Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered
On 2018/7/10 1:06, Mike Christie wrote: On 07/06/2018 08:28 PM, Xiubo Li wrote: On 2018/7/7 2:23, Mike Christie wrote: On 07/05/2018 09:57 PM, xiu...@redhat.com wrote: static irqreturn_t uio_interrupt(int irq, void *dev_id) { struct uio_device *idev = (struct uio_device *)dev_id; -irqreturn_t ret = idev->info->handler(irq, idev->info); +irqreturn_t ret; + +mutex_lock(&idev->info_lock); +if (!idev->info) { +ret = IRQ_NONE; +goto out; +} +ret = idev->info->handler(irq, idev->info); if (ret == IRQ_HANDLED) uio_event_notify(idev->info); +out: +mutex_unlock(&idev->info_lock); return ret; } Do you need the interrupt related changes in this patch and the first one? Actually, the NULL checking is not a must, we can remove this. But the lock/unlock is needed. When we do uio_unregister_device -> free_irq does free_irq return when there are no longer running interrupt handlers that we requested? If that is not the case then I think we can hit a similar bug. We do: __uio_register_device -> device_register -> device's refcount goes to zero so we do -> uio_device_release -> kfree(idev) and if it is possible the interrupt handler could still run after free_irq then we would end up doing: uio_interrupt -> mutex_lock(&idev->info_lock) -> idev access freed memory. I think this shouldn't happen. Because the free_irq function does not return until any executing interrupts for this IRQ have completed. If free_irq returns after executing interrupts and does not allow new executions what is the lock protecting in uio_interrupt? I meant idev->info->handler(irq, idev->info), it may should be protected by the related lock in each driver. Thanks,
Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered
On 2018/7/10 0:40, Mike Christie wrote: On 07/06/2018 08:47 PM, Xiubo Li wrote: On 2018/7/7 2:58, Mike Christie wrote: On 07/05/2018 09:57 PM, xiu...@redhat.com wrote: void uio_event_notify(struct uio_info *info) { -struct uio_device *idev = info->uio_dev; +struct uio_device *idev; + +if (!info) +return; + +idev = info->uio_dev; For this one too, I am not sure if it is needed. uio_interrupt -> uio_event_notify. See other mail. driver XYZ -> uio_event_notify. I think drivers need to handle this and set some bits and/or perform some cleanup to make sure they are not calling uio_event_notify after it has called uio_unregister_device. The problem with the above test is if they do not they could have called uio_unregister_device right after the info test so you could still hit the problem. When we are tcmu_destroy_device(), if the netlink notify event to userspace is not successful then the TCMU will call the uio unregister, which will set the idev->info = NULL, without close and deleting the device in userspace. But the TCMU could still queue cmds to the ring buffer, then the uio_event_notify will be called. Before tcmu_destroy_device is called LIO will have stopped new IO and waited for outstanding IO to be completed, so it would never be called after uio_unregister_device. If it does, it's a bug in LIO. Yeah, So this should be fixed now. Only hit one crash seem related to uio of this code with lower kernel before. Thanks, BRs
Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered
On 07/06/2018 08:28 PM, Xiubo Li wrote: > On 2018/7/7 2:23, Mike Christie wrote: >> On 07/05/2018 09:57 PM, xiu...@redhat.com wrote: >>> static irqreturn_t uio_interrupt(int irq, void *dev_id) >>> { >>> struct uio_device *idev = (struct uio_device *)dev_id; >>> -irqreturn_t ret = idev->info->handler(irq, idev->info); >>> +irqreturn_t ret; >>> + >>> +mutex_lock(&idev->info_lock); >>> +if (!idev->info) { >>> +ret = IRQ_NONE; >>> +goto out; >>> +} >>> +ret = idev->info->handler(irq, idev->info); >>> if (ret == IRQ_HANDLED) >>> uio_event_notify(idev->info); >>> +out: >>> +mutex_unlock(&idev->info_lock); >>> return ret; >>> } >> >> Do you need the interrupt related changes in this patch and the first >> one? > Actually, the NULL checking is not a must, we can remove this. But the > lock/unlock is needed. >> When we do uio_unregister_device -> free_irq does free_irq return >> when there are no longer running interrupt handlers that we requested? >> >> If that is not the case then I think we can hit a similar bug. We do: >> >> __uio_register_device -> device_register -> device's refcount goes to >> zero so we do -> uio_device_release -> kfree(idev) >> >> and if it is possible the interrupt handler could still run after >> free_irq then we would end up doing: >> >> uio_interrupt -> mutex_lock(&idev->info_lock) -> idev access freed >> memory. > > I think this shouldn't happen. Because the free_irq function does not > return until any executing interrupts for this IRQ have completed. > If free_irq returns after executing interrupts and does not allow new executions what is the lock protecting in uio_interrupt?
Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered
On 07/06/2018 08:47 PM, Xiubo Li wrote: > On 2018/7/7 2:58, Mike Christie wrote: >> On 07/05/2018 09:57 PM, xiu...@redhat.com wrote: >>> void uio_event_notify(struct uio_info *info) >>> { >>> -struct uio_device *idev = info->uio_dev; >>> +struct uio_device *idev; >>> + >>> +if (!info) >>> +return; >>> + >>> +idev = info->uio_dev; >>> >> For this one too, I am not sure if it is needed. >> >> uio_interrupt -> uio_event_notify. See other mail. >> >> driver XYZ -> uio_event_notify. I think drivers need to handle this and >> set some bits and/or perform some cleanup to make sure they are not >> calling uio_event_notify after it has called uio_unregister_device. The >> problem with the above test is if they do not they could have called >> uio_unregister_device right after the info test so you could still hit >> the problem. > > When we are tcmu_destroy_device(), if the netlink notify event to > userspace is not successful then the TCMU will call the uio unregister, > which will set the idev->info = NULL, without close and deleting the > device in userspace. But the TCMU could still queue cmds to the ring > buffer, then the uio_event_notify will be called. Before tcmu_destroy_device is called LIO will have stopped new IO and waited for outstanding IO to be completed, so it would never be called after uio_unregister_device. If it does, it's a bug in LIO.
Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered
On 2018/7/7 2:58, Mike Christie wrote: On 07/05/2018 09:57 PM, xiu...@redhat.com wrote: void uio_event_notify(struct uio_info *info) { - struct uio_device *idev = info->uio_dev; + struct uio_device *idev; + + if (!info) + return; + + idev = info->uio_dev; For this one too, I am not sure if it is needed. uio_interrupt -> uio_event_notify. See other mail. driver XYZ -> uio_event_notify. I think drivers need to handle this and set some bits and/or perform some cleanup to make sure they are not calling uio_event_notify after it has called uio_unregister_device. The problem with the above test is if they do not they could have called uio_unregister_device right after the info test so you could still hit the problem. When we are tcmu_destroy_device(), if the netlink notify event to userspace is not successful then the TCMU will call the uio unregister, which will set the idev->info = NULL, without close and deleting the device in userspace. But the TCMU could still queue cmds to the ring buffer, then the uio_event_notify will be called. For this case only when using idev->info it would happen. And currently there is no need to check this, I will remove it for now. Thanks, BRs
Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered
On 2018/7/7 2:23, Mike Christie wrote: On 07/05/2018 09:57 PM, xiu...@redhat.com wrote: static irqreturn_t uio_interrupt(int irq, void *dev_id) { struct uio_device *idev = (struct uio_device *)dev_id; - irqreturn_t ret = idev->info->handler(irq, idev->info); + irqreturn_t ret; + + mutex_lock(&idev->info_lock); + if (!idev->info) { + ret = IRQ_NONE; + goto out; + } + ret = idev->info->handler(irq, idev->info); if (ret == IRQ_HANDLED) uio_event_notify(idev->info); +out: + mutex_unlock(&idev->info_lock); return ret; } Do you need the interrupt related changes in this patch and the first one? Actually, the NULL checking is not a must, we can remove this. But the lock/unlock is needed. When we do uio_unregister_device -> free_irq does free_irq return when there are no longer running interrupt handlers that we requested? If that is not the case then I think we can hit a similar bug. We do: __uio_register_device -> device_register -> device's refcount goes to zero so we do -> uio_device_release -> kfree(idev) and if it is possible the interrupt handler could still run after free_irq then we would end up doing: uio_interrupt -> mutex_lock(&idev->info_lock) -> idev access freed memory. I think this shouldn't happen. Because the free_irq function does not return until any executing interrupts for this IRQ have completed. Thanks, BRs
Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered
On 07/05/2018 09:57 PM, xiu...@redhat.com wrote: > void uio_event_notify(struct uio_info *info) > { > - struct uio_device *idev = info->uio_dev; > + struct uio_device *idev; > + > + if (!info) > + return; > + > + idev = info->uio_dev; > For this one too, I am not sure if it is needed. uio_interrupt -> uio_event_notify. See other mail. driver XYZ -> uio_event_notify. I think drivers need to handle this and set some bits and/or perform some cleanup to make sure they are not calling uio_event_notify after it has called uio_unregister_device. The problem with the above test is if they do not they could have called uio_unregister_device right after the info test so you could still hit the problem.
Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered
On 07/05/2018 09:57 PM, xiu...@redhat.com wrote: > static irqreturn_t uio_interrupt(int irq, void *dev_id) > { > struct uio_device *idev = (struct uio_device *)dev_id; > - irqreturn_t ret = idev->info->handler(irq, idev->info); > + irqreturn_t ret; > + > + mutex_lock(&idev->info_lock); > + if (!idev->info) { > + ret = IRQ_NONE; > + goto out; > + } > > + ret = idev->info->handler(irq, idev->info); > if (ret == IRQ_HANDLED) > uio_event_notify(idev->info); > > +out: > + mutex_unlock(&idev->info_lock); > return ret; > } Do you need the interrupt related changes in this patch and the first one? When we do uio_unregister_device -> free_irq does free_irq return when there are no longer running interrupt handlers that we requested? If that is not the case then I think we can hit a similar bug. We do: __uio_register_device -> device_register -> device's refcount goes to zero so we do -> uio_device_release -> kfree(idev) and if it is possible the interrupt handler could still run after free_irq then we would end up doing: uio_interrupt -> mutex_lock(&idev->info_lock) -> idev access freed memory.
Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered
Looks ok to me. Reviewed-by: Hamish Martin On 07/06/2018 02:57 PM, xiu...@redhat.com wrote: > From: Xiubo Li > > For the target_core_user use case, after the device is unregistered > it maybe still opened in user space, then the kernel will crash, like: > > [ 251.163692] BUG: unable to handle kernel NULL pointer dereference at > 0008 > [ 251.163820] IP: [] show_name+0x23/0x40 [uio] > [ 251.163965] PGD 800062694067 PUD 62696067 PMD 0 > [ 251.164097] Oops: [#1] SMP > ... > [ 251.165605] e1000 mptscsih mptbase drm_panel_orientation_quirks dm_mirror > dm_region_hash dm_log dm_mod > [ 251.166014] CPU: 0 PID: 13380 Comm: tcmu-runner Kdump: loaded Not tainted > 3.10.0-916.el7.test.x86_64 #1 > [ 251.166381] Hardware name: VMware, Inc. VMware Virtual Platform/440BX > Desktop Reference Platform, BIOS 6.00 05/19/2017 > [ 251.166747] task: 971eb91db0c0 ti: 971e9e384000 task.ti: > 971e9e384000 > [ 251.167137] RIP: 0010:[] [] > show_name+0x23/0x40 [uio] > [ 251.167563] RSP: 0018:971e9e387dc8 EFLAGS: 00010282 > [ 251.167978] RAX: RBX: 971e9e3f8000 RCX: > 971eb8368d98 > [ 251.168408] RDX: 971e9e3f8000 RSI: c0738084 RDI: > 971e9e3f8000 > [ 251.168856] RBP: 971e9e387dd0 R08: 971eb8bc0018 R09: > > [ 251.169296] R10: 1000 R11: a09d444d R12: > a1076e80 > [ 251.169750] R13: 971e9e387f18 R14: 0001 R15: > 971e9cfb1c80 > [ 251.170213] FS: 7ff37d175880() GS:971ebb60() > knlGS: > [ 251.170693] CS: 0010 DS: ES: CR0: 80050033 > [ 251.171248] CR2: 0008 CR3: 001f6000 CR4: > 003607f0 > [ 251.172071] DR0: DR1: DR2: > > [ 251.172640] DR3: DR6: fffe0ff0 DR7: > 0400 > [ 251.173236] Call Trace: > [ 251.173789] [] dev_attr_show+0x23/0x60 > [ 251.174356] [] ? mutex_lock+0x12/0x2f > [ 251.174892] [] sysfs_kf_seq_show+0xcf/0x1f0 > [ 251.175433] [] kernfs_seq_show+0x26/0x30 > [ 251.175981] [] seq_read+0x110/0x3f0 > [ 251.176609] [] kernfs_fop_read+0xf5/0x160 > [ 251.177158] [] vfs_read+0x9f/0x170 > [ 251.177707] [] SyS_read+0x7f/0xf0 > [ 251.178268] [] system_call_fastpath+0x1c/0x21 > [ 251.178823] Code: 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 > 89 d3 e8 7e 96 56 e0 48 8b 80 d8 02 00 00 48 89 df 48 c7 c6 84 80 73 c0 <48> > 8b 50 08 31 c0 e8 e2 67 44 e0 5b 48 98 5d c3 0f 1f 00 66 2e > [ 251.180115] RIP [] show_name+0x23/0x40 [uio] > [ 251.180820] RSP > [ 251.181473] CR2: 0008 > > CC: Hamish Martin > CC: Mike Christie > Signed-off-by: Xiubo Li > --- > drivers/uio/uio.c | 116 > ++ > 1 file changed, 99 insertions(+), 17 deletions(-) > > diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c > index 655ade4..a0d926e 100644 > --- a/drivers/uio/uio.c > +++ b/drivers/uio/uio.c > @@ -215,7 +215,20 @@ static ssize_t name_show(struct device *dev, >struct device_attribute *attr, char *buf) > { > struct uio_device *idev = dev_get_drvdata(dev); > - return sprintf(buf, "%s\n", idev->info->name); > + int ret; > + > + mutex_lock(&idev->info_lock); > + if (!idev->info) { > + ret = -EINVAL; > + dev_err(dev, "the device has been unregistered\n"); > + goto out; > + } > + > + ret = sprintf(buf, "%s\n", idev->info->name); > + > +out: > + mutex_unlock(&idev->info_lock); > + return ret; > } > static DEVICE_ATTR_RO(name); > > @@ -223,7 +236,20 @@ static ssize_t version_show(struct device *dev, > struct device_attribute *attr, char *buf) > { > struct uio_device *idev = dev_get_drvdata(dev); > - return sprintf(buf, "%s\n", idev->info->version); > + int ret; > + > + mutex_lock(&idev->info_lock); > + if (!idev->info) { > + ret = -EINVAL; > + dev_err(dev, "the device has been unregistered\n"); > + goto out; > + } > + > + ret = sprintf(buf, "%s\n", idev->info->version); > + > +out: > + mutex_unlock(&idev->info_lock); > + return ret; > } > static DEVICE_ATTR_RO(version); > > @@ -399,7 +425,12 @@ static void uio_free_minor(struct uio_device *idev) >*/ > void uio_event_notify(struct uio_info *info) > { > - struct uio_device *idev = info->uio_dev; > + struct uio_device *idev; > + > + if (!info) > + return; > + > + idev = info->uio_dev; > > atomic_inc(&idev->event); > wake_up_interruptible(&idev->wait); > @@ -415,11 +446,20 @@ void uio_event_notify(struct uio_info *info) > static irqreturn_t uio_interrupt(int irq, void *dev_id) > { > struct uio_device *idev = (struct uio_device *)dev_id; > - irqreturn_t ret = idev->info->handler(irq, idev-
[PATCH v3 3/3] uio: fix crash after the device is unregistered
From: Xiubo Li For the target_core_user use case, after the device is unregistered it maybe still opened in user space, then the kernel will crash, like: [ 251.163692] BUG: unable to handle kernel NULL pointer dereference at 0008 [ 251.163820] IP: [] show_name+0x23/0x40 [uio] [ 251.163965] PGD 800062694067 PUD 62696067 PMD 0 [ 251.164097] Oops: [#1] SMP ... [ 251.165605] e1000 mptscsih mptbase drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod [ 251.166014] CPU: 0 PID: 13380 Comm: tcmu-runner Kdump: loaded Not tainted 3.10.0-916.el7.test.x86_64 #1 [ 251.166381] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017 [ 251.166747] task: 971eb91db0c0 ti: 971e9e384000 task.ti: 971e9e384000 [ 251.167137] RIP: 0010:[] [] show_name+0x23/0x40 [uio] [ 251.167563] RSP: 0018:971e9e387dc8 EFLAGS: 00010282 [ 251.167978] RAX: RBX: 971e9e3f8000 RCX: 971eb8368d98 [ 251.168408] RDX: 971e9e3f8000 RSI: c0738084 RDI: 971e9e3f8000 [ 251.168856] RBP: 971e9e387dd0 R08: 971eb8bc0018 R09: [ 251.169296] R10: 1000 R11: a09d444d R12: a1076e80 [ 251.169750] R13: 971e9e387f18 R14: 0001 R15: 971e9cfb1c80 [ 251.170213] FS: 7ff37d175880() GS:971ebb60() knlGS: [ 251.170693] CS: 0010 DS: ES: CR0: 80050033 [ 251.171248] CR2: 0008 CR3: 001f6000 CR4: 003607f0 [ 251.172071] DR0: DR1: DR2: [ 251.172640] DR3: DR6: fffe0ff0 DR7: 0400 [ 251.173236] Call Trace: [ 251.173789] [] dev_attr_show+0x23/0x60 [ 251.174356] [] ? mutex_lock+0x12/0x2f [ 251.174892] [] sysfs_kf_seq_show+0xcf/0x1f0 [ 251.175433] [] kernfs_seq_show+0x26/0x30 [ 251.175981] [] seq_read+0x110/0x3f0 [ 251.176609] [] kernfs_fop_read+0xf5/0x160 [ 251.177158] [] vfs_read+0x9f/0x170 [ 251.177707] [] SyS_read+0x7f/0xf0 [ 251.178268] [] system_call_fastpath+0x1c/0x21 [ 251.178823] Code: 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 d3 e8 7e 96 56 e0 48 8b 80 d8 02 00 00 48 89 df 48 c7 c6 84 80 73 c0 <48> 8b 50 08 31 c0 e8 e2 67 44 e0 5b 48 98 5d c3 0f 1f 00 66 2e [ 251.180115] RIP [] show_name+0x23/0x40 [uio] [ 251.180820] RSP [ 251.181473] CR2: 0008 CC: Hamish Martin CC: Mike Christie Signed-off-by: Xiubo Li --- drivers/uio/uio.c | 116 ++ 1 file changed, 99 insertions(+), 17 deletions(-) diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index 655ade4..a0d926e 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -215,7 +215,20 @@ static ssize_t name_show(struct device *dev, struct device_attribute *attr, char *buf) { struct uio_device *idev = dev_get_drvdata(dev); - return sprintf(buf, "%s\n", idev->info->name); + int ret; + + mutex_lock(&idev->info_lock); + if (!idev->info) { + ret = -EINVAL; + dev_err(dev, "the device has been unregistered\n"); + goto out; + } + + ret = sprintf(buf, "%s\n", idev->info->name); + +out: + mutex_unlock(&idev->info_lock); + return ret; } static DEVICE_ATTR_RO(name); @@ -223,7 +236,20 @@ static ssize_t version_show(struct device *dev, struct device_attribute *attr, char *buf) { struct uio_device *idev = dev_get_drvdata(dev); - return sprintf(buf, "%s\n", idev->info->version); + int ret; + + mutex_lock(&idev->info_lock); + if (!idev->info) { + ret = -EINVAL; + dev_err(dev, "the device has been unregistered\n"); + goto out; + } + + ret = sprintf(buf, "%s\n", idev->info->version); + +out: + mutex_unlock(&idev->info_lock); + return ret; } static DEVICE_ATTR_RO(version); @@ -399,7 +425,12 @@ static void uio_free_minor(struct uio_device *idev) */ void uio_event_notify(struct uio_info *info) { - struct uio_device *idev = info->uio_dev; + struct uio_device *idev; + + if (!info) + return; + + idev = info->uio_dev; atomic_inc(&idev->event); wake_up_interruptible(&idev->wait); @@ -415,11 +446,20 @@ void uio_event_notify(struct uio_info *info) static irqreturn_t uio_interrupt(int irq, void *dev_id) { struct uio_device *idev = (struct uio_device *)dev_id; - irqreturn_t ret = idev->info->handler(irq, idev->info); + irqreturn_t ret; + + mutex_lock(&idev->info_lock); + if (!idev->info) { + ret = IRQ_NONE; + goto out; + } + ret = idev->info->handler(irq, idev->info); if (ret == IRQ_HANDLED) uio_event_notify(idev->info); +out: +