Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered

2018-07-09 Thread Xiubo Li

On 2018/7/10 1:06, Mike Christie wrote:

On 07/06/2018 08:28 PM, Xiubo Li wrote:

On 2018/7/7 2:23, Mike Christie wrote:

On 07/05/2018 09:57 PM, xiu...@redhat.com wrote:

   static irqreturn_t uio_interrupt(int irq, void *dev_id)
   {
   struct uio_device *idev = (struct uio_device *)dev_id;
-irqreturn_t ret = idev->info->handler(irq, idev->info);
+irqreturn_t ret;
+
+mutex_lock(&idev->info_lock);
+if (!idev->info) {
+ret = IRQ_NONE;
+goto out;
+}
   +ret = idev->info->handler(irq, idev->info);
   if (ret == IRQ_HANDLED)
   uio_event_notify(idev->info);
   +out:
+mutex_unlock(&idev->info_lock);
   return ret;
   }

Do you need the interrupt related changes in this patch and the first
one?

Actually, the NULL checking is not a must, we can remove this. But the
lock/unlock is needed.

   When we do uio_unregister_device -> free_irq does free_irq return
when there are no longer running interrupt handlers that we requested?

If that is not the case then I think we can hit a similar bug. We do:

__uio_register_device -> device_register -> device's refcount goes to
zero so we do -> uio_device_release -> kfree(idev)

and if it is possible the interrupt handler could still run after
free_irq then we would end up doing:

uio_interrupt -> mutex_lock(&idev->info_lock) -> idev access freed
memory.

I think this shouldn't happen. Because the free_irq function does not
return until any executing interrupts for this IRQ have completed.


If free_irq returns after executing interrupts and does not allow new
executions what is the lock protecting in uio_interrupt?

I meant idev->info->handler(irq, idev->info), it may should be protected 
by the related lock in each driver.


Thanks,



Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered

2018-07-09 Thread Xiubo Li

On 2018/7/10 0:40, Mike Christie wrote:

On 07/06/2018 08:47 PM, Xiubo Li wrote:

On 2018/7/7 2:58, Mike Christie wrote:

On 07/05/2018 09:57 PM, xiu...@redhat.com wrote:

   void uio_event_notify(struct uio_info *info)
   {
-struct uio_device *idev = info->uio_dev;
+struct uio_device *idev;
+
+if (!info)
+return;
+
+idev = info->uio_dev;
   

For this one too, I am not sure if it is needed.

uio_interrupt -> uio_event_notify. See other mail.

driver XYZ -> uio_event_notify. I think drivers need to handle this and
set some bits and/or perform some cleanup to make sure they are not
calling uio_event_notify after it has called uio_unregister_device. The
problem with the above test is if they do not they could have called
uio_unregister_device right after the info test so you could still hit
the problem.

When we are tcmu_destroy_device(), if the netlink notify event to
userspace is not successful then the TCMU will call the uio unregister,
which will set the idev->info = NULL, without close and deleting the
device in userspace. But the TCMU could still queue cmds to the ring
buffer, then the uio_event_notify will be called.

Before tcmu_destroy_device is called LIO will have stopped new IO and
waited for outstanding IO to be completed, so it would never be called
after uio_unregister_device. If it does, it's a bug in LIO.
Yeah, So this should be fixed now. Only hit one crash seem related to 
uio of this code with lower kernel before.


Thanks,
BRs





Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered

2018-07-09 Thread Mike Christie
On 07/06/2018 08:28 PM, Xiubo Li wrote:
> On 2018/7/7 2:23, Mike Christie wrote:
>> On 07/05/2018 09:57 PM, xiu...@redhat.com wrote:
>>>   static irqreturn_t uio_interrupt(int irq, void *dev_id)
>>>   {
>>>   struct uio_device *idev = (struct uio_device *)dev_id;
>>> -irqreturn_t ret = idev->info->handler(irq, idev->info);
>>> +irqreturn_t ret;
>>> +
>>> +mutex_lock(&idev->info_lock);
>>> +if (!idev->info) {
>>> +ret = IRQ_NONE;
>>> +goto out;
>>> +}
>>>   +ret = idev->info->handler(irq, idev->info);
>>>   if (ret == IRQ_HANDLED)
>>>   uio_event_notify(idev->info);
>>>   +out:
>>> +mutex_unlock(&idev->info_lock);
>>>   return ret;
>>>   }
>>
>> Do you need the interrupt related changes in this patch and the first
>> one?
> Actually, the NULL checking is not a must, we can remove this. But the
> lock/unlock is needed.
>>   When we do uio_unregister_device -> free_irq does free_irq return
>> when there are no longer running interrupt handlers that we requested?
>>
>> If that is not the case then I think we can hit a similar bug. We do:
>>
>> __uio_register_device -> device_register -> device's refcount goes to
>> zero so we do -> uio_device_release -> kfree(idev)
>>
>> and if it is possible the interrupt handler could still run after
>> free_irq then we would end up doing:
>>
>> uio_interrupt -> mutex_lock(&idev->info_lock) -> idev access freed
>> memory.
> 
> I think this shouldn't happen. Because the free_irq function does not
> return until any executing interrupts for this IRQ have completed.
> 

If free_irq returns after executing interrupts and does not allow new
executions what is the lock protecting in uio_interrupt?



Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered

2018-07-09 Thread Mike Christie
On 07/06/2018 08:47 PM, Xiubo Li wrote:
> On 2018/7/7 2:58, Mike Christie wrote:
>> On 07/05/2018 09:57 PM, xiu...@redhat.com wrote:
>>>   void uio_event_notify(struct uio_info *info)
>>>   {
>>> -struct uio_device *idev = info->uio_dev;
>>> +struct uio_device *idev;
>>> +
>>> +if (!info)
>>> +return;
>>> +
>>> +idev = info->uio_dev;
>>>   
>> For this one too, I am not sure if it is needed.
>>
>> uio_interrupt -> uio_event_notify. See other mail.
>>
>> driver XYZ -> uio_event_notify. I think drivers need to handle this and
>> set some bits and/or perform some cleanup to make sure they are not
>> calling uio_event_notify after it has called uio_unregister_device. The
>> problem with the above test is if they do not they could have called
>> uio_unregister_device right after the info test so you could still hit
>> the problem.
> 
> When we are tcmu_destroy_device(), if the netlink notify event to
> userspace is not successful then the TCMU will call the uio unregister,
> which will set the idev->info = NULL, without close and deleting the
> device in userspace. But the TCMU could still queue cmds to the ring
> buffer, then the uio_event_notify will be called.

Before tcmu_destroy_device is called LIO will have stopped new IO and
waited for outstanding IO to be completed, so it would never be called
after uio_unregister_device. If it does, it's a bug in LIO.


Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered

2018-07-06 Thread Xiubo Li

On 2018/7/7 2:58, Mike Christie wrote:

On 07/05/2018 09:57 PM, xiu...@redhat.com wrote:

  void uio_event_notify(struct uio_info *info)
  {
-   struct uio_device *idev = info->uio_dev;
+   struct uio_device *idev;
+
+   if (!info)
+   return;
+
+   idev = info->uio_dev;
  

For this one too, I am not sure if it is needed.

uio_interrupt -> uio_event_notify. See other mail.

driver XYZ -> uio_event_notify. I think drivers need to handle this and
set some bits and/or perform some cleanup to make sure they are not
calling uio_event_notify after it has called uio_unregister_device. The
problem with the above test is if they do not they could have called
uio_unregister_device right after the info test so you could still hit
the problem.


When we are tcmu_destroy_device(), if the netlink notify event to 
userspace is not successful then the TCMU will call the uio unregister, 
which will set the idev->info = NULL, without close and deleting the 
device in userspace. But the TCMU could still queue cmds to the ring 
buffer, then the uio_event_notify will be called.


For this case only when using idev->info it would happen. And currently 
there is no need to check this, I will remove it for now.


Thanks,

BRs



Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered

2018-07-06 Thread Xiubo Li

On 2018/7/7 2:23, Mike Christie wrote:

On 07/05/2018 09:57 PM, xiu...@redhat.com wrote:

  static irqreturn_t uio_interrupt(int irq, void *dev_id)
  {
struct uio_device *idev = (struct uio_device *)dev_id;
-   irqreturn_t ret = idev->info->handler(irq, idev->info);
+   irqreturn_t ret;
+
+   mutex_lock(&idev->info_lock);
+   if (!idev->info) {
+   ret = IRQ_NONE;
+   goto out;
+   }
  
+	ret = idev->info->handler(irq, idev->info);

if (ret == IRQ_HANDLED)
uio_event_notify(idev->info);
  
+out:

+   mutex_unlock(&idev->info_lock);
return ret;
  }


Do you need the interrupt related changes in this patch and the first
one?
Actually, the NULL checking is not a must, we can remove this. But the 
lock/unlock is needed.

  When we do uio_unregister_device -> free_irq does free_irq return
when there are no longer running interrupt handlers that we requested?

If that is not the case then I think we can hit a similar bug. We do:

__uio_register_device -> device_register -> device's refcount goes to
zero so we do -> uio_device_release -> kfree(idev)

and if it is possible the interrupt handler could still run after
free_irq then we would end up doing:

uio_interrupt -> mutex_lock(&idev->info_lock) -> idev access freed memory.


I think this shouldn't happen. Because the free_irq function does not 
return until any executing interrupts for this IRQ have completed.


Thanks,
BRs



Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered

2018-07-06 Thread Mike Christie
On 07/05/2018 09:57 PM, xiu...@redhat.com wrote:
>  void uio_event_notify(struct uio_info *info)
>  {
> - struct uio_device *idev = info->uio_dev;
> + struct uio_device *idev;
> +
> + if (!info)
> + return;
> +
> + idev = info->uio_dev;
>  

For this one too, I am not sure if it is needed.

uio_interrupt -> uio_event_notify. See other mail.

driver XYZ -> uio_event_notify. I think drivers need to handle this and
set some bits and/or perform some cleanup to make sure they are not
calling uio_event_notify after it has called uio_unregister_device. The
problem with the above test is if they do not they could have called
uio_unregister_device right after the info test so you could still hit
the problem.


Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered

2018-07-06 Thread Mike Christie
On 07/05/2018 09:57 PM, xiu...@redhat.com wrote:
>  static irqreturn_t uio_interrupt(int irq, void *dev_id)
>  {
>   struct uio_device *idev = (struct uio_device *)dev_id;
> - irqreturn_t ret = idev->info->handler(irq, idev->info);
> + irqreturn_t ret;
> +
> + mutex_lock(&idev->info_lock);
> + if (!idev->info) {
> + ret = IRQ_NONE;
> + goto out;
> + }
>  
> + ret = idev->info->handler(irq, idev->info);
>   if (ret == IRQ_HANDLED)
>   uio_event_notify(idev->info);
>  
> +out:
> + mutex_unlock(&idev->info_lock);
>   return ret;
>  }


Do you need the interrupt related changes in this patch and the first
one? When we do uio_unregister_device -> free_irq does free_irq return
when there are no longer running interrupt handlers that we requested?

If that is not the case then I think we can hit a similar bug. We do:

__uio_register_device -> device_register -> device's refcount goes to
zero so we do -> uio_device_release -> kfree(idev)

and if it is possible the interrupt handler could still run after
free_irq then we would end up doing:

uio_interrupt -> mutex_lock(&idev->info_lock) -> idev access freed memory.


Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered

2018-07-05 Thread Hamish Martin
Looks ok to me.

Reviewed-by: Hamish Martin 

On 07/06/2018 02:57 PM, xiu...@redhat.com wrote:
> From: Xiubo Li 
>
> For the target_core_user use case, after the device is unregistered
> it maybe still opened in user space, then the kernel will crash, like:
>
> [  251.163692] BUG: unable to handle kernel NULL pointer dereference at 
> 0008
> [  251.163820] IP: [] show_name+0x23/0x40 [uio]
> [  251.163965] PGD 800062694067 PUD 62696067 PMD 0
> [  251.164097] Oops:  [#1] SMP
> ...
> [  251.165605]  e1000 mptscsih mptbase drm_panel_orientation_quirks dm_mirror 
> dm_region_hash dm_log dm_mod
> [  251.166014] CPU: 0 PID: 13380 Comm: tcmu-runner Kdump: loaded Not tainted 
> 3.10.0-916.el7.test.x86_64 #1
> [  251.166381] Hardware name: VMware, Inc. VMware Virtual Platform/440BX 
> Desktop Reference Platform, BIOS 6.00 05/19/2017
> [  251.166747] task: 971eb91db0c0 ti: 971e9e384000 task.ti: 
> 971e9e384000
> [  251.167137] RIP: 0010:[]  [] 
> show_name+0x23/0x40 [uio]
> [  251.167563] RSP: 0018:971e9e387dc8  EFLAGS: 00010282
> [  251.167978] RAX:  RBX: 971e9e3f8000 RCX: 
> 971eb8368d98
> [  251.168408] RDX: 971e9e3f8000 RSI: c0738084 RDI: 
> 971e9e3f8000
> [  251.168856] RBP: 971e9e387dd0 R08: 971eb8bc0018 R09: 
> 
> [  251.169296] R10: 1000 R11: a09d444d R12: 
> a1076e80
> [  251.169750] R13: 971e9e387f18 R14: 0001 R15: 
> 971e9cfb1c80
> [  251.170213] FS:  7ff37d175880() GS:971ebb60() 
> knlGS:
> [  251.170693] CS:  0010 DS:  ES:  CR0: 80050033
> [  251.171248] CR2: 0008 CR3: 001f6000 CR4: 
> 003607f0
> [  251.172071] DR0:  DR1:  DR2: 
> 
> [  251.172640] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [  251.173236] Call Trace:
> [  251.173789]  [] dev_attr_show+0x23/0x60
> [  251.174356]  [] ? mutex_lock+0x12/0x2f
> [  251.174892]  [] sysfs_kf_seq_show+0xcf/0x1f0
> [  251.175433]  [] kernfs_seq_show+0x26/0x30
> [  251.175981]  [] seq_read+0x110/0x3f0
> [  251.176609]  [] kernfs_fop_read+0xf5/0x160
> [  251.177158]  [] vfs_read+0x9f/0x170
> [  251.177707]  [] SyS_read+0x7f/0xf0
> [  251.178268]  [] system_call_fastpath+0x1c/0x21
> [  251.178823] Code: 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 
> 89 d3 e8 7e 96 56 e0 48 8b 80 d8 02 00 00 48 89 df 48 c7 c6 84 80 73 c0 <48> 
> 8b 50 08 31 c0 e8 e2 67 44 e0 5b 48 98 5d c3 0f 1f 00 66 2e
> [  251.180115] RIP  [] show_name+0x23/0x40 [uio]
> [  251.180820]  RSP 
> [  251.181473] CR2: 0008
>
> CC: Hamish Martin 
> CC: Mike Christie 
> Signed-off-by: Xiubo Li 
> ---
>   drivers/uio/uio.c | 116 
> ++
>   1 file changed, 99 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
> index 655ade4..a0d926e 100644
> --- a/drivers/uio/uio.c
> +++ b/drivers/uio/uio.c
> @@ -215,7 +215,20 @@ static ssize_t name_show(struct device *dev,
>struct device_attribute *attr, char *buf)
>   {
>   struct uio_device *idev = dev_get_drvdata(dev);
> - return sprintf(buf, "%s\n", idev->info->name);
> + int ret;
> +
> + mutex_lock(&idev->info_lock);
> + if (!idev->info) {
> + ret = -EINVAL;
> + dev_err(dev, "the device has been unregistered\n");
> + goto out;
> + }
> +
> + ret = sprintf(buf, "%s\n", idev->info->name);
> +
> +out:
> + mutex_unlock(&idev->info_lock);
> + return ret;
>   }
>   static DEVICE_ATTR_RO(name);
>   
> @@ -223,7 +236,20 @@ static ssize_t version_show(struct device *dev,
>   struct device_attribute *attr, char *buf)
>   {
>   struct uio_device *idev = dev_get_drvdata(dev);
> - return sprintf(buf, "%s\n", idev->info->version);
> + int ret;
> +
> + mutex_lock(&idev->info_lock);
> + if (!idev->info) {
> + ret = -EINVAL;
> + dev_err(dev, "the device has been unregistered\n");
> + goto out;
> + }
> +
> + ret = sprintf(buf, "%s\n", idev->info->version);
> +
> +out:
> + mutex_unlock(&idev->info_lock);
> + return ret;
>   }
>   static DEVICE_ATTR_RO(version);
>   
> @@ -399,7 +425,12 @@ static void uio_free_minor(struct uio_device *idev)
>*/
>   void uio_event_notify(struct uio_info *info)
>   {
> - struct uio_device *idev = info->uio_dev;
> + struct uio_device *idev;
> +
> + if (!info)
> + return;
> +
> + idev = info->uio_dev;
>   
>   atomic_inc(&idev->event);
>   wake_up_interruptible(&idev->wait);
> @@ -415,11 +446,20 @@ void uio_event_notify(struct uio_info *info)
>   static irqreturn_t uio_interrupt(int irq, void *dev_id)
>   {
>   struct uio_device *idev = (struct uio_device *)dev_id;
> - irqreturn_t ret = idev->info->handler(irq, idev-

[PATCH v3 3/3] uio: fix crash after the device is unregistered

2018-07-05 Thread xiubli
From: Xiubo Li 

For the target_core_user use case, after the device is unregistered
it maybe still opened in user space, then the kernel will crash, like:

[  251.163692] BUG: unable to handle kernel NULL pointer dereference at 
0008
[  251.163820] IP: [] show_name+0x23/0x40 [uio]
[  251.163965] PGD 800062694067 PUD 62696067 PMD 0
[  251.164097] Oops:  [#1] SMP
...
[  251.165605]  e1000 mptscsih mptbase drm_panel_orientation_quirks dm_mirror 
dm_region_hash dm_log dm_mod
[  251.166014] CPU: 0 PID: 13380 Comm: tcmu-runner Kdump: loaded Not tainted 
3.10.0-916.el7.test.x86_64 #1
[  251.166381] Hardware name: VMware, Inc. VMware Virtual Platform/440BX 
Desktop Reference Platform, BIOS 6.00 05/19/2017
[  251.166747] task: 971eb91db0c0 ti: 971e9e384000 task.ti: 
971e9e384000
[  251.167137] RIP: 0010:[]  [] 
show_name+0x23/0x40 [uio]
[  251.167563] RSP: 0018:971e9e387dc8  EFLAGS: 00010282
[  251.167978] RAX:  RBX: 971e9e3f8000 RCX: 971eb8368d98
[  251.168408] RDX: 971e9e3f8000 RSI: c0738084 RDI: 971e9e3f8000
[  251.168856] RBP: 971e9e387dd0 R08: 971eb8bc0018 R09: 
[  251.169296] R10: 1000 R11: a09d444d R12: a1076e80
[  251.169750] R13: 971e9e387f18 R14: 0001 R15: 971e9cfb1c80
[  251.170213] FS:  7ff37d175880() GS:971ebb60() 
knlGS:
[  251.170693] CS:  0010 DS:  ES:  CR0: 80050033
[  251.171248] CR2: 0008 CR3: 001f6000 CR4: 003607f0
[  251.172071] DR0:  DR1:  DR2: 
[  251.172640] DR3:  DR6: fffe0ff0 DR7: 0400
[  251.173236] Call Trace:
[  251.173789]  [] dev_attr_show+0x23/0x60
[  251.174356]  [] ? mutex_lock+0x12/0x2f
[  251.174892]  [] sysfs_kf_seq_show+0xcf/0x1f0
[  251.175433]  [] kernfs_seq_show+0x26/0x30
[  251.175981]  [] seq_read+0x110/0x3f0
[  251.176609]  [] kernfs_fop_read+0xf5/0x160
[  251.177158]  [] vfs_read+0x9f/0x170
[  251.177707]  [] SyS_read+0x7f/0xf0
[  251.178268]  [] system_call_fastpath+0x1c/0x21
[  251.178823] Code: 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 
89 d3 e8 7e 96 56 e0 48 8b 80 d8 02 00 00 48 89 df 48 c7 c6 84 80 73 c0 <48> 8b 
50 08 31 c0 e8 e2 67 44 e0 5b 48 98 5d c3 0f 1f 00 66 2e
[  251.180115] RIP  [] show_name+0x23/0x40 [uio]
[  251.180820]  RSP 
[  251.181473] CR2: 0008

CC: Hamish Martin 
CC: Mike Christie 
Signed-off-by: Xiubo Li 
---
 drivers/uio/uio.c | 116 ++
 1 file changed, 99 insertions(+), 17 deletions(-)

diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index 655ade4..a0d926e 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -215,7 +215,20 @@ static ssize_t name_show(struct device *dev,
 struct device_attribute *attr, char *buf)
 {
struct uio_device *idev = dev_get_drvdata(dev);
-   return sprintf(buf, "%s\n", idev->info->name);
+   int ret;
+
+   mutex_lock(&idev->info_lock);
+   if (!idev->info) {
+   ret = -EINVAL;
+   dev_err(dev, "the device has been unregistered\n");
+   goto out;
+   }
+
+   ret = sprintf(buf, "%s\n", idev->info->name);
+
+out:
+   mutex_unlock(&idev->info_lock);
+   return ret;
 }
 static DEVICE_ATTR_RO(name);
 
@@ -223,7 +236,20 @@ static ssize_t version_show(struct device *dev,
struct device_attribute *attr, char *buf)
 {
struct uio_device *idev = dev_get_drvdata(dev);
-   return sprintf(buf, "%s\n", idev->info->version);
+   int ret;
+
+   mutex_lock(&idev->info_lock);
+   if (!idev->info) {
+   ret = -EINVAL;
+   dev_err(dev, "the device has been unregistered\n");
+   goto out;
+   }
+
+   ret = sprintf(buf, "%s\n", idev->info->version);
+
+out:
+   mutex_unlock(&idev->info_lock);
+   return ret;
 }
 static DEVICE_ATTR_RO(version);
 
@@ -399,7 +425,12 @@ static void uio_free_minor(struct uio_device *idev)
  */
 void uio_event_notify(struct uio_info *info)
 {
-   struct uio_device *idev = info->uio_dev;
+   struct uio_device *idev;
+
+   if (!info)
+   return;
+
+   idev = info->uio_dev;
 
atomic_inc(&idev->event);
wake_up_interruptible(&idev->wait);
@@ -415,11 +446,20 @@ void uio_event_notify(struct uio_info *info)
 static irqreturn_t uio_interrupt(int irq, void *dev_id)
 {
struct uio_device *idev = (struct uio_device *)dev_id;
-   irqreturn_t ret = idev->info->handler(irq, idev->info);
+   irqreturn_t ret;
+
+   mutex_lock(&idev->info_lock);
+   if (!idev->info) {
+   ret = IRQ_NONE;
+   goto out;
+   }
 
+   ret = idev->info->handler(irq, idev->info);
if (ret == IRQ_HANDLED)
uio_event_notify(idev->info);
 
+out:
+