On Mon, 2019-03-25 at 10:26 +0100, Hannes Reinecke wrote:
> The original issue leading to this patchset was this crash:
> 
> 
> [159135.508116] Pid: 2638, comm: ssea Tainted: G        W    X 
> 3.0.101-0.40-default #1 HP ProLiant BL460c Gen9
> [159135.508119] RIP: 0010:[<ffffffffa00bb5d1>]  [<ffffffffa00bb5d1>] 
> scsi_device_get+0x11/0xb0 [scsi_mod]
> [159135.508126] RSP: 0018:ffff88100fdf5c88  EFLAGS: 00010296
> [159135.508128] RAX: ffff88101b31d5c0 RBX: 0000000000000000 RCX: 
> ffff88101b31d5c0
> [159135.508130] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 
> 0000000000000000
> [159135.508132] RBP: ffff88101c1c4780 R08: 0000000000000000 R09: 
> ffff88201f387af0
> [159135.508134] R10: ffff88100fdf5e68 R11: ffffffff811eee70 R12: 
> ffffffffa06ea120
> [159135.508136] R13: ffff88201ef903c0 R14: ffff881007bdda00 R15: 
> ffff88101c1c4780
> [159135.508139] FS:  00007faae06d2700(0000) GS:ffff88107fc00000(0000) 
> knlGS:0000000000000000
> [159135.508141] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [159135.508143] CR2: 0000000000000650 CR3: 0000001018d0f000 CR4: 
> 00000000001407f0
> [159135.508145] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [159135.508148] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
> 0000000000000400
> [159135.508150] Process ssea (pid: 2638, threadinfo ffff88100fdf4000, 
> task ffff881012080140)
> [159135.508152] Stack:
> [159135.508153]  ffff88201ef903c0 ffff88101b31d5c0 ffff88101c1c4780 
> ffffffffa06e767c
> [159135.508160]  0000000000000000 0000000000000000 0000000000000000 
> ffffffff8116119e
> [159135.508163]  0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000
> [159135.508167] Call Trace:
> [159135.508177]  [<ffffffffa06e767c>] ch_open+0x4c/0xa0 [ch]
> [159135.508189]  [<ffffffff8116119e>] chrdev_open+0x13e/0x200
> [159135.508196]  [<ffffffff8115ade8>] __dentry_open+0x198/0x310
> [159135.508201]  [<ffffffff8116a432>] do_last+0x1f2/0x800
> [159135.508206]  [<ffffffff8116b6a9>] path_openat+0xd9/0x420
> [159135.508210]  [<ffffffff8116bb2c>] do_filp_open+0x4c/0xc0
> [159135.508214]  [<ffffffff8115c7cf>] do_sys_open+0x17f/0x250
> [159135.508219]  [<ffffffff8146c292>] system_call_fastpath+0x16/0x1b
> [159135.508225]  [<00007faadfa2a040>] 0x7faadfa2a03f
> [159135.508227] Code: 56 27 e1 0f 1f 80 00 00 00 00 48 89 df e8 98 47 fe 
> e0 eb d5 66 0f 1f 44 00 00 48 83 ec 18 48 89 5c 24 08 48 89 6c 24 10 48 
> 89 fb
> 83>[159135.508241]  bf 50 06 00 00 04 75 16 b8 fa ff ff ff 48 8b 5c 24 
> 08 48 8b
> [159135.508248] RIP  [<ffffffffa00bb5d1>] scsi_device_get+0x11/0xb0 
> [scsi_mod]
> [159135.508254]  RSP <ffff88100fdf5c88>
> [159135.508256] CR2: 0000000000000650
> 
> And we had been crashing because 'ch->device' was NULL in ch_open().
> This patch is to guarantee atomicity on 'scsi_device_put()' and 
> 'ch->device = NULL'; otherwise we'd be having a race window between 
> those calls, allowing another thread to find a 'ch' device with an 
> invalid but non-NULL ch->device pointer.

Hi Hannes,

Thank you for having shared this call trace. Do you agree that moving
the ch->device = NULL assignment from ch_release() into ch_destroy() is
sufficient to fix this crash?

Bart.

Reply via email to