On Mon, 2019-03-25 at 10:26 +0100, Hannes Reinecke wrote: > The original issue leading to this patchset was this crash: > > > [159135.508116] Pid: 2638, comm: ssea Tainted: G W X > 3.0.101-0.40-default #1 HP ProLiant BL460c Gen9 > [159135.508119] RIP: 0010:[<ffffffffa00bb5d1>] [<ffffffffa00bb5d1>] > scsi_device_get+0x11/0xb0 [scsi_mod] > [159135.508126] RSP: 0018:ffff88100fdf5c88 EFLAGS: 00010296 > [159135.508128] RAX: ffff88101b31d5c0 RBX: 0000000000000000 RCX: > ffff88101b31d5c0 > [159135.508130] RDX: 0000000000000000 RSI: 0000000000000002 RDI: > 0000000000000000 > [159135.508132] RBP: ffff88101c1c4780 R08: 0000000000000000 R09: > ffff88201f387af0 > [159135.508134] R10: ffff88100fdf5e68 R11: ffffffff811eee70 R12: > ffffffffa06ea120 > [159135.508136] R13: ffff88201ef903c0 R14: ffff881007bdda00 R15: > ffff88101c1c4780 > [159135.508139] FS: 00007faae06d2700(0000) GS:ffff88107fc00000(0000) > knlGS:0000000000000000 > [159135.508141] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [159135.508143] CR2: 0000000000000650 CR3: 0000001018d0f000 CR4: > 00000000001407f0 > [159135.508145] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [159135.508148] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [159135.508150] Process ssea (pid: 2638, threadinfo ffff88100fdf4000, > task ffff881012080140) > [159135.508152] Stack: > [159135.508153] ffff88201ef903c0 ffff88101b31d5c0 ffff88101c1c4780 > ffffffffa06e767c > [159135.508160] 0000000000000000 0000000000000000 0000000000000000 > ffffffff8116119e > [159135.508163] 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > [159135.508167] Call Trace: > [159135.508177] [<ffffffffa06e767c>] ch_open+0x4c/0xa0 [ch] > [159135.508189] [<ffffffff8116119e>] chrdev_open+0x13e/0x200 > [159135.508196] [<ffffffff8115ade8>] __dentry_open+0x198/0x310 > [159135.508201] [<ffffffff8116a432>] do_last+0x1f2/0x800 > [159135.508206] [<ffffffff8116b6a9>] path_openat+0xd9/0x420 > [159135.508210] [<ffffffff8116bb2c>] do_filp_open+0x4c/0xc0 > [159135.508214] [<ffffffff8115c7cf>] do_sys_open+0x17f/0x250 > [159135.508219] [<ffffffff8146c292>] system_call_fastpath+0x16/0x1b > [159135.508225] [<00007faadfa2a040>] 0x7faadfa2a03f > [159135.508227] Code: 56 27 e1 0f 1f 80 00 00 00 00 48 89 df e8 98 47 fe > e0 eb d5 66 0f 1f 44 00 00 48 83 ec 18 48 89 5c 24 08 48 89 6c 24 10 48 > 89 fb > 83>[159135.508241] bf 50 06 00 00 04 75 16 b8 fa ff ff ff 48 8b 5c 24 > 08 48 8b > [159135.508248] RIP [<ffffffffa00bb5d1>] scsi_device_get+0x11/0xb0 > [scsi_mod] > [159135.508254] RSP <ffff88100fdf5c88> > [159135.508256] CR2: 0000000000000650 > > And we had been crashing because 'ch->device' was NULL in ch_open(). > This patch is to guarantee atomicity on 'scsi_device_put()' and > 'ch->device = NULL'; otherwise we'd be having a race window between > those calls, allowing another thread to find a 'ch' device with an > invalid but non-NULL ch->device pointer.
Hi Hannes, Thank you for having shared this call trace. Do you agree that moving the ch->device = NULL assignment from ch_release() into ch_destroy() is sufficient to fix this crash? Bart.

