Re: [BUG] Deadlock in blk_mq_register_disk error path
On Mon, Aug 15, 2016 at 6:22 PM, Bart Van Assche wrote: > On 08/15/2016 09:01 AM, Jinpu Wang wrote: >> >> It's more likely you hit another bug, my colleague Roman fix that: >> >> http://www.spinics.net/lists/linux-block/msg04552.html > > > Hello Jinpu, > > Interesting. However, I see that wrote the following: "Firstly this wrong > sequence raises two kernel warnings: 1st. WARNING at > lib/percpu-recount.c:309 percpu_ref_kill_and_confirm called more than once > 2nd. WARNING at lib/percpu-refcount.c:331". I haven't seen any of these > kernel warnings ... > > Thanks, > > Bart. > The warning happened from time to time, but your hung tasks are similar with ours. We injected some delay in order to reproduce easily. -- Mit freundlichen Grüßen, Best Regards, Jack Wang Linux Kernel Developer Storage ProfitBricks GmbH The IaaS-Company. ProfitBricks GmbH Greifswalder Str. 207 D - 10405 Berlin Tel: +49 30 5770083-42 Fax: +49 30 5770085-98 Email: jinpu.w...@profitbricks.com URL: http://www.profitbricks.de Sitz der Gesellschaft: Berlin. Registergericht: Amtsgericht Charlottenburg, HRB 125506 B. Geschäftsführer: Andreas Gauger, Achim Weiss. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Deadlock in blk_mq_register_disk error path
On 08/15/2016 10:15 AM, Jens Axboe wrote: Can you reproduce at will? Would be nice to know if it hit the error case, which is where it would hang. Hello Jens, Unfortunately this hang is only triggered sporadically by my tests. Since about four weeks ago I triggered several thousand scsi_remove_host() calls with my https://github.com/bvanassche/srp-test software. This morning was the first time that I ran into a blk-mq related hang. Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Deadlock in blk_mq_register_disk error path
On 08/15/2016 09:53 AM, Bart Van Assche wrote: On 08/02/2016 10:21 AM, Jens Axboe wrote: On 08/02/2016 06:58 AM, Jinpu Wang wrote: Hi Jens, I found in blk_mq_register_disk, we blk_mq_disable_hotplug which in turn mutex_lock(&all_q_mutex); queue_for_each_hw_ctx(q, hctx, i) { ret = blk_mq_register_hctx(hctx); if (ret) break; /// if about error out, we will call unregister below } if (ret) blk_mq_unregister_disk(disk); In blk_mq_unregister_disk, we will try to disable_hotplug again, which leads to dead lock. Did I miss anything? Nope, your analysis looks correct. This should fix it: http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=6316338a94b2319abe9d3790eb9cdc56ef81ac1a Hi Jens, Will that patch be included in stable kernels? I just encountered a deadlock with kernel v4.7 that looks similar. Sure, we can push to stable, it's a pretty straight forward patch. Can you reproduce at will? Would be nice to know if it hit the error case, which is where it would hang. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Deadlock in blk_mq_register_disk error path
On 08/15/2016 09:01 AM, Jinpu Wang wrote: It's more likely you hit another bug, my colleague Roman fix that: http://www.spinics.net/lists/linux-block/msg04552.html Hello Jinpu, Interesting. However, I see that wrote the following: "Firstly this wrong sequence raises two kernel warnings: 1st. WARNING at lib/percpu-recount.c:309 percpu_ref_kill_and_confirm called more than once 2nd. WARNING at lib/percpu-refcount.c:331". I haven't seen any of these kernel warnings ... Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: [BUG] Deadlock in blk_mq_register_disk error path
Hi Bart, >> >> Nope, your analysis looks correct. This should fix it: >> >> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=6316338a94b2319abe9d3790eb9cdc56ef81ac1a > > Hi Jens, > > Will that patch be included in stable kernels? I just encountered a > deadlock with kernel v4.7 that looks similar. > > Thank you, > > Bart. > > INFO: task kworker/u64:6:136 blocked for more than 480 seconds. > Tainted: GW 4.7.0-dbg+ #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kworker/u64:6 D 88016f677bb0 0 136 2 0x > Workqueue: events_unbound async_run_entry_fn > Call Trace: > [] schedule+0x37/0x90 > [] schedule_preempt_disabled+0x10/0x20 > [] mutex_lock_nested+0x144/0x350 > [] blk_mq_disable_hotplug+0x12/0x20 > [] blk_mq_register_disk+0x29/0x120 > [] blk_register_queue+0xb6/0x160 > [] add_disk+0x219/0x4a0 > [] sd_probe_async+0x100/0x1b0 > [] async_run_entry_fn+0x45/0x140 > [] process_one_work+0x1f9/0x6a0 > [] worker_thread+0x49/0x490 > [] kthread+0xea/0x100 > [] ret_from_fork+0x1f/0x40 > 3 locks held by kworker/u64:6/136: > #0: ("events_unbound"){.+.+.+}, at: [] > process_one_work+0x17a/0x6a0 > #1: ((&entry->work)){+.+.+.}, at: [] > process_one_work+0x17a/0x6a0 > #2: (all_q_mutex){+.+.+.}, at: [] > blk_mq_disable_hotplug+0x12/0x20 > INFO: task 02:8101 blocked for more than 480 seconds. > Tainted: GW 4.7.0-dbg+ #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > 02 D 88039b747968 0 8101 1 0x0004 > Call Trace: > [] schedule+0x37/0x90 > [] blk_mq_freeze_queue_wait+0x51/0xb0 > [] blk_mq_update_tag_set_depth+0x3a/0xb0 > [] blk_mq_init_allocated_queue+0x432/0x450 > [] blk_mq_init_queue+0x35/0x60 > [] scsi_mq_alloc_queue+0x17/0x50 > [] scsi_alloc_sdev+0x2b9/0x350 > [] scsi_probe_and_add_lun+0x98b/0xe50 > [] __scsi_scan_target+0x5ca/0x6b0 > [] scsi_scan_target+0xe1/0xf0 > [] srp_create_target+0xf06/0x13d4 [ib_srp] > [] dev_attr_store+0x13/0x20 > [] sysfs_kf_write+0x40/0x50 > [] kernfs_fop_write+0x137/0x1c0 > [] __vfs_write+0x23/0x140 > [] vfs_write+0xb0/0x190 > [] SyS_write+0x44/0xa0 > [] entry_SYSCALL_64_fastpath+0x18/0xa8 > 8 locks held by 02/8101: > #0: (sb_writers#4){.+.+.+}, at: [] > __sb_start_write+0xb2/0xf0 > #1: (&of->mutex){+.+.+.}, at: [] > kernfs_fop_write+0x101/0x1c0 > #2: (s_active#363){.+.+.+}, at: [] > kernfs_fop_write+0x10a/0x1c0 > #3: (&host->add_target_mutex){+.+.+.}, at: [] > srp_create_target+0x134/0x13d4 [ib_srp] > #4: (&shost->scan_mutex){+.+.+.}, at: [] > scsi_scan_target+0x8d/0xf0 > #5: (cpu_hotplug.lock){++}, at: [] > get_online_cpus+0x2d/0x80 > #6: (all_q_mutex){+.+.+.}, at: [] > blk_mq_init_allocated_queue+0x34a/0x450 > #7: (&set->tag_list_lock){+.+...}, at: [] > blk_mq_init_allocated_queue+0x37a/0x450 > It's more likely you hit another bug, my colleague Roman fix that: http://www.spinics.net/lists/linux-block/msg04552.html It will be great, you test and see if it works for you! -- Mit freundlichen Grüßen, Best Regards, Jack Wang Linux Kernel Developer Storage ProfitBricks GmbH The IaaS-Company. ProfitBricks GmbH Greifswalder Str. 207 D - 10405 Berlin Tel: +49 30 5770083-42 Fax: +49 30 5770085-98 Email: jinpu.w...@profitbricks.com URL: http://www.profitbricks.de Sitz der Gesellschaft: Berlin. Registergericht: Amtsgericht Charlottenburg, HRB 125506 B. Geschäftsführer: Andreas Gauger, Achim Weiss. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: [BUG] Deadlock in blk_mq_register_disk error path
On 08/02/2016 10:21 AM, Jens Axboe wrote: > On 08/02/2016 06:58 AM, Jinpu Wang wrote: >> Hi Jens, >> >> I found in blk_mq_register_disk, we blk_mq_disable_hotplug which in >> turn mutex_lock(&all_q_mutex); >> queue_for_each_hw_ctx(q, hctx, i) { >> ret = blk_mq_register_hctx(hctx); >> if (ret) >> break; /// if about error out, we will call >> unregister below >> } >> >> if (ret) >> blk_mq_unregister_disk(disk); >> >> In blk_mq_unregister_disk, we will try to disable_hotplug again, which >> leads to dead lock. >> >> Did I miss anything? > > Nope, your analysis looks correct. This should fix it: > > http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=6316338a94b2319abe9d3790eb9cdc56ef81ac1a Hi Jens, Will that patch be included in stable kernels? I just encountered a deadlock with kernel v4.7 that looks similar. Thank you, Bart. INFO: task kworker/u64:6:136 blocked for more than 480 seconds. Tainted: GW 4.7.0-dbg+ #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/u64:6 D 88016f677bb0 0 136 2 0x Workqueue: events_unbound async_run_entry_fn Call Trace: [] schedule+0x37/0x90 [] schedule_preempt_disabled+0x10/0x20 [] mutex_lock_nested+0x144/0x350 [] blk_mq_disable_hotplug+0x12/0x20 [] blk_mq_register_disk+0x29/0x120 [] blk_register_queue+0xb6/0x160 [] add_disk+0x219/0x4a0 [] sd_probe_async+0x100/0x1b0 [] async_run_entry_fn+0x45/0x140 [] process_one_work+0x1f9/0x6a0 [] worker_thread+0x49/0x490 [] kthread+0xea/0x100 [] ret_from_fork+0x1f/0x40 3 locks held by kworker/u64:6/136: #0: ("events_unbound"){.+.+.+}, at: [] process_one_work+0x17a/0x6a0 #1: ((&entry->work)){+.+.+.}, at: [] process_one_work+0x17a/0x6a0 #2: (all_q_mutex){+.+.+.}, at: [] blk_mq_disable_hotplug+0x12/0x20 INFO: task 02:8101 blocked for more than 480 seconds. Tainted: GW 4.7.0-dbg+ #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 02 D 88039b747968 0 8101 1 0x0004 Call Trace: [] schedule+0x37/0x90 [] blk_mq_freeze_queue_wait+0x51/0xb0 [] blk_mq_update_tag_set_depth+0x3a/0xb0 [] blk_mq_init_allocated_queue+0x432/0x450 [] blk_mq_init_queue+0x35/0x60 [] scsi_mq_alloc_queue+0x17/0x50 [] scsi_alloc_sdev+0x2b9/0x350 [] scsi_probe_and_add_lun+0x98b/0xe50 [] __scsi_scan_target+0x5ca/0x6b0 [] scsi_scan_target+0xe1/0xf0 [] srp_create_target+0xf06/0x13d4 [ib_srp] [] dev_attr_store+0x13/0x20 [] sysfs_kf_write+0x40/0x50 [] kernfs_fop_write+0x137/0x1c0 [] __vfs_write+0x23/0x140 [] vfs_write+0xb0/0x190 [] SyS_write+0x44/0xa0 [] entry_SYSCALL_64_fastpath+0x18/0xa8 8 locks held by 02/8101: #0: (sb_writers#4){.+.+.+}, at: [] __sb_start_write+0xb2/0xf0 #1: (&of->mutex){+.+.+.}, at: [] kernfs_fop_write+0x101/0x1c0 #2: (s_active#363){.+.+.+}, at: [] kernfs_fop_write+0x10a/0x1c0 #3: (&host->add_target_mutex){+.+.+.}, at: [] srp_create_target+0x134/0x13d4 [ib_srp] #4: (&shost->scan_mutex){+.+.+.}, at: [] scsi_scan_target+0x8d/0xf0 #5: (cpu_hotplug.lock){++}, at: [] get_online_cpus+0x2d/0x80 #6: (all_q_mutex){+.+.+.}, at: [] blk_mq_init_allocated_queue+0x34a/0x450 #7: (&set->tag_list_lock){+.+...}, at: [] blk_mq_init_allocated_queue+0x37a/0x450 -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Deadlock in blk_mq_register_disk error path
On Tue, Aug 2, 2016 at 7:21 PM, Jens Axboe wrote: > On 08/02/2016 06:58 AM, Jinpu Wang wrote: >> >> Hi Jens, >> >> I found in blk_mq_register_disk, we blk_mq_disable_hotplug which in >> turn mutex_lock(&all_q_mutex); >> queue_for_each_hw_ctx(q, hctx, i) { >> ret = blk_mq_register_hctx(hctx); >> if (ret) >> break; /// if about error out, we will call >> unregister below >> } >> >> if (ret) >> blk_mq_unregister_disk(disk); >> >> In blk_mq_unregister_disk, we will try to disable_hotplug again, which >> leads to dead lock. >> >> Did I miss anything? > > > Nope, your analysis looks correct. This should fix it: > > http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=6316338a94b2319abe9d3790eb9cdc56ef81ac1a > > > -- > Jens Axboe > Thanks Jens, looks good to me! -- Mit freundlichen Grüßen, Best Regards, Jack Wang Linux Kernel Developer Storage ProfitBricks GmbH The IaaS-Company. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Deadlock in blk_mq_register_disk error path
On 08/02/2016 06:58 AM, Jinpu Wang wrote: Hi Jens, I found in blk_mq_register_disk, we blk_mq_disable_hotplug which in turn mutex_lock(&all_q_mutex); queue_for_each_hw_ctx(q, hctx, i) { ret = blk_mq_register_hctx(hctx); if (ret) break; /// if about error out, we will call unregister below } if (ret) blk_mq_unregister_disk(disk); In blk_mq_unregister_disk, we will try to disable_hotplug again, which leads to dead lock. Did I miss anything? Nope, your analysis looks correct. This should fix it: http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=6316338a94b2319abe9d3790eb9cdc56ef81ac1a -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[BUG] Deadlock in blk_mq_register_disk error path
Hi Jens, I found in blk_mq_register_disk, we blk_mq_disable_hotplug which in turn mutex_lock(&all_q_mutex); queue_for_each_hw_ctx(q, hctx, i) { ret = blk_mq_register_hctx(hctx); if (ret) break; /// if about error out, we will call unregister below } if (ret) blk_mq_unregister_disk(disk); In blk_mq_unregister_disk, we will try to disable_hotplug again, which leads to dead lock. Did I miss anything? -- Mit freundlichen Grüßen, Best Regards, Jack Wang Linux Kernel Developer Storage ProfitBricks GmbH The IaaS-Company. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html