NMI watchdog: Watchdog detected hard LOCKUP on cpu 6 RIP: 0010:native_queued_spin_lock_slowpath+0x20d/0x2b0
Call Trace: dump_stack_lvl+0x57/0x81 validate_chain.cold+0x157/0x16a __lock_acquire+0xbb1/0x1900 lock_acquire+0x1da/0x640 _raw_spin_lock_irqsave+0x43/0x90 ploop_allocate_cluster+0x12f/0x8c0 [ploop] ploop_alloc_cluster.isra.0+0xf7/0x1f0 [ploop] ploop_prepare_reloc_index_wb+0x2ab/0x4e0 [ploop] ploop_grow_relocate_cluster+0x849/0xcc0 [ploop] ploop_process_resize_cmd+0x65/0x430 [ploop] ploop_resize+0x415/0x680 [ploop] ploop_message+0x420/0xc90 [ploop] target_message+0x453/0x5e0 [dm_mod] ctl_ioctl+0x41f/0x6a0 [dm_mod] dm_ctl_ioctl+0xa/0x20 [dm_mod] __x64_sys_ioctl+0x12b/0x1a0 do_syscall_64+0x5c/0x90 entry_SYSCALL_64_after_hwframe+0x77/0xe1 ploop_prepare_reloc_index_wb() aquires ploop->bat_lock and calls (holding the lock) ploop_alloc_cluster(), which also aquires ploop->bat_lock => deadlock. ploop_prepare_reloc_index_wb spin_lock_irq(&ploop->bat_lock); ploop_alloc_cluster ploop_allocate_cluster spin_lock_irqsave(&ploop->bat_lock, flags); Let's move ploop_alloc_cluster() out of the bat_lock, this is safe as MD_UPDATING bit checked and set under the lock protects us from parallel ploop_alloc_cluster() execution for a pertucular md_page. And if the function is called in parallel for different md_page, it's also OK as new cluster is searched in the bitmask under the bat_lock, so same cluster wil never be found and "allocated" in parallel. Fixes: 9caa1af11b0a ("dm-ploop: fix and rework md updates") https://virtuozzo.atlassian.net/browse/VSTOR-107975 Signed-off-by: Konstantin Khorenko <khore...@virtuozzo.com> Feature: dm-ploop: ploop target driver --- drivers/md/dm-ploop-map.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/md/dm-ploop-map.c b/drivers/md/dm-ploop-map.c index 35085a04ca5f..5a2ef5691405 100644 --- a/drivers/md/dm-ploop-map.c +++ b/drivers/md/dm-ploop-map.c @@ -2746,6 +2746,7 @@ int ploop_prepare_reloc_index_wb(struct ploop *ploop, add_to_wblist = ploop_md_make_dirty(ploop, md); piwb = md->piwb; + spin_unlock_irq(&ploop->bat_lock); if (dst_clu) { /* @@ -2759,7 +2760,6 @@ int ploop_prepare_reloc_index_wb(struct ploop *ploop, if (err) goto out_reset; } - spin_unlock_irq(&ploop->bat_lock); *ret_md = md; *add_for_wb = add_to_wblist ? 1 : 0; @@ -2768,6 +2768,7 @@ int ploop_prepare_reloc_index_wb(struct ploop *ploop, out_reset: ploop_break_bat_update(ploop, md, piwb); + spin_lock_irq(&ploop->bat_lock); out_error: if (add_to_wblist) clear_bit(MD_DIRTY, &md->status); -- 2.43.5 _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel