Re: [Ocfs2-devel] Ocfs2-devel Digest, Vol 167, Issue 20
Hi Guozhonghua, It seems that deadlock could be reproduced easily, right? Sharing the lock with VFS-layer probably is risky, and introducing a new lock for "quota_recovery" sounds good. Could you post a patch to fix this problem? thanks, Jun On 2018/1/13 11:04, Guozhonghua wrote: > >> Message: 1 >> Date: Fri, 12 Jan 2018 06:15:01 + >> From: Shichangkuo >> Subject: Re: [Ocfs2-devel] [Ocfs2-dev] BUG: deadlock with umount and >> ocfs2 workqueue triggered by ocfs2rec thread >> To: Joseph Qi , "z...@suse.com" , >> "j...@suse.cz" >> Cc: "ocfs2-devel@oss.oracle.com" >> Message-ID: >> > i-3com.com> >> >> Content-Type: text/plain; charset="gb2312" >> >> Hi Joseph >> Thanks for replying. >> Umount will flush the ocfs2 workqueue in function >> ocfs2_truncate_log_shutdown and journal recovery is one work of ocfs2 wq. >> >> Thanks >> Changkuo >> > > Umount > mngput >cleanup_mnt >deactivate_super: down_write the rw_semaphore: > down_write(&s->s_umount) > deactivate_locked_super > kill_sb: kill_block_super > generic_shutdown_super > put_super : ocfs2_put_supe > ocfs2_dismount_volume > ocfs2_truncate_log_shutdown > > flush_workqueue(osb->ocfs2_wq); > > ocfs2_finish_quota_recovery > > down_read(&sb->s_umount); >Here > retry down_read rw_semaphore; down read while holding write? Try > rw_semaphore twice, dead lock ? >The flush work queue of ocfs2_wq will > be blocked, so as the umount ops. > > Thanks. > > > ___ > Ocfs2-devel mailing list > Ocfs2-devel@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel > ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] Ocfs2-devel Digest, Vol 167, Issue 20
> Message: 1 > Date: Fri, 12 Jan 2018 06:15:01 + > From: Shichangkuo > Subject: Re: [Ocfs2-devel] [Ocfs2-dev] BUG: deadlock with umount and > ocfs2 workqueue triggered by ocfs2rec thread > To: Joseph Qi , "z...@suse.com" , > "j...@suse.cz" > Cc: "ocfs2-devel@oss.oracle.com" > Message-ID: >i-3com.com> > > Content-Type: text/plain; charset="gb2312" > > Hi Joseph > Thanks for replying. > Umount will flush the ocfs2 workqueue in function > ocfs2_truncate_log_shutdown and journal recovery is one work of ocfs2 wq. > > Thanks > Changkuo > Umount mngput cleanup_mnt deactivate_super: down_write the rw_semaphore: down_write(&s->s_umount) deactivate_locked_super kill_sb: kill_block_super generic_shutdown_super put_super : ocfs2_put_supe ocfs2_dismount_volume ocfs2_truncate_log_shutdown flush_workqueue(osb->ocfs2_wq); ocfs2_finish_quota_recovery down_read(&sb->s_umount); Here retry down_read rw_semaphore; down read while holding write? Try rw_semaphore twice, dead lock ? The flush work queue of ocfs2_wq will be blocked, so as the umount ops. Thanks. ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [Ocfs2-dev] BUG: deadlock with umount and ocfs2 workqueue triggered by ocfs2rec thread
Hi, On 01/12/2018 11:43 AM, Shichangkuo wrote: > Hi all, > Now we are testing ocfs2 with 4.14 kernel, and we finding a deadlock with > umount and ocfs2 workqueue triggered by ocfs2rec thread. The stack as follows: > journal recovery work: > [] call_rwsem_down_read_failed+0x14/0x30 > [] ocfs2_finish_quota_recovery+0x62/0x450 [ocfs2] > [] ocfs2_complete_recovery+0xc1/0x440 [ocfs2] > [] process_one_work+0x130/0x350 > [] worker_thread+0x46/0x3b0 > [] kthread+0x101/0x140 > [] ret_from_fork+0x1f/0x30 > [] 0x > > /bin/umount: > [] flush_workqueue+0x104/0x3e0 > [] ocfs2_truncate_log_shutdown+0x3b/0xc0 [ocfs2] > [] ocfs2_dismount_volume+0x8c/0x3d0 [ocfs2] > [] ocfs2_put_super+0x31/0xa0 [ocfs2] > [] generic_shutdown_super+0x6d/0x120 > [] kill_block_super+0x2d/0x60 > [] deactivate_locked_super+0x51/0x90 > [] cleanup_mnt+0x3b/0x70 > [] task_work_run+0x86/0xa0 > [] exit_to_usermode_loop+0x6d/0xa9 > [] do_syscall_64+0x11d/0x130 > [] entry_SYSCALL64_slow_path+0x25/0x25 > [] 0x > > Function ocfs2_finish_quota_recovery try to get sb->s_umount, which was > already locked by umount thread, then get a deadlock. Good catch, thanks for reporting. Is it reproducible? Can you please share the steps for reproducing this issue? > This issue was introduced by c3b004460d77bf3f980d877be539016f2df4df12 and > 5f530de63cfc6ca8571cbdf58af63fb166cc6517. > I think we cannot use :: s_umount, but the mutex ::dqonoff_mutex was already > removed. > Shall we add a new mutex? @Jan, I don't look into the code yet, could you help me understand why we need to get sb->s_umount in ocfs2_finish_quota_recovery? Is it because that the quota recovery process will start at umounting? or some where else? Thanks, Eric ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel