Re: [Ocfs2-devel] Ocfs2-devel Digest, Vol 167, Issue 20

2018-01-12 Thread piaojun
Hi Guozhonghua,

It seems that deadlock could be reproduced easily, right? Sharing the
lock with VFS-layer probably is risky, and introducing a new lock for
"quota_recovery" sounds good. Could you post a patch to fix this
problem?

thanks,
Jun

On 2018/1/13 11:04, Guozhonghua wrote:
> 
>> Message: 1
>> Date: Fri, 12 Jan 2018 06:15:01 +
>> From: Shichangkuo 
>> Subject: Re: [Ocfs2-devel] [Ocfs2-dev] BUG: deadlock with umount and
>>  ocfs2 workqueue triggered by ocfs2rec thread
>> To: Joseph Qi , "z...@suse.com" ,
>>  "j...@suse.cz" 
>> Cc: "ocfs2-devel@oss.oracle.com" 
>> Message-ID:
>>  > i-3com.com>
>>
>> Content-Type: text/plain; charset="gb2312"
>>
>> Hi Joseph
>> Thanks for replying.
>> Umount will flush the ocfs2 workqueue in function
>> ocfs2_truncate_log_shutdown and journal recovery is one work of ocfs2 wq.
>>
>> Thanks
>> Changkuo
>>
> 
> Umount 
>   mngput
>cleanup_mnt 
>deactivate_super:   down_write the rw_semaphore:  
> down_write(&s->s_umount)
>   deactivate_locked_super
> kill_sb: kill_block_super
>   generic_shutdown_super
>   put_super : ocfs2_put_supe
>   ocfs2_dismount_volume
>   ocfs2_truncate_log_shutdown 
>   
> flush_workqueue(osb->ocfs2_wq);
>   
> ocfs2_finish_quota_recovery
>   
> down_read(&sb->s_umount); 
>Here 
> retry down_read rw_semaphore; down read while holding write?  Try 
> rw_semaphore twice, dead lock ?
>The flush work queue of ocfs2_wq will 
> be blocked, so as the umount ops.  
> 
> Thanks. 
> 
> 
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] Ocfs2-devel Digest, Vol 167, Issue 20

2018-01-12 Thread Guozhonghua

> Message: 1
> Date: Fri, 12 Jan 2018 06:15:01 +
> From: Shichangkuo 
> Subject: Re: [Ocfs2-devel] [Ocfs2-dev] BUG: deadlock with umount and
>   ocfs2 workqueue triggered by ocfs2rec thread
> To: Joseph Qi , "z...@suse.com" ,
>   "j...@suse.cz" 
> Cc: "ocfs2-devel@oss.oracle.com" 
> Message-ID:
>i-3com.com>
> 
> Content-Type: text/plain; charset="gb2312"
> 
> Hi Joseph
> Thanks for replying.
> Umount will flush the ocfs2 workqueue in function
> ocfs2_truncate_log_shutdown and journal recovery is one work of ocfs2 wq.
> 
> Thanks
> Changkuo
> 

Umount 
  mngput
   cleanup_mnt 
 deactivate_super:   down_write the rw_semaphore:  
down_write(&s->s_umount)
deactivate_locked_super
  kill_sb: kill_block_super
generic_shutdown_super
put_super : ocfs2_put_supe
ocfs2_dismount_volume
ocfs2_truncate_log_shutdown 

flush_workqueue(osb->ocfs2_wq);

ocfs2_finish_quota_recovery

down_read(&sb->s_umount); 
 Here 
retry down_read rw_semaphore; down read while holding write?  Try rw_semaphore 
twice, dead lock ?
 The flush work queue of ocfs2_wq will 
be blocked, so as the umount ops.  

Thanks. 


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [Ocfs2-dev] BUG: deadlock with umount and ocfs2 workqueue triggered by ocfs2rec thread

2018-01-12 Thread Eric Ren
Hi,

On 01/12/2018 11:43 AM, Shichangkuo wrote:
> Hi all,
>   Now we are testing ocfs2 with 4.14 kernel, and we finding a deadlock with 
> umount and ocfs2 workqueue triggered by ocfs2rec thread. The stack as follows:
> journal recovery work:
> [] call_rwsem_down_read_failed+0x14/0x30
> [] ocfs2_finish_quota_recovery+0x62/0x450 [ocfs2]
> [] ocfs2_complete_recovery+0xc1/0x440 [ocfs2]
> [] process_one_work+0x130/0x350
> [] worker_thread+0x46/0x3b0
> [] kthread+0x101/0x140
> [] ret_from_fork+0x1f/0x30
> [] 0x
>
> /bin/umount:
> [] flush_workqueue+0x104/0x3e0
> [] ocfs2_truncate_log_shutdown+0x3b/0xc0 [ocfs2]
> [] ocfs2_dismount_volume+0x8c/0x3d0 [ocfs2]
> [] ocfs2_put_super+0x31/0xa0 [ocfs2]
> [] generic_shutdown_super+0x6d/0x120
> [] kill_block_super+0x2d/0x60
> [] deactivate_locked_super+0x51/0x90
> [] cleanup_mnt+0x3b/0x70
> [] task_work_run+0x86/0xa0
> [] exit_to_usermode_loop+0x6d/0xa9
> [] do_syscall_64+0x11d/0x130
> [] entry_SYSCALL64_slow_path+0x25/0x25
> [] 0x
>   
> Function ocfs2_finish_quota_recovery try to get sb->s_umount, which was 
> already locked by umount thread, then get a deadlock.

Good catch, thanks for reporting.  Is it reproducible? Can you please 
share the steps for reproducing this issue?
> This issue was introduced by c3b004460d77bf3f980d877be539016f2df4df12 and 
> 5f530de63cfc6ca8571cbdf58af63fb166cc6517.
> I think we cannot use :: s_umount, but the mutex ::dqonoff_mutex was already 
> removed.
> Shall we add a new mutex?

@Jan, I don't look into the code yet, could you help me understand why 
we need to get sb->s_umount in ocfs2_finish_quota_recovery?
Is it because that the quota recovery process will start at umounting? 
or some where else?

Thanks,
Eric



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel