Hi! On Fri 19-01-18 15:03:30, piaojun wrote: > On 2018/1/19 13:42, Changwei Ge wrote: > > Hi Jun, > > > > On 2018/1/19 11:59, piaojun wrote: > >> Hi Changwei, > >> > >> On 2018/1/19 11:38, Changwei Ge wrote: > >>> Hi Jun, > >>> > >>> On 2018/1/19 11:17, piaojun wrote: > >>>> Hi Jan, Eric and Changwei, > >>>> > >>>> Could we use another mutex lock to protect quota recovery? Sharing the > >>>> lock with VFS-layer probably seems a little weird. > >>> > >>> I am afraid that we can't since quota need ::s_umount and we indeed need > >>> ::s_umount to get rid of race that quota has freed structs that will be > >>> used by quota recovery in ocfs2. > >>> > >> Could you explain which 'structs' used by quota recovery? Do you mean > >> 'struct super_block'? > > I am not pointing to super_block. > > > > Sure. > > You can refer to > > ocfs2_finish_quota_recovery > > ocfs2_recover_local_quota_file -> here, operations on quota happens > > > > Thanks, > > Changwei > > > I looked through the code in deactivate_locked_super(), and did not > find any *structs* needed to be protected except 'sb'. In addition I
There are quota structures (generally attached to superblock in some way) that get freed in ocfs2_local_free_info() that are used by quota recovery. ocfs2_local_free_info() gets called when we call dquot_disable() from ocfs2_disable_quotas() from ocfs2_dismount_volume(). That's why we need to serialize quota recovery and umount. > still could not figure out why 'dqonoff_mutex' was relaced by > 's_umount'. At least, the patch 5f530de63cfc did not mention that. I > will appreciate for detailed explaination. It was replaced mostly to fix a lock inversion issue in generic code - details are described in commit 9d1ccbe70e0b145 "quota: Use s_umount protection for quota operations". Looking into ocfs2 quota code, I think it will be easiest to introduce ocfs2 variant of dqonoff mutex and use it for serialization of umount and recovery. While reading the recovery / umount / remount code, I've also found several other problems / races that also need fixing (e.g. if some node mounts a filesystem in read-only mode and ends up doing recovery of some crashed node, quota recovery will not work). So I plan to do some more fixing in this area, just didn't get to it yet. Honza > >>>> On 2018/1/19 9:48, Changwei Ge wrote: > >>>>> Hi Jan, > >>>>> > >>>>> On 2018/1/18 0:03, Jan Kara wrote: > >>>>>> On Wed 17-01-18 16:21:35, Jan Kara wrote: > >>>>>>> Hello, > >>>>>>> > >>>>>>> On Fri 12-01-18 16:25:56, Eric Ren wrote: > >>>>>>>> On 01/12/2018 11:43 AM, Shichangkuo wrote: > >>>>>>>>> Hi all, > >>>>>>>>>   Now we are testing ocfs2 with 4.14 kernel, and we finding a > >>>>>>>>> deadlock with umount and ocfs2 workqueue triggered by ocfs2rec > >>>>>>>>> thread. The stack as follows: > >>>>>>>>> journal recovery work: > >>>>>>>>> [<ffffffff8a8c0694>] call_rwsem_down_read_failed+0x14/0x30 > >>>>>>>>> [<ffffffffc0d5d652>] ocfs2_finish_quota_recovery+0x62/0x450 [ocfs2] > >>>>>>>>> [<ffffffffc0d21221>] ocfs2_complete_recovery+0xc1/0x440 [ocfs2] > >>>>>>>>> [<ffffffff8a09a1f0>] process_one_work+0x130/0x350 > >>>>>>>>> [<ffffffff8a09a946>] worker_thread+0x46/0x3b0 > >>>>>>>>> [<ffffffff8a0a0e51>] kthread+0x101/0x140 > >>>>>>>>> [<ffffffff8aa002ff>] ret_from_fork+0x1f/0x30 > >>>>>>>>> [<ffffffffffffffff>] 0xffffffffffffffff > >>>>>>>>> > >>>>>>>>> /bin/umount: > >>>>>>>>> [<ffffffff8a099b24>] flush_workqueue+0x104/0x3e0 > >>>>>>>>> [<ffffffffc0cf18db>] ocfs2_truncate_log_shutdown+0x3b/0xc0 [ocfs2] > >>>>>>>>> [<ffffffffc0d4fd6c>] ocfs2_dismount_volume+0x8c/0x3d0 [ocfs2] > >>>>>>>>> [<ffffffffc0d500e1>] ocfs2_put_super+0x31/0xa0 [ocfs2] > >>>>>>>>> [<ffffffff8a2445bd>] generic_shutdown_super+0x6d/0x120 > >>>>>>>>> [<ffffffff8a24469d>] kill_block_super+0x2d/0x60 > >>>>>>>>> [<ffffffff8a244e71>] deactivate_locked_super+0x51/0x90 > >>>>>>>>> [<ffffffff8a263a1b>] cleanup_mnt+0x3b/0x70 > >>>>>>>>> [<ffffffff8a09e9c6>] task_work_run+0x86/0xa0 > >>>>>>>>> [<ffffffff8a003d70>] exit_to_usermode_loop+0x6d/0xa9 > >>>>>>>>> [<ffffffff8a003a2d>] do_syscall_64+0x11d/0x130 > >>>>>>>>> [<ffffffff8aa00113>] entry_SYSCALL64_slow_path+0x25/0x25 > >>>>>>>>> [<ffffffffffffffff>] 0xffffffffffffffff > >>>>>>>>>    > >>>>>>>>> Function ocfs2_finish_quota_recovery try to get sb->s_umount, which > >>>>>>>>> was already locked by umount thread, then get a deadlock. > >>>>>>>> > >>>>>>>> Good catch, thanks for reporting. Is it reproducible? Can you > >>>>>>>> please share > >>>>>>>> the steps for reproducing this issue? > >>>>>>>>> This issue was introduced by > >>>>>>>>> c3b004460d77bf3f980d877be539016f2df4df12 and > >>>>>>>>> 5f530de63cfc6ca8571cbdf58af63fb166cc6517. > >>>>>>>>> I think we cannot use :: s_umount, but the mutex ::dqonoff_mutex > >>>>>>>>> was already removed. > >>>>>>>>> Shall we add a new mutex? > >>>>>>>> > >>>>>>>> @Jan, I don't look into the code yet, could you help me understand > >>>>>>>> why we > >>>>>>>> need to get sb->s_umount in ocfs2_finish_quota_recovery? > >>>>>>>> Is it because that the quota recovery process will start at > >>>>>>>> umounting? or > >>>>>>>> some where else? > >>>>>>> > >>>>>>> I was refreshing my memory wrt how ocfs2 quota recovery works. The > >>>>>>> problem > >>>>>>> is the following: We load information about all quota information that > >>>>>>> needs recovering (this is possibly for other nodes) in > >>>>>>> ocfs2_begin_quota_recovery() that gets called during mount. Real quota > >>>>>>> recovery happens from the recovery thread in > >>>>>>> ocfs2_finish_quota_recovery(). > >>>>>>> We need to protect code running there from dquot_disable() calls as > >>>>>>> that > >>>>>>> will free structures we use for updating quota information etc. > >>>>>>> Currently > >>>>>>> we use sb->s_umount for that protection. > >>>>>>> > >>>>>>> The problem above apparently happens when someone calls umount before > >>>>>>> the > >>>>>>> recovery thread can finish quota recovery. I will think more about > >>>>>>> how to > >>>>>>> fix the locking so that this lock inversion does not happen... > >>>>>> > >>>>>> So could we move ocfs2_recovery_exit() call in ocfs2_dismount_volume() > >>>>>> up > >>>>>> before ocfs2_disable_quotas()? It seems possible to me, I'm just not > >>>>>> sure > >>>>>> if there are not some hidden dependencies on recovery being shut down > >>>>>> only > >>>>>> after truncate log / local alloc. If we can do that, we could remove > >>>>>> s_umount protection from ocfs2_finish_quota_recovery() and thus > >>>>>> resolve the > >>>>>> race. > >>>>>> > >>>>>> Honza > >>>>> > >>>>> Thanks for looking into this. > >>>>> I am not quite familiar with quota part.:) > >>>>> > >>>>> Or can we move ocfs2_disable_quotas() in ocfs2_dismount_volume down > >>>>> after ocfs2_recovery_exit() with ::invoking down_read(&sb->s_umount) > >>>>> eliminated? > >>>>> > >>>>> Another way I can figure out is: > >>>>> I think we might get inspired from qsync_work_fn(). > >>>>> In that function if current work is under running context of umount with > >>>>> ::s_umount held, it just delays current work to next time. > >>>>> > >>>>> So can we also _try lock_ in ocfs2_finish_quota_recovery() and recover > >>>>> quota by other ocfs2 cluster member nodes or local node's next time of > >>>>> mount? > >>>>> > >>>> I guess we need analyse the impact of _try lock_. Such as no other node > >>>> will help recovering quota when I'm the only node in cluster. > >>> > >>> I don't see any risk for now. I will think about it more, later. > >>> > >>> Thanks, > >>> Changwei > >>>> > >>>> thanks, > >>>> Jun > >>>> > >>>>> Thanks, > >>>>> Changwei > >>>>> > >>>>> > >>>>>> > >>>>> _______________________________________________ > >>>>> Ocfs2-devel mailing list > >>>>> Ocfs2-devel@oss.oracle.com > >>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel > >>>>> > >>>> > >>> . > >>> > >> > > . > > -- Jan Kara <j...@suse.com> SUSE Labs, CR _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel