Hi!

On Fri 19-01-18 15:03:30, piaojun wrote:
> On 2018/1/19 13:42, Changwei Ge wrote:
> > Hi Jun,
> > 
> > On 2018/1/19 11:59, piaojun wrote:
> >> Hi Changwei,
> >>
> >> On 2018/1/19 11:38, Changwei Ge wrote:
> >>> Hi Jun,
> >>>
> >>> On 2018/1/19 11:17, piaojun wrote:
> >>>> Hi Jan, Eric and Changwei,
> >>>>
> >>>> Could we use another mutex lock to protect quota recovery? Sharing the
> >>>> lock with VFS-layer probably seems a little weird.
> >>>
> >>> I am afraid that we can't since quota need ::s_umount and we indeed need
> >>> ::s_umount to get rid of race that quota has freed structs that will be
> >>> used by quota recovery in ocfs2.
> >>>
> >> Could you explain which 'structs' used by quota recovery? Do you mean
> >> 'struct super_block'?
> > I am not pointing to super_block.
> > 
> > Sure.
> > You can refer to
> > ocfs2_finish_quota_recovery
> >    ocfs2_recover_local_quota_file -> here, operations on quota happens
> > 
> > Thanks,
> > Changwei
> > 
> I looked through the code in deactivate_locked_super(), and did not
> find any *structs* needed to be protected except 'sb'. In addition I

There are quota structures (generally attached to superblock in some way)
that get freed in ocfs2_local_free_info() that are used by quota recovery.
ocfs2_local_free_info() gets called when we call dquot_disable() from
ocfs2_disable_quotas() from ocfs2_dismount_volume(). That's why we need to
serialize quota recovery and umount.

> still could not figure out why 'dqonoff_mutex' was relaced by
> 's_umount'. At least, the patch 5f530de63cfc did not mention that. I
> will appreciate for detailed explaination.

It was replaced mostly to fix a lock inversion issue in generic code -
details are described in commit 9d1ccbe70e0b145 "quota: Use s_umount
protection for quota operations".

Looking into ocfs2 quota code, I think it will be easiest to introduce
ocfs2 variant of dqonoff mutex and use it for serialization of umount and
recovery. While reading the recovery / umount / remount code, I've also
found several other problems / races that also need fixing (e.g. if some
node mounts a filesystem in read-only mode and ends up doing recovery of
some crashed node, quota recovery will not work). So I plan to do some more
fixing in this area, just didn't get to it yet.

                                                                Honza

> >>>> On 2018/1/19 9:48, Changwei Ge wrote:
> >>>>> Hi Jan,
> >>>>>
> >>>>> On 2018/1/18 0:03, Jan Kara wrote:
> >>>>>> On Wed 17-01-18 16:21:35, Jan Kara wrote:
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> On Fri 12-01-18 16:25:56, Eric Ren wrote:
> >>>>>>>> On 01/12/2018 11:43 AM, Shichangkuo wrote:
> >>>>>>>>> Hi all,
> >>>>>>>>>   Now we are testing ocfs2 with 4.14 kernel, and we finding a 
> >>>>>>>>> deadlock with umount and ocfs2 workqueue triggered by ocfs2rec 
> >>>>>>>>> thread. The stack as follows:
> >>>>>>>>> journal recovery work:
> >>>>>>>>> [<ffffffff8a8c0694>] call_rwsem_down_read_failed+0x14/0x30
> >>>>>>>>> [<ffffffffc0d5d652>] ocfs2_finish_quota_recovery+0x62/0x450 [ocfs2]
> >>>>>>>>> [<ffffffffc0d21221>] ocfs2_complete_recovery+0xc1/0x440 [ocfs2]
> >>>>>>>>> [<ffffffff8a09a1f0>] process_one_work+0x130/0x350
> >>>>>>>>> [<ffffffff8a09a946>] worker_thread+0x46/0x3b0
> >>>>>>>>> [<ffffffff8a0a0e51>] kthread+0x101/0x140
> >>>>>>>>> [<ffffffff8aa002ff>] ret_from_fork+0x1f/0x30
> >>>>>>>>> [<ffffffffffffffff>] 0xffffffffffffffff
> >>>>>>>>>
> >>>>>>>>> /bin/umount:
> >>>>>>>>> [<ffffffff8a099b24>] flush_workqueue+0x104/0x3e0
> >>>>>>>>> [<ffffffffc0cf18db>] ocfs2_truncate_log_shutdown+0x3b/0xc0 [ocfs2]
> >>>>>>>>> [<ffffffffc0d4fd6c>] ocfs2_dismount_volume+0x8c/0x3d0 [ocfs2]
> >>>>>>>>> [<ffffffffc0d500e1>] ocfs2_put_super+0x31/0xa0 [ocfs2]
> >>>>>>>>> [<ffffffff8a2445bd>] generic_shutdown_super+0x6d/0x120
> >>>>>>>>> [<ffffffff8a24469d>] kill_block_super+0x2d/0x60
> >>>>>>>>> [<ffffffff8a244e71>] deactivate_locked_super+0x51/0x90
> >>>>>>>>> [<ffffffff8a263a1b>] cleanup_mnt+0x3b/0x70
> >>>>>>>>> [<ffffffff8a09e9c6>] task_work_run+0x86/0xa0
> >>>>>>>>> [<ffffffff8a003d70>] exit_to_usermode_loop+0x6d/0xa9
> >>>>>>>>> [<ffffffff8a003a2d>] do_syscall_64+0x11d/0x130
> >>>>>>>>> [<ffffffff8aa00113>] entry_SYSCALL64_slow_path+0x25/0x25
> >>>>>>>>> [<ffffffffffffffff>] 0xffffffffffffffff
> >>>>>>>>>   
> >>>>>>>>> Function ocfs2_finish_quota_recovery try to get sb->s_umount, which 
> >>>>>>>>> was already locked by umount thread, then get a deadlock.
> >>>>>>>>
> >>>>>>>> Good catch, thanks for reporting.  Is it reproducible? Can you 
> >>>>>>>> please share
> >>>>>>>> the steps for reproducing this issue?
> >>>>>>>>> This issue was introduced by 
> >>>>>>>>> c3b004460d77bf3f980d877be539016f2df4df12 and 
> >>>>>>>>> 5f530de63cfc6ca8571cbdf58af63fb166cc6517.
> >>>>>>>>> I think we cannot use :: s_umount, but the mutex ::dqonoff_mutex 
> >>>>>>>>> was already removed.
> >>>>>>>>> Shall we add a new mutex?
> >>>>>>>>
> >>>>>>>> @Jan, I don't look into the code yet, could you help me understand 
> >>>>>>>> why we
> >>>>>>>> need to get sb->s_umount in ocfs2_finish_quota_recovery?
> >>>>>>>> Is it because that the quota recovery process will start at 
> >>>>>>>> umounting? or
> >>>>>>>> some where else?
> >>>>>>>
> >>>>>>> I was refreshing my memory wrt how ocfs2 quota recovery works. The 
> >>>>>>> problem
> >>>>>>> is the following: We load information about all quota information that
> >>>>>>> needs recovering (this is possibly for other nodes) in
> >>>>>>> ocfs2_begin_quota_recovery() that gets called during mount. Real quota
> >>>>>>> recovery happens from the recovery thread in 
> >>>>>>> ocfs2_finish_quota_recovery().
> >>>>>>> We need to protect code running there from dquot_disable() calls as 
> >>>>>>> that
> >>>>>>> will free structures we use for updating quota information etc. 
> >>>>>>> Currently
> >>>>>>> we use sb->s_umount for that protection.
> >>>>>>>
> >>>>>>> The problem above apparently happens when someone calls umount before 
> >>>>>>> the
> >>>>>>> recovery thread can finish quota recovery. I will think more about 
> >>>>>>> how to
> >>>>>>> fix the locking so that this lock inversion does not happen...
> >>>>>>
> >>>>>> So could we move ocfs2_recovery_exit() call in ocfs2_dismount_volume() 
> >>>>>> up
> >>>>>> before ocfs2_disable_quotas()? It seems possible to me, I'm just not 
> >>>>>> sure
> >>>>>> if there are not some hidden dependencies on recovery being shut down 
> >>>>>> only
> >>>>>> after truncate log / local alloc. If we can do that, we could remove
> >>>>>> s_umount protection from ocfs2_finish_quota_recovery() and thus 
> >>>>>> resolve the
> >>>>>> race.
> >>>>>>
> >>>>>>                                                                Honza
> >>>>>
> >>>>> Thanks for looking into this.
> >>>>> I am not quite familiar with quota part.:)
> >>>>>
> >>>>> Or can we move ocfs2_disable_quotas() in ocfs2_dismount_volume down
> >>>>> after ocfs2_recovery_exit() with ::invoking down_read(&sb->s_umount)
> >>>>> eliminated?
> >>>>>
> >>>>> Another way I can figure out is:
> >>>>> I think we might get inspired from qsync_work_fn().
> >>>>> In that function if current work is under running context of umount with
> >>>>> ::s_umount held, it just delays current work to next time.
> >>>>>
> >>>>> So can we also _try lock_ in ocfs2_finish_quota_recovery() and recover
> >>>>> quota by other ocfs2 cluster member nodes or local node's next time of
> >>>>> mount?
> >>>>>
> >>>> I guess we need analyse the impact of _try lock_. Such as no other node
> >>>> will help recovering quota when I'm the only node in cluster.
> >>>
> >>> I don't see any risk for now. I will think about it more, later.
> >>>
> >>> Thanks,
> >>> Changwei
> >>>>
> >>>> thanks,
> >>>> Jun
> >>>>
> >>>>> Thanks,
> >>>>> Changwei
> >>>>>
> >>>>>
> >>>>>>
> >>>>> _______________________________________________
> >>>>> Ocfs2-devel mailing list
> >>>>> Ocfs2-devel@oss.oracle.com
> >>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> >>>>>
> >>>>
> >>> .
> >>>
> >>
> > .
> > 
-- 
Jan Kara <j...@suse.com>
SUSE Labs, CR

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Reply via email to