Hi Changwei, I do think we need follow the principle that use 'dlm_domain_lock' to protect 'dlm_state' as the NOTE says in 'dlm_ctxt': /* NOTE: Next three are protected by dlm_domain_lock */
deadnode won't be cleared from 'dlm->domain_map' if return from __dlm_hb_node_down(), and NodeA will retry migrating to NodeB forever if only NodeA and NodeB in domain. I suggest more testing needed in your solution. thanks, Jun On 2017/11/1 16:11, Changwei Ge wrote: > Hi Jun, > > Thanks for reviewing. > I don't think we have to worry about misusing *dlm_domain_lock* and > *dlm::spinlock*. I admit my change may look a little different from most > of other code snippets where using these two spin locks. But our purpose > is to close the race between __dlm_hb_node_down and > dlm_unregister_domain, right? And my change meets that. :-) > > I suppose we can do it in a flexible way. > > Thanks, > Changwei > > > On 2017/11/1 15:57, piaojun wrote: >> Hi Changwei, >> >> thanks for reviewing, and I think waiting for recoverying done before >> migrating seems another solution, but I wonder if new problems will be >> invoked as following comments. >> >> thanks, >> Jun >> >> On 2017/11/1 15:13, Changwei Ge wrote: >>> Hi Jun, >>> >>> I probably get your point. >>> >>> You mean that dlm finds no lock resource to be migrated and no more lock >>> resource is managed by its hash table. After that a node dies all of a >>> sudden and the dead node is put into dlm's recovery map, right? >> that is it. >>> Furthermore, a lock resource is migrated from other nodes and local node >>> has already asserted master to them. >>> >>> If so, I want to suggest a easier way to solve it. >>> We don't have to add a new flag to dlm structure, just leverage existed >>> dlm status and bitmap. >>> It will bring a bonus - no lock resource will be marked with RECOVERING, >>> it's a safer way, I suppose. >>> >>> Please take a review. >>> >>> Thanks, >>> Changwei >>> >>> >>> Subject: [PATCH] ocfs2/dlm: a node can't be involved in recovery if it >>> is being shutdown >>> >>> Signed-off-by: Changwei Ge <ge.chang...@h3c.com> >>> --- >>> fs/ocfs2/dlm/dlmdomain.c | 4 ++++ >>> fs/ocfs2/dlm/dlmrecovery.c | 3 +++ >>> 2 files changed, 7 insertions(+) >>> >>> diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c >>> index a2b19fbdcf46..5e9283e509a4 100644 >>> --- a/fs/ocfs2/dlm/dlmdomain.c >>> +++ b/fs/ocfs2/dlm/dlmdomain.c >>> @@ -707,11 +707,15 @@ void dlm_unregister_domain(struct dlm_ctxt *dlm) >>> * want new domain joins to communicate with us at >>> * least until we've completed migration of our >>> * resources. */ >>> + spin_lock(&dlm->spinlock); >>> dlm->dlm_state = DLM_CTXT_IN_SHUTDOWN; >>> + spin_unlock(&dlm->spinlock); >> I guess there will be misuse of 'dlm->spinlock' and dlm_domain_lock. >>> leave = 1; >>> } >>> spin_unlock(&dlm_domain_lock); >>> >>> + dlm_wait_for_recovery(dlm); >>> + >>> if (leave) { >>> mlog(0, "shutting down domain %s\n", dlm->name); >>> dlm_begin_exit_domain(dlm); >>> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c >>> index 74407c6dd592..764c95b2b35c 100644 >>> --- a/fs/ocfs2/dlm/dlmrecovery.c >>> +++ b/fs/ocfs2/dlm/dlmrecovery.c >>> @@ -2441,6 +2441,9 @@ static void __dlm_hb_node_down(struct dlm_ctxt >>> *dlm, int idx) >>> { >>> assert_spin_locked(&dlm->spinlock); >>> >>> + if (dlm->dlm_state == DLM_CTXT_IN_SHUTDOWN) >>> + return; >>> + >> 'dlm->dlm_state' probably need be to protected by 'dlm_domain_lock'. >> and I wander if there is more work to be done when in >> 'DLM_CTXT_IN_SHUTDOWN'? >>> if (dlm->reco.new_master == idx) { >>> mlog(0, "%s: recovery master %d just died\n", >>> dlm->name, idx); >>> >> > > . > _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel