On Mon, Aug 13, 2012 at 7:03 PM, Xue jiufei <xuejiu...@huawei.com> wrote:
> A parallel umount on 4 nodes triggered a bug in > dlm_process_recovery_date(). Here’s the situation: > Receiving MIG_LOCKRES message, A node processes the locks in migratable > lockres. It copys lvb from migratable lockres when processing the first > valid lock. > If there is a lock in the blocked list with the EX level, it triggers the > BUG. Since valid lvbs are set when locks are granted with EX or PR levels, > locks in > the blocked list cannot have valid lvbs. Therefore I think we should skip > the locks in the blocked list. > > Signed-off-by: Xuejiufei <xuejiu...@huawei.com> > --- > fs/ocfs2/dlm/dlmrecovery.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c > index 01ebfd0..15d81ad 100644 > --- a/fs/ocfs2/dlm/dlmrecovery.c > +++ b/fs/ocfs2/dlm/dlmrecovery.c > @@ -1887,6 +1887,13 @@ static int dlm_process_recovery_data(struct > dlm_ctxt *dlm, > > if (ml->type == LKM_NLMODE) > goto skip_lvb; > + > + /* > + * If the lock is in the blocked list it can't have a > valid lvb, > + * so skip it > + */ > + if (ml->list == DLM_BLOCKED_LIST) > + goto skip_lvb; > > if (!dlm_lvb_is_empty(mres->lvb)) { > if (lksb->flags & DLM_LKSB_PUT_LVB) { > -- > Looks reasonable. Just wanted to confirm. Did this BUG_ON in dlmrecovery,c get tripped? 1903 /* otherwise, the node is sending its 1904 * most recent valid lvb info */ 1905 BUG_ON(ml->type != LKM_EXMODE && 1906 ml->type != LKM_PRMODE);
_______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel