Re: [Ocfs2-users] umount hang + high CPU

sylarrrrrrr Thu, 09 Jul 2009 09:06:01 -0700

 I checked the logs now in the hanged machine, this is what was written there 
priror to the hang:


Jul  7 23:35:14 ocfs2Server kernel: [159179.624911] ocfs2_dlm: Nodes in domain 
("6A468A219FF141429D2BAFF54FA8D514"): 1
Jul  7 23:35:14 ocfs2Server kernel: [159179.625090] 
(9464,0):ocfs2_find_slot:502 slot 1 is already allocated to this node!
Jul  7 23:35:14 ocfs2Server kernel: [159179.630086] 
(9464,3):ocfs2_check_volume:2270 File system was not unmounted cleanly, 
recovering volume.
Jul  7 23:35:14 ocfs2Server kernel: [159179.686060] kjournald2 starting: pid 
9471, dev dm-5:25, commit interval 5 seconds
Jul  7 23:35:14 ocfs2Server kernel: [159179.687710] ocfs2: Mounting device 
(253,5) on (node 1, slot 1) with ordered data mode.
Jul  7 23:35:14 ocfs2Server kernel: [159179.688139] 
(9473,1):ocfs2_replay_journal:1593 Recovering node 0 from slot 0 on device 
(253,5)
Jul  7 23:35:17 ocfs2Server kernel: [159182.532458] kjournald2 starting: pid 
9485, dev dm-5:24, commit interval 5 seconds
Jul  7 23:35:17 ocfs2Server kernel: [159182.591988] 
(9473,2):ocfs2_begin_quota_recovery:374 Beginning quota recovery in slot 0
Jul  7 23:35:18 ocfs2Server kernel: [159183.125091] 
(4797,2):ocfs2_finish_quota_recovery:564 Finishing quota recovery in slot 0
Jul  8 20:39:13 ocfs2Server syslogd 1.5.0#5: restart.

I am going to install netconsole and try again.



-----Original Message-----
From: Sunil Mushran <sunil.mush...@oracle.com>
To: sylarrrr...@aim.com
Cc: ocfs2-users@oss.oracle.com
Sent: Wed, Jul 8, 2009 1:25 am
Subject: Re: [Ocfs2-users] umount hang + high CPU









Well
, that means the hung node exited the dlm domain successfully. 

This is not a dlm issue. 
 

Run alt-sysrq-t on the hung node. If you have netconsole setup you 

should see a log. 
 

sylarrrr...@aim.com wrote: 

> Aha, ok, I don't see the oops, or anything about the hang in the logs.
> The hanged machine still reply to pings. 

> 

> The story now is , that I thought that I can use the : 

> 

>  tunefs.ocfs2  --cloned-volume /dev/mylvmsnapshot 

> 

> in order to mount the snapshot... (big mistake)...well I did manage to
> mount the snapshot, but as soon as 

> I umounted it, the umount process hanged, and then the whole machine
> hanged, except that it responds to pings. 

> 

> 

> Now, I have downloaded the ocfs2-1.4-userguide.pdf , and went to
> section 'f) DLM Debuging', and tried the commands 

> there on the still working node, but only 'cat
> /sys/kernel/debug/o2dlm/*/dlm_state' worked and produced the following
> output: 

> 

> Domain: 1ACAFCEE7ACA47C089069117560F5C91  Key: 0xb9d649ba 

> Thread Pid: 5664  Node: 0  State: JOINED 

> Number of Joins: 1  Joining Node: 255 

> Domain Map: 0 

> Live Map: 0 

> Lock Resources: 51168 (180512) 

> MLEs: 0 (291689) 

>   Blocking: 0 (139713) 

>   Mastery: 0 (151976) 

>   Migration: 0 (0) 

> Lists: Dirty=Empty  Purge=InUse  PendingASTs=Empty  PendingBASTs=EmptyC2

> Purge Count: 8  Refs: 51169 

> Dead Node: 255 

> Recovery Pid: 5665  Master: 255  State: INACTIVE 

> Recovery Map: 

> Recovery Node State: 

> 

> the other commands: 

> debugfs.ocfs2 –R “fs_locks –B” /dev/drbd0 

> debugfs.ocfs2 –R “fs_locks –B” /dev/vg/lv 

> debugfs.ocfs2 –R “dlm_locks M000000000000000022d63c00000000” /dev/drbd0 

> 

> produced the error: 

> open: Device name specified was not found while opening context for
> device –R 

> debugfs.ocfs2 1.4.2 

> debugfs: 

> 

> and: 

> 

> ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN 

> 

> procuded no D state process. 

> 

> 

> I am sorry I write it in the mailing list, but I am a noob, so I don't
> even know if it is a bug, or a misconfiguration, or a misunderstanding. 

> 

> PS. Is nodiratime option supported for mounts? I used it, but I don't
> see it in the user-guide. 

> 

> -----Original Message----- 

> From: Sunil Mushran <sunil.mush...@oracle.com> 

> To: sylarrrr...@aim.com 

> Cc: tao...@oracle.com; ocfs2-us...@oss.oracle.com 

> Sent: Tue, Jul 7, 2009 8:46 pm 

> Subject: Re: [Ocfs2-users] umount hang + high CPU 

> 

> The fix was for the oops you saw.
>
> The hang is a different issue. We have no info on that.
>
> For that, if you would like20to diagnose the problem, read up the dlm
> notes
> in the 1.4 user's guide. It explains a debugging process vis-a-vis hangs.
>
> If the issue is dlm related, then we would like to have the tcpdumps.
>
> Lastly, emails are no t an efficient vehicle for handling such issues.
> Use
> the bugzilla as it allows us to collect information in one place.
>
> Sunil
>
> sylarrrr...@aim.com <mailto:sylarrrr...@aim.com> wrote:
> > So this bug is not over yet :(
> >
> > I have checked my kernel source and indeed it have this patch but I
> > still get the hang.
> >
> > PS. my linux-2.6-2.6.30/fs/ocfs2/dcache.c kernel source has:
> >
> > 290 else
> > 291 mlog_errno(ret);
> > 292
> > 293 /*
> > 294 * In case of error, manually free the allocation and > do the
> iput().
> > 295 * We need to do this because error here means no > d_instantiate(),
> > 296 * which means iput() will not be called during > dput(dentry).
> > 297 */
> > 298 if (ret < 0 && !alias) {
> > 299 ocfs2_lock_res_free(&dl->dl_lockres);
> > 300 BUG_ON(dl->dl_count != 1);
> > 301 spin_lock(&dentry_attach_lock);
> > =2 0302 dentry->d_fsdata = NULL;
> > 303 spin_unlock(&dentry_attach_lock);
> > 304 kfree(dl);
> > 305 iput(inode);
> > 306 }
> > 307
> > 308 dput(alias);
> > 309
> > 310 return ret;
> > 311 }
> >
> >
>
> 

> --------------------------------------------------------------------
---- 

> *A Good Credit Score is 700 or Above. See yours in just 2 easy steps!
> <http://pr.atwola.com/promoclk/100126575x1222585089x1201462806/aol?redir=http://www.freecreditreport.com/pm/default.aspx?sc=668072%26hmpgID=62%26bcd=JulystepsfooterNO62>*
> 

> ------------------------------------------------------------------------ 

> 

> _______________________________________________ 

> Ocfs2-users mailing list 

> ocfs2-us...@oss.oracle.com 

> http://oss.oracle.com/mailman/listinfo/ocfs2-users

_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] umount hang + high CPU

Reply via email to