I checked the logs now in the hanged machine, this is what was written there priror to the hang:
Jul 7 23:35:14 ocfs2Server kernel: [159179.624911] ocfs2_dlm: Nodes in domain ("6A468A219FF141429D2BAFF54FA8D514"): 1 Jul 7 23:35:14 ocfs2Server kernel: [159179.625090] (9464,0):ocfs2_find_slot:502 slot 1 is already allocated to this node! Jul 7 23:35:14 ocfs2Server kernel: [159179.630086] (9464,3):ocfs2_check_volume:2270 File system was not unmounted cleanly, recovering volume. Jul 7 23:35:14 ocfs2Server kernel: [159179.686060] kjournald2 starting: pid 9471, dev dm-5:25, commit interval 5 seconds Jul 7 23:35:14 ocfs2Server kernel: [159179.687710] ocfs2: Mounting device (253,5) on (node 1, slot 1) with ordered data mode. Jul 7 23:35:14 ocfs2Server kernel: [159179.688139] (9473,1):ocfs2_replay_journal:1593 Recovering node 0 from slot 0 on device (253,5) Jul 7 23:35:17 ocfs2Server kernel: [159182.532458] kjournald2 starting: pid 9485, dev dm-5:24, commit interval 5 seconds Jul 7 23:35:17 ocfs2Server kernel: [159182.591988] (9473,2):ocfs2_begin_quota_recovery:374 Beginning quota recovery in slot 0 Jul 7 23:35:18 ocfs2Server kernel: [159183.125091] (4797,2):ocfs2_finish_quota_recovery:564 Finishing quota recovery in slot 0 Jul 8 20:39:13 ocfs2Server syslogd 1.5.0#5: restart. I am going to install netconsole and try again. -----Original Message----- From: Sunil Mushran <sunil.mush...@oracle.com> To: sylarrrr...@aim.com Cc: ocfs2-users@oss.oracle.com Sent: Wed, Jul 8, 2009 1:25 am Subject: Re: [Ocfs2-users] umount hang + high CPU Well , that means the hung node exited the dlm domain successfully. This is not a dlm issue. Run alt-sysrq-t on the hung node. If you have netconsole setup you should see a log. sylarrrr...@aim.com wrote: > Aha, ok, I don't see the oops, or anything about the hang in the logs. > The hanged machine still reply to pings. > > The story now is , that I thought that I can use the : > > tunefs.ocfs2 --cloned-volume /dev/mylvmsnapshot > > in order to mount the snapshot... (big mistake)...well I did manage to > mount the snapshot, but as soon as > I umounted it, the umount process hanged, and then the whole machine > hanged, except that it responds to pings. > > > Now, I have downloaded the ocfs2-1.4-userguide.pdf , and went to > section 'f) DLM Debuging', and tried the commands > there on the still working node, but only 'cat > /sys/kernel/debug/o2dlm/*/dlm_state' worked and produced the following > output: > > Domain: 1ACAFCEE7ACA47C089069117560F5C91 Key: 0xb9d649ba > Thread Pid: 5664 Node: 0 State: JOINED > Number of Joins: 1 Joining Node: 255 > Domain Map: 0 > Live Map: 0 > Lock Resources: 51168 (180512) > MLEs: 0 (291689) > Blocking: 0 (139713) > Mastery: 0 (151976) > Migration: 0 (0) > Lists: Dirty=Empty Purge=InUse PendingASTs=Empty PendingBASTs=EmptyC2 > Purge Count: 8 Refs: 51169 > Dead Node: 255 > Recovery Pid: 5665 Master: 255 State: INACTIVE > Recovery Map: > Recovery Node State: > > the other commands: > debugfs.ocfs2 –R “fs_locks –B” /dev/drbd0 > debugfs.ocfs2 –R “fs_locks –B” /dev/vg/lv > debugfs.ocfs2 –R “dlm_locks M000000000000000022d63c00000000” /dev/drbd0 > > produced the error: > open: Device name specified was not found while opening context for > device –R > debugfs.ocfs2 1.4.2 > debugfs: > > and: > > ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN > > procuded no D state process. > > > I am sorry I write it in the mailing list, but I am a noob, so I don't > even know if it is a bug, or a misconfiguration, or a misunderstanding. > > PS. Is nodiratime option supported for mounts? I used it, but I don't > see it in the user-guide. > > -----Original Message----- > From: Sunil Mushran <sunil.mush...@oracle.com> > To: sylarrrr...@aim.com > Cc: tao...@oracle.com; ocfs2-us...@oss.oracle.com > Sent: Tue, Jul 7, 2009 8:46 pm > Subject: Re: [Ocfs2-users] umount hang + high CPU > > The fix was for the oops you saw. > > The hang is a different issue. We have no info on that. > > For that, if you would like20to diagnose the problem, read up the dlm > notes > in the 1.4 user's guide. It explains a debugging process vis-a-vis hangs. > > If the issue is dlm related, then we would like to have the tcpdumps. > > Lastly, emails are no t an efficient vehicle for handling such issues. > Use > the bugzilla as it allows us to collect information in one place. > > Sunil > > sylarrrr...@aim.com <mailto:sylarrrr...@aim.com> wrote: > > So this bug is not over yet :( > > > > I have checked my kernel source and indeed it have this patch but I > > still get the hang. > > > > PS. my linux-2.6-2.6.30/fs/ocfs2/dcache.c kernel source has: > > > > 290 else > > 291 mlog_errno(ret); > > 292 > > 293 /* > > 294 * In case of error, manually free the allocation and > do the > iput(). > > 295 * We need to do this because error here means no > d_instantiate(), > > 296 * which means iput() will not be called during > dput(dentry). > > 297 */ > > 298 if (ret < 0 && !alias) { > > 299 ocfs2_lock_res_free(&dl->dl_lockres); > > 300 BUG_ON(dl->dl_count != 1); > > 301 spin_lock(&dentry_attach_lock); > > =2 0302 dentry->d_fsdata = NULL; > > 303 spin_unlock(&dentry_attach_lock); > > 304 kfree(dl); > > 305 iput(inode); > > 306 } > > 307 > > 308 dput(alias); > > 309 > > 310 return ret; > > 311 } > > > > > > > -------------------------------------------------------------------- ---- > *A Good Credit Score is 700 or Above. See yours in just 2 easy steps! > <http://pr.atwola.com/promoclk/100126575x1222585089x1201462806/aol?redir=http://www.freecreditreport.com/pm/default.aspx?sc=668072%26hmpgID=62%26bcd=JulystepsfooterNO62>* > > ------------------------------------------------------------------------ > > _______________________________________________ > Ocfs2-users mailing list > ocfs2-us...@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users
_______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users