Re: [Ocfs2-users] umount hang + high CPU

2009-07-09 Thread sylarrrrrrr

 I checked the logs now in the hanged machine, this is what was written there 
priror to the hang:

Jul  7 23:35:14 ocfs2Server kernel: [159179.624911] ocfs2_dlm: Nodes in domain 
("6A468A219FF141429D2BAFF54FA8D514"): 1
Jul  7 23:35:14 ocfs2Server kernel: [159179.625090] 
(9464,0):ocfs2_find_slot:502 slot 1 is already allocated to this node!
Jul  7 23:35:14 ocfs2Server kernel: [159179.630086] 
(9464,3):ocfs2_check_volume:2270 File system was not unmounted cleanly, 
recovering volume.
Jul  7 23:35:14 ocfs2Server kernel: [159179.686060] kjournald2 starting: pid 
9471, dev dm-5:25, commit interval 5 seconds
Jul  7 23:35:14 ocfs2Server kernel: [159179.687710] ocfs2: Mounting device 
(253,5) on (node 1, slot 1) with ordered data mode.
Jul  7 23:35:14 ocfs2Server kernel: [159179.688139] 
(9473,1):ocfs2_replay_journal:1593 Recovering node 0 from slot 0 on device 
(253,5)
Jul  7 23:35:17 ocfs2Server kernel: [159182.532458] kjournald2 starting: pid 
9485, dev dm-5:24, commit interval 5 seconds
Jul  7 23:35:17 ocfs2Server kernel: [159182.591988] 
(9473,2):ocfs2_begin_quota_recovery:374 Beginning quota recovery in slot 0
Jul  7 23:35:18 ocfs2Server kernel: [159183.125091] 
(4797,2):ocfs2_finish_quota_recovery:564 Finishing quota recovery in slot 0
Jul  8 20:39:13 ocfs2Server syslogd 1.5.0#5: restart.

I am going to install netconsole and try again.



-Original Message-
From: Sunil Mushran 
To: syla...@aim.com
Cc: ocfs2-users@oss.oracle.com
Sent: Wed, Jul 8, 2009 1:25 am
Subject: Re: [Ocfs2-users] umount hang + high CPU









Well
, that means the hung node exited the dlm domain successfully. 

This is not a dlm issue. 
 

Run alt-sysrq-t on the hung node. If you have netconsole setup you 

should see a log. 
 

syla...@aim.com wrote: 

> Aha, ok, I don't see the oops, or anything about the hang in the logs.
> The hanged machine still reply to pings. 

> 

> The story now is , that I thought that I can use the : 

> 

>  tunefs.ocfs2  --cloned-volume /dev/mylvmsnapshot 

> 

> in order to mount the snapshot... (big mistake)...well I did manage to
> mount the snapshot, but as soon as 

> I umounted it, the umount process hanged, and then the whole machine
> hanged, except that it responds to pings. 

> 

> 

> Now, I have downloaded the ocfs2-1.4-userguide.pdf , and went to
> section 'f) DLM Debuging', and tried the commands 

> there on the still working node, but only 'cat
> /sys/kernel/debug/o2dlm/*/dlm_state' worked and produced the following
> output: 

> 

> Domain: 1ACAFCEE7ACA47C089069117560F5C91  Key: 0xb9d649ba 

> Thread Pid: 5664  Node: 0  State: JOINED 

> Number of Joins: 1  Joining Node: 255 

> Domain Map: 0 

> Live Map: 0 

> Lock Resources: 51168 (180512) 

> MLEs: 0 (291689) 

>   Blocking: 0 (139713) 

>   Mastery: 0 (151976) 

>   Migration: 0 (0) 

> Lists: Dirty=Empty  Purge=InUse  PendingASTs=Empty  PendingBASTs=EmptyC2

> Purge Count: 8  Refs: 51169 

> Dead Node: 255 

> Recovery Pid: 5665  Master: 255  State: INACTIVE 

> Recovery Map: 

> Recovery Node State: 

> 

> the other commands: 

> debugfs.ocfs2 –R “fs_locks –B” /dev/drbd0 

> debugfs.ocfs2 –R “fs_locks –B” /dev/vg/lv 

> debugfs.ocfs2 –R “dlm_locks M22d63c” /dev/drbd0 

> 

> produced the error: 

> open: Device name specified was not found while opening context for
> device –R 

> debugfs.ocfs2 1.4.2 

> debugfs: 

> 

> and: 

> 

> ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN 

> 

> procuded no D state process. 

> 

> 

> I am sorry I write it in the mailing list, but I am a noob, so I don't
> even know if it is a bug, or a misconfiguration, or a misunderstanding. 

> 

> PS. Is nodiratime option supported for mounts? I used it, but I don't
> see it in the user-guide. 

> 

> -Original Message- 

> From: Sunil Mushran  

> To: syla...@aim.com 

> Cc: tao...@oracle.com; ocfs2-us...@oss.oracle.com 

> Sent: Tue, Jul 7, 2009 8:46 pm 

> Subject: Re: [Ocfs2-users] umount hang + high CPU 

> 

> The fix was for the oops you saw.
>
> The hang is a different issue. We have no info on that.
>
> For that, if you would like20to diagnose the problem, read up the dlm
> notes
> in the 1.4 user's guide. It explains a debugging process vis-a-vis hangs.
>
> If the issue is dlm related, then we would like to have the tcpdumps.
>
> Lastly, emails are no t an efficient vehicle for handling such issues.
> Use
> the bugzilla as it allows us to collect information in one place.
>
> Sunil
>
> syla...@aim.com <mailto:syla...@aim.com> wrote:
> > So this bug is not over yet :(
> >
> > I have checked my kernel source and indeed it have this patch but I
> > 

Re: [Ocfs2-users] umount hang + high CPU

2009-07-07 Thread Sunil Mushran
Well, that means the hung node exited the dlm domain successfully.
This is not a dlm issue.

Run alt-sysrq-t on the hung node. If you have netconsole setup you
should see a log.

syla...@aim.com wrote:
> Aha, ok, I don't see the oops, or anything about the hang in the logs. 
> The hanged machine still reply to pings.
>
> The story now is , that I thought that I can use the :
>
>  tunefs.ocfs2  --cloned-volume /dev/mylvmsnapshot
>
> in order to mount the snapshot... (big mistake)...well I did manage to 
> mount the snapshot, but as soon as
> I umounted it, the umount process hanged, and then the whole machine 
> hanged, except that it responds to pings.
>
>
> Now, I have downloaded the ocfs2-1.4-userguide.pdf , and went to 
> section 'f) DLM Debuging', and tried the commands
> there on the still working node, but only 'cat 
> /sys/kernel/debug/o2dlm/*/dlm_state' worked and produced the following 
> output:
>
> Domain: 1ACAFCEE7ACA47C089069117560F5C91  Key: 0xb9d649ba
> Thread Pid: 5664  Node: 0  State: JOINED
> Number of Joins: 1  Joining Node: 255
> Domain Map: 0
> Live Map: 0
> Lock Resources: 51168 (180512)
> MLEs: 0 (291689)
>   Blocking: 0 (139713)
>   Mastery: 0 (151976)
>   Migration: 0 (0)
> Lists: Dirty=Empty  Purge=InUse  PendingASTs=Empty  PendingBASTs=Empty
> Purge Count: 8  Refs: 51169
> Dead Node: 255
> Recovery Pid: 5665  Master: 255  State: INACTIVE
> Recovery Map:
> Recovery Node State:
>
> the other commands:
> debugfs.ocfs2 –R “fs_locks –B” /dev/drbd0
> debugfs.ocfs2 –R “fs_locks –B” /dev/vg/lv
> debugfs.ocfs2 –R “dlm_locks M22d63c” /dev/drbd0
>
> produced the error:
> open: Device name specified was not found while opening context for 
> device –R
> debugfs.ocfs2 1.4.2
> debugfs:
>
> and:
>
> ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN
>
> procuded no D state process.
>
>
> I am sorry I write it in the mailing list, but I am a noob, so I don't 
> even know if it is a bug, or a misconfiguration, or a misunderstanding.
>
> PS. Is nodiratime option supported for mounts? I used it, but I don't 
> see it in the user-guide.
>
> -Original Message-
> From: Sunil Mushran 
> To: syla...@aim.com
> Cc: tao...@oracle.com; ocfs2-users@oss.oracle.com
> Sent: Tue, Jul 7, 2009 8:46 pm
> Subject: Re: [Ocfs2-users] umount hang + high CPU
>
> The fix was for the oops you saw. 
>  
> The hang is a different issue. We have no info on that. 
>  
> For that, if you would like to diagnose the problem, read up the dlm 
> notes 
> in the 1.4 user's guide. It explains a debugging process vis-a-vis hangs. 
>  
> If the issue is dlm related, then we would like to have the tcpdumps. 
>  
> Lastly, emails are no t an efficient vehicle for handling such issues. 
> Use 
> the bugzilla as it allows us to collect information in one place. 
>  
> Sunil 
>  
> syla...@aim.com <mailto:syla...@aim.com> wrote: 
> > So this bug is not over yet :( 
> > 
> > I have checked my kernel source and indeed it have this patch but I 
> > still get the hang. 
> > 
> > PS. my linux-2.6-2.6.30/fs/ocfs2/dcache.c kernel source has: 
> > 
> > 290 else 
> > 291 mlog_errno(ret); 
> > 292 
> > 293 /* 
> > 294 * In case of error, manually free the allocation and > do the 
> iput(). 
> > 295 * We need to do this because error here means no > d_instantiate(), 
> > 296 * which means iput() will not be called during > dput(dentry). 
> > 297 */ 
> > 298 if (ret < 0 && !alias) { 
> > 299 ocfs2_lock_res_free(&dl->dl_lockres); 
> > 300 BUG_ON(dl->dl_count != 1); 
> > 301 spin_lock(&dentry_attach_lock); 
> > =2 0302 dentry->d_fsdata = NULL; 
> > 303 spin_unlock(&dentry_attach_lock); 
> > 304 kfree(dl); 
> > 305 iput(inode); 
> > 306 } 
> > 307 
> > 308 dput(alias); 
> > 309 
> > 310 return ret; 
> > 311 } 
> > 
> > 
>  
>
> 
> *A Good Credit Score is 700 or Above. See yours in just 2 easy steps! 
> <http://pr.atwola.com/promoclk/100126575x1222585089x1201462806/aol?redir=http://www.freecreditreport.com/pm/default.aspx?sc=668072%26hmpgID=62%26bcd=JulystepsfooterNO62>*
>  
>
> 
>
> ___
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] umount hang + high CPU

2009-07-07 Thread sylarrrrrrr

 Aha, ok, I don't see the oops, or anything about the hang in the logs. The 
hanged machine still reply to pings.

The story now is , that I thought that I can use the :



 tunefs.ocfs2  --cloned-volume /dev/mylvmsnapshot

in order to mount the snapshot... (big mistake)...well I did manage to mount 
the snapshot, but as soon as
I umounted it, the umount process hanged, and then the whole machine hanged, 
except that it responds to pings.


Now, I have downloaded the ocfs2-1.4-userguide.pdf , and went to section 'f) 
DLM Debuging', and tried the commands
there on the still working node, but only 'cat 
/sys/kernel/debug/o2dlm/*/dlm_state' worked and produced the following output:

Domain: 1ACAFCEE7ACA47C089069117560F5C91  Key: 0xb9d649ba
Thread Pid: 5664  Node: 0  State: JOINED
Number of Joins: 1  Joining Node: 255
Domain Map: 0
Live Map: 0
Lock Resources: 51168 (180512)
MLEs: 0 (291689)
  Blocking: 0 (139713)
  Mastery: 0 (151976)
  Migration: 0 (0)
Lists: Dirty=Empty  Purge=InUse  PendingASTs=Empty  PendingBASTs=Empty
Purge Count: 8  Refs: 51169
Dead Node: 255
Recovery Pid: 5665  Master: 255  State: INACTIVE
Recovery Map:
Recovery Node State:

the other commands:
debugfs.ocfs2 –R “fs_locks –B” /dev/drbd0
debugfs.ocfs2 –R “fs_locks –B” /dev/vg/lv
debugfs.ocfs2 –R “dlm_locks M22d63c” /dev/drbd0

produced the error:
open: Device name specified was not found while opening context
 for device –R
debugfs.ocfs2 1.4.2
debugfs:

and:

ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN

procuded no D state process.


I am sorry I write it in the mailing list, but I am a noob, so I don't even 
know if it is a bug, or a misconfiguration, or a misunderstanding.

PS. Is nodiratime option supported for mounts? I used it, but I don't see it in 
the user-guide.




-Original Message-
From: Sunil Mushran 
To: syla...@aim.com
Cc: tao...@oracle.com; ocfs2-users@oss.oracle.com
Sent: Tue, Jul 7, 2009 8:46 pm
Subject: Re: [Ocfs2-users] umount hang + high CPU









The fix was for the oops you saw. 
 

The hang is a different issue. We have no info on that. 
 

For that, if you would like to diagnose the problem, read up the dlm notes 

in the 1.4 user's guide. It explains a debugging process vis-a-vis hangs. 
 

If the issue is dlm related, then we would like to have the tcpdumps. 
 

Lastly, emails are not an efficient vehicle for handling such issues. Use 

the bugzilla as it allows us to collect information in one place. 
 

Sunil 
 

syla...@aim.com wrote: 

> So this bug is not over yet :( 

> 

> I have checked my kernel source and indeed it have this patch but I
> still get the hang. 

> 

> PS. my linux-2.6-2.6.30/fs/ocfs2/dcache.c kernel source has: 

> 

> 290 else 

>
 291 mlog_errno(ret); 

> 292 

> 293 /* 

> 294  * In case of error, manually free the allocation and
> do the iput(). 

> 295  * We need to do this because error here means no
> d_instantiate(), 

> 296  * which means iput() will not be called during
> dput(dentry). 

> 297  */ 

> 298 if (ret < 0 && !alias) { 

> 299 ocfs2_lock_res_free(&dl->dl_lockres); 

> 300 BUG_ON(dl->dl_count != 1); 

> 301 spin_lock(&dentry_attach_lock); 

> 302 dentry->d_fsdata = NULL; 

> 303 spin_unlock(&dentry_attach_lock); 

> 304 kfree(dl); 

> 305 iput(inode); 

> 306 } 

> 307 

> 308 dput(alias); 

> 309 

> 310 return ret; 

> 311 } 

> 

> 
 





___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] umount hang + high CPU

2009-07-07 Thread Sunil Mushran
The fix was for the oops you saw.

The hang is a different issue. We have no info on that.

For that, if you would like to diagnose the problem, read up the dlm notes
in the 1.4 user's guide. It explains a debugging process vis-a-vis hangs.

If the issue is dlm related, then we would like to have the tcpdumps.

Lastly, emails are not an efficient vehicle for handling such issues. Use
the bugzilla as it allows us to collect information in one place.

Sunil

syla...@aim.com wrote:
> So this bug is not over yet :(
>
> I have checked my kernel source and indeed it have this patch but I 
> still get the hang.
>
> PS. my linux-2.6-2.6.30/fs/ocfs2/dcache.c kernel source has:
>
> 290 else
> 291 mlog_errno(ret);
> 292
> 293 /*
> 294  * In case of error, manually free the allocation and 
> do the iput().
> 295  * We need to do this because error here means no 
> d_instantiate(),
> 296  * which means iput() will not be called during 
> dput(dentry).
> 297  */
> 298 if (ret < 0 && !alias) {
> 299 ocfs2_lock_res_free(&dl->dl_lockres);
> 300 BUG_ON(dl->dl_count != 1);
> 301 spin_lock(&dentry_attach_lock);
> 302 dentry->d_fsdata = NULL;
> 303 spin_unlock(&dentry_attach_lock);
> 304 kfree(dl);
> 305 iput(inode);
> 306 }
> 307
> 308 dput(alias);
> 309
> 310 return ret;
> 311 }
>
>


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] umount hang + high CPU

2009-07-07 Thread sylarrrrrrr
So this bug is not over yet :(

I have checked my kernel source and indeed it have this patch but I still get 
the hang.

PS. my linux-2.6-2.6.30/fs/ocfs2/dcache.c kernel source has:

??? 290 else
??? 291 mlog_errno(ret);
??? 292 
??? 293 /*
??? 294? * In case of error, manually free the allocation and do the 
iput().
??? 295? * We need to do this because error here means no 
d_instantiate(),
??? 296? * which means iput() will not be called during dput(dentry).
??? 297? */
??? 298 if (ret < 0 && !alias) {
??? 299 ocfs2_lock_res_free(&dl->dl_lockres);
??? 300 BUG_ON(dl->dl_count != 1);
??? 301 spin_lock(&dentry_attach_lock);
??? 302 dentry->d_fsdata = NULL;
??? 303 spin_unlock(&dentry_attach_lock);
??? 304 kfree(dl);
??? 305 iput(inode);
??? 306 }
??? 307 
??? 308 dput(alias);
??? 309 
??? 310 return ret;
??? 311 }


-Original Message-
From: Tao Ma 
To: syla...@aim.com
Cc: sunil.mush...@oracle.com; ocfs2-users@oss.oracle.com
Sent: Tue, Jul 7, 2009 11:16 am
Subject: Re: [Ocfs2-users] umount hang + high CPU









?

syla...@aim.com wrote:?

> That's a quick fix :D?

> 
> How do I put it in my system??

> 
> I have only recently downloaded and upgraded both tools and kernel, so I 
> gather that it is not on the recent version of either. That dlmmaster.c 
> file is not in the tools package, does that mean that this file is in 
> the kernel? Do I need to patch and compile the kernel from the 
> development code? (which is where?)?

yes, 2.6.30 already have the fix for?

http://oss.oracle.com/bugzilla/show_bug.cgi?id=914?
?

And yes, it is in the kernel. So your tools aren't affected.?
?

Regards,?

Tao?
?

> 
> 
> -Original Message-?

> From: Sunil Mushran ?

> To: syla...@aim.com?

> Cc: ocfs2-us...@oss.oracle.com?

> Sent: Mon, Jul 6, 2009 11:03 pm?

> Subject: Re: [Ocfs2-users] umount hang + high CPU?

> 
> Fixed. Details in http://oss.oracle.com/bugzilla/show_bug.cgi?id=914 
>  
> syla...@aim.com <mailto:syla...@aim.com> wrote: 
>  > 
>  > Hi, 
>  > 
>  > On kernel 2.6.30 (and I have upgraded drbd there too to 8.3.2) I have 
>  > nothing in the logs, and the umount hangs, and after a few minutes 
> the > whole computer hangs, and I have to hard reset it. On kernel 
> 2.6.26 it > also hanged but the computer didn't hang, but it refused to 
> reboot, or > poweroff, so I also had to hard reset it. In 2.6.26 I had 
> this in syslog : 
>  > 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] > 
> (7254,1):dlm_empty_lockres:2709 ERROR: lockres > 
> O0003cb1e30 still has local locks! 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] [ cut 
>  > here ] 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] kernel BUG at > 
> fs/ocfs2/dlm/dlmmaster.c:2710! 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] invalid opcode: > 
>  [1] SMP 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] CPU 1 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] Modules linked in: 
>  > ocfs2 ppdev lp parport drbd cn rfcomm l2cap bluetooth xt_tcpudp > 
> iptable_filter battery ip_t 
>  > ables x_tables ipv6 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm > 
> ocfs2_nodemanager ocfs2_stackglue configfs linear coretemp loop > 
> snd_hda_intel snd_pcsp snd_pcm snd_timer sn 
>  > d soundcore nvidiafb i2c_i801 psmouse snd_page_alloc i2c_core button 
>  > vgastate serio_raw intel_agp evdev ext3 jbd mbcache dm_mirror dm_log 
>  > dm_snapshot dm_mod raid456 a 
>  > sync_xor async_memcpy async_tx xor raid1 md_mod sg sr_mod cdrom 
> sd_mod > ide_pci_generic ide_core ata_generic usbhid hid ff_memless 
> usb_storage > floppy ahci ohci1394 pat 
>  > a_marvell atl1e ieee1394 libata tulip scsi_mod dock ehci_hcd uhci_hcd 
>  > thermal processor fan thermal_sys 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] Pid: 7254, comm: > 
> umount Not tainted 2.6.26-2-amd64 #1 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] RIP: > 
> 0010:[] [] > 
> :ocfs2_dlm:dlm_empty_lockres+0x13fb/0x14a0 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] RSP: > 
> 0018:81023c971c18 EFLAGS: 00010292 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] RAX: > 
> 0079 RBX: 8101db4dae40 RCX: 804fe108 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] RDX: > 
> 0001 RSI: 0096 RDI: 0286 
>  > 

Re: [Ocfs2-users] umount hang + high CPU

2009-07-07 Thread Tao Ma


syla...@aim.com wrote:
> That's a quick fix :D
> 
> How do I put it in my system?
> 
> I have only recently downloaded and upgraded both tools and kernel, so I 
> gather that it is not on the recent version of either. That dlmmaster.c 
> file is not in the tools package, does that mean that this file is in 
> the kernel? Do I need to patch and compile the kernel from the 
> development code? (which is where?)
yes, 2.6.30 already have the fix for
http://oss.oracle.com/bugzilla/show_bug.cgi?id=914

And yes, it is in the kernel. So your tools aren't affected.

Regards,
Tao

> 
> 
> -Original Message-
> From: Sunil Mushran 
> To: syla...@aim.com
> Cc: ocfs2-users@oss.oracle.com
> Sent: Mon, Jul 6, 2009 11:03 pm
> Subject: Re: [Ocfs2-users] umount hang + high CPU
> 
> Fixed. Details in http://oss.oracle.com/bugzilla/show_bug.cgi?id=914 
>  
> syla...@aim.com <mailto:syla...@aim.com> wrote: 
>  > 
>  > Hi, 
>  > 
>  > On kernel 2.6.30 (and I have upgraded drbd there too to 8.3.2) I have 
>  > nothing in the logs, and the umount hangs, and after a few minutes 
> the > whole computer hangs, and I have to hard reset it. On kernel 
> 2.6.26 it > also hanged but the computer didn't hang, but it refused to 
> reboot, or > poweroff, so I also had to hard reset it. In 2.6.26 I had 
> this in syslog : 
>  > 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] > 
> (7254,1):dlm_empty_lockres:2709 ERROR: lockres > 
> O0003cb1e30 still has local locks! 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] [ cut 
>  > here ] 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] kernel BUG at > 
> fs/ocfs2/dlm/dlmmaster.c:2710! 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] invalid opcode: > 
>  [1] SMP 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] CPU 1 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] Modules linked in: 
>  > ocfs2 ppdev lp parport drbd cn rfcomm l2cap bluetooth xt_tcpudp > 
> iptable_filter battery ip_t 
>  > ables x_tables ipv6 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm > 
> ocfs2_nodemanager ocfs2_stackglue configfs linear coretemp loop > 
> snd_hda_intel snd_pcsp snd_pcm snd_timer sn 
>  > d soundcore nvidiafb i2c_i801 psmouse snd_page_alloc i2c_core button 
>  > vgastate serio_raw intel_agp evdev ext3 jbd mbcache dm_mirror dm_log 
>  > dm_snapshot dm_mod raid456 a 
>  > sync_xor async_memcpy async_tx xor raid1 md_mod sg sr_mod cdrom 
> sd_mod > ide_pci_generic ide_core ata_generic usbhid hid ff_memless 
> usb_storage > floppy ahci ohci1394 pat 
>  > a_marvell atl1e ieee1394 libata tulip scsi_mod dock ehci_hcd uhci_hcd 
>  > thermal processor fan thermal_sys 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] Pid: 7254, comm: > 
> umount Not tainted 2.6.26-2-amd64 #1 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] RIP: > 
> 0010:[] [] > 
> :ocfs2_dlm:dlm_empty_lockres+0x13fb/0x14a0 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] RSP: > 
> 0018:81023c971c18 EFLAGS: 00010292 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] RAX: > 
> 0079 RBX: 8101db4dae40 RCX: 804fe108 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] RDX: > 
> 0001 RSI: 0096 RDI: 0286 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] RBP: > 
> 8101db4dae40 R08: 804fe0f0 R09: 81000103b918 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] R10: > 
> 81000103b880 R11: 0046 R12: 8101cae4e800 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] R13: > 
> 001f R14: ffd9 R15: 00c5 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] FS: > 
> () GS:81023f08e8c0(0063) knlGS:f7deb6f0 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] CS: 0010 DS: 002b 
>  > ES: 002b CR0: 8005003b 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] CR2: > 
> f7e2e2a0 CR3: 0001dc9ef000 CR4: 06e0 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] DR0: > 
>  DR1:  DR2:  
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] DR3: > 
>  DR6: 0ff0 DR7: 0400 
>  > Jul 5 21:10:34 ocfs2Server kernel: [249187.320327] Process umount > 
> (pid: 7254, threadinfo 81023c97, task 81019e998040) 
>  > Jul 5 21:10:34 ocfs2Server kernel:

Re: [Ocfs2-users] umount hang + high CPU

2009-07-07 Thread sylarrrrrrr
That's a quick fix :D

How do I put it in my system?

I have only recently downloaded and upgraded both tools and kernel, so I gather 
that it is not on the recent version of either. That dlmmaster.c file is not in 
the tools package, does that mean that this file is in the kernel? Do I need to 
patch and compile the kernel from the development code? (which is where?)


-Original Message-
From: Sunil Mushran 
To: syla...@aim.com
Cc: ocfs2-users@oss.oracle.com
Sent: Mon, Jul 6, 2009 11:03 pm
Subject: Re: [Ocfs2-users] umount hang + high CPU









Fixed. Details in http://oss.oracle.com/bugzilla/show_bug.cgi?id=914?
?

syla...@aim.com wrote:?

>?

> Hi,?

>?

>  On kernel 2.6.30 (and I have upgraded drbd there too to 8.3.2) I have 
> nothing in the logs, and the umount hangs, and after a few minutes the 
> whole computer hangs, and I have to hard reset it. On kernel 2.6.26 it 
> also hanged but the computer didn't hang, but it refused to reboot, or 
> poweroff, so I also had to hard reset it. In 2.6.26 I had this in syslog :?

>?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327] 
> (7254,1):dlm_empty_lockres:2709 ERROR: lockres 
> O0003cb1e30 still has local locks!?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327] [ cut 
> here ]?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327] kernel BUG at 
> fs/ocfs2/dlm/dlmmaster.c:2710!?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327] invalid opcode: 
>  [1] SMP?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327] CPU 1?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327] Modules linked in: 
> ocfs2 ppdev lp parport drbd cn rfcomm l2cap bluetooth xt_tcpudp 
> iptable_filter battery ip_t?

> ables x_tables ipv6 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm 
> ocfs2_nodemanager ocfs2_stackglue configfs linear coretemp loop 
> snd_hda_intel snd_pcsp snd_pcm snd_timer sn?

> d soundcore nvidiafb i2c_i801 psmouse snd_page_alloc i2c_core button 
> vgastate serio_raw intel_agp evdev ext3 jbd mbcache dm_mirror dm_log 
> dm_snapshot dm_mod raid456 a?

> sync_xor async_memcpy async_tx xor raid1 md_mod sg sr_mod cdrom sd_mod 
> ide_pci_generic ide_core ata_generic usbhid hid ff_memless usb_storage 
> floppy ahci ohci1394 pat?

> a_marvell atl1e ieee1394 libata tulip scsi_mod dock ehci_hcd uhci_hcd 
> thermal processor fan thermal_sys?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327] Pid: 7254, comm: 
> umount Not tainted 2.6.26-2-amd64 #1?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327] RIP: 
> 0010:[]  [] 
> :ocfs2_dlm:dlm_empty_lockres+0x13fb/0x14a0?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327] RSP: 
> 0018:81023c971c18  EFLAGS: 00010292?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327] RAX: 
> 0079 RBX: 8101db4dae40 RCX: 804fe108?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327] RDX: 
> 0001 RSI: 0096 RDI: 0286?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327] RBP: 
> 8101db4dae40 R08: 804fe0f0 R09: 81000103b918?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327] R10: 
> 81000103b880 R11: 0046 R12: 8101cae4e800?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327] R13: 
> 001f R14: ffd9 R15: 00c5?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327] FS:  
> () GS:81023f08e8c0(0063) knlGS:f7deb6f0?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327] CS:  0010 DS: 002b 
> ES: 002b CR0: 8005003b?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327] CR2: 
> f7e2e2a0 CR3: 0001dc9ef000 CR4: 06e0?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327] DR0: 
>  DR1:  DR2: ?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327] DR3: 
>  DR6: 0ff0 DR7: 0400?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327] Process umount 
> (pid: 7254, threadinfo 81023c97, task 81019e998040)?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327] Stack:  
> 81020c580800 81010001 0001 ?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327]  8101cae4ea48 
>  81020c580800 ?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327]  8101cae4ea38 
> 0003  81019e998040?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327] Call Trace:?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327]  
> [] ? autoremove_wake_function+0x0/0x2e?

> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327]  
> [] ? :ocfs2_dlm:__dlm_lockres_unuse

Re: [Ocfs2-users] umount hang + high CPU

2009-07-06 Thread Sunil Mushran
fs2_dismount_volume+0x1a1/0x34e
> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327]  
> [] ? filemap_write_and_wait+0x26/0x31
> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327]  
> [] ? :ocfs2:ocfs2_put_super+0x67/0xb8
> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327]  
> [] ? generic_shutdown_super+0x60/0xee
> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327]  
> [] ? kill_block_super+0xd/0x1e
> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327]  
> [] ? deactivate_super+0x5f/0x78
> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327]  
> [] ? sys_umount+0x2f9/0x353
> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327]  
> [] ? do_page_fault+0x5d8/0x9c8
> Jul  5 21:10:34 ocfs2Server kernel: [249187.320327]  
> [] ? sys32_stat64+0x11/0x29
> Jul  5 21:10:34 ocfs2Server kernel: [249187.320337]  
> [] ? __up_write+0x21/0x10e
> Jul  5 21:10:34 ocfs2Server kernel: [249187.320337]  
> [] ? sysenter_do_call+0x1b/0x66
> Jul  5 21:10:34 ocfs2Server kernel: [249187.320337]
> Jul  5 21:10:34 ocfs2Server kernel: [249187.320337]
> Jul  5 21:10:34 ocfs2Server kernel: [249187.320337] Code: 00 00 8b b0 
> 98 01 00 00 48 c7 c7 2f a9 36 a0 31 c0 65 8b 14 25 24 00 00 00 48 89 
> 0c 24 89 d2 48 c7 c1 00 48 36 a0 e8 e8 bb ed df <0f> 0b eb fe 48 f7 05 
> 32 dc fc ff 00 09 00 00 74 4d 48 f7 05 2d
> Jul  5 21:10:34 ocfs2Server kernel: [249187.320337] RIP  
> [] :ocfs2_dlm:dlm_empty_lockres+0x13fb/0x14a0
> Jul  5 21:10:34 ocfs2Server kernel: [249187.320337]  RSP 
> 
> Jul  5 21:10:34 ocfs2Server kernel: [249187.320337] ---[ end trace 
> 10e3d919ff4fa443 ]---
> Jul  5 21:10:34 ocfs2Server kernel: [249187.320337] [ cut 
> here ]----
>
> PS. I see that both kernels have the same 1.5.0 version, so upgrading 
> was pointless in this regard.
>
>
> -Original Message-
> From: Tao Ma mailto:tao...@oracle.com>>
> To: syla...@aim.com <mailto:syla...@aim.com>
> Cc: ocfs2-users@oss.oracle.com <mailto:ocfs2-users@oss.oracle.com>
> Sent: Sun, Jul 5, 2009 9:22 pm
> Subject: Re: [Ocfs2-users] umount hang + high CPU
>
> Hi, 
>   Is there something in your system log? 
>   I would guess there should be some info there. 
>  
> Regards, 
> Tao 
>  
> syla...@aim.com wrote: 
> > Hi, 
> > > I had a problem where I got a "kernel bug" in the logs in ocfs2. 
> That > happened when I unmounted the volume after a day or two that it 
> was > mounted, so I thought I needed to upgrade the kernel (maybe the 
> next > version will be bug free), so I did to 2.6.30, and now I tried 
> mounting > and unmounting the volume right away... and it hanged, and 
> the CPU got > high with that umount process. 
> > > Please advice 
> > > PS. tools and console packages are version 1.4.2. 
> > > *A Good Credit Score is 700 or Above. See yours in just 2 easy 
> steps! > 
> <http://pr.atwola.com/promoclk/100126575x1222887319x1201497660/aol?redir=http://www.freecreditreport.com/pm/default.aspx?sc=668072%26hmpgID=62%26bcd=JulystepsfooterNO62>*
>  
> > > > > 
>  
> > > ___ 
> > Ocfs2-users mailing list 
> > Ocfs2-users@oss.oracle.com 
> > http://oss.oracle.com/mailman/listinfo/ocfs2-users 
>
> 
> *A Good Credit Score is 700 or Above. See yours in just 2 easy steps! 
> <http://pr.atwola.com/promoclk/100126575x1222377077x1201454398/aol?redir=http://www.freecreditreport.com/pm/default.aspx?sc=668072%26hmpgID=62%26bcd=JulystepsfooterNO62>*
>  
>
>
> 
> *A Good Credit Score is 700 or Above. See yours in just 2 easy steps! 
> <http://pr.atwola.com/promoclk/100126575x1222377077x1201454398/aol?redir=http://www.freecreditreport.com/pm/default.aspx?sc=668072%26hmpgID=62%26bcd=JulystepsfooterNO62>*
>  
>
> 
>
> ___
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] umount hang + high CPU

2009-07-06 Thread sylarrrrrrr
20327]? [] ? 
deactivate_super+0x5f/0x78

Jul? 5 21:10:34 ocfs2Server kernel: [249187.320327]? [] ? 
sys_umount+0x2f9/0x353

Jul? 5 21:10:34 ocfs2Server kernel: [249187.320327]? [] ? 
do_page_fault+0x5d8/0x9c8

Jul? 5 21:10:34 ocfs2Server kernel: [249187.320327]? [] ? 
sys32_stat64+0x11/0x29

Jul? 5 21:10:34 ocfs2Server kernel: [249187.320337]? [] ? 
__up_write+0x21/0x10e

Jul? 5 21:10:34 ocfs2Server kernel: [249187.320337]? [] ? 
sysenter_do_call+0x1b/0x66

Jul? 5 21:10:34 ocfs2Server kernel: [249187.320337] 

Jul? 5 21:10:34 ocfs2Server kernel: [249187.320337] 

Jul? 5 21:10:34 ocfs2Server kernel: [249187.320337] Code: 00 00 8b b0 98 01 00 
00 48 c7 c7 2f a9 36 a0 31 c0 65 8b 14 25 24 00 00 00 48 89 0c 24 89 d2 48 c7 
c1 00 48 36 a0 e8 e8 bb ed df <0f> 0b eb fe 48 f7 05 32 dc fc ff 00 09 00 00 74 
4d 48 f7 05 2d 

Jul? 5 21:10:34 ocfs2Server kernel: [249187.320337] RIP? [] 
:ocfs2_dlm:dlm_empty_lockres+0x13fb/0x14a0

Jul? 5 21:10:34 ocfs2Server kernel: [249187.320337]? RSP 

Jul? 5 21:10:34 ocfs2Server kernel: [249187.320337] ---[ end trace 
10e3d919ff4fa443 ]---

Jul? 5 21:10:34 ocfs2Server kernel: [249187.320337] [ cut here 
]








 

PS. I see that both kernels have the same 1.5.0 version, so upgrading was 
pointless in this regard.









-Original Message-


From: Tao Ma 


To: syla...@aim.com


Cc: ocfs2-users@oss.oracle.com


Sent: Sun, Jul 5, 2009 9:22 pm


Subject: Re: [Ocfs2-users] umount hang + high CPU















Hi,?



?  Is there something in your system log??



?  I would guess there should be some info there.?


?



Regards,?



Tao?


?



syla...@aim.com wrote:?



>   Hi,?



> 
>  I had a problem where I got a "kernel bug" in the logs in ocfs2. That 
> happened when I unmounted the volume after a day or two that it was 
> mounted, so I thought I needed to upgrade the kernel (maybe the next 
> version will be bug free), so I did to 2.6.30, and now I tried mounting 
> and unmounting the volume right away... and it hanged, and the CPU got 
> high with that umount process.?



> 
> Please advice?



> 
> PS. tools and console packages are version 1.4.2.?



> 
> *A Good Credit Score is 700 or Above. See yours in just 2 easy steps! 
> <http://pr.atwola.com/promoclk/100126575x1222887319x1201497660/aol?redir=http://www.freecreditreport.com/pm/default.aspx?sc=668072%26hmpgID=62%26bcd=JulystepsfooterNO62>*
>  
> 
> 
> 
> ?



> 
> ___?



> Ocfs2-users mailing list?



> ocfs2-us...@oss.oracle.com?



> http://oss.oracle.com/mailman/listinfo/ocfs2-users?







 



 

A Good Credit Score is 700 or Above. See yours in just 2 easy steps! 



 

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] umount hang + high CPU

2009-07-05 Thread Tao Ma
Hi,
Is there something in your system log?
I would guess there should be some info there.

Regards,
Tao

syla...@aim.com wrote:
>   Hi,
> 
>  I had a problem where I got a "kernel bug" in the logs in ocfs2. That 
> happened when I unmounted the volume after a day or two that it was 
> mounted, so I thought I needed to upgrade the kernel (maybe the next 
> version will be bug free), so I did to 2.6.30, and now I tried mounting 
> and unmounting the volume right away... and it hanged, and the CPU got 
> high with that umount process.
> 
> Please advice
> 
> PS. tools and console packages are version 1.4.2.
> 
> *A Good Credit Score is 700 or Above. See yours in just 2 easy steps! 
> *
>  
> 
> 
> 
> 
> 
> ___
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


[Ocfs2-users] umount hang + high CPU

2009-07-05 Thread sylarrrrrrr
Hi,

?I had a problem where I got a "kernel bug" in the logs in ocfs2. That happened 
when I unmounted the volume after a day or two that it was mounted, so I 
thought I needed to upgrade the kernel (maybe the next version will be bug 
free), so I did to 2.6.30, and now I tried mounting and unmounting the volume 
right away... and it hanged, and the CPU got high with that umount process. 

Please advice

PS. tools and console packages are version 1.4.2.
___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users