Re: [Ocfs2-users] self fencing and system panic problem afterforced reboot

Holger Brueckner Fri, 15 Sep 2006 01:28:25 -0700

i guess i found the solution. while dumping some files with debugfs, it
suddenly stopped working and could not be killed. and guess what, media
error on the drive :-/. funny that a filesystem check succeeds.


anyway thx a lot to those who responded.

holger

On Thu, 2006-09-14 at 11:03 -0700, Sunil Mushran wrote:
> Not sure why a power outage should cause this.
> 
> Do you have the full stack of the oops? It will show the times taken
> in the last 24 operations in the hb thread. That should tell us as to 
> what is up.
> 
> Holger Brueckner wrote:
> > i just discovered the ls, cd, dump and rdump commands in debugfs.ocfs2.
> > they work fine :-). neverless i would really like to know why mounting
> > and accessing the volume is not possible anymore.
> >
> > but thanks for the hint pieter
> >
> > holger brueckner
> >
> > On Thu, 2006-09-14 at 14:30 +0200, Pieter Viljoen - MWEB wrote:
> >   
> >> Hi Holger
> >>
> >> Maybe you should try the fscat tools
> >> (http://oss.oracle.com/projects/fscat/) - which has a fsls (to list) and
> >> fscp (to copy) directly from the device.
> >>
> >> I have not tried it yet, so good luck!
> >>
> >>
> >> Pieter Viljoen
> >>  
> >>
> >> -----Original Message-----
> >> From: [EMAIL PROTECTED]
> >> [mailto:[EMAIL PROTECTED] On Behalf Of Holger
> >> Brueckner
> >> Sent: Thursday, September 14, 2006 14:17
> >> To: [email protected]
> >> Subject: Re: [Ocfs2-users] self fencing and system panic problem
> >> afterforced reboot
> >>
> >> side note: setting HEARBEAT_THRESHOLD to 30 did not help either.
> >>
> >> could it be that the syncronization between the daemons does not work?
> >> (e.g daemons think fs is mounted on some nodes and try to synchonize but
> >> actually the fs isn't mounted on any node?)
> >>
> >> i'm rather clueless now. finding a way to access the data and copy it to
> >> the non shared partitions would help me a lot.
> >>
> >> thx
> >>
> >> holger brueckner
> >>
> >>
> >> On Thu, 2006-09-14 at 13:47 +0200, Holger Brueckner wrote:
> >>     
> >>> X-CS-3-Report: plain
> >>>
> >>>
> >>> hello,
> >>>
> >>> i'm running ocfs2 to provide a shared disk thoughout a xen cluster.
> >>> this setup was working fine until today where there was an power
> >>>       
> >> outage
> >>     
> >>> and all xen nodes where forcefully shut down. whenever i try to
> >>> mount/access the ocfs2 partition the system panics and reboots: 
> >>>
> >>> darks:~# fsck.ocfs2 -y -f /dev/sda4
> >>> (617,0):__dlm_print_nodes:377 Nodes in my domain
> >>> ("5BA3969FC2714FFEAD66033486242B58"):
> >>> (617,0):__dlm_print_nodes:381  node 0
> >>> Checking OCFS2 filesystem in /dev/sda4:
> >>>   label:              <NONE>
> >>>   uuid:               5b a3 96 9f c2 71 4f fe ad 66 03 34 86 24 2b 58
> >>>   number of blocks:   35983584
> >>>   bytes per block:    4096
> >>>   number of clusters: 4497948
> >>>   bytes per cluster:  32768
> >>>   max slots:          4
> >>>
> >>> /dev/sda4 was run with -f, check forced.
> >>> Pass 0a: Checking cluster allocation chains
> >>> Pass 0b: Checking inode allocation chains
> >>> Pass 0c: Checking extent block allocation chains
> >>> Pass 1: Checking inodes and blocks.
> >>> [CLUSTER_ALLOC_BIT] Cluster 295771 is marked in the global cluster
> >>> bitmap but it isn't in use.  Clear its bit in the bitmap? y
> >>> [CLUSTER_ALLOC_BIT] Cluster 2456870 is marked in the global cluster
> >>> bitmap but it isn't in use.  Clear its bit in the bitmap? y
> >>> [CLUSTER_ALLOC_BIT] Cluster 2683096 is marked in the global cluster
> >>> bitmap but it isn't in use.  Clear its bit in the bitmap? y
> >>> Pass 2: Checking directory entries.
> >>> Pass 3: Checking directory connectivity.
> >>> Pass 4a: checking for orphaned inodes
> >>> Pass 4b: Checking inodes link counts.
> >>> All passes succeeded.
> >>> darks:~# mount /data
> >>> (622,0):ocfs2_initialize_super:1326 max_slots for this device: 4
> >>> (622,0):ocfs2_fill_local_node_info:1019 I am node 0
> >>> (622,0):__dlm_print_nodes:377 Nodes in my domain
> >>> ("5BA3969FC2714FFEAD66033486242B58"):
> >>> (622,0):__dlm_print_nodes:381  node 0
> >>> (622,0):ocfs2_find_slot:261 slot 2 is already allocated to this node!
> >>> (622,0):ocfs2_find_slot:267 taking node slot 2
> >>> (622,0):ocfs2_check_volume:1586 File system was not unmounted cleanly,
> >>> recovering volume.
> >>> kjournald starting.  Commit interval 5 seconds
> >>> ocfs2: Mounting device (8,4) on (node 0, slot 2) with ordered data
> >>>       
> >> mode.
> >>     
> >>> (630,0):ocfs2_replay_journal:1181 Recovering node 2 from slot 0 on
> >>> device (8,4)
> >>> darks:~# (4,0):o2hb_write_timeout:164 ERROR: Heartbeat write timeout
> >>>       
> >> to
> >>     
> >>> device sda4 after 12000 milliseconds
> >>> (4,0):o2hb_stop_all_regions:1789 ERROR: stopping heartbeat on all
> >>>       
> >> active
> >>     
> >>> regions.
> >>> Kernel panic - not syncing: ocfs2 is very sorry to be fencing this
> >>> system by panicing
> >>>
> >>> ocfs2-tools    1.2.1-1
> >>> kernel         2.6.16-xen (with corresponding ocfs2 compiled into the
> >>>                kernel)
> >>>
> >>> i already tried the elevator=deadline scheduler option with no effect.
> >>> any further help debugging this issue is greatly appreciated. are ther
> >>> any other possibilities to get access to the data from outside the
> >>> cluster (obviously while the partition isn't mounted) ?
> >>>
> >>> thanks for your help
> >>>
> >>> holger brueckner
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Ocfs2-users mailing list
> >>> [email protected]
> >>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
> >>>       
> >>
> >> _______________________________________________
> >> Ocfs2-users mailing list
> >> [email protected]
> >> http://oss.oracle.com/mailman/listinfo/ocfs2-users
> >>     
> >
> >
> > _______________________________________________
> > Ocfs2-users mailing list
> > [email protected]
> > http://oss.oracle.com/mailman/listinfo/ocfs2-users
> >   


_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] self fencing and system panic problem afterforced reboot

Reply via email to