i guess i found the solution. while dumping some files with debugfs, it suddenly stopped working and could not be killed. and guess what, media error on the drive :-/. funny that a filesystem check succeeds.
anyway thx a lot to those who responded. holger On Thu, 2006-09-14 at 11:03 -0700, Sunil Mushran wrote: > Not sure why a power outage should cause this. > > Do you have the full stack of the oops? It will show the times taken > in the last 24 operations in the hb thread. That should tell us as to > what is up. > > Holger Brueckner wrote: > > i just discovered the ls, cd, dump and rdump commands in debugfs.ocfs2. > > they work fine :-). neverless i would really like to know why mounting > > and accessing the volume is not possible anymore. > > > > but thanks for the hint pieter > > > > holger brueckner > > > > On Thu, 2006-09-14 at 14:30 +0200, Pieter Viljoen - MWEB wrote: > > > >> Hi Holger > >> > >> Maybe you should try the fscat tools > >> (http://oss.oracle.com/projects/fscat/) - which has a fsls (to list) and > >> fscp (to copy) directly from the device. > >> > >> I have not tried it yet, so good luck! > >> > >> > >> Pieter Viljoen > >> > >> > >> -----Original Message----- > >> From: [EMAIL PROTECTED] > >> [mailto:[EMAIL PROTECTED] On Behalf Of Holger > >> Brueckner > >> Sent: Thursday, September 14, 2006 14:17 > >> To: [email protected] > >> Subject: Re: [Ocfs2-users] self fencing and system panic problem > >> afterforced reboot > >> > >> side note: setting HEARBEAT_THRESHOLD to 30 did not help either. > >> > >> could it be that the syncronization between the daemons does not work? > >> (e.g daemons think fs is mounted on some nodes and try to synchonize but > >> actually the fs isn't mounted on any node?) > >> > >> i'm rather clueless now. finding a way to access the data and copy it to > >> the non shared partitions would help me a lot. > >> > >> thx > >> > >> holger brueckner > >> > >> > >> On Thu, 2006-09-14 at 13:47 +0200, Holger Brueckner wrote: > >> > >>> X-CS-3-Report: plain > >>> > >>> > >>> hello, > >>> > >>> i'm running ocfs2 to provide a shared disk thoughout a xen cluster. > >>> this setup was working fine until today where there was an power > >>> > >> outage > >> > >>> and all xen nodes where forcefully shut down. whenever i try to > >>> mount/access the ocfs2 partition the system panics and reboots: > >>> > >>> darks:~# fsck.ocfs2 -y -f /dev/sda4 > >>> (617,0):__dlm_print_nodes:377 Nodes in my domain > >>> ("5BA3969FC2714FFEAD66033486242B58"): > >>> (617,0):__dlm_print_nodes:381 node 0 > >>> Checking OCFS2 filesystem in /dev/sda4: > >>> label: <NONE> > >>> uuid: 5b a3 96 9f c2 71 4f fe ad 66 03 34 86 24 2b 58 > >>> number of blocks: 35983584 > >>> bytes per block: 4096 > >>> number of clusters: 4497948 > >>> bytes per cluster: 32768 > >>> max slots: 4 > >>> > >>> /dev/sda4 was run with -f, check forced. > >>> Pass 0a: Checking cluster allocation chains > >>> Pass 0b: Checking inode allocation chains > >>> Pass 0c: Checking extent block allocation chains > >>> Pass 1: Checking inodes and blocks. > >>> [CLUSTER_ALLOC_BIT] Cluster 295771 is marked in the global cluster > >>> bitmap but it isn't in use. Clear its bit in the bitmap? y > >>> [CLUSTER_ALLOC_BIT] Cluster 2456870 is marked in the global cluster > >>> bitmap but it isn't in use. Clear its bit in the bitmap? y > >>> [CLUSTER_ALLOC_BIT] Cluster 2683096 is marked in the global cluster > >>> bitmap but it isn't in use. Clear its bit in the bitmap? y > >>> Pass 2: Checking directory entries. > >>> Pass 3: Checking directory connectivity. > >>> Pass 4a: checking for orphaned inodes > >>> Pass 4b: Checking inodes link counts. > >>> All passes succeeded. > >>> darks:~# mount /data > >>> (622,0):ocfs2_initialize_super:1326 max_slots for this device: 4 > >>> (622,0):ocfs2_fill_local_node_info:1019 I am node 0 > >>> (622,0):__dlm_print_nodes:377 Nodes in my domain > >>> ("5BA3969FC2714FFEAD66033486242B58"): > >>> (622,0):__dlm_print_nodes:381 node 0 > >>> (622,0):ocfs2_find_slot:261 slot 2 is already allocated to this node! > >>> (622,0):ocfs2_find_slot:267 taking node slot 2 > >>> (622,0):ocfs2_check_volume:1586 File system was not unmounted cleanly, > >>> recovering volume. > >>> kjournald starting. Commit interval 5 seconds > >>> ocfs2: Mounting device (8,4) on (node 0, slot 2) with ordered data > >>> > >> mode. > >> > >>> (630,0):ocfs2_replay_journal:1181 Recovering node 2 from slot 0 on > >>> device (8,4) > >>> darks:~# (4,0):o2hb_write_timeout:164 ERROR: Heartbeat write timeout > >>> > >> to > >> > >>> device sda4 after 12000 milliseconds > >>> (4,0):o2hb_stop_all_regions:1789 ERROR: stopping heartbeat on all > >>> > >> active > >> > >>> regions. > >>> Kernel panic - not syncing: ocfs2 is very sorry to be fencing this > >>> system by panicing > >>> > >>> ocfs2-tools 1.2.1-1 > >>> kernel 2.6.16-xen (with corresponding ocfs2 compiled into the > >>> kernel) > >>> > >>> i already tried the elevator=deadline scheduler option with no effect. > >>> any further help debugging this issue is greatly appreciated. are ther > >>> any other possibilities to get access to the data from outside the > >>> cluster (obviously while the partition isn't mounted) ? > >>> > >>> thanks for your help > >>> > >>> holger brueckner > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Ocfs2-users mailing list > >>> [email protected] > >>> http://oss.oracle.com/mailman/listinfo/ocfs2-users > >>> > >> > >> _______________________________________________ > >> Ocfs2-users mailing list > >> [email protected] > >> http://oss.oracle.com/mailman/listinfo/ocfs2-users > >> > > > > > > _______________________________________________ > > Ocfs2-users mailing list > > [email protected] > > http://oss.oracle.com/mailman/listinfo/ocfs2-users > > _______________________________________________ Ocfs2-users mailing list [email protected] http://oss.oracle.com/mailman/listinfo/ocfs2-users
