1.2.5 is year+ old. Suggest you upgrade to 1.2.9. The oops is bizzare to say the least. I notice you are using xenU kernel. 4 nodes are VMs? Just trying to understand the layout.
Is it reproducible? Definitely upgrade to 1.2.9. If the issue reproduces, file a bugzilla with all the details. This is a dlm/clusterstack issue. fsck will only show ondisk issue, which this is not. Sunil
--- Begin Message ---Hi, I have a cluster with 4 nodes all of them with the same kernel: Linux app19 2.6.9-48.ELxenU #1 SMP Sun Mar 4 19:50:03 EST 2007 x86_64 x86_64 x86_64 GNU/Linux and with OCFS2 Node Manager 1.2.5 Tue Apr 10 12:29:33 EDT 2007 (build 9e5f332181e8ebfad464946bcc4888af) OCFS2 DLM 1.2.5 Tue Apr 10 12:29:33 EDT 2007 (build e2556a71429f31033b275dff4b5594aa) OCFS2 DLMFS 1.2.5 Tue Apr 10 12:29:33 EDT 2007 (build e2556a71429f31033b275dff4b5594aa) OCFS2 User DLM kernel interface loaded >From a moment to the other the ocfs2 filesystems freeze: /home/user /usr I've rebooted one node (the one who had the higher load) and it keept on rebooting over and over again with the following error: (1768,0):dlm_convert_lock_handler:443 ERROR: Domain CACE9ABE4D474B04A3C06C944B7D616D not fully joined! ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at dlmconvert:443 invalid operand: 0000 [1] SMP CPU 0 Modules linked in: ocfs2(U) debugfs(U) ocfs2_dlmfs(U) ocfs2_dlm(U) ocfs2_nodemanager(U) configfs(U) sunrpc dm_mod xennet ext3 jbd xenblk Pid: 1768, comm: o2net Not tainted 2.6.9-48.ELxenU RIP: e030:[<ffffffffa00dcb8b>] <ffffffffa00dcb8b>{:ocfs2_dlm:dlm_convert_lock_handler+376} RSP: e02b:ffffff807d419d88 EFLAGS: 00010292 RAX: 000000000000006a RBX: ffffff807e6bdf00 RCX: 00000000000013ba RDX: 00000000000013ba RSI: 0000000000000000 RDI: ffffffff8032b9a0 RBP: ffffff8009669400 R08: 00000000000927bf R09: ffffff807e6bdf00 R10: ffffffff801eb0a8 R11: 0000ffff80346560 R12: ffffff807ed48000 R13: ffffff807e6bdf00 R14: 0000000000000000 R15: ffffff807ed48018 FS: 0000002a95563da0(0000) GS:ffffffff8041d700(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 Process o2net (pid: 1768, threadinfo ffffff807d418000, task ffffff807e562030) Stack: ffffffffff5fd000 0000000000000000 0000000000000000 ffffff8009786c00 0000000000000000 ffffff807e6bdf00 ffffff8009669400 ffffff807ed48000 ffffff807e6bdf00 0000000000000000 Call Trace:<ffffffffa009dac6>{:ocfs2_nodemanager:o2net_process_message+1567} <ffffffffa009dd03>{:ocfs2_nodemanager:o2net_rx_until_empty+0} <ffffffffa009e5b6>{:ocfs2_nodemanager:o2net_rx_until_empty+2227} <ffffffff8014092e>{worker_thread+419} <ffffffff8012b177>{default_wake_function+0} <ffffffff8012b1c8>{__wake_up_common+67} <ffffffff8012b177>{default_wake_function+0} <ffffffff80144bd4>{keventd_create_kthread+0} <ffffffff8014078b>{worker_thread+0} <ffffffff80144bd4>{keventd_create_kthread+0} <ffffffff80144bab>{kthread+200} <ffffffff8010e092>{child_rip+8} <ffffffff80144bd4>{keventd_create_kthread+0} <ffffffff80144ae3>{kthread+0} <ffffffff8010e08a>{child_rip+0} Code: 0f 0b 12 06 0f a0 ff ff ff ff bb 01 41 80 7f 0f 20 76 5c 48 RIP <ffffffffa00dcb8b>{:ocfs2_dlm:dlm_convert_lock_handler+376} RSP <ffffff807d419d88> <0>Kernel panic - not syncing: Oops Connection to xen3 closed. I had to shutdown all 4 nodes and start them one by one. I even checked with fsck.ocfs2 and it didn't reported any error. Any clues? Thanks Nuno Fernandes _______________________________________________ Ocfs2-users mailing list [email protected] http://oss.oracle.com/mailman/listinfo/ocfs2-users
--- End Message ---
_______________________________________________ Ocfs2-users mailing list [email protected] http://oss.oracle.com/mailman/listinfo/ocfs2-users
