Hello After 24 hours i see TEST-MAIL2 reboot ( possible kernel panic) but TEST-MAIL1 got in dmesg: TEST-MAIL1 ~ #dmesg [cut] o2net: accepted connection from node TEST-MAIL2 (num 1) at 172.17.1.252:7777 o2dlm: Node 1 joins domain B24C4493BBC74FEAA3371E2534BB3611 o2dlm: Nodes in domain B24C4493BBC74FEAA3371E2534BB3611: 0 1 o2net: connection to node TEST-MAIL2 (num 1) at 172.17.1.252:7777 has been idle for 60.0 seconds, shutting it down. (swapper,0,0):o2net_idle_timer:1562 Here are some times that might help debug the situation: (Timer: 33127732045, Now 33187808090, DataReady 33127732039, Advance 33127732051-33127732051, Key 0xebb9cd47, Func 506, FuncTime 33127732045-33127732048) o2net: no longer connected to node TEST-MAIL2 (num 1) at 172.17.1.252:7777 (du,5099,12):dlm_do_master_request:1324 ERROR: link to 1 went down! (du,5099,12):dlm_get_lock_resource:907 ERROR: status = -112 (dlm_thread,14321,1):dlm_send_proxy_ast_msg:484 ERROR: B24C4493BBC74FEAA3371E2534BB3611: res M000000000000000000000cf023ef70, error -112 send AST to node 1 (dlm_thread,14321,1):dlm_flush_asts:605 ERROR: status = -112 (dlm_thread,14321,1):dlm_send_proxy_ast_msg:484 ERROR: B24C4493BBC74FEAA3371E2534BB3611: res P000000000000000000000000000000, error -107 send AST to node 1 (dlm_thread,14321,1):dlm_flush_asts:605 ERROR: status = -107 (kworker/u:3,5071,0):o2net_connect_expired:1724 ERROR: no connection established with node 1 after 60.0 seconds, giving up and returning errors. (o2hb-B24C4493BB,14310,0):o2dlm_eviction_cb:267 o2dlm has evicted node 1 from group B24C4493BBC74FEAA3371E2534BB3611 (ocfs2rec,5504,6):dlm_get_lock_resource:834 B24C4493BBC74FEAA3371E2534BB3611:M0000000000000000000015f023ef70: at least one node (1) to recover before lock mastery can begin (ocfs2rec,5504,6):dlm_get_lock_resource:888 B24C4493BBC74FEAA3371E2534BB3611:M0000000000000000000015f023ef70: at least one node (1) to recover before lock mastery can begin (du,5099,12):dlm_restart_lock_mastery:1213 ERROR: node down! 1 (du,5099,12):dlm_wait_for_lock_mastery:1030 ERROR: status = -11 (du,5099,12):dlm_get_lock_resource:888 B24C4493BBC74FEAA3371E2534BB3611:N000000000020924f: at least one node (1) to recover before lock mastery can begin (dlm_reco_thread,14322,0):dlm_get_lock_resource:834 B24C4493BBC74FEAA3371E2534BB3611:$RECOVERY: at least one node (1) to recover before lock mastery can begin (dlm_reco_thread,14322,0):dlm_get_lock_resource:868 B24C4493BBC74FEAA3371E2534BB3611: recovery map is not empty, but must master $RECOVERY lock now (dlm_reco_thread,14322,0):dlm_do_recovery:523 (14322) Node 0 is the Recovery Master for the Dead Node 1 for Domain B24C4493BBC74FEAA3371E2534BB3611 (ocfs2rec,5504,6):ocfs2_replay_journal:1549 Recovering node 1 from slot 1 on device (253,0) (ocfs2rec,5504,6):ocfs2_begin_quota_recovery:407 Beginning quota recovery in slot 1 (kworker/u:0,2909,0):ocfs2_finish_quota_recovery:599 Finishing quota recovery in slot 1
And i try give this command: debugfs.ocfs2 -l ENTRY EXIT DLM_GLUE QUOTA INODE DISK_ALLOC EXTENT_MAP allow debugfs.ocfs2: Unable to write log mask "ENTRY": No such file or directory debugfs.ocfs2 -l ENTRY EXIT DLM_GLUE QUOTA INODE DISK_ALLOC EXTENT_MAP off debugfs.ocfs2: Unable to write log mask "ENTRY": No such file or directory But not working.... -----Oryginalna wiadomość----- From: Srinivas Eeda Sent: Wednesday, December 21, 2011 8:43 PM To: Marek Królikowski Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] ocfs2 - Kernel panic on many write/read from both Those numbers look good. Basically with the fixes backed out and another fix I gave, you are not seeing that many orphans hanging around and hence not seeing the process stuck kernel stacks. You can run the test longer or if you are satisfied, please enable quotas and re-run the test with the modified kernel. You might see a dead lock which needs to be fixed(I was not able to reproduce this yet). If the system hangs, please capture the following and provide me the output 1. echo t > /proc/sysrq-trigger 2. debugfs.ocfs2 -l ENTRY EXIT DLM_GLUE QUOTA INODE DISK_ALLOC EXTENT_MAP allow 3. wait for 10 minutes 4. debugfs.ocfs2 -l ENTRY EXIT DLM_GLUE QUOTA INODE DISK_ALLOC EXTENT_MAP off 5. echo t > /proc/sysrq-trigger _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users