Hi All,
OK, here is my testing using cman/clvmd enabled on system startup and clvmd outside of pacemaker control. I still seem to be getting the clvmd hang/fail situation even when running outside of pacemaker control, I cannot see off-hand where the issue is occurring, but maybe it is related to what Vladislav was saying where clvmd hangs if it is not running on a cluster node that has cman running, however, I have both cman/clvmd enable to start at boot. Here is a little synopsis of what appears to be happening here: [1] Everything is fine here, both nodes up and running: # cman_tool nodes Node Sts Inc Joined Name 1 M 444 2014-02-07 10:25:00 test01 2 M 440 2014-02-07 10:25:00 test02 # dlm_tool ls dlm lockspaces name clvmd id 0x4104eefa flags 0x00000000 change member 2 joined 1 remove 0 failed 0 seq 1,1 members 1 2 [2] Here I "echo c > /proc/sysrq-trigger" on node2 (test02), I can see crm_mon saying that node 2 is in unclean state and fencing kicks in (reboot node 2) # cman_tool nodes Node Sts Inc Joined Name 1 M 440 2014-02-07 10:27:58 test01 2 X 444 test02 # dlm_tool ls dlm lockspaces name clvmd id 0x4104eefa flags 0x00000004 kern_stop change member 2 joined 1 remove 0 failed 0 seq 2,2 members 1 2 new change member 1 joined 0 remove 1 failed 1 seq 3,3 new status wait_messages 0 wait_condition 1 fencing new members 1 [3] So the above looks fine so far, to my untrained eye, dlm in kern_stop state while waiting on successful fence, and the node reboots and we have the following state: # cman_tool nodes Node Sts Inc Joined Name 1 M 440 2014-02-07 10:27:58 test01 2 M 456 2014-02-07 10:35:42 test02 # dlm_tool ls dlm lockspaces name clvmd id 0x4104eefa flags 0x00000000 change member 2 joined 1 remove 0 failed 0 seq 4,4 members 1 2 So it looks like dlm and cman seem to be working properly (again, I could be wrong, my untrained eye and all :) ) However, if I try to run any lvm status/clvm status commands then they still just hang. Could this be related to clvmd doing a check when cman is up and running but clvmd has not started yet (As I understand from Vladislav's previous email). Or do I have something fundamentally wrong with my fencing configuration. Here is a link to the "dlm_tool dump" at the time of the above "dlm_tool ls" (if it helps) http://pastebin.com/KV6YZWrN Again, thanks for all the info thus far. Thanks
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org