Hi All, We constituted a simple cluster in environment of vSphere5.1.
We composed it of two ESXi servers and shared disk. The guest located it to the shared disk. Step 1) Constitute a cluster.(A DC node is an active node.) ============ Last updated: Mon May 13 14:16:09 2013 Stack: Heartbeat Current DC: pgsr01 (85a81130-4fed-4932-ab4c-21ac2320186f) - partition with quorum Version: 1.0.13-30bb726 2 Nodes configured, unknown expected votes 2 Resources configured. ============ Online: [ pgsr01 pgsr02 ] Resource Group: test-group Dummy1 (ocf::pacemaker:Dummy): Started pgsr01 Dummy2 (ocf::pacemaker:Dummy): Started pgsr01 Clone Set: clnPingd Started: [ pgsr01 pgsr02 ] Node Attributes: * Node pgsr01: + default_ping_set : 100 * Node pgsr02: + default_ping_set : 100 Migration summary: * Node pgsr01: * Node pgsr02: Step 2) Strace does the pengine process of the DC node. [root@pgsr01 ~]# ps -ef |grep heartbeat root 2072 1 0 13:56 ? 00:00:00 heartbeat: master control process root 2075 2072 0 13:56 ? 00:00:00 heartbeat: FIFO reader root 2076 2072 0 13:56 ? 00:00:00 heartbeat: write: bcast eth1 root 2077 2072 0 13:56 ? 00:00:00 heartbeat: read: bcast eth1 root 2078 2072 0 13:56 ? 00:00:00 heartbeat: write: bcast eth2 root 2079 2072 0 13:56 ? 00:00:00 heartbeat: read: bcast eth2 496 2082 2072 0 13:57 ? 00:00:00 /usr/lib64/heartbeat/ccm 496 2083 2072 0 13:57 ? 00:00:00 /usr/lib64/heartbeat/cib root 2084 2072 0 13:57 ? 00:00:00 /usr/lib64/heartbeat/lrmd -r root 2085 2072 0 13:57 ? 00:00:00 /usr/lib64/heartbeat/stonithd 496 2086 2072 0 13:57 ? 00:00:00 /usr/lib64/heartbeat/attrd 496 2087 2072 0 13:57 ? 00:00:00 /usr/lib64/heartbeat/crmd 496 2089 2087 0 13:57 ? 00:00:00 /usr/lib64/heartbeat/pengine root 2182 1 0 14:15 ? 00:00:00 /usr/lib64/heartbeat/pingd -D -p /var/run//pingd-default_ping_set -a default_ping_set -d 5s -m 100 -i 1 -h 192.168.101.254 root 2287 1973 0 14:16 pts/0 00:00:00 grep heartbea [root@pgsr01 ~]# strace -p 2089 Process 2089 attached - interrupt to quit restart_syscall(<... resuming interrupted call ...>) = 0 times({tms_utime=5, tms_stime=6, tms_cutime=0, tms_cstime=0}) = 429527557 recvfrom(5, 0xa93ff7, 953, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=5, events=0}], 1, 0) = 0 (Timeout) recvfrom(5, 0xa93ff7, 953, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=5, events=0}], 1, 0) = 0 (Timeout) (snip) Step 3) Disconnect the shared disk which an active node was placed. Step 4) Cut off pingd of the standby node. The score of pingd is reflected definitely, but handling of pengine blocks it. ~ # esxcfg-vswitch -N vmnic1 -p "ap-db" vSwitch1 ~ # esxcfg-vswitch -N vmnic2 -p "ap-db" vSwitch1 (snip) brk(0xd05000) = 0xd05000 brk(0xeed000) = 0xeed000 brk(0xf2d000) = 0xf2d000 fstat(6, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f86a255a000 write(6, "BZh51AY&SY\327\373\370\203\0\t(_\200UPX\3\377\377%cT \277\377\377"..., 2243) = 2243 brk(0xb1d000) = 0xb1d000 fsync(6 ------------------------------> BLOCKED (snip) ============ Last updated: Mon May 13 14:19:15 2013 Stack: Heartbeat Current DC: pgsr01 (85a81130-4fed-4932-ab4c-21ac2320186f) - partition with quorum Version: 1.0.13-30bb726 2 Nodes configured, unknown expected votes 2 Resources configured. ============ Online: [ pgsr01 pgsr02 ] Resource Group: test-group Dummy1 (ocf::pacemaker:Dummy): Started pgsr01 Dummy2 (ocf::pacemaker:Dummy): Started pgsr01 Clone Set: clnPingd Started: [ pgsr01 pgsr02 ] Node Attributes: * Node pgsr01: + default_ping_set : 100 * Node pgsr02: + default_ping_set : 0 : Connectivity is lost Migration summary: * Node pgsr01: * Node pgsr02: Step 4) Reconnect communication of pingd of the standby node. The score of pingd is reflected definitely, but handling of pengine blocks it. ~ # esxcfg-vswitch -M vmnic1 -p "ap-db" vSwitch1 ~ # esxcfg-vswitch -M vmnic2 -p "ap-db" vSwitch1 ============ Last updated: Mon May 13 14:19:40 2013 Stack: Heartbeat Current DC: pgsr01 (85a81130-4fed-4932-ab4c-21ac2320186f) - partition with quorum Version: 1.0.13-30bb726 2 Nodes configured, unknown expected votes 2 Resources configured. ============ Online: [ pgsr01 pgsr02 ] Resource Group: test-group Dummy1 (ocf::pacemaker:Dummy): Started pgsr01 Dummy2 (ocf::pacemaker:Dummy): Started pgsr01 Clone Set: clnPingd Started: [ pgsr01 pgsr02 ] Node Attributes: * Node pgsr01: + default_ping_set : 100 * Node pgsr02: + default_ping_set : 100 Migration summary: * Node pgsr01: * Node pgsr02: --------- A block state of pengine continues ----- Step 5) Cut off pingd of the active node. The score of pingd is reflected definitely, but handling of pengine blocks it. ~ # esxcfg-vswitch -N vmnic1 -p "ap-db" vSwitch1 ~ # esxcfg-vswitch -N vmnic2 -p "ap-db" vSwitch1 ============ Last updated: Mon May 13 14:20:32 2013 Stack: Heartbeat Current DC: pgsr01 (85a81130-4fed-4932-ab4c-21ac2320186f) - partition with quorum Version: 1.0.13-30bb726 2 Nodes configured, unknown expected votes 2 Resources configured. ============ Online: [ pgsr01 pgsr02 ] Resource Group: test-group Dummy1 (ocf::pacemaker:Dummy): Started pgsr01 Dummy2 (ocf::pacemaker:Dummy): Started pgsr01 Clone Set: clnPingd Started: [ pgsr01 pgsr02 ] Node Attributes: * Node pgsr01: + default_ping_set : 0 : Connectivity is lost * Node pgsr02: + default_ping_set : 100 Migration summary: * Node pgsr01: * Node pgsr02: --------- A block state of pengine continues ----- After that the movement to the standby node of the resource does not happen because in condition transition is not made because a block state of pengine continues. In the vSphere environment, time considerably passes, and blocking is canceled, and transition is generated. * The IO blocking of pengine seems to occur repeatedly * Other processes may be blocked, too. * It took it from trouble to FO completion more than one hour. This problem shows that resource movement may not occur after disk trouble in vSphere environment. Because our user thinks that I use Pacemaker in vSphere environment, the solution to this problem is necessary. Do not you know the example which solved a similar problem on vSphere? We think that it is necessary to evade a block of pengine if there is not a solution example. For example... 1. crmd watches a request to pengine with a timer... 2. pengine writes in it with a timer and watches processing.... ..etc... * This problem does not seem to occur in KVM. * There is the possibility of the difference of the hyper visor. * In addition, even an actual machine of Linux did not generate the problem. Best Regards, Hideo Yamauchi. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org