Hi, On 23-10-12 13:13, Dejan Muhamedagic wrote: > Hi, > > On Mon, Oct 22, 2012 at 11:06:07AM +0200, Robbert Muller wrote: >> Hello, >> >> While testing a new cluster we found the following behavior which i >> discussed on #linux-ha with "andreask" afterwards and we both agree the >> behavior was wrong. >> >> bug scenario: >> 3 node cluster, 1 standby just for having 3 nodes, 2 active nodes >> when we did a power off of the machine ( similar to pulling the power >> cable from a machine ) the cluster failed to failover to the next node. >> >> This is because the following setting: >> RESETPOWERON was set to 0, so a machine powered off stays powered off > > Just to make sure: RESETPOWERON was set to 0 in the configuration? Yes it is.
> >> with the current code path, a machine in the state poweroff is >> considered a failure for the stonith reset operation. which results in >> no resources are started on the second node, and the machine stays in a >> unclean state. >> >> The analogy with real hardware and a powerbar and imho correct behavior: >> --- >> If i pull the plug of node1, node 2 will fence it with the powerbar. The >> power will powercycle the socket without any result, because i pulled >> the plug. But the fencing operation is a success and all resources are >> started on the second node >> --- >> >> Patch to fix this with i hope a minimal change is attached. > > Thanks for the patch. But we'll need to rework it a bit. Could you tell me what is wrong with it? i am currently testing it on our customers environment. And it seems to work as expected. > >> After finding this bug i got ill and have to stay at home for a few >> days, so i don't have access to an environment to test this patch atm. > > Get better soon! Thx, the antibiotics seem to have killed the infection. So i'm back to work. Regards Robbert _______________________________________________________ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/