On Mon, 2014-10-13 at 12:51 +1100, Andrew Beekhof wrote: > > Even the same address can be a problem. That brief window where things were > getting renewed can screw up corosync.
But as I proved, there was no renewal at all during the period of this entire pacemaker run, so the use of DHCP here is a red-herring and does not explain the observed behaviour. > Never ever use dhcp for a cluster node. Ever. Really, never. Fair enough. But since this was not the cause of this problem, it's still unexplained. Is it a bug in pacemaker that it doesn't handle this mysterious third node appearance/disappearance and it fouls up the cluster? > Yes. That is what nodeid's are calculated from. > Different nodeid == different address So your theory is that corosync on one of the nodes momentarily decided to change which interface it was binding to and ... > localhost is the most common one ... binded to localhost? If so, I guess I should take this to the corosync list. b. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org