On Mon, 2014-10-13 at 12:51 +1100, Andrew Beekhof wrote:
> 
> Even the same address can be a problem. That brief window where things were 
> getting renewed can screw up corosync.

But as I proved, there was no renewal at all during the period of this
entire pacemaker run, so the use of DHCP here is a red-herring and does
not explain the observed behaviour.

> Never ever use dhcp for a cluster node. Ever. Really, never.

Fair enough.  But since this was not the cause of this problem, it's
still unexplained.  Is it a bug in pacemaker that it doesn't handle this
mysterious third node appearance/disappearance and it fouls up the
cluster?

> Yes. That is what nodeid's are calculated from.
> Different nodeid == different address

So your theory is that corosync on one of the nodes momentarily decided
to change which interface it was binding to and ...

> localhost is the most common one

... binded to localhost?  If so, I guess I should take this to the
corosync list.

b.




_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to