Am Dienstag, 17. Dezember 2013, 09:17:31 schrieb ma...@nucleus.it: > Hi to all, > i set up a 2 node cluster with a cross cable between the two nodes > without stonith ; i know this is not the best way but this is the > scenario i need at that time. > > I know the releases are old: > corosync-1.2.7-1.2 > libcorosync-1.2.7-1.2 > pacemaker-1.0.10-1.4 > libpacemaker3-1.0.10-1.4 > > Everything was ok for some days/months but a few day ago without > network interruption ( no messages relative to ethernet modules or > errors in network statistics or notifications by nagios ping checks ) > between the two nodes something went wrong. > > From what i try to understand from the logs attached : > Token Timeout (10000 ms) retransmit timeout (980 ms) > token hold (774 ms) retransmits before loss (10 retrans) > > > the 2 nodes lost a token and they try to solve the situation but > node1 think node2 is up: > > Dec 7 05:01:41 node1 pengine: [1138]: info: determine_online_status: > Node node2 is online > Dec 7 05:01:41 node1 pengine: [1138]: info: > determine_online_status: Node node1 is online > > and then lost > > Dec 7 05:01:54 node1 corosync[1128]: [pcmk ] info: > ais_mark_unseen_peer_dead: Node node2 was not seen in the previous > transition > Dec 7 05:01:54 node1 corosync[1128]: [pcmk ] info: update_member: > Node 33559980/node2 is now: lost > > while node2 think node1 was gone: > > Dec 7 05:01:34 node2 corosync[6356]: [pcmk ] info: > ais_mark_unseen_peer_dead: Node node1 was not seen in the previous > transition Dec 7 05:01:34 node2 corosync[6356]: [pcmk ] info: > update_member: Node 16782764/node1 is now: lost > > then they go in spilt brain . > Any suggestion about why node1 saw node2 ath the first time while node2 > declared immediately lost node1 ?
This depends who initiates the round. Both nodes recognized the failure within 20 seconds. This is ok. Especially if you allow 10 Sekunds for a token timeout. Mit freundlichen Grüßen, Michael Schwartzkopff -- [*] sys4 AG http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044 Franziskanerstraße 15, 81669 München Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263 Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer Aufsichtsratsvorsitzender: Florian Kirstein
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org