Hoi,

As I wrote to another post[1] I failed to upgrade to 1.1.8 for a 2 node
cluster.

Before the upgrade process both nodes are using CentOS 6.3, corosync
1.4.1-7 and pacemaker-1.1.7.

I followed the rolling upgrade process, so I stopped pacemaker and then
corosync on node1 and upgraded to CentOS 6.4. The OS upgrade upgrades
also pacemaker to 1.1.8-7 and corosync to 1.4.1-15.
The upgrade of rpms went smoothly as I knew about the crmsh issue so I
made sure I had crmsh rpm on my repos.

Corosync started without any problems and both nodes could see each
other[2]. But for some reason node2 failed to receive a reply on join
offer from node1 and node1 never joined the cluster. Node1 formed a new
cluster as it never got an reply from node2, so I ended up with a
split-brain situation.

Logs of node1 can be found here
https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node1.log
and of node2 here
https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node2.log

I have found this thread[3] which could be related to my problem but the
bug which caused the failure on join on that case is solved in 1.1.8.

Any ideas?

Cheers,
Pavlos





[1] Subject Different value on cluster-infrastructure between 2 nodes
[2]
https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/corosync.status
[3] http://comments.gmane.org/gmane.linux.highavailability.pacemaker/13185

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to