On 07/03/2017 02:34 AM, Cesar Hernandez wrote: > Hi > > I have installed a pacemaker cluster with two nodes. The same type of > installation has done before many times and the following error never > appeared before. The situation is the following: > > both nodes running cluster services > stop pacemaker&corosync on node 1 > stop pacemaker&corosync on node 2 > start corosync&pacemaker on node 1 > > Then node 1 starts, it sees node2 down, and it fences it, as it was expected. > But the problem comes when node 2 is rebooted and starts cluster services: > sometimes, it starts the corosync service but the pacemaker service starts > and then stops. The syslog shows the following error in these cases: > > Jul 3 09:07:04 node2 pacemakerd[597]: warning: The crmd process (608) can > no longer be respawned, shutting the cluster down. > Jul 3 09:07:04 node2 pacemakerd[597]: notice: Shutting down Pacemaker > > Previous messages show some warning messages that I'm not sure they are > related with the shutdown: > > > Jul 3 09:07:04 node2 stonith-ng[604]: notice: Operation reboot of node2 by > node1 for crmd.2413@node1.608d8118: OK > Jul 3 09:07:04 node2 crmd[608]: crit: We were allegedly just fenced by > node1 for node1! > Jul 3 09:07:04 node2 corosync[585]: [pcmk ] info: pcmk_ipc_exit: Client > crmd (conn=0x1471800, async-conn=0x1471800) left > > > On node1, all resources become unrunnable and it stays there forever until I > start manually pacemaker service on node2. > As I said, same type of installation has done before on other servers and > never happened this. The only difference is that in previous installations I > configured corosync with multicast and now I have configured with unicast (my > current network environment doesn't allow multicast) but I think it's not > related with that behaviour
Agreed, I don't think it's multicast vs unicast. I can't see from this what's going wrong. Possibly node1 is trying to re-fence node2 when it comes back. Check that the fencing resources are configured correctly, and check whether node1 sees the first fencing succeed. > Cluster software versions: > corosync-1.4.8 > crmsh-2.1.5 > libqb-0.17.2 > Pacemaker-1.1.14 > resource-agents-3.9.6 > > > > Can you help me? > > Thanks > > Cesar _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org