Re: [ClusterLabs] Problem with stonith and starting services

Ken Gaillot Mon, 03 Jul 2017 08:41:05 -0700

On 07/03/2017 02:34 AM, Cesar Hernandez wrote:
> Hi
> 
> I have installed a pacemaker cluster with two nodes. The same type of 
> installation has done before many times and the following error never 
> appeared before. The situation is the following:
> 
> both nodes running cluster services
> stop pacemaker&corosync on node 1
> stop pacemaker&corosync on node 2
> start corosync&pacemaker on node 1
> 
> Then node 1 starts, it sees node2 down, and it fences it, as it was expected. 
> But the problem comes when node 2 is rebooted and starts cluster services: 
> sometimes, it starts the corosync service but the pacemaker service starts 
> and then stops. The syslog shows the following error in these cases:
> 
> Jul  3 09:07:04 node2 pacemakerd[597]:  warning: The crmd process (608) can 
> no longer be respawned, shutting the cluster down.
> Jul  3 09:07:04 node2 pacemakerd[597]:   notice: Shutting down Pacemaker
> 
> Previous messages show some warning messages that I'm not sure they are 
> related with the shutdown:
> 
> 
> Jul  3 09:07:04 node2 stonith-ng[604]:   notice: Operation reboot of node2 by 
> node1 for crmd.2413@node1.608d8118: OK
> Jul  3 09:07:04 node2 crmd[608]:     crit: We were allegedly just fenced by 
> node1 for node1!
> Jul  3 09:07:04 node2 corosync[585]:   [pcmk  ] info: pcmk_ipc_exit: Client 
> crmd (conn=0x1471800, async-conn=0x1471800) left
> 
> 
> On node1, all resources become unrunnable and it stays there forever until I 
> start manually pacemaker service on node2. 
> As I said, same type of installation has done before on other servers and 
> never happened this. The only difference is that in previous installations I 
> configured corosync with multicast and now I have configured with unicast (my 
> current network environment doesn't allow multicast) but I think it's not 
> related with that behaviour


Agreed, I don't think it's multicast vs unicast.

I can't see from this what's going wrong. Possibly node1 is trying to
re-fence node2 when it comes back. Check that the fencing resources are
configured correctly, and check whether node1 sees the first fencing
succeed.

> Cluster software versions:
> corosync-1.4.8
> crmsh-2.1.5
> libqb-0.17.2
> Pacemaker-1.1.14
> resource-agents-3.9.6
> 
> 
> 
> Can you help me?
> 
> Thanks
> 
> Cesar

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Problem with stonith and starting services

Reply via email to