On Wed, May 19, 2010 at 6:42 PM, James Mackie <jmac...@ezp.net> wrote:
> I have had this small 2 node cluster running since February. This morning
> one of the servers (Node2) stopped responding on the external network
> interface. To remedy this the server was rebooted at the console. (Not by
> me). When the node came back up it was showing the other node offline, and
> tried to take over all the services. The Node that was online the whole time
> (Node1) had taken over the services of Node2 when it was rebooted, (the
> internal network on Node2 was still active and responding), Node1 shows
> Node2 offline, Node2 shows Node1 offline. I’ve put Node2 in standby using
> crm so it stopped trying to take back the services, since it was not
> co-ordinating with the other node.

Versions or corosync/pacemaker?

> How do I get the node back re-joined to the cluster properly? All my
> previous experience was that it just rejoined, and the services failed back
> over as expected. This is the first time that the expected behavior has not
> occurred.
> I read another mailing list post regarding something similar, having to do
> with nodeid changes. This is not the case here, I verified that the nodeid
> in the previous logs matches what the node currently has registered as its
> nodeid.
> That same post recommended deleting Node2 with crm on Node1 and restarting
> Node2, along with deleting all of /var/lib/heartbeat/* on Node2 to flush the
> CIB. My assumption is that this will sync to the cluster and update
> automatically.  Doesn’t sound like advice I’d prefer to take blindly, I hate
> assuming.

I'd not do that.

> Does anyone have any input that will point me in the right direction? Any
> input would be helpful. Thank you.

Easiest method is probably to set is-managed-default=false and restart
corosync on both hosts.
Then once they see each other, set it back to true.
Openais mailing list

Reply via email to