On 8 Nov 2013, at 4:45 am, Sean Lutner <s...@rentul.net> wrote:

> I have a confusing situation that I'm hoping to get help with. Last night 
> after configuring STONITH on my two node cluster, I suddenly have a "ghost" 
> node in my cluster. I'm looking to understand the best way to remove this 
> node from the config.
> 
> I'm using the fence_ec2 device for for STONITH. I dropped the script on each 
> node, registered the device with stonith_admin -R -a fence_ec2 and confirmed 
> the registration with both
> 
> # stonith_admin -I
> # pcs stonith list
> 
> I then configured STONITH per the Clusters from Scratch doc
> 
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Clusters_from_Scratch/_example.html
> 
> Here are my commands:
> # pcs cluster cib stonith_cfg
> # pcs -f stonith_cfg stonith create ec2-fencing fence_ec2 
> ec2-home="/opt/ec2-api-tools" pcmk_host_check="static-list" 
> pcmk_host_list="ip-10-50-3-122 ip-10-50-3-251" op monitor interval="300s" 
> timeout="150s" op start start-delay="30s" interval="0"
> # pcs -f stonith_cfg stonith
> # pcs -f stonith_cfg property set stonith-enabled=true
> # pcs -f stonith_cfg property
> # pcs cluster push cib stonith_cfg
> 
> After that I saw that STONITH appears to be functioning but a new node listed 
> in pcs status output:

Do the EC2 instances have fixed IPs?
I didn't have much luck with EC2 because every time they came back up it was 
with a new name/address which confused corosync and created situations like 
this.

> 
> # pcs status
> Last updated: Thu Nov  7 17:41:21 2013
> Last change: Thu Nov  7 04:29:06 2013 via cibadmin on ip-10-50-3-122
> Stack: cman
> Current DC: ip-10-50-3-122 - partition with quorum
> Version: 1.1.8-7.el6-394e906
> 3 Nodes configured, unknown expected votes
> 11 Resources configured.
> 
> 
> Node ip-10-50-3-1251: UNCLEAN (offline)
> Online: [ ip-10-50-3-122 ip-10-50-3-251 ]
> 
> Full list of resources:
> 
> ClusterEIP_54.215.143.166      (ocf::pacemaker:EIP):   Started ip-10-50-3-122
> Clone Set: EIP-AND-VARNISH-clone [EIP-AND-VARNISH]
>     Started: [ ip-10-50-3-122 ip-10-50-3-251 ]
>     Stopped: [ EIP-AND-VARNISH:2 ]
> ec2-fencing    (stonith:fence_ec2):    Stopped 
> 
> I have no idea where the node that is marked UNCLEAN came from, though it's a 
> clear typo is a proper cluster node.
> 
> The only command I ran with the bad node ID was:
> 
> # crm_resource --resource ClusterEIP_54.215.143.166 --cleanup --node 
> ip-10-50-3-1251
> 
> Is there any possible way that could have caused the the node to be added?
> 
> I tried running pcs cluster node remove ip-10-50-3-1251 but since there is no 
> node and thus no pcsd that failed. Is there a way I can safely remove this 
> ghost node from the cluster? I can provide logs from pacemaker or corosync as 
> needed.
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to