On 8 Nov 2013, at 4:45 am, Sean Lutner <s...@rentul.net> wrote: > I have a confusing situation that I'm hoping to get help with. Last night > after configuring STONITH on my two node cluster, I suddenly have a "ghost" > node in my cluster. I'm looking to understand the best way to remove this > node from the config. > > I'm using the fence_ec2 device for for STONITH. I dropped the script on each > node, registered the device with stonith_admin -R -a fence_ec2 and confirmed > the registration with both > > # stonith_admin -I > # pcs stonith list > > I then configured STONITH per the Clusters from Scratch doc > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Clusters_from_Scratch/_example.html > > Here are my commands: > # pcs cluster cib stonith_cfg > # pcs -f stonith_cfg stonith create ec2-fencing fence_ec2 > ec2-home="/opt/ec2-api-tools" pcmk_host_check="static-list" > pcmk_host_list="ip-10-50-3-122 ip-10-50-3-251" op monitor interval="300s" > timeout="150s" op start start-delay="30s" interval="0" > # pcs -f stonith_cfg stonith > # pcs -f stonith_cfg property set stonith-enabled=true > # pcs -f stonith_cfg property > # pcs cluster push cib stonith_cfg > > After that I saw that STONITH appears to be functioning but a new node listed > in pcs status output:
Do the EC2 instances have fixed IPs? I didn't have much luck with EC2 because every time they came back up it was with a new name/address which confused corosync and created situations like this. > > # pcs status > Last updated: Thu Nov 7 17:41:21 2013 > Last change: Thu Nov 7 04:29:06 2013 via cibadmin on ip-10-50-3-122 > Stack: cman > Current DC: ip-10-50-3-122 - partition with quorum > Version: 1.1.8-7.el6-394e906 > 3 Nodes configured, unknown expected votes > 11 Resources configured. > > > Node ip-10-50-3-1251: UNCLEAN (offline) > Online: [ ip-10-50-3-122 ip-10-50-3-251 ] > > Full list of resources: > > ClusterEIP_54.215.143.166 (ocf::pacemaker:EIP): Started ip-10-50-3-122 > Clone Set: EIP-AND-VARNISH-clone [EIP-AND-VARNISH] > Started: [ ip-10-50-3-122 ip-10-50-3-251 ] > Stopped: [ EIP-AND-VARNISH:2 ] > ec2-fencing (stonith:fence_ec2): Stopped > > I have no idea where the node that is marked UNCLEAN came from, though it's a > clear typo is a proper cluster node. > > The only command I ran with the bad node ID was: > > # crm_resource --resource ClusterEIP_54.215.143.166 --cleanup --node > ip-10-50-3-1251 > > Is there any possible way that could have caused the the node to be added? > > I tried running pcs cluster node remove ip-10-50-3-1251 but since there is no > node and thus no pcsd that failed. Is there a way I can safely remove this > ghost node from the cluster? I can provide logs from pacemaker or corosync as > needed. > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org