On Nov 7, 2013, at 8:34 PM, Andrew Beekhof <and...@beekhof.net> wrote:

> 
> On 8 Nov 2013, at 4:45 am, Sean Lutner <s...@rentul.net> wrote:
> 
>> I have a confusing situation that I'm hoping to get help with. Last night 
>> after configuring STONITH on my two node cluster, I suddenly have a "ghost" 
>> node in my cluster. I'm looking to understand the best way to remove this 
>> node from the config.
>> 
>> I'm using the fence_ec2 device for for STONITH. I dropped the script on each 
>> node, registered the device with stonith_admin -R -a fence_ec2 and confirmed 
>> the registration with both
>> 
>> # stonith_admin -I
>> # pcs stonith list
>> 
>> I then configured STONITH per the Clusters from Scratch doc
>> 
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Clusters_from_Scratch/_example.html
>> 
>> Here are my commands:
>> # pcs cluster cib stonith_cfg
>> # pcs -f stonith_cfg stonith create ec2-fencing fence_ec2 
>> ec2-home="/opt/ec2-api-tools" pcmk_host_check="static-list" 
>> pcmk_host_list="ip-10-50-3-122 ip-10-50-3-251" op monitor interval="300s" 
>> timeout="150s" op start start-delay="30s" interval="0"
>> # pcs -f stonith_cfg stonith
>> # pcs -f stonith_cfg property set stonith-enabled=true
>> # pcs -f stonith_cfg property
>> # pcs cluster push cib stonith_cfg
>> 
>> After that I saw that STONITH appears to be functioning but a new node 
>> listed in pcs status output:
> 
> Do the EC2 instances have fixed IPs?
> I didn't have much luck with EC2 because every time they came back up it was 
> with a new name/address which confused corosync and created situations like 
> this.

The IPs persist across reboots as far as I can tell. I thought the problem was 
due to stonith being enabled but not working so I removed the stonith_id and 
disabled stonith. After that I restarted pacemaker and cman on both nodes and 
things started as expected but the ghost node it still there. 

Someone else working on the cluster exported the CIB, removed the node and then 
imported the CIB. They used this process 
http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-config-updates.html

Even after that, the ghost node is still there? Would pcs cluster cib > 
/tmp/cib-temp.xml and then pcs cluster push cib /tmp/cib-temp.xml after editing 
the node out of the config?

I may have to go back to the drawing board on a fencing device for the nodes. 
Are there any other recommendations for a cluster on EC2 nodes?

Thanks very much

> 
>> 
>> # pcs status
>> Last updated: Thu Nov  7 17:41:21 2013
>> Last change: Thu Nov  7 04:29:06 2013 via cibadmin on ip-10-50-3-122
>> Stack: cman
>> Current DC: ip-10-50-3-122 - partition with quorum
>> Version: 1.1.8-7.el6-394e906
>> 3 Nodes configured, unknown expected votes
>> 11 Resources configured.
>> 
>> 
>> Node ip-10-50-3-1251: UNCLEAN (offline)
>> Online: [ ip-10-50-3-122 ip-10-50-3-251 ]
>> 
>> Full list of resources:
>> 
>> ClusterEIP_54.215.143.166      (ocf::pacemaker:EIP):   Started ip-10-50-3-122
>> Clone Set: EIP-AND-VARNISH-clone [EIP-AND-VARNISH]
>>    Started: [ ip-10-50-3-122 ip-10-50-3-251 ]
>>    Stopped: [ EIP-AND-VARNISH:2 ]
>> ec2-fencing    (stonith:fence_ec2):    Stopped 
>> 
>> I have no idea where the node that is marked UNCLEAN came from, though it's 
>> a clear typo is a proper cluster node.
>> 
>> The only command I ran with the bad node ID was:
>> 
>> # crm_resource --resource ClusterEIP_54.215.143.166 --cleanup --node 
>> ip-10-50-3-1251
>> 
>> Is there any possible way that could have caused the the node to be added?
>> 
>> I tried running pcs cluster node remove ip-10-50-3-1251 but since there is 
>> no node and thus no pcsd that failed. Is there a way I can safely remove 
>> this ghost node from the cluster? I can provide logs from pacemaker or 
>> corosync as needed.
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to