> On Nov 12, 2013, at 6:01 PM, Andrew Beekhof <and...@beekhof.net> wrote:
> 
> 
>> On 13 Nov 2013, at 6:10 am, Sean Lutner <s...@rentul.net> wrote:
>> 
>> The folks testing the cluster I've been building have run a script which 
>> blocks all traffic except SSH on one node of the cluster for 15 seconds to 
>> mimic a network failure. During this time, the network being "down" seems to 
>> cause some odd behavior from pacemaker resulting in it dying.
>> 
>> The cluster is two nodes and running four custom resources on EC2 instances. 
>> The OS is CentOS 6.4 with the config below:
>> 
>> I've attached the /var/log/messages and /var/log/cluster/corosync.log from 
>> the time period during the test. I've having some difficulty in piecing 
>> together what happened and am hoping someone can shed some light on the 
>> problem. Any indications why pacemaker is dying on that node?
> 
> Because corosync is dying underneath it:
> 
> Nov 09 14:51:49 [942] ip-10-50-3-251        cib:    error: send_ais_text:    
> Sending message 28 via cpg: FAILED (rc=2): Library error: Connection timed 
> out (110)
> Nov 09 14:51:49 [942] ip-10-50-3-251        cib:    error: pcmk_cpg_dispatch: 
>    Connection to the CPG API failed: 2
> Nov 09 14:51:49 [942] ip-10-50-3-251        cib:    error: cib_ais_destroy:   
>  Corosync connection lost!  Exiting.
> Nov 09 14:51:49 [942] ip-10-50-3-251        cib:     info: terminate_cib:    
> cib_ais_destroy: Exiting fast...

Is that the expected behavior? Is it because the DC was the other node?

I did notice that there was an attempted fence operation but it didn't look 
successful. 

> 
> 
>> 
>> 
>> [root@ip-10-50-3-122 ~]# pcs config
>> Corosync Nodes:
>> 
>> Pacemaker Nodes:
>> ip-10-50-3-122 ip-10-50-3-251 
>> 
>> Resources: 
>> Resource: ClusterEIP_54.215.143.166 (provider=pacemaker type=EIP class=ocf)
>> Attributes: first_network_interface_id=eni-e4e0b68c 
>> second_network_interface_id=eni-35f9af5d first_private_ip=10.50.3.191 
>> second_private_ip=10.50.3.91 eip=54.215.143.166 alloc_id=eipalloc-376c3c5f 
>> interval=5s 
>> Operations: monitor interval=5s
>> Clone: EIP-AND-VARNISH-clone
>> Group: EIP-AND-VARNISH
>>  Resource: Varnish (provider=redhat type=varnish.sh class=ocf)
>>   Operations: monitor interval=5s
>>  Resource: Varnishlog (provider=redhat type=varnishlog.sh class=ocf)
>>   Operations: monitor interval=5s
>>  Resource: Varnishncsa (provider=redhat type=varnishncsa.sh class=ocf)
>>   Operations: monitor interval=5s
>> Resource: ec2-fencing (type=fence_ec2 class=stonith)
>> Attributes: ec2-home=/opt/ec2-api-tools pcmk_host_check=static-list 
>> pcmk_host_list=HA01 HA02 
>> Operations: monitor start-delay=30s interval=0 timeout=150s
>> 
>> Location Constraints:
>> Ordering Constraints:
>> ClusterEIP_54.215.143.166 then Varnish
>> Varnish then Varnishlog
>> Varnishlog then Varnishncsa
>> Colocation Constraints:
>> Varnish with ClusterEIP_54.215.143.166
>> Varnishlog with Varnish
>> Varnishncsa with Varnishlog
>> 
>> Cluster Properties:
>> dc-version: 1.1.8-7.el6-394e906
>> cluster-infrastructure: cman
>> last-lrm-refresh: 1384196963
>> no-quorum-policy: ignore
>> stonith-enabled: true
>> 
>> <net-failure-messages-110913.out><net-failure-corosync-110913.out>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to