Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues
On Saturday, April 20, 2013, Andreas Mock wrote: Hi Andrew, is the bug fix in 1.1.9 for RHEL6.4? On clusterlabs.org? Yes. Have you an idea when 1.1.20 will be released? 1.1.10 will be available once people stop reporting bugs in the release candidates :-) Rc1 is out now if you'd like to test it. Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Andrew Beekhof [mailto:and...@beekhof.net javascript:;] Gesendet: Samstag, 20. April 2013 12:04 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues On 19/04/2013, at 11:28 AM, pavan tc pavan...@gmail.com javascript:; wrote: Yes, but looking at the code it should be impossible. Would it be possible for you to add: export PCMK_trace_functions=peer_update_callback to /etc/sysconfig/pacemaker and re-test (and send me the new logs - probably in /var/log/pacemaker.log)? Sorry about the delay. I have put these in place and am running tests now. The next time I hit this, I'll post the messages. Another user hit the same issue and was able to reproduce. You can see the resolution at https://bugzilla.redhat.com/show_bug.cgi?id=951340 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org javascript:; http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org javascript:; http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues
On 19/04/2013, at 11:28 AM, pavan tc pavan...@gmail.com wrote: Yes, but looking at the code it should be impossible. Would it be possible for you to add: export PCMK_trace_functions=peer_update_callback to /etc/sysconfig/pacemaker and re-test (and send me the new logs - probably in /var/log/pacemaker.log)? Sorry about the delay. I have put these in place and am running tests now. The next time I hit this, I'll post the messages. Another user hit the same issue and was able to reproduce. You can see the resolution at https://bugzilla.redhat.com/show_bug.cgi?id=951340 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues
Hi Andrew, is the bug fix in 1.1.9 for RHEL6.4? Have you an idea when 1.1.20 will be released? Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Andrew Beekhof [mailto:and...@beekhof.net] Gesendet: Samstag, 20. April 2013 12:04 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues On 19/04/2013, at 11:28 AM, pavan tc pavan...@gmail.com wrote: Yes, but looking at the code it should be impossible. Would it be possible for you to add: export PCMK_trace_functions=peer_update_callback to /etc/sysconfig/pacemaker and re-test (and send me the new logs - probably in /var/log/pacemaker.log)? Sorry about the delay. I have put these in place and am running tests now. The next time I hit this, I'll post the messages. Another user hit the same issue and was able to reproduce. You can see the resolution at https://bugzilla.redhat.com/show_bug.cgi?id=951340 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues
Another user hit the same issue and was able to reproduce. You can see the resolution at https://bugzilla.redhat.com/show_bug.cgi?id=951340 Thanks much for letting me know. I will watch the Fixed in version field and upgrade as necessary. Pavan ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues
Yes, but looking at the code it should be impossible. Would it be possible for you to add: export PCMK_trace_functions=peer_update_callback to /etc/sysconfig/pacemaker and re-test (and send me the new logs - probably in /var/log/pacemaker.log)? Sorry about the delay. I have put these in place and am running tests now. The next time I hit this, I'll post the messages. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues
On 17/04/2013, at 3:19 AM, pavan tc pavan...@gmail.com wrote: On Fri, Apr 12, 2013 at 9:27 AM, pavan tc pavan...@gmail.com wrote: Absolutely none in the syslog. Only the regular monitor logs from my resource agent which continued to report as secondary. This is very strange, because the thing that caused the I_PE_CALC is a timer that goes off every 15 minutes. Which would seem to imply that there was a transition of some kind about when the failure happened - but somehow it didnt go into the logs. Could you post the complete logs from 14:00 to 14:30? Sure. Here goes. Attached are two logs and corosync.conf - 1. syslog (Edited, messages from other modules removed. I have not touched the pacemaker/corosync related messages) 2 corosync.log (Unedited) 3 corosync.conf Wanted to mention a couple of things: -- 14:06 is when the system was coming back up from a reboot. I have started from the earliest message during boot to the point the I_PE_CALC timer popped and a promote was called. -- I see the following during boot up. Does that mean pacemaker did not start? Not yet. But it looks like the pacemaker init script was also kicked off somehow (otherwise the crmd etc would not be running) Apr 10 14:06:26 corosync [pcmk ] info: process_ais_conf: Enabling MCP mode: Use the Pacemaker init script to complete Pacemaker startup Could that contribute to any of this behaviour? No. I'll be glad to provide any other information. Did anybody get a chance to look at the information attached in the previous email? Yes, but looking at the code it should be impossible. Would it be possible for you to add: export PCMK_trace_functions=peer_update_callback to /etc/sysconfig/pacemaker and re-test (and send me the new logs - probably in /var/log/pacemaker.log)? Thanks, Pavan Pavan ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues
On Fri, Apr 12, 2013 at 9:27 AM, pavan tc pavan...@gmail.com wrote: Absolutely none in the syslog. Only the regular monitor logs from my resource agent which continued to report as secondary. This is very strange, because the thing that caused the I_PE_CALC is a timer that goes off every 15 minutes. Which would seem to imply that there was a transition of some kind about when the failure happened - but somehow it didnt go into the logs. Could you post the complete logs from 14:00 to 14:30? Sure. Here goes. Attached are two logs and corosync.conf - 1. syslog (Edited, messages from other modules removed. I have not touched the pacemaker/corosync related messages) 2 corosync.log (Unedited) 3 corosync.conf Wanted to mention a couple of things: -- 14:06 is when the system was coming back up from a reboot. I have started from the earliest message during boot to the point the I_PE_CALC timer popped and a promote was called. -- I see the following during boot up. Does that mean pacemaker did not start? Apr 10 14:06:26 corosync [pcmk ] info: process_ais_conf: Enabling MCP mode: Use the Pacemaker init script to complete Pacemaker startup Could that contribute to any of this behaviour? I'll be glad to provide any other information. Did anybody get a chance to look at the information attached in the previous email? Thanks, Pavan Pavan ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues
On 11/04/2013, at 8:15 AM, pavan tc pavan...@gmail.com wrote: Hi, [I did go through the mail thread titled: RHEL6 and clones: CMAN needed anyway?, but was not sure about some answers there] I recently moved from pacemaker 1.1.7 to 1.1.8-7 on centos 6.2. I see the following in syslog: corosync[2966]: [pcmk ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon. corosync[2966]: [pcmk ] ERROR: process_ais_conf: Please see Chapter 8 of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN Does this mean that my current configuration is incorrect and will not work as it used to with pacemaker 1.1.7/Corosync? It will continue to work until the Pacemaker plugin is removed from RHEL. I looked at the Clusters from Scratch instructions and it talks mostly about GFS2. I don't have any filesystem requirements. In that case, can I live with Pacemaker/Corosync? Yes, but only until the Pacemaker plugin is removed from RHEL. I do understand that this config is not recommended, but the reason I ask is because I am hitting a weird problem with this setup which I will explain below. Just want to make sure that I don't start off with an erroneous setup. I have a two-node multi-state resource configured with the following config: [root@vsanqa4 ~]# crm configure show node vsanqa3 node vsanqa4 primitive vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e ocf:heartbeat:vgc-cm-agent.ocf \ params cluster_uuid=6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e \ op monitor interval=30s role=Master timeout=100s \ op monitor interval=31s role=Slave timeout=100s ms ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e \ meta clone-max=2 globally-unique=false target-role=Started location ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e-nodes ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e \ rule $id=ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e-nodes-rule -inf: #uname ne vsanqa4 and #uname ne vsanqa3 property $id=cib-bootstrap-options \ dc-version=1.1.8-7.el6-394e906 \ cluster-infrastructure=classic openais (with plugin) \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore rsc_defaults $id=rsc-options \ resource-stickiness=100 With this config, if I simulate a crash on the master with echo c /proc/sysrq-trigger, the slave does not get promoted for about 15 minutes. It does detect the peer going down, but does not seem to issue the promote immediately: Apr 10 14:12:32 vsanqa4 corosync[2966]: [TOTEM ] A processor failed, forming new configuration. Apr 10 14:12:38 vsanqa4 corosync[2966]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 166060: memb=1, new=0, lost=1 Apr 10 14:12:38 vsanqa4 corosync[2966]: [pcmk ] info: pcmk_peer_update: memb: vsanqa4 1967394988 Apr 10 14:12:38 vsanqa4 corosync[2966]: [pcmk ] info: pcmk_peer_update: lost: vsanqa3 1950617772 Apr 10 14:12:38 vsanqa4 corosync[2966]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 166060: memb=1, new=0, lost=0 Apr 10 14:12:38 vsanqa4 corosync[2966]: [pcmk ] info: pcmk_peer_update: MEMB: vsanqa4 1967394988 Apr 10 14:12:38 vsanqa4 corosync[2966]: [pcmk ] info: ais_mark_unseen_peer_dead: Node vsanqa3 was not seen in the previous transition Apr 10 14:12:38 vsanqa4 corosync[2966]: [pcmk ] info: update_member: Node 1950617772/vsanqa3 is now: lost Apr 10 14:12:38 vsanqa4 corosync[2966]: [pcmk ] info: send_member_notification: Sending membership update 166060 to 2 children Apr 10 14:12:38 vsanqa4 corosync[2966]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Apr 10 14:12:38 vsanqa4 cib[3386]: notice: ais_dispatch_message: Membership 166060: quorum lost Apr 10 14:12:38 vsanqa4 crmd[3391]: notice: ais_dispatch_message: Membership 166060: quorum lost Apr 10 14:12:38 vsanqa4 cib[3386]: notice: crm_update_peer_state: crm_update_ais_node: Node vsanqa3[1950617772] - state is now lost Apr 10 14:12:38 vsanqa4 crmd[3391]: notice: crm_update_peer_state: crm_update_ais_node: Node vsanqa3[1950617772] - state is now lost Apr 10 14:12:38 vsanqa4 corosync[2966]: [CPG ] chosen downlist: sender r(0) ip(172.16.68.117) ; members(old:2 left:1) Apr 10 14:12:38 vsanqa4 corosync[2966]: [MAIN ] Completed service synchronization, ready to provide service. Then (after about 15 minutes), I see the following: There were no logs at all in between? Apr 10 14:26:46 vsanqa4 crmd[3391]: notice: do_state_transition: State transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ] Apr 10 14:26:46 vsanqa4 pengine[3390]: notice: unpack_config: On
Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues
Hi Andrew, Thanks much for looking at this. Then (after about 15 minutes), I see the following: There were no logs at all in between? Absolutely none in the syslog. Only the regular monitor logs from my resource agent which continued to report as secondary. I also checked /var/log/cluster/corosync.log. The only difference between this and the ones in syslog are the messages below: From /var/log/cluster/corosync.log: --- Apr 10 14:12:38 [3391] vsanqa4 crmd: notice: ais_dispatch_message: Membership 166060: quorum lost Apr 10 14:12:38 [3386] vsanqa4cib: notice: crm_update_peer_state: crm_update_ais_node: Node vsanqa3[1950617772] - state is now lost Apr 10 14:12:38 [3391] vsanqa4 crmd: notice: crm_update_peer_state: crm_update_ais_node: Node vsanqa3[1950617772] - state is now lost Apr 10 14:12:38 [3391] vsanqa4 crmd: info: peer_update_callback: vsanqa3 is now lost (was member) Apr 10 14:12:38 corosync [CPG ] chosen downlist: sender r(0) ip(172.16.68.117) ; members(old:2 left:1) Apr 10 14:12:38 corosync [MAIN ] Completed service synchronization, ready to provide service. Apr 10 14:12:38 [3386] vsanqa4cib: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/62, version=0.668.12): OK (rc=0) Apr 10 14:12:38 [3386] vsanqa4cib: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/64, version=0.668.14): OK (rc=0) Apr 10 14:12:38 [3391] vsanqa4 crmd: info: crmd_ais_dispatch: Setting expected votes to 2 Apr 10 14:12:38 [3386] vsanqa4cib: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/66, version=0.668.15): OK (rc=0) The first six out of the 10 messages above were seen on syslog too. Adding them here for context. The last four are the extra messages in corosync.log Pavan Apr 10 14:26:46 vsanqa4 crmd[3391]: notice: do_state_transition: State transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ] Apr 10 14:26:46 vsanqa4 pengine[3390]: notice: unpack_config: On loss of CCM Quorum: Ignore Apr 10 14:26:46 vsanqa4 pengine[3390]: notice: LogActions: Promote vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e:0#011(Slave - Master vsanqa4) Apr 10 14:26:46 vsanqa4 pengine[3390]: notice: process_pe_message: Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-392.bz2 Thanks, Pavan ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues
On 12/04/2013, at 12:11 PM, pavan tc pavan...@gmail.com wrote: Hi Andrew, Thanks much for looking at this. Then (after about 15 minutes), I see the following: There were no logs at all in between? Absolutely none in the syslog. Only the regular monitor logs from my resource agent which continued to report as secondary. This is very strange, because the thing that caused the I_PE_CALC is a timer that goes off every 15 minutes. Which would seem to imply that there was a transition of some kind about when the failure happened - but somehow it didnt go into the logs. Could you post the complete logs from 14:00 to 14:30? I also checked /var/log/cluster/corosync.log. The only difference between this and the ones in syslog are the messages below: From /var/log/cluster/corosync.log: --- Apr 10 14:12:38 [3391] vsanqa4 crmd: notice: ais_dispatch_message: Membership 166060: quorum lost Apr 10 14:12:38 [3386] vsanqa4cib: notice: crm_update_peer_state: crm_update_ais_node: Node vsanqa3[1950617772] - state is now lost Apr 10 14:12:38 [3391] vsanqa4 crmd: notice: crm_update_peer_state: crm_update_ais_node: Node vsanqa3[1950617772] - state is now lost Apr 10 14:12:38 [3391] vsanqa4 crmd: info: peer_update_callback: vsanqa3 is now lost (was member) Apr 10 14:12:38 corosync [CPG ] chosen downlist: sender r(0) ip(172.16.68.117) ; members(old:2 left:1) Apr 10 14:12:38 corosync [MAIN ] Completed service synchronization, ready to provide service. Apr 10 14:12:38 [3386] vsanqa4cib: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/62, version=0.668.12): OK (rc=0) Apr 10 14:12:38 [3386] vsanqa4cib: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/64, version=0.668.14): OK (rc=0) Apr 10 14:12:38 [3391] vsanqa4 crmd: info: crmd_ais_dispatch: Setting expected votes to 2 Apr 10 14:12:38 [3386] vsanqa4cib: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/66, version=0.668.15): OK (rc=0) The first six out of the 10 messages above were seen on syslog too. Adding them here for context. The last four are the extra messages in corosync.log Pavan Apr 10 14:26:46 vsanqa4 crmd[3391]: notice: do_state_transition: State transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ] Apr 10 14:26:46 vsanqa4 pengine[3390]: notice: unpack_config: On loss of CCM Quorum: Ignore Apr 10 14:26:46 vsanqa4 pengine[3390]: notice: LogActions: Promote vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e:0#011(Slave - Master vsanqa4) Apr 10 14:26:46 vsanqa4 pengine[3390]: notice: process_pe_message: Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-392.bz2 Thanks, Pavan ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues
Hi, [I did go through the mail thread titled: RHEL6 and clones: CMAN needed anyway?, but was not sure about some answers there] I recently moved from pacemaker 1.1.7 to 1.1.8-7 on centos 6.2. I see the following in syslog: corosync[2966]: [pcmk ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon. corosync[2966]: [pcmk ] ERROR: process_ais_conf: Please see Chapter 8 of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN Does this mean that my current configuration is incorrect and will not work as it used to with pacemaker 1.1.7/Corosync? I looked at the Clusters from Scratch instructions and it talks mostly about GFS2. I don't have any filesystem requirements. In that case, can I live with Pacemaker/Corosync? I do understand that this config is not recommended, but the reason I ask is because I am hitting a weird problem with this setup which I will explain below. Just want to make sure that I don't start off with an erroneous setup. I have a two-node multi-state resource configured with the following config: [root@vsanqa4 ~]# crm configure show node vsanqa3 node vsanqa4 primitive vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e ocf:heartbeat:vgc-cm-agent.ocf \ params cluster_uuid=6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e \ op monitor interval=30s role=Master timeout=100s \ op monitor interval=31s role=Slave timeout=100s ms ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e \ meta clone-max=2 globally-unique=false target-role=Started location ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e-nodes ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e \ rule $id=ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e-nodes-rule -inf: #uname ne vsanqa4 and #uname ne vsanqa3 property $id=cib-bootstrap-options \ dc-version=1.1.8-7.el6-394e906 \ cluster-infrastructure=classic openais (with plugin) \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore rsc_defaults $id=rsc-options \ resource-stickiness=100 With this config, if I simulate a crash on the master with echo c /proc/sysrq-trigger, the slave does not get promoted for about 15 minutes. It does detect the peer going down, but does not seem to issue the promote immediately: Apr 10 14:12:32 vsanqa4 corosync[2966]: [TOTEM ] A processor failed, forming new configuration. Apr 10 14:12:38 vsanqa4 corosync[2966]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 166060: memb=1, new=0, lost=1 Apr 10 14:12:38 vsanqa4 corosync[2966]: [pcmk ] info: pcmk_peer_update: memb: vsanqa4 1967394988 Apr 10 14:12:38 vsanqa4 corosync[2966]: [pcmk ] info: pcmk_peer_update: lost: vsanqa3 1950617772 Apr 10 14:12:38 vsanqa4 corosync[2966]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 166060: memb=1, new=0, lost=0 Apr 10 14:12:38 vsanqa4 corosync[2966]: [pcmk ] info: pcmk_peer_update: MEMB: vsanqa4 1967394988 Apr 10 14:12:38 vsanqa4 corosync[2966]: [pcmk ] info: ais_mark_unseen_peer_dead: Node vsanqa3 was not seen in the previous transition Apr 10 14:12:38 vsanqa4 corosync[2966]: [pcmk ] info: update_member: Node 1950617772/vsanqa3 is now: lost Apr 10 14:12:38 vsanqa4 corosync[2966]: [pcmk ] info: send_member_notification: Sending membership update 166060 to 2 children Apr 10 14:12:38 vsanqa4 corosync[2966]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Apr 10 14:12:38 vsanqa4 cib[3386]: notice: ais_dispatch_message: Membership 166060: quorum lost Apr 10 14:12:38 vsanqa4 crmd[3391]: notice: ais_dispatch_message: Membership 166060: quorum lost Apr 10 14:12:38 vsanqa4 cib[3386]: notice: crm_update_peer_state: crm_update_ais_node: Node vsanqa3[1950617772] - state is now lost Apr 10 14:12:38 vsanqa4 crmd[3391]: notice: crm_update_peer_state: crm_update_ais_node: Node vsanqa3[1950617772] - state is now lost Apr 10 14:12:38 vsanqa4 corosync[2966]: [CPG ] chosen downlist: sender r(0) ip(172.16.68.117) ; members(old:2 left:1) Apr 10 14:12:38 vsanqa4 corosync[2966]: [MAIN ] Completed service synchronization, ready to provide service. Then (after about 15 minutes), I see the following: Apr 10 14:26:46 vsanqa4 crmd[3391]: notice: do_state_transition: State transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ] Apr 10 14:26:46 vsanqa4 pengine[3390]: notice: unpack_config: On loss of CCM Quorum: Ignore Apr 10 14:26:46 vsanqa4 pengine[3390]: notice: LogActions: Promote vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e:0#011(Slave - Master vsanqa4) Apr 10 14:26:46 vsanqa4 pengine[3390]: notice: process_pe_message: Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-392.bz2 Thanks, Pavan ___ Pacemaker