Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues

2013-04-21 Thread Andrew Beekhof
On Saturday, April 20, 2013, Andreas Mock wrote:

 Hi Andrew,

 is the bug fix in 1.1.9 for RHEL6.4?


On clusterlabs.org? Yes.


 Have you an idea when 1.1.20 will be released?


 1.1.10 will be available once people stop reporting bugs in the release
candidates :-)
Rc1 is out now if you'd like to test it.



 Best regards
 Andreas Mock


 -Ursprüngliche Nachricht-
 Von: Andrew Beekhof [mailto:and...@beekhof.net javascript:;]
 Gesendet: Samstag, 20. April 2013 12:04
 An: The Pacemaker cluster resource manager
 Betreff: Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion
 issues


 On 19/04/2013, at 11:28 AM, pavan tc pavan...@gmail.com javascript:;
 wrote:

  Yes, but looking at the code it should be impossible.
  Would it be possible for you to add:
 
  export PCMK_trace_functions=peer_update_callback
 
  to /etc/sysconfig/pacemaker and re-test (and send me the new logs -
 probably in /var/log/pacemaker.log)?
 
 
  Sorry about the delay.
 
  I have put these in place and am running tests now. The next time I hit
 this, I'll post the messages.

 Another user hit the same issue and was able to reproduce.
 You can see the resolution at
 https://bugzilla.redhat.com/show_bug.cgi?id=951340


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org javascript:;
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org javascript:;
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues

2013-04-20 Thread Andrew Beekhof

On 19/04/2013, at 11:28 AM, pavan tc pavan...@gmail.com wrote:

 Yes, but looking at the code it should be impossible.
 Would it be possible for you to add:
 
 export PCMK_trace_functions=peer_update_callback
 
 to /etc/sysconfig/pacemaker and re-test (and send me the new logs - probably 
 in /var/log/pacemaker.log)?
 
 
 Sorry about the delay.
 
 I have put these in place and am running tests now. The next time I hit this, 
 I'll post the messages.

Another user hit the same issue and was able to reproduce.
You can see the resolution at https://bugzilla.redhat.com/show_bug.cgi?id=951340


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues

2013-04-20 Thread Andreas Mock
Hi Andrew,

is the bug fix in 1.1.9 for RHEL6.4?
Have you an idea when 1.1.20 will be released?

Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: Andrew Beekhof [mailto:and...@beekhof.net] 
Gesendet: Samstag, 20. April 2013 12:04
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion
issues


On 19/04/2013, at 11:28 AM, pavan tc pavan...@gmail.com wrote:

 Yes, but looking at the code it should be impossible.
 Would it be possible for you to add:
 
 export PCMK_trace_functions=peer_update_callback
 
 to /etc/sysconfig/pacemaker and re-test (and send me the new logs -
probably in /var/log/pacemaker.log)?
 
 
 Sorry about the delay.
 
 I have put these in place and am running tests now. The next time I hit
this, I'll post the messages.

Another user hit the same issue and was able to reproduce.
You can see the resolution at
https://bugzilla.redhat.com/show_bug.cgi?id=951340


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues

2013-04-20 Thread pavan tc

 Another user hit the same issue and was able to reproduce.
 You can see the resolution at
 https://bugzilla.redhat.com/show_bug.cgi?id=951340


Thanks much for letting me know. I will watch the Fixed in version field
and upgrade as necessary.

Pavan
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues

2013-04-18 Thread pavan tc
Yes, but looking at the code it should be impossible.

 Would it be possible for you to add:

 export PCMK_trace_functions=peer_update_callback

 to /etc/sysconfig/pacemaker and re-test (and send me the new logs -
 probably in /var/log/pacemaker.log)?


Sorry about the delay.

I have put these in place and am running tests now. The next time I hit
this, I'll post the messages.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues

2013-04-17 Thread Andrew Beekhof

On 17/04/2013, at 3:19 AM, pavan tc pavan...@gmail.com wrote:

 On Fri, Apr 12, 2013 at 9:27 AM, pavan tc pavan...@gmail.com wrote:
  Absolutely none in the syslog. Only the regular monitor logs from my 
  resource agent which continued to report as secondary.
 
 This is very strange, because the thing that caused the I_PE_CALC is a timer 
 that goes off every 15 minutes.
 Which would seem to imply that there was a transition of some kind about when 
 the failure happened - but somehow it didnt go into the logs.
 
 Could you post the complete logs from 14:00 to 14:30?
 
 
 Sure. Here goes. Attached are two logs and corosync.conf -
 1. syslog (Edited, messages from other modules removed. I have not touched 
 the pacemaker/corosync related messages)
 2 corosync.log (Unedited)
 3 corosync.conf
 
 Wanted to mention a couple of things:
 -- 14:06 is when the system was coming back up from a reboot. I have started 
 from the earliest message during boot to the point the I_PE_CALC timer popped 
 and a promote was called.
 -- I see the following during boot up. Does that mean pacemaker did not start?

Not yet. But it looks like the pacemaker init script was also kicked off 
somehow (otherwise the crmd etc would not be running)

 Apr 10 14:06:26 corosync [pcmk  ] info: process_ais_conf: Enabling MCP mode: 
 Use the Pacemaker init script to complete Pacemaker startup
 
 Could that contribute to any of this behaviour?

No.

 
 I'll be glad to provide any other information.
 
 Did anybody get a chance to look at the information attached in the previous 
 email?

Yes, but looking at the code it should be impossible.
Would it be possible for you to add:

export PCMK_trace_functions=peer_update_callback 

to /etc/sysconfig/pacemaker and re-test (and send me the new logs - probably in 
/var/log/pacemaker.log)?

 
 Thanks,
 Pavan
  
 
 Pavan
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues

2013-04-16 Thread pavan tc
On Fri, Apr 12, 2013 at 9:27 AM, pavan tc pavan...@gmail.com wrote:

  Absolutely none in the syslog. Only the regular monitor logs from my
 resource agent which continued to report as secondary.


 This is very strange, because the thing that caused the I_PE_CALC is a
 timer that goes off every 15 minutes.
 Which would seem to imply that there was a transition of some kind about
 when the failure happened - but somehow it didnt go into the logs.

 Could you post the complete logs from 14:00 to 14:30?


 Sure. Here goes. Attached are two logs and corosync.conf -
 1. syslog (Edited, messages from other modules removed. I have not touched
 the pacemaker/corosync related messages)
 2 corosync.log (Unedited)
 3 corosync.conf

 Wanted to mention a couple of things:
 -- 14:06 is when the system was coming back up from a reboot. I have
 started from the earliest message during boot to the point the I_PE_CALC
 timer popped and a promote was called.
 -- I see the following during boot up. Does that mean pacemaker did not
 start?
 Apr 10 14:06:26 corosync [pcmk  ] info: process_ais_conf: Enabling MCP
 mode: Use the Pacemaker init script to complete Pacemaker startup

 Could that contribute to any of this behaviour?

 I'll be glad to provide any other information.


Did anybody get a chance to look at the information attached in the
previous email?

Thanks,
Pavan



 Pavan


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues

2013-04-11 Thread Andrew Beekhof

On 11/04/2013, at 8:15 AM, pavan tc pavan...@gmail.com wrote:

 Hi,
 
 [I did go through the mail thread titled: RHEL6 and clones: CMAN needed 
 anyway?, but was not sure about some answers there]
 
 I recently moved from pacemaker 1.1.7 to 1.1.8-7 on centos 6.2. I see the 
 following in syslog:
 
 corosync[2966]:   [pcmk  ] ERROR: process_ais_conf: You have configured a 
 cluster using the Pacemaker plugin for Corosync. The plugin is not supported 
 in this environment and will be removed very soon.
 corosync[2966]:   [pcmk  ] ERROR: process_ais_conf:  Please see Chapter 8 of 
 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on using 
 Pacemaker with CMAN
 
 Does this mean that my current configuration is incorrect and will not work 
 as it used to with pacemaker 1.1.7/Corosync?

It will continue to work until the Pacemaker plugin is removed from RHEL.

 
 I looked at the Clusters from Scratch instructions and it talks mostly 
 about GFS2. I don't have any filesystem requirements. In that case, can I 
 live with Pacemaker/Corosync?

Yes, but only until the Pacemaker plugin is removed from RHEL.

 
 I do understand that this config is not recommended, but the reason I ask is 
 because I am hitting a weird problem with this setup which I will explain 
 below. Just want to make sure that I don't start off with an erroneous setup.
 
 I have a two-node multi-state resource configured with the following config:
 
 [root@vsanqa4 ~]# crm configure show
 node vsanqa3
 node vsanqa4
 primitive vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e 
 ocf:heartbeat:vgc-cm-agent.ocf \
 params cluster_uuid=6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e \
 op monitor interval=30s role=Master timeout=100s \
 op monitor interval=31s role=Slave timeout=100s
 ms ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e 
 vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e \
 meta clone-max=2 globally-unique=false target-role=Started
 location ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e-nodes 
 ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e \
 rule $id=ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e-nodes-rule -inf: 
 #uname ne vsanqa4 and #uname ne vsanqa3
 property $id=cib-bootstrap-options \
 dc-version=1.1.8-7.el6-394e906 \
 cluster-infrastructure=classic openais (with plugin) \
 expected-quorum-votes=2 \
 stonith-enabled=false \
 no-quorum-policy=ignore
 rsc_defaults $id=rsc-options \
 resource-stickiness=100
 
 With this config, if I simulate a crash on the master with echo c  
 /proc/sysrq-trigger, the slave does not get promoted for about 15 minutes. 
 It does detect the peer going down, but does not seem to issue the promote 
 immediately:
 
 Apr 10 14:12:32 vsanqa4 corosync[2966]:   [TOTEM ] A processor failed, 
 forming new configuration.
 Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] notice: pcmk_peer_update: 
 Transitional membership event on ring 166060: memb=1, new=0, lost=1
 Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] info: pcmk_peer_update: 
 memb: vsanqa4 1967394988
 Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] info: pcmk_peer_update: 
 lost: vsanqa3 1950617772
 Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] notice: pcmk_peer_update: 
 Stable membership event on ring 166060: memb=1, new=0, lost=0
 Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] info: pcmk_peer_update: 
 MEMB: vsanqa4 1967394988
 Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] info: 
 ais_mark_unseen_peer_dead: Node vsanqa3 was not seen in the previous 
 transition
 Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] info: update_member: Node 
 1950617772/vsanqa3 is now: lost
 Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] info: 
 send_member_notification: Sending membership update 166060 to 2 children
 Apr 10 14:12:38 vsanqa4 corosync[2966]:   [TOTEM ] A processor joined or left 
 the membership and a new membership was formed.
 Apr 10 14:12:38 vsanqa4 cib[3386]:   notice: ais_dispatch_message: Membership 
 166060: quorum lost
 Apr 10 14:12:38 vsanqa4 crmd[3391]:   notice: ais_dispatch_message: 
 Membership 166060: quorum lost
 Apr 10 14:12:38 vsanqa4 cib[3386]:   notice: crm_update_peer_state: 
 crm_update_ais_node: Node vsanqa3[1950617772] - state is now lost
 Apr 10 14:12:38 vsanqa4 crmd[3391]:   notice: crm_update_peer_state: 
 crm_update_ais_node: Node vsanqa3[1950617772] - state is now lost
 Apr 10 14:12:38 vsanqa4 corosync[2966]:   [CPG   ] chosen downlist: sender 
 r(0) ip(172.16.68.117) ; members(old:2 left:1)
 Apr 10 14:12:38 vsanqa4 corosync[2966]:   [MAIN  ] Completed service 
 synchronization, ready to provide service.
 
 Then (after about 15 minutes), I see the following:

There were no logs at all in between?

 
 Apr 10 14:26:46 vsanqa4 crmd[3391]:   notice: do_state_transition: State 
 transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED 
 origin=crm_timer_popped ]
 Apr 10 14:26:46 vsanqa4 pengine[3390]:   notice: unpack_config: On 

Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues

2013-04-11 Thread pavan tc
Hi Andrew,

Thanks much for looking at this.


 Then (after about 15 minutes), I see the following:

 There were no logs at all in between?


Absolutely none in the syslog. Only the regular monitor logs from my
resource agent which continued to report as secondary.
I also checked /var/log/cluster/corosync.log. The only difference between
this and the ones in syslog are the messages below:

From /var/log/cluster/corosync.log:
---
Apr 10 14:12:38 [3391] vsanqa4   crmd:   notice:
ais_dispatch_message:  Membership 166060: quorum lost
Apr 10 14:12:38 [3386] vsanqa4cib:   notice:
crm_update_peer_state: crm_update_ais_node: Node vsanqa3[1950617772] -
state is now lost
Apr 10 14:12:38 [3391] vsanqa4   crmd:   notice:
crm_update_peer_state: crm_update_ais_node: Node vsanqa3[1950617772] -
state is now lost
Apr 10 14:12:38 [3391] vsanqa4   crmd: info:
peer_update_callback:  vsanqa3 is now lost (was member)
Apr 10 14:12:38 corosync [CPG   ] chosen downlist: sender r(0)
ip(172.16.68.117) ; members(old:2 left:1)
Apr 10 14:12:38 corosync [MAIN  ] Completed service synchronization, ready
to provide service.

Apr 10 14:12:38 [3386] vsanqa4cib: info:
cib_process_request:   Operation complete: op cib_modify for section
nodes (origin=local/crmd/62, version=0.668.12): OK (rc=0)
Apr 10 14:12:38 [3386] vsanqa4cib: info:
cib_process_request:   Operation complete: op cib_modify for section
cib (origin=local/crmd/64, version=0.668.14): OK (rc=0)
Apr 10 14:12:38 [3391] vsanqa4   crmd: info:
crmd_ais_dispatch: Setting expected votes to 2
Apr 10 14:12:38 [3386] vsanqa4cib: info:
cib_process_request:   Operation complete: op cib_modify for section
crm_config (origin=local/crmd/66, version=0.668.15): OK (rc=0)

The first six out of the 10 messages above were seen on syslog too. Adding
them here for context. The last four are the extra messages in
corosync.log

Pavan


 
  Apr 10 14:26:46 vsanqa4 crmd[3391]:   notice: do_state_transition: State
 transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
 origin=crm_timer_popped ]
  Apr 10 14:26:46 vsanqa4 pengine[3390]:   notice: unpack_config: On loss
 of CCM Quorum: Ignore
  Apr 10 14:26:46 vsanqa4 pengine[3390]:   notice: LogActions: Promote
 vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e:0#011(Slave - Master vsanqa4)
  Apr 10 14:26:46 vsanqa4 pengine[3390]:   notice: process_pe_message:
 Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-392.bz2
 
  Thanks,
  Pavan
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues

2013-04-11 Thread Andrew Beekhof

On 12/04/2013, at 12:11 PM, pavan tc pavan...@gmail.com wrote:

 Hi Andrew,
 
 Thanks much for looking at this.
 
 
  Then (after about 15 minutes), I see the following:
 
 There were no logs at all in between?
 
 Absolutely none in the syslog. Only the regular monitor logs from my resource 
 agent which continued to report as secondary.

This is very strange, because the thing that caused the I_PE_CALC is a timer 
that goes off every 15 minutes.
Which would seem to imply that there was a transition of some kind about when 
the failure happened - but somehow it didnt go into the logs.

Could you post the complete logs from 14:00 to 14:30?

 I also checked /var/log/cluster/corosync.log. The only difference between 
 this and the ones in syslog are the messages below:
 
 From /var/log/cluster/corosync.log:
 ---
 Apr 10 14:12:38 [3391] vsanqa4   crmd:   notice: ais_dispatch_message:
   Membership 166060: quorum lost
 Apr 10 14:12:38 [3386] vsanqa4cib:   notice: crm_update_peer_state:   
   crm_update_ais_node: Node vsanqa3[1950617772] - state is now lost
 Apr 10 14:12:38 [3391] vsanqa4   crmd:   notice: crm_update_peer_state:   
   crm_update_ais_node: Node vsanqa3[1950617772] - state is now lost
 Apr 10 14:12:38 [3391] vsanqa4   crmd: info: peer_update_callback:
   vsanqa3 is now lost (was member)
 Apr 10 14:12:38 corosync [CPG   ] chosen downlist: sender r(0) 
 ip(172.16.68.117) ; members(old:2 left:1)
 Apr 10 14:12:38 corosync [MAIN  ] Completed service synchronization, ready to 
 provide service.
 
 Apr 10 14:12:38 [3386] vsanqa4cib: info: cib_process_request: 
   Operation complete: op cib_modify for section nodes (origin=local/crmd/62, 
 version=0.668.12): OK (rc=0)
 Apr 10 14:12:38 [3386] vsanqa4cib: info: cib_process_request: 
   Operation complete: op cib_modify for section cib (origin=local/crmd/64, 
 version=0.668.14): OK (rc=0)
 Apr 10 14:12:38 [3391] vsanqa4   crmd: info: crmd_ais_dispatch:   
   Setting expected votes to 2
 Apr 10 14:12:38 [3386] vsanqa4cib: info: cib_process_request: 
   Operation complete: op cib_modify for section crm_config 
 (origin=local/crmd/66, version=0.668.15): OK (rc=0)
 
 The first six out of the 10 messages above were seen on syslog too. Adding 
 them here for context. The last four are the extra messages in corosync.log
 
 Pavan
 
 
 
  Apr 10 14:26:46 vsanqa4 crmd[3391]:   notice: do_state_transition: State 
  transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED 
  origin=crm_timer_popped ]
  Apr 10 14:26:46 vsanqa4 pengine[3390]:   notice: unpack_config: On loss of 
  CCM Quorum: Ignore
  Apr 10 14:26:46 vsanqa4 pengine[3390]:   notice: LogActions: Promote 
  vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e:0#011(Slave - Master vsanqa4)
  Apr 10 14:26:46 vsanqa4 pengine[3390]:   notice: process_pe_message: 
  Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-392.bz2
 
  Thanks,
  Pavan
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues

2013-04-10 Thread pavan tc
Hi,

[I did go through the mail thread titled: RHEL6 and clones: CMAN needed
anyway?, but was not sure about some answers there]

I recently moved from pacemaker 1.1.7 to 1.1.8-7 on centos 6.2. I see the
following in syslog:

corosync[2966]:   [pcmk  ] ERROR: process_ais_conf: You have configured a
cluster using the Pacemaker plugin for Corosync. The plugin is not
supported in this environment and will be removed very soon.
corosync[2966]:   [pcmk  ] ERROR: process_ais_conf:  Please see Chapter 8
of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on
using Pacemaker with CMAN

Does this mean that my current configuration is incorrect and will not work
as it used to with pacemaker 1.1.7/Corosync?

I looked at the Clusters from Scratch instructions and it talks mostly
about GFS2. I don't have any filesystem requirements. In that case, can I
live with Pacemaker/Corosync?

I do understand that this config is not recommended, but the reason I ask
is because I am hitting a weird problem with this setup which I will
explain below. Just want to make sure that I don't start off with an
erroneous setup.

I have a two-node multi-state resource configured with the following config:

[root@vsanqa4 ~]# crm configure show
node vsanqa3
node vsanqa4
primitive vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e
ocf:heartbeat:vgc-cm-agent.ocf \
params cluster_uuid=6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e \
op monitor interval=30s role=Master timeout=100s \
op monitor interval=31s role=Slave timeout=100s
ms ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e
vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e \
meta clone-max=2 globally-unique=false target-role=Started
location ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e-nodes
ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e \
rule $id=ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e-nodes-rule -inf:
#uname ne vsanqa4 and #uname ne vsanqa3
property $id=cib-bootstrap-options \
dc-version=1.1.8-7.el6-394e906 \
cluster-infrastructure=classic openais (with plugin) \
expected-quorum-votes=2 \
stonith-enabled=false \
no-quorum-policy=ignore
rsc_defaults $id=rsc-options \
resource-stickiness=100

With this config, if I simulate a crash on the master with echo c 
/proc/sysrq-trigger, the slave does not get promoted for about 15 minutes.
It does detect the peer going down, but does not seem to issue the promote
immediately:

Apr 10 14:12:32 vsanqa4 corosync[2966]:   [TOTEM ] A processor failed,
forming new configuration.
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] notice:
pcmk_peer_update: Transitional membership event on ring 166060: memb=1,
new=0, lost=1
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] info: pcmk_peer_update:
memb: vsanqa4 1967394988
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] info: pcmk_peer_update:
lost: vsanqa3 1950617772
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] notice:
pcmk_peer_update: Stable membership event on ring 166060: memb=1, new=0,
lost=0
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] info: pcmk_peer_update:
MEMB: vsanqa4 1967394988
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] info:
ais_mark_unseen_peer_dead: Node vsanqa3 was not seen in the previous
transition
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] info: update_member:
Node 1950617772/vsanqa3 is now: lost
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] info:
send_member_notification: Sending membership update 166060 to 2 children
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Apr 10 14:12:38 vsanqa4 cib[3386]:   notice: ais_dispatch_message:
Membership 166060: quorum lost
Apr 10 14:12:38 vsanqa4 crmd[3391]:   notice: ais_dispatch_message:
Membership 166060: quorum lost
Apr 10 14:12:38 vsanqa4 cib[3386]:   notice: crm_update_peer_state:
crm_update_ais_node: Node vsanqa3[1950617772] - state is now lost
Apr 10 14:12:38 vsanqa4 crmd[3391]:   notice: crm_update_peer_state:
crm_update_ais_node: Node vsanqa3[1950617772] - state is now lost
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [CPG   ] chosen downlist: sender
r(0) ip(172.16.68.117) ; members(old:2 left:1)
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [MAIN  ] Completed service
synchronization, ready to provide service.

Then (after about 15 minutes), I see the following:

Apr 10 14:26:46 vsanqa4 crmd[3391]:   notice: do_state_transition: State
transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
origin=crm_timer_popped ]
Apr 10 14:26:46 vsanqa4 pengine[3390]:   notice: unpack_config: On loss of
CCM Quorum: Ignore
Apr 10 14:26:46 vsanqa4 pengine[3390]:   notice: LogActions: Promote
vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e:0#011(Slave - Master vsanqa4)
Apr 10 14:26:46 vsanqa4 pengine[3390]:   notice: process_pe_message:
Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-392.bz2

Thanks,
Pavan
___
Pacemaker