Re: [ClusterLabs] All IP resources deleted once a fenced node rejoins

2016-01-15 Thread Ken Gaillot
On 01/15/2016 11:08 AM, Ken Gaillot wrote:

>> Jan 13 19:33:00 [4291] oranacib: info:
>> cib_process_replace: Replacement 0.4.0 from kamet not applied to
>> 0.74.1: current epoch is greater than the replacement
>> Jan 13 19:33:00 [4291] oranacib:  warning:
>> cib_process_request: Completed cib_replace operation for section
>> 'all': Update was older than existing configuration (rc=-205,
>> origin=kamet/cibadmin/2, version=0.74.1)

I misread. Looking at it again, the above means that the old
configuration was indeed rejected for section "all". However:

>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op:
>> Diff: --- 0.74.1 2
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op:
>> Diff: +++ 0.75.0 (null)
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/nodes/node[@id='kamet']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/nodes/node[@id='orana']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/resources/primitive[@id='fence-uc-orana']/meta_attributes[@id='fence-uc-orana-meta_attributes']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/resources/primitive[@id='fence-uc-kamet']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/resources/primitive[@id='C-3']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/resources/primitive[@id='C-FLT']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/resources/primitive[@id='C-FLT2']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/resources/primitive[@id='E-3']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/resources/primitive[@id='MGMT-FLT']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/resources/primitive[@id='M-FLT']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/resources/primitive[@id='M-FLT2']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/resources/primitive[@id='S-FLT']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/resources/primitive[@id='S-FLT2']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/constraints/rsc_colocation[@id='colocation-C-3-foo-master-INFINITY']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/constraints/rsc_order[@id='order-C-3-foo-master-mandatory']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/constraints/rsc_colocation[@id='colocation-C-FLT-foo-master-INFINITY']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/constraints/rsc_order[@id='order-C-FLT-foo-master-mandatory']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/constraints/rsc_colocation[@id='colocation-C-FLT2-foo-master-INFINITY']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/constraints/rsc_order[@id='order-C-FLT2-foo-master-mandatory']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/constraints/rsc_colocation[@id='colocation-E-3-foo-master-INFINITY']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/constraints/rsc_order[@id='order-E-3-foo-master-mandatory']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/constraints/rsc_colocation[@id='colocation-MGMT-FLT-foo-master-INFINITY']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/constraints/rsc_order[@id='order-MGMT-FLT-foo-master-mandatory']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/constraints/rsc_colocation[@id='colocation-M-FLT-foo-master-INFINITY']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/constraints/rsc_order[@id='order-M-FLT-foo-master-mandatory']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/constraints/rsc_colocation[@id='colocation-M-FLT2-foo-master-INFINITY']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/constraints/rsc_order[@id='order-M-FLT2-foo-master-mandatory']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/configuration/constraints/rsc_colocation[@id='colocation-S-FLT-foo-master-INFINITY']
>> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
>> /cib/c

Re: [ClusterLabs] All IP resources deleted once a fenced node rejoins

2016-01-15 Thread Ken Gaillot
On 01/15/2016 05:02 AM, Arjun Pandey wrote:
> Based  on corosync logs from orana ( The node that did the actual
> fencing  and is the current master node)
> 
> I also tried looking at pengine outputs based on crm_simulate. Uptil
> the fenced node rejoins things look good.
> 
> [root@ucc1 orana]# crm_simulate -S --xml-file
> ./pengine/pe-input-1450.bz2  -u kamet
> Current cluster status:
> Node kamet: pending
> Online: [ orana ]

Above, "pending" means that the node has started to join the cluster,
but has not yet fully joined.


> Jan 13 19:32:53 [4295] oranapengine: info: probe_resources:
> Action probe_complete-kamet on kamet is unrunnable (pending)

Any action on kamet is unrunnable until it finishes joining the cluster.


> Jan 13 19:32:59 [4292] orana stonith-ng: info:
> crm_update_peer_proc: pcmk_cpg_membership: Node kamet[2] -
> corosync-cpg is now online

The pacemaker daemons on orana each report when they see kamet come up
at the corosync level. Here, stonith-ng sees it.


> Jan 13 19:32:59 [4291] oranacib: info:
> crm_update_peer_proc: pcmk_cpg_membership: Node kamet[2] -
> corosync-cpg is now online

Now, the cib sees it.


> Jan 13 19:33:00 [4296] orana   crmd: info:
> crm_update_peer_proc: pcmk_cpg_membership: Node kamet[2] -
> corosync-cpg is now online

Now, crmd sees it.


 [Arjun] Why does pengine declare that the following monitor actions are 
 now unrunnable ?
> 
> Jan 13 19:33:00 [4295] oranapengine:  warning: custom_action:
> Action foo:0_monitor_0 on kamet is unrunnable (pending)

At this point, pengine still hasn't seen kamet join yet, so actions on
it are still unrunnable.


> Jan 13 19:33:00 [4296] orana   crmd: info: join_make_offer:
> join-2: Sending offer to kamet

Having seen kamet at the corosync level, crmd now offers cluster-level
membership to kamet.


> Jan 13 19:33:00 [4291] oranacib: info:
> cib_process_replace: Replacement 0.4.0 from kamet not applied to
> 0.74.1: current epoch is greater than the replacement
> Jan 13 19:33:00 [4291] oranacib:  warning:
> cib_process_request: Completed cib_replace operation for section
> 'all': Update was older than existing configuration (rc=-205,
> origin=kamet/cibadmin/2, version=0.74.1)
> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op:
> Diff: --- 0.74.1 2
> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op:
> Diff: +++ 0.75.0 (null)
> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
> /cib/configuration/nodes/node[@id='kamet']
> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
> /cib/configuration/nodes/node[@id='orana']
> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
> /cib/configuration/resources/primitive[@id='fence-uc-orana']/meta_attributes[@id='fence-uc-orana-meta_attributes']
> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
> /cib/configuration/resources/primitive[@id='fence-uc-kamet']
> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
> /cib/configuration/resources/primitive[@id='C-3']
> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
> /cib/configuration/resources/primitive[@id='C-FLT']
> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
> /cib/configuration/resources/primitive[@id='C-FLT2']
> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
> /cib/configuration/resources/primitive[@id='E-3']
> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
> /cib/configuration/resources/primitive[@id='MGMT-FLT']
> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
> /cib/configuration/resources/primitive[@id='M-FLT']
> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
> /cib/configuration/resources/primitive[@id='M-FLT2']
> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
> /cib/configuration/resources/primitive[@id='S-FLT']
> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
> /cib/configuration/resources/primitive[@id='S-FLT2']
> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_colocation[@id='colocation-C-3-foo-master-INFINITY']
> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_order[@id='order-C-3-foo-master-mandatory']
> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_colocation[@id='colocation-C-FLT-foo-master-INFINITY']
> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_order[@id='order-C-FLT-foo-master-mandatory']
> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_colocation[@id='colocation-C-FLT2-foo-master-INFINITY']
> Jan 13 19:33:00 [4291] oranacib: in

Re: [ClusterLabs] All IP resources deleted once a fenced node rejoins

2016-01-15 Thread Arjun Pandey
Based  on corosync logs from orana ( The node that did the actual
fencing  and is the current master node)

I also tried looking at pengine outputs based on crm_simulate. Uptil
the fenced node rejoins things look good.

[root@ucc1 orana]# crm_simulate -S --xml-file
./pengine/pe-input-1450.bz2  -u kamet
Current cluster status:
Node kamet: pending
Online: [ orana ]

 Master/Slave Set: foo-master [foo]
 Masters: [ orana ]
 Stopped: [ kamet ]
 fence-uc-orana (stonith:fence_ilo4): Started orana
 fence-uc-kamet (stonith:fence_ilo4): Started orana
 C-3G (ocf::pw:IPaddr): Started orana
 C-FLT (ocf::pw:IPaddr): Started orana
 C-FLT2 (ocf::pw:IPaddr): Started orana
 E-3G (ocf::pw:IPaddr): Started orana
 MGMT-FLT (ocf::pw:IPaddr): Started orana
 M-FLT (ocf::pw:IPaddr): Started orana
 M-FLT2 (ocf::pw:IPaddr): Started orana
 S-FLT (ocf::pw:IPaddr): Started orana
 S-FLT2 (ocf::pw:IPaddr): Started orana

Performing requested modifications
 + Bringing node kamet online

Transition Summary:
 * Start   foo:1 (kamet)

Executing cluster transition:
 * Resource action: foomonitor on kamet
 * Pseudo action:   foo-master_pre_notify_start_0
 * Resource action: fence-uc-orana  monitor on kamet
 * Resource action: fence-uc-kamet  monitor on kamet
 * Resource action: C-3G  monitor on kamet
 * Resource action: C-FLT monitor on kamet
 * Resource action: C-FLT2monitor on kamet
 * Resource action: E-3G  monitor on kamet
 * Resource action: MGMT-FLTmonitor on kamet
 * Resource action: M-FLT monitor on kamet
 * Resource action: M-FLT2monitor on kamet
 * Resource action: S-FLT monitor on kamet
 * Resource action: S-FLT2monitor on kamet
 * Pseudo action:   probe_complete
 * Resource action: foonotify on orana
 * Pseudo action:   foo-master_confirmed-pre_notify_start_0
 * Pseudo action:   foo-master_start_0
 * Resource action: foostart on kamet
 * Pseudo action:   foo-master_running_0
 * Pseudo action:   foo-master_post_notify_running_0
 * Resource action: foonotify on orana
 * Resource action: foonotify on kamet
 * Pseudo action:   foo-master_confirmed-post_notify_running_0
 * Resource action: foomonitor=11000 on kamet

Revised cluster status:
Online: [ kamet orana ]

 Master/Slave Set: foo-master [foo]
 Masters: [ orana ]
 Slaves: [ kamet ]
 fence-uc-orana (stonith:fence_ilo4): Started orana
 fence-uc-kamet (stonith:fence_ilo4): Started orana
 C-3G (ocf::pw:IPaddr): Started orana
 C-FLT (ocf::pw:IPaddr): Started orana
 C-FLT2 (ocf::pw:IPaddr): Started orana
 E-3G (ocf::pw:IPaddr): Started orana
 MGMT-FLT (ocf::pw:IPaddr): Started orana
 M-FLT (ocf::pw:IPaddr): Started orana
 M-FLT2 (ocf::pw:IPaddr): Started orana
 S-FLT (ocf::pw:IPaddr): Started orana
 S-FLT2 (ocf::pw:IPaddr): Started orana

I see things run fine till the other node comes up and as soon as the
the other node joins and i see the following pengine behaviour marked
with [Arjun]

Jan 13 19:32:44 corosync [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Jan 13 19:32:44 corosync [QUORUM] Members[2]: 1 2
Jan 13 19:32:44 corosync [QUORUM] Members[2]: 1 2
Jan 13 19:32:44 [4296] orana   crmd: info:
cman_event_callback: Membership 7044: quorum retained
Jan 13 19:32:44 [4296] orana   crmd:   notice:
crm_update_peer_state: cman_event_callback: Node kamet[2] - state is
now member (was lost)
Jan 13 19:32:44 [4296] orana   crmd: info:
peer_update_callback: kamet is now member (was lost)
Jan 13 19:32:44 [4291] oranacib: info:
cib_process_request: Forwarding cib_modify operation for section
status to master (origin=local/crmd/2482)
Jan 13 19:32:44 [4296] orana   crmd: info: crm_cs_flush: Sent
0 CPG messages  (1 remaining, last=122): Try again (6)
Jan 13 19:32:44 [4291] oranacib: info: crm_cs_flush: Sent
0 CPG messages  (1 remaining, last=240): Try again (6)
Jan 13 19:32:44 [4296] orana   crmd: info:
cman_event_callback: Membership 7044: quorum retained
Jan 13 19:32:44 [4291] oranacib: info:
cib_process_request: Forwarding cib_modify operation for section nodes
to master (origin=local/crmd/2483)
Jan 13 19:32:44 [4291] oranacib: info:
cib_process_request: Forwarding cib_modify operation for section
status to master (origin=local/crmd/2484)
Jan 13 19:32:44 [4291] oranacib: info:
cib_process_request: Forwarding cib_modify operation for section nodes
to master (origin=local/crmd/2485)
Jan 13 19:32:44 [4291] oranacib: info:
cib_process_request: Forwarding cib_modify operation for section
status to master (origin=local/crmd/2486)
Jan 13 19:32:44 corosync [CPG   ] chosen downlist: sender r(0)
ip(7.7.7.1) ; members(old:1 left:0)
Jan 13 19:32:44 corosync [MAIN  ] Completed service synchronization,
ready to provide service.
Jan 13 19:32:44 [4291] oranacib: info: crm_cs_flush: Sent
5 CPG messages  (0 remaining, last=