Re: [ClusterLabs] All IP resources deleted once a fenced node rejoins
On 01/15/2016 11:08 AM, Ken Gaillot wrote: >> Jan 13 19:33:00 [4291] oranacib: info: >> cib_process_replace: Replacement 0.4.0 from kamet not applied to >> 0.74.1: current epoch is greater than the replacement >> Jan 13 19:33:00 [4291] oranacib: warning: >> cib_process_request: Completed cib_replace operation for section >> 'all': Update was older than existing configuration (rc=-205, >> origin=kamet/cibadmin/2, version=0.74.1) I misread. Looking at it again, the above means that the old configuration was indeed rejected for section "all". However: >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: >> Diff: --- 0.74.1 2 >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: >> Diff: +++ 0.75.0 (null) >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/nodes/node[@id='kamet'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/nodes/node[@id='orana'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/resources/primitive[@id='fence-uc-orana']/meta_attributes[@id='fence-uc-orana-meta_attributes'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/resources/primitive[@id='fence-uc-kamet'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/resources/primitive[@id='C-3'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/resources/primitive[@id='C-FLT'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/resources/primitive[@id='C-FLT2'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/resources/primitive[@id='E-3'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/resources/primitive[@id='MGMT-FLT'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/resources/primitive[@id='M-FLT'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/resources/primitive[@id='M-FLT2'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/resources/primitive[@id='S-FLT'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/resources/primitive[@id='S-FLT2'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/constraints/rsc_colocation[@id='colocation-C-3-foo-master-INFINITY'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/constraints/rsc_order[@id='order-C-3-foo-master-mandatory'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/constraints/rsc_colocation[@id='colocation-C-FLT-foo-master-INFINITY'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/constraints/rsc_order[@id='order-C-FLT-foo-master-mandatory'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/constraints/rsc_colocation[@id='colocation-C-FLT2-foo-master-INFINITY'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/constraints/rsc_order[@id='order-C-FLT2-foo-master-mandatory'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/constraints/rsc_colocation[@id='colocation-E-3-foo-master-INFINITY'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/constraints/rsc_order[@id='order-E-3-foo-master-mandatory'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/constraints/rsc_colocation[@id='colocation-MGMT-FLT-foo-master-INFINITY'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/constraints/rsc_order[@id='order-MGMT-FLT-foo-master-mandatory'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/constraints/rsc_colocation[@id='colocation-M-FLT-foo-master-INFINITY'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/constraints/rsc_order[@id='order-M-FLT-foo-master-mandatory'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/constraints/rsc_colocation[@id='colocation-M-FLT2-foo-master-INFINITY'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/constraints/rsc_order[@id='order-M-FLT2-foo-master-mandatory'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/configuration/constraints/rsc_colocation[@id='colocation-S-FLT-foo-master-INFINITY'] >> Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- >> /cib/c
Re: [ClusterLabs] All IP resources deleted once a fenced node rejoins
On 01/15/2016 05:02 AM, Arjun Pandey wrote: > Based on corosync logs from orana ( The node that did the actual > fencing and is the current master node) > > I also tried looking at pengine outputs based on crm_simulate. Uptil > the fenced node rejoins things look good. > > [root@ucc1 orana]# crm_simulate -S --xml-file > ./pengine/pe-input-1450.bz2 -u kamet > Current cluster status: > Node kamet: pending > Online: [ orana ] Above, "pending" means that the node has started to join the cluster, but has not yet fully joined. > Jan 13 19:32:53 [4295] oranapengine: info: probe_resources: > Action probe_complete-kamet on kamet is unrunnable (pending) Any action on kamet is unrunnable until it finishes joining the cluster. > Jan 13 19:32:59 [4292] orana stonith-ng: info: > crm_update_peer_proc: pcmk_cpg_membership: Node kamet[2] - > corosync-cpg is now online The pacemaker daemons on orana each report when they see kamet come up at the corosync level. Here, stonith-ng sees it. > Jan 13 19:32:59 [4291] oranacib: info: > crm_update_peer_proc: pcmk_cpg_membership: Node kamet[2] - > corosync-cpg is now online Now, the cib sees it. > Jan 13 19:33:00 [4296] orana crmd: info: > crm_update_peer_proc: pcmk_cpg_membership: Node kamet[2] - > corosync-cpg is now online Now, crmd sees it. [Arjun] Why does pengine declare that the following monitor actions are now unrunnable ? > > Jan 13 19:33:00 [4295] oranapengine: warning: custom_action: > Action foo:0_monitor_0 on kamet is unrunnable (pending) At this point, pengine still hasn't seen kamet join yet, so actions on it are still unrunnable. > Jan 13 19:33:00 [4296] orana crmd: info: join_make_offer: > join-2: Sending offer to kamet Having seen kamet at the corosync level, crmd now offers cluster-level membership to kamet. > Jan 13 19:33:00 [4291] oranacib: info: > cib_process_replace: Replacement 0.4.0 from kamet not applied to > 0.74.1: current epoch is greater than the replacement > Jan 13 19:33:00 [4291] oranacib: warning: > cib_process_request: Completed cib_replace operation for section > 'all': Update was older than existing configuration (rc=-205, > origin=kamet/cibadmin/2, version=0.74.1) > Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: > Diff: --- 0.74.1 2 > Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: > Diff: +++ 0.75.0 (null) > Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- > /cib/configuration/nodes/node[@id='kamet'] > Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- > /cib/configuration/nodes/node[@id='orana'] > Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- > /cib/configuration/resources/primitive[@id='fence-uc-orana']/meta_attributes[@id='fence-uc-orana-meta_attributes'] > Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- > /cib/configuration/resources/primitive[@id='fence-uc-kamet'] > Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- > /cib/configuration/resources/primitive[@id='C-3'] > Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- > /cib/configuration/resources/primitive[@id='C-FLT'] > Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- > /cib/configuration/resources/primitive[@id='C-FLT2'] > Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- > /cib/configuration/resources/primitive[@id='E-3'] > Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- > /cib/configuration/resources/primitive[@id='MGMT-FLT'] > Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- > /cib/configuration/resources/primitive[@id='M-FLT'] > Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- > /cib/configuration/resources/primitive[@id='M-FLT2'] > Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- > /cib/configuration/resources/primitive[@id='S-FLT'] > Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- > /cib/configuration/resources/primitive[@id='S-FLT2'] > Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- > /cib/configuration/constraints/rsc_colocation[@id='colocation-C-3-foo-master-INFINITY'] > Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- > /cib/configuration/constraints/rsc_order[@id='order-C-3-foo-master-mandatory'] > Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- > /cib/configuration/constraints/rsc_colocation[@id='colocation-C-FLT-foo-master-INFINITY'] > Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- > /cib/configuration/constraints/rsc_order[@id='order-C-FLT-foo-master-mandatory'] > Jan 13 19:33:00 [4291] oranacib: info: cib_perform_op: -- > /cib/configuration/constraints/rsc_colocation[@id='colocation-C-FLT2-foo-master-INFINITY'] > Jan 13 19:33:00 [4291] oranacib: in
Re: [ClusterLabs] All IP resources deleted once a fenced node rejoins
Based on corosync logs from orana ( The node that did the actual fencing and is the current master node) I also tried looking at pengine outputs based on crm_simulate. Uptil the fenced node rejoins things look good. [root@ucc1 orana]# crm_simulate -S --xml-file ./pengine/pe-input-1450.bz2 -u kamet Current cluster status: Node kamet: pending Online: [ orana ] Master/Slave Set: foo-master [foo] Masters: [ orana ] Stopped: [ kamet ] fence-uc-orana (stonith:fence_ilo4): Started orana fence-uc-kamet (stonith:fence_ilo4): Started orana C-3G (ocf::pw:IPaddr): Started orana C-FLT (ocf::pw:IPaddr): Started orana C-FLT2 (ocf::pw:IPaddr): Started orana E-3G (ocf::pw:IPaddr): Started orana MGMT-FLT (ocf::pw:IPaddr): Started orana M-FLT (ocf::pw:IPaddr): Started orana M-FLT2 (ocf::pw:IPaddr): Started orana S-FLT (ocf::pw:IPaddr): Started orana S-FLT2 (ocf::pw:IPaddr): Started orana Performing requested modifications + Bringing node kamet online Transition Summary: * Start foo:1 (kamet) Executing cluster transition: * Resource action: foomonitor on kamet * Pseudo action: foo-master_pre_notify_start_0 * Resource action: fence-uc-orana monitor on kamet * Resource action: fence-uc-kamet monitor on kamet * Resource action: C-3G monitor on kamet * Resource action: C-FLT monitor on kamet * Resource action: C-FLT2monitor on kamet * Resource action: E-3G monitor on kamet * Resource action: MGMT-FLTmonitor on kamet * Resource action: M-FLT monitor on kamet * Resource action: M-FLT2monitor on kamet * Resource action: S-FLT monitor on kamet * Resource action: S-FLT2monitor on kamet * Pseudo action: probe_complete * Resource action: foonotify on orana * Pseudo action: foo-master_confirmed-pre_notify_start_0 * Pseudo action: foo-master_start_0 * Resource action: foostart on kamet * Pseudo action: foo-master_running_0 * Pseudo action: foo-master_post_notify_running_0 * Resource action: foonotify on orana * Resource action: foonotify on kamet * Pseudo action: foo-master_confirmed-post_notify_running_0 * Resource action: foomonitor=11000 on kamet Revised cluster status: Online: [ kamet orana ] Master/Slave Set: foo-master [foo] Masters: [ orana ] Slaves: [ kamet ] fence-uc-orana (stonith:fence_ilo4): Started orana fence-uc-kamet (stonith:fence_ilo4): Started orana C-3G (ocf::pw:IPaddr): Started orana C-FLT (ocf::pw:IPaddr): Started orana C-FLT2 (ocf::pw:IPaddr): Started orana E-3G (ocf::pw:IPaddr): Started orana MGMT-FLT (ocf::pw:IPaddr): Started orana M-FLT (ocf::pw:IPaddr): Started orana M-FLT2 (ocf::pw:IPaddr): Started orana S-FLT (ocf::pw:IPaddr): Started orana S-FLT2 (ocf::pw:IPaddr): Started orana I see things run fine till the other node comes up and as soon as the the other node joins and i see the following pengine behaviour marked with [Arjun] Jan 13 19:32:44 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Jan 13 19:32:44 corosync [QUORUM] Members[2]: 1 2 Jan 13 19:32:44 corosync [QUORUM] Members[2]: 1 2 Jan 13 19:32:44 [4296] orana crmd: info: cman_event_callback: Membership 7044: quorum retained Jan 13 19:32:44 [4296] orana crmd: notice: crm_update_peer_state: cman_event_callback: Node kamet[2] - state is now member (was lost) Jan 13 19:32:44 [4296] orana crmd: info: peer_update_callback: kamet is now member (was lost) Jan 13 19:32:44 [4291] oranacib: info: cib_process_request: Forwarding cib_modify operation for section status to master (origin=local/crmd/2482) Jan 13 19:32:44 [4296] orana crmd: info: crm_cs_flush: Sent 0 CPG messages (1 remaining, last=122): Try again (6) Jan 13 19:32:44 [4291] oranacib: info: crm_cs_flush: Sent 0 CPG messages (1 remaining, last=240): Try again (6) Jan 13 19:32:44 [4296] orana crmd: info: cman_event_callback: Membership 7044: quorum retained Jan 13 19:32:44 [4291] oranacib: info: cib_process_request: Forwarding cib_modify operation for section nodes to master (origin=local/crmd/2483) Jan 13 19:32:44 [4291] oranacib: info: cib_process_request: Forwarding cib_modify operation for section status to master (origin=local/crmd/2484) Jan 13 19:32:44 [4291] oranacib: info: cib_process_request: Forwarding cib_modify operation for section nodes to master (origin=local/crmd/2485) Jan 13 19:32:44 [4291] oranacib: info: cib_process_request: Forwarding cib_modify operation for section status to master (origin=local/crmd/2486) Jan 13 19:32:44 corosync [CPG ] chosen downlist: sender r(0) ip(7.7.7.1) ; members(old:1 left:0) Jan 13 19:32:44 corosync [MAIN ] Completed service synchronization, ready to provide service. Jan 13 19:32:44 [4291] oranacib: info: crm_cs_flush: Sent 5 CPG messages (0 remaining, last=