Re: [Pacemaker] Pacemaker resource migration behaviour

Andrew Beekhof Tue, 05 Mar 2013 22:35:27 -0800

Unfortunately the config only tells half of the story, the really
important parts are in the status.
Do you still happen to have
/opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-156.bz2 on mu
around?  That would have what we need.


On Wed, Feb 6, 2013 at 1:12 AM, James Guthrie <j...@open.ch> wrote:
> Hi all,
>
> as a follow-up to this, I realised that I needed to slightly change the way 
> the resource constraints are put together, but I'm still seeing the same 
> behaviour.
>
> Below are an excerpt from the logs on the host and the revised xml 
> configuration. In this case, I caused two failures on the host mu, which 
> forced the resources onto nu then I forced two failures on nu. What can be 
> seen in the logs are the two detected failures on nu (the "warning: 
> update_failcount:" lines). After the two failures on nu, the VIP is migrated 
> back to mu, but none of the "support" resources are promoted with it.
>
> Regards,
> James
>
> <1c>Feb  5 14:58:45 mu crmd[31482]:  warning: update_failcount: Updating 
> failcount for sub-squid on nu after failed monitor: rc=9 (update=value++, 
> time=1360072725)
> <1d>Feb  5 14:58:45 mu crmd[31482]:   notice: do_state_transition: State 
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
> origin=abort_transition_graph ]
> <1d>Feb  5 14:58:45 mu pengine[31481]:   notice: unpack_config: On loss of 
> CCM Quorum: Ignore
> <1c>Feb  5 14:58:45 mu pengine[31481]:  warning: unpack_rsc_op: Processing 
> failed op monitor for sub-squid:0 on mu: master (failed) (9)
> <1c>Feb  5 14:58:45 mu pengine[31481]:  warning: unpack_rsc_op: Processing 
> failed op monitor for sub-squid:0 on nu: master (failed) (9)
> <1c>Feb  5 14:58:45 mu pengine[31481]:  warning: common_apply_stickiness: 
> Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:58:45 mu pengine[31481]:  warning: common_apply_stickiness: 
> Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:58:45 mu pengine[31481]:  warning: common_apply_stickiness: 
> Forcing master-squid away from mu after 2 failures (max=2)
> <1d>Feb  5 14:58:45 mu pengine[31481]:   notice: LogActions: Recover 
> sub-squid:0        (Master nu)
> <1d>Feb  5 14:58:45 mu pengine[31481]:   notice: process_pe_message: 
> Calculated Transition 64: 
> /opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-152.bz2
> <1d>Feb  5 14:58:45 mu pengine[31481]:   notice: unpack_config: On loss of 
> CCM Quorum: Ignore
> <1c>Feb  5 14:58:45 mu pengine[31481]:  warning: unpack_rsc_op: Processing 
> failed op monitor for sub-squid:0 on mu: master (failed) (9)
> <1c>Feb  5 14:58:45 mu pengine[31481]:  warning: unpack_rsc_op: Processing 
> failed op monitor for sub-squid:0 on nu: master (failed) (9)
> <1c>Feb  5 14:58:45 mu pengine[31481]:  warning: common_apply_stickiness: 
> Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:58:45 mu pengine[31481]:  warning: common_apply_stickiness: 
> Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:58:45 mu pengine[31481]:  warning: common_apply_stickiness: 
> Forcing master-squid away from mu after 2 failures (max=2)
> <1d>Feb  5 14:58:45 mu pengine[31481]:   notice: LogActions: Recover 
> sub-squid:0        (Master nu)
> <1d>Feb  5 14:58:45 mu pengine[31481]:   notice: process_pe_message: 
> Calculated Transition 65: 
> /opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-153.bz2
> <1d>Feb  5 14:58:48 mu crmd[31482]:   notice: run_graph: Transition 65 
> (Complete=14, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
> Source=/opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-153.bz2): 
> Complete
> <1d>Feb  5 14:58:48 mu crmd[31482]:   notice: do_state_transition: State 
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> <1d>Feb  5 14:58:58 mu conntrack-tools[1677]: flushing kernel conntrack table 
> (scheduled)
> <1c>Feb  5 14:59:10 mu crmd[31482]:  warning: update_failcount: Updating 
> failcount for sub-squid on nu after failed monitor: rc=9 (update=value++, 
> time=1360072750)
> <1d>Feb  5 14:59:10 mu crmd[31482]:   notice: do_state_transition: State 
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
> origin=abort_transition_graph ]
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: unpack_config: On loss of 
> CCM Quorum: Ignore
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: unpack_rsc_op: Processing 
> failed op monitor for sub-squid:0 on mu: master (failed) (9)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: unpack_rsc_op: Processing 
> failed op monitor for sub-squid:0 on nu: master (failed) (9)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: 
> Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: 
> Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: 
> Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: 
> Forcing master-squid away from nu after 2 failures (max=2)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: 
> Forcing master-squid away from nu after 2 failures (max=2)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: 
> Forcing master-squid away from nu after 2 failures (max=2)
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Demote  
> conntrackd:1       (Master -> Slave nu)
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Demote  
> condition:1        (Master -> Slave nu)
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Demote  
> sub-ospfd:1        (Master -> Slave nu)
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Demote  
> sub-ripd:1 (Master -> Slave nu)
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Demote  
> sub-squid:0        (Master -> Stopped nu)
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Move    
> eth1-0-192.168.1.10        (Started nu -> mu)
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: process_pe_message: 
> Calculated Transition 66: 
> /opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-154.bz2
> <1d>Feb  5 14:59:10 mu crmd[31482]:   notice: process_lrm_event: LRM 
> operation conntrackd_notify_0 (call=996, rc=0, cib-update=0, confirmed=true) 
> ok
> <1d>Feb  5 14:59:10 mu crmd[31482]:   notice: run_graph: Transition 66 
> (Complete=21, Pending=0, Fired=0, Skipped=15, Incomplete=6, 
> Source=/opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-154.bz2): Stopped
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: unpack_config: On loss of 
> CCM Quorum: Ignore
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: unpack_rsc_op: Processing 
> failed op monitor for sub-squid:0 on mu: master (failed) (9)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: unpack_rsc_op: Processing 
> failed op monitor for sub-squid:0 on nu: master (failed) (9)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: 
> Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: 
> Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: 
> Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: 
> Forcing master-squid away from nu after 2 failures (max=2)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: 
> Forcing master-squid away from nu after 2 failures (max=2)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: 
> Forcing master-squid away from nu after 2 failures (max=2)
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Demote  
> conntrackd:1       (Master -> Slave nu)
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Stop    
> sub-squid:0        (nu)
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Start   
> eth1-0-192.168.1.10        (mu)
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: process_pe_message: 
> Calculated Transition 67: 
> /opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-155.bz2
> <1d>Feb  5 14:59:10 mu crmd[31482]:   notice: process_lrm_event: LRM 
> operation conntrackd_notify_0 (call=1001, rc=0, cib-update=0, confirmed=true) 
> ok
> <1e>Feb  5 14:59:10 mu IPaddr2(eth1-0-192.168.1.10)[19429]: INFO: Adding inet 
> address 192.168.1.10/24 with broadcast address 192.168.1.255 to device eth1 
> (with label eth1:0)
> <1e>Feb  5 14:59:10 mu IPaddr2(eth1-0-192.168.1.10)[19429]: INFO: Bringing 
> device eth1 up
> <1e>Feb  5 14:59:11 mu IPaddr2(eth1-0-192.168.1.10)[19429]: INFO: 
> /opt/OSAGpcmk/resource-agents/lib/heartbeat/send_arp -i 200 -r 5 -p 
> /opt/OSAGpcmk/resource-agents/var/run/resource-agents/send_arp-192.168.1.10 
> eth1 192.168.1.10 auto not_used not_used
> <1d>Feb  5 14:59:12 mu crmd[31482]:   notice: process_lrm_event: LRM 
> operation eth1-0-192.168.1.10_start_0 (call=999, rc=0, cib-update=553, 
> confirmed=true) ok
> <1d>Feb  5 14:59:12 mu crmd[31482]:   notice: process_lrm_event: LRM 
> operation conntrackd_notify_0 (call=1005, rc=0, cib-update=0, confirmed=true) 
> ok
> <1d>Feb  5 14:59:12 mu crmd[31482]:   notice: run_graph: Transition 67 
> (Complete=20, Pending=0, Fired=0, Skipped=3, Incomplete=0, 
> Source=/opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-155.bz2): Stopped
> <1d>Feb  5 14:59:12 mu pengine[31481]:   notice: unpack_config: On loss of 
> CCM Quorum: Ignore
> <1c>Feb  5 14:59:12 mu pengine[31481]:  warning: unpack_rsc_op: Processing 
> failed op monitor for sub-squid:0 on mu: master (failed) (9)
> <1c>Feb  5 14:59:12 mu pengine[31481]:  warning: unpack_rsc_op: Processing 
> failed op monitor for sub-squid:0 on nu: master (failed) (9)
> <1c>Feb  5 14:59:12 mu pengine[31481]:  warning: common_apply_stickiness: 
> Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:59:12 mu pengine[31481]:  warning: common_apply_stickiness: 
> Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:59:12 mu pengine[31481]:  warning: common_apply_stickiness: 
> Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:59:12 mu pengine[31481]:  warning: common_apply_stickiness: 
> Forcing master-squid away from nu after 2 failures (max=2)
> <1c>Feb  5 14:59:12 mu pengine[31481]:  warning: common_apply_stickiness: 
> Forcing master-squid away from nu after 2 failures (max=2)
> <1c>Feb  5 14:59:12 mu pengine[31481]:  warning: common_apply_stickiness: 
> Forcing master-squid away from nu after 2 failures (max=2)
> <1d>Feb  5 14:59:12 mu pengine[31481]:   notice: process_pe_message: 
> Calculated Transition 68: 
> /opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-156.bz2
> <1d>Feb  5 14:59:12 mu crmd[31482]:   notice: process_lrm_event: LRM 
> operation eth1-0-192.168.1.10_monitor_10000 (call=1008, rc=0, cib-update=555, 
> confirmed=false) ok
> <1d>Feb  5 14:59:12 mu crmd[31482]:   notice: run_graph: Transition 68 
> (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
> Source=/opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-156.bz2): 
> Complete
> <1d>Feb  5 14:59:12 mu crmd[31482]:   notice: do_state_transition: State 
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
> cause=C_FSA_INTERNAL origin=notify_crmd ]
>
> <resources>
>   <!--resource for conntrackd-->
>   <master id="master-conntrackd">
>     <meta_attributes id="master-conntrackd-meta_attributes">
>       <nvpair id="master-conntrackd-meta_attributes-notify" name="notify" 
> value="true"/>
>       <nvpair id="master-conntrackd-meta_attributes-interleave" 
> name="interleave" value="true"/>
>       <nvpair id="master-conntrackd-meta_attributes-target-role" 
> name="target-role" value="Master"/>
>       <nvpair id="master-conndtrakd-meta_attributes-failure-timeout" 
> name="failure-timeout" value="600"/>
>       <nvpair id="master-conntrackd-meta_attributes-migration-threshold" 
> name="migration-threshold" value="2"/>
>     </meta_attributes>
>     <primitive id="conntrackd" class="ocf" provider="OSAG" type="conntrackd">
>       <operations>
>         <op id="conntrackd-slave-check" name="monitor" interval="60" 
> role="Slave" />
>         <op id="conntrackd-master-check" name="monitor" interval="61" 
> role="Master" />
>       </operations>
>     </primitive>
>   </master>
>
>   <!--resource for condition files-->
>   <master id="master-condition">
>     <meta_attributes id="master-condition-meta_attributes">
>       <nvpair id="master-condition-meta_attributes-notify" name="notify" 
> value="false"/>
>       <nvpair id="master-condition-meta_attributes-interleave" 
> name="interleave" value="true"/>
>       <nvpair id="master-condition-meta_attributes-target-role" 
> name="target-role" value="Master"/>
>       <nvpair id="master-condition-meta_attributes-failure-timeout" 
> name="failure-timeout" value="600"/>
>       <nvpair id="master-condition-meta_attributes-migration-threshold" 
> name="migration-threshold" value="2"/>
>     </meta_attributes>
>     <primitive id="condition" class="ocf" provider="OSAG" type="condition">
>       <instance_attributes id="condition-attrs">
>       </instance_attributes>
>       <operations>
>         <op id="condition-slave-check" name="monitor" interval="60" 
> role="Slave" />
>         <op id="condition-master-check" name="monitor" interval="61" 
> role="Master" />
>       </operations>
>     </primitive>
>   </master>
>
>   <!--resource for subsystem ospfd-->
>   <master id="master-ospfd">
>     <meta_attributes id="master-ospfd-meta_attributes">
>       <nvpair id="master-ospfd-meta_attributes-notify" name="notify" 
> value="false"/>
>       <nvpair id="master-ospfd-meta_attributes-interleave" name="interleave" 
> value="true"/>
>       <nvpair id="master-ospfd-meta_attributes-target-role" 
> name="target-role" value="Master"/>
>       <nvpair id="master-ospfd-meta_attributes-failure-timeout" 
> name="failure-timeout" value="600"/>
>       <nvpair id="master-ospfd-meta_attributes-migration-threshold" 
> name="migration-threshold" value="2"/>
>     </meta_attributes>
>     <primitive id="sub-ospfd" class="ocf" provider="OSAG" type="osaginit">
>       <instance_attributes id="ospfd-attrs">
>         <nvpair id="ospfd-script" name="script" value="ospfd.init"/>
>       </instance_attributes>
>       <operations>
>         <op id="ospfd-slave-check" name="monitor" interval="10" role="Slave" 
> />
>         <op id="ospfd-master-check" name="monitor" interval="11" 
> role="Master" />
>       </operations>
>     </primitive>
>   </master>
>   <!--resource for subsystem ripd-->
>   <master id="master-ripd">
>     <meta_attributes id="master-ripd-meta_attributes">
>       <nvpair id="master-ripd-meta_attributes-notify" name="notify" 
> value="false"/>
>       <nvpair id="master-ripd-meta_attributes-interleave" name="interleave" 
> value="true"/>
>       <nvpair id="master-ripd-meta_attributes-target-role" name="target-role" 
> value="Master"/>
>       <nvpair id="master-ripd-meta_attributes-failure-timeout" 
> name="failure-timeout" value="600"/>
>       <nvpair id="master-ripd-meta_attributes-migration-threshold" 
> name="migration-threshold" value="2"/>
>     </meta_attributes>
>     <primitive id="sub-ripd" class="ocf" provider="OSAG" type="osaginit">
>       <instance_attributes id="ripd-attrs">
>         <nvpair id="ripd-script" name="script" value="ripd.init"/>
>       </instance_attributes>
>       <operations>
>         <op id="ripd-slave-check" name="monitor" interval="10" role="Slave" />
>         <op id="ripd-master-check" name="monitor" interval="11" role="Master" 
> />
>       </operations>
>     </primitive>
>   </master>
>   <!--resource for subsystem squid-->
>   <master id="master-squid">
>     <meta_attributes id="master-squid-meta_attributes">
>       <nvpair id="master-squid-meta_attributes-notify" name="notify" 
> value="false"/>
>       <nvpair id="master-squid-meta_attributes-interleave" name="interleave" 
> value="true"/>
>       <nvpair id="master-squid-meta_attributes-target-role" 
> name="target-role" value="Master"/>
>       <nvpair id="master-squid-meta_attributes-failure-timeout" 
> name="failure-timeout" value="600"/>
>       <nvpair id="master-squid-meta_attributes-migration-threshold" 
> name="migration-threshold" value="2"/>
>     </meta_attributes>
>     <primitive id="sub-squid" class="ocf" provider="OSAG" type="osaginit">
>       <instance_attributes id="squid-attrs">
>         <nvpair id="squid-script" name="script" value="squid.init"/>
>       </instance_attributes>
>       <operations>
>         <op id="squid-slave-check" name="monitor" interval="10" role="Slave" 
> />
>         <op id="squid-master-check" name="monitor" interval="11" 
> role="Master" />
>       </operations>
>     </primitive>
>   </master>
>
>   <!--resource for interface checks -->
>   <clone id="clone-IFcheck">
>     <primitive id="IFcheck" class="ocf" provider="OSAG" type="ifmonitor">
>       <instance_attributes id="resIFcheck-attrs">
>         <nvpair id="IFcheck-interfaces" name="interfaces" value="eth0 eth1"/>
>         <nvpair id="IFcheck-multiplier" name="multiplier" value="200"/>
>         <nvpair id="IFcheck-dampen" name="dampen" value="16s" />
>       </instance_attributes>
>       <operations>
>         <op id="IFcheck-monitor" interval="8s" name="monitor"/>
>       </operations>
>     </primitive>
>   </clone>
>
>   <!--resource for ISP checks-->
>   <clone id="clone-ISPcheck">
>     <primitive id="ISPcheck" class="ocf" provider="OSAG" type="ispcheck">
>       <instance_attributes id="ISPcheck-attrs">
>         <nvpair id="ISPcheck-ipsec" name="ipsec-check" value="1" />
>         <nvpair id="ISPcheck-ping" name="ping-check" value="1" />
>         <nvpair id="ISPcheck-multiplier" name="multiplier" value="200"/>
>         <nvpair id="ISPcheck-dampen" name="dampen" value="60s"/>
>       </instance_attributes>
>       <operations>
>         <op id="ISPcheck-monitor" interval="30s" name="monitor"/>
>       </operations>
>     </primitive>
>   </clone>
>
>   <!--Virtual IP group-->
>   <group id="VIP-group">
>     <primitive id="eth1-0-192.168.1.10" class="ocf" provider="heartbeat" 
> type="IPaddr2">
>       <meta_attributes id="meta-VIP-1">
>         <nvpair id="VIP-1-failure-timeout" name="failure-timeout" value="60"/>
>         <nvpair id="VIP-1-migration-threshold" name="migration-threshold" 
> value="50"/>
>       </meta_attributes>
>       <instance_attributes id="VIP-1-instance_attributes">
>         <nvpair id="VIP-1-IP" name = "ip" value="192.168.1.10"/>
>         <nvpair id="VIP-1-nic" name="nic" value="eth1"/>
>         <nvpair id="VIP-1-cidr" name="cidr_netmask" value="24"/>
>         <nvpair id="VIP-1-iflabel" name="iflabel" value="0"/>
>         <nvpair id="VIP-1-arp-sender" name="arp_sender" value="send_arp"/>
>       </instance_attributes>
>       <operations>
>         <op id="VIP-1-monitor" interval="10s" name="monitor"/>
>       </operations>
>     </primitive>
>   </group>
> </resources>
>
> <!--resource constraints-->
> <constraints>
>   <!--set VIP location based on the following two rules-->
>   <rsc_location id="VIPs" rsc="VIP-group">
>     <!--prefer host with more interfaces-->
>     <rule id="VIP-prefer-connected-rule-1" score-attribute="ifcheck" >
>       <expression id="VIP-prefer-most-connected-1" attribute="ifcheck" 
> operation="defined"/>
>     </rule>
>     <!--prefer host with better ISP connectivity-->
>     <rule id="VIP-prefer-connected-rule-2" score-attribute="ispcheck">
>       <expression id="VIP-prefer-most-connected-2" attribute="ispcheck" 
> operation="defined"/>
>     </rule>
>   </rsc_location>
>
>   <!--conntrack master must run where the VIPs are-->
>   <rsc_colocation id="conntrack-master-with-VIPs" rsc="master-conntrackd" 
> with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
>   <!--condition master must run where the VIPs are-->
>   <rsc_colocation id="condition-master-with-VIPs" rsc="master-condition" 
> with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
>
>   <!--ospfd master must run with master-condition in master-->
>   <rsc_colocation id="ospfd-master-with-VIPs" rsc="master-ospfd" 
> with-rsc="master-condition" with-rsc-role="Master" rsc-role="Master" 
> score="INFINITY" />
>   <!--ripd master must run with master-condition in master-->
>   <rsc_colocation id="ripd-master-with-VIPs" rsc="master-ripd" 
> with-rsc="master-condition" with-rsc-role="Master" rsc-role="Master" 
> score="INFINITY" />
>   <!--squid master must run with master-condition in master-->
>   <rsc_colocation id="squid-master-with-VIPs" rsc="master-squid" 
> with-rsc="master-condition" with-rsc-role="Master" rsc-role="Master" 
> score="INFINITY" />
>
>   <!--prefer as master the following hosts in ascending order-->
>   <rsc_location id="VIP-master-xi" rsc="VIP-group" node="xi" score="0"/>
>   <rsc_location id="VIP-master-nu" rsc="VIP-group" node="nu" score="20"/>
>   <rsc_location id="VIP-master-mu" rsc="VIP-group" node="mu" score="40"/>
> </constraints>
>
> On Feb 5, 2013, at 11:13 AM, James Guthrie <j...@open.ch> wrote:
>
>> Hi Andrew,
>>
>> "The resource" in this case was master-squid.init. The resource agent serves 
>> as a master/slave OCF wrapper to a non-LSB init script. I forced the failure 
>> by manually stopping that init script on the host.
>>
>> Regards,
>> James
>> On Feb 5, 2013, at 10:56 AM, Andrew Beekhof <and...@beekhof.net> wrote:
>>
>>> On Thu, Jan 31, 2013 at 3:04 AM, James Guthrie <j...@open.ch> wrote:
>>>> Hi all,
>>>>
>>>> I'm having a bit of difficulty with the way that my cluster is behaving on 
>>>> failure of a resource.
>>>>
>>>> The objective of my clustering setup is to provide a virtual IP, to which 
>>>> a number of other services are bound. The services are bound to the VIP 
>>>> with constraints to force the service to be running on the same host as 
>>>> the VIP.
>>>>
>>>> I have been testing the way that the cluster behaves if it is unable to 
>>>> start a resource. What I observe is the following: the cluster tries to 
>>>> start the resource on node 1,
>>>
>>> Can you define "the resource"?  You have a few and it matters :)
>>>
>>>> fails 10 times, reaches the migration threshold, moves the resource to the 
>>>> other host, fails 10 times, reaches the migration threshold. Now it has 
>>>> reached the migration threshold on all possible hosts. I was then 
>>>> expecting that it would stop the resource on all nodes and run all of the 
>>>> other resources as though nothing were wrong. What I see though is that 
>>>> the cluster demotes all master/slave resources, despite the fact that only 
>>>> one of them is failing.
>>>>
>>>> I wasn't able to find a parameter which would dictate what the behaviour 
>>>> should be if the migration failed on all available hosts. I must therefore 
>>>> believe that the constraints configuration I'm using isn't doing quite 
>>>> what I hope it's doing.
>>>>
>>>> Below is the configuration xml I am using on the hosts (no crmsh config, 
>>>> sorry).
>>>>
>>>> I am using Corosync 2.3.0 and Pacemaker 1.1.8, built from source.
>>>>
>>>> Regards,
>>>> James
>>>>
>>>> <!-- Configuration file for pacemaker -->
>>>> <resources>
>>>> <!--resource for conntrackd-->
>>>> <master id="master-conntrackd">
>>>>   <meta_attributes id="master-conntrackd-meta_attributes">
>>>>     <nvpair id="master-conntrackd-meta_attributes-notify" name="notify" 
>>>> value="true"/>
>>>>     <nvpair id="master-conntrackd-meta_attributes-interleave" 
>>>> name="interleave" value="true"/>
>>>>     <nvpair id="master-conntrackd-meta_attributes-target-role" 
>>>> name="target-role" value="Master"/>
>>>>     <nvpair id="master-conndtrakd-meta_attributes-failure-timeout" 
>>>> name="failure-timeout" value="600"/>
>>>>     <nvpair id="master-conntrackd-meta_attributes-migration-threshold" 
>>>> name="migration-threshold" value="10"/>
>>>>   </meta_attributes>
>>>>   <primitive id="conntrackd" class="ocf" provider="OSAG" type="conntrackd">
>>>>     <operations>
>>>>       <op id="conntrackd-slave-check" name="monitor" interval="60" 
>>>> role="Slave" />
>>>>       <op id="conntrackd-master-check" name="monitor" interval="61" 
>>>> role="Master" />
>>>>     </operations>
>>>>   </primitive>
>>>> </master>
>>>> <master id="master-condition">
>>>>   <meta_attributes id="master-condition-meta_attributes">
>>>>     <nvpair id="master-condition-meta_attributes-notify" name="notify" 
>>>> value="false"/>
>>>>     <nvpair id="master-condition-meta_attributes-interleave" 
>>>> name="interleave" value="true"/>
>>>>     <nvpair id="master-condition-meta_attributes-target-role" 
>>>> name="target-role" value="Master"/>
>>>>     <nvpair id="master-condition-meta_attributes-failure-timeout" 
>>>> name="failure-timeout" value="600"/>
>>>>     <nvpair id="master-condition-meta_attributes-migration-threshold" 
>>>> name="migration-threshold" value="10"/>
>>>>   </meta_attributes>
>>>>   <primitive id="condition" class="ocf" provider="OSAG" type="condition">
>>>>     <instance_attributes id="condition-attrs">
>>>>     </instance_attributes>
>>>>     <operations>
>>>>       <op id="condition-slave-check" name="monitor" interval="10" 
>>>> role="Slave" />
>>>>       <op id="condition-master-check" name="monitor" interval="11" 
>>>> role="Master" />
>>>>     </operations>
>>>>   </primitive>
>>>> </master>
>>>> <master id="master-ospfd.init">
>>>>   <meta_attributes id="master-ospfd-meta_attributes">
>>>>     <nvpair id="master-ospfd-meta_attributes-notify" name="notify" 
>>>> value="false"/>
>>>>     <nvpair id="master-ospfd-meta_attributes-interleave" name="interleave" 
>>>> value="true"/>
>>>>     <nvpair id="master-ospfd-meta_attributes-target-role" 
>>>> name="target-role" value="Master"/>
>>>>     <nvpair id="master-ospfd-meta_attributes-failure-timeout" 
>>>> name="failure-timeout" value="600"/>
>>>>     <nvpair id="master-ospfd-meta_attributes-migration-threshold" 
>>>> name="migration-threshold" value="10"/>
>>>>   </meta_attributes>
>>>>   <primitive id="ospfd" class="ocf" provider="OSAG" type="osaginit">
>>>>     <instance_attributes id="ospfd-attrs">
>>>>       <nvpair id="ospfd-script" name="script" value="ospfd.init"/>
>>>>     </instance_attributes>
>>>>     <operations>
>>>>       <op id="ospfd-slave-check" name="monitor" interval="10" role="Slave" 
>>>> />
>>>>       <op id="ospfd-master-check" name="monitor" interval="11" 
>>>> role="Master" />
>>>>     </operations>
>>>>   </primitive>
>>>> </master>
>>>> <master id="master-ripd.init">
>>>>   <meta_attributes id="master-ripd-meta_attributes">
>>>>     <nvpair id="master-ripd-meta_attributes-notify" name="notify" 
>>>> value="false"/>
>>>>     <nvpair id="master-ripd-meta_attributes-interleave" name="interleave" 
>>>> value="true"/>
>>>>     <nvpair id="master-ripd-meta_attributes-target-role" 
>>>> name="target-role" value="Master"/>
>>>>     <nvpair id="master-ripd-meta_attributes-failure-timeout" 
>>>> name="failure-timeout" value="600"/>
>>>>     <nvpair id="master-ripd-meta_attributes-migration-threshold" 
>>>> name="migration-threshold" value="10"/>
>>>>   </meta_attributes>
>>>>   <primitive id="ripd" class="ocf" provider="OSAG" type="osaginit">
>>>>     <instance_attributes id="ripd-attrs">
>>>>       <nvpair id="ripd-script" name="script" value="ripd.init"/>
>>>>     </instance_attributes>
>>>>     <operations>
>>>>       <op id="ripd-slave-check" name="monitor" interval="10" role="Slave" 
>>>> />
>>>>       <op id="ripd-master-check" name="monitor" interval="11" 
>>>> role="Master" />
>>>>     </operations>
>>>>   </primitive>
>>>> </master>
>>>> <master id="master-squid.init">
>>>>   <meta_attributes id="master-squid-meta_attributes">
>>>>     <nvpair id="master-squid-meta_attributes-notify" name="notify" 
>>>> value="false"/>
>>>>     <nvpair id="master-squid-meta_attributes-interleave" name="interleave" 
>>>> value="true"/>
>>>>     <nvpair id="master-squid-meta_attributes-target-role" 
>>>> name="target-role" value="Master"/>
>>>>     <nvpair id="master-squid-meta_attributes-failure-timeout" 
>>>> name="failure-timeout" value="600"/>
>>>>     <nvpair id="master-squid-meta_attributes-migration-threshold" 
>>>> name="migration-threshold" value="10"/>
>>>>   </meta_attributes>
>>>>   <primitive id="squid" class="ocf" provider="OSAG" type="osaginit">
>>>>     <instance_attributes id="squid-attrs">
>>>>       <nvpair id="squid-script" name="script" value="squid.init"/>
>>>>     </instance_attributes>
>>>>     <operations>
>>>>       <op id="squid-slave-check" name="monitor" interval="10" role="Slave" 
>>>> />
>>>>       <op id="squid-master-check" name="monitor" interval="11" 
>>>> role="Master" />
>>>>     </operations>
>>>>   </primitive>
>>>> </master>
>>>>
>>>> <!--resource for interface checks -->
>>>> <clone id="clone-IFcheck">
>>>>   <primitive id="IFcheck" class="ocf" provider="OSAG" type="ifmonitor">
>>>>     <instance_attributes id="resIFcheck-attrs">
>>>>       <nvpair id="IFcheck-interfaces" name="interfaces" value="eth0 eth1"/>
>>>>       <nvpair id="IFcheck-multiplier" name="multiplier" value="200"/>
>>>>       <nvpair id="IFcheck-dampen" name="dampen" value="6s" />
>>>>     </instance_attributes>
>>>>     <operations>
>>>>       <op id="IFcheck-monitor" interval="3s" name="monitor"/>
>>>>     </operations>
>>>>   </primitive>
>>>> </clone>
>>>>
>>>> <!--resource for ISP checks-->
>>>> <clone id="clone-ISPcheck">
>>>>   <primitive id="ISPcheck" class="ocf" provider="OSAG" type="ispcheck">
>>>>     <instance_attributes id="ISPcheck-attrs">
>>>>       <nvpair id="ISPcheck-ipsec" name="ipsec-check" value="1" />
>>>>       <nvpair id="ISPcheck-ping" name="ping-check" value="1" />
>>>>       <nvpair id="ISPcheck-multiplier" name="multiplier" value="200"/>
>>>>       <nvpair id="ISPcheck-dampen" name="dampen" value="60s"/>
>>>>     </instance_attributes>
>>>>     <operations>
>>>>       <op id="ISPcheck-monitor" interval="30s" name="monitor"/>
>>>>     </operations>
>>>>   </primitive>
>>>> </clone>
>>>>
>>>> <!--Virtual IP group-->
>>>> <group id="VIP-group">
>>>>   <primitive id="eth1-0-192.168.1.10" class="ocf" provider="heartbeat" 
>>>> type="IPaddr2">
>>>>     <meta_attributes id="meta-VIP-1">
>>>>       <nvpair id="VIP-1-failure-timeout" name="failure-timeout" 
>>>> value="60"/>
>>>>       <nvpair id="VIP-1-migration-threshold" name="migration-threshold" 
>>>> value="50"/>
>>>>     </meta_attributes>
>>>>     <instance_attributes id="VIP-1-instance_attributes">
>>>>       <nvpair id="VIP-1-IP" name = "ip" value="192.168.1.10"/>
>>>>       <nvpair id="VIP-1-nic" name="nic" value="eth1"/>
>>>>       <nvpair id="VIP-1-cidr" name="cidr_netmask" value="24"/>
>>>>       <nvpair id="VIP-1-iflabel" name="iflabel" value="0"/>
>>>>       <nvpair id="VIP-1-arp-sender" name="arp_sender" value="send_arp"/>
>>>>     </instance_attributes>
>>>>     <operations>
>>>>       <op id="VIP-1-monitor" interval="10s" name="monitor"/>
>>>>     </operations>
>>>>   </primitive>
>>>> </group>
>>>> </resources>
>>>>
>>>> <!--resource constraints-->
>>>> <constraints>
>>>> <!--set VIP location based on the following two rules-->
>>>> <rsc_location id="VIPs" rsc="VIP-group">
>>>>   <!--prefer host with more interfaces-->
>>>>   <rule id="VIP-prefer-connected-rule-1" score-attribute="ifcheck" >
>>>>     <expression id="VIP-prefer-most-connected-1" attribute="ifcheck" 
>>>> operation="defined"/>
>>>>   </rule>
>>>>   <!--prefer host with better ISP connectivity-->
>>>>   <rule id="VIP-prefer-connected-rule-2" score-attribute="ispcheck">
>>>>     <expression id="VIP-prefer-most-connected-2" attribute="ispcheck" 
>>>> operation="defined"/>
>>>>   </rule>
>>>> </rsc_location>
>>>> <!--conntrack master must run where the VIPs are-->
>>>> <rsc_colocation id="conntrack-master-with-VIPs" rsc="master-conntrackd" 
>>>> with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
>>>> <rsc_colocation id="condition-master-with-VIPs" rsc="master-condition" 
>>>> with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
>>>> <!--services masters must run where the VIPs are-->
>>>> <rsc_colocation id="ospfd-master-with-VIPs" rsc="master-ospfd.init" 
>>>> with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
>>>> <rsc_colocation id="ripd-master-with-VIPs" rsc="master-ripd.init" 
>>>> with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
>>>> <rsc_colocation id="squid-master-with-VIPs" rsc="master-squid.init" 
>>>> with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
>>>> <!--prefer as master the following hosts in ascending order-->
>>>> <rsc_location id="VIP-master-xi" rsc="VIP-group" node="xi" score="0"/>
>>>> <rsc_location id="VIP-master-nu" rsc="VIP-group" node="nu" score="20"/>
>>>> <rsc_location id="VIP-master-mu" rsc="VIP-group" node="mu" score="40"/>
>>>> </constraints>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Pacemaker resource migration behaviour

Reply via email to