Hi, sorry for the confusion.
Pacemaker 1.0.10 OK(group resource can failover) Pacemaker 1.0.11 NG(gruop resource just stop, can not failover) Pacemaker 1.1 <- the latest hg (gruop resource just stop, can not failover) By the way, your simulation showed dummy01 restart on bl460g1n13 again, but dummy01 failed on bl460g1n13, so dummy01 should move to bl460g1n14. Current cluster status: Online: [ bl460g1n13 bl460g1n14 ] Resource Group: grpDRBD dummy01 (ocf::pacemaker:Dummy): Started bl460g1n13 FAILED dummy02 (ocf::pacemaker:Dummy): Started bl460g1n13 dummy03 (ocf::pacemaker:Dummy): Started bl460g1n13 Master/Slave Set: msDRBD [prmDRBD] Masters: [ bl460g1n13 ] Slaves: [ bl460g1n14 ] Transition Summary: crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Recover dummy01 (Started bl460g1n13) crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Restart dummy02 (Started bl460g1n13) crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Restart dummy03 (Started bl460g1n13) crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Leave prmDRBD:0 (Master bl460g1n13) crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Leave prmDRBD:1 (Slave bl460g1n14) Executing cluster transition: * Executing action 14: dummy03_stop_0 on bl460g1n13 * Executing action 12: dummy02_stop_0 on bl460g1n13 * Executing action 2: dummy01_stop_0 on bl460g1n13 * Executing action 11: dummy01_start_0 on bl460g1n13 * Executing action 1: dummy01_monitor_10000 on bl460g1n13 * Executing action 13: dummy02_start_0 on bl460g1n13 * Executing action 3: dummy02_monitor_10000 on bl460g1n13 * Executing action 15: dummy03_start_0 on bl460g1n13 * Executing action 4: dummy03_monitor_10000 on bl460g1n13 Thanks, Junko 2011/9/29 Andrew Beekhof <and...@beekhof.net>: > On Tue, Sep 27, 2011 at 2:31 PM, Junko IKEDA <tsukishima...@gmail.com> wrote: >> Hi, >> >>> Which version did you check? >> >> Pacemaker 1.0.11. > > I meant of 1.1 since you said: > > "Pacemaker 1.1 shows the same behavior." > >> >>> The latest from git seems to work fine: >>> >>> Current cluster status: >>> Online: [ bl460g1n13 bl460g1n14 ] >>> >>> Resource Group: grpDRBD >>> dummy01 (ocf::pacemaker:Dummy): Started bl460g1n13 FAILED >>> dummy02 (ocf::pacemaker:Dummy): Started bl460g1n13 >>> dummy03 (ocf::pacemaker:Dummy): Started bl460g1n13 >>> Master/Slave Set: msDRBD [prmDRBD] >>> Masters: [ bl460g1n13 ] >>> Slaves: [ bl460g1n14 ] >>> >>> Transition Summary: >>> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Recover >>> dummy01 (Started bl460g1n13) >>> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Restart >>> dummy02 (Started bl460g1n13) >>> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Restart >>> dummy03 (Started bl460g1n13) >>> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Leave >>> prmDRBD:0 (Master bl460g1n13) >>> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Leave >>> prmDRBD:1 (Slave bl460g1n14) >>> >>> Executing cluster transition: >>> * Executing action 14: dummy03_stop_0 on bl460g1n13 >>> * Executing action 12: dummy02_stop_0 on bl460g1n13 >>> * Executing action 2: dummy01_stop_0 on bl460g1n13 >>> * Executing action 11: dummy01_start_0 on bl460g1n13 >>> * Executing action 1: dummy01_monitor_10000 on bl460g1n13 >>> * Executing action 13: dummy02_start_0 on bl460g1n13 >>> * Executing action 3: dummy02_monitor_10000 on bl460g1n13 >>> * Executing action 15: dummy03_start_0 on bl460g1n13 >>> * Executing action 4: dummy03_monitor_10000 on bl460g1n13 >> >> dummy01 got the fail-count, >> so dummy01 should move from bl460g1n13 to bl460g1n14. >> Why does it re-start on the failure node? >> >> I got the latest changeset from hg; >> >> # hg log | head -n 7 >> changeset: 15777:a15ead49e20f >> branch: stable-1.0 >> tag: tip >> user: Andrew Beekhof <and...@beekhof.net> >> date: Thu Aug 25 16:49:59 2011 +1000 >> summary: changeset: 15775:fe18a1ad46f8 >> >> # crm >> crm(live)# cib import pe-input-7.bz2 >> crm(pe-input-7)# configure ptest vvv >> ptest[19194]: 2011/09/27_11:53:45 notice: unpack_config: On loss of >> CCM Quorum: Ignore >> ptest[19194]: 2011/09/27_11:53:45 WARN: unpack_nodes: Blind faith: not >> fencing unseen nodes >> ptest[19194]: 2011/09/27_11:53:45 notice: group_print: Resource Group: >> grpDRBD >> ptest[19194]: 2011/09/27_11:53:45 notice: native_print: dummy01 >> (ocf::pacemaker:Dummy): Started bl460g1n13 >> ptest[19194]: 2011/09/27_11:53:45 notice: native_print: dummy02 >> (ocf::pacemaker:Dummy): Started bl460g1n13 >> ptest[19194]: 2011/09/27_11:53:45 notice: native_print: dummy03 >> (ocf::pacemaker:Dummy): Started bl460g1n13 >> ptest[19194]: 2011/09/27_11:53:45 notice: clone_print: Master/Slave Set: >> msDRBD >> ptest[19194]: 2011/09/27_11:53:45 notice: short_print: Masters: [ >> bl460g1n13 ] >> ptest[19194]: 2011/09/27_11:53:45 notice: short_print: Slaves: [ >> bl460g1n14 ] >> ptest[19194]: 2011/09/27_11:53:45 WARN: common_apply_stickiness: >> Forcing dummy01 away from bl460g1n13 after 1 failures (max=1) >> ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Stop resource >> dummy01 (bl460g1n13) >> ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Stop resource >> dummy02 (bl460g1n13) >> ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Stop resource >> dummy03 (bl460g1n13) >> ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Leave resource >> prmDRBD:0 (Master bl460g1n13) >> ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Leave resource >> prmDRBD:1 (Slave bl460g1n14) >> INFO: install graphviz to see a transition graph >> crm(pe-input-7)# quit >> >> >> reverts to Pacemaker 1.0.11, >> >> # hg revert -a -r b2e39d318fda >> # make install >> >> # crm >> crm(live)# cib import pe-input-7.bz2 >> crm(pe-input-7)# configure ptest vvv >> ptest[751]: 2011/09/27_11:57:50 notice: unpack_config: On loss of CCM >> Quorum: Ignore >> ptest[751]: 2011/09/27_11:57:50 WARN: unpack_nodes: Blind faith: not >> fencing unseen nodes >> ptest[751]: 2011/09/27_11:57:50 notice: group_print: Resource Group: grpDRBD >> ptest[751]: 2011/09/27_11:57:50 notice: native_print: dummy01 >> (ocf::pacemaker:Dummy): Started bl460g1n13 >> ptest[751]: 2011/09/27_11:57:50 notice: native_print: dummy02 >> (ocf::pacemaker:Dummy): Started bl460g1n13 >> ptest[751]: 2011/09/27_11:57:50 notice: native_print: dummy03 >> (ocf::pacemaker:Dummy): Started bl460g1n13 >> ptest[751]: 2011/09/27_11:57:50 notice: clone_print: Master/Slave Set: >> msDRBD >> ptest[751]: 2011/09/27_11:57:50 notice: short_print: Masters: [ >> bl460g1n13 ] >> ptest[751]: 2011/09/27_11:57:50 notice: short_print: Slaves: [ >> bl460g1n14 ] >> ptest[751]: 2011/09/27_11:57:50 WARN: common_apply_stickiness: Forcing >> dummy01 away from bl460g1n13 after 1 failures (max=1) >> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring >> monitor (10s) for dummy01 on bl460g1n14 >> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring >> monitor (10s) for dummy02 on bl460g1n14 >> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring >> monitor (10s) for dummy03 on bl460g1n14 >> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring >> monitor (20s) for prmDRBD:0 on bl460g1n13 >> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring >> monitor (10s) for prmDRBD:1 on bl460g1n14 >> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring >> monitor (20s) for prmDRBD:0 on bl460g1n13 >> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring >> monitor (10s) for prmDRBD:1 on bl460g1n14 >> ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Move resource >> dummy01 (Started bl460g1n13 -> bl460g1n14) >> ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Move resource >> dummy02 (Started bl460g1n13 -> bl460g1n14) >> ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Move resource >> dummy03 (Started bl460g1n13 -> bl460g1n14) >> ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Demote prmDRBD:0 >> (Master -> Slave bl460g1n13) >> ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Promote prmDRBD:1 >> (Slave -> Master bl460g1n14) >> INFO: install graphviz to see a transition graph >> >> Pacemaker 1.0.10 moved the failure resource to the other node. >> It's the expected behavior. >> >> I attached the hb_report which includes the above pe-input-7.bz2. >> >> Thanks, >> Junko >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >> >> > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker