Hi All, I confirmed movement at the time of the trouble in one of Master/Slave in Pacemaker1.1.11.
------------------------------------- Step1) Constitute a cluster. [root@srv01 ~]# crm_mon -1 -Af Last updated: Tue Feb 18 18:07:24 2014 Last change: Tue Feb 18 18:05:46 2014 via crmd on srv01 Stack: corosync Current DC: srv01 (3232238180) - partition with quorum Version: 1.1.10-9d39a6b 2 Nodes configured 6 Resources configured Online: [ srv01 srv02 ] vip-master (ocf::heartbeat:Dummy): Started srv01 vip-rep (ocf::heartbeat:Dummy): Started srv01 Master/Slave Set: msPostgresql [pgsql] Masters: [ srv01 ] Slaves: [ srv02 ] Clone Set: clnPingd [prmPingd] Started: [ srv01 srv02 ] Node Attributes: * Node srv01: + default_ping_set : 100 + master-pgsql : 10 * Node srv02: + default_ping_set : 100 + master-pgsql : 5 Migration summary: * Node srv01: * Node srv02: Step2) Monitor error in vip-master. [root@srv01 ~]# rm -rf /var/run/resource-agents/Dummy-vip-master.state [root@srv01 ~]# crm_mon -1 -Af Last updated: Tue Feb 18 18:07:58 2014 Last change: Tue Feb 18 18:05:46 2014 via crmd on srv01 Stack: corosync Current DC: srv01 (3232238180) - partition with quorum Version: 1.1.10-9d39a6b 2 Nodes configured 6 Resources configured Online: [ srv01 srv02 ] Master/Slave Set: msPostgresql [pgsql] Masters: [ srv01 ] Slaves: [ srv02 ] Clone Set: clnPingd [prmPingd] Started: [ srv01 srv02 ] Node Attributes: * Node srv01: + default_ping_set : 100 + master-pgsql : 10 * Node srv02: + default_ping_set : 100 + master-pgsql : 5 Migration summary: * Node srv01: vip-master: migration-threshold=1 fail-count=1 last-failure='Tue Feb 18 18:07:50 2014' * Node srv02: Failed actions: vip-master_monitor_10000 on srv01 'not running' (7): call=30, status=complete, last-rc-change='Tue Feb 18 18:07:50 2014', queued=0ms, exec=0ms ------------------------------------- However, the resource does not fail-over. But, fail-over is calculated when I check cib in crm_simulate at this point in time. ------------------------------------- [root@srv01 ~]# crm_simulate -L -s Current cluster status: Online: [ srv01 srv02 ] vip-master (ocf::heartbeat:Dummy): Stopped vip-rep (ocf::heartbeat:Dummy): Stopped Master/Slave Set: msPostgresql [pgsql] Masters: [ srv01 ] Slaves: [ srv02 ] Clone Set: clnPingd [prmPingd] Started: [ srv01 srv02 ] Allocation scores: clone_color: clnPingd allocation score on srv01: 0 clone_color: clnPingd allocation score on srv02: 0 clone_color: prmPingd:0 allocation score on srv01: INFINITY clone_color: prmPingd:0 allocation score on srv02: 0 clone_color: prmPingd:1 allocation score on srv01: 0 clone_color: prmPingd:1 allocation score on srv02: INFINITY native_color: prmPingd:0 allocation score on srv01: INFINITY native_color: prmPingd:0 allocation score on srv02: 0 native_color: prmPingd:1 allocation score on srv01: -INFINITY native_color: prmPingd:1 allocation score on srv02: INFINITY clone_color: msPostgresql allocation score on srv01: 0 clone_color: msPostgresql allocation score on srv02: 0 clone_color: pgsql:0 allocation score on srv01: INFINITY clone_color: pgsql:0 allocation score on srv02: 0 clone_color: pgsql:1 allocation score on srv01: 0 clone_color: pgsql:1 allocation score on srv02: INFINITY native_color: pgsql:0 allocation score on srv01: INFINITY native_color: pgsql:0 allocation score on srv02: 0 native_color: pgsql:1 allocation score on srv01: -INFINITY native_color: pgsql:1 allocation score on srv02: INFINITY pgsql:1 promotion score on srv02: 5 pgsql:0 promotion score on srv01: 1 native_color: vip-master allocation score on srv01: -INFINITY native_color: vip-master allocation score on srv02: INFINITY native_color: vip-rep allocation score on srv01: -INFINITY native_color: vip-rep allocation score on srv02: INFINITY Transition Summary: * Start vip-master (srv02) * Start vip-rep (srv02) * Demote pgsql:0 (Master -> Slave srv01) * Promote pgsql:1 (Slave -> Master srv02) ------------------------------------- In addition, fail-over is calculated even if "cluster_recheck_interval" is carried out. Fail-over is carried out even if I carry out cibadmin -B. ------------------------------------- [root@srv01 ~]# cibadmin -B [root@srv01 ~]# crm_mon -1 -Af Last updated: Tue Feb 18 18:21:15 2014 Last change: Tue Feb 18 18:21:00 2014 via cibadmin on srv01 Stack: corosync Current DC: srv01 (3232238180) - partition with quorum Version: 1.1.10-9d39a6b 2 Nodes configured 6 Resources configured Online: [ srv01 srv02 ] vip-master (ocf::heartbeat:Dummy): Started srv02 vip-rep (ocf::heartbeat:Dummy): Started srv02 Master/Slave Set: msPostgresql [pgsql] Masters: [ srv02 ] Slaves: [ srv01 ] Clone Set: clnPingd [prmPingd] Started: [ srv01 srv02 ] Node Attributes: * Node srv01: + default_ping_set : 100 + master-pgsql : 5 * Node srv02: + default_ping_set : 100 + master-pgsql : 10 Migration summary: * Node srv01: vip-master: migration-threshold=1 fail-count=1 last-failure='Tue Feb 18 18:07:50 2014' * Node srv02: Failed actions: vip-master_monitor_10000 on srv01 'not running' (7): call=30, status=complete, last-rc-change='Tue Feb 18 18:07:50 2014', queued=0ms, exec=0ms ------------------------------------- It is a problem to be behind with practice of fail-over. I think that the cause that fail-over is late for from error is Pacemaker. I registered these contents and log information with Bugzilla. * http://bugs.clusterlabs.org/show_bug.cgi?id=5197 Best Regards, Hideo Yamauchi. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org