Re: [Linux-HA] Need HA Help - standby / online not switching automatically

Randy Katz Tue, 17 May 2011 21:45:28 -0700

In the logs, on ha2, I see at the time crm node standby ha1:

May 18 10:32:54 ha2.iohost.com cib: [2378]: info: write_cib_contents: 
Archived previous version as /var/lib/heartbeat/crm/cib-25.raw
May 18 10:32:54 ha2.iohost.com cib: [2378]: info: write_cib_contents: 
Wrote version 0.102.0 of the CIB to disk (digest: 
b445d9afde4b209981c3da08d4c24ecc)
May 18 10:32:54 ha2.iohost.com cib: [2378]: info: retrieveCib: Reading 
cluster configuration from: /var/lib/heartbeat/crm/cib.f5FXZH (digest: 
/var/lib/heartbeat/crm/cib.irSIZ7)
May 18 10:32:54 ha2.iohost.com cib: [7779]: info: Managed 
write_cib_contents process 2378 exited with return code 0.
May 18 10:33:11 ha2.iohost.com attrd: [7782]: info: attrd_ha_callback: 
flush message from ha1.iohost.com
May 18 10:33:12 ha2.iohost.com attrd: [7782]: info: attrd_ha_callback: 
flush message from ha1.iohost.com
May 18 10:33:12 ha2.iohost.com attrd: [7782]: info: attrd_ha_callback: 
flush message from ha1.iohost.com
May 18 10:33:12 ha2.iohost.com attrd: [7782]: info: attrd_ha_callback: 
flush message from ha1.iohost.com
May 18 10:35:14 ha2.iohost.com cib: [7779]: info: cib_stats: Processed 
48 operations (13125.00us average, 0% utilization) in the last 10min

And on ha1:

May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: 
cib:diff: - <cib admin_epoch="0" epoch="101" num_updates="23" >
May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: 
cib:diff: - <configuration >
May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: 
cib:diff: - <nodes >
May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: 
cib:diff: - <node id="b159178d-c19b-4473-aa8e-13e487b65e33" >
May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: 
cib:diff: - <instance_attributes 
id="nodes-b159178d-c19b-4473-aa8e-13e487b65e33" >
May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: 
cib:diff: - <nvpair value="off" 
id="nodes-b159178d-c19b-4473-aa8e-13e487b65e33-standby" />
May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: 
cib:diff: - </instance_attributes>
May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: 
cib:diff: - </node>
May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: 
cib:diff: - </nodes>
May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: 
cib:diff: - </configuration>
May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: 
abort_transition_graph: need_abort:59 - Triggered transition abort 
(complete=1) : Non-status change
May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: 
cib:diff: - </cib>
May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: need_abort: Aborting 
on change to admin_epoch
May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: 
cib:diff: + <cib admin_epoch="0" epoch="102" num_updates="1" >
May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: 
cib:diff: + <configuration >
May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: do_state_transition: 
State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC 
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: 
cib:diff: + <nodes >
May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: do_state_transition: 
All 2 cluster nodes are eligible to run resources.
May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: 
cib:diff: + <node id="b159178d-c19b-4473-aa8e-13e487b65e33" >
May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: 
cib:diff: + <instance_attributes 
id="nodes-b159178d-c19b-4473-aa8e-13e487b65e33" >
May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: 
cib:diff: + <nvpair value="on" 
id="nodes-b159178d-c19b-4473-aa8e-13e487b65e33-standby" />
May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: 
cib:diff: + </instance_attributes>
May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: 
cib:diff: + </node>
May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: do_pe_invoke: Query 
337: Requesting the current CIB: S_POLICY_ENGINE
May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: 
cib:diff: + </nodes>
May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: 
cib:diff: + </configuration>
May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: 
cib:diff: + </cib>
May 17 22:33:02 ha1.iohost.com cib: [8652]: info: cib_process_request: 
Operation complete: op cib_modify for section nodes 
(origin=local/crm_attribute/4, version=0.102.1): ok (rc=0)
May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: 
do_pe_invoke_callback: Invoking the PE: query=337, 
ref=pe_calc-dc-1305696782-441, seq=2, quorate=1
May 17 22:33:02 ha1.iohost.com cib: [1591]: info: write_cib_contents: 
Archived previous version as /var/lib/heartbeat/crm/cib-27.raw
May 17 22:33:02 ha1.iohost.com cib: [1591]: info: write_cib_contents: 
Wrote version 0.102.0 of the CIB to disk (digest: 
6014929506b4b9e2eccb8e741e6e2e6f)
May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: unpack_config: 
On loss of CCM Quorum: Ignore
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: unpack_config: 
Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
May 17 22:33:02 ha1.iohost.com cib: [1591]: info: retrieveCib: Reading 
cluster configuration from: /var/lib/heartbeat/crm/cib.vRGjiM (digest: 
/var/lib/heartbeat/crm/cib.iJf2S7)
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: unpack_status: 
Node ha1.iohost.com is in standby-mode
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: 
determine_online_status: Node ha1.iohost.com is standby
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: unpack_status: 
Node ha2.iohost.com is in standby-mode
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: 
determine_online_status: Node ha2.iohost.com is standby
May 17 22:33:02 ha1.iohost.com pengine: [8685]: WARN: unpack_status: 
Node ha1.iohost.com in status section no longer exists
May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: unpack_rsc_op: 
Operation ip1arp_monitor_0 found resource ip1arp active on ha1.iohost.com
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: find_clone: 
Internally renamed drbd_webfs:1 on ha1.iohost.com to drbd_webfs:2 (ORPHAN)
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: find_clone: 
Internally renamed drbd_mysql:1 on ha1.iohost.com to drbd_mysql:2 (ORPHAN)
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: find_clone: 
Internally renamed drbd_mysql:0 on ha2.iohost.com to drbd_mysql:1
May 17 22:33:02 ha1.iohost.com cib: [8652]: info: Managed 
write_cib_contents process 1591 exited with return code 0.
May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: unpack_rsc_op: 
Operation ip1arp_monitor_0 found resource ip1arp active on ha2.iohost.com
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: find_clone: 
Internally renamed drbd_webfs:0 on ha2.iohost.com to drbd_webfs:1
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: unpack_status: 
Node ha1.iohost.com is unknown
May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: group_print:  
Resource Group: WebServices
May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: 
native_print:      ip1  (ocf::heartbeat:IPaddr2):       Started 
ha1.iohost.com
May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: 
native_print:      ip1arp       (ocf::heartbeat:SendArp):       Started 
ha1.iohost.com
May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: 
native_print:      fs_webfs     (ocf::heartbeat:Filesystem):    Started 
ha1.iohost.com
May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: 
native_print:      fs_mysql     (ocf::heartbeat:Filesystem):    Started 
ha1.iohost.com
May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: 
native_print:      apache2      (lsb:httpd):    Started ha1.iohost.com
May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: 
native_print:      mysql        (ocf::heartbeat:mysql): Started 
ha1.iohost.com
May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: clone_print:  
Master/Slave Set: ms_drbd_mysql
May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: 
short_print:      Masters: [ ha1.iohost.com ]
May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: 
short_print:      Stopped: [ drbd_mysql:1 ]
May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: clone_print:  
Master/Slave Set: ms_drbd_webfs
May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: 
short_print:      Masters: [ ha1.iohost.com ]
May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: 
short_print:      Stopped: [ drbd_webfs:1 ]
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: 
ip1arp: Rolling back scores from fs_webfs
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: 
ip1arp: Rolling back scores from ip1
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: 
Resource ip1arp cannot run anywhere
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: 
ip1: Rolling back scores from apache2
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: 
ip1: Rolling back scores from ip1arp
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: 
Resource ip1 cannot run anywhere
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: 
ms_drbd_webfs: Rolling back scores from apache2
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: 
ms_drbd_webfs: Rolling back scores from fs_webfs
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: 
Resource drbd_webfs:0 cannot run anywhere
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: 
Resource drbd_webfs:1 cannot run anywhere
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: 
ms_drbd_webfs: Rolling back scores from apache2
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: 
ms_drbd_webfs: Rolling back scores from fs_webfs
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: master_color: 
ms_drbd_webfs: Promoted 0 instances of a possible 1 to master
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: 
fs_webfs: Rolling back scores from fs_mysql
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: 
Resource fs_webfs cannot run anywhere
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: 
ms_drbd_mysql: Rolling back scores from apache2
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: 
ms_drbd_mysql: Rolling back scores from fs_mysql
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: 
ms_drbd_mysql: Rolling back scores from fs_mysql
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: 
Resource drbd_mysql:0 cannot run anywhere
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: 
Resource drbd_mysql:1 cannot run anywhere
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: 
ms_drbd_mysql: Rolling back scores from apache2

On 5/17/2011 9:28 PM, Randy Katz wrote:
> Hi,
>
> Relatively new to HA though I have been using Xen and reading
> this list here and there, now need some help:
>
> I have 2 nodes, physical, let's call node1/node2:
> In each I have VM's (Xen paravirt / ha1&  ha2). In each VM I have
> 2 LVs which are DRBD'd (r0 and r1, mysql data, and html data). There is a
> VIP between them, resolving the website, which is a simple
> Wordpress blog so to have database, and works well.
>
> When I start them (reboot VMs) they start up fine and ha1 is
> online (primary) and ha2 is standby (secondary). If I:
>
> 1. crm node standby ha1.iohost.com - sometimes ha2.iohost.com takes
> over, sometimes I am left with 2 nodes on standby, not
> sure why.
> 2. If 2 nodes are in standby and I issue: crm node online ha1.iohost.com
> sometimes ha2 will become active as it should when ha1
> went standby, sometimes ha1 will become active and sometimes, they will
> remain standby, not sure why.
>
> Question: How do I test and debug this? What parameters in which config
> file affect this behavior?
>
> Thank you in advance,
> Randy
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Need HA Help - standby / online not switching automatically

Reply via email to