Re: [Linux-HA] Need HA Help - standby / online not switching automatically

Randy Katz Tue, 17 May 2011 21:50:03 -0700

If do do on ha2: crm node online ha2.iohost.com it starts the VIP, it 
will ping, but does not
do the DRBD mounts and does not start web or mysql services. If I then 
issue the crm node online ha1.iohost.com
on ha1 it will make ha2 online with all services active! Then if I make 
ha2 standby ha1 will
become online with all services, just fine!


Any insights will be greatly appreciated, thanks!

Randy

On 5/17/2011 9:44 PM, Randy Katz wrote:
> In the logs, on ha2, I see at the time crm node standby ha1:
>
> May 18 10:32:54 ha2.iohost.com cib: [2378]: info: write_cib_contents:
> Archived previous version as /var/lib/heartbeat/crm/cib-25.raw
> May 18 10:32:54 ha2.iohost.com cib: [2378]: info: write_cib_contents:
> Wrote version 0.102.0 of the CIB to disk (digest:
> b445d9afde4b209981c3da08d4c24ecc)
> May 18 10:32:54 ha2.iohost.com cib: [2378]: info: retrieveCib: Reading
> cluster configuration from: /var/lib/heartbeat/crm/cib.f5FXZH (digest:
> /var/lib/heartbeat/crm/cib.irSIZ7)
> May 18 10:32:54 ha2.iohost.com cib: [7779]: info: Managed
> write_cib_contents process 2378 exited with return code 0.
> May 18 10:33:11 ha2.iohost.com attrd: [7782]: info: attrd_ha_callback:
> flush message from ha1.iohost.com
> May 18 10:33:12 ha2.iohost.com attrd: [7782]: info: attrd_ha_callback:
> flush message from ha1.iohost.com
> May 18 10:33:12 ha2.iohost.com attrd: [7782]: info: attrd_ha_callback:
> flush message from ha1.iohost.com
> May 18 10:33:12 ha2.iohost.com attrd: [7782]: info: attrd_ha_callback:
> flush message from ha1.iohost.com
> May 18 10:35:14 ha2.iohost.com cib: [7779]: info: cib_stats: Processed
> 48 operations (13125.00us average, 0% utilization) in the last 10min
>
> And on ha1:
>
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: -<cib admin_epoch="0" epoch="101" num_updates="23">
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: -<configuration>
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: -<nodes>
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: -<node id="b159178d-c19b-4473-aa8e-13e487b65e33">
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: -<instance_attributes
> id="nodes-b159178d-c19b-4473-aa8e-13e487b65e33">
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: -<nvpair value="off"
> id="nodes-b159178d-c19b-4473-aa8e-13e487b65e33-standby" />
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: -</instance_attributes>
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: -</node>
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: -</nodes>
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: -</configuration>
> May 17 22:33:02 ha1.iohost.com crmd: [8656]: info:
> abort_transition_graph: need_abort:59 - Triggered transition abort
> (complete=1) : Non-status change
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: -</cib>
> May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: need_abort: Aborting
> on change to admin_epoch
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: +<cib admin_epoch="0" epoch="102" num_updates="1">
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: +<configuration>
> May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: do_state_transition:
> State transition S_IDLE ->  S_POLICY_ENGINE [ input=I_PE_CALC
> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: +<nodes>
> May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: do_state_transition:
> All 2 cluster nodes are eligible to run resources.
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: +<node id="b159178d-c19b-4473-aa8e-13e487b65e33">
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: +<instance_attributes
> id="nodes-b159178d-c19b-4473-aa8e-13e487b65e33">
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: +<nvpair value="on"
> id="nodes-b159178d-c19b-4473-aa8e-13e487b65e33-standby" />
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: +</instance_attributes>
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: +</node>
> May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: do_pe_invoke: Query
> 337: Requesting the current CIB: S_POLICY_ENGINE
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: +</nodes>
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: +</configuration>
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: +</cib>
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: cib_process_request:
> Operation complete: op cib_modify for section nodes
> (origin=local/crm_attribute/4, version=0.102.1): ok (rc=0)
> May 17 22:33:02 ha1.iohost.com crmd: [8656]: info:
> do_pe_invoke_callback: Invoking the PE: query=337,
> ref=pe_calc-dc-1305696782-441, seq=2, quorate=1
> May 17 22:33:02 ha1.iohost.com cib: [1591]: info: write_cib_contents:
> Archived previous version as /var/lib/heartbeat/crm/cib-27.raw
> May 17 22:33:02 ha1.iohost.com cib: [1591]: info: write_cib_contents:
> Wrote version 0.102.0 of the CIB to disk (digest:
> 6014929506b4b9e2eccb8e741e6e2e6f)
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: unpack_config:
> On loss of CCM Quorum: Ignore
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: unpack_config:
> Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
> May 17 22:33:02 ha1.iohost.com cib: [1591]: info: retrieveCib: Reading
> cluster configuration from: /var/lib/heartbeat/crm/cib.vRGjiM (digest:
> /var/lib/heartbeat/crm/cib.iJf2S7)
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: unpack_status:
> Node ha1.iohost.com is in standby-mode
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info:
> determine_online_status: Node ha1.iohost.com is standby
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: unpack_status:
> Node ha2.iohost.com is in standby-mode
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info:
> determine_online_status: Node ha2.iohost.com is standby
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: WARN: unpack_status:
> Node ha1.iohost.com in status section no longer exists
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: unpack_rsc_op:
> Operation ip1arp_monitor_0 found resource ip1arp active on ha1.iohost.com
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: find_clone:
> Internally renamed drbd_webfs:1 on ha1.iohost.com to drbd_webfs:2 (ORPHAN)
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: find_clone:
> Internally renamed drbd_mysql:1 on ha1.iohost.com to drbd_mysql:2 (ORPHAN)
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: find_clone:
> Internally renamed drbd_mysql:0 on ha2.iohost.com to drbd_mysql:1
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: Managed
> write_cib_contents process 1591 exited with return code 0.
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: unpack_rsc_op:
> Operation ip1arp_monitor_0 found resource ip1arp active on ha2.iohost.com
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: find_clone:
> Internally renamed drbd_webfs:0 on ha2.iohost.com to drbd_webfs:1
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: unpack_status:
> Node ha1.iohost.com is unknown
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: group_print:
> Resource Group: WebServices
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice:
> native_print:      ip1  (ocf::heartbeat:IPaddr2):       Started
> ha1.iohost.com
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice:
> native_print:      ip1arp       (ocf::heartbeat:SendArp):       Started
> ha1.iohost.com
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice:
> native_print:      fs_webfs     (ocf::heartbeat:Filesystem):    Started
> ha1.iohost.com
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice:
> native_print:      fs_mysql     (ocf::heartbeat:Filesystem):    Started
> ha1.iohost.com
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice:
> native_print:      apache2      (lsb:httpd):    Started ha1.iohost.com
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice:
> native_print:      mysql        (ocf::heartbeat:mysql): Started
> ha1.iohost.com
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: clone_print:
> Master/Slave Set: ms_drbd_mysql
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice:
> short_print:      Masters: [ ha1.iohost.com ]
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice:
> short_print:      Stopped: [ drbd_mysql:1 ]
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: clone_print:
> Master/Slave Set: ms_drbd_webfs
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice:
> short_print:      Masters: [ ha1.iohost.com ]
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice:
> short_print:      Stopped: [ drbd_webfs:1 ]
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights:
> ip1arp: Rolling back scores from fs_webfs
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights:
> ip1arp: Rolling back scores from ip1
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color:
> Resource ip1arp cannot run anywhere
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights:
> ip1: Rolling back scores from apache2
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights:
> ip1: Rolling back scores from ip1arp
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color:
> Resource ip1 cannot run anywhere
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights:
> ms_drbd_webfs: Rolling back scores from apache2
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights:
> ms_drbd_webfs: Rolling back scores from fs_webfs
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color:
> Resource drbd_webfs:0 cannot run anywhere
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color:
> Resource drbd_webfs:1 cannot run anywhere
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights:
> ms_drbd_webfs: Rolling back scores from apache2
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights:
> ms_drbd_webfs: Rolling back scores from fs_webfs
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: master_color:
> ms_drbd_webfs: Promoted 0 instances of a possible 1 to master
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights:
> fs_webfs: Rolling back scores from fs_mysql
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color:
> Resource fs_webfs cannot run anywhere
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights:
> ms_drbd_mysql: Rolling back scores from apache2
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights:
> ms_drbd_mysql: Rolling back scores from fs_mysql
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights:
> ms_drbd_mysql: Rolling back scores from fs_mysql
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color:
> Resource drbd_mysql:0 cannot run anywhere
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color:
> Resource drbd_mysql:1 cannot run anywhere
> May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights:
> ms_drbd_mysql: Rolling back scores from apache2
>
>
>
> On 5/17/2011 9:28 PM, Randy Katz wrote:
>> Hi,
>>
>> Relatively new to HA though I have been using Xen and reading
>> this list here and there, now need some help:
>>
>> I have 2 nodes, physical, let's call node1/node2:
>> In each I have VM's (Xen paravirt / ha1&   ha2). In each VM I have
>> 2 LVs which are DRBD'd (r0 and r1, mysql data, and html data). There is a
>> VIP between them, resolving the website, which is a simple
>> Wordpress blog so to have database, and works well.
>>
>> When I start them (reboot VMs) they start up fine and ha1 is
>> online (primary) and ha2 is standby (secondary). If I:
>>
>> 1. crm node standby ha1.iohost.com - sometimes ha2.iohost.com takes
>> over, sometimes I am left with 2 nodes on standby, not
>> sure why.
>> 2. If 2 nodes are in standby and I issue: crm node online ha1.iohost.com
>> sometimes ha2 will become active as it should when ha1
>> went standby, sometimes ha1 will become active and sometimes, they will
>> remain standby, not sure why.
>>
>> Question: How do I test and debug this? What parameters in which config
>> file affect this behavior?
>>
>> Thank you in advance,
>> Randy
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Need HA Help - standby / online not switching automatically

Reply via email to