If do do on ha2: crm node online ha2.iohost.com it starts the VIP, it will ping, but does not do the DRBD mounts and does not start web or mysql services. If I then issue the crm node online ha1.iohost.com on ha1 it will make ha2 online with all services active! Then if I make ha2 standby ha1 will become online with all services, just fine!
Any insights will be greatly appreciated, thanks! Randy On 5/17/2011 9:44 PM, Randy Katz wrote: > In the logs, on ha2, I see at the time crm node standby ha1: > > May 18 10:32:54 ha2.iohost.com cib: [2378]: info: write_cib_contents: > Archived previous version as /var/lib/heartbeat/crm/cib-25.raw > May 18 10:32:54 ha2.iohost.com cib: [2378]: info: write_cib_contents: > Wrote version 0.102.0 of the CIB to disk (digest: > b445d9afde4b209981c3da08d4c24ecc) > May 18 10:32:54 ha2.iohost.com cib: [2378]: info: retrieveCib: Reading > cluster configuration from: /var/lib/heartbeat/crm/cib.f5FXZH (digest: > /var/lib/heartbeat/crm/cib.irSIZ7) > May 18 10:32:54 ha2.iohost.com cib: [7779]: info: Managed > write_cib_contents process 2378 exited with return code 0. > May 18 10:33:11 ha2.iohost.com attrd: [7782]: info: attrd_ha_callback: > flush message from ha1.iohost.com > May 18 10:33:12 ha2.iohost.com attrd: [7782]: info: attrd_ha_callback: > flush message from ha1.iohost.com > May 18 10:33:12 ha2.iohost.com attrd: [7782]: info: attrd_ha_callback: > flush message from ha1.iohost.com > May 18 10:33:12 ha2.iohost.com attrd: [7782]: info: attrd_ha_callback: > flush message from ha1.iohost.com > May 18 10:35:14 ha2.iohost.com cib: [7779]: info: cib_stats: Processed > 48 operations (13125.00us average, 0% utilization) in the last 10min > > And on ha1: > > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: -<cib admin_epoch="0" epoch="101" num_updates="23"> > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: -<configuration> > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: -<nodes> > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: -<node id="b159178d-c19b-4473-aa8e-13e487b65e33"> > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: -<instance_attributes > id="nodes-b159178d-c19b-4473-aa8e-13e487b65e33"> > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: -<nvpair value="off" > id="nodes-b159178d-c19b-4473-aa8e-13e487b65e33-standby" /> > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: -</instance_attributes> > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: -</node> > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: -</nodes> > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: -</configuration> > May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: > abort_transition_graph: need_abort:59 - Triggered transition abort > (complete=1) : Non-status change > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: -</cib> > May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: need_abort: Aborting > on change to admin_epoch > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: +<cib admin_epoch="0" epoch="102" num_updates="1"> > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: +<configuration> > May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: do_state_transition: > State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC > cause=C_FSA_INTERNAL origin=abort_transition_graph ] > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: +<nodes> > May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: do_state_transition: > All 2 cluster nodes are eligible to run resources. > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: +<node id="b159178d-c19b-4473-aa8e-13e487b65e33"> > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: +<instance_attributes > id="nodes-b159178d-c19b-4473-aa8e-13e487b65e33"> > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: +<nvpair value="on" > id="nodes-b159178d-c19b-4473-aa8e-13e487b65e33-standby" /> > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: +</instance_attributes> > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: +</node> > May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: do_pe_invoke: Query > 337: Requesting the current CIB: S_POLICY_ENGINE > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: +</nodes> > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: +</configuration> > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: +</cib> > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: cib_process_request: > Operation complete: op cib_modify for section nodes > (origin=local/crm_attribute/4, version=0.102.1): ok (rc=0) > May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: > do_pe_invoke_callback: Invoking the PE: query=337, > ref=pe_calc-dc-1305696782-441, seq=2, quorate=1 > May 17 22:33:02 ha1.iohost.com cib: [1591]: info: write_cib_contents: > Archived previous version as /var/lib/heartbeat/crm/cib-27.raw > May 17 22:33:02 ha1.iohost.com cib: [1591]: info: write_cib_contents: > Wrote version 0.102.0 of the CIB to disk (digest: > 6014929506b4b9e2eccb8e741e6e2e6f) > May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: unpack_config: > On loss of CCM Quorum: Ignore > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: unpack_config: > Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 > May 17 22:33:02 ha1.iohost.com cib: [1591]: info: retrieveCib: Reading > cluster configuration from: /var/lib/heartbeat/crm/cib.vRGjiM (digest: > /var/lib/heartbeat/crm/cib.iJf2S7) > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: unpack_status: > Node ha1.iohost.com is in standby-mode > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: > determine_online_status: Node ha1.iohost.com is standby > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: unpack_status: > Node ha2.iohost.com is in standby-mode > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: > determine_online_status: Node ha2.iohost.com is standby > May 17 22:33:02 ha1.iohost.com pengine: [8685]: WARN: unpack_status: > Node ha1.iohost.com in status section no longer exists > May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: unpack_rsc_op: > Operation ip1arp_monitor_0 found resource ip1arp active on ha1.iohost.com > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: find_clone: > Internally renamed drbd_webfs:1 on ha1.iohost.com to drbd_webfs:2 (ORPHAN) > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: find_clone: > Internally renamed drbd_mysql:1 on ha1.iohost.com to drbd_mysql:2 (ORPHAN) > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: find_clone: > Internally renamed drbd_mysql:0 on ha2.iohost.com to drbd_mysql:1 > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: Managed > write_cib_contents process 1591 exited with return code 0. > May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: unpack_rsc_op: > Operation ip1arp_monitor_0 found resource ip1arp active on ha2.iohost.com > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: find_clone: > Internally renamed drbd_webfs:0 on ha2.iohost.com to drbd_webfs:1 > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: unpack_status: > Node ha1.iohost.com is unknown > May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: group_print: > Resource Group: WebServices > May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: > native_print: ip1 (ocf::heartbeat:IPaddr2): Started > ha1.iohost.com > May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: > native_print: ip1arp (ocf::heartbeat:SendArp): Started > ha1.iohost.com > May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: > native_print: fs_webfs (ocf::heartbeat:Filesystem): Started > ha1.iohost.com > May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: > native_print: fs_mysql (ocf::heartbeat:Filesystem): Started > ha1.iohost.com > May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: > native_print: apache2 (lsb:httpd): Started ha1.iohost.com > May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: > native_print: mysql (ocf::heartbeat:mysql): Started > ha1.iohost.com > May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: clone_print: > Master/Slave Set: ms_drbd_mysql > May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: > short_print: Masters: [ ha1.iohost.com ] > May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: > short_print: Stopped: [ drbd_mysql:1 ] > May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: clone_print: > Master/Slave Set: ms_drbd_webfs > May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: > short_print: Masters: [ ha1.iohost.com ] > May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: > short_print: Stopped: [ drbd_webfs:1 ] > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: > ip1arp: Rolling back scores from fs_webfs > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: > ip1arp: Rolling back scores from ip1 > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: > Resource ip1arp cannot run anywhere > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: > ip1: Rolling back scores from apache2 > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: > ip1: Rolling back scores from ip1arp > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: > Resource ip1 cannot run anywhere > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: > ms_drbd_webfs: Rolling back scores from apache2 > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: > ms_drbd_webfs: Rolling back scores from fs_webfs > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: > Resource drbd_webfs:0 cannot run anywhere > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: > Resource drbd_webfs:1 cannot run anywhere > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: > ms_drbd_webfs: Rolling back scores from apache2 > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: > ms_drbd_webfs: Rolling back scores from fs_webfs > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: master_color: > ms_drbd_webfs: Promoted 0 instances of a possible 1 to master > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: > fs_webfs: Rolling back scores from fs_mysql > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: > Resource fs_webfs cannot run anywhere > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: > ms_drbd_mysql: Rolling back scores from apache2 > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: > ms_drbd_mysql: Rolling back scores from fs_mysql > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: > ms_drbd_mysql: Rolling back scores from fs_mysql > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: > Resource drbd_mysql:0 cannot run anywhere > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: > Resource drbd_mysql:1 cannot run anywhere > May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: > ms_drbd_mysql: Rolling back scores from apache2 > > > > On 5/17/2011 9:28 PM, Randy Katz wrote: >> Hi, >> >> Relatively new to HA though I have been using Xen and reading >> this list here and there, now need some help: >> >> I have 2 nodes, physical, let's call node1/node2: >> In each I have VM's (Xen paravirt / ha1& ha2). In each VM I have >> 2 LVs which are DRBD'd (r0 and r1, mysql data, and html data). There is a >> VIP between them, resolving the website, which is a simple >> Wordpress blog so to have database, and works well. >> >> When I start them (reboot VMs) they start up fine and ha1 is >> online (primary) and ha2 is standby (secondary). If I: >> >> 1. crm node standby ha1.iohost.com - sometimes ha2.iohost.com takes >> over, sometimes I am left with 2 nodes on standby, not >> sure why. >> 2. If 2 nodes are in standby and I issue: crm node online ha1.iohost.com >> sometimes ha2 will become active as it should when ha1 >> went standby, sometimes ha1 will become active and sometimes, they will >> remain standby, not sure why. >> >> Question: How do I test and debug this? What parameters in which config >> file affect this behavior? >> >> Thank you in advance, >> Randy >> _______________________________________________ >> Linux-HA mailing list >> Linux-HA@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems