so I fixed the problem regarding hostname in drbd.conf and in name from cluster point of view. ALso configured and verified fence_vmware agent and enabled stonith Changed in drbd resource configuration
resource ovirt { disk { disk-flushes no; md-flushes no; fencing resource-and-stonith; } device minor 0; disk /dev/sdb; syncer { rate 30M; verify-alg md5; } handlers { fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; } Put in cluster.conf <cman expected_votes="1" two_node="1"/> and restarted pacemaker and cman on nodes. service active on ovirteng01 I provoke power off of ovirteng01. Fencing agent works ok on ovirteng02 and reboots it. I stop boot ofovirteng01 at grub prompt to simulate problem in boot (for example system put in console mode due to filesystem problem) In the mean time ovirteng02 becomes master of drbd resource, but doesn't start the group This in messages: Mar 8 01:08:00 ovirteng02 kernel: drbd ovirt: PingAck did not arrive in time. Mar 8 01:08:00 ovirteng02 kernel: drbd ovirt: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Mar 8 01:08:00 ovirteng02 kernel: drbd ovirt: asender terminated Mar 8 01:08:00 ovirteng02 kernel: drbd ovirt: Terminating drbd_a_ovirt Mar 8 01:08:00 ovirteng02 kernel: drbd ovirt: Connection closed Mar 8 01:08:00 ovirteng02 kernel: drbd ovirt: conn( NetworkFailure -> Unconnected ) Mar 8 01:08:00 ovirteng02 kernel: drbd ovirt: receiver terminated Mar 8 01:08:00 ovirteng02 kernel: drbd ovirt: Restarting receiver thread Mar 8 01:08:00 ovirteng02 kernel: drbd ovirt: receiver (re)started Mar 8 01:08:00 ovirteng02 kernel: drbd ovirt: conn( Unconnected -> WFConnection ) Mar 8 01:08:02 ovirteng02 corosync[12908]: [TOTEM ] A processor failed, forming new configuration. Mar 8 01:08:04 ovirteng02 corosync[12908]: [QUORUM] Members[1]: 2 Mar 8 01:08:04 ovirteng02 corosync[12908]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Mar 8 01:08:04 ovirteng02 corosync[12908]: [CPG ] chosen downlist: sender r(0) ip(192.168.33.46) ; members(old:2 left:1) Mar 8 01:08:04 ovirteng02 corosync[12908]: [MAIN ] Completed service synchronization, ready to provide service. Mar 8 01:08:04 ovirteng02 kernel: dlm: closing connection to node 1 Mar 8 01:08:04 ovirteng02 crmd[13168]: notice: crm_update_peer_state: cman_event_callback: Node ovirteng01.localdomain.local[1] - state is now lost (was member) Mar 8 01:08:04 ovirteng02 crmd[13168]: warning: reap_dead_nodes: Our DC node (ovirteng01.localdomain.local) left the cluster Mar 8 01:08:04 ovirteng02 crmd[13168]: notice: do_state_transition: State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=reap_dead_nodes ] Mar 8 01:08:04 ovirteng02 crmd[13168]: notice: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ] Mar 8 01:08:04 ovirteng02 fenced[12962]: fencing node ovirteng01.localdomain.local Mar 8 01:08:04 ovirteng02 attrd[13166]: notice: attrd_local_callback: Sending full refresh (origin=crmd) Mar 8 01:08:04 ovirteng02 attrd[13166]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-OvirtData (10000) Mar 8 01:08:04 ovirteng02 attrd[13166]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true) Mar 8 01:08:04 ovirteng02 fence_pcmk[13733]: Requesting Pacemaker fence ovirteng01.localdomain.local (reset) Mar 8 01:08:04 ovirteng02 stonith_admin[13734]: notice: crm_log_args: Invoked: stonith_admin --reboot ovirteng01.localdomain.local --tolerance 5s --tag cman Mar 8 01:08:04 ovirteng02 stonith-ng[13164]: notice: handle_request: Client stonith_admin.cman.13734.5528351f wants to fence (reboot) 'ovirteng01.localdomain.local' with device '(any)' Mar 8 01:08:04 ovirteng02 stonith-ng[13164]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for ovirteng01.localdomain.local: 1e70a341-efbf-470a-bcaa-886a8acfa9d1 (0) Mar 8 01:08:04 ovirteng02 stonith-ng[13164]: notice: can_fence_host_with_device: Fencing can fence ovirteng01.localdomain.local (aka. 'ovirteng01'): static-list Mar 8 01:08:04 ovirteng02 stonith-ng[13164]: notice: can_fence_host_with_device: Fencing can fence ovirteng01.localdomain.local (aka. 'ovirteng01'): static-list Mar 8 01:08:05 ovirteng02 pengine[13167]: notice: unpack_config: On loss of CCM Quorum: Ignore Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: pe_fence_node: Node ovirteng01.localdomain.local will be fenced because the node is no longer part of the cluster Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: determine_online_status: Node ovirteng01.localdomain.local is unclean Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action OvirtData:0_demote_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action OvirtData:0_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action OvirtData:0_demote_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action OvirtData:0_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action OvirtData:0_demote_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action OvirtData:0_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action OvirtData:0_demote_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action OvirtData:0_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action ip_OvirtData_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action ip_OvirtData_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action lvm_ovirt_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action lvm_ovirt_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action fs_OvirtData_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action fs_OvirtData_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action pgsql_OvirtData_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action pgsql_OvirtData_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action p_lsb_nfs_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action p_lsb_nfs_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action p_exportfs_root_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action p_exportfs_root_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action p_exportfs_iso_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action p_exportfs_iso_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action ovirt-engine_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action ovirt-engine_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action ovirt-websocket-proxy_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action ovirt-websocket-proxy_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action httpd_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action httpd_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: custom_action: Action Fencing_stop_0 on ovirteng01.localdomain.local is unrunnable (offline) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: stage6: Scheduling Node ovirteng01.localdomain.local for STONITH Mar 8 01:08:05 ovirteng02 pengine[13167]: notice: LogActions: Demote OvirtData:0#011(Master -> Stopped ovirteng01.localdomain.local) Mar 8 01:08:05 ovirteng02 pengine[13167]: notice: LogActions: Promote OvirtData:1#011(Slave -> Master ovirteng02.localdomain.local) Mar 8 01:08:05 ovirteng02 pengine[13167]: notice: LogActions: Stop ip_OvirtData#011(ovirteng01.localdomain.local) Mar 8 01:08:05 ovirteng02 pengine[13167]: notice: LogActions: Stop lvm_ovirt#011(ovirteng01.localdomain.local) Mar 8 01:08:05 ovirteng02 pengine[13167]: notice: LogActions: Stop fs_OvirtData#011(ovirteng01.localdomain.local) Mar 8 01:08:05 ovirteng02 pengine[13167]: notice: LogActions: Stop pgsql_OvirtData#011(ovirteng01.localdomain.local) Mar 8 01:08:05 ovirteng02 pengine[13167]: notice: LogActions: Stop p_lsb_nfs#011(ovirteng01.localdomain.local) Mar 8 01:08:05 ovirteng02 pengine[13167]: notice: LogActions: Stop p_exportfs_root#011(ovirteng01.localdomain.local) Mar 8 01:08:05 ovirteng02 pengine[13167]: notice: LogActions: Stop p_exportfs_iso#011(ovirteng01.localdomain.local) Mar 8 01:08:05 ovirteng02 pengine[13167]: notice: LogActions: Stop ovirt-engine#011(ovirteng01.localdomain.local) Mar 8 01:08:05 ovirteng02 pengine[13167]: notice: LogActions: Stop ovirt-websocket-proxy#011(ovirteng01.localdomain.local) Mar 8 01:08:05 ovirteng02 pengine[13167]: notice: LogActions: Stop httpd#011(ovirteng01.localdomain.local) Mar 8 01:08:05 ovirteng02 pengine[13167]: notice: LogActions: Move Fencing#011(Started ovirteng01.localdomain.local -> ovirteng02.localdomain.local) Mar 8 01:08:05 ovirteng02 pengine[13167]: warning: process_pe_message: Calculated Transition 0: /var/lib/pacemaker/pengine/pe-warn-5.bz2 Mar 8 01:08:05 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 1: cancel OvirtData_cancel_31000 on ovirteng02.localdomain.local (local) Mar 8 01:08:05 ovirteng02 crmd[13168]: notice: te_fence_node: Executing reboot fencing operation (53) on ovirteng01.localdomain.local (timeout=60000) Mar 8 01:08:05 ovirteng02 stonith-ng[13164]: notice: handle_request: Client crmd.13168.426620a0 wants to fence (reboot) 'ovirteng01.localdomain.local' with device '(any)' Mar 8 01:08:05 ovirteng02 stonith-ng[13164]: notice: merge_duplicates: Merging stonith action reboot for node ovirteng01.localdomain.local originating from client crmd.13168.3f0b1143 with identical request from stonith_admin.cman.13734@ovirteng02.localdomain.local.1e70a341 (144s) Mar 8 01:08:05 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 75: notify OvirtData_pre_notify_demote_0 on ovirteng02.localdomain.local (local) Mar 8 01:08:05 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation OvirtData_notify_0 (call=82, rc=0, cib-update=0, confirmed=true) ok Mar 8 01:08:17 ovirteng02 stonith-ng[13164]: notice: log_operation: Operation 'reboot' [13736] (call 2 from stonith_admin.cman.13734) for host 'ovirteng01.localdomain.local' with device 'Fencing' returned: 0 (OK) Mar 8 01:08:17 ovirteng02 stonith-ng[13164]: notice: remote_op_done: Operation reboot of ovirteng01.localdomain.local by ovirteng02.localdomain.local for stonith_admin.cman.13734@ovirteng02.localdomain.local.1e70a341: OK Mar 8 01:08:17 ovirteng02 stonith-ng[13164]: notice: remote_op_done: Operation reboot of ovirteng01.localdomain.local by ovirteng02.localdomain.local for crmd.13168@ovirteng02.localdomain.local.3f0b1143: OK Mar 8 01:08:17 ovirteng02 crmd[13168]: notice: tengine_stonith_notify: Peer ovirteng01.localdomain.local was terminated (reboot) by ovirteng02.localdomain.local for ovirteng02.localdomain.local: OK (ref=1e70a341-efbf-470a-bcaa-886a8acfa9d1) by client stonith_admin.cman.13734 Mar 8 01:08:17 ovirteng02 crmd[13168]: notice: tengine_stonith_notify: Notified CMAN that 'ovirteng01.localdomain.local' is now fenced Mar 8 01:08:17 ovirteng02 crmd[13168]: notice: tengine_stonith_callback: Stonith operation 2/53:0:0:c1041760-73fb-42e7-beda-7613fcf53fd6: OK (0) Mar 8 01:08:17 ovirteng02 crmd[13168]: notice: tengine_stonith_notify: Peer ovirteng01.localdomain.local was terminated (reboot) by ovirteng02.localdomain.local for ovirteng02.localdomain.local: OK (ref=3f0b1143-0250-45ca-ab28-0fb18394d124) by client crmd.13168 Mar 8 01:08:17 ovirteng02 crmd[13168]: notice: tengine_stonith_notify: Notified CMAN that 'ovirteng01.localdomain.local' is now fenced Mar 8 01:08:17 ovirteng02 crmd[13168]: notice: run_graph: Transition 0 (Complete=5, Pending=0, Fired=0, Skipped=32, Incomplete=13, Source=/var/lib/pacemaker/pengine/pe-warn-5.bz2): Stopped Mar 8 01:08:17 ovirteng02 fenced[12962]: fence ovirteng01.localdomain.local success Mar 8 01:08:17 ovirteng02 pengine[13167]: notice: unpack_config: On loss of CCM Quorum: Ignore Mar 8 01:08:17 ovirteng02 pengine[13167]: notice: LogActions: Promote OvirtData:0#011(Slave -> Master ovirteng02.localdomain.local) Mar 8 01:08:17 ovirteng02 pengine[13167]: notice: LogActions: Start Fencing#011(ovirteng02.localdomain.local) Mar 8 01:08:17 ovirteng02 pengine[13167]: notice: process_pe_message: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-1081.bz2 Mar 8 01:08:17 ovirteng02 crmd[13168]: warning: destroy_action: Cancelling timer for action 1 (src=50) Mar 8 01:08:17 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 36: start Fencing_start_0 on ovirteng02.localdomain.local (local) Mar 8 01:08:17 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 57: notify OvirtData_pre_notify_promote_0 on ovirteng02.localdomain.local (local) Mar 8 01:08:17 ovirteng02 stonith-ng[13164]: notice: stonith_device_register: Device 'Fencing' already existed in device list (1 active devices) Mar 8 01:08:17 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation OvirtData_notify_0 (call=87, rc=0, cib-update=0, confirmed=true) ok Mar 8 01:08:17 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 6: promote OvirtData_promote_0 on ovirteng02.localdomain.local (local) Mar 8 01:08:17 ovirteng02 kernel: drbd ovirt: helper command: /sbin/drbdadm fence-peer ovirt Mar 8 01:08:17 ovirteng02 crm-fence-peer.sh[13817]: invoked for ovirt Mar 8 01:08:17 ovirteng02 cibadmin[13848]: notice: crm_log_args: Invoked: cibadmin -C -o constraints -X <rsc_location rsc="ms_OvirtData" id="drbd-fence-by-handler-ovirt-ms_OvirtData">#012 <rule role="Master" score="-INFINITY" id="drbd-fence-by-handler-ovirt-rule-ms_OvirtData">#012 <expression attribute="#uname" operation="ne" value="ovirteng02.localdomain.local" id="drbd-fence-by-handler-ovirt-expr-ms_OvirtData"/>#012 </rule>#012</rsc_location> Mar 8 01:08:17 ovirteng02 stonith-ng[13164]: notice: unpack_config: On loss of CCM Quorum: Ignore Mar 8 01:08:17 ovirteng02 cib[13163]: notice: cib:diff: Diff: --- 0.269.29 Mar 8 01:08:17 ovirteng02 cib[13163]: notice: cib:diff: Diff: +++ 0.270.1 128fad6a0899ee7020947394d4e75449 Mar 8 01:08:17 ovirteng02 cib[13163]: notice: cib:diff: -- <cib admin_epoch="0" epoch="269" num_updates="29"/> Mar 8 01:08:17 ovirteng02 cib[13163]: notice: cib:diff: ++ <rsc_location rsc="ms_OvirtData" id="drbd-fence-by-handler-ovirt-ms_OvirtData"> Mar 8 01:08:17 ovirteng02 cib[13163]: notice: cib:diff: ++ <rule role="Master" score="-INFINITY" id="drbd-fence-by-handler-ovirt-rule-ms_OvirtData"> Mar 8 01:08:17 ovirteng02 cib[13163]: notice: cib:diff: ++ <expression attribute="#uname" operation="ne" value="ovirteng02.localdomain.local" id="drbd-fence-by-handler-ovirt-expr-ms_OvirtData"/> Mar 8 01:08:17 ovirteng02 cib[13163]: notice: cib:diff: ++ </rule> Mar 8 01:08:17 ovirteng02 cib[13163]: notice: cib:diff: ++ </rsc_location> Mar 8 01:08:17 ovirteng02 crm-fence-peer.sh[13817]: INFO peer is fenced, my disk is UpToDate: placed constraint 'drbd-fence-by-handler-ovirt-ms_OvirtData' Mar 8 01:08:17 ovirteng02 kernel: drbd ovirt: helper command: /sbin/drbdadm fence-peer ovirt exit code 7 (0x700) Mar 8 01:08:17 ovirteng02 kernel: drbd ovirt: fence-peer helper returned 7 (peer was stonithed) Mar 8 01:08:17 ovirteng02 kernel: drbd ovirt: pdsk( DUnknown -> Outdated ) Mar 8 01:08:17 ovirteng02 kernel: block drbd0: role( Secondary -> Primary ) Mar 8 01:08:17 ovirteng02 kernel: block drbd0: new current UUID 588C3417F90691DB:8168E91059172F68:62F6D4ABA7053F86:62F5D4ABA7053F87 Mar 8 01:08:17 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation OvirtData_promote_0 (call=90, rc=0, cib-update=52, confirmed=true) ok Mar 8 01:08:17 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 58: notify OvirtData_post_notify_promote_0 on ovirteng02.localdomain.local (local) Mar 8 01:08:17 ovirteng02 stonith-ng[13164]: notice: stonith_device_register: Device 'Fencing' already existed in device list (1 active devices) Mar 8 01:08:17 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation OvirtData_notify_0 (call=93, rc=0, cib-update=0, confirmed=true) ok Mar 8 01:08:26 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation Fencing_start_0 (call=85, rc=0, cib-update=53, confirmed=true) ok Mar 8 01:08:26 ovirteng02 crmd[13168]: notice: run_graph: Transition 1 (Complete=10, Pending=0, Fired=0, Skipped=3, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-1081.bz2): Stopped Mar 8 01:08:26 ovirteng02 pengine[13167]: notice: unpack_config: On loss of CCM Quorum: Ignore Mar 8 01:08:26 ovirteng02 pengine[13167]: notice: process_pe_message: Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-1082.bz2 Mar 8 01:08:26 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 8: monitor OvirtData_monitor_29000 on ovirteng02.localdomain.local (local) Mar 8 01:08:26 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 39: monitor Fencing_monitor_600000 on ovirteng02.localdomain.local (local) Mar 8 01:08:26 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation OvirtData_monitor_29000 (call=97, rc=8, cib-update=55, confirmed=false) master Mar 8 01:08:26 ovirteng02 crmd[13168]: notice: process_lrm_event: ovirteng02.localdomain.local-OvirtData_monitor_29000:97 [ \n ] Mar 8 01:08:33 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation Fencing_monitor_600000 (call=99, rc=0, cib-update=56, confirmed=false) ok Mar 8 01:08:33 ovirteng02 crmd[13168]: notice: run_graph: Transition 2 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-1082.bz2): Complete Mar 8 01:08:33 ovirteng02 crmd[13168]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] situation remains: Last updated: Sat Mar 8 01:08:33 2014 Last change: Sat Mar 8 01:08:17 2014 via cibadmin on ovirteng02.localdomain.local Stack: cman Current DC: ovirteng02.localdomain.local - partition with quorum Version: 1.1.10-14.el6_5.2-368c726 2 Nodes configured 13 Resources configured Online: [ ovirteng02.localdomain.local ] OFFLINE: [ ovirteng01.localdomain.local ] Master/Slave Set: ms_OvirtData [OvirtData] Masters: [ ovirteng02.localdomain.local ] Stopped: [ ovirteng01.localdomain.local ] Fencing (stonith:fence_vmware): Started ovirteng02.localdomain.local I have to manually run (ovirt is the name of my group) # pcs resource clear ovirt and I suddenly get # crm_mon -1 Last updated: Sat Mar 8 01:19:52 2014 Last change: Sat Mar 8 01:19:18 2014 via crm_resource on ovirteng02.localdomain.local Stack: cman Current DC: ovirteng02.localdomain.local - partition with quorum Version: 1.1.10-14.el6_5.2-368c726 2 Nodes configured 13 Resources configured Online: [ ovirteng02.localdomain.local ] OFFLINE: [ ovirteng01.localdomain.local ] Master/Slave Set: ms_OvirtData [OvirtData] Masters: [ ovirteng02.localdomain.local ] Stopped: [ ovirteng01.localdomain.local ] Resource Group: ovirt ip_OvirtData (ocf::heartbeat:IPaddr2): Started ovirteng02.localdomain.local lvm_ovirt (ocf::heartbeat:LVM): Started ovirteng02.localdomain.local fs_OvirtData (ocf::heartbeat:Filesystem): Started ovirteng02.localdomain.local pgsql_OvirtData (lsb:postgresql): Started ovirteng02.localdomain.local p_lsb_nfs (lsb:nfs): Started ovirteng02.localdomain.local p_exportfs_root (ocf::heartbeat:exportfs): Started ovirteng02.localdomain.local p_exportfs_iso (ocf::heartbeat:exportfs): Started ovirteng02.localdomain.local ovirt-engine (lsb:ovirt-engine): Started ovirteng02.localdomain.local ovirt-websocket-proxy (lsb:ovirt-websocket-proxy): Started ovirteng02.localdomain.local httpd (ocf::heartbeat:apache): Started ovirteng02.localdomain.local Fencing (stonith:fence_vmware): Started ovirteng02.localdomain.local So where I'm doing wrong that group doesn't automatically start on ovirteng02? These are the lines right after the clear command on ovirteng02 (ovirteng01 is still at grub prompt) latest output Mar 8 01:08:33 ovirteng02 crmd[13168]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ i nput=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] after clear Mar 8 01:19:18 ovirteng02 crmd[13168]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input =I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Mar 8 01:19:18 ovirteng02 stonith-ng[13164]: notice: unpack_config: On loss of CCM Quorum: Ignore Mar 8 01:19:18 ovirteng02 cib[13163]: notice: cib:diff: Diff: --- 0.270.5 Mar 8 01:19:18 ovirteng02 cib[13163]: notice: cib:diff: Diff: +++ 0.271.1 936dd803304ac6abd83dd63717139bb7 Mar 8 01:19:18 ovirteng02 cib[13163]: notice: cib:diff: -- <rsc_location id="cli-ban-ovirt-on-ovirteng02.localdo main.local" rsc="ovirt" role="Started" node="ovirteng02.localdomain.local" score="-INFINITY"/> Mar 8 01:19:18 ovirteng02 cib[13163]: notice: cib:diff: ++ <cib admin_epoch="0" cib-last-written="Sat Mar 8 01:19:18 2014" crm_feature_set="3.0.7" epoch="271" have-quorum="1" num_updates="1" update-client="crm_resource" update-origin="ovi rteng02.localdomain.local" validate-with="pacemaker-1.2" dc-uuid="ovirteng02.localdomain.local"/> Mar 8 01:19:18 ovirteng02 pengine[13167]: notice: unpack_config: On loss of CCM Quorum: Ignore Mar 8 01:19:18 ovirteng02 pengine[13167]: notice: LogActions: Start ip_OvirtData#011(ovirteng02.localdomain.local) Mar 8 01:19:18 ovirteng02 pengine[13167]: notice: LogActions: Start lvm_ovirt#011(ovirteng02.localdomain.local) Mar 8 01:19:18 ovirteng02 pengine[13167]: notice: LogActions: Start fs_OvirtData#011(ovirteng02.localdomain.local) Mar 8 01:19:18 ovirteng02 pengine[13167]: notice: LogActions: Start pgsql_OvirtData#011(ovirteng02.localdomain.local ) Mar 8 01:19:18 ovirteng02 pengine[13167]: notice: LogActions: Start p_lsb_nfs#011(ovirteng02.localdomain.local) Mar 8 01:19:18 ovirteng02 pengine[13167]: notice: LogActions: Start p_exportfs_root#011(ovirteng02.localdomain.local) Mar 8 01:19:18 ovirteng02 pengine[13167]: notice: LogActions: Start p_exportfs_iso#011(ovirteng02.localdomain.local) Mar 8 01:19:18 ovirteng02 pengine[13167]: notice: LogActions: Start ovirt-engine#011(ovirteng02.localdomain.local) Mar 8 01:19:18 ovirteng02 pengine[13167]: notice: LogActions: Start ovirt-websocket-proxy#011(ovirteng02.localdomain.local) Mar 8 01:19:18 ovirteng02 pengine[13167]: notice: LogActions: Start httpd#011(ovirteng02.localdomain.local) Mar 8 01:19:18 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 34: start ip_OvirtData_start_0 on ovirteng02.localdomain.local (local) Mar 8 01:19:18 ovirteng02 pengine[13167]: notice: process_pe_message: Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-1083.bz2 Mar 8 01:19:18 ovirteng02 stonith-ng[13164]: notice: stonith_device_register: Device 'Fencing' already existed in device list (1 active devices) Mar 8 01:19:18 ovirteng02 IPaddr2(ip_OvirtData)[14528]: INFO: Adding inet address 192.168.33.47/24 with broadcast address 192.168.33.255 to device eth0 Mar 8 01:19:18 ovirteng02 IPaddr2(ip_OvirtData)[14528]: INFO: Bringing device eth0 up Mar 8 01:19:18 ovirteng02 IPaddr2(ip_OvirtData)[14528]: INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-192.168.33.47 eth0 192.168.33.47 auto not_used not_used Mar 8 01:19:18 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation ip_OvirtData_start_0 (call=103, rc=0, cib-update=58, confirmed=true) ok Mar 8 01:19:18 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 35: monitor ip_OvirtData_monitor_60000 on ovirteng02.localdomain.local (local) Mar 8 01:19:18 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 36: start lvm_ovirt_start_0 on ovirteng02.localdomain.local (local) Mar 8 01:19:18 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation ip_OvirtData_monitor_60000 (call=106, rc=0, cib-update=59, confirmed=false) ok Mar 8 01:19:19 ovirteng02 LVM(lvm_ovirt)[14599]: INFO: Activating volume group VG_OVIRT Mar 8 01:19:19 ovirteng02 LVM(lvm_ovirt)[14599]: INFO: Reading all physical volumes. This may take a while... Found volume group "rootvg" using metadata type lvm2 Found volume group "VG_OVIRT" using metadata type lvm2 Mar 8 01:19:19 ovirteng02 LVM(lvm_ovirt)[14599]: INFO: 1 logical volume(s) in volume group "VG_OVIRT" now active Mar 8 01:19:19 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation lvm_ovirt_start_0 (call=108, rc=0, cib-update=60, confirmed=true) ok Mar 8 01:19:19 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 37: monitor lvm_ovirt_monitor_60000 on ovirteng02.localdomain.local (local) Mar 8 01:19:19 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 38: start fs_OvirtData_start_0 on ovirteng02.localdomain.local (local) Mar 8 01:19:19 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation lvm_ovirt_monitor_60000 (call=112, rc= 0, cib-update=61, confirmed=false) ok Mar 8 01:19:19 ovirteng02 Filesystem(fs_OvirtData)[14693]: INFO: Running start for /dev/VG_OVIRT/LV_OVIRT on /shared Mar 8 01:19:19 ovirteng02 kernel: EXT4-fs (dm-5): warning: maximal mount count reached, running e2fsck is recommended Mar 8 01:19:19 ovirteng02 kernel: EXT4-fs (dm-5): 1 orphan inode deleted Mar 8 01:19:19 ovirteng02 kernel: EXT4-fs (dm-5): recovery complete Mar 8 01:19:19 ovirteng02 kernel: EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: Mar 8 01:19:19 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation fs_OvirtData_start_0 (call=114, rc=0, cib-update=62, confirmed=true) ok Mar 8 01:19:19 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 39: monitor fs_OvirtData_monitor_60000 on ovirteng02.localdomain.local (local) Mar 8 01:19:19 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 40: start pgsql_OvirtData_start_0 on ovirteng02.localdomain.local (local) Mar 8 01:19:19 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation fs_OvirtData_monitor_60000 (call=118, rc=0, cib-update=63, confirmed=false) ok Mar 8 01:19:21 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation pgsql_OvirtData_start_0 (call=120, rc=0, cib-update=64, confirmed=true) ok Mar 8 01:19:21 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 41: monitor pgsql_OvirtData_monitor_30000 on ovirteng02.localdomain.local (local) Mar 8 01:19:21 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 42: start p_lsb_nfs_start_0 on ovirteng02.localdomain.local (local) Mar 8 01:19:21 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation pgsql_OvirtData_monitor_30000 (call=124, rc=0, cib-update=65, confirmed=false) ok Mar 8 01:19:21 ovirteng02 rpc.mountd[14884]: Version 1.2.3 starting Mar 8 01:19:22 ovirteng02 kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory Mar 8 01:19:22 ovirteng02 kernel: NFSD: starting 90-second grace period Mar 8 01:19:22 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation p_lsb_nfs_start_0 (call=126, rc=0, cib-update=66, confirmed=true) ok Mar 8 01:19:22 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 43: monitor p_lsb_nfs_monitor_30000 o n ovirteng02.localdomain.local (local) Mar 8 01:19:22 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 44: start p_exportfs_root_start_0 on ovirteng02.localdomain.local (local) Mar 8 01:19:22 ovirteng02 exportfs(p_exportfs_root)[14925]: INFO: Directory /shared/var/lib/exports is not exported to 0.0.0.0/0.0.0.0 (stopped). Mar 8 01:19:22 ovirteng02 exportfs(p_exportfs_root)[14925]: INFO: Exporting file system ... Mar 8 01:19:22 ovirteng02 exportfs(p_exportfs_root)[14925]: INFO: exporting 0.0.0.0/0.0.0.0:/shared/var/lib/exports Mar 8 01:19:22 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation p_lsb_nfs_monitor_30000 (call=130, rc=0, cib-update=67, confirmed=false) ok Mar 8 01:19:22 ovirteng02 exportfs(p_exportfs_root)[14925]: INFO: File system exported Mar 8 01:19:22 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation p_exportfs_root_start_0 (call=132, rc=0, cib-update=68, confirmed=true) ok Mar 8 01:19:22 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 45: monitor p_exportfs_root_monitor_30000 on ovirteng02.localdomain.local (local) Mar 8 01:19:22 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 46: start p_exportfs_iso_start_0 on ovirteng02.localdomain.local (local) Mar 8 01:19:22 ovirteng02 exportfs(p_exportfs_root)[14990]: INFO: Directory /shared/var/lib/exports is exported to 0.0.0.0/0.0.0.0 (started). Mar 8 01:19:22 ovirteng02 exportfs(p_exportfs_iso)[14991]: INFO: Directory /shared/var/lib/exports/iso is not exported to 0.0.0.0/0.0.0.0 (stopped). Mar 8 01:19:22 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation p_exportfs_root_monitor_30000 (call=136, rc=0, cib-update=69, confirmed=false) ok Mar 8 01:19:22 ovirteng02 exportfs(p_exportfs_iso)[14991]: INFO: Exporting file system ... Mar 8 01:19:22 ovirteng02 exportfs(p_exportfs_iso)[14991]: INFO: exporting 0.0.0.0/0.0.0.0:/shared/var/lib/exports/iso Mar 8 01:19:22 ovirteng02 exportfs(p_exportfs_iso)[14991]: INFO: File system exported Mar 8 01:19:22 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation p_exportfs_iso_start_0 (call=138, rc=0, cib-update=70, confirmed=true) ok Mar 8 01:19:22 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 47: monitor p_exportfs_iso_monitor_30 000 on ovirteng02.localdomain.local (local) Mar 8 01:19:22 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 48: start ovirt-engine_start_0 on ovirteng02.localdomain.local (local) Mar 8 01:19:22 ovirteng02 exportfs(p_exportfs_iso)[15034]: INFO: Directory /shared/var/lib/exports/iso is exported to 0.0.0.0/0.0.0.0 (started). Mar 8 01:19:22 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation p_exportfs_iso_monitor_30000 (call=142, rc=0, cib-update=71, confirmed=false) ok Mar 8 01:19:27 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation ovirt-engine_start_0 (call=144, rc=0, cib-update=72, confirmed=true) ok Mar 8 01:19:27 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 49: monitor ovirt-engine_monitor_300000 on ovirteng02.localdomain.local (local) Mar 8 01:19:27 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 50: start ovirt-websocket-proxy_start_0 on ovirteng02.localdomain.local (local) Mar 8 01:19:27 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation ovirt-engine_monitor_300000 (call=148, rc=0, cib-update=73, confirmed=false) ok Mar 8 01:19:33 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation ovirt-websocket-proxy_start_0 (call=150, rc=0, cib-update=74, confirmed=true) ok Mar 8 01:19:33 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 51: monitor ovirt-websocket-proxy_monitor_30000 on ovirteng02.localdomain.local (local) Mar 8 01:19:33 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 52: start httpd_start_0 on ovirteng02.localdomain.local (local) Mar 8 01:19:33 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation ovirt-websocket-proxy_monitor_30000 (call=154, rc=0, cib-update=75, confirmed=false) ok Mar 8 01:19:33 ovirteng02 apache(httpd)[15268]: INFO: apache not running Mar 8 01:19:33 ovirteng02 apache(httpd)[15268]: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up Mar 8 01:19:35 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation httpd_start_0 (call=156, rc=0, cib-update=76, confirmed=true) ok Mar 8 01:19:35 ovirteng02 crmd[13168]: notice: te_rsc_command: Initiating action 53: monitor httpd_monitor_5000 on ovi rteng02.localdomain.local (local) Mar 8 01:19:35 ovirteng02 crmd[13168]: notice: process_lrm_event: LRM operation httpd_monitor_5000 (call=160, rc=0, cib-update=77, confirmed=false) ok Mar 8 01:19:35 ovirteng02 crmd[13168]: notice: run_graph: Transition 3 (Complete=22, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-1083.bz2): Complete Mar 8 01:19:35 ovirteng02 crmd[13168]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Mar 8 01:19:52 ovirteng02 exportfs(p_exportfs_root)[15621]: INFO: Directory /shared/var/lib/exports is exported to 0.0.0.0/0.0.0.0 (started). Mar 8 01:19:52 ovirteng02 exportfs(p_exportfs_iso)[15639]: INFO: Directory /shared/var/lib/exports/iso is exported to 0.0.0.0/0.0.0.0 (started). Mar 8 01:20:22 ovirteng02 exportfs(p_exportfs_root)[16072]: INFO: Directory /shared/var/lib/exports is exported to 0.0.0.0/0.0.0.0 (started). Mar 8 01:20:22 ovirteng02 exportfs(p_exportfs_iso)[16094]: INFO: Directory /shared/var/lib/exports/iso is exported to 0.0.0.0/0.0.0.0 (started). Mar 8 01:20:52 ovirteng02 exportfs(p_exportfs_root)[16444]: INFO: Directory /shared/var/lib/exports is exported to 0.0.0.0/0.0.0.0 (started). Mar 8 01:20:52 ovirteng02 exportfs(p_exportfs_iso)[16455]: INFO: Directory /shared/var/lib/exports/iso is exported to 0.0.0.0/0.0.0.0 (started). Thanks in advance for any clarification. Gianluca _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org