Setting no-quorum-policy to ignore and disabling stonith is not a good
idea. You're sort of inviting the cluster to do screwed up things.
On 10/24/2011 08:23 AM, ihjaz Mohamed wrote:
Hi All,
I 've pacemaker running with corosync. Following is my CRM configuration.
node soalaba56
node soalaba63
primitive FloatingIP ocf:heartbeat:IPaddr2 \
params ip="<floating_ip>" nic="eth0:0"
primitive acestatus lsb:acestatus \
primitive pingd ocf:pacemaker:ping \
params host_list="<gateway_ip>" multiplier="100" \
op monitor interval="15s" timeout="5s"
group HAService FloatingIP acestatus \
meta target-role="Started"
clone pingdclone pingd \
meta globally-unique="false"
location ip1_location FloatingIP \
rule $id="ip1_location-rule" pingd: defined pingd
property $id="cib-bootstrap-options" \
dc-version="1.1.5-5.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
last-lrm-refresh="1305736421"
----------------------------------------------------------------------
When I reboot both the nodes together, cluster goes into an
(unmanaged) Failed state as shown below.
============
Last updated: Mon Oct 24 08:10:42 2011
Stack: openais
Current DC: soalaba63 - partition with quorum
Version: 1.1.5-5.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ soalaba56 soalaba63 ]
Resource Group: HAService
FloatingIP (ocf::heartbeat:IPaddr2) Started (unmanaged)
FAILED[ soalaba63 soalaba56 ]
acestatus (lsb:acestatus): Stopped
Clone Set: pingdclone [pingd]
Started: [ soalaba56 soalaba63 ]
Failed actions:
FloatingIP_stop_0 (node=soalaba63, call=7, rc=1, status=complete):
unknown error
FloatingIP_stop_0 (node=soalaba56, call=7, rc=1, status=complete):
unknown error
------------------------------------------------------------------------------
This happens only when the reboot is done simultaneously on both the
nodes. If reboot is done with some interval in between this is not
seen. Looking into the logs I see that when the nodes come up
resources are started on both the nodes and then it tries to stop the
started resources and fails there.
I've attached the logs.
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
--
Alan Robertson<al...@unix.sh>
"Openness is the foundation and preservative of friendship... Let me claim from you
at all times your undisguised opinions." - William Wilberforce
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker