On 20 Feb 2014, at 8:29 am, Aggarwal, Ajay <aaggar...@verizon.com> wrote:
> I suspected monitor action myself, based on error messages. But the script is > either returning OCF_SUCCESS (0) or OCF_NOT_RUNNING (7) for monitor action. I > ran it manually too to confirm. I'm going to have to say otherwise: > Feb 04 11:27:38 [45168] gol-5-7-0 crmd: warning: status_from_rc: > Action 8 (GOL-HA_monitor_0) on gol-5-7-6 failed (target: 7 vs. rc: 1): Error This indicates the agent returned an error (1). > ________________________________________ > From: Andrew Beekhof [and...@beekhof.net] > Sent: Monday, February 17, 2014 6:46 PM > To: The Pacemaker cluster resource manager > Subject: Re: [Pacemaker] resource is too active problem in a 2-node cluster > > On 18 Feb 2014, at 5:33 am, Ajay Aggarwal <aaggar...@verizon.com> wrote: > >> Thanks Andrew for pointing towards the OCF resource agent's list of "must >> implement" actions. I noticed that our OCF script only implements start, >> stop and monitor. It does not implement meta-data and validate-all. Could >> this error be a result of these un-implemented actions? > > Unlikely. More likely the monitor action is not correctly returning > OCF_NOT_RUNNING if run before the resource is running. > >> On 02/16/2014 09:15 PM, Andrew Beekhof wrote: >>> On 12 Feb 2014, at 1:39 am, Ajay Aggarwal <aaggar...@verizon.com> >>> wrote: >>> >>> >>>> Yes, we have cman (version: cman-3.0.12.1-49). We use manual fencing ( I >>>> know it is not recommended). There is an external monitoring and fencing >>>> service that we use (our own). >>>> >>>> Perhaps subject line "resource is too active problem in a 2-node cluster" >>>> was misleading. Real problem is that resource is *NOT* too active, but >>>> pacemaker thinks it is. >>>> >>> It only thinks what the resource agent tells us. >>> Sounds like script.sh isn't OCF compliant. >>> >>> >>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_actions.html >>> >>> >>> >>>> Which leads to undesirable recovery procedure. See log lines below >>>> >>>> Feb 04 11:27:38 [45167] gol-5-7-0 pengine: warning: unpack_rsc_op: >>>> Processing failed op monitor for GOL-HA on gol-5-7-0: unknown error (1) >>>> Feb 04 11:27:38 [45167] gol-5-7-0 pengine: warning: unpack_rsc_op: >>>> Processing failed op monitor for GOL-HA on gol-5-7-6: unknown error (1) >>>> Feb 04 11:27:38 [45167] gol-5-7-0 pengine: error: >>>> native_create_actions: Resource GOL-HA (ocf::script.sh) is active on 2 >>>> nodes attempting recovery >>>> >>>> >>>> >>>> >>>> >>>> On 02/10/2014 09:43 PM, Digimer wrote: >>>> >>>>> On 10/02/14 09:13 PM, Aggarwal, Ajay wrote: >>>>> >>>>>> I have a 2 node cluster with no-quorum-policy=ignore. I call these nodes >>>>>> as node-0 and node-1. In addition, I have two cluster resources in a >>>>>> group; an IP-address and an OCF script. >>>>>> >>>>> Turning off quorum on a 2-node cluster is fine, in fact, it's required. >>>>> However, that makes stonith all the more important. Without stonith, in >>>>> any cluster but in particualr on two node clusters, things will not work >>>>> right. >>>>> >>>>> First and foremost; Configure stonith and test to make sure it works. >>>>> >>>>> >>>>>> Pacemaker version: 1.1.10 >>>>>> Corosync version: 1-4.1-15 >>>>>> OS: CentOS 6.4 >>>>>> >>>>> With CentOS/RHEL 6, you need cman as well. Please be sure to also >>>>> configure fence_pcmk in cluster.conf to "hook" it into pacemaker's real >>>>> fencing. >>>>> >>>>> >>>>>> What am I doing wrong? >>>>>> >>>>> <snip> >>>>> >>>>>> <nvpair id="cib-bootstrap-options-stonith-enabled" >>>>>> name="stonith-enabled" value="false"/> >>>>>> >>>>> That. :) >>>>> >>>>> Once you have stonith working, see if the problem remains. >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Pacemaker mailing list: >>>> Pacemaker@oss.clusterlabs.org >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> >>>> Project Home: >>>> http://www.clusterlabs.org >>>> >>>> Getting started: >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> >>>> Bugs: >>>> http://bugs.clusterlabs.org >>> >>> >>> _______________________________________________ >>> Pacemaker mailing list: >>> Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> >>> Project Home: >>> http://www.clusterlabs.org >>> >>> Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> >>> Bugs: >>> http://bugs.clusterlabs.org >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org