Re: [Pacemaker] resource is too active problem in a 2-node cluster

Andrew Beekhof Wed, 19 Feb 2014 15:18:07 -0800

On 20 Feb 2014, at 8:29 am, Aggarwal, Ajay <aaggar...@verizon.com> wrote:


> I suspected monitor action myself, based on error messages. But the script is 
> either returning OCF_SUCCESS (0) or OCF_NOT_RUNNING (7) for monitor action. I 
> ran it manually too to confirm.

I'm going to have to say otherwise:

> Feb 04 11:27:38 [45168] gol-5-7-0       crmd:  warning: status_from_rc:     
> Action 8 (GOL-HA_monitor_0) on gol-5-7-6 failed (target: 7 vs. rc: 1): Error

This indicates the agent returned an error (1).

> ________________________________________
> From: Andrew Beekhof [and...@beekhof.net]
> Sent: Monday, February 17, 2014 6:46 PM
> To: The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] resource is too active problem in a 2-node cluster
> 
> On 18 Feb 2014, at 5:33 am, Ajay Aggarwal <aaggar...@verizon.com> wrote:
> 
>> Thanks Andrew for pointing towards the OCF resource agent's list of "must 
>> implement" actions. I noticed that our OCF script only implements start, 
>> stop and monitor. It does not implement meta-data and validate-all.  Could 
>> this error be a result of these un-implemented actions?
> 
> Unlikely. More likely the monitor action is not correctly returning 
> OCF_NOT_RUNNING if run before the resource is running.
> 
>> On 02/16/2014 09:15 PM, Andrew Beekhof wrote:
>>> On 12 Feb 2014, at 1:39 am, Ajay Aggarwal <aaggar...@verizon.com>
>>> wrote:
>>> 
>>> 
>>>> Yes, we have cman (version: cman-3.0.12.1-49). We use manual fencing ( I 
>>>> know it is not recommended).  There is an external monitoring and fencing 
>>>> service that we use (our own).
>>>> 
>>>> Perhaps subject line "resource is too active problem in a 2-node cluster" 
>>>> was misleading. Real problem is that resource is *NOT* too active, but 
>>>> pacemaker thinks it is.
>>>> 
>>> It only thinks what the resource agent tells us.
>>> Sounds like script.sh isn't OCF compliant.
>>> 
>>> 
>>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_actions.html
>>> 
>>> 
>>> 
>>>> Which leads to undesirable recovery procedure. See log lines below
>>>> 
>>>> Feb 04 11:27:38 [45167] gol-5-7-0    pengine:  warning: unpack_rsc_op:     
>>>> Processing failed op monitor for GOL-HA on gol-5-7-0: unknown error (1)
>>>> Feb 04 11:27:38 [45167] gol-5-7-0    pengine:  warning: unpack_rsc_op:     
>>>> Processing failed op monitor for GOL-HA on gol-5-7-6: unknown error (1)
>>>> Feb 04 11:27:38 [45167] gol-5-7-0    pengine:    error: 
>>>> native_create_actions:     Resource GOL-HA (ocf::script.sh) is active on 2 
>>>> nodes attempting recovery
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 02/10/2014 09:43 PM, Digimer wrote:
>>>> 
>>>>> On 10/02/14 09:13 PM, Aggarwal, Ajay wrote:
>>>>> 
>>>>>> I have a 2 node cluster with no-quorum-policy=ignore. I call these nodes 
>>>>>> as node-0 and node-1. In addition, I have two cluster resources in a 
>>>>>> group; an IP-address and an OCF script.
>>>>>> 
>>>>> Turning off quorum on a 2-node cluster is fine, in fact, it's required. 
>>>>> However, that makes stonith all the more important. Without stonith, in 
>>>>> any cluster but in particualr on two node clusters, things will not work 
>>>>> right.
>>>>> 
>>>>> First and foremost; Configure stonith and test to make sure it works.
>>>>> 
>>>>> 
>>>>>>   Pacemaker version: 1.1.10
>>>>>>   Corosync version: 1-4.1-15
>>>>>>   OS: CentOS 6.4
>>>>>> 
>>>>> With CentOS/RHEL 6, you need cman as well. Please be sure to also 
>>>>> configure fence_pcmk in cluster.conf to "hook" it into pacemaker's real 
>>>>> fencing.
>>>>> 
>>>>> 
>>>>>> What am I doing wrong?
>>>>>> 
>>>>> <snip>
>>>>> 
>>>>>>        <nvpair id="cib-bootstrap-options-stonith-enabled" 
>>>>>> name="stonith-enabled" value="false"/>
>>>>>> 
>>>>> That. :)
>>>>> 
>>>>> Once you have stonith working, see if the problem remains.
>>>>> 
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> Pacemaker mailing list:
>>>> Pacemaker@oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> 
>>>> 
>>>> Project Home:
>>>> http://www.clusterlabs.org
>>>> 
>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> 
>>>> Bugs:
>>>> http://bugs.clusterlabs.org
>>> 
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list:
>>> Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> 
>>> Project Home:
>>> http://www.clusterlabs.org
>>> 
>>> Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> 
>>> Bugs:
>>> http://bugs.clusterlabs.org
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] resource is too active problem in a 2-node cluster

Reply via email to