Re: [Pacemaker] OCF Resource agent monitor activity failed due to temporary error

Kulovits Christian - OS ITSC Thu, 19 Apr 2012 07:52:01 -0700

>You want pacemaker to ignore monitor errors on all unknown return values
>and go on with monitoring until a resource "heals" itself?

Definitely not. I do not want to let pacemaker ignore all unknown return values.
I ever thought that pacemaker is a tool for HA.

>.... please rethink ... it is a resource agents work to reliable tell
>pacemaker the definite resource state -- and "uhm, hm, don't know now
>please try later" can be everything -- and how to find that out is very
>specific depending on the resource. IMHO that makes no sense at all to
>let the cluster manager do this work.

I do not want to let the cluster manager do this work. Instead a method for 
retry of a RA monitor activity in the next interval should be provided.

In this specific case a whole application becomes unavailable only because the 
external command to check the resource state was temporarily unavailable. The 
resource itself was available until pacemaker did a restart. To retry the 
command until it succeeds is an option until the specified timeout occurs. The 
RA has no option to avoid this. I think it could be a nice feature to give the 
RA the options to return a value for on-fail. If the RA could return 
on-fail=block (Don't perform any further operations on the resource) and 
pacemaker would it set unmanaged, the resource would be HA.

>There may be cases were a "degraded" resource state may be a nice
>feature and is already a topic here on the list ... from time to time.

There may be sufficient reasons to ignore topics on the list .... from time to 
time. But our goal is HA and there is no reason not to talk about it, or?

Christian

-----Original Message-----
From: Andreas Kurz [mailto:andr...@hastexo.com]
Sent: Donnerstag, 19. April 2012 14:36
To: pacemaker@oss.clusterlabs.org
Subject: Re: [Pacemaker] OCF Resource agent monitor activity failed due to 
temporary error

On 04/19/2012 01:59 PM, Kulovits Christian - OS ITSC wrote:
> Hi Andreas,
> Exactly this is what i want pacemaker to do when my RA is not able to 
> determine the resource´s state. But without running into timeout and restart.
> It's the method to display the resource´s state that is unavailable not the 
> resource itself. This typically approach must be coded in every RA instead of 
> once in pacemaker.

You want pacemaker to ignore monitor errors on all unknown return values
and go on with monitoring until a resource "heals" itself?

.... please rethink ... it is a resource agents work to reliable tell
pacemaker the definite resource state -- and "uhm, hm, don't know now
please try later" can be everything -- and how to find that out is very
specific depending on the resource. IMHO that makes no sense at all to
let the cluster manager do this work.

There may be cases were a "degraded" resource state may be a nice
feature and is already a topic here on the list ... from time to time.

Regards,
Andreas

> Christian
>
> -----Original Message-----
> From: Andreas Kurz [mailto:andr...@hastexo.com]
> Sent: Donnerstag, 19. April 2012 13:51
> To: pacemaker@oss.clusterlabs.org
> Subject: Re: [Pacemaker] OCF Resource agent monitor activity failed due to 
> temporary error
>
> Hi Christian,
>
> On 04/19/2012 01:38 PM, Kulovits Christian - OS ITSC wrote:
>> Hi, Andreas
>>
>> What if the RA gets a response from an external command in the form: 
>> "display currently unavailable, try later". The RA has 3 possibly states 
>> available, "Running", "Not Running", "Failed". But in this situation he 
>> would say "don't know". When I set "on-fail=ignore" this error will be 
>> ignored the same way as when response is "not running" and the resource will 
>> never be restarted.
>> Christian
>
> A typically approach is to wait a little bit and retry the monitor
> command until it succeeds to deliver a valid status (running/not
> running) or the RA monitor operation timeouts and the script is killed
> including resource recovery.
>
> Regards,
> Andreas
>

--
Need help with Pacemaker?
http://www.hastexo.com/now

______________________________________________________________________

Austrian Airlines AG, Office Park 2, P.O. Box 100, 1300 Vienna-Airport, 
Austria, registered office: Vienna, registered with Vienna Commercial Court 
under FN 111000k, DVR 0091740. This e-mail is confidential and is subject to 
disclaimers. Details can be found at: http://www.austrian.com/disclaimer.

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] OCF Resource agent monitor activity failed due to temporary error

Reply via email to