>You want pacemaker to ignore monitor errors on all unknown return values >and go on with monitoring until a resource "heals" itself?
Definitely not. I do not want to let pacemaker ignore all unknown return values. I ever thought that pacemaker is a tool for HA. >.... please rethink ... it is a resource agents work to reliable tell >pacemaker the definite resource state -- and "uhm, hm, don't know now >please try later" can be everything -- and how to find that out is very >specific depending on the resource. IMHO that makes no sense at all to >let the cluster manager do this work. I do not want to let the cluster manager do this work. Instead a method for retry of a RA monitor activity in the next interval should be provided. In this specific case a whole application becomes unavailable only because the external command to check the resource state was temporarily unavailable. The resource itself was available until pacemaker did a restart. To retry the command until it succeeds is an option until the specified timeout occurs. The RA has no option to avoid this. I think it could be a nice feature to give the RA the options to return a value for on-fail. If the RA could return on-fail=block (Don't perform any further operations on the resource) and pacemaker would it set unmanaged, the resource would be HA. >There may be cases were a "degraded" resource state may be a nice >feature and is already a topic here on the list ... from time to time. There may be sufficient reasons to ignore topics on the list .... from time to time. But our goal is HA and there is no reason not to talk about it, or? Christian -----Original Message----- From: Andreas Kurz [mailto:andr...@hastexo.com] Sent: Donnerstag, 19. April 2012 14:36 To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] OCF Resource agent monitor activity failed due to temporary error On 04/19/2012 01:59 PM, Kulovits Christian - OS ITSC wrote: > Hi Andreas, > Exactly this is what i want pacemaker to do when my RA is not able to > determine the resourceĀ“s state. But without running into timeout and restart. > It's the method to display the resourceĀ“s state that is unavailable not the > resource itself. This typically approach must be coded in every RA instead of > once in pacemaker. You want pacemaker to ignore monitor errors on all unknown return values and go on with monitoring until a resource "heals" itself? .... please rethink ... it is a resource agents work to reliable tell pacemaker the definite resource state -- and "uhm, hm, don't know now please try later" can be everything -- and how to find that out is very specific depending on the resource. IMHO that makes no sense at all to let the cluster manager do this work. There may be cases were a "degraded" resource state may be a nice feature and is already a topic here on the list ... from time to time. Regards, Andreas > Christian > > -----Original Message----- > From: Andreas Kurz [mailto:andr...@hastexo.com] > Sent: Donnerstag, 19. April 2012 13:51 > To: pacemaker@oss.clusterlabs.org > Subject: Re: [Pacemaker] OCF Resource agent monitor activity failed due to > temporary error > > Hi Christian, > > On 04/19/2012 01:38 PM, Kulovits Christian - OS ITSC wrote: >> Hi, Andreas >> >> What if the RA gets a response from an external command in the form: >> "display currently unavailable, try later". The RA has 3 possibly states >> available, "Running", "Not Running", "Failed". But in this situation he >> would say "don't know". When I set "on-fail=ignore" this error will be >> ignored the same way as when response is "not running" and the resource will >> never be restarted. >> Christian > > A typically approach is to wait a little bit and retry the monitor > command until it succeeds to deliver a valid status (running/not > running) or the RA monitor operation timeouts and the script is killed > including resource recovery. > > Regards, > Andreas > -- Need help with Pacemaker? http://www.hastexo.com/now ______________________________________________________________________ Austrian Airlines AG, Office Park 2, P.O. Box 100, 1300 Vienna-Airport, Austria, registered office: Vienna, registered with Vienna Commercial Court under FN 111000k, DVR 0091740. This e-mail is confidential and is subject to disclaimers. Details can be found at: http://www.austrian.com/disclaimer. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org