Hi David,
Thanks for the comments!

On 01/24/13 00:36, David Vossel wrote:
> 
> 
> ----- Original Message -----
>> From: "Yan Gao" <y...@suse.com>
>> To: pacemaker@oss.clusterlabs.org
>> Sent: Monday, January 21, 2013 11:28:40 PM
>> Subject: Re: [Pacemaker] Enable remote monitoring
>>
>> Hi,
>> Here's the code for supporting nagios plugins in lrmd:
>>
>> https://github.com/gao-yan/pacemaker/commits/nagios
>>
>> A new resource class "nagios" is introduced.
>>
>> Actions:
>>
>> - probe: A resource defined for a resource container is not probed.
>> (We
>> can also add a condition in pengine to just avoid probing a nagios
>> class
>> resource.)
> 
> Yeah, I think the pengine should know to never probe a nagios script 
> regardless if it is involved in a container or not.
> 
>> - start: Invokes the nagios plugin with specified parameters (Maps
>> the
>> instance attributes to the long options of the nagios plugin). If it
>> returns non-OK, re-invokes it after some delay (delay = start_timeout
>> /
>> 10),  until it returns OK or exceeds the start timeout.
> 
> I made a comment about this on the patch.  Shouldn't the cmd->timeout value 
> be updated each time it is re-scheduled to account for time already spent?
Ah, you are right! Changed, still in
https://github.com/gao-yan/pacemaker/commits/nagios

> 
>>
>> - monitor: Recurring invocation to the nagios plugin with specified
>> parameters.
>>
>> - stop: Nothing special is done. The recurring monitor is canceled
>> anyway.
>>
>> - metadata: Reads the corresponding metadata from a xml file in
>> NAGIOS_METADATA_DIR.
>>
>> (As we know nagios plugins don't support metadata. The current plan
>> is
>> to generate the corresponding metadata according to the help of the
>> plugins, and put them into NAGIOS_METADATA_DIR for use -- Dejan
>> already
>> has progress on this. Thank, Dejan!)
>>
>>
>> For nagios plugins, the exit code are:
>>
>> STATE_OK        = 0,
>> STATE_WARNING   = 1,
>> STATE_CRITICAL  = 2,
>> STATE_UNKNOWN   = 3,
>> STATE_DEPENDENT = 4,
>>
>> AFAICS, STATE_OK should map to PCMK_EXECRA_OK, and the others should
>> all
>> belong to PCMK_EXECRA_UNKNOWN_ERROR. Well, apparently, there's no
>> code
>> to express "NOT_RUNNING" in nagios plugins. I think it should be
>>  fine,
>> since there's no probe.
>>
>> Any suggestions are appreciated!
> 
> This mostly looks like what I expected.  I'm letting the whole re-scheduling 
> of the start operation roll around in my head a bit.  It almost seems like 
> that functionality belongs in the service library...  retry executing this 
> action until either the timeout is hit or some target return code is 
> encountered.  Any thoughts on that?
The handling mainly focuses on a "lrmd_cmd_t" -- resetting some of its
variables, adding it to the resource's pending operations and
triggering. It seems not necessary to put it in service library.

Regards,
  Gao,Yan
-- 
Gao,Yan <y...@suse.com>
Software Engineer
China Server Team, SUSE.

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to