Hi David, Thanks for the comments! On 01/24/13 00:36, David Vossel wrote: > > > ----- Original Message ----- >> From: "Yan Gao" <y...@suse.com> >> To: pacemaker@oss.clusterlabs.org >> Sent: Monday, January 21, 2013 11:28:40 PM >> Subject: Re: [Pacemaker] Enable remote monitoring >> >> Hi, >> Here's the code for supporting nagios plugins in lrmd: >> >> https://github.com/gao-yan/pacemaker/commits/nagios >> >> A new resource class "nagios" is introduced. >> >> Actions: >> >> - probe: A resource defined for a resource container is not probed. >> (We >> can also add a condition in pengine to just avoid probing a nagios >> class >> resource.) > > Yeah, I think the pengine should know to never probe a nagios script > regardless if it is involved in a container or not. > >> - start: Invokes the nagios plugin with specified parameters (Maps >> the >> instance attributes to the long options of the nagios plugin). If it >> returns non-OK, re-invokes it after some delay (delay = start_timeout >> / >> 10), until it returns OK or exceeds the start timeout. > > I made a comment about this on the patch. Shouldn't the cmd->timeout value > be updated each time it is re-scheduled to account for time already spent? Ah, you are right! Changed, still in https://github.com/gao-yan/pacemaker/commits/nagios
> >> >> - monitor: Recurring invocation to the nagios plugin with specified >> parameters. >> >> - stop: Nothing special is done. The recurring monitor is canceled >> anyway. >> >> - metadata: Reads the corresponding metadata from a xml file in >> NAGIOS_METADATA_DIR. >> >> (As we know nagios plugins don't support metadata. The current plan >> is >> to generate the corresponding metadata according to the help of the >> plugins, and put them into NAGIOS_METADATA_DIR for use -- Dejan >> already >> has progress on this. Thank, Dejan!) >> >> >> For nagios plugins, the exit code are: >> >> STATE_OK = 0, >> STATE_WARNING = 1, >> STATE_CRITICAL = 2, >> STATE_UNKNOWN = 3, >> STATE_DEPENDENT = 4, >> >> AFAICS, STATE_OK should map to PCMK_EXECRA_OK, and the others should >> all >> belong to PCMK_EXECRA_UNKNOWN_ERROR. Well, apparently, there's no >> code >> to express "NOT_RUNNING" in nagios plugins. I think it should be >> fine, >> since there's no probe. >> >> Any suggestions are appreciated! > > This mostly looks like what I expected. I'm letting the whole re-scheduling > of the start operation roll around in my head a bit. It almost seems like > that functionality belongs in the service library... retry executing this > action until either the timeout is hit or some target return code is > encountered. Any thoughts on that? The handling mainly focuses on a "lrmd_cmd_t" -- resetting some of its variables, adding it to the resource's pending operations and triggering. It seems not necessary to put it in service library. Regards, Gao,Yan -- Gao,Yan <y...@suse.com> Software Engineer China Server Team, SUSE. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org