Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent
On 2012-09-11T15:04:55, Alan Robertson al...@unix.sh wrote: Depends. Pacemaker may still care about the status of these agents. If it can't start or stop them, what can it do with them? The status from these agents may feed into operations on other resources that are fully managed. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent
On 09/12/2012 05:14 AM, Lars Marowsky-Bree wrote: On 2012-09-11T15:04:55, Alan Robertson al...@unix.sh wrote: Depends. Pacemaker may still care about the status of these agents. If it can't start or stop them, what can it do with them? The status from these agents may feed into operations on other resources that are fully managed. Understood. I believe it will care about those other agents - not these. It shouldn't know about these, AFAIK. The fact that the other agents might call these is an implementation detail - not something it should care about directly. Just as the resource agents should only rely on things that the OCF RA spec says are provided, consumers of those agents (like pacemaker) shouldn't go past the spec in terms of expectations from or observations of resource agents beyond the spec. Or at least that's how it seems to me. It's still my intent to have the exit codes, argument passing, etc. be fully compliant with the OCF RA specification. The only exception I plan on is no start or stop (or reload, etc) actions. They will implement the meta-data and monitor and validate-all actions. I'm not sure whether validate-all makes sense for them or not(?). I'll think about that... -- Alan Robertson al...@unix.sh - @OSSAlanR Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent
On 2012-09-12T09:01:05, Alan Robertson al...@unix.sh wrote: The status from these agents may feed into operations on other resources that are fully managed. Understood. I believe it will care about those other agents - not these. It shouldn't know about these, AFAIK. I guess then you're talking about a different effort from what Dejan, Yan, and I are investigating. (Since we need that status so that Pacemaker can restart the VM, if needed, for example.) (Our goal is also to reuse existing probes from other monitoring frameworks, not rewrite them.) Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent
On 09/08/2012 02:53 PM, Lars Marowsky-Bree wrote: On 2012-09-07T13:46:27, Alan Robertson al...@unix.sh wrote: Well, I presume that one would not tell pacemaker about such agents, as they would not be useful to pacemaker. From the point of view of the crm command, you wouldn't consider them as valid resource agents to put in a configuration for pacemaker. Depends. Pacemaker may still care about the status of these agents. If it can't start or stop them, what can it do with them? And presuming it can't do anything with them, then it doesn't make sense to include them in a configuration. Am I missing something here? -- Alan Robertson al...@unix.sh - @OSSAlanR Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent
On 2012-09-07T13:46:27, Alan Robertson al...@unix.sh wrote: Well, I presume that one would not tell pacemaker about such agents, as they would not be useful to pacemaker. From the point of view of the crm command, you wouldn't consider them as valid resource agents to put in a configuration for pacemaker. Depends. Pacemaker may still care about the status of these agents. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent
On 2012-09-05T15:25:44, Dejan Muhamedagic de...@suse.de wrote: BTW, FWIW - monocf may be just like ocf, sans start and stop operations. That would make all ocf RA elligible for this use. Thinking about this, not entirely. We'd have to fake the start/stop at least. (In particular the start.) For probes, start has to do at least while ! monitor ; sleep 1 ; done to wait until the service is up, before going on. Otherwise, we'd possibly immediately report a failure to Pacemaker and trigger recovery. (Unless we want to mess with start-delay, which I dislike and also doesn't provide such nice reporting.) stop is tricky from the PE perspective - we don't actually want to stop the probes, but only the VM (which implies the stop of the services it provides). And if we can, we'd love to keep showing the probes's state while the VM shuts down, to show the admin what's going on. But, of course, not trigger a recovery. So, we'd want start-up to be: VM - (probes) - that's easy, as a group or as resource set Shutdown would be the same, though: VM - (probes) - and not the inverse of the above. I wonder how much this would suck or if we should just suck it up and destroy the probes and then stop the VM (giving up this added transparency)? Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent
On 09/05/2012 03:32 AM, Dejan Muhamedagic wrote: This would be for my new monitoring project, of course ;-). But it could then be called by all the HTTP resource agents - or used directly - for example by the Assimilation project. This would be a slight but useful bending of OCF resource agent APIs. We could create some new metadata to document it, and also not put start and stop into the actions in the operations section. Or just the latter. What do you think? Right now, there's a bunch of resource agents faking the state (e.g. ping), that is pretending to be able to start and stop. If we could somehow do without it, that would obviously be beneficial. Not sure if/how the pacemaker could deal with such agents. Well, I presume that one would not tell pacemaker about such agents, as they would not be useful to pacemaker. From the point of view of the crm command, you wouldn't consider them as valid resource agents to put in a configuration for pacemaker. People would instead use the nginx or apache agents that _do_ know how to start and stop things. -- Alan Robertson al...@unix.sh - @OSSAlanR Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent
Hi, On Tue, Sep 04, 2012 at 07:20:23PM -0600, Alan Robertson wrote: Hi Dejan, If the resource agent is not running correctly it needs to be restarted. My memory says that OCF_ERR_GENERIC will not cause that behavior. I believe the spec says you should exit with not running if it is not functioning correctly. (but I didn't check it, and my memory isn't that clear in this case). From the OCF standard: 1 generic or unspecified error (current practice) The monitor operation shall return this for a crashed, hung or otherwise non-functional resource. ... 7 program is not running Note: This is not the error code to be returned by a successful stop operation. A successful stop operation shall return 0. The monitor action shall return this value only for a _cleanly_ stopped resource. If in doubt, it should return 1. It also sounds OK to me. I will likely write a monitor-only resource agent for web servers. What would you think about calling it from the other web resource agents? There's a bit of code in http-mon.sh, extracted from apache. It offers some extended testing of web servers. The features are described in README.webapps. This resource agent will not look at any config files, and will require everything explicitly in parameters, and will not know how to start or stop anything. Somebody wanted to do a ping-like RA, i.e. setting attribute based on the HTTP results. Unfortunately, one contributor gave up and another wanted to do everything from scratch thus duplicating parts of the code. What I'd like to see is a script handling just CRM attributes. Then it would be easy to put together a Dummy-like RA to make use of that one and say http-mon.sh. This would be for my new monitoring project, of course ;-). But it could then be called by all the HTTP resource agents - or used directly - for example by the Assimilation project. This would be a slight but useful bending of OCF resource agent APIs. We could create some new metadata to document it, and also not put start and stop into the actions in the operations section. Or just the latter. What do you think? Right now, there's a bunch of resource agents faking the state (e.g. ping), that is pretending to be able to start and stop. If we could somehow do without it, that would obviously be beneficial. Not sure if/how the pacemaker could deal with such agents. Cheers, Dejan On 08/29/2012 05:31 AM, Dejan Muhamedagic wrote: Hi Alan, On Mon, Aug 27, 2012 at 10:51:15AM -0600, Alan Robertson wrote: Hi, I was recently using the Apache resource agent, and discovered a few problems: The exit code from grep was used directly as an OCF exit code. It is NOT an OCF exit code, and should not be directly used in this way. I guess you mean the greps in monitor_apache_extended and monitor_apache_basic? These lines: 267 $whattorun $test_url | grep -Ei $test_regex /dev/null 277 ${ourhttpclient}_func $STATUSURL | grep -Ei $TESTREGEX /dev/null This caused a not running error to become a generic error. These lines are invoked _only_ in case it was previously established that the apache server is running. So, they should return OCF_ERR_GENERIC if the test fails. grep exits with code 1 which matches OCF_ERR_GENERIC. But indeed the OCF error code should be returned explicitely. Pacemaker reacts very differently to the two kinds of errors. This code occurred in two places. The resource agent used OCF_CHECK_LEVEL improperly. The specification says that if you receive an OCF_CHECK_LEVEL which you do not support, you are required to interpret it as the next lower supported value for OCF_CHECK_LEVEL. In effect, there are no invalid OCF_CHECK_LEVEL values. The Apache agent declared all values but one to be errors. This is not the correct behavior. OK. That somehow slipped while I had been reading the OCF standard. BTW, it'd be great if nginx shared some code with apache. The latter has already been split into three scripts. Cheers, Dejan -- Alan Robertson al...@unix.sh - @OSSAlanR Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Alan Robertson al...@unix.sh - @OSSAlanR Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. -
Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent
On 2012-09-04T19:20:23, Alan Robertson al...@unix.sh wrote: I will likely write a monitor-only resource agent for web servers. What would you think about calling it from the other web resource agents? Sharing code - in this case, the monitor-via-network of the http agents - seems to make sense, yes. This resource agent will not look at any config files, and will require everything explicitly in parameters, and will not know how to start or stop anything. This would be for my new monitoring project, of course ;-). But it could then be called by all the HTTP resource agents - or used directly - for example by the Assimilation project. This would be a slight but useful bending of OCF resource agent APIs. I am not sure I'd go by making this OCF RA. We've - for other reasons, like monitoring the services within a VM - started to look at wrapping up the icinga/nagios probes so that they can be configured and called by the cluster. My current thinking is that they might be best handled via a new resource agent class. (Pseudo-configuration: primitive vm1 ocf:heartbeat:VirtualDomain primitive vm1-httpd icinga:httpd \ params ip=192.168.2.1 port=80 group vm1-service vm1 vm1-httpd With some special code in the PE to make it understand that it can't just restart vm1-httpd, but would need to tackle the whole group atomically, etc.) I'm curious, have you looked into re-using those probes already? I admit we're still at the evaluation stage so we might have missed problems with the approach. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent
Hi Lars, On Wed, Sep 05, 2012 at 11:41:17AM +0200, Lars Marowsky-Bree wrote: On 2012-09-04T19:20:23, Alan Robertson al...@unix.sh wrote: I will likely write a monitor-only resource agent for web servers. What would you think about calling it from the other web resource agents? Sharing code - in this case, the monitor-via-network of the http agents - seems to make sense, yes. This resource agent will not look at any config files, and will require everything explicitly in parameters, and will not know how to start or stop anything. This would be for my new monitoring project, of course ;-). But it could then be called by all the HTTP resource agents - or used directly - for example by the Assimilation project. This would be a slight but useful bending of OCF resource agent APIs. I am not sure I'd go by making this OCF RA. We've - for other reasons, like monitoring the services within a VM - started to look at wrapping up the icinga/nagios probes so that they can be configured and called by the cluster. My current thinking is that they might be best handled via a new resource agent class. (Pseudo-configuration: primitive vm1 ocf:heartbeat:VirtualDomain primitive vm1-httpd icinga:httpd \ params ip=192.168.2.1 port=80 group vm1-service vm1 vm1-httpd With some special code in the PE to make it understand that it can't just restart vm1-httpd, but would need to tackle the whole group atomically, etc.) How about a new element. Something like primitive vm1 ocf:heartbeat:VirtualDomain require vm1 web-test dns-test primitive web-test monocf:heartbeat:http-mon \ params ip=192.168.2.1 port=80 primitive dns-test monocf:heartbeat:named ... The require would imply that the resource vm1 requires monitors of web-test and dns-test to succeed, in addition to its monitor (if defined). Monitor ops of web-test and dns-test will run only on the node where vm1 is started. They could in also get the environment (parameters) of vm1. monocf may be just like ocf, sans start and stop operations. That would make all ocf RA elligible for this use. We could derive more classes from monocf, i.e. wrappers for various monitor solutions. I suppose that this would be relatively straightforward to implement. Thanks, Dejan I'm curious, have you looked into re-using those probes already? I admit we're still at the evaluation stage so we might have missed problems with the approach. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent
On 2012-09-05T15:25:44, Dejan Muhamedagic de...@suse.de wrote: How about a new element. Something like primitive vm1 ocf:heartbeat:VirtualDomain require vm1 web-test dns-test How we map this into Pacemaker's dependency scheme is obviously open to discussion. The require would imply that the resource vm1 requires monitors of web-test and dns-test to succeed, in addition to its monitor (if defined). Perhaps. But an as-a-whole attribute for groups to restart handling might already be enough, since we would want the system to eventually stabilize at the same state it runs to today (that is, with the group brought up to the last non-failing resource; otherwise, admins couldn't login to the VM to fix the problem). Monitor ops of web-test and dns-test will run only on the node where vm1 is started. They could in also get the environment (parameters) of vm1. That's implicit in the group. Internally, this could indeed map to a symmetric or whatever aspect of the order dependency, yes, that could be set for the whole group. monocf may be just like ocf, sans start and stop operations. That would make all ocf RA elligible for this use. None of the current resource agents would be able to cope with the use case I suggested, because they expect to run in the OS image where the service is provided - the idea of using the icinga/nagios plugins is exactly that they don't have this requirement, and thus can monitor the VM externally. For OCF agents, this sort-of already exists: meta is-managed=false. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent
On 06/09/2012, at 12:30 AM, Lars Marowsky-Bree l...@suse.com wrote: On 2012-09-05T15:25:44, Dejan Muhamedagic de...@suse.de wrote: How about a new element. Something like primitive vm1 ocf:heartbeat:VirtualDomain require vm1 web-test dns-test How we map this into Pacemaker's dependency scheme is obviously open to discussion. The require would imply that the resource vm1 requires monitors of web-test and dns-test to succeed, in addition to its monitor (if defined). Perhaps. But an as-a-whole attribute for groups to restart handling might already be enough, since we would want the system to eventually stabilize at the same state it runs to today (that is, with the group brought up to the last non-failing resource; otherwise, admins couldn't login to the VM to fix the problem). Those two requirements seem at odds with each other. I doubt it would end well. I suspect you really want the restart everything trigger to be attached to the monitor only resource (at the end). Monitor ops of web-test and dns-test will run only on the node where vm1 is started. They could in also get the environment (parameters) of vm1. That's implicit in the group. Internally, this could indeed map to a symmetric or whatever aspect of the order dependency, yes, that could be set for the whole group. monocf may be just like ocf, sans start and stop operations. That would make all ocf RA elligible for this use. None of the current resource agents would be able to cope with the use case I suggested, because they expect to run in the OS image where the service is provided - the idea of using the icinga/nagios plugins is exactly that they don't have this requirement, and thus can monitor the VM externally. For OCF agents, this sort-of already exists: meta is-managed=false. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent
Hi Dejan, If the resource agent is not running correctly it needs to be restarted. My memory says that OCF_ERR_GENERIC will not cause that behavior. I believe the spec says you should exit with not running if it is not functioning correctly. (but I didn't check it, and my memory isn't that clear in this case). I will likely write a monitor-only resource agent for web servers. What would you think about calling it from the other web resource agents? This resource agent will not look at any config files, and will require everything explicitly in parameters, and will not know how to start or stop anything. This would be for my new monitoring project, of course ;-). But it could then be called by all the HTTP resource agents - or used directly - for example by the Assimilation project. This would be a slight but useful bending of OCF resource agent APIs. We could create some new metadata to document it, and also not put start and stop into the actions in the operations section. Or just the latter. What do you think? On 08/29/2012 05:31 AM, Dejan Muhamedagic wrote: Hi Alan, On Mon, Aug 27, 2012 at 10:51:15AM -0600, Alan Robertson wrote: Hi, I was recently using the Apache resource agent, and discovered a few problems: The exit code from grep was used directly as an OCF exit code. It is NOT an OCF exit code, and should not be directly used in this way. I guess you mean the greps in monitor_apache_extended and monitor_apache_basic? These lines: 267 $whattorun $test_url | grep -Ei $test_regex /dev/null 277 ${ourhttpclient}_func $STATUSURL | grep -Ei $TESTREGEX /dev/null This caused a not running error to become a generic error. These lines are invoked _only_ in case it was previously established that the apache server is running. So, they should return OCF_ERR_GENERIC if the test fails. grep exits with code 1 which matches OCF_ERR_GENERIC. But indeed the OCF error code should be returned explicitely. Pacemaker reacts very differently to the two kinds of errors. This code occurred in two places. The resource agent used OCF_CHECK_LEVEL improperly. The specification says that if you receive an OCF_CHECK_LEVEL which you do not support, you are required to interpret it as the next lower supported value for OCF_CHECK_LEVEL. In effect, there are no invalid OCF_CHECK_LEVEL values. The Apache agent declared all values but one to be errors. This is not the correct behavior. OK. That somehow slipped while I had been reading the OCF standard. BTW, it'd be great if nginx shared some code with apache. The latter has already been split into three scripts. Cheers, Dejan -- Alan Robertson al...@unix.sh - @OSSAlanR Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Alan Robertson al...@unix.sh - @OSSAlanR Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/