Re: [Linux-ha-dev] RA trace facility
Hi, On Tue, Nov 27, 2012 at 08:28:04AM +0100, Dejan Muhamedagic wrote: On Wed, Nov 21, 2012 at 07:06:35PM +0100, Lars Marowsky-Bree wrote: On 2012-11-21T18:02:49, Dejan Muhamedagic de...@suse.de wrote: What would you think of OCF_RESKEY_RA_TRACE ? A meta attribute perhaps? That wouldn't cause a resource restart. Point, but - meta attributes so far were mostly for the PE/pacemaker, this would be for the RA. Not exactly for the RA itself. The RA execution would just be observed. The attribute is consumed by others. Whether it is PE or lrmd or something else makes less of a difference. It is up to these subsystems to sort the meta attributes out. It turns out that pacemaker won't export meta attributes which were not recognized. At any rate, we can go with OCF_RESKEY_trace_ra. The good thing is that it can be specified per operation (op start trace_ra=1). The interface is simple and it's described in ocf-shellfuncs. It would get support in the UI. Would a changed definition for a resource we're trying to trace be an actual problem? I mean, tracing clearly means you want to trace an resource action, so one would put the attribute on the resource before triggering that. (It can also be put on in maintenance mode, avoiding the restart.) Our include script could enable that; it's unlikely that the problem occurs prior to that. - never (default): Does nothing - always: Always trace, write to $(which path?)/raname.rscid.$timestamp bash has a way to send trace to a separate FD, but that feature is available with version =4.x. Otherwise, it could be messy to separate the trace from the other stderr output. Of course, one could just redirect stderr in this case. I suppose that that would work too. I assume that'd be easiest. (And people not using bash can write their own implementation for this. ;-) - on-error: always trace, but delete on successful exit Good idea. This is not implemented right now. The patch is attached. It's planned for the release 3.9.5. Thanks, Dejan hb_report/history explorer could gather this too. Right. (And yes I know this introduces a fake parameter that doesn't really exist. But it'd be so helpful.) Sorry. Maybe I'm getting carried away ;-) Good points. I didn't really think much (yet) about how to further facilitate the feature, just had a vague idea that somehow lrmd should set the environment variable. Sure. LRM is an other obvious entry point for increased tracing/logging. That could also work. Perhaps we could do something like this: # crm resource trace rsc_id [action] [when-to-trace] This would set the appropriate meta attribute for the resource which would trickle down to the RA. ocf-shellfuncs would then do whatever's necessary to setup the trace. The file management could get tricky though, as we don't have a single point of exit (and trap is already used elsewhere). The file/log management would be easier to do in the LRM - and also handle the timeout situation; that could also make use of the redirect trace elsewhere if the shell is new enough. Indeed. Until then, ocf-shellfuncs can fallback to some well known location. Thanks, Dejan Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ From edad4f4f7ef39da0243c1b3444bb8630443a8c38 Mon Sep 17 00:00:00 2001 From: Dejan Muhamedagic de...@suse.de Date: Wed, 23 Jan 2013 17:36:08 +0100 Subject: [PATCH] Medium: ocf-shellfuncs: RA tracing --- doc/dev-guides/ra-dev-guide.txt | 6 +++ heartbeat/ocf-shellfuncs.in | 82 + tools/ocf-tester.8 | 5 ++- tools/ocf-tester.in | 4 +- 4 files changed, 95 insertions(+), 2 deletions(-) diff --git a/doc/dev-guides/ra-dev-guide.txt b/doc/dev-guides/ra-dev-guide.txt index af5e3b1..11e9a5d 100644 --- a/doc/dev-guides/ra-dev-guide.txt +++ b/doc/dev-guides/ra-dev-guide.txt @@ -1623,6 +1623,12 @@ Beginning tests for /home/johndoe/ra-dev/foobar... /home/johndoe/ra-dev/foobar passed all tests -- +If the resource agent exhibits some difficult to grasp behaviour, +which is typically the case with just developed software, there +are +-v+ and +-d+ options to dump more output. If that
Re: [Linux-ha-dev] RA trace facility
Hi, 2012/11/27 Dejan Muhamedagic de...@suse.de: (...) It might be also helpful if it has a kind of 'hook' functionality that allows you to execute an arbitrary script for collecting the runtime information such as CPU usage, memory status, I/O status or the list of running processes etc. for diagnosis. Yes. I guess that one could run such a hook in background. Did you mean that? I first thought that it simply runs a one-shot hook at the invocation of the RA instance, but it would be great if it can run in background while running a RA operation. Or once the RA instance exited? This is a bit different feature though. It is also possible if it can run a hook at the event of the RA timeouts or a command in the RA gets stuck in some reason. Thanks, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] RA trace facility
On Wed, Nov 21, 2012 at 07:06:35PM +0100, Lars Marowsky-Bree wrote: On 2012-11-21T18:02:49, Dejan Muhamedagic de...@suse.de wrote: What would you think of OCF_RESKEY_RA_TRACE ? A meta attribute perhaps? That wouldn't cause a resource restart. Point, but - meta attributes so far were mostly for the PE/pacemaker, this would be for the RA. Not exactly for the RA itself. The RA execution would just be observed. The attribute is consumed by others. Whether it is PE or lrmd or something else makes less of a difference. It is up to these subsystems to sort the meta attributes out. Would a changed definition for a resource we're trying to trace be an actual problem? I mean, tracing clearly means you want to trace an resource action, so one would put the attribute on the resource before triggering that. (It can also be put on in maintenance mode, avoiding the restart.) Our include script could enable that; it's unlikely that the problem occurs prior to that. - never (default): Does nothing - always: Always trace, write to $(which path?)/raname.rscid.$timestamp bash has a way to send trace to a separate FD, but that feature is available with version =4.x. Otherwise, it could be messy to separate the trace from the other stderr output. Of course, one could just redirect stderr in this case. I suppose that that would work too. I assume that'd be easiest. (And people not using bash can write their own implementation for this. ;-) - on-error: always trace, but delete on successful exit Good idea. hb_report/history explorer could gather this too. Right. (And yes I know this introduces a fake parameter that doesn't really exist. But it'd be so helpful.) Sorry. Maybe I'm getting carried away ;-) Good points. I didn't really think much (yet) about how to further facilitate the feature, just had a vague idea that somehow lrmd should set the environment variable. Sure. LRM is an other obvious entry point for increased tracing/logging. That could also work. Perhaps we could do something like this: # crm resource trace rsc_id [action] [when-to-trace] This would set the appropriate meta attribute for the resource which would trickle down to the RA. ocf-shellfuncs would then do whatever's necessary to setup the trace. The file management could get tricky though, as we don't have a single point of exit (and trap is already used elsewhere). The file/log management would be easier to do in the LRM - and also handle the timeout situation; that could also make use of the redirect trace elsewhere if the shell is new enough. Indeed. Until then, ocf-shellfuncs can fallback to some well known location. Thanks, Dejan Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] RA trace facility
Hi Keisuke-san, On Thu, Nov 22, 2012 at 06:27:59PM +0900, Keisuke MORI wrote: Hi, 2012/11/22 Dejan Muhamedagic de...@suse.de: Hi Lars, On Wed, Nov 21, 2012 at 04:43:08PM +0100, Lars Marowsky-Bree wrote: On 2012-11-21T16:33:18, Dejan Muhamedagic de...@suse.de wrote: Hi, This is little something which could help while debugging resource agents. Setting the environment variable __OCF_TRACE_RA would cause the resource agent run to be traced (as in set -x). PS4 is set accordingly (that's a bash feature, don't know if other shells support it). ocf-tester got an option (-X) to turn the feature on. The agent itself can also turn on/off tracing via ocf_start_trace/ocf_stop_trace. Do you find anything amiss? I *really* like this. But I'd like a different way to turn it on - a standard one that is available via the CIB configuration, without modifying the script. I don't really want that the script gets modified either. The above instructions are for people developing a new RA. I like this, too. I would be useful when you need to diagnose in the production environment if you can enable / disable it without any modifications to RAs. Of course. It might be also helpful if it has a kind of 'hook' functionality that allows you to execute an arbitrary script for collecting the runtime information such as CPU usage, memory status, I/O status or the list of running processes etc. for diagnosis. Yes. I guess that one could run such a hook in background. Did you mean that? Or once the RA instance exited? This is a bit different feature though. Thanks, Dejan -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] RA trace facility
Hi, 2012/11/22 Dejan Muhamedagic de...@suse.de: Hi Lars, On Wed, Nov 21, 2012 at 04:43:08PM +0100, Lars Marowsky-Bree wrote: On 2012-11-21T16:33:18, Dejan Muhamedagic de...@suse.de wrote: Hi, This is little something which could help while debugging resource agents. Setting the environment variable __OCF_TRACE_RA would cause the resource agent run to be traced (as in set -x). PS4 is set accordingly (that's a bash feature, don't know if other shells support it). ocf-tester got an option (-X) to turn the feature on. The agent itself can also turn on/off tracing via ocf_start_trace/ocf_stop_trace. Do you find anything amiss? I *really* like this. But I'd like a different way to turn it on - a standard one that is available via the CIB configuration, without modifying the script. I don't really want that the script gets modified either. The above instructions are for people developing a new RA. I like this, too. I would be useful when you need to diagnose in the production environment if you can enable / disable it without any modifications to RAs. It might be also helpful if it has a kind of 'hook' functionality that allows you to execute an arbitrary script for collecting the runtime information such as CPU usage, memory status, I/O status or the list of running processes etc. for diagnosis. -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] RA trace facility
On 2012-11-21T16:33:18, Dejan Muhamedagic de...@suse.de wrote: Hi, This is little something which could help while debugging resource agents. Setting the environment variable __OCF_TRACE_RA would cause the resource agent run to be traced (as in set -x). PS4 is set accordingly (that's a bash feature, don't know if other shells support it). ocf-tester got an option (-X) to turn the feature on. The agent itself can also turn on/off tracing via ocf_start_trace/ocf_stop_trace. Do you find anything amiss? I *really* like this. But I'd like a different way to turn it on - a standard one that is available via the CIB configuration, without modifying the script. What would you think of OCF_RESKEY_RA_TRACE ? Our include script could enable that; it's unlikely that the problem occurs prior to that. - never (default): Does nothing - always: Always trace, write to $(which path?)/raname.rscid.$timestamp - on-error: always trace, but delete on successful exit hb_report/history explorer could gather this too. (And yes I know this introduces a fake parameter that doesn't really exist. But it'd be so helpful.) Sorry. Maybe I'm getting carried away ;-) Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] RA trace facility
Hi Lars, On Wed, Nov 21, 2012 at 04:43:08PM +0100, Lars Marowsky-Bree wrote: On 2012-11-21T16:33:18, Dejan Muhamedagic de...@suse.de wrote: Hi, This is little something which could help while debugging resource agents. Setting the environment variable __OCF_TRACE_RA would cause the resource agent run to be traced (as in set -x). PS4 is set accordingly (that's a bash feature, don't know if other shells support it). ocf-tester got an option (-X) to turn the feature on. The agent itself can also turn on/off tracing via ocf_start_trace/ocf_stop_trace. Do you find anything amiss? I *really* like this. But I'd like a different way to turn it on - a standard one that is available via the CIB configuration, without modifying the script. I don't really want that the script gets modified either. The above instructions are for people developing a new RA. What would you think of OCF_RESKEY_RA_TRACE ? A meta attribute perhaps? That wouldn't cause a resource restart. Our include script could enable that; it's unlikely that the problem occurs prior to that. - never (default): Does nothing - always: Always trace, write to $(which path?)/raname.rscid.$timestamp bash has a way to send trace to a separate FD, but that feature is available with version =4.x. Otherwise, it could be messy to separate the trace from the other stderr output. Of course, one could just redirect stderr in this case. I suppose that that would work too. - on-error: always trace, but delete on successful exit Good idea. hb_report/history explorer could gather this too. Right. (And yes I know this introduces a fake parameter that doesn't really exist. But it'd be so helpful.) Sorry. Maybe I'm getting carried away ;-) Good points. I didn't really think much (yet) about how to further facilitate the feature, just had a vague idea that somehow lrmd should set the environment variable. Perhaps we could do something like this: # crm resource trace rsc_id [action] [when-to-trace] This would set the appropriate meta attribute for the resource which would trickle down to the RA. ocf-shellfuncs would then do whatever's necessary to setup the trace. The file management could get tricky though, as we don't have a single point of exit (and trap is already used elsewhere). Cheers, Dejan Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] RA trace facility
On 2012-11-21T18:02:49, Dejan Muhamedagic de...@suse.de wrote: What would you think of OCF_RESKEY_RA_TRACE ? A meta attribute perhaps? That wouldn't cause a resource restart. Point, but - meta attributes so far were mostly for the PE/pacemaker, this would be for the RA. Would a changed definition for a resource we're trying to trace be an actual problem? I mean, tracing clearly means you want to trace an resource action, so one would put the attribute on the resource before triggering that. (It can also be put on in maintenance mode, avoiding the restart.) Our include script could enable that; it's unlikely that the problem occurs prior to that. - never (default): Does nothing - always: Always trace, write to $(which path?)/raname.rscid.$timestamp bash has a way to send trace to a separate FD, but that feature is available with version =4.x. Otherwise, it could be messy to separate the trace from the other stderr output. Of course, one could just redirect stderr in this case. I suppose that that would work too. I assume that'd be easiest. (And people not using bash can write their own implementation for this. ;-) - on-error: always trace, but delete on successful exit Good idea. hb_report/history explorer could gather this too. Right. (And yes I know this introduces a fake parameter that doesn't really exist. But it'd be so helpful.) Sorry. Maybe I'm getting carried away ;-) Good points. I didn't really think much (yet) about how to further facilitate the feature, just had a vague idea that somehow lrmd should set the environment variable. Sure. LRM is an other obvious entry point for increased tracing/logging. That could also work. Perhaps we could do something like this: # crm resource trace rsc_id [action] [when-to-trace] This would set the appropriate meta attribute for the resource which would trickle down to the RA. ocf-shellfuncs would then do whatever's necessary to setup the trace. The file management could get tricky though, as we don't have a single point of exit (and trap is already used elsewhere). The file/log management would be easier to do in the LRM - and also handle the timeout situation; that could also make use of the redirect trace elsewhere if the shell is new enough. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/