Re: [ClusterLabs] Coming in Pacemaker 2.1.2: better display of internal failures

2021-10-19 Thread Ken Gaillot
On Tue, 2021-10-19 at 18:11 +, Walker, Chris wrote:
> That looks great … is that a string that an RA can set on failure? 
> I’d love to be able to communicate RA-specific failure reasons back
> to crm_mon consumers…
> Thanks!
> Chris

Yes! If you're using ocf-shellfuncs, call ocf_exit_reason "str" to set
"str" as the exit reason. All it does is output "ocf-exit-reason:str"
to stderr, so you can do that directly if not using ocf-shellfuncs.

>  
> From: Users 
> Date: Tuesday, October 19, 2021 at 1:17 PM
> To: users@clusterlabs.org 
> Subject: [ClusterLabs] Coming in Pacemaker 2.1.2: better display of
> internal failures
> 
> Hi all,
> 
> I hope to get the first release candidate for Pacemaker 2.1.2 out in
> a
> couple of weeks.
> 
> One improvement will be in status displays (crm_mon, and the
> crm_resource --force-* options) for failed actions.
> 
> OCF resource agents already have the ability to output an "exit
> reason"
> for failures. These are displayed in the status, to give more
> detailed
> information than just "error".
> 
> Now, Pacemaker will set exit reasons for internal failures as well.
> This includes problems such as an agent or systemd unit not being
> installed, timeouts in Pacemaker communication as opposed to the
> agent
> itself, an agent process being killed by a signal, etc.
> 
> As an example, sending a kill -9 to a running agent monitor would
> previously result in status with no explanation, requiring some log
> diving to figure it out:
> 
>  * rsc1_monitor_6 on node1 'error' (1): call=188, status='Error',
> exitreason='', last-rc-change='Fri Sep 24 14:45:02 2021', queued=0ms,
> exec=0ms
> 
> Now, the exit reason will plainly say what happened:
> 
>  * rsc1_monitor_6 on node1 'error' (1): call=188, status='Error',
> exitreason='Process interrupted by signal', last-rc-change='Fri Sep
> 24
> 14:45:02 2021', queued=0ms, exec=0ms
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Coming in Pacemaker 2.1.2: better display of internal failures

2021-10-19 Thread Walker, Chris
That looks great … is that a string that an RA can set on failure?  I’d love to 
be able to communicate RA-specific failure reasons back to crm_mon consumers…
Thanks!
Chris

From: Users 
Date: Tuesday, October 19, 2021 at 1:17 PM
To: users@clusterlabs.org 
Subject: [ClusterLabs] Coming in Pacemaker 2.1.2: better display of internal 
failures
Hi all,

I hope to get the first release candidate for Pacemaker 2.1.2 out in a
couple of weeks.

One improvement will be in status displays (crm_mon, and the
crm_resource --force-* options) for failed actions.

OCF resource agents already have the ability to output an "exit reason"
for failures. These are displayed in the status, to give more detailed
information than just "error".

Now, Pacemaker will set exit reasons for internal failures as well.
This includes problems such as an agent or systemd unit not being
installed, timeouts in Pacemaker communication as opposed to the agent
itself, an agent process being killed by a signal, etc.

As an example, sending a kill -9 to a running agent monitor would
previously result in status with no explanation, requiring some log
diving to figure it out:

 * rsc1_monitor_6 on node1 'error' (1): call=188, status='Error',
exitreason='', last-rc-change='Fri Sep 24 14:45:02 2021', queued=0ms,
exec=0ms

Now, the exit reason will plainly say what happened:

 * rsc1_monitor_6 on node1 'error' (1): call=188, status='Error',
exitreason='Process interrupted by signal', last-rc-change='Fri Sep 24
14:45:02 2021', queued=0ms, exec=0ms

--
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users<https://lists.clusterlabs.org/mailman/listinfo/users>

ClusterLabs home: https://www.clusterlabs.org/<https://www.clusterlabs.org/>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Coming in Pacemaker 2.1.2: better display of internal failures

2021-10-19 Thread Ken Gaillot
Hi all,

I hope to get the first release candidate for Pacemaker 2.1.2 out in a
couple of weeks.

One improvement will be in status displays (crm_mon, and the
crm_resource --force-* options) for failed actions.

OCF resource agents already have the ability to output an "exit reason"
for failures. These are displayed in the status, to give more detailed
information than just "error".

Now, Pacemaker will set exit reasons for internal failures as well.
This includes problems such as an agent or systemd unit not being
installed, timeouts in Pacemaker communication as opposed to the agent
itself, an agent process being killed by a signal, etc.

As an example, sending a kill -9 to a running agent monitor would
previously result in status with no explanation, requiring some log
diving to figure it out:

 * rsc1_monitor_6 on node1 'error' (1): call=188, status='Error',
exitreason='', last-rc-change='Fri Sep 24 14:45:02 2021', queued=0ms,
exec=0ms

Now, the exit reason will plainly say what happened:

 * rsc1_monitor_6 on node1 'error' (1): call=188, status='Error',
exitreason='Process interrupted by signal', last-rc-change='Fri Sep 24
14:45:02 2021', queued=0ms, exec=0ms

-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/