Re: [ClusterLabs] Coming in Pacemaker 2.1.2: better display of internal failures
On Tue, 2021-10-19 at 18:11 +, Walker, Chris wrote: > That looks great … is that a string that an RA can set on failure? > I’d love to be able to communicate RA-specific failure reasons back > to crm_mon consumers… > Thanks! > Chris Yes! If you're using ocf-shellfuncs, call ocf_exit_reason "str" to set "str" as the exit reason. All it does is output "ocf-exit-reason:str" to stderr, so you can do that directly if not using ocf-shellfuncs. > > From: Users > Date: Tuesday, October 19, 2021 at 1:17 PM > To: users@clusterlabs.org > Subject: [ClusterLabs] Coming in Pacemaker 2.1.2: better display of > internal failures > > Hi all, > > I hope to get the first release candidate for Pacemaker 2.1.2 out in > a > couple of weeks. > > One improvement will be in status displays (crm_mon, and the > crm_resource --force-* options) for failed actions. > > OCF resource agents already have the ability to output an "exit > reason" > for failures. These are displayed in the status, to give more > detailed > information than just "error". > > Now, Pacemaker will set exit reasons for internal failures as well. > This includes problems such as an agent or systemd unit not being > installed, timeouts in Pacemaker communication as opposed to the > agent > itself, an agent process being killed by a signal, etc. > > As an example, sending a kill -9 to a running agent monitor would > previously result in status with no explanation, requiring some log > diving to figure it out: > > * rsc1_monitor_6 on node1 'error' (1): call=188, status='Error', > exitreason='', last-rc-change='Fri Sep 24 14:45:02 2021', queued=0ms, > exec=0ms > > Now, the exit reason will plainly say what happened: > > * rsc1_monitor_6 on node1 'error' (1): call=188, status='Error', > exitreason='Process interrupted by signal', last-rc-change='Fri Sep > 24 > 14:45:02 2021', queued=0ms, exec=0ms > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Coming in Pacemaker 2.1.2: better display of internal failures
That looks great … is that a string that an RA can set on failure? I’d love to be able to communicate RA-specific failure reasons back to crm_mon consumers… Thanks! Chris From: Users Date: Tuesday, October 19, 2021 at 1:17 PM To: users@clusterlabs.org Subject: [ClusterLabs] Coming in Pacemaker 2.1.2: better display of internal failures Hi all, I hope to get the first release candidate for Pacemaker 2.1.2 out in a couple of weeks. One improvement will be in status displays (crm_mon, and the crm_resource --force-* options) for failed actions. OCF resource agents already have the ability to output an "exit reason" for failures. These are displayed in the status, to give more detailed information than just "error". Now, Pacemaker will set exit reasons for internal failures as well. This includes problems such as an agent or systemd unit not being installed, timeouts in Pacemaker communication as opposed to the agent itself, an agent process being killed by a signal, etc. As an example, sending a kill -9 to a running agent monitor would previously result in status with no explanation, requiring some log diving to figure it out: * rsc1_monitor_6 on node1 'error' (1): call=188, status='Error', exitreason='', last-rc-change='Fri Sep 24 14:45:02 2021', queued=0ms, exec=0ms Now, the exit reason will plainly say what happened: * rsc1_monitor_6 on node1 'error' (1): call=188, status='Error', exitreason='Process interrupted by signal', last-rc-change='Fri Sep 24 14:45:02 2021', queued=0ms, exec=0ms -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users<https://lists.clusterlabs.org/mailman/listinfo/users> ClusterLabs home: https://www.clusterlabs.org/<https://www.clusterlabs.org/> ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Coming in Pacemaker 2.1.2: better display of internal failures
Hi all, I hope to get the first release candidate for Pacemaker 2.1.2 out in a couple of weeks. One improvement will be in status displays (crm_mon, and the crm_resource --force-* options) for failed actions. OCF resource agents already have the ability to output an "exit reason" for failures. These are displayed in the status, to give more detailed information than just "error". Now, Pacemaker will set exit reasons for internal failures as well. This includes problems such as an agent or systemd unit not being installed, timeouts in Pacemaker communication as opposed to the agent itself, an agent process being killed by a signal, etc. As an example, sending a kill -9 to a running agent monitor would previously result in status with no explanation, requiring some log diving to figure it out: * rsc1_monitor_6 on node1 'error' (1): call=188, status='Error', exitreason='', last-rc-change='Fri Sep 24 14:45:02 2021', queued=0ms, exec=0ms Now, the exit reason will plainly say what happened: * rsc1_monitor_6 on node1 'error' (1): call=188, status='Error', exitreason='Process interrupted by signal', last-rc-change='Fri Sep 24 14:45:02 2021', queued=0ms, exec=0ms -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/