On 5/31/21 10:53 AM, Emil Penchev wrote:
Hi all,

I'm writing about an issue we have received from a pacemaker user about RA timeout. Some users have encountered a timeout from RA script/program and this led to a major outage for them. Typical of these types of cases, there is no additional useful information to explain why this happened. There is a proposed solution, a POC from the user to instrument pacemaker directly and insert a method to activate further debugging via an external callout program. One can set an environment variable, for example*PCMK_timeout_prog* that points to an external program or a script to be executed to get more useful debug information for example.

Here is the proposed POC change with minor changes.
https://github.com/tickbg/pacemaker/compare/master...a453d30 <https://github.com/tickbg/pacemaker/compare/master...a453d30>

If you directly create a pull-request we would be able

to use github for discussion.


In pacemaker we already have the alerts-feature that

allows calling scripts on various occasions.

One of those is resource-actions.

So it might make sense to consider an extension of

that feature as to cover your case here as well.

Atm you would get the return-code of the RA passed

to your script. I'm actually unsure what happens in

case of a timeout.

To just be called in case of a timeout additional

filtering might be handy to reduce load generated

if the filtering is done in the script and a synchronous-call

flag (atm alerts are called more in a fire and forget

manner as not to throttle pacemaker actions)

could be useful.


Klaus


Emil.

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to