On 5/31/21 10:53 AM, Emil Penchev wrote:
Hi all,
I'm writing about an issue we have received from a pacemaker user
about RA timeout.
Some users have encountered a timeout from RA script/program and this
led to a major outage for them.
Typical of these types of cases, there is no additional useful
information to explain why this happened.
There is a proposed solution, a POC from the user to instrument
pacemaker directly and insert a method to activate further debugging
via an external callout program.
One can set an environment variable, for example*PCMK_timeout_prog*
that points to an external program or a script to be executed to get
more useful debug information for example.
Here is the proposed POC change with minor changes.
https://github.com/tickbg/pacemaker/compare/master...a453d30
<https://github.com/tickbg/pacemaker/compare/master...a453d30>
If you directly create a pull-request we would be able
to use github for discussion.
In pacemaker we already have the alerts-feature that
allows calling scripts on various occasions.
One of those is resource-actions.
So it might make sense to consider an extension of
that feature as to cover your case here as well.
Atm you would get the return-code of the RA passed
to your script. I'm actually unsure what happens in
case of a timeout.
To just be called in case of a timeout additional
filtering might be handy to reduce load generated
if the filtering is done in the script and a synchronous-call
flag (atm alerts are called more in a fire and forget
manner as not to throttle pacemaker actions)
could be useful.
Klaus
Emil.
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers
ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers
ClusterLabs home: https://www.clusterlabs.org/