Re: [Pacemaker] Creating a safe cluster-node shutdown script (for when UPS goes OnBattery+LowBattery)

Tomas Jelinek Fri, 11 Jul 2014 02:21:08 -0700

Dne 11.7.2014 01:30, Giuseppe Ragusa napsal(a):

 > Date: Thu, 10 Jul 2014 10:24:49 +0200
 > From: tojel...@redhat.com
 > To: pacemaker@oss.clusterlabs.org
 > Subject: Re: [Pacemaker] Creating a safe cluster-node shutdown script
(for when UPS goes OnBattery+LowBattery)
 >
 > Dne 10.7.2014 03:17, Giuseppe Ragusa napsal(a):
 > > On Thu, Jul 10, 2014, at 00:06, Andrew Beekhof wrote:
 > >>
 > >> On 9 Jul 2014, at 10:28 pm, Giuseppe Ragusa
<giuseppe.rag...@hotmail.com> wrote:
 > >>
 > >>> On Tue, Jul 8, 2014, at 02:59, Andrew Beekhof wrote:
 > >>>>
 > >>>> On 4 Jul 2014, at 3:16 pm, Giuseppe Ragusa
<giuseppe.rag...@hotmail.com> wrote:
 > >>>>
 > >>>>> Hi all,
 > >>>>> I'm trying to create a script as per subject (on CentOS 6.5,
CMAN+Pacemaker, only DRBD+KVM active/passive resources; SNMP-UPS
monitored by NUT).
 > >>>>>
 > >>>>> Ideally I think that each node should stop (disable) all
locally-running VirtualDomain resources (doing so cleanly demotes than
downs the DRBD resources underneath), then put itself in standby and
finally shutdown.
 > >>>>
 > >>>> Since the end goal is shutdown, why not just run 'pcs cluster
stop' ?
 > >>>
 > >>> I thought that this action would cause communication interruption
(since Corosync would be not responding to the peer) and so cause the
other node to stonith us;
 > >>
 > >> No. Shutdown is a globally co-ordinated process.
 > >> We don't fence nodes we know shut down cleanly.
 > >
 > > Thanks for the clarification.
 > > Now that you said it, it seems also logical and even obvious ;>
 > >
 > >>> I know that ideally the other node too should perform "pcs
cluster stop" in short, since the same UPS powers both, but I worry
about timing issues (and "races") in UPS monitoring since it is a large
Enterprise UPS monitored by SNMP.
 > >>>
 > >>> Furthermore I do not know what happens to running resources at
"pcs cluster stop": I infer from your suggestion that resources are
brought down and not migrated on the other node, correct?
 > >>
 > >> If the other node is shutting down too, they'll simply be stopped.
 > >> Otherwise we'll try to move them.
 > >
 > > It's the "moving" that worries me :)
 > >
 > >>>> Possibly with 'pcs cluster standby' first if you're worried that
stopping the resources might take too long.
 > >
 > > I forgot to ask: in which way would a previous standby make the
resources stop sooner?
 > >
 > >>> I thought that "pcs cluster standby" would usually migrate the
resources to the other node (I actually tried it and confirmed the
expected behaviour); so this would risk to become a race with the timing
of the other node standby,
 > >>
 > >> Not really, at the point the second node runs 'standby' we'll stop
trying to migrate services and just stop them everywhere.
 > >> Again, this is a centrally controlled process, timing isn't a problem.
 > >
 > > I understand that, "eventually", timing won't be a problem and
resources will "eventually" stop, but from your description I'm afraid
that some delaying could result in the total shutdown process, arising
from possibly unsynchronized UPS notifications on the nodes (first node
starts standby, resources start to move, THEN second node starts standby).
 > >
 > > So now I'm taking your advice and I'll modify the script to user
cluster stop but, with the aim of avoiding the aforementioned delay (if
it actually represents a possibility), I would like to ask you three
questions:
 > >
 > Hi,
 > "pcs cluster stop --all" does not work on 6.5 with pcs-0.9.90 which I
 > believe is shipped with 6.5. You need to install current pcs version
 > from https://github.com/feist/pcs and run pcsd service on all nodes.


Hi,
thanks for the information.
 From reading the discussion under various Pacemaker-related bugzilla
entries (on Red Hat bugzilla) I'm suspecting that the package will be
rebased/updated soon (possibly for RHEL 6.6), so I could either wait or
temporarily recompile/create an updated package myself.

 > > *) if I simply issue a "pcs cluster stop --all" from the first node
that gets notified of UPS critical status, do I risk any adverse effect
when the other node asynchronously gives the same command some time
later (before/after the whole cluster stop sequence completes)?
 > It just runs "service pacemaker stop" and "service cman stop" on every
 > node. It should have no effect once the services are already stopped.

Ok for the absence of adverse effects, but the error detection could be
"difficult" (see your warning on exit codes below):
it could have exit code != 0 if the other node was already powered off
or otherwise inoperative, right?
And what would be the exit code if the cluster is already off on this
node (because the other one caught the UPS signal first)? I suppose that
would be an exit code != 0 too.

If pcs is able to connect to pcsd on all nodes and run "servicepacemaker stop" and "service cman stop" on them and both of thesecommands exit with exit code == 0 on all nodes then pcs exits with exitcode == 0. pcsd service is not stopped by this so you can run "pcscluster stop --all" repeatedly and it will exit with 0 because stoppingalready stopped pacemaker and cman exit with 0.

If pcs is not able to connect to pcsd on some node or run the clusterstop command there or service stop exits with != 0 there then pcs exitswith exit code != 0.

So you can run "pcs cluster stop --all" on all nodes and it will exitwith 0. Once you power off a node or stop pcsd service it will exit with!= 0.


Regards,
Tomas


 > > *) does the aforementioned "pcs cluster stop --all" command return
only after the cluster stop sequence has actually/completely ended (so
as to safely issue a "shutdown -h now" immediately afterwards)?
 > Yes. You need to check the return code, non-zero return code means some
 > error has occurred and some nodes haven't been stopped.
 > When you shutdown one node and then try to run "pcs cluster stop --all"
 > on the other one it will fail and return non-zero return code
(obviously).

I suppose I could simply log the exit code and assume that everything
will go reasonably well anyway, but all the possible different
conditions/results make me a bit nervous when thinking of a
logic/algorithm to use and remind me of the hint that Digimer gave at
the beginning of this thread: beware of bugs when developing a script
for such a delicate purpose (programmatically bringing down an otherwise
healthy cluster).

 > > *) is the "pcs cluster stop --all" command known to work reliably
on current CentOS 6.5? (I ask since I found some discussion around "pcs
cluster start" related bugs)
 > See above.

So I'm thinking of a different strategy in order to stick to the advice
of keeping things simple and straight to the point, so these are my
conclusions:

*) I'll get  rid of my too elaborate/peculiar loop, detections, timeouts

*) no "pcs cluster stop" (with or without --all) since it does not meet
my goal (i.e. bring resources, cluster and then nodes down with the
absolute minimum fuss and resources ping-pong should the nodes catch the
UPS status with some delay between them)

*) no "pcs cluster standby" for the same reason given above

*) the cluster coming up fully working by itself at the eventual restart
(power restored and UPS battery at least minimally recharged) would be
desirable, but I will set for a little manual intervention once power
stability has been assured by humans ;>

So this is the strategy I came up with after pondering all your
advices/suggestions/explanations:

"pcs property set maintenance-mode=true" given by each node
unconditionally (I think this should work when given again by the other
node, independently of any delay the other node has in starting this
shutdown-by-UPS procedure) followed (maybe with a little fixed sleep) by
a simple local "shutdown -h +0"

given that all the resources (unmanaged, at this point) that I'm using
have an initscript counterpart which is registered for system
halt/reboot runlevels (but obviously not for system boot), it follows
that they should come down cleanly as if they were not cluster-controlled

at the eventual restart (power restored and UPS battery at least
minimally recharged) an interactively issued "pcs property unset
maintenance-mode" should bring everything online again (since pacemaker
and cman start automatically at system boot)

What do you think? Is this correct/sensible?

Many thanks again for all your suggestions/informations.

Regards,
Giuseppe

 > Regards,
 > Tomas
 > >
 > > Many thanks again for your invaluable help and insight.
 > >
 > > Regards,
 > > Giuseppe
 > >
 > >>> so this is why I took the hassle of explicitly and orderly
stopping all locally-running resources in my script BEFORE putting the
local node in standby.
 > >>>
 > >>>> Pacemaker will stop everything in the required order and stop
the node when done... problem solved?
 > >>>
 > >>> I thought that after a "pcs cluster standby" a regular "shutdown
-h" of the operating system would cleanly bring down the cluster too,
 > >>
 > >> It should do
 > >>
 > >>> without the need for a "pcs cluster stop", given that both
Pacemaker and CMAN are correctly configured for automatic
startup/shutdown as operating system services (SysV initscripts
controlled by CentOS 6.5 Upstart, in my case).
 > >>>
 > >>> Many thanks again for your always thought-provoking and
informative answers!
 > >>>
 > >>> Regards,
 > >>> Giuseppe
 > >>>
 > >>>>>
 > >>>>> On further startup, manual intervention would be required to
unstandby all nodes and enable resources (nodes already in standby and
resources already disabled before blackout should be manually
distinguished).
 > >>>>>
 > >>>>> Is this strategy conceptually safe?
 > >>>>>
 > >>>>> Unfortunately, various searches have turned out no "prior art" :)
 > >>>>>
 > >>>>> This is my tentative script (consider it in the public domain):
 > >>>>>
 > >>>>>
------------------------------------------------------------------------------------------------------------------------------------
 > >>>>> #!/bin/bash
 > >>>>>
 > >>>>> # Note: "pcs cluster status" still has a small bug vs.
CMAN-controlled Corosync and would always return != 0
 > >>>>> pcs status > /dev/null 2>&1
 > >>>>> STATUS=$?
 > >>>>>
 > >>>>> # Detect if cluster is running at all on local node
 > >>>>> # TODO: detect node already in standby and bypass this
 > >>>>> if [ "${STATUS}" = 0 ]; then
 > >>>>> local_node="$(cman_tool status | grep -i
'Node[[:space:]]*name:' | sed -e
's/^.*Node\s*name:\s*\([^[:space:]]*\).*$/\1/i')"
 > >>>>> for local_resource in $(pcs status 2>/dev/null | grep
"ocf::heartbeat:VirtualDomain.*${local_node}\\s*\$" | awk '{print $1}'); do
 > >>>>> pcs resource disable "${local_resource}"
 > >>>>> done
 > >>>>> # TODO: each resource disabling above may return without
waiting for complete stop - wait here for "no more resources active"?
(but avoid endless loops)
 > >>>>> pcs cluster standby "${local_node}"
 > >>>>> fi
 > >>>>>
 > >>>>> # Shut down gracefully anyway at the end
 > >>>>> /sbin/shutdown -h +0
 > >>>>>
 > >>>>>
------------------------------------------------------------------------------------------------------------------------------------
 > >>>>>
 > >>>>> Comments/suggestions/improvements are more than welcome.
 > >>>>>
 > >>>>> Many thanks in advance.
 > >>>>>
 > >>>>> Regards,
 > >>>>> Giuseppe
 > >>>>>
 > >>>>> _______________________________________________
 > >>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 > >>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 > >>>>>
 > >>>>> Project Home: http://www.clusterlabs.org
 > >>>>> Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 > >>>>> Bugs: http://bugs.clusterlabs.org
 > >>>>
 > >>>> _______________________________________________
 > >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 > >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 > >>>>
 > >>>> Project Home: http://www.clusterlabs.org
 > >>>> Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 > >>>> Bugs: http://bugs.clusterlabs.org
 > >>>> Email had 1 attachment:
 > >>>> + signature.asc
 > >>>> 1k (application/pgp-signature)
 > >>> --
 > >>> Giuseppe Ragusa
 > >>> giuseppe.rag...@fastmail.fm
 > >>>
 > >>>
 > >>> _______________________________________________
 > >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 > >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 > >>>
 > >>> Project Home: http://www.clusterlabs.org
 > >>> Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 > >>> Bugs: http://bugs.clusterlabs.org
 > >>
 > >> _______________________________________________
 > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 > >>
 > >> Project Home: http://www.clusterlabs.org
 > >> Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 > >> Bugs: http://bugs.clusterlabs.org
 > >> Email had 1 attachment:
 > >> + signature.asc
 > >> 1k (application/pgp-signature)
 >
 > _______________________________________________
 > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 >
 > Project Home: http://www.clusterlabs.org
 > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 > Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Creating a safe cluster-node shutdown script (for when UPS goes OnBattery+LowBattery)

Reply via email to