Re: [Nut-upsuser] FSD sequence: Waiting for bigger and slower clients before cutting power

2023-10-31 Thread Jim Klimov via Nut-upsuser
Hello again :)

> Aren't the drivers daemonized and isn't it the drivers that kill power
already?

Historically, a mixed bag: they are daemonized while monitoring, then they
exit (or are killed) as part of shutdown, and then a late-shutdown script
(historically a patch to rc.halt or similar init-script - this integration
depends a lot on the OS and its service management framework, whether we
can inject late-shutdown logic at all) would check that we are in FSD mode
and then would call the driver binary again which connects to the device
again and tells it to power off.

With recent work on master branch, a possibility was introduced to use the
running driver daemon to send such a command to the UPS; however it would
need some rework of OS integration to not kill the daemon, retain the
communication sockets available, etc. - so probably not something instantly
usable for the out-of-the-box shutdown routine today.

> The manual makes a significant point of the idea that when an FSD event
occurs, everything should be shut down and the power cut so that the UPS
later will power everything back up again, whereas if some consumers are
shut down early, they won't automatically come back up.

To my knowledge, that stance remains. It is generally complicated to
orchestrate revival of a rack full of servers from partially-off state,
with some databases depending on storage and apps depending on databases,
etc. - and especially with legacy systems lacking some cross-server service
health and dependency tracking. So the safe approach is to power everything
off and then start up consistently and predictably. After all, this chain
of events is something tested by every power-on of the data room, unlike
any other random situations :) Of course, if a particular deployment can
take advantage of their local tools (say, everything is a single Kubernetes
farm and all nodes know each other's health), solutions specific to such
use-cases can differ from NUT's generally suggested safe default behavior.
If blueprints to such optimized solutions can be shared as part of NUT FAQ
or Wiki - so much the better :)

Regarding power coming back after our shutdown began, and if we can not
power-cycle the UPS when this routine ends (e.g. poor hardware support),
then other than WoL packets, the late-shutdown integration noted above can
also be used (if it can) on the NUT-client systems. If an upsmon client has
caused an FSD shutdown, the client system can just sleep for a long time
and reboot programmatically. If the battery dies during this time - no
loss. If the UPS stays up, servers come back after some time.

> I figure it'd be better if the primary could shut down as soon as it's
safe to do so rather than waiting for a fixed amount of time.

Here we balance opposing goals :) If you want the heavy-weight secondary
servers to take time for a shut down (assuming battery can handle it),
before the NUT primary is green-lit to proceed with its shut-down and
complete it by cutting the UPS power, then you wait for all secondary
upsmons to log off. As soon as they do, it is considered safe for the
primary to proceed (it does not wait "for a fixed time" beyond that).

Currently the primary does wait for a (configurable) fixed time and proceed
with its shutdown even if secondaries did not log off.

Makes sense for such a primary to not have long-stopping services of its
own, which would rely on some large-file consistency etc. - the battery
would be running on fumes during its shutdown and power might disappear any
time. Something lightweight (service-wise) like a firewall machine or a
dedicated Raspberry might be a good choice. Perhaps some integration via
upssched (or further development work in upsmon; or maybe using a separate
secondary upsmon client with a special SHUTDOWNCMD on the same machine)
could ensure that the NUT primary host would stop its heavy services (if
any) as soon as it tells other clients to stop themselves. This way there
would be nothing important running except NUT on this primary system by the
time that the primary upsmon proceeds to shut down its host server.

As part of this discussion, https://github.com/networkupstools/nut/pull/2133
was merged and added a way for the secondary upsmons to NOT exit as soon as
they called their SHUTDOWNCMD - now alternatives exist for upsmon to wait
for a specified time and then exit, or to never exit on its own accord (but
to honour e.g. a SIGTERM when its OS service is finally told to go down, up
to the sysadmins to make sure this happens after all the heavy services are
safely parked). This is available on master branch now, and should be part
of NUT v2.8.1 release soon.

Hope this helps,
Jim Klimov


On Tue, Oct 31, 2023 at 11:27 AM Magnus Holmgren <
magnus.holmg...@milientsoftware.com> wrote:

> fredag 27 oktober 2023 20:07:58 CET skrev  Jim Klimov:
> > Hi, this does sound like a useful idea - although for the principle of
> > least surprise and for variation in deployments, I'd 

Re: [Nut-upsuser] FSD sequence: Waiting for bigger and slower clients before cutting power

2023-10-31 Thread Magnus Holmgren
fredag 27 oktober 2023 20:07:58 CET skrev  Jim Klimov:
> Hi, this does sound like a useful idea - although for the principle of
> least surprise and for variation in deployments, I'd rather have it as a
> (non-default state of a) configuration toggle that can be set via
> `upsmon.conf`: whether this particular client exits after processing FSD or
> not. The onus for the rest would be on general systems integration - e.g.
> ensure that init scripts `K*`ill the long-running services before they go
> after upsmon and upsd, or add a drop-in systemd config snippet for
> nut-monitor to not-conflict with "shutdown.target" (and half a dozen of its
> equivalents for halt/reboot/poweroff/...), and possibly to break the
> shutdown-dependency between nut-monitor/nut-server/nut-driver units.

Yes, I figure this is up to the distributions to get right.

> On a related note - there was lately work to allow daemonized drivers to
> kill power of the UPS (may be useful especially for devices with long
> protocol init times), with a safety switch to flip about this and actually
> allow the driver to issue killpower commands. So stopping driver daemons
> might eventually be not needed - but I'm not sure any OS integrations took
> note of this possibility yet. It was not officially released so far, just
> is in master branch.

I don't understand what you're saying here. Aren't the drivers daemonized and 
isn't it the drivers that kill power already?

> Note however that typically FSD happens when the power is critical.
> Definitions of that vary, as well as ability or not to set certain
> thresholds for when the device would emit (and a driver would relay) the
> low-battery condition. You might not physically have those 2 minutes worth
> of remaining battery charge to shut down the VMs or other long-stopping
> services (e.g. app servers to flush in-flight operations, and only later
> their databases) - more so with the probable storage I/O and power-draw
> burst to flush out databases or hibernate those VMs.

Obviously I'll have to set the LB threshold to give sufficient margin, but i 
figure it'd be better if the primary could shut down as soon as it's safe to 
do so rather than waiting for a fixed amount of time. Though I'd still need to 
ensure that the battery can last for as long as the primary will wait.

> In this case fiddling with upssched or setting up dummy-ups relays with an
> override for defining earlier trigger of critical state (usually by battery
> charge or time remaining) may fare better: your NUT primary server would
> seem to serve several UPSes (the "real" device and a few dummies with
> different "criticality" levels), and various secondary hosts would MONITOR
> the suitable dummy to begin their shutdown earlier into the outage. This
> approach may also be useful for Dan's post :)

Ah, that seems like a workable way of shutting down different machines based 
on different battery levels. One caveat though: The manual makes a significant 
point of the idea that when an FSD event occurs, everything should be shut 
down and the power cut so that the UPS later will power everything back up 
again, whereas if some consumers are shut down early, they won't automatically 
come back up (unless you arrange for WoL packets to be sent out, or 
something).

> Jim
> 
> On Fri, Oct 27, 2023 at 4:55 PM Magnus Holmgren <
> 
> magnus.holmg...@milientsoftware.com> wrote:
> > Hi, and thanks for this great piece of free software! I've been meaning to
> > sort this out for some time, but we don't get power outages that often,
> > fortunately...
> > 
> > So, correct me if I'm wrong, but from the documentation at https://
> > networkupstools.org/docs/user-manual.chunked/
> > Configuration_notes.html#UPS_shutdown, and also reading upsmon.c, when a
> > UPS
> > goes OB LB (assuming we have a single UPS connected to a primary and
> > supplying
> > power to the primary and some number of secondaries), the primary notifies
> > the
> > secondaries, the secondaries wait for FINALDELAY and then execute
> > SHUTDOWNCMD
> > immediately followed by exiting, thereby disconnecting from the primary,
> > and
> > the primary, after seeing all secondaries disconnect, proceed with its
> > shutdown (only waiting for FINALDELAY), which ends with telling the UPS to
> > cut
> > the power (without delay too, right?).
> > 
> > Again, correct me if I'm wrong, Is it only I who find this a bit flawed? I
> > would like for the secondaries to stay connected until they shut down. We
> > have
> > a server with a bunch of virtual machines on, and they can take a couple
> > of
> > minutes to shut down. Otherwise the primary can easily cut the power
> > prematurely. Avoiding this, it seems, could pretty easily be accomplished
> > by
> > having upsmon wait, perhaps in a separate loop, for the INT/TERM/QUIT
> > signal
> > (it would still be necessary to configure the service manager such that
> > upsmon
> > is terminated as late as possible). The primary could sta

Re: [Nut-upsuser] FSD sequence: Waiting for bigger and slower clients before cutting power

2023-10-27 Thread Jim Klimov via Nut-upsuser
Check it out now,
my NUT project braza!..

https://github.com/networkupstools/nut/pull/2133
https://github.com/networkupstools/nut/wiki/Building-NUT-for-in%E2%80%90place-upgrades-or-non%E2%80%90disruptive-tests

Jim


On Fri, Oct 27, 2023 at 8:07 PM Jim Klimov  wrote:

> Hi, this does sound like a useful idea - although for the principle of
> least surprise and for variation in deployments, I'd rather have it as a
> (non-default state of a) configuration toggle that can be set via
> `upsmon.conf`: whether this particular client exits after processing FSD or
> not. The onus for the rest would be on general systems integration - e.g.
> ensure that init scripts `K*`ill the long-running services before they go
> after upsmon and upsd, or add a drop-in systemd config snippet for
> nut-monitor to not-conflict with "shutdown.target" (and half a dozen of its
> equivalents for halt/reboot/poweroff/...), and possibly to break the
> shutdown-dependency between nut-monitor/nut-server/nut-driver units.
>
> On a related note - there was lately work to allow daemonized drivers to
> kill power of the UPS (may be useful especially for devices with long
> protocol init times), with a safety switch to flip about this and actually
> allow the driver to issue killpower commands. So stopping driver daemons
> might eventually be not needed - but I'm not sure any OS integrations took
> note of this possibility yet. It was not officially released so far, just
> is in master branch.
>
> Note however that typically FSD happens when the power is critical.
> Definitions of that vary, as well as ability or not to set certain
> thresholds for when the device would emit (and a driver would relay) the
> low-battery condition. You might not physically have those 2 minutes worth
> of remaining battery charge to shut down the VMs or other long-stopping
> services (e.g. app servers to flush in-flight operations, and only later
> their databases) - more so with the probable storage I/O and power-draw
> burst to flush out databases or hibernate those VMs.
>
> In this case fiddling with upssched or setting up dummy-ups relays with an
> override for defining earlier trigger of critical state (usually by battery
> charge or time remaining) may fare better: your NUT primary server would
> seem to serve several UPSes (the "real" device and a few dummies with
> different "criticality" levels), and various secondary hosts would MONITOR
> the suitable dummy to begin their shutdown earlier into the outage. This
> approach may also be useful for Dan's post :)
>
> Jim
>
> On Fri, Oct 27, 2023 at 4:55 PM Magnus Holmgren <
> magnus.holmg...@milientsoftware.com> wrote:
>
>> Hi, and thanks for this great piece of free software! I've been meaning
>> to
>> sort this out for some time, but we don't get power outages that often,
>> fortunately...
>>
>> So, correct me if I'm wrong, but from the documentation at https://
>> networkupstools.org/docs/user-manual.chunked/
>> Configuration_notes.html#UPS_shutdown, and also reading upsmon.c, when a
>> UPS
>> goes OB LB (assuming we have a single UPS connected to a primary and
>> supplying
>> power to the primary and some number of secondaries), the primary
>> notifies the
>> secondaries, the secondaries wait for FINALDELAY and then execute
>> SHUTDOWNCMD
>> immediately followed by exiting, thereby disconnecting from the primary,
>> and
>> the primary, after seeing all secondaries disconnect, proceed with its
>> shutdown (only waiting for FINALDELAY), which ends with telling the UPS
>> to cut
>> the power (without delay too, right?).
>>
>> Again, correct me if I'm wrong, Is it only I who find this a bit flawed?
>> I
>> would like for the secondaries to stay connected until they shut down. We
>> have
>> a server with a bunch of virtual machines on, and they can take a couple
>> of
>> minutes to shut down. Otherwise the primary can easily cut the power
>> prematurely. Avoiding this, it seems, could pretty easily be accomplished
>> by
>> having upsmon wait, perhaps in a separate loop, for the INT/TERM/QUIT
>> signal
>> (it would still be necessary to configure the service manager such that
>> upsmon
>> is terminated as late as possible). The primary could start shutting down
>> its
>> services in the meantime, but upsmon would hold the poweroff until the
>> secondaries have disconnected (or HOSTSYNC expires).
>>
>> Surely this would be better than cranking up FINALDELAY on the primary
>> and
>> always waiting for a fixed period of time, as suggested in
>> https://alioth-lists.debian.net/pipermail/nut-upsuser/2012-April/007550.html?
>> I guess I could
>> try writing a SHUTDOWNCMD script that doesn't exit until most other
>> services
>> have also done so, taking care not to create a deadlock situation.
>>
>> Another option would be to use upssched to shut down the "big rig"
>> earlier. It
>> just seems unsatisfying to me that upssched is entirely time-based. It
>> would
>> be nice if it were easier to trigger off battery.char

Re: [Nut-upsuser] FSD sequence: Waiting for bigger and slower clients before cutting power

2023-10-27 Thread Jim Klimov via Nut-upsuser
Hi, this does sound like a useful idea - although for the principle of
least surprise and for variation in deployments, I'd rather have it as a
(non-default state of a) configuration toggle that can be set via
`upsmon.conf`: whether this particular client exits after processing FSD or
not. The onus for the rest would be on general systems integration - e.g.
ensure that init scripts `K*`ill the long-running services before they go
after upsmon and upsd, or add a drop-in systemd config snippet for
nut-monitor to not-conflict with "shutdown.target" (and half a dozen of its
equivalents for halt/reboot/poweroff/...), and possibly to break the
shutdown-dependency between nut-monitor/nut-server/nut-driver units.

On a related note - there was lately work to allow daemonized drivers to
kill power of the UPS (may be useful especially for devices with long
protocol init times), with a safety switch to flip about this and actually
allow the driver to issue killpower commands. So stopping driver daemons
might eventually be not needed - but I'm not sure any OS integrations took
note of this possibility yet. It was not officially released so far, just
is in master branch.

Note however that typically FSD happens when the power is critical.
Definitions of that vary, as well as ability or not to set certain
thresholds for when the device would emit (and a driver would relay) the
low-battery condition. You might not physically have those 2 minutes worth
of remaining battery charge to shut down the VMs or other long-stopping
services (e.g. app servers to flush in-flight operations, and only later
their databases) - more so with the probable storage I/O and power-draw
burst to flush out databases or hibernate those VMs.

In this case fiddling with upssched or setting up dummy-ups relays with an
override for defining earlier trigger of critical state (usually by battery
charge or time remaining) may fare better: your NUT primary server would
seem to serve several UPSes (the "real" device and a few dummies with
different "criticality" levels), and various secondary hosts would MONITOR
the suitable dummy to begin their shutdown earlier into the outage. This
approach may also be useful for Dan's post :)

Jim

On Fri, Oct 27, 2023 at 4:55 PM Magnus Holmgren <
magnus.holmg...@milientsoftware.com> wrote:

> Hi, and thanks for this great piece of free software! I've been meaning to
> sort this out for some time, but we don't get power outages that often,
> fortunately...
>
> So, correct me if I'm wrong, but from the documentation at https://
> networkupstools.org/docs/user-manual.chunked/
> Configuration_notes.html#UPS_shutdown, and also reading upsmon.c, when a
> UPS
> goes OB LB (assuming we have a single UPS connected to a primary and
> supplying
> power to the primary and some number of secondaries), the primary notifies
> the
> secondaries, the secondaries wait for FINALDELAY and then execute
> SHUTDOWNCMD
> immediately followed by exiting, thereby disconnecting from the primary,
> and
> the primary, after seeing all secondaries disconnect, proceed with its
> shutdown (only waiting for FINALDELAY), which ends with telling the UPS to
> cut
> the power (without delay too, right?).
>
> Again, correct me if I'm wrong, Is it only I who find this a bit flawed? I
> would like for the secondaries to stay connected until they shut down. We
> have
> a server with a bunch of virtual machines on, and they can take a couple
> of
> minutes to shut down. Otherwise the primary can easily cut the power
> prematurely. Avoiding this, it seems, could pretty easily be accomplished
> by
> having upsmon wait, perhaps in a separate loop, for the INT/TERM/QUIT
> signal
> (it would still be necessary to configure the service manager such that
> upsmon
> is terminated as late as possible). The primary could start shutting down
> its
> services in the meantime, but upsmon would hold the poweroff until the
> secondaries have disconnected (or HOSTSYNC expires).
>
> Surely this would be better than cranking up FINALDELAY on the primary and
> always waiting for a fixed period of time, as suggested in
> https://alioth-lists.debian.net/pipermail/nut-upsuser/2012-April/007550.html?
> I guess I could
> try writing a SHUTDOWNCMD script that doesn't exit until most other
> services
> have also done so, taking care not to create a deadlock situation.
>
> Another option would be to use upssched to shut down the "big rig"
> earlier. It
> just seems unsatisfying to me that upssched is entirely time-based. It
> would
> be nice if it were easier to trigger off battery.charge or battery.runtime
> going below arbitrary values instead of just the on battery and low
> battery
> statuses.
>
> How do others solve this?
>
> --
> Magnus Holmgren
> ./¯\_/¯\. Milient
> (also holmg...@debian.org)
>
>
>
> ___
> Nut-upsuser mailing list
> Nut-upsuser@alioth-lists.debian.net
> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsu

Re: [Nut-upsuser] FSD sequence: Waiting for bigger and slower clients before cutting power

2023-10-27 Thread Dan Langille via Nut-upsuser
On Fri, Oct 27, 2023, at 10:25 AM, Magnus Holmgren wrote:
> Hi, and thanks for this great piece of free software! I've been meaning to 
> sort this out for some time, but we don't get power outages that often, 
> fortunately...
>
> So, correct me if I'm wrong, but from the documentation at https://
> networkupstools.org/docs/user-manual.chunked/
> Configuration_notes.html#UPS_shutdown, and also reading upsmon.c, when a UPS 
> goes OB LB (assuming we have a single UPS connected to a primary and 
> supplying 
> power to the primary and some number of secondaries), the primary notifies 
> the 
> secondaries, the secondaries wait for FINALDELAY and then execute SHUTDOWNCMD 
> immediately followed by exiting, thereby disconnecting from the primary, and 
> the primary, after seeing all secondaries disconnect, proceed with its 
> shutdown (only waiting for FINALDELAY), which ends with telling the UPS to 
> cut 
> the power (without delay too, right?).
>
> Again, correct me if I'm wrong, Is it only I who find this a bit flawed? I 
> would like for the secondaries to stay connected until they shut down. We 
> have 
> a server with a bunch of virtual machines on, and they can take a couple of 
> minutes to shut down. Otherwise the primary can easily cut the power 
> prematurely. Avoiding this, it seems, could pretty easily be accomplished by 
> having upsmon wait, perhaps in a separate loop, for the INT/TERM/QUIT signal 
> (it would still be necessary to configure the service manager such that 
> upsmon 
> is terminated as late as possible). The primary could start shutting down its 
> services in the meantime, but upsmon would hold the poweroff until the 
> secondaries have disconnected (or HOSTSYNC expires).

I'm not talking directly to your point, however it is a related area.

What I want to do, and have not yet:

* shutdown the primary servers first (i.e. the two Dell R730 in the basement)
* leave the gateway device (small box, very little power and the switches 
running)
* when battery gets down a bit farther, shutdown the rest of the gear

Power outages aren't common for me, so I might be able to keep my home internet 
running for another 40 minutes or so, which might be a good thing.

-- 
  Dan Langille
  d...@langille.org

___
Nut-upsuser mailing list
Nut-upsuser@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsuser


[Nut-upsuser] FSD sequence: Waiting for bigger and slower clients before cutting power

2023-10-27 Thread Magnus Holmgren
Hi, and thanks for this great piece of free software! I've been meaning to 
sort this out for some time, but we don't get power outages that often, 
fortunately...

So, correct me if I'm wrong, but from the documentation at https://
networkupstools.org/docs/user-manual.chunked/
Configuration_notes.html#UPS_shutdown, and also reading upsmon.c, when a UPS 
goes OB LB (assuming we have a single UPS connected to a primary and supplying 
power to the primary and some number of secondaries), the primary notifies the 
secondaries, the secondaries wait for FINALDELAY and then execute SHUTDOWNCMD 
immediately followed by exiting, thereby disconnecting from the primary, and 
the primary, after seeing all secondaries disconnect, proceed with its 
shutdown (only waiting for FINALDELAY), which ends with telling the UPS to cut 
the power (without delay too, right?).

Again, correct me if I'm wrong, Is it only I who find this a bit flawed? I 
would like for the secondaries to stay connected until they shut down. We have 
a server with a bunch of virtual machines on, and they can take a couple of 
minutes to shut down. Otherwise the primary can easily cut the power 
prematurely. Avoiding this, it seems, could pretty easily be accomplished by 
having upsmon wait, perhaps in a separate loop, for the INT/TERM/QUIT signal 
(it would still be necessary to configure the service manager such that upsmon 
is terminated as late as possible). The primary could start shutting down its 
services in the meantime, but upsmon would hold the poweroff until the 
secondaries have disconnected (or HOSTSYNC expires).

Surely this would be better than cranking up FINALDELAY on the primary and 
always waiting for a fixed period of time, as suggested in 
https://alioth-lists.debian.net/pipermail/nut-upsuser/2012-April/007550.html? I 
guess I could 
try writing a SHUTDOWNCMD script that doesn't exit until most other services 
have also done so, taking care not to create a deadlock situation.

Another option would be to use upssched to shut down the "big rig" earlier. It 
just seems unsatisfying to me that upssched is entirely time-based. It would 
be nice if it were easier to trigger off battery.charge or battery.runtime 
going below arbitrary values instead of just the on battery and low battery 
statuses.

How do others solve this?

-- 
Magnus Holmgren
./¯\_/¯\. Milient
(also holmg...@debian.org)



___
Nut-upsuser mailing list
Nut-upsuser@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsuser