Thanks for the info!
First, regarding the later detail about starting `usbhid-ups`: the
"Resource busy" indicates that likely an earlier instance of the NUT driver
(in its own service unit) is still running and holding the device. In a
worse case, some other programs might consider this a HID device (same
class family as keyboards and mice) and grab it somehow, or it gets passed
through virtualization/containers and a different OS holds it in fact
(guest if you're running NUT on hypervisor, or vice versa). Make sure you
stop those too, whichever applies.
Second, looking at dmesg and other logs: it seems the device number gets
re-enumerated (lost #3, got #4) during reconnection - some OSes do that,
others do not. This might confuse a driver's attempts to re-attach (relying
on earlier learned details from initial connection) - although due to an
issue in NUT v2.8.0 and before the drivers might have not tracked the
"device" number (seen as "unknown" here), which got fixed in 2.8.1, but
since 2.8.2 the `nut-scanner` would again not-suggest it as a config option
by default due to such unreliability with reconnections...
There are quite a few nuances that might differ between "vanilla" NUT
systemd integration (which also evolved over time in git sources) and that
packaged by distros, I haven't looked deep into Debian recipes lately.
Maybe there's some unfortunate interaction of service dependencies that
follows up from the disconnections. Some educated guesswork below:
My current guess would be that a driver gives up reconnecting with the
remembered data that no longer matches the device, and perhaps exits to be
restarted by systemd (journal history of `nut-driver@Eaton.service`, if
that one got generated by `nut-driver-enumerator` on your system, might
confirm or dispel that guess; otherwise check for a plain old
`nut-driver.service` monolith). Or it exits/crashes due to some other
reason, such as manual runs of `usbhid-ups` (if the PID file is saved by
the daemon and is found and used by the command-line spawned instance to
kill off the "presumed-frozen" competitor; in the log above I don't see
direct indications of that though).
* Found an example of when a new driver process kills its older "self"
using the PID files - just in case you see such indications in your logs:
Duplicate driver instance detected (PID file
/run/nut/nutdrv_qx-tecnowaremansarda.pid exists)! Terminating other driver!
Further, what I guess could follow up is that if your only
`nut-driver*.service` exits and restarts, the systemd dependency it
provides for `nut-server.service` (maybe via `nut-driver.target`) flickers
and causes the data server to restart. (Not sure at the moment if it is a
weak Wants or a harder Requires type of dependency there; technically
`upsd` should run well without drivers and report that the device is
unknown or data is stale).
Probably you can interpolate the event trails from `dmesg` and `journalctl
-lx` as suggested earlier, with bumped `debug_min` in `ups.conf`,
`upsd.conf` and maybe `upsmon.conf`, to check if the real events and
service state changes seem to confirm this theorized chain of events:
* USB reconnection (HW/FW reasons can vary a lot);
* `nut-driver@Eaton.service` tries to reconnect but fails, or for older
versions just aborts due to loss of link (even without OS USB
re-enumeration, there can be a few seconds when the newly made devfs node
is owned by `root` and `udev` did not have time to hand it off to `nut`, so
the NUT driver can not re-attach);
* fault of `nut-driver*` causes `nut-server` to stop (shouldn't, but might,
happen... at least fits the symptoms you've posted earlier)
* the `nut-driver` is resuscitated by systemd after some RestartSec timeout
* dependency for `nut-server` is healthy so it is started up again
* thinking of it, maybe the last couple of steps is almost concurrent: as
soon as systemd launched the driver process, its unit is considered
healthy, so the data server starts; however the driver then takes some time
to do the initial walk of the device and only then begins talking to `upsd`
-- maybe this is when `upsmon` asks for login to the not-yet-recognized UPS
so the nut-server says it is denied per your original post. With NUT
v2.8.1+ the systemd integration is tighter, so daemons can notify the
service manager when they are actually ready to serve, and only then the
dependencies can start, if (packaging) build-time configuration options
enable this mode.
Given the revised integrations since NUT v2.8.0 release, you might have
better luck with current master-branch codebase - see
https://github.com/networkupstools/nut/wiki/Building-NUT-for-in%E2%80%90place-upgrades-or-non%E2%80%90disruptive-tests
- and by confirming that it works or by uncovering more edge cases that are
not well handled, help improve an upcoming NUT v2.8.2 release :)
Hope this helps,
Jim Klimov
On Sat, Jan 20, 2024 at 1:07 PM Stefan Schumacher <
> stefanschumacheratw...@gmail.com> wrote:
>