>> You may not be aware of it, but this is where the root of the problem >> lies. Normally, when a UPS is told to shutdown, two things can happen: >> >> 1) The input power is gone and the UPS powers off and switches back on >> when the power returns. >> >> 2) Input power is still available and the UPS cycles the power so that >> the systems that receive power from it, can restart. > > In this case, strictly speaking, neither of those two conditions were a > cause.
But it *is* the reason why NUT isn't working as expected. You would never have noticed anything abnormal, if the UPS'es would have restarted automatically, as they are supposed to after sending them a forced shutdown. In this respect, the apc-hid subdriver is *very* broken. With a real power outage, you would see the same problem of systems not getting back online, so this is not limited to setups where NUT monitors many different UPS'es. >> Apparently, #2 is not happening for you and from looking at the driver, >> I can understand why. The shutdown sequence in this driver is not doing >> what it is supposed to do. This needs fixing, but since I don't have an >> APC UPS, I can't do that for you. We'll have to track down the developer >> that wrote this driver, to correct this. > I have pulled the UPS in question out of production and I should have time > next week to attach a spare PC to it and get nut talking to it. For > starters, I'll want to look into why the Smart-UPS's aren't starting back > up a few seconds after a forced shutdown. See the above. You will need to look into the shutdown function in 'apc-hid.c' to correct this. If you can pull the development version of usbhid-ups from the SVN trunk, I can help with that. The stable version doesn't provide enough debugging information to do that remotely. >> There is definitly something broken here too. Under no circumstance >> should a driver indicate both 'on battery' and 'low battery' if the power >> is not actually out. > I don't know for certain if this is applicable to the APC Smart-UPS 1500, > but I think there is a hardware limitation with many UPSs in that they are > not able to communicate when an on-battery condition is due to a self-test > or mains failure. That isn't a problem, since it wouldn't be report anything. :-) Generally speaking though, most of the time this is caused by mapping the UPS variables to the incorrect NUT values or the UPS indicates that the AC failed when a battery test is running. We can correct the first and work around the second if needed, so this is the second thing that is currently broken in the apc-hid subdriver. [...] > While I see where you are coming from, I maintain that this isn't proper > behavior. If the power never went out to the UPS in question, then it's > safer to assume that power will not go out to that UPS (and risk an > unclean shutdown of those hosts) than to assume that that UPS must be shut > down (and guarantee the unavailability of those hosts). This latter should not happen, at least it should not persist. What I'm worried about, is that what you suggest opens a race condition that doesn't exist in the present setup. We support configurations where many different UPS'es work in parallel (various models and even from different vendors). Not all have near-instant notifications if the power goes out. Some devices/drivers will take tens of seconds to notice that. Had the NUT server assumed they were still on line power when it went down, that would lead to a nasty surprise later on. [...] > If a nut server simply isn't running, then client hosts won't refuse to > mount their disks read-write on startup. I don't see any reason to be so > paranoid in one situation but not another. It would be possible to do this though and if memory serves, there are actually people using this. We even suggest something similar in the documentation, to delay startup until the batteries have recharged sufficiently to allow the systems to cleanly shutdown in the event power fails again. >> In a single NUT server, multiple UPS system it is impossible to deal >> with situations where some of the UPS'es monitored receive power >> from the mains and the one powering the NUT server is not. > I agree, it is impossible to do perfectly. There are pitfalls to both > approaches. I believe the current approach violates the principle of least > surprise. The surprise you see, is that the systems didn't restart, that is the #1 problem in the apc-hid subdriver. What we should also warn more clearly for, is that the NUT client-server architecture isn't robust in setups where some UPS'es may be receiving power and some are not. In case of a three phase mains, this might mean that you need three NUT servers (one for each phase), if you use single phase UPS'es. Best regards, Arjen