On Apr 10, 2017, at 10:42 AM, Jon Bendtsen wrote:
> 
> Actually maybe it is within NUT's control. Maybe NUT should only claim that a 
> UPS is ONLINE if ONLINE is the only thing it is?

The problem is that a lot of the UPS status values are more of a de-facto 
standard, but they have been generally defined in such a way that simpler UPSes 
only need to report basic status. We don't have an equivalent of "no ALARM", 
just the absence of "ALARM" in the status line.

Also, I think you are reading more into the OL status than is intended. OL 
simply means that AC power is available and passing through the UPS (contrast 
with OB and OFF). Again, because of the de-facto nature of this, we would have 
to consult the ups.type value to accurately reflect whether the UPS is truly an 
online (double-conversion) system, or "offline" with the relay feeding power 
directly from line to load (different than an online UPS in bypass). 
Unfortunately, ups.type is marked as opaque, and is not available everywhere.

The short answer here is that if a monitoring system wants to represent the 
overall health of the system, ALARM needs to be taken into account. (Never mind 
the fact that a basic Back-UPS LS 500 uses a more common HID PDC Usage that 
maps to RB when its battery test fails...) I think we have established that the 
monitoring in upsmon was not sufficient, but by extension, that means the 
Nagios plugin probably needs a change to expose the ALARM bit and message.

Not sure if this got answered already, but is the "No battery installed" alarm 
accurate, or is it just an old battery? If old, does the battery.runtime value 
get adjusted downwards after a battery test? Either way, we would need to 
establish which reading should take priority, and I don't think this is 
straightforward.

I almost think we need another layer of logic to handle priority logic like 
this, as well as scale values. It irks me that we add scale values to the 
driver without knowing the extent of the error (is it only for one firmware 
revision? for a whole line?) This would offer some hope of being able to 
silence false alarms (I vaguely remember some "life cycle alarm" in one UPS 
that contradicted another, more direct, status bit). But this is the sort of 
thing that should be designed, rather than slapped together, and it shouldn't 
get in the way of an UPS that behaves predictably. And I think it should be a 
separate layer so that we can always go directly to the driver to see the raw 
values that the UPS is returning.
_______________________________________________
Nut-upsuser mailing list
Nut-upsuser@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/nut-upsuser

Reply via email to