Re: [Nut-upsuser] SNMPv3 fails when more than one UPS is configured in ups.conf

2017-12-05 Thread Lee Damon
>How close do the EPEL RPM's systemd configuration files look to the
ones in the NUT tree?

The only difference between /lib/systemd/system/nut-driver.service and
the one
at 
https://github.com/networkupstools/nut/blob/v2.7.2/scripts/systemd/nut-driver.service.in
(not counting variable substitution for paths) is an additional line in
the [Service] section provided by EPEL:

ExecStartPre=-/usr/bin/systemd-tmpfiles --create
/etc/tmpfiles.d/nut-run.conf

> The "StopWhenUnneeded=no" part is odd, in my mind, although the idea
is for "upsdrvctl start" to start
> the drivers, let them fork, then exit. (Were you running "upsdrvctl -D
start" at the command line or under
> systemd? I would not  expect the latter to work without modifications.)

I've left all systemd files intact except for having to add the
StopWhenUnneeded=no (along with change to TomeoutSec) to
/etc/systemd/system/nut-driver.service.d/nut-driver.conf. I presume the
default is to run it without -D. 

Even with the extended timeouts I've seen upsdrvctl start take 4 or 5
tries to start a driver (sometimes it takes multiple tries to start
multiple drivers). I've even seen it take multiple tries for the very
first driver. Even with the extended timeouts it has failed to start on
one reboot out of three.

I suspect there's a deeper problem here. This morning I came in to 22
emails about COMBAD and 22 emails about COMOK from upsmon on the master
host and a bunch more such messages from upsmon on my test clients. The
time between BAD and OK is almost always 5 seconds (I presume this is
the retry timer setting). Oddly enough, the complaint times on the test
clients don't always match up with the complaint times on the master.
Sometimes they both complain at the same time, sometimes the master
complains but the client doesn't, and some times the client complains
but the master doesn't. I don't remember seeing this at all when I was
using SNMPv1 but I didn't use that for very long so it's possible I just
didn't see it because it hadn't happened yet. I think I'm going to
switch the config to SNMPv1 for a while and see if the problem persists.

nomad

___
Nut-upsuser mailing list
Nut-upsuser@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/nut-upsuser

Re: [Nut-upsuser] SNMPv3 fails when more than one UPS is configured in ups.conf

2017-12-05 Thread Lee Damon
One thing I just noticed, all of the COMBAD and COMOK messages are about
the same UPS, the other two seem to be fine.

nomad

On 12/5/17 08:24 , Lee Damon wrote:
...
> I suspect there's a deeper problem here. This morning I came in to 22
> emails about COMBAD and 22 emails about COMOK from upsmon on the master
> host and a bunch more such messages from upsmon on my test clients. The
> time between BAD and OK is almost always 5 seconds (I presume this is
> the retry timer setting). Oddly enough, the complaint times on the test
> clients don't always match up with the complaint times on the master.
> Sometimes they both complain at the same time, sometimes the master
> complains but the client doesn't, and some times the client complains
> but the master doesn't. I don't remember seeing this at all when I was
> using SNMPv1 but I didn't use that for very long so it's possible I just
> didn't see it because it hadn't happened yet. I think I'm going to
> switch the config to SNMPv1 for a while and see if the problem persists.


___
Nut-upsuser mailing list
Nut-upsuser@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/nut-upsuser


Re: [Nut-upsuser] SNMPv3 fails when more than one UPS is configured in ups.conf

2017-12-05 Thread Charles Lepple
On Dec 4, 2017, at 2:43 PM, Lee Damon  wrote:
> 
> I have it "working" with the following settings changes from the default
> in the RPM. By "working" I mean it starts after reboot and after issuing
> 'sudo systemctl restart nut-driver' but, as expected, it takes quite a
> while to finish startup.

Yeah, this doesn't sound ideal.

How close do the EPEL RPM's systemd configuration files look to the ones in the 
NUT tree? e.g. 
https://github.com/networkupstools/nut/blob/v2.7.2/scripts/systemd/nut-driver.service.in

If the EPEL RPMs have a bug tracker, it might be worth pinging them about it. 
The "StopWhenUnneeded=no" part is odd, in my mind, although the idea is for 
"upsdrvctl start" to start the drivers, let them fork, then exit. (Were you 
running "upsdrvctl -D start" at the command line or under systemd? I would not 
expect the latter to work without modifications.)

There is also a proposal for reworking the NUT driver startup under systemd, in 
case problems crop up later, and you want to revisit this:

https://github.com/networkupstools/nut/pull/330

As I understand it, the drivers would each get their own systemd unit, so that 
might be easier for isolating failures.



___
Nut-upsuser mailing list
Nut-upsuser@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/nut-upsuser


Re: [Nut-upsuser] SNMPv3 fails when more than one UPS is configured in ups.conf

2017-12-04 Thread Lee Damon
I have it "working" with the following settings changes from the default
in the RPM. By "working" I mean it starts after reboot and after issuing
'sudo systemctl restart nut-driver' but, as expected, it takes quite a
while to finish startup.

/etc/systemd/system/nut-driver.service.d/nut-driver.conf
[Unit]
StopWhenUnneeded=no

[Service]
TimeoutSec=600



/etc/ups/ups.conf
...
maxretry = 10
retrydelay = 15
...


As I said, I suspect the bulk of these are cargo cult "it worked, not
messing with it" settings.

nomad

___
Nut-upsuser mailing list
Nut-upsuser@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/nut-upsuser


Re: [Nut-upsuser] SNMPv3 fails when more than one UPS is configured in ups.conf

2017-12-04 Thread Lee Damon
I've told systemd to wait up to 10 minutes for startup but it's still
failing. Sometimes it fails with "timeout exceeded" and sometimes with
the ASN error previously reported.

I went back to running upsdrvctl -D to see if I could see anything. On
the third run I saw messages about "startup timer elapsed" for ups2-1
and then it tried again to launch that one (killing the existing one).
It did this multiple times. When trying to systemd I see a bunch of
zombie processes that used to be snmp-ups for one of ups2-1 or ups3-1
(depending on the run).

I'm trying again with
 /etc/systemd/system/nut-driver.service.d/nut-driver.conf set to:
  [Service]
  TimeoutSec=600
and
 /etc/ups/ups.conf having
  maxretry = 10
  retrydelay = 15

This time ps shows three apparently happy snmp-ups processes but
upsdrvctl start is still showing up. ... and then everything vanishes
but systemd doesn't complain. journalctl -xe shows a good startup then
claims nut-driver.service isn't needed anymore and shuts it down:

Dec 04 10:17:52 [redacted] snmp-ups[14578]: Startup successful
Dec 04 10:17:52 [redacted] upsdrvctl[14556]: Network UPS Tools - UPS
driver controller 2.7.2
Dec 04 10:17:52 [redacted] systemd[1]: Started Network UPS Tools - power
device driver controller.
-- Subject: Unit nut-driver.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit nut-driver.service has finished starting up.
--
-- The start-up result is done.
Dec 04 10:17:52 [redacted] systemd[1]: Unit nut-driver.service is not
needed anymore. Stopping.

I have a feeling that if I do get this working by setting huge timers
I'm just cargo-culting a "fix".

nomad

On 12/4/17 09:38 , Lee Damon wrote:
> Hi Charles,
> 
> Running upsdrvctl -D start on the host with all three configured to
> SNMPv3 kicks out 56 different "unhandled ASN 0x81 from ..." lines on all
> three but importantly they all start up and keep running.
> 
> I'm starting to suspect the startup is taking so long that systemd is
> timing out and killing it. time -p reports real time of 133.27. I'm not
> a fan of systemd but it's a thing I have to deal with. I'm going to see
> if I can find a way to tell it to give startup more time.
> 
> I'm attaching a -D startup in case anyone is curious (but I doubt anyone
> will be. :)
> 
> nomad

___
Nut-upsuser mailing list
Nut-upsuser@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/nut-upsuser


Re: [Nut-upsuser] SNMPv3 fails when more than one UPS is configured in ups.conf

2017-12-04 Thread Lee Damon
Hi Charles,

Running upsdrvctl -D start on the host with all three configured to
SNMPv3 kicks out 56 different "unhandled ASN 0x81 from ..." lines on all
three but importantly they all start up and keep running.

I'm starting to suspect the startup is taking so long that systemd is
timing out and killing it. time -p reports real time of 133.27. I'm not
a fan of systemd but it's a thing I have to deal with. I'm going to see
if I can find a way to tell it to give startup more time.

I'm attaching a -D startup in case anyone is curious (but I doubt anyone
will be. :)

nomad

On 12/3/17 14:34 , Charles Lepple wrote:
> On Nov 30, 2017, at 6:31 PM, Lee Damon  wrote:
>>
>> SNMPv3 works fine when I have any one of the three configured in
>> ups.conf while the other two are configured with SNMPv1. It doesn't
>> matter which _one_ UPS is configured for SNMPv3 in ups.conf, they all
>> work individually (see below example).
>>
>> However, if I configure any two or all three of them to use SNMPv3 then
>> on startup upsdrvctl complains about "unhandled ASN 0x81 received from
>> .1.3.6.1.4.1.318.1.1.1.9.3.3.1.7.1.1.3" and fails to start.
>>
> Thanks for the detailed bug report. It's an interesting failure mode.
> 
> I created an issue on GitHub with a link back to the mail archive because the 
> developers who are most involved with SNMP are more likely to see it there 
> (and so it doesn't get lost; we're attempting to put together a new release 
> at the moment): https://github.com/networkupstools/nut/issues/508 (feel free 
> to reply to the list, or post there).
> 
> I suspect that the message you are seeing is from here in the snmp-ups driver:
> 
>https://github.com/networkupstools/nut/blob/v2.7.2/drivers/snmp-ups.c#L706
> 
> (There are two other "unhandled ASN ..." messages, but due to the use of 
> upsdebugx(), they probably wouldn't show up the way that upsdrvctl starts the 
> drivers.)
> 
> To get more context on what is happening before this error, you can start the 
> driver directly with a few "-D" flags. The command line can be found from a 
> running driver, or you can pass "-D" to upsdrvctl itself (I think; there have 
> been some changes to that code over the years) and it will show the whole 
> command line.
> 
> Another option (which papers over the problem, but may help in the near term) 
> is to see if this is just a startup issue. You can specify one UPS with 
> "upsdrvctl start ups1", and then see if the other drivers still start if you 
> wait until the first driver has completed its SNMPv3 handshake. You can 
> automate this a bit with the "maxretry" and "retrydelay" options in ups.conf: 
> http://networkupstools.org/docs/man/ups.conf.html
> 

Script started on Mon 04 Dec 2017 09:32:50 AM PST
: || nomad@nomaddev ups [1001] ; time -p sudo /usr/sbin/upsdrvctl -D start && 
exit
Network UPS Tools - UPS driver controller 2.7.2
   0.00 Starting UPS: ups1-1
Network UPS Tools - Generic SNMP UPS driver 0.72 (2.7.2)
Detected Smart-UPS 5000 on host ups1-1.[redacted] (mib: apcc 1.2)
[ups1-1] unhandled ASN 0x81 received from .1.3.6.1.4.1.318.1.1.1.9.2.3.1.3.1.1.1
[ups1-1] unhandled ASN 0x81 received from .1.3.6.1.4.1.318.1.1.1.9.2.3.1.3.1.1.2
[ups1-1] unhandled ASN 0x81 received from .1.3.6.1.4.1.318.1.1.1.9.2.3.1.3.1.1.3
[ups1-1] unhandled ASN 0x81 received from .1.3.6.1.4.1.318.1.1.1.9.2.3.1.4.1.1.1
[ups1-1] unhandled ASN 0x81 received from .1.3.6.1.4.1.318.1.1.1.9.2.3.1.4.1.1.2
[ups1-1] unhandled ASN 0x81 received from .1.3.6.1.4.1.318.1.1.1.9.2.3.1.4.1.1.3
[ups1-1] unhandled ASN 0x81 received from .1.3.6.1.4.1.318.1.1.1.9.2.3.1.5.1.1.1
[ups1-1] unhandled ASN 0x81 received from .1.3.6.1.4.1.318.1.1.1.9.2.3.1.5.1.1.2
[ups1-1] unhandled ASN 0x81 received from .1.3.6.1.4.1.318.1.1.1.9.2.3.1.5.1.1.3
[ups1-1] unhandled ASN 0x81 received from .1.3.6.1.4.1.318.1.1.1.9.2.3.1.6.1.1.1
[ups1-1] unhandled ASN 0x81 received from .1.3.6.1.4.1.318.1.1.1.9.2.3.1.6.1.1.2
[ups1-1] unhandled ASN 0x81 received from .1.3.6.1.4.1.318.1.1.1.9.2.3.1.6.1.1.3
[ups1-1] unhandled ASN 0x81 received from .1.3.6.1.4.1.318.1.1.1.9.2.3.1.7.1.1.1
[ups1-1] unhandled ASN 0x81 received from .1.3.6.1.4.1.318.1.1.1.9.2.3.1.7.1.1.2
[ups1-1] unhandled ASN 0x81 received from .1.3.6.1.4.1.318.1.1.1.9.2.3.1.7.1.1.3
[ups1-1] unhandled ASN 0x81 received from .1.3.6.1.4.1.318.1.1.1.9.2.3.1.8.1.1.1
[ups1-1] unhandled ASN 0x81 received from .1.3.6.1.4.1.318.1.1.1.9.2.3.1.8.1.1.2
[ups1-1] unhandled ASN 0x81 received from .1.3.6.1.4.1.318.1.1.1.9.2.3.1.8.1.1.3
[ups1-1] unhandled ASN 0x81 received from .1.3.6.1.4.1.318.1.1.1.9.2.2.1.4.1
[ups1-1] unhandled ASN 0x81 received from .1.3.6.1.4.1.318.1.1.1.2.3.5.0
[ups1-1] unhandled ASN 0x81 received from .1.3.6.1.4.1.318.1.1.1.2.3.6.0
[ups1-1] unhandled ASN 0x81 received from .1.3.6.1.4.1.318.1.1.1.9.3.2.1.4.1
[ups1-1] unhandled ASN 0x81 received from .1.3.6.1.4.1.318.1.1.1.9.3.3.1.3.1.1.1
[ups1-1] unhandled ASN 0x81 received from .1.3.6.1.4.1.318.1.1.1.9.3.3.1.3.1.1.2

Re: [Nut-upsuser] SNMPv3 fails when more than one UPS is configured in ups.conf

2017-12-03 Thread Charles Lepple
On Nov 30, 2017, at 6:31 PM, Lee Damon  wrote:
> 
> SNMPv3 works fine when I have any one of the three configured in
> ups.conf while the other two are configured with SNMPv1. It doesn't
> matter which _one_ UPS is configured for SNMPv3 in ups.conf, they all
> work individually (see below example).
> 
> However, if I configure any two or all three of them to use SNMPv3 then
> on startup upsdrvctl complains about "unhandled ASN 0x81 received from
> .1.3.6.1.4.1.318.1.1.1.9.3.3.1.7.1.1.3" and fails to start.
> 
Thanks for the detailed bug report. It's an interesting failure mode.

I created an issue on GitHub with a link back to the mail archive because the 
developers who are most involved with SNMP are more likely to see it there (and 
so it doesn't get lost; we're attempting to put together a new release at the 
moment): https://github.com/networkupstools/nut/issues/508 (feel free to reply 
to the list, or post there).

I suspect that the message you are seeing is from here in the snmp-ups driver:

   https://github.com/networkupstools/nut/blob/v2.7.2/drivers/snmp-ups.c#L706

(There are two other "unhandled ASN ..." messages, but due to the use of 
upsdebugx(), they probably wouldn't show up the way that upsdrvctl starts the 
drivers.)

To get more context on what is happening before this error, you can start the 
driver directly with a few "-D" flags. The command line can be found from a 
running driver, or you can pass "-D" to upsdrvctl itself (I think; there have 
been some changes to that code over the years) and it will show the whole 
command line.

Another option (which papers over the problem, but may help in the near term) 
is to see if this is just a startup issue. You can specify one UPS with 
"upsdrvctl start ups1", and then see if the other drivers still start if you 
wait until the first driver has completed its SNMPv3 handshake. You can 
automate this a bit with the "maxretry" and "retrydelay" options in ups.conf: 
http://networkupstools.org/docs/man/ups.conf.html
___
Nut-upsuser mailing list
Nut-upsuser@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/nut-upsuser


[Nut-upsuser] SNMPv3 fails when more than one UPS is configured in ups.conf

2017-11-30 Thread Lee Damon
New NUT install dealing with three APC Smart-UPS 5000s, two of which
have AP9617 and the third has a AP9619 card.

The SNMPv3 configurations on all three are exactly the same. This is
confirmed by snmpget calls that work just fine:
snmpget -Cf -v3 -u [...] -l authPriv -A '[...]' -X '[...]' -a
MD5 -x DES [upsname] .1.3.6.1.4.1.318.1.1.1.1.1.1.0

snmp-ups can query all three UPSs with SNMPv1 with no problems. However,
querying with SNMPv3 doesn't work if I try to query more than one of them.

SNMPv3 works fine when I have any one of the three configured in
ups.conf while the other two are configured with SNMPv1. It doesn't
matter which _one_ UPS is configured for SNMPv3 in ups.conf, they all
work individually (see below example).

However, if I configure any two or all three of them to use SNMPv3 then
on startup upsdrvctl complains about "unhandled ASN 0x81 received from
.1.3.6.1.4.1.318.1.1.1.9.3.3.1.7.1.1.3" and fails to start.

My current [redacted] ups.conf follows. The only differences between the
values are on the port and desc lines (ignoring #, of course).

[ups1]
desc = "APC Smart-UPS 5000 SNMP in rack 1 - AP9617"
driver = snmp-ups
mibs = apcc
port = [ups1...]
#snmp_version = v1
#community = [xxx]
snmp_version = v3
secLevel = authPriv
secName = [xxx]
authPassword = [xxx]
authProtocol = MD5
privPassword = [xxx]
privProtocol = DES

[ups2]
desc = "APC Smart-UPS 5000 SNMP in rack 2 - AP9619"
driver = snmp-ups
mibs = apcc
port = [ups2...]
snmp_version = v1
community = [xxx]
#snmp_version = v3
#secLevel = authPriv
#secName = [xxx]
#authPassword = [xxx]
#authProtocol = MD5
#privPassword = [xxx]
#privProtocol = DES

[ups3]
desc = "APC Smart-UPS 5000 SNMP in rack 3 - AP9617"
driver = snmp-ups
mibs = apcc
port = [ups3...]
snmp_version = v1
community = [xxx]
#snmp_version = v3
#secLevel = authPriv
#secName = [xxx]
#authPassword = [xxx]
#authProtocol = MD5
#privPassword = [xxx]
#privProtocol = DES


OS: CentOS Linux release 7.4.1708 (Core)
Kernel: 3.10.0-693.5.2.el7.x86_64
NUT installed from: nut-2.7.2-3.el7.x86_64 (EPEL RPM)

SNMP-related RPMs installed on the host:
net-snmp-devel-5.7.2-28.el7.x86_64
net-snmp-utils-5.7.2-28.el7.x86_64
nagios-plugins-snmp-2.2.1-4git.el7.x86_64
net-snmp-libs-5.7.2-28.el7.x86_64
net-snmp-agent-libs-5.7.2-28.el7.x86_64
net-snmp-5.7.2-28.el7.x86_64


Any suggestions on where I should look next would be greatly appreciated.

thanks,
nomad

___
Nut-upsuser mailing list
Nut-upsuser@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/nut-upsuser