On Nov 21, 2014, at 7:53 PM, Kristian Rasmussen 
<kristian_rasmus...@fastmail.co.uk> wrote:

> When this happens, reloading the usbhid-ups driver and even unplugging
> and re-inserting the USB cable does not fix the problem, and lsusb does
> not list the UPS at all. On one system, an x86_64 server running kernel
> 3.17.3, the following can be seen in the log:
> 
> Nov 21 23:24:07 test-svr1 kernel: [111233.920039] ohci-pci 0000:00:02.0:
> frame counter not updating; disabled
> Nov 21 23:24:07 test-svr1 kernel: [111233.920047] ohci-pci 0000:00:02.0:
> HC died; cleaning up
> 
> Unloading and re-inserting the ohci kernel modules (ohci_hcd and
> ohci_pci) does temporarily resolve the issue on this host, but after a
> few hours the problem appears again.

The "frame counter not updating" message seems like it could indicate an IRQ 
issue. While these were much more common before the advent of PCI, there is 
still potential for mismatches between the hardware routing of the PCI 
interrupt lines, and the interrupt tables that get passed in to the kernel from 
the BIOS/EFI.

Alternatively, you might have load-related problems where another hardware 
device is not properly sharing the interrupt with the OHCI controller. If I 
remember correctly, OHCI can only handle two or four physical USB ports per PCI 
device (or per function? it's been a while) so if you can check 
/proc/interrupts and find another group of ports on a different OHCI device and 
IRQ, you might have better luck.

Are these ports also USB2 or USB3? Does the addition of a USB 2.0 hub change 
the symptoms?

Also, are you using non-default values for pollinterval or pollfreq?

I have no idea if this matters, but are you using libusb-0.1, or libusb-1.0.x 
with libusb-compat?

> On the other affected system, a 32-bit VIA router running kernel 3.12.6,
> the issue causes a kernel panic and nothing regarding the kernel USB
> drivers is logged. I haven't yet had the opportunity to set up serial
> console logging to conclusively verify that the panic occurs in the ohci
> module, but it does seem likely; if I catch the problem in time, after
> the warning about "stale data" from upsd but before the actual panic
> occurs, I can unload and reload the ohci drivers and prevent the crash.

Same comments apply, but without the exact crash info, I'd say it's a little 
early to draw conclusions.

> Both systems are running nut-2.7.1 compiled from source. The UPS units
> involved are not identical; one is an older MGE Pulsar while the other
> is a newer Eaton, but both use the same USB identiier (0463:ffff, "MGE
> UPS Systems UPS"). The USB chipsets involved are quite dissimilar
> (Nvidia vs. VIA).

Given that the OHCI drivers are typically the same for the different chipsets, 
it could well be OHCI-related. I am also not aware of any changes in behavior 
for the chip(s) 0463:ffff ID between MGE and Eaton, although that ID is used on 
a wide array of their USB devices.

> Is this likely to be a bug in nut, or has the nut usbhid-ups driver
> perhaps triggered an underlying kernel bug in the USB driver subsystem?
> Anything I can do to narrow down what's causing this?

I think it's the latter. While we have had our share of USB-related bugs in the 
NUT userspace code, they typically don't cause kernel crashes across-the-board. 
(I know it doesn't help much in your case, but we have many counterexamples of 
kernel/hardware/NUT versions that do work long-term-- maybe we should be 
recording those to find patterns.) The usual NUT USB problem manifests itself 
as an inscrutable errno value from the kernel, which doesn't bring the rest of 
the system down with it.

Unfortunately, we don't often hear back when we recommend that users take the 
issue up with the Linux USB lists[1], so I don't know if these bugs are getting 
fixed, or if people are just moving to other hardware (UPS or motherboard).

[1] http://www.linux-usb.org/mailing.html

The bottom line is that we are happy to try and work with you to debug this, 
but so far there are more questions than answers in this particular problem 
space.

-- 
Charles Lepple
clepple@gmail




_______________________________________________
Nut-upsuser mailing list
Nut-upsuser@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/nut-upsuser

Reply via email to