On Sat, Sep 17, 2022 at 11:22:41AM -0700, Stephan Somogyi wrote:

> Starting with arm64 snapshot kernel 1818 and continuing to 1822, the latest
> snapshot, I've been experiencing a persistent problem with the RPi3's USB
> bus locking up in a way that requires physical access to power cycle, and
> is thus a fairly serious regression. This system has been running
> continuously on -current since about 6.8-current without anything even
> remotely like this happening.
> 
> The 100bT interface is at smsc0 on the usb bus. Initially, it looked like
> there may have been a weird race condition since I also had a USB-based
> flash drive plugged in, but moving that drive around to the other ports,
> and eventual complete removal, hasn't stopped the hanging.
> 
> The hang is visible in dmesg as follows:
> 
> usbd_start_next: error=5
> usbd_start_next: error=5
> usbd_free_xfer: xfer=0xffffff8004e74a20 not free
> smsc0: warning: Failed to read register 0x114
> smsc0: warning: MII is busy
> 
> Searching around, I find references to some of these errors in FreeBSD and
> OpenBSD going back at least to 2014, but no clear resolution. It's
> _possible_ that I have some kind of creeping hardware failure, but it
> doesn't seem likely.
> 
> Once the error messages appear, I can no longer access the system over the
> network. I've since connected the serial console. If I try to reboot while
> it's in this state, the system will hang hard and not even respond to the
> console. If I try `ifconfig smsc0 down` it hangs in the same way.
> 
> While the USB drive was still part of the repro configuration, attempting
> to sync or otherwise access the drive also resulted in the hard hang,
> leading me to conclude this is a USB issue rather than either a mass
> storage or an ethernet issue.
> 
> I've also done the usual variable elimination by using different USB
> drives, different ethernet cables, different port & different switch, etc.
> I no longer appear to be able to isolate this further myself.
> 
> My only recourse once it's in this state is to hard power cycle.
> 
> I'm happy to try and help debug further; I strongly prefer that
> 7.2-release/-stable doesn't include this behavior.
> 
> s.

We had some changes recently in dwctwo(4).  I currently think that your
issue might be related to the last commit to dwc2.c revision 1.67.  I'll
prepare a diff and send it to you for testing by tomorrow.  We might
need some iterations.  Worst case we can try to revert that commit.

Reply via email to