On Sat, Sep 17, 2022 at 11:22:41AM -0700, Stephan Somogyi wrote: > Starting with arm64 snapshot kernel 1818 and continuing to 1822, the latest > snapshot, I've been experiencing a persistent problem with the RPi3's USB > bus locking up in a way that requires physical access to power cycle, and > is thus a fairly serious regression. This system has been running > continuously on -current since about 6.8-current without anything even > remotely like this happening. > > The 100bT interface is at smsc0 on the usb bus. Initially, it looked like > there may have been a weird race condition since I also had a USB-based > flash drive plugged in, but moving that drive around to the other ports, > and eventual complete removal, hasn't stopped the hanging. > > The hang is visible in dmesg as follows: > > usbd_start_next: error=5 > usbd_start_next: error=5 > usbd_free_xfer: xfer=0xffffff8004e74a20 not free > smsc0: warning: Failed to read register 0x114 > smsc0: warning: MII is busy > > Searching around, I find references to some of these errors in FreeBSD and > OpenBSD going back at least to 2014, but no clear resolution. It's > _possible_ that I have some kind of creeping hardware failure, but it > doesn't seem likely. > > Once the error messages appear, I can no longer access the system over the > network. I've since connected the serial console. If I try to reboot while > it's in this state, the system will hang hard and not even respond to the > console. If I try `ifconfig smsc0 down` it hangs in the same way. > > While the USB drive was still part of the repro configuration, attempting > to sync or otherwise access the drive also resulted in the hard hang, > leading me to conclude this is a USB issue rather than either a mass > storage or an ethernet issue. > > I've also done the usual variable elimination by using different USB > drives, different ethernet cables, different port & different switch, etc. > I no longer appear to be able to isolate this further myself. > > My only recourse once it's in this state is to hard power cycle. > > I'm happy to try and help debug further; I strongly prefer that > 7.2-release/-stable doesn't include this behavior. > > s.
We had some changes recently in dwctwo(4). I currently think that your issue might be related to the last commit to dwc2.c revision 1.67. I'll prepare a diff and send it to you for testing by tomorrow. We might need some iterations. Worst case we can try to revert that commit.