I won't know for certain for a few more hours, but initially this looks to have fixed it. Thank you.
So far it seems that the fix has also caused a small performance decrease — ~5-10% as measured via several sftp transfers — but this measurement was very unscientific. When I first booted with the new kernel I did an sftp transfer where the throughput dropped by 90% and then very slowly recovered to ~50% normal, but without hanging or interrupting the transfer. It hasn't happened since, so I'm not sure what to make of this. s. On Sun, Sep 18, 2022 at 12:44 AM Marcus Glocker <mar...@nazgul.ch> wrote: > On Sat, Sep 17, 2022 at 10:40:29PM +0200, Marcus Glocker wrote: > > > On Sat, Sep 17, 2022 at 11:22:41AM -0700, Stephan Somogyi wrote: > > > > > Starting with arm64 snapshot kernel 1818 and continuing to 1822, the > latest > > > snapshot, I've been experiencing a persistent problem with the RPi3's > USB > > > bus locking up in a way that requires physical access to power cycle, > and > > > is thus a fairly serious regression. This system has been running > > > continuously on -current since about 6.8-current without anything even > > > remotely like this happening. > > > > > > The 100bT interface is at smsc0 on the usb bus. Initially, it looked > like > > > there may have been a weird race condition since I also had a USB-based > > > flash drive plugged in, but moving that drive around to the other > ports, > > > and eventual complete removal, hasn't stopped the hanging. > > > > > > The hang is visible in dmesg as follows: > > > > > > usbd_start_next: error=5 > > > usbd_start_next: error=5 > > > usbd_free_xfer: xfer=0xffffff8004e74a20 not free > > > smsc0: warning: Failed to read register 0x114 > > > smsc0: warning: MII is busy > > > > > > Searching around, I find references to some of these errors in FreeBSD > and > > > OpenBSD going back at least to 2014, but no clear resolution. It's > > > _possible_ that I have some kind of creeping hardware failure, but it > > > doesn't seem likely. > > > > > > Once the error messages appear, I can no longer access the system over > the > > > network. I've since connected the serial console. If I try to reboot > while > > > it's in this state, the system will hang hard and not even respond to > the > > > console. If I try `ifconfig smsc0 down` it hangs in the same way. > > > > > > While the USB drive was still part of the repro configuration, > attempting > > > to sync or otherwise access the drive also resulted in the hard hang, > > > leading me to conclude this is a USB issue rather than either a mass > > > storage or an ethernet issue. > > > > > > I've also done the usual variable elimination by using different USB > > > drives, different ethernet cables, different port & different switch, > etc. > > > I no longer appear to be able to isolate this further myself. > > > > > > My only recourse once it's in this state is to hard power cycle. > > > > > > I'm happy to try and help debug further; I strongly prefer that > > > 7.2-release/-stable doesn't include this behavior. > > > > > > s. > > > > We had some changes recently in dwctwo(4). I currently think that your > > issue might be related to the last commit to dwc2.c revision 1.67. I'll > > prepare a diff and send it to you for testing by tomorrow. We might > > need some iterations. Worst case we can try to revert that commit. > > Does this diff fix your issue? > > > Index: sys/dev/usb/dwc2/dwc2.c > =================================================================== > RCS file: /cvs/src/sys/dev/usb/dwc2/dwc2.c,v > retrieving revision 1.67 > diff -u -p -u -p -r1.67 dwc2.c > --- sys/dev/usb/dwc2/dwc2.c 10 Sep 2022 08:13:16 -0000 1.67 > +++ sys/dev/usb/dwc2/dwc2.c 18 Sep 2022 07:41:24 -0000 > @@ -242,7 +242,6 @@ dwc2_allocx(struct usbd_bus *bus) > void > dwc2_freex(struct usbd_bus *bus, struct usbd_xfer *xfer) > { > - struct dwc2_xfer *dxfer = DWC2_XFER2DXFER(xfer); > struct dwc2_softc *sc = DWC2_BUS2SC(bus); > > DPRINTFN(10, "\n"); > @@ -255,7 +254,6 @@ dwc2_freex(struct usbd_bus *bus, struct > xfer->busy_free = XFER_FREE; > #endif > DWC2_EVCNT_INCR(sc->sc_ev_xferpoolput); > - dwc2_hcd_urb_free(sc->sc_hsotg, dxfer->urb, xfer->nframes); > pool_put(&sc->sc_xferpool, xfer); > } > > Index: sys/dev/usb/dwc2/dwc2_hcd.c > =================================================================== > RCS file: /cvs/src/sys/dev/usb/dwc2/dwc2_hcd.c,v > retrieving revision 1.28 > diff -u -p -u -p -r1.28 dwc2_hcd.c > --- sys/dev/usb/dwc2/dwc2_hcd.c 9 Sep 2022 21:16:54 -0000 1.28 > +++ sys/dev/usb/dwc2/dwc2_hcd.c 18 Sep 2022 07:41:24 -0000 > @@ -4312,6 +4312,7 @@ void dwc2_host_complete(struct dwc2_hsot > xfer); > } > > + dwc2_hcd_urb_free(sc->sc_hsotg, dxfer->urb, xfer->nframes); > qtd->urb = NULL; > timeout_del(&xfer->timeout_handle); > usb_rem_task(xfer->device, &xfer->abort_task); >