I won't know for certain for a few more hours, but initially this looks to
have fixed it. Thank you.

So far it seems that the fix has also caused a small performance decrease —
~5-10% as measured via several sftp transfers — but this measurement was
very unscientific.

When I first booted with the new kernel I did an sftp transfer where the
throughput dropped by 90% and then very slowly recovered to ~50% normal,
but without hanging or interrupting the transfer. It hasn't happened since,
so I'm not sure what to make of this.

s.


On Sun, Sep 18, 2022 at 12:44 AM Marcus Glocker <mar...@nazgul.ch> wrote:

> On Sat, Sep 17, 2022 at 10:40:29PM +0200, Marcus Glocker wrote:
>
> > On Sat, Sep 17, 2022 at 11:22:41AM -0700, Stephan Somogyi wrote:
> >
> > > Starting with arm64 snapshot kernel 1818 and continuing to 1822, the
> latest
> > > snapshot, I've been experiencing a persistent problem with the RPi3's
> USB
> > > bus locking up in a way that requires physical access to power cycle,
> and
> > > is thus a fairly serious regression. This system has been running
> > > continuously on -current since about 6.8-current without anything even
> > > remotely like this happening.
> > >
> > > The 100bT interface is at smsc0 on the usb bus. Initially, it looked
> like
> > > there may have been a weird race condition since I also had a USB-based
> > > flash drive plugged in, but moving that drive around to the other
> ports,
> > > and eventual complete removal, hasn't stopped the hanging.
> > >
> > > The hang is visible in dmesg as follows:
> > >
> > > usbd_start_next: error=5
> > > usbd_start_next: error=5
> > > usbd_free_xfer: xfer=0xffffff8004e74a20 not free
> > > smsc0: warning: Failed to read register 0x114
> > > smsc0: warning: MII is busy
> > >
> > > Searching around, I find references to some of these errors in FreeBSD
> and
> > > OpenBSD going back at least to 2014, but no clear resolution. It's
> > > _possible_ that I have some kind of creeping hardware failure, but it
> > > doesn't seem likely.
> > >
> > > Once the error messages appear, I can no longer access the system over
> the
> > > network. I've since connected the serial console. If I try to reboot
> while
> > > it's in this state, the system will hang hard and not even respond to
> the
> > > console. If I try `ifconfig smsc0 down` it hangs in the same way.
> > >
> > > While the USB drive was still part of the repro configuration,
> attempting
> > > to sync or otherwise access the drive also resulted in the hard hang,
> > > leading me to conclude this is a USB issue rather than either a mass
> > > storage or an ethernet issue.
> > >
> > > I've also done the usual variable elimination by using different USB
> > > drives, different ethernet cables, different port & different switch,
> etc.
> > > I no longer appear to be able to isolate this further myself.
> > >
> > > My only recourse once it's in this state is to hard power cycle.
> > >
> > > I'm happy to try and help debug further; I strongly prefer that
> > > 7.2-release/-stable doesn't include this behavior.
> > >
> > > s.
> >
> > We had some changes recently in dwctwo(4).  I currently think that your
> > issue might be related to the last commit to dwc2.c revision 1.67.  I'll
> > prepare a diff and send it to you for testing by tomorrow.  We might
> > need some iterations.  Worst case we can try to revert that commit.
>
> Does this diff fix your issue?
>
>
> Index: sys/dev/usb/dwc2/dwc2.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/usb/dwc2/dwc2.c,v
> retrieving revision 1.67
> diff -u -p -u -p -r1.67 dwc2.c
> --- sys/dev/usb/dwc2/dwc2.c     10 Sep 2022 08:13:16 -0000      1.67
> +++ sys/dev/usb/dwc2/dwc2.c     18 Sep 2022 07:41:24 -0000
> @@ -242,7 +242,6 @@ dwc2_allocx(struct usbd_bus *bus)
>  void
>  dwc2_freex(struct usbd_bus *bus, struct usbd_xfer *xfer)
>  {
> -       struct dwc2_xfer *dxfer = DWC2_XFER2DXFER(xfer);
>         struct dwc2_softc *sc = DWC2_BUS2SC(bus);
>
>         DPRINTFN(10, "\n");
> @@ -255,7 +254,6 @@ dwc2_freex(struct usbd_bus *bus, struct
>         xfer->busy_free = XFER_FREE;
>  #endif
>         DWC2_EVCNT_INCR(sc->sc_ev_xferpoolput);
> -       dwc2_hcd_urb_free(sc->sc_hsotg, dxfer->urb, xfer->nframes);
>         pool_put(&sc->sc_xferpool, xfer);
>  }
>
> Index: sys/dev/usb/dwc2/dwc2_hcd.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/usb/dwc2/dwc2_hcd.c,v
> retrieving revision 1.28
> diff -u -p -u -p -r1.28 dwc2_hcd.c
> --- sys/dev/usb/dwc2/dwc2_hcd.c 9 Sep 2022 21:16:54 -0000       1.28
> +++ sys/dev/usb/dwc2/dwc2_hcd.c 18 Sep 2022 07:41:24 -0000
> @@ -4312,6 +4312,7 @@ void dwc2_host_complete(struct dwc2_hsot
>                                         xfer);
>         }
>
> +       dwc2_hcd_urb_free(sc->sc_hsotg, dxfer->urb, xfer->nframes);
>         qtd->urb = NULL;
>         timeout_del(&xfer->timeout_handle);
>         usb_rem_task(xfer->device, &xfer->abort_task);
>

Reply via email to