On Sun, Apr 12, 2020 at 01:58:18PM +0000, Mikolaj Kucharski wrote:
> On Sun, Apr 12, 2020 at 03:22:01PM +0200, Patrick Wildt wrote:
> > Maybe, make that printf conditional on something weird, like the
> > following.  Since, if it runs off the DMA buffers map, maybe actlen
> > is broken.
> > 
> > if (xfer->length < xfer->actlen)
> >     printf("ehci_idon: ....
> 
> Bingo. Kernel panic on first boot.
> 
> ehci_idone: len=4, actlen=-15996, cerr=3, status=0x8c00 type=0x0
> 

At the end is my debugging patch, so it's easier (hopefully) to
understand dmesg output just before kernel panic. Please be aware of
wrong indentations, but I wanted to make my changes against existing
code minimal. Below diff is with latest Patrick's changes in this
thread.

So, with below diff crash looks as follows (transcribed by hand):

urtwn0: RUN -> INIT
urtwn0: INIT -> SCAN
urtwn0: begin active scan
urtwn0: SCAN -> SCAN
urtwn0: sending probe_req to ff:ff:ff:ff:ff:ff on channel 2 mode auto
urtwn0: SCAN -> INIT
XXX ehci_idone: len=1, actlen=4294951305, xf_actlen=0, status=0x8c00
XXX ehci_idone: x_actlen=4294951304, x1=16000, x2=8
XXX ehci_idone: x_actlen=4294951305, x1=0, x2=1
XXX ehci_idone: x_actlen=4294951305, x1=0, x2=0
ehci_idone: len=1, actlen=4294951305, x_actlen=4294951305, cerr=3, 
status=0x8c00 type=0x0
panic: _dmamap_sync: ran off map!

and the usual panic trace with ehci_idone() starts from that moment.

With above output I'm following this code path:

908          if (xfer->length < actlen) {
909                  printf("XXX ehci_idone: len=%u, actlen=%u, xf_actlen=%u, "
910                          "status=0x%x\n", xfer->length, actlen, 
xfer->actlen, status);
911          x_actlen = 0;
912          for (sqtd = ex->sqtdstart; sqtd != NULL; sqtd = sqtd->nextqtd) {
913                  usb_syncmem(&sqtd->dma, sqtd->offs, sizeof(sqtd->qtd),
914                      BUS_DMASYNC_POSTWRITE | BUS_DMASYNC_POSTREAD);
915                  nstatus = letoh32(sqtd->qtd.qtd_status);
916                  if (nstatus & EHCI_QTD_ACTIVE)
917                          break;
918  
919                  status = nstatus;
920                  /* halt is ok if descriptor is last, and complete */
921                  if (sqtd->qtd.qtd_next == htole32(EHCI_LINK_TERMINATE) &&
922                      EHCI_QTD_GET_BYTES(status) == 0)
923                          status &= ~EHCI_QTD_HALTED;
924                  if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP) {
925                          x1 = EHCI_QTD_GET_BYTES(status);
926                          x2 = sqtd->len;
927                          x_actlen += x2 - x1;
928                          printf("XXX ehci_idone: x_actlen=%u, x1=%u, 
x2=%u\n", x_actlen, x1, x2);
929                  }
930          }
931          }

On the first iteration of above for loop x_actlen ends up as 4294951304,
in line 927 because EHCI_QTD_GET_BYTES(status) returns 16000, line 925
and sqtd->len returns 8 line 926. So EHCI_QTD_GET_BYTES(status) is
bigger than sqtd->len and I don't know is this expected, can this
happened. From that moment x_actlen will have large value and that
will lead to kernel panic.

I actually have seen this before also with urtwn(4), but don't
remember was that Pinebook or amd64, but dmesg says:

usb_transfer_complete: actlen > len 4294951172 > 4

That was couple of days, or weeks before recent changes to USB stack.
That message comes from usb_transfer_complete() file dev/usb/usbdi.c
around lines 738 to 744.

I'm writing this, so maybe someone can push me in the correct direction
how to help debug this further.


Index: sys/dev/usb/ehci.c
===================================================================
RCS file: /cvs/src/sys/dev/usb/ehci.c,v
retrieving revision 1.210
diff -u -p -u -r1.210 ehci.c
--- sys/dev/usb/ehci.c  3 Apr 2020 20:11:47 -0000       1.210
+++ sys/dev/usb/ehci.c  13 Apr 2020 18:04:51 -0000
@@ -869,7 +869,9 @@ ehci_idone(struct usbd_xfer *xfer)
        struct ehci_xfer *ex = (struct ehci_xfer *)xfer;
        struct ehci_soft_qtd *sqtd;
        u_int32_t status = 0, nstatus = 0;
-       int actlen, cerr;
+       u_int32_t actlen, x_actlen;
+       int cerr;
+       u_int32_t x1, x2;
 
 #ifdef DIAGNOSTIC
        {
@@ -903,8 +905,33 @@ ehci_idone(struct usbd_xfer *xfer)
                        actlen += sqtd->len - EHCI_QTD_GET_BYTES(status);
        }
 
+       if (xfer->length < actlen) {
+               printf("XXX ehci_idone: len=%u, actlen=%u, xf_actlen=%u, "
+                       "status=0x%x\n", xfer->length, actlen, xfer->actlen, 
status);
+       x_actlen = 0;
+       for (sqtd = ex->sqtdstart; sqtd != NULL; sqtd = sqtd->nextqtd) {
+               usb_syncmem(&sqtd->dma, sqtd->offs, sizeof(sqtd->qtd),
+                   BUS_DMASYNC_POSTWRITE | BUS_DMASYNC_POSTREAD);
+               nstatus = letoh32(sqtd->qtd.qtd_status);
+               if (nstatus & EHCI_QTD_ACTIVE)
+                       break;
+
+               status = nstatus;
+               /* halt is ok if descriptor is last, and complete */
+               if (sqtd->qtd.qtd_next == htole32(EHCI_LINK_TERMINATE) &&
+                   EHCI_QTD_GET_BYTES(status) == 0)
+                       status &= ~EHCI_QTD_HALTED;
+               if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP) {
+                       x1 = EHCI_QTD_GET_BYTES(status);
+                       x2 = sqtd->len;
+                       x_actlen += x2 - x1;
+                       printf("XXX ehci_idone: x_actlen=%u, x1=%u, x2=%u\n", 
x_actlen, x1, x2);
+               }
+       }
+       }
+
        cerr = EHCI_QTD_GET_CERR(status);
-       DPRINTFN(/*10*/2, ("ehci_idone: len=%d, actlen=%d, cerr=%d, "
+       DPRINTFN(/*10*/2, ("ehci_idone: len=%u, actlen=%u, cerr=%d, "
            "status=0x%x\n", xfer->length, actlen, cerr, status));
        xfer->actlen = actlen;
        if ((status & EHCI_QTD_HALTED) != 0) {
@@ -912,13 +939,20 @@ ehci_idone(struct usbd_xfer *xfer)
                        xfer->status = USBD_STALLED;
                else
                        xfer->status = USBD_IOERROR; /* more info XXX */
-       } else
+       } else {
+               if (xfer->actlen) {
+                       if (xfer->length < xfer->actlen) {
+                       printf("ehci_idone: len=%u, actlen=%u, x_actlen=%u, 
cerr=%d, "
+                           "status=0x%x type=0x%x\n", xfer->length, actlen, 
xfer->actlen, cerr,
+                           status, 
UE_GET_XFERTYPE(xfer->pipe->endpoint->edesc->bmAttributes));
+                       }
+                       usb_syncmem(&xfer->dmabuf, 0, xfer->actlen,
+                           usbd_xfer_isread(xfer) ?
+                           BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE);
+               }
                xfer->status = USBD_NORMAL_COMPLETION;
+       }
 
-       if (xfer->actlen)
-               usb_syncmem(&xfer->dmabuf, 0, xfer->actlen,
-                   usbd_xfer_isread(xfer) ?
-                   BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE);
        usb_transfer_complete(xfer);
        DPRINTFN(/*12*/2, ("ehci_idone: ex=%p done\n", ex));
 }

-- 
Regards,
 Mikolaj

Reply via email to