On Sun, Apr 12, 2020 at 01:58:18PM +0000, Mikolaj Kucharski wrote: > On Sun, Apr 12, 2020 at 03:22:01PM +0200, Patrick Wildt wrote: > > Maybe, make that printf conditional on something weird, like the > > following. Since, if it runs off the DMA buffers map, maybe actlen > > is broken. > > > > if (xfer->length < xfer->actlen) > > printf("ehci_idon: .... > > Bingo. Kernel panic on first boot. > > ehci_idone: len=4, actlen=-15996, cerr=3, status=0x8c00 type=0x0 >
At the end is my debugging patch, so it's easier (hopefully) to understand dmesg output just before kernel panic. Please be aware of wrong indentations, but I wanted to make my changes against existing code minimal. Below diff is with latest Patrick's changes in this thread. So, with below diff crash looks as follows (transcribed by hand): urtwn0: RUN -> INIT urtwn0: INIT -> SCAN urtwn0: begin active scan urtwn0: SCAN -> SCAN urtwn0: sending probe_req to ff:ff:ff:ff:ff:ff on channel 2 mode auto urtwn0: SCAN -> INIT XXX ehci_idone: len=1, actlen=4294951305, xf_actlen=0, status=0x8c00 XXX ehci_idone: x_actlen=4294951304, x1=16000, x2=8 XXX ehci_idone: x_actlen=4294951305, x1=0, x2=1 XXX ehci_idone: x_actlen=4294951305, x1=0, x2=0 ehci_idone: len=1, actlen=4294951305, x_actlen=4294951305, cerr=3, status=0x8c00 type=0x0 panic: _dmamap_sync: ran off map! and the usual panic trace with ehci_idone() starts from that moment. With above output I'm following this code path: 908 if (xfer->length < actlen) { 909 printf("XXX ehci_idone: len=%u, actlen=%u, xf_actlen=%u, " 910 "status=0x%x\n", xfer->length, actlen, xfer->actlen, status); 911 x_actlen = 0; 912 for (sqtd = ex->sqtdstart; sqtd != NULL; sqtd = sqtd->nextqtd) { 913 usb_syncmem(&sqtd->dma, sqtd->offs, sizeof(sqtd->qtd), 914 BUS_DMASYNC_POSTWRITE | BUS_DMASYNC_POSTREAD); 915 nstatus = letoh32(sqtd->qtd.qtd_status); 916 if (nstatus & EHCI_QTD_ACTIVE) 917 break; 918 919 status = nstatus; 920 /* halt is ok if descriptor is last, and complete */ 921 if (sqtd->qtd.qtd_next == htole32(EHCI_LINK_TERMINATE) && 922 EHCI_QTD_GET_BYTES(status) == 0) 923 status &= ~EHCI_QTD_HALTED; 924 if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP) { 925 x1 = EHCI_QTD_GET_BYTES(status); 926 x2 = sqtd->len; 927 x_actlen += x2 - x1; 928 printf("XXX ehci_idone: x_actlen=%u, x1=%u, x2=%u\n", x_actlen, x1, x2); 929 } 930 } 931 } On the first iteration of above for loop x_actlen ends up as 4294951304, in line 927 because EHCI_QTD_GET_BYTES(status) returns 16000, line 925 and sqtd->len returns 8 line 926. So EHCI_QTD_GET_BYTES(status) is bigger than sqtd->len and I don't know is this expected, can this happened. From that moment x_actlen will have large value and that will lead to kernel panic. I actually have seen this before also with urtwn(4), but don't remember was that Pinebook or amd64, but dmesg says: usb_transfer_complete: actlen > len 4294951172 > 4 That was couple of days, or weeks before recent changes to USB stack. That message comes from usb_transfer_complete() file dev/usb/usbdi.c around lines 738 to 744. I'm writing this, so maybe someone can push me in the correct direction how to help debug this further. Index: sys/dev/usb/ehci.c =================================================================== RCS file: /cvs/src/sys/dev/usb/ehci.c,v retrieving revision 1.210 diff -u -p -u -r1.210 ehci.c --- sys/dev/usb/ehci.c 3 Apr 2020 20:11:47 -0000 1.210 +++ sys/dev/usb/ehci.c 13 Apr 2020 18:04:51 -0000 @@ -869,7 +869,9 @@ ehci_idone(struct usbd_xfer *xfer) struct ehci_xfer *ex = (struct ehci_xfer *)xfer; struct ehci_soft_qtd *sqtd; u_int32_t status = 0, nstatus = 0; - int actlen, cerr; + u_int32_t actlen, x_actlen; + int cerr; + u_int32_t x1, x2; #ifdef DIAGNOSTIC { @@ -903,8 +905,33 @@ ehci_idone(struct usbd_xfer *xfer) actlen += sqtd->len - EHCI_QTD_GET_BYTES(status); } + if (xfer->length < actlen) { + printf("XXX ehci_idone: len=%u, actlen=%u, xf_actlen=%u, " + "status=0x%x\n", xfer->length, actlen, xfer->actlen, status); + x_actlen = 0; + for (sqtd = ex->sqtdstart; sqtd != NULL; sqtd = sqtd->nextqtd) { + usb_syncmem(&sqtd->dma, sqtd->offs, sizeof(sqtd->qtd), + BUS_DMASYNC_POSTWRITE | BUS_DMASYNC_POSTREAD); + nstatus = letoh32(sqtd->qtd.qtd_status); + if (nstatus & EHCI_QTD_ACTIVE) + break; + + status = nstatus; + /* halt is ok if descriptor is last, and complete */ + if (sqtd->qtd.qtd_next == htole32(EHCI_LINK_TERMINATE) && + EHCI_QTD_GET_BYTES(status) == 0) + status &= ~EHCI_QTD_HALTED; + if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP) { + x1 = EHCI_QTD_GET_BYTES(status); + x2 = sqtd->len; + x_actlen += x2 - x1; + printf("XXX ehci_idone: x_actlen=%u, x1=%u, x2=%u\n", x_actlen, x1, x2); + } + } + } + cerr = EHCI_QTD_GET_CERR(status); - DPRINTFN(/*10*/2, ("ehci_idone: len=%d, actlen=%d, cerr=%d, " + DPRINTFN(/*10*/2, ("ehci_idone: len=%u, actlen=%u, cerr=%d, " "status=0x%x\n", xfer->length, actlen, cerr, status)); xfer->actlen = actlen; if ((status & EHCI_QTD_HALTED) != 0) { @@ -912,13 +939,20 @@ ehci_idone(struct usbd_xfer *xfer) xfer->status = USBD_STALLED; else xfer->status = USBD_IOERROR; /* more info XXX */ - } else + } else { + if (xfer->actlen) { + if (xfer->length < xfer->actlen) { + printf("ehci_idone: len=%u, actlen=%u, x_actlen=%u, cerr=%d, " + "status=0x%x type=0x%x\n", xfer->length, actlen, xfer->actlen, cerr, + status, UE_GET_XFERTYPE(xfer->pipe->endpoint->edesc->bmAttributes)); + } + usb_syncmem(&xfer->dmabuf, 0, xfer->actlen, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); + } xfer->status = USBD_NORMAL_COMPLETION; + } - if (xfer->actlen) - usb_syncmem(&xfer->dmabuf, 0, xfer->actlen, - usbd_xfer_isread(xfer) ? - BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); usb_transfer_complete(xfer); DPRINTFN(/*12*/2, ("ehci_idone: ex=%p done\n", ex)); } -- Regards, Mikolaj