Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
Hi Martin, On Wed, Sep 19, 2018 at 10:36:41AM -0300, Martin Pieuchot wrote: > At that moment you don't need to plug back the phone. The ugen(4) > driver has a flawed logic where it waits 1min for all the transfers > to finish. So just wait until you see: > > usb_detach_wait: ugen1 didn't detach > > What would help a lot is the output of: > > # ps -lAk > > After unplugging the phone but before seeing "usb_detach_wait". > > If you can also get this output after seeing the message & before > killing the server that would also help. > > Cheers, > Martin OK I managed to capture the ps from before usb_detach_wait and after. Here's the dmesg-related stuff: 1) Plug in phone: xhci0: port=7 change=0x80 xhci0: port=7 change=0x80 xhci0: xhci_cmd_slot_control xhci0: dev 3, input=0xff00766a6000 slot=0xff00766a6020 ep0=0xff00766a6040 xhci0: dev 3, setting DCBAA to 0x766a7000 xhci_pipe_init: pipe=0x8154f000 addr=0 depth=1 port=7 speed=3 dev 3 dci 1 (epAddr=0x0) xhci0: xhci_cmd_set_address BSR=1 xhci0: xhci_cmd_set_address BSR=0 xhci0: dev 3 addr 3 ugen1 at uhub0 port 7 "motorola XT1039" rev 2.00/2.28 addr 4 ugen_set_config: ugen1 to configno 1, sc=0x815c8000 ugen_set_config: ifaceno 0 ugen_set_config: endptno 0, endpt=0x81(1,128), sce=0x815c8468 ugen_set_config: endptno 1, endpt=0x01(1,0), sce=0x815c8310 2) Start adb: ugenopen: flag=3, mode=8192, unit=0 endpt=0 ugenopen: flag=3, mode=8192, unit=1 endpt=0 ugenioctl: cmd=80045572 ugenioctl: cmd=c020556f xhci0: short xfer 0xff047d8afc30 for 48 ugenioctl: cmd=80045572 ugenioctl: cmd=c020556f xhci0: short xfer 0xff047d8afc30 for 51 ugenopen: flag=3, mode=8192, unit=1 endpt=1 ugenopen: sc=0x815c8000, endpt=1, dir=0, sce=0x815c8310 xhci_pipe_init: pipe=0x81586000 addr=4 depth=1 port=7 speed=3 dev 3 dci 2 (epAddr=0x1) xhci0: xhci_cmd_configure_ep dev 3 ugenopen: flag=3, mode=8192, unit=1 endpt=1 ugenopen: sc=0x815c8000, endpt=1, dir=0, sce=0x815c8310 xhci_pipe_init: pipe=0x815cf000 addr=4 depth=1 port=7 speed=3 dev 3 dci 2 (epAddr=0x1) xhci0: xhci_cmd_configure_ep dev 3 ugenopen: sc=0x815c8000, endpt=1, dir=1, sce=0x815c8468 xhci_pipe_init: pipe=0x815d addr=4 depth=1 port=7 speed=3 dev 3 dci 3 (epAddr=0x81) xhci0: xhci_cmd_configure_ep dev 3 ugenioctl: cmd=80045572 ugen1: ugenwrite: 1 ugenwrite: transfer 24 bytes ugenopen: sc=0x815c8000, endpt=1, dir=1, sce=0x815c8468 xhci_pipe_init: pipe=0x815d1000 addr=4 depth=1 port=7 speed=3 dev 3 dci 3 (epAddr=0x81) xhci0: xhci_cmd_configure_ep dev 3 xhci0: wrong trb index (4294967040) max is 255 ugenioctl: cmd=80045572 ugenioctl: cmd=80045571 ugen1: ugenread: 1 ugenread: start transfer 24 bytes ugenopen: flag=3, mode=8192, unit=0 endpt=0 ugenopen: flag=3, mode=8192, unit=0 endpt=0 ugenopen: flag=3, mode=8192, unit=0 endpt=0 3) Full dmesg OpenBSD 6.4-beta (CUSTOM) #6: Wed Sep 26 21:00:22 BST 2018 tom@freya.pertho.local:/usr/src/sys/arch/amd64/compile/CUSTOM real mem = 17040445440 (16251MB) avail mem = 16514740224 (15749MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x7b288000 (41 entries) bios0: vendor American Megatrends Inc. version "1.05.07" date 09/29/2017 bios0: PC Specialist LTD N13xWU acpi0 at bios0: rev 2 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP APIC FPDT FIDT MCFG SSDT SSDT HPET UEFI SSDT SSDT SSDT DBGP DBG2 DMAR BGRT ASF! WSMT acpi0: wakeup devices PXSX(S4) RP17(S4) PXSX(S4) RP18(S4) PXSX(S4) RP19(S4) PXSX(S4) RP20(S4) PXSX(S4) RP21(S4) PXSX(S4) RP22(S4) PXSX(S4) RP23(S4) PXSX(S4) RP24(S4) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.76 MHz, 06-8e-0a cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN cpu0: 256KB 64b/line 8-way L2 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 24MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.07 MHz, 06-8e-0a cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,N
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
Hello Tom, On 12/09/18(Wed) 18:58, Tom Murphy wrote: > On Wed, Sep 12, 2018 at 12:55:01PM -0300, Martin Pieuchot wrote: > > On 08/09/18(Sat) 12:07, Tom Murphy wrote: > > > On Thu, Sep 06, 2018 at 01:06:50PM -0300, Martin Pieuchot wrote: > > > > Tom, as I said previously you've found a race in the ugen(4) driver. > > > > > > > > That's the symptom: > > > > > > > > > [...] > > > > > usb_detach_wait: ugen1 didn't detach > > > > > > > > To be able to understand which race we are chasing, could you rebuild a > > > > kernel with UGEN_DEBUG defined and set `ugendebug' to 6? > > > > > > OK here's the output per each step. Below that will be the dmesg and the > > > backtrace. > > > > Thanks a lot, but I need the same outputs with both UGEN_DEBUG and > > XHCI_DEBUG, and of course `ugendebug' set to 6 :) > > > > The interaction between ugen(4) the stack and xhci(4) is what will tell > > us where is the use-after-free :) > > Oh sorry about that I had replaced XHCI_DEBUG with UGEN_DEBUG. > > Anyway here's testing under kernel with both turned on and the ugendebug set > to 6: > > 1. Plugging in phone > > xhci0: port=7 change=0x80 > xhci0: port=7 change=0x80 > xhci0: xhci_cmd_slot_control > xhci0: dev 3, input=0xff0077164000 slot=0xff0077164020 > ep0=0xff0077164040 > xhci0: dev 3, setting DCBAA to 0x77165000 > xhci_pipe_init: pipe=0x81596000 addr=0 depth=1 port=7 speed=3 dev 3 > dci 1 (epAddr=0x0) > xhci0: xhci_cmd_set_address BSR=1 > xhci0: xhci_cmd_set_address BSR=0 > xhci0: dev 3 addr 3 > ugen1 at uhub0 port 7 "motorola XT1039" rev 2.00/2.28 addr 4 > ugen_set_config: ugen1 to configno 1, sc=0x81534000 > ugen_set_config: ifaceno 0 > ugen_set_config: endptno 0, endpt=0x81(1,128), sce=0x81534468 > ugen_set_config: endptno 1, endpt=0x01(1,0), sce=0x81534310 > > 2. Starting adb > > ugenopen: flag=3, mode=8192, unit=0 endpt=0 > ugenopen: flag=3, mode=8192, unit=1 endpt=0 > ugenioctl: cmd=80045572 > ugenioctl: cmd=c020556f > xhci0: short xfer 0xff047d8afe10 for 48 > ugenioctl: cmd=80045572 > ugenioctl: cmd=c020556f > xhci0: short xfer 0xff047d8afe10 for 51 > ugenopen: flag=3, mode=8192, unit=1 endpt=1 > ugenopen: sc=0x81534000, endpt=1, dir=0, sce=0x81534310 > xhci_pipe_init: pipe=0x8155 addr=4 depth=1 port=7 speed=3 dev 3 > dci 2 (epAddr=0x1) > xhci0: xhci_cmd_configure_ep dev 3 > ugenopen: flag=3, mode=8192, unit=1 endpt=1 > ugenopen: sc=0x81534000, endpt=1, dir=0, sce=0x81534310 > xhci_pipe_init: pipe=0x815e2000 addr=4 depth=1 port=7 speed=3 dev 3 > dci 2 (epAddr=0x1) > xhci0: xhci_cmd_configure_ep dev 3 > ugenopen: sc=0x81534000, endpt=1, dir=1, sce=0x81534468 > xhci_pipe_init: pipe=0x815e3000 addr=4 depth=1 port=7 speed=3 dev 3 > dci 3 (epAddr=0x81) > xhci0: xhci_cmd_configure_ep dev 3 > ugenioctl: cmd=80045572 > ugenioctl: cmd=80045571 > ugen1: ugenread: 1 > ugenread: start transfer 24 bytes > ugenopen: sc=0x81534000, endpt=1, dir=1, sce=0x81534468 > xhci_pipe_init: pipe=0x815e4000 addr=4 depth=1 port=7 speed=3 dev 3 > dci 3 (epAddr=0x81) > xhci0: xhci_cmd_configure_ep dev 3 > ugenioctl: cmd=80045572 > ugen1: ugenwrite: 1 > ugenwrite: transfer 24 bytes > ugenopen: flag=3, mode=8192, unit=0 endpt=0 > ugenopen: flag=3, mode=8192, unit=0 endpt=0 > ugenopen: flag=3, mode=8192, unit=0 endpt=0 > ugenopen: flag=3, mode=8192, unit=0 endpt=0 > ugenopen: flag=3, mode=8192, unit=0 endpt=0 > ugenopen: flag=3, mode=8192, unit=0 endpt=0 > > 3. Unplugged phone > > ugenopen: flag=3, mode=8192, unit=0 endpt=0 > ugenopen: flag=3, mode=8192, unit=0 endpt=0 > ugenopen: flag=3, mode=8192, unit=0 endpt=0 > xhci0: port=7 change=0x80 > ugen_detach: sc=0x81534000 flags=1 > xhci_abort_xfer: xfer=0xff047d8afc30 status=IN_PROGRESS err=CANCELLED > actlen=0 len=24 idx=0 > xhci0: xhci_cmd_stop_ep dev 3 dci 2 > xhci0: wrong trb index (4294967040) max is 255 > xhci0: xhci_cmd_set_tr_deq_async dev 3 dci 2 > > 4. Plug back in phone At that moment you don't need to plug back the phone. The ugen(4) driver has a flawed logic where it waits 1min for all the transfers to finish. So just wait until you see: usb_detach_wait: ugen1 didn't detach What would help a lot is the output of: # ps -lAk After unplugging the phone but before seeing "usb_detach_wait". If you can also get this output after seeing the message & before killing the server that would also help. Cheers, Martin
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
On Wed, Sep 12, 2018 at 12:55:01PM -0300, Martin Pieuchot wrote: > Hello Tom, > > On 08/09/18(Sat) 12:07, Tom Murphy wrote: > > On Thu, Sep 06, 2018 at 01:06:50PM -0300, Martin Pieuchot wrote: > > > Tom, as I said previously you've found a race in the ugen(4) driver. > > > > > > That's the symptom: > > > > > > > [...] > > > > usb_detach_wait: ugen1 didn't detach > > > > > > To be able to understand which race we are chasing, could you rebuild a > > > kernel with UGEN_DEBUG defined and set `ugendebug' to 6? > > > > OK here's the output per each step. Below that will be the dmesg and the > > backtrace. > > Thanks a lot, but I need the same outputs with both UGEN_DEBUG and > XHCI_DEBUG, and of course `ugendebug' set to 6 :) > > The interaction between ugen(4) the stack and xhci(4) is what will tell > us where is the use-after-free :) Oh sorry about that I had replaced XHCI_DEBUG with UGEN_DEBUG. Anyway here's testing under kernel with both turned on and the ugendebug set to 6: 1. Plugging in phone xhci0: port=7 change=0x80 xhci0: port=7 change=0x80 xhci0: xhci_cmd_slot_control xhci0: dev 3, input=0xff0077164000 slot=0xff0077164020 ep0=0xff0077164040 xhci0: dev 3, setting DCBAA to 0x77165000 xhci_pipe_init: pipe=0x81596000 addr=0 depth=1 port=7 speed=3 dev 3 dci 1 (epAddr=0x0) xhci0: xhci_cmd_set_address BSR=1 xhci0: xhci_cmd_set_address BSR=0 xhci0: dev 3 addr 3 ugen1 at uhub0 port 7 "motorola XT1039" rev 2.00/2.28 addr 4 ugen_set_config: ugen1 to configno 1, sc=0x81534000 ugen_set_config: ifaceno 0 ugen_set_config: endptno 0, endpt=0x81(1,128), sce=0x81534468 ugen_set_config: endptno 1, endpt=0x01(1,0), sce=0x81534310 2. Starting adb ugenopen: flag=3, mode=8192, unit=0 endpt=0 ugenopen: flag=3, mode=8192, unit=1 endpt=0 ugenioctl: cmd=80045572 ugenioctl: cmd=c020556f xhci0: short xfer 0xff047d8afe10 for 48 ugenioctl: cmd=80045572 ugenioctl: cmd=c020556f xhci0: short xfer 0xff047d8afe10 for 51 ugenopen: flag=3, mode=8192, unit=1 endpt=1 ugenopen: sc=0x81534000, endpt=1, dir=0, sce=0x81534310 xhci_pipe_init: pipe=0x8155 addr=4 depth=1 port=7 speed=3 dev 3 dci 2 (epAddr=0x1) xhci0: xhci_cmd_configure_ep dev 3 ugenopen: flag=3, mode=8192, unit=1 endpt=1 ugenopen: sc=0x81534000, endpt=1, dir=0, sce=0x81534310 xhci_pipe_init: pipe=0x815e2000 addr=4 depth=1 port=7 speed=3 dev 3 dci 2 (epAddr=0x1) xhci0: xhci_cmd_configure_ep dev 3 ugenopen: sc=0x81534000, endpt=1, dir=1, sce=0x81534468 xhci_pipe_init: pipe=0x815e3000 addr=4 depth=1 port=7 speed=3 dev 3 dci 3 (epAddr=0x81) xhci0: xhci_cmd_configure_ep dev 3 ugenioctl: cmd=80045572 ugenioctl: cmd=80045571 ugen1: ugenread: 1 ugenread: start transfer 24 bytes ugenopen: sc=0x81534000, endpt=1, dir=1, sce=0x81534468 xhci_pipe_init: pipe=0x815e4000 addr=4 depth=1 port=7 speed=3 dev 3 dci 3 (epAddr=0x81) xhci0: xhci_cmd_configure_ep dev 3 ugenioctl: cmd=80045572 ugen1: ugenwrite: 1 ugenwrite: transfer 24 bytes ugenopen: flag=3, mode=8192, unit=0 endpt=0 ugenopen: flag=3, mode=8192, unit=0 endpt=0 ugenopen: flag=3, mode=8192, unit=0 endpt=0 ugenopen: flag=3, mode=8192, unit=0 endpt=0 ugenopen: flag=3, mode=8192, unit=0 endpt=0 ugenopen: flag=3, mode=8192, unit=0 endpt=0 3. Unplugged phone ugenopen: flag=3, mode=8192, unit=0 endpt=0 ugenopen: flag=3, mode=8192, unit=0 endpt=0 ugenopen: flag=3, mode=8192, unit=0 endpt=0 xhci0: port=7 change=0x80 ugen_detach: sc=0x81534000 flags=1 xhci_abort_xfer: xfer=0xff047d8afc30 status=IN_PROGRESS err=CANCELLED actlen=0 len=24 idx=0 xhci0: xhci_cmd_stop_ep dev 3 dci 2 xhci0: wrong trb index (4294967040) max is 255 xhci0: xhci_cmd_set_tr_deq_async dev 3 dci 2 4. Plug back in phone xhci0: port=7 change=0x80 usb_detach_wait: ugen1 didn't detach ugenclose: close control ugenclose: endpt=1 dir=0 sce=0x81534310 xhci0: xhci_cmd_configure_ep dev 3 ugenclose: endpt=1 dir=1 sce=0x81534468 xhci0: xhci_cmd_configure_ep dev 3 ugen1 detached xhci0: xhci_cmd_configure_ep dev 3 xhci0: xhci_cmd_slot_control xhci0: port=7 change=0x80 xhci0: xhci_cmd_slot_control xhci0: dev 4, input=0xff00764c5000 slot=0xff00764c5020 ep0=0xff00764c5040 xhci0: dev 4, setting DCBAA to 0x77164000 xhci_pipe_init: pipe=0x815db000 addr=0 depth=1 port=7 speed=3 dev 4 dci 1 (epAddr=0x0) xhci0: xhci_cmd_set_address BSR=1 xhci0: xhci_cmd_set_address BSR=0 xhci0: dev 4 addr 4 ugen1 at uhub0 port 7 "motorola XT1039" rev 2.00/2.28 addr 4 ugen_set_config: ugen1 to configno 1, sc=0x81534000 ugen_set_config: ifaceno 0 ugen_set_config: endptno 0, endpt=0x81(1,128), sce=0x81534468 ugen_set_config: endptno 1, endpt=0x01(1,0), sce=0x81534310 ugenopen: flag=3, mode=8192, unit=0 endpt=0 ugenopen: flag=3, mode=8192, unit=0 endpt=0 5. Run 'adb kill-server' xhci_abort_xfer: xfer=0xff047d8afe10 status=IN_PROGRESS err=CANCELLED a
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
Hello Tom, On 08/09/18(Sat) 12:07, Tom Murphy wrote: > On Thu, Sep 06, 2018 at 01:06:50PM -0300, Martin Pieuchot wrote: > > Tom, as I said previously you've found a race in the ugen(4) driver. > > > > That's the symptom: > > > > > [...] > > > usb_detach_wait: ugen1 didn't detach > > > > To be able to understand which race we are chasing, could you rebuild a > > kernel with UGEN_DEBUG defined and set `ugendebug' to 6? > > OK here's the output per each step. Below that will be the dmesg and the > backtrace. Thanks a lot, but I need the same outputs with both UGEN_DEBUG and XHCI_DEBUG, and of course `ugendebug' set to 6 :) The interaction between ugen(4) the stack and xhci(4) is what will tell us where is the use-after-free :)
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
Hi Martin, On Thu, Sep 06, 2018 at 01:06:50PM -0300, Martin Pieuchot wrote: > Tom, as I said previously you've found a race in the ugen(4) driver. > > That's the symptom: > > > [...] > > usb_detach_wait: ugen1 didn't detach > > To be able to understand which race we are chasing, could you rebuild a > kernel with UGEN_DEBUG defined and set `ugendebug' to 6? OK here's the output per each step. Below that will be the dmesg and the backtrace. 1. Plug in phone: ugen1 at uhub0 port 7 "motorola XT1039" rev 2.00/2.28 addr 4 ugen_set_config: ugen1 to configno 1, sc=0x81533000 ugen_set_config: ifaceno 0 ugen_set_config: endptno 0, endpt=0x81(1,128), sce=0x81533468 ugen_set_config: endptno 1, endpt=0x01(1,0), sce=0x81533310 2. Run adb start-server: ugenopen: flag=3, mode=8192, unit=0 endpt=0 ugenopen: flag=3, mode=8192, unit=1 endpt=0 ugenioctl: cmd=80045572 ugenioctl: cmd=c020556f ugenioctl: cmd=80045572 ugenioctl: cmd=c020556f ugenopen: flag=3, mode=8192, unit=1 endpt=1 ugenopen: sc=0x81533000, endpt=1, dir=0, sce=0x81533310 ugenopen: flag=3, mode=8192, unit=1 endpt=1 ugenopen: sc=0x81533000, endpt=1, dir=0, sce=0x81533310 ugenopen: sc=0x81533000, endpt=1, dir=1, sce=0x81533468 ugenioctl: cmd=80045572 ugenioctl: cmd=80045571 ugen1: ugenread: 1 ugenread: start transfer 24 bytes This message starts repeating over and over: ugenopen: flag=3, mode=8192, unit=0 endpt=0 3. Disconnect phone: ugen_detach: sc=0x81533000 flags=1 xhci0: wrong trb index (4294967040) max is 255 4. Run adb kill-server usb_detach_wait: ugen1 didn't detach ugenclose: close control ugenclose: endpt=1 dir=0 sce=0x81533310 ugenclose: endpt=1 dir=1 sce=0x81533468 ugen1 detached ugen1 at uhub0 port 7 "motorola XT1039" rev 2.00/2.28 addr 4 ugen_set_config: ugen1 to configno 1, sc=0x81533000 ugen_set_config: ifaceno 0 ugen_set_config: endptno 0, endpt=0x81(1,128), sce=0x81533468 ugen_set_config: endptno 1, endpt=0x01(1,0), sce=0x81533310 Right after the above ugen messages, the kernel protection fault trap: kernel: protection fault trap, code=0 Stopped at xhci_abort_xfer+0x57:cmpb $0,0x471(%r14) ddb{1}> bt xhci_abort_xfer(8f61ab7a60c70bc6,4) at xhci_abort_xfer+0x57 usbd_transfer(4369088e1033fbfc) at usbd_transfer+0x24d ugen_do_read(ba86656ba01aa6cf,800031bf8570,81533000,ff047e7b2a20) at ugen_do_read+0x3eb ugenread(a7bc233819d8f0fa,800031bf8570,800031bf8470) at ugenread+0x47 spec_read(6cdfcf35dbaa936b) at spec_read+0xab VOP_READ(bbd6b618d0407947,e27a94ce45ee2e22,ff047e7b2a20,ff03) at VOP_READ+0x49 dofilereadv(408a5e296dd9b5f,30,8000fffe79f8,3,800031bf86a0) at dofilereadv+0xe0 sys_read(bbd6b618d0080a08,e27a94ce45ee2e22,18) at sys_read+0x5c syscall(6cdfcf35db6c51e6) at syscall+0x32a Xsyscall(6,3,17bc55b14750,3,1,17bc84df5a00) at Xsyscall+0x128 end of kernel end trace frame: 0x17bc9ef24890, count: -11 Dmesg: OpenBSD 6.4-beta (CUSTOM) #3: Sat Sep 8 10:43:09 BST 2018 tom@freya.pertho.local:/sys/arch/amd64/compile/CUSTOM real mem = 17040445440 (16251MB) avail mem = 16514748416 (15749MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x7b288000 (41 entries) bios0: vendor American Megatrends Inc. version "1.05.07" date 09/29/2017 bios0: PC Specialist LTD N13xWU acpi0 at bios0: rev 2 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP APIC FPDT FIDT MCFG SSDT SSDT HPET UEFI SSDT SSDT SSDT DBGP DBG2 DMAR BGRT ASF! WSMT acpi0: wakeup devices PXSX(S4) RP17(S4) PXSX(S4) RP18(S4) PXSX(S4) RP19(S4) PXSX(S4) RP20(S4) PXSX(S4) RP21(S4) PXSX(S4) RP22(S4) PXSX(S4) RP23(S4) PXSX(S4) RP24(S4) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.76 MHz, 06-8e-0a cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN cpu0: 256KB 64b/line 8-way L2 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 24MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.07 MHz, 06-8e-0a cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,S
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
On 05/09/18(Wed) 21:32, Tom Murphy wrote: > Hi Martin, Tom, as I said previously you've found a race in the ugen(4) driver. That's the symptom: > [...] > usb_detach_wait: ugen1 didn't detach To be able to understand which race we are chasing, could you rebuild a kernel with UGEN_DEBUG defined and set `ugendebug' to 6? > [...] > Thanks so much for looking into this and spending the time and the patches. > I think we're getting closer! Indeed, thanks for your help! Martin
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
Hi Martin, On Wed, Sep 05, 2018 at 11:19:56AM -0300, Martin Pieuchot wrote: > On 05/09/18(Wed) 09:51, Tom Murphy wrote: > > [...] > > I physically unplug the phone and the kernel starts generating the xhci0: > > timeout aborting transfer messages in a loop. > > Aah! So the messages appear *after* you unplugged the device. That > makes sense. Does the diff below help? I just committed the previous > one so make sure you have current sources. > > Index: xhci.c > === > RCS file: /cvs/src/sys/dev/usb/xhci.c,v > retrieving revision 1.88 > diff -u -p -r1.88 xhci.c > --- xhci.c5 Sep 2018 14:03:28 - 1.88 > +++ xhci.c5 Sep 2018 14:17:39 - > @@ -2055,8 +2055,13 @@ xhci_abort_xfer(struct usbd_xfer *xfer, > xp->aborted_xfer = xfer; > > /* Stop the endpoint and wait until the hardware says so. */ > - if (xhci_cmd_stop_ep(sc, xp->slot, xp->dci)) > + if (xhci_cmd_stop_ep(sc, xp->slot, xp->dci)) { > DPRINTF(("%s: error stopping endpoint\n", DEVNAME(sc))); > + /* Assume the device is gone. */ > + xfer->status = status; > + usb_transfer_complete(xfer); > + return; > + } > > /* >* The transfer was already completed when we stopped the I've just tested the patch. It does fix the behavior when unplugging the device, however, I've gotten the kernel to crash when I do the following: 1) Start adb 2) Connect the phone 3) Try connecting to phone with adb shell (it doesn't work.. says device is offline) 4) Disconnect phone 5) Reconnect phone 6) run: 'adb kill-server' Then kernel protection fault happens. Dmesg and backtrace and XHCI debug entries below. dmesg: OpenBSD 6.4-beta (CUSTOM) #1: Wed Sep 5 19:44:41 BST 2018 tom@freya.pertho.local:/sys/arch/amd64/compile/CUSTOM real mem = 17040445440 (16251MB) avail mem = 16514736128 (15749MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x7b288000 (41 entries) bios0: vendor American Megatrends Inc. version "1.05.07" date 09/29/2017 bios0: PC Specialist LTD N13xWU acpi0 at bios0: rev 2 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP APIC FPDT FIDT MCFG SSDT SSDT HPET UEFI SSDT SSDT SSDT DBGP DBG2 DMAR BGRT ASF! WSMT acpi0: wakeup devices PXSX(S4) RP17(S4) PXSX(S4) RP18(S4) PXSX(S4) RP19(S4) PXSX(S4) RP20(S4) PXSX(S4) RP21(S4) PXSX(S4) RP22(S4) PXSX(S4) RP23(S4) PXSX(S4) RP24(S4) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.47 MHz, 06-8e-0a cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN cpu0: 256KB 64b/line 8-way L2 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 24MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.07 MHz, 06-8e-0a cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN cpu1: 256KB 64b/line 8-way L2 cache cpu1: smt 0, core 1, package 0 cpu2 at mainbus0: apid 4 (application processor) cpu2: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.07 MHz, 06-8e-0a cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN cpu2: 256KB 64b/line 8-way L2 cache cpu2: smt 0, core 2, package 0 cpu3 at mainbus0: apid 6 (application processor) cpu3: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.07 MHz, 06-8e-0a cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,S
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
On 05/09/18(Wed) 09:51, Tom Murphy wrote: > [...] > I physically unplug the phone and the kernel starts generating the xhci0: > timeout aborting transfer messages in a loop. Aah! So the messages appear *after* you unplugged the device. That makes sense. Does the diff below help? I just committed the previous one so make sure you have current sources. Index: xhci.c === RCS file: /cvs/src/sys/dev/usb/xhci.c,v retrieving revision 1.88 diff -u -p -r1.88 xhci.c --- xhci.c 5 Sep 2018 14:03:28 - 1.88 +++ xhci.c 5 Sep 2018 14:17:39 - @@ -2055,8 +2055,13 @@ xhci_abort_xfer(struct usbd_xfer *xfer, xp->aborted_xfer = xfer; /* Stop the endpoint and wait until the hardware says so. */ - if (xhci_cmd_stop_ep(sc, xp->slot, xp->dci)) + if (xhci_cmd_stop_ep(sc, xp->slot, xp->dci)) { DPRINTF(("%s: error stopping endpoint\n", DEVNAME(sc))); + /* Assume the device is gone. */ + xfer->status = status; + usb_transfer_complete(xfer); + return; + } /* * The transfer was already completed when we stopped the
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
On Tue, Sep 04, 2018 at 03:24:19PM -0300, Martin Pieuchot wrote: > On 31/08/18(Fri) 09:19, Tom Murphy wrote: > > [...] > > Here's the dmesg from a XHCI_DEBUG kernel before it crashes (it appears > > to loop quite a few times before the kernel protection fault kicks in.) > > You're now hitting a bug a ugen(4). Because you're detaching a device > while ugen_do_read() is sleeping for a transfer to finish. This is > definitively something that needs to be fixed. > > However... > > > xhci0: timeout aborting transfer > > xhci_abort_xfer: xfer=0xff047d8afe10 status=IN_PROGRESS err=CANCELLED > > actlen=0 len=24 idx=0 > > How do you generate this message? I plug in the Android phone, I start up adb with 'adb start-server'. For whatever reason, the phone comes back as error: device offline (I can only actually connect via adb when the phone is booted into recovery mode, so not sure if this is a bug with LineageOS, anyway, that's not really relevant to the bug here) I physically unplug the phone and the kernel starts generating the xhci0: timeout aborting transfer messages in a loop. After a while the protection fault drops me into ddb. I don't think the transfer ever finishes because the device doesn't respond to it in the first place. So unplugging the device is what causes it to error. If I keep the phone plugged in, the kernel protection fault doesn't show up until I unplug it.
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
On 31/08/18(Fri) 09:19, Tom Murphy wrote: > [...] > Here's the dmesg from a XHCI_DEBUG kernel before it crashes (it appears > to loop quite a few times before the kernel protection fault kicks in.) You're now hitting a bug a ugen(4). Because you're detaching a device while ugen_do_read() is sleeping for a transfer to finish. This is definitively something that needs to be fixed. However... > xhci0: timeout aborting transfer > xhci_abort_xfer: xfer=0xff047d8afe10 status=IN_PROGRESS err=CANCELLED > actlen=0 len=24 idx=0 How do you generate this message?
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
Hi Martin, On Thu, Aug 30, 2018 at 03:38:25PM -0300, Martin Pieuchot wrote: > On 30/08/18(Thu) 18:15, Tom Murphy wrote: > > On Thu, Aug 30, 2018 at 10:30:04AM -0300, Martin Pieuchot wrote: > > > On 30/08/18(Thu) 14:00, Tom Murphy wrote: > > > > On Wed, Aug 29, 2018 at 10:44:51AM -0300, Martin Pieuchot wrote: > > > > > On 28/08/18(Tue) 22:22, Tom Murphy wrote: > > > > > > On Tue, Aug 28, 2018 at 04:20:41PM -0300, Martin Pieuchot wrote: > > > > > > > Hello Tom, > > > > > > > > > > > > > > On 28/08/18(Tue) 11:10, Tom Murphy wrote: > > > > > > > > On Tue, Aug 28, 2018 at 02:49:38PM +0900, Bryan Linton wrote: > > > > > > > > > On 2018-08-25 21:40:57, Tom Murphy wrote: > > > > > > > > > > On Thu, Aug 23, 2018 at 08:45:54PM +0900, Tom Murphy wrote: > > > > > > > > > > > I've narrowed it down. > > > > > > > > > > > > > > > > > > > > > >Last kernel where adb works: June 24 09:59:46 MDT 2018 > > > > > > > > > > >1st Kernel where adb panics: June 25 13:10:32 MDT 2018 > > > > > > > > > > > > > > The real problem is in the xhci(4) driver. When a command with a > > > > > > > timeout is submitted we should ensure no other command is enqueued > > > > > > > before continuing. Sadly the driver did not include any mechanism > > > > > > > to serialize command submissions. Diff below does that and should > > > > > > > fix your problem. > > > > > > > > > > > > > > Can you try it on top of -current? Make sure you have no diff > > > > > > > reverted. > > > > > > > > > > > > Hi, > > > > > > > > > > > > I think I spoke a little too soon. I found a case where it > > > > > > started printing xhci0: command timeout over and over until > > > > > > eventually the kernel panics with a protection fault. I couldn't > > > > > > catch the backtrace properly, but it looked around the same area > > > > > > as this original bug report. > > > > > > > > > > Without backtrace I can't make progress. > > > > > > > > Apologies for the delay. Just found time to reproduce this. Here's > > > > a backtrace: > > > > > > Almost, can you send the full dmesg with the backtrace at the end? > > > > Hi, Sorry, here's the dmesg with the backtrace. > > Is it the live dmesg? I don't see any 'xhci0: command timeout'. Btw > this message doesn't exist so I can't understand which code path is > triggering the problem. Could you build a kernel with XHCI_DEBUG > enabled, reproduce the page fault and send the dmesg (at least the last > 10 lines before crashing) + the trace? Here's the dmesg from a XHCI_DEBUG kernel before it crashes (it appears to loop quite a few times before the kernel protection fault kicks in.) xhci0: timeout aborting transfer xhci_abort_xfer: xfer=0xff047d8afe10 status=IN_PROGRESS err=CANCELLED actlen=0 len=24 idx=0 xhci0: xhci_cmd_stop_ep dev 3 dci 3 xhci0: event error code=19, result=33 trb=0x800031d7bc58 (0x684400b0 0x1300 0x3008401) xhci0: error stopping endpoint xhci0: xhci_cmd_set_tr_deq_async dev 3 dci 3 xhci0: timeout aborting transfer xhci_abort_xfer: xfer=0xff047d8afe10 status=IN_PROGRESS err=CANCELLED actlen=0 len=24 idx=0 xhci0: xhci_cmd_stop_ep dev 3 dci 3 xhci0: event error code=19, result=33 trb=0x800031d7bc58 (0x684400d0 0x1300 0x3008401) xhci0: error stopping endpoint xhci0: xhci_cmd_set_tr_deq_async dev 3 dci 3 xhci0: timeout aborting transfer xhci_abort_xfer: xfer=0xff047d8afe10 status=IN_PROGRESS err=CANCELLED actlen=0 len=24 idx=0 xhci0: xhci_cmd_stop_ep dev 3 dci 3 xhci0: event error code=19, result=33 trb=0x800031d7bc58 (0x6844 0x1300 0x3008401) xhci0: error stopping endpoint xhci0: xhci_cmd_set_tr_deq_async dev 3 dci 3 xhci0: timeout aborting transfer xhci_abort_xfer: xfer=0xff047d8afe10 status=IN_PROGRESS err=CANCELLED actlen=0 len=24 idx=0 xhci0: xhci_cmd_stop_ep dev 3 dci 3 xhci0: event error code=19, result=33 trb=0x800031d7bc58 (0x68440020 0x1300 0x3008401) xhci0: error stopping endpoint xhci0: xhci_cmd_set_tr_deq_async dev 3 dci 3 xhci0: timeout aborting transfer xhci_abort_xfer: xfer=0xff047d8afe10 status=IN_PROGRESS err=CANCELLED actlen=0 len=24 idx=0 xhci0: xhci_cmd_stop_ep dev 3 dci 3 xhci0: event error code=19, result=33 trb=0x800031d7bc58 (0x68440040 0x1300 0x3008401) xhci0: error stopping endpoint xhci0: xhci_cmd_set_tr_deq_async dev 3 dci 3 xhci0: timeout aborting transfer xhci_abort_xfer: xfer=0xff047d8afe10 status=IN_PROGRESS err=CANCELLED actlen=0 len=24 idx=0 xhci0: xhci_cmd_stop_ep dev 3 dci 3 xhci0: event error code=19, result=33 trb=0x800031d7bc58 (0x68440060 0x1300 0x3008401) xhci0: error stopping endpoint xhci0: xhci_cmd_set_tr_deq_async dev 3 dci 3 Full dmesg boot output: OpenBSD 6.4-beta (CUSTOM) #0: Fri Aug 31 08:20:38 BST 2018 tom@freya.pertho.local:/usr/src/sys/arch/amd64/compile/CUSTOM real mem = 17040445440 (16251MB) avail mem = 16514748416 (15749MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMB
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
On 30/08/18(Thu) 18:15, Tom Murphy wrote: > On Thu, Aug 30, 2018 at 10:30:04AM -0300, Martin Pieuchot wrote: > > On 30/08/18(Thu) 14:00, Tom Murphy wrote: > > > On Wed, Aug 29, 2018 at 10:44:51AM -0300, Martin Pieuchot wrote: > > > > On 28/08/18(Tue) 22:22, Tom Murphy wrote: > > > > > On Tue, Aug 28, 2018 at 04:20:41PM -0300, Martin Pieuchot wrote: > > > > > > Hello Tom, > > > > > > > > > > > > On 28/08/18(Tue) 11:10, Tom Murphy wrote: > > > > > > > On Tue, Aug 28, 2018 at 02:49:38PM +0900, Bryan Linton wrote: > > > > > > > > On 2018-08-25 21:40:57, Tom Murphy wrote: > > > > > > > > > On Thu, Aug 23, 2018 at 08:45:54PM +0900, Tom Murphy wrote: > > > > > > > > > > I've narrowed it down. > > > > > > > > > > > > > > > > > > > >Last kernel where adb works: June 24 09:59:46 MDT 2018 > > > > > > > > > >1st Kernel where adb panics: June 25 13:10:32 MDT 2018 > > > > > > > > > > > > The real problem is in the xhci(4) driver. When a command with a > > > > > > timeout is submitted we should ensure no other command is enqueued > > > > > > before continuing. Sadly the driver did not include any mechanism > > > > > > to serialize command submissions. Diff below does that and should > > > > > > fix your problem. > > > > > > > > > > > > Can you try it on top of -current? Make sure you have no diff > > > > > > reverted. > > > > > > > > > > Hi, > > > > > > > > > > I think I spoke a little too soon. I found a case where it > > > > > started printing xhci0: command timeout over and over until > > > > > eventually the kernel panics with a protection fault. I couldn't > > > > > catch the backtrace properly, but it looked around the same area > > > > > as this original bug report. > > > > > > > > Without backtrace I can't make progress. > > > > > > Apologies for the delay. Just found time to reproduce this. Here's > > > a backtrace: > > > > Almost, can you send the full dmesg with the backtrace at the end? > > Hi, Sorry, here's the dmesg with the backtrace. Is it the live dmesg? I don't see any 'xhci0: command timeout'. Btw this message doesn't exist so I can't understand which code path is triggering the problem. Could you build a kernel with XHCI_DEBUG enabled, reproduce the page fault and send the dmesg (at least the last 10 lines before crashing) + the trace?
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
On Thu, Aug 30, 2018 at 10:30:04AM -0300, Martin Pieuchot wrote: > On 30/08/18(Thu) 14:00, Tom Murphy wrote: > > Hi Martin, > > > > On Wed, Aug 29, 2018 at 10:44:51AM -0300, Martin Pieuchot wrote: > > > On 28/08/18(Tue) 22:22, Tom Murphy wrote: > > > > On Tue, Aug 28, 2018 at 04:20:41PM -0300, Martin Pieuchot wrote: > > > > > Hello Tom, > > > > > > > > > > On 28/08/18(Tue) 11:10, Tom Murphy wrote: > > > > > > On Tue, Aug 28, 2018 at 02:49:38PM +0900, Bryan Linton wrote: > > > > > > > On 2018-08-25 21:40:57, Tom Murphy wrote: > > > > > > > > On Thu, Aug 23, 2018 at 08:45:54PM +0900, Tom Murphy wrote: > > > > > > > > > I've narrowed it down. > > > > > > > > > > > > > > > > > >Last kernel where adb works: June 24 09:59:46 MDT 2018 > > > > > > > > >1st Kernel where adb panics: June 25 13:10:32 MDT 2018 > > > > > > > > > > The real problem is in the xhci(4) driver. When a command with a > > > > > timeout is submitted we should ensure no other command is enqueued > > > > > before continuing. Sadly the driver did not include any mechanism > > > > > to serialize command submissions. Diff below does that and should > > > > > fix your problem. > > > > > > > > > > Can you try it on top of -current? Make sure you have no diff > > > > > reverted. > > > > > > > > Hi, > > > > > > > > I think I spoke a little too soon. I found a case where it > > > > started printing xhci0: command timeout over and over until > > > > eventually the kernel panics with a protection fault. I couldn't > > > > catch the backtrace properly, but it looked around the same area > > > > as this original bug report. > > > > > > Without backtrace I can't make progress. > > > > Apologies for the delay. Just found time to reproduce this. Here's > > a backtrace: > > Almost, can you send the full dmesg with the backtrace at the end? Hi, Sorry, here's the dmesg with the backtrace. OpenBSD 6.4-beta (GENERIC.MP) #1: Tue Aug 28 21:00:29 BST 2018 tom@freya.pertho.local:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 17040445440 (16251MB) avail mem = 16514805760 (15749MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x7b288000 (41 entries) bios0: vendor American Megatrends Inc. version "1.05.07" date 09/29/2017 bios0: PC Specialist LTD N13xWU acpi0 at bios0: rev 2 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP APIC FPDT FIDT MCFG SSDT SSDT HPET UEFI SSDT SSDT SSDT DBGP DBG2 DMAR BGRT ASF! WSMT acpi0: wakeup devices PXSX(S4) RP17(S4) PXSX(S4) RP18(S4) PXSX(S4) RP19(S4) PXSX(S4) RP20(S4) PXSX(S4) RP21(S4) PXSX(S4) RP22(S4) PXSX(S4) RP23(S4) PXSX(S4) RP24(S4) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.75 MHz, 06-8e-0a cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN cpu0: 256KB 64b/line 8-way L2 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 24MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.07 MHz, 06-8e-0a cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN cpu1: 256KB 64b/line 8-way L2 cache cpu1: smt 0, core 1, package 0 cpu2 at mainbus0: apid 4 (application processor) cpu2: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.07 MHz, 06-8e-0a cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN cpu2: 256KB 64b/line 8-way L2 cache cpu2: smt 0, core 2, package 0 cpu3 at mainbus0: apid 6 (application processor) cpu3: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz,
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
On 30/08/18(Thu) 14:00, Tom Murphy wrote: > Hi Martin, > > On Wed, Aug 29, 2018 at 10:44:51AM -0300, Martin Pieuchot wrote: > > On 28/08/18(Tue) 22:22, Tom Murphy wrote: > > > On Tue, Aug 28, 2018 at 04:20:41PM -0300, Martin Pieuchot wrote: > > > > Hello Tom, > > > > > > > > On 28/08/18(Tue) 11:10, Tom Murphy wrote: > > > > > On Tue, Aug 28, 2018 at 02:49:38PM +0900, Bryan Linton wrote: > > > > > > On 2018-08-25 21:40:57, Tom Murphy wrote: > > > > > > > On Thu, Aug 23, 2018 at 08:45:54PM +0900, Tom Murphy wrote: > > > > > > > > I've narrowed it down. > > > > > > > > > > > > > > > >Last kernel where adb works: June 24 09:59:46 MDT 2018 > > > > > > > >1st Kernel where adb panics: June 25 13:10:32 MDT 2018 > > > > > > > > The real problem is in the xhci(4) driver. When a command with a > > > > timeout is submitted we should ensure no other command is enqueued > > > > before continuing. Sadly the driver did not include any mechanism > > > > to serialize command submissions. Diff below does that and should > > > > fix your problem. > > > > > > > > Can you try it on top of -current? Make sure you have no diff > > > > reverted. > > > > > > Hi, > > > > > > I think I spoke a little too soon. I found a case where it > > > started printing xhci0: command timeout over and over until > > > eventually the kernel panics with a protection fault. I couldn't > > > catch the backtrace properly, but it looked around the same area > > > as this original bug report. > > > > Without backtrace I can't make progress. > > Apologies for the delay. Just found time to reproduce this. Here's > a backtrace: Almost, can you send the full dmesg with the backtrace at the end?
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
Hi Martin, On Wed, Aug 29, 2018 at 10:44:51AM -0300, Martin Pieuchot wrote: > On 28/08/18(Tue) 22:22, Tom Murphy wrote: > > On Tue, Aug 28, 2018 at 04:20:41PM -0300, Martin Pieuchot wrote: > > > Hello Tom, > > > > > > On 28/08/18(Tue) 11:10, Tom Murphy wrote: > > > > On Tue, Aug 28, 2018 at 02:49:38PM +0900, Bryan Linton wrote: > > > > > On 2018-08-25 21:40:57, Tom Murphy wrote: > > > > > > On Thu, Aug 23, 2018 at 08:45:54PM +0900, Tom Murphy wrote: > > > > > > > I've narrowed it down. > > > > > > > > > > > > > >Last kernel where adb works: June 24 09:59:46 MDT 2018 > > > > > > >1st Kernel where adb panics: June 25 13:10:32 MDT 2018 > > > > > > The real problem is in the xhci(4) driver. When a command with a > > > timeout is submitted we should ensure no other command is enqueued > > > before continuing. Sadly the driver did not include any mechanism > > > to serialize command submissions. Diff below does that and should > > > fix your problem. > > > > > > Can you try it on top of -current? Make sure you have no diff > > > reverted. > > > > Hi, > > > > I think I spoke a little too soon. I found a case where it > > started printing xhci0: command timeout over and over until > > eventually the kernel panics with a protection fault. I couldn't > > catch the backtrace properly, but it looked around the same area > > as this original bug report. > > Without backtrace I can't make progress. Apologies for the delay. Just found time to reproduce this. Here's a backtrace: kernel: protection fault trap, code=0 Stopped at xhci_abort_xfer+0x57: cmpb $0,0x471(%r14) ddb{1}> bt xhci_abort_xfer(9cbfd0450eaf1f9c,4) at xhci_abort_xfer+0x57 usbd_transfer(b41ac578436f71d1) at usb_transfer+0x24d ugen_do_read(7fd9672719de8ccd,800031d0,8153, ff047e7b2ba0) at ugen_do_read+0x347 ugenread(d30250dfbb30abef,800031d0,800031d66560) at ugenread+0x47 spec_read(ae30accb64c92cb4) at spec_read+0xab VOP_READ(1be36585d83bbddf,2d04c5610d671db7,ff047e7b2ba0, ff04) at VOP_READ+0x49 vn_read(f8ef5df9c7a62c48,ff03ff215788,10) at vn_read+0xf5 dofilereadv(1dd31abb163e61f3,30,8000fffea028,3, 800031d66790) at dofilereadv+0xe0 sys_read(cc3368a684a7a8f,2d04c5610d671db7,18) at sys_read+0x5c syscall(eec44dfe2a55ecc1) at syscall+0x32a Xsyscall(6,3,2b360799d50,3,1,2b3d0a18e00) at Xsyscall+0x128 end of kernel end trace frame: 0x2b39d0546f0, count: -11 I managed to trigger it by disconnecting the phone, reconnecting it, then running adb kill-server to re-run adb start-server to see if it picked up on the phone and the panic happened right after 'adb kill-server'. Please let me know if there's anything more you'd like me to test. Thanks, Tom
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
On 28/08/18(Tue) 22:22, Tom Murphy wrote: > On Tue, Aug 28, 2018 at 04:20:41PM -0300, Martin Pieuchot wrote: > > Hello Tom, > > > > On 28/08/18(Tue) 11:10, Tom Murphy wrote: > > > On Tue, Aug 28, 2018 at 02:49:38PM +0900, Bryan Linton wrote: > > > > On 2018-08-25 21:40:57, Tom Murphy wrote: > > > > > On Thu, Aug 23, 2018 at 08:45:54PM +0900, Tom Murphy wrote: > > > > > > I've narrowed it down. > > > > > > > > > > > >Last kernel where adb works: June 24 09:59:46 MDT 2018 > > > > > >1st Kernel where adb panics: June 25 13:10:32 MDT 2018 > > > > The real problem is in the xhci(4) driver. When a command with a > > timeout is submitted we should ensure no other command is enqueued > > before continuing. Sadly the driver did not include any mechanism > > to serialize command submissions. Diff below does that and should > > fix your problem. > > > > Can you try it on top of -current? Make sure you have no diff > > reverted. > > Hi, > > I think I spoke a little too soon. I found a case where it > started printing xhci0: command timeout over and over until > eventually the kernel panics with a protection fault. I couldn't > catch the backtrace properly, but it looked around the same area > as this original bug report. Without backtrace I can't make progress.
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
On Tue, Aug 28, 2018 at 04:20:41PM -0300, Martin Pieuchot wrote: > Hello Tom, > > On 28/08/18(Tue) 11:10, Tom Murphy wrote: > > On Tue, Aug 28, 2018 at 02:49:38PM +0900, Bryan Linton wrote: > > > On 2018-08-25 21:40:57, Tom Murphy wrote: > > > > On Thu, Aug 23, 2018 at 08:45:54PM +0900, Tom Murphy wrote: > > > > > I've narrowed it down. > > > > > > > > > >Last kernel where adb works: June 24 09:59:46 MDT 2018 > > > > >1st Kernel where adb panics: June 25 13:10:32 MDT 2018 > > The real problem is in the xhci(4) driver. When a command with a > timeout is submitted we should ensure no other command is enqueued > before continuing. Sadly the driver did not include any mechanism > to serialize command submissions. Diff below does that and should > fix your problem. > > Can you try it on top of -current? Make sure you have no diff > reverted. Hi, I think I spoke a little too soon. I found a case where it started printing xhci0: command timeout over and over until eventually the kernel panics with a protection fault. I couldn't catch the backtrace properly, but it looked around the same area as this original bug report. Thanks, Tom
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
Hi Martin, On Tue, Aug 28, 2018 at 04:20:41PM -0300, Martin Pieuchot wrote: > Hello Tom, > > On 28/08/18(Tue) 11:10, Tom Murphy wrote: > > On Tue, Aug 28, 2018 at 02:49:38PM +0900, Bryan Linton wrote: > > > On 2018-08-25 21:40:57, Tom Murphy wrote: > > > > On Thu, Aug 23, 2018 at 08:45:54PM +0900, Tom Murphy wrote: > > > > > I've narrowed it down. > > > > > > > > > >Last kernel where adb works: June 24 09:59:46 MDT 2018 > > > > >1st Kernel where adb panics: June 25 13:10:32 MDT 2018 > > The real problem is in the xhci(4) driver. When a command with a > timeout is submitted we should ensure no other command is enqueued > before continuing. Sadly the driver did not include any mechanism > to serialize command submissions. Diff below does that and should > fix your problem. > > Can you try it on top of -current? Make sure you have no diff > reverted. I've just tested this and was able to send some files to the phone with adb over USB without any issues. The rwlock diff did the trick. Many thanks for this! Tom
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
Hello Tom, On 28/08/18(Tue) 11:10, Tom Murphy wrote: > On Tue, Aug 28, 2018 at 02:49:38PM +0900, Bryan Linton wrote: > > On 2018-08-25 21:40:57, Tom Murphy wrote: > > > On Thu, Aug 23, 2018 at 08:45:54PM +0900, Tom Murphy wrote: > > > > I've narrowed it down. > > > > > > > >Last kernel where adb works: June 24 09:59:46 MDT 2018 > > > >1st Kernel where adb panics: June 25 13:10:32 MDT 2018 The real problem is in the xhci(4) driver. When a command with a timeout is submitted we should ensure no other command is enqueued before continuing. Sadly the driver did not include any mechanism to serialize command submissions. Diff below does that and should fix your problem. Can you try it on top of -current? Make sure you have no diff reverted. Index: xhci.c === RCS file: /cvs/src/sys/dev/usb/xhci.c,v retrieving revision 1.87 diff -u -p -r1.87 xhci.c --- xhci.c 16 Jul 2018 07:48:17 - 1.87 +++ xhci.c 28 Aug 2018 19:12:11 - @@ -25,6 +25,7 @@ #include #include #include +#include #include @@ -344,6 +345,7 @@ xhci_init(struct xhci_softc *sc) return (ENOMEM); /* Setup command ring. */ + rw_init(&sc->sc_cmd_lock, "xhcicmd"); error = xhci_ring_alloc(sc, &sc->sc_cmd_ring, XHCI_MAX_CMDS, XHCI_CMDS_RING_ALIGN); if (error) { @@ -1663,7 +1665,7 @@ xhci_command_submit(struct xhci_softc *s return (0); } - assertwaitok(); + rw_assert_wrlock(&sc->sc_cmd_lock); s = splusb(); sc->sc_cmd_trb = trb; @@ -1729,6 +1731,7 @@ int xhci_cmd_configure_ep(struct xhci_softc *sc, uint8_t slot, uint64_t addr) { struct xhci_trb trb; + int error; DPRINTF(("%s: %s dev %u\n", DEVNAME(sc), __func__, slot)); @@ -1738,13 +1741,17 @@ xhci_cmd_configure_ep(struct xhci_softc XHCI_TRB_SET_SLOT(slot) | XHCI_CMD_CONFIG_EP ); - return (xhci_command_submit(sc, &trb, XHCI_CMD_TIMEOUT)); + rw_enter_write(&sc->sc_cmd_lock); + error = xhci_command_submit(sc, &trb, XHCI_CMD_TIMEOUT); + rw_exit_write(&sc->sc_cmd_lock); + return (error); } int xhci_cmd_stop_ep(struct xhci_softc *sc, uint8_t slot, uint8_t dci) { struct xhci_trb trb; + int error; DPRINTF(("%s: %s dev %u dci %u\n", DEVNAME(sc), __func__, slot, dci)); @@ -1754,7 +1761,10 @@ xhci_cmd_stop_ep(struct xhci_softc *sc, XHCI_TRB_SET_SLOT(slot) | XHCI_TRB_SET_EP(dci) | XHCI_CMD_STOP_EP ); - return (xhci_command_submit(sc, &trb, XHCI_CMD_TIMEOUT)); + rw_enter_write(&sc->sc_cmd_lock); + error = xhci_command_submit(sc, &trb, XHCI_CMD_TIMEOUT); + rw_exit_write(&sc->sc_cmd_lock); + return (error); } void @@ -1794,6 +1804,7 @@ int xhci_cmd_slot_control(struct xhci_softc *sc, uint8_t *slotp, int enable) { struct xhci_trb trb; + int error; DPRINTF(("%s: %s\n", DEVNAME(sc), __func__)); @@ -1806,7 +1817,10 @@ xhci_cmd_slot_control(struct xhci_softc XHCI_TRB_SET_SLOT(*slotp) | XHCI_CMD_DISABLE_SLOT ); - if (xhci_command_submit(sc, &trb, XHCI_CMD_TIMEOUT)) + rw_enter_write(&sc->sc_cmd_lock); + error = xhci_command_submit(sc, &trb, XHCI_CMD_TIMEOUT); + rw_exit_write(&sc->sc_cmd_lock); + if (error != 0) return (EIO); if (enable) @@ -1820,6 +1834,7 @@ xhci_cmd_set_address(struct xhci_softc * uint32_t bsr) { struct xhci_trb trb; + int error; DPRINTF(("%s: %s BSR=%u\n", DEVNAME(sc), __func__, bsr ? 1 : 0)); @@ -1829,13 +1844,17 @@ xhci_cmd_set_address(struct xhci_softc * XHCI_TRB_SET_SLOT(slot) | XHCI_CMD_ADDRESS_DEVICE | bsr ); - return (xhci_command_submit(sc, &trb, XHCI_CMD_TIMEOUT)); + rw_enter_write(&sc->sc_cmd_lock); + error = xhci_command_submit(sc, &trb, XHCI_CMD_TIMEOUT); + rw_exit_write(&sc->sc_cmd_lock); + return (error); } int xhci_cmd_evaluate_ctx(struct xhci_softc *sc, uint8_t slot, uint64_t addr) { struct xhci_trb trb; + int error; DPRINTF(("%s: %s dev %u\n", DEVNAME(sc), __func__, slot)); @@ -1845,7 +1864,10 @@ xhci_cmd_evaluate_ctx(struct xhci_softc XHCI_TRB_SET_SLOT(slot) | XHCI_CMD_EVAL_CTX ); - return (xhci_command_submit(sc, &trb, XHCI_CMD_TIMEOUT)); + rw_enter_write(&sc->sc_cmd_lock); + error = xhci_command_submit(sc, &trb, XHCI_CMD_TIMEOUT); + rw_exit_write(&sc->sc_cmd_lock); + return (error); } #ifdef XHCI_DEBUG @@ -1853,6 +1875,7 @@ int xhci_cmd_noop(struct xhci_softc *sc) { struct xhci_trb trb; + int error; DPRINTF(("%s: %s\n", DEVNAME(sc), __func__)); @@ -1860,7 +1883,10 @@ xhci_cmd_noop(struct xhci_softc *sc) trb.trb_status = 0; trb.trb_flags = htole32(XHCI_CMD_NOOP); -
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
On Tue, Aug 28, 2018 at 02:49:38PM +0900, Bryan Linton wrote: > [CCing visa@ in] > > On 2018-08-25 21:40:57, Tom Murphy wrote: > > On Thu, Aug 23, 2018 at 08:45:54PM +0900, Tom Murphy wrote: > > > I've narrowed it down. > > > > > >Last kernel where adb works: June 24 09:59:46 MDT 2018 > > >1st Kernel where adb panics: June 25 13:10:32 MDT 2018 > > > > > > [...] > > > > > > I'm going to look at the commits next. > > > > > >-Tom > > > > I can verify that this commit is what makes the kernel panic when adb is > > run and an Android device is connected to the machine with ADB enabled: > > > > https://marc.info/?l=openbsd-cvs&m=152996258723362&w=2 > > > > CVSROOT:/cvs > > Module name:src > > Changes by: v...@cvs.openbsd.org2018/06/25 10:06:27 > > > > Modified files: > > sys/kern : vfs_syscalls.c > > lib/libc/sys : dup.2 > > > > Log message: > > During open(2), release the fdp lock before calling vn_open(9). > > This lets other threads of the process modify the file descriptor > > table even if the vn_open(9) call blocks. > > > > The change has an effect on dup2(2) and dup3(2). If the new descriptor > > is the same as the one reserved by an unfinished open(2), the system > > call will fail with error EBUSY. The accept(2) system call already > > behaves like this. > > > > Issue pointed out by art@ via mpi@ > > > > Tested in a bulk build by ajacoutot@ > > OK mpi@ > > > > * * * > > > > I tested kernels compiled just before that commit and right after, and that > > commit makes the kernel panic. > > > > I can also confirm that reverting this patch fixes the kernel > panics when launching ADB for me as well. I'm currently syncing > my phone to my HDD as I type this. > > I'm still building against kernel sources from here: > OpenBSD 6.3-current (GENERIC.MP) #163: Mon Jul 30 12:45:31 MDT 2018 > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > So fair warning, my tree is still a little bit out of date (I'm > planning on upgrading to a newer snap this weekend if I have the > time) but, as stated above, I can at least confirm that reverting > this patch fixes the panics for me. > > -- > Bryan Hi, I've checked out the src from -current, and reverted the 25th June 2018 commit (just to src/sys/kern/vfs_syscalls.c to test) and I'm able to transfer files to and from my Android phone over USB now without any panics. This is the diff to revert. I don't know what kind of an effect this will have overall on the kernel as I see in the commit messages that there is some refactoring going on (if I read it correctly)? Index: sys/kern/vfs_syscalls.c === RCS file: /cvs/src/sys/kern/vfs_syscalls.c,v retrieving revision 1.304 diff -u -p -u -p -r1.304 vfs_syscalls.c --- sys/kern/vfs_syscalls.c 20 Aug 2018 16:00:22 - 1.304 +++ sys/kern/vfs_syscalls.c 28 Aug 2018 10:07:56 - @@ -1007,8 +1007,6 @@ doopenat(struct proc *p, int fd, const c fdplock(fdp); if ((error = falloc(p, &fp, &indx)) != 0) goto out; - fdpunlock(fdp); - flags = FFLAGS(oflags); if (flags & FREAD) { ni_pledge |= PLEDGE_RPATH; @@ -1035,7 +1033,6 @@ doopenat(struct proc *p, int fd, const c flags &= ~O_TRUNC; /* Must do truncate ourselves */ } if ((error = vn_open(&nd, flags, cmode)) != 0) { - fdplock(fdp); if (error == ENODEV && p->p_dupfd >= 0 && /* XXX from fdopen */ (error = @@ -1070,7 +1067,6 @@ doopenat(struct proc *p, int fd, const c VOP_UNLOCK(vp); error = VOP_ADVLOCK(vp, (caddr_t)fp, F_SETLK, &lf, type); if (error) { - fdplock(fdp); /* closef will vn_close the file for us. */ fdremove(fdp, indx); closef(fp, p); @@ -1093,7 +1089,6 @@ doopenat(struct proc *p, int fd, const c } if (error) { VOP_UNLOCK(vp); - fdplock(fdp); /* closef will close the file for us. */ fdremove(fdp, indx); closef(fp, p); @@ -1102,7 +1097,6 @@ doopenat(struct proc *p, int fd, const c } VOP_UNLOCK(vp); *retval = indx; - fdplock(fdp); fdinsert(fdp, indx, cloexec, fp); FRELE(fp, p); out: Kind regards, Tom
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
[CCing visa@ in] On 2018-08-25 21:40:57, Tom Murphy wrote: > On Thu, Aug 23, 2018 at 08:45:54PM +0900, Tom Murphy wrote: > > I've narrowed it down. > > > >Last kernel where adb works: June 24 09:59:46 MDT 2018 > >1st Kernel where adb panics: June 25 13:10:32 MDT 2018 > > > > [...] > > > > I'm going to look at the commits next. > > > >-Tom > > I can verify that this commit is what makes the kernel panic when adb is > run and an Android device is connected to the machine with ADB enabled: > > https://marc.info/?l=openbsd-cvs&m=152996258723362&w=2 > > CVSROOT: /cvs > Module name: src > Changes by: v...@cvs.openbsd.org2018/06/25 10:06:27 > > Modified files: > sys/kern : vfs_syscalls.c > lib/libc/sys : dup.2 > > Log message: > During open(2), release the fdp lock before calling vn_open(9). > This lets other threads of the process modify the file descriptor > table even if the vn_open(9) call blocks. > > The change has an effect on dup2(2) and dup3(2). If the new descriptor > is the same as the one reserved by an unfinished open(2), the system > call will fail with error EBUSY. The accept(2) system call already > behaves like this. > > Issue pointed out by art@ via mpi@ > > Tested in a bulk build by ajacoutot@ > OK mpi@ > > * * * > > I tested kernels compiled just before that commit and right after, and that > commit makes the kernel panic. > I can also confirm that reverting this patch fixes the kernel panics when launching ADB for me as well. I'm currently syncing my phone to my HDD as I type this. I'm still building against kernel sources from here: OpenBSD 6.3-current (GENERIC.MP) #163: Mon Jul 30 12:45:31 MDT 2018 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP So fair warning, my tree is still a little bit out of date (I'm planning on upgrading to a newer snap this weekend if I have the time) but, as stated above, I can at least confirm that reverting this patch fixes the panics for me. -- Bryan
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
On Thu, Aug 23, 2018 at 08:45:54PM +0900, Tom Murphy wrote: > I've narrowed it down. > >Last kernel where adb works: June 24 09:59:46 MDT 2018 >1st Kernel where adb panics: June 25 13:10:32 MDT 2018 > > I did notice when my phone's booted into LineageOS and I have ADB >turned on, when I connect the phone via USB I get: > >ugen1 at uhub0 port 7 "motorola XT1039" rev 2.00/2.28 addr 8 > > However, I'm not able to actually connect to it with adb shell or >anything else. It says: Error device offline or something. > > When I boot the phone into recovery mode, the phone shows up like >this when I plug it in: > > ugen1 at uhub0 port 7 "Motorola Moto G LTE" rev 2.00/2.28 addr 4 > > (different name!) and I am able use adb shell, adb push/pull, etc.. > > I think there's some issue with LineageOS' ADB mode, but that's not >really relevant here (it's a separate issue and outside of OpenBSD >perhaps though I'll have to test with Linux or some other OS.) > > I'm going to look at the commits next. > >-Tom I can verify that this commit is what makes the kernel panic when adb is run and an Android device is connected to the machine with ADB enabled: https://marc.info/?l=openbsd-cvs&m=152996258723362&w=2 CVSROOT:/cvs Module name:src Changes by: v...@cvs.openbsd.org2018/06/25 10:06:27 Modified files: sys/kern : vfs_syscalls.c lib/libc/sys : dup.2 Log message: During open(2), release the fdp lock before calling vn_open(9). This lets other threads of the process modify the file descriptor table even if the vn_open(9) call blocks. The change has an effect on dup2(2) and dup3(2). If the new descriptor is the same as the one reserved by an unfinished open(2), the system call will fail with error EBUSY. The accept(2) system call already behaves like this. Issue pointed out by art@ via mpi@ Tested in a bulk build by ajacoutot@ OK mpi@ * * * I tested kernels compiled just before that commit and right after, and that commit makes the kernel panic. -Tom
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
On Thu, Aug 23, 2018 at 08:45:54PM +0900, Bryan Linton wrote: > So I found some time to try to bisect this, but was hampered by my > phone being somewhat temperamental. > > Everything up to July 3rd was fine. No crashes occurred. > > On a July 15th checkout, my system panicked when trying to run adb > with my phone connected. > > Unfortunately when I tried to bisect this further, my phone began > refusing to connect to my computer. I get a generic > "uhub0: device problem, disabling port 2" > error and cannot get my phone to attach to my computer even if I > reboot it, plug/unplug it, etc. > > I'll see if I can try to bisect this further once I figure out > what the problem is with my phone, but in the meantime, I wanted > to at least update the bugs@ list with my findings so far. > > I see a few potential commits in that time-frame that could be > responsible, so I'm going to see if I can manage to narrow this > down even further. > > -- > Bryan Hi Bryan, I've narrowed it down. Last kernel where adb works: June 24 09:59:46 MDT 2018 1st Kernel where adb panics: June 25 13:10:32 MDT 2018 I did notice when my phone's booted into LineageOS and I have ADB turned on, when I connect the phone via USB I get: ugen1 at uhub0 port 7 "motorola XT1039" rev 2.00/2.28 addr 8 However, I'm not able to actually connect to it with adb shell or anything else. It says: Error device offline or something. When I boot the phone into recovery mode, the phone shows up like this when I plug it in: ugen1 at uhub0 port 7 "Motorola Moto G LTE" rev 2.00/2.28 addr 4 (different name!) and I am able use adb shell, adb push/pull, etc.. I think there's some issue with LineageOS' ADB mode, but that's not really relevant here (it's a separate issue and outside of OpenBSD perhaps though I'll have to test with Linux or some other OS.) I'm going to look at the commits next. -Tom
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
On Thu, Aug 23, 2018 at 08:45:54PM +0900, Bryan Linton wrote: > On 2018-08-21 17:00:50, Bryan Linton wrote: > > On 2018-08-18 23:02:24, Tom Murphy wrote: > > > >Synopsis:Kernel panic in xhci.c when Android phone with ADB > > > > > > [...] > > > > > > >How-To-Repeat: > > > I can reproduce this every time just by enabling ADB on the > > > phone and connecting it to the OpenBSD machine. > > > > > > > I've been seeing the same panics for about a month or two now. > > > > I hadn't filed a bug report because I was hoping I'd be able to > > find the time to bisect when the panics first started happening. > > > > So I found some time to try to bisect this, but was hampered by my > phone being somewhat temperamental. > > Everything up to July 3rd was fine. No crashes occurred. > > On a July 15th checkout, my system panicked when trying to run adb > with my phone connected. > > Unfortunately when I tried to bisect this further, my phone began > refusing to connect to my computer. I get a generic > "uhub0: device problem, disabling port 2" > error and cannot get my phone to attach to my computer even if I > reboot it, plug/unplug it, etc. > > I'll see if I can try to bisect this further once I figure out > what the problem is with my phone, but in the meantime, I wanted > to at least update the bugs@ list with my findings so far. > > I see a few potential commits in that time-frame that could be > responsible, so I'm going to see if I can manage to narrow this > down even further. > > -- > Bryan Hi Bryan, Thanks for having a look at this! I've been up to my eyeballs in work lately so haven't had a chance to play with it much more. I'll have a look at the xhci stuff that's gone in and see what could be doing it this weekend. -Tom
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
On 2018-08-21 17:00:50, Bryan Linton wrote: > On 2018-08-18 23:02:24, Tom Murphy wrote: > > >Synopsis: Kernel panic in xhci.c when Android phone with ADB > > > > [...] > > > > >How-To-Repeat: > > I can reproduce this every time just by enabling ADB on the > > phone and connecting it to the OpenBSD machine. > > > > I've been seeing the same panics for about a month or two now. > > I hadn't filed a bug report because I was hoping I'd be able to > find the time to bisect when the panics first started happening. > So I found some time to try to bisect this, but was hampered by my phone being somewhat temperamental. Everything up to July 3rd was fine. No crashes occurred. On a July 15th checkout, my system panicked when trying to run adb with my phone connected. Unfortunately when I tried to bisect this further, my phone began refusing to connect to my computer. I get a generic "uhub0: device problem, disabling port 2" error and cannot get my phone to attach to my computer even if I reboot it, plug/unplug it, etc. I'll see if I can try to bisect this further once I figure out what the problem is with my phone, but in the meantime, I wanted to at least update the bugs@ list with my findings so far. I see a few potential commits in that time-frame that could be responsible, so I'm going to see if I can manage to narrow this down even further. -- Bryan
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
On 2018-08-18 23:02:24, Tom Murphy wrote: > >Synopsis:Kernel panic in xhci.c when Android phone with ADB > > [...] > > >How-To-Repeat: > I can reproduce this every time just by enabling ADB on the > phone and connecting it to the OpenBSD machine. > I've been seeing the same panics for about a month or two now. I hadn't filed a bug report because I was hoping I'd be able to find the time to bisect when the panics first started happening. Since I'm not sure if I'll have the time to do this anytime soon, I'd like to at least add the following information. According to the timestamps in the filesystem, I was able to successfuly use ADB to sync my phone to my computer on June 24th. So the panics must have begun some time after that. If you have the time to bisect this, first, try a build from before the above date and confirm that it works for you. Then, try a build from around July 24th to see if it happened before or after that. Then repeat the above until you can narrow down the time frame as much as possible. If you can find the exact commit that broke this, it'll be much easier to fix it. Failing that, being able to say "It worked up until the Nth, but started panicking on the N+1th" it will still make it significantly easier find out what changed and thus, to fix this. -- Bryan
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
Hi, I managed to get XHCI_DEBUG working. Here are the messages and I connect the Android device: xhci0: xhci_cmd_configure_ep dev 1 xhci0: xhci_cmd_slot_control xhci0: port=7 change=0x80 xhci0: port=7 change=0x80 xhci0: xhci_cmd_slot_control xhci0: dev 6, input=0xff00687b9000 slot=0xff00687b9020 ep0=0xff00687b9040 xhci0: dev 6, setting DCBAA to 0x687ba000 xhci_pipe_init: pipe=0x81557000 addr=0 depth=1 port=7 speed=3 dev 6 dci 1 (epAddr=0x0) xhci0: xhci_cmd_set_address BSR=1 xhci0: xhci_cmd_set_address BSR=0 xhci0: dev 6 addr 6 ugen1 at uhub0 port 7 "motorola XT1039" rev 2.00/2.28 addr 2 xhci0: short xfer 0xff047d8ad960 for 48 xhci0: short xfer 0xff047d8ad960 for 51 Then, when I start adb with 'adb wait-for-devices': pxahnciic_:p ikpee_init: pipe=0x80c57000 addr=2 depth=1 port=7 speed=3 dev 6 dci 3 (epAddr=0x81) xhci0: xhci_cmd_configure_ep dev 6 Then the system hangs (doesn't drop into ddb) requiring a hard reset. When I was in ddb before, I tried 'show struct sc' but that didn't print anything helpful. Thanks, Tom
Re: Plugging in ADB-enabled Android device makes kernel panic with xhci
Hi, I would like to amend the above bug report. When reproducing I originally said it could be reproduced when the Android device was just plugged in. After going over my notes, and doing more testing, I found it only occurs when 'adb start-server' is running AND the device is plugged in via USB. If adb is not started, plugging in the device registers it in the kernel but does nothing. Starting adb then triggers the panic. I recompiled the kernel with option XHCI_debug but could not see any extra messages. Thanks, Tom
Plugging in ADB-enabled Android device makes kernel panic with xhci
>Synopsis: Kernel panic in xhci.c when Android phone with ADB > connected >Category: kernel >Environment: System : OpenBSD 6.4 Details : OpenBSD 6.4-beta (GENERIC.MP) #224: Fri Aug 17 23:42:30 MDT 2018 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP Architecture: OpenBSD.amd64 Machine : amd64 >Description: When plugging in an Android phone into USB when it has the Android Debug Bridge (ADB) enabled causes a kernel panic. No other application needs to be running (not even devel/adb needs to be running.) ddb{2}> show panic kernel diagnostic assertion "timeout == 0 || sc->sc_cmd_trb == NULL" failed file "/usr/src/sys/dev/usb/xhci.c", line 1643 ddb{2}> trace db_enter() at db_enter+0x12 panic() at panic+0x120 __assert(817bfa54,800031e1a680,806ab000,800031e1a6e8) at __assert+0x24 xhci_command_submit(893f1452e29dc378,806ab000,8178a068) at xhci_command_submit+0x220 xhci_pipe_init(2517ab0354ba223c,8178a000) at xhci_pipe_init+0x1b9 xhci_pipe_open(c71aa3f3774c08e5) at xhci_pipe_open+0x11b usbd_setup_pipe(12ef9866b0b08d0e,6,3,38e55c0cf670c5f1) at usbd_open_pipe+0xa1 ugenopen(fdb4584eb53365ed,800031e1a948,800031bc8968,15a8) at ugenopen+0x1b4 spec_open(0d8608c87102661b) at spec_open+0xe1 VOP_OPEN(f6c6c903b309846,800031e1aad8,38e55c0cf670c5f1,800031bc8968) at VOP_OPEN+0x58 vn_open(3e77a72b92ba5b4b,800031bc8968,1) at vn_open+0x28b doopenat(6e2d8c17e8ec19d2,50,800031bc8968,5,800031e1ad70,426cbc483d8) at doopenat+0x1d3 syscall(739d180e92889b4d) at syscall+0x32a Xsyscall(6,5,42690c29c00,5,1,1) at Xsyscall+0x128 end of kernel end of trace frame: 0x426cbc484c0, count: -15 There appears to be a similar bugs reported here: https://marc.info/?l=openbsd-bugs&m=147306540522232&w=2 https://marc.info/?l=openbsd-bugs&m=143456823814213&w=2 except I get the above panic on xhci, not ehci. The panic occurs as soon as the phone is physically connected. My machine only uses xhci. If I disable xhci(4) in the kernel, USB stops working, so I cannot test with ehci(4) (there is also no BIOS option to disable USB3) >How-To-Repeat: I can reproduce this every time just by enabling ADB on the phone and connecting it to the OpenBSD machine. dmesg: OpenBSD 6.4-beta (GENERIC.MP) #224: Fri Aug 17 23:42:30 MDT 2018 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 17040445440 (16251MB) avail mem = 16514805760 (15749MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x7b288000 (41 entries) bios0: vendor American Megatrends Inc. version "1.05.07" date 09/29/2017 bios0: PC Specialist LTD N13xWU acpi0 at bios0: rev 2 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP APIC FPDT FIDT MCFG SSDT SSDT HPET UEFI SSDT SSDT SSDT DBGP DBG2 DMAR BGRT ASF! WSMT acpi0: wakeup devices PXSX(S4) RP17(S4) PXSX(S4) RP18(S4) PXSX(S4) RP19(S4) PXSX(S4) RP20(S4) PXSX(S4) RP21(S4) PXSX(S4) RP22(S4) PXSX(S4) RP23(S4) PXSX(S4) RP24(S4) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.76 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN cpu0: 256KB 64b/line 8-way L2 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 23MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.04 MHz cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN cpu1: 256KB 64b/line 8-way L2 cache cpu1: smt 0, core 1, package 0 cpu2 at mainbus0: apid 4 (application processor) cpu2: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.04 MHz cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,x