Re: drivers/usb/host/xhci-ring.c:1390 handle_cmd_completion
Hi back from my vacation. On 27.07.2017 04:04, Jason A. Donenfeld wrote: Hey Mathias, While the oops has been fixed in the latest kernels, as of 4.12.3, I'm still having issues where USB "stops working" after a while. If I restart, then after a long while, it stops working again. If I simply rmmod all the USB-related modules and modprobe xhci_pci, things work again, but pretty soon after -- maybe 15ish minutes -- it's broken again. If it's of any help, I'm often plugging and unplugging a USB hub with quite a few devices on it. The message in my dmesg this time isn't an BUG, like in the previous kernels, but rather this more subdued set of messages: [529153.583731] xhci_hcd :00:14.0: Timeout while waiting for setup device command [529159.215731] xhci_hcd :00:14.0: ERROR mismatched command completion event [529159.215768] xhci_hcd :00:14.0: Timeout while waiting for setup device command Odd, looks like something could still messed up with the comma queue, or events are received out of order. Or maybe we don't get the interrupt for command completion event. Any chance you could take traces of this? It would show more details about the queued commands and completions. xhci traces can be taken with: mount -t debugfs none /sys/kernel/debug echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable then after issue triggers send me the content of /sys/kernel/debug/tracing/trace (it will be huge) and dmesg Thanks Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: drivers/usb/host/xhci-ring.c:1390 handle_cmd_completion
Hey Mathias, While the oops has been fixed in the latest kernels, as of 4.12.3, I'm still having issues where USB "stops working" after a while. If I restart, then after a long while, it stops working again. If I simply rmmod all the USB-related modules and modprobe xhci_pci, things work again, but pretty soon after -- maybe 15ish minutes -- it's broken again. If it's of any help, I'm often plugging and unplugging a USB hub with quite a few devices on it. The message in my dmesg this time isn't an BUG, like in the previous kernels, but rather this more subdued set of messages: [529153.583731] xhci_hcd :00:14.0: Timeout while waiting for setup device command [529159.215731] xhci_hcd :00:14.0: ERROR mismatched command completion event [529159.215768] xhci_hcd :00:14.0: Timeout while waiting for setup device command [529159.423840] usb 2-2: device not accepting address 6, error -62 [529161.263641] xhci_hcd :00:14.0: Timeout while waiting for stop endpoint command [529166.895725] xhci_hcd :00:14.0: Command completion event does not match command [529166.895729] xhci_hcd :00:14.0: ERROR mismatched command completion event [529173.551626] xhci_hcd :00:14.0: ERROR mismatched command completion event [529178.671242] xhci_hcd :00:14.0: xHCI host not responding to stop endpoint command . [529178.671260] xhci_hcd :00:14.0: xHCI host controller not responding, assume dead [529178.671325] xhci_hcd :00:14.0: HC died; cleaning up [529178.671369] xhci_hcd :00:14.0: Timeout waiting for reset device command [529178.671411] usb 1-2: USB disconnect, device number 16 [529178.671415] usb 1-2.2: USB disconnect, device number 17 [529178.671938] usb 1-2.3: USB disconnect, device number 18 [529178.671943] usb 1-2.3.4: USB disconnect, device number 20 [529178.673143] usb 1-2.4: USB disconnect, device number 19 [529178.673150] usb 1-2.4.1: USB disconnect, device number 21 [529178.756412] usb 1-2.4.2: USB disconnect, device number 22 [529178.830334] usb 1-2.4.3: USB disconnect, device number 23 Is this sufficient information? Or should I try to gather more somehow? Regards, Jason -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: drivers/usb/host/xhci-ring.c:1390 handle_cmd_completion
On 23.05.2017 12:50, Mathias Nyman wrote: On 22.05.2017 18:42, Greg KH wrote: On Fri, May 19, 2017 at 02:53:20PM +0200, Jason A. Donenfeld wrote: I'm having this issue on kernel 4.11.0 and 4.11.1. It usually happens after a while of ordinary USB use. Afterwards, USB does not work. If I rmmod all the modules and reinsert them, it works for a while, but even sooner after it stops working again. [48112.422418] [ cut here ] [48112.422441] WARNING: CPU: 0 PID: 14420 at drivers/usb/host/xhci-ring.c:1390 handle_cmd_completion+0xb17/0xc00 [xhci_hcd] [48112.422446] Modules linked in: xt_hashlimit ip6_udp_tunnel udp_tunnel rfcomm pl2303 hid_lenovo bnep cdc_mbim cdc_ncm qcserial cdc_wdm usb_wwan usbnet usbserial mii uvcvideo videobuf2_vmalloc videobuf2_memops [48112.422480] xhci_hcd :00:14.0: Timeout while waiting for setup device command [48112.422481] videobuf2_v4l2 videobuf2_core cdc_acm videodev btusb btintel usbhid bluetooth af_packet nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter xt_hl nf_conntrack_ipv6 nf_defrag_ipv6 xt_multiport 8021q xt_conntrack nf_conntrack ip6table_filter ip6_tables algif_skcipher joydev mousedev snd_hda_codec_realtek snd_hda_codec_generic arc4 iwlmvm mac80211 rtsx_pci_sdmmc mmc_core intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iwlwifi snd_hda_intel ahci irqbypass snd_hda_codec crc32_pclmul snd_hwdep xhci_pci xhci_hcd libahci crc32c_intel snd_hda_core mei_me cfg80211 usbcore snd_pcm rtsx_pci ie31200_edac input_leds mfd_core e1000e libata usb_common mei snd_timer psmouse edac_core intel_pch_thermal thinkpad_acpi snd soundcore led_class rfkill tpm_tis tpm_tis_core evdev [48112.422556] tpm sch_fq_codel [48112.422565] CPU: 0 PID: 14420 Comm: kworker/0:7 Tainted: P W O 4.11.1-gentoo #1 [48112.422567] Hardware name: LENOVO 20ENCTO1WW/20ENCTO1WW, BIOS N1EET65W (1.38 ) 02/09/2017 [48112.422577] Workqueue: events xhci_handle_command_timeout [xhci_hcd] [48112.422580] Call Trace: [48112.422583] [48112.422589] ? dump_stack+0x46/0x5e [48112.422595] ? __warn+0xb9/0xe0 [48112.422603] ? handle_cmd_completion+0xb17/0xc00 [xhci_hcd] [48112.422609] ? try_to_wake_up+0x22e/0x390 [48112.422617] ? xhci_irq+0x38f/0x1460 [xhci_hcd] [48112.422624] ? run_timer_softirq.part.2+0x4c/0xa0 [48112.422629] ? expire_timers+0x6e/0xe0 [48112.422634] ? __handle_irq_event_percpu+0x36/0x190 [48112.422637] ? handle_irq_event_percpu+0x1b/0x50 [48112.422640] ? handle_irq_event+0x22/0x40 [48112.422644] ? handle_edge_irq+0x65/0x120 [48112.422649] ? handle_irq+0x11/0x20 [48112.422653] ? do_IRQ+0x3c/0xc0 [48112.422658] ? common_interrupt+0x7f/0x7f [48112.422660] [48112.422664] ? _raw_spin_unlock_irqrestore+0x5/0x10 [48112.422671] ? xhci_handle_command_timeout+0xf4/0x1b0 [xhci_hcd] [48112.422684] ? process_one_work+0x1d9/0x450 [48112.422689] ? worker_thread+0x42/0x4b0 [48112.422695] ? process_one_work+0x450/0x450 [48112.422698] ? kthread+0x112/0x130 [48112.422702] ? kthread_create_on_node+0x40/0x40 [48112.422705] ? ret_from_fork+0x23/0x30 [48112.422709] ---[ end trace eb9505885b6e349e ]--- [48113.446247] xhci_hcd :00:14.0: xHCI host not responding to stop endpoint command. [48113.446250] xhci_hcd :00:14.0: Assuming host is dying, halting host. [48113.446348] xhci_hcd :00:14.0: HC died; cleaning up Ugh, that does not look good, Mathias, anything Jason can do here to help debug this? both traces and dynamic debugging could be useful here mount -t debugfs none /sys/kernel/debug echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable then send output of both dmesg and /sys/kernel/debug/tracing/trace Actually these symptoms match the xhci command ring stop regression in 4.11, 4.12-rc2 contains a fix, does that work? fix is: commit 604d02a2a66ab7f93fd3b2bde3698c29ef057b65 -Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: drivers/usb/host/xhci-ring.c:1390 handle_cmd_completion
On 22.05.2017 18:42, Greg KH wrote: On Fri, May 19, 2017 at 02:53:20PM +0200, Jason A. Donenfeld wrote: I'm having this issue on kernel 4.11.0 and 4.11.1. It usually happens after a while of ordinary USB use. Afterwards, USB does not work. If I rmmod all the modules and reinsert them, it works for a while, but even sooner after it stops working again. [48112.422418] [ cut here ] [48112.422441] WARNING: CPU: 0 PID: 14420 at drivers/usb/host/xhci-ring.c:1390 handle_cmd_completion+0xb17/0xc00 [xhci_hcd] [48112.422446] Modules linked in: xt_hashlimit ip6_udp_tunnel udp_tunnel rfcomm pl2303 hid_lenovo bnep cdc_mbim cdc_ncm qcserial cdc_wdm usb_wwan usbnet usbserial mii uvcvideo videobuf2_vmalloc videobuf2_memops [48112.422480] xhci_hcd :00:14.0: Timeout while waiting for setup device command [48112.422481] videobuf2_v4l2 videobuf2_core cdc_acm videodev btusb btintel usbhid bluetooth af_packet nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter xt_hl nf_conntrack_ipv6 nf_defrag_ipv6 xt_multiport 8021q xt_conntrack nf_conntrack ip6table_filter ip6_tables algif_skcipher joydev mousedev snd_hda_codec_realtek snd_hda_codec_generic arc4 iwlmvm mac80211 rtsx_pci_sdmmc mmc_core intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iwlwifi snd_hda_intel ahci irqbypass snd_hda_codec crc32_pclmul snd_hwdep xhci_pci xhci_hcd libahci crc32c_intel snd_hda_core mei_me cfg80211 usbcore snd_pcm rtsx_pci ie31200_edac input_leds mfd_core e1000e libata usb_common mei snd_timer psmouse edac_core intel_pch_thermal thinkpad_acpi snd soundcore led_class rfkill tpm_tis tpm_tis_core evdev [48112.422556] tpm sch_fq_codel [48112.422565] CPU: 0 PID: 14420 Comm: kworker/0:7 Tainted: P W O 4.11.1-gentoo #1 [48112.422567] Hardware name: LENOVO 20ENCTO1WW/20ENCTO1WW, BIOS N1EET65W (1.38 ) 02/09/2017 [48112.422577] Workqueue: events xhci_handle_command_timeout [xhci_hcd] [48112.422580] Call Trace: [48112.422583] [48112.422589] ? dump_stack+0x46/0x5e [48112.422595] ? __warn+0xb9/0xe0 [48112.422603] ? handle_cmd_completion+0xb17/0xc00 [xhci_hcd] [48112.422609] ? try_to_wake_up+0x22e/0x390 [48112.422617] ? xhci_irq+0x38f/0x1460 [xhci_hcd] [48112.422624] ? run_timer_softirq.part.2+0x4c/0xa0 [48112.422629] ? expire_timers+0x6e/0xe0 [48112.422634] ? __handle_irq_event_percpu+0x36/0x190 [48112.422637] ? handle_irq_event_percpu+0x1b/0x50 [48112.422640] ? handle_irq_event+0x22/0x40 [48112.422644] ? handle_edge_irq+0x65/0x120 [48112.422649] ? handle_irq+0x11/0x20 [48112.422653] ? do_IRQ+0x3c/0xc0 [48112.422658] ? common_interrupt+0x7f/0x7f [48112.422660] [48112.422664] ? _raw_spin_unlock_irqrestore+0x5/0x10 [48112.422671] ? xhci_handle_command_timeout+0xf4/0x1b0 [xhci_hcd] [48112.422684] ? process_one_work+0x1d9/0x450 [48112.422689] ? worker_thread+0x42/0x4b0 [48112.422695] ? process_one_work+0x450/0x450 [48112.422698] ? kthread+0x112/0x130 [48112.422702] ? kthread_create_on_node+0x40/0x40 [48112.422705] ? ret_from_fork+0x23/0x30 [48112.422709] ---[ end trace eb9505885b6e349e ]--- [48113.446247] xhci_hcd :00:14.0: xHCI host not responding to stop endpoint command. [48113.446250] xhci_hcd :00:14.0: Assuming host is dying, halting host. [48113.446348] xhci_hcd :00:14.0: HC died; cleaning up Ugh, that does not look good, Mathias, anything Jason can do here to help debug this? both traces and dynamic debugging could be useful here mount -t debugfs none /sys/kernel/debug echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable then send output of both dmesg and /sys/kernel/debug/tracing/trace -Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: drivers/usb/host/xhci-ring.c:1390 handle_cmd_completion
On Fri, May 19, 2017 at 02:53:20PM +0200, Jason A. Donenfeld wrote: > I'm having this issue on kernel 4.11.0 and 4.11.1. It usually happens > after a while of ordinary USB use. Afterwards, USB does not work. If I > rmmod all the modules and reinsert them, it works for a while, but > even sooner after it stops working again. > > [48112.422418] [ cut here ] > [48112.422441] WARNING: CPU: 0 PID: 14420 at > drivers/usb/host/xhci-ring.c:1390 handle_cmd_completion+0xb17/0xc00 > [xhci_hcd] > [48112.422446] Modules linked in: xt_hashlimit ip6_udp_tunnel > udp_tunnel rfcomm pl2303 hid_lenovo bnep cdc_mbim cdc_ncm qcserial > cdc_wdm usb_wwan usbnet usbserial mii uvcvideo videobuf2_vmalloc > videobuf2_memops > [48112.422480] xhci_hcd :00:14.0: Timeout while waiting for setup > device command > [48112.422481] videobuf2_v4l2 videobuf2_core cdc_acm videodev btusb > btintel usbhid bluetooth af_packet nf_conntrack_ipv4 nf_defrag_ipv4 > iptable_filter xt_hl nf_conntrack_ipv6 nf_defrag_ipv6 xt_multiport > 8021q xt_conntrack nf_conntrack ip6table_filter ip6_tables > algif_skcipher joydev mousedev snd_hda_codec_realtek > snd_hda_codec_generic arc4 iwlmvm mac80211 rtsx_pci_sdmmc mmc_core > intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp > kvm_intel kvm iwlwifi snd_hda_intel ahci irqbypass snd_hda_codec > crc32_pclmul snd_hwdep xhci_pci xhci_hcd libahci crc32c_intel > snd_hda_core mei_me cfg80211 usbcore snd_pcm rtsx_pci ie31200_edac > input_leds mfd_core e1000e libata usb_common mei snd_timer psmouse > edac_core intel_pch_thermal thinkpad_acpi snd soundcore led_class > rfkill tpm_tis tpm_tis_core evdev > [48112.422556] tpm sch_fq_codel > [48112.422565] CPU: 0 PID: 14420 Comm: kworker/0:7 Tainted: P W O > 4.11.1-gentoo #1 > [48112.422567] Hardware name: LENOVO 20ENCTO1WW/20ENCTO1WW, BIOS > N1EET65W (1.38 ) 02/09/2017 > [48112.422577] Workqueue: events xhci_handle_command_timeout [xhci_hcd] > [48112.422580] Call Trace: > [48112.422583] > [48112.422589] ? dump_stack+0x46/0x5e > [48112.422595] ? __warn+0xb9/0xe0 > [48112.422603] ? handle_cmd_completion+0xb17/0xc00 [xhci_hcd] > [48112.422609] ? try_to_wake_up+0x22e/0x390 > [48112.422617] ? xhci_irq+0x38f/0x1460 [xhci_hcd] > [48112.422624] ? run_timer_softirq.part.2+0x4c/0xa0 > [48112.422629] ? expire_timers+0x6e/0xe0 > [48112.422634] ? __handle_irq_event_percpu+0x36/0x190 > [48112.422637] ? handle_irq_event_percpu+0x1b/0x50 > [48112.422640] ? handle_irq_event+0x22/0x40 > [48112.422644] ? handle_edge_irq+0x65/0x120 > [48112.422649] ? handle_irq+0x11/0x20 > [48112.422653] ? do_IRQ+0x3c/0xc0 > [48112.422658] ? common_interrupt+0x7f/0x7f > [48112.422660] > [48112.422664] ? _raw_spin_unlock_irqrestore+0x5/0x10 > [48112.422671] ? xhci_handle_command_timeout+0xf4/0x1b0 [xhci_hcd] > [48112.422684] ? process_one_work+0x1d9/0x450 > [48112.422689] ? worker_thread+0x42/0x4b0 > [48112.422695] ? process_one_work+0x450/0x450 > [48112.422698] ? kthread+0x112/0x130 > [48112.422702] ? kthread_create_on_node+0x40/0x40 > [48112.422705] ? ret_from_fork+0x23/0x30 > [48112.422709] ---[ end trace eb9505885b6e349e ]--- > [48113.446247] xhci_hcd :00:14.0: xHCI host not responding to stop > endpoint command. > [48113.446250] xhci_hcd :00:14.0: Assuming host is dying, halting host. > [48113.446348] xhci_hcd :00:14.0: HC died; cleaning up Ugh, that does not look good, Mathias, anything Jason can do here to help debug this? thanks greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
drivers/usb/host/xhci-ring.c:1390 handle_cmd_completion
I'm having this issue on kernel 4.11.0 and 4.11.1. It usually happens after a while of ordinary USB use. Afterwards, USB does not work. If I rmmod all the modules and reinsert them, it works for a while, but even sooner after it stops working again. [48112.422418] [ cut here ] [48112.422441] WARNING: CPU: 0 PID: 14420 at drivers/usb/host/xhci-ring.c:1390 handle_cmd_completion+0xb17/0xc00 [xhci_hcd] [48112.422446] Modules linked in: xt_hashlimit ip6_udp_tunnel udp_tunnel rfcomm pl2303 hid_lenovo bnep cdc_mbim cdc_ncm qcserial cdc_wdm usb_wwan usbnet usbserial mii uvcvideo videobuf2_vmalloc videobuf2_memops [48112.422480] xhci_hcd :00:14.0: Timeout while waiting for setup device command [48112.422481] videobuf2_v4l2 videobuf2_core cdc_acm videodev btusb btintel usbhid bluetooth af_packet nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter xt_hl nf_conntrack_ipv6 nf_defrag_ipv6 xt_multiport 8021q xt_conntrack nf_conntrack ip6table_filter ip6_tables algif_skcipher joydev mousedev snd_hda_codec_realtek snd_hda_codec_generic arc4 iwlmvm mac80211 rtsx_pci_sdmmc mmc_core intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iwlwifi snd_hda_intel ahci irqbypass snd_hda_codec crc32_pclmul snd_hwdep xhci_pci xhci_hcd libahci crc32c_intel snd_hda_core mei_me cfg80211 usbcore snd_pcm rtsx_pci ie31200_edac input_leds mfd_core e1000e libata usb_common mei snd_timer psmouse edac_core intel_pch_thermal thinkpad_acpi snd soundcore led_class rfkill tpm_tis tpm_tis_core evdev [48112.422556] tpm sch_fq_codel [48112.422565] CPU: 0 PID: 14420 Comm: kworker/0:7 Tainted: P W O 4.11.1-gentoo #1 [48112.422567] Hardware name: LENOVO 20ENCTO1WW/20ENCTO1WW, BIOS N1EET65W (1.38 ) 02/09/2017 [48112.422577] Workqueue: events xhci_handle_command_timeout [xhci_hcd] [48112.422580] Call Trace: [48112.422583] [48112.422589] ? dump_stack+0x46/0x5e [48112.422595] ? __warn+0xb9/0xe0 [48112.422603] ? handle_cmd_completion+0xb17/0xc00 [xhci_hcd] [48112.422609] ? try_to_wake_up+0x22e/0x390 [48112.422617] ? xhci_irq+0x38f/0x1460 [xhci_hcd] [48112.422624] ? run_timer_softirq.part.2+0x4c/0xa0 [48112.422629] ? expire_timers+0x6e/0xe0 [48112.422634] ? __handle_irq_event_percpu+0x36/0x190 [48112.422637] ? handle_irq_event_percpu+0x1b/0x50 [48112.422640] ? handle_irq_event+0x22/0x40 [48112.422644] ? handle_edge_irq+0x65/0x120 [48112.422649] ? handle_irq+0x11/0x20 [48112.422653] ? do_IRQ+0x3c/0xc0 [48112.422658] ? common_interrupt+0x7f/0x7f [48112.422660] [48112.422664] ? _raw_spin_unlock_irqrestore+0x5/0x10 [48112.422671] ? xhci_handle_command_timeout+0xf4/0x1b0 [xhci_hcd] [48112.422684] ? process_one_work+0x1d9/0x450 [48112.422689] ? worker_thread+0x42/0x4b0 [48112.422695] ? process_one_work+0x450/0x450 [48112.422698] ? kthread+0x112/0x130 [48112.422702] ? kthread_create_on_node+0x40/0x40 [48112.422705] ? ret_from_fork+0x23/0x30 [48112.422709] ---[ end trace eb9505885b6e349e ]--- [48113.446247] xhci_hcd :00:14.0: xHCI host not responding to stop endpoint command. [48113.446250] xhci_hcd :00:14.0: Assuming host is dying, halting host. [48113.446348] xhci_hcd :00:14.0: HC died; cleaning up -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html