Re: drivers/usb/host/xhci-ring.c:1390 handle_cmd_completion

2017-08-07 Thread Mathias Nyman

Hi

back from my vacation.

On 27.07.2017 04:04, Jason A. Donenfeld wrote:

Hey Mathias,

While the oops has been fixed in the latest kernels, as of 4.12.3, I'm
still having issues where USB "stops working" after a while. If I
restart, then after a long while, it stops working again. If I simply
rmmod all the USB-related modules and modprobe xhci_pci, things work
again, but pretty soon after -- maybe 15ish minutes -- it's broken
again. If it's of any help, I'm often plugging and unplugging a USB
hub with quite a few devices on it. The message in my dmesg this time
isn't an BUG, like in the previous kernels, but rather this more
subdued set of messages:

[529153.583731] xhci_hcd :00:14.0: Timeout while waiting for setup
device command
[529159.215731] xhci_hcd :00:14.0: ERROR mismatched command completion event
[529159.215768] xhci_hcd :00:14.0: Timeout while waiting for setup
device command


Odd, looks like something could still messed up with the comma queue,
or events are received out of order. Or maybe we don't get the interrupt for 
command completion event.

Any chance you could take traces of this?
It would show more details about the queued commands and completions.

xhci traces can be taken with:

mount -t debugfs none /sys/kernel/debug
echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable

then after issue triggers send me the content of
/sys/kernel/debug/tracing/trace   (it will be huge)
and dmesg

Thanks
Mathias


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: drivers/usb/host/xhci-ring.c:1390 handle_cmd_completion

2017-07-26 Thread Jason A. Donenfeld
Hey Mathias,

While the oops has been fixed in the latest kernels, as of 4.12.3, I'm
still having issues where USB "stops working" after a while. If I
restart, then after a long while, it stops working again. If I simply
rmmod all the USB-related modules and modprobe xhci_pci, things work
again, but pretty soon after -- maybe 15ish minutes -- it's broken
again. If it's of any help, I'm often plugging and unplugging a USB
hub with quite a few devices on it. The message in my dmesg this time
isn't an BUG, like in the previous kernels, but rather this more
subdued set of messages:

[529153.583731] xhci_hcd :00:14.0: Timeout while waiting for setup
device command
[529159.215731] xhci_hcd :00:14.0: ERROR mismatched command completion event
[529159.215768] xhci_hcd :00:14.0: Timeout while waiting for setup
device command
[529159.423840] usb 2-2: device not accepting address 6, error -62
[529161.263641] xhci_hcd :00:14.0: Timeout while waiting for stop
endpoint command
[529166.895725] xhci_hcd :00:14.0: Command completion event does
not match command
[529166.895729] xhci_hcd :00:14.0: ERROR mismatched command
completion event
[529173.551626] xhci_hcd :00:14.0: ERROR mismatched command completion event
[529178.671242] xhci_hcd :00:14.0: xHCI host not responding to
stop endpoint command
.
[529178.671260] xhci_hcd :00:14.0: xHCI host controller not
responding, assume dead
[529178.671325] xhci_hcd :00:14.0: HC died; cleaning up
[529178.671369] xhci_hcd :00:14.0: Timeout waiting for reset
device command
[529178.671411] usb 1-2: USB disconnect, device number 16
[529178.671415] usb 1-2.2: USB disconnect, device number 17
[529178.671938] usb 1-2.3: USB disconnect, device number 18
[529178.671943] usb 1-2.3.4: USB disconnect, device number 20
[529178.673143] usb 1-2.4: USB disconnect, device number 19
[529178.673150] usb 1-2.4.1: USB disconnect, device number 21
[529178.756412] usb 1-2.4.2: USB disconnect, device number 22
[529178.830334] usb 1-2.4.3: USB disconnect, device number 23

Is this sufficient information? Or should I try to gather more somehow?

Regards,
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: drivers/usb/host/xhci-ring.c:1390 handle_cmd_completion

2017-05-23 Thread Mathias Nyman

On 23.05.2017 12:50, Mathias Nyman wrote:

On 22.05.2017 18:42, Greg KH wrote:

On Fri, May 19, 2017 at 02:53:20PM +0200, Jason A. Donenfeld wrote:

I'm having this issue on kernel 4.11.0 and 4.11.1. It usually happens
after a while of ordinary USB use. Afterwards, USB does not work. If I
rmmod all the modules and reinsert them, it works for a while, but
even sooner after it stops working again.

[48112.422418] [ cut here ]
[48112.422441] WARNING: CPU: 0 PID: 14420 at
drivers/usb/host/xhci-ring.c:1390 handle_cmd_completion+0xb17/0xc00
[xhci_hcd]
[48112.422446] Modules linked in: xt_hashlimit ip6_udp_tunnel
udp_tunnel rfcomm pl2303 hid_lenovo bnep cdc_mbim cdc_ncm qcserial
cdc_wdm usb_wwan usbnet usbserial mii uvcvideo videobuf2_vmalloc
videobuf2_memops
[48112.422480] xhci_hcd :00:14.0: Timeout while waiting for setup
device command
[48112.422481] videobuf2_v4l2 videobuf2_core cdc_acm videodev btusb
btintel usbhid bluetooth af_packet nf_conntrack_ipv4 nf_defrag_ipv4
iptable_filter xt_hl nf_conntrack_ipv6 nf_defrag_ipv6 xt_multiport
8021q xt_conntrack nf_conntrack ip6table_filter ip6_tables
algif_skcipher joydev mousedev snd_hda_codec_realtek
snd_hda_codec_generic arc4 iwlmvm mac80211 rtsx_pci_sdmmc mmc_core
intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp
kvm_intel kvm iwlwifi snd_hda_intel ahci irqbypass snd_hda_codec
crc32_pclmul snd_hwdep xhci_pci xhci_hcd libahci crc32c_intel
snd_hda_core mei_me cfg80211 usbcore snd_pcm rtsx_pci ie31200_edac
input_leds mfd_core e1000e libata usb_common mei snd_timer psmouse
edac_core intel_pch_thermal thinkpad_acpi snd soundcore led_class
rfkill tpm_tis tpm_tis_core evdev
[48112.422556] tpm sch_fq_codel
[48112.422565] CPU: 0 PID: 14420 Comm: kworker/0:7 Tainted: P W O
4.11.1-gentoo #1
[48112.422567] Hardware name: LENOVO 20ENCTO1WW/20ENCTO1WW, BIOS
N1EET65W (1.38 ) 02/09/2017
[48112.422577] Workqueue: events xhci_handle_command_timeout [xhci_hcd]
[48112.422580] Call Trace:
[48112.422583] 
[48112.422589] ? dump_stack+0x46/0x5e
[48112.422595] ? __warn+0xb9/0xe0
[48112.422603] ? handle_cmd_completion+0xb17/0xc00 [xhci_hcd]
[48112.422609] ? try_to_wake_up+0x22e/0x390
[48112.422617] ? xhci_irq+0x38f/0x1460 [xhci_hcd]
[48112.422624] ? run_timer_softirq.part.2+0x4c/0xa0
[48112.422629] ? expire_timers+0x6e/0xe0
[48112.422634] ? __handle_irq_event_percpu+0x36/0x190
[48112.422637] ? handle_irq_event_percpu+0x1b/0x50
[48112.422640] ? handle_irq_event+0x22/0x40
[48112.422644] ? handle_edge_irq+0x65/0x120
[48112.422649] ? handle_irq+0x11/0x20
[48112.422653] ? do_IRQ+0x3c/0xc0
[48112.422658] ? common_interrupt+0x7f/0x7f
[48112.422660] 
[48112.422664] ? _raw_spin_unlock_irqrestore+0x5/0x10
[48112.422671] ? xhci_handle_command_timeout+0xf4/0x1b0 [xhci_hcd]
[48112.422684] ? process_one_work+0x1d9/0x450
[48112.422689] ? worker_thread+0x42/0x4b0
[48112.422695] ? process_one_work+0x450/0x450
[48112.422698] ? kthread+0x112/0x130
[48112.422702] ? kthread_create_on_node+0x40/0x40
[48112.422705] ? ret_from_fork+0x23/0x30
[48112.422709] ---[ end trace eb9505885b6e349e ]---
[48113.446247] xhci_hcd :00:14.0: xHCI host not responding to stop
endpoint command.
[48113.446250] xhci_hcd :00:14.0: Assuming host is dying, halting host.
[48113.446348] xhci_hcd :00:14.0: HC died; cleaning up


Ugh, that does not look good, Mathias, anything Jason can do here to
help debug this?



both traces and dynamic debugging could be useful here

mount -t debugfs none /sys/kernel/debug
echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control
echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable

then send output of both dmesg and /sys/kernel/debug/tracing/trace



Actually these symptoms match the xhci command ring stop regression in 4.11,
4.12-rc2 contains a fix, does that work?

fix is:
commit 604d02a2a66ab7f93fd3b2bde3698c29ef057b65

-Mathias



--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: drivers/usb/host/xhci-ring.c:1390 handle_cmd_completion

2017-05-23 Thread Mathias Nyman

On 22.05.2017 18:42, Greg KH wrote:

On Fri, May 19, 2017 at 02:53:20PM +0200, Jason A. Donenfeld wrote:

I'm having this issue on kernel 4.11.0 and 4.11.1. It usually happens
after a while of ordinary USB use. Afterwards, USB does not work. If I
rmmod all the modules and reinsert them, it works for a while, but
even sooner after it stops working again.

[48112.422418] [ cut here ]
[48112.422441] WARNING: CPU: 0 PID: 14420 at
drivers/usb/host/xhci-ring.c:1390 handle_cmd_completion+0xb17/0xc00
[xhci_hcd]
[48112.422446] Modules linked in: xt_hashlimit ip6_udp_tunnel
udp_tunnel rfcomm pl2303 hid_lenovo bnep cdc_mbim cdc_ncm qcserial
cdc_wdm usb_wwan usbnet usbserial mii uvcvideo videobuf2_vmalloc
videobuf2_memops
[48112.422480] xhci_hcd :00:14.0: Timeout while waiting for setup
device command
[48112.422481] videobuf2_v4l2 videobuf2_core cdc_acm videodev btusb
btintel usbhid bluetooth af_packet nf_conntrack_ipv4 nf_defrag_ipv4
iptable_filter xt_hl nf_conntrack_ipv6 nf_defrag_ipv6 xt_multiport
8021q xt_conntrack nf_conntrack ip6table_filter ip6_tables
algif_skcipher joydev mousedev snd_hda_codec_realtek
snd_hda_codec_generic arc4 iwlmvm mac80211 rtsx_pci_sdmmc mmc_core
intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp
kvm_intel kvm iwlwifi snd_hda_intel ahci irqbypass snd_hda_codec
crc32_pclmul snd_hwdep xhci_pci xhci_hcd libahci crc32c_intel
snd_hda_core mei_me cfg80211 usbcore snd_pcm rtsx_pci ie31200_edac
input_leds mfd_core e1000e libata usb_common mei snd_timer psmouse
edac_core intel_pch_thermal thinkpad_acpi snd soundcore led_class
rfkill tpm_tis tpm_tis_core evdev
[48112.422556] tpm sch_fq_codel
[48112.422565] CPU: 0 PID: 14420 Comm: kworker/0:7 Tainted: P W O
4.11.1-gentoo #1
[48112.422567] Hardware name: LENOVO 20ENCTO1WW/20ENCTO1WW, BIOS
N1EET65W (1.38 ) 02/09/2017
[48112.422577] Workqueue: events xhci_handle_command_timeout [xhci_hcd]
[48112.422580] Call Trace:
[48112.422583] 
[48112.422589] ? dump_stack+0x46/0x5e
[48112.422595] ? __warn+0xb9/0xe0
[48112.422603] ? handle_cmd_completion+0xb17/0xc00 [xhci_hcd]
[48112.422609] ? try_to_wake_up+0x22e/0x390
[48112.422617] ? xhci_irq+0x38f/0x1460 [xhci_hcd]
[48112.422624] ? run_timer_softirq.part.2+0x4c/0xa0
[48112.422629] ? expire_timers+0x6e/0xe0
[48112.422634] ? __handle_irq_event_percpu+0x36/0x190
[48112.422637] ? handle_irq_event_percpu+0x1b/0x50
[48112.422640] ? handle_irq_event+0x22/0x40
[48112.422644] ? handle_edge_irq+0x65/0x120
[48112.422649] ? handle_irq+0x11/0x20
[48112.422653] ? do_IRQ+0x3c/0xc0
[48112.422658] ? common_interrupt+0x7f/0x7f
[48112.422660] 
[48112.422664] ? _raw_spin_unlock_irqrestore+0x5/0x10
[48112.422671] ? xhci_handle_command_timeout+0xf4/0x1b0 [xhci_hcd]
[48112.422684] ? process_one_work+0x1d9/0x450
[48112.422689] ? worker_thread+0x42/0x4b0
[48112.422695] ? process_one_work+0x450/0x450
[48112.422698] ? kthread+0x112/0x130
[48112.422702] ? kthread_create_on_node+0x40/0x40
[48112.422705] ? ret_from_fork+0x23/0x30
[48112.422709] ---[ end trace eb9505885b6e349e ]---
[48113.446247] xhci_hcd :00:14.0: xHCI host not responding to stop
endpoint command.
[48113.446250] xhci_hcd :00:14.0: Assuming host is dying, halting host.
[48113.446348] xhci_hcd :00:14.0: HC died; cleaning up


Ugh, that does not look good, Mathias, anything Jason can do here to
help debug this?



both traces and dynamic debugging could be useful here

mount -t debugfs none /sys/kernel/debug
echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control
echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable

then send output of both dmesg and /sys/kernel/debug/tracing/trace

-Mathias

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: drivers/usb/host/xhci-ring.c:1390 handle_cmd_completion

2017-05-22 Thread Greg KH
On Fri, May 19, 2017 at 02:53:20PM +0200, Jason A. Donenfeld wrote:
> I'm having this issue on kernel 4.11.0 and 4.11.1. It usually happens
> after a while of ordinary USB use. Afterwards, USB does not work. If I
> rmmod all the modules and reinsert them, it works for a while, but
> even sooner after it stops working again.
> 
> [48112.422418] [ cut here ]
> [48112.422441] WARNING: CPU: 0 PID: 14420 at
> drivers/usb/host/xhci-ring.c:1390 handle_cmd_completion+0xb17/0xc00
> [xhci_hcd]
> [48112.422446] Modules linked in: xt_hashlimit ip6_udp_tunnel
> udp_tunnel rfcomm pl2303 hid_lenovo bnep cdc_mbim cdc_ncm qcserial
> cdc_wdm usb_wwan usbnet usbserial mii uvcvideo videobuf2_vmalloc
> videobuf2_memops
> [48112.422480] xhci_hcd :00:14.0: Timeout while waiting for setup
> device command
> [48112.422481] videobuf2_v4l2 videobuf2_core cdc_acm videodev btusb
> btintel usbhid bluetooth af_packet nf_conntrack_ipv4 nf_defrag_ipv4
> iptable_filter xt_hl nf_conntrack_ipv6 nf_defrag_ipv6 xt_multiport
> 8021q xt_conntrack nf_conntrack ip6table_filter ip6_tables
> algif_skcipher joydev mousedev snd_hda_codec_realtek
> snd_hda_codec_generic arc4 iwlmvm mac80211 rtsx_pci_sdmmc mmc_core
> intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp
> kvm_intel kvm iwlwifi snd_hda_intel ahci irqbypass snd_hda_codec
> crc32_pclmul snd_hwdep xhci_pci xhci_hcd libahci crc32c_intel
> snd_hda_core mei_me cfg80211 usbcore snd_pcm rtsx_pci ie31200_edac
> input_leds mfd_core e1000e libata usb_common mei snd_timer psmouse
> edac_core intel_pch_thermal thinkpad_acpi snd soundcore led_class
> rfkill tpm_tis tpm_tis_core evdev
> [48112.422556] tpm sch_fq_codel
> [48112.422565] CPU: 0 PID: 14420 Comm: kworker/0:7 Tainted: P W O
> 4.11.1-gentoo #1
> [48112.422567] Hardware name: LENOVO 20ENCTO1WW/20ENCTO1WW, BIOS
> N1EET65W (1.38 ) 02/09/2017
> [48112.422577] Workqueue: events xhci_handle_command_timeout [xhci_hcd]
> [48112.422580] Call Trace:
> [48112.422583] 
> [48112.422589] ? dump_stack+0x46/0x5e
> [48112.422595] ? __warn+0xb9/0xe0
> [48112.422603] ? handle_cmd_completion+0xb17/0xc00 [xhci_hcd]
> [48112.422609] ? try_to_wake_up+0x22e/0x390
> [48112.422617] ? xhci_irq+0x38f/0x1460 [xhci_hcd]
> [48112.422624] ? run_timer_softirq.part.2+0x4c/0xa0
> [48112.422629] ? expire_timers+0x6e/0xe0
> [48112.422634] ? __handle_irq_event_percpu+0x36/0x190
> [48112.422637] ? handle_irq_event_percpu+0x1b/0x50
> [48112.422640] ? handle_irq_event+0x22/0x40
> [48112.422644] ? handle_edge_irq+0x65/0x120
> [48112.422649] ? handle_irq+0x11/0x20
> [48112.422653] ? do_IRQ+0x3c/0xc0
> [48112.422658] ? common_interrupt+0x7f/0x7f
> [48112.422660] 
> [48112.422664] ? _raw_spin_unlock_irqrestore+0x5/0x10
> [48112.422671] ? xhci_handle_command_timeout+0xf4/0x1b0 [xhci_hcd]
> [48112.422684] ? process_one_work+0x1d9/0x450
> [48112.422689] ? worker_thread+0x42/0x4b0
> [48112.422695] ? process_one_work+0x450/0x450
> [48112.422698] ? kthread+0x112/0x130
> [48112.422702] ? kthread_create_on_node+0x40/0x40
> [48112.422705] ? ret_from_fork+0x23/0x30
> [48112.422709] ---[ end trace eb9505885b6e349e ]---
> [48113.446247] xhci_hcd :00:14.0: xHCI host not responding to stop
> endpoint command.
> [48113.446250] xhci_hcd :00:14.0: Assuming host is dying, halting host.
> [48113.446348] xhci_hcd :00:14.0: HC died; cleaning up

Ugh, that does not look good, Mathias, anything Jason can do here to
help debug this?

thanks

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


drivers/usb/host/xhci-ring.c:1390 handle_cmd_completion

2017-05-19 Thread Jason A. Donenfeld
I'm having this issue on kernel 4.11.0 and 4.11.1. It usually happens
after a while of ordinary USB use. Afterwards, USB does not work. If I
rmmod all the modules and reinsert them, it works for a while, but
even sooner after it stops working again.

[48112.422418] [ cut here ]
[48112.422441] WARNING: CPU: 0 PID: 14420 at
drivers/usb/host/xhci-ring.c:1390 handle_cmd_completion+0xb17/0xc00
[xhci_hcd]
[48112.422446] Modules linked in: xt_hashlimit ip6_udp_tunnel
udp_tunnel rfcomm pl2303 hid_lenovo bnep cdc_mbim cdc_ncm qcserial
cdc_wdm usb_wwan usbnet usbserial mii uvcvideo videobuf2_vmalloc
videobuf2_memops
[48112.422480] xhci_hcd :00:14.0: Timeout while waiting for setup
device command
[48112.422481] videobuf2_v4l2 videobuf2_core cdc_acm videodev btusb
btintel usbhid bluetooth af_packet nf_conntrack_ipv4 nf_defrag_ipv4
iptable_filter xt_hl nf_conntrack_ipv6 nf_defrag_ipv6 xt_multiport
8021q xt_conntrack nf_conntrack ip6table_filter ip6_tables
algif_skcipher joydev mousedev snd_hda_codec_realtek
snd_hda_codec_generic arc4 iwlmvm mac80211 rtsx_pci_sdmmc mmc_core
intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp
kvm_intel kvm iwlwifi snd_hda_intel ahci irqbypass snd_hda_codec
crc32_pclmul snd_hwdep xhci_pci xhci_hcd libahci crc32c_intel
snd_hda_core mei_me cfg80211 usbcore snd_pcm rtsx_pci ie31200_edac
input_leds mfd_core e1000e libata usb_common mei snd_timer psmouse
edac_core intel_pch_thermal thinkpad_acpi snd soundcore led_class
rfkill tpm_tis tpm_tis_core evdev
[48112.422556] tpm sch_fq_codel
[48112.422565] CPU: 0 PID: 14420 Comm: kworker/0:7 Tainted: P W O
4.11.1-gentoo #1
[48112.422567] Hardware name: LENOVO 20ENCTO1WW/20ENCTO1WW, BIOS
N1EET65W (1.38 ) 02/09/2017
[48112.422577] Workqueue: events xhci_handle_command_timeout [xhci_hcd]
[48112.422580] Call Trace:
[48112.422583] 
[48112.422589] ? dump_stack+0x46/0x5e
[48112.422595] ? __warn+0xb9/0xe0
[48112.422603] ? handle_cmd_completion+0xb17/0xc00 [xhci_hcd]
[48112.422609] ? try_to_wake_up+0x22e/0x390
[48112.422617] ? xhci_irq+0x38f/0x1460 [xhci_hcd]
[48112.422624] ? run_timer_softirq.part.2+0x4c/0xa0
[48112.422629] ? expire_timers+0x6e/0xe0
[48112.422634] ? __handle_irq_event_percpu+0x36/0x190
[48112.422637] ? handle_irq_event_percpu+0x1b/0x50
[48112.422640] ? handle_irq_event+0x22/0x40
[48112.422644] ? handle_edge_irq+0x65/0x120
[48112.422649] ? handle_irq+0x11/0x20
[48112.422653] ? do_IRQ+0x3c/0xc0
[48112.422658] ? common_interrupt+0x7f/0x7f
[48112.422660] 
[48112.422664] ? _raw_spin_unlock_irqrestore+0x5/0x10
[48112.422671] ? xhci_handle_command_timeout+0xf4/0x1b0 [xhci_hcd]
[48112.422684] ? process_one_work+0x1d9/0x450
[48112.422689] ? worker_thread+0x42/0x4b0
[48112.422695] ? process_one_work+0x450/0x450
[48112.422698] ? kthread+0x112/0x130
[48112.422702] ? kthread_create_on_node+0x40/0x40
[48112.422705] ? ret_from_fork+0x23/0x30
[48112.422709] ---[ end trace eb9505885b6e349e ]---
[48113.446247] xhci_hcd :00:14.0: xHCI host not responding to stop
endpoint command.
[48113.446250] xhci_hcd :00:14.0: Assuming host is dying, halting host.
[48113.446348] xhci_hcd :00:14.0: HC died; cleaning up
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html