Hello,

I'm debugging some issues on a couple of different USB TV tuners which
boil down to the following error:

xhci_hcd 0000:00:14.0: xHCI host not responding to stop endpoint command.

This is followed by the XHCI driver disconnecting *all* USB devices
from the controller.

I've done a bit of debugging, and the root of the issue appears to be
an intermittent control message timing out, and then the call to
usb_kill_urb() that occurs inside of usb_control_msg() when the
timeout expires is what causes the disconnect.  Specifically, it would
appear that xhci_urb_dequeue tries to stop the endpoint using
xhci_queue_stop_endpoint(), the command gets queued but the IRQ never
fires to perform the TRB_STOP_RING completion code. The function
xhci_stop_endpoint_command_watchdog() fires after five seconds, which
tears down the entire driver.

Below is the dmesg output with the xhci_hcd debugging enabled.  The
dump_stack() call is something I added (i.e. it's not an OOPS) so I
could see which code path was making the usb_kill_urb() call that was
failing.  Note that the caller is using usb_control_msg() with 1000ms
timeout, and we can see from the timestamps that the timer expires
which is what causes the call to usb_kill_urb().

I would imagine that explicitly killing URBs is a pretty uncommon task
for control endpoint messages (compared to ISOC or BULK endpoints
where it's done regularly).  Is it possible a exception case has been
missed?

Independent of the usb_kill_urb() killing the entire stack, I still
don't really understand yet why the control message failed in the
first place.  This is a well-exercised code path in the au0828 driver
(related to I2C transfers) and I've never seen this when using the
EHCI driver.  My assumption is that either the HCD is getting sick
which is causing both the control message to fail as well as putting
it into an inconsistent state such that we never get the TRB_STOP_RING
IRQ, or we've got two separate bugs - the control message failing for
some "legitimate" reason (i.e. I screwed something up in my au0828
driver), followed by the usb_kill_urb() error simply not handling
killing of URBs on a control endpoint (which causes the entire stack
to go down).

Thoughts/suggestions/recommendations are welcome.

Thanks in advance,

Devin

Jan 18 14:04:05 devin-MacBookPro kernel: [ 9119.647249] au0828:
au0828_writereg(0x0100, 0x00)
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.645091] xhci_hcd
0000:00:14.0: Cancel URB ffff8802543c36c0, dev 1, ep 0x0, starting at
offset 0x25c358940
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.645365] djh dequeue
pending=0 ep_index=0
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.645632] CPU: 1 PID:
2782 Comm: tvtime Tainted: P           OE  3.18.0-rc4djh+ #33
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.645921] Hardware name:
Apple Inc. MacBookPro11,1/Mac-189A3D4F975D5FFC, BIOS
MBP111.88Z.0138.B11.1408291433 08/29/2014
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.646236]
ffff88025b9b0000 ffff88023ea2f9d8 ffffffff817445c1 0000000000000000
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.646570]
ffff8802543c36c0 ffff88023ea2fa58 ffffffffa0080b2e 000000025c358940
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.646909]
ffff88023ea2fa48 000000003ea2fa18 ffff88023e8a22a0 0000000000000000
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.647256] Call Trace:
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.647605]
[<ffffffff817445c1>] dump_stack+0x46/0x58
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.647981]
[<ffffffffa0080b2e>] xhci_urb_dequeue+0x28e/0x420 [xhci_hcd]
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.648357]
[<ffffffff8157f1fd>] unlink1+0x2d/0x130
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.648743]
[<ffffffff810cbd80>] ? internal_add_timer+0xb0/0xb0
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.649133]
[<ffffffff814aaa67>] ? get_device+0x17/0x30
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.649526]
[<ffffffff8158129d>] usb_hcd_unlink_urb+0x5d/0xf0
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.649928]
[<ffffffff8158281a>] usb_kill_urb+0x3a/0xa0
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.650334]
[<ffffffff81093890>] ? wake_up_state+0x20/0x20
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.650746]
[<ffffffff81583068>] usb_start_wait_urb+0xc8/0x150
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.651166]
[<ffffffff811b4065>] ? __kmalloc+0x55/0x190
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.651586]
[<ffffffff815831b5>] usb_control_msg+0xc5/0x110
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.652011]
[<ffffffffa05362a9>] au0828_writereg+0x79/0xf0 [au0828]
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.652447]
[<ffffffff810ccfaf>] ? try_to_del_timer_sync+0x4f/0x70
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.652894]
[<ffffffffa053b929>] au0828_analog_stream_disable+0x29/0x50 [au0828]
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.653354]
[<ffffffffa053ba29>] vidioc_streamoff+0xd9/0x1c0 [au0828]
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.653823]
[<ffffffffa0b5a98a>] v4l_streamoff+0x1a/0x20 [videodev]
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.654298]
[<ffffffffa0b5d484>] __video_do_ioctl+0x274/0x2f0 [videodev]
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.654786]
[<ffffffff811c4dcf>] ? commit_charge+0x1f/0x80
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.655278]
[<ffffffffa0b5fdbe>] video_usercopy+0x21e/0x580 [videodev]
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.655784]
[<ffffffffa0b5d210>] ? v4l_dqevent+0x20/0x20 [videodev]
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.656295]
[<ffffffff8105623c>] ? __do_page_fault+0x20c/0x550
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.656815]
[<ffffffffa0b60135>] video_ioctl2+0x15/0x20 [videodev]
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.657359]
[<ffffffffa0b596bf>] v4l2_ioctl+0x10f/0x150 [videodev]
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.657895]
[<ffffffff8111c2dc>] ? acct_account_cputime+0x1c/0x20
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.658440]
[<ffffffff811e3040>] do_vfs_ioctl+0x2e0/0x4c0
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.658993]
[<ffffffff81097374>] ? vtime_account_user+0x54/0x60
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.659551]
[<ffffffff81022097>] ? syscall_trace_enter_phase2+0xa7/0x1d0
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.660116]
[<ffffffff811e32a1>] SyS_ioctl+0x81/0xa0
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.660690]
[<ffffffff8174cc24>] tracesys_phase2+0xd8/0xdd
Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.664207] xhci_hcd
0000:00:14.0: // Ding dong!
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.671045] xhci_hcd
0000:00:14.0: xHCI host not responding to stop endpoint command.
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.671679] xhci_hcd
0000:00:14.0: Assuming host is dying, halting host.
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.672317] xhci_hcd
0000:00:14.0: // Halt the HC
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.672950] xhci_hcd
0000:00:14.0: Killing URBs for slot ID 1, ep index 0
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.673602] xhci_hcd
0000:00:14.0: Killing URBs for slot ID 1, ep index 2
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.674246] xhci_hcd
0000:00:14.0: Killing URBs for slot ID 1, ep index 3
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.674900] xhci_hcd
0000:00:14.0: Killing URBs for slot ID 2, ep index 0
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.676020] xhci_hcd
0000:00:14.0: Killing URBs for slot ID 2, ep index 2
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.677146] xhci_hcd
0000:00:14.0: get port status, actual port 2 status  = 0x1203
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.678271] xhci_hcd
0000:00:14.0: Get port status returned 0x203
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.679400] xhci_hcd
0000:00:14.0: Killing URBs for slot ID 3, ep index 0
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.680567] xhci_hcd
0000:00:14.0: Killing URBs for slot ID 3, ep index 2
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.681736] xhci_hcd
0000:00:14.0: Killing URBs for slot ID 3, ep index 6
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.682908] xhci_hcd
0000:00:14.0: set port reset, actual port 2 status  = 0x1311
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.684078] xhci_hcd
0000:00:14.0: Killing URBs for slot ID 3, ep index 8
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.684776] xhci_hcd
0000:00:14.0: Killing URBs for slot ID 6, ep index 0
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.685464] xhci_hcd
0000:00:14.0: Killing URBs for slot ID 6, ep index 2
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.686158] Bluetooth:
hci0 urb ffff88025b997b40 failed to resubmit (22)
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.686863] xhci_hcd
0000:00:14.0: Killing URBs for slot ID 6, ep index 3
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.687576] xhci_hcd
0000:00:14.0: Killing URBs for slot ID 6, ep index 4
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.688298] Bluetooth:
hci0 urb ffff88025b997240 failed to resubmit (22)
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.689024] Bluetooth:
hci0 urb ffff88025b9973c0 failed to resubmit (22)
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.689752] xhci_hcd
0000:00:14.0: Killing URBs for slot ID 6, ep index 5
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.690491] xhci_hcd
0000:00:14.0: Killing URBs for slot ID 6, ep index 6
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.691239] xhci_hcd
0000:00:14.0: Killing URBs for slot ID 6, ep index 7
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.691983] xhci_hcd
0000:00:14.0: Killing URBs for slot ID 6, ep index 8
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.692725] xhci_hcd
0000:00:14.0: Killing URBs for slot ID 7, ep index 0
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.693470] xhci_hcd
0000:00:14.0: Killing URBs for slot ID 7, ep index 2
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.694214] xhci_hcd
0000:00:14.0: Killing URBs for slot ID 7, ep index 4
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.694981] xhci_hcd
0000:00:14.0: Killing URBs for slot ID 7, ep index 6
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.695728] xhci_hcd
0000:00:14.0: Killing URBs for slot ID 7, ep index 8
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.696477] xhci_hcd
0000:00:14.0: Calling usb_hc_died()
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.697229] xhci_hcd
0000:00:14.0: HC died; cleaning up
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.697985] xhci_hcd
0000:00:14.0: xHCI host controller is dead.
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.698750] xhci_hcd
0000:00:14.0: Command timeout on stopped ring
Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.699521] xhci_hcd
0000:00:14.0: Turn aborted command ffff88025c227350 to no-op


-- 
Devin J. Heitmueller - Kernel Labs
http://www.kernellabs.com
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to