On Wed, Dec 12, 2012 at 12:47:24PM +0100, Javier Martinez Canillas wrote:
> Hello,
> 
> We have an issue when trying to use USB cameras on a particular machine using
> the latest mainline Linux 3.7 kernel. This is not a regression since the same
> issue is present with older kernels (i.e: 3.5).
> 
> The cameras work fine when plugged to an USB2.0 port (using the EHCI HCD host
> controller driver) but they don't when using the USB3.0 port (using the xHCI
> HCD host controller driver).
> 
> The machine's USB3.0 host controller is a NEC Corporation uPD720200 USB 3.0 
> Host
> Controller (rev 04).
> 
> When enabling trace on the uvcvideo driver I see that most frames are lost:
> 
> Dec 12 11:07:58 thinclient kernel: [ 4965.597637] uvcvideo: USB isochronous
> frame lost (-18).
> Dec 12 11:07:58 thinclient kernel: [ 4965.597642] uvcvideo: USB isochronous
> frame lost (-18).
> Dec 12 11:07:58 thinclient kernel: [ 4965.597647] uvcvideo: Marking buffer as
> bad (error bit set).
> Dec 12 11:07:58 thinclient kernel: [ 4965.597651] uvcvideo: Frame complete 
> (EOF
> found).
> Dec 12 11:07:58 thinclient kernel: [ 4965.597655] uvcvideo: EOF in empty 
> payload.
> Dec 12 11:07:58 thinclient kernel: [ 4965.597661] uvcvideo: Dropping payload
> (out of sync).
> Dec 12 11:07:58 thinclient kernel: [ 4965.813294] uvcvideo: frame 486 stats:
> 0/2/8 packets, 0/0/8 pts
> 
> The uvcvideo checks if urb->iso_frame_desc[i].status < 0 on the
> uvc_video_decode_isoc() function (drivers/media/usb/uvc/uvc_video.c).
> 
> I checked on the xhci driver and the only place where this error code (-EXDEV)
> is assigned to frame->status is inside the skip_isoc_td() function
> (drivers/usb/host/xhci-ring.c).
> 
> At this point I'm not sure if this is a bug on the xhci driver, another quirk
> needed by the XHCI_NEC_HOST, a camera misconfiguration on the USB Video Class
> driver or a firmware/hardware bug.

It's a known performance issue, although it's not clear whether it's on
the xHCI driver side or the host controller side.  When an interface
setting is enabled where the isochronous endpoint requires two
additional transfers per service interval, the NEC host controller
starts reporting many missed service intervals.  The xHCI driver then
finds all the frame buffers that were skipped and marks them with the
-EXDEV status.

An error status of Missed Service Interval means the host controller
could not access the transfer memory fast enough through the PCI bus to
service the endpoint in time.  It could be a host hardware issue, or it
could be software slowing down the system to a crawl.  I lean towards a
software issue since, as you said, the Windows driver works fine.
(Although who knows what NEC quirks the Windows driver is working
around...)

The NEC xHCI host controller is a 0.96 revision, which doesn't support
the Block Event Interrupt (BEI) flag which cuts down on the number of
interrupts per URB submitted.  So the xHCI driver's interrupt routine
gets called on every single service interval, rather than being called
once per URB.

Since the Linux xHCI driver isn't really optimized for performance yet,
the interrupt handler is probably pretty slow and could cause delays in
submitting future URBs.  The high amount of interrupts is probably
causing other systems to be starved, possibly leading to the xHCI host
controller not being able to access memory fast enough to service the
endpoint.

> The cameras are reported to work on the same machine but using another 
> operating
> system (Windows).

Windows probably uses Event Data TRBs to cut the interrupts down to one
per URB.  It would take a major effort to make the xHCI driver use Event
Data TRBs.

> I was wondering if you can give me some pointers on how to be sure what's the
> issue or if this rings any bells to you.

I don't have time to work on performance issues right now, as I have
several other critical bugs (mostly around failed S3/S4).  However, if
you want to try to fix this issue yourself, I suggest you run perf and
see where the bottle necks in the xHCI interrupt handler are.

I suspect that part of it is that the interrupt handler reads the xHCI
status register.  That PCI register read is pretty costly, and it's not
necessary since 99% of the time the host controller is going to report
an OK status.  And there's no guarantee that when the host does have an
error that it will set a bad status.

But without an analysis by perf, we won't really know where the
bottlenecks are.

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to