Re: XHCI Clear halt issue

2014-04-08 Thread Eric Gross
Thanks for your help, Mathias! See my comments inline below:

Mathias Nyman  wrote on 04/08/2014 10:26:43 
AM:
> The issue we currently have is that the xHCI (both driver and hw) 
> refuses to reset an endpoint if it's not halted.
> SetFeature(ENDPOINT_HALT) will set the device to halted state, but it 
> requires some additional transfer that returns STALL until xHCI will see 

> the endpoint as halted.
> 
> So in this case the situation is:
> Abort pending urbs
> SetFeature(ENDPOINT_HALT)
>   - ep halted on device side, xHCI doesn't consider ep halted.
> usb_clear_halt()
>   - ClearFeature(ENDPOINT_HALT) -> device resets its ep toggle/sequence
>   - call hcd->driver->endpoint_reset(), but the xhci .endpoint_reset()
> callback can't reset an endpoint it doesn't consider halted.
> xhci host side toggle/sequence are not reset -> mismatch.

Ok. But there shouldn't be any way user code should be able to get the two 
out-of-sync, right? This is really a layer below what the user should be 
able to interact with. Maybe this is what you are saying?

> With dynamic debugging enabled for xhci you should probably see:
> "Endpoint x not halted, refusing to reset."

I'll try to get a kernel installed with this enabled. Right now it is a 
bit tricky to update kernels on our systems because their are a whole 
hierarchy of dependencies that need to rebuild with it. If there are 
specific things to test that I can lump together I can rebuild it all at 
once.

> Discussion threads touching this topic:
> http://marc.info/?l=linux-usb&m=134922286125585&w=2
> http://marc.info/?l=linux-usb&m=134852269014614&w=2
> http://marc.info/?l=linux-usb&m=139025060301432&w=2

Thanks for consolidating those messages. Those were the ones I had seen 
previously but wasn't sure what to conclude.

> I'm focusing on this issue right now, and I appreciate if you are able 
> to run some test with your setup once I get something ready.

Great! I can help as needed.

> The main thing that needs to be done is what xHCI specs states
> in an additional Note added to section 4.6.8 :
> " If software wishes reset the Data Toggle or Sequence Number of an 
> endpoint that isn't in the Halted state, then software may issue a 
> Configure Endpoint Command with the Drop and Add bits set for the
> target endpoint." But some other tweaking to how xhci driver handles 
> STALL and clears halted endpoints is also needed.

Since the bus trace looks the same on Windows as on Linux (minues the 
incorrect sequence number and the failure), I assume this must be how it 
is done there?

Eric Gross


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


XHCI Clear halt issue

2014-04-07 Thread Eric Gross
Hi all,

I am implementing a driver (currently libusb-based, but may change to 
kernel-based eventually) for a USB standard class type that makes use of 
endpoint stalling as a synchronization mechanism to recover after error 
conditions between device and host (the reasons for needing it are a bit 
complex). The driver code I have been using works beautifully on Windows, 
some embedded OSes with proprietary USB stacks, and Linux via the EHCI 
driver. However, I ran into problems as soon as we started using this 
driver on XHCI systems (based off the 3.10 kernel).

The sequence the driver typically does when encountering an error (or 
thinking it needs to resync) is:
- Abort any pending URBs (may be several queued to the EP)
- Set Feature(HALT)
- Clear EP Stall
- Continue

What we saw with a bus analyzer was that, independent of host controller 
used (tested Intel and Renesas), the sequence number of the next outgoing 
packet (or toggle bit when in High Speed mode) was incorrect after 
clearing the stall. The device resets its expected sequence/toggle after 
un-stalling the EP and hence it ignores the next packet with the incorrect 
one. Interestingly, some devices are actually tolerant of this behavior 
and accept the incorrect sequence id, but any devices based on the Cypress 
FX3 (a large number of devices implementing this class type) fail.

When researching this issue I saw a number of previous posts hinting at 
known issues like this, but I have not seen a firm conclusion. It seems 
that some of the early responses by Sarah Sharp indicate that it is 
working this way by design (I admit I am not an expert in the XHCI spec). 
I see some newer posts referencing a "clear  halt bug", but I have been 
unable to find what this definitively is referencing. Based on my 
experience with how every other stack appears to work (including the Linux 
EHCI driver) and how the device is supposed to behave when it gets the 
clear stall request, I can't help but think that the behavior as it 
currently is is wrong.

I can provide any additional information (bus traces, testing results, 
etc) as needed. If this is a known issue that someone can point me to the 
bugzilla entry for (I have been unsuccessful finding one) or some previous 
discussion threads I may have not found, it would be appreciated as well.

Thanks,
Eric Gross
National Instruments

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html