subject:"Re\: \[PATCH v4 13\/14\] usb\: force warm reset to break resume livelock"

Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock

2014-02-18 Thread Sarah Sharp

On Tue, Feb 11, 2014 at 07:29:40PM -0800, Julius Werner wrote:
I believe I am seeing a polling livelock scenario as described by Julius.

Julius was talking about what happens when the host controller itself
gets reset (and therefore remembers nothing about any device) whereas
the device still thinks it is in U3. Is that the scenario you're
encountering? I thought you were working on normal runtime PM.

When you turn the power back on for a port, it should start out in
RxDetect and switch to Polling as it detects Rx terminations. If the
downstream device is unhappy for any reason (e.g. in SS.Inactive or
still in U3) and sends no or wrong responses to the LFPS Polling, the
hub's port will either move to ComplianceMode or keep cycling back and
forth between RxDetect and Polling.

The latter is especially dangerous
because it's hard to detect (if you just sample the port status you
might see RxDetect, which would also be expected if there is nothing
connected at all), so I'm thinking an unconditional warm reset might
be unavoidable.

That is why we proposed to go that route for the
Synopsys controller, and I think the same will apply to this situation
(since I think turning off a PortPower bit in XHCI will make the
controller forget a previous U3 state and return to RxDetect when
you turn it back on again, even though the actual VBUS line to the
device may not have been disabled after all).

Julius, are you sure the Synopsys host will actually power off the
ports? The Intel hosts need some special ACPI methods, so I'm not sure
if Dan's issue with ports after power on could even be seen on the
Synopsys host.

The Synopsys issue (as I remember it, please correct me if I'm wrong)
would only be triggered if the host lost power during S3, and was halted
and reset after a register restore failure. I think the solution we
agreed to was to implement a Synopsys host quirk that would warm reset
all ports unconditionally on register restore failure. Was that right?

Then there's Dan's issue. Dan, does the port go into SS.Inactive before
the host starts to cycle between RxDetect and Polling and U0 for this
case?

Hans also ran into this issue (or at least a variation of it), and
proposed a patch to fix it.

https://git.kernel.org/cgit/linux/kernel/git/sarah/xhci.git/commit/?h=for-usb-next-streamsid=3fd14185404e3a298a3f6b6c6f21ff8d41bb2747

Can you take a look at it, and see if it would address your issue? I
think it will catch the case where we transition from SS.Inactive -
RxDetect - Polling.

One other thought (I don't know if it is the right thing to do) is that
we might _always_ perform a warm reset after powering-on a SuperSpeed
port, without bothering to call hub_port_debounce_be_connected.

I'm leaning in that direction. However, the decision comes down to
the relative occurrence frequency of devices that fall into this trap
vs those that successfully recover and would suffer the additional
latency of a warm reset.

Is a warm reset significantly longer than an ordinary reset? We have
to do some kind of reset in any case. After all, the power session
_has_ been interrupted. (Assuming the power switching worked...)

USB 3.0 ports don't need to be reset on connect as a matter of course.
The should usually just start training themselves and eventually
become ready as soon as the wires touch. An extra warm reset would add
80-120ms delay to the port resume. (In comparison, a hot reset should
not take more than 12ms, probably even much less.)

I would rather avoid warm reset unconditionally on connect whenever
possible, because 80-120ms is too long of a delay for some
embedded/tablet systems that come into and out of S3 very often.

With this in place we may want to consider reducing the timeout and
relying on warm reset for recovery.

Why? I'm not familiar with the intricacies of USB-3 link state
changes, but there seem to be only two possibilities:

Either PORT_LS_POLLING is a valid state to be in while
trying to establish a SuperSpeed connection, in which case
we don't want to reduce the timeout,

Or it isn't a valid state, in which case we should abort
the debounce immediately.

It is a valid transitional state, unfortunately, but in a working case
it should resolve itself within a few milliseconds (probably less than
10). Maybe we should try to differentiate between USB 2.0 and 3.0
devices in hub_port_debounce()? I think due to the built-in link
training in USB 3.0, the classic debouncing doesn't really make sense
anymore (and wastes a lot of time since SuperSpeed links can train
really fast when they work).

Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock

2014-02-18 Thread Felipe Balbi

Hi,

+ Paul as he might have better details about the Synopsys core host-side
implementation

On Tue, Feb 18, 2014 at 12:42:53PM -0800, Sarah Sharp wrote:
On Tue, Feb 11, 2014 at 07:29:40PM -0800, Julius Werner wrote:
I believe I am seeing a polling livelock scenario as described by
Julius.