Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock

2014-02-18 Thread Sarah Sharp
On Tue, Feb 11, 2014 at 07:29:40PM -0800, Julius Werner wrote:
  I believe I am seeing a polling livelock scenario as described by Julius.
 
  Julius was talking about what happens when the host controller itself
  gets reset (and therefore remembers nothing about any device) whereas
  the device still thinks it is in U3.  Is that the scenario you're
  encountering?  I thought you were working on normal runtime PM.
 
 When you turn the power back on for a port, it should start out in
 RxDetect and switch to Polling as it detects Rx terminations. If the
 downstream device is unhappy for any reason (e.g. in SS.Inactive or
 still in U3) and sends no or wrong responses to the LFPS Polling, the
 hub's port will either move to ComplianceMode or keep cycling back and
 forth between RxDetect and Polling.

 The latter is especially dangerous
 because it's hard to detect (if you just sample the port status you
 might see RxDetect, which would also be expected if there is nothing
 connected at all), so I'm thinking an unconditional warm reset might
 be unavoidable.

 That is why we proposed to go that route for the
 Synopsys controller, and I think the same will apply to this situation
 (since I think turning off a PortPower bit in XHCI will make the
 controller forget a previous U3 state and return to RxDetect when
 you turn it back on again, even though the actual VBUS line to the
 device may not have been disabled after all).

Julius, are you sure the Synopsys host will actually power off the
ports?  The Intel hosts need some special ACPI methods, so I'm not sure
if Dan's issue with ports after power on could even be seen on the
Synopsys host.

The Synopsys issue (as I remember it, please correct me if I'm wrong)
would only be triggered if the host lost power during S3, and was halted
and reset after a register restore failure.  I think the solution we
agreed to was to implement a Synopsys host quirk that would warm reset
all ports unconditionally on register restore failure.  Was that right?

Then there's Dan's issue.  Dan, does the port go into SS.Inactive before
the host starts to cycle between RxDetect and Polling and U0 for this
case?

Hans also ran into this issue (or at least a variation of it), and
proposed a patch to fix it.

https://git.kernel.org/cgit/linux/kernel/git/sarah/xhci.git/commit/?h=for-usb-next-streamsid=3fd14185404e3a298a3f6b6c6f21ff8d41bb2747

Can you take a look at it, and see if it would address your issue?  I
think it will catch the case where we transition from SS.Inactive -
RxDetect - Polling.

   One other thought (I don't know if it is the right thing to do) is that
   we might _always_ perform a warm reset after powering-on a SuperSpeed
   port, without bothering to call hub_port_debounce_be_connected.
 
  I'm leaning in that direction.  However, the decision comes down to
  the relative occurrence frequency of devices that fall into this trap
  vs those that successfully recover and would suffer the additional
  latency of a warm reset.
 
  Is a warm reset significantly longer than an ordinary reset?  We have
  to do some kind of reset in any case.  After all, the power session
  _has_ been interrupted.  (Assuming the power switching worked...)
 
 USB 3.0 ports don't need to be reset on connect as a matter of course.
 The should usually just start training themselves and eventually
 become ready as soon as the wires touch. An extra warm reset would add
 80-120ms delay to the port resume. (In comparison, a hot reset should
 not take more than 12ms, probably even much less.)

I would rather avoid warm reset unconditionally on connect whenever
possible, because 80-120ms is too long of a delay for some
embedded/tablet systems that come into and out of S3 very often.

   With this in place we may want to consider reducing the timeout and
   relying on warm reset for recovery.
  
   Why?  I'm not familiar with the intricacies of USB-3 link state
   changes, but there seem to be only two possibilities:
  
   Either PORT_LS_POLLING is a valid state to be in while
   trying to establish a SuperSpeed connection, in which case
   we don't want to reduce the timeout,
  
   Or it isn't a valid state, in which case we should abort
   the debounce immediately.
 
 It is a valid transitional state, unfortunately, but in a working case
 it should resolve itself within a few milliseconds (probably less than
 10). Maybe we should try to differentiate between USB 2.0 and 3.0
 devices in hub_port_debounce()? I think due to the built-in link
 training in USB 3.0, the classic debouncing doesn't really make sense
 anymore (and wastes a lot of time since SuperSpeed links can train
 really fast when they work).
 
 As for this patch, I think the best approach would be to wait for the
 device to come back in usb_port_runtime_resume() (through
 hub_port_debounce() or something else), and if it doesn't show up
 always set the bit to warm reset the port (regardless of 

Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock

2014-02-18 Thread Felipe Balbi
Hi,

+ Paul as he might have better details about the Synopsys core host-side
implementation

On Tue, Feb 18, 2014 at 12:42:53PM -0800, Sarah Sharp wrote:
 On Tue, Feb 11, 2014 at 07:29:40PM -0800, Julius Werner wrote:
   I believe I am seeing a polling livelock scenario as described by 
   Julius.
  
   Julius was talking about what happens when the host controller itself
   gets reset (and therefore remembers nothing about any device) whereas
   the device still thinks it is in U3.  Is that the scenario you're
   encountering?  I thought you were working on normal runtime PM.
  
  When you turn the power back on for a port, it should start out in
  RxDetect and switch to Polling as it detects Rx terminations. If the
  downstream device is unhappy for any reason (e.g. in SS.Inactive or
  still in U3) and sends no or wrong responses to the LFPS Polling, the
  hub's port will either move to ComplianceMode or keep cycling back and
  forth between RxDetect and Polling.
 
  The latter is especially dangerous
  because it's hard to detect (if you just sample the port status you
  might see RxDetect, which would also be expected if there is nothing
  connected at all), so I'm thinking an unconditional warm reset might
  be unavoidable.
 
  That is why we proposed to go that route for the
  Synopsys controller, and I think the same will apply to this situation
  (since I think turning off a PortPower bit in XHCI will make the
  controller forget a previous U3 state and return to RxDetect when
  you turn it back on again, even though the actual VBUS line to the
  device may not have been disabled after all).
 
 Julius, are you sure the Synopsys host will actually power off the
 ports?  The Intel hosts need some special ACPI methods, so I'm not sure
 if Dan's issue with ports after power on could even be seen on the
 Synopsys host.
 
 The Synopsys issue (as I remember it, please correct me if I'm wrong)
 would only be triggered if the host lost power during S3, and was halted
 and reset after a register restore failure.  I think the solution we
 agreed to was to implement a Synopsys host quirk that would warm reset
 all ports unconditionally on register restore failure.  Was that right?
 
 Then there's Dan's issue.  Dan, does the port go into SS.Inactive before
 the host starts to cycle between RxDetect and Polling and U0 for this
 case?
 
 Hans also ran into this issue (or at least a variation of it), and
 proposed a patch to fix it.
 
 https://git.kernel.org/cgit/linux/kernel/git/sarah/xhci.git/commit/?h=for-usb-next-streamsid=3fd14185404e3a298a3f6b6c6f21ff8d41bb2747
 
 Can you take a look at it, and see if it would address your issue?  I
 think it will catch the case where we transition from SS.Inactive -
 RxDetect - Polling.
 
One other thought (I don't know if it is the right thing to do) is that
we might _always_ perform a warm reset after powering-on a SuperSpeed
port, without bothering to call hub_port_debounce_be_connected.
  
   I'm leaning in that direction.  However, the decision comes down to
   the relative occurrence frequency of devices that fall into this trap
   vs those that successfully recover and would suffer the additional
   latency of a warm reset.
  
   Is a warm reset significantly longer than an ordinary reset?  We have
   to do some kind of reset in any case.  After all, the power session
   _has_ been interrupted.  (Assuming the power switching worked...)
  
  USB 3.0 ports don't need to be reset on connect as a matter of course.
  The should usually just start training themselves and eventually
  become ready as soon as the wires touch. An extra warm reset would add
  80-120ms delay to the port resume. (In comparison, a hot reset should
  not take more than 12ms, probably even much less.)
 
 I would rather avoid warm reset unconditionally on connect whenever
 possible, because 80-120ms is too long of a delay for some
 embedded/tablet systems that come into and out of S3 very often.
 
With this in place we may want to consider reducing the timeout and
relying on warm reset for recovery.
   
Why?  I'm not familiar with the intricacies of USB-3 link state
changes, but there seem to be only two possibilities:
   
Either PORT_LS_POLLING is a valid state to be in while
trying to establish a SuperSpeed connection, in which case
we don't want to reduce the timeout,
   
Or it isn't a valid state, in which case we should abort
the debounce immediately.
  
  It is a valid transitional state, unfortunately, but in a working case
  it should resolve itself within a few milliseconds (probably less than
  10). Maybe we should try to differentiate between USB 2.0 and 3.0
  devices in hub_port_debounce()? I think due to the built-in link
  training in USB 3.0, the classic debouncing doesn't really make sense
  anymore (and wastes a lot of time since SuperSpeed links can train
  really fast when they work).
  
  As 

RE: [PATCH v4 13/14] usb: force warm reset to break resume livelock

2014-02-18 Thread Paul Zimmerman
 From: Felipe Balbi [mailto:ba...@ti.com]
 Sent: Tuesday, February 18, 2014 1:05 PM
 
 Hi,
 
 + Paul as he might have better details about the Synopsys core host-side
 implementation
 
 On Tue, Feb 18, 2014 at 12:42:53PM -0800, Sarah Sharp wrote:
  On Tue, Feb 11, 2014 at 07:29:40PM -0800, Julius Werner wrote:
I believe I am seeing a polling livelock scenario as described by 
Julius.
   
Julius was talking about what happens when the host controller itself
gets reset (and therefore remembers nothing about any device) whereas
the device still thinks it is in U3.  Is that the scenario you're
encountering?  I thought you were working on normal runtime PM.
  
   When you turn the power back on for a port, it should start out in
   RxDetect and switch to Polling as it detects Rx terminations. If the
   downstream device is unhappy for any reason (e.g. in SS.Inactive or
   still in U3) and sends no or wrong responses to the LFPS Polling, the
   hub's port will either move to ComplianceMode or keep cycling back and
   forth between RxDetect and Polling.
 
   The latter is especially dangerous
   because it's hard to detect (if you just sample the port status you
   might see RxDetect, which would also be expected if there is nothing
   connected at all), so I'm thinking an unconditional warm reset might
   be unavoidable.
 
   That is why we proposed to go that route for the
   Synopsys controller, and I think the same will apply to this situation
   (since I think turning off a PortPower bit in XHCI will make the
   controller forget a previous U3 state and return to RxDetect when
   you turn it back on again, even though the actual VBUS line to the
   device may not have been disabled after all).
 
  Julius, are you sure the Synopsys host will actually power off the
  ports?  The Intel hosts need some special ACPI methods, so I'm not sure
  if Dan's issue with ports after power on could even be seen on the
  Synopsys host.
 
  The Synopsys issue (as I remember it, please correct me if I'm wrong)
  would only be triggered if the host lost power during S3, and was halted
  and reset after a register restore failure.  I think the solution we
  agreed to was to implement a Synopsys host quirk that would warm reset
  all ports unconditionally on register restore failure.  Was that right?
 
  Then there's Dan's issue.  Dan, does the port go into SS.Inactive before
  the host starts to cycle between RxDetect and Polling and U0 for this
  case?
 
  Hans also ran into this issue (or at least a variation of it), and
  proposed a patch to fix it.
 
  https://git.kernel.org/cgit/linux/kernel/git/sarah/xhci.git/commit/?h=for-usb-next-streamsid=3fd14185404e3a298a3f6b6c6f21ff8d41bb2747
 
  Can you take a look at it, and see if it would address your issue?  I
  think it will catch the case where we transition from SS.Inactive -
  RxDetect - Polling.
 
 One other thought (I don't know if it is the right thing to do) is 
 that
 we might _always_ perform a warm reset after powering-on a SuperSpeed
 port, without bothering to call hub_port_debounce_be_connected.
   
I'm leaning in that direction.  However, the decision comes down to
the relative occurrence frequency of devices that fall into this trap
vs those that successfully recover and would suffer the additional
latency of a warm reset.
   
Is a warm reset significantly longer than an ordinary reset?  We have
to do some kind of reset in any case.  After all, the power session
_has_ been interrupted.  (Assuming the power switching worked...)
  
   USB 3.0 ports don't need to be reset on connect as a matter of course.
   The should usually just start training themselves and eventually
   become ready as soon as the wires touch. An extra warm reset would add
   80-120ms delay to the port resume. (In comparison, a hot reset should
   not take more than 12ms, probably even much less.)
 
  I would rather avoid warm reset unconditionally on connect whenever
  possible, because 80-120ms is too long of a delay for some
  embedded/tablet systems that come into and out of S3 very often.
 
 With this in place we may want to consider reducing the timeout and
 relying on warm reset for recovery.

 Why?  I'm not familiar with the intricacies of USB-3 link state
 changes, but there seem to be only two possibilities:

 Either PORT_LS_POLLING is a valid state to be in while
 trying to establish a SuperSpeed connection, in which case
 we don't want to reduce the timeout,

 Or it isn't a valid state, in which case we should abort
 the debounce immediately.
  
   It is a valid transitional state, unfortunately, but in a working case
   it should resolve itself within a few milliseconds (probably less than
   10). Maybe we should try to differentiate between USB 2.0 and 3.0
   devices in hub_port_debounce()? I think due to the built-in link
   

Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock

2014-02-18 Thread Julius Werner
 Julius, are you sure the Synopsys host will actually power off the
 ports?  The Intel hosts need some special ACPI methods, so I'm not sure
 if Dan's issue with ports after power on could even be seen on the
 Synopsys host.

 The Synopsys issue (as I remember it, please correct me if I'm wrong)
 would only be triggered if the host lost power during S3, and was halted
 and reset after a register restore failure.  I think the solution we
 agreed to was to implement a Synopsys host quirk that would warm reset
 all ports unconditionally on register restore failure.  Was that right?

 Then there's Dan's issue.  Dan, does the port go into SS.Inactive before
 the host starts to cycle between RxDetect and Polling and U0 for this
 case?

Sorry, I didn't mean to pull the Synopsys issue into this thread... we
should probably keep that separate in
https://lkml.org/lkml/2013/12/9/336 , or this will get too confusing.
Some aspects of that issue are definitely different, e.g. there's no
port power being turned off there (on the contrary, the problem is
that the power session stays alive during suspend but the xHC gets
reset and forgets about it). I just wanted to point out that both
issues can lead to the same state (by different means) where the port
status cycles between RxDetect and Polling, because they both reset
the host controller's port state to RxDetect in a way that the device
might not notice (or not react correctly to). I think whenever the
host port state gets forced back to RxDetect while the device is in
SS.Inactive (or anything that can lead to SS.Inactive after a few
unexpected packets) you will get into this state, and you will only
get back out of there through a warm reset (or by physically cutting
off VBUS).

 Hans also ran into this issue (or at least a variation of it), and
 proposed a patch to fix it.

 https://git.kernel.org/cgit/linux/kernel/git/sarah/xhci.git/commit/?h=for-usb-next-streamsid=3fd14185404e3a298a3f6b6c6f21ff8d41bb2747

 Can you take a look at it, and see if it would address your issue?  I
 think it will catch the case where we transition from SS.Inactive -
 RxDetect - Polling.

I don't think that's targeting the same problem. Hans seems to be
describing a situation where the device connects again even though he
doesn't want it to -- in our case, the device doesn't connect even
though we want that. His patch shouldn't do anything for our issue
since he checks for PORT_STAT_CONNECTION, and that bit will be unset
when the host port is stuck in RxDetect/Polling.

  It is a valid transitional state, unfortunately, but in a working case
  it should resolve itself within a few milliseconds (probably less than
  10). Maybe we should try to differentiate between USB 2.0 and 3.0
  devices in hub_port_debounce()? I think due to the built-in link
  training in USB 3.0, the classic debouncing doesn't really make sense
  anymore (and wastes a lot of time since SuperSpeed links can train
  really fast when they work).
 
  As for this patch, I think the best approach would be to wait for the
  device to come back in usb_port_runtime_resume() (through
  hub_port_debounce() or something else), and if it doesn't show up
  always set the bit to warm reset the port (regardless of LTSSM state,
  since even if it says RxDetect I wouldn't be sure that there is really
  nothing connected). We could then also use those bits in the lost
  power case of xhci_resume() to try and work around the problems with
  that Synopsys controller.

 That's a lot of changes to the hub core.  Would an xHCI quirk be
 simpler?  Is there some scenario I'm missing that the S3 resume quirk
 wouldn't handle?

We don't need to change hub_port_debounce() right away... that was
just an additional suggestion to make things more efficient for
SuperSpeed devices in general. I think for now (in order to solve
Dan's problem), it would be best if he just calls hub_port_debounce()
in usb_port_runtime_resume() (which should bring a connected device
back in the majority of cases), and always queues up a warm reset if
that fails. (This is essentially what his patch is already doing, just
get rid of the extra check for USB_SS_PORT_LS_POLLING in the error
path which I think might be a little too specific and overlook cases
where the RxDetect/Polling cycle just happens to be at RxDetect in
that moment.)
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock

2014-02-18 Thread Dan Williams
On Tue, Feb 18, 2014 at 2:40 PM, Julius Werner jwer...@chromium.org wrote:
 Can you take a look at it, and see if it would address your issue?  I
 think it will catch the case where we transition from SS.Inactive -
 RxDetect - Polling.

 I don't think that's targeting the same problem. Hans seems to be
 describing a situation where the device connects again even though he
 doesn't want it to -- in our case, the device doesn't connect even
 though we want that. His patch shouldn't do anything for our issue
 since he checks for PORT_STAT_CONNECTION, and that bit will be unset
 when the host port is stuck in RxDetect/Polling.

Yes, agreed.


  It is a valid transitional state, unfortunately, but in a working case
  it should resolve itself within a few milliseconds (probably less than
  10). Maybe we should try to differentiate between USB 2.0 and 3.0
  devices in hub_port_debounce()? I think due to the built-in link
  training in USB 3.0, the classic debouncing doesn't really make sense
  anymore (and wastes a lot of time since SuperSpeed links can train
  really fast when they work).
 
  As for this patch, I think the best approach would be to wait for the
  device to come back in usb_port_runtime_resume() (through
  hub_port_debounce() or something else), and if it doesn't show up
  always set the bit to warm reset the port (regardless of LTSSM state,
  since even if it says RxDetect I wouldn't be sure that there is really
  nothing connected). We could then also use those bits in the lost
  power case of xhci_resume() to try and work around the problems with
  that Synopsys controller.

 That's a lot of changes to the hub core.  Would an xHCI quirk be
 simpler?  Is there some scenario I'm missing that the S3 resume quirk
 wouldn't handle?

 We don't need to change hub_port_debounce() right away... that was
 just an additional suggestion to make things more efficient for
 SuperSpeed devices in general. I think for now (in order to solve
 Dan's problem), it would be best if he just calls hub_port_debounce()
 in usb_port_runtime_resume() (which should bring a connected device
 back in the majority of cases), and always queues up a warm reset if
 that fails. (This is essentially what his patch is already doing, just
 get rid of the extra check for USB_SS_PORT_LS_POLLING in the error
 path which I think might be a little too specific and overlook cases
 where the RxDetect/Polling cycle just happens to be at RxDetect in
 that moment.)

That's the plan, and I also want to test a usb device quirk flag to
unconditionally warm reset on power-session loss since it does seem to
be device specifc in my case.
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock

2014-02-18 Thread Julius Werner
 We don't need to change hub_port_debounce() right away... that was
 just an additional suggestion to make things more efficient for
 SuperSpeed devices in general. I think for now (in order to solve
 Dan's problem), it would be best if he just calls hub_port_debounce()
 in usb_port_runtime_resume() (which should bring a connected device
 back in the majority of cases), and always queues up a warm reset if
 that fails. (This is essentially what his patch is already doing, just
 get rid of the extra check for USB_SS_PORT_LS_POLLING in the error
 path which I think might be a little too specific and overlook cases
 where the RxDetect/Polling cycle just happens to be at RxDetect in
 that moment.)

 That's the plan, and I also want to test a usb device quirk flag to
 unconditionally warm reset on power-session loss since it does seem to
 be device specifc in my case.

For what it's worth, I think you might not need a quirk flag since you
are not really loosing anything in that case. If the first
hub_port_debounce() fails to reconnect, the device is either quirky or
there is no device attached at all. In the first case we want the warm
reset, and in the second case the warm reset is pointless but also
doesn't cost us anything (assuming it runs totally asynchronous, which
I think your patch guarantees). The only case where we really need to
avoid wasting those 100ms is on ports which are connected and already
usable, but those should be caught by hub_port_debounce() already.
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock

2014-02-18 Thread Dan Williams
On Tue, Feb 18, 2014 at 3:39 PM, Julius Werner jwer...@chromium.org wrote:
 We don't need to change hub_port_debounce() right away... that was
 just an additional suggestion to make things more efficient for
 SuperSpeed devices in general. I think for now (in order to solve
 Dan's problem), it would be best if he just calls hub_port_debounce()
 in usb_port_runtime_resume() (which should bring a connected device
 back in the majority of cases), and always queues up a warm reset if
 that fails. (This is essentially what his patch is already doing, just
 get rid of the extra check for USB_SS_PORT_LS_POLLING in the error
 path which I think might be a little too specific and overlook cases
 where the RxDetect/Polling cycle just happens to be at RxDetect in
 that moment.)

 That's the plan, and I also want to test a usb device quirk flag to
 unconditionally warm reset on power-session loss since it does seem to
 be device specifc in my case.

 For what it's worth, I think you might not need a quirk flag since you
 are not really loosing anything in that case. If the first
 hub_port_debounce() fails to reconnect, the device is either quirky or
 there is no device attached at all. In the first case we want the warm
 reset, and in the second case the warm reset is pointless but also
 doesn't cost us anything (assuming it runs totally asynchronous, which
 I think your patch guarantees).

No, it's not asynchronous.

 The only case where we really need to
 avoid wasting those 100ms is on ports which are connected and already
 usable, but those should be caught by hub_port_debounce() already.

The quirk would stop the the implementation from wasting its time on
the 2 second reconnect timeout on devices known to have problems
recovering from a logical power session loss and simply issue the
warm-reset immediately.  That said, I should also look to see how long
devices take to reconnect on average and if that is more than 120ms
maybe it would make sense to unconditionally warm reset.
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock

2014-02-11 Thread Alan Stern
On Mon, 10 Feb 2014, Dan Williams wrote:

 I believe I am seeing a polling livelock scenario as described by Julius.

Julius was talking about what happens when the host controller itself
gets reset (and therefore remembers nothing about any device) whereas
the device still thinks it is in U3.  Is that the scenario you're 
encountering?  I thought you were working on normal runtime PM.

  With this in place we may want to consider reducing the timeout and
  relying on warm reset for recovery.
 
  Why?  I'm not familiar with the intricacies of USB-3 link state
  changes, but there seem to be only two possibilities:
 
  Either PORT_LS_POLLING is a valid state to be in while
  trying to establish a SuperSpeed connection, in which case
  we don't want to reduce the timeout,
 
  Or it isn't a valid state, in which case we should abort
  the debounce immediately.
 
  One other thought (I don't know if it is the right thing to do) is that
  we might _always_ perform a warm reset after powering-on a SuperSpeed
  port, without bothering to call hub_port_debounce_be_connected.
 
 I'm leaning in that direction.  However, the decision comes down to
 the relative occurrence frequency of devices that fall into this trap
 vs those that successfully recover and would suffer the additional
 latency of a warm reset.

Is a warm reset significantly longer than an ordinary reset?  We have
to do some kind of reset in any case.  After all, the power session
_has_ been interrupted.  (Assuming the power switching worked...)

  There's just no way to know if the device on
 the other side is legitimately causing a polling condition or whether
 this is a result of the aforementioned live lock.  So far I only have
 one USB3 device that requires this, our favorite ax88179_178a net
 adapter.
 
 The spec says that the only way to reliably sync the state machines is
 to remove power from the device, but we have no real way from the
 kernel to force and know a port is physically powered off.  I'll look
 and see how imposing latency-wise it would be to always warm reset,
 but we may want to just quirk temperamental devices and hosts as we
 find them and use the timeout as a backstop.
 
 
   Other xHCs that fail to propagate
  warm resets on hub resume may want to trigger this behavior via a quirk.
 
  What do you mean by other xHCs?  Other than what?
 
 
 Other xHCs referring again to that warm reset thread and the
 hypothesis that the Synopsys xHC is not propagating warm resets on
 host resume.

It looks like there were two separate considerations: Whether the host
controller issues a reset signal over the bus when it gets reset
itself, and whether the host issues a reset signal to a port if it
doesn't believe any device is attached.

  I don't want to go over this patch in detail, because it's pretty
  confusing and the code is messy.  Still, it seems odd to add all those
  port status manipulations in usb_port_runtime_resume, when
  hub_port_debounce_be_connected is already doing them.
 
  And why do we need another special flag to indicate that a warm reset
  is needed?  Can't check_port_resume_type figure that out from the port
  status?  That routine was meant for exactly this sort of thing.
 
 
 check_port_resume_type() does not have the context to make the
 determination.  LS_POLLING is a valid state, we only know that a warm
 reset is required when it has been in this state for too long.

If the port is in that state when check_port_resume_type runs then it
certainly has been too long.  According to section 7.5 of the spec,
ports don't go into LS_POLLING when the power session wasn't
interrupted.

 Unfortunately, the timeout needs to consider that the device is coming
 from physically powered off condition (rather than just logical) so it
 at least needs to be 2 seconds for a connection (per commit ad493e5
 usb: add usb port auto power off mechanism).

Okay, that's reasonable.  Unfortunate, but reasonable.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock

2014-02-11 Thread Julius Werner
 I believe I am seeing a polling livelock scenario as described by Julius.

 Julius was talking about what happens when the host controller itself
 gets reset (and therefore remembers nothing about any device) whereas
 the device still thinks it is in U3.  Is that the scenario you're
 encountering?  I thought you were working on normal runtime PM.

When you turn the power back on for a port, it should start out in
RxDetect and switch to Polling as it detects Rx terminations. If the
downstream device is unhappy for any reason (e.g. in SS.Inactive or
still in U3) and sends no or wrong responses to the LFPS Polling, the
hub's port will either move to ComplianceMode or keep cycling back and
forth between RxDetect and Polling. The latter is especially dangerous
because it's hard to detect (if you just sample the port status you
might see RxDetect, which would also be expected if there is nothing
connected at all), so I'm thinking an unconditional warm reset might
be unavoidable. That is why we proposed to go that route for the
Synopsys controller, and I think the same will apply to this situation
(since I think turning off a PortPower bit in XHCI will make the
controller forget a previous U3 state and return to RxDetect when
you turn it back on again, even though the actual VBUS line to the
device may not have been disabled after all).

  One other thought (I don't know if it is the right thing to do) is that
  we might _always_ perform a warm reset after powering-on a SuperSpeed
  port, without bothering to call hub_port_debounce_be_connected.

 I'm leaning in that direction.  However, the decision comes down to
 the relative occurrence frequency of devices that fall into this trap
 vs those that successfully recover and would suffer the additional
 latency of a warm reset.

 Is a warm reset significantly longer than an ordinary reset?  We have
 to do some kind of reset in any case.  After all, the power session
 _has_ been interrupted.  (Assuming the power switching worked...)

USB 3.0 ports don't need to be reset on connect as a matter of course.
The should usually just start training themselves and eventually
become ready as soon as the wires touch. An extra warm reset would add
80-120ms delay to the port resume. (In comparison, a hot reset should
not take more than 12ms, probably even much less.)

  With this in place we may want to consider reducing the timeout and
  relying on warm reset for recovery.
 
  Why?  I'm not familiar with the intricacies of USB-3 link state
  changes, but there seem to be only two possibilities:
 
  Either PORT_LS_POLLING is a valid state to be in while
  trying to establish a SuperSpeed connection, in which case
  we don't want to reduce the timeout,
 
  Or it isn't a valid state, in which case we should abort
  the debounce immediately.

It is a valid transitional state, unfortunately, but in a working case
it should resolve itself within a few milliseconds (probably less than
10). Maybe we should try to differentiate between USB 2.0 and 3.0
devices in hub_port_debounce()? I think due to the built-in link
training in USB 3.0, the classic debouncing doesn't really make sense
anymore (and wastes a lot of time since SuperSpeed links can train
really fast when they work).

As for this patch, I think the best approach would be to wait for the
device to come back in usb_port_runtime_resume() (through
hub_port_debounce() or something else), and if it doesn't show up
always set the bit to warm reset the port (regardless of LTSSM state,
since even if it says RxDetect I wouldn't be sure that there is really
nothing connected). We could then also use those bits in the lost
power case of xhci_resume() to try and work around the problems with
that Synopsys controller.
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock

2014-02-10 Thread Alan Stern
On Fri, 31 Jan 2014, Dan Williams wrote:

 Resuming a powered down port sometimes results in the port state being
 stuck in USB_SS_PORT_LS_POLLING:
 
  hub 3-0:1.0: debounce: port 1: total 2000ms stable 0ms status 0x2e0
  port1: can't get reconnection after setting port  power on, status -110
  hub 3-0:1.0: port 1 status .02e0 after resume, -19
  usb 3-1: can't resume, status -19
  hub 3-0:1.0: logical disconnect on port 1

It's not obvious that this illustrates your point.  Are we supposed to 
know offhand that 0x2e0 means USB_SS_PORT_LS_POLLING?

 In the case above we wait 2 seconds for the port to reconnect and for
 that entire time the port remained in the polling state.  A warm reset
 triggers the device to reconnect and resume as normal.  With this patch
 we get:
 
  hub 3-0:1.0: debounce: port 1: total 2000ms stable 0ms status 0x2e0
  usb usb3: port1 usb_port_runtime_resume requires warm reset
  hub 3-0:1.0: port 1 not warm reset yet, waiting 50ms
  usb 3-1: reset SuperSpeed USB device number 2 using xhci_hcd

Could this be improved?  We still spent 2 seconds waiting for a port
that remained in the polling state.

 With this in place we may want to consider reducing the timeout and
 relying on warm reset for recovery.

Why?  I'm not familiar with the intricacies of USB-3 link state
changes, but there seem to be only two possibilities:

Either PORT_LS_POLLING is a valid state to be in while
trying to establish a SuperSpeed connection, in which case
we don't want to reduce the timeout,

Or it isn't a valid state, in which case we should abort
the debounce immediately.

One other thought (I don't know if it is the right thing to do) is that 
we might _always_ perform a warm reset after powering-on a SuperSpeed 
port, without bothering to call hub_port_debounce_be_connected.

  Other xHCs that fail to propagate
 warm resets on hub resume may want to trigger this behavior via a quirk.

What do you mean by other xHCs?  Other than what?

I don't want to go over this patch in detail, because it's pretty
confusing and the code is messy.  Still, it seems odd to add all those
port status manipulations in usb_port_runtime_resume, when 
hub_port_debounce_be_connected is already doing them.

And why do we need another special flag to indicate that a warm reset 
is needed?  Can't check_port_resume_type figure that out from the port 
status?  That routine was meant for exactly this sort of thing.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock

2014-02-10 Thread Dan Williams
On Mon, Feb 10, 2014 at 1:26 PM, Alan Stern st...@rowland.harvard.edu wrote:
 On Fri, 31 Jan 2014, Dan Williams wrote:

 Resuming a powered down port sometimes results in the port state being
 stuck in USB_SS_PORT_LS_POLLING:

  hub 3-0:1.0: debounce: port 1: total 2000ms stable 0ms status 0x2e0
  port1: can't get reconnection after setting port  power on, status -110
  hub 3-0:1.0: port 1 status .02e0 after resume, -19
  usb 3-1: can't resume, status -19
  hub 3-0:1.0: logical disconnect on port 1

 It's not obvious that this illustrates your point.  Are we supposed to
 know offhand that 0x2e0 means USB_SS_PORT_LS_POLLING?

Hmm, no, I'll make the clearer in the revised change log.


 In the case above we wait 2 seconds for the port to reconnect and for
 that entire time the port remained in the polling state.  A warm reset
 triggers the device to reconnect and resume as normal.  With this patch
 we get:

  hub 3-0:1.0: debounce: port 1: total 2000ms stable 0ms status 0x2e0
  usb usb3: port1 usb_port_runtime_resume requires warm reset
  hub 3-0:1.0: port 1 not warm reset yet, waiting 50ms
  usb 3-1: reset SuperSpeed USB device number 2 using xhci_hcd

 Could this be improved?  We still spent 2 seconds waiting for a port
 that remained in the polling state.

So this and at least one other question is why this cc list includes
the participants from the thread [PATCH] USB: core: Add warm reset
while reset-resuming SuperSpeed HUBs:
http://marc.info/?l=linux-usbm=138678842000703w=2

I believe I am seeing a polling livelock scenario as described by Julius.


 With this in place we may want to consider reducing the timeout and
 relying on warm reset for recovery.

 Why?  I'm not familiar with the intricacies of USB-3 link state
 changes, but there seem to be only two possibilities:

 Either PORT_LS_POLLING is a valid state to be in while
 trying to establish a SuperSpeed connection, in which case
 we don't want to reduce the timeout,

 Or it isn't a valid state, in which case we should abort
 the debounce immediately.

 One other thought (I don't know if it is the right thing to do) is that
 we might _always_ perform a warm reset after powering-on a SuperSpeed
 port, without bothering to call hub_port_debounce_be_connected.

I'm leaning in that direction.  However, the decision comes down to
the relative occurrence frequency of devices that fall into this trap
vs those that successfully recover and would suffer the additional
latency of a warm reset.  There's just no way to know if the device on
the other side is legitimately causing a polling condition or whether
this is a result of the aforementioned live lock.  So far I only have
one USB3 device that requires this, our favorite ax88179_178a net
adapter.

The spec says that the only way to reliably sync the state machines is
to remove power from the device, but we have no real way from the
kernel to force and know a port is physically powered off.  I'll look
and see how imposing latency-wise it would be to always warm reset,
but we may want to just quirk temperamental devices and hosts as we
find them and use the timeout as a backstop.


  Other xHCs that fail to propagate
 warm resets on hub resume may want to trigger this behavior via a quirk.

 What do you mean by other xHCs?  Other than what?


Other xHCs referring again to that warm reset thread and the
hypothesis that the Synopsys xHC is not propagating warm resets on
host resume.

 I don't want to go over this patch in detail, because it's pretty
 confusing and the code is messy.  Still, it seems odd to add all those
 port status manipulations in usb_port_runtime_resume, when
 hub_port_debounce_be_connected is already doing them.

 And why do we need another special flag to indicate that a warm reset
 is needed?  Can't check_port_resume_type figure that out from the port
 status?  That routine was meant for exactly this sort of thing.


check_port_resume_type() does not have the context to make the
determination.  LS_POLLING is a valid state, we only know that a warm
reset is required when it has been in this state for too long.
Unfortunately, the timeout needs to consider that the device is coming
from physically powered off condition (rather than just logical) so it
at least needs to be 2 seconds for a connection (per commit ad493e5
usb: add usb port auto power off mechanism).
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html