Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock
On Tue, Feb 11, 2014 at 07:29:40PM -0800, Julius Werner wrote: I believe I am seeing a polling livelock scenario as described by Julius. Julius was talking about what happens when the host controller itself gets reset (and therefore remembers nothing about any device) whereas the device still thinks it is in U3. Is that the scenario you're encountering? I thought you were working on normal runtime PM. When you turn the power back on for a port, it should start out in RxDetect and switch to Polling as it detects Rx terminations. If the downstream device is unhappy for any reason (e.g. in SS.Inactive or still in U3) and sends no or wrong responses to the LFPS Polling, the hub's port will either move to ComplianceMode or keep cycling back and forth between RxDetect and Polling. The latter is especially dangerous because it's hard to detect (if you just sample the port status you might see RxDetect, which would also be expected if there is nothing connected at all), so I'm thinking an unconditional warm reset might be unavoidable. That is why we proposed to go that route for the Synopsys controller, and I think the same will apply to this situation (since I think turning off a PortPower bit in XHCI will make the controller forget a previous U3 state and return to RxDetect when you turn it back on again, even though the actual VBUS line to the device may not have been disabled after all). Julius, are you sure the Synopsys host will actually power off the ports? The Intel hosts need some special ACPI methods, so I'm not sure if Dan's issue with ports after power on could even be seen on the Synopsys host. The Synopsys issue (as I remember it, please correct me if I'm wrong) would only be triggered if the host lost power during S3, and was halted and reset after a register restore failure. I think the solution we agreed to was to implement a Synopsys host quirk that would warm reset all ports unconditionally on register restore failure. Was that right? Then there's Dan's issue. Dan, does the port go into SS.Inactive before the host starts to cycle between RxDetect and Polling and U0 for this case? Hans also ran into this issue (or at least a variation of it), and proposed a patch to fix it. https://git.kernel.org/cgit/linux/kernel/git/sarah/xhci.git/commit/?h=for-usb-next-streamsid=3fd14185404e3a298a3f6b6c6f21ff8d41bb2747 Can you take a look at it, and see if it would address your issue? I think it will catch the case where we transition from SS.Inactive - RxDetect - Polling. One other thought (I don't know if it is the right thing to do) is that we might _always_ perform a warm reset after powering-on a SuperSpeed port, without bothering to call hub_port_debounce_be_connected. I'm leaning in that direction. However, the decision comes down to the relative occurrence frequency of devices that fall into this trap vs those that successfully recover and would suffer the additional latency of a warm reset. Is a warm reset significantly longer than an ordinary reset? We have to do some kind of reset in any case. After all, the power session _has_ been interrupted. (Assuming the power switching worked...) USB 3.0 ports don't need to be reset on connect as a matter of course. The should usually just start training themselves and eventually become ready as soon as the wires touch. An extra warm reset would add 80-120ms delay to the port resume. (In comparison, a hot reset should not take more than 12ms, probably even much less.) I would rather avoid warm reset unconditionally on connect whenever possible, because 80-120ms is too long of a delay for some embedded/tablet systems that come into and out of S3 very often. With this in place we may want to consider reducing the timeout and relying on warm reset for recovery. Why? I'm not familiar with the intricacies of USB-3 link state changes, but there seem to be only two possibilities: Either PORT_LS_POLLING is a valid state to be in while trying to establish a SuperSpeed connection, in which case we don't want to reduce the timeout, Or it isn't a valid state, in which case we should abort the debounce immediately. It is a valid transitional state, unfortunately, but in a working case it should resolve itself within a few milliseconds (probably less than 10). Maybe we should try to differentiate between USB 2.0 and 3.0 devices in hub_port_debounce()? I think due to the built-in link training in USB 3.0, the classic debouncing doesn't really make sense anymore (and wastes a lot of time since SuperSpeed links can train really fast when they work). As for this patch, I think the best approach would be to wait for the device to come back in usb_port_runtime_resume() (through hub_port_debounce() or something else), and if it doesn't show up always set the bit to warm reset the port (regardless of
Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock
Hi, + Paul as he might have better details about the Synopsys core host-side implementation On Tue, Feb 18, 2014 at 12:42:53PM -0800, Sarah Sharp wrote: On Tue, Feb 11, 2014 at 07:29:40PM -0800, Julius Werner wrote: I believe I am seeing a polling livelock scenario as described by Julius. Julius was talking about what happens when the host controller itself gets reset (and therefore remembers nothing about any device) whereas the device still thinks it is in U3. Is that the scenario you're encountering? I thought you were working on normal runtime PM. When you turn the power back on for a port, it should start out in RxDetect and switch to Polling as it detects Rx terminations. If the downstream device is unhappy for any reason (e.g. in SS.Inactive or still in U3) and sends no or wrong responses to the LFPS Polling, the hub's port will either move to ComplianceMode or keep cycling back and forth between RxDetect and Polling. The latter is especially dangerous because it's hard to detect (if you just sample the port status you might see RxDetect, which would also be expected if there is nothing connected at all), so I'm thinking an unconditional warm reset might be unavoidable. That is why we proposed to go that route for the Synopsys controller, and I think the same will apply to this situation (since I think turning off a PortPower bit in XHCI will make the controller forget a previous U3 state and return to RxDetect when you turn it back on again, even though the actual VBUS line to the device may not have been disabled after all). Julius, are you sure the Synopsys host will actually power off the ports? The Intel hosts need some special ACPI methods, so I'm not sure if Dan's issue with ports after power on could even be seen on the Synopsys host. The Synopsys issue (as I remember it, please correct me if I'm wrong) would only be triggered if the host lost power during S3, and was halted and reset after a register restore failure. I think the solution we agreed to was to implement a Synopsys host quirk that would warm reset all ports unconditionally on register restore failure. Was that right? Then there's Dan's issue. Dan, does the port go into SS.Inactive before the host starts to cycle between RxDetect and Polling and U0 for this case? Hans also ran into this issue (or at least a variation of it), and proposed a patch to fix it. https://git.kernel.org/cgit/linux/kernel/git/sarah/xhci.git/commit/?h=for-usb-next-streamsid=3fd14185404e3a298a3f6b6c6f21ff8d41bb2747 Can you take a look at it, and see if it would address your issue? I think it will catch the case where we transition from SS.Inactive - RxDetect - Polling. One other thought (I don't know if it is the right thing to do) is that we might _always_ perform a warm reset after powering-on a SuperSpeed port, without bothering to call hub_port_debounce_be_connected. I'm leaning in that direction. However, the decision comes down to the relative occurrence frequency of devices that fall into this trap vs those that successfully recover and would suffer the additional latency of a warm reset. Is a warm reset significantly longer than an ordinary reset? We have to do some kind of reset in any case. After all, the power session _has_ been interrupted. (Assuming the power switching worked...) USB 3.0 ports don't need to be reset on connect as a matter of course. The should usually just start training themselves and eventually become ready as soon as the wires touch. An extra warm reset would add 80-120ms delay to the port resume. (In comparison, a hot reset should not take more than 12ms, probably even much less.) I would rather avoid warm reset unconditionally on connect whenever possible, because 80-120ms is too long of a delay for some embedded/tablet systems that come into and out of S3 very often. With this in place we may want to consider reducing the timeout and relying on warm reset for recovery. Why? I'm not familiar with the intricacies of USB-3 link state changes, but there seem to be only two possibilities: Either PORT_LS_POLLING is a valid state to be in while trying to establish a SuperSpeed connection, in which case we don't want to reduce the timeout, Or it isn't a valid state, in which case we should abort the debounce immediately. It is a valid transitional state, unfortunately, but in a working case it should resolve itself within a few milliseconds (probably less than 10). Maybe we should try to differentiate between USB 2.0 and 3.0 devices in hub_port_debounce()? I think due to the built-in link training in USB 3.0, the classic debouncing doesn't really make sense anymore (and wastes a lot of time since SuperSpeed links can train really fast when they work). As
RE: [PATCH v4 13/14] usb: force warm reset to break resume livelock
From: Felipe Balbi [mailto:ba...@ti.com] Sent: Tuesday, February 18, 2014 1:05 PM Hi, + Paul as he might have better details about the Synopsys core host-side implementation On Tue, Feb 18, 2014 at 12:42:53PM -0800, Sarah Sharp wrote: On Tue, Feb 11, 2014 at 07:29:40PM -0800, Julius Werner wrote: I believe I am seeing a polling livelock scenario as described by Julius. Julius was talking about what happens when the host controller itself gets reset (and therefore remembers nothing about any device) whereas the device still thinks it is in U3. Is that the scenario you're encountering? I thought you were working on normal runtime PM. When you turn the power back on for a port, it should start out in RxDetect and switch to Polling as it detects Rx terminations. If the downstream device is unhappy for any reason (e.g. in SS.Inactive or still in U3) and sends no or wrong responses to the LFPS Polling, the hub's port will either move to ComplianceMode or keep cycling back and forth between RxDetect and Polling. The latter is especially dangerous because it's hard to detect (if you just sample the port status you might see RxDetect, which would also be expected if there is nothing connected at all), so I'm thinking an unconditional warm reset might be unavoidable. That is why we proposed to go that route for the Synopsys controller, and I think the same will apply to this situation (since I think turning off a PortPower bit in XHCI will make the controller forget a previous U3 state and return to RxDetect when you turn it back on again, even though the actual VBUS line to the device may not have been disabled after all). Julius, are you sure the Synopsys host will actually power off the ports? The Intel hosts need some special ACPI methods, so I'm not sure if Dan's issue with ports after power on could even be seen on the Synopsys host. The Synopsys issue (as I remember it, please correct me if I'm wrong) would only be triggered if the host lost power during S3, and was halted and reset after a register restore failure. I think the solution we agreed to was to implement a Synopsys host quirk that would warm reset all ports unconditionally on register restore failure. Was that right? Then there's Dan's issue. Dan, does the port go into SS.Inactive before the host starts to cycle between RxDetect and Polling and U0 for this case? Hans also ran into this issue (or at least a variation of it), and proposed a patch to fix it. https://git.kernel.org/cgit/linux/kernel/git/sarah/xhci.git/commit/?h=for-usb-next-streamsid=3fd14185404e3a298a3f6b6c6f21ff8d41bb2747 Can you take a look at it, and see if it would address your issue? I think it will catch the case where we transition from SS.Inactive - RxDetect - Polling. One other thought (I don't know if it is the right thing to do) is that we might _always_ perform a warm reset after powering-on a SuperSpeed port, without bothering to call hub_port_debounce_be_connected. I'm leaning in that direction. However, the decision comes down to the relative occurrence frequency of devices that fall into this trap vs those that successfully recover and would suffer the additional latency of a warm reset. Is a warm reset significantly longer than an ordinary reset? We have to do some kind of reset in any case. After all, the power session _has_ been interrupted. (Assuming the power switching worked...) USB 3.0 ports don't need to be reset on connect as a matter of course. The should usually just start training themselves and eventually become ready as soon as the wires touch. An extra warm reset would add 80-120ms delay to the port resume. (In comparison, a hot reset should not take more than 12ms, probably even much less.) I would rather avoid warm reset unconditionally on connect whenever possible, because 80-120ms is too long of a delay for some embedded/tablet systems that come into and out of S3 very often. With this in place we may want to consider reducing the timeout and relying on warm reset for recovery. Why? I'm not familiar with the intricacies of USB-3 link state changes, but there seem to be only two possibilities: Either PORT_LS_POLLING is a valid state to be in while trying to establish a SuperSpeed connection, in which case we don't want to reduce the timeout, Or it isn't a valid state, in which case we should abort the debounce immediately. It is a valid transitional state, unfortunately, but in a working case it should resolve itself within a few milliseconds (probably less than 10). Maybe we should try to differentiate between USB 2.0 and 3.0 devices in hub_port_debounce()? I think due to the built-in link
Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock
Julius, are you sure the Synopsys host will actually power off the ports? The Intel hosts need some special ACPI methods, so I'm not sure if Dan's issue with ports after power on could even be seen on the Synopsys host. The Synopsys issue (as I remember it, please correct me if I'm wrong) would only be triggered if the host lost power during S3, and was halted and reset after a register restore failure. I think the solution we agreed to was to implement a Synopsys host quirk that would warm reset all ports unconditionally on register restore failure. Was that right? Then there's Dan's issue. Dan, does the port go into SS.Inactive before the host starts to cycle between RxDetect and Polling and U0 for this case? Sorry, I didn't mean to pull the Synopsys issue into this thread... we should probably keep that separate in https://lkml.org/lkml/2013/12/9/336 , or this will get too confusing. Some aspects of that issue are definitely different, e.g. there's no port power being turned off there (on the contrary, the problem is that the power session stays alive during suspend but the xHC gets reset and forgets about it). I just wanted to point out that both issues can lead to the same state (by different means) where the port status cycles between RxDetect and Polling, because they both reset the host controller's port state to RxDetect in a way that the device might not notice (or not react correctly to). I think whenever the host port state gets forced back to RxDetect while the device is in SS.Inactive (or anything that can lead to SS.Inactive after a few unexpected packets) you will get into this state, and you will only get back out of there through a warm reset (or by physically cutting off VBUS). Hans also ran into this issue (or at least a variation of it), and proposed a patch to fix it. https://git.kernel.org/cgit/linux/kernel/git/sarah/xhci.git/commit/?h=for-usb-next-streamsid=3fd14185404e3a298a3f6b6c6f21ff8d41bb2747 Can you take a look at it, and see if it would address your issue? I think it will catch the case where we transition from SS.Inactive - RxDetect - Polling. I don't think that's targeting the same problem. Hans seems to be describing a situation where the device connects again even though he doesn't want it to -- in our case, the device doesn't connect even though we want that. His patch shouldn't do anything for our issue since he checks for PORT_STAT_CONNECTION, and that bit will be unset when the host port is stuck in RxDetect/Polling. It is a valid transitional state, unfortunately, but in a working case it should resolve itself within a few milliseconds (probably less than 10). Maybe we should try to differentiate between USB 2.0 and 3.0 devices in hub_port_debounce()? I think due to the built-in link training in USB 3.0, the classic debouncing doesn't really make sense anymore (and wastes a lot of time since SuperSpeed links can train really fast when they work). As for this patch, I think the best approach would be to wait for the device to come back in usb_port_runtime_resume() (through hub_port_debounce() or something else), and if it doesn't show up always set the bit to warm reset the port (regardless of LTSSM state, since even if it says RxDetect I wouldn't be sure that there is really nothing connected). We could then also use those bits in the lost power case of xhci_resume() to try and work around the problems with that Synopsys controller. That's a lot of changes to the hub core. Would an xHCI quirk be simpler? Is there some scenario I'm missing that the S3 resume quirk wouldn't handle? We don't need to change hub_port_debounce() right away... that was just an additional suggestion to make things more efficient for SuperSpeed devices in general. I think for now (in order to solve Dan's problem), it would be best if he just calls hub_port_debounce() in usb_port_runtime_resume() (which should bring a connected device back in the majority of cases), and always queues up a warm reset if that fails. (This is essentially what his patch is already doing, just get rid of the extra check for USB_SS_PORT_LS_POLLING in the error path which I think might be a little too specific and overlook cases where the RxDetect/Polling cycle just happens to be at RxDetect in that moment.) -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock
On Tue, Feb 18, 2014 at 2:40 PM, Julius Werner jwer...@chromium.org wrote: Can you take a look at it, and see if it would address your issue? I think it will catch the case where we transition from SS.Inactive - RxDetect - Polling. I don't think that's targeting the same problem. Hans seems to be describing a situation where the device connects again even though he doesn't want it to -- in our case, the device doesn't connect even though we want that. His patch shouldn't do anything for our issue since he checks for PORT_STAT_CONNECTION, and that bit will be unset when the host port is stuck in RxDetect/Polling. Yes, agreed. It is a valid transitional state, unfortunately, but in a working case it should resolve itself within a few milliseconds (probably less than 10). Maybe we should try to differentiate between USB 2.0 and 3.0 devices in hub_port_debounce()? I think due to the built-in link training in USB 3.0, the classic debouncing doesn't really make sense anymore (and wastes a lot of time since SuperSpeed links can train really fast when they work). As for this patch, I think the best approach would be to wait for the device to come back in usb_port_runtime_resume() (through hub_port_debounce() or something else), and if it doesn't show up always set the bit to warm reset the port (regardless of LTSSM state, since even if it says RxDetect I wouldn't be sure that there is really nothing connected). We could then also use those bits in the lost power case of xhci_resume() to try and work around the problems with that Synopsys controller. That's a lot of changes to the hub core. Would an xHCI quirk be simpler? Is there some scenario I'm missing that the S3 resume quirk wouldn't handle? We don't need to change hub_port_debounce() right away... that was just an additional suggestion to make things more efficient for SuperSpeed devices in general. I think for now (in order to solve Dan's problem), it would be best if he just calls hub_port_debounce() in usb_port_runtime_resume() (which should bring a connected device back in the majority of cases), and always queues up a warm reset if that fails. (This is essentially what his patch is already doing, just get rid of the extra check for USB_SS_PORT_LS_POLLING in the error path which I think might be a little too specific and overlook cases where the RxDetect/Polling cycle just happens to be at RxDetect in that moment.) That's the plan, and I also want to test a usb device quirk flag to unconditionally warm reset on power-session loss since it does seem to be device specifc in my case. -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock
We don't need to change hub_port_debounce() right away... that was just an additional suggestion to make things more efficient for SuperSpeed devices in general. I think for now (in order to solve Dan's problem), it would be best if he just calls hub_port_debounce() in usb_port_runtime_resume() (which should bring a connected device back in the majority of cases), and always queues up a warm reset if that fails. (This is essentially what his patch is already doing, just get rid of the extra check for USB_SS_PORT_LS_POLLING in the error path which I think might be a little too specific and overlook cases where the RxDetect/Polling cycle just happens to be at RxDetect in that moment.) That's the plan, and I also want to test a usb device quirk flag to unconditionally warm reset on power-session loss since it does seem to be device specifc in my case. For what it's worth, I think you might not need a quirk flag since you are not really loosing anything in that case. If the first hub_port_debounce() fails to reconnect, the device is either quirky or there is no device attached at all. In the first case we want the warm reset, and in the second case the warm reset is pointless but also doesn't cost us anything (assuming it runs totally asynchronous, which I think your patch guarantees). The only case where we really need to avoid wasting those 100ms is on ports which are connected and already usable, but those should be caught by hub_port_debounce() already. -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock
On Tue, Feb 18, 2014 at 3:39 PM, Julius Werner jwer...@chromium.org wrote: We don't need to change hub_port_debounce() right away... that was just an additional suggestion to make things more efficient for SuperSpeed devices in general. I think for now (in order to solve Dan's problem), it would be best if he just calls hub_port_debounce() in usb_port_runtime_resume() (which should bring a connected device back in the majority of cases), and always queues up a warm reset if that fails. (This is essentially what his patch is already doing, just get rid of the extra check for USB_SS_PORT_LS_POLLING in the error path which I think might be a little too specific and overlook cases where the RxDetect/Polling cycle just happens to be at RxDetect in that moment.) That's the plan, and I also want to test a usb device quirk flag to unconditionally warm reset on power-session loss since it does seem to be device specifc in my case. For what it's worth, I think you might not need a quirk flag since you are not really loosing anything in that case. If the first hub_port_debounce() fails to reconnect, the device is either quirky or there is no device attached at all. In the first case we want the warm reset, and in the second case the warm reset is pointless but also doesn't cost us anything (assuming it runs totally asynchronous, which I think your patch guarantees). No, it's not asynchronous. The only case where we really need to avoid wasting those 100ms is on ports which are connected and already usable, but those should be caught by hub_port_debounce() already. The quirk would stop the the implementation from wasting its time on the 2 second reconnect timeout on devices known to have problems recovering from a logical power session loss and simply issue the warm-reset immediately. That said, I should also look to see how long devices take to reconnect on average and if that is more than 120ms maybe it would make sense to unconditionally warm reset. -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock
On Mon, 10 Feb 2014, Dan Williams wrote: I believe I am seeing a polling livelock scenario as described by Julius. Julius was talking about what happens when the host controller itself gets reset (and therefore remembers nothing about any device) whereas the device still thinks it is in U3. Is that the scenario you're encountering? I thought you were working on normal runtime PM. With this in place we may want to consider reducing the timeout and relying on warm reset for recovery. Why? I'm not familiar with the intricacies of USB-3 link state changes, but there seem to be only two possibilities: Either PORT_LS_POLLING is a valid state to be in while trying to establish a SuperSpeed connection, in which case we don't want to reduce the timeout, Or it isn't a valid state, in which case we should abort the debounce immediately. One other thought (I don't know if it is the right thing to do) is that we might _always_ perform a warm reset after powering-on a SuperSpeed port, without bothering to call hub_port_debounce_be_connected. I'm leaning in that direction. However, the decision comes down to the relative occurrence frequency of devices that fall into this trap vs those that successfully recover and would suffer the additional latency of a warm reset. Is a warm reset significantly longer than an ordinary reset? We have to do some kind of reset in any case. After all, the power session _has_ been interrupted. (Assuming the power switching worked...) There's just no way to know if the device on the other side is legitimately causing a polling condition or whether this is a result of the aforementioned live lock. So far I only have one USB3 device that requires this, our favorite ax88179_178a net adapter. The spec says that the only way to reliably sync the state machines is to remove power from the device, but we have no real way from the kernel to force and know a port is physically powered off. I'll look and see how imposing latency-wise it would be to always warm reset, but we may want to just quirk temperamental devices and hosts as we find them and use the timeout as a backstop. Other xHCs that fail to propagate warm resets on hub resume may want to trigger this behavior via a quirk. What do you mean by other xHCs? Other than what? Other xHCs referring again to that warm reset thread and the hypothesis that the Synopsys xHC is not propagating warm resets on host resume. It looks like there were two separate considerations: Whether the host controller issues a reset signal over the bus when it gets reset itself, and whether the host issues a reset signal to a port if it doesn't believe any device is attached. I don't want to go over this patch in detail, because it's pretty confusing and the code is messy. Still, it seems odd to add all those port status manipulations in usb_port_runtime_resume, when hub_port_debounce_be_connected is already doing them. And why do we need another special flag to indicate that a warm reset is needed? Can't check_port_resume_type figure that out from the port status? That routine was meant for exactly this sort of thing. check_port_resume_type() does not have the context to make the determination. LS_POLLING is a valid state, we only know that a warm reset is required when it has been in this state for too long. If the port is in that state when check_port_resume_type runs then it certainly has been too long. According to section 7.5 of the spec, ports don't go into LS_POLLING when the power session wasn't interrupted. Unfortunately, the timeout needs to consider that the device is coming from physically powered off condition (rather than just logical) so it at least needs to be 2 seconds for a connection (per commit ad493e5 usb: add usb port auto power off mechanism). Okay, that's reasonable. Unfortunate, but reasonable. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock
I believe I am seeing a polling livelock scenario as described by Julius. Julius was talking about what happens when the host controller itself gets reset (and therefore remembers nothing about any device) whereas the device still thinks it is in U3. Is that the scenario you're encountering? I thought you were working on normal runtime PM. When you turn the power back on for a port, it should start out in RxDetect and switch to Polling as it detects Rx terminations. If the downstream device is unhappy for any reason (e.g. in SS.Inactive or still in U3) and sends no or wrong responses to the LFPS Polling, the hub's port will either move to ComplianceMode or keep cycling back and forth between RxDetect and Polling. The latter is especially dangerous because it's hard to detect (if you just sample the port status you might see RxDetect, which would also be expected if there is nothing connected at all), so I'm thinking an unconditional warm reset might be unavoidable. That is why we proposed to go that route for the Synopsys controller, and I think the same will apply to this situation (since I think turning off a PortPower bit in XHCI will make the controller forget a previous U3 state and return to RxDetect when you turn it back on again, even though the actual VBUS line to the device may not have been disabled after all). One other thought (I don't know if it is the right thing to do) is that we might _always_ perform a warm reset after powering-on a SuperSpeed port, without bothering to call hub_port_debounce_be_connected. I'm leaning in that direction. However, the decision comes down to the relative occurrence frequency of devices that fall into this trap vs those that successfully recover and would suffer the additional latency of a warm reset. Is a warm reset significantly longer than an ordinary reset? We have to do some kind of reset in any case. After all, the power session _has_ been interrupted. (Assuming the power switching worked...) USB 3.0 ports don't need to be reset on connect as a matter of course. The should usually just start training themselves and eventually become ready as soon as the wires touch. An extra warm reset would add 80-120ms delay to the port resume. (In comparison, a hot reset should not take more than 12ms, probably even much less.) With this in place we may want to consider reducing the timeout and relying on warm reset for recovery. Why? I'm not familiar with the intricacies of USB-3 link state changes, but there seem to be only two possibilities: Either PORT_LS_POLLING is a valid state to be in while trying to establish a SuperSpeed connection, in which case we don't want to reduce the timeout, Or it isn't a valid state, in which case we should abort the debounce immediately. It is a valid transitional state, unfortunately, but in a working case it should resolve itself within a few milliseconds (probably less than 10). Maybe we should try to differentiate between USB 2.0 and 3.0 devices in hub_port_debounce()? I think due to the built-in link training in USB 3.0, the classic debouncing doesn't really make sense anymore (and wastes a lot of time since SuperSpeed links can train really fast when they work). As for this patch, I think the best approach would be to wait for the device to come back in usb_port_runtime_resume() (through hub_port_debounce() or something else), and if it doesn't show up always set the bit to warm reset the port (regardless of LTSSM state, since even if it says RxDetect I wouldn't be sure that there is really nothing connected). We could then also use those bits in the lost power case of xhci_resume() to try and work around the problems with that Synopsys controller. -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock
On Fri, 31 Jan 2014, Dan Williams wrote: Resuming a powered down port sometimes results in the port state being stuck in USB_SS_PORT_LS_POLLING: hub 3-0:1.0: debounce: port 1: total 2000ms stable 0ms status 0x2e0 port1: can't get reconnection after setting port power on, status -110 hub 3-0:1.0: port 1 status .02e0 after resume, -19 usb 3-1: can't resume, status -19 hub 3-0:1.0: logical disconnect on port 1 It's not obvious that this illustrates your point. Are we supposed to know offhand that 0x2e0 means USB_SS_PORT_LS_POLLING? In the case above we wait 2 seconds for the port to reconnect and for that entire time the port remained in the polling state. A warm reset triggers the device to reconnect and resume as normal. With this patch we get: hub 3-0:1.0: debounce: port 1: total 2000ms stable 0ms status 0x2e0 usb usb3: port1 usb_port_runtime_resume requires warm reset hub 3-0:1.0: port 1 not warm reset yet, waiting 50ms usb 3-1: reset SuperSpeed USB device number 2 using xhci_hcd Could this be improved? We still spent 2 seconds waiting for a port that remained in the polling state. With this in place we may want to consider reducing the timeout and relying on warm reset for recovery. Why? I'm not familiar with the intricacies of USB-3 link state changes, but there seem to be only two possibilities: Either PORT_LS_POLLING is a valid state to be in while trying to establish a SuperSpeed connection, in which case we don't want to reduce the timeout, Or it isn't a valid state, in which case we should abort the debounce immediately. One other thought (I don't know if it is the right thing to do) is that we might _always_ perform a warm reset after powering-on a SuperSpeed port, without bothering to call hub_port_debounce_be_connected. Other xHCs that fail to propagate warm resets on hub resume may want to trigger this behavior via a quirk. What do you mean by other xHCs? Other than what? I don't want to go over this patch in detail, because it's pretty confusing and the code is messy. Still, it seems odd to add all those port status manipulations in usb_port_runtime_resume, when hub_port_debounce_be_connected is already doing them. And why do we need another special flag to indicate that a warm reset is needed? Can't check_port_resume_type figure that out from the port status? That routine was meant for exactly this sort of thing. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock
On Mon, Feb 10, 2014 at 1:26 PM, Alan Stern st...@rowland.harvard.edu wrote: On Fri, 31 Jan 2014, Dan Williams wrote: Resuming a powered down port sometimes results in the port state being stuck in USB_SS_PORT_LS_POLLING: hub 3-0:1.0: debounce: port 1: total 2000ms stable 0ms status 0x2e0 port1: can't get reconnection after setting port power on, status -110 hub 3-0:1.0: port 1 status .02e0 after resume, -19 usb 3-1: can't resume, status -19 hub 3-0:1.0: logical disconnect on port 1 It's not obvious that this illustrates your point. Are we supposed to know offhand that 0x2e0 means USB_SS_PORT_LS_POLLING? Hmm, no, I'll make the clearer in the revised change log. In the case above we wait 2 seconds for the port to reconnect and for that entire time the port remained in the polling state. A warm reset triggers the device to reconnect and resume as normal. With this patch we get: hub 3-0:1.0: debounce: port 1: total 2000ms stable 0ms status 0x2e0 usb usb3: port1 usb_port_runtime_resume requires warm reset hub 3-0:1.0: port 1 not warm reset yet, waiting 50ms usb 3-1: reset SuperSpeed USB device number 2 using xhci_hcd Could this be improved? We still spent 2 seconds waiting for a port that remained in the polling state. So this and at least one other question is why this cc list includes the participants from the thread [PATCH] USB: core: Add warm reset while reset-resuming SuperSpeed HUBs: http://marc.info/?l=linux-usbm=138678842000703w=2 I believe I am seeing a polling livelock scenario as described by Julius. With this in place we may want to consider reducing the timeout and relying on warm reset for recovery. Why? I'm not familiar with the intricacies of USB-3 link state changes, but there seem to be only two possibilities: Either PORT_LS_POLLING is a valid state to be in while trying to establish a SuperSpeed connection, in which case we don't want to reduce the timeout, Or it isn't a valid state, in which case we should abort the debounce immediately. One other thought (I don't know if it is the right thing to do) is that we might _always_ perform a warm reset after powering-on a SuperSpeed port, without bothering to call hub_port_debounce_be_connected. I'm leaning in that direction. However, the decision comes down to the relative occurrence frequency of devices that fall into this trap vs those that successfully recover and would suffer the additional latency of a warm reset. There's just no way to know if the device on the other side is legitimately causing a polling condition or whether this is a result of the aforementioned live lock. So far I only have one USB3 device that requires this, our favorite ax88179_178a net adapter. The spec says that the only way to reliably sync the state machines is to remove power from the device, but we have no real way from the kernel to force and know a port is physically powered off. I'll look and see how imposing latency-wise it would be to always warm reset, but we may want to just quirk temperamental devices and hosts as we find them and use the timeout as a backstop. Other xHCs that fail to propagate warm resets on hub resume may want to trigger this behavior via a quirk. What do you mean by other xHCs? Other than what? Other xHCs referring again to that warm reset thread and the hypothesis that the Synopsys xHC is not propagating warm resets on host resume. I don't want to go over this patch in detail, because it's pretty confusing and the code is messy. Still, it seems odd to add all those port status manipulations in usb_port_runtime_resume, when hub_port_debounce_be_connected is already doing them. And why do we need another special flag to indicate that a warm reset is needed? Can't check_port_resume_type figure that out from the port status? That routine was meant for exactly this sort of thing. check_port_resume_type() does not have the context to make the determination. LS_POLLING is a valid state, we only know that a warm reset is required when it has been in this state for too long. Unfortunately, the timeout needs to consider that the device is coming from physically powered off condition (rather than just logical) so it at least needs to be 2 seconds for a connection (per commit ad493e5 usb: add usb port auto power off mechanism). -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html