On Thu, 16 Mar 2017, Guenter Roeck wrote:
> While stress testing a usb controller using a bind/unbind looop, the
> following error loop was observed.
>
> usb 7-1.2: new low-speed USB device number 3 using xhci-hcd
> usb 7-1.2: hub failed to enable device, error -108
> usb 7-1-port2: cannot disable (err = -22)
> usb 7-1-port2: couldn't allocate usb_device
> usb 7-1-port2: cannot disable (err = -22)
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: activate --> -22
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: activate --> -22
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: activate --> -22
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: activate --> -22
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: activate --> -22
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: activate --> -22
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: activate --> -22
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: activate --> -22
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> ** 57 printk messages dropped ** hub 7-1:1.0: activate --> -22
> ** 82 printk messages dropped ** hub 7-1:1.0: hub_ext_port_status failed (err
> = -22)
>
> This continues forever. After adding tracebacks into the code,
> the call sequence leading to this is found to be as follows.
>
> [] hub_activate+0x368/0x7b8
> [] hub_resume+0x2c/0x3c
> [] usb_resume_interface.isra.6+0x128/0x158
> [] usb_suspend_both+0x1e8/0x288
> [] usb_runtime_suspend+0x3c/0x98
> [] __rpm_callback+0x48/0x7c
> [] rpm_callback+0xa8/0xd4
> [] rpm_suspend+0x84/0x758
> [] rpm_idle+0x2c8/0x498
> [] __pm_runtime_idle+0x60/0xac
> [] usb_autopm_put_interface+0x6c/0x7c
> [] hub_event+0x10ac/0x12ac
> [] process_one_work+0x390/0x6b8
> [] worker_thread+0x480/0x610
> [] kthread+0x164/0x178
> [] ret_from_fork+0x10/0x40
>
> kick_hub_wq() is called from hub_activate() even after failures to
> communicate with the hub. This results in an endless sequence of
> hub event -> hub activate -> wq trigger -> hub event -> ...
>
> Provide two solutions for the problem.
>
> - Only trigger the hub event queue if communication with the hub
> was successful.
> - After a suspend failure, only resume already suspended interfaces
> if the communication with the device is still possible.
>
> Each of the changes fixes the observed problem. Use both to improve
> robustness.
>
> Signed-off-by: Guenter Roeck
> ---
> v3: In hub.c, abort immediately if hub_port_status() returns an error.
> Since hub_port_status() already logs the error, don't log it again.
> In device,c, log the error return value from usb_suspend_device()
> if usb_get_status() failed as well.
> Don't check if the hub is still accessible if the error returned
> from hub_port_status() is -EBUSY.
> v2: Instead of not triggering the hub wq after an error to submit an urb,
> implement a more complex error detection and handling. Do it in two
> places. Marked as RFC to determine if one (or both) of those solutions
> are viable.
>
> drivers/usb/core/driver.c | 18 ++
> drivers/usb/core/hub.c| 5 -
> 2 files changed, 22 insertions(+), 1 deletion(-)
Acked-by: Alan Stern
> diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c
> index cdee5130638b..7ebdf2a4e8fe 100644
> --- a/drivers/usb/core/driver.c
> +++ b/drivers/usb/core/driver.c
> @@ -1331,6 +1331,24 @@ static int usb_suspend_both(struct usb_device *udev,
> pm_message_t msg)
>*/
> if (udev->parent && !PMSG_IS_AUTO(msg))
> status = 0;
> +
> + /*
> + * If the device is inaccessible, don't try to resume
> + * suspended interfaces and just return the error.
> + */
> + if (status && status != -EBUSY) {
> + int err;
> + u16 devstat;
> +
> + err = usb_get_status(udev, USB_RECIP_DEVICE, 0,
> + );
> + if (err) {
> + dev_err(>dev,
> + "Failed to suspend device, error %d\n",
> + status);
> + goto done;
> + }
> + }
> }
>
> /* If the