> -----Original Message----- > From: Mathias Nyman [mailto:mathias.ny...@linux.intel.com] > Sent: Thursday, March 31, 2016 8:07 PM > To: Rajesh Bhagat <rajesh.bha...@nxp.com> > Cc: gre...@linuxfoundation.org; linux-...@vger.kernel.org; linux- > ker...@vger.kernel.org; Sriram Dash <sriram.d...@nxp.com> > Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI > commmand timeout > > On 31.03.2016 06:51, Rajesh Bhagat wrote: > > > > > > > > Hello Mathias, > > > > Thanks a lot for your support. > > > > Attached patch is working and allows the completion handler to be > > called. And resume operation completes and shell comes back (but with lot of > errors). > > > > Now, some other points which needs our attention here are: > > 1. usb core hub code is not checking the error status of hcd->driver- > >reset_device(xhci_discover_or_reset_device) > > and continues considering reset_device was successful (and causes > > other issues). > Hence, a check is needed on return > > value of reset_device and proper action is required on it. > > 2. Even if above return value is checked and reset_device status is used. > > USB core > hub retries for N times which is not > > required in this case as adding to the resume operation time. But if > > in this case > we return -ENOTCONN instead of -EINVAL > > from hcd->driver->reset_device(xhci_discover_or_reset_device), code > > returns > early and doesn't retry. > > > > Proposed Solution for above with your suggested patches is as below: > > > > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index > > 7cad1fa..efeba31 100644 > > --- a/drivers/usb/core/hub.c > > +++ b/drivers/usb/core/hub.c > > @@ -2807,13 +2807,17 @@ done: > > struct usb_hcd *hcd = bus_to_hcd(udev->bus); > > > > update_devnum(udev, 0); > > - /* The xHC may think the device is already reset, > > - * so ignore the status. > > + /* > > + * check the status of resetting the device and > > update > > + * the udev status. > > */ > > if (hcd->driver->reset_device) > > - hcd->driver->reset_device(hcd, udev); > > + status = > > + hcd->driver->reset_device(hcd, udev); > > > > - usb_set_device_state(udev, USB_STATE_DEFAULT); > > + if (status == 0) > > + usb_set_device_state(udev, > > USB_STATE_DEFAULT); > > + else > > + usb_set_device_state(udev, > > + USB_STATE_NOTATTACHED); > > } > > } else { > > if (udev) > > diff --git a/drivers/usb/host/xhci-ring.c > > b/drivers/usb/host/xhci-ring.c index b3a0a22..9427175f 100644 > > --- a/drivers/usb/host/xhci-ring.c > > +++ b/drivers/usb/host/xhci-ring.c > > @@ -310,6 +310,10 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci) > > return -ESHUTDOWN; > > } > > > > + /* writing the CMD_RING_ABORT bit should create a command completion > > + * event, add a command completion timeout for it as well > > + */ > > + mod_timer(&xhci->cmd_timer, jiffies + > > + XHCI_CMD_DEFAULT_TIMEOUT); > > return 0; > > } > > > > @@ -1243,6 +1247,7 @@ void xhci_handle_command_timeout(unsigned long data) > > int ret; > > unsigned long flags; > > u64 hw_ring_state; > > + u32 old_status; > > struct xhci_command *cur_cmd = NULL; > > xhci = (struct xhci_hcd *) data; > > > > @@ -1250,6 +1255,7 @@ void xhci_handle_command_timeout(unsigned long data) > > spin_lock_irqsave(&xhci->lock, flags); > > if (xhci->current_cmd) { > > cur_cmd = xhci->current_cmd; > > + old_status = cur_cmd->status; > > cur_cmd->status = COMP_CMD_ABORT; > > } > > > > @@ -1272,7 +1278,15 @@ void xhci_handle_command_timeout(unsigned long > data) > > } > > /* command timeout on stopped ring, ring can't be aborted */ > > xhci_dbg(xhci, "Command timeout on stopped ring\n"); > > - xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd); > > + > > + /* is this the second timeout for the same command? ring wont > > restart */ > > + if (old_status == COMP_CMD_ABORT) { > > + xhci_err(xhci, "Abort timeout, Fail to restart cmd ring\n"); > > + xhci_cleanup_command_queue(xhci); > > + /* everything else here to halt, cleanup, free etc kill xhc */ > > + } else > > + xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd); > > + > > spin_unlock_irqrestore(&xhci->lock, flags); > > return; > > } > > diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index > > c502c22..bd18966 100644 > > --- a/drivers/usb/host/xhci.c > > +++ b/drivers/usb/host/xhci.c > > @@ -3450,7 +3450,7 @@ int xhci_discover_or_reset_device(struct usb_hcd *hcd, > struct usb_device *udev) > > if (ret == 1) > > return 0; > > else > > - return -EINVAL; > > + return -ENOTCONN; /* Don't retry */ > > } > > > > if (virt_dev->tt_info) > > > > Sample output after above patch (timer is set as wakeup source): > > > > root@phoenix:~# echo mem > /sys/power/state > > PM: Syncing filesystems ... done. > > Freezing user space processes ... (elapsed 0.001 seconds) done. > > Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. > > PM: suspend of devices complete after 155.658 msecs > > PM: late suspend of devices complete after 1.594 msecs > > PM: noirq suspend of devices complete after 1.496 msecs > > PM: noirq resume of devices complete after 1.290 msecs > > PM: early resume of devices complete after 1.250 msecs usb usb1: root > > hub lost power or was reset usb usb2: root hub lost power or was reset > > xhci-hcd xhci-hcd.0.auto: Abort timeout, Fail to restart cmd ring > > xhci-hcd xhci-hcd.0.auto: Error while assigning device slot ID > > xhci-hcd xhci-hcd.0.auto: Max number of devices this xHCI host supports is > > 127. > > xhci-hcd xhci-hcd.0.auto: Abort timeout, Fail to restart cmd ring > > xhci-hcd xhci-hcd.0.auto: Error while assigning device slot ID > > xhci-hcd xhci-hcd.0.auto: Max number of devices this xHCI host supports is > > 127. > > PM: resume of devices complete after 25436.366 msecs <= resume time is > increased a lot even after using -ENOTCONN > > Restarting tasks ... > > usb 1-1: USB disconnect, device number 2 usb 1-1.2: USB disconnect, > > device number 3 usb 2-1: USB disconnect, device number 2 done. > > root@phoenix:~# > > > > Please share your opinion on other changes for patch submission as well as > > resume > time. > >
> > I think more effort should be put into investigating why this happens in the > first place. > What is the root cause? why doesn't xhci start properly after resume in this > case. > Hello Mathias, I understand your point and would surely debug the root cause of the issue. But implementing the fallback mechanism in SW, if HW does not respond well seems to a better solution to me. Just to add, code prior to common implementation of xhci_handle_command_timeout was handling the above case and was __not__ dependent on HW. Please comment on the possibility of above fallback mechanism in XHCI SW and any side effects that you can foresee. > Optimizing resume time and error paths should be secondary here, resuming > faster > isn't really a matter when xhci is completely stuck. > I agree on above point. - Rajesh Bhagat > -Mathias