On Fri, Oct 11, 2019 at 03:58:42PM +0300, Mathias Nyman wrote:
> commit ef513be0a905 ("usb: xhci: Add Clear_TT_Buffer") schedules work
> to clear TT buffer, but causes a use-after-free regression at the same time
>
> Make sure hub_tt_work finishes before endpoint is disabled, otherwise
> the work will dereference already freed endpoint and device related
> pointers.
>
> This was triggered when usb core failed to read the configuration
> descriptor of a FS/LS device during enumeration.
> xhci driver queued clear_tt_work while usb core freed and reallocated
> a new device for the next enumeration attempt.
>
> EHCI driver implents ehci_endpoint_disable() that makes sure
> clear_tt_work has finished before it returns, but xhci lacks this support.
> usb core will call hcd->driver->endpoint_disable() callback before
> disabling endpoints, so we want this in xhci as well.
>
> The added xhci_endpoint_disable() is based on ehci_endpoint_disable()
>
> Fixes: ef513be0a905 ("usb: xhci: Add Clear_TT_Buffer")
> Cc: <[email protected]> # v5.3
> Reported-by: Johan Hovold <[email protected]>
> Signed-off-by: Mathias Nyman <[email protected]>
> ---
> drivers/usb/host/xhci.c | 43 +++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 43 insertions(+)
>
> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> index 5cfbf9a04494..6e817686d04f 100644
> --- a/drivers/usb/host/xhci.c
> +++ b/drivers/usb/host/xhci.c
> @@ -3071,6 +3071,48 @@ void xhci_cleanup_stalled_ring(struct xhci_hcd *xhci,
> unsigned int ep_index,
> }
> }
>
> +static void xhci_endpoint_disable(struct usb_hcd *hcd,
> + struct usb_host_endpoint *host_ep)
> +{
> + struct xhci_hcd *xhci;
> + struct xhci_virt_device *vdev;
> + struct xhci_virt_ep *ep;
> + struct usb_device *udev;
> + unsigned long flags;
> + unsigned int ep_index;
> +
> + xhci = hcd_to_xhci(hcd);
> +rescan:
> + spin_lock_irqsave(&xhci->lock, flags);
> +
> + udev = (struct usb_device *)host_ep->hcpriv;
> + if (!udev || !udev->slot_id)
> + goto done;
> +
> + vdev = xhci->devs[udev->slot_id];
> + if (!vdev)
> + goto done;
> +
> + ep_index = xhci_get_endpoint_index(&host_ep->desc);
> + ep = &vdev->eps[ep_index];
> + if (!ep)
> + goto done;
> +
> + /* wait for hub_tt_work to finish clearing hub TT */
> + if (ep->ep_state & EP_CLEARING_TT) {
> + spin_unlock_irqrestore(&xhci->lock, flags);
> + schedule_timeout_uninterruptible(1);
> + goto rescan;
> + }
> +
> + if (ep->ep_state)
> + xhci_dbg(xhci, "endpoint disable with ep_state 0x%x\n",
> + ep->ep_state);
> +done:
> + host_ep->hcpriv = NULL;
> + spin_unlock_irqrestore(&xhci->lock, flags);
> +}
> +
I used essentially the same reproducer as you did for debugging this
after I first hit it with an actually stalled control endpoint, and this
patch works also with my fault-injection hack.
I've reviewed the code and it looks good to me except for one mostly
theoretical issue. You need to check ep->hc_priv while holding the
xhci->lock in xhci_clear_tt_buffer_complete() or you could end up having
xhci_endpoint_disable() reschedule indefinitely while waiting for
EP_CLEARING_TT to be cleared on a sufficiently weakly ordered
system.
Since cfbb8a84c2d2 ("xhci: Fix NULL pointer dereference in
xhci_clear_tt_buffer_complete()") isn't needed anymore and is slightly
misleading, I suggest amending the patch with the following:
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 9b1e15fe2c8e..6c17e3fe181a 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -5280,20 +5280,13 @@ static void xhci_clear_tt_buffer_complete(struct
usb_hcd *hcd,
unsigned int ep_index;
unsigned long flags;
- /*
- * udev might be NULL if tt buffer is cleared during a failed device
- * enumeration due to a halted control endpoint. Usb core might
- * have allocated a new udev for the next enumeration attempt.
- */
-
xhci = hcd_to_xhci(hcd);
+
+ spin_lock_irqsave(&xhci->lock, flags);
udev = (struct usb_device *)ep->hcpriv;
- if (!udev)
- return;
slot_id = udev->slot_id;
ep_index = xhci_get_endpoint_index(&ep->desc);
- spin_lock_irqsave(&xhci->lock, flags);
xhci->devs[slot_id]->eps[ep_index].ep_state &= ~EP_CLEARING_TT;
xhci_ring_doorbell_for_active_rings(xhci, slot_id, ep_index);
spin_unlock_irqrestore(&xhci->lock, flags);
Feel free to add my:
Suggested-by: Johan Hovold <[email protected]>
Reviewed-by: Johan Hovold <[email protected]>
Tested-by: Johan Hovold <[email protected]>
Johan