The attached patch series makes several improvements to the usbdi(9)
API (the USB device driver interface) and to the kernel's USB host
controller interface API, and fixes bugs in xhci(4) abort/stall.
I'm not quite ready to commit this because (a) I want to make some
more improvements to the host controller interface API, and (b) I need
help testing and auditing drivers for a change to usbd_abort_pipe.
Review, testing, and comments welcome!
Summary of changes to usbdi(9):
- The following functions which previously returned a usbd_status but
never failed in practice now return void:
. usbd_close_pipe
. usbd_abort_pipe
. usbd_abort_defaut_pipe
- usbd_transfer no longer races with usbd_abort_pipe to detect when
the pipe is currently being aborted.
- usbd_abort_pipe is now persistent, not transient: new xfers will
continue to fail with USBD_CANCELLED even after it returns.
Previously, usbd_abort_pipe would abort any pending xfers, cause new
xfers to fail with USDB_CANCELLED _while aborting_, wait for the
pending xfers to complete, and then allow new xfers to be submitted
again.
This causes a race leading to deadlock and/or use-after-free with
synchronous xfers, such as usbd_sync_transfer or usbd_bulk_transfer.
Many drivers have logic where one thread (e.g., in read or write on
a /dev node) is doing I/O and another thread (e.g., in close or
revoke, or when the device is detached) is trying to abort the I/O:
/* I/O thread */
if (sc->sc_dying)
return EIO;
// (*)
usbd_bulk_transfer(...);
/* abort thread */
sc->sc_dying = true;
usbd_abort_pipe(...);
wait_for_io_thread(...);
usbd_destroy_xfer(...);
If the abort logic starts, and finishes, concurrently with the I/O
thread at the line marked (*), then there is a race: the I/O thread
doesn't notice sc_dying has been set, and can start to issue a new
xfer on the pipe; and the abort thread doesn't notice that a new
xfer may have been issued so it either deadlocks waiting for the I/O
thread (now waiting indefinitely for the xfer) or proceeds to free
data structures like the xfer itself which the I/O thread is trying
to use.
With this change, usbd_abort_pipe is persistent -- and mandatory
before usbd_close_pipe, detected by KASSERT.
This fixes the example above. I haven't found any other generic way
to fix this example without changing the API -- it is tempting, for
instance, to put a lock around both threads, but this deadlocks
because usbd_bulk_transfer is synchronous. We could put an
interlock into the usbd_bulk_transfer which it would release once it
has the pipe lock, but that's a lot more complexity for little gain.
CAVEAT: There may be some (badly-written) drivers that rely on
usbd_abort_pipe to be transient. For example, it looks like ubt(4)
might do this. So it may have to be fixed -- need to close the pipe
and reopen it later, like usbnet(9) does.
Summary of changes to host controller interface API:
- struct usbd_pipe_methods::upm_transfer is no longer responsible for
calling usb_insert_transfer to insert the transfer in the pipe's
queue. usbd_transfer takes care of that itself.
This was necessary to fix the usbd_transfer / usbd_abort_pipe race
mentioned above.
XXX This part isn't done yet -- I also want to keep the pipe's queue
locked across .upm_transfer _and_ .upm_start, because right now,
except for dwc2, every hci driver abuses access to the pipe's queue
without the lock in .upm_transfer. This change actually makes dwc2
slightly worse, but once the lock can cover both .upm_transfer and
.upm_start, it will be back to what it was before, and all the other
hci drivers will be better off.
- struct usbd_bus_methods::ubm_abortx is no longer responsible for
calling usb_transfer_complete -- usbd_xfer_abort now does that.
Note that .ubm_abortx MUST NOT release the pipe (bus) lock; this was
a bug in xhci(4) abort.
>From 7b3c6d322dbbc91fcee42bc66776f1827f81d619 Mon Sep 17 00:00:00 2001
From: Taylor R Campbell
Date: Sat, 8 Jan 2022 13:54:12 +
Subject: [PATCH 01/13] usb: Factor usb_insert_transfer out of upm_transfer and
make private.
Almost every upm_transfer function starts with:
mutex_enter(>sc_lock);
err = usb_insert_transfer(xfer);
mutex_exit(>sc_lock);
if (err)
return err;
Some of them have debug messages sprinkled in here too, or assert
that err == USBD_NORMAL_COMPLETION (alternative is USBD_IN_PROGRESS,
only for pipes with up_running or up_serialise, presumably not
applicable for these types of pipes). Some of them also assert
xfer->ux_status == USBD_NOT_STARTED, which is guaranteed on entry and
preserved by usb_insert_transer.
Exceptions:
- arch/mips/adm5120/dev/ahci.c ahci_device_isoc_transfer just returns
USBD_NORMAL_COMPLETION, but I'm pretty sure this is and always has
been broken