Hi Mathias,
We have run into a problem with a USB printer which we're quite confident is a
bug in the Linux xHCI driver. There is no problem when the same printer is
plugged into a port managed by the EHCI driver.
The core problem is that xhci_reset_endpoint() doesn't do anything, and more
specifically does not reset the xHC's data toggle/sequence number. That is not
normally an issue, because the reset does happen in response to a STALL; in our
scenario, there is no STALL or any other error. That can lead to the data
toggle getting out of sync and the host dropping a packet sent by the device.
Now a detailed problem description. We have a USB printer passed through to a
VM. The VM runs Windows 8.1 or 10 (other versions may be affected too), and
uses Microsoft's standard usbprint.sys to talk to the printer. The vendor
printer driver tries to query the printer's configuration, using the control
endpoint, one OUT endpoint, and one IN endpoint. The query always times
out/fails when printer is plugged into a port managed by xHCI, yet works in
EHCI ports.
The usbprint.sys driver is a bit funny and in many cases (though not always)
queues up URBs on the IN endpoint in advance, and once it decides that it has
received the entire response, cancels the last URB and resets the IN endpoint
(issuing SetFeature(CLEAR_HALT)). After much head scratching, we realized, and
later confirmed with a USB analyzer, that the next IN packet that the printer
sends is not seen by the host's USB stack at all, let alone the guest OS. Other
packets arrive just fine, but the guest OS keeps waiting for more data to
arrive, eventually loses patience and fails.
We cannot observe the data toggle state of the xHC but we are fairly certain
that things go wrong when the data toggle is set (on both ends) prior to the
endpoint reset. SetFeature(CLEAR_HALT) resets the toggle on the device, but not
on the host. But we know for a fact that the device sends a packet (with data
toggle 0) which the host USB stack never sees, and a data toggle mismatch
explains that quite well.
We are using USBFS to talk to the printer, but that shouldn't matter much. I
will note that the available documentation<1> explicitly says that
USBDEVFS_RESETEP and USBDEVFS_CLEAR_HALT both reset the data toggle. That is
indeed the case for the Linux EHCI driver but not xHCI. Both of the USBFS
IOCTLs call into xhci_reset_endpoint() which does nothing.
We believe that xhci_reset_endpoint() needs to reset the data toggle/sequence
number to match the documentation and for compatibility with the EHCI driver.
We tried but failed to find a workaround which would reset the data toggle
without side effects (e.g. USBDEVFS_SETINTERFACE does reset the toggle on the
IN endpoint, but also resets it on the OUT endpoint and talks to the device, so
that's no good).
The data toggle management is not terribly well documented in the xHCI spec so
we hope you know about it more than we do. Based on our understanding of the
xHCI specification, xhci_reset_endpoint() should issue either a Reset Endpoint
command with TSP=0 or a dummy Configure Endpoint command dropping/re-adding the
specified endpoint (as the xHCI 1.1 spec suggests at the end of 4.6.8). Please
confirm if that should solve the problem.
We don't know how many devices this problem affects. We suspect it affects
many USB printers and could in theory affect more or less any device, but few
drivers reset endpoints when there are no errors. The problem scenario can
probably be artificially reproduced with more or less any USB device (when data
toggle is set, issue USBDEVFS_CLEAR_HALT, see if next packet arrives at
destination).
Regards,
Michal
1:
https://www.kernel.org/doc/htmldocs/usb/usbfs-ioctl.html