Re: [RFC 03/10] xhci: Handle command stalls.

Sarah Sharp Fri, 08 Mar 2013 10:55:13 -0800

On Fri, Mar 08, 2013 at 11:14:38AM -0500, Alan Stern wrote:
> On Thu, 7 Mar 2013, Sarah Sharp wrote:
> 
> > "One Ring to rule them all, One Ring to find them,
> > One Ring to bring them all and in the darkness bind them"
> >  -  J. R. R. Tolkien
> > 
> > There is only one command ring for each xHCI host, and all commands flow
> > through that ring.  However, when a tricksy little USB device fails to
> > respond to the Set Address command, the entire ring hangs.
> > 
> > If a Stop Endpoint command had been queued at the same time, the
> > watchdog timer will fire shortly after we start aborting the Set Address
> > command.  The watchdog timer will assume the host is dying and kill it,
> > disconnecting all USB devices under the host.  It's basically a USB DoS
> > attack.
> > 
> > The fix is for all commands to check whether they are the currently
> > executing command on the command ring when they timeout.  If not, they
> > will wait on the command completion again, and the Stop Endpoint
> > watchdog timer will re-queue itself.  If they are the currently
> > executing command, they will cancel themselves, or the Stop Endpoint
> > watchdog timer will assume the host is dying.
> 
> You might want to go farther than this.  I haven't read the driver code 
> enough to be sure how you have this organized, but it seems logical 
> that you shouldn't set a command's expiration time until that command 
> begins executing.
> 
> Suppose the ring contains commands A and B and you set B's expiration 
> time while A is still running.  Then A could continue running until B's 
> time is almost up.  When B does start, the watchdog timer will expire 
> before B has a chance to do anything.


Yes, it's true.  With this patch, the watchdog timer will re-queue
itself, of course.  I do agree that it's inefficient to have multiple
timers running when we could only have one timer running.

> By not setting a command's expiration time until that command begins
> running, you remove the need for the check added by this patch.  It
> will not be possible for the command's watchdog timer to expire before
> the command becomes the one currently executing.

In general, all the commands wait (interruptibly) on a completion, with
a 5 second timeout.  However, the Stop Endpoint command doesn't have a
completion, since it's part of the asynchronous call to cancel an URB.
The Stop Endpoint command is the only one with a watchdog timer.

I started out with an idea to yours, of having a global command list.
Commands would be queued to the FIFO, submitted to the command ring, and
then their submitters would call wait_for_completion_interruptible
(without a timeout), or simply return, in the case of the Stop Endpoint
command.  The watchdog timer routine would be gutted to be a global
command queue manager that timed the currently executing command, and
would cancel it if necessary.

In the end, I decided it would be better for the stable kernels to keep
to the simpler (if less efficient) solution.  But I would like to do the
"right thing" later on.

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 03/10] xhci: Handle command stalls.

Reply via email to