On Wed, 20 Feb 2008, David Brownell wrote:

> >     CPU 0                           CPU 1
> >     -----                           -----
> >     Watchdog timer expires
> >     Timer routine acquires spinlock
> >                                     IAA IRQ arrives
> >                                     ehci_irq tries to acquire 
> >                                             spinlock...

The following comment refers to the "Timer routine either sets" below, 
right?

> There's another condition here, and
> another action.  The condition is
> that ehci->reclaim must first be set;
> the action is to clear STS_IAA (and,
> given the previous patch, maybe IAAD).
> 
> And this "either" is more concisely
> written as "call end_unlink_async()"
> (point made just for clarity).

Correct on both counts.  I had forgotten that the watchdog routine 
clears STS_IAA.

> >     Timer routine either sets
> >             ehci->reclaim to NULL 
> >             or else starts a new
> >             IAA cycle
> >     Timer routine releases spinlock
> >             and returns
> >                                     ehci_irq acquires spinlock
> >                                             and sees IAA is set
> 
>                                       Can only happen if a new IAA
>                                       cycle was started by CPU0, and
>                                       the IAA condition triggered
>                                       that quickly.
> 
> >                                     Call end_unlink_async()!

Okay, so this isn't as bad as it seemed.  I don't have a copy of your 
most recent patch, but it seems clear that the watchdog routine must:

        First remove the circumstances that would cause the controller 
        to set IAA.  I guess that means clearing IAAD; it's not
        entirely clear from the spec whether this will do what we 
        want.

        Then clear IAA (if it happens to be set).

This is the only way to avoid the race, and I know that my original
version of the routine does these steps in the wrong order (if at all).  
That should be fixed.  Given sufficiently bizarre hardware we can't be
certain that things won't still go wrong on occasion, but this is the
best we can do for now -- weird hardware can be handled as it arises.

The other change to make (which you have already anticipated) is to 
guard against ehci->reclaim == NULL in end_unlink_async().  There's no 
real need for a warning or stack dump; it should just return silently 
when this happens.  If there is a warning, maybe it should be placed at 
the site of the caller (for example, in ehci_irq() when STS_IAA is 
detected).

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to