On Thu, Jan 23, 2014 at 10:35:56AM +0000, David Laight wrote:
> From: Sarah Sharp
> > Hi David,
> > 
> > I've been thinking about this some more, and I'd like to propose a much
> > simpler solution.
> > 
> > The TD fragment rules didn't go into the xHCI specification until the
> > 1.0 revision.  The code that follows those rules only seems to trigger
> > issues for 0.96 ASMedia hosts, while Intel 1.0 hosts benefit from the
> > code when USB ethernet devices are attached.  The patch that changed the
> > link TRB behavior (commit 6c12db90f19727c76990e7f4801c67a148b30111) was
> > submitted in 2010, and the xHCI 1.0 spec didn't come out until later
> > that year, so it's unlikely that Synopsys host is a 1.0 host either.
> 
> Ah - so I was right in thinking that a LINK TRB mustn't point to
> a TRB that is still owned by the driver.

For that *particular* piece of hardware.  This is not an issue for other
xHCI hosts; this is not something that's spec mandated.

> The thing is, that patch has a timing bug, and I think that is what Walt
> is hitting. It is there regardless of my NOP change.

That does not make sense.  Walt confirmed that your new patch does not
help him.  Therefore either your patch doesn't close the race condition,
or Walt's ASMedia host does not suffer from it.  Either way, I have no
hard proof your patch is necessary and correct.

> In order to hit the bug I've fixed you need two things:
> 1) An xhci controller that doesn't like 'dangling' LINK TRB.
>    (These will be the ones that made the 6c12 fix needed.)
> and:
> 2a) A host system where the timings of the PCIe transfers (especially
>     the latency) are such that the controller manages to read a
>     LINK TRB that prepare_transfer() has updated while queueing a
>     new transfer in response to a transfer completion event.
> or:
> 2b) To setup a second transfer just as the last transfer is completing.
>     I'm not sure this can happen for disks, but it could happen for
>     network traffic.

Neither of us has deep technical knowledge of how this host hardware
works.  We can argue about theoretical race conditions and link TRB
pre-fetches and packet pending flags, but in the end, we don't know how
this buggy host is designed.  We don't know if the timings you mentioned
in #2a are actually a problem.

In fact, looking back at the discussion around this patch, I brought up
the same issue you did about link TRB activation, and the original patch
submitter confirmed it was fine:

http://marc.info/?l=linux-usb&m=127325508724339&w=2

> So this fix is actually just a version of the 6c12 patch that
> actually works!

The original patch works, and the theoretical race condition this
patchset was supposed to fix is not an issue, based on that link.  If
your new commit causes issues with Synopsys hosts, we'll address it when
someone actually reports the issue.

In the meantime, I'm going to go with the simplest fix, and disable the
no-op TRB code for pre-1.0 hosts.  You are welcome to resubmit this
patchset without the patches that rely on the first patch, if you like.

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to