On Sunday 17 April 2005 8:09 pm, Alan Stern wrote:
> On Sun, 17 Apr 2005, David Brownell wrote:
> 
> > I just had a thought:  maybe one of the reasons Microsoft has such big
> > per-request latencies is that they're using something analgous to tasklets.
> > It's always puzzled me why they go to such effort to batch networking
> > requests, for example, especially when the handful of Linux-vs-Windows
> > comparisons I've seen on the same hardware shows that Linux gets better
> > throughput _without_ needing batching.  It could just be more evidence
> > that TCP-offload architectures aren't wins ... or it could be just a
> > not-so-good consequence of a particular USB design tradeoff.
> 
> Does MS really have such large per-request latencies?  How does one get
> hard numbers?

If one were more of a ms-windows hacker, one could just measure.  I'm
having to go by reports from folk who are comparing "usbnet" throughput.

The last number I got had Linux faster (by what I recall as a bit more
than 10% on a 100BaseT highspeed link) ... with Linux just using basic
URB queueing, and Windows using some fancy batching scheme to cope with
the higher latencies.  Lots of Windows drivers do that -- as does RNDIS,
though there's a "one packet at a time" mode too.  Batching always
bothered me, since it's not free to do (or undo) and there are some
nasty implications (like, how long to wait before wrapping up a given
batch and sending it along) that could just as well be handled by the
HC pushing out whatever data's available.


> > > I don't see any way around it, though.  Yes, I would like to take a lot 
> > > of 
> > > the stuff now done with interrupts disabled (which isn't quite the same 
> > > as 
> > > under the spinlock) and let them run with interrupts enabled.  Remember, 
> > > however, that ->enqueue() can be called with interrupts already disabled! 
> > >  
> > 
> > So?  We've talked about other ways to reduce the amount of time spent
> > with IRQs disabled.  (Including not using memory poisoning, and other
> > reductions of TD allocation costs.  As well as less invasive ways to
> > reduce TD allocations.)
> 
> I'm convinced to the extent that I'll hold off on the tasklet until after 
> these other changes have been made.  If they can improve the timings 
> enough that there's no need for a tasklet, then so much the better.

OK, then I'll be happy.  For a while, anyway! :)


> > So at any rate, interrupt transfers would be incurring TWO additional
> > tasklet-induced latencies, which could be driver-visible.  One between
> > IRQ and tasklet running, for the completion.
> 
> I don't believe this delay would be very long.  Aren't tasklets run every
> time the processor returns from an interrupt?  That's not going to take
> much time.  Only the time needed to run higher-priority tasklets, which 
> seems reasonable to me.

Or other IRQs, etc.  I punted the detailed analysis, but briefly:

  * OHCI typically gets a completion (WDH) IRQ at the start of a frame,
    then if there's anything on the control or bulk queues it'll do
    that for 10% of a frame and then start periodic processing.  So it's
    got about 100 usec ... and when I last measured, it had no problem
    processing an interrupt transfer completion and reissuing it so that
    a 1/msec poll rate would work.  (Clearly, load dependent.)

 * EHCI typically gets completion IRQs every microframe (as needed, but
   this is a tuning thing) which means it usually has up to 850 usec to
   satisfy the 1/msec polling rate (again with a single non-queued URB).

As I understand what UHCI hardware does, it won't do as well on either of
those cases.  But certainly for OHCI, that 100 usec available to handle
the IRQ (so far as stick-urb-back-on-schedule) is a bit tight; anything
to stretch out the latency would hurt the ability to do back-to-back
transfers with a resubmit in between.


> > Although I've always recommended that periodic transfers maintain queues
> > on the endpoint if they're going to defend against increased latencies,
> > usually it's just ISO drivers doing that.  Any driver that potentially
> > uses interrupt transfers with fast transfer intervals could notice the
> > difference if both completion and submit processing go to tasklets, which
> > increases software-visible latencies.
> 
> The delay for resubmission by a completion handler also won't be very 
> long.  After all, the completion handler is called _by the tasklet_, so 
> the tasklet is already running.  There won't be any overhead for starting 
> it up.

As I said, depending on how it's coded; you're assuming some merging.
But you're also not denying there'd be additional latencies added on
the resubmit paths.


>         There's no reason the
> driver can't make its own copy of the struct pt_regs and pass a pointer to
> the copy to a completion handler later on.  (Not a copy of the original
> pointer; that would be useless, as you said.)

Nothing except the fact that it'd be invalid (i.e. not current) at that time.
Better to just pass a null pointer.


> > > ... why should the driver care if the queue is still running?  
> > > How could it even tell? 
> > 
> > By data stream corruption when the hardware keeps processing the
> > requests.  Yes, tricky to detect directly, and not common ... the
> > point of the current (pre-patch) API was to preclude such trouble
> 
> But if the HCD works correctly there won't be any data stream corruption.  
> Certainly processing URBs 1-14 won't cause corruption, if URB 15 was 
> the one that got unlinked.

Corruption would be 1-14 followed by 16 ... possible if the queue
wasn't stopped.


> > > All it cares about is that the queue should stop 
> > > before the hardware reaches URB 16.  If the HCD can make that guarantee
> > > without stopping the queue, why shouldn't it?
> > 
> > "If the queue can be stopped without stopping the queue..." ????
> 
> The idea I'm trying to get across here is that the queue doesn't have to 
> be stopped _at the time the completion handler runs_.  It suffices to stop 
> the queue any time before the hardware reaches the unlinked URB.  If the 
> HCD can do so safely, and if that time doesn't occur until after the 
> completion handler has finished, then the handler can run while the queue 
> is active.

That sounds more sensible, though ...


> That's what my documentation change was intended to convey.  The old text 
> said that the queue would be stopped when the handler runs.  The new text 
> gives the HCD more leeway; it doesn't have to stop the queue ASAP but only 
> when stopping becomes necessary.

... I guess I still don't see how the HCD could guarantee that
without first stopping the queue ... if it's running, it could end
up running past that point.


> > > That's what I said: Don't report the fault completion right away when an
> > > URB is unlinked; hold it back until the queue stops and all the unlinked
> > > URBs are returned.  Only the "don't report right away" part is new.  The
> > > existing code already takes care of completion reports when the queue
> > > stops; no additional mechanism is needed there (contrary to what you
> > > said).
> > 
> > But there _is_ a new mechanism you want to require, as you said.
> > Holding back all fault completions iff an unlink is pending.
> 
> That's right.  Your case 3b needs something like this if it is to follow 
> the API's guidelines.  The way the HCDs currently handle it doesn't agree 
> with either the old or the new documentation.

Which was being addressed separately ... not entirely to my
satisfaction, but safely for now.  I think we agree that some
changes closer to usbcore/hcd are needed for that case.  Just
what those changes should be is a different issue.  :)

- Dave



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to