On Friday 15 April 2005 1:58 pm, Alan Stern wrote:
> On Fri, 15 Apr 2005, David Brownell wrote:
> > 
> > Could you summarize what tools you used to generate those numbers?
> > Like what kind of driver(s) were active, with what kind of loads.
> > Audio?  Storage?  Networking?  How about other statistics, like
> > minimum, mean, and standard deviation?
> 
> No special measures were taken.  This was done on two ordinary
> workstations.  Networking was up on the P4 but not on the P2.  No user
> programs running other than the shell and the normal background daemons,
> none of which did any USB activity (in particular haldaemon was off).  I
> used usb-storage (with debugging turned off, although that shouldn't
> matter much).  The P4 has EHCI controllers but ehci-hcd wasn't loaded --
> otherwise the test device wouldn't have used uhci-hcd!

So basically this was a usb-storage measurement.  That's probably a
worst-case from the HCD perspective, since virtually everything else
only uses very short queues.  (Other than "usbnet", which at full
speed uses shorter queue lengths; or usb audio, which usually keeps
only two ISO transfers of a handful of msec each.)  Only usb-storage
is routinely queueing more than a dozen KBytes or so (at full speed).

 
> (A lot of kernel debugging features, like cache poisoning, were turned on 
> since that's how I normally do my development.  They may have had a 
> significant impact.)

Cache/memory poisoning certainly would; I've seen it.  Of course, I
normally leave that on too!  Though thankfully it's been ages since
I've had to chase a bug where an HC was using memory after it was freed
by an HCD.  Remember back a few years when that was sadly common?  :)

 
> I didn't keep any statistics other than what you see above, and I only
> ran the test a few times.  It's possible that the numbers are incorrect 
> because, as I realized later, I stored the initial timer value immediately 
> before calling spin_lock_irqsave instead of immediately after.  I can do 
> it over again if you want.
> 
> > It'd also be interesting to compare them for OHCI and EHCI.  I'd
> > expect UHCI would be worse, because of the TD-per-packet thing,
> > but also having some common baselines would be good.
> 
> Would you like to see my test code?  I'll send it to you off-list if you 
> want -- not because it's big but because it's so ugly.  It should be easy 
> enough to adapt it to OHCI and EHCI.

Sure, please do.  It'd be worth gathering statistics at the usbcore
level, IMO, for numbers that are directly comparable.


> > Heck, even just the usbcore/hcd hooks to let the HCDs cache a list of TDs
> > onto the URB would help, without needing any new API... so the invasive
> > changes could be invisible (at first) to device drivers.  TDs could be freed
> > to the per-urb list, and on some architectures (like x86) the re-enqueue
> > path might well be able to use cache-hot memory.
> 
> I'm not sure what would be the best/easiest approach.  Preallocating TDs 
> may not be good if the URB is going to live for a long time.

It'd be "good" in the sense of "when that URB is used, it'll have TDs
available".  The "not good" would be limited to memory from that dma_pool
not being easily shared ... a non-issue unless urbs sit idle.


> And it's not  
> clear how much of the time for enqueue is spent _allocating_ the TDs as 
> opposed to _preparing_ them.

With memory/cache poisoning, allocating and freeing each write whole TDs.
So the cost to allocate a new one from a dma_pool will be more than the
cost to initialize one with data (since it's got to find a TD to allocate).

Ergo my observation that a freelist would be quicker.  It can also be used
to prioritize allocation to TDs that are already cache-hot.


> > Alternatively, a per-endpoint cache of TDs might be even better ... less
> > invasive to usbcore.  That wouldn't help with urb-private data, but for
> > HCDs that need those it'd still just be a single kmalloc/free per submit.
> > That might facilitate addressing the UHCI-specific "lots of TDs" issue.
> > (By a scheme I once sketched:  only URBs to the front of the queue would
> > need TDs allocated, and as TDs get freed they could be mapped onto URBs
> > towards the end.  That'd put a ceiling on the enqueue costs, which is a
> > fine thing from real-time perspectives...)
> 
> This is one of those changes I mentioned earlier.  It shouldn't be
> necessary to have more than, say, 500 TDs allocated for an endpoint at any
> time.  That's about 26 ms worth, or 31 KB of data.  So long as a
> completion interrupt is issued every 200 TDs, it should work fine.

I think this approach is the one I'd take if I had time to do that kind
of work.  I suspect it'd be a much bigger win with UHCI though!

- Dave



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to