On Tue, May 22, 2001 at 09:35:52AM -0700, David Brownell wrote:
> So in many typical cases, multiple endpoints are involved.

Yes.  But, the number of endpoints involved varies, depending on the
device.

> > Data is always passed via bulk endpoint.  The SCSI layer will allocate the
> > scatter-gather segments for me, and those can vary significantly.  Initial
> > performance tests suggest that memory allocation is _much_ faster (read 4-6
> > times) when smaller segments are used.  That stat is based on teh current
> > codebase -- when the number of segments is increased (and thus the size of
> > each decreases), total throughput jumps dramatically. A "big" segment is
> > 4K, a small segment is 512 bytes.
> 
> That jump being because the SCSI layer spent less time allocating?  Or is
> that "8 times the data for 4-6 times the cost"?

The jump is from the allocation.  We're talking about the same amount of
data for 4-6 times the cost.

> > Currently, since I need to maintain the synchronization between endpoints
> > manually, I handle each URB individualy.
> 
> Are C/B* devices required to ignore (NAK) bulk transfers until they get
> the control command?  If so, software synch there can be avoided...
> Otherwise I think it'll be hard to avoid that ... the HC will have to
> finish transferring the command before anything can submit the data.

There is no such requirement.  Some devices will crash if you sequence
things that way.

My hope was that I could use the fact that the ->next pointer doesn't
submit the data stage URB until _after_ the command stage URB is completed.

> >      This works well when I've only
> > got one command to deal with at a time, but I'd like to be able to handle
> > multiple commands in the queue to improve performance.  This only makes
> > sense when I can use my CPU time to construct the URB chains ahead of time,
> > and submit them all at once, letting the DMA hardware take care of that
> > series.  Note that I don't actually need any completion handler code except
> > for the last (and possibly next-to-last in _some_ data-in cases) URB.  The
> > URBs really do take care of themselves.  The problem is, they're not all
> > bulk transfers.
> 
> But any one of them can get an error ... I don't quite see how the URBs can
> take care of themselves that much.

Right... but what errors?  The endpoint can stall in the data stage,
retiring all the data stage URBs and accelerating us to the status stage.
The device can be removed, which retires all the URBs.  Short packets won't
happen, and if they do, we're so hosed anyway that killing the device is
the only option.... 

> I _do_ see how you can take one SCSI command, create all its URBs,
> and start processing those all at once ... reducing today's scheduling
> overheads by (A) doing more inter-transfer work in completion handlers,
> and (B) using bulk queuing for the "data" stage of the transaction.

Here's a question then:  Using QUEUE_BULK, what happens when the endpoint
STALLs?

> > Here's what I'd like in my dream world:  I allocate a largeish pool of URBs
> > at init time.  I then take a command off the command queue and allocate
> > URBs to handle it, and submit them. 
> 
> I can see all of that working with today's URB submission API, as sketched
> above.  Though I'm thinking the "submit" would be to a small state machine
> engine used only by usb-storage, which talks to usbcore as needed.  (Those
> synchronous control/bulk submission routines would vanish ...)

Doesn't it seem a bit silly to need a custom state machine here, when more
drivers might be able to use it?

Yes, it could be done your way (assuming that STALLs affect things in a
sane way, and submitting URBs from a completion handler is okay, and
assuming that more than 2 URBs can be queued on a bulk endpoint).

Matt

-- 
Matthew Dharm                              Home: [EMAIL PROTECTED] 
Maintainer, Linux USB Mass Storage Driver

Oh great modem, why hast thou forsaken me?
                                        -- Dust Puppy
User Friendly, 3/2/1998

PGP signature

Reply via email to