On Tue, Nov 12, 2013 at 06:59:02PM +0100, Martin Sperl wrote:

> First, sorry for the bad formatting in the last email - it looked ok when 
> typing 
> it with Thunderbird... I even used the recommended plugin to make it wrap 
> correctly (still format=flowed).

Like I say still reading that, a couple of initial thoughts though...

> As for other interesting measurements a single example with 5 transfers:
> Interrupt to __spi_async:     19us
> __spi_async sanity start/end:  2us
> __SPI_ASYNC to DMA_PREPARE:   99us
> dma_prepare start/end:        40us
> dma_prepare_end to CS DOWN:    4us
> CS DOWN to CS UP:             16us (real transfer)

This is making me question the use of DMA at all here, this looks like
the situation of a lot of drivers where they switch to PIO mode for
small transfers since the cost of managing DMA is too great.  I'm also
curious which parts of the DMA preparation are expensive - is it the
building of the datastructure for DMA or is it dealing with the
coherency issues for the DMA controller?  The dmaengine API currently
needs transfers rebuilding each time I believe...

Also how does this scale for larger messages?

I appreciate that you want to push the entire queue down into hardware,
I'm partly thinking of the costs for drivers that don't go and do any of
the precooking here.

> This is actually one of the biggest "factors" to latency on its own.
> And that is the scheduling delay of the message-pump!

Right, which is why I'm much more interested in the refactoring of that
code to support kicking transfers without bouncing up there - the stuff
I was talking about which made you talk about checksumming and so on.

> So the "threaded" approach - even though nice on paper - is actually
> introducing long latencies. And that is why i would like to see how 
> a fully DMA pipelined driver would fare - my guess is that the delays
> would be much smaller. It will not change the delays interrupt to
> running the handler, but the scheduling of the message pump would go
> away immediately.

Exactly, but this is largely orthogonal to having the client drivers
precook the messages.  You can't just discard the thread since some
things like clock reprogramming can be sleeping but we should be using
it less (and some of the use that is needed can be run in parallel with
the transfers of other messages if we build up a queue).

> So I hope that these are some hard enough facts for you to say, that 
> "preparing" messages, that do not "change in structure" in a device
> driver _does_ bring an advantage.

The 40us is definitely somewhat interesting though I'd be interested to
know how that compares with PIO too.

Attachment: signature.asc
Description: Digital signature

Reply via email to