On Tue, Nov 12, 2013 at 06:59:02PM +0100, Martin Sperl wrote: > First, sorry for the bad formatting in the last email - it looked ok when > typing > it with Thunderbird... I even used the recommended plugin to make it wrap > correctly (still format=flowed).
Like I say still reading that, a couple of initial thoughts though... > As for other interesting measurements a single example with 5 transfers: > Interrupt to __spi_async: 19us > __spi_async sanity start/end: 2us > __SPI_ASYNC to DMA_PREPARE: 99us > dma_prepare start/end: 40us > dma_prepare_end to CS DOWN: 4us > CS DOWN to CS UP: 16us (real transfer) This is making me question the use of DMA at all here, this looks like the situation of a lot of drivers where they switch to PIO mode for small transfers since the cost of managing DMA is too great. I'm also curious which parts of the DMA preparation are expensive - is it the building of the datastructure for DMA or is it dealing with the coherency issues for the DMA controller? The dmaengine API currently needs transfers rebuilding each time I believe... Also how does this scale for larger messages? I appreciate that you want to push the entire queue down into hardware, I'm partly thinking of the costs for drivers that don't go and do any of the precooking here. > This is actually one of the biggest "factors" to latency on its own. > And that is the scheduling delay of the message-pump! Right, which is why I'm much more interested in the refactoring of that code to support kicking transfers without bouncing up there - the stuff I was talking about which made you talk about checksumming and so on. > So the "threaded" approach - even though nice on paper - is actually > introducing long latencies. And that is why i would like to see how > a fully DMA pipelined driver would fare - my guess is that the delays > would be much smaller. It will not change the delays interrupt to > running the handler, but the scheduling of the message pump would go > away immediately. Exactly, but this is largely orthogonal to having the client drivers precook the messages. You can't just discard the thread since some things like clock reprogramming can be sleeping but we should be using it less (and some of the use that is needed can be run in parallel with the transfers of other messages if we build up a queue). > So I hope that these are some hard enough facts for you to say, that > "preparing" messages, that do not "change in structure" in a device > driver _does_ bring an advantage. The 40us is definitely somewhat interesting though I'd be interested to know how that compares with PIO too.
signature.asc
Description: Digital signature
