Re: Depreciated spi_master.transfer and "prepared spi messages" for an optimized pipelined-SPI-DMA-driver

Martin Sperl Sun, 10 Nov 2013 08:41:46 -0800

On 10.11.2013, at 12:05, Mark Brown wrote:
> An example of the sort of thing I'm concerned about: if I understand
> correctly one of the things the prepare and unprepare operations in the
> driver do is the DMA mapping.  This will mean that if you prepare in the
> caller the system will be able to do the DMA mapping while a prior
> transfer is in progress which should help performance.  That's a good
> thing to do but it should be something that can be done as part of the
> existing API without needing drivers using SPI to be updated.


yes - but that comes back to my argument that this is feasible to do 
transparently within the framework, but the cost of doing so would
eat away most of the possible performance gains unnecessarily and make the
situation worse for situations where there IS a change.

As explained - you would need to make an educated guess if a spi_transfer 
chain has changed or not (and I talk about the need to check the chain 
itself, speed, transferlength, rx_buff,tx_buff,... have not changed).

Doing this sort of checking will still take a few usecs per message and call.
One could do some hashing (with the risk of collisions) or keeping a second
copy of the data and do a byte wise compare of spi_message and the spi_transfer
chain. Or some other.

But still this requires processing you need to run on each spi_async call, 
independently if it has been prepared via the framework preciously or not.

And the risk of finding a false positive is non-0 (in case of hashing) and 
then you still spend spinning CPU cycles on it (not to mention the debugging
nightmare if such a situation occurs on a production system)

So you waste CPU cycles unnecessarily for something where you are unsure.

So assuming you need 1us per message for this kind of "test" if the structure 
has changed or not, and you run 3000 such SPI messages/s, then you are wasting
3ms of time.

And (again to repeat myself) you have on top of that a memory management
issue, because you then would need to garbage collect your prepared resources
for all those cases where the message just got allocated on the heap and 
discarded afterwards - there is no means in the current API to release the 
memory when releasing the memory of the spi_message. 
So to make things efficient here you would need to extend the API - but this
time it would be a mandatory change for all drivers (and I doubt you will find
EVERY instance resulting in bugs to fix down the road)

Also with this approach the "message" would possibly need to get prepared in
IRQ context (because spi_async can get called from interrupt handler) 
which would require allocation of memory in GFP_ATOMIC context just to play 
it safe in all situations. Not so good either.

I am mostly seeing this "spi_prepare_message" as a method for the driver to 
give some optimization hints to the engine to minimize "parsing" of the 
structures. 

And I would also prefer - if possible - to see the API to be able
to allocate memory from "blocking" memory pools, so the spi_prepare_message
would be allowed to run only in threaded context and may not get called from
interrupt contexts. Anyway those "typical" calls can easily get prepared 
during the module initialization phase and to avoid memory allocation in 
interrupt handlers and callbacks...

Ok - I have tried to explain my concerns again.

So if you still think you can do the things requested via the framework, then

Here a "quick" example where the "prepared messages" would reduce CPU overhead
immediately without any bus-driver specific implemention:

static int __spi_async(struct spi_device *spi, struct spi_message *message)
{
        struct spi_master *master = spi->master;
        struct spi_transfer *xfer;
+       /* run the below computations only if our message has not been changed 
*/
+       if (__spi_message_has_changed(spi,message)) {
                /* Half-duplex links include original MicroWire, and ones with
                 * only one data pin like SPI_3WIRE (switches direction) or 
where
                 * either MOSI or MISO is missing.  They can also be caused by
                 * software limitations.
                 */
                if ((master->flags & SPI_MASTER_HALF_DUPLEX)
                                || (spi->mode & SPI_3WIRE)) {
                        unsigned flags = master->flags;
        
                        list_for_each_entry(xfer, &message->transfers, 
transfer_list) {
                                if (xfer->rx_buf && xfer->tx_buf)
                                        return -EINVAL;
                                if ((flags & SPI_MASTER_NO_TX) && xfer->tx_buf)
                                        return -EINVAL;
                                if ((flags & SPI_MASTER_NO_RX) && xfer->rx_buf)
                                        return -EINVAL;
                        }
                }

                /**
                 * Set transfer bits_per_word and max speed as spi device 
default if
                 * it is not set for this transfer.
                 */
                list_for_each_entry(xfer, &message->transfers, transfer_list) {
                        if (!xfer->bits_per_word)
                                xfer->bits_per_word = spi->bits_per_word;
                        if (!xfer->speed_hz)
                                xfer->speed_hz = spi->max_speed_hz;
                        if (master->bits_per_word_mask) {
                                /* Only 32 bits fit in the mask */
                                if (xfer->bits_per_word > 32)
                                        return -EINVAL;
                                if (!(master->bits_per_word_mask &
                                                BIT(xfer->bits_per_word - 1)))
                                        return -EINVAL;
                        }
                }
+               /* call spi_prepare_message */
+               if (master->prepare_message) {
+                       ret=master->prepare_message(master,message);
+                       if (ret)
+                               return ret;
+               }
+       }
        message->spi = spi;
        message->status = -EINPROGRESS;
        return master->transfer(spi, message);
}

So how would you implement the __spi_message_has_changed function in a manner 
that is performance efficient and worth called in a "hot-path"?

And how would you handle the garbage collection required for the "transparent 
case"
you are trying to get.

Please stop wishfull thinking, instead provide alternative ideas how to solve 
the issue.

Please provide a solution that works as simple as:
/* in the driver init code - if you want to "improve" the latency of spi_async 
*/
spi_prepare_message(spi,message);
/* in the driver release code - if you want to "improve" the latency of 
spi_async */
spi_unprepare_message(spi,message);

/* and the "check" itself */
static int __spi_message_has_changed(spi,message) 
{ 
        return (! message->is_driver_prepared); 
}

So what about the performance measurements I have mentioned?
Do you have any concerns with the described approach and the metrics proposed?
You want me to measure and publish those results?

Martin--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Depreciated spi_master.transfer and "prepared spi messages" for an optimized pipelined-SPI-DMA-driver

Reply via email to