Re: Depreciated spi_master.transfer and "prepared spi messages" for an optimized pipelined-SPI-DMA-driver

Martin Sperl Wed, 06 Nov 2013 16:44:48 -0800

Hi Mark!

The problem is that you intend to do prepare/unprepare transparently.


OK - but I see problems with heuristics on when it is safe to assume
that a message has not been changed since the last call (either xfers
or tx/rx_buff pointer). If it has not changed then we can use the cached 
computations from the previous step ( equivalent to prepared).

Figuring out if the message has changed or not is consuming too much CPU
and that is - as far as I am understanding - the main reason why we want to 
have the option to prepare/unprepare a message: save CPU cycles.

See my "quick&dirty" modification to _spi_async how prepare could help 
improve avoid spending CPU cycles even for "normal" cases that do not 
run on HW where preparing would not help except for avoiding running the 
sanity checks on the message every time.

As said: running a single transfer 10000 times/s will run the following code:
        /* Half-duplex links include original MicroWire, and ones with
         * only one data pin like SPI_3WIRE (switches direction) or where
         * either MOSI or MISO is missing.  They can also be caused by
         * software limitations.
         */
        if ((master->flags & SPI_MASTER_HALF_DUPLEX)
                        || (spi->mode & SPI_3WIRE)) {
                unsigned flags = master->flags;

                list_for_each_entry(xfer, &message->transfers, transfer_list) {
                        if (xfer->rx_buf && xfer->tx_buf)
                                return -EINVAL;
                        if ((flags & SPI_MASTER_NO_TX) && xfer->tx_buf)
                                return -EINVAL;
                        if ((flags & SPI_MASTER_NO_RX) && xfer->rx_buf)
                                return -EINVAL;
                }
        }

        /**
         * Set transfer bits_per_word and max speed as spi device default if
         * it is not set for this transfer.
         */
        list_for_each_entry(xfer, &message->transfers, transfer_list) {
                if (!xfer->bits_per_word)
                        xfer->bits_per_word = spi->bits_per_word;
                if (!xfer->speed_hz)
                        xfer->speed_hz = spi->max_speed_hz;
                if (master->bits_per_word_mask) {
                        /* Only 32 bits fit in the mask */
                        if (xfer->bits_per_word > 32)
                                return -EINVAL;
                        if (!(master->bits_per_word_mask &
                                        BIT(xfer->bits_per_word - 1)))
                                return -EINVAL;
                }
        }

10000 times/s.

With a "prepared" message by the driver instead, this message would get 
prepared 
(maybe in this case better call it sanity-checked) once and from then on this
piece of code does not get run at all, which would reduce CPU load immediately.

But if you have to loop over the message to see if it has changed (by 
calculating 
a checksum or comparing to something else), then you have lost this opportunity 
to optimize out the above code...

Is the above described scenario of 10k spi messages/s realistic?

Yes - definitely.

The can Driver for which I have started my investigation is processing about 
3200 
can-messages/second. Each generates an interrupt - so 3200 interrupts and each
is sending 3 spi_messages with multiple transfers in sequence (all of which can
get and are already prepared...)

So we are essentially at the example above with 9600 SPI messages/s and the
code getting run 9600 times unnecessarily.

Obviously this is not so important for drivers that run maybe 10 SPI 
messages/s, but 
it becomes an important factor at higher volumes.

That is the reason why I am so persistent about this subject and trying to get
a solution that does not require a lot of CPU cycles.

Ciao,
        Martin

P.s: And my guess is that avoiding this "sanity-check" alone would reduce the 
system load for my test-case by 2-3% alone that would be down from 77% to 74% 
CPU.
(and I am sure I can bring it down further if I am moving away from the 
scheduling
overhead that is immanent to the "transfer_one_message" worker interface.
The preparing of the DMA Structure reduced the System load by about 7-8%.

I can try to quantify this number if you really want by NOT using spi_async but 
calling 
  message->spi=spi;
  message->status=-EINPROGRESS;
  status=spi->master->transmit(spi,message);
directly.

But not tonight...

--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Depreciated spi_master.transfer and "prepared spi messages" for an optimized pipelined-SPI-DMA-driver

Reply via email to