Re: Continuous streaming SPI transfer

2012-10-04 Thread Mark Brown
On Fri, Sep 28, 2012 at 01:09:25AM +0300, Nuutti Kotivuori wrote:

 There seems to be no way to prevent the deactivation and reactivation of
 the clock and everything between separate transfers - and a single
 transfer is bounded in size and no progress is reported for it. Even
 within a single transfer, it would seem that an earlier transfer is

This sounds more like you've got an IIO application (or audio) than a
SPI one - the hardware is the same but you're thinking about it in a
completely different way and so a separate subsystem makes sense.

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
spi-devel-general mailing list
spi-devel-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spi-devel-general


Re: Continuous streaming SPI transfer

2012-09-29 Thread Nuutti Kotivuori
Ned Forrester nforres...@whoi.edu writes:
 On 09/27/2012 06:09 PM, Nuutti Kotivuori wrote:
 I would like to use SPI in a streaming fashion, with a transfer being
 active all the time. This seems very difficult with the current kernel
 drivers. I am mainly looking at the Raspberry Pi BCM2708 SPI driver, but
 I have also tried to find similar features from other drivers.

 You don't say whether you mean to transfer into or out of the external
 device (or both). Nor do you say what clock rate you want to use (what's
 the average bit/byte rate?).

I mean to transfer both in and out, but the clock rate I need is not very
high, 100 kHz should be fine, 250 kHz should be plenty.

 I am not familiar with the hardware you want to use, but I have done
 this on a PXA255 processor, at 11Mbit/sec from the external device to
 the PXA processor.  For this I had to extensively modify the pxa2xx_spi
 controller driver (that was with kernel 2.6.20, I think the name has
 changed in recent kernels), and, of course, to write my own protocol
 driver (the glue between the kernel side of Clib and the controller
 driver).  I did not have to make any changes to the SPI core.

 NOTE WELL: Please don't ask for this code, I cannot share it.

That sounds like a *lot* of effort, something beyond what I'm willing to
put in.

 To overcome this problem, I ran the PXA processor as a slave, with the
 clocks supplied from external hardware.  The problem then becomes
 keeping the receive FIFO from overflowing.  That was solved by enabling
 chained DMA transfers (at enormous driver-writing expense; I'd never
 written a device driver before).

[...]

 I don't think you can achieve what you want without a dedicated
 effort, because the SPI model is not intended for this type of
 application.

I agree, this is so for your data rate. However, for my 250 kHz, things
are much simpler.

Because of my own investigations and your answer, I decided to forgo the
kernel route altogether. Instead, I used the bcm2835 library from
userspace as the peripherals are simply mapped to a fixed address space.

With this, my main loop is simply

  while (!exit) {
uint32_t state = bcm2835_peri_read(spi_cs);
if (state  BCM2835_SPI0_CS_RXF)
  fprintf(stderr, RX FIFO full!\n);
if (state  BCM2835_SPI0_CS_DONE)
  fprintf(stderr, TX FIFO empty!\n);
if (state  BCM2835_SPI0_CS_TXD)
  bcm2835_peri_write_nb(spi_fifo, producebyte());
if (state  BCM2835_SPI0_CS_RXD)
  processbyte(bcm2835_peri_read_nb(spi_fifo));
if (!(state  (BCM2835_SPI0_CS_TXD | BCM2835_SPI0_CS_RXD)))
  nanosleep(loop_wait, NULL);
  }

To explain, I simply loop in a userspace (but realtime scheduled)
process, writing to the TX FIFO if the TX FIFO can accept more input and
reading from the RX FIFO if the RX FIFO has any bytes to read. If there
is neither, I sleep for 300 microseconds.

This piece of code can easily keep both RX and TX FIFOS well tended at
250kHz - and I managed to get it stable at even 2 MHz while testing. The
solution does not use interrupts for anything (as it can't), but since
the data rate is fixed a simple timer is not much worse. The 300
microsecond sleep version uses about 7% of the CPU, and that is easily
acceptable for me.

I see no fundamental reason why a similar mode of operation could not be
supported for /dev/spidev or inside the kernel - much more efficiently -
and I would very much like to see something like this implemented.

However, for my use case I have total control of the runtime environment
and I can easily make sure no other software touches the SPI or the pins
I need - and I stand to gain nothing by implementing this inside the
kernel - so I will stick with the userland mmap() /dev/mem hack for the
time being.

Thank you for the answer, it was very enlightening.

-- Naked

--
How fast is your code?
3 out of 4 devs don\\\'t know how their code performs in production.
Find out how slow your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219672;13503038;z?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
___
spi-devel-general mailing list
spi-devel-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spi-devel-general


Re: Continuous streaming SPI transfer

2012-09-29 Thread Ned Forrester
On 09/29/2012 04:20 PM, Nuutti Kotivuori wrote:
 Ned Forrester nforres...@whoi.edu writes:
 On 09/27/2012 06:09 PM, Nuutti Kotivuori wrote:
 I would like to use SPI in a streaming fashion, with a transfer being
 active all the time. This seems very difficult with the current kernel
 drivers. I am mainly looking at the Raspberry Pi BCM2708 SPI driver, but
 I have also tried to find similar features from other drivers.

 You don't say whether you mean to transfer into or out of the external
 device (or both). Nor do you say what clock rate you want to use (what's
 the average bit/byte rate?).
 
 I mean to transfer both in and out, but the clock rate I need is not very
 high, 100 kHz should be fine, 250 kHz should be plenty.
 
 I am not familiar with the hardware you want to use, but I have done
 this on a PXA255 processor, at 11Mbit/sec from the external device to
 the PXA processor.  For this I had to extensively modify the pxa2xx_spi
 controller driver (that was with kernel 2.6.20, I think the name has
 changed in recent kernels), and, of course, to write my own protocol
 driver (the glue between the kernel side of Clib and the controller
 driver).  I did not have to make any changes to the SPI core.

 NOTE WELL: Please don't ask for this code, I cannot share it.
 
 That sounds like a *lot* of effort, something beyond what I'm willing to
 put in.
 
 To overcome this problem, I ran the PXA processor as a slave, with the
 clocks supplied from external hardware.  The problem then becomes
 keeping the receive FIFO from overflowing.  That was solved by enabling
 chained DMA transfers (at enormous driver-writing expense; I'd never
 written a device driver before).
 
 [...]
 
 I don't think you can achieve what you want without a dedicated
 effort, because the SPI model is not intended for this type of
 application.
 
 I agree, this is so for your data rate. However, for my 250 kHz, things
 are much simpler.
 
 Because of my own investigations and your answer, I decided to forgo the
 kernel route altogether. Instead, I used the bcm2835 library from
 userspace as the peripherals are simply mapped to a fixed address space.
 
 With this, my main loop is simply
 
   while (!exit) {
 uint32_t state = bcm2835_peri_read(spi_cs);
 if (state  BCM2835_SPI0_CS_RXF)
   fprintf(stderr, RX FIFO full!\n);
 if (state  BCM2835_SPI0_CS_DONE)
   fprintf(stderr, TX FIFO empty!\n);
 if (state  BCM2835_SPI0_CS_TXD)
   bcm2835_peri_write_nb(spi_fifo, producebyte());
 if (state  BCM2835_SPI0_CS_RXD)
   processbyte(bcm2835_peri_read_nb(spi_fifo));
 if (!(state  (BCM2835_SPI0_CS_TXD | BCM2835_SPI0_CS_RXD)))
   nanosleep(loop_wait, NULL);
   }
 
 To explain, I simply loop in a userspace (but realtime scheduled)
 process, writing to the TX FIFO if the TX FIFO can accept more input and
 reading from the RX FIFO if the RX FIFO has any bytes to read. If there
 is neither, I sleep for 300 microseconds.
 
 This piece of code can easily keep both RX and TX FIFOS well tended at
 250kHz - and I managed to get it stable at even 2 MHz while testing. The
 solution does not use interrupts for anything (as it can't), but since
 the data rate is fixed a simple timer is not much worse. The 300
 microsecond sleep version uses about 7% of the CPU, and that is easily
 acceptable for me.
 
 I see no fundamental reason why a similar mode of operation could not be
 supported for /dev/spidev or inside the kernel - much more efficiently -
 and I would very much like to see something like this implemented.
 
 However, for my use case I have total control of the runtime environment
 and I can easily make sure no other software touches the SPI or the pins
 I need - and I stand to gain nothing by implementing this inside the
 kernel - so I will stick with the userland mmap() /dev/mem hack for the
 time being.
 
 Thank you for the answer, it was very enlightening.
 
 -- Naked

Most certainly simpler at lower clock rate.  I'm glad you found
something that works.

-- 
Ned Forrester   nforres...@whoi.edu
Oceanographic Systems Lab   508-289-2226 Office
Applied Ocean Physics and Engineering Dept.
Woods Hole Oceanographic Institution  Woods Hole, MA 02543, USA
http://www.whoi.edu/
http://www.whoi.edu/page.do?pid=29856
http://www.whoi.edu/hpb/Site.do?id=1532


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
___
spi-devel-general mailing list
spi-devel-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spi-devel-general


Continuous streaming SPI transfer

2012-09-27 Thread Nuutti Kotivuori
Hello,

I would like to use SPI in a streaming fashion, with a transfer being
active all the time. This seems very difficult with the current kernel
drivers. I am mainly looking at the Raspberry Pi BCM2708 SPI driver, but
I have also tried to find similar features from other drivers.

There seems to be no way to prevent the deactivation and reactivation of
the clock and everything between separate transfers - and a single
transfer is bounded in size and no progress is reported for it. Even
within a single transfer, it would seem that an earlier transfer is
waited to receive a DONE signal (tx fifo empty, rx fifo has everything)
before starting a new transfer, so there's always a small gap between
transfers. (The RPi hardware stops sending SCLK if there ever is a
condition where there is no byte ready in the FIFO to be sent, so if a
DONE signal is receved that means that SCLK has already been stopped.)

Keeping a transfer active constantly would need some buffering on writes
and reads to keep the TX FIFO always filled and RX FIFO not
full. Otherwise I don't see any direct problems with it, atleast on the
hardware level.

So, my questions are:

 1) Is there a fundamental problem in keeping an SPI transfer active for
extended periods of time (several hours)?
 2) Is this possible with the kernel API somehow?
 3) Is this possible from userland somehow?
 4) If it is not possible, what would be a good API for such a case?
 5) Would patches for such functionality be accepted?

Thank you in advance,
-- Naked

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
___
spi-devel-general mailing list
spi-devel-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spi-devel-general


Re: Continuous streaming SPI transfer

2012-09-27 Thread Ned Forrester
On 09/27/2012 06:09 PM, Nuutti Kotivuori wrote:
 Hello,
 
 I would like to use SPI in a streaming fashion, with a transfer being
 active all the time. This seems very difficult with the current kernel
 drivers. I am mainly looking at the Raspberry Pi BCM2708 SPI driver, but
 I have also tried to find similar features from other drivers.

You don't say whether you mean to transfer into or out of the external
device (or both). Nor do you say what clock rate you want to use (what's
the average bit/byte rate?).

I am not familiar with the hardware you want to use, but I have done
this on a PXA255 processor, at 11Mbit/sec from the external device to
the PXA processor.  For this I had to extensively modify the pxa2xx_spi
controller driver (that was with kernel 2.6.20, I think the name has
changed in recent kernels), and, of course, to write my own protocol
driver (the glue between the kernel side of Clib and the controller
driver).  I did not have to make any changes to the SPI core.

NOTE WELL: Please don't ask for this code, I cannot share it.

 There seems to be no way to prevent the deactivation and reactivation of
 the clock and everything between separate transfers - and a single
 transfer is bounded in size and no progress is reported for it. Even
 within a single transfer, it would seem that an earlier transfer is
 waited to receive a DONE signal (tx fifo empty, rx fifo has everything)
 before starting a new transfer, so there's always a small gap between
 transfers. (The RPi hardware stops sending SCLK if there ever is a
 condition where there is no byte ready in the FIFO to be sent, so if a
 DONE signal is receved that means that SCLK has already been stopped.)

To overcome this problem, I ran the PXA processor as a slave, with the
clocks supplied from external hardware.  The problem then becomes
keeping the receive FIFO from overflowing.  That was solved by enabling
chained DMA transfers (at enormous driver-writing expense; I'd never
written a device driver before).  DMA chaining allows the DMA hardware
to fetch its own next set of DMA parameters (addresses and byte count)
at the end of each DMA, thus avoiding the interrupt latency between each
buffer of data.  Using 256 buffers of 4096 bytes, interrupt latency of
100s of milliseconds becomes acceptable.

I used this to stream data from an external device, but I don't know of
any reason why it could not be used to stream data to a device, or to
both transmit and receive.  I have no idea whether your hardware has any
equivalent of DMA chaining, nor whether you have the experience/budget
for writing device drivers.  I don't think you can achieve what you want
without a dedicated effort, because the SPI model is not intended for
this type of application.

 Keeping a transfer active constantly would need some buffering on writes
 and reads to keep the TX FIFO always filled and RX FIFO not
 full. Otherwise I don't see any direct problems with it, atleast on the
 hardware level.
 
 So, my questions are:
 
  1) Is there a fundamental problem in keeping an SPI transfer active for
 extended periods of time (several hours)?

It depends on the clock rate, the features of the hardware you intend to
use, and your willingness to write your own driver.

  2) Is this possible with the kernel API somehow?

The SPI core assumes a series of messages containing transfers.  It is
intended to support multiple chips attached to one (or more) bus(es).
Each chip has a protocol driver with corresponding user-space interface
provided by the Clib, and each protocol driver passes messages in mixed
fashion to the controller driver, which actually manipulates the bus and
chip selects.

What you want can be done through the SPI core only if there is a single
device on the bus (or if data to/from multiple devices is completely
predictable, eg. sequential).  I wrote my driver as a modification of
pxa2xx_spi, with all existing functionality intact and continuing to use
all the message mechanics of the SPI core.  That was probably a mistake,
as it would have taken me far less time to write a dedicated, single
purpose driver.

  3) Is this possible from userland somehow?

Ultimately everything has to pass to userland to be useful.  Do you
mean: can this be done without writing a device driver?  I doubt it.

  4) If it is not possible, what would be a good API for such a case?

Not sure.  Somehow you need to be able to execute a read() from user
space and get data.  I chose to read() a block of data equal to the
length of each DMA transfer (4096 bytes), so that each buffer could be
freed as it is read.  It should also work to model the device as
continuously streaming and then to read a number of bytes, either
blocking (returns only when byte count is satisfied) or non-blocking
(returns available bytes).

I am thinking of a device that streams data to the processor.  Note that
the chained DMA scheme involves considerable delay between when the
device actually sends data and when