Re: Continuous streaming SPI transfer
On Fri, Sep 28, 2012 at 01:09:25AM +0300, Nuutti Kotivuori wrote: There seems to be no way to prevent the deactivation and reactivation of the clock and everything between separate transfers - and a single transfer is bounded in size and no progress is reported for it. Even within a single transfer, it would seem that an earlier transfer is This sounds more like you've got an IIO application (or audio) than a SPI one - the hardware is the same but you're thinking about it in a completely different way and so a separate subsystem makes sense. -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general
Re: Continuous streaming SPI transfer
Ned Forrester nforres...@whoi.edu writes: On 09/27/2012 06:09 PM, Nuutti Kotivuori wrote: I would like to use SPI in a streaming fashion, with a transfer being active all the time. This seems very difficult with the current kernel drivers. I am mainly looking at the Raspberry Pi BCM2708 SPI driver, but I have also tried to find similar features from other drivers. You don't say whether you mean to transfer into or out of the external device (or both). Nor do you say what clock rate you want to use (what's the average bit/byte rate?). I mean to transfer both in and out, but the clock rate I need is not very high, 100 kHz should be fine, 250 kHz should be plenty. I am not familiar with the hardware you want to use, but I have done this on a PXA255 processor, at 11Mbit/sec from the external device to the PXA processor. For this I had to extensively modify the pxa2xx_spi controller driver (that was with kernel 2.6.20, I think the name has changed in recent kernels), and, of course, to write my own protocol driver (the glue between the kernel side of Clib and the controller driver). I did not have to make any changes to the SPI core. NOTE WELL: Please don't ask for this code, I cannot share it. That sounds like a *lot* of effort, something beyond what I'm willing to put in. To overcome this problem, I ran the PXA processor as a slave, with the clocks supplied from external hardware. The problem then becomes keeping the receive FIFO from overflowing. That was solved by enabling chained DMA transfers (at enormous driver-writing expense; I'd never written a device driver before). [...] I don't think you can achieve what you want without a dedicated effort, because the SPI model is not intended for this type of application. I agree, this is so for your data rate. However, for my 250 kHz, things are much simpler. Because of my own investigations and your answer, I decided to forgo the kernel route altogether. Instead, I used the bcm2835 library from userspace as the peripherals are simply mapped to a fixed address space. With this, my main loop is simply while (!exit) { uint32_t state = bcm2835_peri_read(spi_cs); if (state BCM2835_SPI0_CS_RXF) fprintf(stderr, RX FIFO full!\n); if (state BCM2835_SPI0_CS_DONE) fprintf(stderr, TX FIFO empty!\n); if (state BCM2835_SPI0_CS_TXD) bcm2835_peri_write_nb(spi_fifo, producebyte()); if (state BCM2835_SPI0_CS_RXD) processbyte(bcm2835_peri_read_nb(spi_fifo)); if (!(state (BCM2835_SPI0_CS_TXD | BCM2835_SPI0_CS_RXD))) nanosleep(loop_wait, NULL); } To explain, I simply loop in a userspace (but realtime scheduled) process, writing to the TX FIFO if the TX FIFO can accept more input and reading from the RX FIFO if the RX FIFO has any bytes to read. If there is neither, I sleep for 300 microseconds. This piece of code can easily keep both RX and TX FIFOS well tended at 250kHz - and I managed to get it stable at even 2 MHz while testing. The solution does not use interrupts for anything (as it can't), but since the data rate is fixed a simple timer is not much worse. The 300 microsecond sleep version uses about 7% of the CPU, and that is easily acceptable for me. I see no fundamental reason why a similar mode of operation could not be supported for /dev/spidev or inside the kernel - much more efficiently - and I would very much like to see something like this implemented. However, for my use case I have total control of the runtime environment and I can easily make sure no other software touches the SPI or the pins I need - and I stand to gain nothing by implementing this inside the kernel - so I will stick with the userland mmap() /dev/mem hack for the time being. Thank you for the answer, it was very enlightening. -- Naked -- How fast is your code? 3 out of 4 devs don\\\'t know how their code performs in production. Find out how slow your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219672;13503038;z? http://info.appdynamics.com/FreeJavaPerformanceDownload.html ___ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general
Re: Continuous streaming SPI transfer
On 09/29/2012 04:20 PM, Nuutti Kotivuori wrote: Ned Forrester nforres...@whoi.edu writes: On 09/27/2012 06:09 PM, Nuutti Kotivuori wrote: I would like to use SPI in a streaming fashion, with a transfer being active all the time. This seems very difficult with the current kernel drivers. I am mainly looking at the Raspberry Pi BCM2708 SPI driver, but I have also tried to find similar features from other drivers. You don't say whether you mean to transfer into or out of the external device (or both). Nor do you say what clock rate you want to use (what's the average bit/byte rate?). I mean to transfer both in and out, but the clock rate I need is not very high, 100 kHz should be fine, 250 kHz should be plenty. I am not familiar with the hardware you want to use, but I have done this on a PXA255 processor, at 11Mbit/sec from the external device to the PXA processor. For this I had to extensively modify the pxa2xx_spi controller driver (that was with kernel 2.6.20, I think the name has changed in recent kernels), and, of course, to write my own protocol driver (the glue between the kernel side of Clib and the controller driver). I did not have to make any changes to the SPI core. NOTE WELL: Please don't ask for this code, I cannot share it. That sounds like a *lot* of effort, something beyond what I'm willing to put in. To overcome this problem, I ran the PXA processor as a slave, with the clocks supplied from external hardware. The problem then becomes keeping the receive FIFO from overflowing. That was solved by enabling chained DMA transfers (at enormous driver-writing expense; I'd never written a device driver before). [...] I don't think you can achieve what you want without a dedicated effort, because the SPI model is not intended for this type of application. I agree, this is so for your data rate. However, for my 250 kHz, things are much simpler. Because of my own investigations and your answer, I decided to forgo the kernel route altogether. Instead, I used the bcm2835 library from userspace as the peripherals are simply mapped to a fixed address space. With this, my main loop is simply while (!exit) { uint32_t state = bcm2835_peri_read(spi_cs); if (state BCM2835_SPI0_CS_RXF) fprintf(stderr, RX FIFO full!\n); if (state BCM2835_SPI0_CS_DONE) fprintf(stderr, TX FIFO empty!\n); if (state BCM2835_SPI0_CS_TXD) bcm2835_peri_write_nb(spi_fifo, producebyte()); if (state BCM2835_SPI0_CS_RXD) processbyte(bcm2835_peri_read_nb(spi_fifo)); if (!(state (BCM2835_SPI0_CS_TXD | BCM2835_SPI0_CS_RXD))) nanosleep(loop_wait, NULL); } To explain, I simply loop in a userspace (but realtime scheduled) process, writing to the TX FIFO if the TX FIFO can accept more input and reading from the RX FIFO if the RX FIFO has any bytes to read. If there is neither, I sleep for 300 microseconds. This piece of code can easily keep both RX and TX FIFOS well tended at 250kHz - and I managed to get it stable at even 2 MHz while testing. The solution does not use interrupts for anything (as it can't), but since the data rate is fixed a simple timer is not much worse. The 300 microsecond sleep version uses about 7% of the CPU, and that is easily acceptable for me. I see no fundamental reason why a similar mode of operation could not be supported for /dev/spidev or inside the kernel - much more efficiently - and I would very much like to see something like this implemented. However, for my use case I have total control of the runtime environment and I can easily make sure no other software touches the SPI or the pins I need - and I stand to gain nothing by implementing this inside the kernel - so I will stick with the userland mmap() /dev/mem hack for the time being. Thank you for the answer, it was very enlightening. -- Naked Most certainly simpler at lower clock rate. I'm glad you found something that works. -- Ned Forrester nforres...@whoi.edu Oceanographic Systems Lab 508-289-2226 Office Applied Ocean Physics and Engineering Dept. Woods Hole Oceanographic Institution Woods Hole, MA 02543, USA http://www.whoi.edu/ http://www.whoi.edu/page.do?pid=29856 http://www.whoi.edu/hpb/Site.do?id=1532 -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://ad.doubleclick.net/clk;258768047;13503038;j? http://info.appdynamics.com/FreeJavaPerformanceDownload.html ___ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general
Continuous streaming SPI transfer
Hello, I would like to use SPI in a streaming fashion, with a transfer being active all the time. This seems very difficult with the current kernel drivers. I am mainly looking at the Raspberry Pi BCM2708 SPI driver, but I have also tried to find similar features from other drivers. There seems to be no way to prevent the deactivation and reactivation of the clock and everything between separate transfers - and a single transfer is bounded in size and no progress is reported for it. Even within a single transfer, it would seem that an earlier transfer is waited to receive a DONE signal (tx fifo empty, rx fifo has everything) before starting a new transfer, so there's always a small gap between transfers. (The RPi hardware stops sending SCLK if there ever is a condition where there is no byte ready in the FIFO to be sent, so if a DONE signal is receved that means that SCLK has already been stopped.) Keeping a transfer active constantly would need some buffering on writes and reads to keep the TX FIFO always filled and RX FIFO not full. Otherwise I don't see any direct problems with it, atleast on the hardware level. So, my questions are: 1) Is there a fundamental problem in keeping an SPI transfer active for extended periods of time (several hours)? 2) Is this possible with the kernel API somehow? 3) Is this possible from userland somehow? 4) If it is not possible, what would be a good API for such a case? 5) Would patches for such functionality be accepted? Thank you in advance, -- Naked -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://ad.doubleclick.net/clk;258768047;13503038;j? http://info.appdynamics.com/FreeJavaPerformanceDownload.html ___ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general
Re: Continuous streaming SPI transfer
On 09/27/2012 06:09 PM, Nuutti Kotivuori wrote: Hello, I would like to use SPI in a streaming fashion, with a transfer being active all the time. This seems very difficult with the current kernel drivers. I am mainly looking at the Raspberry Pi BCM2708 SPI driver, but I have also tried to find similar features from other drivers. You don't say whether you mean to transfer into or out of the external device (or both). Nor do you say what clock rate you want to use (what's the average bit/byte rate?). I am not familiar with the hardware you want to use, but I have done this on a PXA255 processor, at 11Mbit/sec from the external device to the PXA processor. For this I had to extensively modify the pxa2xx_spi controller driver (that was with kernel 2.6.20, I think the name has changed in recent kernels), and, of course, to write my own protocol driver (the glue between the kernel side of Clib and the controller driver). I did not have to make any changes to the SPI core. NOTE WELL: Please don't ask for this code, I cannot share it. There seems to be no way to prevent the deactivation and reactivation of the clock and everything between separate transfers - and a single transfer is bounded in size and no progress is reported for it. Even within a single transfer, it would seem that an earlier transfer is waited to receive a DONE signal (tx fifo empty, rx fifo has everything) before starting a new transfer, so there's always a small gap between transfers. (The RPi hardware stops sending SCLK if there ever is a condition where there is no byte ready in the FIFO to be sent, so if a DONE signal is receved that means that SCLK has already been stopped.) To overcome this problem, I ran the PXA processor as a slave, with the clocks supplied from external hardware. The problem then becomes keeping the receive FIFO from overflowing. That was solved by enabling chained DMA transfers (at enormous driver-writing expense; I'd never written a device driver before). DMA chaining allows the DMA hardware to fetch its own next set of DMA parameters (addresses and byte count) at the end of each DMA, thus avoiding the interrupt latency between each buffer of data. Using 256 buffers of 4096 bytes, interrupt latency of 100s of milliseconds becomes acceptable. I used this to stream data from an external device, but I don't know of any reason why it could not be used to stream data to a device, or to both transmit and receive. I have no idea whether your hardware has any equivalent of DMA chaining, nor whether you have the experience/budget for writing device drivers. I don't think you can achieve what you want without a dedicated effort, because the SPI model is not intended for this type of application. Keeping a transfer active constantly would need some buffering on writes and reads to keep the TX FIFO always filled and RX FIFO not full. Otherwise I don't see any direct problems with it, atleast on the hardware level. So, my questions are: 1) Is there a fundamental problem in keeping an SPI transfer active for extended periods of time (several hours)? It depends on the clock rate, the features of the hardware you intend to use, and your willingness to write your own driver. 2) Is this possible with the kernel API somehow? The SPI core assumes a series of messages containing transfers. It is intended to support multiple chips attached to one (or more) bus(es). Each chip has a protocol driver with corresponding user-space interface provided by the Clib, and each protocol driver passes messages in mixed fashion to the controller driver, which actually manipulates the bus and chip selects. What you want can be done through the SPI core only if there is a single device on the bus (or if data to/from multiple devices is completely predictable, eg. sequential). I wrote my driver as a modification of pxa2xx_spi, with all existing functionality intact and continuing to use all the message mechanics of the SPI core. That was probably a mistake, as it would have taken me far less time to write a dedicated, single purpose driver. 3) Is this possible from userland somehow? Ultimately everything has to pass to userland to be useful. Do you mean: can this be done without writing a device driver? I doubt it. 4) If it is not possible, what would be a good API for such a case? Not sure. Somehow you need to be able to execute a read() from user space and get data. I chose to read() a block of data equal to the length of each DMA transfer (4096 bytes), so that each buffer could be freed as it is read. It should also work to model the device as continuously streaming and then to read a number of bytes, either blocking (returns only when byte count is satisfied) or non-blocking (returns available bytes). I am thinking of a device that streams data to the processor. Note that the chained DMA scheme involves considerable delay between when the device actually sends data and when