Re: [ARTIQ] Sustained RTIO output switching speed

2014-10-17 Thread Slichter, Daniel H.
 1 MHz does not sound like an unreasonable performance target for software
 and RTIO/processor communication optimizations. And we can have the
 large block RAM FIFO as you said, as a plan B that is easy to roll out.

If you think this is doable, then please go ahead with the further 
optimizations necessary.  I think it would be nice to have the option of the 
large block RAM FIFO with user-selectable depth (at bitstream compile time) 
anyway, since it sounds not too hard to implement.

 With the current code, events that cannot be put into the FIFO because it is
 full are silently discarded. I can have the RTIO driver raise an exception in
 those cases. I propose that the exception be raised when the user attempts
 to read from an input FIFO that has overflown.

Sounds good.

 The FIFO can also be disabled when the RTIO input is in gateware counter
 mode (and no overflow exceptions will be raised in this mode).

Sounds good.

Daniel
___
ARTIQ mailing list
https://ssl.serverraum.org/lists/listinfo/artiq


Re: [ARTIQ] Sustained RTIO output switching speed

2014-10-16 Thread Slichter, Daniel H.
Hi,

 If we go the DMA route, maybe a good option is to have DRAM backing of
 the RTIO FIFOs that is implemented all in gateware and is fully transparent 
 for
 the software. Then we can have FIFOs with hundreds of megabytes of
 storage (note that loading/unloading them will take some time). But this is
 not straightforward to do, and wasting the abundance of resources that the
 K7 FPGA has on large on-chip FIFO memories is much easier.

Given your comments, it seems to make the most sense just to make the FIFO 
memories deeper for right now, and defer discussion of DMA for the time being.  
I think the only place where this is really an issue would be on some 
special-case inputs, such as the Penning trap.  Unless others disagree, I think 
a 64-event FIFO queue seems plenty for outputs.  For inputs, what is the 
deepest FIFO one could make given the resources on the Kintex7?  Could one do a 
65k FIFO?  This would address the Penning needs if they care about timestamps 
as well.  Is the depth of the FIFOs easily reconfigurable (i.e. modify and 
build a new bitstream), or does it end up being more involved for a FIFO of 
this size?  In other words, how easy would it be for individual experiments to 
tweak FIFO depth and recompile the bitstream for their KC705 depending on their 
particular needs?  At what FIFO depth would this sort of thing not be possible 
anymore?
___
ARTIQ mailing list
https://ssl.serverraum.org/lists/listinfo/artiq


Re: [ARTIQ] Sustained RTIO output switching speed

2014-10-16 Thread Sébastien Bourdeauducq
On 10/16/2014 11:12 PM, Slichter, Daniel H. wrote:
 For inputs, what is the deepest FIFO one could make given the
 resources on the Kintex7?  Could one do a 65k FIFO?

Yes, with 64-bit timestamps (a pessimistic estimate) that would make a
~4Mbit memory. The FPGA on KC705 has ~16Mbit of block RAM and we're not
using much for other things. Changing the size of on-chip FIFOs can be
made as simple as changing numbers in a Python file (or we can add
command line parameters insted) and recompiling the bitstream.

Sébastien
___
ARTIQ mailing list
https://ssl.serverraum.org/lists/listinfo/artiq


Re: [ARTIQ] Sustained RTIO output switching speed

2014-10-15 Thread Slichter, Daniel H.
  Should I make it a priority to optimize this (the CPU/RTIO
  communication via CSR looks like a good suspect), or is it going to be
  enough for the near future?
 
 Good enough IMHO. My guess would be that the sum of tweaks like CSR
 data width will at most yield a factor of 3-5, probably not worth to do that
 here. The next logical step would be to do DMA, i presume.

I agree with Robert that DMA would be the most sensible next step here, rather 
than trying other tweaks.  I don't really know how difficult this would be to 
implement, though.  It would be a handy feature if one wanted to stream 
pre-programmed data to a high speed DAC, for example, or record data points 
from a high speed ADC -- useful if one is interested in implementing the kind 
of extensible hardware Joe has championed.  

In general, having DMA for the RTIO core solves not just this issue but also 
the following one, of storing timestamps for counts that come in at a high rate 
(e.g. in the Penning experiment).  

For folks on the list who didn't understand all of this, DMA (direct memory 
access) would mean that the DDR3 RAM on the FPGA board could be used to store 
pre-compiled digital pulse patterns with arbitrary timing patterns and 
essentially arbitrary length (since there is 1 GB of space on the KC705), which 
could then just be played back in real time, without needing to involve the 
soft processor.  Likewise, DMA would allow input events and their timestamps to 
be streamed directly into RAM, allowing the acquisition of data at rates much 
higher than the soft processor is able to handle in real time.  Right now, the 
longest patterns are limited by the depth of the FIFO queue for each RTIO 
(real-time input/output) channel, which is 64 entries deep per channel 
(Sebastien, can this be made deeper on the KC705?).  If you want more patterns 
than that, you may run into limitations in terms of how fast the soft CPU on 
the FPGA can calculate them and add them to the FIFO on the fly.  The tests Seb
 astien has run indicate that you can add one pulse instruction in roughly 1.1 
us on the KC705 board we are using.  

  Sustained input performance should be roughly comparable, though I
  have not actually tested it yet. I have noted that the Penning lab
  needs better than 0.6us ~ 2us of event processing time with up to 30k
  event for PMT pulses (note that one pulse is only one event, since the
  RTIO core can filter by edge type for inputs). But if only the count
  is important (not the timestamps of individual pulses), it is easy to
  do some count-specific software optimizations or even put the counter
  in gateware.
 
 Yes. Or a single deep FIFO on the PMT.

I think it's safe to say that we might want timestamps as well as counts, so 
would it be possible to make one or two input channels with very deep FIFOs, 
and the rest with standard ones?  As above, DMA for the RTIO core would solve 
this problem.  
___
ARTIQ mailing list
https://ssl.serverraum.org/lists/listinfo/artiq