Thank you--that's very useful. I didn't know the DSP slices could do 5 ns multiplies.
Ultimately what I'm what I'm getting at here is trying to estimate how many filter taps I can reasonably support on a 5 ns clock, with new data words arriving on every clock, questions of available chip resources aside. If I understand this correctly, even with new data arriving on every 5 ns clock, ROACH should (up to practical considerations) be able to operate as many taps as can fit on the FPGA. Is this right? -Alex On Mon, Sep 17, 2012 at 11:45 PM, Jason Manley <jman...@ska.ac.za> wrote: > The latency through an FPGA will be high relative to a CPU/GPU, because > the FPGA's clock rate is lower (1/200MHz=5ns). But these operations can be > pipelined so that you can do a DSP operation on every clock cycle. ROACH 1 > and ROACH 2 will both run at 200MHz very easily. > > Considering ROACH-1, it has 640 DSP slices and you can do up to an 18 bit > x 25 bit multiply in a single DSP slice. So you can do 640 multiply (and/or > addition operation) operations every 1/200MHz=5ns. > > But then you can also start using the 14720 slices for multipliers or > adders so you can get many more operations per second. And then, if you're > doing low resolution operations, you can fill the 244 BRAMs with lookup > tables and just lookup the product for a given input vector to do even more > operations on every clock cycle. > > If you wanted to throw the whole FPGA at DSP operations, you could easily > say that a ROACH-1 board is capable of over 2 TeraOps/s for 4-bit > operations (common in radio astronomy). But this is an unrealistic figure > of merit because it ignores things like pipelining registers and data > routing requirements, memory controllers and the like which would all be > needed in a practical design. > > Jason > > On 18 Sep 2012, at 05:20, Alex Zahn wrote: > > > I've been browsing the xilinx literature, but I just can't seem to get > any idea how long one can usually expect addition and multiplication > operations to take. I realize this depends on a lot of factors in the > design, but does anyone know if it's reasonable to multiply two 16 bit > numbers in a single clock with a clock rate of 200 MHz? I would test this > on my ROACH out to find out, but I'm away from lab for a while, and thus > rendered rather helpless for the time being. > > > > Unrelated, is there any online documentation on the new snapshot block? > > > > -Alex Zahn > >