Thank you--that's very useful. I didn't know the DSP slices could do 5 ns
multiplies.

Ultimately what I'm what I'm getting at here is trying to estimate how many
filter taps I can reasonably support on a 5 ns clock, with new data words
arriving on every clock, questions of available chip resources aside.

If I understand this correctly, even with new data arriving on every 5 ns
clock, ROACH should (up to practical considerations) be able to operate as
many taps as can fit on the FPGA. Is this right?

-Alex

On Mon, Sep 17, 2012 at 11:45 PM, Jason Manley <jman...@ska.ac.za> wrote:

> The latency through an FPGA will be high relative to a CPU/GPU, because
> the FPGA's clock rate is lower (1/200MHz=5ns). But these operations can be
> pipelined so that you can do a DSP operation on every clock cycle. ROACH 1
> and ROACH 2 will both run at 200MHz very easily.
>
> Considering ROACH-1, it has 640 DSP slices and you can do up to an 18 bit
> x 25 bit multiply in a single DSP slice. So you can do 640 multiply (and/or
> addition operation) operations every 1/200MHz=5ns.
>
> But then you can also start using the 14720 slices for multipliers or
> adders so you can get many more operations per second. And then, if you're
> doing low resolution operations, you can fill the 244 BRAMs with lookup
> tables and just lookup the product for a given input vector to do even more
> operations on every clock cycle.
>
> If you wanted to throw the whole FPGA at DSP operations, you could easily
> say that a ROACH-1 board is capable of over 2 TeraOps/s for 4-bit
> operations (common in radio astronomy). But this is an unrealistic figure
> of merit because it ignores things like pipelining registers and data
> routing requirements, memory controllers and the like which would all be
> needed in a practical design.
>
> Jason
>
> On 18 Sep 2012, at 05:20, Alex Zahn wrote:
>
> > I've been browsing the xilinx literature, but I just can't seem to get
> any idea how long one can usually expect addition and multiplication
> operations to take. I realize this depends on a lot of factors in the
> design, but does anyone know if it's reasonable to multiply two 16 bit
> numbers in a single clock with a clock rate of 200 MHz? I would test this
> on my ROACH out to find out, but I'm away from lab for a while, and thus
> rendered rather helpless for the time being.
> >
> > Unrelated, is there any online documentation on the new snapshot block?
> >
> > -Alex Zahn
>
>

Reply via email to