Hey all,
Luke Madden was asking me about what's going on in the FFT-direct today.
 I'm pretty sure we have basically zero documentation on this lying around,
so it's a good time to fix that.  I'm going share what I know, but I'd
appreciate it if other people could add/correct me as needed.....

So, you can split the CASPER FFTs into streaming and parallel FFTs:

streaming: <fft_biplex, fft_biplex_real, fft_biplex_real_4x>
These FFTs have several independent ports.  Each of these ports is fed with
normal-order, serial time-domain data and produces normal-order, serial
frequency-domain data.  If you know something about how pipelined FFTs
work, you'll probably call it a "Radix 2, Delay-Commutator FFT", or R2DC.
 In the <fft_biplex>, we follow the R2DC FFT with an
inverse-delay-commutator stage to un-scramble the data (the casper
implementation doesn't have the same structure as an
inverse-delay-commutator, but they do the same thing).  In
<fft_biplex_real>, we do the same R2DC FFT, but we treat real and imag as
separate inputs, making four inputs.

parallel: <fft_direct>
If map_tail is not set, then the fft_direct block accepts all the inputs
for an fft on *each clock cycle*.  Natural order in, Natural order out.
If map_tail *is* set, it's a bit more complicated.  Then, this block is
being used with a number of streaming FFTs to achieve a wideband FFT.
Imagine a standard DIT FFT.  The early stages of the FFT only use a few
coefficients.  In fact, they are each FFTs in their own rights, only on a
subset of the data.  These streaming FFTs are just that:  for as long as we
can still process the data in a serial fashion, we process each sample
sequentially.  Then, we do the last 1-4 (typically) stages in a massive
parallel format.  Here, the same structure is drawn as in the <map_tail=0>
fft_direct... but the coefficients now change (specifically, their phases
are incrementing).

This is where my understanding gets a bit hazy, but it looks like the last
stages of the FFT are being literally enumerated here.  *If someone wants
to chime in, here is the place to do it*.

In any case, you could actually do these "mixed streaming/parallel FFTs"
(which are <fft, fft_wideband_real>) in a different fashion, by re-casting
them as a split-radix FFT (look it up).  Doing this is computationally
about the same, but saves resources and memory... and is simpler if the
size of <fft_direct> is greater than 2^2.


I hope this helps, Luke (and everyone else)!


--Ryan Monroe

Reply via email to