Hey all, Luke Madden was asking me about what's going on in the FFT-direct today. I'm pretty sure we have basically zero documentation on this lying around, so it's a good time to fix that. I'm going share what I know, but I'd appreciate it if other people could add/correct me as needed.....
So, you can split the CASPER FFTs into streaming and parallel FFTs: streaming: <fft_biplex, fft_biplex_real, fft_biplex_real_4x> These FFTs have several independent ports. Each of these ports is fed with normal-order, serial time-domain data and produces normal-order, serial frequency-domain data. If you know something about how pipelined FFTs work, you'll probably call it a "Radix 2, Delay-Commutator FFT", or R2DC. In the <fft_biplex>, we follow the R2DC FFT with an inverse-delay-commutator stage to un-scramble the data (the casper implementation doesn't have the same structure as an inverse-delay-commutator, but they do the same thing). In <fft_biplex_real>, we do the same R2DC FFT, but we treat real and imag as separate inputs, making four inputs. parallel: <fft_direct> If map_tail is not set, then the fft_direct block accepts all the inputs for an fft on *each clock cycle*. Natural order in, Natural order out. If map_tail *is* set, it's a bit more complicated. Then, this block is being used with a number of streaming FFTs to achieve a wideband FFT. Imagine a standard DIT FFT. The early stages of the FFT only use a few coefficients. In fact, they are each FFTs in their own rights, only on a subset of the data. These streaming FFTs are just that: for as long as we can still process the data in a serial fashion, we process each sample sequentially. Then, we do the last 1-4 (typically) stages in a massive parallel format. Here, the same structure is drawn as in the <map_tail=0> fft_direct... but the coefficients now change (specifically, their phases are incrementing). This is where my understanding gets a bit hazy, but it looks like the last stages of the FFT are being literally enumerated here. *If someone wants to chime in, here is the place to do it*. In any case, you could actually do these "mixed streaming/parallel FFTs" (which are <fft, fft_wideband_real>) in a different fashion, by re-casting them as a split-radix FFT (look it up). Doing this is computationally about the same, but saves resources and memory... and is simpler if the size of <fft_direct> is greater than 2^2. I hope this helps, Luke (and everyone else)! --Ryan Monroe