On Thu, 7 Jan 1999, Bill Daly wrote:

[ snip very nice explanation ]

> This is only a part of Crandall's method. He also takes advantage of a
> method of packing the number to be multiplied into complex coefficients,
> which reduces the order of the FFT by a factor of 2. Also, he uses
> something called a "split-radix" FFT, which in some way takes advantage
> of special properties of the square root of -1 to further reduce the
> number of multiplies required. I haven't been able to work out how this
> is done, so if someone has an explanation, maybe they could post it
> here.
> 

The split-radix formulation is easy to visualize. Draw yourself
a radix-2 decimation in frequency FFT, and pick out four of the
butterflies (two in one "pass" and two in the next):

a0 ------- a0 --- a0
   \     /     X 
a1   \ /   a1 --- a1
  \   X   /
    /   \
  /   X   \
a2   / \   a2 --- a2
   /     \     X
a3 ------  a3 --- a3


           1)     2)


In pass 1, a2 is multiplied by some constant "w^j" (where w is the
root of unity you're using) and a3 is multiplied by i*w^j. In pass
2, a1 and a3 are multiplied by w^(2j).

If you take these four butterflies and collapse them into one giant
butterfly, you get the building block of the radix-4 FFT. Note that
combining everything together reduces the number of complex multiplies
from 4 to 3, for what looks like a big savings of effort (though the
arithmetic isn't reduced by 25%, more like 10-15%). 

Take two of these radix-4 butterflies and interleave them, then add a
third pass, and you have a radix-8 FFT (I won't even attempt to draw it).

The point is, take the picture above and get rid of the top right
butterfly, and you get the building block of the split-radix FFT.
This gives two complex multiplies for three butterflies, slightly
better than the radix-4 version. There are pluses and minus to rearranging
a radix-2 FFT this way. on the good side, the operation 
count turns out to be lower than even a radix-8 FFT, and 20-25% better
than the radix-2. It also has a much easier time of fitting into the
Pentium's 8 FPU registers than the radix-4, because by the time you have
to get to the complex multiplies the top two complex numbers have gone
away already. But on the bad side, one good reason to use a large radix 
is that it subdivides the dataset into cache-size chunks faster, and
the split-radix doesn't do that as well as even a radix-4 FFT. Also, you
can't take care of all the data in a single pass like you can with the
more conventional FFTs. In fact, the easiest way to do a split radix FFT
is recursively:

split_radix(n, size) {
   if size==4
      radix4(n)
   if size==8
      radix8(n)           /* base cases of recursion */
   else
      funky_split_radix_butterfly(n, size)
      split_radix( n, size/2)
      split_radix( n+2*size/4, size/4 )
      split_radix( n+3*size/4, size/4 )
   end if
}

You can build a decimation in time split-radix from the DIT radix-2
FFT in the same way.

Hope this helps,
jasonp

Reply via email to