Hey David! Small fixed-size FFTs are fun, until you need to implement one with a prime factor of 23 :)
Generally, the trick is more or less that you can decompose any N-point DFT into a problem akin to a 2D DFT of size M×(N/M), but only if M and N/M are coprime. That's cool, for example if M is the highest power of 2 by which N is divisible, you can use a Radix-2 FFT for the columns, and worst case will have to do classical DFTs on the rows, if you cannot decompose N/M into further factors for which you have e.g. a Radix-3 FFT. With something that is of size 23, I don't think there's a known implementation that'd be faster than the straightforward matrix-multiplication DFT, but I'd love to be proven wrong! Cheers, Marcus On 17.02.21 17:18, David Winter wrote: > Hey, > > > >>Larger n yields greater efficiencies, right? Doing lots of small calls isn't >>necessarilyas efficient as doing less large calls, so long as you can handle >>the latency and the processor can stay fed with data. I figure >~4000 >>samples is a good compromise, but maybe larger ones work out better, too. > > > > Keep in mind that the FFT has an algorithmic complexity of O(n*log n), thus > smaller > batches **should** in theory be easier on your CPU down to the point where > call > > overheads become a problem. > > Keeping your FFT size near the capacity of your registers / L1 cache also > can’t hurt ^^ > > > > Obviously you can still wait until you have accumulated a large batch of > samples > (a couple k sounds good?) and only then start running lots of small FFTs. If > your > samples are in a contiguous block of memory, your CPU should be able to > prefetch > those just fine. > > > > Ultimately you might just have to benchmark the two approaches and compare > them - I’m not sure how well libraries like fftw deal with this usecase and > what the > call overhead is. > > > > Should fftw not work out, you could also try writing a small fixed-size fft > yourself > > and inlining that, the code isn’t too bad (There are enough C-examples > online). > > > > David > > > > *Von: *Discuss-gnuradio <discuss-gnuradio-bounces+dastw=gmx....@gnu.org> im > Auftrag von > Brian Padalino <bpadal...@gmail.com> > *Datum: *Mittwoch, 17. Februar 2021 um 16:58 > *An: *Marcus Müller <muel...@kit.edu> > *Cc: *GNURadio Discussion List <discuss-gnuradio@gnu.org> > *Betreff: *Re: Resampling radio data > > > > On Wed, Feb 17, 2021 at 10:05 AM Marcus Müller <muel...@kit.edu > <mailto:muel...@kit.edu>> > wrote: > > Oh, sorry, didn't mean to imply that! FFT interpolation might work well > (if you can live > with the sinc sidelobes). > > > > If the bandwidth is already constrained to 20MHz/23MHz, then there would be > no sidelobes - > correct? > > > > > I do have a question, though: Why do you go for a 4600-FFT? why not > simply a 23 FFT? > > > > Larger n yields greater efficiencies, right? Doing lots of small calls isn't > necessarily > as efficient as doing less large calls, so long as you can handle the latency > and the > processor can stay fed with data. I figure ~4000 samples is a good > compromise, but maybe > larger ones work out better, too. > > > > For reference, here's some performance benchmarks for FFTW (randomly chosen > processor): > > > > http://www.fftw.org/speed/Pentium4-3.06GHz/ > <http://www.fftw.org/speed/Pentium4-3.06GHz/> > > > > Single precision, non powers of 2, 1d transforms are some good benchmarks to > look at for > something like this. Obviously some numbers are better than others, but the > general trend > seems to skew towards the mid-thousands as being a decent number to target > for maximizing > throughput. > > > > Brian >
smime.p7s
Description: S/MIME Cryptographic Signature