Hey David!

Small fixed-size FFTs are fun, until you need to implement one with a prime 
factor of 23 :)

Generally, the trick is more or less that you can decompose any N-point DFT 
into a problem
akin to a 2D DFT of size M×(N/M), but only if M and N/M are coprime. That's 
cool, for
example if M is the highest power of 2 by which N is divisible, you can use a 
Radix-2 FFT
for the columns, and worst case will have to do classical DFTs on the rows, if 
you cannot
decompose N/M into further factors for which you have e.g. a Radix-3 FFT.

With something that is of size 23, I don't think there's a known implementation 
that'd be
faster than the straightforward matrix-multiplication DFT, but I'd love to be 
proven wrong!

Cheers,
Marcus

On 17.02.21 17:18, David Winter wrote:
> Hey,
> 
>  
> 
>>Larger n yields greater efficiencies, right?  Doing lots of small calls isn't 
>>necessarilyas efficient as doing less large calls, so long as you can handle 
>>the latency and the processor can stay fed with data.  I figure >~4000 
>>samples is a good compromise, but maybe larger ones work out better, too.
> 
>  
> 
> Keep in mind that the FFT has an algorithmic complexity of O(n*log n), thus 
> smaller
> batches **should** in theory be easier on your CPU down to the point where 
> call
> 
> overheads become a problem.
> 
> Keeping your FFT size near the capacity of your registers / L1 cache also 
> can’t hurt ^^
> 
>  
> 
> Obviously you can still wait until you have accumulated a large batch of 
> samples
> (a couple k sounds good?) and only then start running lots of small FFTs. If 
> your
> samples are in a contiguous block of memory, your CPU should be able to 
> prefetch
> those just fine.
> 
>  
> 
> Ultimately you might just have to benchmark the two approaches and compare
> them - I’m not sure how well libraries like fftw deal with this usecase and 
> what the
> call overhead is.
> 
>  
> 
> Should fftw not work out, you could also try writing a small fixed-size fft 
> yourself
> 
> and inlining that, the code isn’t too bad (There are enough C-examples 
> online).
> 
>  
> 
> David
> 
>  
> 
> *Von: *Discuss-gnuradio <discuss-gnuradio-bounces+dastw=gmx....@gnu.org> im 
> Auftrag von
> Brian Padalino <bpadal...@gmail.com>
> *Datum: *Mittwoch, 17. Februar 2021 um 16:58
> *An: *Marcus Müller <muel...@kit.edu>
> *Cc: *GNURadio Discussion List <discuss-gnuradio@gnu.org>
> *Betreff: *Re: Resampling radio data
> 
>  
> 
> On Wed, Feb 17, 2021 at 10:05 AM Marcus Müller <muel...@kit.edu 
> <mailto:muel...@kit.edu>>
> wrote:
> 
>     Oh, sorry, didn't mean to imply that! FFT interpolation might work well 
> (if you can live
>     with the sinc sidelobes).
> 
>  
> 
> If the bandwidth is already constrained to 20MHz/23MHz, then there would be 
> no sidelobes -
> correct?
> 
>  
> 
> 
>     I do have a question, though: Why do you go for a 4600-FFT? why not 
> simply a 23 FFT?
> 
>  
> 
> Larger n yields greater efficiencies, right?  Doing lots of small calls isn't 
> necessarily
> as efficient as doing less large calls, so long as you can handle the latency 
> and the
> processor can stay fed with data.  I figure ~4000 samples is a good 
> compromise, but maybe
> larger ones work out better, too.
> 
>  
> 
> For reference, here's some performance benchmarks for FFTW (randomly chosen 
> processor):
> 
>  
> 
>   http://www.fftw.org/speed/Pentium4-3.06GHz/ 
> <http://www.fftw.org/speed/Pentium4-3.06GHz/>
> 
>  
> 
> Single precision, non powers of 2, 1d transforms are some good benchmarks to 
> look at for
> something like this.  Obviously some numbers are better than others, but the 
> general trend
> seems to skew towards the mid-thousands as being a decent number to target 
> for maximizing
> throughput.
> 
>  
> 
> Brian
> 

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to