Tom Rondeau wrote:
> Martin Dvh wrote:
>> Eric Blossom wrote:
>>> On Tue, Dec 11, 2007 at 03:41:46PM -0800, Eugene Grayver wrote:
>>>> Please see answers in-line.
>>>> Thanks!
>>>>       General curiosity questions:
>>>>   Are you using oprofile to measure performance?
>>>> I am a bit of a maverick, and for various reasons am using a pure
>>>> C++ environment.  I hacked my own 'connect_block' function (can;t
>>>> wait for v3.2, where these will be part of native gr).
>>> The trunk contains C++ code for connect, hier_block2, etc.  Some of
>>> the pieces that are still missing include C++ support for the USRP
>>> daughterboards, but Johnathan Corgan is working on that now.
>>>> I am measuring the performance using a custom block (gr_throughput)
>>>> that simply reports the average number of samples processed per
>>>> second.
>>>>         What h/w platform are you running on / tuning for?
>>>> The platform is currently Intel Xeon or Core2 Duo.
>>>>   You're not trying to run your app on a cache-crippled machine like a
>>>>   Celeron, are you?  ;)
>>>> No, very high end.
>>>>   Which blocks are causing you the biggest problem?
>>>> I got a 2x improvement on all the filtering blocks.
>>> If these are FIR filters, were you using gr_fft_filter_{fff,ccc}
>>> or the gr_fir_filter* blocks?  The FFT one's are _much_ faster with a
>>> break-even point around 16 taps IIRC.
>>>> About a 40% improvement for sine/cosine generation blocks.  This
>>>> includes gr_expj, gr_rotate.
>>> No surprise there, and that's a great example of SIMD code that should
>>> be in GNU Radio.
>>>>   Are your problems caused primarily by lack of CPU cycles, cache
>>>>   misses or mis-predicted branches?
>>>> I am not sure, since I am not at all a software expect (mostly
>>>> dsp/comm). My guess is that the SSE instructions are not being used
>>>> (or not used to a full extent).  Even the 'multiply' block is VERY
>>>> slow compared to a vector x vector multiplication in the Intel library.
>>> OK.
>>>> Some of the gr_blocks process each sample using a separate function
>>>> call (e.g. for (n=0; n<noutput_samples; n++)
>>>>         scale(in[n])
>>>> Replacing this with a single vectorized function call is much faster.
>>> OK.
>>>>> We would not accept the changes.
>>>> That's what I expected.  We'll try to contribute the more
>>>> dsp-centric blocks such as demodulators.       
>>> That would be great!  Or if you want to code up an SSE Taylor series
>>> expansion for sine/cosine good to 23-bits or so, we'd love that too ;)
>> I am working on this in the little spare time I have.
>> I already got a SSE taylor series for atan2, working in gnuradio.
>> The atan2 needs some code cleanup and wrapper code to switch
>> implementations (if (processor=X86, processor
>> supports_SSE2)=>optimized else generic)
>> The sin/cos is far from ready.
>> Greetings,
>> Martin
> Martin,
> Bob put in a fast atan function (general/ about a year
> ago. Have you looked in this, and is the Taylor performance better?
The taylor performance is much better when you get (a multiple of) 4 atan2s at 
a time.
(because the SSE taylor series works with SIMD in blocks of 4)
When you only get one at a time, the performance is still better but not by 
The taylor series also is more precise then
I don't have the numbers at hand, but I also wrote qa and benchmark code so 
exact numbers on precision and speed can be determined.

As a side note:
I have also been working on a new version off the FFT FIR filter.
This one is more efficient when decimating.
This works very well when decimation is 2^n, it also works well for most other 
decimation factors EXCEPT when decimation is a big prime.

This means the theoretical maximum speed improvement is a factor two (when 
decimation is infinite)
But when you want multiple parts of the spectrum then the speed improvement is 
much better then using a FIR filter per spectrum part.
Then you can use a single forward FFT with multiple inverse FFTs.


> We really need a faster sin/cos. Glad to hear you're working on it.
> Tom


Discuss-gnuradio mailing list

Reply via email to