Please see answers in-line.

Thanks!
________________________



Eric Blossom <[EMAIL PROTECTED]> 
12/11/2007 02:31 PM

To
Eugene Grayver <[EMAIL PROTECTED]>
cc
discuss-gnuradio@gnu.org
Subject
Re: [Discuss-gnuradio] Re-writing blocks using intel libraries






On Tue, Dec 11, 2007 at 10:13:32AM -0800, Eugene Grayver wrote:
> Hello,
> 
> We are working on some systems that require high sampling rates.  I am 
> already using the Intel C++ compiler at the highest optimization ratio, 
> but a lot of the blocks are very slow still.  It appears that intel C++ 
> does not properly vectorize <complex> data type. 

General curiosity questions:

  Are you using oprofile to measure performance?

I am a bit of a maverick, and for various reasons am using a pure C++ 
environment.  I hacked my own 'connect_block' function (can;t wait for 
v3.2, where these will be part of native gr).  I am measuring the 
performance using a custom block (gr_throughput) that simply reports the 
average number of samples processed per second.

  What h/w platform are you running on / tuning for?

The platform is currently Intel Xeon or Core2 Duo.

  You're not trying to run your app on a cache-crippled machine like a
  Celeron, are you?  ;)

No, very high end.

  Which blocks are causing you the biggest problem?

I got a 2x improvement on all the filtering blocks.  About a 40% 
improvement for sine/cosine generation blocks.  This includes gr_expj, 
gr_rotate.

  Are your problems caused primarily by lack of CPU cycles, cache
  misses or mis-predicted branches?

I am not sure, since I am not at all a software expect (mostly dsp/comm). 
My guess is that the SSE instructions are not being used (or not used to a 
full extent).  Even the 'multiply' block is VERY slow compared to a vector 
x vector multiplication in the Intel library.  Some of the gr_blocks 
process each sample using a separate function call (e.g. 
for (n=0; n<noutput_samples; n++)
        scale(in[n])

Replacing this with a single vectorized function call is much faster.

> I have been replacing almost every low level block with a functionally 
> equivalent using the intel performance libraries (IPP).  These libraries 

> are not GPL, but are free for noncommercial use under Linux ($200 
> otherwise).  At some point, I would like to contribute our work back to 
> gnuradio.  Would this fit with the gr philosophy?  How should we 
structure 
> the code?  (i.e. have a separate set of files, use #defines, or ...)?
> 
> Eugene

We would not accept the changes.  Part of what we're up to is building
an ever expanding universe of free code.  Instead of using the
non-free IPP code, please consider using a free library such as ATLAS,
or help us find and fix performance challenges in a way that doesn't
require non-free code.  Also, are you sure that your performance
issues can't be better addressed with an algorithmic change?  If
you're using a lot of very low-level blocks (e.g., add, multiply,
etc.) you're probably better off writing a block that aggregates some
of the operations into a single block.

That's what I expected.  We'll try to contribute the more dsp-centric 
blocks such as demodulators. 



Eric

_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
http://lists.gnu.org/mailman/listinfo/discuss-gnuradio

Reply via email to