Please see answers in-line. Thanks! ________________________
Eric Blossom <[EMAIL PROTECTED]> 12/11/2007 02:31 PM To Eugene Grayver <[EMAIL PROTECTED]> cc discuss-gnuradio@gnu.org Subject Re: [Discuss-gnuradio] Re-writing blocks using intel libraries On Tue, Dec 11, 2007 at 10:13:32AM -0800, Eugene Grayver wrote: > Hello, > > We are working on some systems that require high sampling rates. I am > already using the Intel C++ compiler at the highest optimization ratio, > but a lot of the blocks are very slow still. It appears that intel C++ > does not properly vectorize <complex> data type. General curiosity questions: Are you using oprofile to measure performance? I am a bit of a maverick, and for various reasons am using a pure C++ environment. I hacked my own 'connect_block' function (can;t wait for v3.2, where these will be part of native gr). I am measuring the performance using a custom block (gr_throughput) that simply reports the average number of samples processed per second. What h/w platform are you running on / tuning for? The platform is currently Intel Xeon or Core2 Duo. You're not trying to run your app on a cache-crippled machine like a Celeron, are you? ;) No, very high end. Which blocks are causing you the biggest problem? I got a 2x improvement on all the filtering blocks. About a 40% improvement for sine/cosine generation blocks. This includes gr_expj, gr_rotate. Are your problems caused primarily by lack of CPU cycles, cache misses or mis-predicted branches? I am not sure, since I am not at all a software expect (mostly dsp/comm). My guess is that the SSE instructions are not being used (or not used to a full extent). Even the 'multiply' block is VERY slow compared to a vector x vector multiplication in the Intel library. Some of the gr_blocks process each sample using a separate function call (e.g. for (n=0; n<noutput_samples; n++) scale(in[n]) Replacing this with a single vectorized function call is much faster. > I have been replacing almost every low level block with a functionally > equivalent using the intel performance libraries (IPP). These libraries > are not GPL, but are free for noncommercial use under Linux ($200 > otherwise). At some point, I would like to contribute our work back to > gnuradio. Would this fit with the gr philosophy? How should we structure > the code? (i.e. have a separate set of files, use #defines, or ...)? > > Eugene We would not accept the changes. Part of what we're up to is building an ever expanding universe of free code. Instead of using the non-free IPP code, please consider using a free library such as ATLAS, or help us find and fix performance challenges in a way that doesn't require non-free code. Also, are you sure that your performance issues can't be better addressed with an algorithmic change? If you're using a lot of very low-level blocks (e.g., add, multiply, etc.) you're probably better off writing a block that aggregates some of the operations into a single block. That's what I expected. We'll try to contribute the more dsp-centric blocks such as demodulators. Eric
_______________________________________________ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org http://lists.gnu.org/mailman/listinfo/discuss-gnuradio