I haven't used VOLK with the OMAP processor but from my experience with the E100 every multiplication and/or division in your flowgraph counts ... When I was working on my C64x+ DSP based FM receiver on the E100 I was moving individual blocks 1-by-1 from the GPP to the DSP and almost every multiplication/division on the GPP caused a buffer overflow my impression at least is if you're going for a pure GPP implementation you need to make used of NEOS vector operations and if you're using a DSP based solution you'll need to find a way to speed up the GPP/DSP buffers, which is something I'm hoping to have more time to look into.
Almohanad Fayez alfa...@aol.com -----Original Message----- From: Evan Merewether <e...@syndetix.com> To: discuss-gnuradio <discuss-gnuradio@gnu.org> Sent: Tue, Jan 24, 2012 1:22 pm Subject: Re: [Discuss-gnuradio] Try to improve E100's performance at high sample rate Has anybody looked at using the CORDIC approximation for atan2? Depending on the required accuracy, this may dramatically improve performance in your C code. Ultimately, you can implement the CORDIC functions in the FPGA (quasi math-coprocessor style) which would then give you the fastest possible computation speed. Evan -----Original Message----- From: discuss-gnuradio-bounces+evan=syndetix....@gnu.org [mailto:discuss-gnuradio-bounces+evan=syndetix....@gnu.org] On Behalf Of ziyang Sent: Tuesday, January 24, 2012 10:56 AM To: Nick Foster Cc: discuss-gnuradio@gnu.org Subject: Re: [Discuss-gnuradio] Try to improve E100's performance at high sample rate On 01/19/2012 07:13 PM, Nick Foster wrote: > Optimizing an algorithm is a hard and sometimes counterintuitive > process. You might benchmark the following: > > - Gnuradio's atan2 WITHOUT any Volk multiplications (just comment out > the volk mults in your block) > - The Volk multiplications WITHOUT Gnuradio's atan2 (just comment out > the atan2 in your block) > > This will let you determine where the bottleneck is. In addition, try > running over a MUCH larger dataset. The clock resolution at <1ms is > not very good and the scheduler will have a correspondingly larger > effect at smaller timescales. > > I think you'll find the atan2 part takes vastly longer than the > multiplications do, and that will be where you have to look for > performance improvements. > > --n > Hi Nick, I have been doing some tests about the demodulation module. As you said, the atan2 part takes much longer than the multiplication. So in order to maximize the performance improvement that volk could bring to the processing, I took a division and a multiplication out of atan2, and use volk-ified divider and multiplier instead. Then I run tests using a much larger dataset. But from the test results, I did not observe a performance improvement. In fact, the average processing time even increase a little bit. So I was wondering if what I did was not a good way to improve the performance? Another issue is that when I ran Cmake to build Gnuradio on E100, it reported this: -- Available arches: generic;neon -- Available machines: generic;neon -- Did not find liborc and orcc, disabling orc support... But from the "opkg list-installed | grep orc" check, both orc and liborc are installed. Could this lack of orc support be part of the reason why my implementation did not have a performance improvement? I will appreciate it if you could give me a hand on this. Thanks. Best Regards, Terry _______________________________________________ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org https://lists.gnu.org/mailman/listinfo/discuss-gnuradio ----- No virus found in this message. Checked by AVG - www.avg.com Version: 2012.0.1901 / Virus Database: 2109/4763 - Release Date: 01/24/12 _______________________________________________ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
_______________________________________________ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org https://lists.gnu.org/mailman/listinfo/discuss-gnuradio