Forgot to include the link to my benchmarking tool:
https://github.com/marcusmueller/table_vs_volk
Had too look intensely for your mail:
Trek, please don't "hijack" other threads by replying to them with a
completely unrelated topic. If starting a new topic, simply send an
email to the mailing list, without using the "reply" functionality, or
else, most people won't even see it, because it's buried in a discussion
thread irrelevant to them.

Best regards,
Marcus
On 07.04.2016 11:40, Marcus Müller wrote:
> Hi Trek,
>
> as Martin noted, yes, if you search the GNU Radio source tree for that
> file name, you'll find it. And also, yes, GNU Radio is Free Software,
> and one of the main credos of that is that you should be able to use
> everything from it for your own purposes (as long as you adhere to the
> freeness that the part you're using demands; for GNU Radio, that's
> GPLv3). However, to be honest, a linear approximation-based 8kB sine
> table might or might not be the right tool for your problem – usually,
> one would just think about what one needs and generate the sine table
> oneself, matching exactly the requirements at hand.
>
> Us being DSP nerds, I guess some of us are curious: what is your fixed
> point application? Are you planning to use this on some
> microcontroller, or some programmable logic device, or do you need a
> sin where you transform fixed point     values (e.g. from an ADC) to
> floating point values? What is the algorithm you're building with that?
>
> However, are you /sure/ a sine table is the optimum for your specific
> problem?
> I'm not an overly big fan of uniform sine tables (they make a lot of
> sense on e.g. microcontrollers that don't have advanced math
> functions, and if you don't need the accuracy), but if you look at
> VOLK, you'll find things that are comparably fast, or in my case, even
> faster; using a benchmarking stub I've got lying around (didn't
> specify any compiler optimizations, i.e. gcc will not optimize).
> Doing 100000000 operations.
> fixed point
>  0.781710s wall, 0.780000s user + 0.000000s system = 0.780000s CPU (99.8%)
> standard libc float32 sin
>  2.700463s wall, 2.700000s user + 0.000000s system = 2.700000s CPU (100.0%)
> VOLK float32 sin
> Using Volk machine: avx2_64_mmx_orc
>  0.331708s wall, 0.330000s user + 0.000000s system = 0.330000s CPU (99.5%)
> dummy memory bandwidth test: copy out- to input
>  0.404707s wall, 0.400000s user + 0.000000s system = 0.400000s CPU (98.8%)
> dummy memory bandwidth test: copy in- to output
>  0.406990s wall, 0.410000s user + 0.000000s system = 0.410000s CPU (100.7%)
>
> Volk of course only makes sense if you can arrange your algorithms so
> that you get a lot of sin input values continuously in memory.
>
> Four observations:
>
>  1. This sine-table implementation is but three times faster than the
>     standard libc sin, not even counting the fact that you'd have to
>     first come up with the proper input scaling. Unless your program
>     is really dominated by sin() performance, this might not be even
>     worth considering. A general hint: run "perf record -a
>     yourprogram"; "perf report" to find out where your PC spent it's
>     time. Well, at least without compiler optimizations.
>  2. The VOLK routine is twice as fast as the fixed point
>     implementation, and being a six-summand Taylor series
>     approximation probably more accurate.
>  3. Enabling compiler optimizations (CFLAGS=-Ofast make) will probably
>     double the speed of sin (my experience), and severely cut the the
>     time that the fixed point implementation takes, probably slightly
>     below the time of Volk (which will not change measurably). That's
>     because the compiler will inline everything in the fixed point
>     routine. Whether that slight advantage then will be worth the
>     accuracy loss is up to you.
>  4. VOLK's sin is faster than float-wise copy (here, without compiler
>     optimizations); what seems paradox shows that making extensive use
>     of memory alignment and SIMD brings you much closer to the memory
>     bandwidth barrier. Knowing my machine, I now have a guess for the
>     performance of the fixed point sin table approach under heavy
>     compiler optimization: it will take around ¼ of the time one of
>     the dummy copies takes; that's how fast you get with 4-float32
>     SIMD here, assuming this is really only bandwidth-limited. Trying
>     this verifies my suspicion!
>
> As you can see, the question what approach is fast really depends on
> what your compiler does, what SIMD instructions you can make use of
> (VOLK's sin only has optimizations for SSE4.1, I think) and how your
> data lies in memory.
>
> Best regards,
> Marcus
>
> On 07.04.2016 05:26, Trek Liu wrote:
>> What is the purpose of this file? There is zero documentation in this
>> file, is it ever being used? 
>> I am looking for a sin/cos table for speed optimization, is there one
>> inside gnuradio?
>>
>> Thanks. 
>>
>>
>> _______________________________________________
>> Discuss-gnuradio mailing list
>> Discuss-gnuradio@gnu.org
>> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>

_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

Reply via email to