To answer your question however, he's misguided. Fftw and volk both have
methods (volk_profile, fftw-wisdom) to profile and determine the best
instructions to use for cases where they have multiple options, it's not
going to get noticeably slower by compiling the extra stuff in.
On Mar 9, 2016 8:47 AM, "devin kelly" <dwwke...@gmail.com> wrote:

> Thanks for the help, I don't think I could have figured this out on my own.
>
> This is because I'm on RHEL7 (argh!).  My libfftw.so doesn't contain any
> references to AVX. For me there are a couple of options for fixing this:
>
> 1) Use Nathan's branch.
> 2) Rebuild fftw with AVX support
> 3) Rebuild GR and Volk without AVX.
>
> I tried 2) first and noticed this in the spec file that was in the source
> RPM I was trying to rebuild:
>
> %ifarch %{ix86} x86_64
> # Enable SSE2 support for x86 and x86_64
> # (no avx as it is claimed to drastically slower)
> for((i=0;i<2;i++)); do
>  prec_flags[i]+=" --enable-sse2"
> done
> %endif
>
> Is the spec file author right?  Now I'm a little confused about the
> approach I should take.  I'll probably just go with 1) in the mean time.
>
> Thanks again Nathan,
> Devin
>
> On Wed, Mar 9, 2016 at 1:06 AM, West, Nathan <n...@ostatemail.okstate.edu>
> wrote:
>
>> The a and c vectors come from gr:fft objects' internal buffers. These are
>> internally created with fftwf_malloc (lines 152/156 of gr-fft/lib/fft.cc).
>> fftwf_malloc is obviously not generating buffers with proper alignment so
>> you're seeing a 50% (per buffer) that this segfaults. I'll note that this
>> is also only an issue with fftwf buffers when fftwf isn't built with AVX
>> support (and therefore nothing in fftwf requires  a 32-byte aligned buffer).
>>
>> Andy Walls (thanks!) pointed out on IRC that we had a similar issue years
>> ago with a QT sink.
>>
>> I have a branch that should fix this (
>> https://github.com/n-west/gnuradio/tree/fft-avx-alignment). I also
>> suggest you look in to getting a version of fftwf built with AVX. I don't
>> know if there's a good way to tell, but if I run readelf -a on my
>> libfftw3.so I see some functions with avx in the name.
>>
>> Cheers,
>> nw
>>
>>
>> On Tue, Mar 8, 2016 at 1:31 PM, devin kelly <dwwke...@gmail.com> wrote:
>>
>>> OK, here's my C program:
>>>
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>> #include <volk/volk.h>
>>> #include <stdint.h>
>>>
>>> int main() {
>>>
>>>     size_t alignment = volk_get_alignment();
>>>
>>>     uint8_t* ptr;
>>>
>>>     ptr = (uint8_t*)volk_malloc(1000 * sizeof(uint8_t), alignment);
>>>     printf("alignment = %lu, ptr = %x, *ptr = %u\n", alignment, ptr,
>>> *ptr);
>>>     volk_free((void*)ptr);
>>>     ptr = NULL;
>>>
>>>
>>>     return 0;
>>> }
>>>
>>>
>>> Compile:
>>>
>>> $ gcc volk_test.c -o volk_test -lvolk -L/local_disk/gr_3.7.9_debug/lib
>>>
>>> It's output:
>>>
>>> $ ./volk_test
>>> Using Volk machine: avx2_64_mmx_orc
>>> alignment = 32, ptr = 151b040, *ptr = 00
>>>
>>> Also, I've attached the output from the preprocessor, this command:
>>>
>>> $ /usr/bin/cc  -DHAVE_AVX_CVTPI32_PS -DHAVE_CPUID_H -DHAVE_DLFCN_H
>>> -DHAVE_FENV_H -DHAVE_POSIX_MEMALIGN -DHAVE_XGETBV -Wall -fvisibility=hidden
>>> -g -I/local_disk/gr_3.7.9_src/volk/build_debug/include
>>> -I/local_disk/gr_3.7.9_src/volk/include
>>> -I/local_disk/gr_3.7.9_src/volk/kernels
>>> -I/local_disk/gr_3.7.9_src/volk/build_debug/lib
>>> -I/local_disk/gr_3.7.9_src/volk/lib -I/usr/include/orc-0.4  -E  -fPIC -o
>>> volk_malloc_preprocessed   -c
>>> /local_disk/gr_3.7.9_src/volk/lib/volk_malloc.c
>>>
>>> I just found the compiler step from from doing 'VERBOSE=1 make' then
>>> changed the output and added -E.  I attached volk_malloc_preprocessed as
>>> well.
>>>
>>> It looks like this is my volk_malloc():
>>>
>>>
>>> void *volk_malloc(size_t size, size_t alignment)
>>> {
>>>   void *ptr;
>>>
>>>
>>>
>>>
>>>   if (alignment == 1)
>>>     return malloc(size);
>>>
>>>   int err = posix_memalign(&ptr, alignment, size);
>>>   if(err == 0) {
>>>     return ptr;
>>>   }
>>>   else {
>>>     fprintf(stderr,
>>>             "VOLK: Error allocating memory "
>>>             "(posix_memalign: error %d: %s)\n", err, strerror(err));
>>>     return ((void *)0);
>>>   }
>>> }
>>>
>>>
>>>
>>> Devin
>>>
>>>
>>>
>>> On Tue, Mar 8, 2016 at 11:37 AM, West, Nathan <
>>> n...@ostatemail.okstate.edu> wrote:
>>>
>>>>
>>>> On Tue, Mar 8, 2016 at 10:58 AM, devin kelly <dwwke...@gmail.com>
>>>> wrote:
>>>>
>>>>> Calling 'info variables' (or args or locals) the last few frames
>>>>> didn't give me any real info so I built a copy of GR/Volk with debug
>>>>> symbols.  I ran the FG again, this time from GDB, here's my back trace.  
>>>>> In
>>>>> this backtrace you can see the arguments passed in each call.  I have an
>>>>> i7-5600U CPU @ 2.60GHz, the volk_profile is appended at the bottom.
>>>>>
>>>>
>>>> Excellent. Thanks for going through that extra step. It really helps.
>>>>
>>>>
>>>>>
>>>>> Here's are the links for the relevant code:
>>>>>
>>>>>
>>>>> https://github.com/gnuradio/volk/blob/f0b722392950bf7ede7b32f5ff60019bce7a8592/kernels/volk/volk_32fc_x2_multiply_32fc.h#L232
>>>>>
>>>>> https://github.com/gnuradio/gnuradio/blob/master/gr-filter/lib/fft_filter.cc#L323
>>>>>
>>>>> https://github.com/gnuradio/gnuradio/blob/222e0003f9797a1b92d64855bd2b93f0d9099f93/gr-digital/lib/corr_est_cc_impl.cc#L214
>>>>>
>>>>> Could the problem be that nitems is 257 and num_points is 512?  Or
>>>>> should nitems really be 256 and not 257?
>>>>>
>>>>
>>>> I don't think so. I'm not familiar with the details of the fft_filter
>>>> implementations, but usually these things will take in some history if they
>>>> don't have enough points to operate on (in this case 512).
>>>>
>>>> The much more worrying thing is your vector addresses.
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Devin
>>>>>
>>>>> (gdb) bt
>>>>> #0  0x00007fffdcaccb57 in volk_32fc_x2_multiply_32fc_a_avx2_fma
>>>>> (__P=0x3b051b0)
>>>>>     at /usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/avxintrin.h:835
>>>>> #1  0x00007fffdcaccb57 in volk_32fc_x2_multiply_32fc_a_avx2_fma
>>>>> (cVector=0x3b1f770, aVector=0x3b051b0, bVector=0x3b240e0, num_points=512)
>>>>>
>>>>
>>>> 0x3b1f770 % 32 = 16 (bad)
>>>> 0x3b051b0 % 32 = 16 (bad)
>>>> 0x3b240e0 % 32 = 0 (good)
>>>>
>>>> Unfortunately it looks like volk_get_alignment is returning the wrong
>>>> thing or there's a bug in volk_malloc. Can you tell us what
>>>> volk_get_alignment returns? The easiest thing is probably to write a simple
>>>> C program that prints out the result (hmm, I should add that to
>>>> volk-config-info). I'd also like to know which volk_malloc implementation
>>>> you're using. Unfortunately I don't think we have an easy way to discover
>>>> that (hmm, something else that should be added to volk-config-info). I
>>>> think the best way might be to look at volk_malloc.c intermediate files
>>>> after the preprocessor has done its work.
>>>>
>>>> If you want to move on while we figure this out then you can edit
>>>> ~/.volk/volk_config and replace the avx2_fma with sse3 on the line that has
>>>> this kernel name on it.
>>>>
>>>>
>>>>>     at
>>>>> /local_disk/gr_3.7.9_src/volk/kernels/volk/volk_32fc_x2_multiply_32fc.h:242
>>>>> #2  0x00007fffdc945a75 in __volk_32fc_x2_multiply_32fc_a
>>>>> (cVector=0x3b1f770, aVector=0x3b051b0, bVector=0x3b240e0, num_points=512)
>>>>>     at /local_disk/gr_3.7.9_src/volk/build_debug/lib/volk.c:7010
>>>>> #3  0x00007fffd3f8e360 in
>>>>> gr::filter::kernel::fft_filter_ccc::filter(int, std::complex<float> 
>>>>> const*,
>>>>> std::complex<float>*) (this=0x3b02f40, nitems=nitems@entry=257,
>>>>> input=input@entry=0x7fffc9cc7000, output=output@entry=0x3b36460)
>>>>>     at
>>>>> /local_disk/gr_3.7.9_src/gnuradio/gr-filter/lib/fft_filter.cc:323
>>>>> #4  0x00007fffd42910df in gr::digital::corr_est_cc_impl::work(int,
>>>>> std::vector<void const*, std::allocator<void const*> >&, 
>>>>> std::vector<void*,
>>>>> std::allocator<void*> >&) (this=0x3b01560, noutput_items=257,
>>>>> input_items=..., output_items=std::vector of length 1, capacity 1 = {...})
>>>>>     at
>>>>> /local_disk/gr_3.7.9_src/gnuradio/gr-digital/lib/corr_est_cc_impl.cc:237
>>>>> #5  0x00007fffdd064907 in gr::sync_block::general_work(int,
>>>>> std::vector<int, std::allocator<int> >&, std::vector<void const*,
>>>>> std::allocator<void const*> >&, std::vector<void*, std::allocator<void*>
>>>>> >&) (this=0x3b015b8, noutput_items=<optimized out>, ninput_items=...,
>>>>> input_items=..., output_items=...) at
>>>>> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/sync_block.cc:66
>>>>> #6  0x00007fffdd02f70f in gr::block_executor::run_one_iteration()
>>>>> (this=this@entry=0x7fff83ffedb0)
>>>>>     at
>>>>> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/block_executor.cc:438
>>>>> #7  0x00007fffdd06da8a in
>>>>> gr::tpb_thread_body::tpb_thread_body(boost::shared_ptr<gr::block>, int)
>>>>> (this=0x7fff83ffedb0, block=..., max_noutput_items=<optimized out>) at
>>>>> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/tpb_thread_body.cc:122
>>>>> #8  0x00007fffdd062761 in
>>>>> boost::detail::function::void_function_obj_invoker0<gr::thread::thread_body_wrapper<gr::tpb_container>,
>>>>> void>::invoke(boost::detail::function::function_buffer&) (this=0x3bc3ec0)
>>>>>     at
>>>>> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/scheduler_tpb.cc:44
>>>>> #9  0x00007fffdd062761 in
>>>>> boost::detail::function::void_function_obj_invoker0<gr::thread::thread_body_wrapper<gr::tpb_container>,
>>>>> void>::invoke(boost::detail::function::function_buffer&) (this=0x3bc3ec0)
>>>>>     at
>>>>> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/include/gnuradio/thread/thread_body_wrapper.h:51
>>>>> #10 0x00007fffdd062761 in
>>>>> boost::detail::function::void_function_obj_invoker0<gr::thread::thread_body_wrapper<gr::tpb_container>,
>>>>> void>::invoke(boost::detail::function::function_buffer&)
>>>>> (function_obj_ptr=...) at
>>>>> /usr/include/boost/function/function_template.hpp:153
>>>>> #11 0x00007fffdd016cd0 in
>>>>> boost::detail::thread_data<boost::function0<void> >::run() 
>>>>> (this=<optimized
>>>>> out>)
>>>>>     at /usr/include/boost/function/function_template.hpp:767
>>>>> #12 0x00007fffdd016cd0 in
>>>>> boost::detail::thread_data<boost::function0<void> >::run() 
>>>>> (this=<optimized
>>>>> out>)
>>>>>     at /usr/include/boost/thread/detail/thread.hpp:117
>>>>> #13 0x00007fffdbe4f24a in thread_proxy () at
>>>>> /lib64/libboost_thread-mt.so.1.53.0
>>>>> #14 0x00007ffff7800dc5 in start_thread () at /lib64/libpthread.so.0
>>>>> #15 0x00007ffff6e2528d in clone () at /lib64/libc.so.6
>>>>>
>>>>> Here are the locals on the last few frames:
>>>>>
>>>>> (gdb) f 0
>>>>> #0  0x00007fffdcaccb57 in _mm256_load_ps (__P=0x3b051b0) at
>>>>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/avxintrin.h:835
>>>>> 835       return *(__m256 *)__P;
>>>>> (gdb) info locals
>>>>> No locals.
>>>>> (gdb) f 1
>>>>> #1  volk_32fc_x2_multiply_32fc_a_avx2_fma (cVector=0x3b1f770,
>>>>> aVector=0x3b051b0, bVector=0x3b240e0, num_points=512)
>>>>>     at
>>>>> /local_disk/gr_3.7.9_src/volk/kernels/volk/volk_32fc_x2_multiply_32fc.h:242
>>>>> 242         const __m256 x = _mm256_load_ps((float*)a); // Load the ar
>>>>> + ai, br + bi as ar,ai,br,bi
>>>>> (gdb) info locals
>>>>> y = {-4.87433296e+17, 4.59163468e-41, -3.92813517e+17, 4.59163468e-41,
>>>>> 5.15677835e-43, 0, 5.26888223e-43, 0}
>>>>> tmp2x = {6.389921e-43, 0, -512.314453, 4.59163468e-41, 1.26116862e-44,
>>>>> 0, -4.87433296e+17, 4.59163468e-41}
>>>>> x = {-512.314453, 4.59163468e-41, 0, 0, 2.76102662, -3.64918089,
>>>>> -4.92134571, -1.06491208}
>>>>> yl = {4.14784345e-43, 0, 1.26116862e-44, 0, -4.87442367e+17,
>>>>> 4.59163468e-41, -4.87439343e+17, 4.59163468e-41}
>>>>> yh = {-1674752, 4.59163468e-41, 0, 0, -1.50397414e-36, 4.59163468e-41,
>>>>> -3.31452625e+17, 4.59163468e-41}
>>>>> tmp2 = {6.72623263e-44, 1.2751816e-43, 2.24207754e-44, 0,
>>>>> 7.17464814e-43, 0, -3.31440427e+17, 4.59163468e-41}
>>>>> z = {0.794147611, 0, 0.263988227, 0, -0.380019426, 0, -0.953325868, 0}
>>>>> number = 0
>>>>> quarterPoints = 128
>>>>> c = 0x3b1f770
>>>>> a = 0x3b051b0
>>>>> b = 0x3b240e0
>>>>> (gdb) f 2
>>>>> #2  0x00007fffdc945a75 in __volk_32fc_x2_multiply_32fc_a
>>>>> (cVector=0x3b1f770, aVector=0x3b051b0, bVector=0x3b240e0, num_points=512)
>>>>>     at /local_disk/gr_3.7.9_src/volk/build_debug/lib/volk.c:7010
>>>>> 7010        volk_32fc_x2_multiply_32fc_a(cVector, aVector, bVector,
>>>>> num_points);
>>>>> (gdb) info locals
>>>>> No locals.
>>>>> (gdb) f 3
>>>>> #3  0x00007fffd3f8e360 in gr::filter::kernel::fft_filter_ccc::filter
>>>>> (this=0x3b02f40, nitems=nitems@entry=257,
>>>>>     input=input@entry=0x7fffc9cc7000, output=output@entry=0x3b36460)
>>>>> at /local_disk/gr_3.7.9_src/gnuradio/gr-filter/lib/fft_filter.cc:323
>>>>> 323               volk_32fc_x2_multiply_32fc_a(c, a, b, d_fftsize);
>>>>> (gdb) info locals
>>>>> a = <optimized out>
>>>>> b = <optimized out>
>>>>> c = <optimized out>
>>>>> i = 0
>>>>> dec_ctr = 0
>>>>> j = <optimized out>
>>>>> ninput_items = 257
>>>>>
>>>>> My volk profile results:
>>>>>
>>>>> $  volk_profile -R 32fc_x2_multiply
>>>>> Using Volk machine: avx2_64_mmx_orc
>>>>> RUN_VOLK_TESTS: volk_32fc_x2_multiply_32fc(131071,1987)
>>>>> u_avx2_fma completed in 220ms
>>>>> u_avx completed in 220ms
>>>>> u_sse3 completed in 240ms
>>>>> generic completed in 2810ms
>>>>> a_avx2_fma completed in 200ms
>>>>> a_avx completed in 220ms
>>>>> a_sse3 completed in 230ms
>>>>> a_generic completed in 2810ms
>>>>> u_orc completed in 280ms
>>>>> Best aligned arch: a_avx2_fma
>>>>> Best unaligned arch: u_avx2_fma
>>>>> RUN_VOLK_TESTS: volk_32fc_x2_multiply_conjugate_32fc(131071,1987)
>>>>> u_avx completed in 230ms
>>>>> u_sse3 completed in 230ms
>>>>> generic completed in 2790ms
>>>>> a_avx completed in 220ms
>>>>> a_sse3 completed in 230ms
>>>>> a_generic completed in 2800ms
>>>>> Best aligned arch: a_avx
>>>>> Best unaligned arch: u_avx
>>>>> Writing "/home/devin/.volk/volk_config"...
>>>>>
>>>>>
>>>> Well I'm both jealous and happy that AVX2 is actually an improvement on
>>>> newer processors. Also matches the folklore that these new technologies are
>>>> usually not faster in the first silicon products that they come out in.
>>>>
>>>
>>>
>>> _______________________________________________
>>> Discuss-gnuradio mailing list
>>> Discuss-gnuradio@gnu.org
>>> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>>>
>>>
>>
>
> _______________________________________________
> Discuss-gnuradio mailing list
> Discuss-gnuradio@gnu.org
> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>
>
_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

Reply via email to