Thanks for the hint Nathan, I added puppets for both. They both need 'num_byes / 8'. They either write or read unallocated memory otherwise. Also, they pass the respective tests now.
Now I'm working on a polar encoder for VOLK. So far I have a generic version which passes all tests. Though, when running 'ctest -V' in 'build/volk', this is part of the output. Even for the generic implementation. 'Volk warning: no arch found, returning generic impl' that's the function signature. static inline void volk_8u_x3_encodepolar_8u_generic( unsigned char* frame, unsigned char* temp, const unsigned char* frozen_bit_mask, const unsigned char* frozen_bits, const unsigned char* info_bits, unsigned int frame_size, unsigned int info_bit_size) Also, when trying to add an AVX version, the compiler throws a warning: 'build/volk/lib/volk_machine_avx_64_mmx.c:627:5: warning: initialization from incompatible pointer type [enabled by default] {volk_8u_x3_encodepolarpuppet_8u_generic, volk_8u_x3_encodepolarpuppet_8u_u_avx},' It never gets called because of the first VOLK warning. Something is obviously failing here. But I couldn't figure out what it is. I'm working on an Intel Core i7-3630QM and volk reports that AVX is available. GCC 4.8.4 and Ubuntu 14.04.2 I suspect that the function signature is causing problems. But clarification on what happens with signatures and how they're interpreted would be helpful. Cheers Johannes PS: a better subject would be helpful. On 24.07.2015 18:13, West, Nathan wrote: > On Fri, Jul 24, 2015 at 11:44 AM, Johannes Demel > <uf...@student.kit.edu> wrote: > >> Hey community, >> >> after last weeks success with channel construction, this week is >> calmer. It involves a steep learning curve for SIMD. So I was >> able to create my first VOLK kernels [3]. There are two new >> kernels for 8bit packing and unpacking. In case someone wants to >> pack 8 bytes with the LSB active into one byte, there's a new >> VOLK kernel to do this for you. At first, I thought, this is as >> simple as doing a load+movemask operation. Unfortunately, >> endianness stopped me from doing so. Thus it involves shuffling >> and AND, COMPARE operations too. Without Shuffling it should have >> worked with SSE2 but since shuffle is involved SSSE3 is >> required. I'm reading through all the docs and websites which >> target SIMD and find new ways to do things all the time. So, I >> guess it is a long way to go until I have some decent knowledge >> about SIMD instructions. Though, I could achieve a 7x speedup for >> packing bits compared to the generic implementation. Also, I >> created a kernel for unpacking. I wasn't very successful here. >> SSSE3 implementation is slower than the generic one for now. >> Maybe someone can give me a hint on what is going wrong here. I >> named those two new kernels 'volk_8u_pack8_8u' and >> 'volk_8u_unpack8_8u'. I hope this explains there operation. >> Suggestions on alternative names are welcome here. I tried to >> integrate my VOLK kernels into VOLKS test framework, but that is >> quite tough. It seems like it doesn't expect any rate changing >> kernels. >> >> My aim for next week is to come up with a kernel for polar code >> encoding. This will include interleaving a lot of bits which is >> the actual issue to overcome. >> >> More info and current project progress can be found in [1], [2] >> and [3]. >> >> Cheers Johannes >> >> [1] https://github.com/jdemel/gnuradio [2] >> https://github.com/jdemel/socis-proposal [3] >> https://github.com/jdemel/volk >> >> >> > Hi Johannes, > > This is pretty neat-- nice work! > > You'll probably need to use a puppet. The VOLK QA creates input and > output buffers that are itemsize * num_points for every input and > output. I think this is fine for the packer, but as you've > discovered will not work for the unpacker. A puppet lets you wrap > your actual kernel in a way that works nicely with the VOLK QA. In > this case I suspect you want something like the following: > > volk_8u_unpack8puppet_8u_generic(uchar* out, uchar* in, > num_points{ volk_8u_unpack8_8u_generic(out, in, num_points/8); } > > You'll get 8x as much buffer space and a bunch of inputs that > you'll never operate on, which is OK. This obviously isn't critical > for your GSoC project, but we'll want to do this at some point > since this looks really useful. > > Nathan > _______________________________________________ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org https://lists.gnu.org/mailman/listinfo/discuss-gnuradio