Le dim. 14 juin 2020 à 07:06, Stephen McDowell <svenevs....@gmail.com> a écrit :
> Hi Alexei, > > It's only __builtin_shuffle that's a problem. I'm a simd novice at best > hehe. I played around for a good long while trying to find an equivalent > shuffle intrinsic, for now I was just working off of the GCC examples for > __builtin_shuffle: https://godbolt.org/z/gPiZQL > > It's technically successful, but with a big caveat that in order for me to > try and translate this to the freetype code I need help understanding how > the mask={0,1,1,3} gets transformed into 212 in emitted `pshufd xmm0, xmm0, > 212` from the gcc __builtin_shuffle call. Look for `#define MAGIC` in the > example, anything stick out as to how that value is created? If we know > how that is done, I can begin looking into shorts (v82 type used in > freetype code) rather than int in the example code. > > 212 decimal is 0xD4 hex, which is binary for 11010100, or 11_01_01_00 when separating 2-bit values, which corresponds to the {0, 1, 1, 3} mask in little-endian order. For more details, see https://software.intel.com/sites/landingpage/IntrinsicsGuide/#cats=Swizzle&text=shuffle_epi32&expand=5144 which explains how the second argument to __mm_shuffle_epi32 is interpreted by the CPU. > I'm game to push a little further on it, but to be honest adding in > conditional trickery for intel will make this code more confusing. It's > going to have to convert between v82 and one of the _mXXXi vector types and > shuffle splitting (can't call _mm_shuffle* with v82 type). In other words, > while intel users may not get the fastest possible code, previously none of > this code was vectorized anyway so it's kind of a wash. That said, I > totally understand the desire to vectorize it if we can :) > > I think it makes sense to first disable the vectorized code path to get the source to build properly with the Intel compiler. A second patch could try to optimize the code using Intel intrinsics on x86 and x86_64, this would probably be portable to more compilers. Not sure this is worth it though. > > > Let me know your thoughts! > > -Stephen > > > On Sat, Jun 13, 2020 at 2:36 PM Alexei Podtelezhnikov <apodt...@gmail.com> > wrote: > >> On Fri, Jun 12, 2020 at 8:07 AM Stephen McDowell <svenevs....@gmail.com> >> wrote: >> > I help maintain the spack package manager when I can, currently users >> with intel compilers cannot build / install any version after 2.7.1 due to >> the usage of __builtin_shuffle (for some reason Intel still doesn't support >> this). >> >> Is there by any chance an equivalent intrinsic? >> >> https://software.intel.com/sites/landingpage/IntrinsicsGuide/#cats=Bit%20Manipulation >> What about __builtin_clz that FreeType also uses? >> >