Le dim. 14 juin 2020 à 07:06, Stephen McDowell <svenevs....@gmail.com> a
écrit :

> Hi Alexei,
>
> It's only __builtin_shuffle that's a problem.  I'm a simd novice at best
> hehe.  I played around for a good long while trying to find an equivalent
> shuffle intrinsic, for now I was just working off of the GCC examples for
> __builtin_shuffle: https://godbolt.org/z/gPiZQL
>
> It's technically successful, but with a big caveat that in order for me to
> try and translate this to the freetype code I need help understanding how
> the mask={0,1,1,3} gets transformed into 212 in emitted `pshufd xmm0, xmm0,
> 212` from the gcc __builtin_shuffle call.  Look for `#define MAGIC` in the
> example, anything stick out as to how that value is created?  If we know
> how that is done, I can begin looking into shorts (v82 type used in
> freetype code) rather than int in the example code.
>
> 212 decimal is 0xD4 hex, which is binary for 11010100, or 11_01_01_00 when
separating 2-bit values, which corresponds to the {0, 1, 1, 3} mask in
little-endian order.
For more details, see
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#cats=Swizzle&text=shuffle_epi32&expand=5144
which explains how the second argument to __mm_shuffle_epi32 is interpreted
by the CPU.


> I'm game to push a little further on it, but to be honest adding in
> conditional trickery for intel will make this code more confusing.  It's
> going to have to convert between v82 and one of the _mXXXi vector types and
> shuffle splitting (can't call _mm_shuffle* with v82 type).  In other words,
> while intel users may not get the fastest possible code, previously none of
> this code was vectorized anyway so it's kind of a wash.  That said, I
> totally understand the desire to vectorize it if we can :)
>
> I think it makes sense to first disable the vectorized code path to get
the source to build properly with the Intel compiler.
A second patch could try to optimize the code using Intel intrinsics on x86
and x86_64, this would probably be portable to more compilers. Not sure
this is worth it though.

>
>
> Let me know your thoughts!
>
> -Stephen
>
>
> On Sat, Jun 13, 2020 at 2:36 PM Alexei Podtelezhnikov <apodt...@gmail.com>
> wrote:
>
>> On Fri, Jun 12, 2020 at 8:07 AM Stephen McDowell <svenevs....@gmail.com>
>> wrote:
>> > I help maintain the spack package manager when I can, currently users
>> with intel compilers cannot build / install any version after 2.7.1 due to
>> the usage of __builtin_shuffle (for some reason Intel still doesn't support
>> this).
>>
>> Is there by any chance an equivalent intrinsic?
>>
>> https://software.intel.com/sites/landingpage/IntrinsicsGuide/#cats=Bit%20Manipulation
>> What about __builtin_clz that FreeType also uses?
>>
>

Reply via email to