https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104394

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
           Severity|normal                      |enhancement
   Last reconfirmed|                            |2022-02-05
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
We are missing a lot of vector vs scalar optimizations really.

This is a 2 optimizations we are missing here really.
First is:
#define bit 1u<<3

typedef int32_t v4i32 __attribute__((vector_size(16)));

v4i32 get_cmpmask(v4i32 mask)
{
    v4i32 signmask{(int32_t)bit, (int32_t)bit, (int32_t)bit, (int32_t)bit};
    return ((signmask & mask) == signmask);
}

This is not optimized to -(signmask>>log2(bit))&1) which is similar to:

        pslld   xmm0, 28
        psrad   xmm0, 31
on x86_64.


Here is a full example which shows what needs to be done:

#define bit 3

typedef int32_t v4i32 __attribute__((vector_size(16)));

v4i32 get_cmpmask(v4i32 mask)
{
    v4i32 signmask{(int32_t)1u<<bit, (int32_t)1u<<bit, (int32_t)1u<<bit,
(int32_t)1u<<bit};
    return ((signmask & mask) == signmask);
}

v4i32 get_cmpmask1(v4i32 mask)
{
    mask >>= bit;
    mask &= 1;
    mask = -mask;
    return mask;
}
v4i32 get_cmpmask2(v4i32 mask)
{
    mask <<= 31-bit;
    mask >>= 31;
    return mask;
}

---- CUT ---
Note clang does not even optimize get_cmpmask1 either.


GCC does get the scalar version correct though (but not at the gimple level):

int get_cmpmasks(int mask)
{
    mask >>= bit;
    mask &= 1;
    mask = -mask;
    return mask;
}

Reply via email to