https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104360

            Bug ID: 104360
           Summary: Failure to optimize abs pattern on vector types
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

#include <stdint.h>

typedef int16_t v8i16 __attribute__((vector_size(16)));

v8i16 abs_i16(v8i16 x)
{
    auto isN = x < v8i16{};

    x ^= isN;
    return x - isN;
}

This (although I think v8i16 could be replaced with any integer vector type and
it still would work) can be optimized to using an abs instruction where
possible (such as `pabsw` on x86-64, or `abs` on aarch64)

PS: this doesn't even necessarily require an abs instruction. on standard
x86-64 with -O3, GCC manages just this:

abs_i16(short __vector(8)):
  pxor xmm1, xmm1
  pcmpgtw xmm1, xmm0
  pxor xmm0, xmm1
  psubw xmm0, xmm1
  ret

whereas LLVM outputs this:

abs_i16(short __vector(8)):
  pxor xmm1, xmm1
  psubw xmm1, xmm0
  pmaxsw xmm0, xmm1
  ret

which I'm pretty sure is better.

Reply via email to