https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124174

            Bug ID: 124174
           Summary: aarch64: NEON vadd should be well-defined on overflow
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: acoplan at gcc dot gnu.org
  Target Milestone: ---

Consider the following testcase:

#include <arm_neon.h>
uint32x4_t ubtest_neon(int32x4_t v)
{
    return vcltq_s32(vaddq_s32(v, vdupq_n_s32(1)), v);
}

GCC currently miscompiles this to:

ubtest_neon:
        movi    v0.4s, 0
        ret

it is wrong because a NEON vadd should be well-defined on overflow. The problem
is that it is implemented using GCC vector extensions and open-coded with the +
operator directly on the signed vector types (in arm_neon.h):

__extension__ extern __inline int32x4_t
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
vaddq_s32 (int32x4_t __a, int32x4_t __b)
{
  return __a + __b;
}

LLVM generates:

ubtest_neon:
        mvni    v1.4s, #128, lsl #24
        cmeq    v0.4s, v0.4s, v1.4s
        ret

for the above testcase.  This is a long-standing issue, I suspect it has been
the case since arm_neon.h was introduced.  I wonder if we couldn't just
implement the addition by casting to unsigned vector types, doing the addition
as unsigned, and then casting back to signed vectors.

Ideally it would be done inside the compiler, too, rather than as an
always_inline function in arm_neon.h, but I suppose that refactoring could be a
separate step.

Reply via email to