[Bug target/98119] New: SVE: Wrong code with -O1 -ftree-vectorize -msve-vector-bits=512 -mtune=thunderx

acoplan at gcc dot gnu.org via Gcc-bugs Thu, 03 Dec 2020 04:20:28 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98119


            Bug ID: 98119
           Summary: SVE: Wrong code with -O1 -ftree-vectorize
                    -msve-vector-bits=512 -mtune=thunderx
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: acoplan at gcc dot gnu.org
  Target Milestone: ---

AArch64 GCC miscompiles the following testcase:

_Bool a[34];
int main() {
  for (long b = 0; b < 2; ++b)
    for (long c = 0; c < 17; ++c)
      a[b * 2 + c] = 1;
  for (long c = 0; c < 7; ++c)
    if (!a[2 + c])
      __builtin_abort();
}

at -O1 -ftree-vectorize -march=armv8.2-a+sve -msve-vector-bits=512
-mtune=thunderx.

Removing any one of these flags, the issue goes away. Obviously, this is not a
sensible choice of -mtune given that we're asking for SVE, but it seems that
the scheduling should not result in a miscompile.

Looking at a snippet of the broken code:

main:
.LFB0:
        .cfi_startproc
        adrp    x2, .LANCHOR0
        add     x2, x2, :lo12:.LANCHOR0
        and     w3, w2, 63
        and     x0, x2, -64    // align x2 down
        add     w1, w3, 17
        whilelo p0.d, wzr, w1
        whilelo p1.d, wzr, w3
        not     p0.b, p0/z, p1.b
        mov     z0.b, #1
        st1b    z0.d, p0, [x0] // no-op (p0 all 0s)
        mov     w3, 8
        whilelo p0.d, w3, w1
        b.none  .L2
        add     x4, x0, 8
        st1b    z0.d, p0, [x4] // stores out-of-bounds
        add     x0, x0, 16
        mov     w3, 16
        whilelo p0.d, w3, w1
        b.none  .L2
        st1b    z0.d, p0, [x0]

We initially compute the address of our array (a) in x2, and then align this
down to the nearest 64-byte-aligned address, storing the result in x0. We then
add 8 to this, and store a vector to this address. But this address can be
out-of-bounds (suppose a is only 16-byte aligned). So things have already
started to go downhill by this point.

[Bug target/98119] New: SVE: Wrong code with -O1 -ftree-vectorize -msve-vector-bits=512 -mtune=thunderx

Reply via email to