[Bug rtl-optimization/117012] New: [15 Regression] incorrect RTL simplification around vector AND and shifts

tnfchris at gcc dot gnu.org via Gcc-bugs Tue, 08 Oct 2024 01:02:05 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117012


            Bug ID: 117012
           Summary: [15 Regression] incorrect RTL simplification around
                    vector AND and shifts
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: wrong-code
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64*

The following example:

#include <arm_neon.h>
#include <stdint.h>

uint8x16_t f (uint8x16_t x)
{
  uint8x16_t mask = vreinterpretq_u8_u64(vdupq_n_u64 (0x101));
  return vandq_u8(vcltq_s8(vreinterpretq_s8_u8(x), vdupq_n_s8(0)), mask);
}

compiled at -O3 gives the following:

f:
        ushr    v0.16b, v0.16b, 7
        ret

This is incorrect as it assumes that the value in every lane for the AND was
0x1 where in fact only the bottom lane is.

combine is matching this incorrect pattern:

Trying 7, 6 -> 8:
    7: r108:V16QI=const_vector
    6: r107:V16QI=r109:V16QI>>const_vector
      REG_DEAD r109:V16QI
    8: r106:V16QI=r107:V16QI&r108:V16QI
      REG_DEAD r108:V16QI
      REG_DEAD r107:V16QI
      REG_EQUAL r107:V16QI&const_vector
Successfully matched this instruction:
(set (reg:V16QI 106 [ _5 ])
    (lshiftrt:V16QI (reg:V16QI 109 [ xD.22802 ])
        (const_vector:V16QI [
                (const_int 7 [0x7]) repeated x16
            ])))

The optimization seems to only look at the bottom lane of the vector:

#include <arm_neon.h>
#include <stdint.h>

uint8x16_t f (uint8x16_t x)
{
  uint8x16_t mask = vreinterpretq_u8_u64(vdupq_n_u64 (0x301));
  return vandq_u8(vcltq_s8(vreinterpretq_s8_u8(x), vdupq_n_s8(0)), mask);
}

also generates incorrect code but changing the bottom lane

#include <arm_neon.h>
#include <stdint.h>

uint8x16_t f (uint8x16_t x)
{
  uint8x16_t mask = vreinterpretq_u8_u64(vdupq_n_u64 (0x102));
  return vandq_u8(vcltq_s8(vreinterpretq_s8_u8(x), vdupq_n_s8(0)), mask);
}

gives the right result.

[Bug rtl-optimization/117012] New: [15 Regression] incorrect RTL simplification around vector AND and shifts

Reply via email to