https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117012
Bug ID: 117012
Summary: [15 Regression] incorrect RTL simplification around
vector AND and shifts
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords: wrong-code
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: tnfchris at gcc dot gnu.org
Target Milestone: ---
Target: aarch64*
The following example:
#include <arm_neon.h>
#include <stdint.h>
uint8x16_t f (uint8x16_t x)
{
uint8x16_t mask = vreinterpretq_u8_u64(vdupq_n_u64 (0x101));
return vandq_u8(vcltq_s8(vreinterpretq_s8_u8(x), vdupq_n_s8(0)), mask);
}
compiled at -O3 gives the following:
f:
ushr v0.16b, v0.16b, 7
ret
This is incorrect as it assumes that the value in every lane for the AND was
0x1 where in fact only the bottom lane is.
combine is matching this incorrect pattern:
Trying 7, 6 -> 8:
7: r108:V16QI=const_vector
6: r107:V16QI=r109:V16QI>>const_vector
REG_DEAD r109:V16QI
8: r106:V16QI=r107:V16QI&r108:V16QI
REG_DEAD r108:V16QI
REG_DEAD r107:V16QI
REG_EQUAL r107:V16QI&const_vector
Successfully matched this instruction:
(set (reg:V16QI 106 [ _5 ])
(lshiftrt:V16QI (reg:V16QI 109 [ xD.22802 ])
(const_vector:V16QI [
(const_int 7 [0x7]) repeated x16
])))
The optimization seems to only look at the bottom lane of the vector:
#include <arm_neon.h>
#include <stdint.h>
uint8x16_t f (uint8x16_t x)
{
uint8x16_t mask = vreinterpretq_u8_u64(vdupq_n_u64 (0x301));
return vandq_u8(vcltq_s8(vreinterpretq_s8_u8(x), vdupq_n_s8(0)), mask);
}
also generates incorrect code but changing the bottom lane
#include <arm_neon.h>
#include <stdint.h>
uint8x16_t f (uint8x16_t x)
{
uint8x16_t mask = vreinterpretq_u8_u64(vdupq_n_u64 (0x102));
return vandq_u8(vcltq_s8(vreinterpretq_s8_u8(x), vdupq_n_s8(0)), mask);
}
gives the right result.