https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118062
Bug ID: 118062
Summary: [15 regression]
c-c++-common/torture/vector-compare-1.c fails on arm /
MVE after gcc-15-5317-gf40010c198f
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: clyon at gcc dot gnu.org
Reporter: clyon at gcc dot gnu.org
CC: rguenth at gcc dot gnu.org
Target Milestone: ---
Target: arm
After commit gcc-15-5317-gf40010c198f we have noticed that vector-compare-1.c
fails at execution when using the MVE vector extension on arm:
FAIL: c-c++-common/torture/vector-compare-1.c -O0 execution test
FAIL: c-c++-common/torture/vector-compare-1.c -O1 execution test
FAIL: c-c++-common/torture/vector-compare-1.c -O2 execution test
FAIL: c-c++-common/torture/vector-compare-1.c -O2 -flto -fno-use-linker-plugin
-flto-partition=none execution test
FAIL: c-c++-common/torture/vector-compare-1.c -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects execution test
FAIL: c-c++-common/torture/vector-compare-1.c -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions execution test
FAIL: c-c++-common/torture/vector-compare-1.c -O3 -g execution test
FAIL: c-c++-common/torture/vector-compare-1.c -Os execution test
on GCC target arm-none-eabi configured with
--disable-multilib --with-mode=thumb --with-arch=armv8.1-m.main+mve.fp+fp.dp
--with-float=hard
running the testsuite with
-mthumb/-march=armv8.1-m.main+mve.fp+fp.dp/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto
The problem occurs when comparing floats or doubles. For floats for instance,
the generated code looks like:
(input vectors are f0=(argc, 1, 2, 10) and f1=(0, 3, 2, -23)
vmov s15, r0 @ int # move argc (==1) into s15
vcvt.f32.s32 s15, s15 # convert it into floating-point
vcmpe.f32 s15, #0 # compare against 0
movs r1, #0
vmrs APSR_nzcv, FPSCR
push {r4, r5, lr}
it gt
movgt r2, #-1 # r2 = -1 (0xffffffff) if argc -gt 0
mov lr, #4
it le
movle r2, r1
mov r4, #0 @ movhi
lsl r2, r2, lr
asr r2, r2, lr
bfi r4, r2, #0, #4 # r4 = 0x0000000f
vldr.64 d2, .L8
vldr.64 d3, .L8+8 # d2/d3 (=q1 register) = {1, 1, 2, 10}
vmov.i32 q2, #0xffffffff @ v4si # q2 = { -1, -1, -1, -1}
vmov.i32 q0, #0 @ v4si # q0 = { 0, 0, 0, 0}
vmov r5, s15
vmsr p0, r4 @ movhi # p0 (predicate register) = 0x000f
(only 16 bits, 1 per byte)
vpush.64 {d8, d9}
vmov.32 q1[0], r5 # insert argc as q1[0], so q1={argc,
1, 2, 10}
vldr.64 d8, .L8+16
vldr.64 d9, .L8+24 # d8/d9 (=q4 register) = {0, 3, 2,
-23}
vpsel q2, q2, q0 # select q2 = p0 (q2, q0) = (-1, 0,
0, 0) = ( argc > 0 ? -1 : 0, 0, 0, 0)
then a loop which compares pairs one by one:
1 > 3 ? -> 0
2 > 2 ? -> 0
10 > - 23 ? -> -1
and compares the result with the corresponding element of q2
and fails on elem #3 because q2[3] = 0 but 10 > -23, so we expect -1.
In vector-compare-1.c.192t.loopdone we have:
<bb 2> [local count: 215091964]:
_1 = (float) argc_12(D);
_2 = {_1, 1.0e+0, 2.0e+0, 1.0e+1};
f0 = _2;
f1 = { 0.0, 3.0e+0, 2.0e+0, -2.3e+1 };
_3 = _2 > { 0.0, 3.0e+0, 2.0e+0, -2.3e+1 };
_4 = VEC_COND_EXPR <_3, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
ifres = _4;
and in vector-compare-1.c.196t.veclower21 we have:
<bb 2> [local count: 215091964]:
_1 = (float) argc_12(D);
_2 = {_1, 1.0e+0, 2.0e+0, 1.0e+1};
f0 = _2;
f1 = { 0.0, 3.0e+0, 2.0e+0, -2.3e+1 };
_28 = _1 > 0.0;
_29 = (<unnamed-signed:4>) _28;
_30 = -_29;
_31 = (<signed-boolean:4>) _30;
_3 = {_31};
_4 = VEC_COND_EXPR <_3, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
ifres = _4;
which seems to forget about comparing elements 1, 2 and 3 of f0/f1 ?