https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117383
Bug ID: 117383
Summary: gcc relies on RISC-V vcompress instruction undefined
behaviour
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: anton at ozlabs dot org
Target Milestone: ---
I think gcc is relying on undefined behaviour with the vcompress instruction.
This thread explains how vcompress is different in that the tail starts after
the last mask selected field:
https://github.com/riscvarchive/riscv-v-spec/issues/796
There was a bug in QEMU that I just fixed that prevented the all 1s tail
agnostic option (rvv_ta_all_1s) from poisoning these bits:
https://lists.nongnu.org/archive/html/qemu-riscv/2024-10/msg00561.html
With that fix, I see problems with the test case below until I modify the
previous setvli from ta to tu. I think 9aabf81f40f0 ("RISC-V: Optimize
permutation codegen with compress") is one place we need to set tail
undisturbed.
Build with:
gcc -march=rv64gcv -mabi=lp64d -mrvv-vector-bits=zvl -O3
QEMU without all 1s tail agnostic poisoning:
-1
-2
-3
-5
-7
-9
-10
-11
-12
-14
-15
-17
-19
-21
-22
-23
-26
-28
-30
-31
-37
-38
-41
-46
-47
-53
-54
-55
-60
-61
-62
-63
52
53
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
QEMU with all 1s tail agnostic poisoning:
-1
-2
-3
-5
-7
-9
-10
-11
-12
-14
-15
-17
-19
-21
-22
-23
-26
-28
-30
-31
-37
-38
-41
-46
-47
-53
-54
-55
-60
-61
-62
-63
52
53
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
Not sure where the 52/53 values are coming from either.
#include <stdio.h>
#include <stdint.h>
typedef int8_t vnx64i __attribute__ ((vector_size (64)));
#define MASK_64
\
1, 2, 3, 5, 7, 9, 10, 11, 12, 14, 15, 17, 19, 21, 22, 23, 26, 28, 30, 31,
\
37, 38, 41, 46, 47, 53, 54, 55, 60, 61, 62, 63, 76, 77, 78, 79, 80, 81,
\
82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,
\
100, 101, 102, 103, 104, 105, 106, 107
void __attribute__ ((noinline, noclone)) test_1 (int8_t *x, int8_t *y, int8_t
*out)
{
vnx64i v1 = *(vnx64i*)x;
vnx64i v2 = *(vnx64i*)y;
vnx64i v3 = __builtin_shufflevector (v1, v2, MASK_64);
*(vnx64i*)out = v3;
}
int main(void)
{
int8_t x[64];
int8_t y[64];
int8_t out[64];
for (int i = 0; i < 64; i++) {
x[i] = -i;
y[i] = i;
}
test_1(x, y, out);
for (int i = 0; i < 64; i++) {
printf("%d\n", out[i]);
}
}