https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110430
Bug ID: 110430
Summary: Fail to CSE for LEN_MASK_STORE
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: juzhe.zhong at rivai dot ai
Target Milestone: ---
Consider this following case:
void __attribute__((noinline,noclone))
foo (int *out, int *res)
{
int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 };
int i;
for (i = 0; i < 16; ++i)
{
if (mask[i])
out[i] = i;
}
int o0 = out[0];
int o7 = out[7];
int o14 = out[14];
int o15 = out[15];
res[0] = o0;
res[2] = o7;
res[4] = o14;
res[6] = o15;
}
-O3 -march=rv64gcv_zvl512b --param riscv-autovec-preference=fixed-vlmax
Current RVV auto-vectorization codegen:
foo:
lui a5,%hi(.LANCHOR0)
vsetivli zero,16,e32,m1,ta,ma
addi a5,a5,%lo(.LANCHOR0)
vid.v v1
vlm.v v0,0(a5)
vsetvli a5,zero,e32,m1,ta,ma
vse32.v v1,0(a0),v0.t
lw a2,0(a0)
lw a3,28(a0)
lw a4,56(a0)
lw a5,60(a0)
sw a2,0(a1)
sw a3,8(a1)
sw a4,16(a1)
sw a5,24(a1)
ret
However, with this patch:
https://patchwork.sourceware.org/project/gcc/patch/[email protected]/
We will end up with better codegen with CSE:
foo:
lui a5,%hi(.LANCHOR0)
vsetivli zero,16,e32,m1,ta,ma
addi a5,a5,%lo(.LANCHOR0)
vid.v v1
vlm.v v0,0(a5)
vsetvli a5,zero,e32,m1,ta,ma
vse32.v v1,0(a0),v0.t
lw a4,0(a0)
lw a5,56(a0)
sw a4,0(a1)
sw a5,16(a1)
li a4,7
li a5,15
sw a4,8(a1)
sw a5,24(a1)
ret
2 "lw" should be CSE into 2 "li" instructions, gimple IR:
.LEN_MASK_STORE (out_10(D), 32B, 16, { 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0,
-1, 0, -1, 0, -1 }, { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 },
0);
o0_11 = *out_10(D);
o14_13 = MEM[(int *)out_10(D) + 56B];
*res_15(D) = o0_11;
MEM[(int *)res_15(D) + 8B] = 7;
MEM[(int *)res_15(D) + 16B] = o14_13;
MEM[(int *)res_15(D) + 24B] = 15;
mask ={v} {CLOBBER(eol)};
Since after discussion with Richi,
this current possible fix patch can only hanlde VLS (fixed-length) vectors,
can not handle VLA (variable-length) vectors.
It's hard for us to create a C code testcase to produce CSE opportunity for
VL vectors.
So, open a BUG for now to make me won't forget such issue.
Will enhance LEN_MASK_STORE in CSE after I finished all RVV auto-vectorization
support.