https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117818
--- Comment #5 from Steven Munroe <munroesj at gcc dot gnu.org> ---
I expected compiling for -mcpu=power9 to do a better job generating splats for
small constants.
Given the new instructions like VSX Vector Splat Immediate Byte (xxspltib) and
Vector Extend Sign Byte To Word/Doubleword the compiler should have a easier
time generating vec_splats(). It would seem that Vector Splat Immediate Byte
would be the perfect way to generate a constant shift quadword left/right.
But that is not what I am seeing. First note there is no direct intrinsic for
xxspltib. It is sometimes generated for vec_splat_u8(0-15) and
vec_splats((vector unsigned char) x). But sometimes it gets weird.
For example:
vui128_t
test_slqi_char_18_V3 (vui128_t vra)
{
vui8_t result;
vui8_t tmp = vec_splats((unsigned char)18);
result = vec_vslo ((vui8_t) vra, tmp);
return (vui128_t) vec_vsl (result, tmp);
}
Which I would expect to generate:
xxspltib 34,18
vslo 2,2,0
vsl 2,2,0
But generates:
vspltisb 0,9
vadduwm 0,0,0
vslo 2,2,0
vsl 2,2,0
It recognizes that it can't generate 18 with vspltisb and uses the 18 = 9 * 2
pattern. It also erroneously generates vector add word. Seem like GCC is
reusing the old pattern and ignoring the new instructions.
This is weird because:
vui8_t
test_splat6_char_18 ()
{
vui8_t tmp = vec_splat_u8(9);
return vec_add (tmp, tmp);
}
Generates:
xxspltib 34,9
vaddubm 2,2,2
But:
vui8_t
test_splat6_char_31 ()
{
// 31 = (16+15) = (15 - (-16))
vui8_t v16 = vec_splat_u8(-16);
vui8_t tmp = vec_splat_u8(15);
return vec_sub (tmp, v16);
}
Generates:
xxspltib 34,31
Which seems like a miracle. Is this constant propagation?
But:
vui8_t
test_slqi_char_31_V0 (vui8_t vra)
{
vui8_t result;
// 31 = (16+15) = (15 - (-16))
vui8_t v16 = vec_splat_u8(-16);
vui8_t tmp = vec_splat_u8(15);
tmp = vec_sub (tmp, v16);
result = vec_slo (vra, tmp);
return vec_sll (result, tmp);
}
Generates:
addis 9,2,.LC0@toc@ha
addi 9,9,.LC0@toc@l
lxv 32,0(9)
vslo 2,2,0
vsl 2,2,0
Ok I think I can fix ths with:
vui8_t
test_slqi_char_31_V3 (vui8_t vra)
{
vui8_t result;
vui8_t tmp = vec_splats((unsigned char)31);
result = vec_slo (vra, tmp);
return vec_sll (result, tmp);
}
But no. it still generated:
addis 9,2,.LC0@toc@ha
addi 9,9,.LC0@toc@l
lxv 32,0(9)
vslo 2,2,0
vsl 2,2,0
Which is all very confusing.