https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118480
--- Comment #1 from Steven Munroe <munroesj at gcc dot gnu.org> ---
Strangely the ticks that seem to work for positive immediate values (see
test_slqi_char_18_V3 above) fail (generate and .rodata load) for negative
values. For example the shift count for 110 (110-128 = -18):
vui8_t
test_splat1_char_110_V2 ()
{
return vec_splats ((unsigned char)110);
}
test_splat1_char_110_V2:
xxspltib 34,110
blr
But fails when the vec_splats results is passed to vec_slo/vec_sll:
vui128_t
test_slqi_char_110_V3 (vui128_t vra)
{
vui8_t result;
vui8_t tmp = vec_splats((unsigned char)110);
result = vec_vslo ((vui8_t) vra, tmp);
return (vui128_t) vec_vsl (result, tmp);
}
test_slqi_char_110_V3:
addis 9,2,.LC9@toc@ha
addi 9,9,.LC9@toc@l
lxv 32,0(9)
vslo 2,2,0
vsl 2,2,0
blr
Strangely GCC playes along with the even (but negative) numbers trick. For
example:
vui8_t
test_splat7_char_110_V0 ()
{ // 110-128 = -18
// (-18 / 2) + (-18 / 2)
// (-9) + (-9)
vui8_t tmp = vec_splat_u8(-9);
return vec_add (tmp, tmp);
}
test_splat7_char_110_V0:
xxspltib 34,247
vaddubm 2,2,2
blr
But fails when this value passed to vec_slo/vec_sll:
vui128_t
test_slqi_char_110_V2 (vui128_t vra)
{
vui8_t result;
vui8_t tmp = vec_splat_u8(-9);
tmp = vec_vaddubm (tmp, tmp);
result = vec_vslo ((vui8_t) vra, tmp);
return (vui128_t) vec_vsl (result, tmp);
}
test_slqi_char_110_V2:
addis 9,2,.LC11@toc@ha
addi 9,9,.LC11@toc@l
lxv 32,0(9)
vslo 2,2,0
vsl 2,2,0
blr
Stranger yet, replacing the vaddubm with a shift left 1
vui8_t
test_splat7_char__110_V4 ()
{ // 110 - 128 = -18
// -18 = (-9 * 2) = (-9 << 1)
vui8_t v1 = vec_splat_u8(1);
vui8_t tmp = vec_splat_u8(-9);
return vec_sl (tmp, v1);
}
test_splat7_char__110_V4:
.LFB34:
.cfi_startproc
xxspltib 34,247
vaddubm 2,2,2
blr
When this is passed to vec_slo/vec_sll, GCC avoids the conversion to .rodata,
but converts the shift back to xxspltib/vaddubm. This is slightly better but
generates an extra (and unnecessary) instruction:
vui8_t
test_slqi_char_110_V4 (vui8_t vra)
{
vui8_t result;
// 110 = (-9 * 2) = (-9 << 1)
vui8_t v1 = vec_splat_u8(1);
vui8_t tmp = vec_splat_u8(-9);
tmp = vec_sl (tmp, v1);
result = vec_slo (vra, tmp);
return vec_sll (result, tmp);
}
test_slqi_char_110_V4:
.LFB41:
.cfi_startproc
xxspltib 32,247
vaddubm 0,0,0
vslo 2,2,0
vsl 2,2,0
blr
Perhaps we are on to something!
- Avoid negative values
- Use explicit shift instead of add
So one last example generating the 7-bit shift-count as octet (times 8) plus
bit shift and using only positive values:
vui8_t
test_splat7_char_110_V1 ()
{
// 110 = (13 * 8) + 4
vui8_t v3 = vec_splat_u8(3);
vui8_t tmp = vec_splat_u8(13);
vui8_t tmp2 = vec_splat_u8(6);
tmp = vec_sl (tmp, v3);
return vec_add (tmp, tmp2);
}
test_splat7_char_110_V1:
xxspltib 34,110
blr
And:
vui8_t
test_slqi_char_110_V5 (vui8_t vra)
{
vui8_t result;
// 110 = (13 * 8) + 6
vui8_t v3 = vec_splat_u8(3);
vui8_t tmp = vec_splat_u8(13);
vui8_t tmp2 = vec_splat_u8(6);
tmp = vec_sl (tmp, v3);
tmp = vec_add (tmp, tmp2);
result = vec_slo (vra, tmp);
return vec_sll (result, tmp);
}
test_slqi_char_110_V5:
xxspltib 32,110
vslo 2,2,0
vsl 2,2,0
blr
Finally we have a reasonable result that should have been possible with simple
vec_splats((unsigned char)110)!
Note: this looks like a possible workaround for generating vector splatted with
positive constants. It still looks like a problem with negative constants
persists.