https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119912
Bug ID: 119912
Summary: PPC: Inefficient vector immediate shifts
Product: gcc
Version: 14.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Shifts by <element bit width>-1 should be performed by a 0xFF..FF constant as
PPC has modulo shift and the constant generation for 0xFF..FF requires just 1
instruction.
On Power9 always use a byte mask for the shift amount that xxspltib can be
used.
On Power8 use vspltisb for the
- value range 0..15 and 16..31 for int
- value range 0..15 for short
- value range 0..7 for byte
- 0..15 48..63 for long long.
1 byte shift left as add is done already by gcc.
Sample:
#include <altivec.h>
vector unsigned int shl31(vector unsigned int in)
{
return vec_sl(in, (vector unsigned int)vec_splats((unsigned char)31));
}
Today on Power8/9:
shl31(unsigned int __vector(4)):
.LCF0:
0: addis 2,12,.TOC.-.LCF0@ha
addi 2,2,.TOC.-.LCF0@l
addis 9,2,.LC0@toc@ha
addi 9,9,.LC0@toc@l
lxv 32,0(9)
vslw 2,2,0
blr
Should be done by:
Power8:
vspltisw 0,-1
vslw 2,2,0
Power9:
xxspltib 34,31
vslw 2,2,0