On Tue, Apr 14, 2020 at 07:39:44AM -0500, Segher Boessenkool wrote:
> Hi,
>
> On Mon, Apr 13, 2020 at 10:24:39PM -0400, Michael Meissner wrote:
> > This patch fixes PR target/94557, on PowerPC. It was a regression on the
> > GCC 9
> > branch caused by my recent patch for PR target/93932.
> >
> > The patch for 93932 was a back port from the master branch of a fix for the
> > vec_extract built-in function. This patch fixes the case where we are
> > optimizing a vector extract from an vector in memory to be a scalar load
> > instead of loading the vector to a register and then doing an extract.
>
> What does "this patch" mean? The backport?
>
> > This patch adds in the masking of the vector index that is in the master
> > branch. I re-implemented the change for GCC 9, since the changes on the
> > master
> > branch are more extensive, and include PC-relative support that is not in
> > GCC
> > 9.
>
> Hrm.
>
> > 2020-04-13 Michael Meissner
> >
> > PR target/94557
> > * config/rs6000/rs6000.c (rs6000_adjust_vec_address): Mask
> > variable vector extract index so it does not go beyond the vector
> > when extracting a vector element from memory. This change is a
> > simplification of the 2020-02-03 patch that went into the trunk.
>
> You have no patches go into trunk at that date.
Lets see if I can clarify things.
One of the examples comes from vsx-builtin-10a.c. It has the following code:
#define CONST0 (0)
#define CONST1 (1)
#define CONST2 (2)
#define CONST3 (3)
#define CONST4 (4)
#define CONST5 (5)
#define CONST6 (6)
#define CONST7 (7)
__attribute__((noinline))
short mci (vector short *vp, int i)
{
return __builtin_vec_ext_v8hi (*vp, i);
}
int main (int argc, short *argv[]) {
vector short sv = {
CONST0, CONST1, CONST2, CONST3, CONST4, CONST5, CONST6, CONST7 };
short s;
/* ... */
s = mci (&sv, 25);
if (s != CONST1)
abort ();
}
The current trunk generates the correct code for both -mcpu=power8 and power9
power8: power9:
mci:mci:
rldicl 9,4,0,61 rldicl 9,4,0,61
rldicr 3,3,0,59 sldi 9,9,1
sldi 9,9,1 lhzx 4,3,9
lhzx 4,3,9 extsh 3,4
extsh 3,4 blr
blr
The 'rldicr 3,3,0,59' is presumably due to alignment issues in vectors between
p8 and p9. The important line is the 'rldicl 9,4,0,61' which truncates the
variable vector index to be between 0..7. The test that fails has an index of
25 (i.e. out of bounds).
The GCC 9 -mcpu=power9 code does the optimization converting vec_extract from a
vector in memory to be a normal load, but the GCC 9 -mcpu=power8 code does not
do this optimization.
power8: power9:
mci:mci:
rldicl 9,4,0,61 sldi 9,4,1
lvx 0,0,3 lhzx 4,3,9
subfic 9,9,7extsh 3,4
sldi 9,9,4 blr
mtvsrd 33,9
xxpermdi 33,33,33,0
vslo 1,0,1
mfvsrd 3,33
srdi 3,3,48
extsh 3,3
blr
Notice that even though the optimization of using a scalar load is not done for
-mcpu=power8, it does do the masking. Thus it will access the correct element.
The code for -mcpu=power9 is missing the masking. So when it is given an index
of 25, it randomly indexes outside of the vector, picking up whatever 2 byte
value is is after the vector in question. The test in particular checks to see
that the element #1 is return, and fails at execution time.
The patch for PR target/94557, provides this masking when the optimization is
done.
Now, what happened with the backport of the 93932 patch, is without the 93932
patch, it always called rs6000_split_vec_extract_var which does the masking.
The master branch now splits this into 2 cases, extracting from a vector in a
register calls rs6000_split_vec_extract_var, but extracting a scalar from a
vector in memory calls the lower level function rs6000_adjust_vec_address. In
the master branch, as part of the PC-relative support, the
rs6000_adjust_vec_address function now does the masking. In GCC 9, we were
relying on the masking being done in rs6000_split_vec_extract_var before it
calls rs6000_adjust_vec_address.
And if you remember, part of the issue with master patches in that area is you
pointed out that we were modifying an operand to do the masking.
> > --- /tmp/4XFFqK_rs6000.c2020-04-13 15:28:33.514011024 -0500
> > +++ gcc/config/rs6000/rs6000.c 2020-04-13 14:24:01.296932921 -0500
> > @@ -7047,18 +7047,25 @@ rs6000_adjust_vec_address (rtx scalar_re
> > element_offset = GEN_INT (INTVAL (element) * scalar_size);
> >else
> > {
> > + /* Mask the element to make sure the element number is betw