On Tue, Apr 14, 2020 at 07:39:44AM -0500, Segher Boessenkool wrote:
> Hi,
> 
> On Mon, Apr 13, 2020 at 10:24:39PM -0400, Michael Meissner wrote:
> > This patch fixes PR target/94557, on PowerPC.  It was a regression on the 
> > GCC 9
> > branch caused by my recent patch for PR target/93932.
> > 
> > The patch for 93932 was a back port from the master branch of a fix for the
> > vec_extract built-in function.  This patch fixes the case where we are
> > optimizing a vector extract from an vector in memory to be a scalar load
> > instead of loading the vector to a register and then doing an extract.
> 
> What does "this patch" mean?  The backport?
> 
> > This patch adds in the masking of the vector index that is in the master
> > branch.  I re-implemented the change for GCC 9, since the changes on the 
> > master
> > branch are more extensive, and include PC-relative support that is not in 
> > GCC
> > 9.
> 
> Hrm.
> 
> > 2020-04-13  Michael Meissner  <meiss...@linux.ibm.com>
> > 
> >     PR target/94557
> >     * config/rs6000/rs6000.c (rs6000_adjust_vec_address): Mask
> >     variable vector extract index so it does not go beyond the vector
> >     when extracting a vector element from memory.  This change is a
> >     simplification of the 2020-02-03 patch that went into the trunk.
> 
> You have no patches go into trunk at that date.

Lets see if I can clarify things.

One of the examples comes from vsx-builtin-10a.c.  It has the following code:

#define CONST0          (0)
#define CONST1          (1)
#define CONST2          (2)
#define CONST3          (3)
#define CONST4          (4)
#define CONST5          (5)
#define CONST6          (6)
#define CONST7          (7)

__attribute__((noinline))
short mci (vector short *vp, int i)
{
  return __builtin_vec_ext_v8hi (*vp, i);
}

int main (int argc, short *argv[]) {
  vector short sv = {
    CONST0, CONST1, CONST2, CONST3, CONST4, CONST5, CONST6, CONST7 };
  short s;

  /* ... */

  s = mci (&sv, 25);
  if (s != CONST1)
    abort ();
}

The current trunk generates the correct code for both -mcpu=power8 and power9

        power8:                         power9:
mci:                            mci:
        rldicl 9,4,0,61                 rldicl 9,4,0,61
        rldicr 3,3,0,59                 sldi 9,9,1
        sldi 9,9,1                      lhzx 4,3,9
        lhzx 4,3,9                      extsh 3,4
        extsh 3,4                       blr
        blr

The 'rldicr 3,3,0,59' is presumably due to alignment issues in vectors between
p8 and p9.  The important line is the 'rldicl 9,4,0,61' which truncates the
variable vector index to be between 0..7.  The test that fails has an index of
25 (i.e. out of bounds).

The GCC 9 -mcpu=power9 code does the optimization converting vec_extract from a
vector in memory to be a normal load, but the GCC 9 -mcpu=power8 code does not
do this optimization.

        power8:                         power9:
mci:                            mci:                    
        rldicl 9,4,0,61                 sldi 9,4,1
        lvx 0,0,3                       lhzx 4,3,9
        subfic 9,9,7                    extsh 3,4
        sldi 9,9,4                      blr
        mtvsrd 33,9
        xxpermdi 33,33,33,0
        vslo 1,0,1
        mfvsrd 3,33
        srdi 3,3,48
        extsh 3,3
        blr

Notice that even though the optimization of using a scalar load is not done for
-mcpu=power8, it does do the masking.  Thus it will access the correct element.

The code for -mcpu=power9 is missing the masking.  So when it is given an index
of 25, it randomly indexes outside of the vector, picking up whatever 2 byte
value is is after the vector in question.  The test in particular checks to see
that the element #1 is return, and fails at execution time.

The patch for PR target/94557, provides this masking when the optimization is
done.

Now, what happened with the backport of the 93932 patch, is without the 93932
patch, it always called rs6000_split_vec_extract_var which does the masking.

The master branch now splits this into 2 cases, extracting from a vector in a
register calls rs6000_split_vec_extract_var, but extracting a scalar from a
vector in memory calls the lower level function rs6000_adjust_vec_address.  In
the master branch, as part of the PC-relative support, the
rs6000_adjust_vec_address function now does the masking.  In GCC 9, we were
relying on the masking being done in rs6000_split_vec_extract_var before it
calls rs6000_adjust_vec_address.

And if you remember, part of the issue with master patches in that area is you
pointed out that we were modifying an operand to do the masking.

> > --- /tmp/4XFFqK_rs6000.c    2020-04-13 15:28:33.514011024 -0500
> > +++ gcc/config/rs6000/rs6000.c      2020-04-13 14:24:01.296932921 -0500
> > @@ -7047,18 +7047,25 @@ rs6000_adjust_vec_address (rtx scalar_re
> >      element_offset = GEN_INT (INTVAL (element) * scalar_size);
> >    else
> >      {
> > +      /* Mask the element to make sure the element number is between 0 and 
> > the
> > +    maximum number of elements - 1 so that we don't generate an address
> > +    outside the vector.  */
> 
> The patch introducing this to trunk was reverted.

There were several patches.  One part was reverted but not the part that did
the masking.  Here is the git blame code:

e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6745) 
get_vector_offset (rtx mem, rtx element, rtx base_tmp, unsigned scalar_size)
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6746) {
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6747)   if 
(CONST_INT_P (element))
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6748)     return 
GEN_INT (INTVAL (element) * scalar_size);
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6749) 
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6750)   /* All insns 
should use the 'Q' constraint (address is a single register) if
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6751)      the 
element number is not a constant.  */
d10cddeaad (Michael Meissner    2020-02-05 16:45:05 -0500  6752)   gcc_assert 
(satisfies_constraint_Q (mem));
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6753) 
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6754)   /* Mask the 
element to make sure the element number is between 0 and the
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6755)      maximum 
number of elements - 1 so that we don't generate an address
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6756)      outside 
the vector.  */
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6757)   rtx 
num_ele_m1 = GEN_INT (GET_MODE_NUNITS (GET_MODE (mem)) - 1);
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6758)   rtx and_op = 
gen_rtx_AND (Pmode, element, num_ele_m1);
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6759)   emit_insn 
(gen_rtx_SET (base_tmp, and_op));
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6760) 
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6761)   /* Shift the 
element to get the byte offset from the element number.  */
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6762)   int shift = 
exact_log2 (scalar_size);
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6763)   gcc_assert 
(shift >= 0);
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6764) 
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6765)   if (shift > 
0)
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6766)     {
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6767)       rtx 
shift_op = gen_rtx_ASHIFT (Pmode, base_tmp, GEN_INT (shift));
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6768)       
emit_insn (gen_rtx_SET (base_tmp, shift_op));
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6769)     }
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6770) 
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6771)   return 
base_tmp;
e7f3e07528 (Michael Meissner    2020-02-03 17:57:57 -0500  6772) }

The part that got replaced was the assertion which had used the wrong argument.
But the part that did the masking was not reverted.

Because we don't support the PC-relative addressing in GCC 9, I made the
decision to do the masking locally in rs6000_adjust_vec_address, rather than
replicate the get_vector_offset function and do the mask.  But I used the
temporary register for the masking, rather than modifying the parameter marked
as input-only (i.e. fixing the bug you pointed out).  The reason for doing the
masking did come from the master branch, even if I didn't use the code
directly.

> 
> 
> I don't get the warm and fuzzies from this patch.  How can I expect that
> it won't cause more problems than it solves, like this?  Reverts and new
> implementations, and also mislabeled?
> 
> Please explain this better?
> 
> 
> Segher

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Reply via email to