On Tue, Apr 14, 2020 at 07:39:44AM -0500, Segher Boessenkool wrote: > Hi, > > On Mon, Apr 13, 2020 at 10:24:39PM -0400, Michael Meissner wrote: > > This patch fixes PR target/94557, on PowerPC. It was a regression on the > > GCC 9 > > branch caused by my recent patch for PR target/93932. > > > > The patch for 93932 was a back port from the master branch of a fix for the > > vec_extract built-in function. This patch fixes the case where we are > > optimizing a vector extract from an vector in memory to be a scalar load > > instead of loading the vector to a register and then doing an extract. > > What does "this patch" mean? The backport? > > > This patch adds in the masking of the vector index that is in the master > > branch. I re-implemented the change for GCC 9, since the changes on the > > master > > branch are more extensive, and include PC-relative support that is not in > > GCC > > 9. > > Hrm. > > > 2020-04-13 Michael Meissner <meiss...@linux.ibm.com> > > > > PR target/94557 > > * config/rs6000/rs6000.c (rs6000_adjust_vec_address): Mask > > variable vector extract index so it does not go beyond the vector > > when extracting a vector element from memory. This change is a > > simplification of the 2020-02-03 patch that went into the trunk. > > You have no patches go into trunk at that date.
Lets see if I can clarify things. One of the examples comes from vsx-builtin-10a.c. It has the following code: #define CONST0 (0) #define CONST1 (1) #define CONST2 (2) #define CONST3 (3) #define CONST4 (4) #define CONST5 (5) #define CONST6 (6) #define CONST7 (7) __attribute__((noinline)) short mci (vector short *vp, int i) { return __builtin_vec_ext_v8hi (*vp, i); } int main (int argc, short *argv[]) { vector short sv = { CONST0, CONST1, CONST2, CONST3, CONST4, CONST5, CONST6, CONST7 }; short s; /* ... */ s = mci (&sv, 25); if (s != CONST1) abort (); } The current trunk generates the correct code for both -mcpu=power8 and power9 power8: power9: mci: mci: rldicl 9,4,0,61 rldicl 9,4,0,61 rldicr 3,3,0,59 sldi 9,9,1 sldi 9,9,1 lhzx 4,3,9 lhzx 4,3,9 extsh 3,4 extsh 3,4 blr blr The 'rldicr 3,3,0,59' is presumably due to alignment issues in vectors between p8 and p9. The important line is the 'rldicl 9,4,0,61' which truncates the variable vector index to be between 0..7. The test that fails has an index of 25 (i.e. out of bounds). The GCC 9 -mcpu=power9 code does the optimization converting vec_extract from a vector in memory to be a normal load, but the GCC 9 -mcpu=power8 code does not do this optimization. power8: power9: mci: mci: rldicl 9,4,0,61 sldi 9,4,1 lvx 0,0,3 lhzx 4,3,9 subfic 9,9,7 extsh 3,4 sldi 9,9,4 blr mtvsrd 33,9 xxpermdi 33,33,33,0 vslo 1,0,1 mfvsrd 3,33 srdi 3,3,48 extsh 3,3 blr Notice that even though the optimization of using a scalar load is not done for -mcpu=power8, it does do the masking. Thus it will access the correct element. The code for -mcpu=power9 is missing the masking. So when it is given an index of 25, it randomly indexes outside of the vector, picking up whatever 2 byte value is is after the vector in question. The test in particular checks to see that the element #1 is return, and fails at execution time. The patch for PR target/94557, provides this masking when the optimization is done. Now, what happened with the backport of the 93932 patch, is without the 93932 patch, it always called rs6000_split_vec_extract_var which does the masking. The master branch now splits this into 2 cases, extracting from a vector in a register calls rs6000_split_vec_extract_var, but extracting a scalar from a vector in memory calls the lower level function rs6000_adjust_vec_address. In the master branch, as part of the PC-relative support, the rs6000_adjust_vec_address function now does the masking. In GCC 9, we were relying on the masking being done in rs6000_split_vec_extract_var before it calls rs6000_adjust_vec_address. And if you remember, part of the issue with master patches in that area is you pointed out that we were modifying an operand to do the masking. > > --- /tmp/4XFFqK_rs6000.c 2020-04-13 15:28:33.514011024 -0500 > > +++ gcc/config/rs6000/rs6000.c 2020-04-13 14:24:01.296932921 -0500 > > @@ -7047,18 +7047,25 @@ rs6000_adjust_vec_address (rtx scalar_re > > element_offset = GEN_INT (INTVAL (element) * scalar_size); > > else > > { > > + /* Mask the element to make sure the element number is between 0 and > > the > > + maximum number of elements - 1 so that we don't generate an address > > + outside the vector. */ > > The patch introducing this to trunk was reverted. There were several patches. One part was reverted but not the part that did the masking. Here is the git blame code: e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6745) get_vector_offset (rtx mem, rtx element, rtx base_tmp, unsigned scalar_size) e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6746) { e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6747) if (CONST_INT_P (element)) e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6748) return GEN_INT (INTVAL (element) * scalar_size); e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6749) e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6750) /* All insns should use the 'Q' constraint (address is a single register) if e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6751) the element number is not a constant. */ d10cddeaad (Michael Meissner 2020-02-05 16:45:05 -0500 6752) gcc_assert (satisfies_constraint_Q (mem)); e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6753) e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6754) /* Mask the element to make sure the element number is between 0 and the e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6755) maximum number of elements - 1 so that we don't generate an address e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6756) outside the vector. */ e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6757) rtx num_ele_m1 = GEN_INT (GET_MODE_NUNITS (GET_MODE (mem)) - 1); e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6758) rtx and_op = gen_rtx_AND (Pmode, element, num_ele_m1); e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6759) emit_insn (gen_rtx_SET (base_tmp, and_op)); e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6760) e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6761) /* Shift the element to get the byte offset from the element number. */ e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6762) int shift = exact_log2 (scalar_size); e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6763) gcc_assert (shift >= 0); e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6764) e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6765) if (shift > 0) e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6766) { e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6767) rtx shift_op = gen_rtx_ASHIFT (Pmode, base_tmp, GEN_INT (shift)); e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6768) emit_insn (gen_rtx_SET (base_tmp, shift_op)); e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6769) } e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6770) e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6771) return base_tmp; e7f3e07528 (Michael Meissner 2020-02-03 17:57:57 -0500 6772) } The part that got replaced was the assertion which had used the wrong argument. But the part that did the masking was not reverted. Because we don't support the PC-relative addressing in GCC 9, I made the decision to do the masking locally in rs6000_adjust_vec_address, rather than replicate the get_vector_offset function and do the mask. But I used the temporary register for the masking, rather than modifying the parameter marked as input-only (i.e. fixing the bug you pointed out). The reason for doing the masking did come from the master branch, even if I didn't use the code directly. > > > I don't get the warm and fuzzies from this patch. How can I expect that > it won't cause more problems than it solves, like this? Reverts and new > implementations, and also mislabeled? > > Please explain this better? > > > Segher -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797