Re: [PATCH] Add PowerPC ISA 3.0 word splat and byte immediate splat support
On Fri, May 13, 2016 at 08:23:16PM -0500, Segher Boessenkool wrote: > On Fri, May 13, 2016 at 07:25:43PM -0400, Michael Meissner wrote: > > This patch adds support for the 32-bit word splat instructions, the byte > > immediate splat instructions, and the vector sign extend instructions to GCC > > 7.0. > > > > In addition to the various splat instructions, since I was modifying the > > vector > > moves, I took the opportunity to reorganize the vector move instructions > > with > > several changes I've wanted to do: > > It is much easier to review, and for regression searches later, if one > patch does one thing. No need to change this patch, but please keep > it in mind for later patches. Yes and no. Sometimes, you need to make larger changes. > > Are these patches ok to apply to the GCC 7.0 trunk? > > Changelog is missing. Sorry about that. I was focused on including the patches this time, that I forgot to include the ChangeLog entries. Unfortunately, I'm finding some regressions that seem to show up with more recent trunk changes, particularly with reload than LRA. I'm going to have to spend some time debugging why reload is failing on big endian systems, and resubmit the patches when it is debugged. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] Add PowerPC ISA 3.0 word splat and byte immediate splat support
On Fri, May 13, 2016 at 07:25:43PM -0400, Michael Meissner wrote: > This patch adds support for the 32-bit word splat instructions, the byte > immediate splat instructions, and the vector sign extend instructions to GCC > 7.0. > > In addition to the various splat instructions, since I was modifying the > vector > moves, I took the opportunity to reorganize the vector move instructions with > several changes I've wanted to do: It is much easier to review, and for regression searches later, if one patch does one thing. No need to change this patch, but please keep it in mind for later patches. > I replaced the single move that handled all vector types with separate 32-bit > and 64-bit moves. I also combined the movti_ moves (when -mvsx-timode is > in effect) into the main vector moves. > > I eliminated separate moves for , where the preferred register > class > () is listed first, and the secondary register class () is listed > second with a '?' to discourage use. > > Prefer loading 0/-1 in any VSX register for ISA 3.0, and Altivec registers for > ISA 2.06/2.07 (PR target/70915) so that if the register was involved in a slow > operation, the clear/set operation does not wait for the slow operation to > finish. > > I adjusted the length attributes for 32-bit mode. > > I changed the 32-bit move to use rs6000_output_move_128bit and drop the use of > the string instructions for 32-bit movti when -mvsx-timode is in effect. > > I used spacing so that the alternatives and attributes don't generate long > lines, and put things in columns, so that it is easier to match up the > operands > and attributes with the insn alternatives. > > I did a spec 2006 run comparing these changes to the trunk, and there were no > significant differences in terms of performances. > > Are these patches ok to apply to the GCC 7.0 trunk? Changelog is missing. > +;; Generate the XXORC instruction which was added in ISA 2.07 > +(define_constraint "wM" > + "vector constant with all 1's, ISA 2.07 and above" > + (and (match_test "TARGET_P8_VECTOR") > + (match_operand 0 "all_ones_constant"))) That's not a very good comment; a constraint doesn't generate anything. "Used for", maybe? > +(define_predicate "xxspltib_constant_split" > + (match_code "const_vector,vec_duplicate,const_int") > +{ > + int value = 256; > + int num_insns = -1; No tabs please, just a single space will do. > + rtx cvt = ((TARGET_XSCVDPSPN) > + ? gen_vsx_xscvdpspn_scalar (freg, sreg) > + : gen_vsx_xscvdpsp_scalar (freg, sreg)); No parentheses around TARGET_XSCVDPSPN. > +(define_insn "xxspltib_v16qi" > + [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa") > + (vec_duplicate:V16QI (match_operand:SI 1 "s8bit_cint_operand" "i")))] > + "TARGET_P9_VECTOR" > { > - return rs6000_output_move_128bit (operands); > + operands[2] = GEN_INT (INTVAL (operands[1]) & 0xff); > + return "xxspltib %x0,%2"; > } Please use "n" instead of "i"? It shouldn't matter here, but at least it's good documentation. > +;; VSX store VSX load > +;; VSX movedirect move > +;; direct move GPR store > +;; GPR loadGPR move > +;; P9 const. AVX const. > +;; P9 const. 0 > +;; -1 GPR const. > +;; AVX store AVX load "AVX"? Heh. AltiVec or VMX. > +(define_insn "*vsx_mov_64bit" > + [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=wOZ,, > + , *r, > + *wo, $Y, > + ??r, ??r, > + wo, v, > + v, ?, > + ?wh, ??r, > + wZ, v") This is more readable, excellent. Would four or so columns work even better? > +/* { dg-do compile { target { powerpc64le-*-* } } } */ Does the test not work on BE? Okay for trunk, with a changelog and the nits fixed. Thanks, Segher
[PATCH] Add PowerPC ISA 3.0 word splat and byte immediate splat support
This patch adds support for the 32-bit word splat instructions, the byte immediate splat instructions, and the vector sign extend instructions to GCC 7.0. In addition to the various splat instructions, since I was modifying the vector moves, I took the opportunity to reorganize the vector move instructions with several changes I've wanted to do: I replaced the single move that handled all vector types with separate 32-bit and 64-bit moves. I also combined the movti_ moves (when -mvsx-timode is in effect) into the main vector moves. I eliminated separate moves for , where the preferred register class () is listed first, and the secondary register class () is listed second with a '?' to discourage use. Prefer loading 0/-1 in any VSX register for ISA 3.0, and Altivec registers for ISA 2.06/2.07 (PR target/70915) so that if the register was involved in a slow operation, the clear/set operation does not wait for the slow operation to finish. I adjusted the length attributes for 32-bit mode. I changed the 32-bit move to use rs6000_output_move_128bit and drop the use of the string instructions for 32-bit movti when -mvsx-timode is in effect. I used spacing so that the alternatives and attributes don't generate long lines, and put things in columns, so that it is easier to match up the operands and attributes with the insn alternatives. I did a spec 2006 run comparing these changes to the trunk, and there were no significant differences in terms of performances. Are these patches ok to apply to the GCC 7.0 trunk? -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/constraints.md === --- gcc/config/rs6000/constraints.md (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 236136) +++ gcc/config/rs6000/constraints.md(.../gcc/config/rs6000) (working copy) @@ -140,6 +140,10 @@ (define_constraint "wD" (and (match_code "const_int") (match_test "TARGET_VSX && (ival == VECTOR_ELEMENT_SCALAR_64BIT)"))) +(define_constraint "wE" + "Vector constant that can be loaded with the XXSPLTIB instruction." + (match_test "xxspltib_constant_nosplit (op, mode)")) + ;; Extended fusion store (define_memory_constraint "wF" "Memory operand suitable for power9 fusion load/stores" @@ -156,6 +160,12 @@ (define_constraint "wL" (and (match_test "TARGET_DIRECT_MOVE_128") (match_test "(ival == VECTOR_ELEMENT_MFVSRLD_64BIT)" +;; Generate the XXORC instruction which was added in ISA 2.07 +(define_constraint "wM" + "vector constant with all 1's, ISA 2.07 and above" + (and (match_test "TARGET_P8_VECTOR") + (match_operand 0 "all_ones_constant"))) + ;; ISA 3.0 vector d-form addresses (define_memory_constraint "wO" "Memory operand suitable for the ISA 3.0 vector d-form instructions." @@ -166,6 +176,10 @@ (define_memory_constraint "wQ" "Memory operand suitable for the load/store quad instructions" (match_operand 0 "quad_memory_operand")) +(define_constraint "wS" + "Vector constant that can be loaded with XXSPLTIB & sign extension." + (match_test "xxspltib_constant_split (op, mode)")) + ;; Altivec style load/store that ignores the bottom bits of the address (define_memory_constraint "wZ" "Indexed or indirect memory operand, ignoring the bottom 4 bits" Index: gcc/config/rs6000/predicates.md === --- gcc/config/rs6000/predicates.md (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 236136) +++ gcc/config/rs6000/predicates.md (.../gcc/config/rs6000) (working copy) @@ -565,6 +565,38 @@ (define_predicate "easy_fp_constant" } }) +;; Return 1 if the operand is a CONST_VECTOR or VEC_DUPLICATE of a constant +;; that can loaded with a XXSPLTIB instruction and then a VUPKHSB, VECSB2W or +;; VECSB2D instruction. + +(define_predicate "xxspltib_constant_split" + (match_code "const_vector,vec_duplicate,const_int") +{ + int value= 256; + int num_insns= -1; + + if (!xxspltib_constant_p (op, mode, &num_insns, &value)) +return false; + + return num_insns > 1; +}) + + +;; Return 1 if the operand is a CONST_VECTOR that can loaded directly with a +;; XXSPLTIB instruction. + +(define_predicate "xxspltib_constant_nosplit" + (match_code "const_vector,vec_duplicate,const_int") +{ + int value= 256; + int num_insns= -1; + + if (!xxspltib_constant_p (op, mode, &num_insns, &value)) +return false; + + return num_insns == 1; +}) + ;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a ;; vector register without using memory. (define_predicate "easy_vector_constant" @@ -583,7 +615,14 @@ (define_predicate "easy_vector_constant" if (VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)) { - if (zero_constant (op, mode))