Re: [PATCH] Add PowerPC ISA 3.0 word splat and byte immediate splat support

2016-05-16 Thread Michael Meissner
On Fri, May 13, 2016 at 08:23:16PM -0500, Segher Boessenkool wrote:
> On Fri, May 13, 2016 at 07:25:43PM -0400, Michael Meissner wrote:
> > This patch adds support for the 32-bit word splat instructions, the byte
> > immediate splat instructions, and the vector sign extend instructions to GCC
> > 7.0.
> > 
> > In addition to the various splat instructions, since I was modifying the 
> > vector
> > moves, I took the opportunity to reorganize the vector move instructions 
> > with
> > several changes I've wanted to do:
> 
> It is much easier to review, and for regression searches later, if one
> patch does one thing.  No need to change this patch, but please keep
> it in mind for later patches.

Yes and no.  Sometimes, you need to make larger changes.

> > Are these patches ok to apply to the GCC 7.0 trunk?
> 
> Changelog is missing.

Sorry about that.  I was focused on including the patches this time, that I
forgot to include the ChangeLog entries.

Unfortunately, I'm finding some regressions that seem to show up with more
recent trunk changes, particularly with reload than LRA.  I'm going to have to
spend some time debugging why reload is failing on big endian systems, and
resubmit the patches when it is debugged.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] Add PowerPC ISA 3.0 word splat and byte immediate splat support

2016-05-13 Thread Segher Boessenkool
On Fri, May 13, 2016 at 07:25:43PM -0400, Michael Meissner wrote:
> This patch adds support for the 32-bit word splat instructions, the byte
> immediate splat instructions, and the vector sign extend instructions to GCC
> 7.0.
> 
> In addition to the various splat instructions, since I was modifying the 
> vector
> moves, I took the opportunity to reorganize the vector move instructions with
> several changes I've wanted to do:

It is much easier to review, and for regression searches later, if one
patch does one thing.  No need to change this patch, but please keep
it in mind for later patches.

> I replaced the single move that handled all vector types with separate 32-bit
> and 64-bit moves.  I also combined the movti_ moves (when -mvsx-timode is
> in effect) into the main vector moves.
> 
> I eliminated separate moves for  , where the preferred register 
> class
> () is listed first, and the secondary register class () is listed
> second with a '?' to discourage use.
> 
> Prefer loading 0/-1 in any VSX register for ISA 3.0, and Altivec registers for
> ISA 2.06/2.07 (PR target/70915) so that if the register was involved in a slow
> operation, the clear/set operation does not wait for the slow operation to
> finish.
> 
> I adjusted the length attributes for 32-bit mode.
> 
> I changed the 32-bit move to use rs6000_output_move_128bit and drop the use of
> the string instructions for 32-bit movti when -mvsx-timode is in effect.
> 
> I used spacing so that the alternatives and attributes don't generate long
> lines, and put things in columns, so that it is easier to match up the 
> operands
> and attributes with the insn alternatives.
> 
> I did a spec 2006 run comparing these changes to the trunk, and there were no
> significant differences in terms of performances.
> 
> Are these patches ok to apply to the GCC 7.0 trunk?

Changelog is missing.

> +;; Generate the XXORC instruction which was added in ISA 2.07
> +(define_constraint "wM"
> +  "vector constant with all 1's, ISA 2.07 and above"
> +  (and (match_test "TARGET_P8_VECTOR")
> +   (match_operand 0 "all_ones_constant")))

That's not a very good comment; a constraint doesn't generate anything.
"Used for", maybe?

> +(define_predicate "xxspltib_constant_split"
> +  (match_code "const_vector,vec_duplicate,const_int")
> +{
> +  int value  = 256;
> +  int num_insns  = -1;

No tabs please, just a single space will do.

> +   rtx cvt  = ((TARGET_XSCVDPSPN)
> +   ? gen_vsx_xscvdpspn_scalar (freg, sreg)
> +   : gen_vsx_xscvdpsp_scalar (freg, sreg));

No parentheses around TARGET_XSCVDPSPN.

> +(define_insn "xxspltib_v16qi"
> +  [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
> + (vec_duplicate:V16QI (match_operand:SI 1 "s8bit_cint_operand" "i")))]
> +  "TARGET_P9_VECTOR"
>  {
> -  return rs6000_output_move_128bit (operands);
> +  operands[2] = GEN_INT (INTVAL (operands[1]) & 0xff);
> +  return "xxspltib %x0,%2";
>  }

Please use "n" instead of "i"?  It shouldn't matter here, but at least it's
good documentation.

> +;;  VSX store   VSX load
> +;;   VSX movedirect move
> +;;   direct move GPR store
> +;;   GPR loadGPR move
> +;;   P9 const.   AVX const.
> +;;  P9 const.   0
> +;;   -1  GPR const.
> +;;  AVX store   AVX load

"AVX"?  Heh.  AltiVec or VMX.

> +(define_insn "*vsx_mov_64bit"
> +  [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=wOZ,,
> +   ,   *r,
> +   *wo, $Y,
> +   ??r, ??r,
> +   wo,  v,
> +   v,   ?,
> +   ?wh, ??r,
> +   wZ,  v")

This is more readable, excellent.  Would four or so columns work even
better?

> +/* { dg-do compile { target { powerpc64le-*-* } } } */

Does the test not work on BE?

Okay for trunk, with a changelog and the nits fixed.  Thanks,


Segher


[PATCH] Add PowerPC ISA 3.0 word splat and byte immediate splat support

2016-05-13 Thread Michael Meissner
This patch adds support for the 32-bit word splat instructions, the byte
immediate splat instructions, and the vector sign extend instructions to GCC
7.0.

In addition to the various splat instructions, since I was modifying the vector
moves, I took the opportunity to reorganize the vector move instructions with
several changes I've wanted to do:

I replaced the single move that handled all vector types with separate 32-bit
and 64-bit moves.  I also combined the movti_ moves (when -mvsx-timode is
in effect) into the main vector moves.

I eliminated separate moves for  , where the preferred register class
() is listed first, and the secondary register class () is listed
second with a '?' to discourage use.

Prefer loading 0/-1 in any VSX register for ISA 3.0, and Altivec registers for
ISA 2.06/2.07 (PR target/70915) so that if the register was involved in a slow
operation, the clear/set operation does not wait for the slow operation to
finish.

I adjusted the length attributes for 32-bit mode.

I changed the 32-bit move to use rs6000_output_move_128bit and drop the use of
the string instructions for 32-bit movti when -mvsx-timode is in effect.

I used spacing so that the alternatives and attributes don't generate long
lines, and put things in columns, so that it is easier to match up the operands
and attributes with the insn alternatives.

I did a spec 2006 run comparing these changes to the trunk, and there were no
significant differences in terms of performances.

Are these patches ok to apply to the GCC 7.0 trunk?

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/constraints.md
===
--- gcc/config/rs6000/constraints.md
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 236136)
+++ gcc/config/rs6000/constraints.md(.../gcc/config/rs6000) (working copy)
@@ -140,6 +140,10 @@ (define_constraint "wD"
   (and (match_code "const_int")
(match_test "TARGET_VSX && (ival == VECTOR_ELEMENT_SCALAR_64BIT)")))
 
+(define_constraint "wE"
+  "Vector constant that can be loaded with the XXSPLTIB instruction."
+  (match_test "xxspltib_constant_nosplit (op, mode)"))
+
 ;; Extended fusion store
 (define_memory_constraint "wF"
   "Memory operand suitable for power9 fusion load/stores"
@@ -156,6 +160,12 @@ (define_constraint "wL"
(and (match_test "TARGET_DIRECT_MOVE_128")
(match_test "(ival == VECTOR_ELEMENT_MFVSRLD_64BIT)"
 
+;; Generate the XXORC instruction which was added in ISA 2.07
+(define_constraint "wM"
+  "vector constant with all 1's, ISA 2.07 and above"
+  (and (match_test "TARGET_P8_VECTOR")
+   (match_operand 0 "all_ones_constant")))
+
 ;; ISA 3.0 vector d-form addresses
 (define_memory_constraint "wO"
   "Memory operand suitable for the ISA 3.0 vector d-form instructions."
@@ -166,6 +176,10 @@ (define_memory_constraint "wQ"
   "Memory operand suitable for the load/store quad instructions"
   (match_operand 0 "quad_memory_operand"))
 
+(define_constraint "wS"
+  "Vector constant that can be loaded with XXSPLTIB & sign extension."
+  (match_test "xxspltib_constant_split (op, mode)"))
+
 ;; Altivec style load/store that ignores the bottom bits of the address
 (define_memory_constraint "wZ"
   "Indexed or indirect memory operand, ignoring the bottom 4 bits"
Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md 
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 236136)
+++ gcc/config/rs6000/predicates.md (.../gcc/config/rs6000) (working copy)
@@ -565,6 +565,38 @@ (define_predicate "easy_fp_constant"
 }
 })
 
+;; Return 1 if the operand is a CONST_VECTOR or VEC_DUPLICATE of a constant
+;; that can loaded with a XXSPLTIB instruction and then a VUPKHSB, VECSB2W or
+;; VECSB2D instruction.
+
+(define_predicate "xxspltib_constant_split"
+  (match_code "const_vector,vec_duplicate,const_int")
+{
+  int value= 256;
+  int num_insns= -1;
+
+  if (!xxspltib_constant_p (op, mode, &num_insns, &value))
+return false;
+
+  return num_insns > 1;
+})
+
+
+;; Return 1 if the operand is a CONST_VECTOR that can loaded directly with a
+;; XXSPLTIB instruction.
+
+(define_predicate "xxspltib_constant_nosplit"
+  (match_code "const_vector,vec_duplicate,const_int")
+{
+  int value= 256;
+  int num_insns= -1;
+
+  if (!xxspltib_constant_p (op, mode, &num_insns, &value))
+return false;
+
+  return num_insns == 1;
+})
+
 ;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a
 ;; vector register without using memory.
 (define_predicate "easy_vector_constant"
@@ -583,7 +615,14 @@ (define_predicate "easy_vector_constant"
 
   if (VECTOR_MEM_ALTIVEC_OR_VSX_P (mode))
 {
-  if (zero_constant (op, mode))