Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

2017-08-07 Thread Segher Boessenkool
On Mon, Aug 07, 2017 at 09:18:30AM -0400, Michael Meissner wrote:
> > I don't like using NULL as a magic value at all; it does not simplify
> > this interface, it complicates it instead.
> > 
> > Can you move the "which half is high" decision to the callers?
> 
> I rewrote the patch to eliminate the rs6000_output_xxpermdi function, and do
> the calculation of the XXPERMDI mask in each of the vsx_concat__{1,2,3}
> insns.  Just to be sure I got things correct, I wrote a new executable test
> that tests various methods of creating/inserting 2 element vectors with double
> word elements, and tested in BE, LE -maltivec=be, and LE, and the results 
> match
> previous compilers.
> 
> I have done bootstrap/build checks on a big endian power7, a little endian
> power8 system, and I have done a non-bootstrap/check on a power9 prototype (I
> have script issues that prevents a bootstrap build on power9 that I need to
> look into).  There are no regressions in the tests and the new tests were run
> on each of the systems.  Can I check this into the trunk?
> 
> I would also like to backport it to all open branches (particularly GCC 7, but
> GCC 6 if possible).  Note, the patch will need a slight tweak on the older
> systems due to GCC 7 still supporting -mupper-regs-{df,di} and I have to 
> adjust
> the constraints to accomidate this, and under GCC 6 DImode not being allowed 
> in
> traditional Altivec registers.

Thanks!  Okay for trunk.  The 7 branch is frozen; okay for 7 after the
release, and for 6 too.


Segher


Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

2017-08-07 Thread Michael Meissner
On Thu, Aug 03, 2017 at 10:01:41AM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Wed, Aug 02, 2017 at 10:28:55AM -0400, Michael Meissner wrote:
> > On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote:
> > > I think calling this with the rtx elementN args makes this only more
> > > complicated (the function comment doesn't say what they are or what
> > > NULL means, btw).
> 
> You didn't handle the first part of this as far as I see?  It's the
> big complicating issue here.
> 
> > +   If ELEMENT1 is null, use the top 64-bit double word of ARG1.  If it is
> > +   non-NULL, it is a 0 or 1 constant that gives the vector element number 
> > to
> > +   use for extracting the 64-bit double word from ARG1.
> > +
> > +   If ELEMENT2 is null, use the top 64-bit double word of ARG2.  If it is
> > +   non-NULL, it is a 0 or 1 constant that gives the vector element number 
> > to
> > +   use for extracting the 64-bit double word from ARG2.
> > +
> > +   The element number is based on the user element ordering, set by the
> > +   endianess and by the -maltivec={le,be} options.  */
> 
> ("endianness", two n's).
> 
> I don't like using NULL as a magic value at all; it does not simplify
> this interface, it complicates it instead.
> 
> Can you move the "which half is high" decision to the callers?

I rewrote the patch to eliminate the rs6000_output_xxpermdi function, and do
the calculation of the XXPERMDI mask in each of the vsx_concat__{1,2,3}
insns.  Just to be sure I got things correct, I wrote a new executable test
that tests various methods of creating/inserting 2 element vectors with double
word elements, and tested in BE, LE -maltivec=be, and LE, and the results match
previous compilers.

I have done bootstrap/build checks on a big endian power7, a little endian
power8 system, and I have done a non-bootstrap/check on a power9 prototype (I
have script issues that prevents a bootstrap build on power9 that I need to
look into).  There are no regressions in the tests and the new tests were run
on each of the systems.  Can I check this into the trunk?

I would also like to backport it to all open branches (particularly GCC 7, but
GCC 6 if possible).  Note, the patch will need a slight tweak on the older
systems due to GCC 7 still supporting -mupper-regs-{df,di} and I have to adjust
the constraints to accomidate this, and under GCC 6 DImode not being allowed in
traditional Altivec registers.

[gcc]
2017-08-07  Michael Meissner  

PR target/81593
* config/rs6000/vsx.md (vsx_concat_, VSX_D): Cleanup
constraints since the -mupper-regs-* switches have been
eliminated.
(vsx_concat__1): New combiner insns to recognize inserting
into a vector from a double word element that was extracted from
another vector, and eliminate extra XXPERMDI instructions.
(vsx_concat__2): Likewise.
(vsx_concat__3): Likewise.
(vsx_set_, VSX_D): Rewrite vector set in terms of vector
concat to allow optimizing inserts from previous extracts.

[gcc/testsuite]
2017-08-07  Michael Meissner  

PR target/81593
* gcc.target/powerpc/vec-setup.h: New tests to test various
combinations of setting up vectors of 2 double word elements.
* gcc.target/powerpc/vec-setup-long.c: Likewise.
* gcc.target/powerpc/vec-setup-double.c: Likewise.
* gcc.target/powerpc/vec-setup-be-long.c: Likewise.
* gcc.target/powerpc/vec-setup-be-double.c: Likewise.
* gcc.target/powerpc/vsx-extract-6.c: New tests for optimzing
vector inserts from vector extracts.
* gcc.target/powerpc/vsx-extract-7.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 250858)
+++ gcc/config/rs6000/vsx.md(.../gcc/config/rs6000) (working copy)
@@ -2364,10 +2364,10 @@ (define_insn "*vsx_float_fix_v2df2"
 
 ;; Build a V2DF/V2DI vector from two scalars
 (define_insn "vsx_concat_"
-  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=,we")
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
(vec_concat:VSX_D
-(match_operand: 1 "gpc_reg_operand" ",b")
-(match_operand: 2 "gpc_reg_operand" ",b")))]
+(match_operand: 1 "gpc_reg_operand" "wa,b")
+(match_operand: 2 "gpc_reg_operand" "wa,b")))]
   "VECTOR_MEM_VSX_P (mode)"
 {
   if (which_alternative == 0)
@@ -2385,6 +2385,80 @@ (define_insn "vsx_concat_"
 }
   [(set_attr "type" "vecperm")])
 
+;; Combiner patterns to allow creating XXPERMDI's to access either double
+;; word element in a vector register.
+(define_insn "*vsx_concat__1"
+  [(set 

Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

2017-08-03 Thread Michael Meissner
On Thu, Aug 03, 2017 at 10:01:41AM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Wed, Aug 02, 2017 at 10:28:55AM -0400, Michael Meissner wrote:
> > On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote:
> > > I think calling this with the rtx elementN args makes this only more
> > > complicated (the function comment doesn't say what they are or what
> > > NULL means, btw).
> 
> You didn't handle the first part of this as far as I see?  It's the
> big complicating issue here.

I am not sure exactly what you are asking for.  This is like the other output
functions that take the rtx insns.

> > +   If ELEMENT1 is null, use the top 64-bit double word of ARG1.  If it is
> > +   non-NULL, it is a 0 or 1 constant that gives the vector element number 
> > to
> > +   use for extracting the 64-bit double word from ARG1.
> > +
> > +   If ELEMENT2 is null, use the top 64-bit double word of ARG2.  If it is
> > +   non-NULL, it is a 0 or 1 constant that gives the vector element number 
> > to
> > +   use for extracting the 64-bit double word from ARG2.
> > +
> > +   The element number is based on the user element ordering, set by the
> > +   endianess and by the -maltivec={le,be} options.  */
> 
> ("endianness", two n's).
> 
> I don't like using NULL as a magic value at all; it does not simplify
> this interface, it complicates it instead.
>
> Can you move the "which half is high" decision to the callers?

And then essentially there is no need for the function, since each of the 4
concat variants have to have the logic to support big endian, -maltivec=be, and
-maltivec=le.

Let me see what I can do about it.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

2017-08-03 Thread Segher Boessenkool
Hi Mike,

On Wed, Aug 02, 2017 at 10:28:55AM -0400, Michael Meissner wrote:
> On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote:
> > I think calling this with the rtx elementN args makes this only more
> > complicated (the function comment doesn't say what they are or what
> > NULL means, btw).

You didn't handle the first part of this as far as I see?  It's the
big complicating issue here.

> +   If ELEMENT1 is null, use the top 64-bit double word of ARG1.  If it is
> +   non-NULL, it is a 0 or 1 constant that gives the vector element number to
> +   use for extracting the 64-bit double word from ARG1.
> +
> +   If ELEMENT2 is null, use the top 64-bit double word of ARG2.  If it is
> +   non-NULL, it is a 0 or 1 constant that gives the vector element number to
> +   use for extracting the 64-bit double word from ARG2.
> +
> +   The element number is based on the user element ordering, set by the
> +   endianess and by the -maltivec={le,be} options.  */

("endianness", two n's).

I don't like using NULL as a magic value at all; it does not simplify
this interface, it complicates it instead.

Can you move the "which half is high" decision to the callers?


Segher


Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

2017-08-02 Thread Michael Meissner
On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote:
> 
> "emit" is not a good name for this: that is generally used for something
> that does emit_insn, i.e. put an insn in the instruction stream.  This
> function returns a string a define_insn can return.  For the rl* insns
> I called the similar functions rs6000_insn_for_*, maybe something like
> that is better here?

...

> I think calling this with the rtx elementN args makes this only more
> complicated (the function comment doesn't say what they are or what
> NULL means, btw).

...

> In this and the other testcase, should you test no other insns at all
> are generated?

Here are the revised patches.  I tested on a little endian power8 system and a
big endian power7 system.  Are these patches ok for the trunk?

[gcc]
2017-08-02  Michael Meissner  

PR target/81593
* config/rs6000/rs6000-protos.h (rs6000_output_xxpermdi): New
declaration.
* config/rs6000/rs6000.c (rs6000_output_xxpermdi): New function to
emit XXPERMDI accessing either double word in either vector
register inputs.
* config/rs6000/vsx.md (vsx_concat_, VSX_D iterator):
Rewrite VEC_CONCAT insn to call rs6000_output_xxpermdi.  Simplify
the constraints with the removal of the -mupper-regs-* switches.
(vsx_concat__1): New combiner insns to optimize CONCATs
where either register might have come from VEC_SELECT.
(vsx_concat__2): Likewise.
(vsx_concat__3): Likewise.
(vsx_set_, VSX_D iterator): Rewrite insn to generate a
VEC_CONCAT rather than use an UNSPEC to specify the option.

[gcc/testsuite]
2017-08-02  Michael Meissner  

PR target/81593
* gcc.target/powerpc/vsx-extract-6.c: New test.
* gcc.target/powerpc/vsx-extract-7.c: Likewise.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h   
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 250793)
+++ gcc/config/rs6000/rs6000-protos.h   (.../gcc/config/rs6000) (working copy)
@@ -233,6 +233,7 @@ extern void rs6000_asm_output_dwarf_pcre
   const char *label);
 extern void rs6000_asm_output_dwarf_datarel (FILE *file, int size,
 const char *label);
+extern const char *rs6000_output_xxpermdi (rtx, rtx, rtx, rtx, rtx);
 
 /* Declare functions in rs6000-c.c */
 
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 250793)
+++ gcc/config/rs6000/rs6000.c  (.../gcc/config/rs6000) (working copy)
@@ -39007,6 +39007,60 @@ rs6000_optab_supported_p (int op, machin
   return true;
 }
 }
+
+
+/* Output a xxpermdi instruction that sets a 128-bit vector DEST combining two
+   inputs SRC1 and SRC2.
+
+   If ELEMENT1 is null, use the top 64-bit double word of ARG1.  If it is
+   non-NULL, it is a 0 or 1 constant that gives the vector element number to
+   use for extracting the 64-bit double word from ARG1.
+
+   If ELEMENT2 is null, use the top 64-bit double word of ARG2.  If it is
+   non-NULL, it is a 0 or 1 constant that gives the vector element number to
+   use for extracting the 64-bit double word from ARG2.
+
+   The element number is based on the user element ordering, set by the
+   endianess and by the -maltivec={le,be} options.  */
+
+const char *
+rs6000_output_xxpermdi (rtx dest,
+   rtx src1,
+   rtx src2,
+   rtx element1,
+   rtx element2)
+{
+  int op1_dword = (!element1) ? 0 : INTVAL (element1);
+  int op2_dword = (!element2) ? 0 : INTVAL (element2);
+  rtx xops[10];
+  const char *insn_string;
+
+  gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1));
+  xops[0] = dest;
+  xops[1] = src1;
+  xops[2] = src2;
+
+  if (BYTES_BIG_ENDIAN)
+{
+  xops[3] = GEN_INT (2*op1_dword + op2_dword);
+  insn_string = "xxpermdi %x0,%x1,%x2,%3";
+}
+  else
+{
+  if (element1)
+   op1_dword = 1 - op1_dword;
+
+  if (element2)
+   op2_dword = 1 - op2_dword;
+
+  xops[3] = GEN_INT (op1_dword + 2*op2_dword);
+  insn_string = "xxpermdi %x0,%x2,%x1,%3";
+}
+
+  output_asm_insn (insn_string, xops);
+  return "";
+}
+
 
 struct gcc_target targetm = TARGET_INITIALIZER;
 
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 250793)
+++ 

Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

2017-07-31 Thread Bill Schmidt

> On Jul 31, 2017, at 12:40 PM, Michael Meissner  
> wrote:
> 
> On Sun, Jul 30, 2017 at 09:00:58AM -0500, Bill Schmidt wrote:
 (define_insn "vsx_concat_"
 -  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=,we")
 +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
(vec_concat:VSX_D
 -   (match_operand: 1 "gpc_reg_operand" ",b")
 -   (match_operand: 2 "gpc_reg_operand" ",b")))]
 +   (match_operand: 1 "gpc_reg_operand" "wa,b")
 +   (match_operand: 2 "gpc_reg_operand" "wa,b")))]
  "VECTOR_MEM_VSX_P (mode)"
 {
  if (which_alternative == 0)
 -return (BYTES_BIG_ENDIAN
 -  ? "xxpermdi %x0,%x1,%x2,0"
 -  : "xxpermdi %x0,%x2,%x1,0");
 +return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
 
  else if (which_alternative == 1)
 -return (BYTES_BIG_ENDIAN
 +return (VECTOR_ELT_ORDER_BIG
? "mtvsrdd %x0,%1,%2"
: "mtvsrdd %x0,%2,%1");
>>> 
>>> This one could be
>>> 
>>> {
>>> if (!BYTES_BIG_ENDIAN)
>> 
>> !VECTOR_ELT_ORDER_BIG (to accommodate -maltivec=be).  (We have some general 
>> bitrot associated with -maltivec=be that needs to be addressed, or we need 
>> to give up on it altogether.  Still of two minds about this.)
>> 
>> Bill
> 
> In this case, I believe I tested -maltivec=be, and BYTES_BIG_ENDIAN is correct
> (I originally had it using VECTOR_ELT_ORDER_BIG and got failures).  But I need
> to look at it again.

Hi Mike,

You misunderstand me, I think you had it right (you did move to 
VECTOR_ELT_ORDER_BIG here)
but I just wanted to clarify that Segher's suggestion would also need to use 
VECTOR_ELT_ORDER_BIG.

Thanks,
Bill

> 
> -- 
> Michael Meissner, IBM
> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
> email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

2017-07-31 Thread Michael Meissner
On Sun, Jul 30, 2017 at 09:00:58AM -0500, Bill Schmidt wrote:
> >> (define_insn "vsx_concat_"
> >> -  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=,we")
> >> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
> >>(vec_concat:VSX_D
> >> -   (match_operand: 1 "gpc_reg_operand" ",b")
> >> -   (match_operand: 2 "gpc_reg_operand" ",b")))]
> >> +   (match_operand: 1 "gpc_reg_operand" "wa,b")
> >> +   (match_operand: 2 "gpc_reg_operand" "wa,b")))]
> >>   "VECTOR_MEM_VSX_P (mode)"
> >> {
> >>   if (which_alternative == 0)
> >> -return (BYTES_BIG_ENDIAN
> >> -  ? "xxpermdi %x0,%x1,%x2,0"
> >> -  : "xxpermdi %x0,%x2,%x1,0");
> >> +return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
> >> 
> >>   else if (which_alternative == 1)
> >> -return (BYTES_BIG_ENDIAN
> >> +return (VECTOR_ELT_ORDER_BIG
> >>? "mtvsrdd %x0,%1,%2"
> >>: "mtvsrdd %x0,%2,%1");
> > 
> > This one could be
> > 
> > {
> >  if (!BYTES_BIG_ENDIAN)
> 
> !VECTOR_ELT_ORDER_BIG (to accommodate -maltivec=be).  (We have some general 
> bitrot associated with -maltivec=be that needs to be addressed, or we need to 
> give up on it altogether.  Still of two minds about this.)
> 
> Bill

In this case, I believe I tested -maltivec=be, and BYTES_BIG_ENDIAN is correct
(I originally had it using VECTOR_ELT_ORDER_BIG and got failures).  But I need
to look at it again.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

2017-07-31 Thread Michael Meissner
On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Jul 27, 2017 at 07:21:14PM -0400, Michael Meissner wrote:
> > This patches optimizes the PowerPC vector set operation for 64-bit doubles 
> > and
> > longs where the elements in the vector set may have been extracted from 
> > another
> > vector (PR target/81593):
> 
> > * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
> > emit XXPERMDI accessing either double word in either vector
> > register inputs.
> 
> "emit" is not a good name for this: that is generally used for something
> that does emit_insn, i.e. put an insn in the instruction stream.  This
> function returns a string a define_insn can return.  For the rl* insns
> I called the similar functions rs6000_insn_for_*, maybe something like
> that is better here?

Yeah, I should have used rs6000_output_xxpermdi or similar (or output_xxpermdi,
etc.), which is what the other functions used.

> > +/* Emit a XXPERMDI instruction that can extract from either double word of 
> > the
> > +   two arguments.  ELEMENT1 and ELEMENT2 are either NULL or they are 0/1 
> > giving
> > +   which double word to be used for the operand.  */
> > +
> > +const char *
> > +rs6000_emit_xxpermdi (rtx operands[], rtx element1, rtx element2)
> > +{
> > +  int op1_dword = (!element1) ? 0 : INTVAL (element1);
> > +  int op2_dword = (!element2) ? 0 : INTVAL (element2);
> > +
> > +  gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1));
> > +
> > +  if (BYTES_BIG_ENDIAN)
> > +{
> > +  operands[3] = GEN_INT (2*op1_dword + op2_dword);
> > +  return "xxpermdi %x0,%x1,%x2,%3";
> > +}
> > +  else
> > +{
> > +  if (element1)
> > +   op1_dword = 1 - op1_dword;
> > +
> > +  if (element2)
> > +   op2_dword = 1 - op2_dword;
> > +
> > +  operands[3] = GEN_INT (op1_dword + 2*op2_dword);
> > +  return "xxpermdi %x0,%x2,%x1,%3";
> > +}
> > +}
> 
> I think calling this with the rtx elementN args makes this only more
> complicated (the function comment doesn't say what they are or what
> NULL means, btw).

Ok, let me think on it.

> 
> >  (define_insn "vsx_concat_"
> > -  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=,we")
> > +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
> > (vec_concat:VSX_D
> > -(match_operand: 1 "gpc_reg_operand" ",b")
> > -(match_operand: 2 "gpc_reg_operand" ",b")))]
> > +(match_operand: 1 "gpc_reg_operand" "wa,b")
> > +(match_operand: 2 "gpc_reg_operand" "wa,b")))]
> >"VECTOR_MEM_VSX_P (mode)"
> >  {
> >if (which_alternative == 0)
> > -return (BYTES_BIG_ENDIAN
> > -   ? "xxpermdi %x0,%x1,%x2,0"
> > -   : "xxpermdi %x0,%x2,%x1,0");
> > +return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
> >  
> >else if (which_alternative == 1)
> > -return (BYTES_BIG_ENDIAN
> > +return (VECTOR_ELT_ORDER_BIG
> > ? "mtvsrdd %x0,%1,%2"
> > : "mtvsrdd %x0,%2,%1");
> 
> This one could be
> 
> {
>   if (!BYTES_BIG_ENDIAN)
> std::swap (operands[1], operands[2]);
> 
>   switch (which_alternative)
> {
> case 0:
>   return "xxpermdi %x0,%x1,%x2,0";
> case 1:
>   return "mtvsrdd %x0,%1,%2";
> default:
>   gcc_unreachable ();
> }
> }

> (Could/should we use xxmrghd instead?  Do all supported assemblers know
> that extended mnemonic, is it actually more readable?)

For me no, xxpermdi is clearer.  But if you want xxmrghd, I can do it.

> > --- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c
> > (svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c)
> >  (revision 0)
> > +++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c
> > (.../gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c)  (revision 250640)
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile { target { powerpc*-*-* } } } */
> > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> > +/* { dg-require-effective-target powerpc_vsx_ok } */
> > +/* { dg-options "-O2 -mvsx" } */
> > +
> > +vector double
> > +test_vpasted (vector double high, vector double low)
> > +{
> > +  vector double res;
> > +  res[1] = high[1];
> > +  res[0] = low[0];
> > +  return res;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */
> 
> In this and the other testcase, should you test no other insns at all
> are generated?

It is kind of hard to test a negative, without trying to guess what possible
instructions could be generated.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

2017-07-30 Thread Bill Schmidt

> On Jul 28, 2017, at 4:08 PM, Segher Boessenkool  
> wrote:
> 
> Hi!
> 
> On Thu, Jul 27, 2017 at 07:21:14PM -0400, Michael Meissner wrote:
>> This patches optimizes the PowerPC vector set operation for 64-bit doubles 
>> and
>> longs where the elements in the vector set may have been extracted from 
>> another
>> vector (PR target/81593):
> 
>>  * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
>>  emit XXPERMDI accessing either double word in either vector
>>  register inputs.
> 
> "emit" is not a good name for this: that is generally used for something
> that does emit_insn, i.e. put an insn in the instruction stream.  This
> function returns a string a define_insn can return.  For the rl* insns
> I called the similar functions rs6000_insn_for_*, maybe something like
> that is better here?
> 
>> +/* Emit a XXPERMDI instruction that can extract from either double word of 
>> the
>> +   two arguments.  ELEMENT1 and ELEMENT2 are either NULL or they are 0/1 
>> giving
>> +   which double word to be used for the operand.  */
>> +
>> +const char *
>> +rs6000_emit_xxpermdi (rtx operands[], rtx element1, rtx element2)
>> +{
>> +  int op1_dword = (!element1) ? 0 : INTVAL (element1);
>> +  int op2_dword = (!element2) ? 0 : INTVAL (element2);
>> +
>> +  gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1));
>> +
>> +  if (BYTES_BIG_ENDIAN)
>> +{
>> +  operands[3] = GEN_INT (2*op1_dword + op2_dword);
>> +  return "xxpermdi %x0,%x1,%x2,%3";
>> +}
>> +  else
>> +{
>> +  if (element1)
>> +op1_dword = 1 - op1_dword;
>> +
>> +  if (element2)
>> +op2_dword = 1 - op2_dword;
>> +
>> +  operands[3] = GEN_INT (op1_dword + 2*op2_dword);
>> +  return "xxpermdi %x0,%x2,%x1,%3";
>> +}
>> +}
> 
> I think calling this with the rtx elementN args makes this only more
> complicated (the function comment doesn't say what they are or what
> NULL means, btw).
> 
>> (define_insn "vsx_concat_"
>> -  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=,we")
>> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
>>  (vec_concat:VSX_D
>> - (match_operand: 1 "gpc_reg_operand" ",b")
>> - (match_operand: 2 "gpc_reg_operand" ",b")))]
>> + (match_operand: 1 "gpc_reg_operand" "wa,b")
>> + (match_operand: 2 "gpc_reg_operand" "wa,b")))]
>>   "VECTOR_MEM_VSX_P (mode)"
>> {
>>   if (which_alternative == 0)
>> -return (BYTES_BIG_ENDIAN
>> -? "xxpermdi %x0,%x1,%x2,0"
>> -: "xxpermdi %x0,%x2,%x1,0");
>> +return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
>> 
>>   else if (which_alternative == 1)
>> -return (BYTES_BIG_ENDIAN
>> +return (VECTOR_ELT_ORDER_BIG
>>  ? "mtvsrdd %x0,%1,%2"
>>  : "mtvsrdd %x0,%2,%1");
> 
> This one could be
> 
> {
>  if (!BYTES_BIG_ENDIAN)

!VECTOR_ELT_ORDER_BIG (to accommodate -maltivec=be).  (We have some general 
bitrot associated with -maltivec=be that needs to be addressed, or we need to 
give up on it altogether.  Still of two minds about this.)

Bill

>std::swap (operands[1], operands[2]);
> 
>  switch (which_alternative)
>{
>case 0:
>  return "xxpermdi %x0,%x1,%x2,0";
>case 1:
>  return "mtvsrdd %x0,%1,%2";
>default:
>  gcc_unreachable ();
>}
> }
> 
> (Could/should we use xxmrghd instead?  Do all supported assemblers know
> that extended mnemonic, is it actually more readable?)
> 
>> --- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c 
>> (svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c)
>>  (revision 0)
>> +++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c 
>> (.../gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c)  (revision 250640)
>> @@ -0,0 +1,15 @@
>> +/* { dg-do compile { target { powerpc*-*-* } } } */
>> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
>> +/* { dg-require-effective-target powerpc_vsx_ok } */
>> +/* { dg-options "-O2 -mvsx" } */
>> +
>> +vector double
>> +test_vpasted (vector double high, vector double low)
>> +{
>> +  vector double res;
>> +  res[1] = high[1];
>> +  res[0] = low[0];
>> +  return res;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */
> 
> In this and the other testcase, should you test no other insns at all
> are generated?
> 
> 
> Segher
> 



Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

2017-07-28 Thread Segher Boessenkool
Hi!

On Thu, Jul 27, 2017 at 07:21:14PM -0400, Michael Meissner wrote:
> This patches optimizes the PowerPC vector set operation for 64-bit doubles and
> longs where the elements in the vector set may have been extracted from 
> another
> vector (PR target/81593):

>   * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
>   emit XXPERMDI accessing either double word in either vector
>   register inputs.

"emit" is not a good name for this: that is generally used for something
that does emit_insn, i.e. put an insn in the instruction stream.  This
function returns a string a define_insn can return.  For the rl* insns
I called the similar functions rs6000_insn_for_*, maybe something like
that is better here?

> +/* Emit a XXPERMDI instruction that can extract from either double word of 
> the
> +   two arguments.  ELEMENT1 and ELEMENT2 are either NULL or they are 0/1 
> giving
> +   which double word to be used for the operand.  */
> +
> +const char *
> +rs6000_emit_xxpermdi (rtx operands[], rtx element1, rtx element2)
> +{
> +  int op1_dword = (!element1) ? 0 : INTVAL (element1);
> +  int op2_dword = (!element2) ? 0 : INTVAL (element2);
> +
> +  gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1));
> +
> +  if (BYTES_BIG_ENDIAN)
> +{
> +  operands[3] = GEN_INT (2*op1_dword + op2_dword);
> +  return "xxpermdi %x0,%x1,%x2,%3";
> +}
> +  else
> +{
> +  if (element1)
> + op1_dword = 1 - op1_dword;
> +
> +  if (element2)
> + op2_dword = 1 - op2_dword;
> +
> +  operands[3] = GEN_INT (op1_dword + 2*op2_dword);
> +  return "xxpermdi %x0,%x2,%x1,%3";
> +}
> +}

I think calling this with the rtx elementN args makes this only more
complicated (the function comment doesn't say what they are or what
NULL means, btw).

>  (define_insn "vsx_concat_"
> -  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=,we")
> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
>   (vec_concat:VSX_D
> -  (match_operand: 1 "gpc_reg_operand" ",b")
> -  (match_operand: 2 "gpc_reg_operand" ",b")))]
> +  (match_operand: 1 "gpc_reg_operand" "wa,b")
> +  (match_operand: 2 "gpc_reg_operand" "wa,b")))]
>"VECTOR_MEM_VSX_P (mode)"
>  {
>if (which_alternative == 0)
> -return (BYTES_BIG_ENDIAN
> - ? "xxpermdi %x0,%x1,%x2,0"
> - : "xxpermdi %x0,%x2,%x1,0");
> +return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
>  
>else if (which_alternative == 1)
> -return (BYTES_BIG_ENDIAN
> +return (VECTOR_ELT_ORDER_BIG
>   ? "mtvsrdd %x0,%1,%2"
>   : "mtvsrdd %x0,%2,%1");

This one could be

{
  if (!BYTES_BIG_ENDIAN)
std::swap (operands[1], operands[2]);

  switch (which_alternative)
{
case 0:
  return "xxpermdi %x0,%x1,%x2,0";
case 1:
  return "mtvsrdd %x0,%1,%2";
default:
  gcc_unreachable ();
}
}

(Could/should we use xxmrghd instead?  Do all supported assemblers know
that extended mnemonic, is it actually more readable?)

> --- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c  
> (svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c)
>  (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c  
> (.../gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c)  (revision 250640)
> @@ -0,0 +1,15 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx" } */
> +
> +vector double
> +test_vpasted (vector double high, vector double low)
> +{
> +  vector double res;
> +  res[1] = high[1];
> +  res[0] = low[0];
> +  return res;
> +}
> +
> +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */

In this and the other testcase, should you test no other insns at all
are generated?


Segher


Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

2017-07-28 Thread Michael Meissner
On Fri, Jul 28, 2017 at 09:51:30AM +0200, Richard Biener wrote:
> On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner
>  wrote:
> > This patches optimizes the PowerPC vector set operation for 64-bit doubles 
> > and
> > longs where the elements in the vector set may have been extracted from 
> > another
> > vector (PR target/81593):
> >
> > Here an an example:
> >
> > vector double
> > test_vpasted (vector double high, vector double low)
> > {
> >   vector double res;
> >   res[1] = high[1];
> >   res[0] = low[0];
> >   return res;
> > }
> 
> Interesting.  We expand from
> 
>[100.00%] [count: INV]:
>   _1 = BIT_FIELD_REF ;
>   res_6 = BIT_INSERT_EXPR ;
>   _2 = BIT_FIELD_REF ;
>   res_8 = BIT_INSERT_EXPR ;
>   return res_8;
> 
> but ideally we'd pattern-match that to a VEC_PERM_EXPR.  The bswap
> pass looks like the canonical pass for this even though it's quite awkward
> to fill this in.
> 
> So a match.pd rule would work as well here - your ppc backend patterns
> are v2df specific, right?

Both V2DF and V2DI.

While it would be great to have a machine independent optimization, my patches
would also work for PowerPC specific built-ins for vector extract and vector
insert.

Also my patches replaces an UNSPEC to create the vector with VEC_CONCAT.

Thus work going on in for machine independent support should not preclude
this patch from being accepted in the PowerPC backend.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

2017-07-28 Thread Marc Glisse

On Fri, 28 Jul 2017, Andrew Pinski wrote:


For the vector case, can't we write it as:
_1 = BIT_FIELD_REF ;
_2 = BIT_FIELD_REF ;
res_8 = {_1, _2};

And then have some match.pd patterns (which might get complex), to
rewrite that into VEC_PERM_EXPR?


For this last part, we have simplify_vector_constructor in 
tree-ssa-forwprop.c, which currently only recognizes VEC_PERM_EXPR of a 
single vector, but I guess it could be extended to 2 vectors. Not as good 
as a bswap revamp (which will be needed anyway at some point), but less 
work.


--
Marc Glisse


Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

2017-07-28 Thread Richard Biener
On Fri, Jul 28, 2017 at 10:02 AM, Andrew Pinski  wrote:
> On Fri, Jul 28, 2017 at 12:51 AM, Richard Biener
>  wrote:
>> On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner
>>  wrote:
>>> This patches optimizes the PowerPC vector set operation for 64-bit doubles 
>>> and
>>> longs where the elements in the vector set may have been extracted from 
>>> another
>>> vector (PR target/81593):
>>>
>>> Here an an example:
>>>
>>> vector double
>>> test_vpasted (vector double high, vector double low)
>>> {
>>>   vector double res;
>>>   res[1] = high[1];
>>>   res[0] = low[0];
>>>   return res;
>>> }
>>
>> Interesting.  We expand from
>>
>>[100.00%] [count: INV]:
>>   _1 = BIT_FIELD_REF ;
>>   res_6 = BIT_INSERT_EXPR ;
>>   _2 = BIT_FIELD_REF ;
>>   res_8 = BIT_INSERT_EXPR ;
>>   return res_8;
>>
>> but ideally we'd pattern-match that to a VEC_PERM_EXPR.  The bswap
>> pass looks like the canonical pass for this even though it's quite awkward
>> to fill this in.
>
> I was thinking about this exactly.  Though for the scale use of
> BIT_INSERT_EXPR/BIT_FIELD_REF.
> I have a case where someone writes (this shows up in GCC too):
> a->b = c->b;
> a->d = c->d;
> a->e = c->e;
> a->f = c->f;
> a->g = c->g;
> a->h = c->h;
>
> Where b,d,e,f,g,h are adjacent bit-fields after I lowered the bit-fields I 
> have:
> _1 = BIT_FIELD_REF ;
> _8 = BIT_INSERT_EXPR ;
> _2 = BIT_FIELD_REF ;
> _9 = BIT_INSERT_EXPR <_8, _2, 2 (4 bits)>;
> 
>
> For the vector case, can't we write it as:
> _1 = BIT_FIELD_REF ;
> _2 = BIT_FIELD_REF ;
> res_8 = {_1, _2};
>
> And then have some match.pd patterns (which might get complex), to
> rewrite that into VEC_PERM_EXPR?
> The reason why I ask that is because say someone who wrote:
> vector double
> test_vpasted (vector double high, vector double low)
> {
>   vector double res = { high[1], low[0] };
>   return res;
> }

I still believe a proper pass is better than match.pd patterns (which are
awkward when dealing with "variable operand number" cases).

I believe in the end we want to "unify" SRA, bswap and store-merging
at least.  Analyze memory/component accesses, their flow and then
pattern-match the result.  bswap is good with the flow stuff but its
memory/component access analysis is too ad-hoc.

"unify" in the sense of using common infrastructure.

Richard.

>
> Thanks,
> Andrew Pinski
>
>>
>> So a match.pd rule would work as well here - your ppc backend patterns
>> are v2df specific, right?
>>
>>> Previously it would generate:
>>>
>>> xxpermdi 12,34,34,2
>>> vspltisw 2,0
>>> xxlor 0,35,35
>>> xxpermdi 34,34,12,0
>>> xxpermdi 34,0,34,1
>>>
>>> and with these patches, it now generates:
>>>
>>> xxpermdi 34,35,34,1
>>>
>>> I have tested it on a little endian power8 system and a big endian power7
>>> system with the usual bootstrap and make checks with no regressions.  Can I
>>> check this into the trunk?
>>>
>>> I also built Spec 2006 with the compiler, and saw no changes in the code
>>> generated.  This isn't surprising because it isn't something that auto
>>> vectorization might generate by default.
>>>
>>> [gcc]
>>> 2017-07-27  Michael Meissner  
>>>
>>> PR target/81593
>>> * config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New
>>> declaration.
>>> * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
>>> emit XXPERMDI accessing either double word in either vector
>>> register inputs.
>>> * config/rs6000/vsx.md (vsx_concat_, VSX_D iterator):
>>> Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi.  Simplify
>>> the constraints with the removal of the -mupper-regs-* switches.
>>> (vsx_concat__1): New combiner insns to optimize CONCATs
>>> where either register might have come from VEC_SELECT.
>>> (vsx_concat__2): Likewise.
>>> (vsx_concat__3): Likewise.
>>> (vsx_set_, VSX_D iterator): Rewrite insn to generate a
>>> VEC_CONCAT rather than use an UNSPEC to specify the option.
>>>
>>> [gcc/testsuite]
>>> 2017-07-27  Michael Meissner  
>>>
>>> PR target/81593
>>> * gcc.target/powerpc/vsx-extract-6.c: New test.
>>> * gcc.target/powerpc/vsx-extract-7.c: Likewise.
>>>
>>> --
>>> Michael Meissner, IBM
>>> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
>>> email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797


Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

2017-07-28 Thread Andrew Pinski
On Fri, Jul 28, 2017 at 12:51 AM, Richard Biener
 wrote:
> On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner
>  wrote:
>> This patches optimizes the PowerPC vector set operation for 64-bit doubles 
>> and
>> longs where the elements in the vector set may have been extracted from 
>> another
>> vector (PR target/81593):
>>
>> Here an an example:
>>
>> vector double
>> test_vpasted (vector double high, vector double low)
>> {
>>   vector double res;
>>   res[1] = high[1];
>>   res[0] = low[0];
>>   return res;
>> }
>
> Interesting.  We expand from
>
>[100.00%] [count: INV]:
>   _1 = BIT_FIELD_REF ;
>   res_6 = BIT_INSERT_EXPR ;
>   _2 = BIT_FIELD_REF ;
>   res_8 = BIT_INSERT_EXPR ;
>   return res_8;
>
> but ideally we'd pattern-match that to a VEC_PERM_EXPR.  The bswap
> pass looks like the canonical pass for this even though it's quite awkward
> to fill this in.

I was thinking about this exactly.  Though for the scale use of
BIT_INSERT_EXPR/BIT_FIELD_REF.
I have a case where someone writes (this shows up in GCC too):
a->b = c->b;
a->d = c->d;
a->e = c->e;
a->f = c->f;
a->g = c->g;
a->h = c->h;

Where b,d,e,f,g,h are adjacent bit-fields after I lowered the bit-fields I have:
_1 = BIT_FIELD_REF ;
_8 = BIT_INSERT_EXPR ;
_2 = BIT_FIELD_REF ;
_9 = BIT_INSERT_EXPR <_8, _2, 2 (4 bits)>;


For the vector case, can't we write it as:
_1 = BIT_FIELD_REF ;
_2 = BIT_FIELD_REF ;
res_8 = {_1, _2};

And then have some match.pd patterns (which might get complex), to
rewrite that into VEC_PERM_EXPR?
The reason why I ask that is because say someone who wrote:
vector double
test_vpasted (vector double high, vector double low)
{
  vector double res = { high[1], low[0] };
  return res;
}


Thanks,
Andrew Pinski

>
> So a match.pd rule would work as well here - your ppc backend patterns
> are v2df specific, right?
>
>> Previously it would generate:
>>
>> xxpermdi 12,34,34,2
>> vspltisw 2,0
>> xxlor 0,35,35
>> xxpermdi 34,34,12,0
>> xxpermdi 34,0,34,1
>>
>> and with these patches, it now generates:
>>
>> xxpermdi 34,35,34,1
>>
>> I have tested it on a little endian power8 system and a big endian power7
>> system with the usual bootstrap and make checks with no regressions.  Can I
>> check this into the trunk?
>>
>> I also built Spec 2006 with the compiler, and saw no changes in the code
>> generated.  This isn't surprising because it isn't something that auto
>> vectorization might generate by default.
>>
>> [gcc]
>> 2017-07-27  Michael Meissner  
>>
>> PR target/81593
>> * config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New
>> declaration.
>> * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
>> emit XXPERMDI accessing either double word in either vector
>> register inputs.
>> * config/rs6000/vsx.md (vsx_concat_, VSX_D iterator):
>> Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi.  Simplify
>> the constraints with the removal of the -mupper-regs-* switches.
>> (vsx_concat__1): New combiner insns to optimize CONCATs
>> where either register might have come from VEC_SELECT.
>> (vsx_concat__2): Likewise.
>> (vsx_concat__3): Likewise.
>> (vsx_set_, VSX_D iterator): Rewrite insn to generate a
>> VEC_CONCAT rather than use an UNSPEC to specify the option.
>>
>> [gcc/testsuite]
>> 2017-07-27  Michael Meissner  
>>
>> PR target/81593
>> * gcc.target/powerpc/vsx-extract-6.c: New test.
>> * gcc.target/powerpc/vsx-extract-7.c: Likewise.
>>
>> --
>> Michael Meissner, IBM
>> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
>> email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797


Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

2017-07-28 Thread Richard Biener
On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner
 wrote:
> This patches optimizes the PowerPC vector set operation for 64-bit doubles and
> longs where the elements in the vector set may have been extracted from 
> another
> vector (PR target/81593):
>
> Here an an example:
>
> vector double
> test_vpasted (vector double high, vector double low)
> {
>   vector double res;
>   res[1] = high[1];
>   res[0] = low[0];
>   return res;
> }

Interesting.  We expand from

   [100.00%] [count: INV]:
  _1 = BIT_FIELD_REF ;
  res_6 = BIT_INSERT_EXPR ;
  _2 = BIT_FIELD_REF ;
  res_8 = BIT_INSERT_EXPR ;
  return res_8;

but ideally we'd pattern-match that to a VEC_PERM_EXPR.  The bswap
pass looks like the canonical pass for this even though it's quite awkward
to fill this in.

So a match.pd rule would work as well here - your ppc backend patterns
are v2df specific, right?

> Previously it would generate:
>
> xxpermdi 12,34,34,2
> vspltisw 2,0
> xxlor 0,35,35
> xxpermdi 34,34,12,0
> xxpermdi 34,0,34,1
>
> and with these patches, it now generates:
>
> xxpermdi 34,35,34,1
>
> I have tested it on a little endian power8 system and a big endian power7
> system with the usual bootstrap and make checks with no regressions.  Can I
> check this into the trunk?
>
> I also built Spec 2006 with the compiler, and saw no changes in the code
> generated.  This isn't surprising because it isn't something that auto
> vectorization might generate by default.
>
> [gcc]
> 2017-07-27  Michael Meissner  
>
> PR target/81593
> * config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New
> declaration.
> * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
> emit XXPERMDI accessing either double word in either vector
> register inputs.
> * config/rs6000/vsx.md (vsx_concat_, VSX_D iterator):
> Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi.  Simplify
> the constraints with the removal of the -mupper-regs-* switches.
> (vsx_concat__1): New combiner insns to optimize CONCATs
> where either register might have come from VEC_SELECT.
> (vsx_concat__2): Likewise.
> (vsx_concat__3): Likewise.
> (vsx_set_, VSX_D iterator): Rewrite insn to generate a
> VEC_CONCAT rather than use an UNSPEC to specify the option.
>
> [gcc/testsuite]
> 2017-07-27  Michael Meissner  
>
> PR target/81593
> * gcc.target/powerpc/vsx-extract-6.c: New test.
> * gcc.target/powerpc/vsx-extract-7.c: Likewise.
>
> --
> Michael Meissner, IBM
> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
> email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797


[PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

2017-07-27 Thread Michael Meissner
This patches optimizes the PowerPC vector set operation for 64-bit doubles and
longs where the elements in the vector set may have been extracted from another
vector (PR target/81593):

Here an an example:

vector double
test_vpasted (vector double high, vector double low)
{
  vector double res;
  res[1] = high[1];
  res[0] = low[0];
  return res;
}

Previously it would generate:

xxpermdi 12,34,34,2
vspltisw 2,0
xxlor 0,35,35
xxpermdi 34,34,12,0
xxpermdi 34,0,34,1

and with these patches, it now generates:

xxpermdi 34,35,34,1

I have tested it on a little endian power8 system and a big endian power7
system with the usual bootstrap and make checks with no regressions.  Can I
check this into the trunk?

I also built Spec 2006 with the compiler, and saw no changes in the code
generated.  This isn't surprising because it isn't something that auto
vectorization might generate by default.

[gcc]
2017-07-27  Michael Meissner  

PR target/81593
* config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New
declaration.
* config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
emit XXPERMDI accessing either double word in either vector
register inputs.
* config/rs6000/vsx.md (vsx_concat_, VSX_D iterator):
Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi.  Simplify
the constraints with the removal of the -mupper-regs-* switches.
(vsx_concat__1): New combiner insns to optimize CONCATs
where either register might have come from VEC_SELECT.
(vsx_concat__2): Likewise.
(vsx_concat__3): Likewise.
(vsx_set_, VSX_D iterator): Rewrite insn to generate a
VEC_CONCAT rather than use an UNSPEC to specify the option.

[gcc/testsuite]
2017-07-27  Michael Meissner  

PR target/81593
* gcc.target/powerpc/vsx-extract-6.c: New test.
* gcc.target/powerpc/vsx-extract-7.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h   
(svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000/rs6000-protos.h)
(revision 250577)
+++ gcc/config/rs6000/rs6000-protos.h   (.../gcc/config/rs6000/rs6000-protos.h) 
(working copy)
@@ -233,6 +233,7 @@ extern void rs6000_asm_output_dwarf_pcre
   const char *label);
 extern void rs6000_asm_output_dwarf_datarel (FILE *file, int size,
 const char *label);
+extern const char *rs6000_emit_xxpermdi (rtx[], rtx, rtx);
 
 /* Declare functions in rs6000-c.c */
 
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  
(svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000/rs6000.c)   
(revision 250577)
+++ gcc/config/rs6000/rs6000.c  (.../gcc/config/rs6000/rs6000.c)
(working copy)
@@ -39167,6 +39167,38 @@ rs6000_optab_supported_p (int op, machin
   return true;
 }
 }
+
+
+/* Emit a XXPERMDI instruction that can extract from either double word of the
+   two arguments.  ELEMENT1 and ELEMENT2 are either NULL or they are 0/1 giving
+   which double word to be used for the operand.  */
+
+const char *
+rs6000_emit_xxpermdi (rtx operands[], rtx element1, rtx element2)
+{
+  int op1_dword = (!element1) ? 0 : INTVAL (element1);
+  int op2_dword = (!element2) ? 0 : INTVAL (element2);
+
+  gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1));
+
+  if (BYTES_BIG_ENDIAN)
+{
+  operands[3] = GEN_INT (2*op1_dword + op2_dword);
+  return "xxpermdi %x0,%x1,%x2,%3";
+}
+  else
+{
+  if (element1)
+   op1_dword = 1 - op1_dword;
+
+  if (element2)
+   op2_dword = 1 - op2_dword;
+
+  operands[3] = GEN_INT (op1_dword + 2*op2_dword);
+  return "xxpermdi %x0,%x2,%x1,%3";
+}
+}
+
 
 struct gcc_target targetm = TARGET_INITIALIZER;
 
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md
(svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000/vsx.md) 
(revision 250577)
+++ gcc/config/rs6000/vsx.md(.../gcc/config/rs6000/vsx.md)  (working copy)
@@ -2366,19 +2366,17 @@ (define_insn "*vsx_float_fix_v2df2"
 
 ;; Build a V2DF/V2DI vector from two scalars
 (define_insn "vsx_concat_"
-  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=,we")
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
(vec_concat:VSX_D
-(match_operand: 1 "gpc_reg_operand" ",b")
-(match_operand: 2 "gpc_reg_operand" ",b")))]
+