Hi,

I discovered recently that, with -mcpu=power9, an attempt to generate a 
vspltish instruction resulted instead in an xxspltib followed by a vupkhsb.  
This is semantically correct but the extra instruction is not optimal.  I found 
that there was some logic in xxspltib_constant_p to do special casing for 
const_vector with small constants, but not for vec_duplicate with small 
constants.  This patch duplicates that logic so we can generate the single 
instruction when possible.

When I did this, I ran into a problem with an existing test case.  We end up 
matching the *vsx_splat_v4si_internal pattern instead of falling back to the 
altivec_vspltisw pattern.  The constraints don't match for constant input.  To 
avoid this, I added a pattern ahead of this one that will match for VMX output 
registers and produce the vspltisw as desired.  This corrected the failing test 
and produces the expected code.

I've added a test case to demonstrate the code works properly now in the usual 
case.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu.  OK for trunk, and 
for 6.2 after suitable burn-in?

Thanks!

Bill


[gcc]

2016-06-21  Bill Schmidt  <wschm...@linux.vnet.ibm.com>

        * config/rs6000/rs6000.c (xxspltib_constant_p): Prefer vspltisw/h
        for vec_duplicate when this is cheaper.
        * config/rs6000/vsx.md (*vsx_splat_v4si_altivec): New define_insn.

[gcc/testsuite]

2016-06-21  Bill Schmidt  <wschm...@linux.vnet.ibm.com>

        * gcc.target/powerpc/splat-p9-1.c: New test.


Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c  (revision 237619)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -6329,6 +6329,13 @@ xxspltib_constant_p (rtx op,
       value = INTVAL (element);
       if (!IN_RANGE (value, -128, 127))
        return false;
+
+      /* See if we could generate vspltisw/vspltish directly instead of
+        xxspltib + sign extend.  Special case 0/-1 to allow getting
+         any VSX register instead of an Altivec register.  */
+      if (!IN_RANGE (value, -1, 0) && EASY_VECTOR_15 (value)
+         && (mode == V4SImode || mode == V8HImode))
+       return false;
     }
 
   /* Handle (const_vector [...]).  */
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md    (revision 237619)
+++ gcc/config/rs6000/vsx.md    (working copy)
@@ -2400,6 +2400,17 @@
     operands[1] = force_reg (<VS_scalar>mode, operands[1]);
 })
 
+;; The pattern following this one hides altivec_vspltisw, which we
+;; prefer to match when possible, so duplicate that here for
+;; TARGET_P9_VECTOR.
+(define_insn "*vsx_splat_v4si_altivec"
+  [(set (match_operand:V4SI 0 "altivec_register_operand" "=v")
+        (vec_duplicate:V4SI
+        (match_operand:QI 1 "s5bit_cint_operand" "i")))]
+  "TARGET_P9_VECTOR"
+  "vspltisw %0,%1"
+  [(set_attr "type" "vecperm")])
+
 (define_insn "*vsx_splat_v4si_internal"
   [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa")
        (vec_duplicate:V4SI
Index: gcc/testsuite/gcc.target/powerpc/splat-p9-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/splat-p9-1.c       (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/splat-p9-1.c       (working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-maltivec -mcpu=power9" } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
+/* { dg-final { scan-assembler "vspltish" } } */
+/* { dg-final { scan-assembler-not "xxspltib" } } */
+
+/* Make sure we don't use an inefficient sequence for small integer splat.  */
+
+#include <altivec.h>
+
+vector short
+foo ()
+{
+  return vec_splat_s16 (5);
+}



Reply via email to