As part of my work in adding support for -mcpu=future, I noticed an insn that
would never match.

Here is the insn:

(define_insn_and_split "*vsx_extract_<mode>_<VS_scalar>mode_var"
  [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,r,r")
        (zero_extend:<VS_scalar>
         (unspec:<VSX_EXTRACT_I:VS_scalar>
          [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,Q")
           (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
          UNSPEC_VSX_EXTRACT)))
   (clobber (match_scratch:DI 3 "=r,r,&b"))
   (clobber (match_scratch:V2DI 4 "=X,&v,X"))]
  "VECTOR_MEM_VSX_P (<VSX_EXTRACT_I:MODE>mode) && TARGET_DIRECT_MOVE_64BIT"
  "#"
  "&& reload_completed"
  [(const_int 0)]
{
  machine_mode smode = <VS_scalar>mode;
  rs6000_split_vec_extract_var (gen_rtx_REG (smode, REGNO (operands[0])),
                                operands[1], operands[2],
                                operands[3], operands[4]);
  DONE;
}
  [(set_attr "isa" "p9v,*,*")])

It will never match, because the zero_extend result is the same mode as the
input, so the machine independent parts of the compiler would never insert the
zero extend.  Instead, an explicit extend is generated to convert the type to
DImode.  In addition, there is the need to split the insn into two parts, one
that handles the register and the other the memory optimization such as was
done for the other 3 variable insns as part of PR target/93932.

There is a wider issue to optimize all cases of vec_extract to do the sign,
zero, and float extension automatically when we are loading from memory, which
is PR target/93230.  I have patches for all of the cases for 93230, but they
will need to wait until GCC 11 opens up.

But it irks me to have this pattern that mostly was correct, but it would never
match.  As I see it, there are 4 options:

1) Delete the insn completely, since it doesn't match, and then put in code
later to cover the case when PR target/93230 is done.

2) Ignore the patterns in the source code, accepting that they are useless, and
will be fixed some day.

3) Patch the existing insns so that they do match, but don't add all of the
other options that could be added (adding sign extension, adding the ability to
load the value into vector registers with ISA 2.07, optimizing vec_extract
being cnverted to floating point to avoid direct moves, etc.).

4) Do all of the possibilities now.  The trouble is due to the number of
special cases, this can add up to a number of new insns (for example, dealing
with HImode/QImode also being zero extended to SImode in addition to DImode,
dealing with QImode not having a sign extending load, dealing with SImode that
can load into the vector registers at ISA 2.05/2.06 which HI/QI modes need
2.07, etc.).

Given we are in stage 4, I think #4 is not appropriate (but if you want, I can
do the patches).

I would prefer to either delete the patterns (#1) or fix them in a limited
extent (#3).  I would prefer not to leave them unchanged in vsx.md, but
obviously that is the simplest approach.

This patch implements #3, fixing the insns as written, but not extending them
to handle all of the other special cases.

I have built compilers on both big endian and little endian PowerPC Linux
systems.  These patches fix the instruction counts of the three tests that now
eliminate the zero_extend operation.  Can I check these patches into the master
branch?

[gcc]
2020-02-28  Michael Meissner  <meiss...@linux.ibm.com>

        PR target/93937
        * config/rs6000/vsx.md (vsx_extract_<mode>_<VS_scalar>mode_var):
        Delete, replace with vsx_extract_<mode>_uns_di_var.
        (vsx_extract_<mode>_uns_di_var): Replacement insn that will match
        zero extensions properly.  Restrict the vector to be in a
        register.
        (vsx_extract_<mode>_uns_di_var_load): New insn to handle variable
        vector extract from memory and combine it with a zero extend to
        DImode.

[gcc/testsuite]
2020-02-28  Michael Meissner  <meiss...@linux.ibm.com>

        PR target/93937
        * gcc.target/powerpc/fold-vec-extract-char.p8.c: Update
        instruction counts.
        * gcc.target/powerpc/fold-vec-extract-int.p8.c: Update
        instruction counts.
        * gcc.target/powerpc/fold-vec-extract-short.p8.c: Update
        instruction counts.

--- /tmp/DHFA3L_vsx.md  2020-02-27 23:48:34.871654766 -0500
+++ gcc/config/rs6000/vsx.md    2020-02-27 16:26:51.538724961 -0500
@@ -3749,15 +3749,16 @@ (define_insn_and_split "*vsx_extract_<mo
 }
   [(set_attr "type" "load")])
 
-(define_insn_and_split "*vsx_extract_<mode>_<VS_scalar>mode_var"
-  [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,r,r")
-       (zero_extend:<VS_scalar>
-        (unspec:<VSX_EXTRACT_I:VS_scalar>
-         [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,Q")
-          (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
+;; Variable V16QI/V8HI/V4SI extract from a register and zero extend to DImode.
+(define_insn_and_split "*vsx_extract_<mode>_uns_di_var"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r,r")
+       (zero_extend:DI
+        (unspec:<VS_scalar>
+         [(match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" "v,v")
+          (match_operand:DI 2 "gpc_reg_operand" "r,r")]
          UNSPEC_VSX_EXTRACT)))
-   (clobber (match_scratch:DI 3 "=r,r,&b"))
-   (clobber (match_scratch:V2DI 4 "=X,&v,X"))]
+   (clobber (match_scratch:DI 3 "=r,r"))
+   (clobber (match_scratch:V2DI 4 "=X,&v"))]
   "VECTOR_MEM_VSX_P (<VSX_EXTRACT_I:MODE>mode) && TARGET_DIRECT_MOVE_64BIT"
   "#"
   "&& reload_completed"
@@ -3769,7 +3770,29 @@ (define_insn_and_split "*vsx_extract_<mo
                                operands[3], operands[4]);
   DONE;
 }
-  [(set_attr "isa" "p9v,*,*")])
+  [(set_attr "isa" "p9v,*")])
+
+;; Variable V16QI/V8HI/V4SI extract from memory and zero extend to DImode.
+(define_insn_and_split "*vsx_extract_<mode>_uns_di_var_load"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
+       (zero_extend:DI
+        (unspec:<VS_scalar>
+         [(match_operand:VSX_EXTRACT_I 1 "memory_operand" "Q")
+          (match_operand:DI 2 "gpc_reg_operand" "r")]
+         UNSPEC_VSX_EXTRACT)))
+   (clobber (match_scratch:DI 3 "=&b"))]
+  "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+       (zero_extend:DI (match_dup 4)))]
+{
+  machine_mode smode = <VS_scalar>mode;
+  rtx reg_small = gen_rtx_REG (smode, reg_or_subregno (operands[0]));
+  operands[4] = rs6000_adjust_vec_address (reg_small, operands[1], operands[2],
+                                          operands[3], <VS_scalar>mode);
+}
+  [(set_attr "type" "load")])
 
 ;; VSX_EXTRACT optimizations
 ;; Optimize double d = (double) vec_extract (vi, <n>)
--- /tmp/4yG7Ho_fold-vec-extract-char.p8.c      2020-02-27 23:48:34.882654847 
-0500
+++ gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p8.c 2020-02-27 
19:55:47.011685661 -0500
@@ -6,9 +6,9 @@
 /* { dg-options "-mdejagnu-cpu=power8 -O2" } */
 
 // six tests total. Targeting P8LE / P8BE.
-// P8 LE variable offset: rldicl, subfic, sldi, mtvsrd, xxpermdi, vslo, 
mfvsrd, sradi, rlwinm, (extsb)
+// P8 LE variable offset: rldicl, subfic, sldi, mtvsrd, xxpermdi, vslo, 
mfvsrd, sradi, (extsb)
 // P8 LE constant offset: vspltb, mfvsrd, rlwinm, (extsb)
-// P8 BE variable offset:                 sldi, mtvsrd, xxpermdi, vslo, 
mfvsrd, sradi, rlwinm, (extsb)
+// P8 BE variable offset:                 sldi, mtvsrd, xxpermdi, vslo, 
mfvsrd, sradi, (extsb)
 // P8 BE constant offset: vspltb, mfvsrd, rlwinm, (extsb)
 
 /* { dg-final { scan-assembler-times {\mrldicl\M} 3 { target { le } } } } */
@@ -21,7 +21,7 @@
 /* { dg-final { scan-assembler-times {\msrdi\M} 3 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "extsb" 2 } } */
 /* { dg-final { scan-assembler-times {\mvspltb\M} 3 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {\mrlwinm\M} 4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mrlwinm\M} 2 { target lp64 } } } */
 
 /* multiple codegen variations for -m32. */
 /* { dg-final { scan-assembler-times {\mrlwinm\M} 3 { target ilp32 } } } */
--- /tmp/JCT8n1_fold-vec-extract-int.p8.c       2020-02-27 23:48:34.889654898 
-0500
+++ gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c  2020-02-27 
23:45:39.924371415 -0500
@@ -7,14 +7,14 @@
 
 // Targeting P8 (LE) and (BE).  6 tests total.
 // P8 LE constant:  vspltw, mfvsrwz, (1:extsw/2:rldicl)
-// P8 LE variables: subfic,  sldi, mtvsrd, xxpermdi, vslo, mfvsrd, sradi, 
(1:extsw/5:rldicl))
+// P8 LE variables: subfic,  sldi, mtvsrd, xxpermdi, vslo, mfvsrd, sradi, 
(1:extsw/3:rldicl))
 // P8 BE constant:  vspltw, mfvsrwz, (1:extsw/2:rldicl)
 // P8 BE variables:                  sldi, mtvsrd, xxpermdi, vslo, mfvsrd, 
sradi, (1:extsw/2:rldicl))
 
 /* { dg-final { scan-assembler-times {\mvspltw\M} 3 { target lp64 } } } */
 /* { dg-final { scan-assembler-times {\mmfvsrwz\M} 3 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {\mrldicl\M} 7 { target { le } } } } */
-/* { dg-final { scan-assembler-times {\mrldicl\M} 4 { target { lp64 && be } } 
} } */
+/* { dg-final { scan-assembler-times {\mrldicl\M} 5 { target { le } } } } */
+/* { dg-final { scan-assembler-times {\mrldicl\M} 2 { target { lp64 && be } } 
} } */
 /* { dg-final { scan-assembler-times {\msubfic\M} 3 { target { le } } } } */
 /* { dg-final { scan-assembler-times {\msldi\M} 3  { target lp64 } } } */
 /* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 { target lp64 } } } */
--- /tmp/bVbx5D_fold-vec-extract-short.p8.c     2020-02-27 23:48:34.896654950 
-0500
+++ gcc/testsuite/gcc.target/powerpc/fold-vec-extract-short.p8.c        
2020-02-27 23:44:15.883754923 -0500
@@ -6,10 +6,10 @@
 /* { dg-options "-mdejagnu-cpu=power8 -O2" } */
 
 // six tests total. Targeting P8, both LE and BE.
-// p8 (le) variable offset: rldicl, subfic, sldi, mtvsrd, xxpermdi, vslo, 
mfvsrd, srdi, (1:extsh/2:rlwinm)
-// p8 (le) const offset:                          mtvsrd,                      
          (1:extsh/2:rlwinm)
-// p8 (be) var offset:                      sldi, mtvsrd, xxpermdi, vslo, 
mfvsrd, srdi, (1:extsh:2:rlwinm)
-// p8 (be) const offset:    vsplth,               mfvsrd,                      
          (1:extsh/2:rlwinm)
+// p8 (le) variable offset: rldicl, subfic, sldi, mtvsrd, xxpermdi, vslo, 
mfvsrd, srdi, (1:extsh/1:rlwinm)
+// p8 (le) const offset:                          mtvsrd,                      
          (1:extsh/1:rlwinm)
+// p8 (be) var offset:                      sldi, mtvsrd, xxpermdi, vslo, 
mfvsrd, srdi, (1:extsh:1:rlwinm)
+// p8 (be) const offset:    vsplth,               mfvsrd,                      
          (1:extsh/1:rlwinm)
 
 // * - each of the above will have an extsh if the argument is signed.
 // * - bool and unsigned tests also have an rlwinm.
@@ -24,7 +24,7 @@
 /* { dg-final { scan-assembler-times "mfvsrd" 6 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "srdi" 3 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "extsh" 2 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "rlwinm" 4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "rlwinm" 2 { target lp64 } } } */
 
 /* -m32 codegen tests. */
 /* { dg-final { scan-assembler-times {\mli\M} 6 { target ilp32 } } } */

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Reply via email to