On Thu, Oct 15, 2020 at 4:14 PM Hongtao Liu <crazy...@gmail.com> wrote:
>
> On Thu, Oct 15, 2020 at 1:37 AM Segher Boessenkool
> <seg...@kernel.crashing.org> wrote:
> >
> > Hi!
> >
> > On Wed, Oct 14, 2020 at 01:43:45PM +0800, Hongtao Liu wrote:
> > > On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool
> > > <seg...@kernel.crashing.org> wrote:
> > > > On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote:
> > > > >   For rtx like
> > > > >   (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
> > > > >                    (parallel [(const_int 0) (const_int 1)]))
> > > > >  it could be simplified as inner.
> > > >
> > > > You could even simplify any vec_select of a subreg of X to just a
> > > > vec_select of X, by changing the selection vector a bit (well, only do
> > >
> > > Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to 
> > > selection.
> >
> > Exactly.
> >
> > > > this if that is a constant vector, I suppose).  Not just for paradoxical
> > > > subregs either, just for *all* subregs.
> > >
> > > Yes, and only when X has the same inner mode and more elements.
> >
> > No, for *all*.  The mode of the first argument of vec_select does not
> > have to equal its result mode.
> >
> > Any (constant indices) vec_select of a subreg can be written as just a
> > vec_select.
> >
> > > +       /* Simplify vec_select of a subreg of X to just a vec_select of X
> > > +          when available.  */
> >
> > What does "when available" mean here?
> >
>
> When X has same component mode as vec_select, i'll add this to comments.
>
> > > +       int l2;
> > > +       if (GET_CODE (trueop0) == SUBREG
> > > +           && (GET_MODE_INNER (mode)
> > > +               == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))))
> >
> > Don't use unnecessary parens here please, it makes it harder to read
> > (there are quite enough parens already :-) )
> >
>
> Yes.
>
> > > +           gcc_assert (can_div_trunc_p (SUBREG_BYTE (trueop0),
> > > +                                        GET_MODE_SIZE (GET_MODE_INNER 
> > > (mode)),
> > > +                                        &subreg_offset));
> >
> > Why is this needed?
>
> I only found this interface for poly_uint64 division to get subreg_offset.
>
> >
> > > +           bool success = true;
> > > +           for (int i = 0;i != l1; i++)
> >
> > (space after ; )
> >
> > > +             {
> > > +               rtx j = XVECEXP (trueop1, 0, i);
> >
> > (i and j and k ususally are integers, not rtx)
> >
> > > +               if (!CONST_INT_P (j)
> > > +                   || known_ge (UINTVAL (j), l2 - subreg_offset))
> > > +                 {
> > > +                   success = false;
> > > +                   break;
> > > +                 }
> > > +             }
> >
> > You don't have to test if the input RTL is valid.  You can assume it is.
> >
>
> This test is for something like (vec_select:v2di (subreg:v4di
> (reg:v2di) 0)(parallel [ (const_int 2) (const_int 3)])).
> const_int 2 here is out of range. Are you meaning the upper rtx wouldn't 
> exist?
>

Remove the test.

> > > +           if (success)
> > > +             {
> > > +               rtx par = trueop1;
> > > +               if (subreg_offset)
> > > +                 {
> > > +                   rtvec vec = rtvec_alloc (l1);
> > > +                   for (int i = 0; i < l1; i++)
> > > +                     RTVEC_ELT (vec, i)
> > > +                       = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i)
> > > +                                          + subreg_offset));
> > > +                   par = gen_rtx_PARALLEL (VOIDmode, vec);
> > > +                 }
> > > +               return gen_rtx_VEC_SELECT (mode, XEXP (trueop0, 0), par);
> > > +             }
> > > +         }
> >
> > subreg_offset will differ in meaning if big-endian; is this correct
> Yes.
> > there, do all the stars align so this code works out fine there as well?
> >
>
> i found it's a bit tricky to adjust selection index for target
> BYTES_BIG_ENDIA != WORDS_BIG_ENDIAN.
> Especially for component mode smaller than word, Any interface to handle this?
>

Use subreg_lsb to get offset from least significant bits, suppose this
could be independent of big/little-endian.

> > Looks fine otherwise, thanks :-)
> >
> >
> > Segher
>
>
>
> --
> BR,
> Hongtao

Update Patch.

-- 
BR,
Hongtao
From 2b56fed24fe7ef7f49bcac131bbc7f51f36b0a6a Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao....@intel.com>
Date: Tue, 13 Oct 2020 15:35:29 +0800
Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of
 X.

gcc/ChangeLog
	PR rtl-optimization/97249
	* simplify-rtx.c (simplify_binary_operation_1): Simplify
	vec_select of a subreg of X to a vec_select of X when
	available.

gcc/testsuite/ChangeLog

	* gcc.target/i386/pr97249-1.c: New test.
---
 gcc/simplify-rtx.c                        | 43 +++++++++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 ++++++++++++++++
 2 files changed, 73 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 869f0d11b2e..ff831bcec8f 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -4170,6 +4170,49 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 		    return subop1;
 		}
 	    }
+
+	  /* Simplify vec_select of a subreg of X to just a vec_select of X
+	     when X has same component mode as vec_select.  */
+	  int l2;
+	  if (GET_CODE (trueop0) == SUBREG
+	      && GET_MODE_INNER (mode)
+		 == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))
+	      && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
+	      && (GET_MODE_NUNITS (mode)).is_constant (&l1)
+	      && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0))))
+		  .is_constant (&l2)
+	      && known_le (l1, l2))
+	    {
+	      unsigned HOST_WIDE_INT subreg_offset = 0;
+	      gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
+	      gcc_assert (can_div_trunc_p (exact_div (subreg_lsb (trueop0), BITS_PER_UNIT),
+					   GET_MODE_SIZE (GET_MODE_INNER (mode)),
+					   &subreg_offset));
+	      bool success = true;
+	      for (int i = 0; i != l1; i++)
+		{
+		  rtx idx  = XVECEXP (trueop1, 0, i);
+		  if (!CONST_INT_P (idx))
+		    {
+		      success = false;
+		      break;
+		    }
+		}
+	      if (success)
+		{
+		  rtx par = trueop1;
+		  if (subreg_offset)
+		    {
+		      rtvec vec = rtvec_alloc (l1);
+		      for (int i = 0; i < l1; i++)
+			RTVEC_ELT (vec, i)
+			  = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i)
+					     + subreg_offset));
+		      par = gen_rtx_PARALLEL (VOIDmode, vec);
+		    }
+		  return gen_rtx_VEC_SELECT (mode, XEXP (trueop0, 0), par);
+		}
+	    }
 	}
 
       if (XVECLEN (trueop1, 0) == 1
diff --git a/gcc/testsuite/gcc.target/i386/pr97249-1.c b/gcc/testsuite/gcc.target/i386/pr97249-1.c
new file mode 100644
index 00000000000..4478a34a9f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr97249-1.c
@@ -0,0 +1,30 @@
+/* PR target/97249  */
+/* { dg-do compile } */
+/* { dg-options "-mavx2 -O3 -masm=att" } */
+/* { dg-final { scan-assembler-times {(?n)vpmovzxbw[ \t]+\(.*%xmm[0-9]} 2 } } */
+/* { dg-final { scan-assembler-times {(?n)vpmovzxwd[ \t]+\(.*%xmm[0-9]} 2 } } */
+/* { dg-final { scan-assembler-times {(?n)vpmovzxdq[ \t]+\(.*%xmm[0-9]} 2 } } */
+
+void
+foo (unsigned char* p1, unsigned char* p2, short* __restrict p3)
+{
+    for (int i = 0 ; i != 8; i++)
+     p3[i] = p1[i] + p2[i];
+     return;
+}
+
+void
+foo1 (unsigned short* p1, unsigned short* p2, int* __restrict p3)
+{
+    for (int i = 0 ; i != 4; i++)
+     p3[i] = p1[i] + p2[i];
+     return;
+}
+
+void
+foo2 (unsigned int* p1, unsigned int* p2, long long* __restrict p3)
+{
+    for (int i = 0 ; i != 2; i++)
+      p3[i] = (long long)p1[i] + (long long)p2[i];
+     return;
+}
-- 
2.18.1

Reply via email to