Hi!
On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote:
> For rtx like
> (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
> (parallel [(const_int 0) (const_int 1)]))
> it could be simplified as inner.
You could even simplify any vec_select of a subreg of X to just a
vec_select of X, by changing the selection vector a bit (well, only do
this if that is a constant vector, I suppose). Not just for paradoxical
subregs either, just for *all* subregs.
> gcc/ChangeLog
> PR rtl-optimization/97249
> * simplify-rtx.c (simplify_binary_operation_1): Simplify
> vec_select of paradoxical subreg.
>
> gcc/testsuite/ChangeLog
>
> * gcc.target/i386/pr97249-1.c: New test.
> + /* For cases like
> + (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
> + (parallel [(const_int 0) (const_int 1)])).
> + return inner directly. */
> + if (GET_CODE (trueop0) == SUBREG
> + && paradoxical_subreg_p (trueop0)
> + && mode == GET_MODE (XEXP (trueop0, 0))
> + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
> + && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> + && l0 % l1 == 0)
Why this? Why does the number of elements of the input have to divide
that of the output?
> + {
> + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
> + unsigned HOST_WIDE_INT expect = (HOST_WIDE_INT_1U << l1) - 1;
> + unsigned HOST_WIDE_INT sel = 0;
> + int i = 0;
> + for (;i != l1; i++)
for (int i = 0; i != l1; i++)
> + {
> + rtx j = XVECEXP (trueop1, 0, i);
> + if (!CONST_INT_P (j))
> + break;
> + sel |= HOST_WIDE_INT_1U << UINTVAL (j);
> + }
> + /* ??? Need to simplify XEXP (trueop0, 0) here. */
> + if (sel == expect)
> + return XEXP (trueop0, 0);
> + }
> }
If you just handle the much more generic case, all the other vec_select
simplifications can be done as well, not just this one.
> +/* PR target/97249 */
> +/* { dg-do compile } */
> +/* { dg-options "-mavx2 -O3 -masm=att" } */
> +/* { dg-final { scan-assembler-times "vpmovzxbw\[
> \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
> +/* { dg-final { scan-assembler-times "vpmovzxwd\[
> \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
> +/* { dg-final { scan-assembler-times "vpmovzxdq\[
> \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
I don't know enough about the x86 backend to know if this is exactly
what you need in the testsuite. I do know a case of backslashitis when
I see one though -- you might want to use {} instead of "", and perhaps
\m and \M and \s etc. And to make sure things are on one line, don't do
all that nastiness with [^\n], just start the RE with (?n) :-)
Segher