RFC: 2->2 combine patch (was: Re: [PATCH] Allow fwprop to undo vectorization harm (PR68961))

Segher Boessenkool Thu, 16 Jun 2016 17:07:54 -0700

On Fri, Jun 10, 2016 at 11:20:22AM +0200, Richard Biener wrote:
> With the proposed cost change for vector construction we will end up
> vectorizing the testcase in PR68961 again (on x86_64 and likely
> on ppc64le as well after that target gets adjustments).  Currently
> we can't optimize that away again noticing the direct overlap of
> argument and return registers.  The obstackle is
> 
> (insn 7 4 8 2 (set (reg:V2DF 93)
>         (vec_concat:V2DF (reg/v:DF 91 [ a ])
>             (reg/v:DF 92 [ aa ]))) 
> ...
> (insn 21 8 24 2 (set (reg:DI 97 [ D.1756 ])
>         (subreg:DI (reg:TI 88 [ D.1756 ]) 0))
> (insn 24 21 11 2 (set (reg:DI 100 [+8 ])
>         (subreg:DI (reg:TI 88 [ D.1756 ]) 8))
> 
> which we eventually optimize to DFmode subregs of (reg:V2DF 93).
> 
> First of all simplify_subreg doesn't handle the subregs of a vec_concat
> (easy fix below).
> 
> Then combine doesn't like to simplify the multi-use (it tries some
> parallel it seems).


Combine will not do a 2->2 combination currently.  Say it is combining
A with a later B into C, and the result of A is used again later, then
it tries a parallel of A with C.  That usually does not match an insn for
the target.

If this were a 3->2 (or 4->2) combination, or A or C are no-op moves
(so that they will disappear later in combines), combine will break the
parallel into two and see if that matches.  We can in fact do that for
2->2 combinations as well: this removes a log_link (from A to B), so
combine cannot get into an infinite loop, even though it does not make
the number of RTL insns lower.

So I tried out the patch below.  It decreases code size on most targets
(mostly fixed length insn targets), and increases it a small bit on some
variable length insn targets (doing an op twice, instead of doing it once
and doing a move).  It looks to be all good there too, but there are so
many changes that it is almost impossible to really check.

So: can people try this out with their favourite benchmarks, please?


Segher


diff --git a/gcc/combine.c b/gcc/combine.c
index 6b5d000..2c99b4e 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -3933,8 +3933,6 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
           && XVECLEN (newpat, 0) == 2
           && GET_CODE (XVECEXP (newpat, 0, 0)) == SET
           && GET_CODE (XVECEXP (newpat, 0, 1)) == SET
-          && (i1 || set_noop_p (XVECEXP (newpat, 0, 0))
-                 || set_noop_p (XVECEXP (newpat, 0, 1)))
           && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != ZERO_EXTRACT
           && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != STRICT_LOW_PART
           && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != ZERO_EXTRACT

RFC: 2->2 combine patch (was: Re: [PATCH] Allow fwprop to undo vectorization harm (PR68961))

Reply via email to