On 24/11/15 00:15, Segher Boessenkool wrote:
On Thu, Nov 19, 2015 at 03:20:22PM +0000, Kyrill Tkachov wrote:
Hmmm, so the answer to that is a bit further down the validate_replacement:
path.
It's the code after the big comment:
   /* See if this is a PARALLEL of two SETs where one SET's destination is
      a register that is unused and this isn't marked as an instruction that
      might trap in an EH region.  In that case, we just need the other SET.
      We prefer this over the PARALLEL.

      This can occur when simplifying a divmod insn.  We *must* test for this
      case here because the code below that splits two independent SETs
      doesn't
      handle this case correctly when it updates the register status.

      It's pointless doing this if we originally had two sets, one from
      i3, and one from i2.  Combining then splitting the parallel results
      in the original i2 again plus an invalid insn (which we delete).
      The net effect is only to move instructions around, which makes
      debug info less accurate.  */

The code extracts all the valid sets inside the PARALLEL and calls
recog_for_combine on them
individually, ignoring the clobber.
Before I made this use is_parallel_of_n_reg_sets the code used to test
if it is a parallel of two sets, and no clobbers allowed.  So it would
never allow a clobber of zero.  But now it does.  I'll fix this in
is_parallel_of_n_reg_sets.

Thanks for finding the problem!

Thanks for fixing the wrong-code issue.
As I mentioned on IRC, this patch improves codegen on aarch64 as well.
I've re-checked SPEC2006 and it seems to improve codegen around 
multiply-extend-accumulate
instructions. For example the sequence:
    mov    w4, 64
    mov    x1, 24
    smaddl    x1, w9, w4, x1 // multiply-sign-extend-accumulate
    add    x1, x3, x1

becomes something like this:
    mov    w3, 64
    smaddl    x1, w9, w3, x0
    add    x1, x1, 24  // constant 24 propagated into the add

Another was transforming the muliply-extend into something cheaper:
    mov    x0, 40
    mov    w22, 32
    umaddl    x22, w21, w22, x0 // multiply-zero-extend-accumulate

changed becomes:
    ubfiz    x22, x21, 5, 32 // ASHIFT+extend
    add    x22, x22, 40

which should be always beneficial.
From what I can see we don't lose any of the multiply-extend-accumulate
 opportunities that we gained from the original combine patch.

So can we take this patch in as well?

Thanks,
Kyrill

Segher


Reply via email to