[Bug rtl-optimization/87763] [9 Regression] aarch64 target testcases fail after r265398

law at redhat dot com Fri, 05 Apr 2019 14:58:37 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87763


Jeffrey A. Law <law at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |segher at kernel dot 
crashing.org

--- Comment #39 from Jeffrey A. Law <law at redhat dot com> ---
One option here is to actually throttle back combining for stuff like this
(ud_dce dump):

(insn 2 4 3 2 (set (reg/v:DI 92 [ a ])
        (reg:DI 0 x0 [ a ])) "j.c":18:1 47 {*movdi_aarch64}
     (expr_list:REG_DEAD (reg:DI 0 x0 [ a ])
        (nil)))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (set (reg:DI 93)
        (const_int 3 [0x3])) "j.c":21:10 47 {*movdi_aarch64}
     (nil))
(insn 7 6 13 2 (set (zero_extract:DI (reg/v:DI 92 [ a ])
            (const_int 8 [0x8])
            (const_int 0 [0]))
        (reg:DI 93)) "j.c":21:10 764 {*insv_regdi}
     (expr_list:REG_DEAD (reg:DI 93)
        (nil)))
(insn 13 7 14 2 (set (reg/i:DI 0 x0)
        (reg/v:DI 92 [ a ])) "j.c":22:1 47 {*movdi_aarch64}
     (expr_list:REG_DEAD (reg/v:DI 92 [ a ])
        (nil)))

We've actually got an RMW insn when combine starts.  But...
Trying 17, 7 -> 13:
   17: r92:DI=r95:DI
      REG_DEAD r95:DI
    7: zero_extract(r92:DI,0x8,0)=r93:DI
      REG_DEAD r93:DI
   13: x0:DI=r92:DI
      REG_DEAD r92:DI
Failed to match this instruction:
(set (reg/i:DI 0 x0)
    (ior:DI (and:DI (reg:DI 95)
            (const_int -256 [0xffffffffffffff00]))
        (reg:DI 93)))

We've torn the damn thing apart via expand_field_assignment.  Worse yet, that
naturally splits and results in:

Successfully matched this instruction:
(set (reg/v:DI 92 [ a ])
    (and:DI (reg:DI 95)
        (const_int -256 [0xffffffffffffff00])))
Successfully matched this instruction:
(set (reg/i:DI 0 x0)
    (ior:DI (reg/v:DI 92 [ a ])
        (reg:DI 93)))
allowing combination of insns 17, 7 and 13
original costs 4 + 4 + 4 = 12
replacement costs 4 + 4 = 8


So we think the split is cheaper.    And at this point we've lost.  We won't do
further combinations into the second insn (destination is a hard reg and source
isn't a reg).  Costing could clearly be improved here.  Two copies and a zero
extract are cheaper than two logicals -- largely because the copies often go
away.  But we can't know that at this point.


We could throttle attempts to combine into insn 13 if the source is not a
register and that was moderately successful.   But it seems to me like we're
better off making try_combine smarter.

It's not hard at all to see something like
(set (reg/i:DI 0 x0)
    (ior:DI (and:DI (reg:DI 95)
            (const_int -256 [0xffffffffffffff00]))
        (reg:DI 93)))


And realize that with a copy we can turn this into a RMW.  In the case where
the destination is a hard register we can do something like this

(set (reg 95) (ior (and (reg 95) ...)))
(set (reg 0) (reg 95))

Of course that assumes reg95 dies, but that's easy enough to check.  And that
has a good shot at being identified as a bitfield insertion.

Another example:

(set (reg 92))
    (ior:DI (and:DI (reg 0)
            (const_int -256 [0xffffffffffffff00]))
        (reg:DI 93)))


I'm not sure this happens anymore, but it can be addressed in a similar way,
but with a copy before the insn, so

(set (reg 92) (reg 0))
(set (reg 92) (ior (and (reg 92) ...)))

Which again can often be identified as a bitfield insertion.

The all pseudo case can be handled with similar transformations.

This is actually pretty easy to wire into try_combine (rather than
make_field_assignment) just before we do splitting.  If we wanted to be really
thorough, we can look at make_field_assignment and create a routine that
handles all the cases found there.

So we identify the cases we want to handle, use subst to make the change so
that it looks like an RMW (subst eventually calls make_field_assignment). 
Assuming recognition succeeds, then we emit the copy before/after I3 and let
normal processing continue.  If recognition fails we can undo_all and return.

That also seems to handle the combine_bfi regression (though I haven't tested
on Steve's more thorough tests).

It doesn't handle the lsl_ar_sbfix case, but AFAICT that was failing at 265398
as well.

Note this doesn't require major surgery in make_field_extraction or its friends
which is kind of nice given the way they're called out of subst.

Thoughts?

[Bug rtl-optimization/87763] [9 Regression] aarch64 target testcases fail after r265398

Reply via email to