https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87763
Jeffrey A. Law <law at redhat dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |segher at kernel dot crashing.org --- Comment #39 from Jeffrey A. Law <law at redhat dot com> --- One option here is to actually throttle back combining for stuff like this (ud_dce dump): (insn 2 4 3 2 (set (reg/v:DI 92 [ a ]) (reg:DI 0 x0 [ a ])) "j.c":18:1 47 {*movdi_aarch64} (expr_list:REG_DEAD (reg:DI 0 x0 [ a ]) (nil))) (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) (insn 6 3 7 2 (set (reg:DI 93) (const_int 3 [0x3])) "j.c":21:10 47 {*movdi_aarch64} (nil)) (insn 7 6 13 2 (set (zero_extract:DI (reg/v:DI 92 [ a ]) (const_int 8 [0x8]) (const_int 0 [0])) (reg:DI 93)) "j.c":21:10 764 {*insv_regdi} (expr_list:REG_DEAD (reg:DI 93) (nil))) (insn 13 7 14 2 (set (reg/i:DI 0 x0) (reg/v:DI 92 [ a ])) "j.c":22:1 47 {*movdi_aarch64} (expr_list:REG_DEAD (reg/v:DI 92 [ a ]) (nil))) We've actually got an RMW insn when combine starts. But... Trying 17, 7 -> 13: 17: r92:DI=r95:DI REG_DEAD r95:DI 7: zero_extract(r92:DI,0x8,0)=r93:DI REG_DEAD r93:DI 13: x0:DI=r92:DI REG_DEAD r92:DI Failed to match this instruction: (set (reg/i:DI 0 x0) (ior:DI (and:DI (reg:DI 95) (const_int -256 [0xffffffffffffff00])) (reg:DI 93))) We've torn the damn thing apart via expand_field_assignment. Worse yet, that naturally splits and results in: Successfully matched this instruction: (set (reg/v:DI 92 [ a ]) (and:DI (reg:DI 95) (const_int -256 [0xffffffffffffff00]))) Successfully matched this instruction: (set (reg/i:DI 0 x0) (ior:DI (reg/v:DI 92 [ a ]) (reg:DI 93))) allowing combination of insns 17, 7 and 13 original costs 4 + 4 + 4 = 12 replacement costs 4 + 4 = 8 So we think the split is cheaper. And at this point we've lost. We won't do further combinations into the second insn (destination is a hard reg and source isn't a reg). Costing could clearly be improved here. Two copies and a zero extract are cheaper than two logicals -- largely because the copies often go away. But we can't know that at this point. We could throttle attempts to combine into insn 13 if the source is not a register and that was moderately successful. But it seems to me like we're better off making try_combine smarter. It's not hard at all to see something like (set (reg/i:DI 0 x0) (ior:DI (and:DI (reg:DI 95) (const_int -256 [0xffffffffffffff00])) (reg:DI 93))) And realize that with a copy we can turn this into a RMW. In the case where the destination is a hard register we can do something like this (set (reg 95) (ior (and (reg 95) ...))) (set (reg 0) (reg 95)) Of course that assumes reg95 dies, but that's easy enough to check. And that has a good shot at being identified as a bitfield insertion. Another example: (set (reg 92)) (ior:DI (and:DI (reg 0) (const_int -256 [0xffffffffffffff00])) (reg:DI 93))) I'm not sure this happens anymore, but it can be addressed in a similar way, but with a copy before the insn, so (set (reg 92) (reg 0)) (set (reg 92) (ior (and (reg 92) ...))) Which again can often be identified as a bitfield insertion. The all pseudo case can be handled with similar transformations. This is actually pretty easy to wire into try_combine (rather than make_field_assignment) just before we do splitting. If we wanted to be really thorough, we can look at make_field_assignment and create a routine that handles all the cases found there. So we identify the cases we want to handle, use subst to make the change so that it looks like an RMW (subst eventually calls make_field_assignment). Assuming recognition succeeds, then we emit the copy before/after I3 and let normal processing continue. If recognition fails we can undo_all and return. That also seems to handle the combine_bfi regression (though I haven't tested on Steve's more thorough tests). It doesn't handle the lsl_ar_sbfix case, but AFAICT that was failing at 265398 as well. Note this doesn't require major surgery in make_field_extraction or its friends which is kind of nice given the way they're called out of subst. Thoughts?