Segher Boessenkool <seg...@kernel.crashing.org> writes: > On Thu, Nov 21, 2019 at 08:32:14PM +0000, Richard Sandiford wrote: >> Segher Boessenkool <seg...@kernel.crashing.org> writes: >> > It's not great if every pass invents its own version of some common >> > infrastructure thing because that common one is not suitable. >> > >> > I.e., can this be fixed somehow? Maybe just by having a restricted DU >> > chains df problem? >> >> Well, it'd probably make sense to make fwprop.c's approach available >> as a "proper" df interface at some point. Hopefully if anyone wants the >> same thing as fwprop.c, they'd do that rather than copy the code. :-) > >> >> * Updating a full, ordered def-use chain after a move is a linear-time >> >> operation, so whatever happens, we'd need to apply some kind of limit >> >> on the number of uses we maintain, with something like that integer >> >> point range for the rest. > > Yeah. > >> >> * Once we've analysed the insn and built its def-use chains, we don't >> >> look at the df_refs again until we update the chains after a successful >> >> combination. So it should be more efficient to maintain a small array >> >> of insn_info_rec pointers alongside the numerical range, rather than >> >> walk and pollute chains of df_refs and then link back the insn uids >> >> to the pass-local info. >> > >> > So you need something like combine's LOG_LINKS? Not that handling those >> > is not quadratic in the worst case, but in practice it works well. And >> > it *could* be made linear. >> >> Not sure why what I've used isn't what I need though :-) > > I am wondering the other way around :-) Is what you do for combine2 > something that would be more generally applicable/useful? That's what > I'm trying to find out :-) > > What combine does could use some improvement, if you want to hear a > more direct motivations. LOG_LINKS just skip references we cannot > handle (and some more), so we always have to do modified_between etc., > which hurts.
The trade-offs behind the choice of representation are very specific to the pass. You'd only pick this if you wanted both to propagate definitions into uses and to move insns around. You'd also only pick it if you were happy with tracking a small number of named uses per definition. I can't think of any other passes that would prefer this over what they already use. (Combine itself is an exception, since the new pass started out as a deliberate attempt to start from scratch.) >> >> Target Tests Delta Best Worst Median >> >> avr-elf 1341 -111401 -13824 680 -10 >> > >> > Things like this are kind of suspicious :-) >> >> Yeah. This mostly seems to come from mopping up the extra moves created >> by make_more_copies. So we have combinations like: >> >> 58: r70:SF=r94:SF >> REG_DEAD r94:SF >> 60: r22:SF=r70:SF >> REG_DEAD r70:SF > > Why didn't combine do this? A target problem? Seems to be because combine rejects hard-reg destinations whose classes are likely spilled (cant_combine_insn_p). This SF argument register happens to overlap POINTER_X_REGS and POINTER_Y_REGS and so we reject the combination based on POINTER_X_REGS being likely spilled. I think the same thing could happen on other targets, e.g. for TAILCALL_ADDR_REGS on aarch64. >> So there's only one case in which it isn't a win, but the number of >> tests is tiny. So I agree there's no justification for trying this in >> combine proper as things stand (and I wasn't arguing otherwise FWIW). >> I'd still like to keep it in the new pass because it does help >> *sometimes* and there's no sign yet that it has a noticeable >> compile-time cost. > > So when does it help? I can only think of cases where there are > problems elsewhere. The full list of affected tests (all at -O2 -ftree-vectorize) are: arc-elf gcc.c-torture/compile/pr67506.c avr-elf gcc.dg/torture/pr77916.c bpf-elf gcc.dg/torture/vshuf-v8hi.c bpf-elf gcc.dg/torture/vshuf-v4si.c bfin-elf gcc.dg/torture/vshuf-v8qi.c c6x-elf gcc.c-torture/execute/991118-1.c cr16-elf gcc.c-torture/compile/pr82052.c epiphany-elf gcc.c-torture/execute/991118-1.c epiphany-elf gcc.dg/pr77664.c epiphany-elf gcc.dg/vect/vect-mult-pattern-2.c epiphany-elf gcc.dg/torture/vshuf-v8hi.c epiphany-elf gcc.dg/tree-ssa/pr77664.c epiphany-elf gcc.dg/tree-ssa/negneg-3.c fr30-elf gcc.dg/torture/vshuf-v4hi.c fr30-elf gcc.dg/torture/vshuf-v8hi.c frv-linux-gnu gcc.dg/torture/vshuf-v4hi.c frv-linux-gnu gcc.dg/torture/vshuf-v8hi.c h8300-elf gcc.c-torture/execute/20000422-1.c h8300-elf gcc.dg/torture/pr77916.c ia64-linux-gnu gcc.c-torture/execute/ieee/pr30704.c ia64-linux-gnu gcc.dg/vect/pr49478.c ia64-linux-gnu gcc.dg/tree-ssa/ldist-16.c i686-apple-darwin gcc.dg/vect/vect-mult-pattern-2.c m32r-elf gcc.dg/store_merging_8.c m32r-elf gcc.dg/torture/vshuf-v4hi.c m32r-elf gcc.dg/torture/vshuf-v8hi.c m32r-elf gcc.dg/tree-ssa/vrp61.c mcore-elf gcc.c-torture/execute/991118-1.c mcore-elf gcc.dg/torture/vshuf-v4hi.c mcore-elf gcc.dg/torture/vshuf-v8hi.c mcore-elf gcc.dg/torture/vshuf-v8qi.c mmix gcc.dg/torture/20181024-1.c mn10300-elf g++.dg/warn/Warray-bounds-6.C moxie-rtems gcc.c-torture/execute/930718-1.c moxie-rtems gcc.c-torture/compile/pr70263-1.c moxie-rtems gcc.dg/graphite/scop-5.c moxie-rtems g++.dg/pr80707.C nds32le-elf gcc.dg/torture/vshuf-v16qi.c nios2-linux-gnu gcc.dg/torture/vshuf-v8qi.c or1k-elf gcc.dg/torture/vshuf-v4hi.c or1k-elf gcc.dg/torture/vshuf-v8hi.c or1k-elf gcc.dg/tree-ssa/vrp61.c powerpc-ibm-aix7.0 g++.dg/warn/Wunused-3.C powerpc-ibm-aix7.0 g++.dg/lto/pr88049_0.C powerpc-ibm-aix7.0 g++.dg/other/cxa-atexit1.C s390-linux-gnu gcc.c-torture/compile/20020304-1.c s390-linux-gnu gcc.dg/atomic-op-1.c s390-linux-gnu gcc.dg/atomic/stdatomic-op-1.c s390-linux-gnu gcc.dg/atomic/c11-atomic-exec-2.c s390-linux-gnu gcc.dg/atomic/c11-atomic-exec-3.c s390-linux-gnu gcc.dg/ubsan/float-cast-overflow-atomic.c s390x-linux-gnu gcc.c-torture/compile/20020304-1.c sh-linux-gnu gcc.c-torture/execute/991118-1.c sh-linux-gnu gcc.dg/torture/vshuf-v8qi.c sparc-linux-gnu gcc.dg/pr56890-2.c sparc-linux-gnu gcc.dg/torture/vshuf-v4hi.c sparc-linux-gnu gcc.dg/torture/vshuf-v8hi.c sparc-linux-gnu gcc.dg/torture/20181024-1.c sparc64-linux-gnu gcc.dg/torture/20181024-1.c xstormy16-elf gcc.c-torture/execute/strlen-5.c xstormy16-elf gcc.c-torture/execute/20080424-1.c xstormy16-elf gcc.c-torture/compile/pr60655-1.c xstormy16-elf gcc.c-torture/compile/pr60655-2.c xstormy16-elf gcc.dg/Wrestrict-9.c xstormy16-elf gcc.dg/graphite/scop-15.c xstormy16-elf gcc.dg/guality/pr43051-1.c xstormy16-elf gcc.dg/torture/pr68955.c xstormy16-elf gcc.dg/torture/pr58955-2.c xstormy16-elf gcc.dg/tree-ssa/builtin-sprintf-warn-23.c The s390x-linux-gnu test is one in which we have: 116: {r167:DI=r86:DI-0x1000;clobber %cc:CC;} REG_DEAD r86:DI REG_UNUSED %cc:CC 118: %r2:DI=[r167:DI+r155:DI+0x5] REG_DEAD r167:DI REG_DEAD r155:DI REG_EQUAL [r167:DI+0x1005] and so the 0x1000s cancel each other out. And yeah, you could definitely argue that it's a problem elsewhere. :-) Expand has: ;; _32 = BGl_equalzf3zf3zz__r4_equivalence_6_2z00 (_31, 2B); (insn 113 112 114 (set (reg:DI 164) (const_int -4096 [0xfffffffffffff000])) "gcc.c-torture/compile/20020304-1.c":161:9 -1 (nil)) (insn 114 113 115 (set (reg:DI 165) (reg:DI 164)) "gcc.c-torture/compile/20020304-1.c":161:9 -1 (nil)) (insn 115 114 116 (set (reg:DI 166) (const_int 4096 [0x1000])) "gcc.c-torture/compile/20020304-1.c":161:9 -1 (nil)) (insn 116 115 117 (parallel [ (set (reg:DI 167) (plus:DI (reg:DI 86 [ BgL_cdrzd21994zd2_959.10_27 ]) (reg:DI 165))) (clobber (reg:CC 33 %cc)) ]) "gcc.c-torture/compile/20020304-1.c":161:9 -1 (nil)) (insn 117 116 118 (set (reg:DI 3 %r3) (const_int 2 [0x2])) "gcc.c-torture/compile/20020304-1.c":161:9 -1 (nil)) (insn 118 117 119 (set (reg:DI 2 %r2) (mem/f/j:DI (plus:DI (plus:DI (reg:DI 167) (reg:DI 166)) (const_int 5 [0x5])) [2 _30->pair_t.cdr+0 S8 A64])) "gcc.c-torture/compile/20020304-1.c":161:9 -1 (nil)) >> It might also be interesting to see how much difference it makes for >> run-combine=4 (e.g. to see how much it makes up for the current 2-insn >> limit)... > > Numbers are good :-) FWIW, it does make more of a difference there, but not massively: Target Tests Delta Best Worst Median ====== ===== ===== ==== ===== ====== aarch64-linux-gnu 5 -15 -5 -1 -3 aarch64_be-linux-gnu 4 -14 -5 -2 -4 arc-elf 1 -4 -4 -4 -4 arm-linux-gnueabi 4 -22 -10 -2 -8 arm-linux-gnueabihf 4 -22 -10 -2 -8 avr-elf 1 -1 -1 -1 -1 bfin-elf 25 -592 -223 3 -5 bpf-elf 47 -508 -95 -1 -3 c6x-elf 26 -388 -74 1 -4 cr16-elf 18 -142 -82 -1 -2 csky-elf 5 -10 -4 -1 -2 epiphany-elf 30 -514 -155 -1 -4 fr30-elf 28 -416 -140 -1 -3 frv-linux-gnu 45 -1274 -209 -1 -4 ft32-elf 7 -17 -6 -1 -2 h8300-elf 3 -7 -5 -1 -1 hppa64-hp-hpux11.23 1 -1 -1 -1 -1 i686-apple-darwin 1 -3 -3 -3 -3 ia64-linux-gnu 8 -86 -26 -5 -10 iq2000-elf 1 -2 -2 -2 -2 m32r-elf 78 -1692 -308 -2 -4 mcore-elf 58 -1117 -174 3 -5 mipsel-linux-gnu 7 -26 -8 -2 -3 mipsisa64-linux-gnu 30 -136 -18 -2 -3 mmix 5 -7 -2 -1 -1 mn10300-elf 1 -2 -2 -2 -2 moxie-rtems 11 -35 -5 -2 -3 msp430-elf 1 -1 -1 -1 -1 nds32le-elf 15 -142 -88 -1 -2 nios2-linux-gnu 22 -259 -110 -1 -4 nvptx-none 2 -8 -4 -4 -4 or1k-elf 34 -592 -160 -1 -3 powerpc64le-linux-gnu 1 -8 -8 -8 -8 riscv32-elf 4 -11 -6 -1 -2 riscv64-elf 2 -7 -6 -1 -6 rl78-elf 1 -7 -7 -7 -7 rx-elf 1 -2 -2 -2 -2 s390-linux-gnu 35 708 -12 292 -1 s390x-linux-gnu 15 -53 -6 -2 -3 sh-linux-gnu 38 -741 -141 2 -6 sparc-linux-gnu 26 -478 -156 -1 -7 sparc64-linux-gnu 10 -86 -28 -2 -4 vax-netbsdelf 1 -4 -4 -4 -4 visium-elf 30 -467 -159 -1 -4 x86_64-darwin 7 -24 -10 -1 -2 x86_64-linux-gnu 7 -26 -12 -1 -2 xstormy16-elf 15 -70 -45 2 -2 xtensa-elf 26 -682 -226 -2 -4 Thanks, Richard