> From: Hans-Peter Nilsson <h...@axis.com> > Date: Thu, 11 May 2023 17:05:40 +0200
> Next, I'll turn around completely, and try defaulting to > -fsplit-wide-types-early, which sounds more promising. :) > I don't like throwing defaults around randomly, but trying > out a promising idea this way is easy. Absolutely nothing changed (not counting now running "subreg2" and generating a dump-file), compared to the default. Besides coremark and local micro-benchmarks I inspected running arith-rand-ll.c with -O2 and briefly stepped through the passes with gdb: the costs guiding the splits are fine, properly enabling the splits, but not all DImode registers are naturally "splittable"; looks like the ones used in non-decomposable operations remain. It seems all splittable opportunities are dealt with by the first pass ("subreg1"). I guess this pass has the most impact for targets that have few or no DImode operations at all. But why is the option called -fsplit-wide-types-early when what it does is enabling a "subreg2" pass, there being "subreg1" and "subreg3" enabled with -fsplit-wide-types? It should rather be called -fsplit-wide-types-second! :) Looking at its placement in passes.def makes me wonder what magic properties targets have that benefit from it. Anyway, Roger mentioned that the clobbers emitted by the lower-subreg passes were apparently damaging, so I'll try this out "for fun", on the assumption that they're actually unnecessary. I don't think actually removing them has been attempted? The patch below seems to substantially lower register pressure for arith-rand-ll for CRIS, but I've only inspected the assembly source (not even compared the result to the reload version). Quoting it for reference only, and if it "works" (passes regtest for cris-elf and x86-64-linux) I think I'll resubmit as a proper patch: --- lower-subreg.cc.orig 2023-04-29 02:53:39.000000000 +0200 +++ lower-subreg.cc 2023-05-12 15:35:25.574668930 +0200 @@ -1086,9 +1086,6 @@ resolve_simple_move (rtx set, rtx_insn * { unsigned int i; - if (REG_P (dest) && !HARD_REGISTER_NUM_P (REGNO (dest))) - emit_clobber (dest); - for (i = 0; i < words; ++i) { rtx t = simplify_gen_subreg_concatn (word_mode, dest, brgds, H-P