> From: "Roger Sayle" <ro...@nextmovesoftware.com> > Date: Tue, 2 May 2023 00:37:14 +0100
> Jeff Law wrote: > > This patch converts the xstormy16 patch to LRA. It introduces a code > > quality regression in the shiftsi testcase, but it also fixes numerous > > aborts/errors. IMHO it's a good tradeoff. > > I've investigated the shiftsi regression on xstormy16 and the underlying > cause > appears to be an interaction between lower-subreg's "subreg3" pass and the > new LRA. Previously, reload was not phased by the "clobbers" that are > introduced by the decompose_multiword_subregs function, but they appear > to interfere with LRA's register assignments. > > combine's make_extra_copies introduces a new pseudo-to-pseudo move, > but when subreg3 inserts a naked clobber between the original and the > new move, LRA is recombine theses pseudos back to the same allocno. > > The shiftsi.cc regression on xstormy16 is fixed by adding > -fno-split-wide-types. > In fact, if all the regression tests pass, I'd suggest that > flag_split_wide-types = false > should be the default on xstormy16 now that we've moved to LRA. And if this > works for xstormy16, it might be useful to other targets for the LRA > transition; > it's a difference in behaviour between reload and LRA that could potentially > affect multiple targets. > > For reference, xstormy16 has a post-reload define_insn_and_split for movsi > (i.e. a multi-word move). If this insn was split during split1 (i.e. before > subreg3) > there wouldn't be a problem (no clobber), but alas the target's > xstormy16_split_move > function has several asserts insisting this only get called when > reload_completed. > > I hope this is useful. > Cheers, > Roger Yes, very interesting. Thank you for sharing this. I've seen regressions with LRA for CRIS too, for "double-register-sized" types, which for CRIS, a 32-bit target, translates to 64-bit types (DFmode and DImode), and where LRA does a much worse job than reload; spills a lot more often to stack, even after trying every register-allocation-related hook I found (and also an LRA patch which helped only by a fraction, but regressed results on x86_64-linux, so let's quickly forget it again). No fix or nicely stated bug entry yet, but at least a different observation: Coremark for cris-elf built with -O2 -march=v10, when going from reload to LRA is slightly faster but a bit bigger (for example before/after Jeffs r14-383-gfaf8bea79b6256, 5090593 to 5090567 cycles and 48887 to 48901 bytes), a relative observation which has not changed much since February when I started working on an LRA transition for CRIS. But, the case for code with heavy use of "double-register- sized" types is much worse; up to several percent slower. My favorite sharable example is gcc/testsuite/gcc.c-torture/execute/arith-rand-ll.c (with a few unimportant local tweaks not suitable for upstreaming but which I'm happy to share with anyone asking) which around that commit goes from 1295021 to 1317531 cycles (101.74%) and one percent larger; 4008 to 4048 bytes. Your suggestion to default to -fno-split-wide-types seemed too good to be true, and though worth a try, unfortunately it was. I'm seeing *horrible* regressions for double-register codes with the patch below on top of LRA. Coremark numbers suffer too (different baseline here than above; closer to today's sources) from 5078989 to 5081968 cycles and from 48537 to 50145 bytes. But, arith-rand-ll suffers much more: from 1317530 to 2182080 cycles (yes, 165.62%) and from 4044 to 4174 bytes. (With reload, it's bad too, but "only" regressing 143.67% by speed.) Next, I'll turn around completely, and try defaulting to -fsplit-wide-types-early, which sounds more promising. :) I don't like throwing defaults around randomly, but trying out a promising idea this way is easy. So because of the numbers above, this will never be committed, just passed for reference. I believe this is the correct way to default to -fno-split-wide-types: -- >8 -- [PATCH] CRIS: Default to -fno-split-wide-types * common/config/cris/cris-common.cc (cris_option_optimization_table): New. Default to -fno-split-wide-types. (TARGET_OPTION_OPTIMIZATION_TABLE): Define. --- gcc/common/config/cris/cris-common.cc | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/gcc/common/config/cris/cris-common.cc b/gcc/common/config/cris/cris-common.cc index b08d6014102d..cf00c1414651 100644 --- a/gcc/common/config/cris/cris-common.cc +++ b/gcc/common/config/cris/cris-common.cc @@ -26,6 +26,14 @@ along with GCC; see the file COPYING3. If not see #include "opts.h" #include "flags.h" +/* Implement TARGET_OPTION_OPTIMIZATION_TABLE. */ + +static const struct default_options cris_option_optimization_table[] = + { + { OPT_LEVELS_1_PLUS, OPT_fsplit_wide_types, NULL, 0 }, + { OPT_LEVELS_NONE, 0, NULL, 0 } + }; + /* TARGET_HANDLE_OPTION worker. We just store the values into local variables here. Checks for correct semantics are in cris_option_override. */ @@ -90,5 +98,7 @@ cris_handle_option (struct gcc_options *opts, #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT | CRIS_SUBTARGET_DEFAULT) #undef TARGET_HANDLE_OPTION #define TARGET_HANDLE_OPTION cris_handle_option +#undef TARGET_OPTION_OPTIMIZATION_TABLE +#define TARGET_OPTION_OPTIMIZATION_TABLE cris_option_optimization_table struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER; -- 2.30.2