On 22/07/2020 13:24, Richard Biener via Gcc-patches wrote: > On Wed, Jul 22, 2020 at 12:03 PM Andrea Corallo <andrea.cora...@arm.com> > wrote: >> >> Hi all, >> >> I'd like to submit the following two patches implementing a new AArch64 >> specific back-end pass that helps optimize branch-dense code, which can >> be a bottleneck for performance on some Arm cores. This is achieved by >> padding out the branch-dense sections of the instruction stream with >> nops. >> >> The original patch was already posted some time ago: >> >> https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg200721.html >> >> This follows up splitting as suggested in two patches, rebasing on >> master and implementing the suggestions of the first code review. >> >> This first patch implements the addition of a new RTX instruction class >> FILLER_INSN, which has been white listed to allow placement of NOPs >> outside of a basic block. This is to allow padding after unconditional >> branches. This is favorable so that any performance gained from >> diluting branches is not paid straight back via excessive eating of >> nops. >> >> It was deemed that a new RTX class was less invasive than modifying >> behavior in regards to standard UNSPEC nops. >> >> 1/2 is requirement for 2/2. Please see this the cover letter of this last >> for more details on the pass itself. > > I wonder if such effect of instructions on the pipeline can be modeled > in the DFA and thus whether the scheduler could issue (always ready) > NOPs? > > I also wonder whether such optimization is better suited for the assembler > which should know instruction lengths and alignment in a more precise > way and also would know whether extra nops make immediates too large > for pc relative things like short branches or section anchor accesses > (or whatever else)?
No, the assembler should never spontaneously insert instructions. That breaks the branch range calculations that the compiler relies upon. R. > > Richard. > >> Regards >> >> Andrea >> >> gcc/ChangeLog >> >> 2020-07-17 Andrea Corallo <andrea.cora...@arm.com> >> Carey Williams <carey.willi...@arm.com> >> >> * cfgbuild.c (inside_basic_block_p): Handle FILLER_INSN. >> * cfgrtl.c (rtl_verify_bb_layout): Whitelist FILLER_INSN outside >> basic blocks. >> * coretypes.h: New rtx class. >> * emit-rtl.c (emit_filler_after): New function. >> * rtl.def (FILLER_INSN): New rtl define. >> * rtl.h (rtx_filler_insn): Define new structure. >> (FILLER_INSN_P): New macro. >> (is_a_helper <rtx_filler_insn *>::test): New test helper for >> rtx_filler_insn. >> (emit_filler_after): New extern. >> * target-insns.def: Add target insn definition.