On 22/07/2020 13:24, Richard Biener via Gcc-patches wrote:
> On Wed, Jul 22, 2020 at 12:03 PM Andrea Corallo <andrea.cora...@arm.com> 
> wrote:
>>
>> Hi all,
>>
>> I'd like to submit the following two patches implementing a new AArch64
>> specific back-end pass that helps optimize branch-dense code, which can
>> be a bottleneck for performance on some Arm cores.  This is achieved by
>> padding out the branch-dense sections of the instruction stream with
>> nops.
>>
>> The original patch was already posted some time ago:
>>
>> https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg200721.html
>>
>> This follows up splitting as suggested in two patches, rebasing on
>> master and implementing the suggestions of the first code review.
>>
>> This first patch implements the addition of a new RTX instruction class
>> FILLER_INSN, which has been white listed to allow placement of NOPs
>> outside of a basic block.  This is to allow padding after unconditional
>> branches.  This is favorable so that any performance gained from
>> diluting branches is not paid straight back via excessive eating of
>> nops.
>>
>> It was deemed that a new RTX class was less invasive than modifying
>> behavior in regards to standard UNSPEC nops.
>>
>> 1/2 is requirement for 2/2.  Please see this the cover letter of this last
>> for more details on the pass itself.
> 
> I wonder if such effect of instructions on the pipeline can be modeled
> in the DFA and thus whether the scheduler could issue (always ready)
> NOPs?
> 
> I also wonder whether such optimization is better suited for the assembler
> which should know instruction lengths and alignment in a more precise
> way and also would know whether extra nops make immediates too large
> for pc relative things like short branches or section anchor accesses
> (or whatever else)?

No, the assembler should never spontaneously insert instructions.  That
breaks the branch range calculations that the compiler relies upon.

R.

> 
> Richard.
> 
>> Regards
>>
>>   Andrea
>>
>> gcc/ChangeLog
>>
>> 2020-07-17  Andrea Corallo  <andrea.cora...@arm.com>
>>             Carey Williams  <carey.willi...@arm.com>
>>
>>         * cfgbuild.c (inside_basic_block_p): Handle FILLER_INSN.
>>         * cfgrtl.c (rtl_verify_bb_layout): Whitelist FILLER_INSN outside
>>         basic blocks.
>>         * coretypes.h: New rtx class.
>>         * emit-rtl.c (emit_filler_after): New function.
>>         * rtl.def (FILLER_INSN): New rtl define.
>>         * rtl.h (rtx_filler_insn): Define new structure.
>>         (FILLER_INSN_P): New macro.
>>         (is_a_helper <rtx_filler_insn *>::test): New test helper for
>>         rtx_filler_insn.
>>         (emit_filler_after): New extern.
>>         * target-insns.def: Add target insn definition.

Reply via email to