Re: [PATCH 2/2] Aarch64: Add branch diluter pass

Andrea Corallo Fri, 24 Jul 2020 00:02:18 -0700

Segher Boessenkool <seg...@kernel.crashing.org> writes:

> On Wed, Jul 22, 2020 at 09:45:08PM +0200, Andrea Corallo wrote:
>> > Should that actually be a sliding window, or should there actually just
>> > not be more than N branches per aligned block of machine code?  Like,
>> > per fetch group.
>> >
>> > Can you not use ASM_OUTPUT_ALIGN_WITH_NOP (or ASM_OUTPUT_MAX_SKIP_ALIGN
>> > even) then?  GCC has infrastructure for that, already.
>> 
>> Correct, it's a sliding window only because the real load address is not
>> known to the compiler and the algorithm is conservative.  I believe we
>> could use ASM_OUTPUT_ALIGN_WITH_NOP if we align each function to (al
>> least) the granule size, then we should be able to insert 'nop aligned
>> labels' precisely.
>
> Yeah, we have similar issues on Power...  Our "granule" (fetch group
> size, in our terminology) is 32 typically, but we align functions to
> just 16.  This is causing some problems, but aligning to bigger
> boundaries isn't a very happy alternative either.  WIP...


Interesting, I was expecting other CPUs to have a similar mechanism.

> (We don't have this exact same problem, because our non-ancient cores
> can just predict *all* branches in the same cycle).
>
>> My main fear is that given new cores tend to have big granules code size
>> would blow.  One advantage of the implemented algorithm is that even if
>> slightly conservative it's impacting code size only where an high branch
>> density shows up.
>
> What is "big granules" for you?

N1 is 8 instructions so 32 bytes as well, I guess this may grow further
(my speculation).

  Andrea

Re: [PATCH 2/2] Aarch64: Add branch diluter pass

Reply via email to