Re: [PATCH 2/2] Aarch64: Add branch diluter pass

Segher Boessenkool Thu, 23 Jul 2020 15:48:25 -0700

On Wed, Jul 22, 2020 at 09:45:08PM +0200, Andrea Corallo wrote:
> > Should that actually be a sliding window, or should there actually just
> > not be more than N branches per aligned block of machine code?  Like,
> > per fetch group.
> >
> > Can you not use ASM_OUTPUT_ALIGN_WITH_NOP (or ASM_OUTPUT_MAX_SKIP_ALIGN
> > even) then?  GCC has infrastructure for that, already.
> 
> Correct, it's a sliding window only because the real load address is not
> known to the compiler and the algorithm is conservative.  I believe we
> could use ASM_OUTPUT_ALIGN_WITH_NOP if we align each function to (al
> least) the granule size, then we should be able to insert 'nop aligned
> labels' precisely.


Yeah, we have similar issues on Power...  Our "granule" (fetch group
size, in our terminology) is 32 typically, but we align functions to
just 16.  This is causing some problems, but aligning to bigger
boundaries isn't a very happy alternative either.  WIP...

(We don't have this exact same problem, because our non-ancient cores
can just predict *all* branches in the same cycle).

> My main fear is that given new cores tend to have big granules code size
> would blow.  One advantage of the implemented algorithm is that even if
> slightly conservative it's impacting code size only where an high branch
> density shows up.

What is "big granules" for you?


Segher

Re: [PATCH 2/2] Aarch64: Add branch diluter pass

Reply via email to