Segher Boessenkool <seg...@kernel.crashing.org> writes: > On Wed, Jul 22, 2020 at 09:45:08PM +0200, Andrea Corallo wrote: >> > Should that actually be a sliding window, or should there actually just >> > not be more than N branches per aligned block of machine code? Like, >> > per fetch group. >> > >> > Can you not use ASM_OUTPUT_ALIGN_WITH_NOP (or ASM_OUTPUT_MAX_SKIP_ALIGN >> > even) then? GCC has infrastructure for that, already. >> >> Correct, it's a sliding window only because the real load address is not >> known to the compiler and the algorithm is conservative. I believe we >> could use ASM_OUTPUT_ALIGN_WITH_NOP if we align each function to (al >> least) the granule size, then we should be able to insert 'nop aligned >> labels' precisely. > > Yeah, we have similar issues on Power... Our "granule" (fetch group > size, in our terminology) is 32 typically, but we align functions to > just 16. This is causing some problems, but aligning to bigger > boundaries isn't a very happy alternative either. WIP...
Interesting, I was expecting other CPUs to have a similar mechanism. > (We don't have this exact same problem, because our non-ancient cores > can just predict *all* branches in the same cycle). > >> My main fear is that given new cores tend to have big granules code size >> would blow. One advantage of the implemented algorithm is that even if >> slightly conservative it's impacting code size only where an high branch >> density shows up. > > What is "big granules" for you? N1 is 8 instructions so 32 bytes as well, I guess this may grow further (my speculation). Andrea