Segher Boessenkool <seg...@kernel.crashing.org> writes: > Hi!
Hi Segher, > [ Btw, the mailing list archive will not show your attachments (just lets > me download them); naming the files *.txt probably works, but you can > also use a sane mime type (like, text/plain) ]. [ Sure can do it np, I'm just less sure if text/x-diff is such and insane mime type for a mailing list software :) ] > On Wed, Jul 22, 2020 at 12:09:08PM +0200, Andrea Corallo wrote: >> this second patch implements the AArch64 specific back-end pass >> 'branch-dilution' controllable by the followings command line options: >> >> -mbranch-dilution >> >> --param=aarch64-branch-dilution-granularity={num} >> >> --param=aarch64-branch-dilution-max-branches={num} > > That sounds like something that would be useful generically, even? > Targets that do not want to use it can just skip it (that probably should > be the default then), via setting the granularity to 0 for example. > >> Observed performance improvements on Neoverse N1 SPEC CPU 2006 where >> up to ~+3% (xalancbmk) and ~+1.5% (sjeng). Average code size increase >> for all the testsuite proved to be ~0.4%. > > Can you share a geomean improvement as well? Also something like 0.4% > is sounds like, or is it more? After my first measure I was suggestted by a colleague a less noisy system to benchmark on and a more reproducable methodology. I repeated the tests on N1 with the following results: | Benchmark | Est. Peak Rate ration | | | diluted / baseline | |----------------+-----------------------| | 400.perlbench | 1.018x | | 401.bzip2 | 1.004x | | 403.gcc | 0.987x | | 429.mcf | 1.000x | | 445.gobmk | 0.998x | | 456.hmmer | 1.000x | | 458.sjeng | 1.008x | | 462.libquantum | 1.014x | | 464.h264ref | 1.004x | | 471.omnetpp | 1.017x | | 473.astar | 1.007x | | 483.xalancbmk | 0.998x | I was explained xalanc tend to be very noisy being memory bound so this explains the difference, not sure why sjeng looks less good. The overall ratio comparing spec rates is +~0.44%. >> - Unconditional branches must be the end of a basic block, and nops >> cannot be outside of a basic block. Thus the need for FILLER_INSN, >> which allows placement outside of a basic block > > But the new rtx class is recognised in just one location, so it could > recognise it on any other characteristic easily. Make sense, unless in the future this changes, but probably is not worth taking this in account now. >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/branch-dilution-off.c >> @@ -0,0 +1,57 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O1 -mcpu=cortex-a72 --param case-values-threshold=50" } */ >> +/* { dg-final { scan-assembler-not "\\s*b.*\n\\snop\n" } } */ > > You can write this simpler as > > /* { dg-final { scan-assembler-not {\s*b.*\n\snop\n} } } */ > > which shows a problem in this more clearly: . will match newlines as > well. Also, starting a RE with (anything)* does nothing, "anything" is > allowed to match 0 times after all. You probably meant the "b" should > start a mnemonic? > > /* { dg-final { scan-assembler-not {(?n)\mb.*\n\snop\n} } } */ > > (\m is a zero-width start-of-word match, like \< in grep; (?n) means . > does not match newlines (if you know Perl, it turns /m on and /s off -- > the opposite of the defaults for Tcl). > > (or you could do [^\n]* or even just \S* , no (?n) needed then). > > > Segher Thanks for the feedback! Andrea