Re: [PATCH 2/2] Aarch64: Add branch diluter pass

Andrea Corallo Tue, 28 Jul 2020 11:56:21 -0700

Segher Boessenkool <seg...@kernel.crashing.org> writes:

> Hi!


Hi Segher,

> [ Btw, the mailing list archive will not show your attachments (just lets
> me download them); naming the files *.txt probably works, but you can
> also use a sane mime type (like, text/plain) ].

[ Sure can do it np, I'm just less sure if text/x-diff is such and
insane mime type for a mailing list software :) ]

> On Wed, Jul 22, 2020 at 12:09:08PM +0200, Andrea Corallo wrote:
>> this second patch implements the AArch64 specific back-end pass
>> 'branch-dilution' controllable by the followings command line options:
>> 
>> -mbranch-dilution
>> 
>> --param=aarch64-branch-dilution-granularity={num}
>> 
>> --param=aarch64-branch-dilution-max-branches={num}
>
> That sounds like something that would be useful generically, even?
> Targets that do not want to use it can just skip it (that probably should
> be the default then), via setting the granularity to 0 for example.
> 
>> Observed performance improvements on Neoverse N1 SPEC CPU 2006 where
>> up to ~+3% (xalancbmk) and ~+1.5% (sjeng).  Average code size increase
>> for all the testsuite proved to be ~0.4%.
>
> Can you share a geomean improvement as well?  Also something like 0.4%
> is sounds like, or is it more?

After my first measure I was suggestted by a colleague a less noisy
system to benchmark on and a more reproducable methodology.  I repeated
the tests on N1 with the following results:

| Benchmark      | Est. Peak Rate ration |
|                |    diluted / baseline |
|----------------+-----------------------|
| 400.perlbench  |                1.018x |
| 401.bzip2      |                1.004x |
| 403.gcc        |                0.987x |
| 429.mcf        |                1.000x |
| 445.gobmk      |                0.998x |
| 456.hmmer      |                1.000x |
| 458.sjeng      |                1.008x |
| 462.libquantum |                1.014x |
| 464.h264ref    |                1.004x |
| 471.omnetpp    |                1.017x |
| 473.astar      |                1.007x |
| 483.xalancbmk  |                0.998x |

I was explained xalanc tend to be very noisy being memory bound so this
explains the difference, not sure why sjeng looks less good.  The
overall ratio comparing spec rates is +~0.44%.

>> - Unconditional branches must be the end of a basic block, and nops
>>   cannot be outside of a basic block.  Thus the need for FILLER_INSN,
>>   which allows placement outside of a basic block
>
> But the new rtx class is recognised in just one location, so it could
> recognise it on any other characteristic easily.

Make sense, unless in the future this changes, but probably is not worth
taking this in account now.

>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/branch-dilution-off.c
>> @@ -0,0 +1,57 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O1 -mcpu=cortex-a72 --param case-values-threshold=50" } */
>> +/* { dg-final { scan-assembler-not  "\\s*b.*\n\\snop\n" } } */
>
> You can write this simpler as
>
> /* { dg-final { scan-assembler-not {\s*b.*\n\snop\n} } } */
>
> which shows a problem in this more clearly: . will match newlines as
> well.  Also, starting a RE with (anything)* does nothing, "anything" is
> allowed to match 0 times after all.  You probably meant the "b" should
> start a mnemonic?
>
> /* { dg-final { scan-assembler-not {(?n)\mb.*\n\snop\n} } } */
>
> (\m is a zero-width start-of-word match, like \< in grep; (?n) means .
> does not match newlines (if you know Perl, it turns /m on and /s off --
> the opposite of the defaults for Tcl).
>
> (or you could do [^\n]* or even just \S* , no (?n) needed then).
>
>
> Segher

Thanks for the feedback!

  Andrea

Re: [PATCH 2/2] Aarch64: Add branch diluter pass

Reply via email to