Segher Boessenkool <[email protected]> writes:
> Hi!
Hi Segher,
> [ Btw, the mailing list archive will not show your attachments (just lets
> me download them); naming the files *.txt probably works, but you can
> also use a sane mime type (like, text/plain) ].
[ Sure can do it np, I'm just less sure if text/x-diff is such and
insane mime type for a mailing list software :) ]
> On Wed, Jul 22, 2020 at 12:09:08PM +0200, Andrea Corallo wrote:
>> this second patch implements the AArch64 specific back-end pass
>> 'branch-dilution' controllable by the followings command line options:
>>
>> -mbranch-dilution
>>
>> --param=aarch64-branch-dilution-granularity={num}
>>
>> --param=aarch64-branch-dilution-max-branches={num}
>
> That sounds like something that would be useful generically, even?
> Targets that do not want to use it can just skip it (that probably should
> be the default then), via setting the granularity to 0 for example.
>
>> Observed performance improvements on Neoverse N1 SPEC CPU 2006 where
>> up to ~+3% (xalancbmk) and ~+1.5% (sjeng). Average code size increase
>> for all the testsuite proved to be ~0.4%.
>
> Can you share a geomean improvement as well? Also something like 0.4%
> is sounds like, or is it more?
After my first measure I was suggestted by a colleague a less noisy
system to benchmark on and a more reproducable methodology. I repeated
the tests on N1 with the following results:
| Benchmark | Est. Peak Rate ration |
| | diluted / baseline |
|----------------+-----------------------|
| 400.perlbench | 1.018x |
| 401.bzip2 | 1.004x |
| 403.gcc | 0.987x |
| 429.mcf | 1.000x |
| 445.gobmk | 0.998x |
| 456.hmmer | 1.000x |
| 458.sjeng | 1.008x |
| 462.libquantum | 1.014x |
| 464.h264ref | 1.004x |
| 471.omnetpp | 1.017x |
| 473.astar | 1.007x |
| 483.xalancbmk | 0.998x |
I was explained xalanc tend to be very noisy being memory bound so this
explains the difference, not sure why sjeng looks less good. The
overall ratio comparing spec rates is +~0.44%.
>> - Unconditional branches must be the end of a basic block, and nops
>> cannot be outside of a basic block. Thus the need for FILLER_INSN,
>> which allows placement outside of a basic block
>
> But the new rtx class is recognised in just one location, so it could
> recognise it on any other characteristic easily.
Make sense, unless in the future this changes, but probably is not worth
taking this in account now.
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/branch-dilution-off.c
>> @@ -0,0 +1,57 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O1 -mcpu=cortex-a72 --param case-values-threshold=50" } */
>> +/* { dg-final { scan-assembler-not "\\s*b.*\n\\snop\n" } } */
>
> You can write this simpler as
>
> /* { dg-final { scan-assembler-not {\s*b.*\n\snop\n} } } */
>
> which shows a problem in this more clearly: . will match newlines as
> well. Also, starting a RE with (anything)* does nothing, "anything" is
> allowed to match 0 times after all. You probably meant the "b" should
> start a mnemonic?
>
> /* { dg-final { scan-assembler-not {(?n)\mb.*\n\snop\n} } } */
>
> (\m is a zero-width start-of-word match, like \< in grep; (?n) means .
> does not match newlines (if you know Perl, it turns /m on and /s off --
> the opposite of the defaults for Tcl).
>
> (or you could do [^\n]* or even just \S* , no (?n) needed then).
>
>
> Segher
Thanks for the feedback!
Andrea