https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #28 from Maxim Kuvyrkov <mkuvyrkov at gcc dot gnu.org> ---
(In reply to Ilya Leoshkevich from comment #27)
> With
> 
> -DSPEC_CPU -DNDEBUG -DPERL_CORE   -O3 -save-temps=obj
> -fopt-info-vec-optimized       -DSPEC_CPU_LP64 -DSPEC_CPU_LINUX_X64
> -fgnu89-inline
> 
> on gcc113 I can see 2% slowdown:
> 
> r277511 (without this fix): 880.09s
> r277515 (with this fix):    897.85s
> 
> The function that degraded the most is indeed S_regmatch:
> 
> $ perf diff perf-9760321.data perf-44b2b4c.data
>     32.24%           exe                [.] S_regmatch                      
> 
>      8.92%           exe                [.] S_find_byclass.isra.0           
> 
>      6.80%   +0.28%  libc-2.19.so       [.] 0x000000000007dec0              
> 
>      5.20%           exe                [.] S_regtry                        
> 
> 
> However, the "shape" of S_regmatch did not change, that is, when all
> offsets and register numbers are replaced with "x" in the objdump
> output, the old and the new versions are identical.  This hints at some
> microarchitectural effect - aliasing in the branch predictor maybe?
> 
> From my perspective, this happens too often, so I use the following test
> to rule this out: just add a nop at the beginning of the problematic
> function. This changes all the offsets and makes aliasing situation
> completely different.  And indeed, by adding a single nop to S_regmatch,
> I get wildly different results (for now this is just 1 repeat, I will
> run best-of-3 overnight):
> 
> r277511 (without this fix): 929.1s
> r277515 (with this fix):    931.48s

Hi Ilya,

Thanks for the analysis.  Doesn't seem like we can do anything useful about
this regression.

[For completeness, I see same 5% slowdown with "-O3 -funroll-loops" as with
plain -O3 on Cortex-A57.]

Reply via email to