On Wed, May 29, 2024 at 7:13 PM 赵海峰 via Gcc <gcc@gcc.gnu.org> wrote:
>
> Dear Sir/Madam,
>
>
> We found that running on intel SPR UnixBench compiled with gcc 10.3 performs 
> worse than with gcc 8.5 for dhry2reg benchmark.
>
>
> I found it related with -fcommon option which is disabled in 10.3 by default. 
> Fcommon will make global variables addresses in special order in bss 
> section(watching by nm -n) whatever they are defined in source code.
>
>
> We are wondering if fcommon has some special performance optimization process?
>
>
> (I also post the subject to gcc-help. Hope to get some suggestion in this 
> mail list. Sorry for bothering.)

This was already filed as
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114532 . But someone
needs to go in and do more analysis of what is going wrong. The
biggest difference for x86_64 is how the variables are laid out and by
who (the compiler or the linker).  There is some notion that
-fno-common increases the number of L1-dcache-load-misses and that
points to the layout of the variable differences causing the
difference. But nobody has gone and seen which variables are laid out
differently and why. I am suspecting that small changes in the
code/variables would cause layout differences which will cause the
cache misses which can cause the performance which is almost all by
accident.
I suspect adding -fdata-sections will cause another performance
difference here too. And there is not much GCC can do about this since
data layout is "hard" to do to get the best performance always.

Thanks,
Andrew Pinski

>
>
> Best regards.
>
>
> Clark Zhao

Reply via email to