https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87528
Martin Jambor <jamborm at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kugan at gcc dot gnu.org --- Comment #1 from Martin Jambor <jamborm at gcc dot gnu.org> --- It seems that the machine does not like the newly generated calls into libgcc for popcount. The profile of r262486 (_slow variant) and the one immediately preceding it (the _fast variant) is: $ perf report -n --percent-limit=2 | cat # Overhead Samples Command Shared Object Symbol # ........ ............ ............... ............. ................. # 6.15% 187930 deepsjeng_r_slow deepsjeng_r feval 5.88% 179434 deepsjeng_r_fast deepsjeng_r feval 5.56% 169734 deepsjeng_r_fast deepsjeng_r search 5.42% 165581 deepsjeng_r_slow deepsjeng_r search 5.19% 158575 deepsjeng_r_slow deepsjeng_r ProbeTT 5.16% 157546 deepsjeng_r_fast deepsjeng_r ProbeTT 4.74% 144696 deepsjeng_r_slow deepsjeng_r qsearch 4.72% 144193 deepsjeng_r_fast deepsjeng_r qsearch 2.76% 84389 deepsjeng_r_slow libgcc_s.so __popcountdi2 2.75% 83936 deepsjeng_r_fast deepsjeng_r see 2.73% 83307 deepsjeng_r_slow deepsjeng_r see 2.67% 81614 deepsjeng_r_slow deepsjeng_r order_moves 2.62% 80077 deepsjeng_r_fast deepsjeng_r order_moves 2.49% 76087 deepsjeng_r_slow deepsjeng_r FindFirstRemove 2.47% 75346 deepsjeng_r_fast deepsjeng_r FindFirstRemove 2.03% 61888 deepsjeng_r_fast deepsjeng_r make 2.03% 61861 deepsjeng_r_slow deepsjeng_r make The profile for r262864 (marked again as _slow below) and its immediate predecessor (marked _fast) is: # Overhead Samples Command Shared Object Symbol # ........ ............ ............... ............. ................. # 5.87% 192681 deepsjeng_r_slow deepsjeng_r feval 5.74% 188254 deepsjeng_r_fast deepsjeng_r feval 5.48% 179850 deepsjeng_r_slow libgcc_s.so __popcountdi2 5.17% 169671 deepsjeng_r_slow deepsjeng_r search 5.04% 165438 deepsjeng_r_fast deepsjeng_r search 4.83% 158368 deepsjeng_r_fast deepsjeng_r ProbeTT 4.82% 158096 deepsjeng_r_slow deepsjeng_r ProbeTT 4.44% 145659 deepsjeng_r_fast deepsjeng_r qsearch 4.39% 144117 deepsjeng_r_slow deepsjeng_r qsearch 2.56% 84085 deepsjeng_r_fast libgcc_s.so __popcountdi2 2.55% 83853 deepsjeng_r_slow deepsjeng_r see 2.55% 83653 deepsjeng_r_fast deepsjeng_r see 2.54% 83383 deepsjeng_r_fast deepsjeng_r order_moves 2.44% 80246 deepsjeng_r_slow deepsjeng_r order_moves 2.31% 75966 deepsjeng_r_fast deepsjeng_r FindFirstRemove 2.30% 75575 deepsjeng_r_slow deepsjeng_r FindFirstRemove Again, let me emphasize this is all about generic march/mtune, native march/mtune is almost 3% faster than GCC 8.