https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87528

Martin Jambor <jamborm at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kugan at gcc dot gnu.org

--- Comment #1 from Martin Jambor <jamborm at gcc dot gnu.org> ---
It seems that the machine does not like the newly generated calls into
libgcc for popcount.

The profile of r262486 (_slow variant) and the one immediately
preceding it (the _fast variant) is:

$ perf report -n --percent-limit=2 | cat

# Overhead       Samples  Command          Shared Object  Symbol                
# ........  ............  ...............  .............  .................
#
     6.15%        187930  deepsjeng_r_slow  deepsjeng_r   feval
     5.88%        179434  deepsjeng_r_fast  deepsjeng_r   feval
     5.56%        169734  deepsjeng_r_fast  deepsjeng_r   search
     5.42%        165581  deepsjeng_r_slow  deepsjeng_r   search
     5.19%        158575  deepsjeng_r_slow  deepsjeng_r   ProbeTT
     5.16%        157546  deepsjeng_r_fast  deepsjeng_r   ProbeTT
     4.74%        144696  deepsjeng_r_slow  deepsjeng_r   qsearch
     4.72%        144193  deepsjeng_r_fast  deepsjeng_r   qsearch
     2.76%         84389  deepsjeng_r_slow  libgcc_s.so   __popcountdi2
     2.75%         83936  deepsjeng_r_fast  deepsjeng_r   see
     2.73%         83307  deepsjeng_r_slow  deepsjeng_r   see
     2.67%         81614  deepsjeng_r_slow  deepsjeng_r   order_moves
     2.62%         80077  deepsjeng_r_fast  deepsjeng_r   order_moves
     2.49%         76087  deepsjeng_r_slow  deepsjeng_r   FindFirstRemove
     2.47%         75346  deepsjeng_r_fast  deepsjeng_r   FindFirstRemove
     2.03%         61888  deepsjeng_r_fast  deepsjeng_r   make
     2.03%         61861  deepsjeng_r_slow  deepsjeng_r   make


The profile for r262864 (marked again as _slow below) and its
immediate predecessor (marked _fast) is:


# Overhead       Samples  Command          Shared Object  Symbol                
# ........  ............  ...............  .............  .................
#    
     5.87%        192681  deepsjeng_r_slow  deepsjeng_r   feval
     5.74%        188254  deepsjeng_r_fast  deepsjeng_r   feval
     5.48%        179850  deepsjeng_r_slow  libgcc_s.so   __popcountdi2
     5.17%        169671  deepsjeng_r_slow  deepsjeng_r   search
     5.04%        165438  deepsjeng_r_fast  deepsjeng_r   search
     4.83%        158368  deepsjeng_r_fast  deepsjeng_r   ProbeTT
     4.82%        158096  deepsjeng_r_slow  deepsjeng_r   ProbeTT
     4.44%        145659  deepsjeng_r_fast  deepsjeng_r   qsearch
     4.39%        144117  deepsjeng_r_slow  deepsjeng_r   qsearch
     2.56%         84085  deepsjeng_r_fast  libgcc_s.so   __popcountdi2
     2.55%         83853  deepsjeng_r_slow  deepsjeng_r   see
     2.55%         83653  deepsjeng_r_fast  deepsjeng_r   see
     2.54%         83383  deepsjeng_r_fast  deepsjeng_r   order_moves
     2.44%         80246  deepsjeng_r_slow  deepsjeng_r   order_moves
     2.31%         75966  deepsjeng_r_fast  deepsjeng_r   FindFirstRemove
     2.30%         75575  deepsjeng_r_slow  deepsjeng_r   FindFirstRemove

Again, let me emphasize this is all about generic march/mtune, native
march/mtune is almost 3% faster than GCC 8.

Reply via email to