https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113641

            Bug ID: 113641
           Summary: 510.parest_r with PGO at O2 slower than GCC 12 (7% on
                    Zen 3&2, 4% on CascadeLake) since
                    r13-4272-g8caf155a3d6e23
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jamborm at gcc dot gnu.org
            Blocks: 26163
  Target Milestone: ---
              Host: x86_64-linux-gnu
            Target: x86_64-linux-gnu

During the development of GCC 13, 510.parest_r run-time regressed on x86_64
when built with profile guided optimization and just plain O2 and master than
when using GCC12.  The difference is not big but fairly clear cut, about 7.6%
on Zen3:

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=740.457.0&plot.1=892.457.0&plot.2=694.457.0&;

and about 7.2% on Zen2:

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=777.457.0&plot.1=932.457.0&plot.2=687.457.0&;

The graphs above show use of both LTO and PGO but LTO is not necessary.

I was able to bisect the regression to commit r13-4272-g8caf155a3d6e23 (i386:
Only enable small loop unrolling in backend [PR 107692]).  parest_r is also
about 4% slower when compiled with this revision than with the previous one on
Intel CascadeLake.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

Reply via email to