https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122585

            Bug ID: 122585
           Summary: [16 Regression] 4-5% slowdown of 433.milc on AMD zen4
                    since r16-4469-gd6986e06db5eeb
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pheeck at gcc dot gnu.org
                CC: rguenth at gcc dot gnu.org
            Blocks: 26163
  Target Milestone: ---
              Host: x86_64-linux
            Target: x86_64-linux

As seen here

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=1109.70.0

there was a 4-5% exec time slowdown of the 433.milc SPEC 2006
benchmark when compiled with -O2 -march=x86-64-v3 -flto -fprofile-use on an AMD
Zen4 machine. I bisected it to r16-4469-gd6986e06db5eeb.

commit d6986e06db5eeb797344d86bd3ef9b0654606bbd
Author:     Richard Biener <[email protected]>
AuthorDate: Fri Oct 17 15:12:11 2025 +0200
Commit:     Richard Biener <[email protected]>
CommitDate: Fri Oct 17 16:03:28 2025 +0200

    tree-optimization/122308 - apply LIM after unroll-and-jam

    Just like with loop interchange, unroll-and-jam can leave invariant
    stmts in the inner loop from outer loop stmts inbetween the two
    inner loop copies.  Do a per-function invariant motion when we
    applied unroll-and-jam.  This avoids failed dataref analysis
    and fallback to gather/scatter during vectorization.

            PR tree-optimization/122308
            * gimple-loop-jam.cc (tree_loop_unroll_and_jam): Do LIM
            after applying unroll-and-jam.

            * gcc.dg/vect/vect-pr122308.c: New testcase.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

Reply via email to