https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119960
Bug ID: 119960
Summary: Regression of code generation
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: arseny.kapoulkine at gmail dot com
Target Milestone: ---
Created attachment 61208
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61208&action=edit
perf assembly
Starting from gcc15, index decoder benchmark in
https://github.com/zeux/meshoptimizer sees a substantial regression with O2/O3:
Using Zen 4 (7950X) CPU:
gcc14 O2: 4.80 GB/s
gcc14 O3: 7.10 GB/s
gcc15 O2: 5.40 GB/s
gcc15 O3: 4.50 GB/s
clang20 O2: 6.10 GB/s
clang20 O3: 6.10 GB/s
To reproduce this, run:
make config=release codecbench && ./codecbench -l
.. after cloning the project. You can also use `CXX=` variable to override the
compiler.
The function that regressed is meshopt_decodeIndexBuffer in src/indexcodec.cpp.
I've bisected the regression to:
commit 5ab3f091b3eb42795340d3c9cea8aaec2060693c (HEAD)
Author: Richard Biener <[email protected]>
Date: Mon Dec 2 11:07:46 2024 +0100
tree-optimization/116352 - SLP scheduling and stmt order
I've attached the hot loop as ran under perf, when using gcc15 before the
referenced commit and at the referenced commit.