[Bug target/113600] [14 regression] 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4

2024-03-07 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600 Jeffrey A. Law changed: What|Removed |Added Priority|P3 |P2 CC|

[Bug target/113600] [14 regression] 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4

2024-02-13 Thread pheeck at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600 Filip Kastl changed: What|Removed |Added CC||pheeck at gcc dot gnu.org --- Comment #7

[Bug target/113600] [14 regression] 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4

2024-01-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600 --- Comment #6 from Hongtao Liu --- Guess explicit .REDUC_PLUS instead of original VEC_PERM_EXPR somehow impacts the store split decision.

[Bug target/113600] [14 regression] 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4

2024-01-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600 --- Comment #5 from Hongtao Liu --- It looks like x264_pixel_satd_16x16 consumes more time after my commit, an extracted case is as below, note there's no attribute((always_inline)) in the original x264_pixel_satd_8x4, it's added to force

[Bug target/113600] [14 regression] 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4

2024-01-26 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600 --- Comment #4 from Martin Jambor --- (In reply to Hongtao Liu from comment #2) > A patch is posted at > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640276.html > > Would you give a try to see if it fixes the regression, I don't

[Bug target/113600] [14 regression] 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4

2024-01-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600 --- Comment #3 from Richard Biener --- I'll note that esp. two-lane reductions (or in general two-lane BB vectorization) is hardly profitable on modern x86 uarchs unless the vectorized code is interleaved with other non-vectorized code that can

[Bug target/113600] [14 regression] 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4

2024-01-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600 --- Comment #2 from Hongtao Liu --- A patch is posted at https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640276.html Would you give a try to see if it fixes the regression, I don't currently have a znver4 machine for testing.

[Bug target/113600] [14 regression] 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4

2024-01-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600 --- Comment #1 from Hongtao Liu --- Guess it's same issue as PR112879?

[Bug target/113600] [14 regression] 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4

2024-01-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |14.0 Keywords|