On Tue, Jul 8, 2025 at 12:48 PM H.J. Lu <hjl.to...@gmail.com> wrote:
>
> aba3b9d3a48a0703fd565f7c5f0caf604f59970b is the first bad commit
> commit aba3b9d3a48a0703fd565f7c5f0caf604f59970b
> Author: H.J. Lu <hjl.to...@gmail.com>
> Date:   Fri May 9 07:17:07 2025 +0800
>
>     x86: Extend the remove_redundant_vector pass
>
> which removed non all 0s/1s redundant vector loads, caused SPEC CPU 2017
> 519.lbm_r and 470.lbm performance regressions on AMD znverN processors.
> Add a tuning option to keep non all 0s/1s redundant vector loads on AMD
> znverN processors.

Do we know what actually happens here or is this basically reverting the change
based on a new tunable and the reported regression?

If I read the pass correctly it might insert broadcasts on paths where
not originally
computed (it inserts after the scalar def, which might be far away).
ix86_broadcast_inner
suggests it replaces extracts from a broadcast with the original
broadcast value/register
which means it might increase lifetime of the broadcast register.

Both shouldn't be causing specifically regressions on Zen2, but can be
bad.   I think
we need to understand better what the pass does (it's written without
much commentary,
so I tried to quickly reverse engineer it), and improve it, avoiding
cases where it
obviously increases register lifetime.

> gcc/
>
> PR target/120941
> * config/i386/i386-features.cc (ix86_broadcast_inner): Keep
> non all 0s/1s redundant vector loads if asked.
> * config/i386/x86-tune.def (X86_TUNE_KEEP_REDUNDANT_VECTOR_LOAD):
> New tuning.
>
> gcc/testsuite/
>
> PR target/120941
> * gcc.target/i386/pr120941-1a.c: New test.
> * gcc.target/i386/pr120941-1b.c: Likewise.
> * gcc.target/i386/pr120941-1c.c: Likewise.
> * gcc.target/i386/pr120941-1d.c: Likewise.
>
> OK for master?
>
> Thanks.
>
> --
> H.J.

Reply via email to