On Tue, Jul 8, 2025 at 7:26 PM Richard Biener
<richard.guent...@gmail.com> wrote:
>
> On Tue, Jul 8, 2025 at 12:48 PM H.J. Lu <hjl.to...@gmail.com> wrote:
> >
> > aba3b9d3a48a0703fd565f7c5f0caf604f59970b is the first bad commit
> > commit aba3b9d3a48a0703fd565f7c5f0caf604f59970b
> > Author: H.J. Lu <hjl.to...@gmail.com>
> > Date:   Fri May 9 07:17:07 2025 +0800
> >
> >     x86: Extend the remove_redundant_vector pass
> >
> > which removed non all 0s/1s redundant vector loads, caused SPEC CPU 2017
> > 519.lbm_r and 470.lbm performance regressions on AMD znverN processors.
> > Add a tuning option to keep non all 0s/1s redundant vector loads on AMD
> > znverN processors.
>
> Do we know what actually happens here or is this basically reverting the 
> change
> based on a new tunable and the reported regression?
>
> If I read the pass correctly it might insert broadcasts on paths where
> not originally
> computed (it inserts after the scalar def, which might be far away).
> ix86_broadcast_inner
> suggests it replaces extracts from a broadcast with the original
> broadcast value/register
> which means it might increase lifetime of the broadcast register.
>
> Both shouldn't be causing specifically regressions on Zen2, but can be
> bad.   I think
> we need to understand better what the pass does (it's written without
> much commentary,
> so I tried to quickly reverse engineer it), and improve it, avoiding
> cases where it
> obviously increases register lifetime.

The regression doesn't show up on Intel processors.  This regression is specific
to AMD processors.  If there is a small testcase, I will find a
different way to fix
it.

> > gcc/
> >
> > PR target/120941
> > * config/i386/i386-features.cc (ix86_broadcast_inner): Keep
> > non all 0s/1s redundant vector loads if asked.
> > * config/i386/x86-tune.def (X86_TUNE_KEEP_REDUNDANT_VECTOR_LOAD):
> > New tuning.
> >
> > gcc/testsuite/
> >
> > PR target/120941
> > * gcc.target/i386/pr120941-1a.c: New test.
> > * gcc.target/i386/pr120941-1b.c: Likewise.
> > * gcc.target/i386/pr120941-1c.c: Likewise.
> > * gcc.target/i386/pr120941-1d.c: Likewise.
> >
> > OK for master?
> >
> > Thanks.
> >
> > --
> > H.J.



-- 
H.J.

Reply via email to