On Fri, Jan 16, 2026 at 10:23 PM Richard Biener <[email protected]> wrote:
>
> On Fri, 16 Jan 2026, Liu, Hongtao wrote:
>
> >
> >
> > > -----Original Message-----
> > > From: Richard Biener <[email protected]>
> > > Sent: Friday, January 16, 2026 6:23 PM
> > > To: [email protected]
> > > Cc: Liu, Hongtao <[email protected]>
> > > Subject: [PATCH] target/123603 - add --param ix86-vect-compare-costs
> > >
> > > The following allows to switch the x86 target to use the vectorizer cost
> > > comparison mechanic to select between different vector mode variants of
> > > vectorizations. The default is still to not do this but this allows an
> > > opt-in.
> > >
> >
> > The patch LGTM.
> >
> > > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> > >
> > > For next stage1 I'll probably propose flipping the switch (or not add the
> > > switch
> > > at all). I'll follow up with a report on how CPU 2017 behaves with this
> > > on vs.
> >
> > If possible, we should run the next SPEC CPU benchmarks (with more
> > vectorization) to decide whether to switch it on.
> > I did similar tests on SPEC CPU 2017 two years ago - no clear benefits and
> > longer compile times, probably due to the crude cost model.
> >
> > > off before considering to ask whether we want this switch for GCC 16 or
> > > not
> > > (like if it only has overly negative effects).
> >
> > It would be quite interesting if we could find that some benchmarks do show
> > benefits.
>
> On SPEC CPU 2017 for -Ofast -march=znver4 this shows 2463 out of
> 39706 vectorized loops changing mode. In 503 out of 12378 cases
> we decided to not use masked epilogs. Compile-time increases by ~1%
> overall.
> With a quick 1-run there does not seem to be off-noise effects
> for INT, this particular optimization and target option combination
> and actual hardware to run on. For FP 549.fotonik3d_r improves by 6%
> (confirmed with a 2-run).
Interesting.
>
> This was triggered by PR123190 and PR123603 which have cases where
> comparing costs would have resulted in the faster vector size to be
> used. Both were reported for -O2 -march=x86-64-v3 -flto and with PGO.
> The PR123603 recorded regression of 548.exchange2_r with these flags
> is resolved with the flag (performance improves by 13%). I don't
> have SPEC 2006 on that machine so did not verify the PR123190 433.milc
> regression, but that has been improved with the two earlier patches.
> The --param has no effect on the testcase in the PR.
>
> I do expect that some of our tricks in the x86 cost model to make
> larger vector sizes unprofitable will be obsolete or are
> counter-productive with cost comparison turned on.
>
> I think the above shows having the knob is useful, if only to
> gather more data.
I will test this separately on Intel P-cores and E-cores
(theoretically, the cost comparison should be
architecture-independent, but more testing might expose issues with
the current cost model or certain limitations of cost comparison). If
there are no negative results, considering that the current compile
time overhead is relatively small, we can indeed enable this in the
next stage1.
>
> In case there's no negative feedback I plan to merge this early
> next week.
>
> Thanks,
> Richard.
>
> > >
> > > PR target/123603
> > > * config/i386/i386.opt (-param=ix86-vect-compare-costs=): Add.
> > > * config/i386/i386.cc (ix86_autovectorize_vector_modes): Honor it.
> > > * doc/invoke.texi (ix86-vect-compare-costs): Document.
> > >
> > > * gcc.dg/vect/costmodel/x86_64/costmodel-pr123603.c: New
> > > testcase.
> > > ---
> > > gcc/config/i386/i386.cc | 2 +-
> > > gcc/config/i386/i386.opt | 4 ++++
> > > gcc/doc/invoke.texi | 3 +++
> > > .../vect/costmodel/x86_64/costmodel-pr123603.c | 15
> > > +++++++++++++++
> > > 4 files changed, 23 insertions(+), 1 deletion(-) create mode 100644
> > > gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr123603.c
> > >
> > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index
> > > 6bf4af8bbe3..a3d0f7cb649 100644
> > > --- a/gcc/config/i386/i386.cc
> > > +++ b/gcc/config/i386/i386.cc
> > > @@ -25700,7 +25700,7 @@ ix86_autovectorize_vector_modes
> > > (vector_modes *modes, bool all)
> > > if (TARGET_SSE2)
> > > modes->safe_push (V4QImode);
> > >
> > > - return 0;
> > > + return ix86_vect_compare_costs ? VECT_COMPARE_COSTS : 0;
> > > }
> > >
> > > /* Implemenation of targetm.vectorize.get_mask_mode. */ diff --git
> > > a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index
> > > 99bb674812b..ef9efabcff6 100644
> > > --- a/gcc/config/i386/i386.opt
> > > +++ b/gcc/config/i386/i386.opt
> > > @@ -1249,6 +1249,10 @@ Enable conservative small loop unrolling.
> > > Target Joined UInteger Var(ix86_vect_unroll_limit) Init(4) Param Limit
> > > how
> > > much the autovectorizer may unroll a loop.
> > >
> > > +-param=ix86-vect-compare-costs=
> > > +Target Joined UInteger Var(ix86_vect_compare_costs) Init(0)
> > > +IntegerRange(0, 1) Param Optimization Whether x86 vectorizer cost
> > > modeling compares costs of different vector sizes.
> > > +
> > > mlam=
> > > Target RejectNegative Joined Enum(lam_type) Var(ix86_lam_type)
> > > Init(lam_none) -mlam=[none|u48|u57] Instrument meta data position in
> > > user data pointers.
> > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index
> > > b703b531d75..5092e4ba9ad 100644
> > > --- a/gcc/doc/invoke.texi
> > > +++ b/gcc/doc/invoke.texi
> > > @@ -18213,6 +18213,9 @@ the discovery is aborted.
> > > @item ix86-vect-unroll-limit
> > > Limit how much the autovectorizer may unroll a loop.
> > >
> > > +@item ix86-vect-compare-costs
> > > +Whether x86 vectorizer cost modeling compares costs of different vector
> > > sizes.
> > > +
> > > @end table
> > >
> > > @end table
> > > diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-
> > > pr123603.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-
> > > pr123603.c
> > > new file mode 100644
> > > index 00000000000..c074176a7e4
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr123603.c
> > > @@ -0,0 +1,15 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-additional-options "--param ix86-vect-compare-costs=1" } */
> > > +
> > > +void foo (int *block)
> > > +{
> > > + for (int i = 0; i < 3; ++i)
> > > + {
> > > + int a = block[i*9];
> > > + int b = block[i*9+1];
> > > + block[i*9] = a + 10;
> > > + block[i*9+1] = b + 10;
> > > + }
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "optimized: loop vectorized using 8 byte
> > > +vectors" "vect" } } */
> > > --
> > > 2.51.0
> >
>
> --
> Richard Biener <[email protected]>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
--
BR,
Hongtao