On Fri, 16 Jan 2026, Liu, Hongtao wrote:
>
>
> > -----Original Message-----
> > From: Richard Biener <[email protected]>
> > Sent: Friday, January 16, 2026 6:23 PM
> > To: [email protected]
> > Cc: Liu, Hongtao <[email protected]>
> > Subject: [PATCH] target/123603 - add --param ix86-vect-compare-costs
> >
> > The following allows to switch the x86 target to use the vectorizer cost
> > comparison mechanic to select between different vector mode variants of
> > vectorizations. The default is still to not do this but this allows an
> > opt-in.
> >
>
> The patch LGTM.
>
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> >
> > For next stage1 I'll probably propose flipping the switch (or not add the
> > switch
> > at all). I'll follow up with a report on how CPU 2017 behaves with this on
> > vs.
>
> If possible, we should run the next SPEC CPU benchmarks (with more
> vectorization) to decide whether to switch it on.
> I did similar tests on SPEC CPU 2017 two years ago - no clear benefits and
> longer compile times, probably due to the crude cost model.
>
> > off before considering to ask whether we want this switch for GCC 16 or not
> > (like if it only has overly negative effects).
>
> It would be quite interesting if we could find that some benchmarks do show
> benefits.
On SPEC CPU 2017 for -Ofast -march=znver4 this shows 2463 out of
39706 vectorized loops changing mode. In 503 out of 12378 cases
we decided to not use masked epilogs. Compile-time increases by ~1%
overall.
With a quick 1-run there does not seem to be off-noise effects
for INT, this particular optimization and target option combination
and actual hardware to run on. For FP 549.fotonik3d_r improves by 6%
(confirmed with a 2-run).
This was triggered by PR123190 and PR123603 which have cases where
comparing costs would have resulted in the faster vector size to be
used. Both were reported for -O2 -march=x86-64-v3 -flto and with PGO.
The PR123603 recorded regression of 548.exchange2_r with these flags
is resolved with the flag (performance improves by 13%). I don't
have SPEC 2006 on that machine so did not verify the PR123190 433.milc
regression, but that has been improved with the two earlier patches.
The --param has no effect on the testcase in the PR.
I do expect that some of our tricks in the x86 cost model to make
larger vector sizes unprofitable will be obsolete or are
counter-productive with cost comparison turned on.
I think the above shows having the knob is useful, if only to
gather more data.
In case there's no negative feedback I plan to merge this early
next week.
Thanks,
Richard.
> >
> > PR target/123603
> > * config/i386/i386.opt (-param=ix86-vect-compare-costs=): Add.
> > * config/i386/i386.cc (ix86_autovectorize_vector_modes): Honor it.
> > * doc/invoke.texi (ix86-vect-compare-costs): Document.
> >
> > * gcc.dg/vect/costmodel/x86_64/costmodel-pr123603.c: New
> > testcase.
> > ---
> > gcc/config/i386/i386.cc | 2 +-
> > gcc/config/i386/i386.opt | 4 ++++
> > gcc/doc/invoke.texi | 3 +++
> > .../vect/costmodel/x86_64/costmodel-pr123603.c | 15
> > +++++++++++++++
> > 4 files changed, 23 insertions(+), 1 deletion(-) create mode 100644
> > gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr123603.c
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index
> > 6bf4af8bbe3..a3d0f7cb649 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -25700,7 +25700,7 @@ ix86_autovectorize_vector_modes
> > (vector_modes *modes, bool all)
> > if (TARGET_SSE2)
> > modes->safe_push (V4QImode);
> >
> > - return 0;
> > + return ix86_vect_compare_costs ? VECT_COMPARE_COSTS : 0;
> > }
> >
> > /* Implemenation of targetm.vectorize.get_mask_mode. */ diff --git
> > a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index
> > 99bb674812b..ef9efabcff6 100644
> > --- a/gcc/config/i386/i386.opt
> > +++ b/gcc/config/i386/i386.opt
> > @@ -1249,6 +1249,10 @@ Enable conservative small loop unrolling.
> > Target Joined UInteger Var(ix86_vect_unroll_limit) Init(4) Param Limit how
> > much the autovectorizer may unroll a loop.
> >
> > +-param=ix86-vect-compare-costs=
> > +Target Joined UInteger Var(ix86_vect_compare_costs) Init(0)
> > +IntegerRange(0, 1) Param Optimization Whether x86 vectorizer cost
> > modeling compares costs of different vector sizes.
> > +
> > mlam=
> > Target RejectNegative Joined Enum(lam_type) Var(ix86_lam_type)
> > Init(lam_none) -mlam=[none|u48|u57] Instrument meta data position in
> > user data pointers.
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index
> > b703b531d75..5092e4ba9ad 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -18213,6 +18213,9 @@ the discovery is aborted.
> > @item ix86-vect-unroll-limit
> > Limit how much the autovectorizer may unroll a loop.
> >
> > +@item ix86-vect-compare-costs
> > +Whether x86 vectorizer cost modeling compares costs of different vector
> > sizes.
> > +
> > @end table
> >
> > @end table
> > diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-
> > pr123603.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-
> > pr123603.c
> > new file mode 100644
> > index 00000000000..c074176a7e4
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr123603.c
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "--param ix86-vect-compare-costs=1" } */
> > +
> > +void foo (int *block)
> > +{
> > + for (int i = 0; i < 3; ++i)
> > + {
> > + int a = block[i*9];
> > + int b = block[i*9+1];
> > + block[i*9] = a + 10;
> > + block[i*9+1] = b + 10;
> > + }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "optimized: loop vectorized using 8 byte
> > +vectors" "vect" } } */
> > --
> > 2.51.0
>
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)