On Fri, 16 Jan 2026, Liu, Hongtao wrote:

> 
> 
> > -----Original Message-----
> > From: Richard Biener <[email protected]>
> > Sent: Friday, January 16, 2026 6:23 PM
> > To: [email protected]
> > Cc: Liu, Hongtao <[email protected]>
> > Subject: [PATCH] target/123603 - add --param ix86-vect-compare-costs
> > 
> > The following allows to switch the x86 target to use the vectorizer cost
> > comparison mechanic to select between different vector mode variants of
> > vectorizations.  The default is still to not do this but this allows an 
> > opt-in.
> > 
> 
> The patch LGTM.
> 
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> > 
> > For next stage1 I'll probably propose flipping the switch (or not add the 
> > switch
> > at all).  I'll follow up with a report on how CPU 2017 behaves with this on 
> > vs.
> 
> If possible, we should run the next SPEC CPU benchmarks (with more 
> vectorization) to decide whether to switch it on.
> I did similar tests on SPEC CPU 2017 two years ago - no clear benefits and 
> longer compile times, probably due to the crude cost model.
> 
> > off before considering to ask whether we want this switch for GCC 16 or not
> > (like if it only has overly negative effects).
> 
> It would be quite interesting if we could find that some benchmarks do show 
> benefits.

On SPEC CPU 2017 for -Ofast -march=znver4 this shows 2463 out of
39706 vectorized loops changing mode.  In 503 out of 12378 cases
we decided to not use masked epilogs.  Compile-time increases by ~1% 
overall.
With a quick 1-run there does not seem to be off-noise effects
for INT, this particular optimization and target option combination
and actual hardware to run on.  For FP 549.fotonik3d_r improves by 6%
(confirmed with a 2-run).

This was triggered by PR123190 and PR123603 which have cases where
comparing costs would have resulted in the faster vector size to be
used.  Both were reported for -O2 -march=x86-64-v3 -flto and with PGO.
The PR123603 recorded regression of 548.exchange2_r with these flags
is resolved with the flag (performance improves by 13%).  I don't
have SPEC 2006 on that machine so did not verify the PR123190 433.milc
regression, but that has been improved with the two earlier patches.
The --param has no effect on the testcase in the PR.

I do expect that some of our tricks in the x86 cost model to make
larger vector sizes unprofitable will be obsolete or are
counter-productive with cost comparison turned on.

I think the above shows having the knob is useful, if only to
gather more data.

In case there's no negative feedback I plan to merge this early
next week.

Thanks,
Richard.

>  > 
> >     PR target/123603
> >     * config/i386/i386.opt (-param=ix86-vect-compare-costs=): Add.
> >     * config/i386/i386.cc (ix86_autovectorize_vector_modes): Honor it.
> >     * doc/invoke.texi (ix86-vect-compare-costs): Document.
> > 
> >     * gcc.dg/vect/costmodel/x86_64/costmodel-pr123603.c: New
> > testcase.
> > ---
> >  gcc/config/i386/i386.cc                           |  2 +-
> >  gcc/config/i386/i386.opt                          |  4 ++++
> >  gcc/doc/invoke.texi                               |  3 +++
> >  .../vect/costmodel/x86_64/costmodel-pr123603.c    | 15
> > +++++++++++++++
> >  4 files changed, 23 insertions(+), 1 deletion(-)  create mode 100644
> > gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr123603.c
> > 
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index
> > 6bf4af8bbe3..a3d0f7cb649 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -25700,7 +25700,7 @@ ix86_autovectorize_vector_modes
> > (vector_modes *modes, bool all)
> >    if (TARGET_SSE2)
> >      modes->safe_push (V4QImode);
> > 
> > -  return 0;
> > +  return ix86_vect_compare_costs ? VECT_COMPARE_COSTS : 0;
> >  }
> > 
> >  /* Implemenation of targetm.vectorize.get_mask_mode.  */ diff --git
> > a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index
> > 99bb674812b..ef9efabcff6 100644
> > --- a/gcc/config/i386/i386.opt
> > +++ b/gcc/config/i386/i386.opt
> > @@ -1249,6 +1249,10 @@ Enable conservative small loop unrolling.
> >  Target Joined UInteger Var(ix86_vect_unroll_limit) Init(4) Param  Limit how
> > much the autovectorizer may unroll a loop.
> > 
> > +-param=ix86-vect-compare-costs=
> > +Target Joined UInteger Var(ix86_vect_compare_costs) Init(0)
> > +IntegerRange(0, 1) Param Optimization Whether x86 vectorizer cost
> > modeling compares costs of different vector sizes.
> > +
> >  mlam=
> >  Target RejectNegative Joined Enum(lam_type) Var(ix86_lam_type)
> > Init(lam_none)  -mlam=[none|u48|u57] Instrument meta data position in
> > user data pointers.
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index
> > b703b531d75..5092e4ba9ad 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -18213,6 +18213,9 @@ the discovery is aborted.
> >  @item ix86-vect-unroll-limit
> >  Limit how much the autovectorizer may unroll a loop.
> > 
> > +@item ix86-vect-compare-costs
> > +Whether x86 vectorizer cost modeling compares costs of different vector
> > sizes.
> > +
> >  @end table
> > 
> >  @end table
> > diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-
> > pr123603.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-
> > pr123603.c
> > new file mode 100644
> > index 00000000000..c074176a7e4
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr123603.c
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "--param ix86-vect-compare-costs=1" } */
> > +
> > +void foo (int *block)
> > +{
> > +  for (int i = 0; i < 3; ++i)
> > +    {
> > +      int a = block[i*9];
> > +      int b = block[i*9+1];
> > +      block[i*9] = a + 10;
> > +      block[i*9+1] = b + 10;
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "optimized: loop vectorized using 8 byte
> > +vectors" "vect" } } */
> > --
> > 2.51.0
> 

-- 
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to