https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617

--- Comment #27 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #26)
> (In reply to Hongtao Liu from comment #25)
> > in vectorization, store shouldn't have such high cost because once the
> > address is computed, the store instruction doesn't stall the pipeline
> > (except for memory dependencies that are hard for compilers to detect). I'm
> > experimenting with adjusting store costs: setting integer stores to
> > COST_N_INSNS (1) - 1 (minus 1 since integer can benefit from renaming and
> > have lower STLF costs), while keeping vector/floating-point stores at
> > COST_N_INSNS (1). This approach should discourage vectorization for integer
> > vector construction + vector store patterns, while still promoting
> > vectorization for floating-point operations in vector construct + vector
> > store scenarios. 
> > 
> > 
> > I'm testing below patch to see if there's any surprise.
> 
> Sth like this was also on my TODO list.  I'd have gone further and made
> stores zero cost as we generally do not model AGU latency (also a zero
> would more likely show up issues).

We still want to add preference to bigger store with the consideration of STLF,
 COST_N_INSNS(1) may be more profitable.

> 
> > 
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index 52f82185e32..420104a04b2 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -25363,8 +25363,7 @@ ix86_default_vector_cost (enum vect_cost_for_stmt
> > type_of_cost,
> >                               : ix86_cost->int_load [2]) / 2;
> > 
> >        case scalar_store:
> > -        return COSTS_N_INSNS (fp ? ix86_cost->sse_store[0]
> > -                             : ix86_cost->int_store [2]) / 2;
> > +       return fp ? COSTS_N_INSNS (1) : COSTS_N_INSNS (1) - 1;
> 
> why is FP more costly than int?
> 

1) Many processors support memory renaming for integer but not for fp/sse
2) For integer there's potential extra integer_to_sse that won't be caught like
the PR.

That's why I reduce integerĀ scalar_storeĀ  for by -1.

Reply via email to