13 Regression] Enable vectorizer generates extra load

rguenth at gcc dot gnu.org via Gcc-bugs Wed, 22 Jun 2022 23:20:02 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106022


--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to H.J. Lu from comment #9)
> (In reply to Richard Biener from comment #8)
> > (In reply to H.J. Lu from comment #6)
> > > Created attachment 53169 [details]
> > > A patch
> > > 
> > > This patch multiplies the vector store cost by the number of scalar 
> > > elements
> > > in
> > > a word to properly compare scalar store cost against vector store cost.
> > 
> > But that's not "properly" but "wrong" ...
> > 
> > Note we already cost the vector load from the constant pool so the vector
> > side costing is correct.
> > 
> > What's eventually imprecise is the scalar cost where you could anticipate
> > store merging, but adjusting the vector cost side is just wrong.
> 
> I tried to adjust the scalar cost.  When the scalar cost of storing a byte
> is 6, dividing it by 8 (the number of scalar elements in a word) becomes 0.
> Will it work?

No, I think you would need to pattern match an actual store sequence,
for example by looking at

 if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
     && pow2p_hwi (DR_GROUP_STORE_COUNT (stmt_info)))
   /* cost a possibly merged store only once (but with larger mode?) */
   if (DR_GROUP_FIRST_ELEMENT (stmt_info) == stmt_info)
     ...

So costing the whole sequence of scalar stores a single time, with
adjusted mode.

store-merging also handles non-QImode stores btw.

[Bug target/106022] [12/13 Regression] Enable vectorizer generates extra load

Reply via email to