https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106022
--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to H.J. Lu from comment #9) > (In reply to Richard Biener from comment #8) > > (In reply to H.J. Lu from comment #6) > > > Created attachment 53169 [details] > > > A patch > > > > > > This patch multiplies the vector store cost by the number of scalar > > > elements > > > in > > > a word to properly compare scalar store cost against vector store cost. > > > > But that's not "properly" but "wrong" ... > > > > Note we already cost the vector load from the constant pool so the vector > > side costing is correct. > > > > What's eventually imprecise is the scalar cost where you could anticipate > > store merging, but adjusting the vector cost side is just wrong. > > I tried to adjust the scalar cost. When the scalar cost of storing a byte > is 6, dividing it by 8 (the number of scalar elements in a word) becomes 0. > Will it work? No, I think you would need to pattern match an actual store sequence, for example by looking at if (STMT_VINFO_GROUPED_ACCESS (stmt_info) && pow2p_hwi (DR_GROUP_STORE_COUNT (stmt_info))) /* cost a possibly merged store only once (but with larger mode?) */ if (DR_GROUP_FIRST_ELEMENT (stmt_info) == stmt_info) ... So costing the whole sequence of scalar stores a single time, with adjusted mode. store-merging also handles non-QImode stores btw.