https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104912

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
Another thing is noticing the loop performs no vector loads/stores at all, all
of them are strided.  If we'd improve SLP analysis we could get equal (but
VF==1)
basic-block vectorization - but with the caveat of having to deal with the
possible aliasing of XPQKL(MPQ,MKL) and XPQKL(MRS,MKL).  Still in a case
where there's no aliasing doing BB vectorization will eventually be a better
solution.

That said - a x86 backend specific thing could be to count the number of
vector loads/stores as well as the number of strided loads/stores and
apply the biasing based on that at finish_cost time, not on the individual
case.
We can also count the number of "other" stmts in the loop body so to weight
the ratio between them.  For gamess it's 10 vector stmts vs. 6 strided
loads + 2 strided stores.  We could simply sum vector stmts (including
vector loads and stores), subtract the "emulated scalar" ones (maybe weight
the variably strided cases with a factor of two) and require the outcome
to be > 0 to be worthwhile to vectorize.  Eventually the finish_cost hook
should get a bool result to indicate that independent of the cost of the
scalar loop we do not want this vectorization (that's nicer than returning
an arbitrary high number for example).

Reply via email to