https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441
--- Comment #10 from JuzheZhong <juzhe.zhong at rivai dot ai> --- (In reply to Tamar Christina from comment #9) > So on SVE the change is cost modelling. > > Bisect landed on g:33c2b70dbabc02788caabcbc66b7baeafeb95bcf which changed > the compiler's defaults to using the new throughput matched cost modelling > used be newer cores. > > It looks like this changes which mode the compiler picks for when using a > fixed register size. > > This is because the new cost model (correctly) models the costs for FMAs and > promotions. > > Before: > > array1[0][_1] 1 times scalar_load costs 1 in prologue > int) _2 1 times scalar_stmt costs 1 in prologue > > after: > > array1[0][_1] 1 times scalar_load costs 1 in prologue > (int) _2 1 times scalar_stmt costs 0 in prologue > > and the cost goes from: > > Vector inside of loop cost: 125 > > to > > Vector inside of loop cost: 83 > > so far, nothing sticks out, and in fact the profitability for VNx4QI drops > from > > Calculated minimum iters for profitability: 5 > > to > > Calculated minimum iters for profitability: 3 > > This causes a clash, as this is now exactly the same cost as VNx2QI which > used to be what it preferred before. > > Which then leads it to pick the higher VF. > > In the end smaller VF shows: > > ;; Guessed iterations of loop 4 is 0.500488. New upper bound 1. > > and now we get: > > Vectorization factor 16 seems too large for profile prevoiusly believed to > be consistent; reducing. > ;; Guessed iterations of loop 4 is 0.500488. New upper bound 0. > ;; Scaling loop 4 with scale 66.6% (guessed) to reach upper bound 0 > > which I guess is the big difference. > > There is a weird costing going on in the PHI nodes though: > > m_108 = PHI <m_92(16), m_111(5)> 1 times vector_stmt costs 0 in body > m_108 = PHI <m_92(16), m_111(5)> 2 times scalar_to_vec costs 0 in prologue > > they have collapsed to 0. which can't be right.. I don't think this change makes the regression since the regression not only happens on ARM SVE but also on RVV. It should be middle-end. I believe you'd better use -fno-vect-cost-model.