[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

juzhe.zhong at rivai dot ai via Gcc-bugs Mon, 22 Jan 2024 14:17:01 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441


--- Comment #10 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Tamar Christina from comment #9)
> So on SVE the change is cost modelling.
> 
> Bisect landed on g:33c2b70dbabc02788caabcbc66b7baeafeb95bcf which changed
> the compiler's defaults to using the new throughput matched cost modelling
> used be newer cores.
> 
> It looks like this changes which mode the compiler picks for when using a
> fixed register size.
> 
> This is because the new cost model (correctly) models the costs for FMAs and
> promotions.
> 
> Before:
> 
> array1[0][_1] 1 times scalar_load costs 1 in prologue
> int) _2 1 times scalar_stmt costs 1 in prologue
> 
> after:
> 
> array1[0][_1] 1 times scalar_load costs 1 in prologue 
> (int) _2 1 times scalar_stmt costs 0 in prologue 
> 
> and the cost goes from:
> 
> Vector inside of loop cost: 125
> 
> to
> 
> Vector inside of loop cost: 83 
> 
> so far, nothing sticks out, and in fact the profitability for VNx4QI drops
> from
> 
> Calculated minimum iters for profitability: 5
> 
> to
> 
> Calculated minimum iters for profitability: 3
> 
> This causes a clash, as this is now exactly the same cost as VNx2QI which
> used to be what it preferred before.
> 
> Which then leads it to pick the higher VF.
> 
> In the end smaller VF shows:
> 
> ;; Guessed iterations of loop 4 is 0.500488. New upper bound 1.
> 
> and now we get:
> 
> Vectorization factor 16 seems too large for profile prevoiusly believed to
> be consistent; reducing.  
> ;; Guessed iterations of loop 4 is 0.500488. New upper bound 0.
> ;; Scaling loop 4 with scale 66.6% (guessed) to reach upper bound 0
> 
> which I guess is the big difference.
> 
> There is a weird costing going on in the PHI nodes though:
> 
> m_108 = PHI <m_92(16), m_111(5)> 1 times vector_stmt costs 0 in body 
> m_108 = PHI <m_92(16), m_111(5)> 2 times scalar_to_vec costs 0 in prologue
> 
> they have collapsed to 0. which can't be right..

I don't think this change makes the regression since the regression not only
happens on ARM SVE but also on RVV.
It should be middle-end.

I believe you'd better use -fno-vect-cost-model.

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

Reply via email to