https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110979
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rsandifo at gcc dot gnu.org --- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- For this particular costing there's also the issue that we perform costing at vectorizable_reduction time but then we don't know yet whether we will use partial vectors in the end. We try to apply costs due to using partial vectors in vect_estimate_min_profitable_iters but we don't have any good way to account for extra costs because some operations might expand differently when using partial vectors vs. not using partial vectors. The only way would be to separate costing of operations from the analysis phase or alternatively record multiple cost variants during analysis and pick the correct one later. I think the former, separating costing from analysis, might be the better way in the end. Note currently we cost _4 + sum_13 8 times vec_to_scalar costs 64 in body _4 + sum_13 8 times scalar_stmt costs 96 in body *_3 1 times unaligned_load (misalign -1) costs 12 in body t.c:9:21: note: operating on partial vectors. <unknown> 2 times vector_stmt costs 8 in prologue <unknown> 2 times vector_stmt costs 8 in body t.c:9:21: note: Cost model analysis: Vector inside of loop cost: 180 Vector prologue cost: 8 Vector epilogue cost: 0 Scalar iteration cost: 24 Scalar outside cost: 0 Vector outside cost: 8 prologue iterations: 0 epilogue iterations: 0 Minimum number of vector iterations: 1 Calculated minimum iters for profitability: 8 t.c:9:21: note: Runtime profitability threshold = 8 t.c:9:21: note: Static estimate profitability threshold = 8 t.c:9:21: note: ***** Analysis succeeded with vector mode V8DF The vector + overhead is thus cheaper than the scalar version but that assumes we'd actually run a full round of VF scalar iterations! If we'd add 7 times vec_to_scalar + scalar_stmt as epilogue cost we'd up the requirement considerably because the difference between scalar (192) and vector (180) is already quite small. I wonder if we wouldn't need to adjust our formula for static profitability to account for partial vectors? When using a variable upper loop bound we still see t.c:9:21: note: Cost model analysis: Vector inside of loop cost: 180 Vector prologue cost: 8 Vector epilogue cost: 0 Scalar iteration cost: 24 Scalar outside cost: 32 Vector outside cost: 8 prologue iterations: 0 epilogue iterations: 0 Minimum number of vector iterations: 1 Calculated minimum iters for profitability: 7 t.c:9:21: note: Runtime profitability threshold = 7 t.c:9:21: note: Static estimate profitability threshold = 32 t.c:9:21: note: no need for a runtime choice between the scalar and vector loops t.c:9:21: note: ***** Analysis succeeded with vector mode V8DF but if the scalar loop would only iterate once we'd have a cost of 24 there.