https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92464
Kewen Lin <linkw at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |ASSIGNED Last reconfirmed| |2019-11-12 Ever confirmed|0 |1 --- Comment #1 from Kewen Lin <linkw at gcc dot gnu.org> --- Before the regressed commit, the cost view looks like: 0x13135eb0 ic[i_35] 2 times vector_stmt costs 2 in prologue 0x13135eb0 ic[i_35] 1 times vector_stmt costs 1 in prologue 0x13135eb0 ic[i_35] 1 times vector_load costs 1 in body 0x13135eb0 ic[i_35] 1 times vec_perm costs 3 in body 0x13135eb0 _5 1 times vector_store costs 1 in body .c:21:3: note: not using a fully-masked loop. cost model: prologue peel iters set to vf/2. cost model: epilogue peel iters set to vf/2 because peeling for alignment is unknown. 0x13135eb0 <unknown> 1 times cond_branch_taken costs 3 in prologue 0x13135eb0 <unknown> 1 times cond_branch_not_taken costs 1 in prologue 0x13135eb0 <unknown> 1 times cond_branch_taken costs 3 in epilogue 0x13135eb0 <unknown> 1 times cond_branch_not_taken costs 1 in epilogue 0x13135eb0 ic[i_35] 2 times scalar_load costs 2 in prologue 0x13135eb0 ic[i_35] 2 times scalar_load costs 2 in epilogue 0x13135eb0 _5 2 times scalar_store costs 2 in prologue 0x13135eb0 _5 2 times scalar_store costs 2 in epilogue .c:21:3: note: Cost model analysis: Vector inside of loop cost: 5 Vector prologue cost: 11 Vector epilogue cost: 8 Scalar iteration cost: 2 Scalar outside cost: 0 Vector outside cost: 19 prologue iterations: 2 epilogue iterations: 2 Calculated minimum iters for profitability: 19 With the commit, the cost view is changed to: 0x13135eb0 ic[i_35] 2 times vector_stmt costs 2 in prologue 0x13135eb0 ic[i_35] 1 times vector_stmt costs 1 in prologue 0x13135eb0 ic[i_35] 1 times vector_load costs 2 in body 0x13135eb0 ic[i_35] 1 times vec_perm costs 3 in body 0x13135eb0 _5 1 times vector_store costs 1 in body .c:21:3: note: not using a fully-masked loop. cost model: prologue peel iters set to vf/2. cost model: epilogue peel iters set to vf/2 because peeling for alignment is unknown. 0x13135eb0 <unknown> 1 times cond_branch_taken costs 3 in prologue 0x13135eb0 <unknown> 1 times cond_branch_not_taken costs 1 in prologue 0x13135eb0 <unknown> 1 times cond_branch_taken costs 3 in epilogue 0x13135eb0 <unknown> 1 times cond_branch_not_taken costs 1 in epilogue 0x13135eb0 ic[i_35] 2 times scalar_load costs 4 in prologue 0x13135eb0 ic[i_35] 2 times scalar_load costs 4 in epilogue 0x13135eb0 _5 2 times scalar_store costs 2 in prologue 0x13135eb0 _5 2 times scalar_store costs 2 in epilogue .c:21:3: note: Cost model analysis: Vector inside of loop cost: 6 Vector prologue cost: 13 Vector epilogue cost: 10 Scalar iteration cost: 3 Scalar outside cost: 0 Vector outside cost: 23 prologue iterations: 2 epilogue iterations: 2 Calculated minimum iters for profitability: 12 The cost changes are expected, scalar and vector load cost more. It leads the profitable min iter count become small. I ran both before- and after-executable with 100000 invocations at 10 times, the evaluated time are very close, both average time are 65.23s. It means the cost adjustment doesn't make this case worse. One fix idea is to adjust the test case iteration count to 11 lower than the current profitable min iters count.