https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649
--- Comment #7 from Jan Hubicka <hubicka at gcc dot gnu.org> --- I found the problem why vectorizer gets vectorized epilogue profile scales wrong. It is scale_profile_for_vect_loop that uses niter_for_unrolled_loop which does not understand the fact that if iteration count is not divisible, the epilogue (unless loop is masked) will use the count. THe upper bound compuation is actually right in update of loop_info, so we can just use it directly instead of relying on niter_for_unrolled_loop. Wrong profile in: ;; basic block 14, loop depth 2, count 13764235 (guessed, freq 1.9247), maybe hot ;; Invalid sum of incoming counts 25234431 (guessed, freq 3.5286), should be 13764235 (guessed, freq 1.9247) Is caused by loop peeling. The unrolled loop is peeled 4 times which seems like a reasonable idea, but I am not sure why profile is not updated correctly here.