https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325
--- Comment #10 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- (In reply to Hongtao Liu from comment #9) > The original case is a little different from the one in PR. But the issue is similar, after cunrolli, GCC failed to vectorize the outer loop. The interesting thing is in estimated_unrolled_size, the original unr_insns is 288 which is bigger than param_max_completely_peeled_insns(200), but unr_insn is decreased by 1/3 due to Loop body is likely going to simplify further, this is difficult to guess, we just decrease the result by 1/3. */ In practice, this loop body is not simplied for 1/3 of the instructions. Considering the unroll factor is 16, the unr_insn is large(192), I was wondering if we could add some heuristic algorithm to avoid complete loop unroll, because usually for such a big loop, both loop and BB vectorizer may not perform well.