https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325

--- Comment #10 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Hongtao Liu from comment #9)
> The original case is a little different from the one in PR.
But the issue is similar, after cunrolli, GCC failed to vectorize the outer
loop.

The interesting thing is in estimated_unrolled_size, the original unr_insns is
288 which is bigger than param_max_completely_peeled_insns(200), but unr_insn
is decreased by 1/3 due to

   Loop body is likely going to simplify further, this is difficult
   to guess, we just decrease the result by 1/3.  */

In practice, this loop body is not simplied for 1/3 of the instructions.

Considering the unroll factor is 16, the unr_insn is large(192), I was
wondering if we could add some heuristic algorithm to avoid complete loop
unroll, because usually for such a big loop, both loop and BB vectorizer may
not perform well.

Reply via email to