https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398
--- Comment #19 from Wilco <wilco at gcc dot gnu.org> --- (In reply to Jakub Jelinek from comment #18) > The duffs device doesn't need to be done with computed jump, it can be done > with 3 conditional branches + 3 comparisons too. The advantage of doing > that is especially if the iter isn't really very small, by doing it that way > you don't need those 4 unrolled iterations + one scalar loop. While that is better than an indirect branch, it still branches into the loop, so you don't benefit from optimization across the loop. So using a trailing loop is better in most cases. > Of course, if iter is very short, it might be easier/more efficient to > duplicate iter more times than 4 and do something else. If iter is a small constant then peeling makes sense as you can completely remove the loop counter and branches. Eg. for N=3 you end up with something like this: if (n >= 2) iter1 iter2 n -= 2 if (n != 0) iter3