https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94092
--- Comment #3 from Marc Glisse <glisse at gcc dot gnu.org> --- Does profile feedback (so we have an idea on the loop count) make any difference? It seems clear that for a loop that in practice just copies one long, having to arrange the arguments, make a function call, test for alignment, etc, is a lot of overhead. What to do about it though...