https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101097
Hongtao.liu <crazylht at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |crazylht at gmail dot com --- Comment #2 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Richard Biener from comment #1) > Hmm, so the difference is that we use loop vect for 'foo' but fail to do > that for 'bar' and BB vect succeeds. Disabling loop vect but enabling BB > vect also produces optimal code for 'foo' (unrolling happens before): > > foo: > .LFB0: > .cfi_startproc > vpmovzxwd (%rsi), %ymm0 > vpmovzxwd (%rdi), %ymm1 > vpaddd %ymm1, %ymm0, %ymm0 > vmovdqu %ymm0, (%rdx) > vzeroupper > > the key difference in the vectorizer is that BB vect supports different > vector sizes in the same instance but the loop vectorizer can only use > a single vector size. Is there any plan for extending loop vectorizer to handle different vector sizes?