https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #13 from Jan Hubicka <hubicka at ucw dot cz> --- > So is this option still helping with the latest microcode? Not in this case at > least. It is on my TODO list to re-benchmark 256bit vectorization for Zen. I do not think microcode is a big difference here. Using 256 bit vectors has advantage of exposing more of parallelism but also disadvantage of requiring more involved setup. So for loops that vectorize naturally (like matrix multiplication) it can be win, while for loops that are difficult to vectorize it is a loss. So I think the early benchmarks did not look consistent and it is why 128bit mode was introduced. It is not that different form vectorizing for K8 which had split SSE registers in a similar fashion or for kabylake which splits 512 bit operations. While rewriting the cost-model I tried to keep this in mind and more acurately model the split operations, so it may be possible to switch to 256 by default. Ideally vectorizer should make a deicsion whether 128 or 256 is win for partiuclar loop but it doesn't seem to have infrastructure to do so. My plan is to split current flag into two - preffer 128bit and assume that registers are internally split and see if that is enough to get consistent win for 256 bit vectorization. Richi may know better. Honza