https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110222
Bug ID: 110222 Summary: Inefficient fully masked loop vectorization with AVX512 Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- gfortran.dg/matmul_2.f90 with -march=znver4 --param vect-partial-vector-usage=2 shows the c(:,1:7:2) = matmul(a,b(:,1:7:2)) innermost loop vectorized with note: vectorization_factor = 16, niters = 2 which means a statically known loop mask which is even power-of-two. This should be instead vectorized without masking and V2SImode vectors. Similarly for a theoretical niters = 3 or a niters < 4 this should use a smaller (but masked) vector mode for vectorization, not the target preferred 512bit size. The x86 target currently chooses to not get costs with different vector modes compared but in these cases statically selecting a better mode should be possible.