https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110485
Bug ID: 110485 Summary: vectorizing simd clone calls without loop masking applied Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- #include <math.h> double a[1024]; double b[1024]; void foo (int n) { for (int i = 0; i < n; ++i) a[i] = pow (b[i], 71.2); } with -Ofast -march=znver4 --param vect-partial-vector-usage=1 gets us the following OK main loop .L4: vmovapd b(%rbx), %zmm0 vmovapd -112(%rbp), %zmm1 addq $64, %rbx call _ZGVeN8vv_pow vmovapd %zmm0, a-64(%rbx) cmpq %r13, %rbx jne .L4 but the following vectorized masked epilogue: movl %r12d, %eax andl $-8, %eax testb $7, %r12b je .L13 .L3: subl %eax, %r12d movl %eax, %edx vmovapd -112(%rbp), %zmm1 vpbroadcastw %r12d, %xmm0 leaq 0(,%rdx,8), %rbx vpcmpuw $6, .LC2(%rip), %xmm0, %k1 vmovapd b(,%rdx,8), %zmm0{%k1}{z} kmovb %k1, -113(%rbp) call _ZGVeN8vv_pow kmovb -113(%rbp), %k1 vmovapd %zmm0, a(%rbx){%k1} so we simply call _ZGVeN8vv_pow without any masking applied. That's possibly OK since we use zero-masking and thus actual masked argument lanes are zero but it seems this isn't the expected behavior for vectorizable_simd_clone_call. Instead it should probably unconditionally set LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) to false? Is there a way to query which SIMD clone is "happy" with zero arguments and thus for example with -ffast-math would be OK to run unmasked?