https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111793
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2023-10-13 Ever confirmed|0 |1 --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- For AVX2 and simdlen(8) we get <bb 2> [local count: 1073741824]: vect__35.98_51 = VIEW_CONVERT_EXPR<vector(8) unsigned int>(simd.25_11(D)); vect__36.99_52 = VIEW_CONVERT_EXPR<vector(8) unsigned int>(simd.26_13(D)); vect__37.100_53 = vect__35.98_51 + vect__36.99_52; vect__3.101_54 = VIEW_CONVERT_EXPR<vector(8) int>(vect__37.100_53); mask__38.102_55 = mask.27_15(D) != { 0, 0, 0, 0, 0, 0, 0, 0 }; if (mask__38.102_55 == { 0, 0, 0, 0, 0, 0, 0, 0 }) goto <bb 4>; [20.00%] else goto <bb 3>; [80.00%] <bb 3> [local count: 858993419]: .MASK_STORE (&retval.21, 256B, mask__38.102_55, vect__3.101_54); <bb 4> [local count: 1073741824]: _8 = VIEW_CONVERT_EXPR<vector(8) int>(retval.21); return _8; which is much more reasonable - I'm not sure whether the compare against 0 is required or if the ABI guarantees either 0 or -1 in the elements. That we end up with memory due to the use of a .MASK_STORE unfortunately persists to the final assembly: _ZGVdM8vv_foo: .LFB3: .cfi_startproc vpaddd %ymm1, %ymm0, %ymm0 vpxor %xmm1, %xmm1, %xmm1 pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 vpcmpeqd %ymm1, %ymm2, %ymm2 movq %rsp, %rbp .cfi_def_cfa_register 6 andq $-32, %rsp vpcmpeqd %ymm1, %ymm2, %ymm2 vptest %ymm2, %ymm2 je .L12 vpmaskmovd %ymm0, %ymm2, -32(%rsp) .L12: vmovdqa -32(%rsp), %ymm0 leave .cfi_def_cfa 7, 8 ret there's the opportunity to maybe rewrite .MASK_STORE to a VEC_COND_EXPR and rewriting retval.21 to SSA. Maybe if-conversion can see this already given retval.21 is local automatic and not address taken.