[Bug tree-optimization/111793] OpenMP SIMD inbranch clones for AVX512 are highly sub-optimal

rguenth at gcc dot gnu.org via Gcc-bugs Fri, 13 Oct 2023 00:43:53 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111793


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2023-10-13
     Ever confirmed|0                           |1

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
For AVX2 and simdlen(8) we get

  <bb 2> [local count: 1073741824]:
  vect__35.98_51 = VIEW_CONVERT_EXPR<vector(8) unsigned int>(simd.25_11(D));
  vect__36.99_52 = VIEW_CONVERT_EXPR<vector(8) unsigned int>(simd.26_13(D));
  vect__37.100_53 = vect__35.98_51 + vect__36.99_52;
  vect__3.101_54 = VIEW_CONVERT_EXPR<vector(8) int>(vect__37.100_53);
  mask__38.102_55 = mask.27_15(D) != { 0, 0, 0, 0, 0, 0, 0, 0 };
  if (mask__38.102_55 == { 0, 0, 0, 0, 0, 0, 0, 0 })
    goto <bb 4>; [20.00%]
  else
    goto <bb 3>; [80.00%]

  <bb 3> [local count: 858993419]:
  .MASK_STORE (&retval.21, 256B, mask__38.102_55, vect__3.101_54);

  <bb 4> [local count: 1073741824]:
  _8 = VIEW_CONVERT_EXPR<vector(8) int>(retval.21);
  return _8;

which is much more reasonable - I'm not sure whether the compare against 0
is required or if the ABI guarantees either 0 or -1 in the elements.  That
we end up with memory due to the use of a .MASK_STORE unfortunately persists
to the final assembly:

_ZGVdM8vv_foo:  
.LFB3:
        .cfi_startproc
        vpaddd  %ymm1, %ymm0, %ymm0
        vpxor   %xmm1, %xmm1, %xmm1
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        vpcmpeqd        %ymm1, %ymm2, %ymm2
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        andq    $-32, %rsp
        vpcmpeqd        %ymm1, %ymm2, %ymm2
        vptest  %ymm2, %ymm2
        je      .L12
        vpmaskmovd      %ymm0, %ymm2, -32(%rsp)
.L12:
        vmovdqa -32(%rsp), %ymm0
        leave
        .cfi_def_cfa 7, 8
        ret

there's the opportunity to maybe rewrite .MASK_STORE to a VEC_COND_EXPR
and rewriting retval.21 to SSA.  Maybe if-conversion can see this already
given retval.21 is local automatic and not address taken.

[Bug tree-optimization/111793] OpenMP SIMD inbranch clones for AVX512 are highly sub-optimal

Reply via email to