[Bug target/100929] gcc fails to optimize less to min for SIMD code

glisse at gcc dot gnu.org via Gcc-bugs Sun, 03 Apr 2022 05:56:49 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929


--- Comment #6 from Marc Glisse <glisse at gcc dot gnu.org> ---
(blend is now lowered in gimple)

For the integer case, the mix of vector(int) and vector(char) obfuscates things
a bit, we have

__m256i if_else_int (__m256i x, __m256i y)
{
  vector(32) char _4;
  vector(32) char _5;
  vector(32) char _6;
  vector(32) <signed-boolean:8> _7;
  vector(32) char _8; 
  vector(4) long long int _9;
  vector(8) int _10;
  vector(8) int _11;
  vector(8) <signed-boolean:32> _12;
  vector(8) int _13;

  <bb 2> [local count: 1073741824]: 
  _10 = VIEW_CONVERT_EXPR<vector(8) int>(x_2(D));
  _11 = VIEW_CONVERT_EXPR<vector(8) int>(y_3(D));
  _12 = _10 > _11;
  _13 = VEC_COND_EXPR <_12, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0,
0, 0, 0, 0 }>;
  _5 = VIEW_CONVERT_EXPR<vector(32) char>(_13);
  _4 = VIEW_CONVERT_EXPR<vector(32) char>(y_3(D));
  _6 = VIEW_CONVERT_EXPR<vector(32) char>(x_2(D));
  _7 = _5 < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  _8 = VEC_COND_EXPR <_7, _4, _6>;
  _9 = VIEW_CONVERT_EXPR<__m256i>(_8);
  return _9;
}

A first step would be to teach gcc that it can do a VEC_COND_EXPR<_12, _11,
_10> with fewer VIEW_CONVERT_EXPR (maybe follow the definition chain of the
condition through trivial ops like <0, view_convert or ?-1:0 until we find a
real comparison _10 > _11, to determine the right size?).

Other steps:

* Move (or at least partially copy) fold_cond_expr_with_comparison to match.pd
so we can recognize min/max.

* Lower __builtin_ia32_cmpps256 (y_2(D), x_3(D), 17) to GIMPLE for the float
case, if that's a valid thing to do (NaN, etc).

[Bug target/100929] gcc fails to optimize less to min for SIMD code

Reply via email to