[Bug target/100929] gcc fails to optimize less to min for SIMD code

2024-07-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-07-19
 Ever confirmed|0   |1

--- Comment #7 from Andrew Pinski  ---
if_else_int is optimized starting in GCC 14:
  _4 = MIN_EXPR <_6, _7>;


if_else_float is still not:
```

  _7 = __builtin_ia32_cmpps256 (y_2(D), x_3(D), 17);
  _5 = VIEW_CONVERT_EXPR(_7);
  _4 = _5 < { 0, 0, 0, 0, 0, 0, 0, 0 };
  _6 = .VCOND_MASK (_4, y_2(D), x_3(D));
```

But that is a target issue.

[Bug target/100929] gcc fails to optimize less to min for SIMD code

2022-04-03 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929

--- Comment #6 from Marc Glisse  ---
(blend is now lowered in gimple)

For the integer case, the mix of vector(int) and vector(char) obfuscates things
a bit, we have

__m256i if_else_int (__m256i x, __m256i y)
{
  vector(32) char _4;
  vector(32) char _5;
  vector(32) char _6;
  vector(32)  _7;
  vector(32) char _8; 
  vector(4) long long int _9;
  vector(8) int _10;
  vector(8) int _11;
  vector(8)  _12;
  vector(8) int _13;

   [local count: 1073741824]: 
  _10 = VIEW_CONVERT_EXPR(x_2(D));
  _11 = VIEW_CONVERT_EXPR(y_3(D));
  _12 = _10 > _11;
  _13 = VEC_COND_EXPR <_12, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0,
0, 0, 0, 0 }>;
  _5 = VIEW_CONVERT_EXPR(_13);
  _4 = VIEW_CONVERT_EXPR(y_3(D));
  _6 = VIEW_CONVERT_EXPR(x_2(D));
  _7 = _5 < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  _8 = VEC_COND_EXPR <_7, _4, _6>;
  _9 = VIEW_CONVERT_EXPR<__m256i>(_8);
  return _9;
}

A first step would be to teach gcc that it can do a VEC_COND_EXPR<_12, _11,
_10> with fewer VIEW_CONVERT_EXPR (maybe follow the definition chain of the
condition through trivial ops like <0, view_convert or ?-1:0 until we find a
real comparison _10 > _11, to determine the right size?).

Other steps:

* Move (or at least partially copy) fold_cond_expr_with_comparison to match.pd
so we can recognize min/max.

* Lower __builtin_ia32_cmpps256 (y_2(D), x_3(D), 17) to GIMPLE for the float
case, if that's a valid thing to do (NaN, etc).

[Bug target/100929] gcc fails to optimize less to min for SIMD code

2021-06-07 Thread denis.yaroshevskij at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929

--- Comment #5 from Denis Yaroshevskiy  ---
x86  (https://godbolt.org/z/zPWbnqfPY)

Options: -O3 -mavx2
```
#include 

__m256 if_else_float(__m256 x, __m256 y) {
  __m256 mask = _mm256_cmp_ps(y, x, _CMP_LT_OQ);
  return _mm256_blendv_ps(x, y, mask);
}

__m256 min_float(__m256 x, __m256 y) {
  return _mm256_min_ps(x, y);
}

__m256i if_else_int(__m256i x, __m256i y) {
  __m256i mask = _mm256_cmpgt_epi32(x, y);
  return _mm256_blendv_epi8(x, y, mask);
}

__m256i min_int(__m256i x, __m256i y) {
  return _mm256_min_epi32(x, y);
}
```

[Bug target/100929] gcc fails to optimize less to min for SIMD code

2021-06-06 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929

--- Comment #4 from Marc Glisse  ---
(In reply to Denis Yaroshevskiy from comment #3)
> Is what @Andrew Pinski copied enough?

I think so (it is missing the command line), although one example with an
integer type could also help in case floats turn out to have a different issue.

> -ftrapping-math causes clang to stop doing this optimisation.

Note that -ftrapping-math is on by default with gcc (PR 54192), but
-fno-trapping-math wouldn't solve your problem, we are missing other things.

[Bug target/100929] gcc fails to optimize less to min for SIMD code

2021-06-06 Thread denis.yaroshevskij at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929

--- Comment #3 from Denis Yaroshevskiy  ---
> Please attach your testcases to the bug report.

Is what @Andrew Pinski copied enough? I can attach the same code as file.

> I don't know if there would be issues for comparisons (with -ftrapping-math 
> for instance?).

-ftrapping-math causes clang to stop doing this optimisation.

I can see that clang does it, so I assume `nans` are OK without this flag. For
ints this is for sure OK.

> Note the other testcase is using eve which I have no idea what it is coming 
> from.

Using eve just was much easier then writing this with intrinsics:

The point was:

vpcmpgtdymm2, ymm0, ymm1
vpblendvb   ymm0, ymm0, ymm1, ymm2

should become

vpminsd ymm0, ymm1, ymm0

And on arm:

cmgtv2.4s, v0.4s, v1.4s
bit v0.16b, v1.16b, v2.16b

should become
   sminv0.4s, v1.4s, v0.4s

And
fcmgt   v2.4s, v0.4s, v1.4s
bit v0.16b, v1.16b, v2.16b

should become
   fminv0.4s, v1.4s, v0.4s


I don't really know how it is done in `gcc` - but all these examples look like
the same issue. If it is very helpful to write all of them as intrinsics, I
can.

[Bug target/100929] gcc fails to optimize less to min for SIMD code

2021-06-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929

--- Comment #2 from Andrew Pinski  ---
Original x86_64 testcase:

#include 

__m256 if_else(__m256 x, __m256 y) {
  __m256 mask = _mm256_cmp_ps(y, x, _CMP_LT_OQ);
  return _mm256_blendv_ps(x, y, mask);
}

__m256 min(__m256 x, __m256 y) {
  return _mm256_min_ps(x, y);
}

 CUT -
Note the other testcase is using eve which I have no idea what it is coming
from.

[Bug target/100929] gcc fails to optimize less to min for SIMD code

2021-06-06 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929

Marc Glisse  changed:

   What|Removed |Added

Version|og10 (devel/omp/gcc-10) |11.1.0
   Keywords||missed-optimization
  Component|c++ |target
   Severity|normal  |enhancement
 Target||x86_64-*-*