[Bug target/100929] gcc fails to optimize less to min for SIMD code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2024-07-19 Ever confirmed|0 |1 --- Comment #7 from Andrew Pinski --- if_else_int is optimized starting in GCC 14: _4 = MIN_EXPR <_6, _7>; if_else_float is still not: ``` _7 = __builtin_ia32_cmpps256 (y_2(D), x_3(D), 17); _5 = VIEW_CONVERT_EXPR(_7); _4 = _5 < { 0, 0, 0, 0, 0, 0, 0, 0 }; _6 = .VCOND_MASK (_4, y_2(D), x_3(D)); ``` But that is a target issue.
[Bug target/100929] gcc fails to optimize less to min for SIMD code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929 --- Comment #6 from Marc Glisse --- (blend is now lowered in gimple) For the integer case, the mix of vector(int) and vector(char) obfuscates things a bit, we have __m256i if_else_int (__m256i x, __m256i y) { vector(32) char _4; vector(32) char _5; vector(32) char _6; vector(32) _7; vector(32) char _8; vector(4) long long int _9; vector(8) int _10; vector(8) int _11; vector(8) _12; vector(8) int _13; [local count: 1073741824]: _10 = VIEW_CONVERT_EXPR(x_2(D)); _11 = VIEW_CONVERT_EXPR(y_3(D)); _12 = _10 > _11; _13 = VEC_COND_EXPR <_12, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }>; _5 = VIEW_CONVERT_EXPR(_13); _4 = VIEW_CONVERT_EXPR(y_3(D)); _6 = VIEW_CONVERT_EXPR(x_2(D)); _7 = _5 < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; _8 = VEC_COND_EXPR <_7, _4, _6>; _9 = VIEW_CONVERT_EXPR<__m256i>(_8); return _9; } A first step would be to teach gcc that it can do a VEC_COND_EXPR<_12, _11, _10> with fewer VIEW_CONVERT_EXPR (maybe follow the definition chain of the condition through trivial ops like <0, view_convert or ?-1:0 until we find a real comparison _10 > _11, to determine the right size?). Other steps: * Move (or at least partially copy) fold_cond_expr_with_comparison to match.pd so we can recognize min/max. * Lower __builtin_ia32_cmpps256 (y_2(D), x_3(D), 17) to GIMPLE for the float case, if that's a valid thing to do (NaN, etc).
[Bug target/100929] gcc fails to optimize less to min for SIMD code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929 --- Comment #5 from Denis Yaroshevskiy --- x86 (https://godbolt.org/z/zPWbnqfPY) Options: -O3 -mavx2 ``` #include __m256 if_else_float(__m256 x, __m256 y) { __m256 mask = _mm256_cmp_ps(y, x, _CMP_LT_OQ); return _mm256_blendv_ps(x, y, mask); } __m256 min_float(__m256 x, __m256 y) { return _mm256_min_ps(x, y); } __m256i if_else_int(__m256i x, __m256i y) { __m256i mask = _mm256_cmpgt_epi32(x, y); return _mm256_blendv_epi8(x, y, mask); } __m256i min_int(__m256i x, __m256i y) { return _mm256_min_epi32(x, y); } ```
[Bug target/100929] gcc fails to optimize less to min for SIMD code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929 --- Comment #4 from Marc Glisse --- (In reply to Denis Yaroshevskiy from comment #3) > Is what @Andrew Pinski copied enough? I think so (it is missing the command line), although one example with an integer type could also help in case floats turn out to have a different issue. > -ftrapping-math causes clang to stop doing this optimisation. Note that -ftrapping-math is on by default with gcc (PR 54192), but -fno-trapping-math wouldn't solve your problem, we are missing other things.
[Bug target/100929] gcc fails to optimize less to min for SIMD code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929 --- Comment #3 from Denis Yaroshevskiy --- > Please attach your testcases to the bug report. Is what @Andrew Pinski copied enough? I can attach the same code as file. > I don't know if there would be issues for comparisons (with -ftrapping-math > for instance?). -ftrapping-math causes clang to stop doing this optimisation. I can see that clang does it, so I assume `nans` are OK without this flag. For ints this is for sure OK. > Note the other testcase is using eve which I have no idea what it is coming > from. Using eve just was much easier then writing this with intrinsics: The point was: vpcmpgtdymm2, ymm0, ymm1 vpblendvb ymm0, ymm0, ymm1, ymm2 should become vpminsd ymm0, ymm1, ymm0 And on arm: cmgtv2.4s, v0.4s, v1.4s bit v0.16b, v1.16b, v2.16b should become sminv0.4s, v1.4s, v0.4s And fcmgt v2.4s, v0.4s, v1.4s bit v0.16b, v1.16b, v2.16b should become fminv0.4s, v1.4s, v0.4s I don't really know how it is done in `gcc` - but all these examples look like the same issue. If it is very helpful to write all of them as intrinsics, I can.
[Bug target/100929] gcc fails to optimize less to min for SIMD code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929 --- Comment #2 from Andrew Pinski --- Original x86_64 testcase: #include __m256 if_else(__m256 x, __m256 y) { __m256 mask = _mm256_cmp_ps(y, x, _CMP_LT_OQ); return _mm256_blendv_ps(x, y, mask); } __m256 min(__m256 x, __m256 y) { return _mm256_min_ps(x, y); } CUT - Note the other testcase is using eve which I have no idea what it is coming from.
[Bug target/100929] gcc fails to optimize less to min for SIMD code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929 Marc Glisse changed: What|Removed |Added Version|og10 (devel/omp/gcc-10) |11.1.0 Keywords||missed-optimization Component|c++ |target Severity|normal |enhancement Target||x86_64-*-*