[Bug libstdc++/77776] C++17 std::hypot implementation is poor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6 --- Comment #29 from g.peterh...@t-online.de --- (In reply to Jakub Jelinek from comment #28) > As long as the scale is a power of two or 1.0 / power of two, I don't see > why any version wouldn't be inaccurate. Yes, but the constant scale_up is incorrectly selected. scale_up = std::exp2(Type(limits::max_exponent-1)) --> ok scale_up = std::exp2(Type(limits::max_exponent/2)) --> error scale_up = prev_power2(sqrt_max) --> error scale_down = std::exp2(Type(limits::min_exponent-1)) also seems to me to be more favorable. PS: There seems to be a problem with random numbers and std::float16_t, which is why I use std::uniform_real_distribution. I have not yet found out exactly where the error lies. thx Gero template inline constexpr Type hypot_exp(Type x, Type y, Type z) noexcept { using limits = std::numeric_limits; constexpr Type zero = 0; x = std::abs(x); y = std::abs(y); z = std::abs(z); if (!(std::isnormal(x) && std::isnormal(y) && std::isnormal(z))) [[unlikely]] { if (std::isinf(x) | std::isinf(y) | std::isinf(z)) return limits::infinity(); else if (std::isnan(x) | std::isnan(y) | std::isnan(z)) return limits::quiet_NaN(); else { const bool xz{x == zero}, yz{y == zero}, zz{z == zero}; if (xz) { if (yz) return zz ? zero : z; else if (zz)return y; } else if (yz && zz) return x; } } if (x > z) std::swap(x, z); if (y > z) std::swap(y, z); int exp; z = std::frexp(z, &exp); y = std::ldexp(y, -exp); x = std::ldexp(x, -exp); return std::ldexp(std::sqrt(__builtin_assoc_barrier(x*x + y*y) + z*z), exp); } template inline constexpr Type hypot_gp(Type x, Type y, Type z)noexcept { using limits = std::numeric_limits; constexpr Type sqrt_min= std::sqrt(limits::min()), sqrt_max= std::sqrt(limits::max()), scale_up= std::exp2(Type(limits::max_exponent-1)), scale_down = std::exp2(Type(limits::min_exponent-1)), zero= 0; x = std::abs(x); y = std::abs(y); z = std::abs(z); if (!(std::isnormal(x) && std::isnormal(y) && std::isnormal(z))) [[unlikely]] { if (std::isinf(x) | std::isinf(y) | std::isinf(z)) return limits::infinity(); else if (std::isnan(x) | std::isnan(y) | std::isnan(z)) return limits::quiet_NaN(); else { const bool xz{x == zero}, yz{y == zero}, zz{z == zero}; if (xz) { if (yz) return zz ? zero : z; else if (zz)return y; } else if (yz && zz) return x; } } if (x > z) std::swap(x, z); if (y > z) std::swap(y, z); if (const bool b{z>=sqrt_min}; b && z<=sqrt_max) [[likely]] { // no scale return std::sqrt(__builtin_assoc_barrier(x*x + y*y) + z*z); } else { const Type scale = b ? scale_down : scale_up; x *= scale; y *= scale; z *= scale; return std::sqrt(__builtin_assoc_barrier(x*x + y*y) + z*z) / scale; } } template voidtest(const size_t count, const Type min, const Type max, const Type factor) { std::random_device rd{}; std::mt19937 gen{rd()}; std::uniform_real_distribution dis{min, max}; auto rnd = [&]() noexcept -> Type { return Type(dis(gen) * factor); }; for (size_t i=0; i; test(1024*1024, 0.5, 1, limits::max()); test(1024*1024, 0, 1, limits::min()); return EXIT_SUCCESS; }
[Bug libstdc++/77776] C++17 std::hypot implementation is poor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6 --- Comment #27 from g.peterh...@t-online.de --- Hi Matthias, thanks for your benchmark. I still have 2 questions: 1) Accuracy The frexp/ldexp variant seems to be the most accurate; is that correct? Then other constants would have to be used in hypot_gp: scale_up = std::exp2(Type(limits::max_exponent-1)) scale_down = std::exp2(Type(limits::min_exponent-1)) 2) Speed Your benchmark outputs several columns (Δ)Latency/(Δ)Throughput/Speedup. What exactly do the values stand for; what should be optimized for? thx Gero
[Bug libquadmath/114623] sqrtq and std::numeric_limits<__float128>::max()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114623 --- Comment #4 from g.peterh...@t-online.de --- That is precisely the design error of C/C++/etc. There should be no float/double/long double/__float128/etc, but *only* floatN_t. Then there wouldn't be these discrepancies (if necessary you have to emulate by SW). But that's just my humble opinion ... and now we have to face reality and make the best of it. One step might be to put std::float128_t and __float128 on a common/uniform code base :-) cu Gero
[Bug libquadmath/114623] sqrtq and std::numeric_limits<__float128>::max()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114623 --- Comment #2 from g.peterh...@t-online.de --- #include #include #include #include #include #include #include void print_hex(const std::float128_t value) { std::array buffer{}; const std::to_chars_result result{std::to_chars(buffer.data(), buffer.data()+buffer.size(), value, std::chars_format::hex)}; std::cout << std::string_view{buffer.data(), result.ptr} << std::endl; } template voidprint_sqrt_max_hex() { using limits = std::numeric_limits; print_hex(std::sqrt(limits::max())); } int main() { print_sqrt_max_hex(); print_sqrt_max_hex<__float128>(); return EXIT_SUCCESS; } gets 1.p+8191 1p+8192
[Bug libquadmath/114623] New: sqrt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114623 Bug ID: 114623 Summary: sqrt Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libquadmath Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- Hello, sqrt does not work for std::numeric_limits<__float128>::max(). I have not checked other (special) values, perhaps the problem also occurs there. Please see https://godbolt.org/z/bx8or94v7 regards Gero
[Bug libstdc++/77776] C++17 std::hypot implementation is poor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6 --- Comment #26 from g.peterh...@t-online.de --- must of course be "... / scale". How can I still edit posts?
[Bug libstdc++/77776] C++17 std::hypot implementation is poor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6 --- Comment #25 from g.peterh...@t-online.de --- Hi Matthias, to get good results on average (all FP-types: (B)FP16..FP128, scalar/vectorized(SIMD)/parallel/...) this algorithm seems to me (so far) to be suitable: template inline constexpr Type hypot_gp(Type x, Type y, Type z)noexcept { using limits = std::numeric_limits; constexpr Type sqrt_min= std::sqrt(limits::min()), sqrt_max= std::sqrt(limits::max()), scale_up= std::exp2( Type(limits::max_exponent/2)), scale_down = std::exp2(-Type(limits::max_exponent/2)), zero= 0; x = std::abs(x); y = std::abs(y); z = std::abs(z); if (!(std::isnormal(x) && std::isnormal(y) && std::isnormal(z))) [[unlikely]] { if (std::isinf(x) | std::isinf(y) | std::isinf(z)) return limits::infinity(); else if (std::isnan(x) | std::isnan(y) | std::isnan(z)) return limits::quiet_NaN(); else { const bool xz{x == zero}, yz{y == zero}, zz{z == zero}; if (xz) { if (yz) return zz ? zero : z; else if (zz)return y; } else if (yz && zz) return x; } } if (x > z) std::swap(x, z); if (y > z) std::swap(y, z); if ((z >= sqrt_min) && (z <= sqrt_max)) [[likely]] { // no scale return std::sqrt(__builtin_assoc_barrier(x*x + y*y) + z*z); } else { const Type scale = (z >= sqrt_min) ? scale_down : scale_up; x *= scale; y *= scale; z *= scale; return std::sqrt(__builtin_assoc_barrier(x*x + y*y) + z*z); } }
[Bug libstdc++/77776] C++17 std::hypot implementation is poor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6 --- Comment #23 from g.peterh...@t-online.de --- Hello Matthias, you've given me new ideas. I think we agree on implementing hypot3 using a scaling factor. But the correct value is not yet implemented here either; do you have a suggestion? A version here: https://godbolt.org/z/Gd53cG9YG I've intentionally broken hypot_gp into small pieces so that you can play around with it. This is of course unnecessary for a final version. General * The function must of course work efficiently with all FP types. Questions * Sorting: It is theoretically sufficient to sort the values x,y,z only to the extent that the condition x,y <= z is fulfilled (HYPOT_SORT_FULL). * Accuracy: This is better with fma (HYPOT_FMA). * How do you create the benchmarks? I could do this myself without getting on your nerves. thx Gero
[Bug libstdc++/77776] C++17 std::hypot implementation is poor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6 --- Comment #19 from g.peterh...@t-online.de --- > So, no need to use frexp/ldexp, just comparisons of hi above against sqrt of > (max finite / 3), in that case scale by multiplying all 3 args by some > appropriate scale constant, and similarly otherwise if lo1 is too small by > some large scale. I don't really know. With frexp/ldexp you probably get the highest accuracy (even if it is probably slower) instead of doing it manually. The problem is to determine suitable scaling factors and to adjust the (return)values accordingly. I have implemented both cases. Error * In the case (x==y && y==z), x*std::sqrt(T(3)) must not simply be returned, as this can lead to an overflow (inf). Generally * Instead of using fmin/fmax to determine the values hi,lo1,lo0, it is better to sort x,y,z. This is faster and clearer and no additional variables need to be introduced. * It also makes sense to consider the case (x==0 && y==0 && z==0). Optimizations * You were probably wondering why I wrote "if (std::isinf(x) | std::isinf(y) | std::isinf(z))", for example. This is intentional. The problem is that gcc almost always produces branch code for logical operations, so *a lot* of conditional jumps. By using arithmetic operations, so instead of || && just | &, I can get it to generate only actually necessary conditional jumps or cmoves. branchfree code is always better. template constexpr T hypot3_exp(T x, T y, T z) noexcept { using limits = std::numeric_limits; constexpr T zero = 0; x = std::abs(x); y = std::abs(y); z = std::abs(z); if (std::isinf(x) | std::isinf(y) | std::isinf(z)) [[unlikely]] return limits::infinity(); if (std::isnan(x) | std::isnan(y) | std::isnan(z)) [[unlikely]] return limits::quiet_NaN(); if ((x==zero) & (y==zero) & (z==zero)) [[unlikely]] return zero; if ((y==zero) & (z==zero)) [[unlikely]] return x; if ((x==zero) & (z==zero)) [[unlikely]] return y; if ((x==zero) & (y==zero)) [[unlikely]] return z; auto sort = [](T& a, T& b, T& c)constexpr noexcept -> void { if (a > b) std::swap(a, b); if (b > c) std::swap(b, c); if (a > b) std::swap(a, b); }; sort(x, y, z); // x <= y <= z int exp = 0; z = std::frexp(z, &exp); y = std::ldexp(y, -exp); x = std::ldexp(x, -exp); T sum = x*x + y*y; sum += z*z; return std::ldexp(std::sqrt(sum), exp); } template constexpr T hypot3_scale(T x, T y, T z) noexcept { using limits = std::numeric_limits; auto prev_power2 = [](const T value)constexpr noexcept -> T { return std::exp2(std::floor(std::log2(value))); }; constexpr T sqrtmax = std::sqrt(limits::max()), scale_up= prev_power2(sqrtmax), scale_down = T(1) / scale_up, zero= 0; x = std::abs(x); y = std::abs(y); z = std::abs(z); if (std::isinf(x) | std::isinf(y) | std::isinf(z)) [[unlikely]] return limits::infinity(); if (std::isnan(x) | std::isnan(y) | std::isnan(z)) [[unlikely]] return limits::quiet_NaN(); if ((x==zero) & (y==zero) & (z==zero)) [[unlikely]] return zero; if ((y==zero) & (z==zero)) [[unlikely]] return x; if ((x==zero) & (z==zero)) [[unlikely]] return y; if ((x==zero) & (y==zero)) [[unlikely]] return z; auto sort = [](T& a, T& b, T& c)constexpr noexcept -> void { if (a > b) std::swap(a, b); if (b > c) std::swap(b, c); if (a > b) std::swap(a, b); }; sort(x, y, z); // x <= y <= z const T scale = (z > sqrtmax) ? scale_down : (z < 1) ? scale_up : 1; x *= scale; y *= scale; z *= scale; T sum = x*x + y*y; sum += z*z; return std::sqrt(sum) / scale; } regards Gero
[Bug libstdc++/77776] C++17 std::hypot implementation is poor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6 --- Comment #13 from g.peterh...@t-online.de --- Thanks for the suggestions: template constexpr _Tp __hypot3(_Tp __x, _Tp __y, _Tp __z) noexcept { if (std::isinf(__x) | std::isinf(__y) | std::isinf(__z)) [[__unlikely__]] return _Tp(INFINITY); __x = std::fabs(__x); __y = std::fabs(__y); __z = std::fabs(__z); const _Tp __max = std::fmax(std::fmax(__x, __y), __z); if (__max == _Tp{}) [[__unlikely__]] return __max; __x /= __max; __y /= __max; __z /= __max; return std::sqrt(__x*__x + __y*__y + __z*__z) * __max; } The functions are then set to constexpr/noexcept. regards Gero
[Bug c/114181] issubnormal is a macro
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114181 g.peterh...@t-online.de changed: What|Removed |Added Resolution|INVALID |FIXED --- Comment #12 from g.peterh...@t-online.de --- If this comes into the C++ standard I would have to rewrite it anyway. Why not now that I have reported this error? Are there already plans how to deal with the https://en.cppreference.com/w/c/experimental/fpext1 https://en.cppreference.com/w/c/experimental/fpext4 regarding C++? thx Gero
[Bug c/114181] issubnormal is a macro
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114181 g.peterh...@t-online.de changed: What|Removed |Added Resolution|INVALID |FIXED --- Comment #10 from g.peterh...@t-online.de --- Exactly that does not work, because issubnormal is a simple macro. Only if before the implementation #undef issubnormal is made before implementation: https://godbolt.org/z/z3PG3hYev That is incorrect. * I have no idea what happens if math.h is already included somewhere and I subsequently undefine issubnormal. * It is not my job to program around any (compiler-specific) problems. The compiler has to do it right or it doesn't support it at all. Therefore issubnormal must be provided as a "real" function or via builtin.
[Bug c/114181] issubnormal is a macro
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114181 g.peterh...@t-online.de changed: What|Removed |Added Resolution|INVALID |FIXED --- Comment #8 from g.peterh...@t-online.de --- Of course, std::issubnormal is not yet available at the moment. To be able to implement this at all, issubnormal from math.h must not be a macro!
[Bug c/114181] issubnormal is a macro
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114181 g.peterh...@t-online.de changed: What|Removed |Added Resolution|MOVED |FIXED --- Comment #5 from g.peterh...@t-online.de --- > If you are implementing a cmath for a C++ implementation, you need to a > similar thing and `#undef` it. > The math.h that defines issubnormal comes from glibc. That's what I mean. See also e.g. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77925 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77926
[Bug c/114181] issubnormal is a macro
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114181 --- Comment #3 from g.peterh...@t-online.de --- Of course issubnormal is defined in math.h (in my case line 1088, gcc 13.2).
[Bug c/114181] New: issubnormal is a macro
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114181 Bug ID: 114181 Summary: issubnormal is a macro Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- issubnormal is a macro and therefore not a (builtin)function. This is incorrect, as no further issubnormal functions can be implemented, e.g. for C++ namespace std { bool issubnormal(...); } thx Gero
[Bug libstdc++/77776] C++17 std::hypot implementation is poor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6 g.peterh...@t-online.de changed: What|Removed |Added CC||g.peterh...@t-online.de --- Comment #11 from g.peterh...@t-online.de --- Would this be a good implementation for hypot3 in cmath? #define GCC_UNLIKELY(x) __builtin_expect(x, 0) #define GCC_LIKELY(x) __builtin_expect(x, 1) namespace __detail { template inline _GLIBCXX_CONSTEXPR typename enable_if::value, bool>::type __isinf3(const _Tp __x, const _Tp __y, const _Tp __z) noexcept { return bool(int(std::isinf(__x)) | int(std::isinf(__y)) | int(std::isinf(__z))); } template inline _GLIBCXX_CONSTEXPR typename enable_if::value, _Tp>::type __hypot3(_Tp __x, _Tp __y, _Tp __z) noexcept { __x = std::fabs(__x); __y = std::fabs(__y); __z = std::fabs(__z); const _Tp __max = std::fmax(std::fmax(__x, __y), __z); if (GCC_UNLIKELY(__max == _Tp{})) { return __max; } else { __x /= __max; __y /= __max; __z /= __max; return std::sqrt(__x*__x + __y*__y + __z*__z) * __max; } } } // __detail template inline _GLIBCXX_CONSTEXPR typename enable_if::value, _Tp>::type __hypot3(const _Tp __x, const _Tp __y, const _Tp __z) noexcept { return (GCC_UNLIKELY(__detail::__isinf3(__x, __y, __z))) ? numeric_limits<_Tp>::infinity() : __detail::__hypot3(__x, __y, __z); } #undef GCC_UNLIKELY #undef GCC_LIKELY How does it work? * Basically, I first pull out the special case INFINITY (see https://en.cppreference.com/w/cpp/numeric/math/hypot). * As an additional safety measure (to prevent misuse) the functions are defined by enable_if. constexpr * The hypot3 functions can thus be defined as _GLIBCXX_CONSTEXPR. Questions * To get a better runtime behavior I define GCC_(UN)LIKELY. Are there already such macros (which I have overlooked)? * The functions are noexcept. Does that make sense? If yes: why are the math functions not noexcept? thx Gero
[Bug libquadmath/114140] different results for std::fmin/std::fmax and quadmath fminq/fmaxq if one argument=signaling_NaN
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114140 --- Comment #13 from g.peterh...@t-online.de --- > The cppreference page is wrong. But then *all* of your implementations for fmin/fmax (float, double, long double, std::floatN_t) would be wrong, because they give exactly the results as described on cppreference. Is this really the case (which I don't believe)? And if so, that still doesn't solve the original problem: std::math-functions and quadmath-functions *must* of course return the same results - no matter which implementation is correct.
[Bug middle-end/114140] different results for std::fmin/std::fmax and quadmath fminq/fmaxq if one argument=signaling_NaN
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114140 --- Comment #7 from g.peterh...@t-online.de --- I think there is a misunderstanding. The problem is that std::fmin/std::fmax and quadmath fminq/fmaxq give different results when only *one* argument is signaling_NaN. The standard (https://en.cppreference.com/w/cpp/numeric/math/fmin + https://en.cppreference.com/w/cpp/numeric/math/fmax) says: * If one of the two arguments is NaN, the value of the other argument is returned * Only if both arguments are NaN, NaN is returned quadmath fminq/fmaxq also return NaN if only *one* argument is signaling_NaN.
[Bug target/50597] printf_fp.o: relocation R_X86_64_PC32 against `hack_digit.6607' can not be used when making a shared object; recompile with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50597 g.peterh...@t-online.de changed: What|Removed |Added CC||g.peterh...@t-online.de --- Comment #2 from g.peterh...@t-online.de --- I think there is a misunderstanding. The problem is that std::fmin/std::fmax and quadmath fminq/fmaxq give different results when only *one* argument is signaling_NaN. The standard (https://en.cppreference.com/w/cpp/numeric/math/fmin + https://en.cppreference.com/w/cpp/numeric/math/fmax) says: * If one of the two arguments is NaN, the value of the other argument is returned * Only if both arguments are NaN, NaN is returned quadmath fminq/fmaxq also return NaN if only *one* argument is signaling_NaN. thx Gero
[Bug libquadmath/114140] New: quadmath fminq/fmaxq with signaling_NaN not work
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114140 Bug ID: 114140 Summary: quadmath fminq/fmaxq with signaling_NaN not work Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libquadmath Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- please see https://godbolt.org/z/T4W8Mejxz Notes: * std::numeric_limits<__float128> (from boost) does not work properly, so I fall back to builtins. * std::fmin/fmax for __float128 calls fminq/fmaxq (boost, this works) thx Gero
[Bug libgcc/114131] New: std::isinf(std::float128_t) generates superfluous nan-checks
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114131 Bug ID: 114131 Summary: std::isinf(std::float128_t) generates superfluous nan-checks Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgcc Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- please see https://godbolt.org/z/djc9q1vcv test1(default): includes nan-checks (__unordtf2) test2: no nan-checks, but calls __eqtf2 test3: only checks for inf (via bit_cast); no additional function calls + branchfree. Of course, this only works if (unsigned) __int128 is available. thx Gero
[Bug libstdc++/114018] New: std::nexttoward is not implemented for C++23-FP-Types
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114018 Bug ID: 114018 Summary: std::nexttoward is not implemented for C++23-FP-Types Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- please see https://godbolt.org/z/EoKnEE8eT thx Gero
[Bug libstdc++/113260] missing from_chars/to_chars for __float128
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113260 --- Comment #7 from g.peterh...@t-online.de --- Thank you. That was my question whether these two functions could be added. At the moment I'm using boost.charconv https://github.com/cppalliance/charconv https://develop.charconv.cpp.al (not official yet) - but it's still completely buggy.
[Bug libstdc++/113260] missing from_chars/to_chars for __float128
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113260 --- Comment #5 from g.peterh...@t-online.de --- ??? I asked for std::from_chars/std::to_chars - which of course doesn't work: https://godbolt.org/z/n34dTajoc
[Bug libquadmath/113259] quadmath::nanq not support payload
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113259 --- Comment #2 from g.peterh...@t-online.de --- I'm currently fiddling around with a library for/with boost. I don't need this kind of incompatibility.
[Bug libstdc++/113260] missing from_chars/to_chars for __float128
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113260 --- Comment #3 from g.peterh...@t-online.de --- My problem is that I need from_chars/to_chars for __float128 also for older C++ standards that do not yet support _Float128/std::float128_t.
[Bug libquadmath/113260] New: missing from_chars/to_chars for __float128
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113260 Bug ID: 113260 Summary: missing from_chars/to_chars for __float128 Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libquadmath Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- Hello, can you add this? thx Gero
[Bug libquadmath/113259] New: quadmath::nanq not support payload
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113259 Bug ID: 113259 Summary: quadmath::nanq not support payload Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libquadmath Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- Hello, in https://github.com/gcc-mirror/gcc/blob/master/libquadmath/math/nanq.c there is only a comment that payloads are not supported. So it is incompatible with the standard. Will this be fixed? thx Gero
[Bug c++/109924] missing __builtin_nanf16b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109924 --- Comment #3 from g.peterh...@t-online.de --- But in your documentation https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html it is stated that the __builtin's would be available for all FP types. For upcoming standards https://en.cppreference.com/w/c/experimental/fpext1 this is needed anyway (setpayload etc.) thx Gero
[Bug c++/109928] New: std::abs(long/long long) are not constexpr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109928 Bug ID: 109928 Summary: std::abs(long/long long) are not constexpr Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- see std_abs.h regards Gero
[Bug c++/109924] New: missing __builtin_nanf16b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109924 Bug ID: 109924 Summary: missing __builtin_nanf16b Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- like __builtin_nansf16b regards Gero
[Bug c++/109884] __builtin_Xq returns _Float128 instead of __float128
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109884 --- Comment #3 from g.peterh...@t-online.de --- But these are different types (even if they are mathematically/behaviorally equivalent) std::is_same_v --> false
[Bug c++/109884] New: __builtin_Xq returns _Float128 instead of __float128
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109884 Bug ID: 109884 Summary: __builtin_Xq returns _Float128 instead of __float128 Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- #include #include #include #include #include template inline std::string nameof() { return boost::core::demangle(typeid(Type).name()); } int main() { std::cout << nameof() << std::endl; std::cout << nameof() << std::endl; std::cout << nameof() << std::endl; } compiled with 13 returns the incorrect type _Float128 _Float128 _Float128 with 12 or older gives the correct type __float128 __float128 __float128 regards Gero
[Bug libstdc++/109758] std::abs(__float128) doesn't support NaN
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109758 --- Comment #7 from g.peterh...@t-online.de --- 1) Can you please still submit a proposal to the STD/ISO committee so that abs (besides copysign/signbit) ALWAYS works ? 2) What do you think about my proposal for a C++ interface quadmath.hpp ?
[Bug libstdc++/109758] std::abs(__float128) doesn't support NaN
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109758 --- Comment #5 from g.peterh...@t-online.de --- >> Again, what do you mean by "quadmath"? __float128 https://github.com/gcc-mirror/gcc/tree/master/libquadmath This is not to be confused with C++23 std::float128_t.
[Bug libstdc++/109758] std::abs(__float128) doesn't support NaN
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109758 --- Comment #3 from g.peterh...@t-online.de --- >> libstdc++ doesn't depend on libquadmath and the __float128 support is there >> very limited. Yes, exactly. There should be nothing of quadmath in the std implementations of C/C++. But in bits/std_abs.h this is the case. >> Use std::float128_t instead (in GCC 13.1)? std::float128_t can only be used from C++23 on, but quadmath can also be used with older standard/compiler versions.
[Bug libquadmath/109758] New: quadmath abs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109758 Bug ID: 109758 Summary: quadmath abs Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libquadmath Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- Hello gcc-team, Problem: #include #include #include #include using T = __float128; int main() { const T neg_nan_v = -std::numeric_limits::quiet_NaN(); std::cout << neg_nan_v << std::endl; std::cout << "std::abs " << std::abs(neg_nan_v) << std::endl; std::cout << "fabsq " << fabsq(neg_nan_v) << std::endl; std::cout << "builtin " << __builtin_fabsf128(neg_nan_v) << std::endl; } -nan std::abs -nan fabsq nan builtin nan The problem can be found in bits/std_abs.h: #if !defined(__STRICT_ANSI__) && defined(_GLIBCXX_USE_FLOAT128) __extension__ inline _GLIBCXX_CONSTEXPR __float128 abs(__float128 __x) { return __x < 0 ? -__x : __x; } #endif Is this actually correct? If I compile with -U__STRICT_ANSI__ or remove/comment abs from bits/std_abs.h abs falls back to fabsq, which then also works. With std::abs(float/double/...) this problem does not occur. Wouldn't it make sense in principle to also provide a C++ header (quadmath.hpp)? #include namespace std { math-functions to_string/to_wstring to_chars/from_chars operator<< operator>> ... } thx Gero
[Bug c++/109378] new builtin like __builtin_sqrt but does not set errno
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109378 g.peterh...@t-online.de changed: What|Removed |Added Resolution|INVALID |FIXED --- Comment #11 from g.peterh...@t-online.de --- Ok, in detail: std::sqrt/__builtin_sqrt performs the check for nan in the calling context. This causes the following problems: * the calling context contains error handling/conditional jumps, which have nothing to do there but have to be handled in the error handling of std::sqrt * Because this does NOT happen in your implementation of std::sqrt, the code gets bloated, at the latest when a function contains more than one std::sqrt. Therefore * do complete error handling in std::sqrt/__builtin_sqrt * so there is only one exact call for std::sqrt, which can/must be vectorized.
[Bug c++/109378] new builtin like __builtin_sqrt but does not set errno
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109378 --- Comment #8 from g.peterh...@t-online.de --- But I don't want and can't use a version of std::sqrt that requires compiler specific flags/options/__builtins and injects internals of std::sqrt/__builtin_sqrt into the calling context/function. I just want to have a very dumb std::sqrt that does its error handling internally. Sorry, but is that too much to ask?
[Bug c++/109378] improve __builtin_sqrt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109378 --- Comment #4 from g.peterh...@t-online.de --- Hm. Maybe we misunderstood each other or I don't understand. I don't want to set -fno-math-errno or any other compiler-specific flag. My intention is that __builtin_sqrt doesn't "contaminate" the calling context with internals of __builtin_sqrt, but simply returns the result.
[Bug c++/109378] improve __builtin_sqrt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109378 --- Comment #2 from g.peterh...@t-online.de --- But this is of no use if I want to compile something "normally" without compiler specific options.
[Bug c++/109379] New: improve __builtin_fmal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109379 Bug ID: 109379 Summary: improve __builtin_fmal Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- Hello gcc team, __builtin_fmal generates quite a lot of overhead. Can you please optimize this or make it an inline function? thx Gero
[Bug c++/109378] New: improve __builtin_sqrt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109378 Bug ID: 109378 Summary: improve __builtin_sqrt Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- Hello gcc team, https://godbolt.org/z/Wa1rfxrPo when I write a function that contains std::sqrt, it always contains the nan(?) tests for the argument. E.g. sqrtf64. If I use my_sqrt the tests are done inside sqrt and not in the calling function - clear (because noinline). Wouldn't it be better to rewrite __builtin_sqrt so that these tests are done inside __builtin_sqrt and not already in the calling context? This would have the advantage that std::sqrt would not "contaminate" the calling function with conditional jumps and thus inflate it. I can make this clear with foo vs. bar. And of course __builtin_sqrt must be able to be vectorized automatically and must be inline for certain contexts (e.g. __FAST_MATH__). regards Gero
[Bug c++/109029] std::signbit(double) generiert sehr ineffizienten code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109029 --- Comment #1 from g.peterh...@t-online.de --- Ok in english std::signbit(double) generates very inefficient code and thus cannot be vectorized (https://godbolt.org/z/se6Ea8bo9).
[Bug c++/109029] New: std::signbit(double) generiert sehr ineffizienten code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109029 Bug ID: 109029 Summary: std::signbit(double) generiert sehr ineffizienten code Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- Hallo, std::signbit(double) generiert sehr ineffizienten code und kann somit nicht vektorisiert werden (https://godbolt.org/z/se6Ea8bo9). thx Gero -std=c++20 -march=x86-64-v3 -O3 -mno-vzeroupper #include #include #include #include static constexpr size_t Size = 1024; using float80_t = long double; using float64_t = double; using float32_t = float; template inline constexpr bool foo(const Type x) noexcept { return std::signbit(x); } template inline constexpr Type bar(const Type x) noexcept { return std::signbit(x) ? std::numbers::pi_v : 0; } template inline constexpr void for_all(Container& cnt, Function&& f) noexcept { std::transform(cnt.begin(), cnt.end(), cnt.begin(), f); } template inline constexpr void for_all(ContainerRes& res, const ContainerArg& arg, Function&& f) noexcept { std::transform(arg.begin(), arg.end(), res.begin(), f); } float64_t foo64(const float64_t x) noexcept { return foo(x); } float32_t foo32(const float32_t x) noexcept { return foo(x); } float64_t bar64(const float64_t x) noexcept { return bar(x); } float32_t bar32(const float32_t x) noexcept { return bar(x); } void foos64(std::array& res, const std::array& arg) noexcept { for_all(res, arg, foo); } void foos32(std::array& res, const std::array& arg) noexcept { for_all(res, arg, foo); } void bars64(std::array& cnt) noexcept { for_all(cnt, bar); } void bars32(std::array& cnt) noexcept { for_all(cnt, bar); }
[Bug target/109028] fcmov will not be generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109028 --- Comment #2 from g.peterh...@t-online.de --- > X87 code generation is definitely not as optimized as other code really. Ok > Also fcmov is newish. New? fcmov was introduced with the PentiumPro (1995) - that's 27 years ago. :-)
[Bug target/109028] New: fcmov will not be generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109028 Bug ID: 109028 Summary: fcmov will not be generated Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- Hello, very rarely fcmov instructions are generated (https://godbolt.org/z/qE6f76Gda) thx Gero #include #include #include static constexpr size_t Size = 1024; using float80_t = long double; using float64_t = double; using float32_t = float; template inline constexpr Type foo(const Type x) noexcept { return (x > 42) ? std::numbers::pi_v : std::numbers::e_v; } template inline constexpr Type bar(const Type x) noexcept { return std::signbit(x) ? std::numbers::pi_v : 0; } template inline constexpr Type baz(const Type x) noexcept { return std::copysign(std::numbers::pi_v, x); } template inline constexpr void for_all(Container& cnt, Function&& f) noexcept { for (auto& val : cnt) { val = f(val); } } float80_t foo80(const float80_t x) noexcept { return foo(x); } float80_t bar80(const float80_t x) noexcept { return bar(x); } float80_t baz80(const float80_t x) noexcept { return baz(x); } void foos80(std::array& cnt) noexcept { for_all(cnt, foo); } void bars80(std::array& cnt) noexcept { for_all(cnt, bar); } void bazs80(std::array& cnt) noexcept { for_all(cnt, baz); }
[Bug target/108902] Conversions std::float16_t<->float with FP16C are not vectorized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108902 --- Comment #5 from g.peterh...@t-online.de --- add test case (https://godbolt.org/z/q65cWKhWx) void inc_builtin(array_t& arr)noexcept { auto load_cvt = [](const std::float16_t*const ptr) noexcept { return __builtin_convertvector(*((const __m128h*const)ptr), __m256); }; auto save_cvt = [](std::float16_t* ptr, const __m256 arg)noexcept { *((__m128h*)ptr) = __builtin_convertvector(arg, __m128h); }; for (std::size_t i=0; i
[Bug c++/108902] New: Conversions std::float16_t<->float with FP16C are not vectorized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108902 Bug ID: 108902 Summary: Conversions std::float16_t<->float with FP16C are not vectorized Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- Please see https://godbolt.org/z/dGn4qhPef thx Gero
[Bug c++/107458] New: std::fma generates slow scalar-call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107458 Bug ID: 107458 Summary: std::fma generates slow scalar-call Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- Please see https://godbolt.org/z/bxxc9ezeM thx Gero
[Bug target/107432] __builtin_convertvector generates inefficient code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432 --- Comment #2 from g.peterh...@t-online.de --- Another example. I want to convert an array to array. There are basically 3 options: - Copy - Test (b2f64_default) - optimized version (b2f64_manually) gcc12.2 + gcctrunc convertSIZE_copy only generates scalar code (_mm_cvtsi64_sd) convertSIZE_default always generates conditional jumps convertSIZE_manually gcctrunc always generates branch-free scalar code gcc12.2 convert1024_manually generates vector code, but does not use HW conversion int8->int64 (_mm(256)_cvtepi8_epi64) and converts int8->int16->int32->int64 manually convert8_manually generates branch-free scalar code convert4_manually generates vector code and uses HW conversion int8->int64 NONE of these conversions are transformed/optimized to the extent that always - all available intrinsics are used - no "normal" registers are used - branch-free code is generated https://godbolt.org/z/f74vK79of thx Gero
[Bug c++/107432] New: __builtin_convertvector generates inefficient code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432 Bug ID: 107432 Summary: __builtin_convertvector generates inefficient code Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- Example: conversion int64_t -> int32_t avx512f + avx512vl HW conversions are available. avx2 There is a correctly working 32-bit-permutation (_mm256_permutevar8x32_epi32/vpermd) that can be used. I have not (yet) evaluated whether other conversions (larger int -> smaller int) are also affected. PS: On x86 it's already hell to optimize all cases depending on the instruction set. PPS: What about -march=znver4 ? https://godbolt.org/z/3s79bnh7v thx Gero
[Bug tree-optimization/107283] conversions u/int64_t to float64/32_t are not vectorized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107283 --- Comment #2 from g.peterh...@t-online.de --- That will be right. I had reported something similar many years ago - but it was not fixed. thx Gero
[Bug c++/107283] New: conversions u/int64_t to float64/32_t are not vectorized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107283 Bug ID: 107283 Summary: conversions u/int64_t to float64/32_t are not vectorized Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- The conversions u/int64_t to float64/32_t are not vectorisized if no HW-support (eg AVX512) available. But we can do that manually https://stackoverflow.com/questions/41144668/how-to-efficiently-perform-double-int64-conversions-with-sse-avx In the case u/int64_t -> float32_t i first convert to float64_t and then to float32_t. There might be a better way to implement this. With HW-support the standard implementation is of course faster. https://godbolt.org/z/WTa663PrK thx Gero
[Bug c++/107281] New: comparisations with u/int64_t constants not generate vector-result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107281 Bug ID: 107281 Summary: comparisations with u/int64_t constants not generate vector-result Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- If no 64-bit vector comparisons are available no vectorized results are produced for the cases <=, >=, <, and >. The cases == and != works. The comparisons themselves are then carried out individually, but the result is combined with unpcklqdq. It would be better if this works with all comparisons so that can better (auto)vectorized. It might be possible to further optimize this so that no scalar comparisons are necessary - especially for the frequent case constant=0. https://godbolt.org/z/cj8n9TenK thx Gero
[Bug libquadmath/104695] different bit patterns in __builtin_nans and libquadmath::nanq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104695 --- Comment #2 from g.peterh...@t-online.de --- Yes, that is very vaguely worded. However, the std functions or builtins must always return the same values on the same platform. quiet nan: libquadmath::nanq != __builtin_nanf128 signaling nan: __builtin_nansf64x != __builtin_nansl __builtin_nansf64 != __builtin_nans __builtin_nansf32 != __builtin_nansf
[Bug c++/104695] New: different bit patterns in __builtin_nans and libquadmath::nanq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104695 Bug ID: 104695 Summary: different bit patterns in __builtin_nans and libquadmath::nanq Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- Hello gcc-Team, the related __builtin_nans return different values and libquadmath::nanq ignores the parameter. Please see my test case https://godbolt.org/z/fda5vevPe regards Gero
[Bug target/100627] missing optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100627 --- Comment #2 from g.peterh...@t-online.de --- Hello, i found a better solution here https://stackoverflow.com/questions/41144668/how-to-efficiently-perform-double-int64-conversions-with-sse-avx and ported to "normal" C++-code (no intrinsics) https://godbolt.org/z/scjEdze99. This has these advantages: - constexpr - flexible - can be vectorized (autovectorization) These implementations require C++20 (std::bit_cast and constexpr std::exp2), but can easily be implemented with older C++ versions. Possibly this trick can also be used on s/uint64 -> float32, so that one saves the detour s/uint64 -> float64 -> float32. However, i have stated: - with -march=skylake-avx512 no AVX512 code is generated - only with -march=skylake-avx512 -mprefer-vector-width=512 or -mavx512f -mavx512dq -mavx512vl does that work - for s/uint64 -> float32 no correct AVX512 code is generated either (_mm512_cvtepi64_ps, _mm512_cvtepu64_ps) thx Gero
[Bug tree-optimization/100627] New: missing optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100627 Bug ID: 100627 Summary: missing optimization Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- Hello gcc team, i think i wrote something like that a long time ago, but i'm not sure. I think the standard conversion uint64_t -> float/double is inefficient when AVX512 is not available. At least on x86, but with SVE or other CPUs this may not be the case. Problems: - a lot of conditional jumps are generated, not BPU-friendly - and therefore not branchfree - larger codesize I briefly implemented a few conversions for SSE/SSE2 (https://godbolt.org/z/n63WedKT9). Advantages: - branchfree - mostly smaller codesize - more quickly Wouldn't it make sense to implement the standard conversion in this way (including for AVX/AVX2)? thx Gero
[Bug c++/100171] New: autovectorizer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100171 Bug ID: 100171 Summary: autovectorizer Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- Hello gcc team, I once wrote a small test case to show the problems with the autovectorizer https://godbolt.org/z/xs35P45MM . In particular, the += operator is not vectorized. The + operator works in the same context. I do not understand that. If you decrement the arraysize in foo from 2 to 1 it doesn't work at all anymore - scalar operations are always generated for ARR_2x. In general, I made the experience that the autovectorizer starts much too late. It should always do this from 2 values, even if these are much smaller than a simd register. This also saves a lot of memory accesses - especially when the data is linear in the memory (as in the example). Usually, however, vectorization is only carried out when the data is at least as large as a simd register, but often only when it is twice or even four times as large. I think you should urgently update/optimize the autovectorizer. thx & regards Gero
[Bug c++/99841] (temporary) refs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99841 --- Comment #2 from g.peterh...@t-online.de --- That is not the problem. I only made using type = ... and type(x) in the ctor calls so that I can test different types. You like to throw that out - has no influence.
[Bug c++/99841] New: (temporary) refs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99841 Bug ID: 99841 Summary: (temporary) refs Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- please see https://godbolt.org/z/Ez1K7eofr gcc gives different (false?) results than clang/icc. If you set O0 or remove O-option gives same results.
[Bug target/99228] blend/shuffle
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99228 --- Comment #5 from g.peterh...@t-online.de --- Here is a better test case. https://godbolt.org/z/3Gq783 I've found: sgn_complex - always inefficient code, TYPE and SIZE do not matter, even with -Ofast or -fast-math for TYPE=double SIZE=1 - abs/mul/div/pow2_complex ok - zero_complex not vectorized, also with -Ofast or -ffast-math SIZE=2 - abs/mul/div/pow2/zero_complex only with scalar operations, never vectorized SIZE=4 and larger - abs/mul/div/pow2/zero_complex ok for TYPE=float SIZE=1 - abs/mul/pow2_complex ok - div/zero_complex not vectorized, also with -Ofast or -ffast-math SIZE=2 - abs/mul/div/pow2/zero_complex only with scalar operations, never vectorized SIZE=4 - abs/pow2/zero_complex ok - mul_complex inefficient, xmm instead of ymm, also with -Ofast or -ffast-math - div_complex ok with O3, but with Ofast/fast-math only xmm instead of ymm SIZE=8 and larger - abs/mul/div/pow2_complex ok
[Bug target/99228] blend/shuffle
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99228 --- Comment #2 from g.peterh...@t-online.de --- I only use the types of boost here. You can remove boost and use: using float80_t = long double; using float64_t = double; using float32_t = float;
[Bug c++/99228] New: blend/shuffle
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99228 Bug ID: 99228 Summary: blend/shuffle Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- Hello ggc team, the compiler generates very inefficient code for the sgn functions (scalar and complex arguments) https://godbolt.org/z/zvE3Mf scalar - float32/64: 2 conditional jumps instead of blend/shuffle - float80: no fcmov - integer: only cmov instead of blend/shuffle complex - float32/64: 4 conditional jumps instead of blend/shuffle - float80: no fcmov - integer: only cmov instead of blend/shuffle For testing I have 3 versions each: v1: total disaster v2: better, only half of the jumps each time, but clang can't really handle that v3: like v2, but clang seems to work too. If you remove [[likely]] from conditional_move like v1. regards Gero