[Bug middle-end/102464] Miss optimization for (_Float16) sqrtf ((float) f16)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102464 --- Comment #18 from GCC Commits --- The master branch has been updated by Roger Sayle : https://gcc.gnu.org/g:589865a8e4f6bd26c622ea0ee0a38565a0d42e80 commit r15-1752-g589865a8e4f6bd26c622ea0ee0a38565a0d42e80 Author: Roger Sayle Date: Mon Jul 1 12:21:20 2024 +0100 testsuite: Fix -m32 gcc.target/i386/pr102464-vrndscaleph.c on RedHat. This patch fixes the 4 FAILs of gcc.target/i386/pr192464-vrndscaleph.c with --target_board='unix{-m32}' on RedHat 7.x. The issue is that this AVX512 test includes the system math.h, and on older systems this provides inline versions of floor, ceil and rint (for the 387). The work around is to define __NO_MATH_INLINES before #include (or alternatively use __builtin_floor, __builtin_ceil, etc.). 2024-07-01 Roger Sayle gcc/testsuite/ChangeLog PR middle-end/102464 * gcc.target/i386/pr102464-vrndscaleph.c: Define __NO_MATH_INLINES to resovle FAILs with -m32 on older RedHat systems.
[Bug middle-end/102464] Miss optimization for (_Float16) sqrtf ((float) f16)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102464 --- Comment #17 from jbeulich at suse dot com --- Largely the same is actually true for the RNDSCALEPH test added for the PR here.
[Bug middle-end/102464] Miss optimization for (_Float16) sqrtf ((float) f16)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102464 jbeulich at suse dot com changed: What|Removed |Added CC||jbeulich at suse dot com --- Comment #16 from jbeulich at suse dot com --- (In reply to Hongtao.liu from comment #15) > Fixed in GCC12. Only almost - the new FMA testcase there fails for i?86-*-*. I don't think even the few uses of VFMA* actually match the expectations. The majority of the operations are carried in the FPU anyway, despite -mfpmath=sse.
[Bug middle-end/102464] Miss optimization for (_Float16) sqrtf ((float) f16)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102464 Hongtao.liu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #15 from Hongtao.liu --- Fixed in GCC12.
[Bug middle-end/102464] Miss optimization for (_Float16) sqrtf ((float) f16)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102464 --- Comment #14 from CVS Commits --- The master branch has been updated by hongtao Liu : https://gcc.gnu.org/g:b879d40a17ec0409f1a2cd9ab6134bb28f53eea8 commit r12-5079-gb879d40a17ec0409f1a2cd9ab6134bb28f53eea8 Author: liuhongt Date: Thu Nov 4 16:05:45 2021 +0800 Simplify (trunc)MAX/MIN((extend)a, (extend)b) to MAX/MIN(a,b) a and b are same type as trunc type and has less precision than extend type. gcc/ChangeLog: PR target/102464 * match.pd: Simplify (trunc)fmax/fmin((extend)a, (extend)b) to MAX/MIN(a,b) gcc/testsuite/ChangeLog: * gcc.target/i386/pr102464-maxmin.c: New test.
[Bug middle-end/102464] Miss optimization for (_Float16) sqrtf ((float) f16)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102464 --- Comment #13 from CVS Commits --- The master branch has been updated by hongtao Liu : https://gcc.gnu.org/g:a1f7ead09cd41d32e2fe902eb32e587c36e7 commit r12-4985-ga1f7ead09cd41d32e2fe902eb32e587c36e7 Author: liuhongt Date: Mon Nov 8 09:32:17 2021 +0800 Add !HONOR_SNANS to simplifcation: (trunc)copysign((extend)a, (extend)b) to copysign (a, b). > Note that this is not safe with -fsignaling-nans, so needs to be disabled > for that option (if there isn't already logic somewhere with that effect), > because the extend will convert a signaling NaN to quiet (raising > "invalid"), but copysign won't, so this transformation could result in a > signaling NaN being wrongly returned when the original code would never > have returned a signaling NaN. > > -- > Joseph S. Myers > jos...@codesourcery.com gcc/ChangeLog PR target/102464 * match.pd (Simplifcation (trunc)copysign((extend)a, (extend)b) to .COPYSIGN (a, b)): Add !HONOR_SNANS.
[Bug middle-end/102464] Miss optimization for (_Float16) sqrtf ((float) f16)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102464 --- Comment #12 from CVS Commits --- The master branch has been updated by hongtao Liu : https://gcc.gnu.org/g:2ad1e8081f4797a99a96b513ffe14c7305e9b3d8 commit r12-4984-g2ad1e8081f4797a99a96b513ffe14c7305e9b3d8 Author: liuhongt Date: Mon Nov 8 09:19:29 2021 +0800 [Gimple] Simplify (trunc)fma ((extend)a, (extend)b, (extend)c) to IFN_FMA (a,b, c). a, b, c are same type as truncation type and has less precision than extend type, the optimization is guarded under flag_unsafe_math_optimizations. gcc/ChangeLog: PR target/102464 * match.pd: Simplify (trunc)fma ((extend)a, (extend)b, (extend)c) to IFN_FMA (a, b, c) under flag_unsafe_math_optimizations. gcc/testsuite/ChangeLog: * gcc.target/i386/pr102464-fma.c: New test.
[Bug middle-end/102464] Miss optimization for (_Float16) sqrtf ((float) f16)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102464 --- Comment #11 from CVS Commits --- The master branch has been updated by hongtao Liu : https://gcc.gnu.org/g:22ce7382fccc15ce2355306b3f5be7afc00f81f4 commit r12-4881-g22ce7382fccc15ce2355306b3f5be7afc00f81f4 Author: liuhongt Date: Wed Nov 3 16:07:34 2021 +0800 Simplify (trunc)copysign((extend)a, (extend)b) to .COPYSIGN (a,b). a and b are same type as the truncation type and has less precision than extend type. gcc/ChangeLog: PR target/102464 * match.pd: simplify (trunc)copysign((extend)a, (extend)b) to .COPYSIGN (a,b) when a and b are same type as the truncation type and has less precision than extend type. gcc/testsuite/ChangeLog: * gcc.target/i386/pr102464-copysign-1.c: New test.
[Bug middle-end/102464] Miss optimization for (_Float16) sqrtf ((float) f16)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102464 --- Comment #10 from CVS Commits --- The master branch has been updated by hongtao Liu : https://gcc.gnu.org/g:84bcefdaf6d95e08cd980965098961289215 commit r12-4780-g84bcefdaf6d95e08cd980965098961289215 Author: liuhongt Date: Mon Oct 25 15:20:35 2021 +0800 Enable vectorization for _Float16 floor/ceil/trunc/nearbyint/rint operations. gcc/ChangeLog: PR target/102464 * config/i386/i386-builtin-types.def (V8HF_FTYPE_V8HF): New function type. (V16HF_FTYPE_V16HF): Ditto. (V32HF_FTYPE_V32HF): Ditto. (V8HF_FTYPE_V8HF_ROUND): Ditto. (V16HF_FTYPE_V16HF_ROUND): Ditto. (V32HF_FTYPE_V32HF_ROUND): Ditto. * config/i386/i386-builtin.def ( IX86_BUILTIN_FLOORPH, IX86_BUILTIN_CEILPH, IX86_BUILTIN_TRUNCPH, IX86_BUILTIN_FLOORPH256, IX86_BUILTIN_CEILPH256, IX86_BUILTIN_TRUNCPH256, IX86_BUILTIN_FLOORPH512, IX86_BUILTIN_CEILPH512, IX86_BUILTIN_TRUNCPH512): New builtin. * config/i386/i386-builtins.c (ix86_builtin_vectorized_function): Enable vectorization for HFmode FLOOR/CEIL/TRUNC operation. * config/i386/i386-expand.c (ix86_expand_args_builtin): Handle new builtins. * config/i386/sse.md (rint2, nearbyint2): Extend to vector HFmodes. gcc/testsuite/ChangeLog: * gcc.target/i386/pr102464-vrndscaleph.c: New test.
[Bug middle-end/102464] Miss optimization for (_Float16) sqrtf ((float) f16)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102464 --- Comment #9 from CVS Commits --- The master branch has been updated by hongtao Liu : https://gcc.gnu.org/g:1a07bc9cda77b1211e95ae295b30e46c0d9ee222 commit r12-4651-g1a07bc9cda77b1211e95ae295b30e46c0d9ee222 Author: liuhongt Date: Mon Oct 25 10:51:33 2021 +0800 Simplify (_Float16) sqrtf((float) a) to .SQRT(a) when a is a _Float16 value. Similar for sqrt/sqrtl. gcc/ChangeLog: PR target/102464 * match.pd: Simplify (_Float16) sqrtf((float) a) to .SQRT(a) when direct_internal_fn_supported_p, similar for sqrt/sqrtl. gcc/testsuite/ChangeLog: PR target/102464 * gcc.target/i386/pr102464-sqrtph.c: New test. * gcc.target/i386/pr102464-sqrtsh.c: New test.
[Bug middle-end/102464] Miss optimization for (_Float16) sqrtf ((float) f16)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102464 --- Comment #8 from Hongtao.liu --- (In reply to Richard Biener from comment #3) > There's related optimizations in convert () which should ideally move to > match.pd When i try to mov convert stuffs to match.pd, i find some "mismatch", there's 3 cases 1. math functions are transformed under condition "optimize" 2. math functions are transformed under condition "optimize && flag_unsafe_math_optimizations" 3. math functions are transformed under condition "optimize && flag_unsafe_math_optimizations flag_errno_maths" And for logb, it's case 1, which means it can be transformed w/o !flag_errno_maths, but according to DEF_C99_BUILTIN(BUILT_IN_LOGB, "logb", BT_FN_DOUBLE_DOUBLE, ATTR_MATHFN_FPROUNDING_ERRNO), !flag_errno_maths is needed and the transformation will be prevented by gimple-match-head.c:maybe_push_res_to_seq /* We can't and should not emit calls to non-const functions. */ if (!(flags_from_decl_or_type (decl) & ECF_CONST)) return NULL; /* fabsl (extend(x)) -> extend(fabsf(x)), etc., if x is a float. */ (for froms (BUILT_IN_FABS BUILT_IN_FABSL BUILT_IN_LOGB BUILT_IN_LOGBL) tos (BUILT_IN_FABSF BUILT_IN_FABSF BUILT_IN_LOGBF BUILT_IN_LOGBF) (simplify (froms (convert float_value_p@0)) (if (optimize && canonicalize_math_p () && mathfn_built_in (TREE_TYPE (@0), froms)) (convert (tos @0)
[Bug middle-end/102464] Miss optimization for (_Float16) sqrtf ((float) f16)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102464 --- Comment #7 from CVS Commits --- The master branch has been updated by hongtao Liu : https://gcc.gnu.org/g:613196462a62a28de8414b9023ec2be9a29ac3dc commit r12-4242-g613196462a62a28de8414b9023ec2be9a29ac3dc Author: liuhongt Date: Fri Sep 24 19:17:42 2021 +0800 Simplify (_Float16) ceil ((double) x) to .CEIL (x) when available. gcc/ChangeLog: PR target/102464 * config/i386/i386.c (ix86_optab_supported_p): Return true for HFmode. * match.pd: Simplify (_Float16) ceil ((double) x) to __builtin_ceilf16 (a) when a is _Float16 type and direct_internal_fn_supported_p. gcc/testsuite/ChangeLog: * gcc.target/i386/pr102464.c: New test.
[Bug middle-end/102464] Miss optimization for (_Float16) sqrtf ((float) f16)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102464 --- Comment #6 from Hongtao.liu --- (In reply to Hongtao.liu from comment #5) > (gdb) p direct_internal_fn_supported_p (IFN_CEIL, type, OPTIMIZE_FOR_BOTH) > $110 = false > > (gdb) p direct_internal_fn_supported_p (IFN_SQRT, type, OPTIMIZE_FOR_BOTH) > $111 = true > > hmm, why? Hmm, Because in ix86_optab_supported_p, we have case rint_optab: if ((SSE_FLOAT_MODE_P (mode1) && TARGET_SSE_MATH && !flag_trapping_math && !TARGET_SSE4_1)) return opt_type == OPTIMIZE_FOR_SPEED; return true; case floor_optab: case ceil_optab: case btrunc_optab: if ((SSE_FLOAT_MODE_P (mode1) && TARGET_SSE_MATH && !flag_trapping_math && TARGET_SSE4_1) return true; return opt_type == OPTIMIZE_FOR_SPEED;
[Bug middle-end/102464] Miss optimization for (_Float16) sqrtf ((float) f16)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102464 --- Comment #5 from Hongtao.liu --- (gdb) p direct_internal_fn_supported_p (IFN_CEIL, type, OPTIMIZE_FOR_BOTH) $110 = false (gdb) p direct_internal_fn_supported_p (IFN_SQRT, type, OPTIMIZE_FOR_BOTH) $111 = true hmm, why?
[Bug middle-end/102464] Miss optimization for (_Float16) sqrtf ((float) f16)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102464 --- Comment #4 from joseph at codesourcery dot com --- Note that for fma this would only be valid for -funsafe-math-optimizations.
[Bug middle-end/102464] Miss optimization for (_Float16) sqrtf ((float) f16)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102464 --- Comment #3 from Richard Biener --- There's related optimizations in convert () which should ideally move to match.pd
[Bug middle-end/102464] Miss optimization for (_Float16) sqrtf ((float) f16)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102464 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2021-09-23 Keywords||internal-improvement --- Comment #2 from Andrew Pinski --- Confirmed. fabs and fma I don't think they need to be internal functions as there are already tree codes for them.
[Bug middle-end/102464] Miss optimization for (_Float16) sqrtf ((float) f16)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102464 --- Comment #1 from Hongtao.liu --- Similar optimization also applies for fma fmax/fmin fabs ldexp ceil floor trunc round rint nearbyint copysign Since AVX512-FP16 has corresponding instructions.