Hi All, I sent an initial RFC of this patch[1] some time ago. I believe it is now in a complete state, with added the test cases for the builtins.
The builtin optimizations presented here were originally in glibc, but were removed and suggested that they were a good fit as gcc builtins[2]. So, this patch adds new rs6000 expand optimizations for fegetround and for some calls to feclearexcept and feraiseexcept. All of them C99 functions from fenv.h I replicated the optimizations semantics almost as-is as the glibc version, with the notable exception that for feclearexcept and feraiseexcept, the glibc builtin was not filtering only the valid flags, so it had and undefined behavior for values out of range. For the gcc builtin I thought was best to explicitly filter the valid ones, as the builtin does not return any error(same as the original) and there is only 4 valid flags. To check the FE_* flags used in feclearexcept and feraiseexcept expands I decided copy verbatim the definitions from glibc instead of using the macros, which would means including fenv.h somewhere to get them. Still on feclearexcept and feraiseexcept I, I am not sure I used exact_log2_cint_operand correctly because on my tests it kept accepting feclearexcept(0) and it should not. In any case, because I decided to test for all valid flags, this is not a problem for correct generation, but I thought I should mention it. tested on top of master (808f4dfeb3a95f50f15e71148e5c1067f90a126d) on the following plataforms with no regression: powerpc64le-linux-gnu (Power 9) powerpc64le-linux-gnu (Power 8) [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548998.html [2] https://sourceware.org/legacy-ml/libc-alpha/2020-03/msg00047.html https://sourceware.org/legacy-ml/libc-alpha/2020-03/msg00080.html ---- 8< ---- This optimizations were originally in glibc, but was removed and sugested that they were a good fit as gcc builtins[1]. The associated bugreport: PR target/94193 [1] https://sourceware.org/legacy-ml/libc-alpha/2020-03/msg00047.html https://sourceware.org/legacy-ml/libc-alpha/2020-03/msg00080.html 2020-08-13 Raoni Fassina Firmino <ra...@linux.ibm.com> gcc/ChangeLog: * builtins.c (expand_builtin_fegetround): New function. (expand_builtin_feclear_feraise_except): New function. (expand_builtin): Add cases for BUILT_IN_FEGETROUND, BUILT_IN_FECLEAREXCEPT and BUILT_IN_FERAISEEXCEPT * config/rs6000/rs6000.md (fegetroundsi): New pattern. (feclearexceptsi): New Pattern. (feraiseexceptsi): New Pattern. * optabs.def (fegetround_optab): New optab. (feclearexcept_optab): New optab. (feraiseexcept_optab): New optab. gcc/testsuite/ChangeLog: * gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-1.c: New test. * gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-2.c: New test. * gcc.target/powerpc/builtin-fegetround.c: New test. Signed-off-by: Raoni Fassina Firmino <ra...@linux.ibm.com> --- gcc/builtins.c | 75 ++++++++++ gcc/config/rs6000/rs6000.md | 82 +++++++++++ gcc/optabs.def | 4 + .../builtin-feclearexcept-feraiseexcept-1.c | 64 +++++++++ .../builtin-feclearexcept-feraiseexcept-2.c | 130 ++++++++++++++++++ .../gcc.target/powerpc/builtin-fegetround.c | 30 ++++ 6 files changed, 385 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/builtin-fegetround.c diff --git a/gcc/builtins.c b/gcc/builtins.c index beb56e06d8a..de6f34e0225 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -115,6 +115,8 @@ static rtx expand_builtin_mathfn_3 (tree, rtx, rtx); static rtx expand_builtin_mathfn_ternary (tree, rtx, rtx); static rtx expand_builtin_interclass_mathfn (tree, rtx); static rtx expand_builtin_sincos (tree); +static rtx expand_builtin_fegetround (tree, rtx, machine_mode); +static rtx expand_builtin_feclear_feraise_except (tree, rtx, machine_mode, optab); static rtx expand_builtin_cexpi (tree, rtx); static rtx expand_builtin_int_roundingfn (tree, rtx); static rtx expand_builtin_int_roundingfn_2 (tree, rtx); @@ -2577,6 +2579,59 @@ expand_builtin_sincos (tree exp) return const0_rtx; } +/* Expand call EXP to the fegetround builtin (from C99 venv.h), returning the + result and setting it in TARGET. Otherwise return NULL_RTX on failure. */ +static rtx +expand_builtin_fegetround (tree exp, rtx target, machine_mode target_mode) +{ + if (!validate_arglist (exp, VOID_TYPE)) + return NULL_RTX; + + insn_code icode = direct_optab_handler (fegetround_optab, SImode); + if (icode == CODE_FOR_nothing) + return NULL_RTX; + + if (target == 0 + || GET_MODE (target) != target_mode + || ! (*insn_data[icode].operand[0].predicate) (target, target_mode)) + target = gen_reg_rtx (target_mode); + + rtx pat = GEN_FCN (icode) (target); + if (! pat) + return NULL_RTX; + emit_insn (pat); + + return target; +} + +/* Expand call EXP to either feclearexcept or feraiseexcept builtins (from C99 + venv.h), returning the result and setting it in TARGET. Otherwise return + NULL_RTX on failure. */ +static rtx +expand_builtin_feclear_feraise_except (tree exp, rtx target, + machine_mode target_mode, optab op_optab) +{ + if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE)) + return NULL_RTX; + rtx op0 = expand_normal (CALL_EXPR_ARG (exp, 0)); + + insn_code icode = direct_optab_handler (op_optab, SImode); + if (icode == CODE_FOR_nothing) + return NULL_RTX; + + if (target == 0 + || GET_MODE (target) != target_mode + || ! (*insn_data[icode].operand[0].predicate) (target, target_mode)) + target = gen_reg_rtx (target_mode); + + rtx pat = GEN_FCN (icode) (target, op0); + if (! pat) + return NULL_RTX; + emit_insn (pat); + + return target; +} + /* Expand a call to the internal cexpi builtin to the sincos math function. EXP is the expression that is a call to the builtin function; if convenient, the result should be placed in TARGET. */ @@ -8087,6 +8142,26 @@ expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode, return target; break; + case BUILT_IN_FEGETROUND: + target = expand_builtin_fegetround (exp, target, target_mode); + if (target) + return target; + break; + + case BUILT_IN_FECLEAREXCEPT: + target = expand_builtin_feclear_feraise_except (exp, target, target_mode, + feclearexcept_optab); + if (target) + return target; + break; + + case BUILT_IN_FERAISEEXCEPT: + target = expand_builtin_feclear_feraise_except (exp, target, target_mode, + feraiseexcept_optab); + if (target) + return target; + break; + case BUILT_IN_APPLY_ARGS: return expand_builtin_apply_args (); diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 43b620ae1c0..55ab0196334 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -6565,6 +6565,88 @@ [(set_attr "type" "fpload") (set_attr "length" "8") (set_attr "isa" "*,p8v,p8v")]) + +;; int __builtin_fegetround() +(define_expand "fegetroundsi" + [(use (match_operand:SI 0 "gpc_reg_operand"))] + "TARGET_HARD_FLOAT" +{ + rtx tmp_df = gen_reg_rtx (DFmode); + emit_insn (gen_rs6000_mffsl (tmp_df)); + + rtx tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0); + rtx tmp_di_2 = gen_reg_rtx (DImode); + emit_insn (gen_anddi3 (tmp_di_2, tmp_di, GEN_INT (3))); + rtx tmp_si = gen_reg_rtx (SImode); + tmp_si = simplify_gen_subreg (SImode, tmp_di_2, DImode, 0); + emit_move_insn (operands[0], tmp_si); + DONE; +}) + +;; int feclearexcept(int excepts) +;; +;; This expansion for the C99 function only works when excepts is a +;; constant know at compile time and specifying only one of +;; FE_INEXACT, FE_DIVBYZERO, FE_UNDERFLOW and FE_OVERFLOW flags. +;; It dosen't handle values out of range, and always returns 0. +;; Note that FE_INVALID is unsuported because it maps to more than +;; one bit on FPSCR register. +;; Because this restrictions, this only expands on the desired cases. +(define_expand "feclearexceptsi" + [(use (match_operand:SI 1 "exact_log2_cint_operand" "N")) + (set (match_operand:SI 0 "gpc_reg_operand") + (const_int 0))] + "TARGET_HARD_FLOAT" +{ + switch (INTVAL (operands[1])) + { + case (1 << (31 - 6)): /* FE_INEXACT */ + case (1 << (31 - 5)): /* FE_DIVBYZERO */ + case (1 << (31 - 4)): /* FE_UNDERFLOW */ + case (1 << (31 - 3)): /* FE_OVERFLOW */ + break; + default: + FAIL; + } + + rtx tmp = gen_rtx_CONST_INT (SImode, __builtin_clz (INTVAL(operands[1]))); + emit_insn (gen_rs6000_mtfsb0 (tmp)); + emit_move_insn (operands[0], GEN_INT (0)); + DONE; +}) + +;; int fegraiseexcept(int excepts) +;; +;; This expansion for the C99 function only works when excepts is a +;; constant know at compile time and specifying only one of +;; FE_INEXACT, FE_DIVBYZERO, FE_UNDERFLOW and FE_OVERFLOW flags. +;; It dosen't handle values out of range, and always returns 0. +;; Note that FE_INVALID is unsuported because it maps to more than +;; one bit on FPSCR register. +;; Because this restrictions, this only expands on the desired cases. +(define_expand "feraiseexceptsi" + [(use (match_operand:SI 1 "exact_log2_cint_operand" "N")) + (set (match_operand:SI 0 "gpc_reg_operand") + (const_int 0))] + "TARGET_HARD_FLOAT" +{ + switch (INTVAL (operands[1])) + { + case (1 << (31 - 6)): /* FE_INEXACT */ + case (1 << (31 - 5)): /* FE_DIVBYZERO */ + case (1 << (31 - 4)): /* FE_UNDERFLOW */ + case (1 << (31 - 3)): /* FE_OVERFLOW */ + break; + default: + FAIL; + } + + rtx tmp = gen_rtx_CONST_INT (SImode, __builtin_clz (INTVAL(operands[1]))); + emit_insn (gen_rs6000_mtfsb1 (tmp)); + emit_move_insn (operands[0], GEN_INT (0)); + DONE; +}) + ;; Define the TImode operations that can be done in a small number ;; of instructions. The & constraints are to prevent the register diff --git a/gcc/optabs.def b/gcc/optabs.def index 78409aa1453..987ee0f79dc 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -318,6 +318,10 @@ OPTAB_D (sinh_optab, "sinh$a2") OPTAB_D (tan_optab, "tan$a2") OPTAB_D (tanh_optab, "tanh$a2") +OPTAB_D (fegetround_optab, "fegetround$a") +OPTAB_D (feclearexcept_optab, "feclearexcept$a") +OPTAB_D (feraiseexcept_optab, "feraiseexcept$a") + /* C99 implementations of fmax/fmin. */ OPTAB_D (fmax_optab, "fmax$a3") OPTAB_D (fmin_optab, "fmin$a3") diff --git a/gcc/testsuite/gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-1.c b/gcc/testsuite/gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-1.c new file mode 100644 index 00000000000..959715aea7a --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-1.c @@ -0,0 +1,64 @@ +/* { dg-do run { target { powerpc*-*-* } } } */ +/* { dg-options "-lm -fno-builtin" } */ + +/* This testcase ensures that the builtins expand with the matching arguments + * or otherwise fallback gracefull to a function call, and don't ICE during + * compilation. */ + +#include <fenv.h> + +/* We use __builtin_* version to avoid the function be replaced by glibc that + * may have an inline optimization for it in fenv.h. */ +int main () +{ + int rsi = 0; + long rsl = 0; + short rss = 0; + char rsc = 0; + + unsigned int rui = 0; + unsigned long rul = 0; + unsigned short rus = 0; + unsigned char ruc = 0; + + int e = FE_DIVBYZERO; + + __builtin_feclearexcept(e); // CALL + __builtin_feclearexcept(0); // CALL + __builtin_feclearexcept(FE_ALL_EXCEPT); // CALL + __builtin_feclearexcept(FE_INVALID); // CALL + __builtin_feclearexcept(FE_INEXACT | FE_DIVBYZERO); // CALL + __builtin_feclearexcept(FE_INEXACT); // EXPAND + __builtin_feclearexcept(FE_DIVBYZERO); // EXPAND + __builtin_feclearexcept(FE_UNDERFLOW); // EXPAND + __builtin_feclearexcept(FE_OVERFLOW); // EXPAND + + rsi = __builtin_feclearexcept(FE_DIVBYZERO); // EXPAND + rsl = __builtin_feclearexcept(FE_DIVBYZERO); // EXPAND + rss = __builtin_feclearexcept(FE_DIVBYZERO); // EXPAND + rsc = __builtin_feclearexcept(FE_DIVBYZERO); // EXPAND + rui = __builtin_feclearexcept(FE_DIVBYZERO); // EXPAND + rul = __builtin_feclearexcept(FE_DIVBYZERO); // EXPAND + rus = __builtin_feclearexcept(FE_DIVBYZERO); // EXPAND + ruc = __builtin_feclearexcept(FE_DIVBYZERO); // EXPAND + + + __builtin_feraiseexcept(e); // CALL + __builtin_feraiseexcept(0); // CALL + __builtin_feraiseexcept(FE_ALL_EXCEPT); // CALL + __builtin_feraiseexcept(FE_INVALID); // CALL + __builtin_feraiseexcept(FE_INEXACT | FE_DIVBYZERO); // CALL + __builtin_feraiseexcept(FE_INEXACT); // EXPAND + __builtin_feraiseexcept(FE_DIVBYZERO); // EXPAND + __builtin_feraiseexcept(FE_UNDERFLOW); // EXPAND + __builtin_feraiseexcept(FE_OVERFLOW); // EXPAND + + rsi = __builtin_feraiseexcept(FE_DIVBYZERO); // EXPAND + rsl = __builtin_feraiseexcept(FE_DIVBYZERO); // EXPAND + rss = __builtin_feraiseexcept(FE_DIVBYZERO); // EXPAND + rsc = __builtin_feraiseexcept(FE_DIVBYZERO); // EXPAND + rui = __builtin_feraiseexcept(FE_DIVBYZERO); // EXPAND + rul = __builtin_feraiseexcept(FE_DIVBYZERO); // EXPAND + rus = __builtin_feraiseexcept(FE_DIVBYZERO); // EXPAND + ruc = __builtin_feraiseexcept(FE_DIVBYZERO); // EXPAND +} diff --git a/gcc/testsuite/gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-2.c b/gcc/testsuite/gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-2.c new file mode 100644 index 00000000000..8ae0f9d0e43 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-2.c @@ -0,0 +1,130 @@ +/* { dg-do run { target { powerpc*-*-* } } } */ +/* { dg-options "-lm -fno-builtin" } */ + +/* This testcase ensures that the builtins are correctly expanded and match the + * expected result. */ + +#include <fenv.h> + +#ifdef DEBUG +#include <stdio.h> +#define INFO(...) printf(__VA_ARGS__) +#define FAIL(v, e, x, s, f) \ + printf("ERROR [l %d] testing %s(%x): %s returned %x," \ + " expecected %x\n", __LINE__, s, x, f, v, e) +#else +void abort (void); +#define INFO(...) +#define FAIL(v, e, x, s, f) abort() +#endif + +/* We use __builtin_* version to avoid the function be replaced by glibc that + * may have an inline optimization for it in fenv.h. */ +int main () +{ + char *s = 0; + int e = 0; + int raised = 0; + + s = "FE_ALL_EXCEPT"; + e = FE_ALL_EXCEPT; + INFO("test: %s(%x)\n", s, e); + + feclearexcept(FE_ALL_EXCEPT); + __builtin_feraiseexcept(FE_ALL_EXCEPT); + raised = fetestexcept(FE_ALL_EXCEPT); + if (raised != e) + FAIL(raised, e, e, s, "__builtin_feraiseexcept"); + + feraiseexcept(FE_ALL_EXCEPT); + __builtin_feclearexcept(FE_ALL_EXCEPT); + raised = fetestexcept(FE_ALL_EXCEPT); + if (raised == FE_ALL_EXCEPT & ~e) + FAIL(raised, FE_ALL_EXCEPT & ~e, e, s, "__builtin_feclearexcept"); + + + s = "FE_DIVBYZERO"; + e = FE_DIVBYZERO; + INFO("test: %s(%x)\n", s, e); + + feclearexcept(FE_ALL_EXCEPT); + __builtin_feraiseexcept(FE_DIVBYZERO); + raised = fetestexcept(FE_ALL_EXCEPT); + if (raised != e) + FAIL(raised, e, e, s, "__builtin_feraiseexcept"); + + feraiseexcept(FE_ALL_EXCEPT); + __builtin_feclearexcept(FE_DIVBYZERO); + raised = fetestexcept(FE_ALL_EXCEPT); + if (raised == FE_ALL_EXCEPT & ~e) + FAIL(raised, FE_ALL_EXCEPT & ~e, e, s, "__builtin_feclearexcept"); + + + s = "FE_INEXACT"; + e = FE_INEXACT; + INFO("test: %s(%x)\n", s, e); + + feclearexcept(FE_ALL_EXCEPT); + __builtin_feraiseexcept(FE_INEXACT); + raised = fetestexcept(FE_ALL_EXCEPT); + if (raised != e) + FAIL(raised, e, e, s, "__builtin_feraiseexcept"); + + feraiseexcept(FE_ALL_EXCEPT); + __builtin_feclearexcept(FE_INEXACT); + raised = fetestexcept(FE_ALL_EXCEPT); + if (raised == FE_ALL_EXCEPT & ~e) + FAIL(raised, FE_ALL_EXCEPT & ~e, e, s, "__builtin_feclearexcept"); + + + s = "FE_OVERFLOW"; + e = FE_OVERFLOW; + INFO("test: %s(%x)\n", s, e); + + feclearexcept(FE_ALL_EXCEPT); + __builtin_feraiseexcept(FE_OVERFLOW); + raised = fetestexcept(FE_ALL_EXCEPT); + if (raised != e) + FAIL(raised, e, e, s, "__builtin_feraiseexcept"); + + feraiseexcept(FE_ALL_EXCEPT); + __builtin_feclearexcept(FE_OVERFLOW); + raised = fetestexcept(FE_ALL_EXCEPT); + if (raised == FE_ALL_EXCEPT & ~e) + FAIL(raised, FE_ALL_EXCEPT & ~e, e, s, "__builtin_feclearexcept"); + + + s = "FE_UNDERFLOW"; + e = FE_UNDERFLOW; + INFO("test: %s(%x)\n", s, e); + + feclearexcept(FE_ALL_EXCEPT); + __builtin_feraiseexcept(FE_UNDERFLOW); + raised = fetestexcept(FE_ALL_EXCEPT); + if (raised != e) + FAIL(raised, e, e, s, "__builtin_feraiseexcept"); + + feraiseexcept(FE_ALL_EXCEPT); + __builtin_feclearexcept(FE_UNDERFLOW); + raised = fetestexcept(FE_ALL_EXCEPT); + if (raised == FE_ALL_EXCEPT & ~e) + FAIL(raised, FE_ALL_EXCEPT & ~e, e, s, "__builtin_feclearexcept"); + + + s = "FE_INVALID"; + e = FE_INVALID; + INFO("test: %s(%x)\n", s, e); + + feclearexcept(FE_ALL_EXCEPT); + __builtin_feraiseexcept(FE_INVALID); + raised = fetestexcept(FE_ALL_EXCEPT); + if (raised != e) + FAIL(raised, e, e, s, "__builtin_feraiseexcept"); + + feraiseexcept(FE_ALL_EXCEPT); + __builtin_feclearexcept(FE_INVALID); + raised = fetestexcept(FE_ALL_EXCEPT); + if (raised == FE_ALL_EXCEPT & ~e) + FAIL(raised, FE_ALL_EXCEPT & ~e, e, s, "__builtin_feclearexcept"); + +} diff --git a/gcc/testsuite/gcc.target/powerpc/builtin-fegetround.c b/gcc/testsuite/gcc.target/powerpc/builtin-fegetround.c new file mode 100644 index 00000000000..0ff33aef007 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/builtin-fegetround.c @@ -0,0 +1,30 @@ +/* { dg-do run { target { powerpc*-*-* } } } */ +/* { dg-options "-lm -fno-builtin" } */ + +/* This testcase ensures that the builtins is correctly expanded and match the + * expected result from the standard function. */ + +#include <fenv.h> + +#ifdef DEBUG +#include <stdio.h> +#define FAIL(v, e) printf("ERROR, __builtin_fegetround() returned %d," \ + " not the expecected value %d\n", v, e); +#else +void abort (void); +#define FAIL(v, e) abort() +#endif + +/* We use __builtin_* version to avoid the function be replaced by glibc that + * may have an inline optimization for it in fenv.h. */ +int main () +{ + int rm[] = {FE_TONEAREST, FE_TOWARDZERO, FE_UPWARD, FE_DOWNWARD}; + for (int i = 0; i < sizeof(rm); i++) { + fesetround(rm[i]); + int rounding = __builtin_fegetround(); + int expected = fegetround(); + if (rounding != expected) + FAIL(rounding, expected); + } +} -- 2.26.2