Hi Carl, on 2023/7/12 02:06, Carl Love wrote: > GCC maintainers: > > Ver 4, Removed extra space in subject line. Added comment to commit > log comments about new __SET_FPSCR_RN_RETURNS_FPSCR__ define. Changed > Added to Add and Renamed to Rename in ChangeLog. Updated define_expand > "rs6000_set_fpscr_rn" per Peter's comments to use new temporary > register for output value. Also, comments from Kewen about moving rtx > tmp_di1 close to use. Renamed tmp_di2 as orig_df_in_di. Additionally, > changed the name of tmp_di3 to tmp_di2 so the numbering is > sequential. Moved the new rtx tmp_di2 = gen_reg_rtx (DImode); right > before its use to be consistent with previous move request. Fixed tabs > in comment. Remove -std=c99 from test_fpscr_rn_builtin_1.c. Cleaned up > comment and removed abort from test_fpscr_rn_builtin_2.c. > > Fixed a couple of additional issues with the ChangeLog per feedback > from git gcc-verify. > > Retested updated patch on Power 8, 9 and 10 to verify changes. > > Ver 3, Renamed the patch per comments on ver 2. Previous subject line > was " [PATCH ver 2] rs6000, __builtin_set_fpscr_rn add retrun value". > Fixed spelling mistakes and formatting. Updated define_expand > "rs6000_set_fpscr_rn to have the rs6000_get_fpscr_fields and > rs6000_update_fpscr_rn_field define expands inlined. Optimized the > code and fixed use of temporary register values. Updated the test file > dg-do run arguments and dg-options. Removed the check for > __SET_FPSCR_RN_RETURNS_FPSCR__. Removed additional references to the > overloaded built-in with double argument. Fixed up the documentation > file. Updated patch retested on Power 8 BE/LE, Power 9 BE/LE and Power > 10 LE. > > Ver 2, Went back thru the requirements and emails. Not sure where I > came up with the requirement for an overloaded version with double > argument. Removed the overloaded version with the double argument. > Added the macro to announce if the __builtin_set_fpscr_rn returns a > void or a double with the FPSCR bits. Updated the documentation file. > Retested on Power 8 BE/LE, Power 9 BE/LE, Power 10 LE. Redid the test > file. Per request, the original test file functionality was not > changed. Just changed the name from test_fpscr_rn_builtin.c to > test_fpscr_rn_builtin_1.c. Put new tests for the return values into a > new test file, test_fpscr_rn_builtin_2.c. > > The GLibC team requested a builtin to replace the mffscrn and > mffscrniinline asm instructions in the GLibC code. Previously there > was discussion on adding builtins for the mffscrn instructions. > > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620261.html > > In the end, it was felt that it would be to extend the existing > __builtin_set_fpscr_rn builtin to return a double instead of a void > type. The desire is that we could have the functionality of the > mffscrn and mffscrni instructions on older ISAs. The two instructions > were initially added in ISA 3.0. The __builtin_set_fpscr_rn has the > needed functionality to set the RN field using the mffscrn and mffscrni > instructions if ISA 3.0 is supported or fall back to using logical > instructions to mask and set the bits for earlier ISAs. The > instructions return the current value of the FPSCR fields DRN, VE, OE, > UE, ZE, XE, NI, RN bit positions then update the RN bit positions with > the new RN value provided. > > The current __builtin_set_fpscr_rn builtin has a return type of void. > So, changing the return type to double and returning the FPSCR fields > DRN, VE, OE, UE, ZE, XE, NI, RN bit positions would then give the > functionally equivalent of the mffscrn and mffscrni instructions. Any > current uses of the builtin would just ignore the return value yet any > new uses could use the return value. So the requirement is for the > change to the __builtin_set_fpscr_rn builtin to be backwardly > compatible and work for all ISAs. > > The following patch changes the return type of the > __builtin_set_fpscr_rn builtin from void to double. The return value > is the current value of the various FPSCR fields DRN, VE, OE, UE, ZE, > XE, NI, RN bit positions when the builtin is called. The builtin then > updated the RN field with the new value provided as an argument to the > builtin. The patch adds new testcases to test_fpscr_rn_builtin.c to > check that the builtin returns the current value of the FPSCR fields > and then updates the RN field. > > The GLibC team has reviewed the patch to make sure it met their needs > as a drop in replacement for the inline asm mffscr and mffscrni > statements in the GLibC code. T > > The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10 > LE. > > Please let me know if the patch is acceptable for mainline. Thanks. > > Carl > > ----------------------------------------- > rs6000, Add return value to __builtin_set_fpscr_rn > > Change the return value from void to double for __builtin_set_fpscr_rn. > The return value consists of the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, > RN bit positions. A new test file, test powerpc/test_fpscr_rn_builtin_2.c, > is added to test the new return value for the built-in. > > The value __SET_FPSCR_RN_RETURNS_FPSCR__ is defined if > __builtin_set_fpscr_rn returns a double. > > gcc/ChangeLog: > * config/rs6000/rs6000-builtins.def (__builtin_set_fpscr_rn): Update > built-in definition return type. > * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Add check, > define __SET_FPSCR_RN_RETURNS_FPSCR__ macro. > * config/rs6000/rs6000.md (rs6000_set_fpscr_rn): Add return > argument to return FPSCR fields. > * doc/extend.texi (__builtin_set_fpscr_rn): Update description for > the return value. Add description for > __SET_FPSCR_RN_RETURNS_FPSCR__ macro. > > gcc/testsuite/ChangeLog: > * gcc.target/powerpc/test_fpscr_rn_builtin.c: Rename to > test_fpscr_rn_builtin_1.c. Add comment. > * gcc.target/powerpc/test_fpscr_rn_builtin_2.c: New test for the > return value of __builtin_set_fpscr_rn builtin. > --- > gcc/config/rs6000/rs6000-builtins.def | 2 +- > gcc/config/rs6000/rs6000-c.cc | 5 + > gcc/config/rs6000/rs6000.md | 55 ++++--- > gcc/doc/extend.texi | 23 ++- > ...rn_builtin.c => test_fpscr_rn_builtin_1.c} | 12 +- > .../powerpc/test_fpscr_rn_builtin_2.c | 148 ++++++++++++++++++ > 6 files changed, 213 insertions(+), 32 deletions(-) > rename gcc/testsuite/gcc.target/powerpc/{test_fpscr_rn_builtin.c => > test_fpscr_rn_builtin_1.c} (89%) > create mode 100644 gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_2.c > > diff --git a/gcc/config/rs6000/rs6000-builtins.def > b/gcc/config/rs6000/rs6000-builtins.def > index 289a37998b1..28788b69c7d 100644 > --- a/gcc/config/rs6000/rs6000-builtins.def > +++ b/gcc/config/rs6000/rs6000-builtins.def > @@ -237,7 +237,7 @@ > const __ibm128 __builtin_pack_ibm128 (double, double); > PACK_IF packif {ibm128} > > - void __builtin_set_fpscr_rn (const int[0,3]); > + double __builtin_set_fpscr_rn (const int[0,3]); > SET_FPSCR_RN rs6000_set_fpscr_rn {nosoft} > > const double __builtin_unpack_ibm128 (__ibm128, const int<1>); > diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc > index 8555174d36e..07a32badb54 100644 > --- a/gcc/config/rs6000/rs6000-c.cc > +++ b/gcc/config/rs6000/rs6000-c.cc > @@ -604,6 +604,11 @@ rs6000_target_modify_macros (bool define_p, > HOST_WIDE_INT flags) > /* Tell the user -mrop-protect is in play. */ > if (rs6000_rop_protect) > rs6000_define_or_undefine_macro (define_p, "__ROP_PROTECT__"); > + /* Tell the user __builtin_set_fpscr_rn now returns the FPSCR fields > + in a double. Originally the built-in returned void. */ > + if ((flags & OPTION_MASK_SOFT_FLOAT) == 0) > + rs6000_define_or_undefine_macro (define_p, > + "__SET_FPSCR_RN_RETURNS_FPSCR__"); > } > > void > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index b0db8ae508d..641ab2dfcfe 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -6441,7 +6441,8 @@ (define_insn "rs6000_mffscdrn" > [(set_attr "type" "fp")]) > > (define_expand "rs6000_set_fpscr_rn" > - [(match_operand:SI 0 "reg_or_cint_operand")] > + [(use (match_operand:DF 0 "gpc_reg_operand")) > + (use (match_operand:SI 1 "reg_or_cint_operand"))] > "TARGET_HARD_FLOAT" > { > rtx tmp_df = gen_reg_rtx (DFmode); > @@ -6450,49 +6451,63 @@ (define_expand "rs6000_set_fpscr_rn" > new rounding mode bits from operands[0][62:63] into FPSCR[62:63]. */ > if (TARGET_P9_MISC) > { > - if (const_0_to_3_operand (operands[0], VOIDmode)) > - emit_insn (gen_rs6000_mffscrni (tmp_df, operands[0])); > + if (const_0_to_3_operand (operands[1], VOIDmode)) > + emit_insn (gen_rs6000_mffscrni (tmp_df, operands[1])); > else > { > - rtx op0 = convert_to_mode (DImode, operands[0], false); > - rtx src_df = simplify_gen_subreg (DFmode, op0, DImode, 0); > + rtx op1 = convert_to_mode (DImode, operands[1], false); > + rtx src_df = simplify_gen_subreg (DFmode, op1, DImode, 0); > emit_insn (gen_rs6000_mffscrn (tmp_df, src_df)); > } > - DONE; > + emit_move_insn (operands[0], tmp_df); > + DONE; > } > > - if (CONST_INT_P (operands[0])) > + /* Emulate the behavior of the mffscrni, mffscrn instructions for earlier > + ISAs. Return bits 29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI, > + RN) from the FPSCR. Set the RN field based on the value in operands[1]. > + */ > + > + /* Get the current FPSCR fields, bits 29:31 (DRN) and bits 56:63 (VE, OE, > UE, > + ZE, XE, NI, RN) from the FPSCR and return them. */ > + > + emit_insn (gen_rs6000_mffs (tmp_df)); > + rtx orig_df_in_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0); > + rtx tmp_di1 = gen_reg_rtx (DImode); > + emit_insn (gen_anddi3 (tmp_di1, orig_df_in_di, > + GEN_INT (0x00000007000000FFULL)));
Nit: The line with GEN_INT should indent more to tmp_di1, that is: emit_insn (gen_anddi3 (tmp_di1, orig_df_in_di, GEN_INT (0x00000007000000FFULL))); > + rtx tmp_rtn = simplify_gen_subreg (DFmode, tmp_di1, DImode, 0); > + emit_move_insn (operands[0], tmp_rtn); > + > + if (CONST_INT_P (operands[1])) > { > - if ((INTVAL (operands[0]) & 0x1) == 0x1) > + if ((INTVAL (operands[1]) & 0x1) == 0x1) > emit_insn (gen_rs6000_mtfsb1 (GEN_INT (31))); > else > emit_insn (gen_rs6000_mtfsb0 (GEN_INT (31))); > > - if ((INTVAL (operands[0]) & 0x2) == 0x2) > + if ((INTVAL (operands[1]) & 0x2) == 0x2) > emit_insn (gen_rs6000_mtfsb1 (GEN_INT (30))); > else > emit_insn (gen_rs6000_mtfsb0 (GEN_INT (30))); > } > else > { > - rtx tmp_rn = gen_reg_rtx (DImode); > - rtx tmp_di = gen_reg_rtx (DImode); > - > /* Extract new RN mode from operand. */ > - rtx op0 = convert_to_mode (DImode, operands[0], false); > - emit_insn (gen_anddi3 (tmp_rn, op0, GEN_INT (3))); > + rtx op1 = convert_to_mode (DImode, operands[1], false); > + rtx tmp_rn = gen_reg_rtx (DImode); > + emit_insn (gen_anddi3 (tmp_rn, op1, GEN_INT (3))); > > - /* Insert new RN mode into FSCPR. */ > - emit_insn (gen_rs6000_mffs (tmp_df)); > - tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0); > - emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (-4))); > - emit_insn (gen_iordi3 (tmp_di, tmp_di, tmp_rn)); > + /* Insert the new RN value from tmp_rn into FPSCR bit [62:63]. */ > + emit_insn (gen_anddi3 (tmp_di1, orig_df_in_di, GEN_INT (-4))); Nit: This tmp_di1 has been used as destination pseudo above, as Peter's comments, it's better to use one new pseudo here. OK for trunk with these nits fixed and bootstrapped & regress-tested as before. Thanks! Kewen