Hi Carl,
on 2023/7/12 02:06, Carl Love wrote:
> GCC maintainers:
>
> Ver 4, Removed extra space in subject line. Added comment to commit
> log comments about new __SET_FPSCR_RN_RETURNS_FPSCR__ define. Changed
> Added to Add and Renamed to Rename in ChangeLog. Updated define_expand
> "rs6000_set_fpscr_rn" per Peter's comments to use new temporary
> register for output value. Also, comments from Kewen about moving rtx
> tmp_di1 close to use. Renamed tmp_di2 as orig_df_in_di. Additionally,
> changed the name of tmp_di3 to tmp_di2 so the numbering is
> sequential. Moved the new rtx tmp_di2 = gen_reg_rtx (DImode); right
> before its use to be consistent with previous move request. Fixed tabs
> in comment. Remove -std=c99 from test_fpscr_rn_builtin_1.c. Cleaned up
> comment and removed abort from test_fpscr_rn_builtin_2.c.
>
> Fixed a couple of additional issues with the ChangeLog per feedback
> from git gcc-verify.
>
> Retested updated patch on Power 8, 9 and 10 to verify changes.
>
> Ver 3, Renamed the patch per comments on ver 2. Previous subject line
> was " [PATCH ver 2] rs6000, __builtin_set_fpscr_rn add retrun value".
> Fixed spelling mistakes and formatting. Updated define_expand
> "rs6000_set_fpscr_rn to have the rs6000_get_fpscr_fields and
> rs6000_update_fpscr_rn_field define expands inlined. Optimized the
> code and fixed use of temporary register values. Updated the test file
> dg-do run arguments and dg-options. Removed the check for
> __SET_FPSCR_RN_RETURNS_FPSCR__. Removed additional references to the
> overloaded built-in with double argument. Fixed up the documentation
> file. Updated patch retested on Power 8 BE/LE, Power 9 BE/LE and Power
> 10 LE.
>
> Ver 2, Went back thru the requirements and emails. Not sure where I
> came up with the requirement for an overloaded version with double
> argument. Removed the overloaded version with the double argument.
> Added the macro to announce if the __builtin_set_fpscr_rn returns a
> void or a double with the FPSCR bits. Updated the documentation file.
> Retested on Power 8 BE/LE, Power 9 BE/LE, Power 10 LE. Redid the test
> file. Per request, the original test file functionality was not
> changed. Just changed the name from test_fpscr_rn_builtin.c to
> test_fpscr_rn_builtin_1.c. Put new tests for the return values into a
> new test file, test_fpscr_rn_builtin_2.c.
>
> The GLibC team requested a builtin to replace the mffscrn and
> mffscrniinline asm instructions in the GLibC code. Previously there
> was discussion on adding builtins for the mffscrn instructions.
>
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620261.html
>
> In the end, it was felt that it would be to extend the existing
> __builtin_set_fpscr_rn builtin to return a double instead of a void
> type. The desire is that we could have the functionality of the
> mffscrn and mffscrni instructions on older ISAs. The two instructions
> were initially added in ISA 3.0. The __builtin_set_fpscr_rn has the
> needed functionality to set the RN field using the mffscrn and mffscrni
> instructions if ISA 3.0 is supported or fall back to using logical
> instructions to mask and set the bits for earlier ISAs. The
> instructions return the current value of the FPSCR fields DRN, VE, OE,
> UE, ZE, XE, NI, RN bit positions then update the RN bit positions with
> the new RN value provided.
>
> The current __builtin_set_fpscr_rn builtin has a return type of void.
> So, changing the return type to double and returning the FPSCR fields
> DRN, VE, OE, UE, ZE, XE, NI, RN bit positions would then give the
> functionally equivalent of the mffscrn and mffscrni instructions. Any
> current uses of the builtin would just ignore the return value yet any
> new uses could use the return value. So the requirement is for the
> change to the __builtin_set_fpscr_rn builtin to be backwardly
> compatible and work for all ISAs.
>
> The following patch changes the return type of the
> __builtin_set_fpscr_rn builtin from void to double. The return value
> is the current value of the various FPSCR fields DRN, VE, OE, UE, ZE,
> XE, NI, RN bit positions when the builtin is called. The builtin then
> updated the RN field with the new value provided as an argument to the
> builtin. The patch adds new testcases to test_fpscr_rn_builtin.c to
> check that the builtin returns the current value of the FPSCR fields
> and then updates the RN field.
>
> The GLibC team has reviewed the patch to make sure it met their needs
> as a drop in replacement for the inline asm mffscr and mffscrni
> statements in the GLibC code. T
>
> The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
> LE.
>
> Please let me know if the patch is acceptable for mainline. Thanks.
>
> Carl
>
> -----------------------------------------
> rs6000, Add return value to __builtin_set_fpscr_rn
>
> Change the return value from void to double for __builtin_set_fpscr_rn.
> The return value consists of the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI,
> RN bit positions. A new test file, test powerpc/test_fpscr_rn_builtin_2.c,
> is added to test the new return value for the built-in.
>
> The value __SET_FPSCR_RN_RETURNS_FPSCR__ is defined if
> __builtin_set_fpscr_rn returns a double.
>
> gcc/ChangeLog:
> * config/rs6000/rs6000-builtins.def (__builtin_set_fpscr_rn): Update
> built-in definition return type.
> * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Add check,
> define __SET_FPSCR_RN_RETURNS_FPSCR__ macro.
> * config/rs6000/rs6000.md (rs6000_set_fpscr_rn): Add return
> argument to return FPSCR fields.
> * doc/extend.texi (__builtin_set_fpscr_rn): Update description for
> the return value. Add description for
> __SET_FPSCR_RN_RETURNS_FPSCR__ macro.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/powerpc/test_fpscr_rn_builtin.c: Rename to
> test_fpscr_rn_builtin_1.c. Add comment.
> * gcc.target/powerpc/test_fpscr_rn_builtin_2.c: New test for the
> return value of __builtin_set_fpscr_rn builtin.
> ---
> gcc/config/rs6000/rs6000-builtins.def | 2 +-
> gcc/config/rs6000/rs6000-c.cc | 5 +
> gcc/config/rs6000/rs6000.md | 55 ++++---
> gcc/doc/extend.texi | 23 ++-
> ...rn_builtin.c => test_fpscr_rn_builtin_1.c} | 12 +-
> .../powerpc/test_fpscr_rn_builtin_2.c | 148 ++++++++++++++++++
> 6 files changed, 213 insertions(+), 32 deletions(-)
> rename gcc/testsuite/gcc.target/powerpc/{test_fpscr_rn_builtin.c =>
> test_fpscr_rn_builtin_1.c} (89%)
> create mode 100644 gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_2.c
>
> diff --git a/gcc/config/rs6000/rs6000-builtins.def
> b/gcc/config/rs6000/rs6000-builtins.def
> index 289a37998b1..28788b69c7d 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -237,7 +237,7 @@
> const __ibm128 __builtin_pack_ibm128 (double, double);
> PACK_IF packif {ibm128}
>
> - void __builtin_set_fpscr_rn (const int[0,3]);
> + double __builtin_set_fpscr_rn (const int[0,3]);
> SET_FPSCR_RN rs6000_set_fpscr_rn {nosoft}
>
> const double __builtin_unpack_ibm128 (__ibm128, const int<1>);
> diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
> index 8555174d36e..07a32badb54 100644
> --- a/gcc/config/rs6000/rs6000-c.cc
> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -604,6 +604,11 @@ rs6000_target_modify_macros (bool define_p,
> HOST_WIDE_INT flags)
> /* Tell the user -mrop-protect is in play. */
> if (rs6000_rop_protect)
> rs6000_define_or_undefine_macro (define_p, "__ROP_PROTECT__");
> + /* Tell the user __builtin_set_fpscr_rn now returns the FPSCR fields
> + in a double. Originally the built-in returned void. */
> + if ((flags & OPTION_MASK_SOFT_FLOAT) == 0)
> + rs6000_define_or_undefine_macro (define_p,
> + "__SET_FPSCR_RN_RETURNS_FPSCR__");
> }
>
> void
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index b0db8ae508d..641ab2dfcfe 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -6441,7 +6441,8 @@ (define_insn "rs6000_mffscdrn"
> [(set_attr "type" "fp")])
>
> (define_expand "rs6000_set_fpscr_rn"
> - [(match_operand:SI 0 "reg_or_cint_operand")]
> + [(use (match_operand:DF 0 "gpc_reg_operand"))
> + (use (match_operand:SI 1 "reg_or_cint_operand"))]
> "TARGET_HARD_FLOAT"
> {
> rtx tmp_df = gen_reg_rtx (DFmode);
> @@ -6450,49 +6451,63 @@ (define_expand "rs6000_set_fpscr_rn"
> new rounding mode bits from operands[0][62:63] into FPSCR[62:63]. */
> if (TARGET_P9_MISC)
> {
> - if (const_0_to_3_operand (operands[0], VOIDmode))
> - emit_insn (gen_rs6000_mffscrni (tmp_df, operands[0]));
> + if (const_0_to_3_operand (operands[1], VOIDmode))
> + emit_insn (gen_rs6000_mffscrni (tmp_df, operands[1]));
> else
> {
> - rtx op0 = convert_to_mode (DImode, operands[0], false);
> - rtx src_df = simplify_gen_subreg (DFmode, op0, DImode, 0);
> + rtx op1 = convert_to_mode (DImode, operands[1], false);
> + rtx src_df = simplify_gen_subreg (DFmode, op1, DImode, 0);
> emit_insn (gen_rs6000_mffscrn (tmp_df, src_df));
> }
> - DONE;
> + emit_move_insn (operands[0], tmp_df);
> + DONE;
> }
>
> - if (CONST_INT_P (operands[0]))
> + /* Emulate the behavior of the mffscrni, mffscrn instructions for earlier
> + ISAs. Return bits 29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI,
> + RN) from the FPSCR. Set the RN field based on the value in operands[1].
> + */
> +
> + /* Get the current FPSCR fields, bits 29:31 (DRN) and bits 56:63 (VE, OE,
> UE,
> + ZE, XE, NI, RN) from the FPSCR and return them. */
> +
> + emit_insn (gen_rs6000_mffs (tmp_df));
> + rtx orig_df_in_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
> + rtx tmp_di1 = gen_reg_rtx (DImode);
> + emit_insn (gen_anddi3 (tmp_di1, orig_df_in_di,
> + GEN_INT (0x00000007000000FFULL)));
Nit: The line with GEN_INT should indent more to tmp_di1, that is:
emit_insn (gen_anddi3 (tmp_di1, orig_df_in_di,
GEN_INT (0x00000007000000FFULL)));
> + rtx tmp_rtn = simplify_gen_subreg (DFmode, tmp_di1, DImode, 0);
> + emit_move_insn (operands[0], tmp_rtn);
> +
> + if (CONST_INT_P (operands[1]))
> {
> - if ((INTVAL (operands[0]) & 0x1) == 0x1)
> + if ((INTVAL (operands[1]) & 0x1) == 0x1)
> emit_insn (gen_rs6000_mtfsb1 (GEN_INT (31)));
> else
> emit_insn (gen_rs6000_mtfsb0 (GEN_INT (31)));
>
> - if ((INTVAL (operands[0]) & 0x2) == 0x2)
> + if ((INTVAL (operands[1]) & 0x2) == 0x2)
> emit_insn (gen_rs6000_mtfsb1 (GEN_INT (30)));
> else
> emit_insn (gen_rs6000_mtfsb0 (GEN_INT (30)));
> }
> else
> {
> - rtx tmp_rn = gen_reg_rtx (DImode);
> - rtx tmp_di = gen_reg_rtx (DImode);
> -
> /* Extract new RN mode from operand. */
> - rtx op0 = convert_to_mode (DImode, operands[0], false);
> - emit_insn (gen_anddi3 (tmp_rn, op0, GEN_INT (3)));
> + rtx op1 = convert_to_mode (DImode, operands[1], false);
> + rtx tmp_rn = gen_reg_rtx (DImode);
> + emit_insn (gen_anddi3 (tmp_rn, op1, GEN_INT (3)));
>
> - /* Insert new RN mode into FSCPR. */
> - emit_insn (gen_rs6000_mffs (tmp_df));
> - tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
> - emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (-4)));
> - emit_insn (gen_iordi3 (tmp_di, tmp_di, tmp_rn));
> + /* Insert the new RN value from tmp_rn into FPSCR bit [62:63]. */
> + emit_insn (gen_anddi3 (tmp_di1, orig_df_in_di, GEN_INT (-4)));
Nit: This tmp_di1 has been used as destination pseudo above, as Peter's
comments, it's better to use one new pseudo here.
OK for trunk with these nits fixed and bootstrapped & regress-tested
as before. Thanks!
Kewen