On Fri, Jan 31, 2025 at 08:04:53AM +0100, Richard Biener wrote:
> On Fri, Jan 31, 2025 at 3:55 AM Michael Meissner <[email protected]>
> wrote:
> >
> > Fix PR 118541, do not generate unordered fp cmoves for IEEE compares.
> >
> > In bug PR target/118541 on power9, power10, and power11 systems, for the
> > function:
> >
> > extern double __ieee754_acos (double);
> >
> > double
> > __acospi (double x)
> > {
> > double ret = __ieee754_acos (x) / 3.14;
> > return __builtin_isgreater (ret, 1.0) ? 1.0 : ret;
> > }
> >
> > GCC currently generates the following code:
> >
> > Power9 Power10 and Power11
> > ====== ===================
> > bl __ieee754_acos bl __ieee754_acos@notoc
> > nop plfd 0,.LC0@pcrel
> > addis 9,2,.LC2@toc@ha xxspltidp 12,1065353216
> > addi 1,1,32 addi 1,1,32
> > lfd 0,.LC2@toc@l(9) ld 0,16(1)
> > addis 9,2,.LC0@toc@ha fdiv 0,1,0
> > ld 0,16(1) mtlr 0
> > lfd 12,.LC0@toc@l(9) xscmpgtdp 1,0,12
> > fdiv 0,1,0 xxsel 1,0,12,1
> > mtlr 0 blr
> > xscmpgtdp 1,0,12
> > xxsel 1,0,12,1
> > blr
> >
> > This is because ifcvt.c optimizes the conditional floating point move to
> > use the
> > XSCMPGTDP instruction.
> >
> > However, the XSCMPGTDP instruction will generate an interrupt if one of the
> > arguments is a signalling NaN and signalling NaNs can generate an interrupt.
> > The IEEE comparison functions (isgreater, etc.) require that the comparison
> > not
> > raise an interrupt.
> >
> > The following patch changes the PowerPC back end so that ifcvt.c will not
> > change
> > the if/then test and move into a conditional move if the comparison is one
> > of
> > the comparisons that do not raise an error with signalling NaNs and -Ofast
> > is
> > not used. If a normal comparison is used or -Ofast is used, GCC will
> > continue
> > to generate XSCMPGTDP and XXSEL.
> >
> > For the following code:
> >
> > double
> > ordered_compare (double a, double b, double c, double d)
> > {
> > return __builtin_isgreater (a, b) ? c : d;
> > }
> >
> > /* Verify normal > does generate xscmpgtdp. */
> >
> > double
> > normal_compare (double a, double b, double c, double d)
> > {
> > return a > b ? c : d;
> > }
> >
> > with the following patch, GCC generates the following for power9, power10,
> > and
> > power11:
> >
> > ordered_compare:
> > fcmpu 0,1,2
> > fmr 1,4
> > bnglr 0
> > fmr 1,3
> > blr
> >
> > normal_compare:
> > xscmpgtdp 1,1,2
> > xxsel 1,4,3,1
> > blr
> >
> > I have built bootstrap compilers on big endian power9 systems and little
> > endian
> > power9/power10 systems and there were no regressions. Can I check this
> > patch
> > into the GCC trunk, and after a waiting period, can I check this into the
> > active
> > older branches?
> >
> > 2025-01-30 Michael Meissner <[email protected]>
> >
> > gcc/
> >
> > PR target/118541
> > * config/rs6000/rs6000-protos.h (REVERSE_COND_ORDERED_OK): New
> > macro.
> > (REVERSE_COND_NO_ORDERED): Likewise.
> > (rs6000_reverse_condition): Add argument.
> > * config/rs6000/rs6000.cc (rs6000_reverse_condition): Do not allow
> > ordered comparisons to be reversed for floating point cmoves.
> > (rs6000_emit_sCOND): Adjust rs6000_reverse_condition call.
> > * config/rs6000/rs6000.h (REVERSE_CONDITION): Likewise.
> > * config/rs6000/rs6000.md (reverse_branch_comparison): Name insn.
> > Adjust rs6000_reverse_condition call.
> >
> > gcc/testsuite/
> >
> > PR target/118541
> > * gcc.target/powerpc/pr118541.c: New test.
> > ---
> > gcc/config/rs6000/rs6000-protos.h | 6 ++-
> > gcc/config/rs6000/rs6000.cc | 23 ++++++++---
> > gcc/config/rs6000/rs6000.h | 10 ++++-
> > gcc/config/rs6000/rs6000.md | 24 +++++++-----
> > gcc/testsuite/gcc.target/powerpc/pr118541.c | 43 +++++++++++++++++++++
> > 5 files changed, 89 insertions(+), 17 deletions(-)
> > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr118541.c
> >
> > diff --git a/gcc/config/rs6000/rs6000-protos.h
> > b/gcc/config/rs6000/rs6000-protos.h
> > index 4619142d197..112332660d3 100644
> > --- a/gcc/config/rs6000/rs6000-protos.h
> > +++ b/gcc/config/rs6000/rs6000-protos.h
> > @@ -114,8 +114,12 @@ extern const char *rs6000_sibcall_template (rtx *,
> > unsigned int);
> > extern const char *rs6000_indirect_call_template (rtx *, unsigned int);
> > extern const char *rs6000_indirect_sibcall_template (rtx *, unsigned int);
> > extern const char *rs6000_pltseq_template (rtx *, int);
> > +
> > +#define REVERSE_COND_ORDERED_OK false
> > +#define REVERSE_COND_NO_ORDERED true
> > +
> > extern enum rtx_code rs6000_reverse_condition (machine_mode,
> > - enum rtx_code);
> > + enum rtx_code, bool);
> > extern rtx rs6000_emit_eqne (machine_mode, rtx, rtx, rtx);
> > extern rtx rs6000_emit_fp_cror (rtx_code, machine_mode, rtx);
> > extern void rs6000_emit_sCOND (machine_mode, rtx[]);
> > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> > index f9f9a0b931d..eaf79435ec3 100644
> > --- a/gcc/config/rs6000/rs6000.cc
> > +++ b/gcc/config/rs6000/rs6000.cc
> > @@ -15360,15 +15360,25 @@ rs6000_print_patchable_function_entry (FILE *file,
> > }
> >
> > enum rtx_code
> > -rs6000_reverse_condition (machine_mode mode, enum rtx_code code)
> > +rs6000_reverse_condition (machine_mode mode,
> > + enum rtx_code code,
> > + bool no_ordered)
> > {
> > /* Reversal of FP compares takes care -- an ordered compare
> > - becomes an unordered compare and vice versa. */
> > + becomes an unordered compare and vice versa.
> > +
> > + However, this is not safe for ordered comparisons (i.e. for isgreater,
> > + etc.) starting with the power9 because ifcvt.cc will want to create
> > a fp
> > + cmove, and the x{s,v}cmp{eq,gt,ge}{dp,qp} instructions will trap if
> > one of
> > + the arguments is a signalling NaN. */
> > +
> > if (mode == CCFPmode
> > && (!flag_finite_math_only
>
> I fail to see where -Ofast is allowed, but only see the pre-existing
> flag check above
> which checks for no NaN/Inf rather than sNaN - the latter would be checked
> with
> HONOR_SNANS (mode).
The !flag_finite_math_only will allow the optimization to be compiled, but
using HONOR_NANS is probably a better way. I'll rewrite the test. Thanks.
For:
double
ordered_compare (double a, double b, double c, double d)
{
return __builtin_isgreater (a, b) ? c : d;
}
Here is what -Ofast generates:
xscmpgedp 1,2,1
xxsel 1,3,4,1
Here is what -O2 generates:
fcmpu 0,1,2
fmr 1,4
bnglr 0
fmr 1,3
blr
--
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: [email protected]