[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-28 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #33 from H.J. Lu  ---
(In reply to Uroš Bizjak from comment #20)
> (In reply to Jakub Jelinek from comment #16)
> 
> > More questionable is the #Z case, where Table 8-11 just talks about
> > Divide or reverse divide operation  Returns an ∞ signed with the exclusive
> > OR of the
> > with a 0 divisor.   sign of the two operands to the
> > destination operand.
> > but FPREM does division too, so I hope it is covered too (but not listed
> > explicitly).
> 
> FYI, the table 3-30 (and 3-31) is wrong. Executing fprem when st(0) == 1.0
> and st(1) == 0.0 results in IA exception, not Z exception.

Thanks for bringing it up.  It will be fixed in the next revision for Intel
SDM.

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

Uroš Bizjak  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #32 from Uroš Bizjak  ---
Fixed by reverting g:4f2611b6e872.

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #31 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:8020c9c42349f51f75239b9d35a2be41848a97bd

commit r13-6361-g8020c9c42349f51f75239b9d35a2be41848a97bd
Author: Uros Bizjak 
Date:   Mon Feb 27 22:10:01 2023 +0100

i386: Do not constrain fmod and remainder patterns with
flag_finite_math_only [PR108922]

According to Intel ISA manual, fprem and fprem1 return NaN when invalid
arithmetic exception is generated. This is documented in Table 8-10 of the
ISA manual and makes these two instructions fully IEEE compatible.

The reverted patch was based on the data from table 3-30 and 3-31 of the
Intel ISA manual, where results in case of st(0) being infinity or
st(1) being 0 are not specified.

2023-02-27  Uroš Bizjak  

gcc/ChangeLog:

PR target/108922
Revert:
* config/i386/i386.md (fmodxf3): Enable for flag_finite_math_only
only.
(fmod3): Ditto.
(fpremxf4_i387): Ditto.
(reminderxf3): Ditto.
(reminder3): Ditto.
(fprem1xf4_i387): Ditto.

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #30 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #29)
> Note, fmod_optab is only used on i?86 (where because of the commit mentioned
> here it was limited to finite math only) and rs6000 (which guards it on
> unsafe math optimizations), so both in the fast-math related area only.
> Therefore it might be very well possible it got broken because of those
> changes without anyone noticing.  Most of the builtins for which ranges are
> tested are single operand and pow which has 2 has special handling...

Looking at r6-4983-g883cabdecdb052865f, fmod handled here:

+/* Return true if CALL can produce a domain error (EDOM) but can never
+   produce a pole, range overflow or range underflow error (all ERANGE).
+   This means that we can tell whether a function would have set errno
+   by testing whether the result is a NaN.  */
+
+static bool
+edom_only_function (gcall *call)
+{
+  switch (DECL_FUNCTION_CODE (gimple_call_fndecl (call)))
+{
+CASE_FLT_FN (BUILT_IN_ACOS):
+CASE_FLT_FN (BUILT_IN_ASIN):
+CASE_FLT_FN (BUILT_IN_ATAN):
+CASE_FLT_FN (BUILT_IN_COS):
+CASE_FLT_FN (BUILT_IN_SIGNIFICAND):
+CASE_FLT_FN (BUILT_IN_SIN):
+CASE_FLT_FN (BUILT_IN_SQRT):
+CASE_FLT_FN (BUILT_IN_FMOD):
+CASE_FLT_FN (BUILT_IN_REMAINDER):
+  return true;
+
+default:
+  return false;
+}
+}

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #29 from Jakub Jelinek  ---
Note, fmod_optab is only used on i?86 (where because of the commit mentioned
here it was limited to finite math only) and rs6000 (which guards it on unsafe
math optimizations), so both in the fast-math related area only.
Therefore it might be very well possible it got broken because of those changes
without anyone noticing.  Most of the builtins for which ranges are tested are
single operand and pow which has 2 has special handling...

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

Uroš Bizjak  changed:

   What|Removed |Added

   Target Milestone|--- |13.0
 Status|WAITING |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |ubizjak at gmail dot com

--- Comment #28 from Uroš Bizjak  ---
I think that we cleared all questions here. I'll prepare the revert later
today.

On a related note, it would be nice if Intel corrected the table 3-30 and 3-31
w.r.t to Z exception.

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

Jakub Jelinek  changed:

   What|Removed |Added

 CC||rsandifo at gcc dot gnu.org

--- Comment #27 from Jakub Jelinek  ---
CCing Richard S. who has removed expand_errno_check in
r6-4983-g883cabdecdb052865f.
>From what I can see, can_test_argument_range doesn't handle FMOD (could it test
for x infinite or y zero?), edom_only_function does though.

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #26 from Uroš Bizjak  ---
(In reply to Jan Kratochvil from comment #23)
> Created attachment 54542 [details]
> fmoderrno.cpp
> 
> (In reply to Uroš Bizjak from comment #21)
> > When g:93ba85fdd253b4b9cf2b9e54e8e5969b1a3db098 is reverted, current
> > mainline does not emit anything that would handle errno (even with
> > -fmath-errno flag explicitly set at command line).
> 
> With 4341106354c6a463ce3628a4ef9c1a1d37193b59 (=2023-02-25),
> 93ba85fdd253b4b9cf2b9e54e8e5969b1a3db098 reverted and fmoderrno.cpp I get:
> fmod(1.0D, 0.0D)
> g++ -o fmoderrno fmoderrno.C -O3 -Wall; ./fmoderrno
> -nan errno=33=Numerical argument out of domain

Ah, the compilation is different if the compiler finds "errno" mentioned in the
source.

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #25 from Jakub Jelinek  ---
Note, the 215740 change has been backported to 4.8 branch in r215773 (I've been
wondering why I can't reproduce it on 4.8; and also to 4.9 branch).
Anyway, in 4.7 I see fmodl being called in *.optimized dump, and
expand_builtin_mathfn_2
used to add the expand_errno_check.
I bet starting with GCC 6 fmod etc. are handled through internal functions
instead and maybe the errno stuff in there is missing.

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread jkratochvil at azul dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #24 from Jan Kratochvil  ---
(In reply to Alexander Monakov from comment #22)
> Strange, comment #8 claims the opposite (unless Jan tested the revert not on
> trunk, but on some branch).

The testsuite ran on 4341106354c6a463ce3628a4ef9c1a1d37193b59 (=2023-02-25),
93ba85fdd253b4b9cf2b9e54e8e5969b1a3db098 reverted.

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread jkratochvil at azul dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #23 from Jan Kratochvil  ---
Created attachment 54542
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54542&action=edit
fmoderrno.cpp

(In reply to Uroš Bizjak from comment #21)
> When g:93ba85fdd253b4b9cf2b9e54e8e5969b1a3db098 is reverted, current
> mainline does not emit anything that would handle errno (even with
> -fmath-errno flag explicitly set at command line).

With 4341106354c6a463ce3628a4ef9c1a1d37193b59 (=2023-02-25),
93ba85fdd253b4b9cf2b9e54e8e5969b1a3db098 reverted and fmoderrno.cpp I get:
fmod(1.0D, 0.0D)
g++ -o fmoderrno fmoderrno.C -O3 -Wall; ./fmoderrno
-nan errno=33=Numerical argument out of domain

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #22 from Alexander Monakov  ---
Strange, comment #8 claims the opposite (unless Jan tested the revert not on
trunk, but on some branch).

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #21 from Uroš Bizjak  ---
(In reply to Alexander Monakov from comment #19)
> I get the feeling that you're ignoring me, but gcc-4.8.3 was already
> emitting a helper fmod call for setting errno without any flag_errno_math
> checks in i386.md, i.e. it was already in the middle-end. As was mentioned
> in comment #9.

When g:93ba85fdd253b4b9cf2b9e54e8e5969b1a3db098 is reverted, current mainline
does not emit anything that would handle errno (even with -fmath-errno flag
explicitly set at command line).

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #20 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #16)

> More questionable is the #Z case, where Table 8-11 just talks about
> Divide or reverse divide operation  Returns an ∞ signed with the exclusive
> OR of the
> with a 0 divisor.   sign of the two operands to the
> destination operand.
> but FPREM does division too, so I hope it is covered too (but not listed
> explicitly).

FYI, the table 3-30 (and 3-31) is wrong. Executing fprem when st(0) == 1.0 and
st(1) == 0.0 results in IA exception, not Z exception.

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #19 from Alexander Monakov  ---
I get the feeling that you're ignoring me, but gcc-4.8.3 was already emitting a
helper fmod call for setting errno without any flag_errno_math checks in
i386.md, i.e. it was already in the middle-end. As was mentioned in comment #9.

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #18 from Jakub Jelinek  ---
(In reply to Uroš Bizjak from comment #17)
> So, based on the above finding, should insn condition be changed to
> !flag_errno_math?

I'd say that it shouldn't be the business of backends to check flag_errno_math,
it should be the middle-end.  And it can either ignore the fmod (but isn't say
hypot and others a similar case) optab in that case or it could do the sqrt
trick by using the optab inline even for flag_errno_math, then using comparison
detect if it is one of the exceptional cases and call the library function in
that case.

Of course, that is probably GCC 14 material and so a hack on the backend side
would be acceptable too.

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #17 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #16)
> Doesn't the SDM guarantee the right behavior though?

Indeed, this is what is missing from Table 3-31.

> It is true that the FPREM results table says * and ** in certain spots
> (Table 3-31 in my copy), but then in the Invalid Arithmetic Operand
> Exception (#IA) chapter (8.5.1.2 for me) I see Table 8-10 Invalid Arithmetic
> Operations and the Masked Responses to Them
> and in there:
> Condition   Masked Response
> Remainder instructions FPREM,   Return the QNaN floating-point
> indefinite;
> FPREM1: modulus (divisor) is 0  clear condition code flag C2 to 0.
> or dividend is ∞.
> More questionable is the #Z case, where Table 8-11 just talks about
> Divide or reverse divide operation  Returns an ∞ signed with the exclusive
> OR of the
> with a 0 divisor.   sign of the two operands to the
> destination operand.
> but FPREM does division too, so I hope it is covered too (but not listed
> explicitly).

Table C-2 says that FPREM{,1} do not generate #Z exception.

So, based on the above finding, should insn condition be changed to
!flag_errno_math?

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #16 from Jakub Jelinek  ---
Doesn't the SDM guarantee the right behavior though?
It is true that the FPREM results table says * and ** in certain spots (Table
3-31 in my copy), but then in the Invalid Arithmetic Operand Exception (#IA)
chapter (8.5.1.2 for me) I see Table 8-10 Invalid Arithmetic Operations and the
Masked Responses to Them
and in there:
Condition   Masked Response
Remainder instructions FPREM,   Return the QNaN floating-point indefinite;
FPREM1: modulus (divisor) is 0  clear condition code flag C2 to 0.
or dividend is ∞.
More questionable is the #Z case, where Table 8-11 just talks about
Divide or reverse divide operation  Returns an ∞ signed with the exclusive OR
of the
with a 0 divisor.   sign of the two operands to the destination
operand.
but FPREM does division too, so I hope it is covered too (but not listed
explicitly).

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #15 from Alexander Monakov  ---
That is the fancy-error-handling path that is reached under _LIB_VERSION !=
_IEEE_. Before glibc-2.27, linking with -lieee would set _LIB_VERSION = _IEEE_,
and then glibc would use the fprem[1] instruction without any special-casing.

musl libc does not implement errno setting for math functions, and always uses
fprem directly; likewise for Apple libm:

https://github.com/apple-oss-distributions/Libm/blob/17a5f9daa3f5679f7536b26f133b40cc078753c3/Source/Intel/fmod.s

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-26 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #14 from Uroš Bizjak  ---
(In reply to Jan Kratochvil from comment #13)
> The question is whether gcc can rely on the undocumented Intel behavior as
> described in Comment 7. glibc already relies on it anyway.

I don't think this is true, please see Comment #11.

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-26 Thread jkratochvil at azul dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #13 from Jan Kratochvil  ---
(In reply to Uroš Bizjak from comment #12)
> (In reply to Jan Kratochvil from comment #8)
> 
> > The revert makes it 13x faster. But the produced code still falls back to
> > calling glibc fmod() as shown in the disassembly in Comment 0.
> > If I use the "fprem" instruction directly it gets 15x faster - but I did not
> > figure out some (easy) way for me how to patch GCC to no longer produce the
> > call to fmod() at all and produce only the "fprem" instruction.
> 
> Use -ffinite-math-only option:
> 
> -ffinite-math-only
>Allow optimizations for floating-point arithmetic that assume that
> arguments and results are not NaNs or +-Infs.

That works for this Comment 0 reproducer but I find -ffinite-math-only
incorrect to use due to other calculations in the whole OpenJDK codebase. Using
infinite numbers is documented for Java code and then it may have invalid
results.

To fully performance-fix it (no "call fmod" case) I find better to use
-fno-math-errno. Nothing in OpenJDK should rely on errno from math operations.
But that option still requires to revert your patch.

The question is whether gcc can rely on the undocumented Intel behavior as
described in Comment 7. glibc already relies on it anyway.

This revert proposal I have submitted only for the benefit of GCC. I (or my
employer) do not mind myself as I have already submitted a fix for OpenJDK
using an asm "fprem" expression. Relying on a fix in GCC would not be
acceptable for OpenJDK as it is still going to be built by old/exising
OSes/compilers for years: https://github.com/openjdk/jdk/pull/12508/files

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-26 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #12 from Uroš Bizjak  ---
(In reply to Jan Kratochvil from comment #8)

> The revert makes it 13x faster. But the produced code still falls back to
> calling glibc fmod() as shown in the disassembly in Comment 0.
> If I use the "fprem" instruction directly it gets 15x faster - but I did not
> figure out some (easy) way for me how to patch GCC to no longer produce the
> call to fmod() at all and produce only the "fprem" instruction.

Use -ffinite-math-only option:

-ffinite-math-only
   Allow optimizations for floating-point arithmetic that assume that arguments
and results are not NaNs or +-Infs.

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-26 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #11 from Uroš Bizjak  ---
(In reply to Jan Kratochvil from comment #8)

> It is true replacing fmod() with fmodl() makes it 5x faster (but only 5x).
> There is still some infinity check and I haven't found any real
> justification in glibc sources for it:
> 28  if (__builtin_expect (isinf (x) || y == 0.0L, 0)
> 29  && _LIB_VERSION != _IEEE_ && !isnan (y) && !isnan (x))
> 30/* fmod(+-Inf,y) or fmod(x,0) */
> 31return __kernel_standard_l (x, y, 227);

Using the following test:

--cut here--
#include 
#include 

long double
__attribute__((noinline))
test (long double x, long double y)
{
  return fmodl (x, y);
}

int
main ()
{
  long double x = INFINITY, y = 1.0;

  printf ("%Lf\n", test (x, y));
  return 0;
}
--cut here--

execution ends in:

case 227:
/* fmod(x,0) */
exc.type = DOMAIN;
exc.name = CSTR ("fmod");
if (_LIB_VERSION == _SVID_)
exc.retval = x;
else
exc.retval = zero/zero;
if (_LIB_VERSION == _POSIX_)
  __set_errno (EDOM);
else if (!matherr(&exc)) {
  if (_LIB_VERSION == _SVID_) {
(void) WRITE2("fmod:  DOMAIN error\n", 20);
  }
  __set_errno (EDOM);
}
break;

So, it doesn't execute fprem, but returns early with NaN.

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-26 Thread jkratochvil at azul dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #10 from Jan Kratochvil  ---
(In reply to Alexander Monakov from comment #9)
> You just need to pass -fno-math-errno (the call is for setting errno,
> similar to how gcc emits the sqrt() sequence).

True, thanks.


So I think the patch should be reverted, right? I expect the revert should have
a testcase nowadays.

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #9 from Alexander Monakov  ---
(In reply to Jan Kratochvil from comment #8)
> The revert makes it 13x faster. But the produced code still falls back to
> calling glibc fmod() as shown in the disassembly in Comment 0.
> If I use the "fprem" instruction directly it gets 15x faster - but I did not
> figure out some (easy) way for me how to patch GCC to no longer produce the
> call to fmod() at all and produce only the "fprem" instruction.

You just need to pass -fno-math-errno (the call is for setting errno, similar
to how gcc emits the sqrt() sequence).


> (In reply to Alexander Monakov from comment #4)
> > Plus, Glibc does use fprem/fprem1 for fmodl/remainderl on x86_64,
> 
> It is true replacing fmod() with fmodl() makes it 5x faster (but only 5x).
> There is still some infinity check and I haven't found any real
> justification in glibc sources for it:
> 28  if (__builtin_expect (isinf (x) || y == 0.0L, 0)
> 29  && _LIB_VERSION != _IEEE_ && !isnan (y) && !isnan (x))
> 30/* fmod(+-Inf,y) or fmod(x,0) */
> 31return __kernel_standard_l (x, y, 227);

This is for legacy/fancy error handling beyond setting IEEE exception flags.

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-25 Thread jkratochvil at azul dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #8 from Jan Kratochvil  ---
(In reply to Andrew Pinski from comment #2)
> So the simple test is run the full GCC bootstrap/test with all languages and
> check if the testcase fails or not. I suspect it will.

It does not. Tested on Fedora 36 x86-64.

I did test only a revert of:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=93ba85fdd253b4b9cf2b9e54e8e5969b1a3db098

The revert makes it 13x faster. But the produced code still falls back to
calling glibc fmod() as shown in the disassembly in Comment 0.
If I use the "fprem" instruction directly it gets 15x faster - but I did not
figure out some (easy) way for me how to patch GCC to no longer produce the
call to fmod() at all and produce only the "fprem" instruction.

(In reply to Alexander Monakov from comment #4)
> Plus, Glibc does use fprem/fprem1 for fmodl/remainderl on x86_64,

It is true replacing fmod() with fmodl() makes it 5x faster (but only 5x).
There is still some infinity check and I haven't found any real justification
in glibc sources for it:
28if (__builtin_expect (isinf (x) || y == 0.0L, 0)
29&& _LIB_VERSION != _IEEE_ && !isnan (y) && !isnan (x))
30  /* fmod(+-Inf,y) or fmod(x,0) */
31  return __kernel_standard_l (x, y, 227);

> The ieee_2.f90 testcase attempts to change rounding mode. It 2014 it
> probably just was "miscompiled".

The testsuite run did include "gfortran.dg/ieee/ieee_2.f90" and it has no
regression.

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #7 from Alexander Monakov  ---
I saw that. That's why I'm pointing out that Glibc (and musl) uses the
instruction without any additional checks: real CPUs produce the expected
result in st(0), despite the documentation making no promise about the content
of st(0)).

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-25 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #6 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #5)
> (In reply to Alexander Monakov from comment #3)
> > I guess Uros' claim was based on what Intel and AMD manuals specify rather
> > than observed behavior of CPUs.
> 
> As a "committer", I really don't remember the reason to disable the
> patterns, but there is some analysis in the corresponding e-mail.

Please see Table 3-31 (and Table 3-32) in SDM [1]. If 'x' (AKA st(0)) is
infinity, no return is specified, since invalid arith operand exception is
generated.

In the above case, the SDM declares output as *undefined*, but c99 specifies
NaN.

[1]
https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-25 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #5 from Uroš Bizjak  ---
(In reply to Alexander Monakov from comment #3)
> I guess Uros' claim was based on what Intel and AMD manuals specify rather
> than observed behavior of CPUs.

As a "committer", I really don't remember the reason to disable the patterns,
but there is some analysis in the corresponding e-mail.

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #4 from Alexander Monakov  ---
Plus, Glibc does use fprem/fprem1 for fmodl/remainderl on x86_64, as well as
for {fmod,remainder,remquo}{,f,l} on i386 without any branches for corner
cases. So in practice CPUs apparently implement the expected behavior even
though the manual doesn't promise so.

The ieee_2.f90 testcase attempts to change rounding mode. It 2014 it probably
just was "miscompiled".

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #3 from Alexander Monakov  ---
> But my attached testcase tests that all the corner cases do have both
> the same result value and the same exceptions generated.

It seems you forgot to attach that testcase (bench.cpp does not cover corner
cases).

I guess Uros' claim was based on what Intel and AMD manuals specify rather than
observed behavior of CPUs.

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-02-25
 Status|UNCONFIRMED |WAITING

--- Comment #2 from Andrew Pinski  ---
So the simple test is run the full GCC bootstrap/test with all languages and
check if the testcase fails or not. I suspect it will.

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922

--- Comment #1 from Andrew Pinski  ---
>The committer also claims "fixes ieee_2.f90 testsuite failure" but I have no 
>idea where to find this testsuite.


./testsuite/gfortran.dg/ieee/ieee_2.f90