[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 --- Comment #24 from Jan Kratochvil --- (In reply to Alexander Monakov from comment #22) > Strange, comment #8 claims the opposite (unless Jan tested the revert not on > trunk, but on some branch). The testsuite ran on 4341106354c6a463ce3628a4ef9c1a1d37193b59 (=2023-02-25), 93ba85fdd253b4b9cf2b9e54e8e5969b1a3db098 reverted.
[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 --- Comment #23 from Jan Kratochvil --- Created attachment 54542 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54542=edit fmoderrno.cpp (In reply to Uroš Bizjak from comment #21) > When g:93ba85fdd253b4b9cf2b9e54e8e5969b1a3db098 is reverted, current > mainline does not emit anything that would handle errno (even with > -fmath-errno flag explicitly set at command line). With 4341106354c6a463ce3628a4ef9c1a1d37193b59 (=2023-02-25), 93ba85fdd253b4b9cf2b9e54e8e5969b1a3db098 reverted and fmoderrno.cpp I get: fmod(1.0D, 0.0D) g++ -o fmoderrno fmoderrno.C -O3 -Wall; ./fmoderrno -nan errno=33=Numerical argument out of domain
[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 --- Comment #13 from Jan Kratochvil --- (In reply to Uroš Bizjak from comment #12) > (In reply to Jan Kratochvil from comment #8) > > > The revert makes it 13x faster. But the produced code still falls back to > > calling glibc fmod() as shown in the disassembly in Comment 0. > > If I use the "fprem" instruction directly it gets 15x faster - but I did not > > figure out some (easy) way for me how to patch GCC to no longer produce the > > call to fmod() at all and produce only the "fprem" instruction. > > Use -ffinite-math-only option: > > -ffinite-math-only >Allow optimizations for floating-point arithmetic that assume that > arguments and results are not NaNs or +-Infs. That works for this Comment 0 reproducer but I find -ffinite-math-only incorrect to use due to other calculations in the whole OpenJDK codebase. Using infinite numbers is documented for Java code and then it may have invalid results. To fully performance-fix it (no "call fmod" case) I find better to use -fno-math-errno. Nothing in OpenJDK should rely on errno from math operations. But that option still requires to revert your patch. The question is whether gcc can rely on the undocumented Intel behavior as described in Comment 7. glibc already relies on it anyway. This revert proposal I have submitted only for the benefit of GCC. I (or my employer) do not mind myself as I have already submitted a fix for OpenJDK using an asm "fprem" expression. Relying on a fix in GCC would not be acceptable for OpenJDK as it is still going to be built by old/exising OSes/compilers for years: https://github.com/openjdk/jdk/pull/12508/files
[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 --- Comment #10 from Jan Kratochvil --- (In reply to Alexander Monakov from comment #9) > You just need to pass -fno-math-errno (the call is for setting errno, > similar to how gcc emits the sqrt() sequence). True, thanks. So I think the patch should be reverted, right? I expect the revert should have a testcase nowadays.
[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 --- Comment #8 from Jan Kratochvil --- (In reply to Andrew Pinski from comment #2) > So the simple test is run the full GCC bootstrap/test with all languages and > check if the testcase fails or not. I suspect it will. It does not. Tested on Fedora 36 x86-64. I did test only a revert of: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=93ba85fdd253b4b9cf2b9e54e8e5969b1a3db098 The revert makes it 13x faster. But the produced code still falls back to calling glibc fmod() as shown in the disassembly in Comment 0. If I use the "fprem" instruction directly it gets 15x faster - but I did not figure out some (easy) way for me how to patch GCC to no longer produce the call to fmod() at all and produce only the "fprem" instruction. (In reply to Alexander Monakov from comment #4) > Plus, Glibc does use fprem/fprem1 for fmodl/remainderl on x86_64, It is true replacing fmod() with fmodl() makes it 5x faster (but only 5x). There is still some infinity check and I haven't found any real justification in glibc sources for it: 28if (__builtin_expect (isinf (x) || y == 0.0L, 0) 29&& _LIB_VERSION != _IEEE_ && !isnan (y) && !isnan (x)) 30 /* fmod(+-Inf,y) or fmod(x,0) */ 31 return __kernel_standard_l (x, y, 227); > The ieee_2.f90 testcase attempts to change rounding mode. It 2014 it > probably just was "miscompiled". The testsuite run did include "gfortran.dg/ieee/ieee_2.f90" and it has no regression.
[Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 Bug ID: 108922 Summary: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() Product: gcc Version: 12.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jkratochvil at azul dot com Target Milestone: --- Created attachment 54528 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54528=edit bench.cpp This performance regression is since: [PATCH, i386]: Enable reminder{sd,df,xf} and fmod{sf,df,xf} only for flag_finite_math_only. https://gcc.gnu.org/pipermail/gcc-patches/2014-September/400104.html https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=93ba85fdd253b4b9cf2b9e54e8e5969b1a3db098 Reproducible with attached "bench.cpp": g++ (GCC) 4.8.3 20140517 (prerelease) real0m0.329s g++ (GCC) 4.9.3 20150207 (prerelease) real0m4.396s The committer claims "do not return NaN for infinities, but generate invalid-arithmetic-operand exception.". But my attached testcase tests that all the corner cases do have both the same result value and the same exceptions generated. The committer also claims "fixes ieee_2.f90 testsuite failure" but I have no idea where to find this testsuite. g++ (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18) /home/azul/t/zuc1182/fmod.C:7 4005f8: dd 44 24 30 fldl 0x30(%rsp) 4005fc: dd 44 24 38 fldl 0x38(%rsp) 400600: d9 c1 fld%st(1) 400602: d9 c1 fld%st(1) 400604: d9 f8 fprem 400606: df e0 fnstsw %ax 400608: f6 c4 04test $0x4,%ah 40060b: 75 f7 jne400604 40060d: dd d9 fstp %st(1) 40060f: dd 5c 24 18 fstpl 0x18(%rsp) 400613: f2 0f 10 44 24 18 movsd 0x18(%rsp),%xmm0 400619: 66 0f 2e c0 ucomisd %xmm0,%xmm0 ^^^ Here it tests the result is finite; if it is not it will fallback to calling fmod(). But I do not find even that needed, one could just use the "fprem" result. 40061d: 7a 06 jp 400625 40061f: 74 2f je 400650 400621: d9 c9 fxch %st(1) 400623: eb 0b jmp400630 400625: d9 c9 fxch %st(1) 400627: 66 0f 1f 84 00 00 00nopw 0x0(%rax,%rax,1) 40062e: 00 00 400630: dd 5c 24 08 fstpl 0x8(%rsp) 400634: f2 0f 10 4c 24 08 movsd 0x8(%rsp),%xmm1 40063a: dd 5c 24 08 fstpl 0x8(%rsp) 40063e: f2 0f 10 44 24 08 movsd 0x8(%rsp),%xmm0 400644: e8 6f fe ff ff callq 4004b8 400649: eb 09 jmp400654 40064b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 400650: dd d8 fstp %st(0) 400652: dd d8 fstp %st(0) 400654: 83 c3 01add$0x1,%ebx 400657: f2 0f 11 44 24 28 movsd %xmm0,0x28(%rsp) /home/azul/t/zuc1182/fmod.C:6 40065d: 81 fb 00 e1 f5 05 cmp$0x5f5e100,%ebx 400663: 75 93 jne4005f8 Similar issue may be with drem() (=remainder()) vs. "fprem1" instruction. I expect the same issue also affects fmodf(), dremf() and remainderf(). Another topic is why the glibc fmod() implementation just does not use "fprem" on i686/x86_64 arch.