https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922
Bug ID: 108922 Summary: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod() Product: gcc Version: 12.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jkratochvil at azul dot com Target Milestone: --- Created attachment 54528 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54528&action=edit bench.cpp This performance regression is since: [PATCH, i386]: Enable reminder{sd,df,xf} and fmod{sf,df,xf} only for flag_finite_math_only. https://gcc.gnu.org/pipermail/gcc-patches/2014-September/400104.html https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=93ba85fdd253b4b9cf2b9e54e8e5969b1a3db098 Reproducible with attached "bench.cpp": g++ (GCC) 4.8.3 20140517 (prerelease) real 0m0.329s g++ (GCC) 4.9.3 20150207 (prerelease) real 0m4.396s The committer claims "do not return NaN for infinities, but generate invalid-arithmetic-operand exception.". But my attached testcase tests that all the corner cases do have both the same result value and the same exceptions generated. The committer also claims "fixes ieee_2.f90 testsuite failure" but I have no idea where to find this testsuite. g++ (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18) /home/azul/t/zuc1182/fmod.C:7 4005f8: dd 44 24 30 fldl 0x30(%rsp) 4005fc: dd 44 24 38 fldl 0x38(%rsp) 400600: d9 c1 fld %st(1) 400602: d9 c1 fld %st(1) 400604: d9 f8 fprem 400606: df e0 fnstsw %ax 400608: f6 c4 04 test $0x4,%ah 40060b: 75 f7 jne 400604 <main+0x34> 40060d: dd d9 fstp %st(1) 40060f: dd 5c 24 18 fstpl 0x18(%rsp) 400613: f2 0f 10 44 24 18 movsd 0x18(%rsp),%xmm0 400619: 66 0f 2e c0 ucomisd %xmm0,%xmm0 ^^^ Here it tests the result is finite; if it is not it will fallback to calling fmod(). But I do not find even that needed, one could just use the "fprem" result. 40061d: 7a 06 jp 400625 <main+0x55> 40061f: 74 2f je 400650 <main+0x80> 400621: d9 c9 fxch %st(1) 400623: eb 0b jmp 400630 <main+0x60> 400625: d9 c9 fxch %st(1) 400627: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) 40062e: 00 00 400630: dd 5c 24 08 fstpl 0x8(%rsp) 400634: f2 0f 10 4c 24 08 movsd 0x8(%rsp),%xmm1 40063a: dd 5c 24 08 fstpl 0x8(%rsp) 40063e: f2 0f 10 44 24 08 movsd 0x8(%rsp),%xmm0 400644: e8 6f fe ff ff callq 4004b8 <fmod@plt> 400649: eb 09 jmp 400654 <main+0x84> 40064b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 400650: dd d8 fstp %st(0) 400652: dd d8 fstp %st(0) 400654: 83 c3 01 add $0x1,%ebx 400657: f2 0f 11 44 24 28 movsd %xmm0,0x28(%rsp) /home/azul/t/zuc1182/fmod.C:6 40065d: 81 fb 00 e1 f5 05 cmp $0x5f5e100,%ebx 400663: 75 93 jne 4005f8 <main+0x28> Similar issue may be with drem() (=remainder()) vs. "fprem1" instruction. I expect the same issue also affects fmodf(), dremf() and remainderf(). Another topic is why the glibc fmod() implementation just does not use "fprem" on i686/x86_64 arch.