4.9 dropping "fprem" and calling fmod()

jkratochvil at azul dot com via Gcc-bugs Fri, 24 Feb 2023 04:22:54 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922


            Bug ID: 108922
           Summary: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem"
                    and calling fmod()
           Product: gcc
           Version: 12.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jkratochvil at azul dot com
  Target Milestone: ---

Created attachment 54528
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54528&action=edit
bench.cpp

This performance regression is since:

[PATCH, i386]: Enable reminder{sd,df,xf} and fmod{sf,df,xf} only for
flag_finite_math_only.
https://gcc.gnu.org/pipermail/gcc-patches/2014-September/400104.html

https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=93ba85fdd253b4b9cf2b9e54e8e5969b1a3db098

Reproducible with attached "bench.cpp":
g++ (GCC) 4.8.3 20140517 (prerelease)
real    0m0.329s
g++ (GCC) 4.9.3 20150207 (prerelease)
real    0m4.396s

The committer claims "do not return NaN for infinities, but generate
invalid-arithmetic-operand exception.". But my attached testcase tests that all
the corner cases do have both the same result value and the same exceptions
generated.

The committer also claims "fixes ieee_2.f90 testsuite failure" but I have no
idea where to find this testsuite.

g++ (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18)
/home/azul/t/zuc1182/fmod.C:7
  4005f8:       dd 44 24 30             fldl   0x30(%rsp)
  4005fc:       dd 44 24 38             fldl   0x38(%rsp)
  400600:       d9 c1                   fld    %st(1)
  400602:       d9 c1                   fld    %st(1)
  400604:       d9 f8                   fprem
  400606:       df e0                   fnstsw %ax
  400608:       f6 c4 04                test   $0x4,%ah
  40060b:       75 f7                   jne    400604 <main+0x34>
  40060d:       dd d9                   fstp   %st(1)
  40060f:       dd 5c 24 18             fstpl  0x18(%rsp)
  400613:       f2 0f 10 44 24 18       movsd  0x18(%rsp),%xmm0
  400619:       66 0f 2e c0             ucomisd %xmm0,%xmm0
^^^
Here it tests the result is finite;
if it is not it will fallback to calling fmod().
But I do not find even that needed, one could just use the "fprem" result.
  40061d:       7a 06                   jp     400625 <main+0x55>
  40061f:       74 2f                   je     400650 <main+0x80>
  400621:       d9 c9                   fxch   %st(1)
  400623:       eb 0b                   jmp    400630 <main+0x60>
  400625:       d9 c9                   fxch   %st(1)
  400627:       66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
  40062e:       00 00
  400630:       dd 5c 24 08             fstpl  0x8(%rsp)
  400634:       f2 0f 10 4c 24 08       movsd  0x8(%rsp),%xmm1
  40063a:       dd 5c 24 08             fstpl  0x8(%rsp)
  40063e:       f2 0f 10 44 24 08       movsd  0x8(%rsp),%xmm0
  400644:       e8 6f fe ff ff          callq  4004b8 <fmod@plt>
  400649:       eb 09                   jmp    400654 <main+0x84>
  40064b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  400650:       dd d8                   fstp   %st(0)
  400652:       dd d8                   fstp   %st(0)
  400654:       83 c3 01                add    $0x1,%ebx
  400657:       f2 0f 11 44 24 28       movsd  %xmm0,0x28(%rsp)
/home/azul/t/zuc1182/fmod.C:6
  40065d:       81 fb 00 e1 f5 05       cmp    $0x5f5e100,%ebx
  400663:       75 93                   jne    4005f8 <main+0x28>

Similar issue may be with drem() (=remainder()) vs. "fprem1" instruction.

I expect the same issue also affects fmodf(), dremf() and remainderf().

Another topic is why the glibc fmod() implementation just does not use "fprem"
on i686/x86_64 arch.

[Bug target/108922] New: fmod() 13x slowdown in gcc 4.8->4.9 dropping "fprem" and calling fmod()

Reply via email to