https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121570
--- Comment #4 from kargls at comcast dot net ---
>
> movq %rbx, %rdi
> movq %r14, %rsi
> callq __for_ieee_next_after_k4_@PLT
> movss %xmm0, 12(%rsp)
> ucomiss 16(%rsp), %xmm0
> jne .LBB0_1
> jp .LBB0_1
We don't know what Intel is doing within __for_ieee_after_k4.
I've quoted the Fortran standard about requirements:
1) On entry to a procedure, save current exceptions
2) Quiet all exceptions
3) Execute procedure
4) Restore exceptions from entry into function
5) Update exceptions that may have occurred during execution
Those requirements force
> whereas gfortran does
>
> movq %rbx, %rdi
> movss %xmm0, 8(%rsp)
> call _gfortran_ieee_procedure_entry
this call ...
> movss 8(%rsp), %xmm0
> pxor %xmm1, %xmm1
> movss %xmm1, 12(%rsp)
> call nextafterf
> movq %rbx, %rdi
> movss %xmm0, 20(%rsp)
> movss %xmm0, 8(%rsp)
> call _gfortran_ieee_procedure_exit
and this call. But, see below ...
> movss 8(%rsp), %xmm0
> ucomiss 12(%rsp), %xmm0
>
> I cannot look at what ifx's __for_ieee_next_after_k4_ does, but
> a separate, more optimized implementation for ieee_next_after might
> be faster also for gfortran.
The only thing that one might be able to do is in-line the _entry
and _exit procedure to avoid function call overhead. Intel has the
luxury that it deals with only Intel/AMD cpus. gfortran has seven
different config files: fpu-387.h, fpu-aix.h, fpu-generic.h, fpu-glibc.h
fpu-sysv.h, fpu-aarch64.h, and fpu-macppc.h.
> For example, it could check its argument
> if the operation will raise an exception, and branch in that event
> (which could be marked as unlikely, and after a few iterations, would
> be marked as unlikely to be taken by the CPU).
>
> Confirmed as an enhancement request.
... here. If it can be assumed that ieee_next_after, which is
mapped to nextafter (on x86_64-*-freebsd) is already IEEE-754
compliant, then the calls to _entry and _exit are redundant
so gfortran need not emit them. That is,
subroutine foo(x, y)
use ieee_arithmetic
real x
x = ieee_next_after(x, 10.)
end subroutine foo
would be translated to
__attribute__((fn spec (". w w ")))
void foo (real(kind=4) & restrict x, real(kind=4) & restrict y)
{
c_char fpstate.0[33];
try
{
// Needed for 1 and 2 above on entry into foo
_gfortran_ieee_procedure_entry ((void *) &fpstate.0);
{
// This is 3 above, i.e., execution of procedure
*x = __builtin_nextafterf (*x, 1.0e+1);
}
}
finally
{
// Needed for 4 and 5 above on exit from foo
_gfortran_ieee_procedure_exit ((void *) &fpstate.0);
}
}