https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122967

            Bug ID: 122967
           Summary: isfinite() assembly for 32-bit float
           Product: gcc
           Version: 15.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
          Assignee: unassigned at gcc dot gnu.org
          Reporter: Explorer09 at gmail dot com
  Target Milestone: ---

This issue was discussed in Clang
(https://github.com/llvm/llvm-project/issues/169270),
and I think it's good to let GCC developers know this and improve the code.

```c
#include <math.h>
#include <stdbool.h>
#include <stdint.h>
bool func(float x) {
    return isfinite(x);
}
bool func2(float x) {
    // Can't make this transform if issignaling(x)
    return isgreaterequal((x - x), (x - x));
}
bool func3(float x) {
    union { float f; uint32_t u; } v, inf;
    inf.f = INFINITY;
    v.f = x;
    v.u = (uint32_t)(v.u << 1);
    return v.u < inf.u * 2;
}
bool func4(float x) {
    union { float f; uint32_t u; } v, inf;
    inf.f = INFINITY;
    v.f = x;
    v.u = ~v.u;
    __asm__ ("" : "+r" (v.u));
    return (v.u & inf.u) == 0;
}
```

```assembly
# x86-64
func2:
        subss   %xmm0, %xmm0
        ucomiss %xmm0, %xmm0
        setnp   %al
        ret
func3:
        movd    %xmm0, %eax
        addl    %eax, %eax
        cmpl    $-16777217, %eax
        setbe   %al
        ret
func4:
        movd    %xmm0, %eax
        notl    %eax
        testl   $2139095040, %eax
        sete    %al
        ret
```

```assembly
# AArch64
func2:
        fsub    s0, s0, s0
        fcmp    s0, s0
        cset    w0, vc
        ret
func3:
        fmov    w1, s0
        mov     w0, -16777217
        cmp     w0, w1, lsl 1
        cset    w0, cs
        ret
func4:
        fmov    w0, s0
        mvn     w0, w0
        tst     w0, 2139095040
        cset    w0, eq
        ret
```

https://godbolt.org/z/1cxc4oWej

The original reporter there (Karl Meakin) suggested the "func2" approach, while
I suggested "func3" and "func4".

"func2" can give the smallest code for '-Os' optimization, but it come with a
caveat that it throws an exception on a signaling NaN (that is, it cannot be
used with '-fsignaling-nans'). "func3" and "func4" are slightly larger but,
AFAIK, do not throw exceptions.

I didn't do performance benchmark on any of these.

Reply via email to