[Bug target/93985] New: Sub-optimal assembly for %st(0) constant loading with SSE enabled (x86_64)

sagebar at web dot de Sun, 01 Mar 2020 05:29:45 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93985


            Bug ID: 93985
           Summary: Sub-optimal assembly for %st(0) constant loading with
                    SSE enabled (x86_64)
           Product: gcc
           Version: 9.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: sagebar at web dot de
  Target Milestone: ---

Created attachment 47939
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47939&action=edit
Inefficient code generation, similar working cases & work-around

With gcc for x86_64, where SSE is enabled by default, some situations exist
where legacy (fpu) math instructions (and their constraints) still have to be
used.
x86 offets a hand full of instructions to load specific floating point
constants more efficiently (including `1.0`).
As such, it would stand to reason to implement a function `atan()` as:
```c
double atan(double x) {
        double result;
        __asm__("fpatan"
                : "=t" (result)
                : "0" (1.0)
                , "u" (x)
                : "st(1)");
        return result;
}
```

This code works perfectly on i386, where it compiles to:
```asm
        fldl    4(%esp)           # push(x)
        fld1                      # push(1.0)
        fpatan
        ret
```

However, on x86_64 it is compiled as:
```asm
        movsd   .LC0(%rip), %xmm1
        movsd   %xmm1, -8(%rsp)
        fldl    -8(%rsp)          # push(1.0)
        movsd   %xmm0, -8(%rsp)
        fldl    -8(%rsp)          # push(x)
        fxch    %st(1)            # { x, 1.0 } -> { 1.0, x }
        fpatan
        fstpl   -8(%rsp)
        movsd   -8(%rsp), %xmm0
        ret
        ...
.LC0:
        .long   0                 # SSE constant: 1.0
        .long   1072693248        # ...
```

When the optimal code would look like:
```asm
        movsd   %xmm0, -8(%rsp)
        fldl    -8(%rsp)          # push(x)
        fld1                      # push(1.0)
        fpatan
        fstpl   -8(%rsp)
        movsd   -8(%rsp), %xmm0
        ret
```

Still though, it appears that GCC _is_ aware of the fld1 instruction for
encoding inline assembly operands, even when SSE is enabled (s.a.
`atan_reverse()` within the attached file).
So it would stand to reason that this is either a problem with how GCC weights
different encoding schemes, or a problem with how GCC decides if certain
encoding schemes are even possible at all (which seems quite likely, especially
considering that the x86_64 version contains an fxch-instruction which also
wouldn't be necessary if GCC had encoded the `1.0` _after_ pushing `x`
(ignoring the fact that `1.0` can be pushed using `fld1`))

NOTE: The attached file should be compiled as `gcc -O3 -S attached-file.c`

[Bug target/93985] New: Sub-optimal assembly for %st(0) constant loading with SSE enabled (x86_64)

Reply via email to