https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93985
Bug ID: 93985 Summary: Sub-optimal assembly for %st(0) constant loading with SSE enabled (x86_64) Product: gcc Version: 9.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: sagebar at web dot de Target Milestone: --- Created attachment 47939 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47939&action=edit Inefficient code generation, similar working cases & work-around With gcc for x86_64, where SSE is enabled by default, some situations exist where legacy (fpu) math instructions (and their constraints) still have to be used. x86 offets a hand full of instructions to load specific floating point constants more efficiently (including `1.0`). As such, it would stand to reason to implement a function `atan()` as: ```c double atan(double x) { double result; __asm__("fpatan" : "=t" (result) : "0" (1.0) , "u" (x) : "st(1)"); return result; } ``` This code works perfectly on i386, where it compiles to: ```asm fldl 4(%esp) # push(x) fld1 # push(1.0) fpatan ret ``` However, on x86_64 it is compiled as: ```asm movsd .LC0(%rip), %xmm1 movsd %xmm1, -8(%rsp) fldl -8(%rsp) # push(1.0) movsd %xmm0, -8(%rsp) fldl -8(%rsp) # push(x) fxch %st(1) # { x, 1.0 } -> { 1.0, x } fpatan fstpl -8(%rsp) movsd -8(%rsp), %xmm0 ret ... .LC0: .long 0 # SSE constant: 1.0 .long 1072693248 # ... ``` When the optimal code would look like: ```asm movsd %xmm0, -8(%rsp) fldl -8(%rsp) # push(x) fld1 # push(1.0) fpatan fstpl -8(%rsp) movsd -8(%rsp), %xmm0 ret ``` Still though, it appears that GCC _is_ aware of the fld1 instruction for encoding inline assembly operands, even when SSE is enabled (s.a. `atan_reverse()` within the attached file). So it would stand to reason that this is either a problem with how GCC weights different encoding schemes, or a problem with how GCC decides if certain encoding schemes are even possible at all (which seems quite likely, especially considering that the x86_64 version contains an fxch-instruction which also wouldn't be necessary if GCC had encoded the `1.0` _after_ pushing `x` (ignoring the fact that `1.0` can be pushed using `fld1`)) NOTE: The attached file should be compiled as `gcc -O3 -S attached-file.c`