On 5/30/2022 4:06 AM, Roger Sayle wrote:
This patch addresses the issue in comment #6 of PR rtl-optimization/7061
(a four digit PR number) from 2006 where on x86_64 complex number arguments
are unconditionally spilled to the stack.

For the test cases below:
float re(float _Complex a) { return __real__ a; }
float im(float _Complex a) { return __imag__ a; }

GCC with -O2 currently generates:

re:     movq    %xmm0, -8(%rsp)
         movss   -8(%rsp), %xmm0
         ret
im:     movq    %xmm0, -8(%rsp)
         movss   -4(%rsp), %xmm0
         ret

with this patch we now generate:

re:     ret
im:     movq    %xmm0, %rax
         shrq    $32, %rax
         movd    %eax, %xmm0
         ret

[Technically, this shift can be performed on %xmm0 in a single
instruction, but the backend needs to be taught to do that, the
important bit is that the SCmode argument isn't written to the
stack].

The patch itself is to emit_group_store where just before RTL
expansion commits to writing to the stack, we check if the store
group consists of a single scalar integer register that holds
a complex mode value; on x86_64 SCmode arguments are passed in
DImode registers.  If this is the case, we can use a SUBREG to
"view_convert" the integer to the equivalent complex mode.

An interesting corner case that showed up during testing is that
x86_64 also passes HCmode arguments in DImode registers(!), i.e.
using modes of different sizes.  This is easily handled/supported
by first converting to an integer mode of the correct size, and
then generating a complex mode SUBREG of this.  This is similar
in concept to the patch I proposed here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590139.html
which was almost (but not quite) approved here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591139.html
Yea, sorry.  Too much to do at the new job.  Trying to work my way through queued up stuff now...



This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2020-05-30  Roger Sayle  <ro...@nextmovesoftware.com>

gcc/ChangeLog
         PR rtl-optimization/7061
         * expr.cc (emit_group_stote): For groups that consist of a single
         scalar integer register that hold a complex mode value, use
         gen_lowpart to generate a SUBREG to "view_convert" to the complex
         mode.  For modes of different sizes, first convert to an integer
         mode of the appropriate size.

gcc/testsuite/ChangeLog
         PR rtl-optimization/7061
         * gcc.target/i386/pr7061-1.c: New test case.
         * gcc.target/i386/pr7061-2.c: New test case.
OK
jeff

Reply via email to