off stack

dhowells at redhat dot com Mon, 05 Sep 2016 09:01:19 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77491


            Bug ID: 77491
           Summary: Suboptimal code produced with unnecessary moving of
                    values on/off stack
           Product: gcc
           Version: 6.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dhowells at redhat dot com
  Target Milestone: ---

Created attachment 39567
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39567&action=edit
Test source

The attached program produces unnecessary instructions moving registers on and
off of the stack.  Compiled with Fedora 24 gcc-6.1.1-3 20160621, using gcc -Os,
for the first function I see:

0000000000000000 <jump>:
   0:   9c                      pushfq 
   1:   59                      pop    %rcx
   2:   fa                      cli    
   3:   8b 07                   mov    (%rdi),%eax
   5:   89 44 24 fc             mov    %eax,-0x4(%rsp)
   9:   8b 54 24 fc             mov    -0x4(%rsp),%edx
   d:   83 fa 17                cmp    $0x17,%edx
  10:   0f 94 c0                sete   %al
  13:   75 06                   jne    1b <jump+0x1b>
  15:   c7 07 2b 00 00 00       movl   $0x2b,(%rdi)
  1b:   51                      push   %rcx
  1c:   9d                      popfq  
  1d:   8b 54 24 fc             mov    -0x4(%rsp),%edx
  21:   89 16                   mov    %edx,(%rsi)
  23:   c3                      retq   

The instruction at 9 is unnecessary - either the value in EDX could be moved
directly to EAX, or the comparison at d could be made against EAX.

The instructions at 5, 1d and 21 could be combined to place the result directly
in (ESI) rather than shuffling it on and off the stack.

Looking at the second function:

0000000000000024 <jump2>:
  24:   9c                      pushfq 
  25:   58                      pop    %rax
  26:   fa                      cli    
  27:   8b 17                   mov    (%rdi),%edx
  29:   89 54 24 fc             mov    %edx,-0x4(%rsp)
  2d:   8b 54 24 fc             mov    -0x4(%rsp),%edx
  31:   83 fa 17                cmp    $0x17,%edx
  34:   75 06                   jne    3c <jump2+0x18>
  36:   c7 07 2b 00 00 00       movl   $0x2b,(%rdi)
  3c:   50                      push   %rax
  3d:   9d                      popfq  
  3e:   8b 44 24 fc             mov    -0x4(%rsp),%eax
  42:   89 44 24 f8             mov    %eax,-0x8(%rsp)
  46:   8b 44 24 f8             mov    -0x8(%rsp),%eax
  4a:   c3                      retq   

It would be best if the flags were stashed in ECX, not EAX, as happens with the
first function.  This would allow the return value to be set in instruction 27.
 The comparison in 31 could then be against EAX directly.  Instructions 29, 2d,
3e, 42 and 46 are all redundant.

Changing the #if in the code to disable the inline asm doesn't show all that
much improvement in either function.  Doing this also allows it to be built for
aarch64 - which also shows unnecessary stack shuffling:

0000000000000000 <jump>:
   0:   d10043ff        sub     sp, sp, #0x10
   4:   b9400002        ldr     w2, [x0]
   8:   b9000fe2        str     w2, [sp,#12]
   c:   b9400fe2        ldr     w2, [sp,#12]
  10:   71005c5f        cmp     w2, #0x17
  14:   1a9f17e3        cset    w3, eq
  18:   54000061        b.ne    24 <jump+0x24>
  1c:   52800562        mov     w2, #0x2b                       // #43
  20:   b9000002        str     w2, [x0]
  24:   b9400fe0        ldr     w0, [sp,#12]
  28:   b9000020        str     w0, [x1]
  2c:   2a0303e0        mov     w0, w3
  30:   910043ff        add     sp, sp, #0x10
  34:   d65f03c0        ret

0000000000000038 <jump2>:
  38:   d10043ff        sub     sp, sp, #0x10
  3c:   b9400001        ldr     w1, [x0]
  40:   b9000fe1        str     w1, [sp,#12]
  44:   b9400fe1        ldr     w1, [sp,#12]
  48:   71005c3f        cmp     w1, #0x17
  4c:   54000061        b.ne    58 <jump2+0x20>
  50:   52800561        mov     w1, #0x2b                       // #43
  54:   b9000001        str     w1, [x0]
  58:   b9400fe0        ldr     w0, [sp,#12]
  5c:   b9000be0        str     w0, [sp,#8]
  60:   b9400be0        ldr     w0, [sp,#8]
  64:   910043ff        add     sp, sp, #0x10
  68:   d65f03c0        ret

[Bug c/77491] New: Suboptimal code produced with unnecessary moving of values on/off stack

Reply via email to