https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70825

            Bug ID: 70825
           Summary: x86_64: __atomic_compare_exchange_n() accesses stack
                    unnecessarily
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dhowells at redhat dot com
  Target Milestone: ---

Created attachment 38349
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38349&action=edit
Example code

__atomic_compare_exchange_n() stores the 'expected' value and, if not
discarded, the old value on the stack rather than in a register and then may
immediately retrieve the old value from there back into the register from
whence it just came.

For example:

   static __always_inline int atomic_cmpxchg(atomic_t *v, int old, int new)
   {
      int cur = old;
      if (__atomic_compare_exchange_n(&v->counter, &cur, new, false,
                  __ATOMIC_SEQ_CST,
                  __ATOMIC_RELAXED))
         return cur;
      return cur;
   }

   void test_atomic_cmpxchg(atomic_t *counter)
   {
      atomic_cmpxchg(counter, 23, 42);
   }

produces:

   0:   ba 2a 00 00 00          mov    $0x2a,%edx
   5:   b8 17 00 00 00          mov    $0x17,%eax
   a:   c7 44 24 fc 17 00 00    movl   $0x17,-0x4(%rsp)
  11:   00 
  12:   f0 0f b1 17             lock cmpxchg %edx,(%rdi)
  16:   c3                      retq   

Note line a where 0x17 is stored at -04(%rsp) unnecessarily (the value is
dicarded).  Note also that this value is in %eax at the time and could be saved
from there rather than using an immediate operand.

And then:

   int test_atomic_cmpxchg_2(atomic_t *counter)
   {
      return atomic_cmpxchg(counter, 23, 42);
   }

where the value is returned, we get something like:

  20:   ba 2a 00 00 00          mov    $0x2a,%edx
  25:   b8 17 00 00 00          mov    $0x17,%eax
  2a:   c7 44 24 fc 17 00 00    movl   $0x17,-0x4(%rsp)
  31:   00 
  32:   f0 0f b1 17             lock cmpxchg %edx,(%rdi)
  36:   74 04                   je     3c <test_atomic_cmpxchg_2+0x1c>
  38:   89 44 24 fc             mov    %eax,-0x4(%rsp)
  3c:   8b 44 24 fc             mov    -0x4(%rsp),%eax
  40:   c3                      retq   

In this case, the return value is being constructed on the stack whereas the
value we actually want to return is in %eax at the conclusion of the CMPXCHG,
so lines 2a & 36-3c are redundant.

Changing atomic_cmpxchg() slightly:

  static __always_inline int atomic_cmpxchg_B(atomic_t *v, int old, int new)
   {
      int cur = old;
      if (__atomic_compare_exchange_n(&v->counter, &cur, new, false,
                  __ATOMIC_SEQ_CST,
                  __ATOMIC_RELAXED))
-->      return old;
      return cur;
   }

does generate code that's somewhat better:

  a0:   ba 2a 00 00 00          mov    $0x2a,%edx
  a5:   b8 17 00 00 00          mov    $0x17,%eax
  aa:   c7 44 24 fc 17 00 00    movl   $0x17,-0x4(%rsp)
  b1:   00 
  b2:   f0 0f b1 17             lock cmpxchg %edx,(%rdi)
  b6:   ba 17 00 00 00          mov    $0x17,%edx
  bb:   0f 45 d0                cmovne %eax,%edx
  be:   89 d0                   mov    %edx,%eax
  c0:   c3                      retq   

However, given the fact that old == cur must hold true if the CMPXCHG
succeeded,  lines b6-be are redundant. Note that the unnecessary store (line
aa) is still there.

Note that clang does somewhat better than gcc:

  50:   b9 2a 00 00 00          mov    $0x2a,%ecx
  55:   b8 17 00 00 00          mov    $0x17,%eax
  5a:   f0 0f b1 0f             lock cmpxchg %ecx,(%rdi)
  5e:   c3                      retq   
  5f:   90                      nop

See the attached example code which should be compiled to a .s file.

This is the case for all of:

    gcc version 5.3.1 20151207 (Red Hat 5.3.1-2) (GCC)
    gcc version 6.0.0 20160219 (Red Hat Cross 6.0.0-0.1) (GCC)
    gcc version 4.8.5 20150623 (Red Hat 4.8.5-2.x) (GCC)

Reply via email to