[Bug rtl-optimization/70825] x86_64: __atomic_compare_exchange_n() accesses stack unnecessarily

2016-06-20 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70825

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #6 from Andrew Pinski  ---
Dup of bug 66867.

*** This bug has been marked as a duplicate of bug 66867 ***

[Bug rtl-optimization/70825] x86_64: __atomic_compare_exchange_n() accesses stack unnecessarily

2016-05-19 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70825

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #5 from ktkachov at gcc dot gnu.org ---
I've looked at RTL dce a bit and the reason it doesn't remove the store is
because the MEM rtx used in the atomic instruction pattern is volatile andalso
has the alias set ALIAS_SET_MEMORY_BARRIER associated with it.

When the dse pass sees either of those it inserts a "wild read" into its
calculations indicating that a memory happened from potentially any location,
thus the stack store is potentially not dead and can't be eliminated.

I've confirmed this by hacking get_builtin_sync_mem from builtins.c (that
creates that mem rtx) to not set MEM_VOLATILE and the alias set to not
ALIAS_SET_MEMORY_BARRIER. With those changes I see the stack store being
eliminated by dse.

So the same information that's used to prevent the compiler from moving memory
instructions across these atomic operations prevents it from eliminating the
preceding stack store.

[Bug rtl-optimization/70825] x86_64: __atomic_compare_exchange_n() accesses stack unnecessarily

2016-05-14 Thread ghalliday at hpccsystems dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70825

ghalliday at hpccsystems dot com changed:

   What|Removed |Added

 CC||ghalliday at hpccsystems dot 
com

--- Comment #4 from ghalliday at hpccsystems dot com ---
I have also hit this problem on x86 and Power8, and am adding a comment about
its significance.  Although it seems a minor bug it can have a very significant
effect on performance.

I have code which uses __sync_bool_compare_and_swap() to implement a lock free
linked list (inside a memory manager).  Replacing it with
__atomic_compare_exchange_n should allow better performance - by avoiding
reloading the expected value (and also selecting a less restrictive memory
order).

However on one example test query, the new code using
__atomic_compare_exchange_n is over 40% *slower* on x86.  It is also slower on
Power8 despite using a lwsync instead of sync in the generated code.

[Bug rtl-optimization/70825] x86_64: __atomic_compare_exchange_n() accesses stack unnecessarily

2016-05-11 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70825

Bill Schmidt  changed:

   What|Removed |Added

 Target|x86_64, aarch64 |x86_64, aarch64, powerpc64*

--- Comment #3 from Bill Schmidt  ---
Also on powerpc64:

.file   "gorp.c"
.abiversion 2
.section".text"
.align 2
.p2align 4,,15
.globl test_atomic_cmpxchg
.type   test_atomic_cmpxchg, @function
test_atomic_cmpxchg:
li 9,23
stw 9,-16(1)  <-- unnecessary
sync
.L2:
lwarx 9,0,3
cmpwi 0,9,23
bne 0,.L3
li 9,42
stwcx. 9,0,3
bne 0,.L2
isync
.L3:
blr

[Bug rtl-optimization/70825] x86_64: __atomic_compare_exchange_n() accesses stack unnecessarily

2016-04-28 Thread ramana at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70825

Ramana Radhakrishnan  changed:

   What|Removed |Added

 Target||x86_64, aarch64
 CC||ramana at gcc dot gnu.org
  Component|target  |rtl-optimization

--- Comment #1 from Ramana Radhakrishnan  ---
There is an unnecessary store to the stack regardless of the architecture. I
suspect that's just because of the a combination of the specification of the
intrinsic and DSE being unable to remove such stores. 

For e.g. on aarch64 with:

 #include 

#define __always_inline inline __attribute__((always_inline))

typedef struct {
int counter;
} atomic_t;


   static __always_inline int atomic_cmpxchg(atomic_t *v, int old, int new)
   {
  int cur = old;
  if (__atomic_compare_exchange_n(>counter, , new, false,
  __ATOMIC_SEQ_CST,
  __ATOMIC_RELAXED))
 return cur;
  return cur;
   }

   void test_atomic_cmpxchg(atomic_t *counter)
   {
  atomic_cmpxchg(counter, 23, 42);
   }

we get.

sub sp, sp, #16
mov w1, 23
mov w2, 42
str w1, [sp, 12] ---> unneeded
.L3:
ldaxr   w3, [x0]
cmp w3, w1
bne .L4
stlxr   w4, w2, [x0]
cbnzw4, .L3
.L4:
add sp, sp, 16
ret

[Bug rtl-optimization/70825] x86_64: __atomic_compare_exchange_n() accesses stack unnecessarily

2016-04-28 Thread ramana at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70825

Ramana Radhakrishnan  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-04-28
 Ever confirmed|0   |1

--- Comment #2 from Ramana Radhakrishnan  ---
Confirmed then.