https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68908
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Target|powerpc64 | Status|UNCONFIRMED |NEW Last reconfirmed| |2015-12-15 CC| |jakub at gcc dot gnu.org, | |jsm28 at gcc dot gnu.org, | |mpolacek at gcc dot gnu.org, | |rth at gcc dot gnu.org Summary|inefficient code for an |inefficient code for |atomic preincrement on |_Atomic operations |powerpc64le | Ever confirmed|0 |1 --- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Doesn't seem to be ppc64le specific in any way, and doesn't affect just preincrement. Try: typedef _Atomic int AI; AI i; void fn1 (AI * ai) { ++*ai; } void fn2 (AI * ai) { (*ai)++; } void fn3 (AI * ai) { *ai += 6; } void fn4 (void) { ++i; } void fn5 (void) { i++; } void fn6 (void) { i += 2; } and you'll see even on x86_64-linux that all the sequences use the generic CAS instructions instead of __atomic_fetch_add etc. The comment above build_atomic_assign even says this: "Also note that the compiler is simply issuing the generic form of the atomic operations." So, the question is, should we add smarts to the FE to optimize the cases already when emitting them (this would be similar to what omp-low.c does when expanding #pragma omp atomic, see: /* When possible, use specialized atomic update functions. */ if ((INTEGRAL_TYPE_P (type) || POINTER_TYPE_P (type)) && store_bb == single_succ (load_bb) && expand_omp_atomic_fetch_op (load_bb, addr, loaded_val, stored_val, index)) return; ), or should we add some pattern matching in some pass that would try to detect these rather complicated patterns like: <bb 2>: _5 = __atomic_load_4 (ai_3(D), 5); _6 = (int) _5; D.1768 = _6; <bb 3>: # prephitmp_17 = PHI <_6(2), pretmp_16(4)> _9 = prephitmp_17 + 1; _10 = (unsigned int) _9; _12 = __atomic_compare_exchange_4 (ai_3(D), &D.1768, _10, 0, 5, 5); if (_12 != 0) goto <bb 5>; else goto <bb 4>; <bb 4>: pretmp_16 = D.1768; goto <bb 3>; (with the casts in there optional) and convert those to the more efficient __atomic_* calls if possible? Note one issue is that the pattern involves non-SSA loads/stores (the D.1768 var above) and we'd need to prove that the var is used only in those two places and nowhere else.