[Bug target/68908] inefficient code for _Atomic operations

2015-12-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68908

Jakub Jelinek  changed:

   What|Removed |Added

 Target|powerpc64   |
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-12-15
 CC||jakub at gcc dot gnu.org,
   ||jsm28 at gcc dot gnu.org,
   ||mpolacek at gcc dot gnu.org,
   ||rth at gcc dot gnu.org
Summary|inefficient code for an |inefficient code for
   |atomic preincrement on  |_Atomic operations
   |powerpc64le |
 Ever confirmed|0   |1

--- Comment #2 from Jakub Jelinek  ---
Doesn't seem to be ppc64le specific in any way, and doesn't affect just
preincrement.
Try:
typedef _Atomic int AI;
AI i;

void
fn1 (AI * ai)
{
  ++*ai;
}

void
fn2 (AI * ai)
{
  (*ai)++;
}

void
fn3 (AI * ai)
{
  *ai += 6;
}

void
fn4 (void)
{
  ++i;
}

void
fn5 (void)
{
  i++;
}

void
fn6 (void)
{
  i += 2;
}
and you'll see even on x86_64-linux that all the sequences use the generic CAS
instructions instead of __atomic_fetch_add etc.

The comment above build_atomic_assign even says this:
"Also note that the compiler is simply issuing the generic form of the atomic
operations."

So, the question is, should we add smarts to the FE to optimize the cases
already when emitting them (this would be similar to what omp-low.c does when
expanding #pragma omp atomic, see:
  /* When possible, use specialized atomic update functions.  */
  if ((INTEGRAL_TYPE_P (type) || POINTER_TYPE_P (type))
  && store_bb == single_succ (load_bb)
  && expand_omp_atomic_fetch_op (load_bb, addr,
 loaded_val, stored_val, index))
return;
), or should we add some pattern matching in some pass that would try to detect
these rather complicated patterns like:
  :
  _5 = __atomic_load_4 (ai_3(D), 5);
  _6 = (int) _5;
  D.1768 = _6;

  :
  # prephitmp_17 = PHI <_6(2), pretmp_16(4)>
  _9 = prephitmp_17 + 1;
  _10 = (unsigned int) _9;
  _12 = __atomic_compare_exchange_4 (ai_3(D), , _10, 0, 5, 5);
  if (_12 != 0)
goto ;
  else
goto ;

  :
  pretmp_16 = D.1768;
  goto ;

(with the casts in there optional) and convert those to the more efficient
__atomic_* calls if possible?  Note one issue is that the pattern involves
non-SSA loads/stores (the D.1768 var above) and we'd need to prove that the var
is used only in those two places and nowhere else.

[Bug target/68908] inefficient code for _Atomic operations

2015-12-15 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68908

--- Comment #4 from Richard Henderson  ---
I think we should rather handle this in the front end than with
quite complex pattern matching.

If we want to do any complex logic with atomics in the middle-end,
we should change their representation so that we don't have to
struggle with a sequence of builtins.  Which is clearly a subject
for gcc7 at minimum.

[Bug target/68908] inefficient code for _Atomic operations

2015-12-15 Thread mpolacek at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68908

Marek Polacek  changed:

   What|Removed |Added

 CC||amacleod at redhat dot com

--- Comment #3 from Marek Polacek  ---
Inviting Richard and Andrew to comment.
I'm afraid that the pattern-matching approach would be quite fragile, but as
Jakub pointed out elsewhere, it'd have the advantage that even user written
code with manual atomic_load and compare_and_swap could be optimized into the
fetch_and_* patterns.