https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68908

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|powerpc64                   |
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2015-12-15
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |jsm28 at gcc dot gnu.org,
                   |                            |mpolacek at gcc dot gnu.org,
                   |                            |rth at gcc dot gnu.org
            Summary|inefficient code for an     |inefficient code for
                   |atomic preincrement on      |_Atomic operations
                   |powerpc64le                 |
     Ever confirmed|0                           |1

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Doesn't seem to be ppc64le specific in any way, and doesn't affect just
preincrement.
Try:
typedef _Atomic int AI;
AI i;

void
fn1 (AI * ai)
{
  ++*ai;
}

void
fn2 (AI * ai)
{
  (*ai)++;
}

void
fn3 (AI * ai)
{
  *ai += 6;
}

void
fn4 (void)
{
  ++i;
}

void
fn5 (void)
{
  i++;
}

void
fn6 (void)
{
  i += 2;
}
and you'll see even on x86_64-linux that all the sequences use the generic CAS
instructions instead of __atomic_fetch_add etc.

The comment above build_atomic_assign even says this:
"Also note that the compiler is simply issuing the generic form of the atomic
operations."

So, the question is, should we add smarts to the FE to optimize the cases
already when emitting them (this would be similar to what omp-low.c does when
expanding #pragma omp atomic, see:
          /* When possible, use specialized atomic update functions.  */
          if ((INTEGRAL_TYPE_P (type) || POINTER_TYPE_P (type))
              && store_bb == single_succ (load_bb)
              && expand_omp_atomic_fetch_op (load_bb, addr,
                                             loaded_val, stored_val, index))
            return;
), or should we add some pattern matching in some pass that would try to detect
these rather complicated patterns like:
  <bb 2>:
  _5 = __atomic_load_4 (ai_3(D), 5);
  _6 = (int) _5;
  D.1768 = _6;

  <bb 3>:
  # prephitmp_17 = PHI <_6(2), pretmp_16(4)>
  _9 = prephitmp_17 + 1;
  _10 = (unsigned int) _9;
  _12 = __atomic_compare_exchange_4 (ai_3(D), &D.1768, _10, 0, 5, 5);
  if (_12 != 0)
    goto <bb 5>;
  else
    goto <bb 4>;

  <bb 4>:
  pretmp_16 = D.1768;
  goto <bb 3>;

(with the casts in there optional) and convert those to the more efficient
__atomic_* calls if possible?  Note one issue is that the pattern involves
non-SSA loads/stores (the D.1768 var above) and we'd need to prove that the var
is used only in those two places and nowhere else.

Reply via email to