On Wed, Sep 03, 2025 at 11:45:31AM +0100, Matthew Malcomson wrote:
> Ah -- to be honest we never considered this seriously due to how we came to
> this work.
> 
> The implicit reason in how we made the decision is:
> - Using pattern matching would mean libstdc++ will be using the CAS loop to
> implement these new C++ methods.
> - We've seen other compilers have a lot of trouble recognising CAS loops in
> that context and turning that into a builtin (NVC++ had trouble with
> suboptimal codegen due to finding it hard to pattern-match the libstdc++
> floating point fetch_add and fetch_sub CAS loop).
> - Hence we'd like the builtins so that libstdc++ can use them.  That should
> mean that it's easier for other compilers to generate good code.
> 
> 
> Attempting to distill what the difficulty was w.r.t. recognising CAS loops
> in libstdc++ (the motivating example for us):
> - The load and compare exchange used in CAS loops there are behind a local
> function call to add some assertions and type-safety.  Pattern matching this
> can get quite difficult.
> - These C++ operations will also have floating point variants, and CAS loops
> written to emulate them would require handling the floating point semantics
> (and exception information) "just right" in order to be able to be replaced
> by the relevant operation.
> 
> Does that sound reasonable?

No, I don't really get what is so hard to pattern match this (obviously you
want to pattern match it after inlining, not before).
I think we have far more complicated pattern matchers in gcc already.

int
foo (int *p, int i)
{
  int o = __atomic_load_n (p, __ATOMIC_RELAXED);
  int q;
  do
    q = o < i ? o : i;
  while (__atomic_compare_exchange_n (p, &o, q, 1, __ATOMIC_RELAXED, 
__ATOMIC_RELAXED));
  return o;
}

That is roughly
  _1 = __atomic_load_4 (p_6(D), 0);
  _2 = (int) _1;

  <bb 3> [local count: 1073741824]:
  # o_16 = PHI <_2(2), _15(3)>
  q_9 = MIN_EXPR <i_8(D), o_16>;
  q.1_3 = (unsigned int) q_9;
  _11 = (unsigned int) o_16;
  _12 = .ATOMIC_COMPARE_EXCHANGE (p_6(D), _11, q.1_3, 260, 0, 0);
  _13 = IMAGPART_EXPR <_12>;
  _14 = REALPART_EXPR <_12>;
  _15 = (int) _14;
  if (_13 != 0)
    goto <bb 3>; [99.96%]
  else
    goto <bb 4>; [0.04%]
obviously for unsigned types it will be without the casts in there, so it
needs to be a little bit flexible, but not so much (allow casts to the same
precision, for floating point perhaps VCEs).  And sure, it should look at
the memory model flags and figure out if the planned replacement is
compatible with those.

        Jakub

Reply via email to