On 9/3/25 15:23, Jakub Jelinek wrote:
External email: Use caution opening links or attachments


Does that sound reasonable?

No, I don't really get what is so hard to pattern match this (obviously you
want to pattern match it after inlining, not before).
I think we have far more complicated pattern matchers in gcc already.

int
foo (int *p, int i)
{
   int o = __atomic_load_n (p, __ATOMIC_RELAXED);
   int q;
   do
     q = o < i ? o : i;
   while (__atomic_compare_exchange_n (p, &o, q, 1, __ATOMIC_RELAXED, 
__ATOMIC_RELAXED));
   return o;
}

That is roughly
   _1 = __atomic_load_4 (p_6(D), 0);
   _2 = (int) _1;

   <bb 3> [local count: 1073741824]:
   # o_16 = PHI <_2(2), _15(3)>
   q_9 = MIN_EXPR <i_8(D), o_16>;
   q.1_3 = (unsigned int) q_9;
   _11 = (unsigned int) o_16;
   _12 = .ATOMIC_COMPARE_EXCHANGE (p_6(D), _11, q.1_3, 260, 0, 0);
   _13 = IMAGPART_EXPR <_12>;
   _14 = REALPART_EXPR <_12>;
   _15 = (int) _14;
   if (_13 != 0)
     goto <bb 3>; [99.96%]
   else
     goto <bb 4>; [0.04%]
obviously for unsigned types it will be without the casts in there, so it
needs to be a little bit flexible, but not so much (allow casts to the same
precision, for floating point perhaps VCEs).  And sure, it should look at
the memory model flags and figure out if the planned replacement is
compatible with those.

         Jakub


Ok -- TBH I don't have any extra details on this argument right now and your point on it's feasibility seems quite convincing. (Timezones are slowing communication with other compiler team about why it's hard in their case -- I may get more information later).

The other arguments towards a builtin I know of are less of a requirement and more about helping keep code standardised throughout the ecosystem:

- The design of the atomic builtins was to match the requirements for C++11. It would seem natural to me to keep matching the C++ standard as it evolves. (In this case also providing users writing C with a standard interface to use this functionality).

- Specifically for the fetch_min/fetch_max, the paper that proposed it be standardised discussed the forms of CAS loop that might be written and how the semantics of two of them are subtly different (section 5 of https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p0493r5.pdf). If we don't provide a user-interface then each user writing C would have to deal with the subtleties of atomic synchronisation vs aggressive optimisations themselves increasing possibility of some mistakes being made.


MM

Reply via email to