https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93177
--- Comment #9 from Matt Emmerton <memmerto at ca dot ibm.com> --- (In reply to Segher Boessenkool from comment #6) > (In reply to Matt Emmerton from comment #4) > > The intrinsics that we would find useful, having used them as provided by > > the IBM XL C/C++ compiler, are the following: > > > > __sync() > > __isync() > > __lwsync() > > The sync intrinsics need to be tied to some other code. A volatile asm with > a "memory" clobber is not good enough, in many cases. We use these in our internal mutex and atomic implementations, and the resulting sequences are carefully scrutinized. > > __lwarx() > > __ldarx() > > __stwcx() > > __stdcx() > > The compiler can always insert memory accesses in between those two, if you > have them as separate intrinsics (and it will, simply stack accesses for > temporaries will do, already). If those accesses hit the same reservation > granule as the larx/stcx. uses, you lose. > > You need to write the whole sequence in one piece of assembler code. I would argue that the compiler should be smart enough to realize that these are part of a decomposed atomic operation, and avoid arbitrary instruction injection. As per my previous update, we use these primitives to implement things that the bulitin __atomic_* functions do not implement. > > __protected_stream_set() > > __protected_stream_count() > > __protected_stream_count_depth() // currently not implemented in gcc > > __protected_stream_go() > > Those are pretty specific to CBE I think? No. They are implemented on POWER5 and above (ISA 2.02), and are useful in managing cache prefetch behaviour. > > The implementation of stwcx() and stdcx() need revision on PPC. > > As I understand it, there is no need the mfocrf instruction nor the > > mask-and-shift on result. > > How else would you output the CR0.EQ bit? There is no need to copy CR0 to a GPR - branch instructions such as BNE can operate on CR0 directly.