https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93177
--- Comment #10 from Segher Boessenkool <segher at gcc dot gnu.org> --- (In reply to Matt Emmerton from comment #9) > > > __sync() > > > __isync() > > > __lwsync() > > > > The sync intrinsics need to be tied to some other code. A volatile asm with > > a "memory" clobber is not good enough, in many cases. > > We use these in our internal mutex and atomic implementations, and the > resulting sequences are carefully scrutinized. You have to check it after *every build* then, in general :-/ > > > __lwarx() > > > __ldarx() > > > __stwcx() > > > __stdcx() > > > > The compiler can always insert memory accesses in between those two, if you > > have them as separate intrinsics (and it will, simply stack accesses for > > temporaries will do, already). If those accesses hit the same reservation > > granule as the larx/stcx. uses, you lose. > > > > You need to write the whole sequence in one piece of assembler code. > > I would argue that the compiler should be smart enough to realize that these > are part of a decomposed atomic operation, and avoid arbitrary instruction > injection. But this is impossible, it is contrary to all optimisation goals we have. Yes, It could perhaps work with -O0. > > > __protected_stream_set() > > > __protected_stream_count() > > > __protected_stream_count_depth() // currently not implemented in gcc > > > __protected_stream_go() > > > > Those are pretty specific to CBE I think? > > No. They are implemented on POWER5 and above (ISA 2.02), and are useful in > managing cache prefetch behaviour. Open a separate feature request for these then, please. > > > The implementation of stwcx() and stdcx() need revision on PPC. > > > As I understand it, there is no need the mfocrf instruction nor the > > > mask-and-shift on result. > > > > How else would you output the CR0.EQ bit? > > There is no need to copy CR0 to a GPR - branch instructions such as BNE can > operate on CR0 directly. You cannot write anything that maps to a CR field directly.