https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80640
--- Comment #7 from Alexander Monakov <amonakov at gcc dot gnu.org> --- I've submitted a patch [1] for the missing compiler barrier, but however please note that the original ompi code and the example in comment #3 are wrong: in a pattern like while (*foo) __atomic_thread_fence(__ATOMIC_ACQUIRE); I think there are two issues; first, if *foo is a non-atomic, non-volatile object, a concurrent modification from another thread would invoke undefined behavior due to a data race; and second, if the loop is not entered (i.e. *foo is false initially), then execution does not encounter the acquire fence at all (and generally execution doesn't encounter the acquire fence after observing *foo==0, so subsequent loads are not properly ordered against that). [1]: https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00782.html