On Fri, 19 Jun 2020 09:33:44 +0200 David Hildenbrand <da...@redhat.com> wrote:
> On 18.06.20 01:56, Halil Pasic wrote: > > On Tue, 16 Jun 2020 08:33:33 +0200 > > Cornelia Huck <coh...@redhat.com> wrote: > > > >>> #define atomic_cmpxchg__nocheck(ptr, old, new) ({ > >>> \ > >>> > >>> typeof_strip_qual(*ptr) _old = (old); > >>> \ > >>> > >>> (void)__atomic_compare_exchange_n(ptr, &_old, new, false, > >>> \ > >>> > >>> __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST); > >>> \ > >>> > >>> _old; > >>> \ > >>> > >>> }) > >>> > >>> ind_old is copied into _old in the macro. Instead of doing the copy from > >>> the > >>> register the compiler reloads the value from memory. The result is that > >>> _old > >>> and ind_old end up having different values. _old in r1 with the bits set > >>> already and ind_old in r10 with the bits cleared. _old gets updated by CS > >>> and matches ind_old afterwards - both with the bits being 0. So the != > >>> compare is false and the loop is left without having set any bits. > >>> > >>> > >>> Paolo (to), > >>> I am asking myself if it would be safer to add a barrier or something like > >>> this in the macros in include/qemu/atomic.h. > >> > >> I'm also wondering whether this has been seen on other architectures as > >> well? There are also some callers in non-s390x code, and dealing with > >> this in common code would catch them as well. > > > > Quite a bunch of users use something like old = atomic_read(..), where > > atomic_read is documented as in docs/devel/atomics.rst: > > - ``atomic_read()`` and ``atomic_set()``; these prevent the compiler from > > optimizing accesses out of existence and creating unsolicited > > accesses, but do not otherwise impose any ordering on loads and > > stores: both the compiler and the processor are free to reorder > > them. > > > > Maybe I should have used that instead of volatile, but my problem was > > that I didn't fully understand what atomic_read() does, and if it does > > more than we need. I found the documentation just now. > > IIRC, atomic_read() is the right way of doing it, at least in the > kernel. In kernel I would use READ_ONCE. https://elixir.bootlin.com/linux/v5.8-rc1/source/include/asm-generic/atomic.h#L171 In this case we are not manipulating an atomic variable. For uint8_t that boils down to an access through a volatile pointer. And that is what I did :). > I use such a loop in QEMU in > > https://lkml.kernel.org/r/20200610115419.51688-2-da...@redhat.com > > But reading docs/devel/atomics.rst:"Comparison with Linux kernel > primitives" I do wonder if that is sufficient. > > Any experts around? > IMHO what we want here is READ_ONCE, i.e. volatile access, and not necessarily atomic access. But I suppose atomic access implies volatile access (the C11 standard refers to atomic_load as C atomic_load(volatile A *object)). Because QEMU seems to use atomic_read() in such situations, and does not have READ_ONCE, for me atomic_read would also do. Regards, Halil