On Tue, 16 Jun 2020 07:58:53 +0200 Christian Borntraeger <borntrae...@de.ibm.com> wrote:
> On 16.06.20 06:50, Halil Pasic wrote: > > The atomic_cmpxchg() loop is broken because we occasionally end up with > > old and _old having different values (a legit compiler can generate code > > that accessed *ind_addr again to pick up a value for _old instead of > > using the value of old that was already fetched according to the > > rules of the abstract machine). This means the underlying CS instruction > > may use a different old (_old) than the one we intended to use if > > atomic_cmpxchg() performed the xchg part. > > > > Let us use volatile to force the rules of the abstract machine for > > accesses to *ind_addr. Let us also rewrite the loop so, we that the > > new old is used to compute the new desired value if the xchg part > > is not performed. > > > > Signed-off-by: Halil Pasic <pa...@linux.ibm.com> > > Reported-by: Andre Wild <andre.wi...@ibm.com> > > Fixes: 7e7494627f ("s390x/virtio-ccw: Adapter interrupt support.") > > --- > > hw/s390x/virtio-ccw.c | 18 ++++++++++-------- > > 1 file changed, 10 insertions(+), 8 deletions(-) > > > > diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c > > index c1f4bb1d33..3c988a000b 100644 > > --- a/hw/s390x/virtio-ccw.c > > +++ b/hw/s390x/virtio-ccw.c > > @@ -786,9 +786,10 @@ static inline VirtioCcwDevice > > *to_virtio_ccw_dev_fast(DeviceState *d) > > static uint8_t virtio_set_ind_atomic(SubchDev *sch, uint64_t ind_loc, > > uint8_t to_be_set) > > { > > - uint8_t ind_old, ind_new; > > + uint8_t expected, actual; > > hwaddr len = 1; > > - uint8_t *ind_addr; > > + /* avoid multiple fetches */ > > + uint8_t volatile *ind_addr; > > > > ind_addr = cpu_physical_memory_map(ind_loc, &len, true); > > if (!ind_addr) { > > @@ -796,14 +797,15 @@ static uint8_t virtio_set_ind_atomic(SubchDev *sch, > > uint64_t ind_loc, > > __func__, sch->cssid, sch->ssid, sch->schid); > > return -1; > > } > > + actual = *ind_addr; > > do { > > - ind_old = *ind_addr; > > to make things easier to understand. Adding a barrier in here also fixes the > issue. > Reasoning follows below: > > > - ind_new = ind_old | to_be_set; > > with an analysis from Andreas (cc) > > #define atomic_cmpxchg__nocheck(ptr, old, new) ({ \ > > typeof_strip_qual(*ptr) _old = (old); \ > > (void)__atomic_compare_exchange_n(ptr, &_old, new, false, \ > > __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST); \ > > _old; \ > > }) > > ind_old is copied into _old in the macro. Instead of doing the copy from the > register the compiler reloads the value from memory. The result is that _old > and ind_old end up having different values. _old in r1 with the bits set > already and ind_old in r10 with the bits cleared. _old gets updated by CS > and matches ind_old afterwards - both with the bits being 0. So the != > compare is false and the loop is left without having set any bits. > > > Paolo (to), > I am asking myself if it would be safer to add a barrier or something like > this in the macros in include/qemu/atomic.h. I'm also wondering whether this has been seen on other architectures as well? There are also some callers in non-s390x code, and dealing with this in common code would catch them as well. > > > > > > - } while (atomic_cmpxchg(ind_addr, ind_old, ind_new) != ind_old); > > - trace_virtio_ccw_set_ind(ind_loc, ind_old, ind_new); > > - cpu_physical_memory_unmap(ind_addr, len, 1, len); > > + expected = actual; > > + actual = atomic_cmpxchg(ind_addr, expected, expected | to_be_set); > > + } while (actual != expected); > > + trace_virtio_ccw_set_ind(ind_loc, actual, actual | to_be_set); > > + cpu_physical_memory_unmap((void *)ind_addr, len, 1, len); > > > > - return ind_old; > > + return actual; > > } > > > > static void virtio_ccw_notify(DeviceState *d, uint16_t vector) > > > >