On Tuesday, May 17, 2011 9:46:34 am Andriy Gapon wrote: > on 17/05/2011 14:56 John Baldwin said the following: > > On 5/17/11 4:03 AM, Andriy Gapon wrote: > >> Couldn't [Shouldn't] the whole: > >> > >>>>> /* Ensure we have up-to-date values. */ > >>>>> atomic_add_acq_int(&smp_rv_waiters[0], 1); > >>>>> while (smp_rv_waiters[0]< smp_rv_ncpus) > >>>>> cpu_spinwait(); > >> > >> be just replaced with: > >> > >> rmb(); > >> > >> Or a proper MI function that does just a read memory barrier, if rmb() is > >> not that. > > > > No, you could replace it with: > > > > atomic_add_acq_int(&smp_rv_waiters[0], 1); > > What about > (void)atomic_load_acq(&smp_rv_waiters[0]); > > In my opinion that should ensure that the hardware must post the latest value > from > a master CPU to memory of smp_rv_waiters[0] and a slave CPU gets it from > there. > And also, because of memory barriers inserted by store_rel on the master CPU > and > load_acq on the slave CPU, the latest values of all other smp_rv_* fields > should > become visible to the slave CPU.
No, it doesn't quite work that way. It wouldn't work on Alpha for example. All load_acq is a load with a memory barrier to order other loads after it. It is still free to load stale data. Only a read-modify-write operation would actually block until it could access an up-to-date value. > > > The key being that atomic_add_acq_int() will block (either in hardware or > > software) until it can safely perform the atomic operation. That means > > waiting > > until the write to set smp_rv_waiters[0] to 0 by the rendezvous initiator is > > visible to the current CPU. > > > > On some platforms a write by one CPU may not post instantly to other CPUs > > (e.g. it > > may sit in a store buffer). That is fine so long as an attempt to update > > that > > value atomically (using cas or a conditional-store, etc.) fails. For those > > platforms, the atomic(9) API is required to spin until it succeeds. > > > > This is why the mtx code spins if it can't set MTX_CONTESTED for example. > > > > Thank you for the great explanation! > Taking sparc64 as an example, I think that atomic_load_acq uses a degenerate > cas > call, which should take care of hardware synchronization. sparc64's load_acq() is stronger than the MI effect of load_acq(). On ia64 which uses ld.acq or Alpha (originally) which uses a membar and simple load, the guarantees are only what I stated above (and would not be sufficient). Note that Alpha borrowed heavily from MIPS, and the MIPS atomic implementation is mostly identical to the old Alpha one (using conditional stores, etc.). The MIPS atomic_load_acq(): #define ATOMIC_STORE_LOAD(WIDTH) \ static __inline uint##WIDTH##_t \ atomic_load_acq_##WIDTH(__volatile uint##WIDTH##_t *p) \ { \ uint##WIDTH##_t v; \ \ v = *p; \ mips_sync(); \ return (v); \ } \ -- John Baldwin _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"