On Mon, Sep 16, 2019 at 01:17:53PM +0800, Boqun Feng wrote: > Hi Alan, > > I spend some time reading this, really helpful! Thanks. > > Please see comments below: > > On Fri, Sep 06, 2019 at 02:11:29PM -0400, Alan Stern wrote: > [...] > > If two memory accesses aren't concurrent then one must execute before > > the other. Therefore the LKMM decides two accesses aren't concurrent > > if they can be connected by a sequence of hb, pb, and rb links > > (together referred to as xb, for "executes before"). However, there > > are two complicating factors. > > > > If X is a load and X executes before a store Y, then indeed there is > > no danger of X and Y being concurrent. After all, Y can't have any > > effect on the value obtained by X until the memory subsystem has > > propagated Y from its own CPU to X's CPU, which won't happen until > > some time after Y executes and thus after X executes. But if X is a > > store, then even if X executes before Y it is still possible that X > > will propagate to Y's CPU just as Y is executing. In such a case X > > could very well interfere somehow with Y, and we would have to > > consider X and Y to be concurrent. > > > > Therefore when X is a store, for X and Y to be non-concurrent the LKMM > > requires not only that X must execute before Y but also that X must > > propagate to Y's CPU before Y executes. (Or vice versa, of course, if > > Y executes before X -- then Y must propagate to X's CPU before X > > executes if Y is a store.) This is expressed by the visibility > > relation (vis), where X ->vis Y is defined to hold if there is an > > intermediate event Z such that: > > > > X is connected to Z by a possibly empty sequence of > > cumul-fence links followed by an optional rfe link (if none of > > these links are present, X and Z are the same event), > > > > I wonder whehter we could add an optional ->coe or ->fre between X and > the possibly empty sequence of cumul-fence, smiliar to the definition > of ->prop. This makes sense because if we have > > X ->coe X' (and X' in on CPU C) > > , X must be already propagated to C before X' executed, according to our > operation model: > > "... In particular, it must arrange for the store to be co-later than > (i.e., to overwrite) any other store to the same location which has > already propagated to CPU C." > > In other words, we can define ->vis as: > > let vis = prop ; ((strong-fence ; [Marked] ; xbstar) | (xbstar & int)) >
Hmm.. so the problem with this approach is that the (xbstar & int) part doesn't satisfy the requirement of visibility... i.e. X ->prop Z ->(xbstar & int) Y may not guarantee when Y executes, X is already propagated to Y's CPU. Lemme think more on this... Regards, Boqun > , and for this document, reference the "prop" section in > explanation.txt. This could make the model simple (both for description > and explanation), and one better thing is that we won't need commit in > Paul's dev branch: > > c683f2c807d2 "tools/memory-model: Fix data race detection for unordered > store and load" > > , and data race rules will look more symmetrical ;-) > > Thoughts? Or am I missing something subtle here? > > Regards, > Boqun > > > and either: > > > > Z is connected to Y by a strong-fence link followed by a > > possibly empty sequence of xb links, > > > > or: > > > > Z is on the same CPU as Y and is connected to Y by a possibly > > empty sequence of xb links (again, if the sequence is empty it > > means Z and Y are the same event). > > > > The motivations behind this definition are straightforward: > > > > cumul-fence memory barriers force stores that are po-before > > the barrier to propagate to other CPUs before stores that are > > po-after the barrier. > > > > An rfe link from an event W to an event R says that R reads > > from W, which certainly means that W must have propagated to > > R's CPU before R executed. > > > > strong-fence memory barriers force stores that are po-before > > the barrier, or that propagate to the barrier's CPU before the > > barrier executes, to propagate to all CPUs before any events > > po-after the barrier can execute. > > > > To see how this works out in practice, consider our old friend, the MP > > pattern (with fences and statement labels, but without the conditional > > test): > > > > int buf = 0, flag = 0; > > > > P0() > > { > > X: WRITE_ONCE(buf, 1); > > smp_wmb(); > > W: WRITE_ONCE(flag, 1); > > } > > > > P1() > > { > > int r1; > > int r2 = 0; > > > > Z: r1 = READ_ONCE(flag); > > smp_rmb(); > > Y: r2 = READ_ONCE(buf); > > } > > > > The smp_wmb() memory barrier gives a cumul-fence link from X to W, and > > assuming r1 = 1 at the end, there is an rfe link from W to Z. This > > means that the store to buf must propagate from P0 to P1 before Z > > executes. Next, Z and Y are on the same CPU and the smp_rmb() fence > > provides an xb link from Z to Y (i.e., it forces Z to execute before > > Y). Therefore we have X ->vis Y: X must propagate to Y's CPU before Y > > executes. > > > [...]
signature.asc
Description: PGP signature