Re: [lock-free] One reason why I like atomic_thread_fence...

Chris M. Thomasson Sun, 15 Apr 2018 15:32:39 -0700


On Friday, April 13, 2018 at 11:45:51 PM UTC-7, Dmitry Vyukov wrote:
>
> On Mon, Apr 9, 2018 at 3:38 AM, Chris M. Thomasson <cri...@charter.net 
> <javascript:>> wrote: 
> > On Saturday, April 7, 2018 at 1:46:20 AM UTC-7, Dmitry Vyukov wrote: 
> >> 
> >> On Thu, Apr 5, 2018 at 10:03 PM, Chris M. Thomasson <cri...@charter.net> 
>
> >> wrote: 
> >> > On Tuesday, April 3, 2018 at 5:44:38 AM UTC-7, Dmitry Vyukov wrote: 
> >> >> 
> >> >> On Sat, Mar 31, 2018 at 10:41 PM, Chris M. Thomasson 
> >> >> <cri...@charter.net> wrote: 
> >> >> > Notice how there is an acquire barrier inside of the CAS loop 
> within 
> >> >> > the 
> >> >> > enqueue and dequeue functions of: 
> >> >> > 
> >> >> > 
> >> >> > 
> http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue 
> >> [...] 
> >> > Executing an acquire barrier on every iteration of the CAS loop is 
> not 
> >> > necessary. The actual version count keeps everything in order. 
> >> > 
> >> > However, you do need a single acquire fence _after_ the CAS loop 
> >> > succeeds in 
> >> > order to get a clear view of the element. 
> >> 
> >> This is true. 
> > 
> > 
> > Agreed. I personally like the ability to see the membars being separated 
> out 
> > and 
> > 
> > standing alone. It is a habit of mine from SPARC. Now, tagged standalone 
> > membars 
> > 
> > aside for a moment, perhaps ones that can include memory locations they 
> are 
> > 
> > interested in... ;^) 
> > 
> > 
> >> 
> >> 
> >> I don't like standalone fences because they are plague for 
> >> verification. Consider, a single release fences turns _all_ subsequent 
> >> relaxed atomic stores ever executed by the thread into release 
> >> operations (they release memory state up to the fence point) and 
> >> handling of acquire/release operations is an O(N) operation (and 
> >> generally done under a mutex). 
> > 
> > 
> > A release operation should make sure all _prior_ operations are visible 
> > _before_ 
> > 
> > they are visible to another thread. They have no effect on subsequent 
> > relaxed 
> > 
> > operations. For instance: 
> > 
> > 
> > // producer 
> > 
> > A = 1 
> > 
> > B = 2 
> > 
> > RELEASE 
> > 
> > C = 3 
> > 
> > D = 4 
> > 
> > 
> > // consumer 
> > 
> > while (D != 4) backoff; 
> > 
> > ACQUIRE 
> > 
> > assert(A == 1 && B == 2); 
> > 
> > 
> > Well, A and B are going be in sync with an acquire such that the assert 
> will 
> > never 
> > fail, however C can be hoisted up and not be in sync at all! C is 
> incoherent 
> > wrt the 
> > consumer because it was not covered by the standalone release barrier. 
>
>
> In this case the RELEASE turned store to D into a release-store (a 
> subsequent store). 
> And ACQUIRE turned load of D into an acquire-load (a preceding load). 
>


D should be a pure relaxed store, and C should not be covered by the 
RELEASE. Iirc, it works this way on SPARC RMO mode. However, on x86, C will 
be covered because each store has implied release characteristics, wb 
memory aside for a moment.

 

>
> At lease this is how this is defined in C/C++ standards. 
> ACQUIRE/RELEASE fences do not establish any happens-before relations 
> themselves. You still need a load in one thread to observe a value 
> stored in another thread. And only that "materializes" standalone 
> fence synchronization. So a store that materializes RELEASE fence will 
> always be a subsequent store. 
>

Humm... That is too strict, and has to be there whether we use standalone 
fences or not. The store to D = 4 makes A and B wrt the RELEASE visible to 
the consumer threads that look for D = 4 and execute the ACQUIRE barrier 
after that fact has been observed. Afaict, C should NOT be covered.

 

>
>
> >> The same for acquire fences: a single 
> >> acquire fences turns _all_ loads ever executed by the thread into 
> >> acquire operations ton he corresponding memory locations, which means 
> >> that you need to handle all relaxed loads as a "shadow" acquire loads 
> >> for the case they will be materialized by a subsequent acquire fence. 
>

That sounds to coarse.

 

> > 
> > 
> > An acquire operation should make sure all operations wrt the release are 
> > visible 
> > 
> > _before_ any subsequent operations can be performed _after_ that fact is 
> > 
> > accomplished. 
> > 
> > 
> > Well, fwiw, the membars that can be embedded into the CAS wrt acquire 
> and 
> > 
> > release do effect prior and subsequent activity anyway, standalone or 
> not. A 
> > release will 
> > 
> > dump prior stores such that an acquire barrier will see them all. Now, 
> when 
> > we 
> > 
> > are dealing with a consume barrier, well that is targeting the release 
> > dependency 
> > 
> > chain wrt the pointer. A consume barrier is more precisely targeted when 
> > compared 
> > 
> > to the wider spectrum of an acquire. Btw, iirc consume is emulated in 
> Relacy 
> > as 
> > 
> > acquire right? 
> > 
> > 
> > Also, think of popping all nodes at once from an atomic LIFO: 
> > 
> > 
> > https://groups.google.com/d/topic/comp.lang.c++/V0s__czQwa0/discussion 
> > 
> > 
> > Well, how can we accomplish the following without using standalone 
> fences?: 
> > 
> > 
> >       // try to flush all of our nodes 
> >       node* flush() 
> >       { 
> >           node* n = m_head.exchange(nullptr, mb_relaxed); 
> > 
> >           if (n) 
> >           { 
> >               mb_fence(mb_acquire); 
> >           } 
> > 
> >           return n; 
> >       } 
>
> I can't disagree. They are definitely more flexible. 
>

Agreed.
 

>
>
> >> The same is actually true for human reasoning. Say, I am reading your 
> >> code. We have 3 load operations in the loop and an acquire fences 
> >> after the loop. Now the question is: which of the loads we wanted to 
> >> turn into acquire by adding the fence? Or is it maybe 2 of them? 
> >> Which? Or maybe 1 in the loop and 1 somewhere before the loop, in a 
> >> different function? 
> >> One can, of course, comment that, but Relacy won't check comments, so 
> >> I won't trust them ;) 
> > 
> > 
> > 
> > Interesting. Still makes me think of tagged membars. I will get back to 
> you 
> > with 
> > 
> > a more detailed response. 
>
>
> You mean something like: 
>
>        // try to flush all of our nodes 
>        node* flush() 
>        { 
>            node* n = m_head.exchange(nullptr, mb_relaxed); 
>
>            if (n) 
>            { 
>                mb_fence(mb_acquire, m_head);  // <---- HERE 
>            } 
>
>            return n; 
>        } 
>
> ? Interesting. 
>

Yes! The stand alone fence can say, we want to perform an acquire barrier 
wrt m_head. Something like that should be able to create more fine grain 
setups. Perhaps even something like the following pseudo-code:
______________________
// setup
int a = 0;
int b = 0;
int c = 0;
signal = false;

// producer
a = 1;
b = 2;
RELEASE(&signal, &a, &b);
c = 3;
STORE_RELAXED(&signal, true);

// consumers
while (LOAD_RELAXED(&signal) != true) backoff;
ACQUIRE(&signal, &a, &b);
assert(a == 1 && b == 2);
______________________

The consumers would always see a and b as 1 and 2, however, c was not 
covered, so it is an incoherent state wrt said consumers.

The acquire would only target a and b, as would the release.

Hummm... Just thinking out loud here. :^)

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/1c4afa47-0a1a-4cce-85e5-f8d9605c8c77%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [lock-free] One reason why I like atomic_thread_fence...

Reply via email to