Re: [lng-odp] [API-NEXT PATCH] api: sync: update spec and add odp_sync_loads

Savolainen, Petri (Nokia - FI/Espoo) Thu, 12 Nov 2015 02:04:16 -0800

> > if (msg == UPDATE_FOO) {
> >     // Make sure that load of “foo” is not moved before schedule()
> >     odp_sync_loads();
> 
> odp_sync_acquire() would be a better name.
> 
> >
> >     foo++;
> >     // Make sure that store of “foo” is not moved after queue_enq()
> >    odp_sync_stores();
> 
> odp_sync_release() would be a better name.
> 
> I am somewhat OK with odp_sync_acquire/odp_sync_release (should also
> take vector of memory segments per suggestion below) as long as it is
> required that producer and consumer use the same ODP calls.


Naming could be acq/rel instead of loads/stores, if those are considered as 
well established terms. The problem with acq/rel names are that this call does 
not do the acq or rel operation. Also the synchronization functionality is a 
bit different:

"While an atomic store-release operation prevents all preceding writes from 
moving past the *store-release*, an atomic_thread_fence with 
memory_order_release ordering prevents *all preceding writes from moving past 
all subsequent stores*".

Thread fence synchronizes between *all stores* before and *all stores* after 
the fence - thus the name odp_sync_stores() fits better.

Store-release synchronizes stores before the operation with the operation, but 
not with *all other* stores following the operation.



> 
> >     msg = FOO_UPDATED;
> >     queue_enq(msg);
> > }
> >
> >                                                                  ev =
> schedule();
> >                                                                  msg
> = event_to_message(ev);
> >
> >                                                                  if
> (msg == FOO_UPDATED) {
> >
> // Make sure that load of “foo” is not moved before schedule()
> >
> odp_sync_loads();
> >
> printf(“foo is now %\n”, foo);
> >                                                                  }
> >
> >
> > As you can see it would a lot of waste if application must always
> sync for stores and loads explicitly.
> > I see your point.
> > I re-read carefully the documentation and maybe we can work around
> it.
> > The sync_stores explicitly states that all store operations are
> globally visible. So cache needs to be flushed.
> 
> If the Kalray HW isn't cache coherent between CPU's, neither old-style
> (volatile + barriers) or C11-style shared memory programming will
> work. Probably some platform specific code is necessary in order to
> share data between CPU's and to properly synchronise the access.
> Perhaps the acquire and release operations should take a vector of
> addresses and sizes, specifying which data is being shared so to
> minimise cache flushing (in the producer) and invalidation (in the
> consumer).

Vector / non-coherent versions would another set of functions. These two should 
be just regular memory fences for coherent memory systems.


> 
> >
> > The load part only states that previous memory operations are
> complete, and cannot be reordered around it.
> > It does not explicitly states that the cache needs to be up-to-date.
> We internally use "read memory barrier" internally as DINVAL + FENCE
> but it is not necessary the case.
> >
> > If my understanding of the function fits with yours, I'm good with
> this patches. If not, we need to discuss things some more.
> >
> > Nicolas
> >
> >
> >
> > The above highlights how it should *not* be used (== not needed with
> ready-made ODP sync mechanisms). It’s intended to be used with non-ODP
> sync mechanisms like the one under. Store and load syncs are usually a
> pair (sync_stores -> sync_loads).
> 
> If either or both of the producer and consumer cannot use some common
> ODP mechanism, the mechanism they actually use must be well defined
> (e.g. must use DMB or DSB on ARM) so that both sides pair properly. So
> for one side to use e.g. odp_sync_stores() (without knowing the actual
> implementation) and the other side use whatever mechanism (not
> odp_sync_loads) is a potential cause of problems. It is difficult for
> me to see a generic implementation (in ODP) that can always pair with
> whatever undefined mechanism is used by the peer. I think the code in
> an ODP application that synchronises with a non-ODP producer or
> consumer must use the specific mechanism agreed on and ODP cannot
> provide any generic function that will work regardless.

It should be enough that we specify this to function the same way as C11 
atomic_thread_fence. There's the spec the other side needs to understand what 
to do. This is what my patch defines in practice.

Did you see actual text in the patch? It's clipped out it this thread.


> 
> >
> >
> > -Petri
> >
> >
> > // data and flag in shared memory
> > volatile int flag = 0;
> 
> int flag = 0; is enough
> 
> >
> > int foo = 0;
> >
> > Thread A                         Thread B
> >
> > foo = 0xbeef;
> >
> > // Make sure that store
> > // of "foo" is visible before
> > // store of "flag"
> > odp_sync_stores();
> > flag = 1;
> 
> Must be implemented as:
> __atomic_store_n(&flag, 1, __ATOMIC_RELEASE);
> All loads and stores before writing to the flag that releases the
> shared data must be observable before the store to flag is observable.
> 

This example illustrates any non-ODP synchronization method (which may be more 
complex than single  store). Sure, when under ODP the simple release store 
should use atomic_store_rel(), but this use case is for non-ODP provided 
synchronization (non-ODP algorithm, 3rd partly lib, etc). This sync may be 
heavier than store_release, but that's what you need to do if cannot use 
store_release ...


-Petri


> 
> >
> >                                  while (flag == 0)
> 
> while (__atomic_load_n(&flag, __ATOMIC_ACQUIRE) == 0)
> No load or store after the load of the flag must be observable by
> other threads before the flag has been "acquired". Now speculative
> stores are not so common but loads could be reordered so reading the
> (stale) shared data before the flag has been set.
> 
> >                                      spin_and_wait();
> >
> >                                  // Make sure that load of
> >                                  // "foo" does not happen before
> >                                  // we see the flag changing
> >                                  odp_sync_loads();
> >
> >                                  printf(“foo is now %\n”, foo);
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > lng-odp mailing list
> > lng-odp@lists.linaro.org
> > https://lists.linaro.org/mailman/listinfo/lng-odp
_______________________________________________
lng-odp mailing list
lng-odp@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/lng-odp

Re: [lng-odp] [API-NEXT PATCH] api: sync: update spec and add odp_sync_loads

Reply via email to