On Tue, Feb 18, 2014 at 10:49:27AM -0800, Linus Torvalds wrote:
> On Tue, Feb 18, 2014 at 10:21 AM, Peter Sewell
> <peter.sew...@cl.cam.ac.uk> wrote:
> >
> > This is a bit more subtle, because (on ARM and POWER) removing the
> > dependency and conditional branch is actually in general *not* equivalent
> > in the hardware, in a concurrent context.
> 
> So I agree, but I think that's a generic issue with non-local memory
> ordering, and is not at all specific to the optimization wrt that
> "x?42:42" expression.
> 
> If you have a value that you loaded with a non-relaxed load, and you
> pass that value off to a non-local function that you don't know what
> it does, in my opinion that implies that the compiler had better add
> the necessary serialization to say "whatever that other function does,
> we guarantee the semantics of the load".
> 
> So on ppc, if you do a load with "consume" or "acquire" and then call
> another function without having had something in the caller that
> serializes the load, you'd better add the lwsync or whatever before
> the call. Exactly because the function call itself otherwise basically
> breaks the visibility into ordering. You've basically turned a
> load-with-ordering-guarantees into just an integer that you passed off
> to something that doesn't know about the ordering guarantees - and you
> need that "lwsync" in order to still guarantee the ordering.
> 
> Tough titties. That's what a CPU with weak memory ordering semantics
> gets in order to have sufficient memory ordering.

And that is in fact what C11 compilers are supposed to do if the function
doesn't have the [[carries_dependency]] attribute on the corresponding
argument or return of the non-local function.  If the function is marked
with [[carries_dependency]], then the compiler has the information needed
in both compilations to make things work correctly.

                                                        Thanx, Paul

> And I don't think it's actually a problem in practice. If you are
> doing loads with ordered semantics, you're not going to pass the
> result off willy-nilly to random functions (or you really *do* require
> the ordering, because the load that did the "acquire" was actually for
> a lock!
> 
> So I really think that the "local optimization" is correct regardless.
> 
>                    Linus
> 

Reply via email to