On Tue, Oct 17, 2017 at 05:03:01PM -0400, Alan Stern wrote:
> On Tue, 17 Oct 2017, Paul E. McKenney wrote:
> 
> > On Tue, Oct 17, 2017 at 03:38:23PM -0400, Alan Stern wrote:
> > > On Tue, 17 Oct 2017, Paul E. McKenney wrote:
> > > 
> > > > How about this?
> > > > 
> > > > 0.      Simple special cases
> > > > 
> > > >         If there is only one CPU on the one hand or only one variable
> > > >         on the other, the code will execute in order.  There are (as
> > > >         usual) some things to be careful of:
> > > > 
> > > >         a.      There are some aspects of the C language that are
> > > >                 unordered.  For example, the compiler can output code
> > > >                 computing arguments of a multi-parameter function in
> > > >                 any order it likes, or even interleaved if it so 
> > > > chooses.
> > > 
> > > That parses a little oddly.  I wouldn't agree that the compiler outputs
> > > the code in any order it likes!
> > 
> > When was the last time you talked to a compiler writer?  ;-)
> > 
> > > In fact, I wouldn't even mention the compiler at all.  Just say that
> > > (with a few exceptions) the language doesn't specify the order in which
> > > the arguments of a function or operation should be evaluated.  For
> > > example, in the expression "f(x) + g(y)", the order in which f and g
> > > are called is not defined; the object code is allowed to use either
> > > order or even to interleave the computations.
> > 
> > Nevertheless, I took your suggestion:
> > 
> >     a.      There are some aspects of the C language that are
> >             unordered.  For example, in the expression "f(x) + g(y)",
> >             the order in which f and g are called is not defined;
> >             the object code is allowed to use either order or even
> >             to interleave the computations.
> 
> Good.
> 
> > > >         b.      Compilers are permitted to use the "as-if" rule.
> > > >                 That is, a compiler can emit whatever code it likes,
> > > >                 as long as the results appear just as if the compiler
> > > >                 had followed all the relevant rules.  To see this,
> > > >                 compiler with a high level of optimization and run
> > > >                 the debugger on the resulting binary.
> > > 
> > > You might omit the last sentence.  Furthermore, if the accesses don't
> > > use READ_ONCE/WRITE_ONCE then the code might not get the same result as
> > > if it had executed in order (even for a single variable!), and if you
> > > do use READ_ONCE/WRITE_ONCE then the compiler can't emit whatever code
> > > it likes.
> > 
> > Ah, I omitted an important qualifier:
> > 
> >     b.      Compilers are permitted to use the "as-if" rule.  That is,
> >             a compiler can emit whatever code it likes, as long as
> >             the results of a single-threaded execution appear just
> >             as if the compiler had followed all the relevant rules.
> >             To see this, compile with a high level of optimization
> >             and run the debugger on the resulting binary.
> 
> That's okay for the single-CPU case.  I don't think it covers the
> multiple-CPU single-variable case correctly, though.  If you don't use
> READ_ONCE or WRITE_ONCE, isn't the compiler allowed to tear the loads
> and stores?  And won't that potentially cause the end result to be
> different from what you would get if the code had appeared to execute
> in order?

Ah, good point, I need yet another qualifier.  How about the following?

        b.      Compilers are permitted to use the "as-if" rule.  That is,
                a compiler can emit whatever code it likes for normal
                accesses, as long as the results of a single-threaded
                execution appear just as if the compiler had followed
                all the relevant rules.  To see this, compile with a
                high level of optimization and run the debugger on the
                resulting binary.

I added "for normal accesses", which excludes READ_ONCE(), WRITE_ONCE(),
and atomics.  This, in conjunction with the previously added
"single-threaded execution" means that yes, the compiler is permitted
to tear normal loads and stores.  The reason is that a single-threaded
run could not tell the difference.  Interrupt handlers or multiple
threads are required to detect load/store tearing.

So, what am I still missing?  ;-)

> > I have seen people (including kernel hackers) surprised by what optimizers
> > do, so I would prefer that the last sentence remain.
> > 
> > > >         c.      If there is only one variable but multiple CPUs, all
> > > >                 accesses to that variable must be aligned and full 
> > > > sized.
> > > 
> > > I would say that the variable is what needs to be aligned, not the
> > > accesses.  (Although, if the variable is aligned and all the accesses
> > > are full sized, then they must necessarily be aligned as well.)
> > 
> > I was thinking in terms of an unaligned 16-bit access to a 32-bit
> > variable.
> 
> That wouldn't be full sized.
> 
> >  But how about this?
> > 
> >     c.      If there is only one variable but multiple CPUs, all
> 
> Extra "all".  Otherwise okay.

Good catch, I removed the extra "all".

> >             that variable must be properly aligned and all accesses
> >             to that variable must be full sized.
> > 
> > > >                 Variables that straddle cachelines or pages void your
> > > >                 full-ordering warranty, as do undersized accesses that
> > > >                 load from or store to only part of the variable.
> > > 
> > > How can a variable straddle pages without also straddling cache lines?
> > 
> > Well, a variable -can- straddle cachelines without straddling pages,
> > which justifies the "or".  Furthermore, given that cacheline sizes have
> > been growing, but pages are still 4KB, it is probably only a matter
> > of time.  ;-)
> 
> By that time, we'll probably be using 64-KB pages.  Or even bigger!

PowerPC's server builds have had a minimum page size of 64KB for quite
a few years.  This helps in many cases, but of course hurts for those
occasional application that insist on doing a pile of independent 4K
mappings.  ;-)

So I would guess that the move from 4K pages to 64K (or whatever)
pages could be quite painful for some CPU families.

                                                        Thanx, Paul

Reply via email to