On Tue, Oct 17, 2017 at 05:03:01PM -0400, Alan Stern wrote: > On Tue, 17 Oct 2017, Paul E. McKenney wrote: > > > On Tue, Oct 17, 2017 at 03:38:23PM -0400, Alan Stern wrote: > > > On Tue, 17 Oct 2017, Paul E. McKenney wrote: > > > > > > > How about this? > > > > > > > > 0. Simple special cases > > > > > > > > If there is only one CPU on the one hand or only one variable > > > > on the other, the code will execute in order. There are (as > > > > usual) some things to be careful of: > > > > > > > > a. There are some aspects of the C language that are > > > > unordered. For example, the compiler can output code > > > > computing arguments of a multi-parameter function in > > > > any order it likes, or even interleaved if it so > > > > chooses. > > > > > > That parses a little oddly. I wouldn't agree that the compiler outputs > > > the code in any order it likes! > > > > When was the last time you talked to a compiler writer? ;-) > > > > > In fact, I wouldn't even mention the compiler at all. Just say that > > > (with a few exceptions) the language doesn't specify the order in which > > > the arguments of a function or operation should be evaluated. For > > > example, in the expression "f(x) + g(y)", the order in which f and g > > > are called is not defined; the object code is allowed to use either > > > order or even to interleave the computations. > > > > Nevertheless, I took your suggestion: > > > > a. There are some aspects of the C language that are > > unordered. For example, in the expression "f(x) + g(y)", > > the order in which f and g are called is not defined; > > the object code is allowed to use either order or even > > to interleave the computations. > > Good. > > > > > b. Compilers are permitted to use the "as-if" rule. > > > > That is, a compiler can emit whatever code it likes, > > > > as long as the results appear just as if the compiler > > > > had followed all the relevant rules. To see this, > > > > compiler with a high level of optimization and run > > > > the debugger on the resulting binary. > > > > > > You might omit the last sentence. Furthermore, if the accesses don't > > > use READ_ONCE/WRITE_ONCE then the code might not get the same result as > > > if it had executed in order (even for a single variable!), and if you > > > do use READ_ONCE/WRITE_ONCE then the compiler can't emit whatever code > > > it likes. > > > > Ah, I omitted an important qualifier: > > > > b. Compilers are permitted to use the "as-if" rule. That is, > > a compiler can emit whatever code it likes, as long as > > the results of a single-threaded execution appear just > > as if the compiler had followed all the relevant rules. > > To see this, compile with a high level of optimization > > and run the debugger on the resulting binary. > > That's okay for the single-CPU case. I don't think it covers the > multiple-CPU single-variable case correctly, though. If you don't use > READ_ONCE or WRITE_ONCE, isn't the compiler allowed to tear the loads > and stores? And won't that potentially cause the end result to be > different from what you would get if the code had appeared to execute > in order?
Ah, good point, I need yet another qualifier. How about the following? b. Compilers are permitted to use the "as-if" rule. That is, a compiler can emit whatever code it likes for normal accesses, as long as the results of a single-threaded execution appear just as if the compiler had followed all the relevant rules. To see this, compile with a high level of optimization and run the debugger on the resulting binary. I added "for normal accesses", which excludes READ_ONCE(), WRITE_ONCE(), and atomics. This, in conjunction with the previously added "single-threaded execution" means that yes, the compiler is permitted to tear normal loads and stores. The reason is that a single-threaded run could not tell the difference. Interrupt handlers or multiple threads are required to detect load/store tearing. So, what am I still missing? ;-) > > I have seen people (including kernel hackers) surprised by what optimizers > > do, so I would prefer that the last sentence remain. > > > > > > c. If there is only one variable but multiple CPUs, all > > > > accesses to that variable must be aligned and full > > > > sized. > > > > > > I would say that the variable is what needs to be aligned, not the > > > accesses. (Although, if the variable is aligned and all the accesses > > > are full sized, then they must necessarily be aligned as well.) > > > > I was thinking in terms of an unaligned 16-bit access to a 32-bit > > variable. > > That wouldn't be full sized. > > > But how about this? > > > > c. If there is only one variable but multiple CPUs, all > > Extra "all". Otherwise okay. Good catch, I removed the extra "all". > > that variable must be properly aligned and all accesses > > to that variable must be full sized. > > > > > > Variables that straddle cachelines or pages void your > > > > full-ordering warranty, as do undersized accesses that > > > > load from or store to only part of the variable. > > > > > > How can a variable straddle pages without also straddling cache lines? > > > > Well, a variable -can- straddle cachelines without straddling pages, > > which justifies the "or". Furthermore, given that cacheline sizes have > > been growing, but pages are still 4KB, it is probably only a matter > > of time. ;-) > > By that time, we'll probably be using 64-KB pages. Or even bigger! PowerPC's server builds have had a minimum page size of 64KB for quite a few years. This helps in many cases, but of course hurts for those occasional application that insist on doing a pile of independent 4K mappings. ;-) So I would guess that the move from 4K pages to 64K (or whatever) pages could be quite painful for some CPU families. Thanx, Paul