On Tue, 07 May 2013 11:30:12 -0400, Andrei Alexandrescu <seewebsiteforem...@erdani.org> wrote:

On 5/7/13 10:31 AM, Steven Schveighoffer wrote:
On Tue, 07 May 2013 09:25:36 -0400, Andrei Alexandrescu
<seewebsiteforem...@erdani.org> wrote:

No. A tutorial on memory consistency models would be too long to
insert here. I don't know of a good online resource, does anyone?

In essence, a read requires an acquire memory barrier, a write requires
a release memory barrier, but in this case, we only need to be concerned
if the value we get back is not valid (i.e. NullValue).

Once in steady state, there is no need to acquire (as long as the write
is atomic, the read value will either be NullValue or ActualValue, not
something else).

There's always a need to acquire so as to figure whether the steady state has been entered.

Not really. Whether it is entered or not is dictated by the vtable. Even classic double-check locking doesn't need an acquire outside the lock. Even if your CPU's view of the variable is outdated, the check after the memory barrier inside the lock only occurs once. After that, steady state is achieved. All subsequent reads need no memory barriers, because the singleton object will never change after that.

The only thing we need to guard against is non-atomic writes, and out of order writes of the static variable (fixed with a memory barrier). Instruction ordering OUTSIDE the lock is irrelevant, because if we don't get the "steady state" value (not null), then we go into the lock to perform the careful initialization with barriers.

I think aligned native word writes are atomic, so we don't have to worry about that.

But I think we've spent enough time on this solution. Yes, double-checked locking can be done, but David's pattern is far easier to implement, understand, and explain. It comes at a small cost of checking a boolean before each access of the initialized data. His benchmarks show a very small performance penalty. And another LARGE benefit is you don't have to pull out your obscure (possibly challenged) memory model book/blog post or the CPU spec to prove it :)

Hmm... you might be able to mitigate the penalty by storing the actual object reference instead of a bool in the _instantiated variable. Then a separate load is not required. David?

-Steve

Reply via email to