On Tue, 07 May 2013 11:30:12 -0400, Andrei Alexandrescu
<seewebsiteforem...@erdani.org> wrote:
On 5/7/13 10:31 AM, Steven Schveighoffer wrote:
On Tue, 07 May 2013 09:25:36 -0400, Andrei Alexandrescu
<seewebsiteforem...@erdani.org> wrote:
No. A tutorial on memory consistency models would be too long to
insert here. I don't know of a good online resource, does anyone?
In essence, a read requires an acquire memory barrier, a write requires
a release memory barrier, but in this case, we only need to be concerned
if the value we get back is not valid (i.e. NullValue).
Once in steady state, there is no need to acquire (as long as the write
is atomic, the read value will either be NullValue or ActualValue, not
something else).
There's always a need to acquire so as to figure whether the steady
state has been entered.
Not really. Whether it is entered or not is dictated by the vtable. Even
classic double-check locking doesn't need an acquire outside the lock.
Even if your CPU's view of the variable is outdated, the check after the
memory barrier inside the lock only occurs once. After that, steady state
is achieved. All subsequent reads need no memory barriers, because the
singleton object will never change after that.
The only thing we need to guard against is non-atomic writes, and out of
order writes of the static variable (fixed with a memory barrier).
Instruction ordering OUTSIDE the lock is irrelevant, because if we don't
get the "steady state" value (not null), then we go into the lock to
perform the careful initialization with barriers.
I think aligned native word writes are atomic, so we don't have to worry
about that.
But I think we've spent enough time on this solution. Yes, double-checked
locking can be done, but David's pattern is far easier to implement,
understand, and explain. It comes at a small cost of checking a boolean
before each access of the initialized data. His benchmarks show a very
small performance penalty. And another LARGE benefit is you don't have to
pull out your obscure (possibly challenged) memory model book/blog post or
the CPU spec to prove it :)
Hmm... you might be able to mitigate the penalty by storing the actual
object reference instead of a bool in the _instantiated variable. Then a
separate load is not required. David?
-Steve