On Friday, January 16, 2004, at 02:58 , Jeff Clites wrote:

On Jan 15, 2004, at 10:55 PM, Leopold Toetsch wrote:
Damien Neil <[EMAIL PROTECTED]> wrote:
On Thu, Jan 15, 2004 at 09:31:39AM +0100, Leopold Toetsch wrote:
I don't see any advantage of such a model. The more as it doesn't
gurantee any atomic access to e.g. long or doubles. The atomic access to
ints and pointers seems to rely on the architecture but is of course
reasonable.

You *can't* guarantee atomic access to longs and doubles on some
architectures, unless you wrap every read or write to one with a
lock.  The CPU support isn't there.

Yes, that's what I'm saying. I don't see an advantage of JVMs multi-step
variable access, because it even doesn't provide such atomic access.

What I was expecting that the Java model was trying to do (though I didn't find this) was something along these lines: "Accessing the main store involves locking, so by copying things to a thread-local store we can perform several operations on an item before we have to move it back to the main store (again, with locking). If we worked directly from the main store, we'd have to lock for each and every use of the variable."

I think the real purpose of the model was to say "thread-local values may be committed to main memory (perhaps significantly) after the local copy is logically assigned." Thus: In the absence of explicit synchronization, threads may manipulate potentially inconsistent local copies of variables. This model addresses: Copies of variables in registers, copies on the JVM stack, copies in the stack frame, thread preemption prior to store (which occurs on uniprocessors and multiprocessors alike), and delayed write-back caches in SMP systems.[*]


In short, this portion of the spec provides bounds for the undefinedness of the behavior that occurs when programs do not use Java's synchronization primitives. It does so realistically, in a manner that contemporary computer systems can implement efficiently. (In fact, the spec is far more descriptive than it is proscriptive.)

Or, as an example, it allows the natural thing to happen here:

; PPC[**] implementation of:
;     var = var + var;
lwz r30, 0(r29)      ; load var      (1 JVM "load")
addi r30, r30, r30   ; double var    (2 JVM "uses", 1 JVM "assign")
; if your thread is preempted here
stw r30, 0(r29)      ; store var     (1 JVM "store")

And allows obvious optimizations like this:

; PPC implementation of:
;     var = var + var;
;     var = var + var;
lwz r30, 0(r29)
addi r30, r30, r30
; imagine your thread is preempted here
addi r30, r30, r30
stw 0(r29), r30

But it explicitly disallows that same optimization for a case like this:

    var = var + var;
    synchronized (other} { other++; }
    var = var + var;

That--that, and the whole cache coherency/delayed write thing:

; CPU 1                     ; CPU 2
loadi r29, 0xFFF8           loadi r29, 0xFFF8
loadi r30, 0xDEAD           loadi r30, 0xBEEF
stw 0(r29), r30             stw 0(r29), r30
lwz r28, 0(r29)             lwz r28, 0(r29)
; r28 is probably 0xDEAD    ; r28 is probably 0xBEEF
; (but could be 0xBEEF)     ; (but could be 0xDEAD)
sync                        noop
lwz r28, 0(r29)             lwz r28, 0(r29)
; r28 matches on both CPUs now, either 0xDEAD or
; 0xBEEF (but not 0xDEEF or 0xBEAD or 0x9999).


[* - On many SMP systems, processors do not have coherent views of main memory (due to their private data caches) unless the program executes explicit memory synchronization operations, which are at least expensive enough that you don't want to execute them on every opcode.]


[** - Forgive my rusty assembler.]

The reason I'm not finding it is that the semantic rules spelled out in the spec _seem_ to imply that every local access implies a corresponding access to the main store, one-to-one. On the other hand, maybe the point is that it can "save up" these accesses--that is, lock the main store once, and push back several values from the thread-local store. If it can do this, then it is saving some locking.

The spec doesn't lock except when, or if the architecture were unable to provide atomic loads and stores for 32-bit quantities.


Parrot deals with PMCs, which can contain (lets consider scalars only)
e.g. a PerlInt or a PerlNumer. Now we would have atomic access
(normally) to the former and very likely non-atomic access to the latter
just depending on the value which happened to be stored in the PMC.


This implies, that we have to wrap almost[1] all shared write *and* read
PMC access with LOCK/UNLOCK.


[1] except plain ints and pointers on current platforms

Ah, but this misses a key point: We know that user data is allowed to get corrupted if the user isn't locking properly--we only have to protect VM-internal state. The key point is that it's very unlikely that there will be any floats involved in VM-internal state--it's going to be all pointers and ints (for offsets and lengths). That is, a corrupted float won't crash the VM.

On one point with respect to "VM-internal state," morph scares me quite a bit. Type-stable memory is vital to lock-free (efficient) data access. Consider:


a_pmc is of type A
--- thread 1 is executing ---
Thread 1: load a_pmc>vtable->whatever
Thread 1: call to a_vtable_whatever begins...
--- thread 2 preempts thread 1 ---
Thread 2: morph PMC to type B
Thread 2: Allocate new B-type pmc_ext structure
Thread 2: update a_pmc->vtable
Thread 2: adjust a_pmc->pmc_ext
--- thread 1 resumes ---
Thread 1: ... finish setting up the call
Thread 1: a_vtable_whatever reads ((A_ext) a_pmc->pmc_ext)->foo; B_ext happens to keep some float at the same offset in the struct
Thread 1: chaos


Keeping in mind the reality of delayed writes on SMP systems, there's just no way to code around this except to acquire a lock on the entire PMC for every read or write. Bye-bye performance: Acquire 2-3 locks to add 2 PerlInts together? ... Or bad performance and poor concurrency: Acquire 1 global lock whenever performing any PMC operation.

I'm tired....



Gordon Henriksen
[EMAIL PROTECTED]

Reply via email to