On Friday, January 16, 2004, at 08:38 , Jeff Clites wrote:

On Jan 16, 2004, at 1:01 PM, Damien Neil wrote:

On Thu, Jan 15, 2004 at 11:58:22PM -0800, Jeff Clites wrote:

On Jan 15, 2004, at 10:55 PM, Leopold Toetsch wrote:

Yes, that's what I'm saying. I don't see an advantage of JVMs multi-step variable access, because it even doesn't provide such atomic access.

You're missing the point of the multi-step access. It has nothing to do with threading or atomic access to variables.

... it has everything to do with allowing multiprocessors to operate without extraneous synchronization.


The JVM is a stack machine. JVM opcodes operate on the stack, not on main memory. The stack is thread-local. In order for a thread to operate on a variable, therefore, it must first copy it from main store to thread-local store (the stack).

Parrot, so far as I know, operates in exactly the same way, except that the thread-local store is a set of registers rather than a stack.

Both VMs separate working-set data (the stack and/or registers) from main store to reduce symbol table lookups.
...
This will all make a lot more sense if you keep in mind that Parrot--unthreaded as it is right now--*also* copies variables to working store before operating on them. This isn't some odd JVM strangeness. The JVM threading document is simply describing how the stack interacts with main memory.

I think the JVM spec actually implying something beyond this. For instance section 8.3 states, "A store operation by T on V must intervene between an assign by T of V and a subsequent load by T of V." Translating this to parrot terms, this would mean that the following is illegal, which it clearly isn't:


find_global P0, "V"
set P0, P1 # "assign by T of V"
find_global P0, "V" # "a subsequent load by T of V" w/o an intervening "store operation by T on V"

This rule addresses aliasing. It says that this (in PPC assembly):


        ; presume &(obj->i) == &obj+12
        lwz r29, 12(r30) ; read, load
        addi r29, r29, 1 ; use, assign
        lwz r28, 12(r30) ; read, load
        addi r28, r28, 1 ; use, assign
        stw 0(r30), r29 ; store, eventual write
        stw 0(r30), r28 ; store, eventual write

... is an invalid implementation of this:

        j.i = j.i + 1;
        k.i = k.i + 1;

... where the JVM cannot prove j == k to be false. The rule states that the stw of r29 must precede the stw of r28. Why this is under threading... beyond me.


Let me briefly hilight the operations as discussed and digress a little bit as to why all the layers:


    main memory
        --read+load-->
            working copy (register file, stack frame, etc.)
                --use-->
                    execution engine (CPU core)
                <--assign--
            working copy (register file, stack frame, etc.)
        <--write+store--
    main memory

(I paired the read+load and write+store due to the second set of rules in 8.2.)

The spec never says where a read puts something that a load can use it, or where a store puts something that a write can use it. A store with its paired write pending is simply an in-flight memory transaction (and the same for a read+load pair). Possible places the value could be: In-flight on the system bus; queued by the memory controller; on a dirty line in a write-back cache; somewhere in transit on a NUMA architecture. Store-write and read-load are just different ends of the underlying ISA's load and store memory transactions. The read and write operations specify the operations from the memory controller's perspective; load and store specify them from the program's perspective.

Note that reads and writes are performed "by main memory," not "by a thread." That distinction is crucial to reading the following section from the spec:

"8.2 EXECUTION ORDER AND CONSISTENCY
"The rules of execution order constrain the order in which certain events may occur. There are four general constraints on the relationships among actions:


* "The actions performed by any one thread are totally ordered; that is, for any two actions performed by a thread, one action precedes the other.
* "The actions performed by the main memory for any one variable are totally ordered; that is, for any two actions performed by the main memory on the same variable, one action precedes the other.
• "..."


The extra read/write step essentially allows main memory (the memory controller) to order its operations with bounded independence of any particular thread. Careful reading of the other rules will show that this is only a useful abstraction in the case of true concurrency (e.g., SMP), as the other rules ensure that a single processor will always load variables in a state consistent with what it last stored.

I think it is talking about something below the Java-bytecode level--remember, this is the JVM spec, and constrains how an implementation of the JVM must behave when executing a sequence of opcodes, not the rules a Java compiler must follow when generating a sequence of opcodes from Java source code.

What I think it's really saying, again translated into Parrot terms, is this:

What it's really saying is that JVM must be wary of aliasing, a bane of optimizing compilers.


store_global "foo", P0 # internally, may cache value and not push to main memory

Well, might not immediately push to main memory. But, yes, a dirty write cache would qualify as an in-flight memory transaction (an uncommitted store operation).


find_global P0, "foo" # internally, can't pull value from main memory if above value was not yet pushed there

True, but it's enforced by a different rule:


"Let action A be a load or store by thread T on variable V, and let action P be the corresponding read or write by the main memory on variable V. Similarly, let action B be some other load or store by thread T on that same variable V, and let action Q be the corresponding read or write by the main memory on variable V. If A precedes B, then P must precede Q. (Less formally: operations on the master copy of any given variable on behalf of a thread are performed by the main memory in exactly the order that the thread requested.)"

and I think the point is this:

find_global P0, "foo" # internally, also caches value in thread-local storage
find_global P0, "foo" # internally, can use cached thread-local value

Don't think of working storage as a cache. Think of it as a necessity: The processor can't operate on data unless it has been brought into the working storage. The true complexities of caching are encompassed by the read and write operations and the orderings imposed on them—particularly with respect to the fact that only explicit locking imposes an ordering on reads and writes between threads. Caches, the stack, and registers all fall under the rubric of working storage in this spec.


And, as mentioned in section 8.6, any time a lock is taken, cached values need to be pushed back into main memory, and the local cache emptied. This doesn't make any sense if the "thread's working memory" is interpreted as the stack.

It does make sense. The rules says that all thread-local changes must be flushed completely to main memory before the thread blocks. It ensures that other threads will see changes made by the thread, and visa versa, so that data protected by the lock is always viewed in a consistent state. On a PowerPC, the JVM just needs to store all assigned-but-unstored variables and then execute the PPC sync instruction.




Gordon Henriksen
[EMAIL PROTECTED]


[Vocab note: A write-back cache only stores changes to main memory when the variable is evicted from the cache (it optimizes both loads and stores). Its corollary is a write-through cache, which stores changes to main memory on every update (it only optimizes loads). A write-back cache prevents implicit SMP cache coherency, because it gives the other processor caches no sign that the memory word has been changed until the word is evicted naturally or a cache-coherency instruction (like sync) forces the issue.]

Reply via email to