On Wednesday, January 28, 2004, at 12:53 , Melvin Smith wrote:

At 12:27 PM 1/23/2004 -0800, Damien Neil wrote:

Java Collections are a standard Java library of common data structures such as arrays and hashes. Collections are not synchronized; access involves no locks at all. Multiple threads accessing the same collection at the same time cannot, however, result in the virtual machine crashing. (They can result in data structure corruption, but this corruption is limited to "surprising results" rather than "VM crash".)

But this accomplishes nothing useful and still means the data structure is not re-entrant, nor is it corruption "resistant", regardless of how we judge it.

It does accomplish something very useful indeed: It avoids the overhead of automatic locking when it isn't necessary. When *is* that locking necessary? To a second order approximation, ***NEVER.***



Never? Yes. From the user's perspective, "synchronized objects" more often than not provide no value! For instance, this Java code, run from competing threads, does not perform its intended purpose:


        //      Vector was Java 1's dynamically sizable array class.
        if (!vector.Contains(obj))
                vector.Add(obj);

I does not prevent obj from appearing more than once in the collection. It's equivalent to this:

        bool temp;
        synchronized (collection) {
                temp = vector.Contains(obj);
        }
        //      Preemption is possible between here...
        if (!temp) {
                sychronized (collection) {
                        //      ... and here.
                        vector.Add(obj);
                }
        }

The correct code is this:

        synchronized (vector) {
                if (!vector.Contains(obj))
                        vector.Add(obj);
        }

If we again make explicit Vector's synchronized methods, a startling redundancy will become apparent:

        synchronized (vector) {
                bool temp;
                synchronized (collection) {
                        temp = vector.Contains(obj);
                }
                if (!temp) {
                        sychronized (collection) {
                                vector.Add(obj);
                        }
                }
        }

This code is performing 3 times as many locks as necessary! More realistic code will perform many times more than 3 times the necessary locking. Imagine the waste in sorting a 10,000 element array.

Beyond that stunning example of wasted effort, most structures aren't shared in the first place, and so the overhead is *completely* unnecessary for them.

Still further, many "shared" objects are unmodified once they become shared, and so again require no locking.

The only time automatically synchronized objects are useful is when the user's semantics exactly match the object's methods. (e.g., an automatically synchronized queue might be quite useful.) In that case, it's a trivial matter for the user to wrap the operation in a synchronized block—or subclass the object with a method that does so. But it is quite impossible to remove the overhead from the other 99% of uses, since the VM cannot discern the user's locking strategy.


If the user is writing a threaded program, then data integrity is necessarily his problem—a virtual machine cannot solve it for him, and automatic synchronization provides little help. "Protecting" a policy-neutral object (such as an array) from its user is pointless; all we could do is to ensure that the object is in a state consistent with an incorrect sequence of operations. That's useless to the user, because his program still behaves incorrectly; the object is, to his eyes, corrupted since it no longer maintains his invariant conditions.


Thus, since the VM cannot guarantee that the the user's program behaves as intended (only as written), and all that locking is wasted 4 times over—the virtual machine would be wise to limit its role to the prevention of crashes and to limiting the corruption that can result from incorrect (unsynchronized) access to objects. If automatic locking is the only way to do that for a particular data structure, then so be it. Oftentimes, though, thoughtful design can ensure that even unsynchronized accesses cannot crash or corrupt the VM as a whole--even if those operations might corrupt one object's state, or provide less than useful results.


As an example of how this applies to parrot, all of the FixedMumbleArray classes[*] recently discussed could clearly be implemented completely and safely without automatic locks on all modern platforms—if only parrot could allow lock-free access to at least some PMCs. ([*] Except FixedMixedArray. That stores UVals [plus more for a discriminator field], and couldn't be quite safe, as a UVal isn't an atomic write on many platforms. Then again, nor is a double.)


Remember, Java already made the mistake of automatic synchronization with their original set of collections (incl. Vector). It was a major design flaw, crippling the performance of early Java programs. Fixing it for Java 2's class library was quite non-trivial, but because the VM was powerful enough to allow it, more efficient unsynchronized collections could be made available as a class library even for the earlier Java VMs.



Gordon Henriksen
[EMAIL PROTECTED]

Reply via email to