On May 7, 2014, at 12:27 PM, Robbert van Dalen wrote: > looking at the java source code, i see references to various object > properties such as (im)mutable, shared, etc. > it appears there is a strong notion of (runtime) immutable, shared objects.
Avail programmers never need to see the constructs, as they stay *strictly*
below the Avail/VM boundary. In other words, they're not essential to make
Avail work.
However, we implement two one-bit sticky reference counts via
mutable/immutable/shared. When an AvailObject is first created, it (generally)
uses a descriptor that indicates the object is mutable. That's because it will
have one reference, the caller of the creation method. If this new mutable
object gets returned from a primitive, the reference is considered to "move"
from the Java code to the continuation that will eventually hold the value, so
it can remain mutable. Similarly, if a mutable object on the operand stack of
a fiber's current continuation is popped and written into a variable, it
remains mutable. To try to extend the mutability of objects as much as
possible, we have distinct Level One nybblecode instructions for (1) getting
the current value of a variable, and (2) doing the same but clearing the
variable (so that it has no value afterward). The former has to mark the value
immutable (recursively), since the value will then be reachable from both the
variable and the operand stack. Note that immutability is a permanent state of
the object; there is no transition back to mutable.
The shared state has to do with multithreading. As long as the objects being
manipulated by a fiber are only reachable via that fiber, there's no reason to
mark the objects as shared. The vast majority of objects never reach the
shared state, even in a program that uses many fibers. Pushing something onto
the fiber's current continuation's stack doesn't cause the object to become
shared, nor does writing an object into a non-shared variable. However, if a
variable is already marked as shared, any object written into it will first be
recursively marked as shared. It's perfectly fine for non-shared objects (even
mutable ones) to refer to shared objects, but a shared object should only ever
refer to other shared objects.
Variables that are marked as shared have a distinct descriptor class
(VariableSharedDescriptor) which allows us to lock the Java monitor associated
with that AvailObject for the duration of operations that are supposed to be
atomic. Not only does this cause mutual exclusion of access to the variable,
but it also acts as a memory barrier. Since a variable's value slot is the
only slot that can change in Avail (basically), this ensures that fibers
running simultaneously on distinct Threads see a perfectly consistent view of
memory. If they're accessing shared objects, those objects aren't changing.
If they're accessing shared variables, they pass through a memory barrier that
ensures all reads and writes appear to be perfectly serialized.
This is an *exceptionally* clean memory model. It hides all the messiness of
Java's model while avoiding locks on everything except shared variable
accesses. Atomic operations on non-shared variables don't even need to ensure
memory coherence to be correct.
Here's the basic transition diagram:
Mutable
|
| (second reference is created)
|
v
Immutable
|
| (object is written into a shared variable)
|
v
Shared
It's also legal to transition directly from Mutable to Shared, which has the
same effect as doing both transitions.
So non-shared objects never need locks of any kind. What interesting property
do mutable objects have?
Heh, that's a cornerstone of the way the Avail VM works. Mutable objects are
allowed to be destroyed or recycled by certain operations. Say we have an
AvailObject that represents the number 10^100. Now say we attempt to add one
to it (say by invoking primitive#1 – see P_001_Addition.java). If the object
is mutable, we know that the primitive will consume both the 10^100 object and
the one object off the operand stack, then push their sum. However, if the
10^100 object is mutable, we are assured that there are no other references to
it. So why not recycle that space? So we do. We actually clobber that
integer object (an AvailObject with IntegerDescriptor.mutable() as its
descriptor), at least if the result occupies the same amount of space as the
original. So we recover some of the cost of uniformly boxing objects. And the
AvailObject representing the new value (which previously represented the old
value) has no other incoming references, so it remains mutable. We do the same
for tuples, sets, maps, objects, floats, doubles, closures, continuations, and
probably a lot more. And the best part is that the Avail programmer can remain
completely oblivious to this fact, since it can't have any visible effect
except improved performance. The Avail programmer doesn't have to care whether
"foo" == "foo" like in Java, nor does he care if "foo"[1]→¢g clobbers the
region of memory where "foo" occurs, or allocates a new region of memory to
store "goo". It has the same visible effect.
So now you see why we want to keep things mutable as long as we can.
Eventually the Level Two optimizer will do significantly more powerful flow
analysis to determine whether a value is being used monadically (i.e., never
has to become immutable), but the Level One mechanisms are already pretty good.
And because all of our underlying objects are represented as trees when they
get sufficiently big, even immutable and shared objects have reasonably
efficient operations. E.g., adding an element to a huge set has to clone *at
most* a handful of small Bagwell hash tree bins, but ideally even those can
just be clobbered directly if they're still mutable.
> is it true that an immutable object that is marked as ‘shared’ has more than
> 1 reference to it? (i.e. a reference count of +1)?
No, but it almost certainly had more than one reference at some point in the
past. It's far too expensive to maintain an actual reference count, even with
the Deutsch-Bobrow deferred counting mechanism and its successors. Besides,
the next time an operation happens on the immutable object, it simply produces
a mutable copy with that one edit. But it's mutable, so the next edit is cheap
again.
> and that an object that is not shared (has 1 reference) can be ‘destroyed’
> while manipulated?
Yes, as above, but "is mutable" is the right distinction.
> what about shared mutable objects?
No such state. If there are multiple fibers that can reach an object (shared)
then we automatically treat it as having multiple references (immutable).
> lastly, are weak references first-class in avail?
Ah. Not yet. We use some of Java's weak structures in various places. For
example, atoms have a property map. The keys of the property map are also
atoms. It's forbidden to enumerate all of an atom's properties (that would
seriously violate modularity), so one may only ask if an atom has a
*particular* property key, and to extract the corresponding value. Because of
that, we use a WeakHashMap<A_Atom, A_BasicObject> wrapped in a POJO (Plain Old
Java Object). If an atom being used as a property key is no longer reachable,
we let the weak map drop that property entirely.
Our best idea for weak references so far is a new kind of variable that holds a
set of values -- weakly. However, since the vast majority of objects in Avail
don't have identity, we wouldn't be able to use most Avail values as elements.
When could the weak-set-variable holding {5, "foo"} drop the value 5 or the
string "foo"? When these values are no longer present in another section of
memory? If it could only hold atoms or variables it would make much more
sense. And in that case we could also introduce a weak-map-variable whose keys
are atoms.
> i need this to be able to hash-cons objects (via weak maps)
I suggest wrapping a WeakHashMap in a POJO for now. You should be able to
instantiate and manipulate almost everything in the Java library (or anything
else that's on the class path) using the POJO interface. Wow, I haven't seen
someone do hash-consing since I read Henry Baker's paper on the subject
mumble-mumble years ago.
You might want to use our LRUCache.java. It uses soft references, which are
more appropriate than weak references for a computational cache.
Unfortunately, to maintain integrity of the Avail runtime we explicitly forbid
com.avail.* classes (or something like that) from being accessed as POJOs, so
you may have to copy the code or wrap it with another class.
> it would be nice to have some documentation that discusses the various
> properties and trade-offs .
True, but it must be considered developer information. We've been *very*
careful to keep these sharp edges away from Avail programmers – that's why
Avail is actually nice to use, rather than, say, C++. And exposing some of
these ideas at the wrong time in someone's study of the language could lead to
them picking up some superstitious behavior. A friend of mine actually had a
professor try to tell her to use new String("foo") everywhere in her code.
Scary. Then again, if you're running Java without a SecurityManager, it lets
you use reflection to *change the characters of a string*. It's that kind of
leaky abstraction that we prevent at all costs in Avail. And the biggest issue
with exposing things like mutable/immutable/shared is that it's purely an
*incidental* implementation technique. It would be perfectly acceptable to
build an Avail VM that didn't implement anything like mutable/immutable/shared.
Even Level Two is purely optional, which (1) allows us to build tools
(browser, debugger, documentation generator) that only have to look as low as
Level One, and (2) allows us to build a simple, alternative Avail VM that
simply interprets Level One nybblecodes. Or has an entirely different kind of
Level Two.
signature.asc
Description: Message signed with OpenPGP using GPGMail
