On May 7, 2014, at 12:27 PM, Robbert van Dalen wrote:
> looking at the java source code, i see references to various object 
> properties such as (im)mutable, shared, etc.
> it appears there is a strong notion of (runtime) immutable, shared objects.

Avail programmers never need to see the constructs, as they stay *strictly* 
below the Avail/VM boundary.  In other words, they're not essential to make 
Avail work.

However, we implement two one-bit sticky reference counts via 
mutable/immutable/shared.  When an AvailObject is first created, it (generally) 
uses a descriptor that indicates the object is mutable.  That's because it will 
have one reference, the caller of the creation method.  If this new mutable 
object gets returned from a primitive, the reference is considered to "move" 
from the Java code to the continuation that will eventually hold the value, so 
it can remain mutable.  Similarly, if a mutable object on the operand stack of 
a fiber's current continuation is popped and written into a variable, it 
remains mutable.  To try to extend the mutability of objects as much as 
possible, we have distinct Level One nybblecode instructions for (1) getting 
the current value of a variable, and (2) doing the same but clearing the 
variable (so that it has no value afterward).  The former has to mark the value 
immutable (recursively), since the value will then be reachable from both the 
variable and the operand stack.  Note that immutability is a permanent state of 
the object; there is no transition back to mutable.

The shared state has to do with multithreading.  As long as the objects being 
manipulated by a fiber are only reachable via that fiber, there's no reason to 
mark the objects as shared.  The vast majority of objects never reach the 
shared state, even in a program that uses many fibers.  Pushing something onto 
the fiber's current continuation's stack doesn't cause the object to become 
shared, nor does writing an object into a non-shared variable.  However, if a 
variable is already marked as shared, any object written into it will first be 
recursively marked as shared.  It's perfectly fine for non-shared objects (even 
mutable ones) to refer to shared objects, but a shared object should only ever 
refer to other shared objects.

Variables that are marked as shared have a distinct descriptor class 
(VariableSharedDescriptor) which allows us to lock the Java monitor associated 
with that AvailObject for the duration of operations that are supposed to be 
atomic.  Not only does this cause mutual exclusion of access to the variable, 
but it also acts as a memory barrier.  Since a variable's value slot is the 
only slot that can change in Avail (basically), this ensures that fibers 
running simultaneously on distinct Threads see a perfectly consistent view of 
memory.  If they're accessing shared objects, those objects aren't changing.  
If they're accessing shared variables, they pass through a memory barrier that 
ensures all reads and writes appear to be perfectly serialized.

This is an *exceptionally* clean memory model.  It hides all the messiness of 
Java's model while avoiding locks on everything except shared variable 
accesses.  Atomic operations on non-shared variables don't even need to ensure 
memory coherence to be correct.

Here's the basic transition diagram:

Mutable
    |
    | (second reference is created)
    |
    v
Immutable
    |
    | (object is written into a shared variable)
    |
    v
Shared

It's also legal to transition directly from Mutable to Shared, which has the 
same effect as doing both transitions.

So non-shared objects never need locks of any kind.  What interesting property 
do mutable objects have?

Heh, that's a cornerstone of the way the Avail VM works.  Mutable objects are 
allowed to be destroyed or recycled by certain operations.  Say we have an 
AvailObject that represents the number 10^100.  Now say we attempt to add one 
to it (say by invoking primitive#1 – see P_001_Addition.java).  If the object 
is mutable, we know that the primitive will consume both the 10^100 object and 
the one object off the operand stack, then push their sum.  However, if the 
10^100 object is mutable, we are assured that there are no other references to 
it.  So why not recycle that space?  So we do.  We actually clobber that 
integer object (an AvailObject with IntegerDescriptor.mutable() as its 
descriptor), at least if the result occupies the same amount of space as the 
original.  So we recover some of the cost of uniformly boxing objects.  And the 
AvailObject representing the new value (which previously represented the old 
value) has no other incoming references, so it remains mutable.  We do the same 
for tuples, sets, maps, objects, floats, doubles, closures, continuations, and 
probably a lot more.  And the best part is that the Avail programmer can remain 
completely oblivious to this fact, since it can't have any visible effect 
except improved performance.  The Avail programmer doesn't have to care whether 
"foo" == "foo" like in Java, nor does he care if "foo"[1]→¢g clobbers the 
region of memory where "foo" occurs, or allocates a new region of memory to 
store "goo".  It has the same visible effect.

So now you see why we want to keep things mutable as long as we can.  
Eventually the Level Two optimizer will do significantly more powerful flow 
analysis to determine whether a value is being used monadically (i.e., never 
has to become immutable), but the Level One mechanisms are already pretty good. 
 And because all of our underlying objects are represented as trees when they 
get sufficiently big, even immutable and shared objects have reasonably 
efficient operations.  E.g., adding an element to a huge set has to clone *at 
most* a handful of small Bagwell hash tree bins, but ideally even those can 
just be clobbered directly if they're still mutable.


> is it true that an immutable object that is marked as ‘shared’ has more than 
> 1 reference to it? (i.e. a reference count of +1)?

No, but it almost certainly had more than one reference at some point in the 
past.  It's far too expensive to maintain an actual reference count, even with 
the Deutsch-Bobrow deferred counting mechanism and its successors.  Besides, 
the next time an operation happens on the immutable object, it simply produces 
a mutable copy with that one edit.  But it's mutable, so the next edit is cheap 
again.


> and that an object that is not shared (has 1 reference) can be ‘destroyed’ 
> while manipulated?

Yes, as above, but "is mutable" is the right distinction.


> what about shared mutable objects?

No such state.  If there are multiple fibers that can reach an object (shared) 
then we automatically treat it as having multiple references (immutable).


> lastly, are weak references first-class in avail?

Ah.  Not yet.  We use some of Java's weak structures in various places.  For 
example, atoms have a property map.  The keys of the property map are also 
atoms.  It's forbidden to enumerate all of an atom's properties (that would 
seriously violate modularity), so one may only ask if an atom has a 
*particular* property key, and to extract the corresponding value.  Because of 
that, we use a WeakHashMap<A_Atom, A_BasicObject> wrapped in a POJO (Plain Old 
Java Object).  If an atom being used as a property key is no longer reachable, 
we let the weak map drop that property entirely.

Our best idea for weak references so far is a new kind of variable that holds a 
set of values -- weakly.  However, since the vast majority of objects in Avail 
don't have identity, we wouldn't be able to use most Avail values as elements.  
When could the weak-set-variable holding {5, "foo"} drop the value 5 or the 
string "foo"?  When these values are no longer present in another section of 
memory?  If it could only hold atoms or variables it would make much more 
sense.  And in that case we could also introduce a weak-map-variable whose keys 
are atoms.


> i need this to be able to hash-cons objects (via weak maps)

I suggest wrapping a WeakHashMap in a POJO for now.  You should be able to 
instantiate and manipulate almost everything in the Java library (or anything 
else that's on the class path) using the POJO interface.  Wow, I haven't seen 
someone do hash-consing since I read Henry Baker's paper on the subject 
mumble-mumble years ago.

You might want to use our LRUCache.java.  It uses soft references, which are 
more appropriate than weak references for a computational cache.  
Unfortunately, to maintain integrity of the Avail runtime we explicitly forbid 
com.avail.* classes (or something like that) from being accessed as POJOs, so 
you may have to copy the code or wrap it with another class.


> it would be nice to have some documentation that discusses the various 
> properties and trade-offs .

True, but it must be considered developer information.  We've been *very* 
careful to keep these sharp edges away from Avail programmers – that's why 
Avail is actually nice to use, rather than, say, C++.  And exposing some of 
these ideas at the wrong time in someone's study of the language could lead to 
them picking up some superstitious behavior.  A friend of mine actually had a 
professor try to tell her to use new String("foo") everywhere in her code.  
Scary.  Then again, if you're running Java without a SecurityManager, it lets 
you use reflection to *change the characters of a string*.  It's that kind of 
leaky abstraction that we prevent at all costs in Avail.  And the biggest issue 
with exposing things like mutable/immutable/shared is that it's purely an 
*incidental* implementation technique.  It would be perfectly acceptable to 
build an Avail VM that didn't implement anything like mutable/immutable/shared. 
 Even Level Two is purely optional, which (1) allows us to build tools 
(browser, debugger, documentation generator) that only have to look as low as 
Level One, and (2) allows us to build a simple, alternative Avail VM that 
simply interprets Level One nybblecodes.  Or has an entirely different kind of 
Level Two.

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to