> On May 10, 2014, at 11:38 PM, Robbert van Dalen <[email protected]> > wrote: > > >> On 11 May 2014, at 02:00, Mark van Gulik <[email protected]> wrote: >> >> For simplicity, consider tuples and variables. A variable containing a >> tuple is still the same variable if you change what tuple it contains. So >> equality for variables is based on the variables' identity. > > how is the hash of a variable determined? > does it only depend on the variables’ identity (aka, its memory location), or > is it also dependent on its content?
A variable's hash value is permanent, and can be considered to have been generated randomly when the variable was created. However, most variables never get put into a hashed structure, so we actually postpone generating the hash until someone requests it. We also generate (and save into a dedicated slot of the variable) the hash for the relatively infrequent case that a variable becomes shared. If we didn't do this we might have two fibers (bound to two distinct threads) attempting to fetch the hash value simultaneously. Each would see that the hash slot was not yet occupied and would generate, cache in the slot, and return a different hash value. It would require using a lock in the hash operation to avoid this race, so we avoid the cost by simply generating a hash, if one hasn't already been computed, at the last moment before the variable becomes shared. So to directly answer your question, the hash of the variable is random but permanent, and is not affected by the variable's current content. >> So combining these principles, a tuple of variables has equality semantics >> that depend on the identities of the variables. Two tuples of variables are >> the same tuple of variables if they refer to the same (identical) variables >> in the same positions. So in one sense the tuple has "identity", in that >> its equality depends on identity of its components, but in a stronger sense >> a tuple is really only relying on equality of its components (which in this >> case was equivalent to identity). > > what happens with the hash of set - that contains a variable that has been > mutated? Viewing the system as variables and immutable objects, and ignoring the mutable optimization, the hash of a value *never* changes. Similarly, the equality or inequality of two objects never changes. However, if we take into account our mutable object trick, a hash value can change -- precisely when that can't be detected because an object's storage has been recycled because the last reference to it was dropped. You can think of the mutable object mechanism as being a memory management trick. In fact, any garbage collected system already has a level of description at which storage for an object is recycled, such that new objects can occupy the storage that old ones used to occupy, and apparently breaking basic principles of equality, identity, and hashing. That's not the right level of description, just like saying that the electrons flowing through the transistors that make up a flip-flop (or alternatively the electron holes migrating backwards) don't preserve the identity of the ones and zeros. The simplifying assumption of a macro-state (only two of which exist for the flop-flop) is the same kind of abstraction that we use when talking about the semantics actually exposed to the Avail programmer. In essence, the Avail VM *implements* a higher level set of semantics, that in which only variables and completely immutable objects can be discussed meaningfully. So if an immutable set contains some variables, it will always contain those variables. And since the variables are the same variables even after having different values assigned to them, the set remains unchanged. An alternative implementation for variables might clarify things further. Say that instead of a slot within the variable for its current value, we instead used a giant, global, mutable Java Map<> from each variable to its value. Using that mechanism we wouldn't expect that binding a new value to a variable via this Map<> would cause a variable to now be a different variable. And yet this implementation should have the same semantics as our actual implementation in which a slot is reserved within each variable to hold the value that it's bound to. Maybe the appropriate mental model is that the variable's current value slot isn't considered part of the variable, but is just an implementation trick that allows us to look up the value that Avail currently has bound to that variable, more efficiently than with a global Map<>.
