Re: [Avail] hashcode, equals, etc

Mark van Gulik Mon, 19 May 2014 13:58:24 -0700

Avail's different from most languages – as I'm sure you've noticed.  There is 
no fixed list of operations that are accessible everywhere in the code (by 
design, for DSLs), so technically there are no base methods that objects 
(values) respond to.  You're always free to import and define as many or as few 
methods as you want that accept an argument of type "any" (the highest usable 
type).

That said, there is a definition of "_=_" early in the graph of modules which 
provides, via a trivial primitive P_060_Equality, access to the notion of 
equality that Avail's VM already fully defines for all Avail values.  
Similarly, "_’s⁇type" provides access through primitive P_030_Type to Avail's 
type system, which is fully implemented by the VM.

Unlike meta-object protocols and their ilk which exist just to allow a 
programmer to make limited patches to a feeble, unusable type system, Avail 
instead provides a fully usable, complete type system.  This may seem contrary 
to the way object-oriented programming is normally organized... because it is.

Your remaining case, "_’s⁇hash", is slightly different.  While we could provide 
access to it in a matter of minutes through a primitive as trivial as 
P_060_Equality, we are currently choosing not to.  We have quite a bit of deep 
infrastructure in place to support pure mathematical notions of sets and maps, 
and we feel that exposing hash values directly may cause two problems:

(1) People might build their own hashed structures that provide no more actual 
power than sets and maps, and
(2) Exposing the hash values may lead to pathologically hashed sets and maps.  
Imagine manually building a bin-like structure in a misguided attempt to 
improve the performance of sets.  Such a structure would segregate values based 
on their hash values, potentially placing them into sets acting as bins of this 
construction.  Those sets would have hash values that were highly correlated, 
which can degrade performance.

Quick story:  We recently uncovered a bug that caused a lot of values to have 
the same hash value.  I don't recall what sort of values had this problem (it's 
described in the commit notes somewhere within the last month or two), but it 
led to a situation in which a Bagwell ideal hash tree was forced to have seven 
hashed levels (5 bits of hash at each level), and failed while attempting to 
use a linear bin at the bottom.  It was a simple off-by-one error in the set 
implementation that caused the failure, but we were lucky that it failed rather 
than simply degrading performance.  We fixed both the poor hash distribution 
problem and the off-by-one bug related to bin depth in the set implementation.  
I believe maps had the same problem, but I don't recall.

There's a third, minor issue as well: (3) exposing the hash value also exposes 
the fact that it fits in a 32-bit int.  If we expose that, then we'll have to 
break working user-written code if we change the number of bits to 64, or 
provide some horribly kludgey "_’s⁇NEW hash" method (heh, like "The New iPad" 
or "Scott Joplin's New Rag").  Worse, Avail may eventually be picked up for use 
"in the small", and some space-constrained implementations of Avail may 
ultimately choose to compute and store fewer hash bits, like 24 or even 16.  Or 
eliminate hashing entirely if there aren't going to be any large hashed 
structures.  They are just a VM optimization, after all.

The Avail VM already does a lot with hash values and equality semantics.  It 
coalesces equal objects.  It caches hash values within complex objects for 
performance without sacrificing good hash distribution.  These cached hash 
values are automatically invalidated or recomputed when a mutable 
representation of an object changes (it only has one reference, so it can't be 
the key of a set or map).  It (sometimes) compares cached hash values as an 
early check for equality to speed up unequal comparisons (equal comparisons are 
sped up by coalescence).  The specific hash functions we've implemented allow 
very rapid computation when manipulating tuples (replacing element(s), 
concatenating, extracting subranges), sets (adding/removing elements), etc.  
The cached hash values are sometimes updated during these "mathematical 
identity" shortcuts during quasi-destructive operations on mutable objects.  
But the most beautiful part of this design is that these things are completely 
hidden from the Avail programmer.  An Avail programmer gets zero-effort, 
reliable, efficient sets and maps – assuming we did our job right!

To answer your second point, however, you might want to include the subtype 
test "_⊆_" in your list.  This, like the others, uses a paper-thin primitive 
(P_033_IsSubtypeOf) to access Avail's built-in, complete rules about 
compatibility of types.  See implementors of o_IsInstanceOf() and its numerous 
helpers for the vast variety of ways in which types comparisons actually happen.

You might want to add some other basic operations to your list as well, like 
"<«_‡,»>" and "{«_‡,»}", since any value can be put in a tuple or set.  Also 
"“_”" (the inner quotes are smart double-quotes), the user-overridable 
stringification operation.  The VM gets told what to name this in the very 
first module, Origin.avail.  The default implementation for [any]→string simply 
asks the value's descriptor to produce something somewhat suitable, but the 
Avail method should be overridden for types of values that can be presented 
more appropriately in some other way.

On May 19, 2014, at 2:10 PM, Robbert van Dalen wrote:

> hi,
> 
> i wonder what’s the base set of methods that (any type of )object responds to:
> these are likely candidates:
> 
> _=_
> _’s⁇hash
> _’s⁇type
> 
> are there any more?
> 
> cheers,
> Robbert.

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [Avail] hashcode, equals, etc

Reply via email to