On 21 May 2014, at 07:23, Mark van Gulik <[email protected]> wrote:

>> yes, avail is different, but it does have built-in vm primitives that i want 
>> to know about.
>> and no, i don't want to implement a 32bit hashcode for every type of object 
>> if i don’t have to (and where would i efficiently store such a hash?)
> 
> Perhaps you could store the hash code in a map of type {any→int|}.  :-).  I'm 
> not entirely joking, by the way.

but then such map needs to be a weak map, otherwise it would hold on to garbage 
(and their hash codes)

> Treaps aren't generally hashed; there's usually a random priority key that 
> makes it a stochastically balanced tree.  Avail provides a pretty good random 
> generator with period 2^19937-1 – the Mersenne Twister.  It's not terribly 
> efficient at the moment, but that will improve when we start doing 
> code-splitting for integer registers and... loops!  To build a treap you can 
> make each node be a 4-tuple of the form <left child, value, right child, 
> priority>, with the empty treap being represented as the empty tuple.

ah, but you can also draw the random priority from the value’s (reproducible) 
hash!
so a 3-tuple <left-child,value,right-child> would suffice.
using value hashed priorities, a treap collapses into a uniquely 
representation, solely based on the values it holds.

> Alternatively, since beyond some scale Avail's tuples are height-balanced 
> 16-64-Trees, you can just maintain a sorted tuple.  To insert an element into 
> an already sorted tuple, do a binary search in the tuple t to determine the 
> insertion position n, then say "t := t[..n-1] ++ <newElement> ++ t[n..];".  
> To remove the element at position n, say "t := t[..n-1] ++ t[n+1..];".  
> That's it, really.  Since extracting a subrange and concatenating two tuples 
> each takes log(n) time, well, the insert and delete operations also take 
> log(n) time.  Technically the binary search takes log^2(n) since each tuple 
> read has to traverse a 16-64-Tree, but the base of that logarithm is pretty 
> large.  And the constant associated with concatenating 16-64-Trees is almost 
> certainly significantly larger than the constant associated with tuple 
> element reading.

yes, tuples can be split and concatenated, but they lack canonical 
representations.

> We're not all that concerned with malicious attacks, since the worst that can 
> happen by running carefully crafted crappy code is poor performance.  Our 
> concern is eliminating pathologies in the hashing.  There are lots of 
> libraries out there with terrible hash functions.  Poor distributions, 
> inappropriate mutability, inappropriate caching, poor combination functions, 
> etc.  Avail may have a few of those lurking – some of the combination 
> functions are still cheesy hacks from the old days of running on top of 
> Smalltalk.  Those will get replaced by more powerful bit-shuffling operations 
> at some point, I'm sure.

there are some carefully crafted codes in the wild, that launch denial of 
service attacks on major banking sites.
but i understand that dos attacks have low priority right now.

>> why not expose it as a special hash type object, and some methods to combine 
>> hash objects to new hash objects.
> 
> Interesting idea.  While I'm always on the lookout to find ways to make Avail 
> slower (it sometimes feels that way!), I think I might have to pass on 
> exposing hashes for now.  There may come a time when someone wants to build 
> an enormous external database that requires smooth hashing of the objects 
> being stored for various reasons (key-store, coalescing store, etc).  In that 
> circumstance we would probably want to expose either the built-in hashing 
> protocol (o_Hash()) or build a supplementary one for that new purpose 
> ("o_BigSlowHash()" or something).

sure, slow ideas should have the lowest priority :-)

> ...and then Todd chopped all that to bits like the Gordian Knot!  His insight 
> was to generate nearly all the primitive declarations by using the reflective 
> access that all the Primitive subclasses already provided for the 
> L2Translator.  A similar scheme uses the "special object_" primitive (given a 
> name via a pragma in Origin) to produce names for the argument types and 
> return types needed by the primitive declarations.

brilliant.

cheers,
Robbert.

Reply via email to