On 21 May 2014, at 07:23, Mark van Gulik <[email protected]> wrote:
>> yes, avail is different, but it does have built-in vm primitives that i want
>> to know about.
>> and no, i don't want to implement a 32bit hashcode for every type of object
>> if i don’t have to (and where would i efficiently store such a hash?)
>
> Perhaps you could store the hash code in a map of type {any→int|}. :-). I'm
> not entirely joking, by the way.
but then such map needs to be a weak map, otherwise it would hold on to garbage
(and their hash codes)
> Treaps aren't generally hashed; there's usually a random priority key that
> makes it a stochastically balanced tree. Avail provides a pretty good random
> generator with period 2^19937-1 – the Mersenne Twister. It's not terribly
> efficient at the moment, but that will improve when we start doing
> code-splitting for integer registers and... loops! To build a treap you can
> make each node be a 4-tuple of the form <left child, value, right child,
> priority>, with the empty treap being represented as the empty tuple.
ah, but you can also draw the random priority from the value’s (reproducible)
hash!
so a 3-tuple <left-child,value,right-child> would suffice.
using value hashed priorities, a treap collapses into a uniquely
representation, solely based on the values it holds.
> Alternatively, since beyond some scale Avail's tuples are height-balanced
> 16-64-Trees, you can just maintain a sorted tuple. To insert an element into
> an already sorted tuple, do a binary search in the tuple t to determine the
> insertion position n, then say "t := t[..n-1] ++ <newElement> ++ t[n..];".
> To remove the element at position n, say "t := t[..n-1] ++ t[n+1..];".
> That's it, really. Since extracting a subrange and concatenating two tuples
> each takes log(n) time, well, the insert and delete operations also take
> log(n) time. Technically the binary search takes log^2(n) since each tuple
> read has to traverse a 16-64-Tree, but the base of that logarithm is pretty
> large. And the constant associated with concatenating 16-64-Trees is almost
> certainly significantly larger than the constant associated with tuple
> element reading.
yes, tuples can be split and concatenated, but they lack canonical
representations.
> We're not all that concerned with malicious attacks, since the worst that can
> happen by running carefully crafted crappy code is poor performance. Our
> concern is eliminating pathologies in the hashing. There are lots of
> libraries out there with terrible hash functions. Poor distributions,
> inappropriate mutability, inappropriate caching, poor combination functions,
> etc. Avail may have a few of those lurking – some of the combination
> functions are still cheesy hacks from the old days of running on top of
> Smalltalk. Those will get replaced by more powerful bit-shuffling operations
> at some point, I'm sure.
there are some carefully crafted codes in the wild, that launch denial of
service attacks on major banking sites.
but i understand that dos attacks have low priority right now.
>> why not expose it as a special hash type object, and some methods to combine
>> hash objects to new hash objects.
>
> Interesting idea. While I'm always on the lookout to find ways to make Avail
> slower (it sometimes feels that way!), I think I might have to pass on
> exposing hashes for now. There may come a time when someone wants to build
> an enormous external database that requires smooth hashing of the objects
> being stored for various reasons (key-store, coalescing store, etc). In that
> circumstance we would probably want to expose either the built-in hashing
> protocol (o_Hash()) or build a supplementary one for that new purpose
> ("o_BigSlowHash()" or something).
sure, slow ideas should have the lowest priority :-)
> ...and then Todd chopped all that to bits like the Gordian Knot! His insight
> was to generate nearly all the primitive declarations by using the reflective
> access that all the Primitive subclasses already provided for the
> L2Translator. A similar scheme uses the "special object_" primitive (given a
> name via a pragma in Origin) to produce names for the argument types and
> return types needed by the primitive declarations.
brilliant.
cheers,
Robbert.