On May 20, 2014, at 3:07 PM, Robbert van Dalen wrote: > On 19 May 2014, at 22:58, Mark van Gulik <[email protected]> wrote: >> Avail's different from most languages – as I'm sure you've noticed. There >> is no fixed list of operations that are accessible everywhere in the code >> (by design, for DSLs), so technically there are no base methods that objects >> (values) respond to. You're always free to import and define as many or as >> few methods as you want that accept an argument of type "any" (the highest >> usable type). > > yes, avail is different, but it does have built-in vm primitives that i want > to know about. > and no, i don't want to implement a 32bit hashcode for every type of object > if i don’t have to (and where would i efficiently store such a hash?)
Perhaps you could store the hash code in a map of type {any→int|}. :-). I'm
not entirely joking, by the way.
>> (1) People might build their own hashed structures that provide no more
>> actual power than sets and maps, and
>> (2) Exposing the hash values may lead to pathologically hashed sets and
>> maps. Imagine manually building a bin-like structure in a misguided attempt
>> to improve the performance of sets. Such a structure would segregate values
>> based on their hash values, potentially placing them into sets acting as
>> bins of this construction. Those sets would have hash values that were
>> highly correlated, which can degrade performance.
>
> Regarding (1): that’s *exactly* what i want to build - my own hashed
> structures, most notably treaps.
> avail’s sets and maps are powerful and efficient, but lack (imo)
> destructuring operations (such as split) and/or optional ordering.
> these can be added (may be), but i’d like to experiment with data structures
> myself!
Treaps aren't generally hashed; there's usually a random priority key that
makes it a stochastically balanced tree. Avail provides a pretty good random
generator with period 2^19937-1 – the Mersenne Twister. It's not terribly
efficient at the moment, but that will improve when we start doing
code-splitting for integer registers and... loops! To build a treap you can
make each node be a 4-tuple of the form <left child, value, right child,
priority>, with the empty treap being represented as the empty tuple.
Alternatively, since beyond some scale Avail's tuples are height-balanced
16-64-Trees, you can just maintain a sorted tuple. To insert an element into
an already sorted tuple, do a binary search in the tuple t to determine the
insertion position n, then say "t := t[..n-1] ++ <newElement> ++ t[n..];". To
remove the element at position n, say "t := t[..n-1] ++ t[n+1..];". That's it,
really. Since extracting a subrange and concatenating two tuples each takes
log(n) time, well, the insert and delete operations also take log(n) time.
Technically the binary search takes log^2(n) since each tuple read has to
traverse a 16-64-Tree, but the base of that logarithm is pretty large. And the
constant associated with concatenating 16-64-Trees is almost certainly
significantly larger than the constant associated with tuple element reading.
For extra points you can use the shorter "t := t[n..n-1]→<newElement>" to
insert, and "t := t[n..n]→<>" to delete.
> Regarding (2): even without exposing avail’s hashcode, there are always
> malicious attacks possible, even when 32bit hashes are close to perfect.
> it requires a lot of effort to harden hashtables (and avail's sets and maps)
> against such malicious attacks.
> that’s why i want to experiment with - and possible offer alternative - set
> and map implementations (which need not be vm primitives).
> that said, believe me that i do think that providing tuples, sets and maps
> natively is a *good* thing, because they covers 99% of the use-cases.
We're not all that concerned with malicious attacks, since the worst that can
happen by running carefully crafted crappy code is poor performance. Our
concern is eliminating pathologies in the hashing. There are lots of libraries
out there with terrible hash functions. Poor distributions, inappropriate
mutability, inappropriate caching, poor combination functions, etc. Avail may
have a few of those lurking – some of the combination functions are still
cheesy hacks from the old days of running on top of Smalltalk. Those will get
replaced by more powerful bit-shuffling operations at some point, I'm sure.
>> There's a third, minor issue as well: (3) exposing the hash value also
>> exposes the fact that it fits in a 32-bit int. If we expose that, then
>> we'll have to break working user-written code if we change the number of
>> bits to 64, or provide some horribly kludgey "_’s⁇NEW hash" method (heh,
>> like "The New iPad" or "Scott Joplin's New Rag"). Worse, Avail may
>> eventually be picked up for use "in the small", and some space-constrained
>> implementations of Avail may ultimately choose to compute and store fewer
>> hash bits, like 24 or even 16. Or eliminate hashing entirely if there
>> aren't going to be any large hashed structures. They are just a VM
>> optimization, after all.
>
> why not expose it as a special hash type object, and some methods to combine
> hash objects to new hash objects.
Interesting idea. While I'm always on the lookout to find ways to make Avail
slower (it sometimes feels that way!), I think I might have to pass on exposing
hashes for now. There may come a time when someone wants to build an enormous
external database that requires smooth hashing of the objects being stored for
various reasons (key-store, coalescing store, etc). In that circumstance we
would probably want to expose either the built-in hashing protocol (o_Hash())
or build a supplementary one for that new purpose ("o_BigSlowHash()" or
something).
>> To answer your second point, however, you might want to include the subtype
>> test "_⊆_" in your list. This, like the others, uses a paper-thin primitive
>> (P_033_IsSubtypeOf) to access Avail's built-in, complete rules about
>> compatibility of types. See implementors of o_IsInstanceOf() and its
>> numerous helpers for the vast variety of ways in which types comparisons
>> actually happen.
>
> ah, that’s a nice paper-thin primitive to know about.
By the way, we sometimes call the o_* operations the "sub-primitives" to
emphasize that many of them are there just to support the Avail primitives.
>> You might want to add some other basic operations to your list as well, like
>> "<«_‡,»>" and "{«_‡,»}", since any value can be put in a tuple or set. Also
>> "“_”" (the inner quotes are smart double-quotes), the user-overridable
>> stringification operation. The VM gets told what to name this in the very
>> first module, Origin.avail. The default implementation for [any]→string
>> simply asks the value's descriptor to produce something somewhat suitable,
>> but the Avail method should be overridden for types of values that can be
>> presented more appropriately in some other way.
>
> yeah, avail's bootstrapping is quite ingenious.
Thanks. I spent a huge amount of time (maybe years) figuring out how to order
the definitions of the primitives. There were dozens of tortuous modules that
each introduced a fraction of the methods appropriate for the various data
types. Some primitives had to be introduced in a weak form to provide basic
support for other primitive methods (typically to support type expressions used
by the method declarations), then re-introduced at a later time with more
rigor, specificity, and robustness...
...and then Todd chopped all that to bits like the Gordian Knot! His insight
was to generate nearly all the primitive declarations by using the reflective
access that all the Primitive subclasses already provided for the L2Translator.
A similar scheme uses the "special object_" primitive (given a name via a
pragma in Origin) to produce names for the argument types and return types
needed by the primitive declarations.
An intended secondary effect is to ensure we can translate all Avail source
code into a human language other than English. We generate and update
"*.properties" files when primitives are added or altered, and we use those
files to generate the Bootstrap modules of the Avail library. Should someone
wish to rewrite the core Avail in another human language, say Japanese, this
mechanism at least bootstraps hundreds of primitives with Japanese names for
them to use in the rest of their library.
signature.asc
Description: Message signed with OpenPGP using GPGMail
