Mark, Yeah, but there is no hope for a ReverseTuple’s hash to be less than linear.
Oh, I’ve been digging around and found this :) http://lists.squeakfoundation.org/pipermail/squeak-dev/2000-August/006848.html Back then, you were using rolling hashes to quickly update the hash of two concatenated ropes. Cool! So I guess you already knew about Rabin-Karp rolling hashes way back in 2000? (http://en.wikipedia.org/wiki/Rolling_hash) Interestingly, the wikipedia page also talks about the Cyclic polynomial rolling hash algorithm, which has even better randomness properties. cheers, Robbert. > On 11 Apr 2015, at 23:16, Mark van Gulik <[email protected]> wrote: > > If we got rid of the pre-toggling of tuple elements' hash values and did an > add instead, and if integers used a polynomial to hash instead of trying to > be well shuffled, we would be able to compute interval tuple hashes in *way* > less than linear time. > > >> On Apr 11, 2015, at 1:40 PM, Robbert van Dalen <[email protected]> wrote: >> >> Having experimented with 5 different immutable Tuple implementations or so, >> I can say with convidence that your BTree implementation is superior to all >> of them in terms of memory footprint and L1 caching. >> That said, the memory footprint can even be more reduced if the cumulative >> sizes array is dropped. Without it you’ll lose binary search (only within a >> maximum size 64 element tuple), but that isn’t to bad considering the fact >> that the L1 cache doesn’t like binary searches anyway. >> >> I’m also still not sure whether your BTree doesn’t degenerate after certain >> patterns of repeated inserts, appends, slices, etc. >> Or that certain operations like append and slice aren’t always O(log(n)) >> time, but worse in certain obscure corner-cases. >> I guess, theoretically, it should never degenerate *if* its invariant is >> always kept: i.e. that it must always be a constant height tree. >> >> But what’s *really* interesting about Avail’s tuples is the choice of using >> an (incremental) hashing algorithm, that is invariant to a tuple’s >> representation. >> (Enchilada also has incremental hashes, but they are cryptographic which >> means that different Tuple representations of the same data end up with >> different hashes). >> >> Unfortunately, in Avail, there are currently two tuple implementations - >> ReverseTuple and SmallIntervalTuple - that don’t have incremental hashes, >> invariant of representation. >> Because how would one quickly calculate the (reverse) hash of a ReverseTuple? >> The same goes for a big interval of integers. >> >> cheers, >> Robbert. >> >> >>> On 11 Apr 2015, at 18:51, Mark van Gulik <[email protected]> wrote: >>> >>> The implementations of tuples are entirely my own (and Leslie's for the >>> interval and repeated tuples). >>> >>> However, the tree implementation is "just" a BTree implementation with a >>> minimum non-root node size of 16 and a maximum size of 64. Normally a >>> BTree will have at most a ratio of 2 between min and max, but that's just >>> to keep disk pages nice and dense. No need for that in memory. But using >>> a mechanism like that rather than some sort of binary tree saves about half >>> the memory cost (and 90% of the L1 miss cost when it's cold), because of >>> the fact that it uses a simple array for its elements and subtrees. A tree >>> tuple with a million elements will have between 4 and 6 layers (log to the >>> bases 16 and 64). Each element has its usual reference pointing to it from >>> its leaf tuple, but if there are 16 elements in each leaf then the >>> immediate parent tree node is holding an eight-byte reference to the leaf, >>> for an amortized cost of half a bit per element. Higher nodes contribute >>> significantly less amortized cost. The leaf tuple's header and incidental >>> fields (like the cached hash) contribute a couple more bits per entry. >>> >>> On the other hand, even a perfectly balanced binary tree has a leaf node >>> for every element, a parent node for every two, a grandparent for every >>> four, etc. so there are 2n complete nodes for each element. Assuming a >>> tiny eight-byte header, that's 2+(3/2)+(3/4) = 5 eight-byte words for each >>> element. So it adds 400% overhead beyond an array representation. It also >>> stresses branch prediction to navigate downward to an element (although the >>> binary searches within an array are similar). And each node eats a >>> separate entry in the processor's on-chip L1 associative cache. And the >>> cost for the garbage collector to manage it goes up considerably. And the >>> algorithms I've seen would involve around twenty new nodes to be >>> constructed for a simple insert or replacement ("persistently") for a >>> million element tree. That's probably a bit less space than an update of >>> the 16-64 BTree; it's hard to say which would really be more expensive >>> overall. >>> >>> >>>> On Apr 10, 2015, at 3:41 PM, Mark van Gulik <[email protected]> wrote: >>>> >>>> Excellent! >>>> >>>> As you point out, L2 and L3 are not there yet, but the base algorithms and >>>> representations are where we’ve focused our efforts so far... but there’s >>>> nothing stopping us from reaching the same performance some day by >>>> compiling down to the JVM, and we should be able to exceed it by compiling >>>> to LLVM. I feel LLVM will be much faster because the JVM has to meet >>>> specific safety concerns that we can take care of in other ways. For >>>> example, the bytecode verifier knows nothing about generics, and simply >>>> relies on the compiler producing a mostly right program, leaving dynamic >>>> casts all over the place. That’s probably a good thing, since Java >>>> generics aren’t even close to sound, so even if the linking architecture >>>> did anything with them it would still require lots of dynamic checks. >>>> Avail has some of the same issues with genericity, but I’m guessing our >>>> primitives already provide a much larger collection of useful guarantees >>>> than the JVM and JRE. And as soon as we start to inline generics we’ll >>>> get back not only the cost of those checks, but probably some vastly wider >>>> opportunities related to the semantics of the built-in types. Java just >>>> has integral types, garbage collected type-safe pointers, and structs with >>>> volatile and final annotations. It’s a great substrate for low-level >>>> programs, but it doesn’t really have any facilities for scaling up when >>>> these low-level parts are combined. Inlining and statistics-driven >>>> polymorphic dispatching are about it, but they mostly just keep >>>> programmers from doing stuff like inlining on their own (a goal of Avail >>>> that predates Java). >>>> >>>> Just so you’re aware, we’re expecting to get some of our best >>>> optimizations out of the looping, folding, and conditional operations, or >>>> really any operations that do control inversion by passing functions >>>> and/or continuations. We’ll strongly bias inlining decisions based on >>>> whether a function or continuation is being passed to external code. By >>>> preferring to inline those, we can not only fold away their construction, >>>> but also convert things like “Restart_” into mere jumps, opening the door >>>> for traditional loop optimizations. >>>> >>>> >>>> >>>> >>>>> On Apr 10, 2015, at 3:09 PM, Robbert van Dalen <[email protected]> >>>>> wrote: >>>>> >>>>> Mark, >>>>> >>>>> I’ve ‘ported’ the TreeTuple implementation to Scala to learn more about >>>>> its inner workings. >>>>> I have yet to find a major performance issue: your TreeTuple code is >>>>> pretty fit! >>>>> I wonder what is the source of the implementation: or did you invent it >>>>> yourself? >>>>> >>>>> Anyway, I’m doing some (performance) experiments with map and fold over >>>>> (Tree)Tuples. >>>>> The scala version performance is pretty good (using specialisation and by >>>>> leveraging some JVM magic): >>>>> >>>>> Mapping over a 100 million element TreeTuple (add 1 to all elements) - >>>>> using 128-wide native array-backed sub-Tuples - takes ~5 seconds >>>>> Folding over the same TreeTuple (sum all elements) takes ~0.3 seconds. >>>>> >>>>> Of course, the Avail version is orders-of-magnitude slower as the L2 and >>>>> L3 compilers are not yet there. >>>>> Consider that a benchmark for us to shoot at! >>>>> >>>>> cheers, >>>>> Robbert. >>
signature.asc
Description: Message signed with OpenPGP using GPGMail
