Excellent! As you point out, L2 and L3 are not there yet, but the base algorithms and representations are where we’ve focused our efforts so far... but there’s nothing stopping us from reaching the same performance some day by compiling down to the JVM, and we should be able to exceed it by compiling to LLVM. I feel LLVM will be much faster because the JVM has to meet specific safety concerns that we can take care of in other ways. For example, the bytecode verifier knows nothing about generics, and simply relies on the compiler producing a mostly right program, leaving dynamic casts all over the place. That’s probably a good thing, since Java generics aren’t even close to sound, so even if the linking architecture did anything with them it would still require lots of dynamic checks. Avail has some of the same issues with genericity, but I’m guessing our primitives already provide a much larger collection of useful guarantees than the JVM and JRE. And as soon as we start to inline generics we’ll get back not only the cost of those checks, but probably some vastly wider opportunities related to the semantics of the built-in types. Java just has integral types, garbage collected type-safe pointers, and structs with volatile and final annotations. It’s a great substrate for low-level programs, but it doesn’t really have any facilities for scaling up when these low-level parts are combined. Inlining and statistics-driven polymorphic dispatching are about it, but they mostly just keep programmers from doing stuff like inlining on their own (a goal of Avail that predates Java).
Just so you’re aware, we’re expecting to get some of our best optimizations out of the looping, folding, and conditional operations, or really any operations that do control inversion by passing functions and/or continuations. We’ll strongly bias inlining decisions based on whether a function or continuation is being passed to external code. By preferring to inline those, we can not only fold away their construction, but also convert things like “Restart_” into mere jumps, opening the door for traditional loop optimizations. > On Apr 10, 2015, at 3:09 PM, Robbert van Dalen <[email protected]> wrote: > > Mark, > > I’ve ‘ported’ the TreeTuple implementation to Scala to learn more about its > inner workings. > I have yet to find a major performance issue: your TreeTuple code is pretty > fit! > I wonder what is the source of the implementation: or did you invent it > yourself? > > Anyway, I’m doing some (performance) experiments with map and fold over > (Tree)Tuples. > The scala version performance is pretty good (using specialisation and by > leveraging some JVM magic): > > Mapping over a 100 million element TreeTuple (add 1 to all elements) - using > 128-wide native array-backed sub-Tuples - takes ~5 seconds > Folding over the same TreeTuple (sum all elements) takes ~0.3 seconds. > > Of course, the Avail version is orders-of-magnitude slower as the L2 and L3 > compilers are not yet there. > Consider that a benchmark for us to shoot at! > > cheers, > Robbert. > > >
signature.asc
Description: Message signed with OpenPGP using GPGMail
