Excellent!

As you point out, L2 and L3 are not there yet, but the base algorithms and 
representations are where we’ve focused our efforts so far... but there’s 
nothing stopping us from reaching the same performance some day by compiling 
down to the JVM, and we should be able to exceed it by compiling to LLVM.  I 
feel LLVM will be much faster because the JVM has to meet specific safety 
concerns that we can take care of in other ways.  For example, the bytecode 
verifier knows nothing about generics, and simply relies on the compiler 
producing a mostly right program, leaving dynamic casts all over the place.  
That’s probably a good thing, since Java generics aren’t even close to sound, 
so even if the linking architecture did anything with them it would still 
require lots of dynamic checks.  Avail has some of the same issues with 
genericity, but I’m guessing our primitives already provide a much larger 
collection of useful guarantees than the JVM and JRE.  And as soon as we start 
to inline generics we’ll get back not only the cost of those checks, but 
probably some vastly wider opportunities related to the semantics of the 
built-in types.  Java just has integral types, garbage collected type-safe 
pointers, and structs with volatile and final annotations.  It’s a great 
substrate for low-level programs, but it doesn’t really have any facilities for 
scaling up when these low-level parts are combined.  Inlining and 
statistics-driven polymorphic dispatching are about it, but they mostly just 
keep programmers from doing stuff like inlining on their own (a goal of Avail 
that predates Java).

Just so you’re aware, we’re expecting to get some of our best optimizations out 
of the looping, folding, and conditional operations, or really any operations 
that do control inversion by passing functions and/or continuations.  We’ll 
strongly bias inlining decisions based on whether a function or continuation is 
being passed to external code.  By preferring to inline those, we can not only 
fold away their construction, but also convert things like “Restart_” into mere 
jumps, opening the door for traditional loop optimizations.




> On Apr 10, 2015, at 3:09 PM, Robbert van Dalen <[email protected]> wrote:
> 
> Mark,
> 
> I’ve ‘ported’ the TreeTuple implementation to Scala to learn more about its 
> inner workings.
> I have yet to find a major performance issue: your TreeTuple code is pretty 
> fit!
> I wonder what is the source of the implementation: or did you invent it 
> yourself?
> 
> Anyway, I’m doing some (performance) experiments with map and fold over 
> (Tree)Tuples.
> The scala version performance is pretty good (using specialisation and by 
> leveraging some JVM magic):
> 
> Mapping over a 100 million element TreeTuple (add 1 to all elements) - using 
> 128-wide native array-backed sub-Tuples - takes ~5 seconds
> Folding over the same TreeTuple (sum all elements) takes ~0.3 seconds.
> 
> Of course, the Avail version is orders-of-magnitude slower as the L2 and L3 
> compilers are not yet there.
> Consider that a benchmark for us to shoot at!
> 
> cheers,
> Robbert.
> 
> 
> 

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to