Mark, Many thanks for your elaborate reply. The buffer/finger approach sounds promising to speed things up. What I’m after is to efficiently support map and fold, considering the following (pseudo) Java code:
interface JTuple<T>
{
public <R> JTuple<R> map (Function<T,R> m);
public <R> JTuple<R> fold (BiFunction<T,R,R> f);
}
// Int version
interface IntJTuple extends JTuple<Int>
{
public <R> JTuple<R> map (IntFunction<R> m);
public <R> JTuple<R> fold (IntBiFunction <R,R> f);
}
.. the downside (as alway) is that we need to implement the combinatorial
explosion of all Java’s primitive types.
That said, efficient implementations of JTuple would have a native backing Java
array.
Another implementing would have a backing balanced (n-way) tree.
So I guess what I’m asking is whether something similar can be done in Avail?
Can tupleMap and tupleFold be added as primitive methods to A_Tuple?
From what I’ve learned so far is, that's probably not an option.
cheers,
Robbert.
> On 06 Mar 2015, at 19:55, Mark van Gulik <[email protected]> wrote:
>
> Hi Robbert,
>
> Some primitives will be (or already are) inlined into somewhat more
> lightweight instructions than a primitive invocation. Those specialized L2
> instruction implementations directly invoke subprimitives (methods on
> AvailObject), rather than use the primitive mechanism at all. L3 should be
> able to get rid of the generic argument passing and simply invoke these
> subprimitives with values from specific local L3 variables (e.g., Java local
> variables), leaving to the substrate the problem of how to lift some of these
> variables into CPU registers.
>
> For infallible primitives (and primitives that are VM-guaranteed not to fail
> for the types of arguments being provided), some of the generic primitive
> invocation machinery is omitted by L2 infallible primitive invokers. Even
> most ordinary fallible primitives can run without reifying the top
> continuation unless it happens to fail that time (branching past the
> reification code on success).
>
> At some point we'll be addressing:
> -L3 primitive calling conventions
> -inlining non-primitives
> -common subexpression elimination
> -using int registers with specialized paths that fail out to the general
> boxed objects when they overflow. That will sometimes require overflow
> checks in the machine code, but the branch predictions should be nearly
> perfect.
> -more dedicated L2 instructions (which produce custom L3 code) for primitives
> that profiling shows us are the worst bottlenecks.
>
> We'll also build a mechanism that avoids most construction of L1
> continuations, using substrate (e.g., Java or LLVM) stack frames instead. If
> reification is needed, we'll throw a ReificationException. Each stack frame
> method (generated code) catches the exception, builds a suitable L1
> continuation, then rethrows the exception. Those new (mutable) continuations
> are then assembled (in reverse) to build the real continuation chain. We'll
> throw the exception when we need to do a non-inlined Restart or Exit (e.g.,
> for backtracking), to throw an Avail exception, when the substrate's stack
> just gets too deep (say 30 calls for safety), to allow a fiber context switch
> (explicit yield, timeslice preemption, timer expiration, fiber parking for
> synchronization), or to allow an L1-safe operation to run (e.g., adding
> method definitions, which can invalidate L2 code, which therefore must not be
> running).
>
> We can either say those reified frames must continue running as interpreted
> L1, or provide an entry point to specify the L1 pc where it should continue
> (exploding data from the continuation back into local variables). A switch
> statement on the L1 pc should be useful.
>
> This mechanism allows function invocation to simply *be* the function calls
> (or method invocation) of the substrate... but without the limitations that
> are normally implied (bounded stack, no backtracking, exception processing at
> the whim of the substrate language).
>
> With that in mind, we may rewrite some particularly "hot" tuple manipulation
> code by allocating an array on the stack (C's alloca) to hold the tuple
> elements. That only works if the tuple operations can all be rewritten to
> equivalent array operations. If the conceptual tuple never escapes due to a
> ReificationException, we never have to build a tuple for it — or even garbage
> collect its storage, since it was stack-allocated.
>
> The immutability of tuples will also in theory allow us to collapse together
> passes of (stable, pure) mapping functions and filters. An untimely
> reification can just pretend the operations hadn't started yet, or use tree
> and subrange mechanisms to "fake" an intermediate state when possible.
> Subranges and tree tuples might also be useful for secretly parallelizing
> (stable, pure) operations on very large tuples. Massively multi-core CPUs
> are coming soon, and that would give us a nice boost.
>
> Finally, based on the snippet you included, we could save some of the cost of
> the binary search of the child subtuple index in tree tuples by keeping a
> "finger" slot. We would read the int finger field (without synchronization),
> then check if our requested subscript would fall within the child tuple at
> that finger index. If not, we would do the binary search to find the
> subtuple, then write its subscript back into the finger field (skipping
> synchronization). Looping sequentially over a tree tuple would be a bit
> quicker, but random access would involve only a few extra comparisons. I'm
> sure the current lookup cost is thoroughly dominated by other code, but it
> might be a pathway for you to try tweaking the VM. :-)
>
> Alternatively, you could extract buffers of data from the tree tuple, then
> process those flat, fast buffers. I believe we try to do that when we need a
> sequence of byte buffers in the VM (e.g., for file I/O). The buffer filling
> operations figure out which subtrees need to transfer their bytes into which
> regions of the buffer, then recurse on each child. The leaves (flat tuples)
> have customized tight loops for doing their part of the work. This technique
> avoids most of the tree traversal cost.
>
> You could also look at the way our tuple iterators work (in Java), which is
> similar. Make sure to make the tuple immutable if you're thinking of
> stashing an opportunistic <iterator, subscript> pair in the tree tuple
> instead of a finger. Also be aware that the tuple iterators are not
> thread-safe.
>
> Enjoy!
> -Mark
signature.asc
Description: Message signed with OpenPGP using GPGMail
