On Feb 25, 2015, at 6:29 PM, John Rose <john.r.r...@oracle.com> wrote:
>
> Maybe this is general enough:
>
> MHs.loop(init, predicate, body)(*a)
> => { let i = init(*a); while (predicate(i, *a)) { i = body(i, *a); }
> return i; }
>
> ...where the type of i depends on init, and if init returns void then you
> have a classic side-effect-only loop.
FTR, I've been thinking more about this one, with help from Michael Haupt.
The above loop is pretty good, except for a couple of things: 'i' ought to be
tuple of loop-varying values, and the exit path needs a more general
post-processing step, a "finalizer" that corresponds to the initializers that
set up the loop variables.
A generalized (linear) loop has two kinds of values flowing within it:
loop-invariant and loop-varying. (The control predicate has a special role, but
may be classed as loop-varying. The finalizer is also special, and
loop-invariant.) Loop-varying values can be inputs, control values (array
indices, etc.), and outputs, but can all, I think, be modeled as "step"
functions using a common signature.
This takes us to something like the C for-loop, but with a sprinkle of tuples,
with a finishing bow around it to tie it up:
for ((v1,v2,v3,…) = inits; predicate(v1…); (v1…) = step(v1…)) { … }; return
finish(v1…);
Loop-local but invariant values can be modeled as non-updated loop variables,
as in this C idiom:
for (int i, limit = computeLimit(); i < limit; i++) …
Pure side effects can be modeled as void-valued step functions, so we want to
allow void as a corner case for loop variable types. (I.e., where there is a
step but not a value.) I don't want to make any more special provision for
pure side effects (such as a unique Runnable-like "body" for a loop), both
because it isn't necessary, and because it encourages sloppy side-effect-driven
loops.
If we allow each loop variable to have its own predicate and finalizer, we can
properly model the effects of ordered guards and special early-exit paths.
loop({init*}, {pred*}, {step*}, {fini*}) := lambda (arg*) {
let var* = init*(arg*)
for (;;) {
for ((v,p,s,f) in (var*, pred*, step*, fini*))
if (!p(var*, arg*))
return f(var*, arg*)
v = s(var*, arg*) // all vars help update each var
}
}
}
All invocations are exact.
As a bonus we can easily derive post-checked loops (do-while), and time-shifted
index variables (p++) simply by adding more variables and/or steps.
Constraints on parameter and return types:
- There must be the same number of inits, preds, steps, and finis.
- There must be at least one input function (e.g., a pred, so that the loop has
a chance to exit).
- Parameter lists of the init functions must all be the same.
- Parameter lists of the pred, step, and fini functions must all be the same.
- Parameter lists of the step functions must be a suffix of the parameter lists
of the init functions.
- Parameters of the step functions must be the same as the return types of the
init functions (with voids removed), followed by the common suffix.
- Return types of the predicates must all be boolean.
- Return types of the finalizers must all be the same (and can be void).
- The return type of each init function must match the return type of the
corresponding step function (a "var" type or void).
The result, a looping method handle, has a return type taken from the finalizer
function(s), and argument types taken from the init functions. The loop has
one iteration variable for each non-void init/step function pair. Each
step/pred/fini call takes all of the iteration variable values, followed by all
the external (loop-invariant) arguments. No throws are caught by the loop;
throws cause the loop to terminate abnormally.
In some loops, such as "find first match", the predicate naturally follows the
step function, because it needs to evaluate the result of the step. To model
post-checks (and the classic do/while shape), a second "void-valued" loop
variable is required:
loop(...) := lambda (arg*) {
let (var, _) = (null, void)
for (;;) {
if (_) return _ // trivial predicate: exit branch never taken
var = s(_, arg*) // pick up next search key
if (!p(var, arg*)) // did we find it?
return f(var, arg*) // extract value from search key as needed
_ = _(var, arg*) // void value, no side effect
}
}
This variable-doubling could be compressed out if there were a further boolean
(per pred/step pair) which says which order to run them in. To me this seems
like too much ad hoc structure; such structure tends to creep over time as we
discover other important patterns. I think it would be better to support
optionality for common "no op" loop components, as follows:
Optionalities:
- If an init is null, a constant zero (or null/false/void) function is supplied
by inspecting the return type of the corresponding step function.
- If a pred is null, a constant "true" function is supplied.
- If a step is null, a suitable identity function is supplied by inspecting the
return type of the corresponding init function.
- If a fini is null, a constant zero function is supplied based on a non-null
fini; if all are null then void is assumed.
- Missing parameters (of any function) corresponding to loop-invariant arg*
values are silently filled in (as if by dropArguments).
- Other missing parameters (of loop variables passed to non-init functions) are
also silently filled in (as if by dropArguments).
- However, all missing parameters must be supplied after the present
parameters; they may not be "mixed into" present parameters.
The missing-parameter rules are as with GWT and CE; they are easy for the
runtime to implement and grossly inconvenient for users if absent. Perhaps
they can be made a little more one-sided, by forcing the step functions (if
present) to carry all arguments.
Concrete type:
MethodHandle loop(MethodHandle[] inits, MethodHandle[] preds, MethodHandle[]
steps, MethodHandle[] finis);
Another possibility would be to allow a free intermixing of any number of steps
(var updates) with any other number of pred/fini pairs:
MethodHandle loop({MethodHandle init, MethodHandle step || MethodHandle pred,
MethodHandle fini}…);
But, it is trivial to convert such a structure to the above structure by
injecting nulls for the missing pairs, to make up a full set of quadruples. So
I think the fixed-quadruple format is general enough without sacrificing
expressiveness.
Various convenience specializations are possible, but can all be code-generated
via the general template. The bytecodes to be generated from the general
template are obvious from the formula above, and will JIT well.
Here are some examples:
MethodHandle finderLoop(MethodHandle body, MethodHandle finder, MethodHandle
fini=null);
// do () { v = body(a*); } while (!finder(v,a*)); return fini(v,a*);
MethodHandle cursorLoop(MethodHandle init, MethodHandle pred, MethodHandle
body, MethodHandle step, MethodHandle fini);
// for (v=init(a*); pred(v,a*); v=step(v,a*)) { body(v,a*); }; return
fini(v,a*);
MethodHandle countedLoop(int start=0, MethodHandle limit, int incr=1,
MethodHandle init2, MethodHandle pred2, MethodHandle body2, MethodHandle
fini=null);
// for (v'=start, vlim=limit(a*), v2=init2(a*), v=v'; (v'++)<vlim &&
pred2(v,vlim,v2,a*); v=v') { v2=body2(v,vlim,v2,a*); }; return
fini(v,vlim,v2,a*);
MethodHandle loop(MethodHandle[/*4*/]... quadOfInitPredStepFini);
(What's your favorite loop? How well does it fit into this scheme?)
Library functions for loop clauses (encoded as quads) might also be
interesting, but this starts to take us into territory where streams can do a
better job:
MethodHandle[][] countedLoopClause(int start, MethodHandle limit, int incr,
MethodHandle[] moreClauses…) {
MethodHandle[] setLimitQuad = { limit, /*no pred/step/exit*/ null, null, null
};
MethodHandle[] checkLimitQuad = { /*no init/pred*/ null, null,
Integer::compareLT, null };
MethodHandle[][] res = new MethodHandle[2+moreClauses.length][];
res[0] = setLimitQuad; res[1] = checkLimitQuad;
System.arraycopy(moreClauses, 0, res, 2, moreClauses.length);
return res;
}
Not so good, that one.
Anyway, my basic position here is that there ought to be a loop shape which is
(a) general enough to encode the normal special cases, and (b) easy to generate
into bytecodes. If we can build everything from that one loop shape, our code
generation and optimization will be made easier to implement, hence more
effective.
And, I claim this is a worthwhile puzzle to puzzle on.
— John
_______________________________________________
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev