Based partly on our discussions at the Summit about "live constants", and also
based on the likely requirements of Project Lambda, the JSR 292 EG is likely to
allow any single invokedynamic instruction to pass one or more extra constant
values into the bootstrap method invocation.
Here is the current thinking. Language implementors, please tell us if we are
missing anything.
We call these "static arguments", in contrast to the normal "dynamic arguments"
that are received on every method call. For invokedynamic, the dynamic
arguments are received as if by 'invokeExact' on the method handle bound to the
invokedynamic instruction instance, by the BSM. The BSM decides, once at link
time, which method handle to choose based on the static arguments.
There are three standard static arguments always passed to the BSM:
1. an indication of the caller class (note: this is likely to change to a
MethodHandles.Lookup capability)
2. a String naming the method apparently being called
3. a MethodType indicating the dynamic arguments and return value types
The String and MethodType are extracted from the NameAndType constant at the
invokedynamic site.
The invokedynamic instruction points to a constant pool entry that looks like
this:
struct InvokeDynamic_info {
u1 tag; // always CONSTANT_InvokeDynamic = 18
u2 bsm_index; // ref to CONSTANT_MethodHandle
u2 descr_index; // ref to CONSTANT_NameAndType
u2 argc; // count of optional static arguments
u2 argv[argc]; // refs to anything 'ldc' can refer to (int, long, float,
double, class, method handle, method type)
}
If we take this path, we will switch to the tag '18', to reduce confusion when
old and new class files are mixed.
The existing tag '17' for the no-extra-args format will drop out of use and be
illegal in JDK7 FCS.
Depending on the value of argc, the BSM will be invoked in one of three ways:
if (argc = 0) binding = bsm.invokeGeneric(lookup, name, type);
if (argc = 1) binding = bsm.invokeGeneric(lookup, name, type, (Object)
argv[0]);
if (argc > 1) binding = bsm.invokeGeneric(lookup, name, type, (Object[])
argv);
Note that the BSM, since it is derived from a CONSTANT_MethodHandle, can only
be a "direct method handle", a pointer to a Java method. It cannot be adapted
(e.g., as a spreader or collector). But in user-visible code, it would be
reasonable to express a typical BSM as an overloaded method, whose third
overloading takes a varargs array:
MethodHandle myBSM(MethodHandles.Lookup look, String name, MethodType type);
MethodHandle myBSM(MethodHandles.Lookup look, String name, MethodType type,
Object arg);
MethodHandle myBSM(MethodHandles.Lookup look, String name, MethodType type,
Object... args);
It is natural to ask why we are using varargs, when we could just specify that
the extra static arguments could be passed positionally. The simple answer is
positional arguments are of limited use, but a varargs array can be used to
encode very rich and useful BSM arguments.
Since very few Java methods take more than 10 parameters, allowing up to 255
extra arguments is not very interesting. (Actually the limit would be 251
non-long non-double arguments, since there are three to start with, plus the
BSM itself.) Writing a BSM which takes (say) 100 arguments would be silly.
(Note that BSMs cannot be collectArguments adapters; they have to be simple JVM
methods or constructors.) And a related one that takes 99 arguments would have
to be a completely distinct method. It is clear that any large number of
arguments has to be passed in an array. So let's pass them all in a trailing
varargs parameter.
Will users want more than a couple of extra static arguments? I think so. It
will provide a way to bind interesting specifications directly into the
classfile, without cumbersome bytecode-based construction. Examples:
- a serialized AST structure, built from a mix of strings and method handles,
to be interpreted
- complex application-defined constants, such as lists or sets
- similarly, templates for partly-constant data structures (the invokedynamic
builds a factory method for the template)
- vtables (i.e., maps of names to method handles)
All of these things can be created by executable bytecodes in <clinit>, but
implementors will (in many cases) be able to create them more compactly from
series of constants. For example, a list of integer values will occupy 2+1+4
bytes per element if encoded as a sequence of static arguments. (The '2' is
the argv element; the 1+4 is the CONSTANT_Integer.) Using <clinit> style
bytecodes, the same element will require (1+2+3+1)+1+4 bytes, where the
parenthesized numbers stand for a sequence of "aload buf; bipush J; ldc N;
aastore". (This sequence stores the element into an object array, which is
going to be passed to something like Arrays.asList.) The ratio is 7 to 11.
For integer values which repeat, the ratio is closer to 2 to 6.
There is a limit to this technique, of course, since the constant pool has only
65535 constants. But this limit is shared with the <clinit> style technique.
A key use case for one or two BSM arguments is closure construction for Project
Lambda. Here, an extra static argument can specify a private synthetic method
which gives the body (code, not data) of the closure. The data parts are
normal dynamic arguments. The BSM produces a factory function which
(efficiently) binds the data values to the statically specified closure body.
A second BSM argument might be the SAM type intended for the closure. (That
could also be inferred from the MethodType.)
Another key use case is an invokedynamic instruction that implements an
arbitrary live constant, by linking the call site (of zero arguments) to a
method handle which always returns the desired constant.
(MethodHandles.constant will do this.) The only missing bit is the serialized
data behind the live constant. Again, allowing an essentially unbounded array
gives implementors the right degree (I think) of flexibility.
If, instead of constants, we want templated values (think Groovy strings like
"hello, $name"), the statically determined structure of the value can be
expressed in static arguments to an invokedynamic, with the inserted values
("$name") passed on the stack. The BSM produces a factory function which
builds the desired result. The BSM might use a templating engine to partially
evaluate the static structure, so that the dynamically changing parts can be
combined in at full speed.
(A useful thing missing here is substructure sharing: What if two
invokedynamic instructions need almost the same static arguments? This can be
dealt in user code, with via a static table created in <clinit> or a similar
method. Shared values can be referred to by small integers assigned by the
language backend. In essence, the components we are proposing help language
implementors to build better versions of constant pools and vtables, with
compactness and efficiency similar to the corresponding native structures.)
In conclusion: It is true that most use cases for BSM arguments will only need
one or two extra arguments. But if we allow an array of strings, integer,
method handles, etc., with a reasonable length, suddenly our language
implementor friends have a flexible and natural way to use for encoding the
"serialized" version of their live constants.
So, let's not just one or two static arguments, and not a useless 251, either,
but rather a useful 65535.
(I'd go for a larger number, 2**31-1, but it would not mesh with the other
16-bit numbers in the class file format. That's got to be fixed in a big
format revision, another day.)
-- John
_______________________________________________
mlvm-dev mailing list
[email protected]
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev