optional arguments for bootstrap methods

John Rose Fri, 22 Oct 2010 01:44:19 -0700

Based partly on our discussions at the Summit about "live constants", and also 
based on the likely requirements of Project Lambda, the JSR 292 EG is likely to 
allow any single invokedynamic instruction to pass one or more extra constant 
values into the bootstrap method invocation.


Here is the current thinking.  Language implementors, please tell us if we are 
missing anything.

We call these "static arguments", in contrast to the normal "dynamic arguments" 
that are received on every method call.  For invokedynamic, the dynamic 
arguments are received as if by 'invokeExact' on the method handle bound to the 
invokedynamic instruction instance, by the BSM.  The BSM decides, once at link 
time, which method handle to choose based on the static arguments.

There are three standard static arguments always passed to the BSM:
 1. an indication of the caller class (note: this is likely to change to a 
MethodHandles.Lookup capability)
 2. a String naming the method apparently being called
 3. a MethodType indicating the dynamic arguments and return value types

The String and MethodType are extracted from the NameAndType constant at the 
invokedynamic site.

The invokedynamic instruction points to a constant pool entry that looks like 
this:

struct InvokeDynamic_info {
  u1 tag; // always CONSTANT_InvokeDynamic = 18
  u2 bsm_index;   // ref to CONSTANT_MethodHandle
  u2 descr_index; // ref to CONSTANT_NameAndType
  u2 argc;  // count of optional static arguments
  u2 argv[argc];  // refs to anything 'ldc' can refer to (int, long, float, 
double, class, method handle, method type)
}

If we take this path, we will switch to the tag '18', to reduce confusion when 
old and new class files are mixed.

The existing tag '17' for the no-extra-args format will drop out of use and be 
illegal in JDK7 FCS.

Depending on the value of argc, the BSM will be invoked in one of three ways:
  if (argc = 0)  binding = bsm.invokeGeneric(lookup, name, type);
  if (argc = 1)  binding = bsm.invokeGeneric(lookup, name, type, (Object) 
argv[0]);
  if (argc > 1)  binding = bsm.invokeGeneric(lookup, name, type, (Object[]) 
argv);

Note that the BSM, since it is derived from a CONSTANT_MethodHandle, can only 
be a "direct method handle", a pointer to a Java method.  It cannot be adapted 
(e.g., as a spreader or collector).  But in user-visible code, it would be 
reasonable to express a typical BSM as an overloaded method, whose third 
overloading takes a varargs array:
  MethodHandle myBSM(MethodHandles.Lookup look, String name, MethodType type);
  MethodHandle myBSM(MethodHandles.Lookup look, String name, MethodType type, 
Object arg);
  MethodHandle myBSM(MethodHandles.Lookup look, String name, MethodType type, 
Object... args);

It is natural to ask why we are using varargs, when we could just specify that 
the extra static arguments could be passed positionally.  The simple answer is 
positional arguments are of limited use, but a varargs array can be used to 
encode very rich and useful BSM arguments.

Since very few Java methods take more than 10 parameters, allowing up to 255 
extra arguments is not very interesting.  (Actually the limit would be 251 
non-long non-double arguments, since there are three to start with, plus the 
BSM itself.)  Writing a BSM which takes (say) 100 arguments would be silly.  
(Note that BSMs cannot be collectArguments adapters; they have to be simple JVM 
methods or constructors.)  And a related one that takes 99 arguments would have 
to be a completely distinct method.  It is clear that any large number of 
arguments has to be passed in an array.  So let's pass them all in a trailing 
varargs parameter.

Will users want more than a couple of extra static arguments?  I think so.  It 
will provide a way to bind interesting specifications directly into the 
classfile, without cumbersome bytecode-based construction.  Examples:
 - a serialized AST structure, built from a mix of strings and method handles, 
to be interpreted
 - complex application-defined constants, such as lists or sets
 - similarly, templates for partly-constant data structures (the invokedynamic 
builds a factory method for the template)
 - vtables (i.e., maps of names to method handles)

All of these things can be created by executable bytecodes in <clinit>, but 
implementors will (in many cases) be able to create them more compactly from 
series of constants.  For example, a list of integer values will occupy 2+1+4 
bytes per element if encoded as a sequence of static arguments.  (The '2' is 
the argv element; the 1+4 is the CONSTANT_Integer.)  Using <clinit> style 
bytecodes, the same element will require (1+2+3+1)+1+4 bytes, where the 
parenthesized numbers stand for a sequence of "aload buf; bipush J; ldc N; 
aastore".  (This sequence stores the element into an object array, which is 
going to be passed to something like Arrays.asList.)  The ratio is 7 to 11.  
For integer values which repeat, the ratio is closer to 2 to 6.

There is a limit to this technique, of course, since the constant pool has only 
65535 constants.  But this limit is shared with the <clinit> style technique.

A key use case for one or two BSM arguments is closure construction for Project 
Lambda.  Here, an extra static argument can specify a private synthetic method 
which gives the body (code, not data) of the closure.  The data parts are 
normal dynamic arguments.  The BSM produces a factory function which 
(efficiently) binds the data values to the statically specified closure body.  
A second BSM argument might be the SAM type intended for the closure.  (That 
could also be inferred from the MethodType.)

Another key use case is an invokedynamic instruction that implements an 
arbitrary live constant, by linking the call site (of zero arguments) to a 
method handle which always returns the desired constant.  
(MethodHandles.constant will do this.)  The only missing bit is the serialized 
data behind the live constant.  Again, allowing an essentially unbounded array 
gives implementors the right degree (I think) of flexibility.

If, instead of constants, we want templated values (think Groovy strings like 
"hello, $name"), the statically determined structure of the value can be 
expressed in static arguments to an invokedynamic, with the inserted values 
("$name") passed on the stack.  The BSM produces a factory function which 
builds the desired result.  The BSM might use a templating engine to partially 
evaluate the static structure, so that the dynamically changing parts can be 
combined in at full speed.

(A useful thing missing here is substructure sharing:  What if two 
invokedynamic instructions need almost the same static arguments?  This can be 
dealt in user code, with via a static table created in <clinit> or a similar 
method.  Shared values can be referred to by small integers assigned by the 
language backend.  In essence, the components we are proposing help language 
implementors to build better versions of constant pools and vtables, with 
compactness and efficiency similar to the corresponding native structures.)

In conclusion:  It is true that most use cases for BSM arguments will only need 
one or two extra arguments.  But if we allow an array of strings, integer, 
method handles, etc., with a reasonable length, suddenly our language 
implementor friends have a flexible and natural way to use for encoding the 
"serialized" version of their live constants.

So, let's not just one or two static arguments, and not a useless 251, either, 
but rather a useful 65535.

(I'd go for a larger number, 2**31-1, but it would not mesh with the other 
16-bit numbers in the class file format.  That's got to be fixed in a big 
format revision, another day.)

-- John

_______________________________________________
mlvm-dev mailing list
[email protected]
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

optional arguments for bootstrap methods

Reply via email to