Hi everybody, i've spend some time to think how the String interpolation + 
Policy should be specified and implemented.


The goal is to add a syntax specifying a user defined method to "interpolate" 
(for a lack of better word) a string with arguments.

Given that it's a method, the exact semantics of the interpolation, things like 
how the arguments are escaped, how the formatted string is parsed, is written 
is Java, this will allow to support a wide range of use cases.

This proposal does not differ from the original proposal of Brian and Jim in 
its goal but in the way a user declare the interpolation method(s).

TLDR; you can declare an interpolation method and optionally an interpolation 
bootstrap method if you want a more efficient code at the price of having to 
play with the method handle API.

---

The proposal of Brian and Jim uses an interface to define the policy but in 
this case, using an interface is not what we want.
I think there are two main reasons,
- the interpolation method can be an instance method but can also be a factory 
method, a static method, and an interface can not constraint a static method.
- we want the signature of the interpolation method to be free to use any 
number of parameters of any types, something that can not be specified with 
type parameters in Java.

So let's take a step back and write some examples, as a user of the 
interpolation method, we want to
- be able to specify string interpolation,
  you can notice that this is a static method.

  String name = ...
  int value = ...
  String s = String."name: \(name) age: \(age)";


- we also want to be able to instantiate regex Pattern,
  and have a magic optimisation that creates the Pattern instance only one

  Pattern pattern = Pattern."foo|bar";

- we also want to support instance method, so the interpolation can escape the 
arguments differently depending on the context,
  here by example, escaping differently depending on the database driver.

  String username = ...
  Connection connection = ...
  connection."""
    SELECT * FROM users where user == "\(username)"
    """;

I think the simplest way to specify an interpolation method is to have a method 
with a special name,
i will use __interpolate__ because i don't want to discuss the exact syntax 
here.

This method can be a static method or an instance method and has a restriction, 
the first parameter has to be a String because the first argument is the 
formatted string.

Here is an example of how the method __interpolate__ inside java.lang.String 
can be written.
To avoid everybody to re-implement the parsing of the formatted string, the 
class java.lang.runtime.InterpolateMetafactory provides a helper method 
"formatIterator" that returns an iterator splitting the formatted string into 
text and binding.


  package java.lang;

  public class String {
    ...
    public static String __interpolate__(String format, Object... args) {
      var i = 0;
      var builder = new StringBuilder();
      var iterator = InterpolateMetafactory.formatIterator(format);
      while(iterator.hasNext()) {
        switch(iterator.next()) {
          case Text(var text) -> builder.append(text);
          case Binding binding -> args[i++];
        }
      }
      return builder.toString();
    }
    ...
  }

While this is nice, you may think that it's just syntactic sugar and it will 
not be more performant that String.valueOf(), i.e. it will be slow.

That's why the specification allow you to provide a second more optimised 
version of the interpolation method using a method __interpolate__bootstrap__.
This method __interpolate__bootstrap__ is not required, can not replace the 
method __interpolate__, both __interpolate__ and __interpolate__bootstrap__
has to be present and it's a backward compatible change to add a method 
__interpolate__bootstrap__ after the fact, there is no need to recompile
all the client code.

For that the compiler translation rely on invokedynamic to call the method 
bootstrap of the class InterpolateMetafactor that at runtime decide
to trampoline either to the method __interpolate__bootstrap__ or to the method 
__interpolate__ if no __interpolate__bootstrap__ exists.

Here is an example of how a call to the interpolation method of String is 
generated by javac
For the Java code

  String name = ...
  int value = ...
  String s = String."name: \(name) age: \(age)";

the equivalent bytecode is  

  aload_1.  // load name
  iload_2.  // load age
  invokedynamic __interpolate__ (Ljava/lang/StringI)Ljava/lang/String;
    java.lang.runtime.InterpolateMetafactory.bootstrap(Lookup, String, 
MethodType, String, MethodHandle):CallSite
    [ "name: \(name) age: \(age)", String::__interpolate__(String, 
Object[]):String ]

>From the perspective of the compiler the method __interpolate__ works exactly 
>like a method with a polymorphic method signature (the method annotated with 
>@PolymorphicSignature),
so the descriptor of invokedynamic is created by collecting the type of the 
argument, here the interpolation method is called with a String and an int, so 
the descriptor
and the return type is String so the descriptor is 
(Ljava/lang/StringI)Ljava/lang/String;

Considering the interpolation method as a polymorphic method is important in 
term of performance because it means that not boxing will be done by the 
compiler, if there are some boxing, they will be done by the runtime, so are 
optional if the __interpolate__bootstrap__ does not need to box arguments.

You can also notice that the formatted string is passed as a bootstrap constant 
so all the parsing of the format can be done once outside of the hot path.
A call to invokedynamic also pass as a second bootstrap argument the method 
handle to the method __interpolate__, so the implementation inside 
InterpolateMetafactory.bootstrap can called this method if no method 
__interpolate__bootstrap__ exists.


Here is a raw implementation of the class InterpolateMetafactory.
The method formatIterator() return an Iterator of Token which is a sealed class.
The method bootstrap() first lookup to a method "__interpolate__bootstrap__" in 
the lookup class that takes a Lookup, a String, a MethodType, the format and 
the default implementation and call it if it exists or takes the default 
implementation, bind the formatted String and adapt the arguments using asType 
(ask for boxing, etc).

   package java.lang.runtime;

   public class InterpolateMetafactory {
     public sealed interface Token {
       public record Text(String text) implements Token {}
       public record Binding(String name) implements Token {} 
     }


     public static Iterator<Token> formatIterator(String format) {
       ...
     }

     public static CallSite bootstrap(Lookup lookup, String name, MethodType 
methodType, String format, MethodHandle impl) throws Throwable {
       // check if there is a bootstrap method
       MethodHandle bootstrap;
       try {
          bootstrap = lookup.findStatic(lookup.lookupClass(), 
"__interpolate__bootstrap__", MethodType.methodType(CallSite.class, 
Lookup.class, String.class, MethodType.class, String.class, 
MethodHandle.class));
       } catch(NoSuchMethodException e) {
         // bind the default implementation
         return new ConstantCallSite(impl.bindTo(format).asType(methodType));
       }
       return boostrap.invoke(lookup, name, methodType, format, impl);
     }
   }


Here is another example, showing how to declare the methods __interpolate__ and 
__interpolate__bootstrap__ inside java.util.regex.Pattern.
The "default" implementation calls Pattern.compile() and the optimized one 
always returns the result of Pattern.compile() as a constant.

  package java.util.regex;

  public class Pattern {
    public static String __interpolate__(String format) {. // the formatted 
string can not have arguments
      return Pattern.compile(format);
    }
    
    private static CallSite __interpolate__bootstrap__(Lookup lookup, String 
name, MethodType methodType, String format, MethodHandle impl) {
      return new ConstantCallSite(MethodHandles.constant(Pattern.class, 
Pattern.compile(format)));
    }
  }


The method __interpolate__ provides via its signature, the parameter types that 
are verified by the compiler.
It also provides a code that can be used by the tools that does static analysis 
on the bytecode because those tools can not see through the method handle 
returned by a bootstrap method given that it's a runtime construct, it's 
usually not available at the time the static analysis is done. This should be 
enough to have tools like Graal VM native image to see through the 
invokedynamic in a similar way it sees through the invokedynamic used when 
creating a lambda.

The fact that all invokedynamic goes through the method 
InterpolateMetafactory.bootstrap and trampoline from it means that adding or 
removing the method __interpolate__bootstrap__ is a binary compatible change, 
if __interpolate__bootstrap__ is declared private. So implementing 
__interpolate__bootstrap__ can be an afterthought.

regards,
Rémi

Reply via email to