> From: "Brian Goetz" <brian.go...@oracle.com> > To: "Remi Forax" <fo...@univ-mlv.fr>, "amber-spec-experts" > <amber-spec-experts@openjdk.java.net> > Sent: Mercredi 13 Octobre 2021 21:32:19 > Subject: Re: String Interpolation
> The ability to capture per-call-site computation so it could be done exactly > once (including generating an MH to describe it) has been part of the goal all > along. The JEP is deliberately cagey about this because we didn't want to > descend down the translation rabbit hole before we'd achieved consensus on the > broad strokes, any more than we wanted to descend down the syntax rabbit hole. > (FWIW, all of these side-paths were ones we already traveled and rejected for > various reasons :) > As you correctly point out, without something like type classes, associating a > static method like a bootstrap with a class requires committing some sort of > sin, such as the "magic names" sins committed by serialization. We surely > didn't want to do that either. What we want here is a "protocol", a protocol is something that is really like a method call but with extra syntax, extra constraints. Java uses protocols. We are used to see a method with the class name and no return type as a constructor that we may not even realize that Java uses special names to indicate a protocol. Unlike Scala, the current syntax does not specify a method name, it's String."a text" and not String.method"a text". That why i've proposed to use a special name. Soon we will want to introduce user defined pattern, this is also a protocol, it's a kind of method call too, with extra syntax, extra constraints. Instead of using I agree that using a magic name is less than ideal to define a protocol, but >> - we also want to be able to instantiate regex Pattern, >> and have a magic optimisation that creates the Pattern instance only one >> Pattern pattern = Pattern."foo|bar"; > You said the magic anti-word, which is "magic". We don't want this to be > magic. > (Examples like this are better treated as a form of optimistic constant > folding, along the lines explored at my JVMLS talk a few years ago.) > Summary: wait for constant folding. >> I think the simplest way to specify an interpolation method is to have a >> method >> with a special name, >> i will use __interpolate__ because i don't want to discuss the exact syntax >> here. > This is committing the same "magic name" sin as serialization. We deliberately > avoided this in the design. When we have type classes, we'll be able to use > that as a way to bridge from a type name to a witness to a particular class. > Our design was crafted so that it could be gracefully extended to such a > mechanism, when it is available (using a type name instead of an instance > reference at the use site.) > Summary: wait for type classes. >> That's why the specification allow you to provide a second more optimised >> version of the interpolation method using a method >> __interpolate__bootstrap__. > This is an obviously attractive goal, but the mechanism is way too ad-hoc -- > and > also too limited -- and also too advanced to be a language feature. Bootstraps > are way too complicated to expose in the source language in this way, > especially not this magically. And its too ad-hoc, since its specific to the > interpolation feature, whereas one could imagine a number of other contexts > where it is useful too. So this is a bad tradeoff in many ways. Jim's > implementation very cleverly gets the equivalent of this using pure library > implementation (which leans on MutableCallSite.) > While it is surely a desirable goal to be able to optimize formatter > implementation, it is also super-easy to become obsessed with this, and give > it > a bigger place in the feature than it deserves. For some cases -- notably > String::format -- there are huge savings to be had (from a number of sources, > not least of which is that scanning the string at every invocation and > choosing > a strategy based on that is expensive.) But in other cases, it is almost > irrelevant. For pure concatenation, it is already pretty fast; for SQL, the > cost of constructing the query is a tiny part of the execution time, so its > not > even worth optimizing. So this is a "nice to have" rather than the centerpiece > of the feature. > To be clear, the centerpiece is the gathering up of a template + parameters so > that their combination can be handled by another entity, whether right now, > later, or never. Optimizing the case where it is done right now, using a > predictable choice of entity, is an optimization, but not the centerpiece. > Let me sketch out how we're envisioning this. The API is something like: > interface TemplatePolicy<T, E extends Exception> { > T apply(TemplatedString ts); > // returns MethodHandle (TemplatePolicy, TemplatedString) -> Object > default MethodHandle asMethodHandle(TemplatedString ts) { > return MH[TemplatePolicy::apply] > } > } > The API specification has a number of constraints on the implementation of > asMethodHandle, which I'll get to in a second. When the compiler encounters an > immediate application P."...", it generates an indy, which uses a special > bootstrap that returns a MutableCallSite. The MutableCallSite initially has as > its target a special secondary bootstrap MH, which represents an interpolation > site that has not yet seen an actual invocation. The secondary bootstrap MH > has > the shape of TemplatePolicy::apply (e.g., (TemplatePolicy, TemplatedString) -> > Object), so on first invocation it receives the TP object and the TS. It then > calls TP::asMethodHandle, and wraps this MH with a GWT which validates the > invariants and proceeds to that MH if they hold -- which they will 99.x% of > the > time. > The invariant is that the dynamic type of the per-instantiation TP be == to > the > dynamic type of the TP that was present at secondary linkage. That is, it be > an > instance of the same class, but not the same instance. By definition, the > string will always be the same as will the types of the parameters, since this > is specific to concrete P."..." sites. So the MH can take advantage of that. > The constraint on TP::asMethodHandle is that it not undermine this invariant; > that if it generates a MH that is dependent on TP state, it not bake that > state > into the resulting MH, but instead, treat the TP state as a parameter. > Further, > the MH must be behaviorally equivalent to calling apply. > If the GWT fails, it means the user is doing something like: > for (TP p : listOfProcessors) { > blah blah p."foo \{a}" > } > in which case the GWT falls back to the "just do an invokevirtual of > TP::apply" > strategy. (It could get fancier but I don't see any point.) > This lets us rescue indy-based translation without exposing a magic indy-hook > in > the JLS. (Sorry, I know you wanted the magic indy hook.) > On 10/13/2021 1:09 PM, Remi Forax wrote: >> Hi everybody, i've spend some time to think how the String interpolation + >> Policy should be specified and implemented. >> The goal is to add a syntax specifying a user defined method to "interpolate" >> (for a lack of better word) a string with arguments. >> Given that it's a method, the exact semantics of the interpolation, things >> like >> how the arguments are escaped, how the formatted string is parsed, is written >> is Java, this will allow to support a wide range of use cases. >> This proposal does not differ from the original proposal of Brian and Jim in >> its >> goal but in the way a user declare the interpolation method(s). >> TLDR; you can declare an interpolation method and optionally an interpolation >> bootstrap method if you want a more efficient code at the price of having to >> play with the method handle API. >> --- >> The proposal of Brian and Jim uses an interface to define the policy but in >> this >> case, using an interface is not what we want. >> I think there are two main reasons, >> - the interpolation method can be an instance method but can also be a >> factory >> method, a static method, and an interface can not constraint a static method. >> - we want the signature of the interpolation method to be free to use any >> number >> of parameters of any types, something that can not be specified with type >> parameters in Java. >> So let's take a step back and write some examples, as a user of the >> interpolation method, we want to >> - be able to specify string interpolation, >> you can notice that this is a static method. >> String name = ... >> int value = ... >> String s = String."name: \(name) age: \(age)"; >> - we also want to be able to instantiate regex Pattern, >> and have a magic optimisation that creates the Pattern instance only one >> Pattern pattern = Pattern."foo|bar"; >> - we also want to support instance method, so the interpolation can escape >> the >> arguments differently depending on the context, >> here by example, escaping differently depending on the database driver. >> String username = ... >> Connection connection = ... >> connection.""" >> SELECT * FROM users where user == "\(username)" >> """; >> I think the simplest way to specify an interpolation method is to have a >> method >> with a special name, >> i will use __interpolate__ because i don't want to discuss the exact syntax >> here. >> This method can be a static method or an instance method and has a >> restriction, >> the first parameter has to be a String because the first argument is the >> formatted string. >> Here is an example of how the method __interpolate__ inside java.lang.String >> can >> be written. >> To avoid everybody to re-implement the parsing of the formatted string, the >> class java.lang.runtime.InterpolateMetafactory provides a helper method >> "formatIterator" that returns an iterator splitting the formatted string into >> text and binding. >> package java.lang; >> public class String { >> ... >> public static String __interpolate__(String format, Object... args) { >> var i = 0; >> var builder = new StringBuilder(); >> var iterator = InterpolateMetafactory.formatIterator(format); >> while(iterator.hasNext()) { >> switch(iterator.next()) { >> case Text(var text) -> builder.append(text); >> case Binding binding -> args[i++]; >> } >> } >> return builder.toString(); >> } >> ... >> } >> While this is nice, you may think that it's just syntactic sugar and it will >> not >> be more performant that String.valueOf(), i.e. it will be slow. >> That's why the specification allow you to provide a second more optimised >> version of the interpolation method using a method >> __interpolate__bootstrap__. >> This method __interpolate__bootstrap__ is not required, can not replace the >> method __interpolate__, both __interpolate__ and __interpolate__bootstrap__ >> has to be present and it's a backward compatible change to add a method >> __interpolate__bootstrap__ after the fact, there is no need to recompile >> all the client code. >> For that the compiler translation rely on invokedynamic to call the method >> bootstrap of the class InterpolateMetafactor that at runtime decide >> to trampoline either to the method __interpolate__bootstrap__ or to the >> method >> __interpolate__ if no __interpolate__bootstrap__ exists. >> Here is an example of how a call to the interpolation method of String is >> generated by javac >> For the Java code >> String name = ... >> int value = ... >> String s = String."name: \(name) age: \(age)"; >> the equivalent bytecode is >> aload_1. // load name >> iload_2. // load age >> invokedynamic __interpolate__ (Ljava/lang/StringI)Ljava/lang/String; >> java.lang.runtime.InterpolateMetafactory.bootstrap(Lookup, String, >> MethodType, >> String, MethodHandle):CallSite >> [ "name: \(name) age: \(age)", String::__interpolate__(String, >> Object[]):String >> ] >> From the perspective of the compiler the method __interpolate__ works exactly >> like a method with a polymorphic method signature (the method annotated with >> @PolymorphicSignature), >> so the descriptor of invokedynamic is created by collecting the type of the >> argument, here the interpolation method is called with a String and an int, >> so >> the descriptor >> and the return type is String so the descriptor is >> (Ljava/lang/StringI)Ljava/lang/String; >> Considering the interpolation method as a polymorphic method is important in >> term of performance because it means that not boxing will be done by the >> compiler, if there are some boxing, they will be done by the runtime, so are >> optional if the __interpolate__bootstrap__ does not need to box arguments. >> You can also notice that the formatted string is passed as a bootstrap >> constant >> so all the parsing of the format can be done once outside of the hot path. >> A call to invokedynamic also pass as a second bootstrap argument the method >> handle to the method __interpolate__, so the implementation inside >> InterpolateMetafactory.bootstrap can called this method if no method >> __interpolate__bootstrap__ exists. >> Here is a raw implementation of the class InterpolateMetafactory. >> The method formatIterator() return an Iterator of Token which is a sealed >> class. >> The method bootstrap() first lookup to a method "__interpolate__bootstrap__" >> in >> the lookup class that takes a Lookup, a String, a MethodType, the format and >> the default implementation and call it if it exists or takes the default >> implementation, bind the formatted String and adapt the arguments using >> asType >> (ask for boxing, etc). >> package java.lang.runtime; >> public class InterpolateMetafactory { >> public sealed interface Token { >> public record Text(String text) implements Token {} >> public record Binding(String name) implements Token {} >> } >> public static Iterator<Token> formatIterator(String format) { >> ... >> } >> public static CallSite bootstrap(Lookup lookup, String name, MethodType >> methodType, String format, MethodHandle impl) throws Throwable { >> // check if there is a bootstrap method >> MethodHandle bootstrap; >> try { >> bootstrap = lookup.findStatic(lookup.lookupClass(), >> "__interpolate__bootstrap__", MethodType.methodType(CallSite.class, >> Lookup.class, String.class, MethodType.class, String.class, >> MethodHandle.class)); >> } catch(NoSuchMethodException e) { >> // bind the default implementation >> return new ConstantCallSite(impl.bindTo(format).asType(methodType)); >> } >> return boostrap.invoke(lookup, name, methodType, format, impl); >> } >> } >> Here is another example, showing how to declare the methods __interpolate__ >> and >> __interpolate__bootstrap__ inside java.util.regex.Pattern. >> The "default" implementation calls Pattern.compile() and the optimized one >> always returns the result of Pattern.compile() as a constant. >> package java.util.regex; >> public class Pattern { >> public static String __interpolate__(String format) {. // the formatted >> string >> can not have arguments >> return Pattern.compile(format); >> } >> private static CallSite __interpolate__bootstrap__(Lookup lookup, String >> name, >> MethodType methodType, String format, MethodHandle impl) { >> return new ConstantCallSite(MethodHandles.constant(Pattern.class, >> Pattern.compile(format))); >> } >> } >> The method __interpolate__ provides via its signature, the parameter types >> that >> are verified by the compiler. >> It also provides a code that can be used by the tools that does static >> analysis >> on the bytecode because those tools can not see through the method handle >> returned by a bootstrap method given that it's a runtime construct, it's >> usually not available at the time the static analysis is done. This should be >> enough to have tools like Graal VM native image to see through the >> invokedynamic in a similar way it sees through the invokedynamic used when >> creating a lambda. >> The fact that all invokedynamic goes through the method >> InterpolateMetafactory.bootstrap and trampoline from it means that adding or >> removing the method __interpolate__bootstrap__ is a binary compatible change, >> if __interpolate__bootstrap__ is declared private. So implementing >> __interpolate__bootstrap__ can be an afterthought. >> regards, >> Rémi