On Thu, 8 Apr 2021 18:51:21 GMT, Jorn Vernee <jver...@openjdk.org> wrote:
> This patch adds a `tableSwitch` combinator that can be used to switch over a > set of method handles given an index, with a fallback in case the index is > out of bounds, much like the `tableswitch` bytecode. Here is a description of > how it works (copied from the javadoc): > > Creates a table switch method handle, which can be used to switch over a > set of target > method handles, based on a given target index, called selector. > > For a selector value of {@code n}, where {@code n} falls in the range > {@code [0, N)}, > and where {@code N} is the number of target method handles, the table > switch method > handle will invoke the n-th target method handle from the list of target > method handles. > > For a selector value that does not fall in the range {@code [0, N)}, the > table switch > method handle will invoke the given fallback method handle. > > All method handles passed to this method must have the same type, with > the additional > requirement that the leading parameter be of type {@code int}. The > leading parameter > represents the selector. > > Any trailing parameters present in the type will appear on the returned > table switch > method handle as well. Any arguments assigned to these parameters will > be forwarded, > together with the selector value, to the selected method handle when > invoking it. > > The combinator does not support specifying the starting index, so the switch > cases always run from 0 to however many target handles are specified. A > starting index can be added manually with another combination step that > filters the input index by adding or subtracting a constant from it, which > does not affect performance. One of the reasons for not supporting a starting > index is that it allows for more lambda form sharing, but also simplifies the > implementation somewhat. I guess an open question is if a convenience > overload should be added for that case? > > Lookup switch can also be simulated by filtering the input through an > injection function that translates it into a case index, which has also > proven to have the ability to have comparable performance to, or even better > performance than, a bytecode-native `lookupswitch` instruction. I plan to add > such an injection function to the runtime libraries in the future as well. > Maybe at that point it could be evaluated if it's worth it to add a lookup > switch combinator as well, but I don't see an immediate need to include it in > this patch. > > The current bytecode intrinsification generates a call for each switch case, > which guarantees full inlining of the target method handles. Alternatively we > could only have 1 callsite at the end of the switch, where each case just > loads the target method handle, but currently this does not allow for > inlining of the handles, since they are not constant. > > Maybe a future C2 optimization could look at the receiver input for > invokeBasic call sites, and if the input is a phi node, clone the call for > each constant input of the phi. I believe that would allow simplifying the > bytecode without giving up on inlining. > > Some numbers from the added benchmarks: > > Benchmark (numCases) (offset) > (sorted) Mode Cnt Score Error Units > MethodHandlesTableSwitchConstant.testSwitch 5 0 > N/A avgt 30 4.186 � 0.054 ms/op > MethodHandlesTableSwitchConstant.testSwitch 5 150 > N/A avgt 30 4.164 � 0.057 ms/op > MethodHandlesTableSwitchConstant.testSwitch 10 0 > N/A avgt 30 4.124 � 0.023 ms/op > MethodHandlesTableSwitchConstant.testSwitch 10 150 > N/A avgt 30 4.126 � 0.025 ms/op > MethodHandlesTableSwitchConstant.testSwitch 25 0 > N/A avgt 30 4.137 � 0.042 ms/op > MethodHandlesTableSwitchConstant.testSwitch 25 150 > N/A avgt 30 4.113 � 0.016 ms/op > MethodHandlesTableSwitchConstant.testSwitch 50 0 > N/A avgt 30 4.118 � 0.028 ms/op > MethodHandlesTableSwitchConstant.testSwitch 50 150 > N/A avgt 30 4.127 � 0.019 ms/op > MethodHandlesTableSwitchConstant.testSwitch 100 0 > N/A avgt 30 4.116 � 0.013 ms/op > MethodHandlesTableSwitchConstant.testSwitch 100 150 > N/A avgt 30 4.121 � 0.020 ms/op > MethodHandlesTableSwitchOpaqueSingle.testSwitch 5 0 > N/A avgt 30 4.113 � 0.009 ms/op > MethodHandlesTableSwitchOpaqueSingle.testSwitch 5 150 > N/A avgt 30 4.149 � 0.041 ms/op > MethodHandlesTableSwitchOpaqueSingle.testSwitch 10 0 > N/A avgt 30 4.121 � 0.026 ms/op > MethodHandlesTableSwitchOpaqueSingle.testSwitch 10 150 > N/A avgt 30 4.113 � 0.021 ms/op > MethodHandlesTableSwitchOpaqueSingle.testSwitch 25 0 > N/A avgt 30 4.129 � 0.028 ms/op > MethodHandlesTableSwitchOpaqueSingle.testSwitch 25 150 > N/A avgt 30 4.105 � 0.019 ms/op > MethodHandlesTableSwitchOpaqueSingle.testSwitch 50 0 > N/A avgt 30 4.097 � 0.021 ms/op > MethodHandlesTableSwitchOpaqueSingle.testSwitch 50 150 > N/A avgt 30 4.131 � 0.037 ms/op > MethodHandlesTableSwitchOpaqueSingle.testSwitch 100 0 > N/A avgt 30 4.135 � 0.025 ms/op > MethodHandlesTableSwitchOpaqueSingle.testSwitch 100 150 > N/A avgt 30 4.139 � 0.145 ms/op > MethodHandlesTableSwitchRandom.testSwitch 5 0 > true avgt 30 4.894 � 0.028 ms/op > MethodHandlesTableSwitchRandom.testSwitch 5 0 > false avgt 30 11.526 � 0.194 ms/op > MethodHandlesTableSwitchRandom.testSwitch 5 150 > true avgt 30 4.882 � 0.025 ms/op > MethodHandlesTableSwitchRandom.testSwitch 5 150 > false avgt 30 11.532 � 0.034 ms/op > MethodHandlesTableSwitchRandom.testSwitch 10 0 > true avgt 30 5.065 � 0.076 ms/op > MethodHandlesTableSwitchRandom.testSwitch 10 0 > false avgt 30 13.016 � 0.020 ms/op > MethodHandlesTableSwitchRandom.testSwitch 10 150 > true avgt 30 5.103 � 0.051 ms/op > MethodHandlesTableSwitchRandom.testSwitch 10 150 > false avgt 30 12.984 � 0.102 ms/op > MethodHandlesTableSwitchRandom.testSwitch 25 0 > true avgt 30 8.441 � 0.165 ms/op > MethodHandlesTableSwitchRandom.testSwitch 25 0 > false avgt 30 13.371 � 0.060 ms/op > MethodHandlesTableSwitchRandom.testSwitch 25 150 > true avgt 30 8.628 � 0.032 ms/op > MethodHandlesTableSwitchRandom.testSwitch 25 150 > false avgt 30 13.542 � 0.020 ms/op > MethodHandlesTableSwitchRandom.testSwitch 50 0 > true avgt 30 4.701 � 0.015 ms/op > MethodHandlesTableSwitchRandom.testSwitch 50 0 > false avgt 30 13.562 � 0.063 ms/op > MethodHandlesTableSwitchRandom.testSwitch 50 150 > true avgt 30 7.991 � 3.111 ms/op > MethodHandlesTableSwitchRandom.testSwitch 50 150 > false avgt 30 13.543 � 0.088 ms/op > MethodHandlesTableSwitchRandom.testSwitch 100 0 > true avgt 30 4.712 � 0.020 ms/op > MethodHandlesTableSwitchRandom.testSwitch 100 0 > false avgt 30 13.600 � 0.085 ms/op > MethodHandlesTableSwitchRandom.testSwitch 100 150 > true avgt 30 4.676 � 0.011 ms/op > MethodHandlesTableSwitchRandom.testSwitch 100 150 > false avgt 30 13.476 � 0.043 ms/op > > > Testing: > - [x] Running of included benchmarks > - [x] Inspecting inlining trace and verifying method handle targets are > inlined > - [x] Running TestTableSwitch test (currently the only user of the new code) > - [x] Running java/lang/invoke tests (just in case) > - [x] Some manual testing > > Thanks, > Jorn > you have two strategy to de-sugar a switch, > > * if what you do after the case do not mutate any variables, you can desugar > each case to a method more or less like a lambda (it's not exactly like a > lambda because there is no capture) and you have an indy in front that will > call the right method handles > > * you have a front end, with an indy that associate the object to an int and > a backend which is tableswitch in the bytecode > > ... > > The tests above are using the first strategy No, they are using the second strategy. The SwitchBootstraps patch I linked to replaces the front end `lookupswitch` of a String switch with an `invokedynamic` that computes an index for the back end jump table, which is still a `tableswitch` in the bytecode. As John also described, a hypothetical lookupSwitch combinator can be emulated by using a `k -> [0, N)` projection that feeds into the tableSwitch combinator that is proposed by this PR. The point of the examples I linked to was to show several flavors of projection functions as an example of how this could be implemented, and to show that they have competitive performance with a native `lookupswitch` instruction (the 'legacy' case). i.e. the benchmarks show the difference between `lookupswitch` implemented in bytecode, and a `k -> [0, N)` projection function built by an `invokedynamic`. (sorry, I should have offered more explanation in the first place) The combinator added by _this_ PR is not meant to replace any part of the String switch translation. For pattern switch the `tableSwitch` combinator _could_ be used to implement the front end `k -> [0, N)` projection, but it is not strictly required. Either way, that seems orthogonal to this PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/3401