On Fri, 9 Apr 2021 18:27:07 GMT, Jorn Vernee <jver...@openjdk.org> wrote:
>> This patch adds a `tableSwitch` combinator that can be used to switch over a >> set of method handles given an index, with a fallback in case the index is >> out of bounds, much like the `tableswitch` bytecode. Here is a description >> of how it works (copied from the javadoc): >> >> Creates a table switch method handle, which can be used to switch over >> a set of target >> method handles, based on a given target index, called selector. >> >> For a selector value of {@code n}, where {@code n} falls in the range >> {@code [0, N)}, >> and where {@code N} is the number of target method handles, the table >> switch method >> handle will invoke the n-th target method handle from the list of >> target method handles. >> >> For a selector value that does not fall in the range {@code [0, N)}, >> the table switch >> method handle will invoke the given fallback method handle. >> >> All method handles passed to this method must have the same type, with >> the additional >> requirement that the leading parameter be of type {@code int}. The >> leading parameter >> represents the selector. >> >> Any trailing parameters present in the type will appear on the returned >> table switch >> method handle as well. Any arguments assigned to these parameters will >> be forwarded, >> together with the selector value, to the selected method handle when >> invoking it. >> >> The combinator does not support specifying the starting index, so the switch >> cases always run from 0 to however many target handles are specified. A >> starting index can be added manually with another combination step that >> filters the input index by adding or subtracting a constant from it, which >> does not affect performance. One of the reasons for not supporting a >> starting index is that it allows for more lambda form sharing, but also >> simplifies the implementation somewhat. I guess an open question is if a >> convenience overload should be added for that case? >> >> Lookup switch can also be simulated by filtering the input through an >> injection function that translates it into a case index, which has also >> proven to have the ability to have comparable performance to, or even better >> performance than, a bytecode-native `lookupswitch` instruction. I plan to >> add such an injection function to the runtime libraries in the future as >> well. Maybe at that point it could be evaluated if it's worth it to add a >> lookup switch combinator as well, but I don't see an immediate need to >> include it in this patch. >> >> The current bytecode intrinsification generates a call for each switch case, >> which guarantees full inlining of the target method handles. Alternatively >> we could only have 1 callsite at the end of the switch, where each case just >> loads the target method handle, but currently this does not allow for >> inlining of the handles, since they are not constant. >> >> Maybe a future C2 optimization could look at the receiver input for >> invokeBasic call sites, and if the input is a phi node, clone the call for >> each constant input of the phi. I believe that would allow simplifying the >> bytecode without giving up on inlining. >> >> Some numbers from the added benchmarks: >> Benchmark (numCases) (offset) >> (sorted) Mode Cnt Score Error Units >> MethodHandlesTableSwitchConstant.testSwitch 5 0 >> N/A avgt 30 4.186 � 0.054 ms/op >> MethodHandlesTableSwitchConstant.testSwitch 5 150 >> N/A avgt 30 4.164 � 0.057 ms/op >> MethodHandlesTableSwitchConstant.testSwitch 10 0 >> N/A avgt 30 4.124 � 0.023 ms/op >> MethodHandlesTableSwitchConstant.testSwitch 10 150 >> N/A avgt 30 4.126 � 0.025 ms/op >> MethodHandlesTableSwitchConstant.testSwitch 25 0 >> N/A avgt 30 4.137 � 0.042 ms/op >> MethodHandlesTableSwitchConstant.testSwitch 25 150 >> N/A avgt 30 4.113 � 0.016 ms/op >> MethodHandlesTableSwitchConstant.testSwitch 50 0 >> N/A avgt 30 4.118 � 0.028 ms/op >> MethodHandlesTableSwitchConstant.testSwitch 50 150 >> N/A avgt 30 4.127 � 0.019 ms/op >> MethodHandlesTableSwitchConstant.testSwitch 100 0 >> N/A avgt 30 4.116 � 0.013 ms/op >> MethodHandlesTableSwitchConstant.testSwitch 100 150 >> N/A avgt 30 4.121 � 0.020 ms/op >> MethodHandlesTableSwitchOpaqueSingle.testSwitch 5 0 >> N/A avgt 30 4.113 � 0.009 ms/op >> MethodHandlesTableSwitchOpaqueSingle.testSwitch 5 150 >> N/A avgt 30 4.149 � 0.041 ms/op >> MethodHandlesTableSwitchOpaqueSingle.testSwitch 10 0 >> N/A avgt 30 4.121 � 0.026 ms/op >> MethodHandlesTableSwitchOpaqueSingle.testSwitch 10 150 >> N/A avgt 30 4.113 � 0.021 ms/op >> MethodHandlesTableSwitchOpaqueSingle.testSwitch 25 0 >> N/A avgt 30 4.129 � 0.028 ms/op >> MethodHandlesTableSwitchOpaqueSingle.testSwitch 25 150 >> N/A avgt 30 4.105 � 0.019 ms/op >> MethodHandlesTableSwitchOpaqueSingle.testSwitch 50 0 >> N/A avgt 30 4.097 � 0.021 ms/op >> MethodHandlesTableSwitchOpaqueSingle.testSwitch 50 150 >> N/A avgt 30 4.131 � 0.037 ms/op >> MethodHandlesTableSwitchOpaqueSingle.testSwitch 100 0 >> N/A avgt 30 4.135 � 0.025 ms/op >> MethodHandlesTableSwitchOpaqueSingle.testSwitch 100 150 >> N/A avgt 30 4.139 � 0.145 ms/op >> MethodHandlesTableSwitchRandom.testSwitch 5 0 >> true avgt 30 4.894 � 0.028 ms/op >> MethodHandlesTableSwitchRandom.testSwitch 5 0 >> false avgt 30 11.526 � 0.194 ms/op >> MethodHandlesTableSwitchRandom.testSwitch 5 150 >> true avgt 30 4.882 � 0.025 ms/op >> MethodHandlesTableSwitchRandom.testSwitch 5 150 >> false avgt 30 11.532 � 0.034 ms/op >> MethodHandlesTableSwitchRandom.testSwitch 10 0 >> true avgt 30 5.065 � 0.076 ms/op >> MethodHandlesTableSwitchRandom.testSwitch 10 0 >> false avgt 30 13.016 � 0.020 ms/op >> MethodHandlesTableSwitchRandom.testSwitch 10 150 >> true avgt 30 5.103 � 0.051 ms/op >> MethodHandlesTableSwitchRandom.testSwitch 10 150 >> false avgt 30 12.984 � 0.102 ms/op >> MethodHandlesTableSwitchRandom.testSwitch 25 0 >> true avgt 30 8.441 � 0.165 ms/op >> MethodHandlesTableSwitchRandom.testSwitch 25 0 >> false avgt 30 13.371 � 0.060 ms/op >> MethodHandlesTableSwitchRandom.testSwitch 25 150 >> true avgt 30 8.628 � 0.032 ms/op >> MethodHandlesTableSwitchRandom.testSwitch 25 150 >> false avgt 30 13.542 � 0.020 ms/op >> MethodHandlesTableSwitchRandom.testSwitch 50 0 >> true avgt 30 4.701 � 0.015 ms/op >> MethodHandlesTableSwitchRandom.testSwitch 50 0 >> false avgt 30 13.562 � 0.063 ms/op >> MethodHandlesTableSwitchRandom.testSwitch 50 150 >> true avgt 30 7.991 � 3.111 ms/op >> MethodHandlesTableSwitchRandom.testSwitch 50 150 >> false avgt 30 13.543 � 0.088 ms/op >> MethodHandlesTableSwitchRandom.testSwitch 100 0 >> true avgt 30 4.712 � 0.020 ms/op >> MethodHandlesTableSwitchRandom.testSwitch 100 0 >> false avgt 30 13.600 � 0.085 ms/op >> MethodHandlesTableSwitchRandom.testSwitch 100 150 >> true avgt 30 4.676 � 0.011 ms/op >> MethodHandlesTableSwitchRandom.testSwitch 100 150 >> false avgt 30 13.476 � 0.043 ms/op >> >> Testing: >> - [x] Running of included benchmarks >> - [x] Inspecting inlining trace and verifying method handle targets are >> inlined >> - [x] Running TestTableSwitch test (currently the only user of the new code) >> - [x] Running java/lang/invoke tests (just in case) >> - [x] Some manual testing >> >> Thanks, >> Jorn > >> yes, for all the switches, pattern-switch, enum-switch but not for the >> string switch which requires a lookup switch. > Can you outline how to use the tableswitch combinator in the case of a switch > on strings ? > > Jan Lahoda has made several good examples of that: > https://github.com/lahodaj/jdk/blob/switch-bootstrap/src/java.base/share/classes/java/lang/runtime/SwitchBootstraps.java > i.e. several filtering strategies for translating a String into a table > index (which can then be fed to `tableswitch`) > > I ran some benchmarks: > > ![results](http://cr.openjdk.java.net/~jvernee/StringSwitch_10.svg) > > Here, 'legacy' is what C2 does with `lookupswitch`. > > Maybe it's worth it to include such an example & benchmark in this patch as > well (show off how to emulate `lookupswitch`) I've uploaded a benchmark that simulates a lookup switch using the tableSwitch combinator as well, using a HashMap lookup as a filter: https://github.com/openjdk/jdk/commit/a7157eb0ef1b7190aabfb2689539801c6692bbae For that particular key set (the same as from the graph above), HashMap is faster: Benchmark Mode Cnt Score Error Units MethodHandlesEmulateLookupSwitch.emulatedLookupSwitch avgt 30 19.450 � 0.079 ms/op MethodHandlesEmulateLookupSwitch.nativeLookupSwitch avgt 30 25.370 � 0.159 ms/op But, I've found it really depends on the key set. However, this is mostly to demonstrate that emulation can have competitive performance with native `lookupswitch`. e.g. to get constant folding for constant inputs another filter has to be used, since C2 can not see through the HashMap lookups. ------------- PR: https://git.openjdk.java.net/jdk/pull/3401