On Fri, 9 Apr 2021 18:27:07 GMT, Jorn Vernee <jver...@openjdk.org> wrote:

>> This patch adds a `tableSwitch` combinator that can be used to switch over a 
>> set of method handles given an index, with a fallback in case the index is 
>> out of bounds, much like the `tableswitch` bytecode. Here is a description 
>> of how it works (copied from the javadoc):
>> 
>>      Creates a table switch method handle, which can be used to switch over 
>> a set of target
>>      method handles, based on a given target index, called selector.
>> 
>>      For a selector value of {@code n}, where {@code n} falls in the range 
>> {@code [0, N)},
>>      and where {@code N} is the number of target method handles, the table 
>> switch method
>>      handle will invoke the n-th target method handle from the list of 
>> target method handles.
>> 
>>      For a selector value that does not fall in the range {@code [0, N)}, 
>> the table switch
>>      method handle will invoke the given fallback method handle.
>> 
>>      All method handles passed to this method must have the same type, with 
>> the additional
>>      requirement that the leading parameter be of type {@code int}. The 
>> leading parameter
>>      represents the selector.
>> 
>>      Any trailing parameters present in the type will appear on the returned 
>> table switch
>>      method handle as well. Any arguments assigned to these parameters will 
>> be forwarded,
>>      together with the selector value, to the selected method handle when 
>> invoking it.
>> 
>> The combinator does not support specifying the starting index, so the switch 
>> cases always run from 0 to however many target handles are specified. A 
>> starting index can be added manually with another combination step that 
>> filters the input index by adding or subtracting a constant from it, which 
>> does not affect performance. One of the reasons for not supporting a 
>> starting index is that it allows for more lambda form sharing, but also 
>> simplifies the implementation somewhat. I guess an open question is if a 
>> convenience overload should be added for that case?
>> 
>> Lookup switch can also be simulated by filtering the input through an 
>> injection function that translates it into a case index, which has also 
>> proven to have the ability to have comparable performance to, or even better 
>> performance than, a bytecode-native `lookupswitch` instruction. I plan to 
>> add such an injection function to the runtime libraries in the future as 
>> well. Maybe at that point it could be evaluated if it's worth it to add a 
>> lookup switch combinator as well, but I don't see an immediate need to 
>> include it in this patch.
>> 
>> The current bytecode intrinsification generates a call for each switch case, 
>> which guarantees full inlining of the target method handles. Alternatively 
>> we could only have 1 callsite at the end of the switch, where each case just 
>> loads the target method handle, but currently this does not allow for 
>> inlining of the handles, since they are not constant.
>> 
>> Maybe a future C2 optimization could look at the receiver input for 
>> invokeBasic call sites, and if the input is a phi node, clone the call for 
>> each constant input of the phi. I believe that would allow simplifying the 
>> bytecode without giving up on inlining.
>> 
>> Some numbers from the added benchmarks:
>> Benchmark                                        (numCases)  (offset)  
>> (sorted)  Mode  Cnt   Score   Error  Units
>> MethodHandlesTableSwitchConstant.testSwitch               5         0       
>> N/A  avgt   30   4.186 � 0.054  ms/op
>> MethodHandlesTableSwitchConstant.testSwitch               5       150       
>> N/A  avgt   30   4.164 � 0.057  ms/op
>> MethodHandlesTableSwitchConstant.testSwitch              10         0       
>> N/A  avgt   30   4.124 � 0.023  ms/op
>> MethodHandlesTableSwitchConstant.testSwitch              10       150       
>> N/A  avgt   30   4.126 � 0.025  ms/op
>> MethodHandlesTableSwitchConstant.testSwitch              25         0       
>> N/A  avgt   30   4.137 � 0.042  ms/op
>> MethodHandlesTableSwitchConstant.testSwitch              25       150       
>> N/A  avgt   30   4.113 � 0.016  ms/op
>> MethodHandlesTableSwitchConstant.testSwitch              50         0       
>> N/A  avgt   30   4.118 � 0.028  ms/op
>> MethodHandlesTableSwitchConstant.testSwitch              50       150       
>> N/A  avgt   30   4.127 � 0.019  ms/op
>> MethodHandlesTableSwitchConstant.testSwitch             100         0       
>> N/A  avgt   30   4.116 � 0.013  ms/op
>> MethodHandlesTableSwitchConstant.testSwitch             100       150       
>> N/A  avgt   30   4.121 � 0.020  ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch           5         0       
>> N/A  avgt   30   4.113 � 0.009  ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch           5       150       
>> N/A  avgt   30   4.149 � 0.041  ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch          10         0       
>> N/A  avgt   30   4.121 � 0.026  ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch          10       150       
>> N/A  avgt   30   4.113 � 0.021  ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch          25         0       
>> N/A  avgt   30   4.129 � 0.028  ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch          25       150       
>> N/A  avgt   30   4.105 � 0.019  ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch          50         0       
>> N/A  avgt   30   4.097 � 0.021  ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch          50       150       
>> N/A  avgt   30   4.131 � 0.037  ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch         100         0       
>> N/A  avgt   30   4.135 � 0.025  ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch         100       150       
>> N/A  avgt   30   4.139 � 0.145  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                 5         0      
>> true  avgt   30   4.894 � 0.028  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                 5         0     
>> false  avgt   30  11.526 � 0.194  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                 5       150      
>> true  avgt   30   4.882 � 0.025  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                 5       150     
>> false  avgt   30  11.532 � 0.034  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                10         0      
>> true  avgt   30   5.065 � 0.076  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                10         0     
>> false  avgt   30  13.016 � 0.020  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                10       150      
>> true  avgt   30   5.103 � 0.051  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                10       150     
>> false  avgt   30  12.984 � 0.102  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                25         0      
>> true  avgt   30   8.441 � 0.165  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                25         0     
>> false  avgt   30  13.371 � 0.060  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                25       150      
>> true  avgt   30   8.628 � 0.032  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                25       150     
>> false  avgt   30  13.542 � 0.020  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                50         0      
>> true  avgt   30   4.701 � 0.015  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                50         0     
>> false  avgt   30  13.562 � 0.063  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                50       150      
>> true  avgt   30   7.991 � 3.111  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                50       150     
>> false  avgt   30  13.543 � 0.088  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch               100         0      
>> true  avgt   30   4.712 � 0.020  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch               100         0     
>> false  avgt   30  13.600 � 0.085  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch               100       150      
>> true  avgt   30   4.676 � 0.011  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch               100       150     
>> false  avgt   30  13.476 � 0.043  ms/op
>> 
>> Testing:
>> - [x] Running of included benchmarks
>> - [x] Inspecting inlining trace and verifying method handle targets are 
>> inlined
>> - [x] Running TestTableSwitch test (currently the only user of the new code)
>> - [x] Running java/lang/invoke tests (just in case)
>> - [x] Some manual testing
>> 
>> Thanks,
>> Jorn
>
>> yes, for all the switches, pattern-switch, enum-switch but not for the 
>> string switch which requires a lookup switch.
> Can you outline how to use the tableswitch combinator in the case of a switch 
> on strings ?
> 
> Jan Lahoda has made several good examples of that: 
> https://github.com/lahodaj/jdk/blob/switch-bootstrap/src/java.base/share/classes/java/lang/runtime/SwitchBootstraps.java
>  i.e. several filtering strategies for translating a String into a table 
> index (which can then be fed to `tableswitch`)
> 
> I ran some benchmarks:
> 
> ![results](http://cr.openjdk.java.net/~jvernee/StringSwitch_10.svg)
> 
> Here, 'legacy' is what C2 does with `lookupswitch`.
> 
> Maybe it's worth it to include such an example & benchmark in this patch as 
> well (show off how to emulate `lookupswitch`)

I've uploaded a benchmark that simulates a lookup switch using the tableSwitch 
combinator as well, using a HashMap lookup as a filter: 
https://github.com/openjdk/jdk/commit/a7157eb0ef1b7190aabfb2689539801c6692bbae

For that particular key set (the same as from the graph above), HashMap is 
faster:

Benchmark                                              Mode  Cnt   Score   
Error  Units
MethodHandlesEmulateLookupSwitch.emulatedLookupSwitch  avgt   30  19.450 � 
0.079  ms/op
MethodHandlesEmulateLookupSwitch.nativeLookupSwitch    avgt   30  25.370 � 
0.159  ms/op

But, I've found it really depends on the key set. However, this is mostly to 
demonstrate that emulation can have competitive performance with native 
`lookupswitch`. e.g. to get constant folding for constant inputs another filter 
has to be used, since C2 can not see through the HashMap lookups.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3401

Reply via email to