On 09/04/2021 18:54, Remi Forax wrote:
----- Mail original -----
De: "Jorn Vernee" <jver...@openjdk.java.net>
À: "core-libs-dev" <core-libs-dev@openjdk.java.net>
Envoyé: Vendredi 9 Avril 2021 12:51:53
Objet: RFR: 8263087: Add a MethodHandle combinator that switches over a set of
MethodHandles
This patch adds a `tableSwitch` combinator that can be used to switch over a set
of method handles given an index, with a fallback in case the index is out of
bounds, much like the `tableswitch` bytecode.
The combinator does not support specifying the starting index, so the switch
cases always run from 0 to however many target handles are specified. A
starting index can be added manually with another combination step that filters
the input index by adding or subtracting a constant from it, which does not
affect performance. One of the reasons for not supporting a starting index is
that it allows for more lambda form sharing, but also simplifies the
implementation somewhat. I guess an open question is if a convenience overload
should be added for that case?
I think the combinator should be lookupswitch which is more general than
tableswitch with a special case when generating the bytecode to generate a
tableswitch instead of a lookupswitch if the indexes are subsequent.
One of the bigger downsides I see in supporting lookupswitch directly is
that the lambda form and intrinsified bytecode become dependent on the
key set, which allows for less sharing. Something that is not/less of a
problem with tableswitch + filter function, because the filter function
could potentially be the same for any key set (where the key set is
bound to the filter function instead).
Lookup switch can also be simulated by filtering the input through an injection
function that translates it into a case index, which has also proven to have
the ability to have comparable performance to, or even better performance than,
a bytecode-native `lookupswitch` instruction. I plan to add such an injection
function to the runtime libraries in the future as well. Maybe at that point it
could be evaluated if it's worth it to add a lookup switch combinator as well,
but I don't see an immediate need to include it in this patch.
As i said in the bug when we discuss about that the filtering function,
i believe that the filtering function for emulating lookupswitch is
lookupswitch itself.
Right, but lookupswitch also ties us into C2's optimization strategy for
lookupswitch. Having the ability to specify the filter function allows
picking a better one for the particular use-case. For instance for
switches with a large-ish number of cases (15+) it's faster to use a
HashMap lookup as a filtering function (according to my benchmarking),
with comparinble results to native lookupswitch if the filter function
uses a tree of if/else.
Though, I'm not saying that it's not worth it to add a lookupswitch
combinator as well, to me it seems like tableswitch is the more
flexible/minimal primitive, because it doesn't force the use of a
particular lookup strategy.
WRT picking the translation strategy based on the set of keys; I'm note
super keen on that. Since the MethodHandle combinators are a low-level
API, I ended up adopting a simple 'what you see is what you get'
philosophy as much as possible, with the possibility of building other
use-cases on top. i.e. a tableSwitch combinator that reliably translates
into the tableswitch bytecode, a lookupSwitch combinator that reliably
translates into the lookupswitch bytecode, and an exception if I get the
key set wrong, rather than silently switching strategies to one or the
other.
The current bytecode intrinsification generates a call for each switch case,
which guarantees full inlining of the target method handles. Alternatively we
could only have 1 callsite at the end of the switch, where each case just loads
the target method handle, but currently this does not allow for inlining of the
handles, since they are not constant.
This scheme also allows to never JIT compile a branch which is never used.
Yes, that's a good point, thanks.
Thanks for the input,
Jorn
Maybe a future C2 optimization could look at the receiver input for invokeBasic
call sites, and if the input is a phi node, clone the call for each constant
input of the phi. I believe that would allow simplifying the bytecode without
giving up on inlining.
Some numbers from the added benchmarks:
Benchmark (numCases) (offset) (sorted)
Mode Cnt Score Error Units
MethodHandlesTableSwitchConstant.testSwitch 5 0 N/A
avgt 30 4.186 � 0.054 ms/op
MethodHandlesTableSwitchConstant.testSwitch 5 150 N/A
avgt 30 4.164 � 0.057 ms/op
MethodHandlesTableSwitchConstant.testSwitch 10 0 N/A
avgt 30 4.124 � 0.023 ms/op
MethodHandlesTableSwitchConstant.testSwitch 10 150 N/A
avgt 30 4.126 � 0.025 ms/op
MethodHandlesTableSwitchConstant.testSwitch 25 0 N/A
avgt 30 4.137 � 0.042 ms/op
MethodHandlesTableSwitchConstant.testSwitch 25 150 N/A
avgt 30 4.113 � 0.016 ms/op
MethodHandlesTableSwitchConstant.testSwitch 50 0 N/A
avgt 30 4.118 � 0.028 ms/op
MethodHandlesTableSwitchConstant.testSwitch 50 150 N/A
avgt 30 4.127 � 0.019 ms/op
MethodHandlesTableSwitchConstant.testSwitch 100 0 N/A
avgt 30 4.116 � 0.013 ms/op
MethodHandlesTableSwitchConstant.testSwitch 100 150 N/A
avgt 30 4.121 � 0.020 ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch 5 0 N/A
avgt 30 4.113 � 0.009 ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch 5 150 N/A
avgt 30 4.149 � 0.041 ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch 10 0 N/A
avgt 30 4.121 � 0.026 ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch 10 150 N/A
avgt 30 4.113 � 0.021 ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch 25 0 N/A
avgt 30 4.129 � 0.028 ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch 25 150 N/A
avgt 30 4.105 � 0.019 ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch 50 0 N/A
avgt 30 4.097 � 0.021 ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch 50 150 N/A
avgt 30 4.131 � 0.037 ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch 100 0 N/A
avgt 30 4.135 � 0.025 ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch 100 150 N/A
avgt 30 4.139 � 0.145 ms/op
MethodHandlesTableSwitchRandom.testSwitch 5 0 true
avgt 30 4.894 � 0.028 ms/op
MethodHandlesTableSwitchRandom.testSwitch 5 0 false
avgt 30 11.526 � 0.194 ms/op
MethodHandlesTableSwitchRandom.testSwitch 5 150 true
avgt 30 4.882 � 0.025 ms/op
MethodHandlesTableSwitchRandom.testSwitch 5 150 false
avgt 30 11.532 � 0.034 ms/op
MethodHandlesTableSwitchRandom.testSwitch 10 0 true
avgt 30 5.065 � 0.076 ms/op
MethodHandlesTableSwitchRandom.testSwitch 10 0 false
avgt 30 13.016 � 0.020 ms/op
MethodHandlesTableSwitchRandom.testSwitch 10 150 true
avgt 30 5.103 � 0.051 ms/op
MethodHandlesTableSwitchRandom.testSwitch 10 150 false
avgt 30 12.984 � 0.102 ms/op
MethodHandlesTableSwitchRandom.testSwitch 25 0 true
avgt 30 8.441 � 0.165 ms/op
MethodHandlesTableSwitchRandom.testSwitch 25 0 false
avgt 30 13.371 � 0.060 ms/op
MethodHandlesTableSwitchRandom.testSwitch 25 150 true
avgt 30 8.628 � 0.032 ms/op
MethodHandlesTableSwitchRandom.testSwitch 25 150 false
avgt 30 13.542 � 0.020 ms/op
MethodHandlesTableSwitchRandom.testSwitch 50 0 true
avgt 30 4.701 � 0.015 ms/op
MethodHandlesTableSwitchRandom.testSwitch 50 0 false
avgt 30 13.562 � 0.063 ms/op
MethodHandlesTableSwitchRandom.testSwitch 50 150 true
avgt 30 7.991 � 3.111 ms/op
MethodHandlesTableSwitchRandom.testSwitch 50 150 false
avgt 30 13.543 � 0.088 ms/op
MethodHandlesTableSwitchRandom.testSwitch 100 0 true
avgt 30 4.712 � 0.020 ms/op
MethodHandlesTableSwitchRandom.testSwitch 100 0 false
avgt 30 13.600 � 0.085 ms/op
MethodHandlesTableSwitchRandom.testSwitch 100 150 true
avgt 30 4.676 � 0.011 ms/op
MethodHandlesTableSwitchRandom.testSwitch 100 150 false
avgt 30 13.476 � 0.043 ms/op
Testing:
- [x] Running of included benchmarks
- [x] Inspecting inlining trace and verifying method handle targets are inlined
- [x] Running TestTableSwitch test (currently the only user of the new code)
- [x] Running java/lang/invoke tests (just in case)
- [x] Some manual testing
Thanks,
Jorn
-------------
Commit messages:
- Improve test
- Touchup
- Use cases array + holder
- WIP - implement tableSwitch combinator in lambda form interpreter
Changes: https://git.openjdk.java.net/jdk/pull/3401/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3401&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8263087
Stats: 959 lines in 8 files changed: 955 ins; 0 del; 4 mod
Patch: https://git.openjdk.java.net/jdk/pull/3401.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/3401/head:pull/3401
PR: https://git.openjdk.java.net/jdk/pull/3401