On Fri, 27 Aug 2021 13:12:33 GMT, Maurizio Cimadamore <[email protected]>
wrote:
>> Mandy Chung has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Remove separate accessor for static vs instance method
>>
>> There is no effective difference when using MethodHandle::dropArgument for
>> static method. Removing Static*Accessor and Instance*Accessor simplifies
>> the implementation.
>
> src/java.base/share/classes/jdk/internal/reflect/DirectConstructorAccessorImpl.java
> line 106:
>
>> 104: Object invokeImpl(Object[] args) throws Throwable {
>> 105: var mhInvoker = mhInvoker();
>> 106: return switch (paramCount) {
>
> As @plevart observed, I'm also a bit surprised to see this, but I note your
> comments regarding performance - especially in cold case. Every adaptation
> counts, I guess, if you're not in the hot path. But let's make sure that the
> pay off to add the extra hand-specialized cases is worth it - I'd be
> surprised if spreading arguments is that expensive.
While I recall it was motivated primarily for startup (note startup numbers for
`-Djdk.reflect.useSpreader=true` in @mlchung reply earlier in this PR), the
specialization to avoid the spreader for low-arity arguments also improve
performance on throughput microbenchmarks. Surprisingly also reduce the per
invocation allocation rate.
Allocation profiling suggests that with the specialization a varargs array is
entirely eliminated. This is of course somewhat fragile and dependent on a
number of things - and might ultimately be an artifact of a rather synthetic
microbenchmark that will have little benefit in practice, but it's a rather
consistent gain for various number of arguments - even when actually going into
a spreader (I have some hypotheses as to why this might happen, but most likely
we help the JIT to narrow things down for each kind of invocation scheme with
the selector):
18 baseline
Benchmark Mode
Cnt Score Error Units
ReflectionMethods.static_method avgt
10 57.329 ± 4.217 ns/op
ReflectionMethods.static_method:·gc.alloc.rate.norm avgt
10 72.006 ± 0.001 B/op
ReflectionMethods.static_method_3arg avgt
10 70.940 ± 7.457 ns/op
ReflectionMethods.static_method_3arg:·gc.alloc.rate.norm avgt
10 96.008 ± 0.002 B/op
ReflectionMethods.static_method_4arg avgt
10 90.453 ± 4.252 ns/op
ReflectionMethods.static_method_4arg:·gc.alloc.rate.norm avgt
10 112.010 ± 0.001 B/op
pull/5027
Benchmark Mode
Cnt Score Error Units
ReflectionMethods.static_method avgt
10 41.518 ± 2.444 ns/op
ReflectionMethods.static_method:·gc.alloc.rate.norm avgt
10 48.004 ± 0.001 B/op
ReflectionMethods.static_method_3arg avgt
10 57.603 ± 3.299 ns/op
ReflectionMethods.static_method_3arg:·gc.alloc.rate.norm avgt
10 64.006 ± 0.001 B/op
ReflectionMethods.static_method_4arg avgt
10 56.772 ± 3.971 ns/op
ReflectionMethods.static_method_4arg:·gc.alloc.rate.norm avgt
10 80.007 ± 0.001 B/op
On a patch to remove the specialization on top of pull/5027, performance
reverts back to numbers very similar to the baseline:
remove-spreader-patch:
Benchmark Mode
Cnt Score Error Units
ReflectionMethods.static_method avgt
10 56.644 ± 4.322 ns/op
ReflectionMethods.static_method:·gc.alloc.rate.norm avgt
10 72.006 ± 0.001 B/op
ReflectionMethods.static_method_3arg avgt
10 72.353 ± 6.815 ns/op
ReflectionMethods.static_method_3arg:·gc.alloc.rate.norm avgt
10 96.008 ± 0.001 B/op
ReflectionMethods.static_method_4arg avgt
10 82.820 ± 8.303 ns/op
ReflectionMethods.static_method_4arg:·gc.alloc.rate.norm avgt
10 112.009 ± 0.002 B/op
I think the cold start reduction alone is worth the extra 100 lines or so of
code, but if the gain on these microbenchmarks translates to real throughput
gains then I think the complexity is more than paid for. Simplifying it while
retaining similar characteristics would be great of course, but I think such an
exploration should be done as a follow-up.
-------------
PR: https://git.openjdk.java.net/jdk/pull/5027