Duncan, thanks a lot for giving it a try!
If you plan to spend more time on it, please, apply 8068915 as well. I
saw huge intermittent performance regressions due to continuous
deoptimization storm. You can look into -XX:+LogCompilation output and
look for repeated deoptimization events in steady state w/ Action_none.
Also, there's deoptimization statistics in the log (at least, in jdk9).
It's located right before compilation_log tag.
Thanks again for the valuable feedback!
Best regards,
Vladimir Ivanov
[1] http://cr.openjdk.java.net/~vlivanov/8068915/webrev.00
On 1/19/15 11:21 PM, MacGregor, Duncan (GE Energy Management) wrote:
Okay, I¹ve done some tests of this with the micro benchmarks for our
language & runtime which show pretty much no change except for one test
which is now almost 3x slower. It uses nested loops to iterate over an
array and concatenate the string-like objects it contains, and replaces
elements with these new longer string-llike objects. It¹s a bit of a
pathological case, and I haven¹t seen the same sort of degradation in the
other benchmarks or in real applications, but I haven¹t done serious
benchmarking of them with this change.
I shall see if the test case can be reduced down to anything simpler while
still showing the same performance behaviour, and try add some compilation
logging options to narrow down what¹s going on.
Duncan.
On 16/01/2015 17:16, "Vladimir Ivanov" <vladimir.x.iva...@oracle.com>
wrote:
http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/
http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/
https://bugs.openjdk.java.net/browse/JDK-8063137
After GuardWithTest (GWT) LambdaForms became shared, profile pollution
significantly distorted compilation decisions. It affected inlining and
hindered some optimizations. It causes significant performance
regressions for Nashorn (on Octane benchmarks).
Inlining was fixed by 8059877 [1], but it didn't cover the case when a
branch is never taken. It can cause missed optimization opportunity, and
not just increase in code size. For example, non-pruned branch can break
escape analysis.
Currently, there are 2 problems:
- branch frequencies profile pollution
- deoptimization counts pollution
Branch frequency pollution hides from JIT the fact that a branch is
never taken. Since GWT LambdaForms (and hence their bytecode) are
heavily shared, but the behavior is specific to MethodHandle, there's no
way for JIT to understand how particular GWT instance behaves.
The solution I propose is to do profiling in Java code and feed it to
JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where
profiling info is stored. Once JIT kicks in, it can retrieve these
counts, if corresponding MethodHandle is a compile-time constant (and it
is usually the case). To communicate the profile data from Java code to
JIT, MethodHandleImpl::profileBranch() is used.
If GWT MethodHandle isn't a compile-time constant, profiling should
proceed. It happens when corresponding LambdaForm is already shared, for
newly created GWT MethodHandles profiling can occur only in native code
(dedicated nmethod for a single LambdaForm). So, when compilation of the
whole MethodHandle chain is triggered, the profile should be already
gathered.
Overriding branch frequencies is not enough. Statistics on
deoptimization events is also polluted. Even if a branch is never taken,
JIT doesn't issue an uncommon trap there unless corresponding bytecode
doesn't trap too much and doesn't cause too many recompiles.
I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT
sees it on some method, Compile::too_many_traps &
Compile::too_many_recompiles for that method always return false. It
allows JIT to prune the branch based on custom profile and recompile the
method, if the branch is visited.
For now, I wanted to keep the fix very focused. The next thing I plan to
do is to experiment with ignoring deoptimization counts for other
LambdaForms which are heavily shared. I already saw problems caused by
deoptimization counts pollution (see JDK-8068915 [2]).
I plan to backport the fix into 8u40, once I finish extensive
performance testing.
Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite,
Octane).
Thanks!
PS: as a summary, my experiments show that fixes for 8063137 & 8068915
[2] almost completely recovers peak performance after LambdaForm sharing
[3]. There's one more problem left (non-inlined MethodHandle invocations
are more expensive when LFs are shared), but it's a story for another day.
Best regards,
Vladimir Ivanov
[1] https://bugs.openjdk.java.net/browse/JDK-8059877
8059877: GWT branch frequencies pollution due to LF sharing
[2] https://bugs.openjdk.java.net/browse/JDK-8068915
[3] https://bugs.openjdk.java.net/browse/JDK-8046703
JEP 210: LambdaForm Reduction and Caching
_______________________________________________
mlvm-dev mailing list
mlvm-...@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
_______________________________________________
mlvm-dev mailing list
mlvm-...@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev