Re: [9] RFR (S) 8062280: C2: inlining failure due to access checks being too strict
Looks fine to me. Vladimir K On 3/23/15 5:27 PM, Vladimir Ivanov wrote: http://cr.openjdk.java.net/~vlivanov/8062280/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8062280 C2 inlining policy is too strict when it comes to inlining DMH linkers. The compiler performs access checks on target method and sometimes it breaks inlining. Such checks can be skipped since MemberNames are checked during construction. The fix is to disable access checks when inlining DMH linkers. Testing: regression test, java/lang/invoke tests, nashorn, octane. Best regards, Vladimir Ivanov ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: [9] RFR (S): 8075263: MHI::checkCustomized isn't eliminated for inlined MethodHandles
Good. thanks, Vladimir K On 3/16/15 12:05 PM, John Rose wrote: Reviewed. — John On Mar 16, 2015, at 11:47 AM, Vladimir Ivanov vladimir.x.iva...@oracle.com wrote: http://cr.openjdk.java.net/~vlivanov/8075263/webrev.00/hotspot http://cr.openjdk.java.net/~vlivanov/8075263/webrev.00/jdk https://bugs.openjdk.java.net/browse/JDK-8075263 ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: [9] RFR (S): 8074548: Never-taken branches cause repeated deopts in MHs.GWT case
You already have 'result' variable for argument(0), why reread?: CmpINode(argument(0) Usually IfTrue is the fast path. IGVN will revert that check to have that. But it is better to have it in code. Current code looks like double negation. There is no explanation in changes why false_count never seen (== 0) is more important than true_count == 0. The comment says one of the values. Thanks, Vladimir On 3/16/15 11:26 AM, Vladimir Ivanov wrote: http://cr.openjdk.java.net/~vlivanov/8074548/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8074548 MethodHandleImpl::profileBoolean doesn't update never-taken branch count when hitting a deopt on it. As a result, for rarely taken branches consequent compilations consider them as never-taken and prune them again causing repeated deopts. It severely affects peak performance. The fix is to update MHI::profileBoolean intrinsic to insert a guard and uncommon trap w/ reexecute bit set for never-seen value. Once previously never seen value is encountered, the execution resumes after deopt in MHI::profileBoolean and corresponding count becomes non-zero. The guard doesn't add any additional overhead, since it dominates all value usages and all branches on the same value are eliminated. Testing: java/lang/invoke, nashorn, octane Best regards, Vladimir Ivanov ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared
Nice! At least Hotspot part since I don't understand jdk part :) I would suggest to add more detailed comment (instead of simple Stop profiling) to inline_profileBranch() intrinsic explaining what it is doing because it is not strictly intrinsic - it does not implement profileBranch() java code when counts is constant. You forgot to mark Opaque4Node as macro node. I would suggest to base it on Opaque2Node then you will get some methods from it. Thanks, Vladimir On 1/16/15 9:16 AM, Vladimir Ivanov wrote: http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ https://bugs.openjdk.java.net/browse/JDK-8063137 After GuardWithTest (GWT) LambdaForms became shared, profile pollution significantly distorted compilation decisions. It affected inlining and hindered some optimizations. It causes significant performance regressions for Nashorn (on Octane benchmarks). Inlining was fixed by 8059877 [1], but it didn't cover the case when a branch is never taken. It can cause missed optimization opportunity, and not just increase in code size. For example, non-pruned branch can break escape analysis. Currently, there are 2 problems: - branch frequencies profile pollution - deoptimization counts pollution Branch frequency pollution hides from JIT the fact that a branch is never taken. Since GWT LambdaForms (and hence their bytecode) are heavily shared, but the behavior is specific to MethodHandle, there's no way for JIT to understand how particular GWT instance behaves. The solution I propose is to do profiling in Java code and feed it to JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where profiling info is stored. Once JIT kicks in, it can retrieve these counts, if corresponding MethodHandle is a compile-time constant (and it is usually the case). To communicate the profile data from Java code to JIT, MethodHandleImpl::profileBranch() is used. If GWT MethodHandle isn't a compile-time constant, profiling should proceed. It happens when corresponding LambdaForm is already shared, for newly created GWT MethodHandles profiling can occur only in native code (dedicated nmethod for a single LambdaForm). So, when compilation of the whole MethodHandle chain is triggered, the profile should be already gathered. Overriding branch frequencies is not enough. Statistics on deoptimization events is also polluted. Even if a branch is never taken, JIT doesn't issue an uncommon trap there unless corresponding bytecode doesn't trap too much and doesn't cause too many recompiles. I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT sees it on some method, Compile::too_many_traps Compile::too_many_recompiles for that method always return false. It allows JIT to prune the branch based on custom profile and recompile the method, if the branch is visited. For now, I wanted to keep the fix very focused. The next thing I plan to do is to experiment with ignoring deoptimization counts for other LambdaForms which are heavily shared. I already saw problems caused by deoptimization counts pollution (see JDK-8068915 [2]). I plan to backport the fix into 8u40, once I finish extensive performance testing. Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite, Octane). Thanks! PS: as a summary, my experiments show that fixes for 8063137 8068915 [2] almost completely recovers peak performance after LambdaForm sharing [3]. There's one more problem left (non-inlined MethodHandle invocations are more expensive when LFs are shared), but it's a story for another day. Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8059877 8059877: GWT branch frequencies pollution due to LF sharing [2] https://bugs.openjdk.java.net/browse/JDK-8068915 [3] https://bugs.openjdk.java.net/browse/JDK-8046703 JEP 210: LambdaForm Reduction and Caching ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: [9, 8u40] RFR (XXS): 8059880: Get rid of LambdaForm interpretation
As far as I can guess :) this change looks good. Reviewed. Thanks, Vladimir On 11/19/14 2:24 AM, Vladimir Ivanov wrote: Aleksey, Duncan, thanks for the review and the confirmation that it doesn't break stuff for you. Any Reviews, please? :-) Best regards, Vladimir Ivanov On 11/19/14, 2:23 PM, MacGregor, Duncan (GE Energy Management) wrote: On 18/11/2014 23:33, Aleksey Shipilev aleksey.shipi...@oracle.com wrote: On 11/19/2014 12:01 AM, Vladimir Ivanov wrote: http://cr.openjdk.java.net/~vlivanov/8059880/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8059880 Yes, for the love of God, GO FOR IT. Seconded. Startup of our stuff seems fine now with a compile threshold of zero, and it will make stacks so much easier to read in the debugger. :-) ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: [9] RFR (S): 8058892: FILL_ARRAYS and ARRAYS are eagely initialized in MethodHandleImpl
Looks fine to me. Can you push it into our jdk9/hs-comp/jdk repo so we can see immediate results from it in our testing? Thanks, Vladimir On 10/3/14 7:12 AM, Vladimir Ivanov wrote: Updated version: http://cr.openjdk.java.net/~vlivanov/8058892/webrev.02/ Looks good. Thanks, Aleksey. Any capital-R volunteers to review this change? :-) Best regards, Vladimir Ivanov ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: The Great Startup Problem
JRuby loads about 4000 own classes (above 1000 of system classes) during execution of just '-e 1'. It is a lot of data to load, parse, verify. I played with CDS (Class Data Sharing) which includes jruby classes. We can do that since jruby.jar is on boot class path but it requires some manual steps. I got next data: jruby-1.7.13$ time bin/jruby -J-Xshare:off -e 1 real0m1.344s user0m3.688s sys 0m0.182s jruby-1.7.13$ time bin/jruby -J-Xshare:on -e 1 real0m1.010s user0m2.918s sys 0m0.153s With C1 it is even smaller: jruby-1.7.13$ time bin/jruby -J-Xshare:on -J-XX:TieredStopAtLevel=1 -e 1 real0m0.835s user0m1.164s sys 0m0.112s Regards, Vladimir K On 9/2/14 10:54 AM, Vladimir Ivanov wrote: Charlie, Is it acceptable and solves the problem for you? This is acceptable for JRuby. Our worst-case Ruby method handle chain will include at most: * Two CatchExceptions for pre/post logic (heap frames, etc). Perf of CatchException compared to literal Java try/catch is important here. * Up to two permute arguments for differing call site/target argument ordering. * Varargs negotiation (may be a couple handles) * GWT * SwitchPoint * For Ruby to Java calls, each argument plus the return value must be filtered to convert to/from Ruby types or apply an IRubyObject wrapper This is worst case, mind you. Most calls in the system will be arity-matched, eliminating the permutes. Most calls will be three or fewer arguments, eliminating varargs. Many calls will be optimized to no longer need a heap frame, eliminating the try/finally. The absolute minimum for any call would be SwitchPoint plus GWT. Of course I'm not counting DMHs here, since they're either the call we want to make or they're leaf logic. Thanks for the data! That's good! We discussed an idea to generate custom bytecodes (single method) for the whole method handle chain (and have only 1 extra stack frame per MH invocation), but it defeats memory footprint reduction we are trying to archieve with LambdaForm sharing. Funny thing...because indy slows our startup and increases our warmup time, we're using our old binding logic by default. And surprise surprise, our old binding logic does exactly this...one small generated invoker class per method. I'm sure you're right that this approach defeats the sharing and memory reduction we'd like to see from LFs, but it works *really* well if you're ok with the extra class and metaspace data in memory. I see one problem with pre-compiling method handle trees. Every tree should be compiled as a whole, so fast path and slow path are always compiled. Without explicit hints or profiling and recompilation it's impossible to distinguish them. Comparing with MethodHandle/LambdaForm compilation unit, where slow path usually stays interpreted on LF level (due to invocation threshold), for considerably large method handle trees memory overhead can be larger. But I'm just guessing here - I don't have any statistics yet neither on average size of method handle trees nor numbers on memory overhead induced by individual classes. So there's one question: is the cost of a bytecoded adapter shim for each method object really that high? Yes, if you're spinning new MHs constantly or doing a million different adaptations of a given method. But if you're just lazily creating an invoker shim once per method, that really doesn't seem like a big deal. Good question. I have a prototype of LF inlining during bytecode translation. I'll conduct some experiments to gather some data. My indy binding logic also has a dozen different flags for tweaking. I can easily modify it to avoid doing all that pre/post logic and argument permutation in the MH chain and just bind directly to the generated invoker. Best (or worst) of both worlds? I just really don't want to have to do that...I want everything from call site to target method body to be in the MH chain. For JRuby 9000, all try/finally logic will be within the target method, so at least that part of the MH chain goes away. Here's another idea... We've been using my InvokeBinder library heavily in JRuby. It provides a Java API/DSL for creating MH chains lazily from the top down: MethodHandle mh = Binder.from(String.class, Object.class, Float.class) .tryFinally(finallyLogic) .permute(1, 0) .append(Hello) .drop(1) .invokeStatic(MyClass.class, someMethod); The adaptations are gathered within the Binder instance, playing forward as you add adaptations and played backward at binding time to make the appropriate MethodHandles and MethodHandle calls. Duncan talked about how he was able to improve MH chain size and performance by applying certain transformations in a different order, among other things. InvokeBinder *could* be doing a lot more to optimize the MH chain. For example, the above case never uses the Object value passed in (it is permuted to position 1 and later dropped), but that fact is
Re: Studying LF performance
Hi Charlie, If you want to experiment :) you can try the code Roland and Christian pushed. Roland just pushed Incremental inlining changes for C2 which should help LF inlining: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/d092d1b31229 You also need Christian's inlining related changes in JDK which : http://hg.openjdk.java.net/hsx/hotspot-main/jdk/rev/12fa4d7ecaf5 Regards, Vladimir On 12/23/12 11:21 AM, Charles Oliver Nutter wrote: A thread emerges! I'm going to be taking some time this holiday to explore the performance of the new LF indy impl in various situations. This will be the thread where I gather observations. A couple preliminaries... My perf exploration so far seems to show LF performing nearly equivalent to the old impl for the smallest benchmarks, with performance rapidly degrading as the size of the code involved grows. Recursive fib and tak have nearly identical perf on LF and the old impl. Red/black performs about the same on LF as with indy disabled, well behind the old indy performance. At some point, LF falls completely off the cliff and can't even compete with non-indy logic, as in a benchmark I ran today of Ruby constant access (heavily SwitchPoint-dependent). Discussions with Christian seem to indicate that the fall-off is because non-inlined LF indy call sites perform very poorly compared to the old impl. I'll be trying to explore this and correlate the perf cliff with failure to inline. Christian has told me that (upcoming?) work on incremental inlining will help reduce the performance impact of the fall-off, but I'm not sure of the status of this work. Some early ASM output from a trivial benchmark: loop 500M times calling #foo, which immediately calls #bar, which just returns the self object (ALOAD 2; ARETURN in essence). I've been comparing the new ASM to the old, both presented in a gist here: https://gist.github.com/4365103 As you can see, the code resulting from both impls boils down to almost nothing, but there's one difference... New code not present in old: 0x000111ab27ef: je 0x000111ab2835 ;*ifnull ; - java.lang.Class::cast@1 (line 3007) ; - java.lang.invoke.LambdaForm$MH/763053631::guard@12 ; - java.lang.invoke.LambdaForm$MH/518216626::linkToCallSite@14 ; - ruby.__dash_e__::method__0$RUBY$foo@3 (line 1) A side effect of inlining through LFs, I presume? Checking to ensure non-null call site? If so, shouldn't this have folded away, since the call site is constant? In any case, it's hardly damning to have an extra branch. This output is, at least, proof that LF *can* inline and optimize as well as the old impl...so we can put that aside for now. The questions to explore then are: * Do cases expected to inline actually do so under LF impl? * When inlining, does code optimize as it should (across the various shapes of call sites in JRuby, at least)? * When code does not inline, how does it impact performance? My expectation is that cases which should inline do so under LF, but that the non-inlined performance is significantly worse than under the old impl. The critical bit will be ensuring that even when LF call sites do not inline, they at least still compile to avoid interpretation and LF-to-LF overhead. At a minimum, it seems like we should be able to expect all LF between a call site and its DMH target will get compiled into a single unit, if not inlined into the caller. I still contend that call site + LFs should be heavily prioritized for inlining either into the caller or along with the called method, since they really *are* the shape of the call site. If there has to be a callq somewhere in that chain, there should ideally be only one. So...here we go. - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Latest experiments...happiness and sadness
I am working on it. Vladimir On Oct 20, 2012, at 9:39 AM, Florian Weimer wrote: * Remi Forax: Even a simple code like the one below, there is no scalar replacement (OSR or not), Float f = new Float(0); for(int i=0; ilength; i++) { f = new Float(f + 1.0f); } float result = f; System.out.println(result); Last time I looked at this, scalar replacement and loop optimizations did not play along at all, and here you need both. Has this changed? ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: review request (L): JDK changes for 7023639: JSR 292 method handle invocation needs a fast path for compiled code
John, Both webrevs point to jdk changes. Where are hotspot changes? Vladimir John Rose wrote: On Jul 13, 2012, at 2:41 AM, John Rose wrote: Here is that webrev: http://cr.openjdk.java.net/~jrose/7023639/webrev.jdk.00/ These are the changes to JDK code that accompany the JVM changes already under review. I have updated both webrevs to their final contents, as follows: http://cr.openjdk.java.net/~jrose/7023639/webrev.01/ http://cr.openjdk.java.net/~jrose/7023639/webrev.jdk.01/ In the same folder are incremental webrevs, for the record. Thanks to all reviewers for their help. — John P.S. If there's something you don't like in one of the files, let us know. As I noted before, we can (and will) roll more adjustments into the next push. ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: review request (XXXL): 7023639: JSR 292 method handle invocation needs a fast path for compiled code
Nice. Vladimir Christian Thalinger wrote: We forgot to remove the Ricochet Frame code from the sA: http://cr.openjdk.java.net/~twisti/7023639/ -- Chris On Jul 19, 2012, at 11:28 AM, Christian Thalinger wrote: JDK testing found a small bug: diff --git a/src/share/vm/oops/cpCacheOop.cpp b/src/share/vm/oops/cpCacheOop.cpp --- a/src/share/vm/oops/cpCacheOop.cpp +++ b/src/share/vm/oops/cpCacheOop.cpp @@ -486,7 +486,8 @@ // virtual and final so _f2 contains method ptr instead of vtable index if (f2_as_vfinal_method() == old_method) { // match old_method so need an update - set_f2_as_vfinal_method(new_method); + // NOTE: can't use set_f2_as_vfinal_method as it asserts on different values + _f2 = (intptr_t)new_method; if (RC_TRACE_IN_RANGE(0x0010, 0x0040)) { if (!(*trace_name_printed)) { // RC_TRACE_MESG macro has an embedded ResourceMark I will integrate this one. -- Chris On Jul 18, 2012, at 7:10 PM, John Rose wrote: On Jul 17, 2012, at 4:04 PM, Christian Thalinger wrote: I see in several files next code pattern. Should we call throw_IncompatibleClassChangeError() as we do in other places?: + if (!EnableInvokeDynamic) { +// rewriter does not generate this bytecode +__ should_not_reach_here(); +return; + } Hmm. This really should not happen and EnableInvokeDynamic is on by default anyway. I doubt someone turns it off. It's now a diagnostic switch. It can be used to verify that an app. is not using 292, which is occasionally helpful. constantPoolOop.cpp: Why not use guarantee() for bad operants? Not sure. John? The code in that file uses assert; I think I'm following the existing practice there. The CP code operates after initial verification passes, so assert is suitable, I think. Why you need separate scopes in resolve_bootstrap_specifier_at_impl()? To keep oops from being visible. Might result in bad crashes. Yes. An alternative is to have the x_oop variables visible at top level, but to put in DEBUG_ONLY(x_oop = NULL). Having the temporary scoped seems cleaner to me, but maybe my sense of style is off here. I will refresh the review patch in mlvm. Thank you! Yes, thanks! — John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: review request (XXXL): 7023639: JSR 292 method handle invocation needs a fast path for compiled code
linkResolver.cpp: Can you replace infinite for(;;) in resolve_invokedynamic() with finite loop since the body is executed once or twice. Hmm. What limit do you have in mind? It is not loop by logic. The code is trying to avoid double the check for CallSite has been bound already. I think it could be re-arranged as next: + Handle bootstrap_specifier; + // Check if CallSite has been bound already: + ConstantPoolCacheEntry* cpce = pool-cache()-secondary_entry_at(index); + if (cpce-is_f1_null()) { +int pool_index = pool-cache()-main_entry_at(index)-constant_pool_index(); +oop bsm_info = pool-resolve_bootstrap_specifier_at(pool_index, CHECK); +assert(bsm_info != NULL, ); +// FIXME: Cache this once per BootstrapMethods entry, not once per CONSTANT_InvokeDynamic. +bootstrap_specifier = Handle(THREAD, bsm_info); + } + if (!cpce-is_f1_null()) { +methodHandle method(THREAD, cpce-f2_as_vfinal_method()); +Handle appendix(THREAD, cpce-has_appendix() ? cpce-f1_appendix() : (oop)NULL); +result.set_handle(method, appendix, CHECK); +return; + } bytecodeInfo.cpp: Don't add spaces into conditions, looks strange. You REALLY want me to remove it? ;-) No, you can leave it as it is. Vladimir Christian Thalinger wrote: On Jul 12, 2012, at 6:27 PM, Vladimir Kozlov wrote: John, sharedRuntime_sparc.cpp: Why casting to (int)? Also use pointer_delta(code_end, code_start,1): + __ set((int)(intptr_t)(code_end - code_start), temp2_reg); Done. You bound L_fail label twice, it should be local in range_check(). Use brx() instead of br() since you compare pointers. And use cmp_and_brx_short() if delayed instruction is nop(). Done. Use fatal() instead of guarantee: guarantee(false, err_msg(special_dispatch=%d, special_dispatch)); Done. interpreter_sparc.cpp: In generate_method_entry() use fatal() instead of ShouldNotReachHere(): fatal(err_msg(unexpected method kind: %d, kind)); Good idea. Done. methodHandles_sparc.cpp: In MethodHandles::verify_klass() calls restore() should be after BINDs. Argh. I haven't seen you've found this bug. It took me a while to debug this :-) In MethodHandles::jump_from_method_handle() use cmp_and_br_short(temp, 0, ) Done. Instead of 100 use strlen(name)+50: +char* qname = NEW_C_HEAP_ARRAY(char, 100); +jio_snprintf(qname, 100, Done. sharedRuntime_x86_32.cpp: sharedRuntime_x86_64.cpp: The same problem with L_fail label as in sharedRuntime_sparc.cpp. Done. templateInterpreter_x86_32.cpp: templateInterpreter_x86_64.cpp: Again use use fatal() instead of ShouldNotReachHere() in generate_method_entry() Done. I see in several files next code pattern. Should we call throw_IncompatibleClassChangeError() as we do in other places?: + if (!EnableInvokeDynamic) { +// rewriter does not generate this bytecode +__ should_not_reach_here(); +return; + } Hmm. This really should not happen and EnableInvokeDynamic is on by default anyway. I doubt someone turns it off. c1_FrameMap.cpp: Why is ShouldNotReachHere() for mh_invoke in FrameMap::java_calling_convention()? That was old, unused code. Removed. c1_GraphBuilder.cpp: add parenthesis: const bool is_invokedynamic = code == Bytecodes::_invokedynamic; Done. nmethod.cpp: Don't put printing nmethod's addresses under Verbose flag. Leftover. Removed. linkResolver.cpp: Can you replace infinite for(;;) in resolve_invokedynamic() with finite loop since the body is executed once or twice. Hmm. What limit do you have in mind? templateInterpreter.cpp: why you need additional {} around the loop? We don't. I think it was used for better visibility of the MH code. Removed. constantPoolOop.cpp: Why not use guarantee() for bad operants? Not sure. John? Why you need separate scopes in resolve_bootstrap_specifier_at_impl()? To keep oops from being visible. Might result in bad crashes. symbol.cpp: The loop in index_of_at() should be for(; scan = limit; scan++) and after loop return -1. Done. bytecodeInfo.cpp: Don't add spaces into conditions, looks strange. It's more readable: if ( callee-is_native()) return native method; if ( callee-is_abstract()) return abstract method; if (!callee-can_be_compiled()) return not compilable (disabled); if (!callee-has_balanced_monitors()) return not compilable (unbalanced monitors); if ( callee-get_flow_analysis()-failing()) return not compilable (flow analysis failed); vs. if (callee-is_native()) return native method; if (callee-is_abstract())return abstract method; if (!callee-can_be_compiled()) return not compilable (disabled); if (!callee-has_balanced_monitors()) return not compilable (unbalanced monitors); if (callee
Re: review request (L): JDK changes for 7023639: JSR 292 method handle invocation needs a fast path for compiled code
BoundMethodHandle.java: I am concern that BMH subclass names are looks like constants names: BMH_LLI. In several places you have BMH constants and variables: + final ClassBoundMethodHandle BMH = BoundMethodHandle.class; + static final String BMH = java/lang/invoke/BoundMethodHandle; ThrowExceptionsTest.java: empty diffs ValueConversionsTest.java: remove commented print statement if it is not needed: -System.out.println(arrayType.getSimpleName()); +//System.out.println(arrayType.getSimpleName()); Vladimir John Rose wrote: On Jul 11, 2012, at 5:53 PM, John Rose wrote: As some of you have noticed, Chris Thalinger, Michael Haupt, and I have been working on the mlvm patches [1] for JEP-160 [2] for several months. These changes make method handles more optimizable. They refactor lots of magic out of the JVM and into more manageable Java code. … An associated webrev for hotspot-comp/jdk/ will be posted soon; it is already present on mlvm-dev for the curious to examine. (This change set also deletes a lot of old code.) Here is that webrev: http://cr.openjdk.java.net/~jrose/7023639/webrev.jdk.00/ These are the changes to JDK code that accompany the JVM changes already under review. There are 2900 LOC deleted, and 7000 LOC added. Key changes: - method handle behavior is fully represented by LambdaForm objects - chained method handles (including adapter method handles) are gone - an ASM-based bytecode spinner compiles LambdaForms when they warm up - bound method handles are compactly represented without boxing - the private symbol-resolution interface to the JVM (MemberName) is improved - unit tests have more systematic coverage - a number of minor bugs are fixed This is implementation work. No public Java APIs are changed, although the javadoc is slightly edited for clarity. Please have a look. — John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: review request (M): 6711908: JVM needs direct access to some annotations
This looks good. Thanks, Vladimir On 7/11/12 11:54 PM, John Rose wrote: On Jul 11, 2012, at 5:25 PM, Vladimir Kozlov wrote: But I'll change it if you insist. Please, change. Also put final klass settings Fill in field values from parse_classfile_attributes in a separate ClassFileParser method. Done. Here is the updated webrev: http://cr.openjdk.java.net/~jrose/6711908/webrev.01 (See synopsis below.) — John @@ -62,6 +64,22 @@ typeArrayHandle _inner_classes; typeArrayHandle _annotations; + void set_class_synthetic_flag(bool x) { _synthetic_flag = x; } + void set_class_sourcefile(Symbol* x) { _sourcefile = x; } + void set_class_generic_signature(Symbol* x) { _generic_signature = x; } + void set_class_sde_symbol(Symbol* x) { _sde_symbol = x; } + void set_class_inner_classes(typeArrayHandle x) { _inner_classes = x; } + void set_class_annotations(typeArrayHandle x) { _annotations = x; } + void init_parsed_class_attributes() { + _synthetic_flag = false; + _sourcefile = NULL; + _generic_signature = NULL; + // initialize the other flags too: + _has_finalizer = _has_empty_finalizer = _has_vanilla_constructor = false; + _max_bootstrap_specifier_index = -1; + } + void apply_parsed_class_attributes(instanceKlassHandle k); // update k + class AnnotationCollector { public: enum Location { _in_field, _in_method, _in_class }; @@ -2866,6 +2863,21 @@ } } +void ClassFileParser::apply_parsed_class_attributes(instanceKlassHandle k) { + if (_synthetic_flag) + k-set_is_synthetic(); + if (_sourcefile != NULL) { + _sourcefile-increment_refcount(); + k-set_source_file_name(_sourcefile); + } + if (_generic_signature != NULL) { + _generic_signature-increment_refcount(); + k-set_generic_signature(_generic_signature); + } + k-set_source_debug_extension(_sde_symbol); // increment_refcount inside + k-set_inner_classes(_inner_classes()); + k-set_class_annotations(_annotations()); +} typeArrayHandle ClassFileParser::assemble_annotations(u1* runtime_visible_annotations, int runtime_visible_annotations_length, ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: review request (XXXL): 7023639: JSR 292 method handle invocation needs a fast path for compiled code
John, sharedRuntime_sparc.cpp: Why casting to (int)? Also use pointer_delta(code_end, code_start,1): + __ set((int)(intptr_t)(code_end - code_start), temp2_reg); You bound L_fail label twice, it should be local in range_check(). Use brx() instead of br() since you compare pointers. And use cmp_and_brx_short() if delayed instruction is nop(). Use fatal() instead of guarantee: guarantee(false, err_msg(special_dispatch=%d, special_dispatch)); interpreter_sparc.cpp: In generate_method_entry() use fatal() instead of ShouldNotReachHere(): fatal(err_msg(unexpected method kind: %d, kind)); methodHandles_sparc.cpp: In MethodHandles::verify_klass() calls restore() should be after BINDs. In MethodHandles::jump_from_method_handle() use cmp_and_br_short(temp, 0, ) Instead of 100 use strlen(name)+50: +char* qname = NEW_C_HEAP_ARRAY(char, 100); +jio_snprintf(qname, 100, sharedRuntime_x86_32.cpp: sharedRuntime_x86_64.cpp: The same problem with L_fail label as in sharedRuntime_sparc.cpp. templateInterpreter_x86_32.cpp: templateInterpreter_x86_64.cpp: Again use use fatal() instead of ShouldNotReachHere() in generate_method_entry() I see in several files next code pattern. Should we call throw_IncompatibleClassChangeError() as we do in other places?: + if (!EnableInvokeDynamic) { +// rewriter does not generate this bytecode +__ should_not_reach_here(); +return; + } c1_FrameMap.cpp: Why is ShouldNotReachHere() for mh_invoke in FrameMap::java_calling_convention()? c1_GraphBuilder.cpp: add parenthesis: const bool is_invokedynamic = code == Bytecodes::_invokedynamic; nmethod.cpp: Don't put printing nmethod's addresses under Verbose flag. linkResolver.cpp: Can you replace infinite for(;;) in resolve_invokedynamic() with finite loop since the body is executed once or twice. templateInterpreter.cpp: why you need additional {} around the loop? constantPoolOop.cpp: Why not use guarantee() for bad operants? Why you need separate scopes in resolve_bootstrap_specifier_at_impl()? symbol.cpp: The loop in index_of_at() should be for(; scan = limit; scan++) and after loop return -1. bytecodeInfo.cpp: Don't add spaces into conditions, looks strange. Remove commented code for inline ForceInline methods. callGenerator.cpp: Please, decide which code to use: +#if 1. And I don't think new code is correct. graphKit.cpp: Remove commented debug print. insert_argument() and remove_argument() are not used. Vladimir John Rose wrote: As some of you have noticed, Chris Thalinger, Michael Haupt, and I have been working on the mlvm patches [1] for JEP-160 [2] for several months. These changes make method handles more optimizable. They refactor lots of magic out of the JVM and into more manageable Java code. To get an idea of how much magic is being removed, consider that the change removes 12,000 lines of non-comment code from the JVM, including much assembly code. It inserts 4900 lines of non-comment code. These changes are now stable enough to integrate. They pass jtreg tests in a number of execution modes and platforms. They also correctly run various JRuby and Nashorn test programs. Although there are no performance gains to boast about at present, these changes clear the ground for long-term optimization work. Here is the webrev [3], for review and integration into JDK 8 via hotspot-comp/hotspot/. Because of the large size of this change set, we request that reviewers would clearly designate which files they are reviewing. That way we may be able to divide up the work a little more effectively. Also, because of the scope of the change, we may respond to some comments by promising to address an issue in a future change set. If necessary, we will file tracking bugs to make sure nothing gets dropped. We have been working on this for months, and expect to make many further changes. The immediate need to get the changes in is twofold: First, some bugs (involving symbolic references off the boot class path) require the new Lambda Form intermediate representation, which is off-BCP-clean. Second, we need to commit our pervasive changes to the JVM sooner rather than later, so they can be properly integrated with other pervasive changes, such as metadata changes. An associated webrev for hotspot-comp/jdk/ will be posted soon; it is already present on mlvm-dev for the curious to examine. (This change set also deletes a lot of old code.) Thanks in advance, — John [1] http://hg.openjdk.java.net/mlvm/mlvm/hotspot/file/tip/meth-lazy-7023639.patch [2] http://openjdk.java.net/jeps/160 [3] http://cr.openjdk.java.net/~jrose/7023639/webrev.00/ ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: review request (M): 6711908: JVM needs direct access to some annotations
c1_GraphBuilder.cpp, can you remove setting bailout message for forced inlining? It should be done proper uniform way for C1 inlining later. I think, next assert should check (id = _unknown id _annotation_LIMIT) instead: +void set_annotation(ID id) { + assert((int)id 0 (int)id BitsPerInt, oob); In header file classFileParser.hpp you should not specify ClassFileParser:: in method parse_classfile_attributes(() declaration: ClassFileParser::ClassAnnotationCollector* parsed_annotations Instead of asserts in apply_to() methods we should use guarantee(not implemented) or something. I don't think next should be part of these changes: +#if 0 +// The parsing of @Retention is for example only. Add parenthesis around expression in next condition: + while (--nann = 0 index-2 + min_size = limit) { Instead of passing pointers to classFileParser's new attributes fields (_synthetic_flag, _sourcefile, ...) as arguments add accessors functions which set these fields. I think next is typo, should be _in_method check: + case vmSymbols::VM_SYMBOL_ENUM_NAME(java_lang_invoke_ForceInline_signature): +if (_location != _in_class) break; +return _method_ForceInline; + default: break; + } Thanks, Vladimir John Rose wrote: This is a building block for an upcoming large enhancement for method handles. http://cr.openjdk.java.net/~jrose/6711908/webrev.00 The implemented interface is completely private to the java.lang.invoke package. — John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: review request (M): 6711908: JVM needs direct access to some annotations
Yes, Michael asked about that too. The point is to show how annotation payloads can be processed. Somebody will need it at some point. I'll make it into a comment; is that OK? OK. Instead of passing pointers to classFileParser's new attributes fields (_synthetic_flag, _sourcefile, ...) as arguments add accessors functions which set these fields. The pointer passing is following the pattern already present for parsing other constructs. parse_fields is the clearest example. The parsed information is returned by reference. I could do as you suggest, but it For parse_fields() references for local variables are passed. It is different from passing references to classFileParser's fields which are accessible inside methods. would work only for top-level class attributes, so there would still be a mix of styles. My thought is to make the style more uniform by relying on the return-by-reference pattern. But I'll change it if you insist. Please, change. Also put final klass settings Fill in field values from parse_classfile_attributes in a separate ClassFileParser method. Thanks, Vladimir John Rose wrote: On Jul 11, 2012, at 12:01 AM, Michael Haupt wrote: @@ -1636,16 +1648,163 @@ The code for parsing @Retention deserves a comment highlighting that it is about parsing an annotation with payload (none of the annotations introduced by our work do this). @@ -2560,10 +2727,11 @@ TempNewSymbol sde_symbol is never used. Fixed; thanks. On Jul 11, 2012, at 2:35 PM, Vladimir Kozlov wrote: c1_GraphBuilder.cpp, can you remove setting bailout message for forced inlining? It should be done proper uniform way for C1 inlining later. Done. I think, next assert should check (id = _unknown id _annotation_LIMIT) instead: +void set_annotation(ID id) { + assert((int)id 0 (int)id BitsPerInt, oob); Good. I added this to the constructor assert((int)_annotation_LIMIT = (int)sizeof(_annotations_present) * BitsPerByte, ); In header file classFileParser.hpp you should not specify ClassFileParser:: in method parse_classfile_attributes(() declaration: ClassFileParser::ClassAnnotationCollector* parsed_annotations Good catch. Instead of asserts in apply_to() methods we should use guarantee(not implemented) or something. Done: + guarantee(false, no field annotations yet); I don't think next should be part of these changes: +#if 0 +// The parsing of @Retention is for example only. Yes, Michael asked about that too. The point is to show how annotation payloads can be processed. Somebody will need it at some point. I'll make it into a comment; is that OK? +// For the record, here is how annotation payloads can be collected. +// Suppose we want to capture @Retention.value. Here is how: +//if (id == AnnotationCollector::_class_Retention) { Add parenthesis around expression in next condition: + while (--nann = 0 index-2 + min_size = limit) { Done: + while ((--nann) = 0 (index-2 + min_size = limit)) { Instead of passing pointers to classFileParser's new attributes fields (_synthetic_flag, _sourcefile, ...) as arguments add accessors functions which set these fields. The pointer passing is following the pattern already present for parsing other constructs. parse_fields is the clearest example. The parsed information is returned by reference. I could do as you suggest, but it would work only for top-level class attributes, so there would still be a mix of styles. My thought is to make the style more uniform by relying on the return-by-reference pattern. But I'll change it if you insist. I think next is typo, should be _in_method check: + case vmSymbols::VM_SYMBOL_ENUM_NAME(java_lang_invoke_ForceInline_signature): +if (_location != _in_class) break; +return _method_ForceInline; + default: break; + } Yes; thanks. — John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled
Christian, Should we put skip bytecode quickening code under flag to do this only when invoke dynamic is enabled? Or put_code is zero only in invoke dynamic case? On 8/8/11 6:56 AM, Christian Thalinger wrote: Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32 bit)? Good question. I took the code from TemplateTable::resolve_cache_and_index without thinking about it and that one uses ld_ptr. _indices in CosntantPoolCacheEntry is defined as intx: volatile intx _indices; // constant pool index rewrite bytecodes and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word: // bit number |310| // bit length |-8--|-8--|---16| // // _indices [ b2 | b1 | index ] Loading 32-bit on LE gives you the right bits but on BE it does not. I think that's the reason for the optimization on x64. I don't like this optimization but I understand why we using it. Add a comment (especially in x64 file). I am concern about using next short branch in new code in templateTable_sparc.cpp: cmp_and_br_short(..., L_patch_done); // don't patch There is __ stop() call which generates a lot of code so that label L_patch_done could be far. Yeah, I thought I give it a try if it works. cmp_and_br_short should assert if the branch displacement is too far, right? Yes, it will assert but may be only in some worst case which we do not test. For example, try to run 64 bit fastdebug VM on Sparc + compressed oops + VerifyOops. Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files. Both files use dependencies and I got linkage errors on Linux while working on the fix (because of inline methods). It seems that the include is not required in ciEnv.cpp because ciEnv.hpp already includes it. I missed that. But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies. OK. Why you did not leave volatile call site inlining with guard? You did not explain why virtual call is fine for it. The spec of MutableCallSite says: For target values which will be frequently updated, consider using a volatile call site instead. And VolatileCallSite says: A VolatileCallSite is a CallSite whose target acts like a volatile variable. An invokedynamic instruction linked to a VolatileCallSite sees updates to its call site target immediately, even if the update occurs in another thread. There may be a performance penalty for such tight coupling between threads. Unlike MutableCallSite, there is no syncAll operation on volatile call sites, since every write to a volatile variable is implicitly synchronized with reader threads. In other respects, a VolatileCallSite is interchangeable with MutableCallSite. Since VolatileCallSite really should only be used when you know the target changes very often we don't do optimizations for this case. Obviously this is just a guess how people will use VolatileCallSite but I think for now this is a safe bet. Thank you for explaining it. Additionally I had to do two small changes because the build was broken on some configurations: - klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()-new_type() : NULL; + klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()-new_type() : (klassOop) NULL; and - MutexLockerEx ccl(CodeCache_lock, thread); + MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag); I updated the webrev. Good. Vladimir -- Christian Vladimir On 8/5/11 6:32 AM, Christian Thalinger wrote: http://cr.openjdk.java.net/~twisti/7071653 7071653: JSR 292: call site change notification should be pushed not pulled Reviewed-by: Currently every speculatively inlined method handle call site has a guard that compares the current target of the CallSite object to the inlined one. This per-invocation overhead can be removed if the notification is changed from pulled to pushed (i.e. deoptimization). I had to change the logic in TemplateTable::patch_bytecode to skip bytecode quickening for putfield instructions when the put_code written to the constant pool cache is zero. This is required so that every execution of a putfield to CallSite.target calls out to InterpreterRuntime::resolve_get_put to do the deoptimization of depending compiled methods. I also had to change the dependency machinery to understand other dependencies than class hierarchy ones. DepChange got the super-type of two new dependencies, KlassDepChange and CallSiteDepChange. Tested with JRuby tests and benchmarks, hand-written testcases, JDK tests and vm.mlvm tests. Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, second with 7071653). Since the CallSite targets don't change during the runtime of this benchmark we can see the performance
Re: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled
Christian Thalinger wrote: On Aug 8, 2011, at 4:55 PM, Vladimir Kozlov wrote: Christian, Should we put skip bytecode quickening code under flag to do this only when invoke dynamic is enabled? Or put_code is zero only in invoke dynamic case? No, it doesn't buy us anything. The new checking code is only executed the first time as the bytecodes are quickened right after that. And in the case where a putfield isn't quickened and we call resolve_get_put it gets very expensive anyway. You lost me here. New code in resolve_get_put() is executed only for putfield to CallSite.target. But new code in patch_bytecode() skips quickening for all putfield bytecodes. My question is: can you narrow skipping quickening only for putfield to CallSite.target? Or you are saying that there is no performance difference between executing _aputfield vs _fast_aputfield? Vladimir On 8/8/11 6:56 AM, Christian Thalinger wrote: Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32 bit)? Good question. I took the code from TemplateTable::resolve_cache_and_index without thinking about it and that one uses ld_ptr. _indices in CosntantPoolCacheEntry is defined as intx: volatile intx _indices; // constant pool index rewrite bytecodes and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word: // bit number |310| // bit length |-8--|-8--|---16| // // _indices [ b2 | b1 | index ] Loading 32-bit on LE gives you the right bits but on BE it does not. I think that's the reason for the optimization on x64. I don't like this optimization but I understand why we using it. Add a comment (especially in x64 file). I factored reading the bytecode into InterpreterMacroAssembler::get_cache_and_index_and_bytecode_at_bcp since the same code is used twice in TemplateTable and added the comment there. I am concern about using next short branch in new code in templateTable_sparc.cpp: cmp_and_br_short(..., L_patch_done); // don't patch There is __ stop() call which generates a lot of code so that label L_patch_done could be far. Yeah, I thought I give it a try if it works. cmp_and_br_short should assert if the branch displacement is too far, right? Yes, it will assert but may be only in some worst case which we do not test. For example, try to run 64 bit fastdebug VM on Sparc + compressed oops + VerifyOops. That works. Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files. Both files use dependencies and I got linkage errors on Linux while working on the fix (because of inline methods). It seems that the include is not required in ciEnv.cpp because ciEnv.hpp already includes it. I missed that. But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies. OK. Why you did not leave volatile call site inlining with guard? You did not explain why virtual call is fine for it. The spec of MutableCallSite says: For target values which will be frequently updated, consider using a volatile call site instead. And VolatileCallSite says: A VolatileCallSite is a CallSite whose target acts like a volatile variable. An invokedynamic instruction linked to a VolatileCallSite sees updates to its call site target immediately, even if the update occurs in another thread. There may be a performance penalty for such tight coupling between threads. Unlike MutableCallSite, there is no syncAll operation on volatile call sites, since every write to a volatile variable is implicitly synchronized with reader threads. In other respects, a VolatileCallSite is interchangeable with MutableCallSite. Since VolatileCallSite really should only be used when you know the target changes very often we don't do optimizations for this case. Obviously this is just a guess how people will use VolatileCallSite but I think for now this is a safe bet. Thank you for explaining it. Additionally I had to do two small changes because the build was broken on some configurations: - klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()-new_type() : NULL; + klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()-new_type() : (klassOop) NULL; and - MutexLockerEx ccl(CodeCache_lock, thread); + MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag); I updated the webrev. Good. Thanks. -- Christian Vladimir -- Christian Vladimir On 8/5/11 6:32 AM, Christian Thalinger wrote: http://cr.openjdk.java.net/~twisti/7071653 7071653: JSR 292: call site change notification should be pushed not pulled Reviewed-by: Currently every speculatively inlined method handle call site has a guard that compares the current target of the CallSite object to the inlined one. This per
Re: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled
Tom Rodriguez wrote: On Aug 8, 2011, at 11:52 AM, Vladimir Kozlov wrote: Christian Thalinger wrote: On Aug 8, 2011, at 4:55 PM, Vladimir Kozlov wrote: Christian, Should we put skip bytecode quickening code under flag to do this only when invoke dynamic is enabled? Or put_code is zero only in invoke dynamic case? No, it doesn't buy us anything. The new checking code is only executed the first time as the bytecodes are quickened right after that. And in the case where a putfield isn't quickened and we call resolve_get_put it gets very expensive anyway. You lost me here. New code in resolve_get_put() is executed only for putfield to CallSite.target. But new code in patch_bytecode() skips quickening for all putfield bytecodes. My question is: can you narrow skipping quickening only for putfield to CallSite.target? Or you are saying that there is no performance difference between executing _aputfield vs _fast_aputfield? It only skips quickening if put_code is zero, which is only done for CallSite.target. All the others proceed as they used to. Good. Thank you, Tom Vladimir tom Vladimir On 8/8/11 6:56 AM, Christian Thalinger wrote: Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32 bit)? Good question. I took the code from TemplateTable::resolve_cache_and_index without thinking about it and that one uses ld_ptr. _indices in CosntantPoolCacheEntry is defined as intx: volatile intx _indices; // constant pool index rewrite bytecodes and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word: // bit number |310| // bit length |-8--|-8--|---16| // // _indices [ b2 | b1 | index ] Loading 32-bit on LE gives you the right bits but on BE it does not. I think that's the reason for the optimization on x64. I don't like this optimization but I understand why we using it. Add a comment (especially in x64 file). I factored reading the bytecode into InterpreterMacroAssembler::get_cache_and_index_and_bytecode_at_bcp since the same code is used twice in TemplateTable and added the comment there. I am concern about using next short branch in new code in templateTable_sparc.cpp: cmp_and_br_short(..., L_patch_done); // don't patch There is __ stop() call which generates a lot of code so that label L_patch_done could be far. Yeah, I thought I give it a try if it works. cmp_and_br_short should assert if the branch displacement is too far, right? Yes, it will assert but may be only in some worst case which we do not test. For example, try to run 64 bit fastdebug VM on Sparc + compressed oops + VerifyOops. That works. Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files. Both files use dependencies and I got linkage errors on Linux while working on the fix (because of inline methods). It seems that the include is not required in ciEnv.cpp because ciEnv.hpp already includes it. I missed that. But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies. OK. Why you did not leave volatile call site inlining with guard? You did not explain why virtual call is fine for it. The spec of MutableCallSite says: For target values which will be frequently updated, consider using a volatile call site instead. And VolatileCallSite says: A VolatileCallSite is a CallSite whose target acts like a volatile variable. An invokedynamic instruction linked to a VolatileCallSite sees updates to its call site target immediately, even if the update occurs in another thread. There may be a performance penalty for such tight coupling between threads. Unlike MutableCallSite, there is no syncAll operation on volatile call sites, since every write to a volatile variable is implicitly synchronized with reader threads. In other respects, a VolatileCallSite is interchangeable with MutableCallSite. Since VolatileCallSite really should only be used when you know the target changes very often we don't do optimizations for this case. Obviously this is just a guess how people will use VolatileCallSite but I think for now this is a safe bet. Thank you for explaining it. Additionally I had to do two small changes because the build was broken on some configurations: - klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()-new_type() : NULL; + klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()-new_type() : (klassOop) NULL; and - MutexLockerEx ccl(CodeCache_lock, thread); + MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag); I updated the webrev. Good. Thanks. -- Christian Vladimir -- Christian Vladimir On 8/5/11 6:32 AM, Christian Thalinger wrote: http://cr.openjdk.java.net/~twisti/7071653 7071653: JSR 292: call
Re: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled
Christian, You need to add big comment to the new code in templateTable_arch.cpp explaining what it does and why. Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32 bit)? Add assert(byte_no == -1, ) to default: case to make sure you got all cases above it. I am concern about using next short branch in new code in templateTable_sparc.cpp: cmp_and_br_short(..., L_patch_done); // don't patch There is __ stop() call which generates a lot of code so that label L_patch_done could be far. Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files. I don't like assignments in condition and implicit NULL checks. Can you change check_dependency() to next?: klassOop check_dependency() { klassOop result = check_klass_dependency(NULL); if (result != NULL) return result; return check_call_site_dependency(NULL); } In interpreterRuntime.cpp initialize marked: int marked = 0; Why you did not leave volatile call site inlining with guard? You did not explain why virtual call is fine for it. Vladimir On 8/5/11 6:32 AM, Christian Thalinger wrote: http://cr.openjdk.java.net/~twisti/7071653 7071653: JSR 292: call site change notification should be pushed not pulled Reviewed-by: Currently every speculatively inlined method handle call site has a guard that compares the current target of the CallSite object to the inlined one. This per-invocation overhead can be removed if the notification is changed from pulled to pushed (i.e. deoptimization). I had to change the logic in TemplateTable::patch_bytecode to skip bytecode quickening for putfield instructions when the put_code written to the constant pool cache is zero. This is required so that every execution of a putfield to CallSite.target calls out to InterpreterRuntime::resolve_get_put to do the deoptimization of depending compiled methods. I also had to change the dependency machinery to understand other dependencies than class hierarchy ones. DepChange got the super-type of two new dependencies, KlassDepChange and CallSiteDepChange. Tested with JRuby tests and benchmarks, hand-written testcases, JDK tests and vm.mlvm tests. Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, second with 7071653). Since the CallSite targets don't change during the runtime of this benchmark we can see the performance benefit of eliminating the guard: $ jruby --server bench/bench_fib_recursive.rb 5 35 0.883000 0.00 0.883000 ( 0.854000) 0.715000 0.00 0.715000 ( 0.715000) 0.712000 0.00 0.712000 ( 0.712000) 0.713000 0.00 0.713000 ( 0.713000) 0.713000 0.00 0.713000 ( 0.712000) $ jruby --server bench/bench_fib_recursive.rb 5 35 0.772000 0.00 0.772000 ( 0.742000) 0.624000 0.00 0.624000 ( 0.624000) 0.621000 0.00 0.621000 ( 0.621000) 0.622000 0.00 0.622000 ( 0.622000) 0.622000 0.00 0.622000 ( 0.621000) ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: review request (M): 7044892: JSR 292: API entry points sometimes throw the wrong exceptions or doesn't throw the expected one
java/lang/invoke/MethodTypeForm.java double c != void.cl check: +if (c == void.class) +c = null; // a Void parameter was unwrapped to void; ignore +if (c != null c != void.class) { Otherwise looks good as far as I understand. Vladimir John Rose wrote: http://cr.openjdk.java.net/~jrose/7044892/webrev.00/ 7044892: JSR 292: API entry points sometimes throw the wrong exceptions or doesn't throw the expected one This is basically a bundle of point fixes having to do with corner cases. Grouped under this bug: 7038847: MethodType.fromMethodDescriptorString accepts both binary names/internal form of binary names 7038860: MethodType.methodType(Class rtype, Class[] ptypes) doesn't throw NPE if ptypes is null 7042656: JSR292: invokeExact/Generic doesn't throw UnsupportedOperationException if invoked via Method.invoke 7042829: JSR292: MethodHandles$Lookup.findStatic[S|G]etter throws InternalError if SecurityManager is set 7041853: findGetter throws unexpected IllegalAccessException -NOBUG-: asCollector throws ArrayIndexOutOfBoundsException instead of IllegalArgumentException for values outside [0..255] -NOBUG-: asVarargsCollector gets wrong trailing parameter type -NOBUG-: MethodType.unwrap on chokes on void if class Void occurs in the parameter list -NOBUG-: MethodHandle.toString need to produce compliant output ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Request for reviews (S): 7001363: java/dyn/InvokeDynamic should not be a well-known class in the JVM
Looks good. Vladimir Christian Thalinger wrote: http://cr.openjdk.java.net/~twisti/7001363/webrev.01/ 7001363: java/dyn/InvokeDynamic should not be a well-known class in the JVM Summary: Because of the removal of language support, the JDK 7 API for JSR 292 no longer includes a public class named java/dyn/InvokeDynamic. Reviewed-by: Because of the removal of language support, the JDK 7 API for JSR 292 no longer includes a public class named java/dyn/InvokeDynamic. However, the JVM uses java/dyn/InvokeDynamic (arbitrarily) as a marker class for invokedynamic instructions. This practice needs to change, and the class may then be removed from the JDK. John already reviewed the changes, I'm just sending an official webrev for completeness. Please feel free to review it too! Begin forwarded message: From: John Rose john.r.r...@oracle.com Date: November 30, 2010 12:52:41 AM GMT+01:00 To: Christian Thalinger christian.thalin...@oracle.com Subject: Re: 7001363: java/dyn/InvokeDynamic should not be a well-known class in the JVM On Nov 29, 2010, at 7:08 AM, Christian Thalinger wrote: On Nov 19, 2010, at 8:44 AM, John Rose wrote: Can I hand you this one? It's not completely clear to me what we have to remove. As far as I understand it invokedynamic instructions should still work but all references to java/dyn/InvokeDynamic should disappear. This is preliminary webrev: http://cr.openjdk.java.net/~twisti/7001363/webrev.00/ These changes do exactly that. I can still run invokedynamic programs but the InvokeDynamic class is not referenced and loaded anymore. Reviewed. Looks good. (You can CC this to hotspot-compiler-dev if you want.) -- John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev