Re: [9] RFR (S) 8062280: C2: inlining failure due to access checks being too strict

2015-03-23 Thread Vladimir Kozlov

Looks fine to me.

Vladimir K

On 3/23/15 5:27 PM, Vladimir Ivanov wrote:

http://cr.openjdk.java.net/~vlivanov/8062280/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8062280

C2 inlining policy is too strict when it comes to inlining DMH linkers.
The compiler performs access checks on target method and sometimes it
breaks inlining. Such checks can be skipped since MemberNames are
checked during construction.

The fix is to disable access checks when inlining DMH linkers.

Testing: regression test, java/lang/invoke tests, nashorn, octane.

Best regards,
Vladimir Ivanov

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: [9] RFR (S): 8075263: MHI::checkCustomized isn't eliminated for inlined MethodHandles

2015-03-16 Thread Vladimir Kozlov

Good.

thanks,
Vladimir K

On 3/16/15 12:05 PM, John Rose wrote:

Reviewed.  — John

On Mar 16, 2015, at 11:47 AM, Vladimir Ivanov vladimir.x.iva...@oracle.com 
wrote:


http://cr.openjdk.java.net/~vlivanov/8075263/webrev.00/hotspot
http://cr.openjdk.java.net/~vlivanov/8075263/webrev.00/jdk
https://bugs.openjdk.java.net/browse/JDK-8075263



___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: [9] RFR (S): 8074548: Never-taken branches cause repeated deopts in MHs.GWT case

2015-03-16 Thread Vladimir Kozlov

You already have 'result' variable for argument(0), why reread?:

CmpINode(argument(0)

Usually IfTrue is the fast path. IGVN will revert that check to have 
that. But it is better to have it in code. Current code looks like 
double negation.


There is no explanation in changes why false_count never seen (== 0) is 
more important than true_count == 0. The comment says one of the values.


Thanks,
Vladimir

On 3/16/15 11:26 AM, Vladimir Ivanov wrote:

http://cr.openjdk.java.net/~vlivanov/8074548/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8074548

MethodHandleImpl::profileBoolean doesn't update never-taken branch count
when hitting a deopt on it. As a result, for rarely taken branches
consequent compilations consider them as never-taken and prune them
again causing repeated deopts. It severely affects peak performance.

The fix is to update MHI::profileBoolean intrinsic to insert a guard and
uncommon trap w/ reexecute bit set for never-seen value. Once previously
never seen value is encountered, the execution resumes after deopt in
MHI::profileBoolean and corresponding count becomes non-zero.

The guard doesn't add any additional overhead, since it dominates all
value usages and all branches on the same value are eliminated.

Testing: java/lang/invoke, nashorn, octane

Best regards,
Vladimir Ivanov
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared

2015-01-16 Thread Vladimir Kozlov

Nice! At least Hotspot part since I don't understand jdk part :)

I would suggest to add more detailed comment (instead of simple Stop 
profiling) to inline_profileBranch() intrinsic explaining what it is 
doing because it is not strictly intrinsic - it does not implement 
profileBranch() java code when counts is constant.


You forgot to mark Opaque4Node as macro node. I would suggest to base it 
on Opaque2Node then you will get some methods from it.


Thanks,
Vladimir

On 1/16/15 9:16 AM, Vladimir Ivanov wrote:

http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/
http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/
https://bugs.openjdk.java.net/browse/JDK-8063137

After GuardWithTest (GWT) LambdaForms became shared, profile pollution
significantly distorted compilation decisions. It affected inlining and
hindered some optimizations. It causes significant performance
regressions for Nashorn (on Octane benchmarks).

Inlining was fixed by 8059877 [1], but it didn't cover the case when a
branch is never taken. It can cause missed optimization opportunity, and
not just increase in code size. For example, non-pruned branch can break
escape analysis.

Currently, there are 2 problems:
   - branch frequencies profile pollution
   - deoptimization counts pollution

Branch frequency pollution hides from JIT the fact that a branch is
never taken. Since GWT LambdaForms (and hence their bytecode) are
heavily shared, but the behavior is specific to MethodHandle, there's no
way for JIT to understand how particular GWT instance behaves.

The solution I propose is to do profiling in Java code and feed it to
JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where
profiling info is stored. Once JIT kicks in, it can retrieve these
counts, if corresponding MethodHandle is a compile-time constant (and it
is usually the case). To communicate the profile data from Java code to
JIT, MethodHandleImpl::profileBranch() is used.

If GWT MethodHandle isn't a compile-time constant, profiling should
proceed. It happens when corresponding LambdaForm is already shared, for
newly created GWT MethodHandles profiling can occur only in native code
(dedicated nmethod for a single LambdaForm). So, when compilation of the
whole MethodHandle chain is triggered, the profile should be already
gathered.

Overriding branch frequencies is not enough. Statistics on
deoptimization events is also polluted. Even if a branch is never taken,
JIT doesn't issue an uncommon trap there unless corresponding bytecode
doesn't trap too much and doesn't cause too many recompiles.

I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT
sees it on some method, Compile::too_many_traps 
Compile::too_many_recompiles for that method always return false. It
allows JIT to prune the branch based on custom profile and recompile the
method, if the branch is visited.

For now, I wanted to keep the fix very focused. The next thing I plan to
do is to experiment with ignoring deoptimization counts for other
LambdaForms which are heavily shared. I already saw problems caused by
deoptimization counts pollution (see JDK-8068915 [2]).

I plan to backport the fix into 8u40, once I finish extensive
performance testing.

Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite, Octane).

Thanks!

PS: as a summary, my experiments show that fixes for 8063137  8068915
[2] almost completely recovers peak performance after LambdaForm sharing
[3]. There's one more problem left (non-inlined MethodHandle invocations
are more expensive when LFs are shared), but it's a story for another day.

Best regards,
Vladimir Ivanov

[1] https://bugs.openjdk.java.net/browse/JDK-8059877
 8059877: GWT branch frequencies pollution due to LF sharing
[2] https://bugs.openjdk.java.net/browse/JDK-8068915
[3] https://bugs.openjdk.java.net/browse/JDK-8046703
 JEP 210: LambdaForm Reduction and Caching
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: [9, 8u40] RFR (XXS): 8059880: Get rid of LambdaForm interpretation

2014-11-19 Thread Vladimir Kozlov

As far as I can guess :) this change looks good. Reviewed.

Thanks,
Vladimir

On 11/19/14 2:24 AM, Vladimir Ivanov wrote:

Aleksey, Duncan, thanks for the review and the confirmation that it doesn't 
break stuff for you.

Any Reviews, please? :-)

Best regards,
Vladimir Ivanov

On 11/19/14, 2:23 PM, MacGregor, Duncan (GE Energy Management) wrote:

On 18/11/2014 23:33, Aleksey Shipilev aleksey.shipi...@oracle.com
wrote:

On 11/19/2014 12:01 AM, Vladimir Ivanov wrote:

http://cr.openjdk.java.net/~vlivanov/8059880/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8059880


Yes, for the love of God, GO FOR IT.


Seconded. Startup of our stuff seems fine now with a compile threshold of
zero, and it will make stacks so much easier to read in the debugger. :-)

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: [9] RFR (S): 8058892: FILL_ARRAYS and ARRAYS are eagely initialized in MethodHandleImpl

2014-10-03 Thread Vladimir Kozlov

Looks fine to me.
Can you push it into our jdk9/hs-comp/jdk repo so we can see immediate results 
from it in our testing?

Thanks,
Vladimir

On 10/3/14 7:12 AM, Vladimir Ivanov wrote:

Updated version:
http://cr.openjdk.java.net/~vlivanov/8058892/webrev.02/


Looks good.

Thanks, Aleksey.

Any capital-R volunteers to review this change? :-)

Best regards,
Vladimir Ivanov
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: The Great Startup Problem

2014-09-08 Thread Vladimir Kozlov
JRuby loads about 4000 own classes (above 1000 of system classes) during 
execution of just '-e 1'. It is a lot of data to load, parse, verify.
I played with CDS (Class Data Sharing) which includes jruby classes. We 
can do that since jruby.jar is on boot class path but it requires some 
manual steps.


I got next data:

jruby-1.7.13$ time bin/jruby -J-Xshare:off -e 1
real0m1.344s
user0m3.688s
sys 0m0.182s

jruby-1.7.13$ time bin/jruby -J-Xshare:on -e 1
real0m1.010s
user0m2.918s
sys 0m0.153s

With C1 it is even smaller:

jruby-1.7.13$ time bin/jruby -J-Xshare:on -J-XX:TieredStopAtLevel=1 -e 1
real0m0.835s
user0m1.164s
sys 0m0.112s

Regards,
Vladimir K

On 9/2/14 10:54 AM, Vladimir Ivanov wrote:

Charlie,


Is it acceptable and solves the problem for you?


This is acceptable for JRuby. Our worst-case Ruby method handle chain
will include at most:

* Two CatchExceptions for pre/post logic (heap frames, etc). Perf of
CatchException compared to literal Java try/catch is important here.
* Up to two permute arguments for differing call site/target argument
ordering.
* Varargs negotiation (may be a couple handles)
* GWT
* SwitchPoint
* For Ruby to Java calls, each argument plus the return value must be
filtered to convert to/from Ruby types or apply an IRubyObject wrapper

This is worst case, mind you. Most calls in the system will be
arity-matched, eliminating the permutes. Most calls will be three or
fewer arguments, eliminating varargs. Many calls will be optimized to
no longer need a heap frame, eliminating the try/finally. The absolute
minimum for any call would be SwitchPoint plus GWT.

Of course I'm not counting DMHs here, since they're either the call we
want to make or they're leaf logic.

Thanks for the data! That's good!


We discussed an idea to generate custom bytecodes (single method) for
the
whole method handle chain (and have only 1 extra stack frame per MH
invocation), but it defeats memory footprint reduction we are trying to
archieve with LambdaForm sharing.


Funny thing...because indy slows our startup and increases our warmup
time, we're using our old binding logic by default. And surprise
surprise, our old binding logic does exactly this...one small
generated invoker class per method. I'm sure you're right that this
approach defeats the sharing and memory reduction we'd like to see
from LFs, but it works *really* well if you're ok with the extra class
and metaspace data in memory.

I see one problem with pre-compiling method handle trees.
Every tree should be compiled as a whole, so fast path and slow path are
always compiled. Without explicit hints or profiling and recompilation
it's impossible to distinguish them.

Comparing with MethodHandle/LambdaForm compilation unit, where slow path
usually stays interpreted on LF level (due to invocation threshold), for
considerably large method handle trees memory overhead can be larger.

But I'm just guessing here - I don't have any statistics yet neither on
average size of method handle trees nor numbers on memory overhead
induced by individual classes.


So there's one question: is the cost of a bytecoded adapter shim for
each method object really that high? Yes, if you're spinning new MHs
constantly or doing a million different adaptations of a given method.
But if you're just lazily creating an invoker shim once per method,
that really doesn't seem like a big deal.

Good question. I have a prototype of LF inlining during bytecode
translation. I'll conduct some experiments to gather some data.


My indy binding logic also has a dozen different flags for tweaking. I
can easily modify it to avoid doing all that pre/post logic and
argument permutation in the MH chain and just bind directly to the
generated invoker. Best (or worst) of both worlds? I just really don't
want to have to do that...I want everything from call site to target
method body to be in the MH chain.

For JRuby 9000, all try/finally logic will be within the target
method, so at least that part of the MH chain goes away.

Here's another idea...

We've been using my InvokeBinder library heavily in JRuby. It provides
a Java API/DSL for creating MH chains lazily from the top down:

MethodHandle mh = Binder.from(String.class, Object.class, Float.class)
 .tryFinally(finallyLogic)
 .permute(1, 0)
 .append(Hello)
 .drop(1)
 .invokeStatic(MyClass.class, someMethod);

The adaptations are gathered within the Binder instance, playing
forward as you add adaptations and played backward at binding time to
make the appropriate MethodHandles and MethodHandle calls.

Duncan talked about how he was able to improve MH chain size and
performance by applying certain transformations in a different order,
among other things. InvokeBinder *could* be doing a lot more to
optimize the MH chain. For example, the above case never uses the
Object value passed in (it is permuted to position 1 and later
dropped), but that fact is 

Re: Studying LF performance

2012-12-23 Thread Vladimir Kozlov
Hi Charlie,

If you want to experiment :) you can try the code Roland and Christian 
pushed.

Roland just pushed Incremental inlining changes for C2 which should help 
LF inlining:

http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/d092d1b31229

You also need Christian's inlining related changes in JDK which :

http://hg.openjdk.java.net/hsx/hotspot-main/jdk/rev/12fa4d7ecaf5

Regards,
Vladimir

On 12/23/12 11:21 AM, Charles Oliver Nutter wrote:
 A thread emerges!

 I'm going to be taking some time this holiday to explore the
 performance of the new LF indy impl in various situations. This will
 be the thread where I gather observations.

 A couple preliminaries...

 My perf exploration so far seems to show LF performing nearly
 equivalent to the old impl for the smallest benchmarks, with
 performance rapidly degrading as the size of the code involved grows.
 Recursive fib and tak have nearly identical perf on LF and the old
 impl. Red/black performs about the same on LF as with indy disabled,
 well behind the old indy performance. At some point, LF falls
 completely off the cliff and can't even compete with non-indy logic,
 as in a benchmark I ran today of Ruby constant access (heavily
 SwitchPoint-dependent).

 Discussions with Christian seem to indicate that the fall-off is
 because non-inlined LF indy call sites perform very poorly compared to
 the old impl. I'll be trying to explore this and correlate the perf
 cliff with failure to inline. Christian has told me that (upcoming?)
 work on incremental inlining will help reduce the performance impact
 of the fall-off, but I'm not sure of the status of this work.

 Some early ASM output from a trivial benchmark: loop 500M times
 calling #foo, which immediately calls #bar, which just returns the
 self object (ALOAD 2; ARETURN in essence). I've been comparing the new
 ASM to the old, both presented in a gist here:
 https://gist.github.com/4365103

 As you can see, the code resulting from both impls boils down to
 almost nothing, but there's one difference...

 New code not present in old:

 0x000111ab27ef: je 0x000111ab2835  ;*ifnull
  ; -
 java.lang.Class::cast@1 (line 3007)
  ; -
 java.lang.invoke.LambdaForm$MH/763053631::guard@12
  ; -
 java.lang.invoke.LambdaForm$MH/518216626::linkToCallSite@14
  ; -
 ruby.__dash_e__::method__0$RUBY$foo@3 (line 1)

 A side effect of inlining through LFs, I presume? Checking to ensure
 non-null call site? If so, shouldn't this have folded away, since the
 call site is constant?

 In any case, it's hardly damning to have an extra branch. This output
 is, at least, proof that LF *can* inline and optimize as well as the
 old impl...so we can put that aside for now. The questions to explore
 then are:

 * Do cases expected to inline actually do so under LF impl?
 * When inlining, does code optimize as it should (across the various
 shapes of call sites in JRuby, at least)?
 * When code does not inline, how does it impact performance?

 My expectation is that cases which should inline do so under LF, but
 that the non-inlined performance is significantly worse than under the
 old impl. The critical bit will be ensuring that even when LF call
 sites do not inline, they at least still compile to avoid
 interpretation and LF-to-LF overhead. At a minimum, it seems like we
 should be able to expect all LF between a call site and its DMH target
 will get compiled into a single unit, if not inlined into the caller.
 I still contend that call site + LFs should be heavily prioritized for
 inlining either into the caller or along with the called method, since
 they really *are* the shape of the call site. If there has to be a
 callq somewhere in that chain, there should ideally be only one.

 So...here we go.

 - Charlie
 ___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Latest experiments...happiness and sadness

2012-10-20 Thread Vladimir Kozlov
I am working on it.

Vladimir

On Oct 20, 2012, at 9:39 AM, Florian Weimer wrote:

 * Remi Forax:
 
 Even a simple code like the one below, there is no scalar replacement  
 (OSR or not),
 Float f = new Float(0);
 for(int i=0; ilength; i++) {
   f = new Float(f + 1.0f);
 }
 float result = f;
 System.out.println(result);
 
 Last time I looked at this, scalar replacement and loop optimizations
 did not play along at all, and here you need both.  Has this changed?
 ___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: review request (L): JDK changes for 7023639: JSR 292 method handle invocation needs a fast path for compiled code

2012-07-23 Thread Vladimir Kozlov
John,

Both webrevs point to jdk changes. Where are hotspot changes?

Vladimir

John Rose wrote:
 On Jul 13, 2012, at 2:41 AM, John Rose wrote:
 
 Here is that webrev:
  http://cr.openjdk.java.net/~jrose/7023639/webrev.jdk.00/

 These are the changes to JDK code that accompany the JVM changes 
 already under review.
 
 I have updated both webrevs to their final contents, as follows:
 
 http://cr.openjdk.java.net/~jrose/7023639/webrev.01/
 http://cr.openjdk.java.net/~jrose/7023639/webrev.jdk.01/
 
 In the same folder are incremental webrevs, for the record.
 
 Thanks to all reviewers for their help.
 
 — John
 
 P.S.  If there's something you don't like in one of the files, let us know.
 As I noted before, we can (and will) roll more adjustments into the next 
 push.
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: review request (XXXL): 7023639: JSR 292 method handle invocation needs a fast path for compiled code

2012-07-19 Thread Vladimir Kozlov
Nice.

Vladimir

Christian Thalinger wrote:
 We forgot to remove the Ricochet Frame code from the sA:
 
 http://cr.openjdk.java.net/~twisti/7023639/
 
 -- Chris
 
 On Jul 19, 2012, at 11:28 AM, Christian Thalinger wrote:
 
 JDK testing found a small bug:

 diff --git a/src/share/vm/oops/cpCacheOop.cpp 
 b/src/share/vm/oops/cpCacheOop.cpp
 --- a/src/share/vm/oops/cpCacheOop.cpp
 +++ b/src/share/vm/oops/cpCacheOop.cpp
 @@ -486,7 +486,8 @@
 // virtual and final so _f2 contains method ptr instead of vtable index
 if (f2_as_vfinal_method() == old_method) {
   // match old_method so need an update
 -  set_f2_as_vfinal_method(new_method);
 +  // NOTE: can't use set_f2_as_vfinal_method as it asserts on different 
 values
 +  _f2 = (intptr_t)new_method;
   if (RC_TRACE_IN_RANGE(0x0010, 0x0040)) {
 if (!(*trace_name_printed)) {
   // RC_TRACE_MESG macro has an embedded ResourceMark

 I will integrate this one.

 -- Chris

 On Jul 18, 2012, at 7:10 PM, John Rose wrote:

 On Jul 17, 2012, at 4:04 PM, Christian Thalinger wrote:

 I see in several files next code pattern. Should we call 
 throw_IncompatibleClassChangeError() as we do in other places?:

 +  if (!EnableInvokeDynamic) {
 +// rewriter does not generate this bytecode
 +__ should_not_reach_here();
 +return;
 +  }
 Hmm.  This really should not happen and EnableInvokeDynamic is on by 
 default anyway.  I doubt someone turns it off.
 It's now a diagnostic switch.  It can be used to verify that an app. is not 
 using 292, which is occasionally helpful.

 constantPoolOop.cpp:
 Why not use guarantee() for bad operants?
 Not sure.  John?
 The code in that file uses assert; I think I'm following the existing 
 practice there.  The CP code operates after initial verification passes, so 
 assert is suitable, I think.

 Why you need separate scopes in resolve_bootstrap_specifier_at_impl()?
 To keep oops from being visible.  Might result in bad crashes.
 Yes.  An alternative is to have the x_oop variables visible at top level, 
 but to put in DEBUG_ONLY(x_oop = NULL).  Having the temporary scoped seems 
 cleaner to me, but maybe my sense of style is off here.

 I will refresh the review patch in mlvm.  Thank you!
 Yes, thanks!

 — John
 ___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
 ___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
 
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: review request (XXXL): 7023639: JSR 292 method handle invocation needs a fast path for compiled code

2012-07-17 Thread Vladimir Kozlov
  linkResolver.cpp:
  Can you replace infinite for(;;) in resolve_invokedynamic() with finite 
  loop 
since the body is executed once or twice.
 
  Hmm.  What limit do you have in mind?

It is not loop by logic. The code is trying to avoid double the check for 
CallSite has been bound already. I think it could be re-arranged as next:

+  Handle bootstrap_specifier;
+  // Check if CallSite has been bound already:
+  ConstantPoolCacheEntry* cpce = pool-cache()-secondary_entry_at(index);
+  if (cpce-is_f1_null()) {
+int pool_index = 
pool-cache()-main_entry_at(index)-constant_pool_index();
+oop bsm_info = pool-resolve_bootstrap_specifier_at(pool_index, CHECK);
+assert(bsm_info != NULL, );
+// FIXME: Cache this once per BootstrapMethods entry, not once per 
CONSTANT_InvokeDynamic.
+bootstrap_specifier = Handle(THREAD, bsm_info);
+  }
+  if (!cpce-is_f1_null()) {
+methodHandle method(THREAD, cpce-f2_as_vfinal_method());
+Handle appendix(THREAD, cpce-has_appendix() ? cpce-f1_appendix() : 
(oop)NULL);
+result.set_handle(method, appendix, CHECK);
+return;
+  }

  bytecodeInfo.cpp:
  Don't add spaces into conditions, looks strange.
  You REALLY want me to remove it? ;-)

No, you can leave it as it is.

Vladimir

Christian Thalinger wrote:
 On Jul 12, 2012, at 6:27 PM, Vladimir Kozlov wrote:
 
 John,

 sharedRuntime_sparc.cpp:
 Why casting to (int)? Also use pointer_delta(code_end, code_start,1):
 +  __ set((int)(intptr_t)(code_end - code_start), temp2_reg);
 
 Done.
 
 You bound L_fail label twice, it should be local in range_check(). Use brx() 
 instead of br() since you compare pointers. And use cmp_and_brx_short() if 
 delayed instruction is nop().
 
 Done.
 
 Use fatal() instead of guarantee:
 guarantee(false, err_msg(special_dispatch=%d, special_dispatch));
 
 Done.
 
 interpreter_sparc.cpp:
 In generate_method_entry() use fatal() instead of ShouldNotReachHere():
 fatal(err_msg(unexpected method kind: %d, kind));
 
 Good idea.  Done.
 

 methodHandles_sparc.cpp:
 In MethodHandles::verify_klass() calls restore() should be after BINDs.
 
 Argh.  I haven't seen you've found this bug.  It took me a while to debug 
 this :-)
 
 In MethodHandles::jump_from_method_handle() use cmp_and_br_short(temp, 0, )
 
 Done.
 
 Instead of 100 use strlen(name)+50:
 +char* qname = NEW_C_HEAP_ARRAY(char, 100);
 +jio_snprintf(qname, 100,
 
 Done.
 
 sharedRuntime_x86_32.cpp:
 sharedRuntime_x86_64.cpp:
 The same problem with L_fail label as in sharedRuntime_sparc.cpp.
 
 Done.
 
 templateInterpreter_x86_32.cpp:
 templateInterpreter_x86_64.cpp:
 Again use use fatal() instead of ShouldNotReachHere() in 
 generate_method_entry()
 
 Done.
 
 I see in several files next code pattern. Should we call 
 throw_IncompatibleClassChangeError() as we do in other places?:

 +  if (!EnableInvokeDynamic) {
 +// rewriter does not generate this bytecode
 +__ should_not_reach_here();
 +return;
 +  }
 
 Hmm.  This really should not happen and EnableInvokeDynamic is on by default 
 anyway.  I doubt someone turns it off.
 
 c1_FrameMap.cpp:
 Why is ShouldNotReachHere() for mh_invoke in  
 FrameMap::java_calling_convention()?
 
 That was old, unused code.  Removed.
 
 c1_GraphBuilder.cpp:
 add parenthesis:
 const bool is_invokedynamic = code == Bytecodes::_invokedynamic;
 
 Done.
 
 nmethod.cpp:
 Don't put printing nmethod's addresses under Verbose flag.
 
 Leftover.  Removed.
 
 linkResolver.cpp:
 Can you replace infinite for(;;) in resolve_invokedynamic() with finite loop 
 since the body is executed once or twice.
 
 Hmm.  What limit do you have in mind?
 
 templateInterpreter.cpp:
 why you need additional {} around the loop?
 
 We don't.  I think it was used for better visibility of the MH code.  Removed.
 
 constantPoolOop.cpp:
 Why not use guarantee() for bad operants?
 
 Not sure.  John?
 
 Why you need separate scopes in resolve_bootstrap_specifier_at_impl()?
 
 To keep oops from being visible.  Might result in bad crashes.
 
 symbol.cpp:
 The loop in index_of_at() should be for(; scan = limit; scan++) and after 
 loop return -1.
 
 Done.
 
 bytecodeInfo.cpp:
 Don't add spaces into conditions, looks strange.
 
 It's more readable:
 
   if ( callee-is_native()) return native method;
   if ( callee-is_abstract())   return abstract method;
   if (!callee-can_be_compiled())   return not compilable 
 (disabled);
   if (!callee-has_balanced_monitors()) return not compilable 
 (unbalanced monitors);
   if ( callee-get_flow_analysis()-failing())  return not compilable (flow 
 analysis failed);
 
 vs.
 
   if (callee-is_native())  return native method;
   if (callee-is_abstract())return abstract method;
   if (!callee-can_be_compiled())   return not compilable 
 (disabled);
   if (!callee-has_balanced_monitors()) return not compilable 
 (unbalanced monitors);
   if (callee

Re: review request (L): JDK changes for 7023639: JSR 292 method handle invocation needs a fast path for compiled code

2012-07-13 Thread Vladimir Kozlov
BoundMethodHandle.java:
   I am concern that BMH subclass names are looks like constants names: 
BMH_LLI. 
In several places you have BMH constants and variables:

+ final ClassBoundMethodHandle BMH = BoundMethodHandle.class;

+ static final String BMH = java/lang/invoke/BoundMethodHandle;


ThrowExceptionsTest.java: empty diffs

ValueConversionsTest.java:
  remove commented print statement if it is not needed:

-System.out.println(arrayType.getSimpleName());
+//System.out.println(arrayType.getSimpleName());

Vladimir

John Rose wrote:
 On Jul 11, 2012, at 5:53 PM, John Rose wrote:
 
 As some of you have noticed, Chris Thalinger, Michael Haupt, and I have been 
 working on the mlvm patches [1] for JEP-160 [2] for several months.  These 
 changes make method handles more optimizable.  They refactor lots of magic 
 out of the JVM and into more manageable Java code.
 …
 An associated webrev for hotspot-comp/jdk/ will be posted soon; it is 
 already present on mlvm-dev for the curious to examine.  (This change set 
 also deletes a lot of old code.)
 
 Here is that webrev:
   http://cr.openjdk.java.net/~jrose/7023639/webrev.jdk.00/
 
 These are the changes to JDK code that accompany the JVM changes already 
 under review.
 
 There are 2900 LOC deleted, and 7000 LOC added.  Key changes:
  - method handle behavior is fully represented by LambdaForm objects
  - chained method handles (including adapter method handles) are gone
  - an ASM-based bytecode spinner compiles LambdaForms when they warm up
  - bound method handles are compactly represented without boxing
  - the private symbol-resolution interface to the JVM (MemberName) is improved
  - unit tests have more systematic coverage
  - a number of minor bugs are fixed
 
 This is implementation work.  No public Java APIs are changed, although the 
 javadoc is slightly edited for clarity.
 
 Please have a look.
 
 — John
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: review request (M): 6711908: JVM needs direct access to some annotations

2012-07-12 Thread Vladimir Kozlov
This looks good.

Thanks,
Vladimir

On 7/11/12 11:54 PM, John Rose wrote:
 On Jul 11, 2012, at 5:25 PM, Vladimir Kozlov wrote:

  But I'll change it if you insist.

 Please, change. Also put final klass settings Fill in field values from 
 parse_classfile_attributes in a separate
 ClassFileParser method.

 Done. Here is the updated webrev:
 http://cr.openjdk.java.net/~jrose/6711908/webrev.01

 (See synopsis below.)

 — John


 @@ -62,6 +64,22 @@
 typeArrayHandle _inner_classes;
 typeArrayHandle _annotations;
 + void set_class_synthetic_flag(bool x) { _synthetic_flag = x; }
 + void set_class_sourcefile(Symbol* x) { _sourcefile = x; }
 + void set_class_generic_signature(Symbol* x) { _generic_signature = x; }
 + void set_class_sde_symbol(Symbol* x) { _sde_symbol = x; }
 + void set_class_inner_classes(typeArrayHandle x) { _inner_classes = x; }
 + void set_class_annotations(typeArrayHandle x) { _annotations = x; }
 + void init_parsed_class_attributes() {
 + _synthetic_flag = false;
 + _sourcefile = NULL;
 + _generic_signature = NULL;
 + // initialize the other flags too:
 + _has_finalizer = _has_empty_finalizer = _has_vanilla_constructor = false;
 + _max_bootstrap_specifier_index = -1;
 + }
 + void apply_parsed_class_attributes(instanceKlassHandle k); // update k
 +
 class AnnotationCollector {
 public:
 enum Location { _in_field, _in_method, _in_class };


 @@ -2866,6 +2863,21 @@
 }
 }
 +void ClassFileParser::apply_parsed_class_attributes(instanceKlassHandle k) {
 + if (_synthetic_flag)
 + k-set_is_synthetic();
 + if (_sourcefile != NULL) {
 + _sourcefile-increment_refcount();
 + k-set_source_file_name(_sourcefile);
 + }
 + if (_generic_signature != NULL) {
 + _generic_signature-increment_refcount();
 + k-set_generic_signature(_generic_signature);
 + }
 + k-set_source_debug_extension(_sde_symbol); // increment_refcount inside
 + k-set_inner_classes(_inner_classes());
 + k-set_class_annotations(_annotations());
 +}
 typeArrayHandle ClassFileParser::assemble_annotations(u1* 
 runtime_visible_annotations,
 int runtime_visible_annotations_length,

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: review request (XXXL): 7023639: JSR 292 method handle invocation needs a fast path for compiled code

2012-07-12 Thread Vladimir Kozlov
John,

sharedRuntime_sparc.cpp:
  Why casting to (int)? Also use pointer_delta(code_end, code_start,1):
+  __ set((int)(intptr_t)(code_end - code_start), temp2_reg);

  You bound L_fail label twice, it should be local in range_check(). Use brx() 
instead of br() since you compare pointers. And use cmp_and_brx_short() if 
delayed instruction is nop().

  Use fatal() instead of guarantee:
guarantee(false, err_msg(special_dispatch=%d, special_dispatch));

interpreter_sparc.cpp:
  In generate_method_entry() use fatal() instead of ShouldNotReachHere():
fatal(err_msg(unexpected method kind: %d, kind));


methodHandles_sparc.cpp:
  In MethodHandles::verify_klass() calls restore() should be after BINDs.

  In MethodHandles::jump_from_method_handle() use cmp_and_br_short(temp, 0, )

  Instead of 100 use strlen(name)+50:
+char* qname = NEW_C_HEAP_ARRAY(char, 100);
+jio_snprintf(qname, 100,

sharedRuntime_x86_32.cpp:
sharedRuntime_x86_64.cpp:
  The same problem with L_fail label as in sharedRuntime_sparc.cpp.

templateInterpreter_x86_32.cpp:
templateInterpreter_x86_64.cpp:
  Again use use fatal() instead of ShouldNotReachHere() in 
generate_method_entry()

I see in several files next code pattern. Should we call 
throw_IncompatibleClassChangeError() as we do in other places?:

+  if (!EnableInvokeDynamic) {
+// rewriter does not generate this bytecode
+__ should_not_reach_here();
+return;
+  }

c1_FrameMap.cpp:
  Why is ShouldNotReachHere() for mh_invoke in  
FrameMap::java_calling_convention()?

c1_GraphBuilder.cpp:
  add parenthesis:
const bool is_invokedynamic = code == Bytecodes::_invokedynamic;

nmethod.cpp:
  Don't put printing nmethod's addresses under Verbose flag.

linkResolver.cpp:
  Can you replace infinite for(;;) in resolve_invokedynamic() with finite loop 
since the body is executed once or twice.

templateInterpreter.cpp:
  why you need additional {} around the loop?

constantPoolOop.cpp:
  Why not use guarantee() for bad operants?
  Why you need separate scopes in resolve_bootstrap_specifier_at_impl()?

symbol.cpp:
  The loop in index_of_at() should be for(; scan = limit; scan++) and after 
loop return -1.

bytecodeInfo.cpp:
Don't add spaces into conditions, looks strange.
Remove commented code for inline ForceInline methods.

callGenerator.cpp:
Please, decide which code to use: +#if 1. And I don't think new code is correct.

graphKit.cpp:
Remove commented debug print.
insert_argument() and remove_argument() are not used.


Vladimir

John Rose wrote:
 As some of you have noticed, Chris Thalinger, Michael Haupt, and I have been 
 working on the mlvm patches [1] for JEP-160 [2] for several months.  These 
 changes make method handles more optimizable.  They refactor lots of magic 
 out of the JVM and into more manageable Java code.
 
 To get an idea of how much magic is being removed, consider that the change 
 removes 12,000 lines of non-comment code from the JVM, including much 
 assembly code.  It inserts 4900 lines of non-comment code.
 
 These changes are now stable enough to integrate.  They pass jtreg tests in a 
 number of execution modes and platforms.  They also correctly run various 
 JRuby and Nashorn test programs.  Although there are no performance gains to 
 boast about at present, these changes clear the ground for long-term 
 optimization work.
 
 Here is the webrev [3], for review and integration into JDK 8 via 
 hotspot-comp/hotspot/.
 
 Because of the large size of this change set, we request that reviewers would 
 clearly designate which files they are reviewing.  That way we may be able to 
 divide up the work a little more effectively.
 
 Also, because of the scope of the change, we may respond to some comments by 
 promising to address an issue in a future change set.  If necessary, we will 
 file tracking bugs to make sure nothing gets dropped.  We have been working 
 on this for months, and expect to make many further changes.
 
 The immediate need to get the changes in is twofold:  First, some bugs 
 (involving symbolic references off the boot class path) require the new 
 Lambda Form intermediate representation, which is off-BCP-clean.  Second, 
 we need to commit our pervasive changes to the JVM sooner rather than later, 
 so they can be properly integrated with other pervasive changes, such as 
 metadata changes.
 
 An associated webrev for hotspot-comp/jdk/ will be posted soon; it is already 
 present on mlvm-dev for the curious to examine.  (This change set also 
 deletes a lot of old code.)
 
 Thanks in advance,
 — John
 
 [1] 
 http://hg.openjdk.java.net/mlvm/mlvm/hotspot/file/tip/meth-lazy-7023639.patch
 [2] http://openjdk.java.net/jeps/160
 [3] http://cr.openjdk.java.net/~jrose/7023639/webrev.00/
 
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: review request (M): 6711908: JVM needs direct access to some annotations

2012-07-11 Thread Vladimir Kozlov
c1_GraphBuilder.cpp, can you remove setting bailout message for forced 
inlining? 
It should be done proper uniform way for C1 inlining later.

I think, next assert should check (id = _unknown  id  _annotation_LIMIT) 
instead:

+void set_annotation(ID id) {
+  assert((int)id  0  (int)id  BitsPerInt, oob);


In header file classFileParser.hpp you should not specify ClassFileParser:: in 
method parse_classfile_attributes(() declaration:

ClassFileParser::ClassAnnotationCollector* parsed_annotations


Instead of asserts in apply_to() methods we should use guarantee(not 
implemented) or something.


I don't think next should be part of these changes:

+#if 0
+// The parsing of @Retention is for example only.


Add parenthesis around expression in next condition:

+  while (--nann = 0  index-2 + min_size = limit) {


Instead of passing pointers to classFileParser's new attributes fields 
(_synthetic_flag, _sourcefile, ...) as arguments add accessors functions which 
set these fields.


I think next is typo, should be _in_method check:

+  case vmSymbols::VM_SYMBOL_ENUM_NAME(java_lang_invoke_ForceInline_signature):
+if (_location != _in_class)  break;
+return _method_ForceInline;
+  default: break;
+  }

Thanks,
Vladimir

John Rose wrote:
 This is a building block for an upcoming large enhancement for method handles.
 
 http://cr.openjdk.java.net/~jrose/6711908/webrev.00
 
 The implemented interface is completely private to the java.lang.invoke 
 package.
 
 — John
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: review request (M): 6711908: JVM needs direct access to some annotations

2012-07-11 Thread Vladimir Kozlov
  Yes, Michael asked about that too.  The point is to show how annotation
  payloads can be processed.  Somebody will need it at some point.  I'll
  make it into a comment; is that OK?

OK.

  Instead of passing pointers to classFileParser's new attributes fields
  (_synthetic_flag, _sourcefile, ...) as arguments add accessors
  functions which set these fields.
 
  The pointer passing is following the pattern already present for parsing
  other constructs.  parse_fields is the clearest example.  The parsed
  information is returned by reference.  I could do as you suggest, but it

For parse_fields() references for local variables are passed. It is different 
from passing references to classFileParser's fields which are accessible inside 
methods.

  would work only for top-level class attributes, so there would still be
  a mix of styles.  My thought is to make the style more uniform by
  relying on the return-by-reference pattern.
 
  But I'll change it if you insist.

Please, change. Also put final klass settings Fill in field values from 
parse_classfile_attributes in a separate ClassFileParser method.

Thanks,
Vladimir

John Rose wrote:
 On Jul 11, 2012, at 12:01 AM, Michael Haupt wrote:
 
 @@ -1636,16 +1648,163 @@
 The code for parsing @Retention deserves a comment highlighting that 
 it is about parsing an annotation with payload (none of the 
 annotations introduced by our work do this).

 @@ -2560,10 +2727,11 @@
 TempNewSymbol sde_symbol is never used.
 
 Fixed; thanks.
 
 
 On Jul 11, 2012, at 2:35 PM, Vladimir Kozlov wrote:
 
 c1_GraphBuilder.cpp, can you remove setting bailout message for forced 
 inlining? It should be done proper uniform way for C1 inlining later.
 
 Done.
 
 I think, next assert should check (id = _unknown  id  
 _annotation_LIMIT) instead:

 +void set_annotation(ID id) {
 +  assert((int)id  0  (int)id  BitsPerInt, oob);
 
 Good.  I added this to the constructor
   assert((int)_annotation_LIMIT = (int)sizeof(_annotations_present) * 
 BitsPerByte, );
 
 In header file classFileParser.hpp you should not specify 
 ClassFileParser:: in method parse_classfile_attributes(() declaration:

 ClassFileParser::ClassAnnotationCollector* parsed_annotations
 
 Good catch.
 
 Instead of asserts in apply_to() methods we should use guarantee(not 
 implemented) or something.
 
 Done:
   +  guarantee(false, no field annotations yet);
 

 I don't think next should be part of these changes:

 +#if 0
 +// The parsing of @Retention is for example only.
 
 Yes, Michael asked about that too.  The point is to show how annotation 
 payloads can be processed.  Somebody will need it at some point.  I'll 
 make it into a comment; is that OK?
 +// For the record, here is how annotation payloads can be collected.
 +// Suppose we want to capture @Retention.value.  Here is how:
 +//if (id == AnnotationCollector::_class_Retention) {
 
 Add parenthesis around expression in next condition:

 +  while (--nann = 0  index-2 + min_size = limit) {
 
 Done:
   +  while ((--nann) = 0  (index-2 + min_size = limit)) {
 
 Instead of passing pointers to classFileParser's new attributes fields 
 (_synthetic_flag, _sourcefile, ...) as arguments add accessors 
 functions which set these fields.
 
 The pointer passing is following the pattern already present for parsing 
 other constructs.  parse_fields is the clearest example.  The parsed 
 information is returned by reference.  I could do as you suggest, but it 
 would work only for top-level class attributes, so there would still be 
 a mix of styles.  My thought is to make the style more uniform by 
 relying on the return-by-reference pattern.
 
 But I'll change it if you insist.
 
 I think next is typo, should be _in_method check:

 +  case 
 vmSymbols::VM_SYMBOL_ENUM_NAME(java_lang_invoke_ForceInline_signature):
 +if (_location != _in_class)  break;
 +return _method_ForceInline;
 +  default: break;
 +  }
 
 Yes; thanks.
 
 — John
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled

2011-08-08 Thread Vladimir Kozlov
Christian,

Should we put skip bytecode quickening code under flag to do this only when 
invoke dynamic is enabled? Or put_code is 
zero only in invoke dynamic case?

On 8/8/11 6:56 AM, Christian Thalinger wrote:
 Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use 
 movl() (only 32  bit)?

 Good question.  I took the code from TemplateTable::resolve_cache_and_index 
 without thinking about it and that one uses ld_ptr.

 _indices in CosntantPoolCacheEntry is defined as intx:

volatile intx _indices;  // constant pool index  rewrite bytecodes

 and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word:

 // bit number |310|
 // bit length |-8--|-8--|---16|
 // 
 // _indices   [ b2 | b1 |  index  ]

 Loading 32-bit on LE gives you the right bits but on BE it does not.  I think 
 that's the reason for the optimization on x64.

I don't like this optimization but I understand why we using it. Add a 
comment (especially in x64 file).


 I am concern about using next short branch in new code in 
 templateTable_sparc.cpp:

 cmp_and_br_short(..., L_patch_done);  // don't patch

 There is __ stop() call which generates a lot of code so that label 
 L_patch_done could be far.

 Yeah, I thought I give it a try if it works.  cmp_and_br_short should assert 
 if the branch displacement is too far, right?


Yes, it will assert but may be only in some worst case which we do not test. 
For example, try to run 64 bit fastdebug VM 
on Sparc + compressed oops + VerifyOops.



 Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs 
 it? Nothing else is changed in these files.

 Both files use dependencies and I got linkage errors on Linux while working 
 on the fix (because of inline methods).  It seems that the include is not 
 required in ciEnv.cpp because ciEnv.hpp already includes it.  I missed that.  
 But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies.


OK.



 Why you did not leave volatile call site inlining with guard? You did not 
 explain why virtual call is fine for it.

 The spec of MutableCallSite says:

 For target values which will be frequently updated, consider using a 
 volatile call site instead.

 And VolatileCallSite says:

 A VolatileCallSite is a CallSite whose target acts like a volatile variable. 
 An invokedynamic instruction linked to a VolatileCallSite sees updates to its 
 call site target immediately, even if the update occurs in another thread. 
 There may be a performance penalty for such tight coupling between threads.

 Unlike MutableCallSite, there is no syncAll operation on volatile call sites, 
 since every write to a volatile variable is implicitly synchronized with 
 reader threads.

 In other respects, a VolatileCallSite is interchangeable with 
 MutableCallSite.

 Since VolatileCallSite really should only be used when you know the target 
 changes very often we don't do optimizations for this case.  Obviously this 
 is just a guess how people will use VolatileCallSite but I think for now this 
 is a safe bet.


Thank you for explaining it.

 Additionally I had to do two small changes because the build was broken on 
 some configurations:

 -  klassOop new_type = _changes.is_klass_change() ? 
 _changes.as_klass_change()-new_type() : NULL;
 +  klassOop new_type = _changes.is_klass_change() ? 
 _changes.as_klass_change()-new_type() : (klassOop) NULL;

 and

 -  MutexLockerEx ccl(CodeCache_lock, thread);
 +  MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag);

 I updated the webrev.

Good.

Vladimir


 -- Christian



 Vladimir

 On 8/5/11 6:32 AM, Christian Thalinger wrote:
 http://cr.openjdk.java.net/~twisti/7071653

 7071653: JSR 292: call site change notification should be pushed not pulled
 Reviewed-by:

 Currently every speculatively inlined method handle call site has a
 guard that compares the current target of the CallSite object to the
 inlined one.  This per-invocation overhead can be removed if the
 notification is changed from pulled to pushed (i.e. deoptimization).

 I had to change the logic in TemplateTable::patch_bytecode to skip
 bytecode quickening for putfield instructions when the put_code
 written to the constant pool cache is zero.  This is required so that
 every execution of a putfield to CallSite.target calls out to
 InterpreterRuntime::resolve_get_put to do the deoptimization of
 depending compiled methods.

 I also had to change the dependency machinery to understand other
 dependencies than class hierarchy ones.  DepChange got the super-type
 of two new dependencies, KlassDepChange and CallSiteDepChange.

 Tested with JRuby tests and benchmarks, hand-written testcases, JDK
 tests and vm.mlvm tests.

 Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147,
 second with 7071653).  Since the CallSite targets don't change during
 the runtime of this benchmark we can see the performance 

Re: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled

2011-08-08 Thread Vladimir Kozlov
Christian Thalinger wrote:
 On Aug 8, 2011, at 4:55 PM, Vladimir Kozlov wrote:
 
 Christian,

 Should we put skip bytecode quickening code under flag to do this only 
 when invoke dynamic is enabled? Or put_code is zero only in invoke dynamic 
 case?
 
 No, it doesn't buy us anything.  The new checking code is only executed the 
 first time as the bytecodes are quickened right after that.  And in the case 
 where a putfield isn't quickened and we call resolve_get_put it gets very 
 expensive anyway.

You lost me here. New code in resolve_get_put() is executed only for putfield 
to 
CallSite.target. But new code in patch_bytecode() skips quickening for all 
putfield bytecodes. My question is: can you narrow skipping quickening only for 
putfield to CallSite.target? Or you are saying that there is no performance 
difference between executing _aputfield vs _fast_aputfield?

Vladimir

 
 On 8/8/11 6:56 AM, Christian Thalinger wrote:
 Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you 
 use movl() (only 32  bit)?
 Good question.  I took the code from TemplateTable::resolve_cache_and_index 
 without thinking about it and that one uses ld_ptr.

 _indices in CosntantPoolCacheEntry is defined as intx:

   volatile intx _indices;  // constant pool index  rewrite bytecodes

 and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word:

 // bit number |310|
 // bit length |-8--|-8--|---16|
 // 
 // _indices   [ b2 | b1 |  index  ]

 Loading 32-bit on LE gives you the right bits but on BE it does not.  I 
 think that's the reason for the optimization on x64.
 I don't like this optimization but I understand why we using it. Add a 
 comment (especially in x64 file).
 
 I factored reading the bytecode into 
 InterpreterMacroAssembler::get_cache_and_index_and_bytecode_at_bcp since the 
 same code is used twice in TemplateTable and added the comment there.
 
 I am concern about using next short branch in new code in 
 templateTable_sparc.cpp:

 cmp_and_br_short(..., L_patch_done);  // don't patch

 There is __ stop() call which generates a lot of code so that label 
 L_patch_done could be far.
 Yeah, I thought I give it a try if it works.  cmp_and_br_short should 
 assert if the branch displacement is too far, right?

 Yes, it will assert but may be only in some worst case which we do not test. 
 For example, try to run 64 bit fastdebug VM on Sparc + compressed oops + 
 VerifyOops.
 
 That works.
 

 Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs 
 it? Nothing else is changed in these files.
 Both files use dependencies and I got linkage errors on Linux while working 
 on the fix (because of inline methods).  It seems that the include is not 
 required in ciEnv.cpp because ciEnv.hpp already includes it.  I missed 
 that.  But nmethod.cpp needs it because nmethod.hpp only declares class 
 Dependencies.

 OK.

 Why you did not leave volatile call site inlining with guard? You did 
 not explain why virtual call is fine for it.
 The spec of MutableCallSite says:

 For target values which will be frequently updated, consider using a 
 volatile call site instead.

 And VolatileCallSite says:

 A VolatileCallSite is a CallSite whose target acts like a volatile 
 variable. An invokedynamic instruction linked to a VolatileCallSite sees 
 updates to its call site target immediately, even if the update occurs in 
 another thread. There may be a performance penalty for such tight coupling 
 between threads.

 Unlike MutableCallSite, there is no syncAll operation on volatile call 
 sites, since every write to a volatile variable is implicitly synchronized 
 with reader threads.

 In other respects, a VolatileCallSite is interchangeable with 
 MutableCallSite.

 Since VolatileCallSite really should only be used when you know the target 
 changes very often we don't do optimizations for this case.  Obviously this 
 is just a guess how people will use VolatileCallSite but I think for now 
 this is a safe bet.

 Thank you for explaining it.

 Additionally I had to do two small changes because the build was broken on 
 some configurations:

 -  klassOop new_type = _changes.is_klass_change() ? 
 _changes.as_klass_change()-new_type() : NULL;
 +  klassOop new_type = _changes.is_klass_change() ? 
 _changes.as_klass_change()-new_type() : (klassOop) NULL;

 and

 -  MutexLockerEx ccl(CodeCache_lock, thread);
 +  MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag);

 I updated the webrev.
 Good.
 
 Thanks.
 
 -- Christian
 
 Vladimir

 -- Christian


 Vladimir

 On 8/5/11 6:32 AM, Christian Thalinger wrote:
 http://cr.openjdk.java.net/~twisti/7071653

 7071653: JSR 292: call site change notification should be pushed not 
 pulled
 Reviewed-by:

 Currently every speculatively inlined method handle call site has a
 guard that compares the current target of the CallSite object to the
 inlined one.  This per

Re: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled

2011-08-08 Thread Vladimir Kozlov
Tom Rodriguez wrote:
 On Aug 8, 2011, at 11:52 AM, Vladimir Kozlov wrote:
 
 Christian Thalinger wrote:
 On Aug 8, 2011, at 4:55 PM, Vladimir Kozlov wrote:

 Christian,

 Should we put skip bytecode quickening code under flag to do this only 
 when invoke dynamic is enabled? Or put_code is zero only in invoke dynamic 
 case?
 No, it doesn't buy us anything.  The new checking code is only executed the 
 first time as the bytecodes are quickened right after that.  And in the 
 case where a putfield isn't quickened and we call resolve_get_put it gets 
 very expensive anyway.
 You lost me here. New code in resolve_get_put() is executed only for 
 putfield to 
 CallSite.target. But new code in patch_bytecode() skips quickening for all 
 putfield bytecodes. My question is: can you narrow skipping quickening only 
 for 
 putfield to CallSite.target? Or you are saying that there is no performance 
 difference between executing _aputfield vs _fast_aputfield?
 
 It only skips quickening if put_code is zero, which is only done for 
 CallSite.target.  All the others proceed as they used to.

Good. Thank you, Tom

Vladimir

 
 tom
 
 Vladimir

 On 8/8/11 6:56 AM, Christian Thalinger wrote:
 Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you 
 use movl() (only 32  bit)?
 Good question.  I took the code from 
 TemplateTable::resolve_cache_and_index without thinking about it and that 
 one uses ld_ptr.

 _indices in CosntantPoolCacheEntry is defined as intx:

  volatile intx _indices;  // constant pool index  rewrite bytecodes

 and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word:

 // bit number |310|
 // bit length |-8--|-8--|---16|
 // 
 // _indices   [ b2 | b1 |  index  ]

 Loading 32-bit on LE gives you the right bits but on BE it does not.  I 
 think that's the reason for the optimization on x64.
 I don't like this optimization but I understand why we using it. Add a 
 comment (especially in x64 file).
 I factored reading the bytecode into 
 InterpreterMacroAssembler::get_cache_and_index_and_bytecode_at_bcp since 
 the same code is used twice in TemplateTable and added the comment there.

 I am concern about using next short branch in new code in 
 templateTable_sparc.cpp:

 cmp_and_br_short(..., L_patch_done);  // don't patch

 There is __ stop() call which generates a lot of code so that label 
 L_patch_done could be far.
 Yeah, I thought I give it a try if it works.  cmp_and_br_short should 
 assert if the branch displacement is too far, right?

 Yes, it will assert but may be only in some worst case which we do not 
 test. For example, try to run 64 bit fastdebug VM on Sparc + compressed 
 oops + VerifyOops.
 That works.

 Why you added new #include into ciEnv.cpp and nmethod.cpp, what code 
 needs it? Nothing else is changed in these files.
 Both files use dependencies and I got linkage errors on Linux while 
 working on the fix (because of inline methods).  It seems that the 
 include is not required in ciEnv.cpp because ciEnv.hpp already includes 
 it.  I missed that.  But nmethod.cpp needs it because nmethod.hpp only 
 declares class Dependencies.

 OK.

 Why you did not leave volatile call site inlining with guard? You did 
 not explain why virtual call is fine for it.
 The spec of MutableCallSite says:

 For target values which will be frequently updated, consider using a 
 volatile call site instead.

 And VolatileCallSite says:

 A VolatileCallSite is a CallSite whose target acts like a volatile 
 variable. An invokedynamic instruction linked to a VolatileCallSite sees 
 updates to its call site target immediately, even if the update occurs in 
 another thread. There may be a performance penalty for such tight 
 coupling between threads.

 Unlike MutableCallSite, there is no syncAll operation on volatile call 
 sites, since every write to a volatile variable is implicitly 
 synchronized with reader threads.

 In other respects, a VolatileCallSite is interchangeable with 
 MutableCallSite.

 Since VolatileCallSite really should only be used when you know the 
 target changes very often we don't do optimizations for this case.  
 Obviously this is just a guess how people will use VolatileCallSite but I 
 think for now this is a safe bet.

 Thank you for explaining it.

 Additionally I had to do two small changes because the build was broken 
 on some configurations:

 -  klassOop new_type = _changes.is_klass_change() ? 
 _changes.as_klass_change()-new_type() : NULL;
 +  klassOop new_type = _changes.is_klass_change() ? 
 _changes.as_klass_change()-new_type() : (klassOop) NULL;

 and

 -  MutexLockerEx ccl(CodeCache_lock, thread);
 +  MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag);

 I updated the webrev.
 Good.
 Thanks.

 -- Christian

 Vladimir

 -- Christian

 Vladimir

 On 8/5/11 6:32 AM, Christian Thalinger wrote:
 http://cr.openjdk.java.net/~twisti/7071653

 7071653: JSR 292: call

Re: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled

2011-08-07 Thread Vladimir Kozlov
Christian,

You need to add big comment to the new code in templateTable_arch.cpp 
explaining what it does and why.

Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use 
movl() (only 32  bit)?

Add assert(byte_no == -1, ) to default: case to make sure you got all cases 
above it.

I am concern about using next short branch in new code in 
templateTable_sparc.cpp:

cmp_and_br_short(..., L_patch_done);  // don't patch

There is __ stop() call which generates a lot of code so that label 
L_patch_done could be far.


Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? 
Nothing else is changed in these files.

I don't like assignments in condition and implicit NULL checks. Can you change 
check_dependency() to next?:

   klassOop check_dependency() {
 klassOop result = check_klass_dependency(NULL);
 if (result != NULL) return result;
 return check_call_site_dependency(NULL);
   }

In interpreterRuntime.cpp initialize marked:  int marked = 0;

Why you did not leave volatile call site inlining with guard? You did not 
explain why virtual call is fine for it.


Vladimir

On 8/5/11 6:32 AM, Christian Thalinger wrote:
 http://cr.openjdk.java.net/~twisti/7071653

 7071653: JSR 292: call site change notification should be pushed not pulled
 Reviewed-by:

 Currently every speculatively inlined method handle call site has a
 guard that compares the current target of the CallSite object to the
 inlined one.  This per-invocation overhead can be removed if the
 notification is changed from pulled to pushed (i.e. deoptimization).

 I had to change the logic in TemplateTable::patch_bytecode to skip
 bytecode quickening for putfield instructions when the put_code
 written to the constant pool cache is zero.  This is required so that
 every execution of a putfield to CallSite.target calls out to
 InterpreterRuntime::resolve_get_put to do the deoptimization of
 depending compiled methods.

 I also had to change the dependency machinery to understand other
 dependencies than class hierarchy ones.  DepChange got the super-type
 of two new dependencies, KlassDepChange and CallSiteDepChange.

 Tested with JRuby tests and benchmarks, hand-written testcases, JDK
 tests and vm.mlvm tests.

 Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147,
 second with 7071653).  Since the CallSite targets don't change during
 the runtime of this benchmark we can see the performance benefit of
 eliminating the guard:

 $ jruby --server bench/bench_fib_recursive.rb 5 35
0.883000   0.00   0.883000 (  0.854000)
0.715000   0.00   0.715000 (  0.715000)
0.712000   0.00   0.712000 (  0.712000)
0.713000   0.00   0.713000 (  0.713000)
0.713000   0.00   0.713000 (  0.712000)

 $ jruby --server bench/bench_fib_recursive.rb 5 35
0.772000   0.00   0.772000 (  0.742000)
0.624000   0.00   0.624000 (  0.624000)
0.621000   0.00   0.621000 (  0.621000)
0.622000   0.00   0.622000 (  0.622000)
0.622000   0.00   0.622000 (  0.621000)

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: review request (M): 7044892: JSR 292: API entry points sometimes throw the wrong exceptions or doesn't throw the expected one

2011-05-17 Thread Vladimir Kozlov
java/lang/invoke/MethodTypeForm.java

double c != void.cl check:

+if (c == void.class)
+c = null;  // a Void parameter was unwrapped to void; ignore
+if (c != null  c != void.class) {

Otherwise looks good as far as I understand.

Vladimir

John Rose wrote:
 http://cr.openjdk.java.net/~jrose/7044892/webrev.00/
 
 7044892: JSR 292: API entry points sometimes throw the wrong exceptions or 
 doesn't throw the expected one
 
 This is basically a bundle of point fixes having to do with corner cases.
 
 Grouped under this bug:
  
 7038847: MethodType.fromMethodDescriptorString accepts both binary 
 names/internal form of binary names
 7038860: MethodType.methodType(Class rtype, Class[] ptypes) doesn't throw NPE 
 if ptypes is null
 7042656: JSR292: invokeExact/Generic doesn't throw 
 UnsupportedOperationException if invoked via Method.invoke
 7042829: JSR292: MethodHandles$Lookup.findStatic[S|G]etter throws 
 InternalError if SecurityManager is set
 7041853: findGetter throws unexpected IllegalAccessException
 -NOBUG-: asCollector throws ArrayIndexOutOfBoundsException instead of 
 IllegalArgumentException for values outside [0..255]
 -NOBUG-: asVarargsCollector gets wrong trailing parameter type
 -NOBUG-: MethodType.unwrap on chokes on void if class Void occurs in the 
 parameter list
 -NOBUG-: MethodHandle.toString need to produce compliant output
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Request for reviews (S): 7001363: java/dyn/InvokeDynamic should not be a well-known class in the JVM

2010-11-30 Thread Vladimir Kozlov
Looks good.

Vladimir

Christian Thalinger wrote:
 http://cr.openjdk.java.net/~twisti/7001363/webrev.01/
 
 7001363: java/dyn/InvokeDynamic should not be a well-known class in the JVM
 Summary: Because of the removal of language support, the JDK 7 API for JSR 
 292 no longer includes a public class named java/dyn/InvokeDynamic.
 Reviewed-by:
 
 Because of the removal of language support, the JDK 7 API for JSR 292
 no longer includes a public class named java/dyn/InvokeDynamic.
 
 However, the JVM uses java/dyn/InvokeDynamic (arbitrarily) as a marker
 class for invokedynamic instructions. This practice needs to change,
 and the class may then be removed from the JDK.
 
 John already reviewed the changes, I'm just sending an official webrev for 
 completeness.  Please feel free to review it too!
 
 
 Begin forwarded message:
 From: John Rose john.r.r...@oracle.com
 Date: November 30, 2010 12:52:41 AM GMT+01:00
 To: Christian Thalinger christian.thalin...@oracle.com
 Subject: Re: 7001363: java/dyn/InvokeDynamic should not be a well-known 
 class in the JVM

 On Nov 29, 2010, at 7:08 AM, Christian Thalinger wrote:

 On Nov 19, 2010, at 8:44 AM, John Rose wrote:
 Can I hand you this one?

 It's not completely clear to me what we have to remove.  As far as I 
 understand it invokedynamic instructions should still work but all 
 references to java/dyn/InvokeDynamic should disappear.  This is preliminary 
 webrev:

 http://cr.openjdk.java.net/~twisti/7001363/webrev.00/

 These changes do exactly that.  I can still run invokedynamic programs but 
 the InvokeDynamic class is not referenced and loaded anymore.
 Reviewed.  Looks good.  (You can CC this to hotspot-compiler-dev if you 
 want.)  -- John
 
 
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev