Re: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly

2015-04-16 Thread Vladimir Ivanov

Roland, thanks a lot for the review!

Best regards,
Vladimir Ivanov

On 4/15/15 7:43 PM, Roland Westrelin wrote:

  http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/


That looks good to me.

Roland.


___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly

2015-04-15 Thread Roland Westrelin
>  http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/

That looks good to me.

Roland.
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly

2015-04-15 Thread Roland Westrelin
Hi Vladimir,

>  http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/hotspot/

In ciCallSite::get_context(), is it safe to manipulate a raw oop the way you do 
it (with 2 different oops). Can’t it be moved concurrently by the GC?

Roland.

>  http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/jdk/
> 
> Best regards,
> Vladimir Ivanov
> 
> On 4/1/15 11:56 PM, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/
>> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/
>> https://bugs.openjdk.java.net/browse/JDK-8057967
>> 
>> HotSpot JITs inline very aggressively through CallSites. The
>> optimistically treat CallSite target as constant, but record a nmethod
>> dependency to invalidate the compiled code once CallSite target changes.
>> 
>> Right now, such dependencies have call site class as a context. This
>> context is too coarse and it leads to context pollution: if some
>> CallSite target changes, VM needs to enumerate all nmethods which
>> depends on call sites of such type.
>> 
>> As performance analysis in the bug report shows, it can sum to
>> significant amount of work.
>> 
>> While working on the fix, I investigated 3 approaches:
>>   (1) unique context per call site
>>   (2) use CallSite target class
>>   (3) use a class the CallSite instance is linked to
>> 
>> Considering call sites are ubiquitous (e.g. 10,000s on some octane
>> benchmarks), loading a dedicated class for every call site is an
>> overkill (even VM anonymous).
>> 
>> CallSite target class
>> (MethodHandle.form->LambdaForm.vmentry->MemberName.clazz->Class) is
>> also not satisfactory, since it is a compiled LambdaForm VM anonymous
>> class, which is heavily shared. It gets context pollution down, but
>> still the overhead is quite high.
>> 
>> So, I decided to focus on (3) and ended up with a mixture of (2) & (3).
>> 
>> Comparing to other options, the complications of (3) are:
>>   - CallSite can stay unlinked (e.g. CallSite.dynamicInvoker()), so
>> there should be some default context VM can use
>> 
>>   - CallSite instances can be shared and it shouldn't keep the context
>> class from unloading;
>> 
>> It motivated a scheme where CallSite context is initialized lazily and
>> can change during lifetime. When CallSite is linked with an indy
>> instruction, it's context is initialized. Usually, JIT sees CallSite
>> instances with initialized context (since it reaches them through indy),
>> but if it's not the case and there's no context yet, JIT sets it to
>> "default context", which means "use target call site".
>> 
>> I introduced CallSite$DependencyContext, which represents a nmethod
>> dependency context and points (indirectly) to a Class used as a context.
>> 
>> Context class is referenced through a phantom reference
>> (sun.misc.Cleaner to simplify cleanup). Though it's impossible to
>> extract referent using Reference.get(), VM can access it directly by
>> reading corresponding field. Unlike other types of references, phantom
>> references aren't cleared automatically. It allows VM to access context
>> class until cleanup is performed. And cleanup resets the context to
>> NULL, in addition to invalidating all relevant dependencies.
>> 
>> There are 3 context states a CallSite instance can be in:
>>   (1) NULL: no depedencies
>>   (2) DependencyContext.DEFAULT_CONTEXT: dependencies are stored in
>> call site target class
>>   (3) DependencyContext for some class: dependencies are stored on the
>> class DependencyContext instance points to
>> 
>> Every CallSite starts w/o a context (1) and then lazily gets one ((2) or
>> (3) depending on the situation).
>> 
>> State transitions:
>>   (1->3): When a CallSite w/o a context (1) is linked with some indy
>> call site, it's owner is recorded as a context (3).
>> 
>>   (1->2): When JIT needs to record a dependency on a target of a
>> CallSite w/o a context(1), it sets the context to DEFAULT_CONTEXT and
>> uses target class to store the dependency.
>> 
>>   (3->1): When context class becomes unreachable, a cleanup hook
>> invalidates all dependencies on that CallSite and resets the context to
>> NULL (1).
>> 
>> Only (3->1) requires dependency invalidation, because there are no
>> depedencies in (1) and (2->1) isn't performed.
>> 
>> (1->3) is done in Java code (CallSite.initContext) and (1->2) is
>> performed in VM (ciCallSite::get_context()). The updates are performed
>> by CAS, so there's no need in additional synchronization. Other
>> operations on VM side are volatile (to play well with Java code) and
>> performed with Compile_lock held (to avoid races between VM operations).
>> 
>> Some statistics:
>>   Box2D, latest jdk9-dev
>> - CallSite instances: ~22000
>> 
>> - invalidated nmethods due to CallSite target changes: ~60
>> 
>> - checked call_site_target_value dependencies:
>>   - before the fix: ~1,600,000
>>   - after the fix:~600
>> 
>> Testing:
>>   - dedicated test which excercises different sta

Re: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly

2015-04-15 Thread Vladimir Ivanov

Roland, thanks for looking into the fix!

You are right.
I moved VM_ENTRY_MARK to the beginning of the method [1].

Updated webrev in place.
  http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/

Best regards,
Vladimir Ivanov

[1]
diff --git a/src/share/vm/ci/ciCallSite.cpp b/src/share/vm/ci/ciCallSite.cpp
--- a/src/share/vm/ci/ciCallSite.cpp
+++ b/src/share/vm/ci/ciCallSite.cpp
@@ -55,6 +55,8 @@
 // Return the target MethodHandle of this CallSite.
 ciKlass* ciCallSite::get_context() {
   assert(!is_constant_call_site(), "");
+
+  VM_ENTRY_MARK;
   oop call_site_oop = get_oop();
   InstanceKlass* ctxk = 
MethodHandles::get_call_site_context(call_site_oop);

   if (ctxk == NULL) {
@@ -63,7 +65,6 @@
 java_lang_invoke_CallSite::set_context_cas(call_site_oop, 
def_context_oop, /*expected=*/NULL);

 ctxk = MethodHandles::get_call_site_context(call_site_oop);
   }
-  VM_ENTRY_MARK;
   return (CURRENT_ENV->get_metadata(ctxk))->as_klass();
 }


On 4/15/15 1:16 PM, Roland Westrelin wrote:

Hi Vladimir,


  http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/hotspot/


In ciCallSite::get_context(), is it safe to manipulate a raw oop the way you do 
it (with 2 different oops). Can’t it be moved concurrently by the GC?

Roland.


  http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/jdk/

Best regards,
Vladimir Ivanov

On 4/1/15 11:56 PM, Vladimir Ivanov wrote:

http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/
http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/
https://bugs.openjdk.java.net/browse/JDK-8057967

HotSpot JITs inline very aggressively through CallSites. The
optimistically treat CallSite target as constant, but record a nmethod
dependency to invalidate the compiled code once CallSite target changes.

Right now, such dependencies have call site class as a context. This
context is too coarse and it leads to context pollution: if some
CallSite target changes, VM needs to enumerate all nmethods which
depends on call sites of such type.

As performance analysis in the bug report shows, it can sum to
significant amount of work.

While working on the fix, I investigated 3 approaches:
   (1) unique context per call site
   (2) use CallSite target class
   (3) use a class the CallSite instance is linked to

Considering call sites are ubiquitous (e.g. 10,000s on some octane
benchmarks), loading a dedicated class for every call site is an
overkill (even VM anonymous).

CallSite target class
(MethodHandle.form->LambdaForm.vmentry->MemberName.clazz->Class) is
also not satisfactory, since it is a compiled LambdaForm VM anonymous
class, which is heavily shared. It gets context pollution down, but
still the overhead is quite high.

So, I decided to focus on (3) and ended up with a mixture of (2) & (3).

Comparing to other options, the complications of (3) are:
   - CallSite can stay unlinked (e.g. CallSite.dynamicInvoker()), so
there should be some default context VM can use

   - CallSite instances can be shared and it shouldn't keep the context
class from unloading;

It motivated a scheme where CallSite context is initialized lazily and
can change during lifetime. When CallSite is linked with an indy
instruction, it's context is initialized. Usually, JIT sees CallSite
instances with initialized context (since it reaches them through indy),
but if it's not the case and there's no context yet, JIT sets it to
"default context", which means "use target call site".

I introduced CallSite$DependencyContext, which represents a nmethod
dependency context and points (indirectly) to a Class used as a context.

Context class is referenced through a phantom reference
(sun.misc.Cleaner to simplify cleanup). Though it's impossible to
extract referent using Reference.get(), VM can access it directly by
reading corresponding field. Unlike other types of references, phantom
references aren't cleared automatically. It allows VM to access context
class until cleanup is performed. And cleanup resets the context to
NULL, in addition to invalidating all relevant dependencies.

There are 3 context states a CallSite instance can be in:
   (1) NULL: no depedencies
   (2) DependencyContext.DEFAULT_CONTEXT: dependencies are stored in
call site target class
   (3) DependencyContext for some class: dependencies are stored on the
class DependencyContext instance points to

Every CallSite starts w/o a context (1) and then lazily gets one ((2) or
(3) depending on the situation).

State transitions:
   (1->3): When a CallSite w/o a context (1) is linked with some indy
call site, it's owner is recorded as a context (3).

   (1->2): When JIT needs to record a dependency on a target of a
CallSite w/o a context(1), it sets the context to DEFAULT_CONTEXT and
uses target class to store the dependency.

   (3->1): When context class becomes unreachable, a cleanup hook
invalidates all dependencies on that CallSite and resets the context to
NULL (1).

Only (3->1) requires dependency invalidation, because there are no
de

Re: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly

2015-04-09 Thread MacGregor, Duncan (GE Energy Management)
Now I¹m back from my Easter break I¹ve run done some testing with our
code. Hs-comp is looking good in general, and this code does appear to
give a nice little extra boost. My results are showing a difference at
peak performance, which I found slightly surprising so I¹ll need to take a
look at just how often targets are being reset and for what reasons.

Anyway, in general I¹m getting about 10% better performance with hs-comp
than 8u40, and that¹s in code which spends a substantial amount of its
time down in some C libraries.

Keep up the good work Vladimir!

Duncan.

On 02/04/2015 17:26, "Vladimir Ivanov" 
wrote:

>Aleksey, thanks a lot for the performance evaluation of the fix!
>
>Best regards,
>Vladimir Ivanov
>
>On 4/2/15 7:10 PM, Aleksey Shipilev wrote:
>> On 04/01/2015 11:56 PM, Vladimir Ivanov wrote:
>>> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/
>>> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/
>>> https://bugs.openjdk.java.net/browse/JDK-8057967
>>
>> Glad to see this finally addressed, thanks!
>>
>> I did not look through the code changes, but ran Octane on my
>> configuration. As expected, Typescript had improved substantially. Other
>> benchmarks are not affected much. This in line with the performance
>> analysis done for the original bug report.
>>
>> Baseline:
>>
>> Benchmark  Mode  CntScoreError  Units
>> Box2D.test   ss   20 4454.677 ±345.807  ms/op
>> CodeLoad.testss   20 4784.299 ±370.658  ms/op
>> Crypto.test  ss   20  878.395 ± 87.918  ms/op
>> DeltaBlue.test   ss   20  502.182 ± 52.362  ms/op
>> EarleyBoyer.test ss   20 2250.508 ±273.924  ms/op
>> Gbemu.test   ss   20 5893.102 ±656.036  ms/op
>> Mandreel.testss   20 9323.484 ±825.801  ms/op
>> NavierStokes.testss   20  657.608 ± 41.212  ms/op
>> PdfJS.test   ss   20 3829.534 ±353.702  ms/op
>> Raytrace.testss   20 1202.826 ±166.795  ms/op
>> Regexp.test  ss   20  156.782 ± 20.992  ms/op
>> Richards.testss   20  324.256 ± 35.874  ms/op
>> Splay.test   ss   20  179.660 ± 34.120  ms/op
>> Typescript.test  ss   20   40.537 ±  2.457   s/op
>>
>> Patched:
>>
>> Benchmark  Mode  CntScoreError  Units
>> Box2D.test   ss   20 4306.198 ±376.030  ms/op
>> CodeLoad.testss   20 4881.635 ±395.585  ms/op
>> Crypto.test  ss   20  823.551 ±106.679  ms/op
>> DeltaBlue.test   ss   20  490.557 ± 41.705  ms/op
>> EarleyBoyer.test ss   20 2299.763 ±270.961  ms/op
>> Gbemu.test   ss   20 5612.868 ±414.052  ms/op
>> Mandreel.testss   20 8616.735 ±825.813  ms/op
>> NavierStokes.testss   20  640.722 ± 28.035  ms/op
>> PdfJS.test   ss   20 4139.396 ±373.580  ms/op
>> Raytrace.testss   20 1227.632 ±151.088  ms/op
>> Regexp.test  ss   20  169.246 ± 34.055  ms/op
>> Richards.testss   20  331.824 ± 32.706  ms/op
>> Splay.test   ss   20  168.479 ± 23.512  ms/op
>> Typescript.test  ss   20   31.181 ±  1.790   s/op
>>
>> The offending profile branch (Universe::flush_dependents_on) is also
>> gone, which explains the performance improvement.
>>
>> Thanks,
>> -Aleksey.
>>
>___
>mlvm-dev mailing list
>mlvm-dev@openjdk.java.net
>http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly

2015-04-08 Thread Vladimir Ivanov

Any volunteers to review VM part?

Latest webrev:
  http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/hotspot/
  http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/jdk/

Best regards,
Vladimir Ivanov

On 4/1/15 11:56 PM, Vladimir Ivanov wrote:

http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/
http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/
https://bugs.openjdk.java.net/browse/JDK-8057967

HotSpot JITs inline very aggressively through CallSites. The
optimistically treat CallSite target as constant, but record a nmethod
dependency to invalidate the compiled code once CallSite target changes.

Right now, such dependencies have call site class as a context. This
context is too coarse and it leads to context pollution: if some
CallSite target changes, VM needs to enumerate all nmethods which
depends on call sites of such type.

As performance analysis in the bug report shows, it can sum to
significant amount of work.

While working on the fix, I investigated 3 approaches:
   (1) unique context per call site
   (2) use CallSite target class
   (3) use a class the CallSite instance is linked to

Considering call sites are ubiquitous (e.g. 10,000s on some octane
benchmarks), loading a dedicated class for every call site is an
overkill (even VM anonymous).

CallSite target class
(MethodHandle.form->LambdaForm.vmentry->MemberName.clazz->Class) is
also not satisfactory, since it is a compiled LambdaForm VM anonymous
class, which is heavily shared. It gets context pollution down, but
still the overhead is quite high.

So, I decided to focus on (3) and ended up with a mixture of (2) & (3).

Comparing to other options, the complications of (3) are:
   - CallSite can stay unlinked (e.g. CallSite.dynamicInvoker()), so
there should be some default context VM can use

   - CallSite instances can be shared and it shouldn't keep the context
class from unloading;

It motivated a scheme where CallSite context is initialized lazily and
can change during lifetime. When CallSite is linked with an indy
instruction, it's context is initialized. Usually, JIT sees CallSite
instances with initialized context (since it reaches them through indy),
but if it's not the case and there's no context yet, JIT sets it to
"default context", which means "use target call site".

I introduced CallSite$DependencyContext, which represents a nmethod
dependency context and points (indirectly) to a Class used as a context.

Context class is referenced through a phantom reference
(sun.misc.Cleaner to simplify cleanup). Though it's impossible to
extract referent using Reference.get(), VM can access it directly by
reading corresponding field. Unlike other types of references, phantom
references aren't cleared automatically. It allows VM to access context
class until cleanup is performed. And cleanup resets the context to
NULL, in addition to invalidating all relevant dependencies.

There are 3 context states a CallSite instance can be in:
   (1) NULL: no depedencies
   (2) DependencyContext.DEFAULT_CONTEXT: dependencies are stored in
call site target class
   (3) DependencyContext for some class: dependencies are stored on the
class DependencyContext instance points to

Every CallSite starts w/o a context (1) and then lazily gets one ((2) or
(3) depending on the situation).

State transitions:
   (1->3): When a CallSite w/o a context (1) is linked with some indy
call site, it's owner is recorded as a context (3).

   (1->2): When JIT needs to record a dependency on a target of a
CallSite w/o a context(1), it sets the context to DEFAULT_CONTEXT and
uses target class to store the dependency.

   (3->1): When context class becomes unreachable, a cleanup hook
invalidates all dependencies on that CallSite and resets the context to
NULL (1).

Only (3->1) requires dependency invalidation, because there are no
depedencies in (1) and (2->1) isn't performed.

(1->3) is done in Java code (CallSite.initContext) and (1->2) is
performed in VM (ciCallSite::get_context()). The updates are performed
by CAS, so there's no need in additional synchronization. Other
operations on VM side are volatile (to play well with Java code) and
performed with Compile_lock held (to avoid races between VM operations).

Some statistics:
   Box2D, latest jdk9-dev
 - CallSite instances: ~22000

 - invalidated nmethods due to CallSite target changes: ~60

 - checked call_site_target_value dependencies:
   - before the fix: ~1,600,000
   - after the fix:~600

Testing:
   - dedicated test which excercises different state transitions
   - jdk/java/lang/invoke, hotspot/test/compiler/jsr292, nashorn

Thanks!

Best regards,
Vladimir Ivanov

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly

2015-04-02 Thread John Rose
On Apr 2, 2015, at 3:08 PM, Vladimir Ivanov  
wrote:
> 
> Member access permission check isn't performed if caller and member owner 
> class are loaded by the same class loader (which is the case with 
> CallSite$DependencyContext and CallSite classes).

Heh!  And I thought I had compiled the reflection logic to gray matter.  — John___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly

2015-04-02 Thread Vladimir Ivanov

John,

Thanks for the clarification!


BTW why do you think security manager was the problem? (1)
Class.getDeclaredField() is caller-sensitive; and (2)
DependencyContext was eagerly initialized with CallSite (see
UNSAFE.ensureClassInitialized() in original version).


CallSite$DependencyContext and CallSite are distinct classes.
At the JVM level they cannot access each others' private members.
So if DependencyContext wants to reflect a private field from CallSite,
there will be extra security checks.  These sometimes fail, as in:
Member access permission check isn't performed if caller and member 
owner class are loaded by the same class loader (which is the case with 
CallSite$DependencyContext and CallSite classes).


jdk/src/java.base/share/classes/java/lang/Class.java:
@CallerSensitive
public Field getDeclaredField(String name)
throws NoSuchFieldException, SecurityException {
checkMemberAccess(Member.DECLARED, Reflection.getCallerClass(), 
true);

...
private void checkMemberAccess(int which, Class caller, boolean 
checkProxyInterfaces) {

final SecurityManager s = System.getSecurityManager();
if (s != null) {
final ClassLoader ccl = ClassLoader.getClassLoader(caller);
final ClassLoader cl = getClassLoader0();
if (which != Member.PUBLIC) {
if (ccl != cl) {

s.checkPermission(SecurityConstants.CHECK_MEMBER_ACCESS_PERMISSION);
}

Best regards,
Vladimir Ivanov
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly

2015-04-02 Thread John Rose
On Apr 2, 2015, at 9:17 AM, Vladimir Ivanov  
wrote:
> 
>> 
>> I recommend putting CONTEXT_OFFSET into CallSite, not the nested class.
>> For one thing, your getDeclaredField call will fail (I think) with a 
>> security manager installed.
>> You can load it up where TARGET_OFFSET is initialized.
> Since I removed DependencyContext, I moved CONTEXT_OFFSET to CallSite.
> 
> BTW why do you think security manager was the problem? (1) 
> Class.getDeclaredField() is caller-sensitive; and (2) DependencyContext was 
> eagerly initialized with CallSite (see UNSAFE.ensureClassInitialized() in 
> original version).

CallSite$DependencyContext and CallSite are distinct classes.
At the JVM level they cannot access each others' private members.
So if DependencyContext wants to reflect a private field from CallSite,
there will be extra security checks.  These sometimes fail, as in:

https://bugs.openjdk.java.net/browse/JDK-7050328

— John___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly

2015-04-02 Thread Vladimir Ivanov

Aleksey, thanks a lot for the performance evaluation of the fix!

Best regards,
Vladimir Ivanov

On 4/2/15 7:10 PM, Aleksey Shipilev wrote:

On 04/01/2015 11:56 PM, Vladimir Ivanov wrote:

http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/
http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/
https://bugs.openjdk.java.net/browse/JDK-8057967


Glad to see this finally addressed, thanks!

I did not look through the code changes, but ran Octane on my
configuration. As expected, Typescript had improved substantially. Other
benchmarks are not affected much. This in line with the performance
analysis done for the original bug report.

Baseline:

Benchmark  Mode  CntScoreError  Units
Box2D.test   ss   20 4454.677 ±345.807  ms/op
CodeLoad.testss   20 4784.299 ±370.658  ms/op
Crypto.test  ss   20  878.395 ± 87.918  ms/op
DeltaBlue.test   ss   20  502.182 ± 52.362  ms/op
EarleyBoyer.test ss   20 2250.508 ±273.924  ms/op
Gbemu.test   ss   20 5893.102 ±656.036  ms/op
Mandreel.testss   20 9323.484 ±825.801  ms/op
NavierStokes.testss   20  657.608 ± 41.212  ms/op
PdfJS.test   ss   20 3829.534 ±353.702  ms/op
Raytrace.testss   20 1202.826 ±166.795  ms/op
Regexp.test  ss   20  156.782 ± 20.992  ms/op
Richards.testss   20  324.256 ± 35.874  ms/op
Splay.test   ss   20  179.660 ± 34.120  ms/op
Typescript.test  ss   20   40.537 ±  2.457   s/op

Patched:

Benchmark  Mode  CntScoreError  Units
Box2D.test   ss   20 4306.198 ±376.030  ms/op
CodeLoad.testss   20 4881.635 ±395.585  ms/op
Crypto.test  ss   20  823.551 ±106.679  ms/op
DeltaBlue.test   ss   20  490.557 ± 41.705  ms/op
EarleyBoyer.test ss   20 2299.763 ±270.961  ms/op
Gbemu.test   ss   20 5612.868 ±414.052  ms/op
Mandreel.testss   20 8616.735 ±825.813  ms/op
NavierStokes.testss   20  640.722 ± 28.035  ms/op
PdfJS.test   ss   20 4139.396 ±373.580  ms/op
Raytrace.testss   20 1227.632 ±151.088  ms/op
Regexp.test  ss   20  169.246 ± 34.055  ms/op
Richards.testss   20  331.824 ± 32.706  ms/op
Splay.test   ss   20  168.479 ± 23.512  ms/op
Typescript.test  ss   20   31.181 ±  1.790   s/op

The offending profile branch (Universe::flush_dependents_on) is also
gone, which explains the performance improvement.

Thanks,
-Aleksey.


___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly

2015-04-02 Thread Vladimir Ivanov

John, Peter,

Thanks a lot for the feedback!

Updated webrev:
  http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/hotspot/
  http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/jdk/


Question:  How common is state 2 (context-free CS) compared to state 3 
(indy-bound CS)?

It's quite rare (<2%). For Box2D the stats are:
   total # of call sites instantiated: 22000
   (1): ~1800 (stay uninitialized)
   (2): ~19900
   (3): ~300


And is state 2 well tested by Box2D?
No, it's not. But: (1) I wrote a focused test on different context state 
transitions (see test/compiler/jsr292/CallSiteDepContextTest.java); and 
(2) artificially stressed the logic by eagerly initializing the context 
to DEFAULT_CONTEXT.


I had (2)->(3) transition (DEF_CTX => bound Class context) at some 
point, but decided to get rid of it. IMO the price of recompilation 
(recorded dependencies should be invalidated during context migration) 
is too much for reduced number of dependencies enumerated.



I recommend putting CONTEXT_OFFSET into CallSite, not the nested class.
For one thing, your getDeclaredField call will fail (I think) with a security 
manager installed.
You can load it up where TARGET_OFFSET is initialized.

Since I removed DependencyContext, I moved CONTEXT_OFFSET to CallSite.

BTW why do you think security manager was the problem? (1) 
Class.getDeclaredField() is caller-sensitive; and (2) DependencyContext 
was eagerly initialized with CallSite (see 
UNSAFE.ensureClassInitialized() in original version).




I haven't looked at the JVM changes yet, and I don't understand the cleaner, 
yet.



Can a call site target class change as a result of LF recompiling or 
customization?
If so, won't that cause a risk of dropped dependencies?
Good point! It's definitely a problem I haven't envisioned. Ok, I 
completely removed call site target class logic and use DefaultContext 
class instead.


On 4/2/15 11:02 AM, Peter Levart wrote:> Hi Vladimir,
>
> Would it be possible for CallSite.context to hold the Cleaner instance
> itself (without indirection through DependencyContext)?
>
> DEFAULT_CONTEXT would then be a Cleaner instance that references some
> "default" Class object (for example DefaultContext.class that serves no
> other purpose).
Good idea! I eliminated the indirection as you suggest.

Best regards,
Vladimir Ivanov
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly

2015-04-02 Thread Aleksey Shipilev
On 04/01/2015 11:56 PM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/
> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/
> https://bugs.openjdk.java.net/browse/JDK-8057967

Glad to see this finally addressed, thanks!

I did not look through the code changes, but ran Octane on my
configuration. As expected, Typescript had improved substantially. Other
benchmarks are not affected much. This in line with the performance
analysis done for the original bug report.

Baseline:

Benchmark  Mode  CntScoreError  Units
Box2D.test   ss   20 4454.677 ±345.807  ms/op
CodeLoad.testss   20 4784.299 ±370.658  ms/op
Crypto.test  ss   20  878.395 ± 87.918  ms/op
DeltaBlue.test   ss   20  502.182 ± 52.362  ms/op
EarleyBoyer.test ss   20 2250.508 ±273.924  ms/op
Gbemu.test   ss   20 5893.102 ±656.036  ms/op
Mandreel.testss   20 9323.484 ±825.801  ms/op
NavierStokes.testss   20  657.608 ± 41.212  ms/op
PdfJS.test   ss   20 3829.534 ±353.702  ms/op
Raytrace.testss   20 1202.826 ±166.795  ms/op
Regexp.test  ss   20  156.782 ± 20.992  ms/op
Richards.testss   20  324.256 ± 35.874  ms/op
Splay.test   ss   20  179.660 ± 34.120  ms/op
Typescript.test  ss   20   40.537 ±  2.457   s/op

Patched:

Benchmark  Mode  CntScoreError  Units
Box2D.test   ss   20 4306.198 ±376.030  ms/op
CodeLoad.testss   20 4881.635 ±395.585  ms/op
Crypto.test  ss   20  823.551 ±106.679  ms/op
DeltaBlue.test   ss   20  490.557 ± 41.705  ms/op
EarleyBoyer.test ss   20 2299.763 ±270.961  ms/op
Gbemu.test   ss   20 5612.868 ±414.052  ms/op
Mandreel.testss   20 8616.735 ±825.813  ms/op
NavierStokes.testss   20  640.722 ± 28.035  ms/op
PdfJS.test   ss   20 4139.396 ±373.580  ms/op
Raytrace.testss   20 1227.632 ±151.088  ms/op
Regexp.test  ss   20  169.246 ± 34.055  ms/op
Richards.testss   20  331.824 ± 32.706  ms/op
Splay.test   ss   20  168.479 ± 23.512  ms/op
Typescript.test  ss   20   31.181 ±  1.790   s/op

The offending profile branch (Universe::flush_dependents_on) is also
gone, which explains the performance improvement.

Thanks,
-Aleksey.



signature.asc
Description: OpenPGP digital signature
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly

2015-04-02 Thread Peter Levart

Hi Vladimir,

Would it be possible for CallSite.context to hold the Cleaner instance 
itself (without indirection through DependencyContext)?


DEFAULT_CONTEXT would then be a Cleaner instance that references some 
"default" Class object (for example DefaultContext.class that serves no 
other purpose).


Regards, Peter

On 04/01/2015 10:56 PM, Vladimir Ivanov wrote:

http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/
http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/
https://bugs.openjdk.java.net/browse/JDK-8057967

HotSpot JITs inline very aggressively through CallSites. The 
optimistically treat CallSite target as constant, but record a nmethod 
dependency to invalidate the compiled code once CallSite target changes.


Right now, such dependencies have call site class as a context. This 
context is too coarse and it leads to context pollution: if some 
CallSite target changes, VM needs to enumerate all nmethods which 
depends on call sites of such type.


As performance analysis in the bug report shows, it can sum to 
significant amount of work.


While working on the fix, I investigated 3 approaches:
  (1) unique context per call site
  (2) use CallSite target class
  (3) use a class the CallSite instance is linked to

Considering call sites are ubiquitous (e.g. 10,000s on some octane 
benchmarks), loading a dedicated class for every call site is an 
overkill (even VM anonymous).


CallSite target class 
(MethodHandle.form->LambdaForm.vmentry->MemberName.clazz->Class) is 
also not satisfactory, since it is a compiled LambdaForm VM anonymous 
class, which is heavily shared. It gets context pollution down, but 
still the overhead is quite high.


So, I decided to focus on (3) and ended up with a mixture of (2) & (3).

Comparing to other options, the complications of (3) are:
  - CallSite can stay unlinked (e.g. CallSite.dynamicInvoker()), so 
there should be some default context VM can use


  - CallSite instances can be shared and it shouldn't keep the context 
class from unloading;


It motivated a scheme where CallSite context is initialized lazily and 
can change during lifetime. When CallSite is linked with an indy 
instruction, it's context is initialized. Usually, JIT sees CallSite 
instances with initialized context (since it reaches them through 
indy), but if it's not the case and there's no context yet, JIT sets 
it to "default context", which means "use target call site".


I introduced CallSite$DependencyContext, which represents a nmethod 
dependency context and points (indirectly) to a Class used as a 
context.


Context class is referenced through a phantom reference 
(sun.misc.Cleaner to simplify cleanup). Though it's impossible to 
extract referent using Reference.get(), VM can access it directly by 
reading corresponding field. Unlike other types of references, phantom 
references aren't cleared automatically. It allows VM to access 
context class until cleanup is performed. And cleanup resets the 
context to NULL, in addition to invalidating all relevant dependencies.


There are 3 context states a CallSite instance can be in:
  (1) NULL: no depedencies
  (2) DependencyContext.DEFAULT_CONTEXT: dependencies are stored in 
call site target class
  (3) DependencyContext for some class: dependencies are stored on the 
class DependencyContext instance points to


Every CallSite starts w/o a context (1) and then lazily gets one ((2) 
or (3) depending on the situation).


State transitions:
  (1->3): When a CallSite w/o a context (1) is linked with some indy 
call site, it's owner is recorded as a context (3).


  (1->2): When JIT needs to record a dependency on a target of a 
CallSite w/o a context(1), it sets the context to DEFAULT_CONTEXT and 
uses target class to store the dependency.


  (3->1): When context class becomes unreachable, a cleanup hook 
invalidates all dependencies on that CallSite and resets the context 
to NULL (1).


Only (3->1) requires dependency invalidation, because there are no 
depedencies in (1) and (2->1) isn't performed.


(1->3) is done in Java code (CallSite.initContext) and (1->2) is 
performed in VM (ciCallSite::get_context()). The updates are performed 
by CAS, so there's no need in additional synchronization. Other 
operations on VM side are volatile (to play well with Java code) and 
performed with Compile_lock held (to avoid races between VM operations).


Some statistics:
  Box2D, latest jdk9-dev
- CallSite instances: ~22000

- invalidated nmethods due to CallSite target changes: ~60

- checked call_site_target_value dependencies:
  - before the fix: ~1,600,000
  - after the fix:~600

Testing:
  - dedicated test which excercises different state transitions
  - jdk/java/lang/invoke, hotspot/test/compiler/jsr292, nashorn

Thanks!

Best regards,
Vladimir Ivanov


___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlv

Re: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly

2015-04-01 Thread John Rose
On Apr 1, 2015, at 1:56 PM, Vladimir Ivanov  
wrote:
> 
> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/
> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/
> https://bugs.openjdk.java.net/browse/JDK-8057967

Impressive work.

Question:  How common is state 2 (context-free CS) compared to state 3 
(indy-bound CS)?

And is state 2 well tested by Box2D?

I recommend putting CONTEXT_OFFSET into CallSite, not the nested class.
For one thing, your getDeclaredField call will fail (I think) with a security 
manager installed.
You can load it up where TARGET_OFFSET is initialized.

I haven't looked at the JVM changes yet, and I don't understand the cleaner, 
yet.

Can a call site target class change as a result of LF recompiling or 
customization?
If so, won't that cause a risk of dropped dependencies?

— John
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


[9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly

2015-04-01 Thread Vladimir Ivanov

http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/
http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/
https://bugs.openjdk.java.net/browse/JDK-8057967

HotSpot JITs inline very aggressively through CallSites. The 
optimistically treat CallSite target as constant, but record a nmethod 
dependency to invalidate the compiled code once CallSite target changes.


Right now, such dependencies have call site class as a context. This 
context is too coarse and it leads to context pollution: if some 
CallSite target changes, VM needs to enumerate all nmethods which 
depends on call sites of such type.


As performance analysis in the bug report shows, it can sum to 
significant amount of work.


While working on the fix, I investigated 3 approaches:
  (1) unique context per call site
  (2) use CallSite target class
  (3) use a class the CallSite instance is linked to

Considering call sites are ubiquitous (e.g. 10,000s on some octane 
benchmarks), loading a dedicated class for every call site is an 
overkill (even VM anonymous).


CallSite target class 
(MethodHandle.form->LambdaForm.vmentry->MemberName.clazz->Class) is 
also not satisfactory, since it is a compiled LambdaForm VM anonymous 
class, which is heavily shared. It gets context pollution down, but 
still the overhead is quite high.


So, I decided to focus on (3) and ended up with a mixture of (2) & (3).

Comparing to other options, the complications of (3) are:
  - CallSite can stay unlinked (e.g. CallSite.dynamicInvoker()), so 
there should be some default context VM can use


  - CallSite instances can be shared and it shouldn't keep the context 
class from unloading;


It motivated a scheme where CallSite context is initialized lazily and 
can change during lifetime. When CallSite is linked with an indy 
instruction, it's context is initialized. Usually, JIT sees CallSite 
instances with initialized context (since it reaches them through indy), 
but if it's not the case and there's no context yet, JIT sets it to 
"default context", which means "use target call site".


I introduced CallSite$DependencyContext, which represents a nmethod 
dependency context and points (indirectly) to a Class used as a context.


Context class is referenced through a phantom reference 
(sun.misc.Cleaner to simplify cleanup). Though it's impossible to 
extract referent using Reference.get(), VM can access it directly by 
reading corresponding field. Unlike other types of references, phantom 
references aren't cleared automatically. It allows VM to access context 
class until cleanup is performed. And cleanup resets the context to 
NULL, in addition to invalidating all relevant dependencies.


There are 3 context states a CallSite instance can be in:
  (1) NULL: no depedencies
  (2) DependencyContext.DEFAULT_CONTEXT: dependencies are stored in 
call site target class
  (3) DependencyContext for some class: dependencies are stored on the 
class DependencyContext instance points to


Every CallSite starts w/o a context (1) and then lazily gets one ((2) or 
(3) depending on the situation).


State transitions:
  (1->3): When a CallSite w/o a context (1) is linked with some indy 
call site, it's owner is recorded as a context (3).


  (1->2): When JIT needs to record a dependency on a target of a 
CallSite w/o a context(1), it sets the context to DEFAULT_CONTEXT and 
uses target class to store the dependency.


  (3->1): When context class becomes unreachable, a cleanup hook 
invalidates all dependencies on that CallSite and resets the context to 
NULL (1).


Only (3->1) requires dependency invalidation, because there are no 
depedencies in (1) and (2->1) isn't performed.


(1->3) is done in Java code (CallSite.initContext) and (1->2) is 
performed in VM (ciCallSite::get_context()). The updates are performed 
by CAS, so there's no need in additional synchronization. Other 
operations on VM side are volatile (to play well with Java code) and 
performed with Compile_lock held (to avoid races between VM operations).


Some statistics:
  Box2D, latest jdk9-dev
- CallSite instances: ~22000

- invalidated nmethods due to CallSite target changes: ~60

- checked call_site_target_value dependencies:
  - before the fix: ~1,600,000
  - after the fix:~600

Testing:
  - dedicated test which excercises different state transitions
  - jdk/java/lang/invoke, hotspot/test/compiler/jsr292, nashorn

Thanks!

Best regards,
Vladimir Ivanov
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev