Coordinated Restore at Checkpoint: A new project for start-up optimization?

2020-09-10 Thread Gil Tene
Hello,

We would like to open a discussion about a new project focused on
"Coordinated Restore at Checkpoint".

A possible relevant project name might be Tubthumpting [9].

Over the years, we [at Azul] have tinkered with various ways to improve java
start-up time and warmup behavior for different use cases for such
improvements. One of the interesting focus areas has been the "starting of
a new instance" of an application that has already run instances using identical
code, a similar expected profile, and potentially a similar initialization
sequence in the past. This is a common scenario in modern application
deployments, when e.g. rolling out new code in continuous deployment
environment, and when e.g. elastically changing instance counts in e.g.
auto-scaling situations.

Checkpoint/Restore technologies have evolved in various forms over the past
few years, and are available in the multiple forms, including e.g. CRIU [1]
and Docker Checkpoint & Restore [2].  While Checkpoint/Restore capabilities
have been shown to work across a wide range of applications for e.g. live
process or application migration, there are various challenges present for
their generic application for new instance deployment. Many of these
challenges have to do with the need to deal with a checkpointed state that may
not be validly reproducible when restoring multiple instances from the same
checkpoint image.

This is where Coordinate Restore at Checkpoint (CRaC) comes in. At a high
level, CRaC aims to systemically address these challenges by facilitating
explicit and intentional coordination between checkpointed applications and
a checkpointing mechanism. Such coordination will allow applications to
proactively discard problematic state ahead of checkpointing and to
reestablish needed state upon restoration. [e.g. closing open file
descriptors ahead of a checkpoint, and recreating and binding them after a
restore].

Coordination is a powerful enabler in this space. Contrary to the approaches
attempting transparent, uncoordinated checkpoint/restore, CRaC's approach to
the date has focused on assisting with the detection of situations that would
prevent a successful checkpoint, and simply refusing to checkpoint if such
conditions are identified. This approach leaves it up to the application
frameworks and the applications themselves to remedy the situation during
development, and before attempting actual deployment (or simply accept
non-CRaC startup times since a restorable checkpoint state will not be
available).

In the Java arena, we aim to create a generic CRaC API that would allow
applications and/or application frameworks to coordinate with an arbitrary
checkpoint/restore mechanism, without being tied to a specific
implementation or to the operational means by which checkpointing and
restoration is achieved. Such an API would allow application frameworks
(e.g. Tomcat, Quarkus, MicroNaut, etc.) to perform the needed coordination
in a portable way, which would not require coding that is specific to a
checkpoint/restore mechanism.  E.g. the same Tomcat CRaC coordination code
would be able to properly coordinate with a generic Linux CRIU utility, with
Docker Checkpoint & Restore, or with future OpenJDK implementations that may
support checkpoint/restore functionality directly or via the use of
libraries or system services.

Our hope is to start a project that will focus on specifying a CRaC API, and
will provide at least one CRaC-supporting checkpoint/restore OpenJDK
implementation with the hope of eventual upstream inclusion in a future
OpenJDK version via associated JEPs. We would potentially want to include
the API in a future Java SE specification as well.

In reality, we expect that more than one checkpoint/restore mechanism may be
supported, as we have already identified at least two probable modes of
operation that would be useful for OpenJDK:

- We have prototyped [3] a JDK-driven, modified-CRIU [4] based
  checkpoint/restore implementation that leverages on-demand paging during
  startup to deliver very promising start times for e.g microservices
  running on Quarkus, Micronaut, and Tomcat, and reaching "full speed"
  condition in sub-50-msec times.[5]

- We anticipate external-to-the-JDK checkpoint/restore implementations such
  as Docker Checkpoint & Restore [2] and potential possible support within
  orchestration frameworks (such as future Kubernetes versions) will drive
  a need for non-Java-specific means of coordinating restoration from
  checkpointed conditions, and that in such environments JDKs will likely
  wish to provide external controls (such jcmd or other APIs) that would
  deal with coordination, but leave the actual checkpointing and restore
  work to external entities.


Below are short summaries of:
- CRaC API concepts
- What a prototype OpenJDK implementation looks like
- Preliminary uses of CRaC API in some application frameworks
- Some promising preliminary results


What do you think? Please chime 

Re: RFR: 8188055: (ref) Add Reference.refersTo predicate

2020-04-08 Thread Gil Tene
Erik,

The fact that you have access to the objects involved (and to their contents) 
does not
mean you already have access to the new information revealed by being able to
check if a phantom reference refers to some specific object. Knowing "who uses 
what thing"
is a lot more than just knowing "who exists" and "what are the things that 
exist"…
Many security things leverage the fundamental difference between those sets of 
knowledge,
and that's why we e.g. fear the effect that the emergence of quantum will 
likely have on on
existing TLS ciphers...

I could probably come up with a "reasonable code" example that would make 
security or
correctness assumptions based on the currently specified opaqueness of phantom 
references,
and which one would then be able to write an exploit against if we change the 
specified behavior .

E.g. it would currently be reasonable to make use of phantom references across 
APIs as
forms of a weak and opaque identity handles (and opposed to WeakReference which 
would
be a weak but non-opaque handle), and build security or correctness assumptions 
based
on that presumed opaqueness. We could go through an exercise of actually 
building
one of these to prove a point.

But we don't need an actual example exploit or a proof that the change can lead 
to
security or correctness issues. We just need enough of a worry that such issues 
can
arise due to the change. What I'm pointing to is the worry, and suggesting that 
the
change in semantics is not necessary.

I could phrase the issue in reverse: "What are examples where being able to 
determine
if a phantom reference refers to a specific object is useful?" I have a feeling
that at least some of those examples would also provide us with the exploit 
examples
you ask for ;-)

— Gil.

> On Apr 8, 2020, at 9:13 AM, Erik Österlund  wrote:
> 
> Hi Gil,
> 
> Do you have an example exploit, or at least the gist of it? As I already 
> said, any information exposed could have been just guessed (replace refersTo 
> with random() and brute force). So if you can create an exploit based on the 
> answer of refersTo, then your system is secure by chance. In other words, it 
> is already compromised. Or have I missed something?
> 
> Thanks,
> /Erik
> 
>> On 8 Apr 2020, at 18:05, Gil Tene  wrote:
>> 
>> Lifting out of response from the JIRA issue:
>> 
>> I always worry when proposing a change to an existing invariant, and
>> PhantomReference currently carries the stated and specified behavior
>> of "the referent of a phantom reference is always inaccessible".
>> 
>> I can imagine quite a few forms of gaining new information I do not otherwise
>> have access to by using PhantomReference::RefersTo if it allowed me to 
>> examine
>> the current referent of a phantom reference and test to see if it is (a) 
>> null or (b) a
>> specific object I have a reference to. Both of those would provide me with 
>> information
>> that is impossible for me to get according to current specifications. With 
>> that newly
>> available information one can come up with all sorts of nice things to do... 
>> Think in
>> terms of "side-channel" as an example of the sort of thinking black hats can 
>> apply
>> to this new knowledge, but the potential attacks are not limited to 
>> side-channels.
>> 
>> While it will be "obviously safe" to have Reference:RefersTo(obj) provide 
>> the same
>> information that (Reference.get() == obj) would, providing more information 
>> than
>> that would be a change to the specified behavior of Reference types, which we
>> should be extra paranoid about. Since PhantomReference::get returns null by
>> definition, we should remain consistent with that in 
>> PhantomReference::refersTo
>> 
>>> On Apr 8, 2020, at 7:56 AM, Erik Österlund  
>>> wrote:
>>> 
>>> Hi Gil,
>>> 
>>> Lifting out my reply to you from the JIRA issue:
>>> 
>>> In terms of breaking existing logic, I am not worried. This is a new API, 
>>> that nobody is using yet. People that write new code that uses it, will 
>>> have to pay attention that they are doing the right thing. We are still not 
>>> exposing the phantom referent with this change. In terms of security, you 
>>> can only use this API to figure out what the referent is, if you already 
>>> have access to it. So that isn't really helpful for building exploits. What 
>>> it could do is allow you to check which one of N objects that you already 
>>> have access to is the one referred to from the PhantomReference. But in 
>>> terms of security, you could also have j

Re: RFR: 8188055: (ref) Add Reference.refersTo predicate

2020-04-08 Thread Gil Tene
Lifting out of response from the JIRA issue:

I always worry when proposing a change to an existing invariant, and
PhantomReference currently carries the stated and specified behavior
of "the referent of a phantom reference is always inaccessible".

I can imagine quite a few forms of gaining new information I do not otherwise
have access to by using PhantomReference::RefersTo if it allowed me to examine
the current referent of a phantom reference and test to see if it is (a) null 
or (b) a
specific object I have a reference to. Both of those would provide me with 
information
that is impossible for me to get according to current specifications. With that 
newly
available information one can come up with all sorts of nice things to do... 
Think in
terms of "side-channel" as an example of the sort of thinking black hats can 
apply
to this new knowledge, but the potential attacks are not limited to 
side-channels.

While it will be "obviously safe" to have Reference:RefersTo(obj) provide the 
same
information that (Reference.get() == obj) would, providing more information than
that would be a change to the specified behavior of Reference types, which we
should be extra paranoid about. Since PhantomReference::get returns null by
definition, we should remain consistent with that in PhantomReference::refersTo

> On Apr 8, 2020, at 7:56 AM, Erik Österlund  wrote:
> 
> Hi Gil,
> 
> Lifting out my reply to you from the JIRA issue:
> 
> In terms of breaking existing logic, I am not worried. This is a new API, 
> that nobody is using yet. People that write new code that uses it, will have 
> to pay attention that they are doing the right thing. We are still not 
> exposing the phantom referent with this change. In terms of security, you can 
> only use this API to figure out what the referent is, if you already have 
> access to it. So that isn't really helpful for building exploits. What it 
> could do is allow you to check which one of N objects that you already have 
> access to is the one referred to from the PhantomReference. But in terms of 
> security, you could also have just guessed that without this API, as you 
> already have full access to the objects. Sounds like a classic case of "I 
> have an exploit. Given a compromised system... X". Or have I missed something?
> 
> Thanks,
> /Erik
> 
> On 2020-04-08 16:25, Gil Tene wrote:
>> A very welcome change overall. However, I have concerns about
>> the semantic change to the PhantomReference specification. I propose
>> that PhantomReference semantics remain unchanged, and that
>> PhantomReference:RefersTo should return true only for null.
>> 
>> See more in comment at 
>> https://bugs.openjdk.java.net/browse/JDK-8188055?focusedCommentId=14329319=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14329319
>> 
>>> On Apr 7, 2020, at 5:25 PM, Kim Barrett  wrote:
>>> 
>>> [Note review on both core-libs and hotspot-gc-dev lists; try not to lose
>>> either when replying.]
>>> 
>>> Please review a new function: java.lang.ref.Reference.refersTo.
>>> 
>>> This function is needed to test the referent of a Reference object
>>> without artificially extending the lifetime of the referent object, as
>>> may happen when calling Reference.get.  Some garbage collectors
>>> require extending the lifetime of a weak referent when accessed, in
>>> order to maintain collector invariants.  Lifetime extension may occur
>>> with any collector when the Reference is a SoftReference, as calling
>>> get indicates recent access.  This new function also allows testing
>>> the referent of a PhantomReference, which can't be accessed by calling
>>> get.
>>> 
>>> The new function uses a native method whose implementation is in the
>>> VM so it can use the Access API.  It is the intent that this function
>>> will be intrinsified by optimizing compilers like C2 or graal, but
>>> that hasn't been implemented yet.  Bear that in mind before rushing
>>> off to change existing uses of Reference.get.
>>> 
>>> CR:
>>> https://bugs.openjdk.java.net/browse/JDK-8188055
>>> https://bugs.openjdk.java.net/browse/JDK-8241029 (CSR)
>>> 
>>> Webrev:
>>> https://cr.openjdk.java.net/~kbarrett/8188055/open.04/
>>> 
>>> Testing:
>>> mach5 tier1
>>> 
>>> Locally (linux-x64) verified the new test passes with various garbage
>>> collectors.



Re: RFR: 8188055: (ref) Add Reference.refersTo predicate

2020-04-08 Thread Gil Tene
A very welcome change overall. However, I have concerns about
the semantic change to the PhantomReference specification. I propose
that PhantomReference semantics remain unchanged, and that
PhantomReference:RefersTo should return true only for null.

See more in comment at 
https://bugs.openjdk.java.net/browse/JDK-8188055?focusedCommentId=14329319=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14329319

> On Apr 7, 2020, at 5:25 PM, Kim Barrett  wrote:
> 
> [Note review on both core-libs and hotspot-gc-dev lists; try not to lose
> either when replying.]
> 
> Please review a new function: java.lang.ref.Reference.refersTo.
> 
> This function is needed to test the referent of a Reference object
> without artificially extending the lifetime of the referent object, as
> may happen when calling Reference.get.  Some garbage collectors
> require extending the lifetime of a weak referent when accessed, in
> order to maintain collector invariants.  Lifetime extension may occur
> with any collector when the Reference is a SoftReference, as calling
> get indicates recent access.  This new function also allows testing
> the referent of a PhantomReference, which can't be accessed by calling
> get.
> 
> The new function uses a native method whose implementation is in the
> VM so it can use the Access API.  It is the intent that this function
> will be intrinsified by optimizing compilers like C2 or graal, but
> that hasn't been implemented yet.  Bear that in mind before rushing
> off to change existing uses of Reference.get.
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8188055
> https://bugs.openjdk.java.net/browse/JDK-8241029 (CSR)
> 
> Webrev:
> https://cr.openjdk.java.net/~kbarrett/8188055/open.04/
> 
> Testing:
> mach5 tier1
> 
> Locally (linux-x64) verified the new test passes with various garbage
> collectors.



Re: RFR 9: 8165641 : Deprecate Object.finalize

2017-04-15 Thread Gil Tene
IMO, so long as the JDK's own performance-sensitive methods do not follow this 
advice, it will be hard to "educate" people to use reachabilityFence 
responsibly.

To be specific:

As long as java.nioDirectByteBuffer.get() is effectively coded like this (after 
the templates are applied):

public byte get() {
return (unsafe.getByte(ix(nextGetIndex(;
}

Instead of like this:

public byte get() {
try {
return (unsafe.getByte(ix(nextGetIndex(;
} finally {
Reference.reachabilityFence(this);
}
}

People will copy it and extend on it, replicating the reachability bug and the 
use-after-free problem in the current code.

It is hard to ask people to code "responsibly" when the JDK itself doesn't. 
This is especially true when the bug in the above code is not obvious, and (as 
Hans points out earlier) it takes a long time to convince people it actually 
exists. This is especially true since the "well, when was the last time someone 
complained about DirectByteBuffer.get() doing getting a SEGV on access to a 
freed buffer?" argument holds some water. And that's where you actually end up 
usually when discussing the above missing reachabilityFence…

A few things are keeping the JDK code from evolving to properly use 
Reference.reachabilityFence(this):

a. It didn't exist until recently. [That's a very good excuse]

b. While Reference.reachabilityFence exists now, it's implementation is 
sub-optimal. IUUC it is modeled [currently] as an opaque method call, which 
prohibits enough useful optimizations to significantly hurt 
DirectByteBuffer.get() if it were actually included in the code right now.

c. As Hans notes below, the compilers may temporarily dodge this issue by 
inhibiting dead variable elimination for references. They do so to allow the 
actual (semantically buggy) current JDK code to execute safely and performantly.

As long as the compilers paper over this by temporarily dodging the issue and 
inhibiting dead variable elimination for references, we are giving people a 
continued excuse to ignore the need for adding reachabilityFences to their 
code. And this situation will probably continue (for practical purposes) at 
least as long as using reachabilityFence in the above common form is not 
optimized in practice to be equivalent to just inhibiting dead variable 
elimination of (this) across the same scope [and not just implemented as a much 
more expensive opaque method effect].

This makes the subject somewhat premature for wide public education IMO. A 
practical set of steps might look like this:

0. Add Reference.reachabilityFence() to the platform [this is basically the 
existing "collect underpants" step]

1. make try { } finally { Reference.reachabilityFence(this); } optimize well 
(to be no more expensive than inhibiting dead variable elimination for 'this' 
across the try block).

2. Change JDK code to actually make use of this code idiom in the various 
places where it is actually needed for correctness.

3. Relax compiler limitation crutches that currently help make the correctness 
bugs not trigger often. But do so first with a non-default -XX flag, and later 
transition to default with a backout flag (since a lot of user code will start 
breaking when the crutch is taken away).

4. Educate and evangelize for proper use of Reference.reachabilityFence(this); 
(maybe even add a method annotation for people who like that sort of thing? 
E.g. @AlwaysReachableThis)

5. Profit


> On Apr 1, 2017, at 10:19 AM, Hans Boehm  wrote:
> 
> That also sounds fine to me.
> 
> The difficulty with applying this, especially in the JNI case, is that you
> may need several reachability fences at the end of each method, guarding
> parameters and temporaries, not all of which may be naturally in scope at
> the end. But we're not going to be able to include a full discussion here.
> 
> In the interest of full disclosure, we currently, at least temporarily,
> dodge this issue by inhibiting dead variable elimination for references.
> 
> 
> On Apr 1, 2017 01:55, "Andrew Haley"  wrote:
> 
> On 31/03/17 19:56, Hans Boehm wrote:
>> This method should be used when cleanup actions (finalize() calls, or
>> Reference enqueuing, Cleaner invocation) could otherwise be triggered
> while
>> a resource under control of the object is still in use. This method should
>> normally be called when these cleanup facilities are used to perform
>> actions other than simply issuing a warning.
> 
> It's still pretty confusing.  It would be simpler to say that
> reachabilityFence should be used at the end of every method of a class
> which has cleanup actions if you don't want those cleanup actions to
> be run before the end of that method.  In that case, reachabilityFence
> only makes any difference in cases where otherwise you'd have a bug.
> 
> Andrew.



Re: [concurrency-interest] Spin Loop Hint support: Draft JEP proposal

2016-02-23 Thread Gil Tene
onXXX is VERY relevant to this situation. And it's exactly that precedent and 
common convention for event notification calls that drove the choice of 
onSpinWait as a method name. onSpinWait() it is delivering a notification [from 
the calling code to the Runtime, or the Thread] about what is going on: a spin 
wait is in progress.

The fundamental difference between onSpinWait() and Thread.yield() is that 
yield() means "please yield here", while onSpinWait means "I am in the middle 
of busily executing a spin wait that I need no help with, and certainly don't 
want to be blocking in. But you might want to know about that and do something 
with that knowledge. Preferably something that improves my performance." 
Basically, onSpinWait() is delivering a notification of a situation, and the 
runtime has a chance to react to that.

Dropping the "on" would completely change the meaning. Something called 
"spinWait" *is* like yield(). It would/should mean "please spin wait here". But 
that's not what spin wait constructs need or want in a call. Unlike yield(), 
spin wait constructs use their own logic for spinning, and usually need to 
retain control over the logic. They just need to let someone know that spinning 
is going on, in the hope that performance of the spinwait construct (with 
whatever logic it uses) might be improved.

Note that the runtime reactions to being notified the calling code is in the 
middle of a spin wait could range from doing nothing, to doing something with 
instruction execution (e.g. execute an x86 pause instruction), to doing 
something more sophisticated (e.g. trying to dedicate a core to the spinning 
thread and steer all other thread and interrupts away from it to improve it's 
reaction time stats). What will actually happen is up to the runtime, and 
intentionally not specified. Per the JavaDoc, if the runtime does take action 
when it receives the onSpinWait() event, the action is intended "to improve the 
performance of invoking spin-wait loop constructions".

You can hopefully/probably see why Runtime was a logical choice as the receiver 
of the event. Since Thread is also acceptable and defensible as the receiver of 
the event, I'm happy to go with it if it gets us over this name-can-of-worms 
thing. But please please please don't let us go into a whole method name redo 
again. We've done that on and off for 3+ months, and what we have is both 
logical and "not unacceptable": onSpinWait().

Let's just go with it.

Thread.onSpinWait() FTW!

> On Feb 23, 2016, at 8:03 PM, Vitaly Davidovich <vita...@gmail.com> wrote:
> 
> Yes, I was going to mention onXXX being common for event handler names (e.g
> onMouseClick), but didn't bother for the same reason you mentioned - it's
> irrelevant to this situation.
> 
> On Tuesday, February 23, 2016, Paul Benedict <pbened...@apache.org> wrote:
> 
>> The onXXX prefix does have precedent as event handler callbacks, but this
>> method does not fit that purpose. Thus, I agree dropping "on" is sensible.
>> On Feb 23, 2016 8:48 PM, "Vitaly Davidovich" <vita...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','vita...@gmail.com');>> wrote:
>> 
>>> On Tuesday, February 23, 2016, Doug Lea <d...@cs.oswego.edu
>>> <javascript:_e(%7B%7D,'cvml','d...@cs.oswego.edu');>> wrote:
>>> 
>>>> On 02/23/2016 04:30 PM, Vitaly Davidovich wrote:
>>>> 
>>>>> Why not drop the (superfluous, IMO) "on" prefix while you're changing
>>> the
>>>>> receiver?
>>>>> 
>>>> 
>>>> Because then it reads as if the method itself is doing a spinWait.
>>> 
>>> vs who else logically speaking? We all know there's a runtime underneath
>>> Java, there's no point in explicitly calling that out here.  Again, how is
>>> this different from Thread::yield or any of the other mentioned examples?
>>> This is splitting hairs perhaps but there's no onXXX precedent to follow
>>> and this just throws an oddly looking method name into the mix.
>>> 
>>>> "onSpinWait" is the only proposed name that no one has said they cannot
>>>> live with. So, live with it :-)
>>> 
>>> Perhaps that's because the Runtime placement was a more glaring issue? :)
>>> It's livable but I just don't see the point of the prefix (and yes, I read
>>> the description of the intent in the original mail).
>>> 
>>>> 
>>>> -Doug
>>>> 
>>>> 
>>>> 
>>>>> On Tue, Feb 23, 2016 at 4:20 PM, Gil Tene <g...@azul.com
>>> <javascript:_e(%7B%7D,'cvml','g...@azul.com');>> wrote:
>>>>>

Re: [concurrency-interest] Spin Loop Hint support: Draft JEP proposal

2016-02-23 Thread Gil Tene

> On Feb 22, 2016, at 10:11 AM, mark.reinh...@oracle.com wrote:
> 
> 2016/1/28 9:25 -0800, g...@azul.com:
>> This thread seems to have "hopped away" to the concurrency-interest
>> list in mid-Dec-2015. This posting is intended to capture a summary of
>> reasoning and some of the discussion there so that we have it in the
>> record in core-libs-dev. Mostly by including the contents of several
>> posts in the continuations of the original thread.
>> 
>> See thread continuations here:
>> http://cs.oswego.edu/pipermail/concurrency-interest/2015-December/thread.html#14576
>> and here:
>> http://cs.oswego.edu/pipermail/concurrency-interest/2015-December/thread.html#14580
>> 
>> Summary:
>> 
>> ...
> 
> Thanks for the summary.
> 
> I still don't buy the argument that this method belongs in j.l.Runtime.
> 
> To say that this method should go there because it's an instruction to
> the run-time system is pretty weak.  I agree with Vitaly [1] that if
> that's the threshold for adding methods to the Runtime class then lots
> of other stuff belongs there as well, including much of what's now in
> java.lang.Thread and java.util.concurrent and, arguably, anything else
> related to interacting with the environment in which the application
> runs (file and network I/O, process manipulation, etc.).
> 
> This thread-related method really belongs in either java.lang.Thread or
> java.util.concurrent.LockSupport.  j.l.Thread already has plenty of
> expert-level static methods related to the current thread, one of which
> (Thread::yield) is even a hint, just like this one.  j.u.c.LockSupport
> is even more obviously intended for expert users and hence may be the
> best choice, but I could live with either one.

Ok. In the interest of moving forward, lets settle on:

Thread.onSpinWait()

Same logic for the name, different receiver for the event. I can certainly live 
with it, and Doug seems ok with it as well.

— Gil.


Re: RFR(XS): 8147844: new method j.l.Runtime.onSpinWait()

2016-01-28 Thread Gil Tene

> On Jan 28, 2016, at 8:51 AM, mark.reinh...@oracle.com wrote:
> 
> 2016/1/28 8:12 -0800, Gil Tene <g...@azul.com>:
>> On Jan 27, 2016, at 9:41 PM, David Holmes <david.hol...@oracle.com> wrote:
>>> On 27/01/2016 11:31 PM, Ivan Krylov wrote:
>>>> Earlier there was a discussion on this mail alias about the spin loop
>>>> hint proposal [1]. Based on the feedback from that discussion some
>>>> changes were incorporated and the JEP has been filed [2]. There seems to
>>>> be a consensus on the API side. The JEP is now in a draft state and I
>>>> hope this JEP will get targeted for java 9 shortly.
>>> 
>>> The discussion in [1] continued in:
>>> 
>>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-December/037063.html
>>> 
>>> but ended abruptly. In particular Mark's query as to why this moved
>>> from Thread to Runtime was seemingly left unanswered.
>> 
>> The thread continued, but it looks like due to cross-posting with
>> concurrency-interest and people replying on the thread dropping the
>> cores-libs-dev recipient somehow.
> 
> I was wondering what happened to that thread ...
> 
>>  See continuations of the thread
>> here:
>> http://cs.oswego.edu/pipermail/concurrency-interest/2015-December/thread.html#14576
>> and here:
>> http://cs.oswego.edu/pipermail/concurrency-interest/2015-December/thread.html#14580
>> 
>> Mark's question on why this was moved from Thread to Runtime is
>> discussed in detail there. An easy summary in a single message body
>> can be found here:
>> http://cs.oswego.edu/pipermail/concurrency-interest/2015-December/014587.html
>> .
> 
> So that we have a self-contained record for posterity in the OpenJDK
> mail archive, can someone please summarize the reasoning to this list,
> core-libs-dev?

Good point. I will follow up in the original thread on core-libs-dev with a 
summary and pointers to the pother discussion.

> 
> I also suggested that this single method doesn't really need a JEP.
> You can do it that way if you really want to, but it will take a bit
> more time.

One reason we wanted to go through the JEP process is to add to the 
community-originated and community-lead stats in JEPs. I think it's worth the 
effort to do that, partly to encourage others to run through the same with 
larger scope things.

> 
> - Mark



Re: RFR(XS): 8147844: new method j.l.Runtime.onSpinWait()

2016-01-28 Thread Gil Tene

> On Jan 27, 2016, at 9:41 PM, David Holmes  wrote:
> 
> HI Ivan,
> 
> On 27/01/2016 11:31 PM, Ivan Krylov wrote:
>> Hello,
>> 
>> Earlier there was a discussion on this mail alias about the spin loop
>> hint proposal [1]. Based on the feedback from that discussion some
>> changes were incorporated and the JEP has been filed [2]. There seems to
>> be a consensus on the API side. The JEP is now in a draft state and I
>> hope this JEP will get targeted for java 9 shortly.
> 
> The discussion in [1] continued in:
> 
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-December/037063.html
> 
> but ended abruptly. In particular Mark's query as to why this moved from 
> Thread to Runtime was seemingly left unanswered.

The thread continued, but it looks like due to cross-posting with 
concurrency-interest and people replying on the thread dropping the 
cores-libs-dev recipient somehow. See continuations of the thread here: 
http://cs.oswego.edu/pipermail/concurrency-interest/2015-December/thread.html#14576
 and here: 
http://cs.oswego.edu/pipermail/concurrency-interest/2015-December/thread.html#14580

Mark's question on why this was moved from Thread to Runtime is discussed in 
detail there. An easy summary in a single message body can be found here: 
http://cs.oswego.edu/pipermail/concurrency-interest/2015-December/014587.html . 
Also, Doug addressed questions about the brief/bland wording choice ("typically 
improve performance", which we then switched to "The runtime may may take 
action to improve the performance" at Brian's suggestion) here: 
http://cs.oswego.edu/pipermail/concurrency-interest/2015-December/014578.html

HTH.

> 
> Thanks,
> David
> 
>> Please review the upcoming API changes:
>> http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/
>> 
>> For the reference, the new generated JavaDoc for j.l.Runtime class and
>> the new method
>> http://ivankrylov.github.io/onspinwait/api/java/lang/Runtime.html#onSpinWait--
>> 
>> 
>> Thanks,
>> 
>> Ivan
>> 
>> 1 -
>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-October/thread.html#35613
>> 
>> 2 - https://bugs.openjdk.java.net/browse/JDK-8147832.



Re: [concurrency-interest] Spin Loop Hint support: Draft JEP proposal

2016-01-28 Thread Gil Tene
This thread seems to have "hopped away" to the concurrency-interest list in 
mid-Dec-2015. This posting is intended to capture a summary of reasoning and 
some of the discussion there so that we have it in the record in core-libs-dev. 
Mostly by including the contents of several posts in the continuations of the 
original thread.

See thread continuations here: 
http://cs.oswego.edu/pipermail/concurrency-interest/2015-December/thread.html#14576
 and here: 
http://cs.oswego.edu/pipermail/concurrency-interest/2015-December/thread.html#14580

Summary:

On the reasoning for moving from Thread.SpinLoopHint to Runtime.onSpinWait:

From Gil Tene, on Tue, Dec 1, 2015 at 3:45 PM
> …
> Thread.spinLoopHint() was my first choice as well. But I was swayed by strong 
> arguments against "hint" in the method name. Including the lack of "action" 
> and ambiguity about roles. We looked at various names that were either clear 
> and way too long or way too short or implementation specific (or narrowing), 
> like skip, pause, relax, etc.
> 
> Given the spec we agreed on, the name we were looking for was something that 
> would be equivalent to the obvious expectations from something named as 
> elaborately as:
> 
> maybeYouShouldTryToImproveThePerformanceOfTheSpinWaitLoopMakingThisCall(), 
> with the receiver being either the thread or the runtime.
> 
> The "maybe you should try" part is important because doing nothing is a valid 
> option, and accidentally failing to achieve the goal is probably ok, but 
> consistently working in the opposite direction of the goal would be 
> "surprising behavior". The "...making this call" part is important because of 
> ambiguities around roles and actions (the call is not expected to spin, or 
> wait, it's the caller that is doing those things).
> 
> Given the natural way we'd describe what the options are for the receiver in 
> plain language, it became clear that Runtime fit better: we naturally say 
> "the runtime may..." and "indicate to the runtime...", not "the thread may" 
> or "indicate to the thread...". In addition, some of the implementation 
> possibilities (e.g. switch this thread to spin on a dedicated core) may 
> involve actions that are natural runtime actions but far outside of the scope 
> of what Thread might do.
> 
> With an event delivery paradigm ("I'm in a spin wait loop, you may want to do 
> something about that") Runtime.onSpinWait() fits common naming conventions 
> and roles. It's also  readable enough to understand that the Runtime is being 
> told that a spin wait is going on. And in that sense, it is just as 
> expressive as spinLoopHint(), while actually following a naming convention. 
> We left the "try to improve the performance" to the spec/JavaDoc because it 
> was very hard to fit in the name.

From Martin Thompson, on Tue, Dec 1, 2015 at 9:13 AM:
> ...
> The "on" prefix was suggested as the caller is notifying the runtime that it 
> is in a spin-wait loop. This allows the runtime the option of reacting to the 
> event, or not, and allows flexibility in how it chooses to react to this 
> information. The method is not waiting; the method is notifying that the 
> caller is waiting.
> 
> Yes, but we don't have Runtime.onGC() or Runtime.onRunFinalization(), and 
> both of those are documented as "suggesting" the VM perform those actions.  
> spinLoopHint() sounded much better than what's proposed here, and carries the 
> suggestion/hint/optionality that's desired.  IMHO, onSpinWait() is the least 
> appealing options proposed thus far.
> 
> System.gc() and Runtime.runFinalizersOnExit(boolean) are clear instructions 
> to the system/runtime to do something in an imperative fashion. The 
> onSpinWait() is a declarative statement about the state of the current 
> thread. I can understand a dislike to a name. We all have personal taste on 
> this but I don't think you are comparing similar semantics.
> 
> Consider the runtime as an object. You send it a message via a method. What 
> exactly is spinLoopHint() telling the Runtime what to do? It is applying an 
> event but not expressing it via any convention. "spinLoopHint()" works for me 
> on a Thread to an extent. We should express intent or declare status for 
> this. Hints don't fit comfortably in a programming model.
> 
> The actual naming does not matter so much as this will only be used by 
> minority of programmers. I'll prepare to be flamed on that :-) However it is 
> desperately needed and anything that makes it slip the date to make Java 9 
> would be such a let down.



Some discussion about naming choices, aiming the API at the target user, and 
what we

Re: RFR [9] 8148117: Move sun.misc.Cleaner to jdk.internal.ref

2016-01-25 Thread Gil Tene
I assume your goal here is to get the resources released with the next newgen 
collections (following a close()), rather than wait for an oldgen (if the 
resource was held by an old object). That's a cool thing.

With that in mind, you can replace the repeated periodic 
polling/flipping/allocation and external calling changeGuard() with a simple 
internal GC-detecor that would call changeGuard() allocate a new guard only 
once per newgen GC cycle.

This can take the form of adding a simple GCDetector inside your implementation:

private class GCDetector {
@Override
protected void finalize() throws Throwable {
GCDetector detector = new GCDetector();
changeGuard();
Reference.reachabilityFence(detector);
}
}

// The reason to use finalize here instead of a phantom ref
// based cleaner is that it would trigger immediately after the cycle,
// rather than potentially take an extra cycle to trigger.
// This can be done with a weakRef based cleaner instead
// (but probably when one is added to the JDK, otherwise you'd
// need your own polling thread and logic here...).

You'll need to allocate a single instance of GCDetector during construction of 
a CloseableMemory, without retaining a reference to it after construction. This 
will start a finalize-triggering chain per instance, with the chain "ticking" 
once per newgen cycle.

If you want to avoid having one of these (coming and going on each GC cycle) 
per CloseableMemory instance, you can use a common static detector and a 
registration mechanism (where each registered instance would have it's 
changeGuard() method call…

— Gil.


> On Jan 24, 2016, at 9:10 AM, Peter Levart  wrote:
> 
> Hi,
> 
> I had an idea recently on how to expedite the collection of an object. It is 
> simple - just don't let it live long.
> 
> Here's a concept prototype:
> 
> http://cr.openjdk.java.net/~plevart/misc/CloseableMemory/CloseableMemory.java
> 
> The overhead of the check in access methods (getByte()/setByte()) amounts to 
> one volatile read of an oop variable that changes once per say 5 to 10 
> seconds. That's the period a special guard object is alive. It's reachability 
> is tracked by the GC and extends to the end of each access method (using 
> Reference.reachabilityFence). Every few seconds, the guard object is changed 
> with new fresh one so that the chance of the guard and its tracking Cleaner 
> being promoted to old generation is very low.
> 
> Could something like that enable a low-overhead CloseableMappedByteBuffer?
> 
> Regards, Peter
> 
> On 01/23/2016 09:31 PM, Andrew Haley wrote:
>> On 23/01/16 20:01, Uwe Schindler wrote:
>> 
>>> It depends how small! If the speed is still somewhere between Java 8
>>> ByteBuffer performance and the recent Hotspot improvements in Java
>>> 9, I agree with trying it out. But some volatile memory access on
>>> every access is a no-go. The code around ByteBufferIndexInput in
>>> Lucene is the most performance-critical critical code, because on
>>> every search query or sorting there is all the work happening in
>>> there (millions of iterations with positional ByteBuffer.get*
>>> calls). As ByteBuffers are limited to 2 GiB, we also need lots of
>>> hairy code to work around that limitation!
>> Yes, I see that code.  It would be helpful if there were a
>> self-contained but realistic benchmark using that code.  That way,
>> some simple experiments would allow changes to be measured.
>> 
>>> If you look at ByteBufferIndexInput's code you will see that we
>>> simply do stuff like trying to read from one bytebuffer and only if
>>> we catch an BufferUnderflowException we fall back to handling buffer
>>> switches: Instead of checking bounds on every access, we have
>>> fallback code only happening on exceptions. E.g. if you are 3 bytes
>>> before end of one buffer slice and read a long, it will throw
>>> BufferUnderflow. When this happens the code will fall back to read
>>> byte by byte from 2 different buffers and reassemble the long):
>> I'm surprised you don't see painful deoptimization traps when that
>> happens.  I suppose it's rare enough that you don't care.  There's a
>> new group of methods in JDK9 called Objects.checkIndex() which are
>> intended to provide a very efficient way to do bounds checks. It might
>> be interesting to see if they work well with ByteBufferIndexInput:
>> that's an important use case.
>> 
>> BTW, does anyone here know why we don't have humongous ByteBuffers
>> with a long index?
>> 
>> Andrew.
> 



Re: Ephemerons

2016-01-24 Thread Gil Tene
(include == true), do {
> scan the ephemerons and for those with live keys, mark value and 
> transitive closure as live.
> while marking the value and transitive closure as live, for each 
> object that was newly marked alive,
> compute the index into the ephemeron buckets as though such object 
> was ephemeron's key
> and set the pending flag of that bucket.
> }
> set the include flags of all buckets from the pending flags of the 
> buckets and count # of pending buckets
> } while (# of pending buckets > 0);

Interesting. I think that this can be described as a "pessimistically estimated 
reverse reference scheme", where you don't actually track reverse references, 
but pessimistically (with some hash) act as if you hit them when you newly mark 
an object live, causing a force-re-evaluate ephemerons that have been "hit" by 
the hash.

But since I think that an actual reverse reference mapping scheme is fairly 
simple to implement, and is both "simpler" in many ways and would result in 
less GC work (no false positive checks forced on ephemerons when marking 
happens to hit them with a hash), I'd stick with that...

> 
> Hm, 
> 
> 
> Regards, Peter
> 
> On 01/24/2016 06:43 PM, Gil Tene wrote:
>> A note on implementation possibilities:
>> 
>> If I read the implementation correctly, a "weakness" of the current 
>> implementation approach for making sure value-referents (and their 
>> transitively reachable) objects are kept alive if key referents are alive is 
>> that it requires multiple passes through the discovered Ephemeron list, with 
>> the passes terminating when the list stabilizes. While I think that this is 
>> sound (in the sense that it will work), it carries a potentially high cost 
>> when large sets of Ephemerons exist in the heap. E.g. if the Ephemerons are 
>> linked in a k-v list (where the value referent of one ephemeron is the key 
>> of another, in a chain), as in your code example, there is an N^2 scanning 
>> thing going on. And e.g. if a large set of ephemeron keys become weakly 
>> reachable in a single cycle (e.g. a large cache was discarded) while other 
>> ephemeron participate in some linked list relationship, the entire list of 
>> [stably] weak-keyed ephemerons has to be traversed in each pass (in case one 
>> of them has become live in a previous pass). I'd worry that these 
>> computational complexity issues could become prohibitive enough in GC cost 
>> that you there would be significant resistance to their adoption.
>> 
>> Note that in comparison (to my understanding), current ref processing work 
>> involved in GC handling soft/weak/final/phantom refs remains linear to the 
>> number of refs, and does not have an O(N^2) component.
>> 
>> I believe that there is a relatively simple way to bring Ephemeron 
>> processing to O(N) by establishing reverse mapping during the scan (the 
>> below description assumes STW during the scan):
>> 
>> 1. Start ref processing with no reverse mapping table established.
>> 
>> 2. During ref processing, establish an EphemeronKeyReverseMapping table 
>> (logically a Map<Address, List>) which would map individual heap 
>> addresses to ephemerons who's key referent points to those addresses.
>> 
>> 3. Note that since each heap address can show up in multiple key referents, 
>> the map needs to return a (potentially empty) list of Ephemerons who's keys 
>> refer to the address.
>> 
>> 4. Specifically, starting with an empty list, and for each discovered 
>> Ephemeron, add a reverse-mapping entry to the EphemeronKeyReverseMapping, 
>> mapping from the key referent address to the Ephemeron.
>> 
>> 5. During Ephemeron processing (under the case where the referent is found 
>> to be alive and the ephemeron then needs to keep the value referent alive) 
>> mark down the value referent path using a special ephemeron_keep_alive 
>> OopClosure (or a mode flag that affects the normal keep_alive behavior) 
>> which, when reaching a not-yet-marked-live object [in addition to marking it 
>> live and traversing it as keep_alive would normally do] would look up the 
>> object's address in the EphemeronKeyReverseMapping to get a list of 
>> Ephemerons to traverse, and traverse each of the mapped Ephemerons' value 
>> referent with the same ephemeron_keep_alive closure.
>> Note: doing reverse-mapping lookups on each not-yet-marked object in a 
>> keep_alive closure will add cost, which is why this ephemeron_keep_alive 
>> pass should probably be done after regular keep_alive passes have had their 
>>

Re: Ephemerons

2016-01-24 Thread Gil Tene
A note on implementation possibilities:

If I read the implementation correctly, a "weakness" of the current 
implementation approach for making sure value-referents (and their transitively 
reachable) objects are kept alive if key referents are alive is that it 
requires multiple passes through the discovered Ephemeron list, with the passes 
terminating when the list stabilizes. While I think that this is sound (in the 
sense that it will work), it carries a potentially high cost when large sets of 
Ephemerons exist in the heap. E.g. if the Ephemerons are linked in a k-v list 
(where the value referent of one ephemeron is the key of another, in a chain), 
as in your code example, there is an N^2 scanning thing going on. And e.g. if a 
large set of ephemeron keys become weakly reachable in a single cycle (e.g. a 
large cache was discarded) while other ephemeron participate in some linked 
list relationship, the entire list of [stably] weak-keyed ephemerons has to be 
traversed in each pass (in case one of them has become live in a previous 
pass). I'd worry that these computational complexity issues could become 
prohibitive enough in GC cost that you there would be significant resistance to 
their adoption.

Note that in comparison (to my understanding), current ref processing work 
involved in GC handling soft/weak/final/phantom refs remains linear to the 
number of refs, and does not have an O(N^2) component.

I believe that there is a relatively simple way to bring Ephemeron processing 
to O(N) by establishing reverse mapping during the scan (the below description 
assumes STW during the scan):

1. Start ref processing with no reverse mapping table established.

2. During ref processing, establish an EphemeronKeyReverseMapping table 
(logically a Map<Address, List>) which would map individual heap 
addresses to ephemerons who's key referent points to those addresses.

3. Note that since each heap address can show up in multiple key referents, the 
map needs to return a (potentially empty) list of Ephemerons who's keys refer 
to the address.

4. Specifically, starting with an empty list, and for each discovered 
Ephemeron, add a reverse-mapping entry to the EphemeronKeyReverseMapping, 
mapping from the key referent address to the Ephemeron.

5. During Ephemeron processing (under the case where the referent is found to 
be alive and the ephemeron then needs to keep the value referent alive) mark 
down the value referent path using a special ephemeron_keep_alive OopClosure 
(or a mode flag that affects the normal keep_alive behavior) which, when 
reaching a not-yet-marked-live object [in addition to marking it live and 
traversing it as keep_alive would normally do] would look up the object's 
address in the EphemeronKeyReverseMapping to get a list of Ephemerons to 
traverse, and traverse each of the mapped Ephemerons' value referent with the 
same ephemeron_keep_alive closure.
Note: doing reverse-mapping lookups on each not-yet-marked object in a 
keep_alive closure will add cost, which is why this ephemeron_keep_alive pass 
should probably be done after regular keep_alive passes have had their chance 
to mark objects live. This way only paths that become newly-live via ephemeron 
processing are subject to the extra reverse-mapping-lookip cost.

While I haven't been poking at this too long to see if it has holes, I think it 
can produce a reliable result, and is O(N) on the count of Ephemerons.

— Gil.


> On Jan 24, 2016, at 2:52 AM, Peter Levart <peter.lev...@gmail.com> wrote:
> 
> Hi Gil,
> 
> I totally agree with your assessment. We should not introduce another way of 
> reviving the almost collectable objects and I fully support tightening the 
> specification so that soft and weak references to the same referent and to 
> other referents from which this referent is reachable are required to be 
> cleared together atomically.
> 
> I modified the prototype to (hopefully) adhere to this new Ephemeron 
> specification that Gil and I agreed upon. Anyone interested in experimenting 
> can find it here:
> 
> http://cr.openjdk.java.net/~plevart/misc/Ephemeron/webrev.jdk.02/ 
> <http://cr.openjdk.java.net/~plevart/misc/Ephemeron/webrev.jdk.02/>
> http://cr.openjdk.java.net/~plevart/misc/Ephemeron/webrev.hotspot.02/ 
> <http://cr.openjdk.java.net/~plevart/misc/Ephemeron/webrev.hotspot.02/>
> 
> It is rebased to current tip of jdk9-dev repositories (after the bulk of 
> merges for jdk-9+102), but still contains the change to remove the Cleaner 
> reference type as it has not yet managed to get in...
> 
> I have also added a test that is a start for verifying the functionality.
> 
> Regards, Peter
> 
> On 01/23/2016 07:25 PM, Gil Tene wrote:
>> 
>>> On Jan 23, 2016, at 5:14 AM, Peter Levart < 
>>> <mailto:peter.lev...@gmail.com>peter.lev...@gmail.com 
>>&

Re: Ephemerons

2016-01-23 Thread Gil Tene

> On Jan 22, 2016, at 2:49 PM, Peter Levart <peter.lev...@gmail.com> wrote:
> 
> Hi Gil,
> 
> Thanks for taking a look at the Ephemerons for Java. It's great to have a big 
> mind joining the discussion.
> 
> On 01/22/2016 07:12 PM, Gil Tene wrote:
>> Peter,
>> 
>> I've been following Ephemerons in other GC'ed environments, and wondering 
>> when someone will bring it up for Java. Happy to see attentions given to it. 
>> The "conditional weak reference hashmap/table/dictionary" use case seems to 
>> be a common primary motivator, and it's a very valid one. Given that we 
>> currently do completely concurrent ref processing (in Zing at least) for all 
>> reference types, adding Ephemerons to the spec'ed behavior will add some 
>> interesting challenges for keeping things concurrent, but I don't think that 
>> it is fundamentally undoable (just "hard" and interesting to fully work out).
> 
> The algorithm for processing ephemerons is not much different from that for 
> processing normal references. It's just that it is iterative until it 
> converges to a stable state. If you allow mutator threads to at least in some 
> phases execute concurrently with normal reference processing then I suspect 
> it should be possible to do that for ephemerons too, but I don't have a clue 
> what tricks you perform to be able to allow mutator threads to be concurrent 
> with reference processing.

The subject of Concurrent Reference Processing has not been widely written 
about. While we've been doing it in our production collectors (PGC, C4 in our 
Vega and Zing products) since roughly 2007, we had not published papers about 
the mechanisms involved. Partially because it seemed to be an "obscure" area of 
GC. I believe there are versions of Metronome that also dealt with non-STW ref 
processing, but without publishing the mechanisms either. In fact, the first 
publication I am aware of that deals with the subject of Concurrent (aka "on 
the fly") collection and reference types is "Reference Object Processing in 
On-The-Fly Garbage Collection" by Ugawa, Jones, and Ritson (published in 2014 
at ISMM'14, and can be found here <https://kar.kent.ac.uk/40820/>, with a 
presentation here 
<https://www.cs.kent.ac.uk/projects/gc/mirrorgc/ismm-2014-reftypes.pdf>).

I believe that I can say (with some authority) that there are sound and 
reliable ways to achieve fully concurrent (as in Zero STW) reference processing 
for the non-strong reference types in Java today (soft, weak, phantom, final). 
But that doing so requires some interesting state machine modeling and careful 
"throwing of gates" to achieve atomic transitions in behavior without requiring 
STW processing of references (which is probably a must), and without even 
requiring an STW transition point (which is a nice bonus for the perfectionists 
among us, but probably acceptable to do without if the transition included no 
processing of references). The mechanisms we use rely on an ability to 
carefully "throw a gate" at a point in the tracing where strong ref processing 
(including all soft refs that have been policy-chosen to be strengthened in 
this cycle) is known to be complete, and after which references to 
non-strongly-marked objects can safely return null (and before which they 
return a strong reference). Ephemerons will almost certainly mess with how that 
gate currently works...

So while I believe that dealing with Ephemerons in a STW ref processing world 
is fairly straight forward, adding the Ephemeron behavior into a concurrent ref 
processing world will "mess" with the delicate transitions and invariants that 
make the current mechanisms work. Still, I'm fairly confident that we could 
still find a way to deal with that. But I don't have one formulated fully at 
the moment, and don't know of anyone else who has...

> 
>> Scanning through the proposal and Java class (mostly JavaDoc spec), I have 
>> the following question:
>> 
>> Do we really need a separate "ephemerally reachable" strength below week? 
>> The was you extended the definition to the weak definition to say "An object 
>> is weakly reachable if it is neither strongly nor softly reachable but can 
>> be reached by traversing a weak reference or by traversing an ephemeron 
>> through it's value while the ephemeron's key is at least weakly reachable." 
>> would [naively] seem to be sufficient to add and fully describe the desired 
>> Ephemeron behavior from a reference strength perspective.
> 
> Ephemeron always touches definitions of at least two consecutive strengths of 
> reachabilities. The prototype says:
> 
>  *  An object is weakly reachable if it is neither
>  * strongly n

Re: Ephemerons

2016-01-23 Thread Gil Tene

> On Jan 23, 2016, at 5:14 AM, Peter Levart <peter.lev...@gmail.com> wrote:
> 
> Hi Gil, it's good to have this discussion. See comments inline...
> 
> On 01/23/2016 05:13 AM, Gil Tene wrote:
> 
>>> On Jan 22, 2016, at 2:49 PM, Peter Levart < 
>>> <mailto:peter.lev...@gmail.com>peter.lev...@gmail.com 
>>> <mailto:peter.lev...@gmail.com>> wrote:
>>> 
>>> Ephemeron always touches definitions of at least two consecutive strengths 
>>> of reachabilities. The prototype says:
>>> 
>>>  *  An object is weakly reachable if it is neither
>>>  * strongly nor softly reachable but can be reached by traversing a
>>>  * weak reference or by traversing an ephemeron through it's value while
>>>  * the ephemeron's key is at least weakly reachable.
>>> 
>>>  *  An object is ephemerally reachable if it is neither
>>>  * strongly, softly nor weakly reachable but can be reached by traversing an
>>>  * ephemeron through it's key or by traversing an ephemeron through it's 
>>> value
>>>  * while it's key is at most ephemerally reachable. When the ephemerons that
>>>  * refer to ephemerally reachable key object are cleared, the key object 
>>> becomes
>>>  * eligible for finalization.
>> 
>> Looking into this a bit more, I don't think the above is quite right. 
>> Specifically, If an ephemeron's key is either strongly of softly reachable, 
>> you want the value to remain appropriately strongly/softly reachable. 
>> Without this quality, Ephemeron value referents can (and will) be 
>> prematurely collected and finalized while the keys are not. This (IMO) 
>> needed quality not provided by the behavior you specify…
> 
> This is not quite true. While ephemeron's value is weakly or even 
> ephemerally-reachable, it is not finalizable, because ephemeraly-reachable is 
> stronger than finaly-reachable. After ephemeron's key becomes 
> ephemeraly-reachable, the ephemeron is cleared by GC which sets it's key 
> *and* value to null atomically. The life of key and value at that moment 
> becomes untangled. Either of them can have a finalizer or not and both of 
> them will eventually be collected if not revived by their finalize() methods. 
> But it can never happen that ephemeron's value is finalized or collected 
> while it's key is still reachable through the ephemeron (while the ephemeron 
> is not cleared yet).
> 
> But I agree that it would be desirable for ephemeron's value to follow the 
> reachability of it's key. In above specification, if the key is strongly 
> reachable, the value is weakly reachable, so any WeakReferences or 
> SoftReferences pointing at the Ephemeron's value can already be cleared while 
> the key is still strongly reachable. This is arguably no different than 
> current specification of Soft vs. Weak references. A SoftReference can 
> already be cleared while its referent is still reachable through a 
> WeakReference,

We seem to agree about the cleaner behavior specification (in both of our texts 
below), so the these next paragraphs are really about arguing for why this is 
an important design choice if/when adding Ephemerons to Java:

It is true the [current] spec allows for soft references to an object to be 
cleared while weak references to the same object are not: the "determines" in 
"Suppose that the garbage collector determines at a certain point in time hat 
an object is  reachable..." part [for  = {soft, weak}] does not have to 
happen at the same "certain point in time".

However, to my knowledge all current implementations present as if this 
determination is happening at the same "point in time" for all weakly and 
softly reachable objects combined. Specifically [in implementations]: if soft 
reachability is determined for an object at some point in time, then weak 
reachability for that object is determined at the same point in time. And the 
weak reachability determination for an object depends on whether the collector 
chose to clear existing soft references to that object at that same point in 
time, with the appearance of the choice to clear (or not to clear) soft 
references to a given object atomically affecting the determination of it's 
weak reachability. Since the collector is *required* to act on a weak 
determination when it is made, while it *may* act on a soft determination when 
it is made, making the combined determination at the same "point in time" 
eliminates an obviously confusing situation that is not prohibited by the spec: 
if the determination for weak and soft reachability was not done at the same 
point in time, then an object that was softly reachable and had it's soft 
refer

Re: Ephemerons

2016-01-22 Thread Gil Tene
Peter,

I've been following Ephemerons in other GC'ed environments, and wondering when 
someone will bring it up for Java. Happy to see attentions given to it. The 
"conditional weak reference hashmap/table/dictionary" use case seems to be a 
common primary motivator, and it's a very valid one. Given that we currently do 
completely concurrent ref processing (in Zing at least) for all reference 
types, adding Ephemerons to the spec'ed behavior will add some interesting 
challenges for keeping things concurrent, but I don't think that it is 
fundamentally undoable (just "hard" and interesting to fully work out).

Scanning through the proposal and Java class (mostly JavaDoc spec), I have the 
following question:

Do we really need a separate "ephemerally reachable" strength below week? The 
was you extended the definition to the weak definition to say "An object is 
weakly reachable if it is neither strongly nor softly reachable but can be 
reached by traversing a weak reference or by traversing an ephemeron through 
it's value while the ephemeron's key is at least weakly reachable." would 
[naively] seem to be sufficient to add and fully describe the desired Ephemeron 
behavior from a reference strength perspective.

Specifically, would modifying your implementation such that Ephemeron 
extended WeakReference (instead of Reference), and it's V value was 
tracked with "private WeakReference valueRef" (instead of "private V value") 
[along with the semi-obvious internal changes that would result] provide what 
is needed to support proper Ephemeron behavior we just added the "…or by 
traversing an ephemeron through it's value while the ephemeron's key is at 
least weakly reachable." to the spec'ed definition of "weakly reachable" (and 
modified the associated JVM ref processing accordingly, which you already have 
to do to support your wider definitions anyway)?

— Gil.

> On Jan 17, 2016, at 10:18 AM, Peter Levart  wrote:
> 
> Hi,
> 
> Ephemerons are special kind of references described by Barry Hayes [1]. In 
> the simple variant, they contain two "referents". One is called the key, 
> which can be viewed as a "weak" referent in the style of Java reference types 
> (SoftReference, WeakReference, PhantomReference), the other is called the 
> value, which is also treated specially by GC. It's reachability is dependent 
> on the reachability of the key.
> 
> Ephemerons solve the problem seen for example in the java.util.WeakHashMap 
> when a value in this map directly or indirectly refers to it's key. Such 
> entry will never be purged as the value is always reachable from the 
> WeakHashMap instance and through the value, it's key is reachable too. There 
> are other places where Ephemerons could be used - for example in ClassValue 
> API. Try googling for "java ephemeron" (without quotes) to find out other 
> situations where Ephemerons would be beneficial.
> 
> If one was to implement Ephemerons in Java, the main question he would be 
> asking is how Ephemeron(s) were supposed to interact with existing Java 
> reference types. Hayes only defines their behavior in isolation, but Java 
> already has 4 "weak" reference types with different "strengths": 
> SoftReference, WeakReference, FinalReference (for implementing finalization) 
> and PhantomReference. In the prototype I choose to define Ephemerons as a new 
> reference type with it's own "strength" for the key referent positioned 
> between WeakReference and FinalReference. It would be possible to merge it 
> with an existing reference type like WeakReference or position it between 
> SoftReference and WeakReference or even entirely "on the top" of the strength 
> list, but I think it would not be wise to position it below FinalReference or 
> even PhantomReference. What's best is an open question. What is also not so 
> obvious is how to define the reachability of the Ephemeron's value and it's 
> interaction with existing reference types. I think I defined it soundly (see 
> the reachability types in the javadoc of [4]). The reason for defining the 
> reachability of the value to alternate between ephemeraly-reachable and 
> weakly-reachable and not between ephemeraly-reachable and strongly-reachable, 
> while theoretically possible, is the desire to have a separate phase of 
> processing for each reachability strength, like it is done currently, and 
> which doesn't affect the performance of processing of existing reference 
> types.
> 
> Having skimmed through the reference processing code in the VM for a couple 
> of times, I thought it should not be too complicated to implement another 
> type of "weak" reference. Encouraged by how little changes were needed to 
> remove the sun.misc.Cleaner special type of reference [2], I began 
> experimenting and came up with a prototype that seems to work [3]. Luckily 
> most of the logic to process Reference objects in VM is encapsulated in the 
> ReferenceProcessor class and this 

Re: Ephemerons

2016-01-22 Thread Gil Tene

> On Jan 22, 2016, at 2:49 PM, Peter Levart <peter.lev...@gmail.com> wrote:
> 
> Hi Gil,
> 
> Thanks for taking a look at the Ephemerons for Java. It's great to have a big 
> mind joining the discussion.
> 
> On 01/22/2016 07:12 PM, Gil Tene wrote:
>> Peter,
>> 
>> I've been following Ephemerons in other GC'ed environments, and wondering 
>> when someone will bring it up for Java. Happy to see attentions given to it. 
>> The "conditional weak reference hashmap/table/dictionary" use case seems to 
>> be a common primary motivator, and it's a very valid one. Given that we 
>> currently do completely concurrent ref processing (in Zing at least) for all 
>> reference types, adding Ephemerons to the spec'ed behavior will add some 
>> interesting challenges for keeping things concurrent, but I don't think that 
>> it is fundamentally undoable (just "hard" and interesting to fully work out).
> 
> The algorithm for processing ephemerons is not much different from that for 
> processing normal references. It's just that it is iterative until it 
> converges to a stable state. If you allow mutator threads to at least in some 
> phases execute concurrently with normal reference processing then I suspect 
> it should be possible to do that for ephemerons too, but I don't have a clue 
> what tricks you perform to be able to allow mutator threads to be concurrent 
> with reference processing.
> 
>> Scanning through the proposal and Java class (mostly JavaDoc spec), I have 
>> the following question:
>> 
>> Do we really need a separate "ephemerally reachable" strength below week? 
>> The was you extended the definition to the weak definition to say "An object 
>> is weakly reachable if it is neither strongly nor softly reachable but can 
>> be reached by traversing a weak reference or by traversing an ephemeron 
>> through it's value while the ephemeron's key is at least weakly reachable." 
>> would [naively] seem to be sufficient to add and fully describe the desired 
>> Ephemeron behavior from a reference strength perspective.
> 
> Ephemeron always touches definitions of at least two consecutive strengths of 
> reachabilities. The prototype says:
> 
>  *  An object is weakly reachable if it is neither
>  * strongly nor softly reachable but can be reached by traversing a
>  * weak reference or by traversing an ephemeron through it's value while
>  * the ephemeron's key is at least weakly reachable.
> 
>  *  An object is ephemerally reachable if it is neither
>  * strongly, softly nor weakly reachable but can be reached by traversing an
>  * ephemeron through it's key or by traversing an ephemeron through it's value
>  * while it's key is at most ephemerally reachable. When the ephemerons that
>  * refer to ephemerally reachable key object are cleared, the key object 
> becomes
>  * eligible for finalization.

Looking into this a bit more, I don't think the above is quite right. 
Specifically, If an ephemeron's key is either strongly of softly reachable, you 
want the value to remain appropriately strongly/softly reachable. Without this 
quality, Ephemeron value referents can (and will) be prematurely collected and 
finalized while the keys are not. This (IMO) needed quality not provided by the 
behavior you specify…

For a correctly specified behavior, I think all strengths (from strong down) 
need to be affected by key/value Ephemeron relationships, but without adding an 
"ephemerally reachable" strength. E.g. I think you fundamentally need something 
like this:

- "An object is strongly reachable if it can be reached by (a) some 
thread without traversing any reference objects, or by (b) traversing the value 
of an Ephemeron whose key is strongly reachable. A newly-created object is 
strongly reachable by the thread that created it"

- "An object is softly reachable if it is not strongly reachable but 
can be reached by (a) traversing a soft reference or by (b) traversing the 
value of an Ephemeron whose key is softly reachable.

- "An object is weakly reachable if it is neither strongly nor softly 
reachable but can be reached by (a) traversing a weak reference or by (b) 
traversing the value of an ephemeron whose key is weakly reachable.

> 
> But Ephemeron does not need a special reachability strength. It could be 
> merged with WeakReference as far as the reachability of it's key is 
> concerned. In that case it would touch at least the definitions of 
> softly-reachable and weakly-reachable:
> 
>  *  An object is softly reachable if it is not strongly
>  * reachable but can be reached by traversing a soft reference or
>  * by traversing an ephemeron through it's value while
>  * the ephemero

Re: [concurrency-interest] Spin Loop Hint support: Draft JEP proposal

2015-11-30 Thread Gil Tene
Update: After some significant back-and-forth between Doug and I on naming and 
JavaDoc'ing, and with Martin (Thompson) stepping in to help, we have what we 
think is a good spec and name selection for this thing. We're proposing to add 
a new static method to the Runtime class:

class Runtime { /...
/**
  * Method signifying that the caller is momentarily unable to
  * progress until the occurrence of one or more actions of one or
  * more other activities.  When invoked within each iteration, this
  * method typically improves performance of spin wait loop
  * constructions.
  */
 public static void onSpinWait() {};
}

See updated details, including a link to the updated JEP draft, as well as 
links to working prototype implementations, webrevs against OpenJDK9b94, and 
example here: https://github.com/giltene/GilExamples/tree/master/SpinWaitTest 
 . All names 
have changed to reflect the new naming (onSpinWait, 
-XX:+UseOnSpinWaitIntrinsic, SpinWaitTest, etc.).

As an interesting stat, the total changes in the WebRevs amount to 78 added 
lines (across 14 files) , and 0 lines removed or changed. Hopefully a good 
indication of relatively low footprint and risk.

— Gil.







Re: [concurrency-interest] Spin Loop Hint support: Draft JEP proposal

2015-10-29 Thread Gil Tene
[Sorry for the 4 day delay in response. JavaOne sort of got in the way]

I think we are looking at two separate and almost opposite motivations, each of 
which is potentially independently valid. Each can be characterized by 
answering the question: "How does adding this to an empty while(!ready) {} spin 
loop change things?".

Putting name selection aside, one motivation can be characterized with "if I 
add this to a spinning loop, keep spinning hard and don't relinquish resources 
any more than the empty loop would, but try to leave the spin as fast as 
possible. And it would be nice if power was conserved as a side effect.". The 
other motivation can be characterized with "If I add this to a spin loop, I am 
indicating that I can't make useful progress unless stuff happens or some 
internal time limit is reached, and that it is ok to try and make better use of 
resources (including my CPU), relinquishing them more aggressively than the 
empty loop would. And it would be nice if reaction time was faster most of the 
time too". 

The two motivations are diametrically opposed in their expected effect when 
compared to the behavior of an empty spin loop that does not contain them. Both 
can be validly implemented as a nop, but they "hint" in opposite directions. 
The former is what I have been calling a spin loop hint (in the "keep spinning 
and don't let go" sense), and the latter is a "spin/yield" (in the "ok to let 
go" sense). They have different uses.

> On Oct 24, 2015, at 11:09 AM, Doug Lea <d...@cs.oswego.edu> wrote:
> 
> 
> Here's one more attempt to explain why it would be a good idea
> to place, name, and specify this method in a way that is more
> general than "call this method only if you want a PAUSE instruction
> on a dedicated multicore x86":

I agree with the goal of not aiming at a processor specific behavior, and 
focusing on documenting intent and expectation. But I think that the intent 
suggested in the spinLoopHint() JavaDoc does that. As noted later in this 
e-mail, there are other things that the JVM can choose to do to work in the 
hint's intended direction.

> 
> On 10/15/2015 01:23 PM, Gil Tene wrote:
> ...
>> 
>> As noted in my proposed JavaDoc, I see the primary indication of the hint to
>> be that the reaction time to events that would cause the loop to exit (e.g.
>> in nanosecond units) is more important to the caller than the speed at which
>> the loop is executing (e.g. in "number of loop iterations per second" units).
> 
> Sure. This can also be stated:
> 
> class Thread { ...
> /**
>  * A hint to the platform that the current thread is momentarily
>  * unable to progress until the occurrence of one or more actions of
>  * one or more other threads (or that its containing loop is
>  * otherwise terminated).  The method is mainly applicable in
>  * spin-then-block constructions entailing a bounded number of
>  * re-checks of a condition, separated by spinYield(), followed if
>  * necessary with use of a blocking synchronization mechanism.  A
>  * spin-loop that invokes this method on each iteration is likely to
>  * be more responsive than it would otherwise be.
>  */
>  public static void spinYield();
> }

I like the "more responsive than it would otherwise be" part. That certainly 
describes how this is different than an empty loop. But the choice of "mainly 
applicable" in spinYield() is exactly opposite from the main use case 
spinLoopHint() is intended for (which is somewhere between "indefinite 
spinning" and "I don't care what kind of spinning"). This JavaDoc looks like a 
good description of spinYield() and it's intended main use cases, but this 
stated intent and expectations (when compared to just doing an empty spin loop) 
works in the opposite direction of what spinLoopHint's intent and expectations 
need to be for it's common use cases.

> 
>> Anyone running indefinite spin loops on a uniprocessor deserves whatever they
>> get. Yielding in order to help them out is not mercy. Let Darwin take care of
>> them instead.
>> 
>> But indefinite user-mode spinning on many-core systems is a valid and common
>> use case (see the disruptor link in my previous e-mail).
> 
>> In such situations the spinning loop should just be calling yield(), or
>> looping for a very short count (like your magic 64) and then yielding. A
>> "magically choose for me whether reaction time or throughput or being nice to
>> others is more important" call is not a useful hint IMO.
>> 
>> Like in my uniprocessor comment above, any program spinning indefinitely (or
>> for a non-trivial amount of time) with load > # cpus deserves what it gets.
> 
> The m

Re: [concurrency-interest] Spin Loop Hint support: Draft JEP proposal

2015-10-15 Thread Gil Tene

> On Oct 15, 2015, at 11:32 PM, Doug Lea <d...@cs.oswego.edu> wrote:
> 
> On 10/14/2015 11:53 PM, Gil Tene wrote:
>> I agree on the separation between spin-hinting and monitor-like constructs.
>> But not so much on the analogy to or use of the term "yield" to describe what
>> is intended y spin hints.
>> 
> 
> I've been focussing on the spec, which still seems to best support
> this naming. Let's try fleshing out some more (for no-arg version).
> 
>  /**
>   * A hint to the platform that the current thread is momentarily
>   * unable to progress until the occurrence of one or more actions
>   * of one or more other threads. The method is mainly applicable
>   * in spin-then-block constructions entailing a bounded number
>   * of re-checks of a condition, separated by spinYield(), followed
>   * if necessary with use of a blocking synchronization mechanism.
>   */
>  public static void spinYield();

I don't think that this is a good description of the use cases. Yes, the hint 
is helpful for spin-then-block constructions, but that's just a part of where 
it can help. In fact, I expect the hint to be very applicable for 
indefinitely-spinning loops, and I expect the measurable impact there to be 
much more reliably noticed because such loops are invariably concerned with 
fast reaction times above all else.

I also don't think that the "…momentarily unable to progress until the 
occurrence of one or more actions of one or more other threads. " is true: 
while (!(done || (count++ > threshold))) { spinLoopHint(); } can progress 
without any action by any other thread.

As noted in my proposed JavaDoc, I see the primary indication of the hint to be 
that the reaction time to events that would cause the loop to exit (e.g. in 
nanosecond units) is more important to the caller than the speed at which the 
loop is executing (e.g. in "number of loop iterations per second" units). So if 
we are focusing on the spec, here is my suggested (edited to be more specific
) spec:

/**
* Provide the JVM with a hint that this call is made from within a spinning
* loop. The JVM may assume that the reaction time to events that would
* cause the loop to terminate is more important than the speed of executing
* the loop (e.g. in terms of number of loop iterations per second).
* Power savings may also occur, but those are considered incidental to the 
* primary purpose of improving reaction time. The JVM will not slow down
* the loop execution to a point where execution will be delayed indefinitely,
* but other choices of loop execution speed are system-specific. Note that a
* nop is a valid implementation of this hint.
*/

> What should be the response to this hint? When applicable
> and available, the JVM should just issue PAUSE. But on a uniprocessor,
> or when load average is easily detected to be high, or
> on a tightly packed cloud node, a plain yield or something
> along those lines might be a better use of this hint, that
> the spec should not rule out.

Anyone running indefinite spin loops on a uniprocessor deserves whatever they 
get. Yielding in order to help them out is not mercy. Let Darwin take care of 
them instead.

But indefinite user-mode spinning on many-core systems is a valid and common 
use case (see the disruptor link in my previous e-mail). And because a spin 
loop hint is extremely useful for indefinitely spinning loop situations, and a 
spin hint is primarily intended to improve the reaction time of spin loops, I 
would describe any explicit yielding by the JVM at the hint point as 
mis-behavior. [Not quite an invalid behavior, because we don't want to specify 
allowed behavior too strongly, but certainly surprising, unexpected, and highly 
disappointing given the intent expressed by the hint]. Yes, the OS or 
hypervisor may choose to preempt a thread at any random point in code, 
including at these hint points, but that's their job and their problem, and not 
the JVM's. The JVM should not be in the business of voluntarily and implicitly 
yielding at specific points in code, and especially not at points in code that 
spins and hints that it wants to improve the performance of that spin.

If what you want is a spin loop that yields, write one: while (!done) {  
yield(); }. I don't see how while (!done) { spinYield(); } has any different 
meaning to the reader. It just reads as something like "yield faster, knowing 
that you are in a spin".

> Also, I believe that some x86
> hypervisors intercept PAUSE and do something roughly similar
> after repeated invocations.

As to hypervisor choices: preempting a guest OS at a PAUSE instruction is 
actually higher risk, since the PAUSE instruction could be taken while holding 
a ciritical kernel resource (e.g. mremap always grabs one spinlock while 
holding another spinlock). The trick most hypervisors seem to use is 

Re: [concurrency-interest] Spin Loop Hint support: Draft JEP proposal

2015-10-14 Thread Gil Tene
I agree on the separation between spin-hinting and monitor-like constructs. But 
not so much on the analogy to or use of the term "yield" to describe what is 
intended y spin hints.

On the name choice: things that include the word "yield" vs. spinLoopHint()::

While the spinYield() example in your e-mail below can work from a semantic 
point of view in the same code, IMO the word "yield" suggests the exact 
opposite of what spnLoopHint() is intending to do or hint at: spinLoopHint is 
motivated by wanting to spin while holding onto the CPU (intentionally not 
yielding to other processes), and by the wish to improve performance while 
doing so, primarily by reducing the reaction time to a loop-terminating event. 
So spinLoopHint() is something that very selfish spinning code does without 
reducing it's selfishness. In contrast, yield() is virtually always done as an 
unselfish act (the very word suggests it). The general expectation with yield() 
calls is that that the OS scheduler can make use of the core for other uses, it 
is OK to switch away form the current task since the thread may not be making 
progress, or may be ok with relinquishing resources to be "nice" to others". As 
such, a yield() generally suggests a system call, a (relatively) large 
overhead, and a willingness to sacrifice reaction time in order to allow others 
to make use of the CPU.

My preferred choice of the spinLoopHint() name comes directly from how the 
behavior expectations of of a PAUSE-like instruction are already expressed in 
relevant documents. E.g. Intel's documentation describes PAUSE instructions as 
a "Spin Loop Hint".

On the no-args vs. "with some arg" variants:

With regards to passing a time value (e.g. the yield(long nanoseconds) example 
in your e-mail below): A spinLoopHint() is a natural fit for loops that already 
do their own spinning, and as such needs to allow those loop to deal with their 
own composition, choices around termination conditions, state updates, and 
choices about backing of or employing time or count based termination 
behaviors. Choosing and dictating a specific mechanism (nanoseconds) or 
terminating condition checks will interfere with the compostability of 
spinLoopHint() into such code. To use a specific example: E.g. the yield(long 
nanoSeconds) form (even if it's called spinLoopHint(long nanoSeconds)) would 
not be directly usable in the following code:

while (!(doneCond1 || doneCond2 || count++ > backOffThreshld) {
spinLoopHint():
}

Similarly, a no-args spinLoopHint() will cleanly drop into things like the 
disruptor's bust spin waitFor() variant 
(https://github.com/LMAX-Exchange/disruptor/blob/f29b3148c2eef3aa2dc5d5f570d7dde92b2f98ba/src/main/java/com/lmax/disruptor/BusySpinWaitStrategy.java#L28),
 but something that takes a nanoseconds argument would not.

I think that compose-abiliy should be our main driver here. Java code already 
knows how to spin in many interesting ways. It just needs to have away to hint 
that reaction time is more important that speed in the spin, and that's what 
I'm suggesting as the main purpose of a spinLoophint(). See proposed JavaDoc 
below.

— Gil.

/**
 * Provide the JVM with a hint that this call is made from within a spinning
 * loop. The JVM may assume that the speed of executing the loop (e.g. in
 * terms of number of loop executions per second) is less important than the
 * reaction time to events that would cause the loop to terminate, or than
 * potential power savings that may be derived from possible execution
 * choices. The JVM will not slow down the loop execution to a point where
 * execution will be delayed indefinitely, but other choices of loop execution
 * speed are system-specific. Note that a nop is a valid implementation of
 * this hint.
 */
public static void spinLoopHint() {
}


> On Oct 14, 2015, at 11:04 PM, Doug Lea  wrote:
> 
> Some notes after reading follow-ups.
> 
> One question is whether there should be a method that clues in
> the JVM about what change is being waited for. This is the territory of
> monitor-like constructions (see below), as opposed to the
> yield/sleep-like constructions that Gil was initially proposing.
> 
> For these, the next question is whether this should be more
> like Thread.yield() vs Thread.sleep(). If it could be like
> sleep, then new a API might not be needed: JVMs could
> implement sleep(0, 1) (or any small value of nanosec arg)
> using a PAUSE instruction on platforms supporting them.
> But sleep is also required to check interrupt status,
> which means that at least one extra load would be needed
> in addition to PAUSE. So it seems that something yield-like
> (with no obligation to check interrupt) is still desirable,
> leading either to my original suggestion:
> 
>  /**
>   * A hint to the platform that the current thread is momentarily
>   * unable to progress...
>   */
>  public static void spinYield();
> 
> OR something more analogous to sleep, but 

Re: Spin Loop Hint support: Draft JEP proposal

2015-10-09 Thread Gil Tene

> On Oct 8, 2015, at 6:18 PM, John Rose <john.r.r...@oracle.com> wrote:
> 
> On Oct 8, 2015, at 12:39 AM, Gil Tene <g...@azul.com> wrote:
>> 
>> On the one hand:
>> 
>> I like the idea of (an optional?) boolean parameter as a means of hinting at 
>> the thing that may terminate the spin. It's probably much more general than 
>> identifying a specific field or address. And it can be used to cover cases 
>> that poll multiple addresses (an or in the boolean) or look a termination 
>> time. If the JVM can track down the boolean's evaluation to dependencies on 
>> specific memory state changes, it could pass it on to hardware, if such 
>> hardware exists.
> 
> Yep.  And there is a user-mode MWAIT in SPARC M7, today.

Cool. Didn't know that. So now It's SPARC M7 and ARM v8. Both fairly new, but 
the pattern of monitoring a single address (or range) and waiting on a 
potential change to it seems common (and similar to the kernel mode 
MONITOR/MWAIT in x86). Anything similar coming (or already here) in Power or 
MIPS?

> For Intel, Dave Dice wrote this up:
>  https://blogs.oracle.com/dave/entry/monitor_mwait_for_spin_loops

Cool writeup. But with the current need to transition to kernel mode this may 
work for loops that want to idle and save power and are willing to sacrifice 
reaction time to do so. But it is the opposite of what a spinHintLoop() would 
typically be looking to do. On modern x86, for example, adding a pause 
instruction improves the reaction speed of the spin loop (see charts attached 
to JEP), but adding the trapping cost and protection mode transition of a 
system call to do an MWAIT will almost certainly do the opposite.

If/when MONITOR/MWAIT becomes available in user mode, it will join ARM v8 and 
SPARC M7 in a common useful paradigm.

> Also, from a cross-platform POV, a boolean would provide an easy to use 
> "hook" for profiling how often the polling is failing.  Failure frequency is 
> an important input to the tuning of spin loops, isn't it?  Why not feed that 
> info through to the JVM?

I don't follow. Perhaps I'm missing something. Spin loops are "strange" in that 
they tend to not care about how "fast" they spin, but do care about their 
reaction time to a change in the thing(s) they are spinning on. I don't think 
profiling will help here…

E.g. in the example tests for this JEP on Ivy Bridge Xeons, adding an 
intrinsified spinLoopHint() to the a simple spin volatile value loop appears to 
reduce the "spin throughput" by a significant ratio (3x-5x for L1-sharing 
threads), but also reduces the reaction time by 35-50%.

> ...
>> and if/when it does, I'm not sure the semantics of passing the boolean 
>> through are enough to cover the actual way to use such hardware when it 
>> becomes available.
> 
> The alternative is to have the JIT pattern-match for loop control around the 
> call to Thread.yield. That is obviously less robust than having the user 
> thread the poll condition bit through the poll primitive.

I dont' think that's the alternative. The alternative(s) I suggest require no 
analysis by the JIT:

The main means of spin loop hinting I am suggesting is a simple no args hint. 
[Folks seem to be converging on using Thread as the home for this stuff, so 
I'll use that]:

E.g.:
while (!done) {
Thread.spinLoopHint();
}

The second form I'm suggesting (mostly in reaction to discussion on this 
thread) directly captures the notion that a single address is being monitored:

E.g. 

volatile boolean done;
static final Field doneField = …;
...
Thread.spinExecuteWhileTrue( () -> !done, doneField, this ); // ugly method 
name I'm not married to...

or a slighltly more complicated: 

Thread.spinExecuteWhileTrue( () -> { count++; return !done;} , doneField, this 
); 

[These Thread.spinExecuteWhileTrue() examples will execute the BooleanSupplier 
each time, but will only watch the specified field for changes in the spin 
loop, potentially pausing the loop until a change in the field is detected, but 
will not pause indefinitely. This can be implemented with a MONITOR/MWAIT, 
WFE/SEVL, or by just using a PAUSE instruction and not watching the field at 
all.]

(for Java 9, a varhandle variant of the above reflection based model is 
probably more appropriate. I spelled this with the reflection form for 
readability by pre-varhandles-speakers).

Neither of these forms require any specific JIT matching or exploration. We 
know the first form is fairly robust on architectures that support stuff like 
PAUSE. The second form will probably be robust both architectures that support 
MWAIT or WFE, and on those that support PAUSE (those just won't watch anything).

On how this differs from a single boolean parameter: My notion (in the example 
above) of a single poll variable would be one that specifically

Re: Spin Loop Hint support: Draft JEP proposal

2015-10-08 Thread Gil Tene

On Oct 7, 2015, at 3:01 PM, John Rose 
<john.r.r...@oracle.com<mailto:john.r.r...@oracle.com>> wrote:

On Oct 5, 2015, at 2:41 AM, Andrew Haley 
<a...@redhat.com<mailto:a...@redhat.com>> wrote:

Hi Gil,

On 04/10/15 17:22, Gil Tene wrote:

Summary

Add an API that would allow Java code to hint that a spin loop is
being executed.


I don't think this will work for ARM, which has a rather different
spinlock mechanism.

Instead of PAUSE, we wait on a lock word with WFE.  WFE puts a core
into a lightweight sleep state waiting on a particular address (the
lock word) and a write to the lock word wakes it up.  This is very
useful and somewhat analogous to 86's MONITOR/MWAIT.

I can't immediately see how to generalize your proposal to ARM, which
is a shame.

Suggestion:  Allow the hint intrinsic to take an argument, from which
a JIT can infer a memory dependency (if one is in fact present).

Even if we are just targeting a PAUSE instruction, I think it is helpful
to the JIT to add more connection points (beyond control flow) between
the intrinsic and the surrounding loop.

class jdk.internal.vm.SpinLoop {
/** Provides a hint to the processor that a spin loop is in progress.
 *  The boolean is returned unchanged.  The processor may assume
 *  that the loop is likely to continue as long as the boolean is false.
 *  The processor may pause or wait after a false result, if there is
 *  some reason to believe that the boolean argument, if re-evaluated,
 *  will be false again.  Any pausing behavior is system-specific.
 *  The processor may not pause indefinitely.
 *  Example:
 * {@code
MyMailbox mb = …;
while (true) {
  if (!pollSpinExit(mb.hasMail())  continue;
  Object m = mb.getMail();
  if (m != null)  return m;
}
 * }
 * /
   @jdk.internal.HotSpotIntrinsicCandidate
public static boolean pollSpinExit(boolean spinExit) { return spinExit; }
}

I'm going to guess that the extra hinting provided by the parameter would
make it easier for a JIT to generate MWAIT and WFEs.

On the one hand:

I like the idea of (an optional?) boolean parameter as a means of hinting at 
the thing that may terminate the spin. It's probably much more general than 
identifying a specific field or address. And it can be used to cover cases that 
poll multiple addresses (an or in the boolean) or look a termination time. If 
the JVM can track down the boolean's evaluation to dependencies on specific 
memory state changes, it could pass it on to hardware, if such hardware exists.

On the other hard:

Unfortunately, I don't think that hardware support that can receive the address 
information exists right now, and if/when it does, I'm not sure the semantics 
of passing the boolean through are enough to cover the actual way to use such 
hardware when it becomes available. It is probably premature to design a 
generic way to provide addresses and/or state to this "spin until something 
interesting changes" stuff without looking at working examples. A single 
watched address API is much more likely to fit current implementations without 
being fragile.

ARM v8's WFE is probably the most real user-mode-accesible thing for this right 
now (MWAIT isn't real yet, as it's not accessible from user mode). We can look 
at an example of how a spinloop needs to coordinate the use of WFE, SEVL, and 
the evaluation of memory location with load exclusive operations here: 
http://lxr.free-electrons.com/source/arch/arm64/include/asm/spinlock.h . The 
tricky part is that the SEVL needs to immediately proceed the loop (and all 
accesses that need to be watched by the WFE), but can't be part of the loop (if 
were in the loop the WFE would always trigger immediately). But the code in the 
spinning loop can can only track a single address (the exclusive tag in load 
exclusive applies only the the most recent address used), so it would be wrong 
to allow generic code in the spin (it would have to be code that watches 
exactly one address).

My suspicion is that the "right" way to capture the various ways a spin loop 
would need to interact with RFE logic will be different than tracking things 
that can generically affect the value of a boolean. E.g. the evaluation of the 
boolean could be based on multiple addresses, and since it's not clear (in the 
API) that this is a problem, the benefits derived would be fragile. In 
addition, there can validly be state mutating logic in the loop (e.g. 
counting), and implicitly re-executing that logic repeatedly inside a 
pollSpinExit(booleanThatOnlyWatchesOneAddress) call would seem "wrong" (the 
logic would presumably proceed the call, and it would be surprising to see it 
execute more than once within the call).

I suspect that the right way to deal with RFE would be to provide an API that 
is closer to what it needs (and which is different from spin-hinting in the 
loop). E.g. some way to designate the beginning of the loop 

Re: Spin Loop Hint support: Draft JEP proposal

2015-10-07 Thread Gil Tene

> On Oct 6, 2015, at 6:50 PM, Joseph D. Darcy <joe.da...@oracle.com> wrote:
> 
> 
> On 10/6/2015 6:28 PM, Gil Tene wrote:
>>> On Oct 6, 2015, at 1:02 PM, Doug Lea <d...@cs.oswego.edu> wrote:
>>> 
>>> On 10/04/2015 12:22 PM, Gil Tene wrote:
>>>> I would like to circulate this draft JEP proposal for initial review and 
>>>> consensus building purposes.
>>>> 
>>> Some background: about two years ago, Dave Dice and I put together
>>> a pre-jep proposal for JVM support for recent CPU features covering:
>>> 
>>> (1) MWAIT/PAUSE/etc (for spins as well as other uses people have noted);
>>> (2) Core topology/neighborhood information and;
>>> (3) 2CAS, as a foothold on HTM that could still be reasonably efficient
>>> on non-transactional processors.
>>> 
>>> My understanding of the result of this effort was that Oracle JVM engineers
>>> didn't think they had resources to do this for jdk9. It didn't occur to
>>> me that non-Oracle contributors might want to cherry-pick one (some
>>> of (1) above). It seems plausible to do this, but only if designed
>>> as the first of some possible enhanced support along these lines.
>> Good point. But that's what an actual community is about. Isn't it?
>> 
>> We (Azul) are volunteering the resources for spinloopHint(). Including
>> experimentation, implementation, testing, and even a TCK (which in this case
>> will be trivial). So the vast majority of resources needed will not be coming
>> other budgeted jdk9 resources.
>> 
>> I certainly recognize that there will still be work involved that others will
>> end up having to do: reviewing, arguing, contributing opinions, etc., as well
>> as integrating the work into the whole. But this specific proposed JEP is 
>> about
>> as narrow and low risk as you can get. especially from a specification point 
>> of
>> view (e.g. intrinsic implementation can be left under a flag if deemed risky 
>> to
>> stability or schedule).
>> 
>> As for fitting in with larger-picture or theme things (listed above). I 
>> think that
>> agonizing over the choice of where to put this is important (e.g. the 
>> Thread.spinLoopHint()
>> idea, or a create new class that other hints will go into in the future, and 
>> where?).
>> But I don't think that there is good reason to bundle this work with e.g. 
>> 2CAS, HTM,
>> and affinity. Work. While we can think of them all as "support for recent 
>> CPU features",
>> they are very different (and probably have multiple other unrelated 
>> groupings).
>> 
>> MWAIT (and the like) and PAUSE do deserve some co-thinking (see earlier 
>> discussion
>> on the thread). So does a discussion about a capturing abstraction like 
>> synchronic
>> (raised in concurrency interest), But given the actual common uses already 
>> waiting
>> for a spinLoopHint(), the very tangible and immediate benefit it shows,  and 
>> the fact that
>> most of the use cases wouldn't be able to make use of MWAIT (or the like), 
>> and some
>> won't be able to use a synchronic-like thing, I think that either a 
>> spin-hint-only JEP
>> is not just a good "shortcut", but also an actual stand-alone feature need.
>> 
> 
> Taking a long-term view, it seems to me premature to burn this kind of hint 
> into the Java SE API (effectively) forever in the absence of experience that 
> the hint in this form is useful and will continue to be useful in 5 years, 10 
> years, etc.
> 
> If the hint started out as a JDK-specific API, there would be (some) more 
> room to modify or drop the API in the future, leaving open the possibility of 
> migrating the functionality to the Java SE API too.
> 
> -Joe

While I don't disagree with the need for long term thinking, I think this JEP 
represents exactly that. It is hardly "premature". "Very late" is probably a 
much better description.

There is overwhelming evidence and experience showing that this exact form of 
hint is useful, and will likely continue to be useful for a decade or more. 
Spin hinting isn't something new. This hint technique (include an explicit 
instruction or function call hinting that you are in a spin loop) has been 
dominantly and beneficially used for over a decade in virtually all spinning 
code *other* than Java. E.g. Linux (and probably all other OSs) uses this for 
virtually all kernel spinning situations on x86 and PowerPC. POSIX libraries do 
so too in mutexes and semaphores in user mode. Even the JVM's own custom C++ 
spinning code has been doing it for many y

Re: Spin Loop Hint support: Draft JEP proposal

2015-10-07 Thread Gil Tene


Sent from Gil's iPhone

> On Oct 7, 2015, at 1:14 AM, Andrew Haley <a...@redhat.com> wrote:
> 
>> On 05/10/15 21:43, Gil Tene wrote:
>> 
>> I see SpinLoopHint as very separate from things like MONITOR/WAIT
>> (on x86) and WFE/SEV (on ARM), as well as any other "wait in a nice
>> way until this state changes" instructions that other architectures
>> may have or add.
>> 
>> Mechanisms like MONITOR/WAIT and WFE/SEV provide a way to
>> potentially wait for specific state changes to occur. As such, they
>> can be used to implement a specific form of a spin loop (the most
>> common one, probably). But they do not provide for generic spinning
>> forms. E.g. loops that have multiple exit conditions in different
>> memory locations, loops that wait on internal state changes that are
>> no affected by other CPUs (like "spin only this many times before
>> giving up" or "spin for this much time"), and loops that may use
>> transactional state changes (e.g. LOCK XADD, or wider things with
>> TSX) are probably "hard" to model with these instructions.
> 
> Yes, you're right: there's no real way to combine these things, and
> support for WFE requires some other kind of interface -- if I ever
> manage to think of a nice way to express it in Java.  So, my
> apologies for hijacking this thread, but now you've got me thinking.
> 
> In an ideal world there would be a timer associated with WFE which
> would trigger after a short while and allow a thread to be
> descheduled.  However, it is possible to set a periodic timer which
> regularly signals each worker thread, giving it the opportunity to
> block if unused for a long time.  This should make a much more
> responsive thread pool, so that when worker threads are active they
> respond in nanoseconds rather than microseconds.

The problem with using timer based interrupts to kick out of WFE or MWAIT 
situations is that the granularity is often too thin for timers (and 
interrupts). E.g. j.u.c often uses "magic number of spins" of 64 or so before 
backed my out of the spin. That's just too short to get a timer going (and 
canceled) in, and the overhead of interrupt handling will overwhelm the actual 
action being attempted.

"What we really need" for WFE/MEAIT hardware instructions to be useful in this 
space (of spin-for-bit-before-giving-up) is for the instructions to take a 
timeout argument (e.g. # of clock cycles. A power of two would probably 
suffice, so not slot of bits needed). But that's just not how they work on 
current HW...

> 
> [ An aside: WFE is available in user mode, and according to Intel's
> documentation it should be possible to configre an OS to use
> MONITOR/WAIT in user mode too.  I don't know why it doesn't work. ]

While there are some ambiguous suggestions that MONITOR/MWAIT may be available 
in CPLs above 0 in some documentation, the current documentation for the  
actual MWAIT instruction is pretty clear about it only working in privilege 
level 0. So maybe this will be relaxed in the future?

In any case, even if it were user-mode-accessible, MWAIT may not appropriate 
for latency-sensitive spinning because it can apparently take 1000s of cycles 
to come out of the C-state modes it goes into. At those level, you may be 
better off blocking or yielding, and no one would make use of it if they care 
about quick reaction time. It may be that it's not that bad, depending on the 
cstate requested, but the fact that Linux kernels don't currently use MWAIT for 
spin loops suggests that it not good for that use case yet.

For ARM, I expect WFE/SEV to need to evolve as well, and for other reasons, 
even fit use within OSs. The current WFE/SEV scheme is not scalable. While it 
probably works ok for spinning at the kernel level on hardware that only has s 
handful of cores, the fact that the event WFE waits for (and SEV sends) is 
global to the system will break things as core counts grow (it is the hardware 
equivalent of wait/notifyAll() with a single global monitor). I expect that for 
OSs to use it for spinning on many-core systems, there would need to be some 
de-muxing capability added (e.g. by address or by some id). In its current 
form, it is probably not ready for exposure to user mode code. (Imagine what 
would happen if user code started doing system-wide SEVs on every unlock).

Re: Spin Loop Hint support: Draft JEP proposal

2015-10-06 Thread Gil Tene

> On Oct 6, 2015, at 7:54 AM, Rezaei, Mohammad A. <mohammad.rez...@gs.com> 
> wrote:
> 
> Thread.yield(), Thread.sleep(), Thread.spinLoopHint()?

That's probably a good place, especially since it doesn't add a new class. Will 
need feedback about the concerns of adding methods to existing commonly used 
classes though. E.g. with non-static methods (which this isn't) there is often 
the "what if someone already has a xyz() method in a subclass" concern. Less of 
an issue with static methods.

> Somehow, it feels right (as in consistent), but also off at the same time (as 
> in, why are these on Thread).
> 
> I'd take consistent over bespoke-one-static-method-class, unless there was a 
> concerted effort to consolidate other related api/concerns.
> 
> Thanks
> Moh
> 
>> -Original Message-
>> From: core-libs-dev [mailto:core-libs-dev-boun...@openjdk.java.net] On Behalf
>> Of Gil Tene
>> Sent: Tuesday, October 06, 2015 10:05 AM
>> To: Andrew Haley
>> Cc: hotspot-...@openjdk.java.net; core-libs-dev@openjdk.java.net
>> Subject: Re: Spin Loop Hint support: Draft JEP proposal
>> 
>> 
>> 
>> Sent from Gil's iPhone
>> 
>>> On Oct 6, 2015, at 1:16 AM, Andrew Haley <a...@redhat.com> wrote:
>>> 
>>>> On 06/10/15 05:32, Gil Tene wrote:
>>>> 
>>>> I don't think of this as platform specific. And it's not much lower
>>>> level than e.g. some java.util.concurrent stuff (but probably
>>>> doesn't belong in that package because it's not really about
>>>> concurrency). I'm looking for a proper Java SE spec'ed way to do
>>>> this. So sun.misc.* won't work. I'm sure we don't want another
>>>> Unsafe for people to abuse...
>>>> 
>>>> But naming the class and method is the smaller, easier detail. Right? ;-)
>>> 
>>> Sure.  I would have thought, though, that java.util.concurrent was a
>>> natural home for this.  Is there any kind of userland spinlock which
>>> is not about concurrency?
>> 
>> The same can be asked about Thread.notify().
>> 
>> To me, spinKoopHint() fits in (as in "probably a method in the same class")
>> with other performance-oriented hints. Like prefetch variants (which we don't
>> have but also probably need. E.g. prefetchWithIntentToWrite()). And placing
>> prefetch hints in j.u.c would not make much sense.



Re: Spin Loop Hint support: Draft JEP proposal

2015-10-06 Thread Gil Tene

> On Oct 6, 2015, at 1:02 PM, Doug Lea <d...@cs.oswego.edu> wrote:
> 
> On 10/04/2015 12:22 PM, Gil Tene wrote:
>> I would like to circulate this draft JEP proposal for initial review and 
>> consensus building purposes.
>> 
> 
> Some background: about two years ago, Dave Dice and I put together
> a pre-jep proposal for JVM support for recent CPU features covering:
> 
> (1) MWAIT/PAUSE/etc (for spins as well as other uses people have noted);
> (2) Core topology/neighborhood information and;
> (3) 2CAS, as a foothold on HTM that could still be reasonably efficient
> on non-transactional processors.
> 
> My understanding of the result of this effort was that Oracle JVM engineers
> didn't think they had resources to do this for jdk9. It didn't occur to
> me that non-Oracle contributors might want to cherry-pick one (some
> of (1) above). It seems plausible to do this, but only if designed
> as the first of some possible enhanced support along these lines.

Good point. But that's what an actual community is about. Isn't it?

We (Azul) are volunteering the resources for spinloopHint(). Including
experimentation, implementation, testing, and even a TCK (which in this case
will be trivial). So the vast majority of resources needed will not be coming
other budgeted jdk9 resources.

I certainly recognize that there will still be work involved that others will
end up having to do: reviewing, arguing, contributing opinions, etc., as well
as integrating the work into the whole. But this specific proposed JEP is about
as narrow and low risk as you can get. especially from a specification point of
view (e.g. intrinsic implementation can be left under a flag if deemed risky to
stability or schedule).

As for fitting in with larger-picture or theme things (listed above). I think 
that
agonizing over the choice of where to put this is important (e.g. the 
Thread.spinLoopHint()
idea, or a create new class that other hints will go into in the future, and 
where?).
But I don't think that there is good reason to bundle this work with e.g. 2CAS, 
HTM,
and affinity. Work. While we can think of them all as "support for recent CPU 
features",
they are very different (and probably have multiple other unrelated groupings).

MWAIT (and the like) and PAUSE do deserve some co-thinking (see earlier 
discussion
on the thread). So does a discussion about a capturing abstraction like 
synchronic
(raised in concurrency interest), But given the actual common uses already 
waiting
for a spinLoopHint(), the very tangible and immediate benefit it shows,  and 
the fact that
most of the use cases wouldn't be able to make use of MWAIT (or the like), and 
some
won't be able to use a synchronic-like thing, I think that either a 
spin-hint-only JEP
is not just a good "shortcut", but also an actual stand-alone feature need.

> -Doug
> 
> 
> 



Re: Spin Loop Hint support: Draft JEP proposal

2015-10-06 Thread Gil Tene


Sent from Gil's iPhone

> On Oct 6, 2015, at 1:16 AM, Andrew Haley <a...@redhat.com> wrote:
> 
>> On 06/10/15 05:32, Gil Tene wrote:
>> 
>> I don't think of this as platform specific. And it's not much lower
>> level than e.g. some java.util.concurrent stuff (but probably
>> doesn't belong in that package because it's not really about
>> concurrency). I'm looking for a proper Java SE spec'ed way to do
>> this. So sun.misc.* won't work. I'm sure we don't want another
>> Unsafe for people to abuse...
>> 
>> But naming the class and method is the smaller, easier detail. Right? ;-)
> 
> Sure.  I would have thought, though, that java.util.concurrent was a
> natural home for this.  Is there any kind of userland spinlock which
> is not about concurrency?

The same can be asked about Thread.notify().

To me, spinKoopHint() fits in (as in "probably a method in the same class") 
with other performance-oriented hints. Like prefetch variants (which we don't 
have but also probably need. E.g. prefetchWithIntentToWrite()). And placing 
prefetch hints in j.u.c would not make much sense.

Re: Spin Loop Hint support: Draft JEP proposal

2015-10-05 Thread Gil Tene

> On Oct 5, 2015, at 1:56 AM, Aleksey Shipilev <aleksey.shipi...@oracle.com> 
> wrote:
> 
> Hi Gil,
> 
> Glad to see this being addressed!
> 
> On 10/04/2015 07:22 PM, Gil Tene wrote:
>> We propose to add a method to the JDK which would be hint that a spin
>> loop is being performed. E.g.
>> jdk.util.PerformanceHints.spinLoopHint(), which will hopefully evolve
>> to a Java SE API, e.g. java.util.PerformanceHints.spinLoopHint(). The
>> specific name space location, class name, and method name will be
>> determined as part of development of this JEP.
> 
> Yes, that would be a tough part. JDK is usually oblivious of these
> low-level platform-specific hints, they go to sun.misc.* (e.g. Unsafe,
> @Contended and others...). I'll let Project Leads to make that call :)

I don't think of this as platform specific. And it's not much lower level than 
e.g. some java.util.concurrent stuff (but probably doesn't belong in that 
package because it's not really about concurrency). I'm looking for a proper 
Java SE spec'ed way to do this. So sun.misc.* won't work. I'm sure we don't 
want another Unsafe for people to abuse...

But naming the class and method is the smaller, easier detail. Right? ;-)

> 
>> [4] HotSpot WebRevs for prototype implementation which intrinsifies
>> org.performancehintsSpinHint.spinLoopHint()
>> http://ivankrylov.github.io/spinloophint/webrev/
>> <http://ivankrylov.github.io/spinloophint/webrev/>
> 
> * product_pd (platform-dependent) flags can be used to conditionalize
> the support on a platform. This helps to avoid vm_version_* tricks.

Thx.

> * Does spinloophint imply membar as well? x86_64.ad suggests so.
> library_call.cpp suggests so. It seems weird to conflate the two, even
> though it's understandable you want to piggyback on existing machinery.

Semantically. spinLoopHint() has zero semantic requirements, and therefore no
implied behavior of any kind (no membar).

We implemented the pause intrinsic as a variant representation of a membar
because that was one relatively simple way of having it stay within the loop (or
with whatever control flow it is under). The "membar" variant implementation
doesn't prohibit reordering of loads or stores.

> 
> * I think spinLoopHint() misses a @HotspotIntrinsicCandidate annotation.

Will add that in future prototypes.

> Thanks,
> -Aleksey
> 



Re: Spin Loop Hint support: Draft JEP proposal

2015-10-05 Thread Gil Tene

> On Oct 5, 2015, at 2:04 AM, Alan Bateman <alan.bate...@oracle.com> wrote:
> 
> On 04/10/2015 17:22, Gil Tene wrote:
>> I would like to circulate this draft JEP proposal for initial review and 
>> consensus building purposes.
>> 
>> I'm cross-posting to both core-libs-dev and hotspot-dev. From a spec 
>> perspective, the main change it suggests is the addition of a method (and 
>> probably a class to hold it) to the core libraries. And intrinsifying 
>> implementations would involve changes in HotSpot (see prototype WebRev links 
>> included below).
>> 
>> Draft JEP follows inline...
>> 
>> — Gil.
>> 
> I don't see any mention in the JEP about existing mechanisms to give hints to 
> the runtime, as in @Contended (JEP 142) and @HotSpotIntrinsicCandidate 
> (JDK-8076112). Just wondering if there is an alternative to explore where 
> something like @Spinloop int ignore; would emit the PAUSE. This would at 
> least give a starting point for usages in the JDK. The bigger question is 
> whether to expose such runtime hints in an API. It's very much for the 
> advanced developer and if an API is really needed then it might be something 
> for a JDK-specific API.

Alan,

The JEP is not about performance hints in general, it's about (and only about) 
spin loop hints. While there are many "spelling" options possible, I don't 
think annotations will fit well. The needs of a spinLoopHint seem to be very 
different from those of the @Contended and @HotSpotIntrinsicCandidate in 
several way. Annotations have limited places in code where they can be applied. 
Specifically, they apply to declarations but not to execution. E.g. "@SpinLoop 
int ignore;" is a declaration: it does not emit any bytecodes. Without actually 
using the variable "ignore" later in the code, there would be no way to link it 
to code. The place where actual byte-codes are emitted would need to tie in to 
the declared variable, which would probably make the "spelling" more cumbersome 
than a trivial method call (with a hopefully self-explanatory name).

— Gil.



Re: Spin Loop Hint support: Draft JEP proposal

2015-10-05 Thread Gil Tene
I see SpinLoopHint as very separate from things like MONITOR/WAIT (on x86) and 
WFE/SEV (on ARM), as well as any other "wait in a nice way until this state 
changes"  instructions that other architectures may have or add.

Mechanisms like MONITOR/WAIT and WFE/SEV provide a way to potentially wait for 
specific state changes to occur. As such, they can be used to implement a 
specific form of a spin loop (the most common one, probably). But they do not 
provide for generic spinning forms. E.g. loops that have multiple exit 
conditions in different memory locations, loops that wait on internal state 
changes that are no affected by other CPUs (like "spin only this many times 
before giving up" or "spin for this much time"), and loops that may use 
transactional state changes (e.g. LOCK XADD, or wider things with TSX) are 
probably "hard" to model with these instructions.

In contrast, spinLoopHint() is intended to hint that the loop it is in is 
spinning, regardless of the logic used for the spinning or the spin 
termination. It is useful, for example, in heuristic spinning-before-blocking 
situations, where WFE/SEV MONITOR/WAIT would not be appropriate.

MONITOR/MWAIT and WFE/SEV would be a good way to implement an actual spinning 
test or atomic operation (if it were available in user mode, but alas it 
isn't). And I could see some variant of AtomicX.CompareAndSet being proposed to 
use them, but the semantics and context are different.

There are at least two architectures for which spinLoophint() is both a natural 
fit as well as the way everything else (kernels, libraries) seem to be 
spinning: In x86, the PAUSE instruction is a classic example of the 
spinLoopHint() use case. On some PowerPC implementations with multiple hardware 
threads (HMT), a lowering of the hardware thread priority is probably another 
example of a good use for spinLoopHint() [I haven't tried or tested this for 
spinLoopHint(), but that's what the linux kernel spinloops do for example: 
http://lxr.free-electrons.com/source/arch/powerpc/include/asm/spinlock.h?v=2.6.35#L116].

On some CPUs there might not (or not yet) be equivalent operation. A nop would 
be a valid way to implement it on current ARM.

— Gil.

> On Oct 5, 2015, at 2:41 AM, Andrew Haley <a...@redhat.com> wrote:
> 
> Hi Gil,
> 
> On 04/10/15 17:22, Gil Tene wrote:
> 
>> Summary
>> 
>> Add an API that would allow Java code to hint that a spin loop is
>> being executed.
> 
> 
> I don't think this will work for ARM, which has a rather different
> spinlock mechanism.
> 
> Instead of PAUSE, we wait on a lock word with WFE.  WFE puts a core
> into a lightweight sleep state waiting on a particular address (the
> lock word) and a write to the lock word wakes it up.  This is very
> useful and somewhat analogous to 86's MONITOR/MWAIT.
> 
> I can't immediately see how to generalize your proposal to ARM, which
> is a shame.
> 
> Andrew.
> 



Spin Loop Hint support: Draft JEP proposal

2015-10-04 Thread Gil Tene
I would like to circulate this draft JEP proposal for initial review and 
consensus building purposes.

I'm cross-posting to both core-libs-dev and hotspot-dev. From a spec 
perspective, the main change it suggests is the addition of a method (and 
probably a class to hold it) to the core libraries. And intrinsifying 
implementations would involve changes in HotSpot (see prototype WebRev links 
included below).

Draft JEP follows inline...

— Gil.

JEP XYZ: Spin Loop Hint

(suggested content for some JEP fields):
Authors Gil Tene
Owner   Gil Tene
TypeFeature
Status  Draft
Component   core-libs
Scope   JDK
Discussion  core dash libs dash dev at openjdk dot java dot net
Effort  S
DurationS

Summary

Add an API that would allow Java code to hint that a spin loop is being 
executed.

Goals

Provide an API that would allow Java code to hint to the runtime that it is in 
a spin loop. The API would be a pure hint, and will carry no semantic behavior 
requirements (i.e. a no-op is a valid implementation). Allow the JVM to benefit 
from spin loop specific behaviors that may be useful on certain hardware 
platforms. Provide both a no-op implementation and an intrinsic implementation 
in the JDK, and demonstrate an execution benefit on at least one major hardware 
platform.

Non-Goals

It is NOT a goal to look at performance hints beyond spin loops. Other 
performance hints, such as prefetch hints, are outside the scope of this JEP.

Motivation

Some hardware platforms benefit from software indication that a spin loop is in 
progress. Some common execution benefits may be observed:

A) The reaction time of a spin loop may be improved when a spin hint is used 
due to various factors, reducing thread-to-thread latencies in spinning wait 
situations.

and

B) The power consumed by the core or hardware thread involved in the spin loop 
may be reduced, benefitting overall power consumption of a program, and 
possibly allowing other cores or hardware threads to execute at faster speeds 
within the same power consumption envelope.

While long term spinning is often discouraged as a general user-mode 
programming practice, short term spinning prior to blocking is a common 
practice (both inside and outside of the JDK). Furthermore, as core-rich 
computing platforms are commonly available, many performance and/or latency 
sensitive applications use a pattern that dedicates a spinning thread to a 
latency critical function [1], and may involve long term spinning as well.

As a practical example and use case, current x86 processors support a PAUSE 
instruction that can be used to indicate spinning behavior. Using a PAUSE 
instruction demonstrably reduces thread-to-thread round trips. Due to it's 
benefits and commonly recommended use, the x86 PAUSE instruction is commonly 
used in kernel spinlocks, in POSIX libraries that perform heuristic spins prior 
to blocking, and even by the JVM itself. However, due to the inability to hint 
that a Java loop is spinning, it's benefits are not available to regular Java 
code.

We include specific supporting evidence: In simple tests [2] performed on a 
E5-2697 v2, measuring the round trip latency behavior between two threads that 
communicate by spinning on a volatile field, round-trip latencies were 
demonstrably reduced by 18-20nsec across a wide percentile spectrum (from the 
10%'ile to the 99.9%'ile). This reduction can represent an improvement as high 
as 35%-50% in best-case thread-to-thread communication latency. E.g. when two 
spinning threads execute on two hardware threads that share a physical CPU core 
and an L1 data cache. See example latency measurement results comparing the 
reaction latency of a spin loop that includes an intrinsified spinLoopHint() 
call [intrinsified as a PAUSE instruction] to the same loop executed without 
using a PAUSE instruction [3], along with the measurements of the it takes to 
perform an actual System.nantoTime() call to measure time.



Description

We propose to add a method to the JDK which would be hint that a spin loop is 
being performed. E.g. jdk.util.PerformanceHints.spinLoopHint(), which will 
hopefully evolve to a Java SE API, e.g. 
java.util.PerformanceHints.spinLoopHint(). The specific name space location, 
class name, and method name will be determined as part of development of this 
JEP.

An empty method would be a valid implementation of the spinLoopHint() method, 
but intrisic implementation is the obvious goal for hardware platforms that can 
benefit from it. We intend to produce an intrinsic x86 implementation for 
OpenJDK as part of developing this JEP. A prototype implementation already 
exists [4] [5] [6] [7] and results from initial testing show promise.

Alternatives

JNI can be used to spin loop with a spin-loop-hinting CPU instruction, but the 
JNI-boundary crossing overhead tends to be larger than the benefit provided by 
the instruction, at least where latency is concerned.

We could attempt to have