Hi Kim,

Let me chime-in although it's a bit late...

I think this is a good change to finally get rid of OOME problems in this area.

On 06/28/2016 07:45 PM, Kim Barrett wrote:
On Jun 28, 2016, at 5:33 AM, Per Liden <per.li...@oracle.com> wrote:
Patch looks good. The only thing I don't feel qualified to review is the 
initialization order change in thread.cpp, so I'll let others comments on that.
Thanks.  I’ll be following up on that area.

I like the pop-one-reference-at-a-time semantics, which simplifies things a lot 
and keeps the interface nice and clean. I was previously afraid that it might 
cause a noticeable performance degradation compared to lifting the whole list 
into Java in one go, but your testing seem to prove that's not the case.
I was concerned about that too, and had tried a different approach that also still 
supported the existing "some callers wait and others don’t" API, but it was a 
bit messy.  Coleen convinced me to try this (since it was easy) and do the measurement, 
and it worked out well.


The repeated JNI invocations do not present a big overhead, but it is measurable. For example, the following benchmark measures the time it takes after a GC cycle for a bunch of references to be transfered to Java side, enqueued into ReferenceQueue and dequeued from it:

http://cr.openjdk.java.net/~plevart/misc/PendingReferenceHandling/ReferenceEnqueueBench.java

The results on my i7-4771 PC:

Original JDK 9:

Benchmark (refCount) Mode Cnt Score Error Units ReferenceEnqueueBench.dequeueReferences 100000 ss 100 38410.515 ± 1011.769 us/op

Patched (by Kim):

Benchmark (refCount) Mode Cnt Score Error Units ReferenceEnqueueBench.dequeueReferences 100000 ss 100 42197.522 ± 1161.451 us/op


So, about 10% worse for this benchmark.

Transfering the whole list in one JNI invocation has the potential for further optimizations on the Java side (like handling the whole popped list privately without additional synchronization - if we ever find a way for java.nio.Bits to wait for it reliably - or even enqueue-ing a chunk of consecutive references destined for the same queue using a single synchronized action on the queue, etc...)

If the JNI API was something like the following:

    /* Atomically pop the pending reference list if wholeList is true,
     * or just next pending reference if wholeList is false.
* If wait is true and the pending reference list is empty, blocks until
     * it becomes non-empty, or returns null if wait is false.
     */
private static native Reference<?> popReferencePendingList(boolean wholeList, boolean wait);


Then not too much complication is needed in Reference.java (diff to current JDK 9 sources) to consume this API and already have some benefit from it:

http://cr.openjdk.java.net/~plevart/misc/PendingReferenceHandling/webrev/

There is a possible race here between ReferenceHandler calling tryHandlePending(true) and java.nio.Bits calling tryHandlePending(false) that can make the method return false to java.nio.Bits when ReferenceHandler has just popped the whole list and not yet installed it in the private pendingList field, but java.nio.Bits makes several retries in this case with exponentially increasing delays, so that does not currently present a problem. java/nio/Buffer/DirectBufferAllocTest.java test passes with this change.

And the above benchmark shows improvement instead of regression:

Proposed (my Peter):

Benchmark (refCount) Mode Cnt Score Error Units ReferenceEnqueueBench.dequeueReferences 100000 ss 100 34134.977 ± 1274.753 us/op


So what do you think? Is it worth the little additional logic on the Java side?


Regards, Peter

Reply via email to