Hi Roger,

On 03/31/2016 09:12 PM, Roger Riggs wrote:
Hi Peter,

It would be simpler to understand the changes if we solve the problems one at a time,
at least for review purposes.

Right. We can focus on one aspect at a time, but I'm still trying to keep the whole thing in a working condition at all times...


To your question in the 2nd part about the Cleaner. (webrev.11.part2)

I don't think the communication between the memory reserving thread and the unreserving thread should be mixed into the Cleaner design or implementation. The logic for the communication between reserveMemory and unreserveMemory methods should be in those two methods and isolated to Bits.java. I understand the intent for the reserving thread to poll for available memory and it might as well do something useful while it is waiting and get a hint about unreserved memory.
But it mixes together the implementations. (too much)

The problem with reserving thread getting information from unreserving thread solely by communication implemented in methods reserveMemory and unreserveMemory is that this information is not enough. Reserving thread must also get information about when all the pending unreservations have been performed so that it can either:
- trigger System.gc() to discover fresh pending unreservations, or
- finally give up with OOME

In the absence of this information, all reserving thread can do is speculate about this information by observing the timings of unreservations happening or not happening - to time out on waiting for unreservations to happen. This works (as shown in webrev.10.part2), but is not very robust or agile - it introduces unnecessary delays. This information is hidden in ReferenceHandler thread (have all pending References been enqueued?) and in the Cleaner (has the queue drained out?).

I moved the retrial and helping logic to ExtendedCleaner because I think it is reusable for other situations. But if you think it doesn't belong to ExtendedCleaner, I can move it back to Bits. We don't strictly need to help the Cleaner thread with cleanups. I did it that way because this seemed an easy way to communicate the information about the drained queue back to allocator thread and to retry reservations at appropriate intervals. But let me think about a way to just get this information without helping - similarly to what I've done it in ReferenceHandler...


Having an arbitrary thread (the one trying to allocate a DirectBuffer) help with the Cleaning puts an unknown thread perhaps with limited stack or AccessControlContext in place to call the cleaning functions is unappealing at best. The cleaning functions are less predictable than the Reference enqueuing functions already discussed but are not much more complex. In most cases they are about the complexity of the Deallocator in Direct-X-Buffer, etc.

Allocating thread could be "conditioned" before calling the cleanup action by:
- saving and clearing thread-locals
- saving and setting AccessControlContext to unprivileged.
...and restoring these back after the action

The problem with unsufficient stack is more difficult solve though. Isn't there a new annotation designed to help with that (mainly intended for critical sections of java.util.concurrent classes). The problem with using it here would be in that we don't know what cleanup action(s) might be executed since Cleaner is a general-purpose API and this annotation is only designed for parts of code that are known in advance...


Can the pieces be disentangled and still pass the DirectBufferAllocTest?

If we want to get that additional piece of information from ReferenceHandler and Cleaner, they must be entangled with Bits, but I might be able to loosen this entanglement a bit. Will try these ideas tomorrow. Stay tuned.

Regards, Peter


Roger




On 3/28/2016 1:18 PM, Peter Levart wrote:
Hi Mandy, Kim, Per and Roger

I'd like to continue the discussion about the 2nd part of removing jdk.internal.ref.Cleaner in this discussion thread.

There was some discussion about whether to synchronize with ReferenceHandler thread and wait for it to enqueue the Reference(s) or simply detect that there are no more pending Reference(s) by timing out on waiting for cleanup actions in discussion thread: "Re: Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently". Based on that discussion, I have prepared a webrev that uses an approach where the detection is performed using timeout:

http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.10.part2/

While this webrev passes the DirectBufferAllocTest, I don't have a good feeling about this approach since it is not very robust. I can imagine situations where it would not behave optimally - it would either trigger reference discovery (System.gc()) more frequently that necessary or it would cause delays in execution. So I still prefer the approach where allocating thread(s) explicitly synchronize with ReferenceHandler thread and wait for it to enqueue pending Reference(s). Luckily this can be performed in an easy way (as I will show you shortly). Waiting on discovery of pending references by ReferenceHandler thread and handing them to it could be moved to native code so that no notification would have to be performed in native code from the ReferenceHandler thread to the allocating thread(s).

But first, let me reply to Mandy's comments...


On 03/25/2016 11:20 PM, Mandy Chung wrote:
On Mar 19, 2016, at 7:00 AM, Peter Levart<peter.lev...@gmail.com>  wrote:

Here's the webrev:

     
http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.08.part2/

On 03/07/2016 07:35 PM, Mandy Chung wrote:
I studied webrev.06priv and the history of JDK-6857566.

I’m not comfortable for any arbitrary thread to handle the enqueuing of the 
pending references (this change is more about the fix for JDK-6857566).

Why? A Thread is a Thread is a Thread... When legacy Cleaner is removed, 
ReferenceHandler thread will be left with swapping pointers only - no custom 
code will be involved. The only things I can think of against using arbitrary 
thread are:
:
My uncomfort was the fix for JDK-6857566 - both enqueuing pending ref and 
invoking the cleaning code in an arbitrary thread.

Looking at it again - enqueuing the pending reference is not so much of a 
concern (simply updating the link) but the common cleaner could be used by 
other code that may only expect to be invoked in system thread that’s still my 
concern (thinking of thread locals).

As you'll see in the webrev below, enqueueing is performed solely be ReferenceHandler thread. Allocating thread(s) just wait for it to do its job. There's a little synchronization action performed at the end of enqueueing a chunk of pending references that notifies waiters (allocating threads) so that they can continue. This actually improves throughput (compared to helping enqueue Reference(s) one by one) because there's not much actual work to be done (just swapping pointers) so synchronization dominates. The goal here is to minimize synchronization among threads and by executing enqueuing of the whole bunch of pending references in private by a single thread achieves a reduction in synchronization when lots of Reference(s) are discovered at once - precisely the situation when it matters.

OTOH helping the Cleaner thread is beneficial as cleanup actions take time to execute and this is the easiest way to retry allocation while there's still chance it will succeed. As the common Cleaner is using InnocuousThread, cleanup actions can't rely on any thread locals to be preserved from invocation to invocation anyway - they are cleared after each cleanup action so each action gets empty thread locals. We could simulate this in threads that help execute cleanup actions by saving thread-locals to local variables, clearing thread-locals, executing cleanup action and then restoring thread-locals from local variables. Mandy, if you think this is important I'll add such save/clear/restore code to appropriate place.

   On the other hand, invoking Deallocator::run (deallocating the native 
memory) in arbitrary threads has no problem.  Consider me being paranoid of the 
fix for JDK-6857566.  The current list of clients using CleanerFactory::cleaner 
may be safe being called from arbitrary threads but I can’t say what will be 
added in the future.

Right, save/clear/restore thread locals then (left for next webrev)...

The allocating thread may do a System.gc() that may discover phantom reachable 
references.  All it’s interested is only the direct byte buffer ones so that it 
can deallocate the native memory.  What is the downside of having a dedicated 
Cleaner for direct byte buffer that could special case for it?
A dedicated Cleaner for direct buffers might be a good idea if other uses of 
shared Cleaner in JDK become heavy. So that helping process Cleanable(s) does 
not involve other unrelated Cleanable(s). But it comes with a price of another 
dedicated background thread.

Perhaps provide one Cleaner specific for native memory deallocation or anything 
safe to be called in arbitrary thread.  It could provide the entry point for 
the allocating thread to assist the cleaning (i.e. Bits::reserveMemory could 
call it).  That will make it explicit that this cleaner provides explicit 
control for other threads to assist the cleaning action (and JavaLangRefAccess 
would only be used by this special cleaner and not in NIO).

All clients of Unsafe.freeMemory could use that special cleaner for native 
memory deallocation use such as IOVecWrapper, DirectByteBuffer, Marlin’s 
OffHeapArray.

The common cleaner would be kept for other things to use and it should be 
lazily created to avoid another thread.

Does this sound reasonable?

Mandy


Of course. Having specialized Cleaner(s) with additional capability requires extension to the Cleaner API for some cleaners. Unfortunately java.lang.ref.Cleaner is a final class.

Here's what I propose: by transforming java.lang.ref.Cleaner into an interface implemented by a class in a concealed package (jdk.internal.ref.CleanerImpl) the public API can be left unchanged while the implementation is actually simplified (there's no injection of Cleaner.impl access function into CleanerImpl class needed any more). The result of that transformation is also the ability to specify an extension interface (ExtendedCleaner) located in a concealed package so it can only be used by system code (java.base and modules to which jdk.internal.ref is explicitly exported) and the ability to extend the functionality of implementation by subclassing it (CleanerImpl.ExtendedImpl). The guts of previous CleanerImpl are simply moved into a private nested class CleanerImpl.Task:

http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.11.part2/

I'm interested in what Roger has to say about this transformation. It is source compatible, but not binary compatible (invokevirtual vs. invokeinterface). So it can be safely performed only before JDK 9 ships.

I packed the entire retry-while-helping mechanics into the implementation of this ExtendedCleaner interface. java.nio.Bits is consequently much simplified. The common cleaner is now ExtendedCleaner as other usages besides handling deallocation of native memory are minor and are not problematic from the standpoint of arbitrary threads helping with cleanup, especially when saving/clearing/restoring of thread-locals is implemented. It would not be a problem to provide another instance, simple java.lang.ref.Cleaner this time, for other usages if needed.

And now a few words about ReferenceHandler thread and synchronization with it (for Kim and Per mostly). I think it should not be a problem to move the following two java.lang.ref.Reference methods to native code if desired:

    static Reference<?> getPendingReferences(int[] discoveryPhaseHolder)
    static int getDiscoveryPhase()

The 1st one is only invoked by a ReferenceHandler thread while the 2nd is invoked by arbitrary thread. The difference between this and webrev.09.part2 is that there's no need any more for ReferenceHandler thread to notify the thread executing the 2nd method and that there's no need for the 2nd method to perform any waiting. It just needs to obtain the lock briefly so that it can read the consistent state of two fields. Those two fields are Java static fields currently: Reference.pending & Reference.discoveryPhase and those two methods are Java methods, but they could be moved to native code if desired to make the protocol between VM and Java code more robust.

So Kim, Per, what do you think of supporting those 2 methods in native code? Would that present any problem?

With webrev.11.part2 I get a 40% improvement in throughput vs. webrev.10.part2 executing DirectBufferAllocTest in 16 allocating threads on a 4-core i7 CPU.

Regards, Peter



Reply via email to