Hi Kim,

I like all the removal of the pending-list-locker related code, but can't really comment on the details of how it was replaced and interacts with all the GC code. The interface to the Reference class is fine, it is the GC code that pushes into the reference queue that I can't really comment on.

I have a couple of queries following up from Coleen's and Dan's reviews.

First, the synchronization of the pending-reference-list leaves me somewhat perplexed. As Coleen noted you require the heap_lock to be held but also use an Atomic::xchg. You have a comment:

+   // owns the lock.  Swap is used by parallel non-concurrent reference
+   // processing threads, where some higher level controller owns
+   // Heap_lock, so requires the lock is locked, but not necessarily by
+   // the current thread.

Can you clarify the higher-level protocol that is at play there. Asserting that "someone" owns a lock seems only marginally useful if you can't be sure who that owner is, or when they might release it. Some more direct detecting of being within this higher-level protocol would be better, if it is possible to detect.

---

As previously mentioned the change to the initialization order is a concern to me - there are always unexpected consequences of making such changes, no matter how straight-forward they seem at the time. Pre-initialization of exception classes is not really essential as the attempt to throw any of them from Java code will force initialization anyway - it's only for exceptions created and thrown from the VM code that direct initialization is needed. The problematic case is OOME because throwing the OOME may trigger a secondary OOME during that class initialization etc. Hence we try to deal with that by pre-allocating instances that do not have stacktraces etc, so they can be thrown safely. Hence my earlier comment that I think the real bug in your situation is that gen_out_of_memory_error() is not considering how far into the VM init sequence we are, and that it should be returning the pre-allocated instance with no backtrace.

---

A query on the JDK side:

src/java.base/share/classes/java/lang/ref/Reference.java

private static native Reference<? super Object> popReferencePendingList(boolean wait);

I'm intrigued by the use of the lower-bounded wildcard using Object. As Object has no super-types this looks very strange and should be equivalent to the more common upper-bounded wildcard using Object ie Reference<? extends Object> or more concisely Reference<?>. I see this construct exists in the current code for the ReferenceQueue, but I am quite intrigued as to why? (Someone getting ready for Any-types? :) )

Thanks,
David
-----

On 25/06/2016 6:09 AM, Kim Barrett wrote:
Please review this change which moves the Reference pending list and
locking from the java.lang.ref.Reference class into the VM.  This
includes elimination of the ReferencePendingListLocker mechanism in
the VM. This fixes various fragility issues arising from having the
management of this list split between Java and the VM (GC).

This change addresses JDK-8156500 by eliminating the possibility of
suspension while holding the lock. Because the locking is now done in
the VM, we have full control over where suspension may occur.

This change retains the existing interface between the reference
processor and the nio.Bits package, e.g. Reference.tryHandlePending
has the same signature and behavior; this change just pushes the
pending list manipulation down into the VM.  There are open bugs
reports, enhancement requests, and discussions related to that
interface; see below. This change does not attempt to address them.

This change additionally addresses or renders moot
https://bugs.openjdk.java.net/browse/JDK-8055232
(ref) Exceptions while processing Reference pending list

and
https://bugs.openjdk.java.net/browse/JDK-7103238
Ensure pending list lock is held on behalf of GC before enqueuing references on 
to the pending list

It is also relevant for followup discussion of
https://bugs.openjdk.java.net/browse/JDK-8022321
java/lang/ref/OOMEInReferenceHandler.java fails immediately

and
https://bugs.openjdk.java.net/browse/JDK-8149925
We don't need jdk.internal.ref.Cleaner any more - part 1

and has implications for:
https://bugs.openjdk.java.net/browse/JDK-8066859
java/lang/ref/OOMEInReferenceHandler.java failed with java.lang.Exception: 
Reference Handler thread died

In addition to the obviously relevant changes, there are a couple of
changes whose presence here might seem surprising and require further
explanation.

- Removal of a stale comment in create_vm, noticed because it was near
some code being removed as part of this change.  The comment was
rendered obsolete by JDK-8028275.

- Moved initialization of exception classes earlier in
initialize_java_lang_classes. The previous order only worked by
accident, at least for OOME.  During the bulk of the library
initialization, OOME may be thrown, perhaps due to poorly chosen
command line options.  That attempts to use one of the pre-allocated
OOME objects, and tries to fill in its stack trace.  If the Throwable
class is not yet initialized, this leads to a failed assert in the VM
because Throwable.UNASSIGNED_STACK still has a null value.  This
initialization order issue was being masked by the old pending list
implementation; the initialization of Reference ensured
InterruptedException was initialized (thereby initializing its
Throwable base class), and the initialization of Reference just
happened to occur early enough that Throwable was initialized by the
time it was needed when running certain tests.  The forced
initialization of InterruptedException is no longer needed by
Reference, but removal of it triggered test failures (and could
trigger corresponding product failures) because of this ordering
issue.

CR:
https://bugs.openjdk.java.net/browse/JDK-8156500

Webrev:
http://cr.openjdk.java.net/~kbarrett/8156500/jdk.00/
http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.00/

Testing:
RBT ad hoc nightly runtime & gc tests.

With the proposed patch for JDK-8153711 applied, locally ran nearly
35K iterations of the OomDebugTest that is part of that patch, with no
failures / deadlocks.  Without this change it would typically only take
on the order of 100 iterations to hit a deadlock failure.

Locally ran direct byte buffer allocation test:
jdk/test/java/nio/Buffer/DirectBufferAllocTest.java
This change reported 3% faster allocation times, and 1/2 the standard
deviation for allocation times.

Locally ran failing test from JDK-8022321 and JDK-8066859 a few
thousand times.
jdk/test/java/lang/ref/OOMEInReferenceHandler.java
No errors, but failures were noted as hard to reproduce.

Reply via email to