Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> On Aug 24, 2016, at 9:26 PM, David Holmes wrote: > > Hi Kim, > > Sorry this fell off the radar … That’s okay. There was a long gap while I was working on other things. > On 24/08/2016 7:09 AM, Kim Barrett wrote: >> Mandy asked for some additional comments to make life easier for >> future readers. I've also stopped pretending the description for >> waitForReferenceProcessing is javadoc, since this function is private >> and only accessible via SharedSecrets or reflection that avoids access >> control. >> >> Changes only in the jdk tree: >> >> New webrevs: >> full: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.05/ >> incr: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.05.inc/ > > Just to be clear, in waitForReferenceProcessing() you are okay with returning > true on spurious wakeup? Yes. >> Unchanged hotspot webrevs: >> full: http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04/ >> incr: http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04.inc/ > > No further comments from me. > > Thanks, > David
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
Hi Kim, Sorry this fell off the radar ... On 24/08/2016 7:09 AM, Kim Barrett wrote: Mandy asked for some additional comments to make life easier for future readers. I've also stopped pretending the description for waitForReferenceProcessing is javadoc, since this function is private and only accessible via SharedSecrets or reflection that avoids access control. Changes only in the jdk tree: New webrevs: full: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.05/ incr: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.05.inc/ Just to be clear, in waitForReferenceProcessing() you are okay with returning true on spurious wakeup? Unchanged hotspot webrevs: full: http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04/ incr: http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04.inc/ No further comments from me. Thanks, David
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> On Aug 24, 2016, at 3:50 AM, Per Liden wrote: > > Still looks good. > > cheers, > Per Thanks. > > On 2016-08-23 23:09, Kim Barrett wrote: >> Mandy asked for some additional comments to make life easier for >> future readers. I've also stopped pretending the description for >> waitForReferenceProcessing is javadoc, since this function is private >> and only accessible via SharedSecrets or reflection that avoids access >> control. >> >> Changes only in the jdk tree: >> >> New webrevs: >> full: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.05/ >> incr: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.05.inc/ >> >> Unchanged hotspot webrevs: >> full: http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04/ >> incr: http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04.inc/
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
Still looks good. cheers, Per On 2016-08-23 23:09, Kim Barrett wrote: Mandy asked for some additional comments to make life easier for future readers. I've also stopped pretending the description for waitForReferenceProcessing is javadoc, since this function is private and only accessible via SharedSecrets or reflection that avoids access control. Changes only in the jdk tree: New webrevs: full: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.05/ incr: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.05.inc/ Unchanged hotspot webrevs: full: http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04/ incr: http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04.inc/
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> On Aug 23, 2016, at 5:31 PM, Mandy Chung wrote: > > >> On Aug 23, 2016, at 2:09 PM, Kim Barrett wrote: >> >> New webrevs: >> full: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.05/ >> incr: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.05.inc/ >> >> Unchanged hotspot webrevs: >> full: http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04/ >> incr: http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04.inc/ > > +1 > > I agree with Kim offline to keep the current proposed methods as the > interface can be revised in the future when needed. > > FYI. I updated the description of JDK-8156500 to: > Move Reference pending list into VM to prevent deadlocks > > This will make it clear what this changeset is about. > > Mandy Thanks.
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> On Aug 23, 2016, at 2:09 PM, Kim Barrett wrote: > > New webrevs: > full: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.05/ > incr: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.05.inc/ > > Unchanged hotspot webrevs: > full: http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04/ > incr: http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04.inc/ +1 I agree with Kim offline to keep the current proposed methods as the interface can be revised in the future when needed. FYI. I updated the description of JDK-8156500 to: Move Reference pending list into VM to prevent deadlocks This will make it clear what this changeset is about. Mandy
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
Mandy asked for some additional comments to make life easier for future readers. I've also stopped pretending the description for waitForReferenceProcessing is javadoc, since this function is private and only accessible via SharedSecrets or reflection that avoids access control. Changes only in the jdk tree: New webrevs: full: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.05/ incr: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.05.inc/ Unchanged hotspot webrevs: full: http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04/ incr: http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04.inc/
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> On Aug 19, 2016, at 7:54 PM, Mandy Chung wrote: > > >> On Aug 19, 2016, at 4:09 PM, Kim Barrett wrote: >> >>> On Aug 13, 2016, at 4:07 PM, Mandy Chung wrote: >>> >>> On Aug 13, 2016, at 1:20 AM, Peter Levart wrote: Hi Mandy, On 08/13/2016 01:55 AM, Mandy Chung wrote: >> On Aug 8, 2016, at 6:25 PM, Kim Barrett >> wrote: >> >> full: >> http://cr.openjdk.java.net/~kbarrett/8156500/jdk.04/ >> >> >> http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04/ > This looks very good. > > Have you considered having JVM_WaitForReferencePendingList method to > return the pending list? i.e. make it a blocking version of > getAndClearReferencePendingList rather than two operations. > > waitForReferenceProcessing is really wait for pending reference(s) > enqueued (additionally cleaner gets invoked). What do you think to > rename this method to “waitForPendingReferencesEnqueued”? > I think the split is intentional. It's a clever way to avoid an otherwise inevitable race that could otherwise cause DBB allocating thread to fail with OOME prematurely. >>> >>> Ah I see. I initially read that pendingList doesn’t need to be set within >>> the synchronized block with processPendingActive flag but such race could >>> cause DBB to fail with OOME, as you clarified. >>> >>> Then I suggest waitForReferencePendingList (and its JVM function) be >>> renamed to: >>> boolean awaitReferencePendingList(); >>> >>> a blocking version of "boolean hasReferencePendingList()”. Minor >>> difference but I think it’s little clearer. >> >> Regarding the name, e.g. "waitFor" vs "await", the meaning is the >> same. If there were a documented or defacto preference for "await" I >> would feel obliged to rename, but the JDK seems to use both forms in >> roughly comparable numbers. >> >> Regarding the signature and description, recall that >> waitForReferencePendingList() does not return a value, nor does it >> need to. So documenting it as a blocking version of >> hasReferencePendingList() isn't correct. > > It’s not a suggestion to write “a blocking version of …” in the javadoc (just > my brief way to make my comment that clears my reading of the interfaces) > > If there were multiple reference handling threads, each would want to wait > until there are pending references but only one thread could grab the pending > list. Having it to return a boolean would be useful to me and up to the > callers to decide whether it wants to check the return value (it doesn’t need > to get the pending list if it returns false) But we don't have multiple reference processing [handling] threads, nor does the proposed design want such. waitForReferencePendingList() is an implementation detail that exists for the exclusive use of the single reference processing thread, to allow it to wait outside the synchronization block where it gets the pending list. The notion of a plurality of callers makes no sense for this function. Maybe I'm not understanding what you are suggesting? It seems like you are talking about some unknown to me design that is different from what I've proposed, and suggesting names and signatures based on that alternative design. > If waitForReferencePendingList returns a boolean, awaitReferencePendingList > reads consistent with “hasReferencePendingList” (one word verb). > >> >>> waitForReferenceProcessing - I’d like the javadoc to include more details >>> or @implNote for example: >>> This method returns true to indicate that some pending references have been >>> enqueued. If there are cleaners, cleaning actions are queued to be >>> invoked e.g. DirectByteBuffer may be deallocated. >>> >>> This would help the readers to understand what this method is intended to >>> do. >> >> I was intentionally a little vague in the description; I don't want to >> be promising too much, and having callers depend on behavior that we >> might want to change. In particular, it was intentional that there's >> no mention of Cleaner or any special behavior for them. >> >> For example, we might later decide we don't want the reference >> processing thread to incur the cost of notifying possible waiters and >> the associated synchronization, and instead let "waiters" return >> immediately when there is reference processing in progress, on the >> assumption that the reference processing thread makes rapid progress. >> In some respects that's probably more similar in effect to what the >> old code did. >> >> Also recall that this function is presently private, accessed by >> nio.Bits code via SharedSecrets (and via reflection by one test). It >> might be appropriate to change the access to this function or provide >> some similar non-private function in the future, at which point the >> javadoc and possibly the semantics and implementation will need to be >> carefully examined. > > This
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> On Aug 19, 2016, at 4:09 PM, Kim Barrett wrote: > >> On Aug 13, 2016, at 4:07 PM, Mandy Chung wrote: >> >> >>> On Aug 13, 2016, at 1:20 AM, Peter Levart wrote: >>> >>> Hi Mandy, >>> >>> On 08/13/2016 01:55 AM, Mandy Chung wrote: > On Aug 8, 2016, at 6:25 PM, Kim Barrett > wrote: > > full: > http://cr.openjdk.java.net/~kbarrett/8156500/jdk.04/ > > > http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04/ This looks very good. Have you considered having JVM_WaitForReferencePendingList method to return the pending list? i.e. make it a blocking version of getAndClearReferencePendingList rather than two operations. waitForReferenceProcessing is really wait for pending reference(s) enqueued (additionally cleaner gets invoked). What do you think to rename this method to “waitForPendingReferencesEnqueued”? >>> >>> I think the split is intentional. It's a clever way to avoid an otherwise >>> inevitable race that could otherwise cause DBB allocating thread to fail >>> with OOME prematurely. >>> >> >> Ah I see. I initially read that pendingList doesn’t need to be set within >> the synchronized block with processPendingActive flag but such race could >> cause DBB to fail with OOME, as you clarified. >> >> Then I suggest waitForReferencePendingList (and its JVM function) be renamed >> to: >> boolean awaitReferencePendingList(); >> >> a blocking version of "boolean hasReferencePendingList()”. Minor difference >> but I think it’s little clearer. > > Regarding the name, e.g. "waitFor" vs "await", the meaning is the > same. If there were a documented or defacto preference for "await" I > would feel obliged to rename, but the JDK seems to use both forms in > roughly comparable numbers. > > Regarding the signature and description, recall that > waitForReferencePendingList() does not return a value, nor does it > need to. So documenting it as a blocking version of > hasReferencePendingList() isn't correct. It’s not a suggestion to write “a blocking version of …” in the javadoc (just my brief way to make my comment that clears my reading of the interfaces) If there were multiple reference handling threads, each would want to wait until there are pending references but only one thread could grab the pending list. Having it to return a boolean would be useful to me and up to the callers to decide whether it wants to check the return value (it doesn’t need to get the pending list if it returns false) If waitForReferencePendingList returns a boolean, awaitReferencePendingList reads consistent with “hasReferencePendingList” (one word verb). > >> waitForReferenceProcessing - I’d like the javadoc to include more details or >> @implNote for example: >> This method returns true to indicate that some pending references have been >> enqueued. If there are cleaners, cleaning actions are queued to be invoked >> e.g. DirectByteBuffer may be deallocated. >> >> This would help the readers to understand what this method is intended to do. > > I was intentionally a little vague in the description; I don't want to > be promising too much, and having callers depend on behavior that we > might want to change. In particular, it was intentional that there's > no mention of Cleaner or any special behavior for them. > > For example, we might later decide we don't want the reference > processing thread to incur the cost of notifying possible waiters and > the associated synchronization, and instead let "waiters" return > immediately when there is reference processing in progress, on the > assumption that the reference processing thread makes rapid progress. > In some respects that's probably more similar in effect to what the > old code did. > > Also recall that this function is presently private, accessed by > nio.Bits code via SharedSecrets (and via reflection by one test). It > might be appropriate to change the access to this function or provide > some similar non-private function in the future, at which point the > javadoc and possibly the semantics and implementation will need to be > carefully examined. This is private API that is subject to change or remove any time. I won’t worry the javadoc here will give a false impression of its supportedness. I think this deserves reasonable level of detailed implementation notes. It will help the readers to understand what this does rather than reading the code (both this class and the callsites) carefully with multiple mutators in mind. Mandy
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> On Aug 13, 2016, at 4:07 PM, Mandy Chung wrote: > > >> On Aug 13, 2016, at 1:20 AM, Peter Levart wrote: >> >> Hi Mandy, >> >> On 08/13/2016 01:55 AM, Mandy Chung wrote: On Aug 8, 2016, at 6:25 PM, Kim Barrett wrote: full: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.04/ http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04/ >>> This looks very good. >>> >>> Have you considered having JVM_WaitForReferencePendingList method to return >>> the pending list? i.e. make it a blocking version of >>> getAndClearReferencePendingList rather than two operations. >>> >>> waitForReferenceProcessing is really wait for pending reference(s) enqueued >>> (additionally cleaner gets invoked). What do you think to rename this >>> method to “waitForPendingReferencesEnqueued”? >>> >> >> I think the split is intentional. It's a clever way to avoid an otherwise >> inevitable race that could otherwise cause DBB allocating thread to fail >> with OOME prematurely. >> > > Ah I see. I initially read that pendingList doesn’t need to be set within > the synchronized block with processPendingActive flag but such race could > cause DBB to fail with OOME, as you clarified. > > Then I suggest waitForReferencePendingList (and its JVM function) be renamed > to: > boolean awaitReferencePendingList(); > > a blocking version of "boolean hasReferencePendingList()”. Minor difference > but I think it’s little clearer. Regarding the name, e.g. "waitFor" vs "await", the meaning is the same. If there were a documented or defacto preference for "await" I would feel obliged to rename, but the JDK seems to use both forms in roughly comparable numbers. Regarding the signature and description, recall that waitForReferencePendingList() does not return a value, nor does it need to. So documenting it as a blocking version of hasReferencePendingList() isn't correct. > waitForReferenceProcessing - I’d like the javadoc to include more details or > @implNote for example: > This method returns true to indicate that some pending references have been > enqueued. If there are cleaners, cleaning actions are queued to be invoked > e.g. DirectByteBuffer may be deallocated. > > This would help the readers to understand what this method is intended to do. I was intentionally a little vague in the description; I don't want to be promising too much, and having callers depend on behavior that we might want to change. In particular, it was intentional that there's no mention of Cleaner or any special behavior for them. For example, we might later decide we don't want the reference processing thread to incur the cost of notifying possible waiters and the associated synchronization, and instead let "waiters" return immediately when there is reference processing in progress, on the assumption that the reference processing thread makes rapid progress. In some respects that's probably more similar in effect to what the old code did. Also recall that this function is presently private, accessed by nio.Bits code via SharedSecrets (and via reflection by one test). It might be appropriate to change the access to this function or provide some similar non-private function in the future, at which point the javadoc and possibly the semantics and implementation will need to be carefully examined.
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
Hi Kim, On 08/15/2016 05:15 AM, Kim Barrett wrote: >I have a feeling that these pauses are now unnecessary. Will try to check with some experiments… I found that the DirectBufferAllocTest will sometimes fail if the pauses are taken out. I think what’s going on is that the multiple threads are competing for resources, and some threads in that test lose out if all of them are waiting and wake up at the same time. The exponentially increasing back-off scatters the threads enough for that to become very unlikely, though with sufficiently bad luck… But I think the current implementation could also fail that test with similarly bad luck. It just requires*very* bad luck, so we’re not seeing it as a problem. And that test is a pretty extreme stress test. I get the same results when experimenting with this. Another option would be to enclose the whole logic of (retrying reservation while waiting for Reference processing progress followed by System.gc() and another round of reservation retries while waiting for Reference processing) into a synchronized block so that multiple reservation threads could not barge for reservations. This approach works and might even be triggering less System.gc() rounds, but it is not much better than current approach. Regards, Peter
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> On Aug 13, 2016, at 4:20 AM, Peter Levart wrote: > > Hi Mandy, > > On 08/13/2016 01:55 AM, Mandy Chung wrote: >>> On Aug 8, 2016, at 6:25 PM, Kim Barrett >>> wrote: >>> >>> full: >>> http://cr.openjdk.java.net/~kbarrett/8156500/jdk.04/ >>> >>> >>> http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04/ >> This looks very good. >> >> Have you considered having JVM_WaitForReferencePendingList method to return >> the pending list? i.e. make it a blocking version of >> getAndClearReferencePendingList rather than two operations. >> >> waitForReferenceProcessing is really wait for pending reference(s) enqueued >> (additionally cleaner gets invoked). What do you think to rename this >> method to “waitForPendingReferencesEnqueued”? >> > > I think the split is intentional. It's a clever way to avoid an otherwise > inevitable race that could otherwise cause DBB allocating thread to fail with > OOME prematurely. > > waitForReferencePendingList() is invoked out of synchronized block so that it > does not prevent the progress of thread(s) invoking > waitForReferenceProcessing(), while getAndClearReferencePendingList is > non-blocking and is invoked inside the synchronized block that also sets a > boolean flag. Code in waitForReferenceProcessing() checks both the > non-emptiness of reference pending list and the boolean flag while holding > the lock. > > The following sequence of actions (as used in NIO Bits): > > System.gc(); > Reference.waitForReferenceProcessing(); > > ...therefore either: > > - completes immediately after System.gc() returns without discovering any > pending Reference, returning false; or > - completes after System.gc() discovers at least one pending Reference and > ReferenceHandler thread processes at least one Cleaner, returning true; or > enqueues all pending Reference(s), returning false. > > When waitForReferenceProcessing() returns false consecutively even after a > series of exponentially increasing pauses, the DBB allocating thread(s) can > be sure they have exhausted all options to allocate the direct buffer and > must fail with OOME. Thanks for answering Mandy’s questions for me. That all looks right. > I have a feeling that these pauses are now unnecessary. Will try to check > with some experiments… I found that the DirectBufferAllocTest will sometimes fail if the pauses are taken out. I think what’s going on is that the multiple threads are competing for resources, and some threads in that test lose out if all of them are waiting and wake up at the same time. The exponentially increasing back-off scatters the threads enough for that to become very unlikely, though with sufficiently bad luck… But I think the current implementation could also fail that test with similarly bad luck. It just requires *very* bad luck, so we’re not seeing it as a problem. And that test is a pretty extreme stress test. > > Regards, Peter > >> Grammar question: >> >> "there are no more references” >> "If there aren't any pending {@link Reference}s" >> >> - This is the case with zero pending reference, shouldn’t it use “is” such >> that: >> >> "there us no more reference” >> "If there isn't any pending {@link Reference}” >> >> Please update the synposis of JDK-8156500 to make it clear what this fix is >> (it’s nothing related to JDI). Maybe something like “Move the pending >> reference list to VM” >> >> Mandy
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> On Aug 13, 2016, at 1:20 AM, Peter Levart wrote: > > Hi Mandy, > > On 08/13/2016 01:55 AM, Mandy Chung wrote: >>> On Aug 8, 2016, at 6:25 PM, Kim Barrett >>> wrote: >>> >>> full: >>> http://cr.openjdk.java.net/~kbarrett/8156500/jdk.04/ >>> >>> >>> http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04/ >> This looks very good. >> >> Have you considered having JVM_WaitForReferencePendingList method to return >> the pending list? i.e. make it a blocking version of >> getAndClearReferencePendingList rather than two operations. >> >> waitForReferenceProcessing is really wait for pending reference(s) enqueued >> (additionally cleaner gets invoked). What do you think to rename this >> method to “waitForPendingReferencesEnqueued”? >> > > I think the split is intentional. It's a clever way to avoid an otherwise > inevitable race that could otherwise cause DBB allocating thread to fail with > OOME prematurely. > Ah I see. I initially read that pendingList doesn’t need to be set within the synchronized block with processPendingActive flag but such race could cause DBB to fail with OOME, as you clarified. Then I suggest waitForReferencePendingList (and its JVM function) be renamed to: boolean awaitReferencePendingList(); a blocking version of "boolean hasReferencePendingList()”. Minor difference but I think it’s little clearer. waitForReferenceProcessing - I’d like the javadoc to include more details or @implNote for example: This method returns true to indicate that some pending references have been enqueued. If there are cleaners, cleaning actions are queued to be invoked e.g. DirectByteBuffer may be deallocated. This would help the readers to understand what this method is intended to do. Mandy > waitForReferencePendingList() is invoked out of synchronized block so that it > does not prevent the progress of thread(s) invoking > waitForReferenceProcessing(), while getAndClearReferencePendingList is > non-blocking and is invoked inside the synchronized block that also sets a > boolean flag. Code in waitForReferenceProcessing() checks both the > non-emptiness of reference pending list and the boolean flag while holding > the lock. > > The following sequence of actions (as used in NIO Bits): > > System.gc(); > Reference.waitForReferenceProcessing(); > > ...therefore either: > > - completes immediately after System.gc() returns without discovering any > pending Reference, returning false; or > - completes after System.gc() discovers at least one pending Reference and > ReferenceHandler thread processes at least one Cleaner, returning true; or > enqueues all pending Reference(s), returning false. > > When waitForReferenceProcessing() returns false consecutively even after a > series of exponentially increasing pauses, the DBB allocating thread(s) can > be sure they have exhausted all options to allocate the direct buffer and > must fail with OOME. > > I have a feeling that these pauses are now unnecessary. Will try to check > with some experiments... > > Regards, Peter > >> Grammar question: >> >> "there are no more references” >> "If there aren't any pending {@link Reference}s" >> >> - This is the case with zero pending reference, shouldn’t it use “is” such >> that: >> >> "there us no more reference” >> "If there isn't any pending {@link Reference}” >> >> Please update the synposis of JDK-8156500 to make it clear what this fix is >> (it’s nothing related to JDI). Maybe something like “Move the pending >> reference list to VM” >> >> Mandy >> >
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
Hi Mandy, On 08/13/2016 01:55 AM, Mandy Chung wrote: On Aug 8, 2016, at 6:25 PM, Kim Barrett wrote: full: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.04/ http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04/ This looks very good. Have you considered having JVM_WaitForReferencePendingList method to return the pending list? i.e. make it a blocking version of getAndClearReferencePendingList rather than two operations. waitForReferenceProcessing is really wait for pending reference(s) enqueued (additionally cleaner gets invoked). What do you think to rename this method to “waitForPendingReferencesEnqueued”? I think the split is intentional. It's a clever way to avoid an otherwise inevitable race that could otherwise cause DBB allocating thread to fail with OOME prematurely. waitForReferencePendingList() is invoked out of synchronized block so that it does not prevent the progress of thread(s) invoking waitForReferenceProcessing(), while getAndClearReferencePendingList is non-blocking and is invoked inside the synchronized block that also sets a boolean flag. Code in waitForReferenceProcessing() checks both the non-emptiness of reference pending list and the boolean flag while holding the lock. The following sequence of actions (as used in NIO Bits): System.gc(); Reference.waitForReferenceProcessing(); ...therefore either: - completes immediately after System.gc() returns without discovering any pending Reference, returning false; or - completes after System.gc() discovers at least one pending Reference and ReferenceHandler thread processes at least one Cleaner, returning true; or enqueues all pending Reference(s), returning false. When waitForReferenceProcessing() returns false consecutively even after a series of exponentially increasing pauses, the DBB allocating thread(s) can be sure they have exhausted all options to allocate the direct buffer and must fail with OOME. I have a feeling that these pauses are now unnecessary. Will try to check with some experiments... Regards, Peter Grammar question: "there are no more references” "If there aren't any pending {@link Reference}s" - This is the case with zero pending reference, shouldn’t it use “is” such that: "there us no more reference” "If there isn't any pending {@link Reference}” Please update the synposis of JDK-8156500 to make it clear what this fix is (it’s nothing related to JDI). Maybe something like “Move the pending reference list to VM” Mandy
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> On Aug 8, 2016, at 6:25 PM, Kim Barrett wrote: > > full: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.04/ > http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04/ This looks very good. Have you considered having JVM_WaitForReferencePendingList method to return the pending list? i.e. make it a blocking version of getAndClearReferencePendingList rather than two operations. waitForReferenceProcessing is really wait for pending reference(s) enqueued (additionally cleaner gets invoked). What do you think to rename this method to “waitForPendingReferencesEnqueued”? Grammar question: "there are no more references” "If there aren't any pending {@link Reference}s" - This is the case with zero pending reference, shouldn’t it use “is” such that: "there us no more reference” "If there isn't any pending {@link Reference}” Please update the synposis of JDK-8156500 to make it clear what this fix is (it’s nothing related to JDI). Maybe something like “Move the pending reference list to VM” Mandy
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> On Aug 9, 2016, at 3:52 AM, Per Liden wrote: > > Hi Kim, > > On 2016-08-09 03:25, Kim Barrett wrote: >>> On Aug 8, 2016, at 8:16 AM, Per Liden wrote: >>> I have one suggestion though, regarding CheckReferencePendingList(). While >>> reviewing I found that I had to check several times what its return value >>> actually meant, the "check" part of the name doesn't quite reveal that. >>> Further, it seems to me that the waiting path of this function has fairly >>> little in common with the non-waiting path, e.g. it always returns true. >>> So, to make both the naming and implementation more clear I'd like to >>> suggest that we split this into two separate functions, >>> HasReferencePendingList() and WaitForReferencePendingList(), like this: >> >> I was thinking about splitting things way, and ended up not doing so >> for no good reason I can think of. And it does seem clearer that way, >> so... >> >> New webrevs: >> full: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.04/ >> http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04/ >> incr: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.04.inc/ >> http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04.inc/ > > Thanks for fixing, looks good. Thanks! > > cheers, > Per > >> >>> Other than this I think the patch looks good.
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
Hi Kim, On 2016-08-09 03:25, Kim Barrett wrote: On Aug 8, 2016, at 8:16 AM, Per Liden wrote: Hi Kim, On 2016-08-01 20:47, Kim Barrett wrote: This updated webrev addresses concerns about the performance of the VM API used by the reference processor thread in the original webrev. New webrevs: full: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.03/ http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.03/ incr: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.03.incr/ http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.03.incr/ Overall I think you managed to hit a good balance here, keeping the VM API straight forward without loosing flexibility. Having the ReferenceHandler doing the actual work seems sound and avoids some complexity, which I like. Thanks for looking at this. I have one suggestion though, regarding CheckReferencePendingList(). While reviewing I found that I had to check several times what its return value actually meant, the "check" part of the name doesn't quite reveal that. Further, it seems to me that the waiting path of this function has fairly little in common with the non-waiting path, e.g. it always returns true. So, to make both the naming and implementation more clear I'd like to suggest that we split this into two separate functions, HasReferencePendingList() and WaitForReferencePendingList(), like this: I was thinking about splitting things way, and ended up not doing so for no good reason I can think of. And it does seem clearer that way, so... New webrevs: full: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.04/ http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04/ incr: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.04.inc/ http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04.inc/ Thanks for fixing, looks good. cheers, Per Other than this I think the patch looks good. The originally offered approach was an attempt to minimize changes. I was trying to be as similar to the existing code as possible, while moving part of it into the VM, where we have the necessary control over suspension and the like. We already know we want to make changes in this area, in order to eliminate the use of jdk.internal.ref.Cleaner (JDK-8149925). But we've also agreed that JDK-8149925 wasn't urgent and to defer it to JDK 10. I don't think we should revisit that. Is JDK-8149925 the correct bug id? Looks like that bug has already been fixed in 9. That CR was originally for removing all the uses. During review the nio.Bits / direct buffer use was separated, and is the last one. There was a bunch of discussion about removing that final one in that review (especially later parts) http://mail.openjdk.java.net/pipermail/core-libs-dev/2016-February/038663.html http://mail.openjdk.java.net/pipermail/core-libs-dev/2016-March/039315.html http://mail.openjdk.java.net/pipermail/core-libs-dev/2016-April/039862.html and also here: http://mail.openjdk.java.net/pipermail/core-libs-dev/2016-March/039561.html. However, I don't see a followup CR for removing the nio.Bits use. Maybe that didn't get created when we split the issue, or maybe I'm just failing to find it. Peter? Note that the consensus back in March/April was to defer it, along with the dependent actual removal of the internal Cleaner, to JDK 10.
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> On Aug 8, 2016, at 8:16 AM, Per Liden wrote: > > Hi Kim, > > On 2016-08-01 20:47, Kim Barrett wrote: >> This updated webrev addresses concerns about the performance of the VM >> API used by the reference processor thread in the original webrev. >> >> New webrevs: >> full: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.03/ >> http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.03/ >> incr: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.03.incr/ >> http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.03.incr/ > > Overall I think you managed to hit a good balance here, keeping the VM API > straight forward without loosing flexibility. Having the ReferenceHandler > doing the actual work seems sound and avoids some complexity, which I like. Thanks for looking at this. > I have one suggestion though, regarding CheckReferencePendingList(). While > reviewing I found that I had to check several times what its return value > actually meant, the "check" part of the name doesn't quite reveal that. > Further, it seems to me that the waiting path of this function has fairly > little in common with the non-waiting path, e.g. it always returns true. So, > to make both the naming and implementation more clear I'd like to suggest > that we split this into two separate functions, HasReferencePendingList() and > WaitForReferencePendingList(), like this: I was thinking about splitting things way, and ended up not doing so for no good reason I can think of. And it does seem clearer that way, so... New webrevs: full: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.04/ http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04/ incr: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.04.inc/ http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04.inc/ > Other than this I think the patch looks good. > >> >> The originally offered approach was an attempt to minimize changes. I >> was trying to be as similar to the existing code as possible, while >> moving part of it into the VM, where we have the necessary control >> over suspension and the like. We already know we want to make changes >> in this area, in order to eliminate the use of >> jdk.internal.ref.Cleaner (JDK-8149925). But we've also agreed that >> JDK-8149925 wasn't urgent and to defer it to JDK 10. I don't think we >> should revisit that. > > Is JDK-8149925 the correct bug id? Looks like that bug has already been fixed > in 9. That CR was originally for removing all the uses. During review the nio.Bits / direct buffer use was separated, and is the last one. There was a bunch of discussion about removing that final one in that review (especially later parts) http://mail.openjdk.java.net/pipermail/core-libs-dev/2016-February/038663.html http://mail.openjdk.java.net/pipermail/core-libs-dev/2016-March/039315.html http://mail.openjdk.java.net/pipermail/core-libs-dev/2016-April/039862.html and also here: http://mail.openjdk.java.net/pipermail/core-libs-dev/2016-March/039561.html. However, I don't see a followup CR for removing the nio.Bits use. Maybe that didn't get created when we split the issue, or maybe I'm just failing to find it. Peter? Note that the consensus back in March/April was to defer it, along with the dependent actual removal of the internal Cleaner, to JDK 10.
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
Hi Kim, On 2016-08-01 20:47, Kim Barrett wrote: This updated webrev addresses concerns about the performance of the VM API used by the reference processor thread in the original webrev. New webrevs: full: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.03/ http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.03/ incr: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.03.incr/ http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.03.incr/ Overall I think you managed to hit a good balance here, keeping the VM API straight forward without loosing flexibility. Having the ReferenceHandler doing the actual work seems sound and avoids some complexity, which I like. I have one suggestion though, regarding CheckReferencePendingList(). While reviewing I found that I had to check several times what its return value actually meant, the "check" part of the name doesn't quite reveal that. Further, it seems to me that the waiting path of this function has fairly little in common with the non-waiting path, e.g. it always returns true. So, to make both the naming and implementation more clear I'd like to suggest that we split this into two separate functions, HasReferencePendingList() and WaitForReferencePendingList(), like this: JVM_ENTRY(jboolean, JVM_HasReferencePendingList(JNIEnv* env)) JVMWrapper("JVM_HasReferencePendingList"); MonitorLockerEx ml(Heap_lock); return Universe::has_reference_pending_list(); JVM_END JVM_ENTRY(void, JVM_WaitForReferencePendingList(JNIEnv* env)) JVMWrapper("JVM_WaitForReferencePendingList"); MonitorLockerEx ml(Heap_lock); while (!Universe::has_reference_pending_list()) { ml.wait(); } JVM_END And of course, this would then also be reflected in j.l.r.Reference, which would have: private static native boolean hasReferencePendingList(); private static native void waitForReferencePendingList(); Other than this I think the patch looks good. The originally offered approach was an attempt to minimize changes. I was trying to be as similar to the existing code as possible, while moving part of it into the VM, where we have the necessary control over suspension and the like. We already know we want to make changes in this area, in order to eliminate the use of jdk.internal.ref.Cleaner (JDK-8149925). But we've also agreed that JDK-8149925 wasn't urgent and to defer it to JDK 10. I don't think we should revisit that. Is JDK-8149925 the correct bug id? Looks like that bug has already been fixed in 9. cheers, Per As Peter pointed out, my original change introduces a small performance regression on a microbenchmark for reference processing throughput. It also showed a small performance benefit on a different microbenchmark (allocation rate for a stress test of direct byte buffers), as noted in the original RFR. I can reproduce both of those. I think reference processing thread throughput is the right measure to use, so long as others don't become horrible. Focusing on that, it's better to just let the reference processing thread do the processing, rather than slowing it down to allow for the rare case when there's another thread that could possibly help. This is especially true now that Cleaner has such limited usage. This leads to a different API for other threads; rather than tryHandlePending that other threads can call to help and to examine progress, we now have waitForReferenceProcessing. The VM API is also different; rather than popReferencePendingList to get or wait, we now have getAndClearReferencePendingList and checkReferencePendingList, the latter with an optional wait. The splitting of the VM API allows us to avoid a narrow race condition discussed by Peter in his prototypes. Peter suggested this race was ok because java.nio.Bits makes several retries. However, those retries are only done before throwing OOME. There are no retries before invoking GC, so this race could lead to unnecessary successive GCs. Doing all the processing on the reference processing thread eliminates execution of Cleaners on application threads, though that's not nearly as much an issue now that the use of Cleaner is so restricted. We've also eliminated a pre-existing issue where a helper could report no progress even though the reference processing thread (and other helpers) had removed a pending reference, but not yet processed it. This could result in the non-progressing helper invoking GC or reporting failure, even though it might have succeeded if it had waited for the other thread(s) to complete processing the reference(s) being worked on. I think the new waitForReferenceProcessing could be used to fix JDK-6306116, though that isn't part of this change, and was not really a motivating factor. I don't know if the new waitForReferenceProcessing will be used by whatever we eventually do for JDK-8149925, but I think the new VM API will suffice for that. That is, I think JDK-8149925 might require changes to the core-libs API used
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> On Aug 3, 2016, at 3:41 AM, David Holmes wrote: > > On 2/08/2016 4:46 AM, Kim Barrett wrote: >> This updated webrev addresses the concerns around initialization order >> for Throwable and OutOfMemoryError. It doesn't address the VM API >> used by the reference processor thread; that will be in a later >> webrev. >> >> New webrevs: >> full: http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.02/ >> incr: http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.02.incr/ >> >> The jdk tree is unchanged from webrev.01. >> >> I've backed out the change to initialize_java_lang_classes. (To >> verify this, look at the runtime/thread.cpp in the full webrev.) >> >> Instead, Universe::gen_out_of_memory_error now also requires Throwable >> to be initialized before it will attempt to fill in the stack trace. >> This means that VM generated OOMEs occurring early in initialization >> won't have a stack trace. > > This change looks good to me. Thanks!
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
On 2/08/2016 4:46 AM, Kim Barrett wrote: This updated webrev addresses the concerns around initialization order for Throwable and OutOfMemoryError. It doesn't address the VM API used by the reference processor thread; that will be in a later webrev. New webrevs: full: http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.02/ incr: http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.02.incr/ The jdk tree is unchanged from webrev.01. I've backed out the change to initialize_java_lang_classes. (To verify this, look at the runtime/thread.cpp in the full webrev.) Instead, Universe::gen_out_of_memory_error now also requires Throwable to be initialized before it will attempt to fill in the stack trace. This means that VM generated OOMEs occurring early in initialization won't have a stack trace. This change looks good to me. Thanks, David An alternative that I considered was to remove the assert at the end of fill_in_stack_trace_of_preallocated_backtrace that requires java_lang_Throwable;:unassigned_stacktrace() to return non-NULL. I did some testing that seemed to work and provide stacktraces, relying on code in the Throwable class that special cases this "out of protocol" state. However, having read some of the history around that (JDK-6998871, JDK-7045138, JDK-7046490), that special casing seems to have been intended to be temporary, and expected to be removed. (Though I didn't find an RFE for doing so, and I'm not doing so here.) Even if we decided it wasn't going to be removed, this approach seems somewhat fragile. [Note to Coleen: This is the reverse of what you and I talked about a few days ago; I hadn't reviewed the history carefully then.]
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> On Aug 1, 2016, at 2:47 PM, Kim Barrett wrote: > > This updated webrev addresses concerns about the performance of the VM > API used by the reference processor thread in the original webrev. > > New webrevs: > full: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.03/ > http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.03/ > incr: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.03.incr/ > http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.03.incr/ Sorry, typo in the incremental webrev URLs — “incr” => “inc”.
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> On Aug 2, 2016, at 8:55 AM, Peter Levart wrote: > > Hi Kim, > > This looks very good. I like the way you dealt with race between the > ReferenceHandler thread and threads waiting for it to do some cleanup > progress. I think the VM API is suitable for possible further development on > JDK-8149925. It's also nice that the whole pending list is obtained in one > native call, so further optimizations on Java side are possible. > > Regards, Peter Thanks! > > On 08/01/2016 08:47 PM, Kim Barrett wrote: >> This updated webrev addresses concerns about the performance of the VM >> API used by the reference processor thread in the original webrev. >> >> New webrevs: >> full: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.03/ >> http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.03/ >> incr: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.03.incr/ >> http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.03.incr/ [snip]
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
Hi Kim, This looks very good. I like the way you dealt with race between the ReferenceHandler thread and threads waiting for it to do some cleanup progress. I think the VM API is suitable for possible further development on JDK-8149925. It's also nice that the whole pending list is obtained in one native call, so further optimizations on Java side are possible. Regards, Peter On 08/01/2016 08:47 PM, Kim Barrett wrote: This updated webrev addresses concerns about the performance of the VM API used by the reference processor thread in the original webrev. New webrevs: full: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.03/ http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.03/ incr: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.03.incr/ http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.03.incr/ The originally offered approach was an attempt to minimize changes. I was trying to be as similar to the existing code as possible, while moving part of it into the VM, where we have the necessary control over suspension and the like. We already know we want to make changes in this area, in order to eliminate the use of jdk.internal.ref.Cleaner (JDK-8149925). But we've also agreed that JDK-8149925 wasn't urgent and to defer it to JDK 10. I don't think we should revisit that. As Peter pointed out, my original change introduces a small performance regression on a microbenchmark for reference processing throughput. It also showed a small performance benefit on a different microbenchmark (allocation rate for a stress test of direct byte buffers), as noted in the original RFR. I can reproduce both of those. I think reference processing thread throughput is the right measure to use, so long as others don't become horrible. Focusing on that, it's better to just let the reference processing thread do the processing, rather than slowing it down to allow for the rare case when there's another thread that could possibly help. This is especially true now that Cleaner has such limited usage. This leads to a different API for other threads; rather than tryHandlePending that other threads can call to help and to examine progress, we now have waitForReferenceProcessing. The VM API is also different; rather than popReferencePendingList to get or wait, we now have getAndClearReferencePendingList and checkReferencePendingList, the latter with an optional wait. The splitting of the VM API allows us to avoid a narrow race condition discussed by Peter in his prototypes. Peter suggested this race was ok because java.nio.Bits makes several retries. However, those retries are only done before throwing OOME. There are no retries before invoking GC, so this race could lead to unnecessary successive GCs. Doing all the processing on the reference processing thread eliminates execution of Cleaners on application threads, though that's not nearly as much an issue now that the use of Cleaner is so restricted. We've also eliminated a pre-existing issue where a helper could report no progress even though the reference processing thread (and other helpers) had removed a pending reference, but not yet processed it. This could result in the non-progressing helper invoking GC or reporting failure, even though it might have succeeded if it had waited for the other thread(s) to complete processing the reference(s) being worked on. I think the new waitForReferenceProcessing could be used to fix JDK-6306116, though that isn't part of this change, and was not really a motivating factor. I don't know if the new waitForReferenceProcessing will be used by whatever we eventually do for JDK-8149925, but I think the new VM API will suffice for that. That is, I think JDK-8149925 might require changes to the core-libs API used by nio.Bits, and won't require further VM API changes.
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
This updated webrev addresses concerns about the performance of the VM API used by the reference processor thread in the original webrev. New webrevs: full: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.03/ http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.03/ incr: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.03.incr/ http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.03.incr/ The originally offered approach was an attempt to minimize changes. I was trying to be as similar to the existing code as possible, while moving part of it into the VM, where we have the necessary control over suspension and the like. We already know we want to make changes in this area, in order to eliminate the use of jdk.internal.ref.Cleaner (JDK-8149925). But we've also agreed that JDK-8149925 wasn't urgent and to defer it to JDK 10. I don't think we should revisit that. As Peter pointed out, my original change introduces a small performance regression on a microbenchmark for reference processing throughput. It also showed a small performance benefit on a different microbenchmark (allocation rate for a stress test of direct byte buffers), as noted in the original RFR. I can reproduce both of those. I think reference processing thread throughput is the right measure to use, so long as others don't become horrible. Focusing on that, it's better to just let the reference processing thread do the processing, rather than slowing it down to allow for the rare case when there's another thread that could possibly help. This is especially true now that Cleaner has such limited usage. This leads to a different API for other threads; rather than tryHandlePending that other threads can call to help and to examine progress, we now have waitForReferenceProcessing. The VM API is also different; rather than popReferencePendingList to get or wait, we now have getAndClearReferencePendingList and checkReferencePendingList, the latter with an optional wait. The splitting of the VM API allows us to avoid a narrow race condition discussed by Peter in his prototypes. Peter suggested this race was ok because java.nio.Bits makes several retries. However, those retries are only done before throwing OOME. There are no retries before invoking GC, so this race could lead to unnecessary successive GCs. Doing all the processing on the reference processing thread eliminates execution of Cleaners on application threads, though that's not nearly as much an issue now that the use of Cleaner is so restricted. We've also eliminated a pre-existing issue where a helper could report no progress even though the reference processing thread (and other helpers) had removed a pending reference, but not yet processed it. This could result in the non-progressing helper invoking GC or reporting failure, even though it might have succeeded if it had waited for the other thread(s) to complete processing the reference(s) being worked on. I think the new waitForReferenceProcessing could be used to fix JDK-6306116, though that isn't part of this change, and was not really a motivating factor. I don't know if the new waitForReferenceProcessing will be used by whatever we eventually do for JDK-8149925, but I think the new VM API will suffice for that. That is, I think JDK-8149925 might require changes to the core-libs API used by nio.Bits, and won't require further VM API changes.
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
This updated webrev addresses the concerns around initialization order for Throwable and OutOfMemoryError. It doesn't address the VM API used by the reference processor thread; that will be in a later webrev. New webrevs: full: http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.02/ incr: http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.02.incr/ The jdk tree is unchanged from webrev.01. I've backed out the change to initialize_java_lang_classes. (To verify this, look at the runtime/thread.cpp in the full webrev.) Instead, Universe::gen_out_of_memory_error now also requires Throwable to be initialized before it will attempt to fill in the stack trace. This means that VM generated OOMEs occurring early in initialization won't have a stack trace. An alternative that I considered was to remove the assert at the end of fill_in_stack_trace_of_preallocated_backtrace that requires java_lang_Throwable;:unassigned_stacktrace() to return non-NULL. I did some testing that seemed to work and provide stacktraces, relying on code in the Throwable class that special cases this "out of protocol" state. However, having read some of the history around that (JDK-6998871, JDK-7045138, JDK-7046490), that special casing seems to have been intended to be temporary, and expected to be removed. (Though I didn't find an RFE for doing so, and I'm not doing so here.) Even if we decided it wasn't going to be removed, this approach seems somewhat fragile. [Note to Coleen: This is the reverse of what you and I talked about a few days ago; I hadn't reviewed the history carefully then.]
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> On Jun 29, 2016, at 6:42 PM, David Holmes wrote: > > The earlier you initialize Throwable the earlier you can try to create an > exception that has a backtrace. But there will always be a region of code > where we can't throw an OOME with a backtrace because of the missing > initialization of Throwable. So we can narrow that window by moving the > initialization of Throwable (which in turn requires a whole bunch of > collection classes - so the window has a fixed minimum size [ unless we do > some creative restructuring of Throwable's static initialization]). My > argument, which I think I've now convinced Kim of, is that we shouldn't be > trying to throw an OOME with a stacktrace if Throwable has not been > initialized - and that is where the change to gen_out_of_memory_error comes > in. It is actually passed the pre-allocated OOME that has no backtrace and > will never have a backtrace, and that is what should be thrown when Throwable > has not been initialized. Thanks for the clarification. It makes sense and the pre-allocated OOME with no backtrace should be used for this early OOM scenario. Mandy
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
On 30/06/2016 10:32 AM, Mandy Chung wrote: On Jun 28, 2016, at 10:19 PM, David Holmes wrote: So here is what I see has happened. Looking back at 9-b01, before we forced the initialization of InterruptedException and thus Throwable we find: 58 Initializing 'java/lang/Throwable' (0x00082990) So Kim is right this was working by accident. It just seems that back then there was less memory required by the initialization of all the collection and other classes and so we didn't run into this problem. Post the InterruptedException change the initialization order made it unlikely an OOME could be encountered before Throwable was initialized (and after we have reached a point where we can throw without the VM crashing or instead doing a vm_exit_during_initialization). Post modules those collection classes, and others, are now done earlier again and before Throwable. And presumably their memory demands have increased. This is the new primordial class added by modules that causes additional classes to be loaded early before initPhase1. // The VM creates objects of this class. initialize_class(vmSymbols::java_lang_reflect_Module(), CHECK); Although we preallocate the OutOfMemoryError instances, and avoid executing any java code to do so, we can't necessarily** "throw" them until after Throwable is initialized. We now have a lot more initialization occurring before we init Throwable and so OOME is more likely and so it will fail as Kim observed. Would initializing java.lang.Throwable after java.lang.reflect.Module address this issue? I don’t think I fully follow the problem Kim observed and below. The earlier you initialize Throwable the earlier you can try to create an exception that has a backtrace. But there will always be a region of code where we can't throw an OOME with a backtrace because of the missing initialization of Throwable. So we can narrow that window by moving the initialization of Throwable (which in turn requires a whole bunch of collection classes - so the window has a fixed minimum size [ unless we do some creative restructuring of Throwable's static initialization]). My argument, which I think I've now convinced Kim of, is that we shouldn't be trying to throw an OOME with a stacktrace if Throwable has not been initialized - and that is where the change to gen_out_of_memory_error comes in. It is actually passed the pre-allocated OOME that has no backtrace and will never have a backtrace, and that is what should be thrown when Throwable has not been initialized. Cheers, David ** I say necessarily because I still believe it is the fact we attempt to fill in the stacktrace that leads to the dependency on Throwable being initialized, and we should be able to avoid that if we check the VM initialization state in gen_out_of_memory_error(). Mandy
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> On Jun 28, 2016, at 10:19 PM, David Holmes wrote: > > So here is what I see has happened. > > Looking back at 9-b01, before we forced the initialization of > InterruptedException and thus Throwable we find: > > 58 Initializing 'java/lang/Throwable' (0x00082990) > > So Kim is right this was working by accident. It just seems that back then > there was less memory required by the initialization of all the collection > and other classes and so we didn't run into this problem. > > Post the InterruptedException change the initialization order made it > unlikely an OOME could be encountered before Throwable was initialized (and > after we have reached a point where we can throw without the VM crashing or > instead doing a vm_exit_during_initialization). > > Post modules those collection classes, and others, are now done earlier again > and before Throwable. And presumably their memory demands have increased. > This is the new primordial class added by modules that causes additional classes to be loaded early before initPhase1. // The VM creates objects of this class. initialize_class(vmSymbols::java_lang_reflect_Module(), CHECK); > Although we preallocate the OutOfMemoryError instances, and avoid executing > any java code to do so, we can't necessarily** "throw" them until after > Throwable is initialized. We now have a lot more initialization occurring > before we init Throwable and so OOME is more likely and so it will fail as > Kim observed. Would initializing java.lang.Throwable after java.lang.reflect.Module address this issue? I don’t think I fully follow the problem Kim observed and below. > > ** I say necessarily because I still believe it is the fact we attempt to > fill in the stacktrace that leads to the dependency on Throwable being > initialized, and we should be able to avoid that if we check the VM > initialization state in gen_out_of_memory_error(). Mandy
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
On 30/06/2016 5:12 AM, Kim Barrett wrote: On Jun 29, 2016, at 2:54 PM, Kim Barrett wrote: I think the problem may have existed long before modules, but simply wasn't noticed. That is, I suspect trying the test case mentioned below with a pre-June-2014 fastdebug build (when Reference started initializing InterruptedException) might trip the same assert. I'm presently trying to obtain a sufficiently old fastdebug build to run the experiment. Nope, works fine with JDK 9 b01. See my response to Mandy above. David - ./jdk1.9.0/fastdebug/bin/java -XX:AutoBoxCacheMax=2147483647 -version Error occurred during initialization of VM java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.lang.Integer$IntegerCache.(Integer.java:799) at java.lang.Integer.valueOf(Integer.java:827) at sun.misc.Signal.handle(Signal.java:169) at java.lang.Terminator.setup(Terminator.java:60) at java.lang.System.initializeSystemClass(System.java:1197)
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> On Jun 29, 2016, at 2:54 PM, Kim Barrett wrote: > > I think the problem may have existed long before modules, but simply > wasn't noticed. That is, I suspect trying the test case mentioned > below with a pre-June-2014 fastdebug build (when Reference started > initializing InterruptedException) might trip the same assert. I'm > presently trying to obtain a sufficiently old fastdebug build to run > the experiment. Nope, works fine with JDK 9 b01. ./jdk1.9.0/fastdebug/bin/java -XX:AutoBoxCacheMax=2147483647 -version Error occurred during initialization of VM java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.lang.Integer$IntegerCache.(Integer.java:799) at java.lang.Integer.valueOf(Integer.java:827) at sun.misc.Signal.handle(Signal.java:169) at java.lang.Terminator.setup(Terminator.java:60) at java.lang.System.initializeSystemClass(System.java:1197)
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> On Jun 28, 2016, at 7:23 PM, David Holmes wrote: > > On 29/06/2016 8:43 AM, Kim Barrett wrote: >> Updated webrevs: >> >> Full: >> http://cr.openjdk.java.net/~kbarrett/8156500/jdk.01/ >> http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.01/ >> >> Incremental: >> http://cr.openjdk.java.net/~kbarrett/8156500/jdk.01.inc/ > > Did Reference not work? Just curious. I used to be a qualified Java > programmer back in the Java 5 era, but wildcards were always a bit iffy :) It did not, or at least not without more contortions than I wanted to make. The old code used , so I decided to fall back to that. Specifically, the type parameter in the type of "ref" needs to be compatible with the type parameter in the type of "q", and that seems to be messy to make happen when wildcards are involved. Having popXXX return Reference and splitting most of the body of tryHandlePending into a new parameterized helper boolean tryHandlePendingAux(Reference ref, boolean should_wait) did seem to do the trick, but also seemed like it was trying too hard... >> http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.01.inc/ >> >> Still investigating the initialization order for core exceptions. > > I suspect it is as Coleen indicated that the module changes introduced the > new failure path that you hit. I did a quick check of the initialization > order in an old b50: > > 33 Initializing 'java/lang/Throwable' (0x001780002990) > ... > 46 Initializing 'java/lang/Exception'(no method) (0x001780003158) > 47 Initializing 'java/lang/InterruptedException'(no method) > (0x0017800178a8) > > Compare that with a current build: > > 60 Initializing 'java/lang/ref/Reference$ReferenceHandler' > (0x00080001e3f0) > 61 Initializing 'java/lang/Throwable' (0x000829f8) > 62 Initializing 'java/lang/Exception'(no method) (0x000831a8) > 63 Initializing 'java/lang/InterruptedException'(no method) > (0x00080001e6e0) > 64 Initializing 'java/lang/ref/PhantomReference'(no method) > (0x00086440) > > Initialization of Throwable is much, much later (large parts of java.lang and > java.util are now initialized first!) and is obviously a direct consequence > of preinitializing InterruptedException. > > So I would say that the module change did break this and that initialization > of Throwable (only) should be restored to a much higher place in the > initialization sequence. I think the problem may have existed long before modules, but simply wasn't noticed. That is, I suspect trying the test case mentioned below with a pre-June-2014 fastdebug build (when Reference started initializing InterruptedException) might trip the same assert. I'm presently trying to obtain a sufficiently old fastdebug build to run the experiment. The thing that has changed is that we now have a test (ol' reliable tickler of bugs TestOptionsWithRanges) that tries all kinds of wacky configurations that had never been seen before. It tripped the init order problem by trying -XX:AutoBoxCacheMax with a value of jint_max, which led to an attempt to allocate an array whose size exceeds the maximum allowed, which is reported as OOME with a specialized message, which led to the assertion failure in java_lang_Throwable::fill_in_stack_trace_of_preallocated_backtrace because java_lang_Throwable::unassigned_stacktrace() == NULL. Note that with a smaller -XX:AutoBoxCacheMax value of 1G we instead get a VM shutdown during initialization, because we are trying to GC before that's allowed, rather than trying to make an array with an impossible size. Aside: Reporting an attempt to allocate a mere 2G element objarray using an OOME seems a bit quaint these days. :-)
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> On Jun 28, 2016, at 3:43 PM, Kim Barrett wrote: > > Updated webrevs: > > Full: > http://cr.openjdk.java.net/~kbarrett/8156500/jdk.01/ > http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.01/ > The VM change does simplify the implementation a lot. Does/should the reference processing and enqueue them in the pending list at a safepoint? Or does the enqueuing happen while other Java threads are running except that it holds Heap_lock? My main concern is that the ReferenceHandler thread now has to make a VM upcall to dequeue a reference object for every iteration. I believe this is what the original implementation tried to avoid. Would it be feasible for popReferencePendingList to be intrinsified? If it is infeasible, we should think through other alternatives for example like Peter’s suggestion to get the entire pending list at a time. Comments on Reference.java 143 /* Atomically pop the next pending reference. If should_wait is 144 * true and there are no pending references, block until the 145 * pending reference list becomes non-empty. The GC adds 146 * references to the list, and notifies waiting threads when 147 * references are added. 148 */ 149 private static native Reference popReferencePendingList(boolean should_wait); The parameter name “should_wait” - I would avoid using the word “should” and `_`. What about “await”? The spec needs to specify: What happens if the thread calling this method is interrupted? What does it return? Does it throw InterruptedException? What happens to the thread’s interrupted status? This method is significant and can you add @param, @return, @throws to make it a proper javadoc. This change is small but the implication is significant. JDK-7103238 would be more appropriate for this change than JDK-8156400 and its synopsis that does not highlight what the changeset is about. > Still investigating the initialization order for core exceptions. > This would be good to understand what’s going on there. Mandy
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
Hi again, On 06/29/2016 03:30 PM, Peter Levart wrote: Just to show what I mean, here's a simple optimization that doesn't use a private pendingList of references shared among callers of tryHandlePanding(true/false), but instead uses course-grained synchronization and waiting for tryHandlePanding(false) callers while ReferenceHandler is privately processing the whole list of pending references: http://cr.openjdk.java.net/~plevart/misc/PendingReferenceHandling/webrev.02/ ...and now, on top of above, for some more involved optimization that is most effective when a chunk of pending references is discovered where majority are destined for the same ReferenceQueue (as in our benchmark): http://cr.openjdk.java.net/~plevart/misc/PendingReferenceHandling/webrev.03/ The results of benchmark: Original JDK 9: Benchmark(refCount) Mode Cnt Score Error Units ReferenceEnqueueBench.dequeueReferences 10ss 100 38410.515 ± 1011.769 us/op Patched (by Kim): Benchmark(refCount) Mode Cnt Score Error Units ReferenceEnqueueBench.dequeueReferences 10ss 100 42197.522 ± 1161.451 us/op Proposed (by Peter, webrev): Benchmark(refCount) Mode Cnt Score Error Units ReferenceEnqueueBench.dequeueReferences 10ss 100 34134.977 ± 1274.753 us/op Proposed (by Peter, webrev.02): Benchmark(refCount) Mode Cnt Score Error Units ReferenceEnqueueBench.dequeueReferences 10ss 100 27935.929 ± 1128.678 us/op Proposed (by Peter, webrev.03): Benchmark(refCount) Mode Cnt Score Error Units ReferenceEnqueueBench.dequeueReferences 10ss 100 6603.009 ± 59.051 us/op That's already 6x faster than baseline! I'm not suggesting that any of these optimizations is applied. I just wanted to show that a JNI call that allows retrieving the whole pending list has potential to allow such optimizations. Regards, Peter
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
Hi Kim, On 06/29/2016 01:22 PM, Peter Levart wrote: Transfering the whole list in one JNI invocation has the potential for further optimizations on the Java side (like handling the whole popped list privately without additional synchronization - if we ever find a way for java.nio.Bits to wait for it reliably - or even enqueue-ing a chunk of consecutive references destined for the same queue using a single synchronized action on the queue, etc...) Just to show what I mean, here's a simple optimization that doesn't use a private pendingList of references shared among callers of tryHandlePanding(true/false), but instead uses course-grained synchronization and waiting for tryHandlePanding(false) callers while ReferenceHandler is privately processing the whole list of pending references: http://cr.openjdk.java.net/~plevart/misc/PendingReferenceHandling/webrev.02/ This further improves benchmark results and it still passes the DirectBufferAllocTest: Original JDK 9: Benchmark(refCount) Mode Cnt Score Error Units ReferenceEnqueueBench.dequeueReferences 10ss 100 38410.515 ± 1011.769 us/op Patched (by Kim): Benchmark(refCount) Mode Cnt Score Error Units ReferenceEnqueueBench.dequeueReferences 10ss 100 42197.522 ± 1161.451 us/op Proposed (by Peter, webrev): Benchmark(refCount) Mode Cnt Score Error Units ReferenceEnqueueBench.dequeueReferences 10ss 100 34134.977 ± 1274.753 us/op Proposed (by Peter, webrev.02): Benchmark(refCount) Mode Cnt Score Error Units ReferenceEnqueueBench.dequeueReferences 10ss 100 27935.929 ± 1128.678 us/op Regards, Peter
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
Hi Kim, Let me chime-in although it's a bit late... I think this is a good change to finally get rid of OOME problems in this area. On 06/28/2016 07:45 PM, Kim Barrett wrote: On Jun 28, 2016, at 5:33 AM, Per Liden wrote: Patch looks good. The only thing I don't feel qualified to review is the initialization order change in thread.cpp, so I'll let others comments on that. Thanks. I’ll be following up on that area. I like the pop-one-reference-at-a-time semantics, which simplifies things a lot and keeps the interface nice and clean. I was previously afraid that it might cause a noticeable performance degradation compared to lifting the whole list into Java in one go, but your testing seem to prove that's not the case. I was concerned about that too, and had tried a different approach that also still supported the existing "some callers wait and others don’t" API, but it was a bit messy. Coleen convinced me to try this (since it was easy) and do the measurement, and it worked out well. The repeated JNI invocations do not present a big overhead, but it is measurable. For example, the following benchmark measures the time it takes after a GC cycle for a bunch of references to be transfered to Java side, enqueued into ReferenceQueue and dequeued from it: http://cr.openjdk.java.net/~plevart/misc/PendingReferenceHandling/ReferenceEnqueueBench.java The results on my i7-4771 PC: Original JDK 9: Benchmark(refCount) Mode Cnt Score Error Units ReferenceEnqueueBench.dequeueReferences 10ss 100 38410.515 ± 1011.769 us/op Patched (by Kim): Benchmark(refCount) Mode Cnt Score Error Units ReferenceEnqueueBench.dequeueReferences 10ss 100 42197.522 ± 1161.451 us/op So, about 10% worse for this benchmark. Transfering the whole list in one JNI invocation has the potential for further optimizations on the Java side (like handling the whole popped list privately without additional synchronization - if we ever find a way for java.nio.Bits to wait for it reliably - or even enqueue-ing a chunk of consecutive references destined for the same queue using a single synchronized action on the queue, etc...) If the JNI API was something like the following: /* Atomically pop the pending reference list if wholeList is true, * or just next pending reference if wholeList is false. * If wait is true and the pending reference list is empty, blocks until * it becomes non-empty, or returns null if wait is false. */ private static native Reference popReferencePendingList(boolean wholeList, boolean wait); Then not too much complication is needed in Reference.java (diff to current JDK 9 sources) to consume this API and already have some benefit from it: http://cr.openjdk.java.net/~plevart/misc/PendingReferenceHandling/webrev/ There is a possible race here between ReferenceHandler calling tryHandlePending(true) and java.nio.Bits calling tryHandlePending(false) that can make the method return false to java.nio.Bits when ReferenceHandler has just popped the whole list and not yet installed it in the private pendingList field, but java.nio.Bits makes several retries in this case with exponentially increasing delays, so that does not currently present a problem. java/nio/Buffer/DirectBufferAllocTest.java test passes with this change. And the above benchmark shows improvement instead of regression: Proposed (my Peter): Benchmark(refCount) Mode Cnt Score Error Units ReferenceEnqueueBench.dequeueReferences 10ss 100 34134.977 ± 1274.753 us/op So what do you think? Is it worth the little additional logic on the Java side? Regards, Peter
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
Hi Mandy, On 29/06/2016 12:28 PM, Mandy Chung wrote: On Jun 28, 2016, at 4:23 PM, David Holmes wrote: On 29/06/2016 8:43 AM, Kim Barrett wrote: Updated webrevs: Full: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.01/ http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.01/ Incremental: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.01.inc/ Did Reference not work? Just curious. I used to be a qualified Java programmer back in the Java 5 era, but wildcards were always a bit iffy :) A related post in compiler-dev w.r.t. Reference: http://mail.openjdk.java.net/pipermail/compiler-dev/2014-January/008457.html Looks like no reply on that thread though. http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.01.inc/ Still investigating the initialization order for core exceptions. I suspect it is as Coleen indicated that the module changes introduced the new failure path that you hit. I did a quick check of the initialization order in an old b50: 33 Initializing 'java/lang/Throwable' (0x001780002990) ... 46 Initializing 'java/lang/Exception'(no method) (0x001780003158) 47 Initializing 'java/lang/InterruptedException'(no method) (0x0017800178a8) Compare that with a current build: 60 Initializing 'java/lang/ref/Reference$ReferenceHandler' (0x00080001e3f0) 61 Initializing 'java/lang/Throwable' (0x000829f8) 62 Initializing 'java/lang/Exception'(no method) (0x000831a8) 63 Initializing 'java/lang/InterruptedException'(no method) (0x00080001e6e0) 64 Initializing 'java/lang/ref/PhantomReference'(no method) (0x00086440) How do you get this list? For current builds: java -Xlog:class+init -version (and I trimmed the output) For slightly older builds: java -Xlog:classinit -version For pre UL you need a fastdebug build and: java -XX:+TraceClassInitialization -version Initialization of Throwable is much, much later (large parts of java.lang and java.util are now initialized first!) and is obviously a direct consequence of preinitializing InterruptedException. Do you mind trying b110 before JEP 261 was integrated in jdk-9+111 and see any difference? Here's fragment from b109: 31 Initializing 'java/lang/ref/Reference' (0x00085a08) 32 Initializing 'java/lang/ref/Reference$Lock'(no method) (0x0008000181c0) 33 Initializing 'java/lang/ref/Reference$ReferenceHandler' (0x0008000183b8) 34 Initializing 'java/lang/Throwable' (0x000828f0) initPhase1 is expected to be called very early in the VM initialization after the primordial classes are loaded. IIRC, the exception classes are preallocated to improve diagnosability in the case when we run out of memory VM can still throw a preallocated instance with an preallocated stack trace buffer. I don’t think these exception classes are expected to be loaded initPhase1. So here is what I see has happened. Looking back at 9-b01, before we forced the initialization of InterruptedException and thus Throwable we find: 58 Initializing 'java/lang/Throwable' (0x00082990) So Kim is right this was working by accident. It just seems that back then there was less memory required by the initialization of all the collection and other classes and so we didn't run into this problem. Post the InterruptedException change the initialization order made it unlikely an OOME could be encountered before Throwable was initialized (and after we have reached a point where we can throw without the VM crashing or instead doing a vm_exit_during_initialization). Post modules those collection classes, and others, are now done earlier again and before Throwable. And presumably their memory demands have increased. Although we preallocate the OutOfMemoryError instances, and avoid executing any java code to do so, we can't necessarily** "throw" them until after Throwable is initialized. We now have a lot more initialization occurring before we init Throwable and so OOME is more likely and so it will fail as Kim observed. ** I say necessarily because I still believe it is the fact we attempt to fill in the stacktrace that leads to the dependency on Throwable being initialized, and we should be able to avoid that if we check the VM initialization state in gen_out_of_memory_error(). Thanks, David More to dig here. Mandy So I would say that the module change did break this and that initialization of Throwable (only) should be restored to a much higher place in the initialization sequence. Thanks, David
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> On Jun 28, 2016, at 4:23 PM, David Holmes wrote: > > On 29/06/2016 8:43 AM, Kim Barrett wrote: >> Updated webrevs: >> >> Full: >> http://cr.openjdk.java.net/~kbarrett/8156500/jdk.01/ >> http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.01/ >> >> Incremental: >> http://cr.openjdk.java.net/~kbarrett/8156500/jdk.01.inc/ > > Did Reference not work? Just curious. I used to be a qualified Java > programmer back in the Java 5 era, but wildcards were always a bit iffy :) A related post in compiler-dev w.r.t. Reference: http://mail.openjdk.java.net/pipermail/compiler-dev/2014-January/008457.html Looks like no reply on that thread though. > >> http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.01.inc/ >> >> Still investigating the initialization order for core exceptions. > > I suspect it is as Coleen indicated that the module changes introduced the > new failure path that you hit. I did a quick check of the initialization > order in an old b50: > > 33 Initializing 'java/lang/Throwable' (0x001780002990) > ... > 46 Initializing 'java/lang/Exception'(no method) (0x001780003158) > 47 Initializing 'java/lang/InterruptedException'(no method) > (0x0017800178a8) > > Compare that with a current build: > > 60 Initializing 'java/lang/ref/Reference$ReferenceHandler' > (0x00080001e3f0) > 61 Initializing 'java/lang/Throwable' (0x000829f8) > 62 Initializing 'java/lang/Exception'(no method) (0x000831a8) > 63 Initializing 'java/lang/InterruptedException'(no method) > (0x00080001e6e0) > 64 Initializing 'java/lang/ref/PhantomReference'(no method) > (0x00086440) > How do you get this list? > Initialization of Throwable is much, much later (large parts of java.lang and > java.util are now initialized first!) and is obviously a direct consequence > of preinitializing InterruptedException. > Do you mind trying b110 before JEP 261 was integrated in jdk-9+111 and see any difference? initPhase1 is expected to be called very early in the VM initialization after the primordial classes are loaded. IIRC, the exception classes are preallocated to improve diagnosability in the case when we run out of memory VM can still throw a preallocated instance with an preallocated stack trace buffer. I don’t think these exception classes are expected to be loaded initPhase1. More to dig here. Mandy > So I would say that the module change did break this and that initialization > of Throwable (only) should be restored to a much higher place in the > initialization sequence. > > Thanks, > David >
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
On 29/06/2016 8:43 AM, Kim Barrett wrote: Updated webrevs: Full: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.01/ http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.01/ Incremental: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.01.inc/ Did Reference not work? Just curious. I used to be a qualified Java programmer back in the Java 5 era, but wildcards were always a bit iffy :) http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.01.inc/ Still investigating the initialization order for core exceptions. I suspect it is as Coleen indicated that the module changes introduced the new failure path that you hit. I did a quick check of the initialization order in an old b50: 33 Initializing 'java/lang/Throwable' (0x001780002990) ... 46 Initializing 'java/lang/Exception'(no method) (0x001780003158) 47 Initializing 'java/lang/InterruptedException'(no method) (0x0017800178a8) Compare that with a current build: 60 Initializing 'java/lang/ref/Reference$ReferenceHandler' (0x00080001e3f0) 61 Initializing 'java/lang/Throwable' (0x000829f8) 62 Initializing 'java/lang/Exception'(no method) (0x000831a8) 63 Initializing 'java/lang/InterruptedException'(no method) (0x00080001e6e0) 64 Initializing 'java/lang/ref/PhantomReference'(no method) (0x00086440) Initialization of Throwable is much, much later (large parts of java.lang and java.util are now initialized first!) and is obviously a direct consequence of preinitializing InterruptedException. So I would say that the module change did break this and that initialization of Throwable (only) should be restored to a much higher place in the initialization sequence. Thanks, David
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> On Jun 27, 2016, at 9:34 PM, Daniel D. Daugherty > wrote: > Other than the possible missing notify_all() in vmGCOperations.cpp, > this all looks great. Thanks Dan.
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
Updated webrevs: Full: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.01/ http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.01/ Incremental: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.01.inc/ http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.01.inc/ Still investigating the initialization order for core exceptions.
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> On Jun 28, 2016, at 12:04 AM, David Holmes wrote: > > Hi Kim, > > I like all the removal of the pending-list-locker related code, but can't > really comment on the details of how it was replaced and interacts with all > the GC code. The interface to the Reference class is fine, it is the GC code > that pushes into the reference queue that I can't really comment on. Thanks David. > I have a couple of queries following up from Coleen's and Dan's reviews. > > First, the synchronization of the pending-reference-list leaves me somewhat > perplexed. As Coleen noted you require the heap_lock to be held but also use > an Atomic::xchg. You have a comment: > > + // owns the lock. Swap is used by parallel non-concurrent reference > + // processing threads, where some higher level controller owns > + // Heap_lock, so requires the lock is locked, but not necessarily by > + // the current thread. > > Can you clarify the higher-level protocol that is at play there. Asserting > that "someone" owns a lock seems only marginally useful if you can't be sure > who that owner is, or when they might release it. Some more direct detecting > of being within this higher-level protocol would be better, if it is possible > to detect. Was Per’s explanation sufficient? > As previously mentioned the change to the initialization order is a concern > to me - there are always unexpected consequences of making such changes, no > matter how straight-forward they seem at the time. Pre-initialization of > exception classes is not really essential as the attempt to throw any of them > from Java code will force initialization anyway - it's only for exceptions > created and thrown from the VM code that direct initialization is needed. The > problematic case is OOME because throwing the OOME may trigger a secondary > OOME during that class initialization etc. Hence we try to deal with that by > pre-allocating instances that do not have stacktraces etc, so they can be > thrown safely. Hence my earlier comment that I think the real bug in your > situation is that gen_out_of_memory_error() is not considering how far into > the VM init sequence we are, and that it should be returning the > pre-allocated instance with no backtrace. I'm also wondering why the existing order must have worked before the pre-initialization of InterruptedException by Reference, but now doesn't when we take that out. I'll see if I can figure out what's going on there. It might be some bug was introduced but was being masked by the Reference initialization. > --- > > A query on the JDK side: > > src/java.base/share/classes/java/lang/ref/Reference.java > > private static native Reference > popReferencePendingList(boolean wait); > > I'm intrigued by the use of the lower-bounded wildcard using Object. As > Object has no super-types this looks very strange and should be equivalent to > the more common upper-bounded wildcard using Object ie Reference Object> or more concisely Reference. I see this construct exists in the > current code for the ReferenceQueue, but I am quite intrigued as to why? > (Someone getting ready for Any-types? :) ) I botched that. I meant to have a real Java programmer (which I don't qualify as) look at that part of the change before sending it out for review, but seem to have forgotten to do so. And lulled into a false sense of security, since neither the compiler nor the runtime complained, unlike the reams of template instantiation error output or segfaults at runtime that I'd expect with C++...
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> On Jun 28, 2016, at 5:33 AM, Per Liden wrote: > Patch looks good. The only thing I don't feel qualified to review is the > initialization order change in thread.cpp, so I'll let others comments on > that. Thanks. I’ll be following up on that area. > I like the pop-one-reference-at-a-time semantics, which simplifies things a > lot and keeps the interface nice and clean. I was previously afraid that it > might cause a noticeable performance degradation compared to lifting the > whole list into Java in one go, but your testing seem to prove that's not the > case. I was concerned about that too, and had tried a different approach that also still supported the existing "some callers wait and others don’t" API, but it was a bit messy. Coleen convinced me to try this (since it was easy) and do the measurement, and it worked out well.
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
Hi Kim, On 2016-06-24 22:09, Kim Barrett wrote: Please review this change which moves the Reference pending list and locking from the java.lang.ref.Reference class into the VM. This includes elimination of the ReferencePendingListLocker mechanism in the VM. This fixes various fragility issues arising from having the management of this list split between Java and the VM (GC). This change addresses JDK-8156500 by eliminating the possibility of suspension while holding the lock. Because the locking is now done in the VM, we have full control over where suspension may occur. This change retains the existing interface between the reference processor and the nio.Bits package, e.g. Reference.tryHandlePending has the same signature and behavior; this change just pushes the pending list manipulation down into the VM. There are open bugs reports, enhancement requests, and discussions related to that interface; see below. This change does not attempt to address them. This change additionally addresses or renders moot https://bugs.openjdk.java.net/browse/JDK-8055232 (ref) Exceptions while processing Reference pending list and https://bugs.openjdk.java.net/browse/JDK-7103238 Ensure pending list lock is held on behalf of GC before enqueuing references on to the pending list It is also relevant for followup discussion of https://bugs.openjdk.java.net/browse/JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails immediately and https://bugs.openjdk.java.net/browse/JDK-8149925 We don't need jdk.internal.ref.Cleaner any more - part 1 and has implications for: https://bugs.openjdk.java.net/browse/JDK-8066859 java/lang/ref/OOMEInReferenceHandler.java failed with java.lang.Exception: Reference Handler thread died In addition to the obviously relevant changes, there are a couple of changes whose presence here might seem surprising and require further explanation. - Removal of a stale comment in create_vm, noticed because it was near some code being removed as part of this change. The comment was rendered obsolete by JDK-8028275. - Moved initialization of exception classes earlier in initialize_java_lang_classes. The previous order only worked by accident, at least for OOME. During the bulk of the library initialization, OOME may be thrown, perhaps due to poorly chosen command line options. That attempts to use one of the pre-allocated OOME objects, and tries to fill in its stack trace. If the Throwable class is not yet initialized, this leads to a failed assert in the VM because Throwable.UNASSIGNED_STACK still has a null value. This initialization order issue was being masked by the old pending list implementation; the initialization of Reference ensured InterruptedException was initialized (thereby initializing its Throwable base class), and the initialization of Reference just happened to occur early enough that Throwable was initialized by the time it was needed when running certain tests. The forced initialization of InterruptedException is no longer needed by Reference, but removal of it triggered test failures (and could trigger corresponding product failures) because of this ordering issue. CR: https://bugs.openjdk.java.net/browse/JDK-8156500 Patch looks good. The only thing I don't feel qualified to review is the initialization order change in thread.cpp, so I'll let others comments on that. I like the pop-one-reference-at-a-time semantics, which simplifies things a lot and keeps the interface nice and clean. I was previously afraid that it might cause a noticeable performance degradation compared to lifting the whole list into Java in one go, but your testing seem to prove that's not the case. cheers, Per Webrev: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.00/ http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.00/ Testing: RBT ad hoc nightly runtime & gc tests. With the proposed patch for JDK-8153711 applied, locally ran nearly 35K iterations of the OomDebugTest that is part of that patch, with no failures / deadlocks. Without this change it would typically only take on the order of 100 iterations to hit a deadlock failure. Locally ran direct byte buffer allocation test: jdk/test/java/nio/Buffer/DirectBufferAllocTest.java This change reported 3% faster allocation times, and 1/2 the standard deviation for allocation times. Locally ran failing test from JDK-8022321 and JDK-8066859 a few thousand times. jdk/test/java/lang/ref/OOMEInReferenceHandler.java No errors, but failures were noted as hard to reproduce.
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
Hi Dan, On 2016-06-28 03:34, Daniel D. Daugherty wrote: [...] src/share/vm/gc/shared/vmGCOperations.cpp L103: Heap_lock->unlock(); You did not add a conditional "Heap_lock->notify_all()" before the Heap_lock->unlock() like you've done in other places. Since you deleted a release_and_notify_pending_list_lock() call, I would think you need the: if (Universe::has_reference_pending_list()) { Heap_lock->notify_all(); } This path is only taken if the GC was skipped and never ran. In this case, we know we didn't discover and enqueued any new references on the pending list, hence no need to notify any waiting thread (the Reference Handler thread in this case). cheers, Per
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
Hi David, On 2016-06-28 06:04, David Holmes wrote: Hi Kim, I like all the removal of the pending-list-locker related code, but can't really comment on the details of how it was replaced and interacts with all the GC code. The interface to the Reference class is fine, it is the GC code that pushes into the reference queue that I can't really comment on. I have a couple of queries following up from Coleen's and Dan's reviews. First, the synchronization of the pending-reference-list leaves me somewhat perplexed. As Coleen noted you require the heap_lock to be held but also use an Atomic::xchg. You have a comment: + // owns the lock. Swap is used by parallel non-concurrent reference + // processing threads, where some higher level controller owns + // Heap_lock, so requires the lock is locked, but not necessarily by + // the current thread. Can you clarify the higher-level protocol that is at play there. Asserting that "someone" owns a lock seems only marginally useful if you can't be sure who that owner is, or when they might release it. Some more direct detecting of being within this higher-level protocol would be better, if it is possible to detect. The Heap_lock protects the pending list from access from the Reference Handler thread (or any other non-GC thread). During a GC, we grab the Heap_lock so that the GC then "owns" the list and is free to modify it. However, the GC itself potentially has many GC worker threads doing reference enqueuing. The atomic swap is there to allow concurrent insertion (prepending) into the list by these GC worker threads. So, each GC worker concurrently calls into ReferenceProcessor::enqueue_discovered_reflist(), with it's own/local list of references to enqueue onto the global pending list. Hope that clarifies things. cheers, Per --- As previously mentioned the change to the initialization order is a concern to me - there are always unexpected consequences of making such changes, no matter how straight-forward they seem at the time. Pre-initialization of exception classes is not really essential as the attempt to throw any of them from Java code will force initialization anyway - it's only for exceptions created and thrown from the VM code that direct initialization is needed. The problematic case is OOME because throwing the OOME may trigger a secondary OOME during that class initialization etc. Hence we try to deal with that by pre-allocating instances that do not have stacktraces etc, so they can be thrown safely. Hence my earlier comment that I think the real bug in your situation is that gen_out_of_memory_error() is not considering how far into the VM init sequence we are, and that it should be returning the pre-allocated instance with no backtrace. --- A query on the JDK side: src/java.base/share/classes/java/lang/ref/Reference.java private static native Reference popReferencePendingList(boolean wait); I'm intrigued by the use of the lower-bounded wildcard using Object. As Object has no super-types this looks very strange and should be equivalent to the more common upper-bounded wildcard using Object ie Reference or more concisely Reference. I see this construct exists in the current code for the ReferenceQueue, but I am quite intrigued as to why? (Someone getting ready for Any-types? :) ) Thanks, David - On 25/06/2016 6:09 AM, Kim Barrett wrote: Please review this change which moves the Reference pending list and locking from the java.lang.ref.Reference class into the VM. This includes elimination of the ReferencePendingListLocker mechanism in the VM. This fixes various fragility issues arising from having the management of this list split between Java and the VM (GC). This change addresses JDK-8156500 by eliminating the possibility of suspension while holding the lock. Because the locking is now done in the VM, we have full control over where suspension may occur. This change retains the existing interface between the reference processor and the nio.Bits package, e.g. Reference.tryHandlePending has the same signature and behavior; this change just pushes the pending list manipulation down into the VM. There are open bugs reports, enhancement requests, and discussions related to that interface; see below. This change does not attempt to address them. This change additionally addresses or renders moot https://bugs.openjdk.java.net/browse/JDK-8055232 (ref) Exceptions while processing Reference pending list and https://bugs.openjdk.java.net/browse/JDK-7103238 Ensure pending list lock is held on behalf of GC before enqueuing references on to the pending list It is also relevant for followup discussion of https://bugs.openjdk.java.net/browse/JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails immediately and https://bugs.openjdk.java.net/browse/JDK-8149925 We don't need jdk.internal.ref.Cleaner any more - part 1 and has implications for: https://bugs.openjdk.java.net/browse/JDK-8066859 java/lang/ref/OOMEInRef
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
Hi Kim, I like all the removal of the pending-list-locker related code, but can't really comment on the details of how it was replaced and interacts with all the GC code. The interface to the Reference class is fine, it is the GC code that pushes into the reference queue that I can't really comment on. I have a couple of queries following up from Coleen's and Dan's reviews. First, the synchronization of the pending-reference-list leaves me somewhat perplexed. As Coleen noted you require the heap_lock to be held but also use an Atomic::xchg. You have a comment: + // owns the lock. Swap is used by parallel non-concurrent reference + // processing threads, where some higher level controller owns + // Heap_lock, so requires the lock is locked, but not necessarily by + // the current thread. Can you clarify the higher-level protocol that is at play there. Asserting that "someone" owns a lock seems only marginally useful if you can't be sure who that owner is, or when they might release it. Some more direct detecting of being within this higher-level protocol would be better, if it is possible to detect. --- As previously mentioned the change to the initialization order is a concern to me - there are always unexpected consequences of making such changes, no matter how straight-forward they seem at the time. Pre-initialization of exception classes is not really essential as the attempt to throw any of them from Java code will force initialization anyway - it's only for exceptions created and thrown from the VM code that direct initialization is needed. The problematic case is OOME because throwing the OOME may trigger a secondary OOME during that class initialization etc. Hence we try to deal with that by pre-allocating instances that do not have stacktraces etc, so they can be thrown safely. Hence my earlier comment that I think the real bug in your situation is that gen_out_of_memory_error() is not considering how far into the VM init sequence we are, and that it should be returning the pre-allocated instance with no backtrace. --- A query on the JDK side: src/java.base/share/classes/java/lang/ref/Reference.java private static native Reference popReferencePendingList(boolean wait); I'm intrigued by the use of the lower-bounded wildcard using Object. As Object has no super-types this looks very strange and should be equivalent to the more common upper-bounded wildcard using Object ie Reference or more concisely Reference. I see this construct exists in the current code for the ReferenceQueue, but I am quite intrigued as to why? (Someone getting ready for Any-types? :) ) Thanks, David - On 25/06/2016 6:09 AM, Kim Barrett wrote: Please review this change which moves the Reference pending list and locking from the java.lang.ref.Reference class into the VM. This includes elimination of the ReferencePendingListLocker mechanism in the VM. This fixes various fragility issues arising from having the management of this list split between Java and the VM (GC). This change addresses JDK-8156500 by eliminating the possibility of suspension while holding the lock. Because the locking is now done in the VM, we have full control over where suspension may occur. This change retains the existing interface between the reference processor and the nio.Bits package, e.g. Reference.tryHandlePending has the same signature and behavior; this change just pushes the pending list manipulation down into the VM. There are open bugs reports, enhancement requests, and discussions related to that interface; see below. This change does not attempt to address them. This change additionally addresses or renders moot https://bugs.openjdk.java.net/browse/JDK-8055232 (ref) Exceptions while processing Reference pending list and https://bugs.openjdk.java.net/browse/JDK-7103238 Ensure pending list lock is held on behalf of GC before enqueuing references on to the pending list It is also relevant for followup discussion of https://bugs.openjdk.java.net/browse/JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails immediately and https://bugs.openjdk.java.net/browse/JDK-8149925 We don't need jdk.internal.ref.Cleaner any more - part 1 and has implications for: https://bugs.openjdk.java.net/browse/JDK-8066859 java/lang/ref/OOMEInReferenceHandler.java failed with java.lang.Exception: Reference Handler thread died In addition to the obviously relevant changes, there are a couple of changes whose presence here might seem surprising and require further explanation. - Removal of a stale comment in create_vm, noticed because it was near some code being removed as part of this change. The comment was rendered obsolete by JDK-8028275. - Moved initialization of exception classes earlier in initialize_java_lang_classes. The previous order only worked by accident, at least for OOME. During the bulk of the library initialization, OOME may be thrown, perhaps due to poorly chosen comma
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> Webrev: > http://cr.openjdk.java.net/~kbarrett/8156500/jdk.00/ make/mapfiles/libjava/mapfile-vers No comments. src/java.base/share/classes/java/lang/ref/Reference.java Much cleaner (pun might have been intended :-)). src/java.base/share/native/include/jvm.h No comments. src/java.base/share/native/libjava/Reference.c No comments. > http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.00/ make/symbols/symbols-unix No comments. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/Threads.java No comments. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/utilities/soql/sa.js No comments. src/share/vm/ci/ciReplay.cpp No comments. src/share/vm/classfile/javaClasses.cpp No comments. src/share/vm/classfile/javaClasses.hpp No comments. src/share/vm/compiler/compileBroker.cpp No comments. src/share/vm/gc/cms/concurrentMarkSweepThread.cpp So no more Surrogate Locker Thread (SLT)? Sound good to me. src/share/vm/gc/cms/vmCMSOperations.cpp No comments. src/share/vm/gc/cms/vmCMSOperations.hpp No comments. src/share/vm/gc/g1/concurrentMarkThread.cpp No comments. src/share/vm/gc/g1/g1CollectedHeap.hpp No comments. src/share/vm/gc/g1/vm_operations_g1.cpp No comments. src/share/vm/gc/g1/vm_operations_g1.hpp No comments. src/share/vm/gc/shared/collectedHeap.hpp No comments. src/share/vm/gc/shared/genCollectedHeap.hpp No comments. src/share/vm/gc/shared/referenceProcessor.cpp No comments. src/share/vm/gc/shared/referenceProcessor.hpp No comments. src/share/vm/gc/shared/vmGCOperations.cpp L103: Heap_lock->unlock(); You did not add a conditional "Heap_lock->notify_all()" before the Heap_lock->unlock() like you've done in other places. Since you deleted a release_and_notify_pending_list_lock() call, I would think you need the: if (Universe::has_reference_pending_list()) { Heap_lock->notify_all(); } src/share/vm/gc/shared/vmGCOperations.hpp No comments. src/share/vm/memory/universe.cpp No comments. src/share/vm/memory/universe.hpp No comments. src/share/vm/oops/method.cpp No comments. src/share/vm/prims/jvm.cpp No comments. src/share/vm/prims/jvm.h No comments. src/share/vm/runtime/thread.cpp So this change moves call_initPhase1() after the initialize_class() calls and that just sets off alarm bells in my head. Yes, you have a really big comment below explaining this. Still worries me! src/share/vm/runtime/vmStructs.cpp No comments. src/share/vm/gc/shared/referencePendingListLocker.cpp src/share/vm/gc/shared/referencePendingListLocker.hpp Both files are deleted and don't need review. Other than the possible missing notify_all() in vmGCOperations.cpp, this all looks great. Dan On 6/24/16 2:09 PM, Kim Barrett wrote: Please review this change which moves the Reference pending list and locking from the java.lang.ref.Reference class into the VM. This includes elimination of the ReferencePendingListLocker mechanism in the VM. This fixes various fragility issues arising from having the management of this list split between Java and the VM (GC). This change addresses JDK-8156500 by eliminating the possibility of suspension while holding the lock. Because the locking is now done in the VM, we have full control over where suspension may occur. This change retains the existing interface between the reference processor and the nio.Bits package, e.g. Reference.tryHandlePending has the same signature and behavior; this change just pushes the pending list manipulation down into the VM. There are open bugs reports, enhancement requests, and discussions related to that interface; see below. This change does not attempt to address them. This change additionally addresses or renders moot https://bugs.openjdk.java.net/browse/JDK-8055232 (ref) Exceptions while processing Reference pending list and https://bugs.openjdk.java.net/browse/JDK-7103238 Ensure pending list lock is held on behalf of GC before enqueuing references on to the pending list It is also relevant for followup discussion of https://bugs.openjdk.java.net/browse/JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails immediately and https://bugs.openjdk.java.net/browse/JDK-8149925 We don't need jdk.internal.ref.Cleaner any more - part 1 and has implications for: https://bugs.openjdk.java.net/browse/JDK-8066859 java/lang/ref/OOMEInReferenceHandler.java failed with java.lang.Exception: Reference Handler thread died In addition to the obviously relevant changes, there are a couple of changes whose presence here might seem surprising and require further explanation. - Removal of a stale comment in create_vm, noticed because it was near some code being removed as part of this change. The comment was rendered obsolete by JDK-8028275. - Moved initialization of exception classes earlier in initialize_java_lang_classes. The p
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
One more follow up before I actually review the code ... On 25/06/2016 6:09 AM, Kim Barrett wrote: Please review this change which moves the Reference pending list and locking from the java.lang.ref.Reference class into the VM. This includes elimination of the ReferencePendingListLocker mechanism in the VM. This fixes various fragility issues arising from having the management of this list split between Java and the VM (GC). This change addresses JDK-8156500 by eliminating the possibility of suspension while holding the lock. Because the locking is now done in the VM, we have full control over where suspension may occur. This change retains the existing interface between the reference processor and the nio.Bits package, e.g. Reference.tryHandlePending has the same signature and behavior; this change just pushes the pending list manipulation down into the VM. There are open bugs reports, enhancement requests, and discussions related to that interface; see below. This change does not attempt to address them. This change additionally addresses or renders moot https://bugs.openjdk.java.net/browse/JDK-8055232 (ref) Exceptions while processing Reference pending list and https://bugs.openjdk.java.net/browse/JDK-7103238 Ensure pending list lock is held on behalf of GC before enqueuing references on to the pending list It is also relevant for followup discussion of https://bugs.openjdk.java.net/browse/JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails immediately and https://bugs.openjdk.java.net/browse/JDK-8149925 We don't need jdk.internal.ref.Cleaner any more - part 1 and has implications for: https://bugs.openjdk.java.net/browse/JDK-8066859 java/lang/ref/OOMEInReferenceHandler.java failed with java.lang.Exception: Reference Handler thread died In addition to the obviously relevant changes, there are a couple of changes whose presence here might seem surprising and require further explanation. - Removal of a stale comment in create_vm, noticed because it was near some code being removed as part of this change. The comment was rendered obsolete by JDK-8028275. - Moved initialization of exception classes earlier in initialize_java_lang_classes. The previous order only worked by accident, at least for OOME. During the bulk of the library initialization, OOME may be thrown, perhaps due to poorly chosen command line options. That attempts to use one of the pre-allocated OOME objects, and tries to fill in its stack trace. If the Throwable class is not yet initialized, this leads to a failed assert in the VM because Throwable.UNASSIGNED_STACK still has a null value. This initialization order issue was being masked by the old pending list implementation; the initialization of Reference ensured InterruptedException was initialized (thereby initializing its Throwable base class), and the initialization of Reference just happened to occur early enough that Throwable was initialized by the time it was needed when running certain tests. The forced initialization of InterruptedException is no longer needed by Reference, but removal of it triggered test failures (and could trigger corresponding product failures) because of this ordering issue. This is surprising because prior to forcing the load of InterruptedException we did not see any initialization issues with OOME throwing that I am aware of. To me this seems to be a bug in Universe::gen_out_of_memory_error() as it is not checking to see if the necessary classes have been initialized - else it should return a pre-generated OOME with no backtrace. Any change to the VM initialization sequence has to be examined very carefully because there are always non obvious consequences to making changes. David - CR: https://bugs.openjdk.java.net/browse/JDK-8156500 Webrev: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.00/ http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.00/ Testing: RBT ad hoc nightly runtime & gc tests. With the proposed patch for JDK-8153711 applied, locally ran nearly 35K iterations of the OomDebugTest that is part of that patch, with no failures / deadlocks. Without this change it would typically only take on the order of 100 iterations to hit a deadlock failure. Locally ran direct byte buffer allocation test: jdk/test/java/nio/Buffer/DirectBufferAllocTest.java This change reported 3% faster allocation times, and 1/2 the standard deviation for allocation times. Locally ran failing test from JDK-8022321 and JDK-8066859 a few thousand times. jdk/test/java/lang/ref/OOMEInReferenceHandler.java No errors, but failures were noted as hard to reproduce.
Re: RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
Hi Kim, On 25/06/2016 6:09 AM, Kim Barrett wrote: Please review this change which moves the Reference pending list and locking from the java.lang.ref.Reference class into the VM. This includes elimination of the ReferencePendingListLocker mechanism in the VM. This fixes various fragility issues arising from having the management of this list split between Java and the VM (GC). This seems a particular example of a more general problem. If the target VM can be "suspended" (does that mean it is taken to a safepoint or ???) and an execution request is then sent, there must be numerous potential things that can fail because a "suspended" thread holds a resource needed by the execution request? David - This change addresses JDK-8156500 by eliminating the possibility of suspension while holding the lock. Because the locking is now done in the VM, we have full control over where suspension may occur. This change retains the existing interface between the reference processor and the nio.Bits package, e.g. Reference.tryHandlePending has the same signature and behavior; this change just pushes the pending list manipulation down into the VM. There are open bugs reports, enhancement requests, and discussions related to that interface; see below. This change does not attempt to address them. This change additionally addresses or renders moot https://bugs.openjdk.java.net/browse/JDK-8055232 (ref) Exceptions while processing Reference pending list and https://bugs.openjdk.java.net/browse/JDK-7103238 Ensure pending list lock is held on behalf of GC before enqueuing references on to the pending list It is also relevant for followup discussion of https://bugs.openjdk.java.net/browse/JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails immediately and https://bugs.openjdk.java.net/browse/JDK-8149925 We don't need jdk.internal.ref.Cleaner any more - part 1 and has implications for: https://bugs.openjdk.java.net/browse/JDK-8066859 java/lang/ref/OOMEInReferenceHandler.java failed with java.lang.Exception: Reference Handler thread died In addition to the obviously relevant changes, there are a couple of changes whose presence here might seem surprising and require further explanation. - Removal of a stale comment in create_vm, noticed because it was near some code being removed as part of this change. The comment was rendered obsolete by JDK-8028275. - Moved initialization of exception classes earlier in initialize_java_lang_classes. The previous order only worked by accident, at least for OOME. During the bulk of the library initialization, OOME may be thrown, perhaps due to poorly chosen command line options. That attempts to use one of the pre-allocated OOME objects, and tries to fill in its stack trace. If the Throwable class is not yet initialized, this leads to a failed assert in the VM because Throwable.UNASSIGNED_STACK still has a null value. This initialization order issue was being masked by the old pending list implementation; the initialization of Reference ensured InterruptedException was initialized (thereby initializing its Throwable base class), and the initialization of Reference just happened to occur early enough that Throwable was initialized by the time it was needed when running certain tests. The forced initialization of InterruptedException is no longer needed by Reference, but removal of it triggered test failures (and could trigger corresponding product failures) because of this ordering issue. CR: https://bugs.openjdk.java.net/browse/JDK-8156500 Webrev: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.00/ http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.00/ Testing: RBT ad hoc nightly runtime & gc tests. With the proposed patch for JDK-8153711 applied, locally ran nearly 35K iterations of the OomDebugTest that is part of that patch, with no failures / deadlocks. Without this change it would typically only take on the order of 100 iterations to hit a deadlock failure. Locally ran direct byte buffer allocation test: jdk/test/java/nio/Buffer/DirectBufferAllocTest.java This change reported 3% faster allocation times, and 1/2 the standard deviation for allocation times. Locally ran failing test from JDK-8022321 and JDK-8066859 a few thousand times. jdk/test/java/lang/ref/OOMEInReferenceHandler.java No errors, but failures were noted as hard to reproduce.