Hi,

Let me chime in and add some comments...

On 07/30/2015 01:46 AM, Kim Barrett wrote:
On Jul 29, 2015, at 4:32 AM, David Holmes <david.hol...@oracle.com> wrote:
On 29/07/2015 5:57 PM, Kim Barrett wrote:
...  The race is being fixed by reordering a pair
of volatile assignments.
While this seems logical for the failure at hand it isn't immediately obvious 
to me that setting next before setting the ENQUEUED state won't cause a problem 
for some other piece of code. Really these things need to be tightly 
synchronized - I think the real bug is the unsynchronized fast-path for the 
empty-queue case in poll(). While that change was deliberate in 6666739 this 
side-effect was not realized and would have affected the change. I hope Martin 
and/or Tom see this and chime in.
I thought the poll() fast-path from 6666739 was ok at the time it was
made, and that it was the later removal of synchronization on the
reference by 8014890 that lead to the race under discussion, and
reinstating that synchronization on the reference was another way to
fix this race.  Turns out I was wrong about that.

The poll() fast-path introduced the possibility of the following
unexpected behavior when another thread is in r.enqueue() and is in
the problematic window:

     !r.enqueue() && q.poll() == null => true

This can happen because the synchronization on the references is only
in ReferenceQueue.enqueue().  If it was instead in
Reference.enqueue(), or if ReferenceQueue.Null.enqueue() also
synchronized on the reference, this race would be prevented.

Isn't above condition perfectly legal for references that have already been de-queued (and when the 'q' is otherwise empty)? But I see what you mean. If r.enqueue() grabs the ENQUEUED queue, the method returns false, but that does not mean that the reference has already been hooked on the 'head' chain ('head' can still be null and q.poll() would return null because of fast-path).

reversing assignments of 'head' and 'queue' fix this situation too.


The removal of reference synchronization opened the door wider, also
allowing this unexpected behavior:

     r.isEnqueued() && q.poll() == null => true

This condition is more representative, yes...


The combination of those changes is needed for the ReferenceEnqueue
regression test to fail, since it requires

     r.isEnqueued() && !r.enqueue() && q.poll() == null => true

I wouldn't want to revert the poll() fast-path, since that was added
for specific performance reasons.

I don't think I'd want to add back synchronization on the reference,
but might be persuaded otherwise.  8029205 should be looked at in that
case.

I've looked carefully at uses of r.next and I don't think there is a
problem with reordering the r.next and r.queue assignments.  Of
course, the existing code was looked at by a number of people without
spotting the race under discussion.

A way to think about this that helped me (I think) understand the
locking structure used here is that q.head and r.next are part of the
queue associated with the reference.  We can manipulate those using
the queue's lock as the basis for protection, so long as the the
reference's queue isn't changed.  But when we change the reference's
queue, the queue (including r.next) must be in a consistent state.
[This suggests the r.queue = NULL assignment in reallyPoll() should be
moved later, though I think the assignment order in reallyPoll()
doesn't matter.]

I think the assignment to r.queue = NULL in realyPoll() should be moved *before* the assignment to 'head' (which might assign null if 'r' was the last element). Here's why:

Suppose we have a Reference 'r' and it's associated ReferenceQueue 'q'. Currently it can happen that the following evaluates to true, which is surprising:

q.poll() == null && r.isEnqueued()


Regards, Peter



That aside the commentary is rather verbose, a simple:

// Only set ENQUEUED state after the reference is enqueued

would suffice (and add "volatiles ensure ordering" if you want that to be 
clearer).
I do get a bit wordy sometimes.  How about this:

             // Update r.queue *after* adding to queue's list, to avoid
             // race between enqueued checks and poll().  Volatiles
             // ensure ordering.


CR:
https://bugs.openjdk.java.net/browse/JDK-8132306

Webrev:
http://cr.openjdk.java.net/~kbarrett/8132306/webrev.00/

Testing:
jprt with default and hotspot testsets

Locally tested race in original code by insertion of sleep in enqueue and
verifying ReferenceEnqueue regression test fails consistently as
described in the bug report.

Locally tested fixed code by insertion of sleeps at various points and
verifying ReferenceEnqueue regression test still passes.


Reply via email to