Well you can ignore what I wrote below - sorry. Somehow I got it in my head that multiple enqueue's were intended/supported when of course they are not. :(

So the proposed fix is okay - though I'd simplify the comment to just:

// Check that since getting the lock this reference hasn't already been
// enqueued (and even then removed)

The synchronization is problematic as I mention below but there is no easy fix due to the lock-ordering problem, and any attempt at such a fix would be much riskier. So this fix is fine - thanks.

David
-----

On 1/07/2013 10:43 AM, David Holmes wrote:
Hi Thomas,

Sorry for the delay in looking into this deeper but I've been OOTO a bit
this past week.

I'm backing up to the start to explore the apparent problem ...

On 19/06/2013 7:08 AM, Thomas Schatzl wrote:
Hi all,

   can I have reviews for the following change?

It happens if multiple threads are enqueuing and dequeuing reference
objects into a reference queue, that Reference objects may be enqueued
at multiple times.

This is because when java.lang.ref.ReferenceQueue.poll() returns and
inactivates a Reference object, another thread may just be during
enqueuing it again.

In the test case provided, the two threads that conflict are the
reference handler thread and the program (main) thread.

Relevant code:

ReferenceHandlerThread.run():

0: [...]
1: ReferenceQueue q = r.queue; // r is the reference
2: if (r != ReferenceQueue.NULL)
3:   q.enqueue().

ReferenceQueue::poll()(reallyPoll()) code (I removed lots of code here)

1: synchronized(lock) {
2:   [...]
3:   r.queue = ReferenceQueue.NULL;
3:}

I.e. while ReferenceQueue.poll() sets the Reference's queue to the NULL
queue so that that reference will not be enqueued again (or at most into
the NULL queue which is a nop), it happens that if the task switch
occurs between lines 2 and 3 of the reference handler thread, q still
contains the old queue reference, and the reference handler thread will
enqueue the Reference into the original queue again.

Let's set some initial conditions here. For poll() to remove 'r' it must
already have been enqueued on this queue. That means that r.queue ==
ENQUEUED. That means the ReferenceHandler thread would actually enqueue
it to the ENQUEUED instance - which is harmless.

So it seems to me that we must have a race between two enqueue attempts
for a problem to arise here. So lets assume that there is a concurrent
r.enqueue() with the reference handlers attempt to enqueue, and a poll()
is mixed in. Consider the following:

refThread reads target q:

   ReferenceQueue q = r.queue; // r is the reference
   if (r != ReferenceQueue.NULL)

(Note: the check for NULL is at best an optimization; it isn't needed)

Thread_1 does r.enqueue()

   So r.queue == ENQUEUED

Thread_2 does q.poll() and removes r.

   r.queue == NULL

refThread continues and executes:

   q.enqueue(r)

and so we have enqueued 'r' twice. Which seems to be the problem
scenario observed by the test, as we can then poll() and get 'r' a
second time.

But is it actually a problem if this happens? If I have two concurrent
attempts to enqueue a reference and concurrent attempt to dequeue it
(via poll) it seems quite plausible to see: enqueue, poll, enqueue - and
so the reference ends up in the queue. The test program is of the form:

           for (int i = 0; i < iterations; i++) {
                 queue = new ReferenceQueue<Object>();

                 for (int j = 0; j < refs.length; j++) {
                     refs[j] = new WeakReference(new Object(), queue);
                 }

                 System.gc(); // enqueues references into the list of
discovered references

                 // now manually enqueue all of them
                 for (int j = 0; j < refs.length; j++) {
                     refs[j].enqueue();
                 }
                 // and get them back. There should be exactly
numReferences
                 // entries in the queue now.
                 int foundReferences = 0;
                 while (queue.poll() != null) {
                     foundReferences++;
                 }

and naively we would assume that until the enqueue loop is complete (at
which point all refs are enqueued) then the ReferenceHandler thread will
not process any of those references as they are still strongly
reachable. If it processes them after then it is a no-op and either way
the poll() will find the exact number of references enqueued.

But that is not guaranteed to happen. As JLS 12.6.1 states:

"Optimizing transformations of a program can be designed that reduce the
number of objects that are reachable to be less than those which would
naively be considered reachable."

So in fact we might find that one or more references are no longer
reachable before their enqueue() operation is invoked and so
consequently we can indeed get this race between the reference handler's
attempt to enqueue and the test programs attempt to enqueue. I would say
this is a bug in the test program as it needs to ensure that the
references are guaranteed to be strongly reachable until after they have
had enqueue() invoked.

So your proposed fix really just masks the invalid assumption that a
ReferenceHandler based enqueue and a direct r.enqueue can't possibly be
concurrent.

Let's consider the case of two concurrent r.enqueue() calls, with an
interleaving poll(). Because Reference.enqueue lacks synchronization on
the reference while reading the queue you can get an interleaving where
the second enqueue() might see NULL, ENQUEUED or the actual queue
depending on the timing. I don't see any issue that requires a code
change. Only an omniscient observer can determine the exact order in
which the two enqueue()'s and the poll() occur, so pretty much any
outcome is valid.

So I don't think there is actually a bug in the reference code per-se,
at least not in relation to this test program.

Now the synchronization is still "all over the place". There are races
due to lack of synchronized (the ref should be locked whenever any of
its fields, ie queue & next, are modified), and lack of volatile on
fields accessed without synchronization. But whether any of those races
are actually a bug is a separate matter and very difficult to determine.
Bugs would arise where multi-field invariants are violated due to lack
of sync, but AFAICS that does not occur. Even JMM issues, reading stale
fields, don't seem to cause any problem here.

Cheers,
David
-----



You can achieve the same effect by simply calling
ReferenceQueue.enqueue() (i.e. without the reference handler thread, or
within the reference handler thread doing the != NULL check), it's just
that in such a case the "old" ReferenceQueue is stored in some register.

The guard for ReferenceQueue.NULL does not have any effect except for
possibly saving the virtual call. Simply calling r.enqueue() exhibits
the same problem.

The proposed solution is to filter out References within
ReferenceQueue.enqueue() again. At that point we can check whether the
given Reference is actually meant for this queue or not. Already removed
References are known to be "inactive" (as enqueue and poll are mutually
exclusive using a lock), in particular the References' queue is
different (actually the NULL queue) to the queue it is being enqueued.

This change should pose no change in semantics, as the ReferenceQueue of
the Reference can only be set in its constructor, and as soon as the
Reference is removed from a ReferenceQueue, its ReferenceQueue will be
the NULL queue. (I.e. before this change you could not enqueue such an
"inactive" Reference multiple times anyway)

(too many References and queues here :)

Webrev with test case
http://cr.openjdk.java.net/~tschatzl/8014890/webrev/

JIRA:
https://jbs.oracle.com/bugs/browse/JDK-8014890

bugs.sun.com
http://bugs.sun.com/view_bug.do?bug_id=8014890

Testing:
jck, jprt, manual testing

Note that I also need a sponsor to push in case this change is approved.

Thanks,
   Thomas

Reply via email to