Hi Stuart,
I can't answer your specific questions as I'm not familiar with the RMI
infrastructure either. Taking the lock in writeObject and moving
group.unregister out of the sync block to avoid deadlock seems
reasonable. The main question to ask with such a move is whether the
temporary inconsistency in state is something that can be observed and
cause a problem.
The lock ordering analysis seems reasonable.
I do note that the startupLock protocol seems suspicious as there is a
race with detection of the lock value. I assume init itself can never be
invoked by more than one thread ;-) It may be that the object can only
become accessible concurrently at some time during init (ie it gets
published) in which case the protocol will work fine - but you really
need a full understanding of the lifecycle of the objects to determine that.
That aside, using ConcurrentHashMap would simplify the solution to the
current problem. If you are concerned about the serialized form then you
could define writeObject and readObject such that they convert between
CHM and plain HM using the serialized-fields API.
Cheers,
David
Stuart Marks said the following on 04/08/11 08:19:
Hi Core-Libs developers,
I'd like to solicit some advice and discussion about this bug and a
potential fix I'm cooking for it. Here is the bug report; it contains
details about the problem and my analysis of it:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6896297
and here's a webrev of the fix I'm working on:
http://cr.openjdk.java.net/~smarks/reviews/6896297/webrev.0/
Briefly, the problem is incorrect synchronization of groupTable, a
HashMap field of an Activation object. The code mostly locks groupTable
around any access to it. However, no such locking is done during
serialization. If the groupTable is modified while it's being
serialized, ConcurrentModificationException occurs.
The obvious fix would be to use ConcurrentHashMap instead of Hashmap and
to remove the external locking entirely. Unfortunately this will change
the serialized representation of the Activation object, which I'm not
sure is acceptable.
Assuming that we can't change the serialized represenation, the
alternative approach would be to make sure that locking is done properly
during serialization. This is fairly easy to do by locking groupTable in
a writeObject() method. Unfortunately, this introduces a deadlock.
This deadlock occurs because, with this modification, there are now
paths through the code that take locks in the opposite order.
Specifically, the addLogRecord() method locks the log object and then
(via serialization and the newly added writeObject() method) locks
groupTable. However, the unregisterGroup() method locks groupTable and
calls GroupEntry.unregisterGroup() which logs the event, which takes a
lock on the log.
After some analysis, I've determined that the call to
GroupEntry.unregisterGroup() can be moved outside the synchronization of
groupTable. This removes the ordering problem.
With these fixes in place (the state of the webrev above) I can get
several hundred successful test runs with neither
ConcurrentModificationException nor any deadlocks.
Of course, that doesn't mean the change is correct. :-)
Questions:
1. Is there a requirement that the serialized form of Activation remain
unchanged? If we can change it, we might as well just use
ConcurrentHashMap instead of HashMap.
2. Is my lock ordering analysis correct? I've pored over the code, but
not really being familiar with any RMI concepts, I'm not sure I have it
right. I've written this analysis into a big comment I've added to the
code.
3. There is also a potential concurrency issue with idTable, which is
used similarly to groupTable. I haven't seen a test failure from it
though. It seems sensible to add a lock for it in
Activation.writeObject() though. I don't particularly like nesting the
locks of idTable and groupTable, but locking them separately would
require serializing the Activation object field by field instead of
simply using defaultWriteObject(). Is this a reasonable approach?
Thanks for any advice or comments.
s'marks