Stuart,

A couple of comments so far:

I don't think that there would be a serialization compatibility problem with 
changing Activation.groupTable to have a ConcurrentHashMap instead of a 
HashMap.  The field's declared type is Map, so the deserialization process 
would be able to assign either class to the field, and the code doesn't 
otherwise care about the Map implementation class.  An rmid's state created 
with CHM could not be recoverable with JDK 1.4, of course, but I don't think 
that that's a concern.  The fix would not help an rmid whose state gets 
recovered from an earlier version created with HM, unless a replacement 
(HM->CHM) was also done upon deserialization.

More troubling I think is the higher-level issue suggested by this 
ConcurrentModificationException.  Thread B (in the 6896297 Evaluation) is 
attempting to write a "snapshot" of the entire persistent state of the 
Activation instance, which presumably should be consistent with future 
"updates" written to the log, while Thread A is concurrently attempting to 
mutate that state (before writing an update for its mutation).  It seems highly 
doubtful that potentially including an uncertain amount of this concurrent 
mutation in the written snapshot is safe.  I might expect a coarser lock to be 
used across all such mutations, together with their associated log writes (at 
which point groupTable wouldn't need to be concurrent).

The approach at your webrev below seems OK to address the immediate 
ConcurrentModificationException, but (unless I'm missing something) the 
higher-level issue with snapshot correctness described above would remain.

Cheers,

-- Peter

P.S.  This seems somewhat related to 
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4945726


On Apr 7, 2011, at 6:19 PM, Stuart Marks wrote:

> Hi Core-Libs developers,
> 
> I'd like to solicit some advice and discussion about this bug and a potential 
> fix I'm cooking for it. Here is the bug report; it contains details about the 
> problem and my analysis of it:
> 
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6896297
> 
> and here's a webrev of the fix I'm working on:
> 
> http://cr.openjdk.java.net/~smarks/reviews/6896297/webrev.0/
> 
> Briefly, the problem is incorrect synchronization of groupTable, a HashMap 
> field of an Activation object. The code mostly locks groupTable around any 
> access to it. However, no such locking is done during serialization. If the 
> groupTable is modified while it's being serialized, 
> ConcurrentModificationException occurs.
> 
> The obvious fix would be to use ConcurrentHashMap instead of Hashmap and to 
> remove the external locking entirely. Unfortunately this will change the 
> serialized representation of the Activation object, which I'm not sure is 
> acceptable.
> 
> Assuming that we can't change the serialized represenation, the alternative 
> approach would be to make sure that locking is done properly during 
> serialization. This is fairly easy to do by locking groupTable in a 
> writeObject() method. Unfortunately, this introduces a deadlock.
> 
> This deadlock occurs because, with this modification, there are now paths 
> through the code that take locks in the opposite order. Specifically, the 
> addLogRecord() method locks the log object and then (via serialization and 
> the newly added writeObject() method) locks groupTable. However, the 
> unregisterGroup() method locks groupTable and calls 
> GroupEntry.unregisterGroup() which logs the event, which takes a lock on the 
> log.
> 
> After some analysis, I've determined that the call to 
> GroupEntry.unregisterGroup() can be moved outside the synchronization of 
> groupTable. This removes the ordering problem.
> 
> With these fixes in place (the state of the webrev above) I can get several 
> hundred successful test runs with neither ConcurrentModificationException nor 
> any deadlocks.
> 
> Of course, that doesn't mean the change is correct. :-)
> 
> Questions:
> 
> 1. Is there a requirement that the serialized form of Activation remain 
> unchanged? If we can change it, we might as well just use ConcurrentHashMap 
> instead of HashMap.
> 
> 2. Is my lock ordering analysis correct? I've pored over the code, but not 
> really being familiar with any RMI concepts, I'm not sure I have it right. 
> I've written this analysis into a big comment I've added to the code.
> 
> 3. There is also a potential concurrency issue with idTable, which is used 
> similarly to groupTable. I haven't seen a test failure from it though. It 
> seems sensible to add a lock for it in Activation.writeObject() though. I 
> don't particularly like nesting the locks of idTable and groupTable, but 
> locking them separately would require serializing the Activation object field 
> by field instead of simply using defaultWriteObject(). Is this a reasonable 
> approach?
> 
> Thanks for any advice or comments.
> 
> s'marks
> 

Reply via email to