On Nov 14, 2007, at 11:34, Chris Bowditch wrote:

Andreas L Delmelle wrote:

Hi Andreas,

A splendid idea!
I already thought my approach to be lacking somewhere, although it was too late last night to see where exactly. First off, the flag should, of course be declared 'volatile'. If not, then there is no guarantee a thread will see the most recent state. More importantly, there is the matter of cleanups that were already running at the time rehash() started acquiring its locks, as you point out. Introducing another dummy Object to synchronize on, seems like a very clean solution. The flag could still be used to avoid unnecessary cleanups or rehashes from being started. Synchronizing on a mutual object would still launch them, but they would merely have to wait.

I'm not sure another object to synchronize on is required as the 2 threads both already synchronize on the segments. I'm not familar with the PropertyCache and I was expecting you to say that the rehash method couldn't start whilst existing cleanups were running because of the sync on segments. But I guess if that was working then we wouldn't need the boolean flag either :-/

The picture is roughly:
If the size of one segment grows too large, a cleanup thread is launched (actually suboptimal -> see Jeremias' response: much better to have only one single thread; I'll look into this asap). That cleanup thread only locks the segment of the cache it is dealing with (just as put() does).

A rehash() OTOH needs a lock on the whole cache, not just one segment, so what it does is acquire these locks one by one. If there would be 10 segments, it first locks segment 9, then 8, then 7 and so on. Between the point where the lock on segment 9 is obtained, and the point where the whole cache is locked (segment 0), a cleanup thread could still beat rehash() to acquire the lock on segment 1 for example.

What you were expecting is /almost/ what happens: once the rehash method has acquired all the locks, any concurrent thread that needs synchronized access must wait until the rehashing is complete.


It's not clear to me from a quick glance at the code exactly what segments within the Property Cache are,

They are merely objects to distribute the synchronization, such that two concurrent threads do not have to wait for each other if they need access to different hash-buckets.

but gut feel is that synchronization between the 2 threads needs more work.

I share this gut feel. I decided to commit anyway a few months ago, since I gathered that to be /the/ way to find out what its effects are in real-time scenarios.

For now I will just leave the null check in place since I don't have time to persue this issue.

Regarding your observation of the OutOfMemoryError: this might need investigating. Do you have any indication as to which types of objects are leaking?

Yes I have been running a profiling tool on FOP this morning and discovered that the PropertyCache$CacheEntry inner class is the one that seems to grow and grow (its +89,000 instances after 1100 documents), but you knew that already ;)

Chris




Reply via email to