On Nov 14, 2007, at 11:34, Chris Bowditch wrote:
Andreas L Delmelle wrote:
Hi Andreas,
A splendid idea!
I already thought my approach to be lacking somewhere, although
it was too late last night to see where exactly. First off, the
flag should, of course be declared 'volatile'. If not, then there
is no guarantee a thread will see the most recent state. More
importantly, there is the matter of cleanups that were already
running at the time rehash() started acquiring its locks, as you
point out.
Introducing another dummy Object to synchronize on, seems like a
very clean solution. The flag could still be used to avoid
unnecessary cleanups or rehashes from being started.
Synchronizing on a mutual object would still launch them, but
they would merely have to wait.
I'm not sure another object to synchronize on is required as the 2
threads both already synchronize on the segments. I'm not familar
with the PropertyCache and I was expecting you to say that the
rehash method couldn't start whilst existing cleanups were running
because of the sync on segments. But I guess if that was working
then we wouldn't need the boolean flag either :-/
The picture is roughly:
If the size of one segment grows too large, a cleanup thread is
launched (actually suboptimal -> see Jeremias' response: much better
to have only one single thread; I'll look into this asap).
That cleanup thread only locks the segment of the cache it is dealing
with (just as put() does).
A rehash() OTOH needs a lock on the whole cache, not just one
segment, so what it does is acquire these locks one by one. If there
would be 10 segments, it first locks segment 9, then 8, then 7 and so
on. Between the point where the lock on segment 9 is obtained, and
the point where the whole cache is locked (segment 0), a cleanup
thread could still beat rehash() to acquire the lock on segment 1 for
example.
What you were expecting is /almost/ what happens: once the rehash
method has acquired all the locks, any concurrent thread that needs
synchronized access must wait until the rehashing is complete.
It's not clear to me from a quick glance at the code exactly what
segments within the Property Cache are,
They are merely objects to distribute the synchronization, such that
two concurrent threads do not have to wait for each other if they
need access to different hash-buckets.
but gut feel is that synchronization between the 2 threads needs
more work.
I share this gut feel. I decided to commit anyway a few months ago,
since I gathered that to be /the/ way to find out what its effects
are in real-time scenarios.
For now I will just leave the null check in place since I don't
have time to persue this issue.
Regarding your observation of the OutOfMemoryError: this might
need investigating. Do you have any indication as to which types
of objects are leaking?
Yes I have been running a profiling tool on FOP this morning and
discovered that the PropertyCache$CacheEntry inner class is the one
that seems to grow and grow (its +89,000 instances after 1100
documents), but you knew that already ;)
Chris