Le 4/26/12 2:08 AM, Selcuk AYA a écrit :
On Wed, Apr 25, 2012 at 4:45 PM, Emmanuel Lécharny<elecha...@gmail.com>  wrote:
Le 4/5/12 1:09 AM, Emmanuel Lécharny a écrit :
Le 4/5/12 12:43 AM, Selcuk AYA a écrit :
On Wed, Apr 4, 2012 at 3:22 PM, Emmanuel Lécharny<elecha...@gmail.com>
  wrote:
It's systematic, and I guess that the fact we now pond the RdnIndex
table
way more often than before (just because we don't call anymore the
OneLevelIndex) cause the cache to get filled and not released fast
enough.
do we hold a cursor open while this code gets stuck? I would think we
hold a cursor open and moduify quite a bit of jdbm btree pages for
this kind of behavior to happen.

I'll check that.
As we don't set any size for the cache, its default size is 1024. For
some
of the tests, this mightnot be enough, as we load a lot of entries
(typically the schema elements) plus many others that get added and
removed
while running tests in revert mode.

If I increase the default size to 65536, the tests are passing.

Ok, now, I have to admit I haven't - yet - looked at the LRUCache code,
and
my analysis is just based on what I saw by quickly looking at the code,
the
stack traces I have added and some few blind guesses.
However, I think we have a serious issue here. As far as I can tel, the
code
itself is probably not responsible for this behaviour, but the way we
use it
is.

Did I missed something ? Is there anything we can do - except increase
the
cache size - to get the tests passing fine ?

I'm more concern about what could occur in real life, when some users
will
load the server up to a point it just stop responding...
  to aovid this issue, we can let the writers allocate more cache
pages(rather than keeping the cache size fixed) so that they do not
loop waiting for a replaceable cache. However, I would again suggest
making sure we do not forget the cursor open. If we forget a cursor
open and keep allocating new cache pages for writes, we will have
other problems.
Yeah, I can see how it may affect the tests. I'll definitively investigate
this first, before going any further in another direction.

ATM, I'm using a not committed version of JDBM were the default cache size
has been changed.

Thanks a lot Selcuk !

So I still have the LRUCache size issue, after having removed the SubLevel
index. Once I increased the size to 1<<  16, tests are passing.

The failing tests are the SearchAuthorizationIT class' tests.

What happens is that when I add an entry, I update many elements in the
RdnIndex, as I have to modify the nbDescendant in all its parents. As those
tests are injecting a lot of entries, so they do a lot of modifications in
the RdnIndex.

I checked that all the cursors are correctly closed.

Any clue ?
Can you provide the following details:
-on which jdbm table are you having the problem(rdn index, main table?)?
- approximately how many modifications are you doing on this table
while you are holding a cursor open( even if the cursor is held open
legally). Knowing this number would help a lot.
I'll add some logs to get those numbers.

- is the same problem you had before or did closing the cursors in the
previous case solve your problem?
Yes, absolutely. But I know for sure that increasing the cache size solved the issue, and that closing the cursors also solved the issue (this is not corelated, both fixed the issue), however, those fixes might perfectly hide some other issue.

I have also conducted some tests where I do some concurrent searches and modifications, without any problem.

Keep in mind that the tests are quite specific, as we run them on a direct connection to the server (which is way more stressing for the server, as we can do 25Ksearches/s), and we do a lot of concurrent operations (as tests are run in parallel).

this kind of problem can currently occur if a thread is holding a
cursor on one table(not necessarily illegally) and the "same" thread
is modifying the "same" table with many add/delete/update operations.
I am wondering whether we have a use case like this now. If we have,
then I can change the code to account for it.
I wonder if this is not what happens. The RdnIndex is used to store the folowing relationships :
forward index : ParentIdAndRdn -> entryID
reverse index : entryId -> ParentIdAndRdn

When we add or delete an entry, in order to use the rdnIndex to replace the subLevel index, we update all the ParentIdAndRdn elements up to the partition root to update the nbChildren/nbDescendant fields in each of them. That means we may have many entries in both table being modified, for each addition.

In the tests I'm running, we typically add entries like :
ou=0, ou=0, ou=0, ou=tests, ou=system.

For every entry like this one, we will update 5 ParentIdAndRdn elements, doing a drop and a add (sadly, we can't simply replace the element, but that's another story).

As we revert the modifications at the end, we may have hundreds of modifications done. Something I don't get though is that the cursors are supposed to be carefully closed. I'll recheck that today.

I will also implement your suggestion (counting the cursor opening/closing to compare the result at the end).

I'll investigate more today.

Still I have a question about this cache : if it's a cache, why do we block waiting for some slot to be available, instead of trashing the oldest entry in the cache ? I mean, cache are never supposed to block, either they provide the required element, or they fetch it from the slow storage, no ? Otherwise, if this is used to manage some temporary elements, until they are flushed to disk, and if we can't discard those elements otherwise we lose some critical information, then it's not really a LRUCache, and we should find a better name... (just thinking out loud here, I'm not very familiar with all the concepts behind implemented in JDBM...)

Thanks Selcuk !

--
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Reply via email to