Hi, Today we had some discussion with Alex, Emmanuel and others on how we can improve jdbm consistency semantics. I had spent sometime looking into this issue and thought it could be useful to put a summary of my findings here.
Currently, jdbm has issues with both concurrency and consistency: 1) jdbm table lookups, insert and remove interfaces are synchronized methods. So even if all the directory server does is to lookups on tables, all lookups will be serialized. Moreover, the record manager operations are all synchronized methods too. This means, for example, while sync of dirty pages to disk goes on, no lookup operation can go ahead. 2) jdbm browser interface does not provide any consistency guarantees. If there are underlying changes to the store while the browser is open, then it might return inconsistent results. I think the situation is even worse if the underlying record manager is CacheRecordManager as the same page could be modified and read by a browser concurrently. I have been working on a scheme which introduces what can be defined as action consistency into the jdbm store. 1) Actions are lookup, insert, remove and browse. Each action is assigned a unique version. Actions are ReadWrite or ReadOnly. 2) We allow one ReadWrite action and multiple ReadOnly actions to run concurrently.So synchronized methods will be removed. 3)We introduce a new record manager which caches jdbm B+ pages. Each page in the cache has a [startVersion, endVersion). When an action with version V1 wants to read a page, its read can be satisfied satisfied from that page's version with V1 >= startVersion && V1 < endVersion. 4) Pages' previous versions are kept in memory. A page can be purged when the minimum version among all active actions is >= endVersion. So say we have three pages in a chain (A0->B0->C0) and each of them has version range [0, infinity). An write action starts and gets the version number 1. It updates B0 and C0 to B1 and C1 in any order. After these two updates, B0 and C0 will have version range [0,1) and and B1 and C1 will have version range [1,infinity). Before the write action completes, a read action comes, gets the current read version which is 0 and reads the chain. Since B0 and C0 will be the versions that can satisfy this read, the read only action will read the chain A0->B0->C0. When write action completes, it posts version 1 as the new read version. First read action completes, a second one starts with version 1 and that one will read A0->B1->C1. Since the minimum read version is now 1, B0 and C0 can be zapped. Concerns: 1)Previous versions of B+ tree pages could consume too much memory. As long as actions are kept small, this is not a problem. Only the browsing action does not follow this rule. There are a couple of options to deal with it. We can maybe spill previous versions to disk after some memory limit. Or we can think about chopping down browsing action into smaller read actions. Another way to deal with this problem would be to keep the previous versions of pages on disk rather than in memory. On disk, versions for a B+ page would form a chain.This is a radically different way of introducing action consistency but I thought this unneccesarily complicates free space management while all we need to do with old versions of pages after a restart is to toss them away. thanks, Selcuk On Thu, Aug 18, 2011 at 11:48 AM, Emmanuel Lecharny (JIRA) <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/DIRSERVER-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086892#comment-13086892 > ] > > Emmanuel Lecharny commented on DIRSERVER-1642: > ---------------------------------------------- > > Damn... The more I think about the issue, the more I find it critical. > > If we remove an entry while someone is doing a search, the search will fail. > Also we have some problem during replication, just when we try to replicate > some deletion, and it might prefectly explain why we get those issues. > >> Unexpected behaviour in JdbmIndex >> --------------------------------- >> >> Key: DIRSERVER-1642 >> URL: https://issues.apache.org/jira/browse/DIRSERVER-1642 >> Project: Directory ApacheDS >> Issue Type: Bug >> Reporter: Stefan Seelmann >> Attachments: IndexTest.java >> >> >> During my experiments and tests of removing one-level and sub-level indices >> at least one integration test "SearchAuthorizationIT" failed (the test fails >> recursivelyDelete()). A debugging session showed that the follwing: >> - in recursivelyDelete() multiple search requests are done which leads to >> multiple open cursors in the XDBM search engine >> - an entry is deleted >> - when the open cursors are advanced wrong/unexpected entries are returned >> I was able to create a small test that shows the problem: >> - the index contains six tuples: >> (a,1) >> (b,2) >> (c,3) >> (d,4) >> (e,5) >> (f,6) >> - a cursor over the index is created and advanced two times, the expected >> tuples (a,1) and (b,2) were returned >> - now tuple (c,3) is deleted >> - when the cursor is advanced again the tuple (b,2) is returned again! I had >> expected (d,4). >> Note that this doesn't happen with AvlIndex. > > -- > This message is automatically generated by JIRA. > For more information on JIRA, see: http://www.atlassian.com/software/jira > > >
