On Fri, Aug 19, 2011 at 12:36 PM, Emmanuel Lecharny <[email protected]> wrote: > On 8/19/11 10:30 AM, Selcuk AYA wrote: >> >> On Fri, Aug 19, 2011 at 10:14 AM, Emmanuel Lecharny<[email protected]> >> wrote: >>> >>> On 8/19/11 8:51 AM, Stefan Seelmann wrote: >>>> >>>> On Thu, Aug 18, 2011 at 10:41 PM, Selcuk AYA<[email protected]> >>>> wrote: >>>>> >>>>> Hi, >>>>> Today we had some discussion with Alex, Emmanuel and others on how we >>>>> can improve jdbm consistency semantics. I had spent sometime looking >>>>> into this issue and thought it could be useful to put a summary of my >>>>> findings here. >>>>> >>>>> Currently, jdbm has issues with both concurrency and consistency: >>>>> 1) jdbm table lookups, insert and remove interfaces are synchronized >>>>> methods. So even if all the directory server does is to lookups on >>>>> tables, all lookups will be serialized. Moreover, the record manager >>>>> operations are all synchronized methods too. This means, for example, >>>>> while sync of dirty pages to disk goes on, no lookup operation can go >>>>> ahead. >>>>> >>>>> 2) jdbm browser interface does not provide any consistency guarantees. >>>>> If there are underlying changes to the store while the browser is >>>>> open, then it might return inconsistent results. I think the situation >>>>> is even worse if the underlying record manager is CacheRecordManager >>>>> as the same page could be modified and read by a browser concurrently. >>>>> >>>>> I have been working on a scheme which introduces what can be defined >>>>> as action consistency into the jdbm store. >>>>> 1) Actions are lookup, insert, remove and browse. Each action is >>>>> assigned a unique version. Actions are ReadWrite or ReadOnly. >>>>> 2) We allow one ReadWrite action and multiple ReadOnly actions to run >>>>> concurrently.So synchronized methods will be removed. >>>>> 3)We introduce a new record manager which caches jdbm B+ pages. Each >>>>> page in the cache has a [startVersion, endVersion). When an action >>>>> with version V1 wants to read a page, its read can be satisfied >>>>> satisfied from that page's version with V1>= startVersion&& V1< >>>>> endVersion. >>>>> 4) Pages' previous versions are kept in memory. A page can be purged >>>>> when the minimum version among all active actions is>= endVersion. >>>>> >>>>> So say we have three pages in a chain (A0->B0->C0) and each of them >>>>> has version range [0, infinity). An write action starts and gets the >>>>> version number 1. It updates B0 and C0 to B1 and C1 in any order. >>>>> After these two updates, B0 and C0 will have version range [0,1) and >>>>> and B1 and C1 will have version range [1,infinity). Before the write >>>>> action completes, a read action comes, gets the current read version >>>>> which is 0 and reads the chain. Since B0 and C0 will be the versions >>>>> that can satisfy this read, the read only action will read the chain >>>>> A0->B0->C0. When write action completes, it posts version 1 as the new >>>>> read version. First read action completes, a second one starts with >>>>> version 1 and that one will read A0->B1->C1. Since the minimum read >>>>> version is now 1, B0 and C0 can be zapped. >>>> >>>> Here I have a question: How can we detect that the read is finished? >>>> In the current JDBM implementation the "browse" action can take >>>> forever, there is no way to tell JDBM that browse is finished (i.e. a >>>> close() method). >> >> that is true. We will need to add a close() to the browse interface >> and that should tell jdbm that the read finished. Since browse is >> embedded under cursor and cursor is supposed to be closed at some >> point ( ? ), this is reasonable I think. > > The cursor will be closed when we have read all the entries. >>> >>> First, browse will last at some point. The most we can do is to read >>> *all* >>> the entries from the master table using an index, but once it's done, the >>> browse will stop. I wondered yesterday if a persistent search could >>> change >>> anything but no : the way it's handled is very different, we just >>> register >>> some listeners in the EventInterceptor, and every modification will >>> trigger >>> one listener. This is not a browse by all mean. >>> >>> Now, I guess we will have to store the used revision somewhere (like in >>> the >>> searchOperationContext), and when we don't have anymore element to send >>> back >>> to the user, then we can 'close' the browse, releasing the revision. >>> >> I assumed that each action(find, insert, remove,browse) is executed by >> one thread so I thought we can store an actionContext at a thread >> local variable as a thread enters an action. Version number can be >> stored in this context. This way, except the close() call we add to >> the Browse interface, we can keep most of the changes local to jdbm. > > Hmmm. The way it works, we execute a search on a single thread, but we also > associate an operationContext instance which is carried all along the > filters. Except that this instance is not passed to the JDBM layer. So, yes, > it's probably a better idea to use a ThreadLocal variable here. Although we > have to be sure that we don't reuse what we store in this variable. > >> With this, an insert implementation within B+tree looks like this for >> example: >> beginAction() // intiialize action context, get a version number >> do the insert >> endAction() >> >> for Browse, we might have: >> Browse() >> { >> beginAction() >> } >> >> close() >> { >> endAction() >> } > > Sounds good. > > Do you want us to create a branch to experiment around these ideas ? > I already cloned the code and am experimenting on it. Will keep you posted on how it goes. > > -- > Regards, > Cordialement, > Emmanuel Lécharny > www.iktek.com > >
regards, Selcuk AYA
