Re: Derby architecture/design documents

Dibyendu Majumdar 20 Jan 2005 22:08:01 -0000

"Mike Matrigali" wrote:

> I am happy to answer any specific questions about the store module you
> have, or can find someone to answer them if I don't know.  I
> think questions should be directed to the mail list so that everyone
> can share in the discussion.  If I miss something feel free to ping
> me to get an answer.  I am an IBM employee still working on the derby
> codeline and was one of the original developers at Cloudscape when it
> first began as a startup.


Thank you very much, this will really help me to get started.

> Is there any model/standard in apache or opensource that we could follow
> or at least look at for a format for this kind of information.

I have some thoughts on how I want to do this - these are given below.

> I personally really like design information embedded in the actual code
> it relates to, rather than having separate documents.

I think that well documented code is always a good idea. And there is an
argument that documentation closer to the code is more likely to be kept
up-to-date. It is also a great help to anyone fixing the code.

However, separate documents are also very useful. Specially because you can
illustrate concepts and design with nice diagrams and flow charts. It also
enables people to grasp the design more easily, as things can be explained
top-down rather than jumping straight to implementation details.

> Is there any more specific area you are looking at, rather than just
> store.

> I usually divide the store into the following, some overlap:
> o store module interface (java protocol used by other modules to
> interact with store, this is actually fairly well documented in the
> javadoc already available)
> o access methods (btree/heap existing, future might be gist/rtree/...)
> o recovery (both restart and crash recovery, logging, xa recovery,  ...)
> o transactions (locking, isolation levels, xa transactions)
> o data (actual physical storage of rows/columns on pages, page
> management, cache  - the format of rows/columns/pages is reasonably
> documented in the code - I think in the javadoc, but might just be in
> comments)
> o misc (sorting, encryption, ...)

Well, I shall be documenting the whole of it., but will start with a
particular item, and then move on to other items.

I think that documentation should be at two levels.

The more useful level is where ideas and concepts are explained. I am sure
that Cloudscape was built upon ideas and concepts that had been invented
before by others, and perhaps built its own innovations upon those
foundations.
So I want to explore the origins of those ideas, where did the ideas come
from? How did Cloudscape implementation use those ideas? I'd like to give
some examples to illustrate what I mean.

If we take the transaction manager for a start. As far as I understand,
Derby uses the ARIES algorithm for write-ahead logging. What I'd like to
understand and document is how the Derby implementation of ARIES is
different from the published algorithms. Does Derby use nested top-level
actions or sub transactions to manage internal atomic operations? I think
I saw a comment in the code somehere that Derby does not yet
implement the transaction list in checkpoint records. I assume that Derby
does page-oriented redo during recovery redo pass, and then
page-oriented or logical undo during the undo pass.

For the BTREE implementation, I'd like to understand whether Derby
implements ARIES/IM or some other algorithm. How does it handle
structural consistency of the tree during concurrent updates? What are
the locking/latching implications of the algorithm used? Are accesses to
the entire tree or subtree serialised during structural modifications?
How is repeatable read ensured? Does Derby use data locking row or
index row locking? Does Derby delete rows logically first, and
physically afterwards? How does it determine that a deleted row can
be physically removed, ie, the transaction that deleted the row has
committed?

I understand that Derby uses Log Actions to perform both normal
operations and recovery operations. Thus a key insert is first encoded as
a log operation, logged, and then executed. This is very neat.

I also get the impression by reading the code that a BTREE is
implemented on top of a regular table like structure. This is also
very neat as it resuses all the code for handling rows, locking, etc.

These are a few examples, and I could list several more. What would be
very useful is if you and other IBM developers could identify the algorithms
that Derby uses. Perhaps you could provide references to published
research papers, books, and patents. I could start by writing a high level
overview of the concepts used by Derby, and gradually build up the
detail by reading  the code, and by asking questions in this forum.

The second level of documentation can be built from the comments
in the code. This is the implementation level where I can describe how
Derby actually implements something, what the file formats are, how the
write ahead log is implementated, what a conglomerate is, what a
record id and how it is different from slot id, etc.

In terms of starting this level of documentation, I would first like to
concentrate on the "raw" module, which is I suspect is at the heart
of all other modules.

Thanks and Regards

Dibyendu

Re: Derby architecture/design documents

Reply via email to