Re: Derby architecture/design documents

Mike Matrigali 1 Feb 2005 21:44:55 -0000

I am going to defer on page version vs. LSN question, but at least
mention some guesses, not researched info.  You are right
about what exists.  I spoke with some colleagues and the best we can
come up with is that the implementor wanted to separate the page and
the log, in case we ever did a different log format.  I will try to
do some more research here.  I also vaguely remember the implementor
mentioning if we ever wanted to implement the LSN on the page, we
had space to do so.  It may simply have been easier to code the page
versions, since in the current system the LSN is the address in the log
file (which and it may not be available when the code wants to write it
on the page).

As you say in derby all the log records are associated with a page, and
thus have a page version number.  That page version number in the log
is compared with the page version number of the page during redo to
determine if the log record needs to be applied.  This also has helped
us in the past to catch some bugs as we can sanity check during redo
that the page is progressing along the expected path, ie. it is a bug
during redo to be applying a page version 10 log record to page that is
at page version 8.  I haven't seen this sanity check in many years, but
was useful when the product was first coded.

The original requirement for the identifier of a record in cloudscape
was that it must be a unique id, associated with the row, for the
lifetime of the row.  The identifier was meant to survive table
reorganization also.  Originally, cloudscape was being designed as
a object relational system, and the plan was that these record id's
might be used as object id's (or some part of an object id), that might
be exported outside the db.  This use really never came to pass.  For
row level locking, the record identifier is used as the lock, so it is
important that the id never get reused.

In derby the extra level of indirection was added to meet the original
design goals.   The slot table entry could be used if we never reclaimed
a slot table entry, currently in derby the slot table entries are
automatically reclaimed as part of the space management algorithms.
During runtime the system maintains runtime information mapping id's to
slot table entries - but as you say it would be more efficient if the
id pointed directly at the offset table.  It is a space vs. runtime
tradeoff.

Dibyendu Majumdar wrote:

> Mike,
> 
> A couple of questions for you.
> 
> While documenting the page formats, I noticed that Derby does not persist
> the LogInstant (LSN) associated with a page. In a classic ARIES
> implementation, each page is supposed to be tagged with the LSN of the log
> record that last updated it, so that during recovery redo, the LSN can be
> used to determine if a particular log record has already been applied to the
> page.
> 
> Derby appears to use PageVersion for this purpose, which is persisted and is
> also present in log records. Any reason why PageVersion is used and not
> LogInstant?
> 
> Another thing that has puzzled me is the use of RecordIds. I would have
> assumed that the purpose of the slot table in a page is provide a stable
> TupleId (as described in the classic TPCT book by Jim Gray and Andreas
> Reuter), so that there is freedom to move records around in the page, and
> still have a stable TupleId that does not change over time. Also, the
> slot-no allows a particular record to be located efficiently. Given that a
> page in Derby has a slot table, why is the slot-no not used as part of the
> RecordId? Using a separate id is obviously inefficient, as it must be
> searched for.
> 
> Thanks and Regards
> 
> Dibyendu
> 
> 
> 
> 
>

Re: Derby architecture/design documents

Reply via email to