On 22/03/11 23:03, Stephen Allen wrote:
Hi Andy,
This is rather long email I'm afraid, but I've split it into two
parts: Parliament and TxTDB comments:
Parliament ----------
The unit of MVCC I'm looking at is per quad. This is similar to
PostgreSQL's row-level MVCC. It also means that the WAL is used for
durability guarantees and transaction rollback rather than
concurrency control. The WAL will track statement
insertions/deletions and disk page states on the first modification
after a checkpoint.
Stephen,
On the Parliament design to check I understand ...
When a new quad is added to the store during a transaction, the quad is
written to the WAL. is anything else written?
The quad needs to be added to the statement list and any new nodes to
resource table. Some data structure manipulations need to be done
atomically with respect to failure. The data structures are in memory
mapped files - there is a chance (small) part of the mapped file is
written back to disk at anytime is the OS decides it needs to swap out
that page. Having been recently used by being updated, it's quite
unlikely to be a candidate but it's possible.
Does Parliament record the lower level statement list and resource table
changes? Or is teh adding of a quad and adding of a resource
idempotent, at least can be repeated without reading from the
datastructure in some way?
These actions that are repeatable without looking at the original state
are quite nice from a redo-log perspective.
I can see how it might be possible to have datastructures that are
insensitive to corruption under append because the repeat is simply to
put in the right answers without reading the potentially corrupt data.
B+Trees don't have this property - when a tree node splits, three nodes
need writing (left of split, right of split, parent). A change to a
single node is OK because the trees don't support duplicate keys. But
if the tree is partially written back in a split then the tree on disk
is broken and reply does not help - the replay traverses the tree
assuming it's valid.
The chances of this are small - splits aren't the common change
operation, and updates close in time are likely to be all written or not
written, but it's possible to interrupt the writing in a small window.
(side question: the ISWC2009 only describes triples - is that a quads
table or a triples-table-per-graph. )
Andy