Re: Different policy for concurrency access in TDB supporting a single writer and multiple readers

Andy Seaborne Wed, 23 Mar 2011 04:28:50 -0700


On 22/03/11 23:03, Stephen Allen wrote:

Hi Andy,

This is rather long email I'm afraid, but I've split it into two parts: 
Parliament and TxTDB comments:

Very helpful indeed. I'll respond to the TxTDB part here -- thepArliament description is interesting and helpful as to what you plan todo and how you came to your design decisions.

TxTDB -----

It sounds like you may be describing shadow paging [1].  In shadow
paging, the basic idea is that any reader or writer gets a view onto
the data based on reachability from pointers in a particular root
block.  Pages that are reachable from any live root block are never
modified.  A vacuum process is required to collect the space from
blocks that are no longer reachable (the last reader out could do it
too).  Updates to indexes must be treated in roughly the same way as
data pages, because they contain pointers to different data.  If you
wrote each block to disk at commit time the system would belong to
the class of logless transaction systems (although a small log might
be needed to prevent partial updates to the root page pointers).
With a shadow paging system you also would not have to actually copy
the new pages back over top of the old ones.


-Stephen

[1] http://en.wikipedia.org/wiki/Shadow_paging

Sort of shadow paging - the Wikipedia page is describing shadow pagingwhere a new page is allocated with new page id, with the need to chaseback up any datastructre that refers to the page and change that (thelatter could be in-place or shadow'ed recursed). It requires datastructure involvement to chase the references.

I'm thinking of keeping the same page id, and the transactions have acontext that enables them to go to the right place place to find a givenpage. The data structures referring to the block do not change; the"get block 123" step carries out the looking for the right version.

The B+Tree code is already structured around "get block", "put block"operations - one reason this new implementation was done, rather thanpick up an existing one.


http://www.sqlite.org/wal.html
Section "How WAL Works"

which isn't quite the same WAL as Wikipedia either. This WAL is a redolog without change in-place as the transaction progresses.

I hope that one form of compression for new blocks is to use ablock-type compressor so adding a triple to a index is not a copy of thewhole block, but just "shift-insert bytes ABC at offset 123, moving upold bytes". Most changes in a B+Tree are on one block, not causingblock split or rebalancing. That level of compress takes the blockchange down to about "insert triple" size, except there are multipleindexes. TDB does not store a distinguished triple as such although oneindex is considered "primary" but they are all cluster indexes at themoment. Primary is just the one the code assumes exists and falls backon although currently only total indexing is done so that never happens.

It would also be possible to reverse the whole thing - copy old data outof the way and change in-place. That puts the I/O to the database onthe writer critical path, which is random access, whereas redo putsappend+sync on the log as the I/O on the writer critical path.


        Andy

Re: Different policy for concurrency access in TDB supporting a single writer and multiple readers

Reply via email to