Understanding the CouchDB file format

Riyad Kalla Tue, 20 Dec 2011 10:25:12 -0800

I've been reading everything I can find on the CouchDB file format[1] and
am getting bits and pieces here and there, but not a great, concrete,
step-by-step explanation of the process.


I'm clear on the use of B+ trees and after reading a few papers on the
benefits of log-structured file formats, I understand the benefits of
inlining the B+ tree indices directly into the data file as well (locality
+ sequential I/O)... what I'm flummoxed about is how much of the B+ tree's
index is rewritten after every modified document.

Consider a CouchDB file that looks more or less like this:

[idx/header][doc1, rev1][idx/header][doc1, rev2]....

After each revised doc is written and the "b-tree root" is rewritten after
that, is that just a modified root node of the B+ tree or the entire B+
tree?

The reason I ask is because regardless of the answer to my previous
question, for a *huge* database will millions of records, that seems like
an enormous amount of data to rewrite after every modification. Say the
root node had a fanning factor of 133; that would still be alot of data to
rewrite.

I am certain I am missing the boat on this; if anyone can pull me out of
the water and point me to dry land I'd appreciate it.

Best,
R



[1]
-- 
http://jchrisa.net/drl/_design/sofa/_list/post/post-page?startkey=%5B%22CouchDB-Implements-a-Fundamental-Algorithm%22%5D
-- http://horicky.blogspot.com/2008/10/couchdb-implementation.html
-- http://blog.kodekabuki.com/post/132952897/couchdb-naked
-- http://guide.couchdb.org/editions/1/en/btree.html
-- http://ayende.com/blog/* (Over my head)

Understanding the CouchDB file format

Reply via email to