On Tue, Dec 20, 2011 at 6:24 PM, Riyad Kalla <[email protected]> wrote: > I've been reading everything I can find on the CouchDB file format[1] and > am getting bits and pieces here and there, but not a great, concrete, > step-by-step explanation of the process. > > I'm clear on the use of B+ trees and after reading a few papers on the > benefits of log-structured file formats, I understand the benefits of > inlining the B+ tree indices directly into the data file as well (locality > + sequential I/O)... what I'm flummoxed about is how much of the B+ tree's > index is rewritten after every modified document. > > Consider a CouchDB file that looks more or less like this: > > [idx/header][doc1, rev1][idx/header][doc1, rev2].... > > After each revised doc is written and the "b-tree root" is rewritten after > that, is that just a modified root node of the B+ tree or the entire B+ > tree? > > The reason I ask is because regardless of the answer to my previous > question, for a *huge* database will millions of records, that seems like > an enormous amount of data to rewrite after every modification. Say the > root node had a fanning factor of 133; that would still be alot of data to > rewrite.
Hi Riyad, Have you observed that in practice? Typically the depth of database btrees is not that high even for millions of documents. For example I have one around with about 10 million documents which doesn't have more than 5 or 6 levels if I recall correctly. So updating a doc, for that particular case, means rewriting 5 or 6 new nodes plus the document itself. Each node is normally not much bigger than 1.2Kb. I've written once a tool to analyze database files which reports btree depths, however it's not updated to work with recent changes on master/1.2.x such as snappy compression and btree sizes: https://github.com/fdmanana/couchfoo It should work with CouchDB 1.1 (and older) database files. > > I am certain I am missing the boat on this; if anyone can pull me out of > the water and point me to dry land I'd appreciate it. > > Best, > R > > > > [1] > -- > http://jchrisa.net/drl/_design/sofa/_list/post/post-page?startkey=%5B%22CouchDB-Implements-a-Fundamental-Algorithm%22%5D > -- http://horicky.blogspot.com/2008/10/couchdb-implementation.html > -- http://blog.kodekabuki.com/post/132952897/couchdb-naked > -- http://guide.couchdb.org/editions/1/en/btree.html > -- http://ayende.com/blog/* (Over my head) -- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men."
