On Thu, Dec 22, 2011 at 8:53 PM, Riyad Kalla <[email protected]> wrote: > Randall, > > Spot on; we are on the same page now. I'll go through the post again > tomorrow morning and reword it a bit so the thoughts aren't so fragmented. > > Best, > Riyad > > P.S.> Are there discussions anywhere on alternative file formats for the > indices that the Couch community has considered in the past? (old JIRA or > mailing list discussions) or has this file format been the only approach > from the initial work started in 05/06? > > The reason I ask is because of the index fragmentation you mentioned that > can occur over time... I'd be curious if an in-place-edited format for the > indices as separate file(s) from the append-only document revisions would > yield better performance over time. The splits would still be expensive, > but you'd be able to pre-allocate your blocks for each node on each split > to help a bit. Even leveraging an append-only, pre-allocated approach to > the index might work where split nodes are appended to the end of the index > file with the full node size (e.g. 133 elements) pre-allocated and the > index marked ready for compaction or something. > > So it would be a hybrid approach... when splits aren't needed, you do an > in-place edit to the index on-disk. If a split is needed, you use a > technique similar to the one now where the effected node hierarchy is > appended to the end of the file. > > You still run the risk of costly index rebuilds in the cast of a failed > in-place edit, but you wouldn't run the risk of any data loss... I am just > wondering for large data sets if this would yield some significant benefits > putting Couch somewhere between a Mongo and Couch (performance wise). > > Just pie-in-the-sky thinking... I am sure the senior team here has talked > this stuff to death years ago. My apologies if this is re-treading road > covered in beaten horse bodies ;) > > I just find this category of problems interesting. >
The goal of the view engine refactoring was to encourage people to start asking and answering exactly these types of questions. I got a bit distracted by work stuff since but I've still got a plan and some notes on a blog post about how to write new indexers. Hopefully in time we'll have a wider array of secondary index types than just the m/r style we have currently. > > On Thu, Dec 22, 2011 at 6:10 PM, Randall Leeds <[email protected]>wrote: > >> On Thu, Dec 22, 2011 at 12:11, Riyad Kalla <[email protected]> wrote: >> > On Thu, Dec 22, 2011 at 12:38 PM, Robert Newson <[email protected]> >> wrote: >> > >> >> There are a >> >> few parts of the article that are inaccurate (like the assertion we >> >> have good locality for the id and seq trees. If this were true we >> >> wouldn't have seen such a huge improvement in compaction by >> >> temporarily separating them). >> > >> > >> > I'd look forward to more detail on this... it was my understanding the >> > updates were appended out in the [doc rev][_id idx diff][_seq idx diff] >> > format at the end of the data file. Sounds like I may have misunderstood >> > that? >> > >> >> Riyad, as you pointed out earlier, all the inner nodes are rewritten >> up to the root. The two btrees are not written in parallel, though, >> which means that for deep trees all the updated nodes are written >> before the other tree's nodes are written. Also remember that the >> trees themselves end up pretty fragmented since older nodes that >> haven't changed are back toward the beginning of the file. In general, >> I'm not sure there's much that's useful to mention about locality in >> the trees. Also, updating these trees requires reading the old values, >> so there is still seeking that occurs (if the pages aren't cached by >> the OS). >> >> > >> >> The 'COMPETE recreation' paragraph also >> >> strikes me as factually awry. >> >> >> > >> > I'd appreciate a detailed correction on this if it is wrong; all the >> > digging I've done (in this thread and other partial resources) suggests >> > that the path from the changed doc ref back to the root (including a copy >> > of all internal nodes and all of their child references) is written so as >> > being able to read-back into the index from the tail of the data file >> > quickly... specifically slides 17, 18 and 19 from this slidedeck ( >> > http://www.slideshare.net/jchrisa/btree-nosql-oak?from=embed) -- note >> that >> > the interim nodes [A..M] and [A..Z] are rewritten (including any and all >> > child pointers they contained). >> > >> > This is what I was referring to; I should either clean up my wording >> > (confusing) or I got it wrong in which case I'd appreciate any and all >> > corrections. >> >> Right. It mostly seems a bit confusing to me. >> "it DOES NOT just rewrite the nodes pathing from the leaf to the node >> and ONLY the connections for that single document" >> That doesn't sound quite right, but I can tell what you're trying to >> say is accurate. If I'm right, you mean that every changed inner node >> is rewritten in its entirety rather than having a single pointer to >> the new child updated in place. >> >> Cheers. Thanks for taking the time to write this up. >> >> -Randall >>
