[Couchdb Wiki] Update of "ReleaseNotice1.0.0RepairTool" by JanLehnardt

Apache Wiki Sun, 26 Feb 2012 14:08:03 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Couchdb Wiki" for 
change notification.


The "ReleaseNotice1.0.0RepairTool" page has been changed by JanLehnardt:
http://wiki.apache.org/couchdb/ReleaseNotice1.0.0RepairTool?action=diff&rev1=1&rev2=2

Comment:
add repair tool page

   5. Start CouchDB 1.0.1.
   6. Wait until CouchDB says "Time to Relax.".
    6.1. CouchDB will have switched te config to `recovery = false`.
-   6.2. CouchDB will 
+   6.2. CouchDB will have created  `/lost\_and\_found\_$dbname\_$timestamp` 
for every db that has orphaned data.
+  7. Go to `/lost\_and\_found\_$dbname\_$timestamp` and see if any lost data 
is found.
+  8. Replicate `/lost\_and\_found\_$dbname\_$timestamp` to `/$dbname` to 
restore all data.
  
+ 
+ 
+ = A generalized repair algorithm =
+ 
+ 
+ == the basic repair algorithm: ==
+ 
+     for Node in BTreeNodes
+       run the compactor to create a lost and found database file that treats 
Node as a root
+     use the replicator to merge all the lost and found database files into 
one database
+     Done!
+ 
+ It is an optimization to prune the Nodes in BTreeNodes so that there is the 
minumum number of them without losing any data. If we did this for every single 
node in the original file, we'd end up with the same final result as if we did 
it optimally. the difference is that we'd use exponentially more disk space in 
the process.
+ 
+ [Another space saving optimization would be to only keep one resident 
lost+found db around and have one active l+f db one per compaction and 
replicate the current one into the resident one after each compaction. I don't 
know how hard it'd be to find the minimal set of node to apply the other 
optimization, so this might be a reasonable alternative to keep disk-usage in 
bounds, although not as optimal -- Jan]
+ 
+ == Finding only the btree nodes we need: ==
+ 
+ It is hard to tell roots apart from other nodes. The way to do that is work 
backwards through the file keeping track of pointers in nodes. if a node 
matches a pointer you've seen, you don't need to capture it. if a node has not 
been pointed to, capture it. This gives you all the roots.
+ 
+ 
+ This means we can also keep track of the pointers that headers point to, and 
a root is pointed to by any header, we can assume any other roots later in the 
file from that header, take that state into account. therefore any root pointed 
to be a header can be disregarded.
+ 
+ when we are done, we will have a list of roots that are not reflected by the 
last valid header in the file (some of these may be after the last valid 
header), **nor are they reflected by later roots in the same "lineage"**. this 
is the set of nodes that we need to use as the starting point for compactions.
+ 
+ **the part in bold is false and means we will collect unneeded roots** I'm 
currently looking into a way to distinguish superceded roots from final roots.
+ 
+ we will end up with as many "lost and found" database files as we have roots 
here. once we have them, we can begin the compact and replicate routine.
+

[Couchdb Wiki] Update of "ReleaseNotice1.0.0RepairTool" by JanLehnardt

Reply via email to