Re: data recovery tool progress

Adam Kocoloski Tue, 10 Aug 2010 12:06:55 -0700

Good idea.  Now we've got

> [info] [<0.33.0>] couch_db_repair for testwritesdb - scanning 1048576 bytes 
> at 1380102
> [info] [<0.33.0>] couch_db_repair for testwritesdb - scanning 1048576 bytes 
> at 331526
> [info] [<0.33.0>] couch_db_repair for testwritesdb - scanning 331526 bytes at > 0
> [info] [<0.33.0>] couch_db_repair writing 12 updates to 
> lost+found/testwritesdb
> [info] [<0.33.0>] couch_db_repair writing 9 updates to lost+found/testwritesdb
> [info] [<0.33.0>] couch_db_repair writing 8 updates to lost+found/testwritesdb


Adam

On Aug 10, 2010, at 2:29 PM, Robert Newson wrote:

> It took 20 minutes before the first 'update' line came out, but now
> seems to be recovering smoothly. machine load is back down to sane
> levels.
> 
> Suggest feedback during the hunting phase.
> 
> B.
> 
> On Tue, Aug 10, 2010 at 7:11 PM, Adam Kocoloski <kocol...@apache.org> wrote:
>> Thanks for the crosscheck.  I'm not aware of anything in the node finder 
>> that would cause it to struggle mightily with healthy DBs.  It pretty much 
>> ignores the health of the DB, in fact.  Would be interested to hear more.
>> 
>> On Aug 10, 2010, at 1:59 PM, Robert Newson wrote:
>> 
>>> I verified the new code's ability to repair the testwritesdb. system
>>> load was smooth from start to finish.
>>> 
>>> I started a further test on a different (healthy) database and system
>>> load was severe again, just collecting the roots (the lost+found db
>>> was not yet created when I aborted the attempt). I suspect the fact
>>> that it's healthy is the issue, so if I'm right, perhaps a warning is
>>> useful.
>>> 
>>> B.
>>> 
>>> 
>>> 
>>> On Tue, Aug 10, 2010 at 6:53 PM, Adam Kocoloski <kocol...@apache.org> wrote:
>>>> Another update.  This morning I took a different tack and, rather than try 
>>>> to find root nodes, I just looked for all kv_nodes in the file and treated 
>>>> each of those as a separate virtual DB to be replicated.  This reduces the 
>>>> algorithmic complexity of the repair, and it looks like testwritesdb 
>>>> repairs in ~30 minutes or so.  Also, this method results in the lost+found 
>>>> DB containing every document, not just the missing ones.
>>>> 
>>>> My branch does not currently include Randall's parallelization of the 
>>>> replications.  It's still CPU-limited, so that may be a worthwhile 
>>>> optimization.  On the other hand, I think we may be reaching a stage at 
>>>> which performance for this repair tool is 'good enough', and pmaps can 
>>>> make error handling a bit dicey.
>>>> 
>>>> In short, I think this tool is now in good shape.
>>>> 
>>>> http://github.com/kocolosk/couchdb/tree/db_repair
>>>> 
>> 
>>

Re: data recovery tool progress

Reply via email to