On Aug 12, 2010, at 11:38 PM, Mikeal Rogers wrote: > I tested the latest code in recover-couchdb and it looks great.
We need to package this so that it is useable by end-users, and put a link to it on http://couchdb.apache.org/notice/1.0.1.html I'm the last guy who knows what that would mean... anyone? I think we should do this today. Do we need to do anything formal and time consuming before linking to the recovery tool / process from that page? Also, someone needs to write up the how-to instructions, along with a description of what to expect. Chris > > -Mikeal > > On Thu, Aug 12, 2010 at 2:33 PM, J Chris Anderson <jch...@apache.org> wrote: > >> >> On Aug 12, 2010, at 2:15 PM, J Chris Anderson wrote: >> >>> >>> On Aug 12, 2010, at 12:36 PM, Adam Kocoloski wrote: >>> >>>> Right, and jchris' db_repair branch includes my patches for DB reader >> _admin access and a more useful progress report in the replication phase of >> the repair. >>>> >>> >>> I've updated the repair branch with everyone's code. I think it is >> faster, due to Adam's idea that if we run the merges in reverse order, those >> near the front of the file are more likely to be no-ops, so less work is >> done over all. >>> >>> Mikeal will be testing for correctness. Could other's please use it and >> test for usability as well. Latest code (with instructions) is here: >>> >>> http://github.com/jhs/recover-couchdb/ >>> >>> Which points at http://github.com/jchris/couchdb/tree/db_repair for the >> repair code. >>> >>> One thing I am not clear about (need better docs) is, do we need to >> replicate the original db to the lost+found db (or vice-versa), after >> recovery is complete? >>> >> >> Also, we should be clear about what the semantics for this are. It can >> potentially introduce conflicts if some writes were repeated after restarts. >> Should it always be a noop on dbs that are clean w/r/t the bug? >> >> Chris >> >>> Chris >>> >>>> Adam >>>> >>>> On Aug 12, 2010, at 3:14 PM, Jason Smith wrote: >>>> >>>>> The code is updated with the following changes: >>>>> 1. Adhere to the lost+found/databasename custom... >>>>> 2. ...except databases starting with _, which goes into >>>>> _system/databasename >>>>> 3. Sync up with jchris's db_repair branch >>>>> >>>>> (About #2, I started with _/database but I think it's too easy to miss >> at >>>>> the command line.) >>>>> >>>>> On Fri, Aug 13, 2010 at 00:52, J Chris Anderson <jch...@gmail.com> >> wrote: >>>>> >>>>>> A few bug reports from my testing: >>>>>> >>>>>> I launched with this command, as specified in the README: >>>>>> >>>>>> find ~/code/couchdb/tmp/lib -type f -name '*.couch' -exec >> ./recover_couchdb >>>>>> {} \; >>>>>> >>>>>> >>>>>> >>>>>> First of all, it chokes on my _users and _replicator db: >>>>>> >>>>>> [info] [<0.2.0>] couch_db_repair for _users - scanning 335961 bytes at >> 0 >>>>>> [error] [<0.2.0>] couch_db_repair merge node at 332061 {case_clause, >>>>>> {error,illegal_database_name}} >>>>>> >>>>>> That second [error] line is repeated many many times (once per merge I >>>>>> think). I think the issue is that _users is hard-coded to be OK, but >>>>>> _users_lost+found is not. So we should do something about that, maybe >> if a >>>>>> db-name starts with _ we should call the lost and found >> a_users_lost+found >>>>>> (_ sorts at the top, so "a" will be near it and legal). >>>>>> >>>>>> >>>>>> >>>>>> When a database has readers defined in the security object, the tool >> is >>>>>> unable to open them (the reading part of the repair tool needs to have >> the >>>>>> _admin userCtx, not just the writer). >>>>>> >>>>>> [debug] [<0.2.0>] Not a reader: UserCtx {user_ctx,null,[],undefined} >> vs >>>>>> Names [<<"joe">>] Roles [<<"_admin">>] >>>>>> escript: exception throw: {unauthorized,<<"You are not authorized to >> access >>>>>> this db.">>} >>>>>> in function couch_db:open/2 >>>>>> in call from couch_db_repair:make_lost_and_found/3 >>>>>> in call from recover_couchdb:main/1 >>>>>> in call from escript:run/2 >>>>>> in call from escript:start/1 >>>>>> in call from init:start_it/1 >>>>>> in call from init:start_em/1 >>>>>> >>>>>> >>>>>> It would also be helpful if the status lines could say something more >> than >>>>>> >>>>>> [info] [<0.2.0>] couch_db_repair writing 15 updates to >> bench_lost+found >>>>>> >>>>>> Like maybe add a note like "about 23% complete" if at all possible. >>>>>> >>>>>> >>>>>> I will patch the first few, I'd love help from someone on the last >> one. >>>>>> I'll be on IRC. >>>>>> >>>>>> >>>>>> Cheers, >>>>>> Chris >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Aug 12, 2010, at 10:18 AM, J Chris Anderson wrote: >>>>>> >>>>>>> >>>>>>> On Aug 11, 2010, at 2:14 PM, Jason Smith wrote: >>>>>>> >>>>>>>> Hi, Jason. >>>>>>>> >>>>>>>> On Thu, Aug 12, 2010 at 04:14, Jason Smith <j...@couch.io> wrote: >>>>>>>> >>>>>>>>> On Wed, Aug 11, 2010 at 09:52, Adam Kocoloski <kocol...@apache.org >>> >>>>>> wrote: >>>>>>>>> >>>>>>>>>> Excellent, thanks for testing. I caught Jason Smith saying on IRC >>>>>> that he >>>>>>>>>> had packaged the whole thing up as an escript + some .beams. If >> we >>>>>> can get >>>>>>>>>> it down to a single file a la rebar that would be a pretty sweet >> way >>>>>> to >>>>>>>>>> deliver the repair tool in my opinion. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Please check out http://github.com/jhs/repair-couchdb >>>>>>>>> >>>>>>>> >>>>>>>> I think you mean http://github.com/jhs/recover-couchdb >>>>>>>> >>>>>>> >>>>>>> I think it is important that we package and release this, if it is >> ready. >>>>>> We should link to it from the bug description page, the project home >> page, >>>>>> as well as blog about it, etc. What is the point of working feverishly >> on a >>>>>> recovery tool if we don't go the last mile? >>>>>>> >>>>>>> I am testing it now on my database directory to make sure it doesn't >> harm >>>>>> anything (I was never subject to the bug, which is probably where most >>>>>> people are, but they might run it anyway.) >>>>>>> >>>>>>> As it stands the submodules thing can't be part of the release, we >> need >>>>>> to package it up as a single zip file or something. >>>>>>> >>>>>>> Is there anything else that needs to be done before we can release >> this? >>>>>>> >>>>>>> Chris >>>>>>> >>>>>>>> -- >>>>>>>> Jason Smith >>>>>>>> Couchio Hosting >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Jason Smith >>>>> Couchio Hosting >>>> >>> >> >>