This sounds like a good approach, if I get the gist of it, it makes the replication state persistent. We also have a _users db now, is this a good time to think about consolidating and having one _system database ?
Good stuff, Bob On May 19, 2010, at 5:31 AM, Filipe David Manana wrote: > Dear all, > > I've been working on the _replicator DB along with Chris. Some of you have > already heard about this DB in the mailing list, IRC, or whatever. Its > purpose: > > - replications can be started by adding a replication document to the > replicator DB _replicator (its name can be configured in the .ini files) > > - replication documents are basically the same JSON structures that we > currently use when POSTing to _replicate/ (and we can give them an > arbitrary id) > > - to cancel a replication, we simply delete the replication document > > - after the replication is started, the replicator adds the field "state" to > the replication document with value "triggered" > > - when the replication finishes (for non continuous replications), the > replication sets the doc's "state" field to "completed" > > - if an error occurs during a replication, the corresponding replication > document will have the "state" field set to "error" > > - after detecting that an error was found, the replication is restarted > after some time (10s for now, but maybe it should be configurable) > > - after a server restart/crash, CouchDB will remember replications and will > restart them (this is specially useful for continuous replications) > > - in the replication document we can define a "user_ctx" property, which > defines the user name and/or role(s) under which the replication will > execute > > > > Some restrictions regarding the _replicator DB: > > - only server admins can add and delete replication documents > > - only the replicator itself can update replication documents - this is to > avoid having race conditions between the replicator and server admins trying > to update replication documents > > - the above point implies that to change a replication you have to add a new > replication document > > All this restrictions are in replicator DB design doc - > http://github.com/fdmanana/couchdb/blob/replicator_db/src/couchdb/couch_def_js_funs.hrl<http://github.com/fdmanana/couchdb/blob/_replicator_db/src/couchdb/couch_def_js_funs.hrl> > > > The code is fully working and is located at: > http://github.com/fdmanana/couchdb/tree/replicator_db > > It includes a comprehensive JavaScript test case. > > Feel free to try it and give your feedback. There are still some TODOs as > comments in the code, so it's still subject to changes. > > > For people more involved with CouchDB internals and development: > > That branch breaks the stats.js test and, occasionally, the > delayed_commits.js tests. > > It breaks stats.js because: > > - internally CouchDB uses the _changes API to be aware of the > addition/update/deletion of replication documents to/from the _replicator > DB. The _changes implementation constantly opens and closes the DB (opens > are triggered by a gen_event). This affects the stats open_databases and > open_os_files. > > It breaks delayed_commits.js occasionally because: > > - by listening to _replicator DB changes an extra file descriptor is used > which affects the max_open_dbs config parameter. This parameter is related > to the max number of user opened DBs. This causes the error {error, > all_dbs_active} (from couch_server.erl) during the execution of > delayed_commits.js (as well as stats.js). > > I also have another branch that fixes these issues in a "dirty" way: > http://github.com/fdmanana/couchdb/tree/_replicator_db (has a big comment > in couch_server.erl explaining the hack) > > Basically it doesn't increment stats for the _replicator DB and bypasses the > max_open_dbs when opening _replicator DB as well as doesn't allow it to be > closed in favour of a user requested DB (like it assigned it a +infinite LRU > time to this DB). > > Sometimes (although very rarely) I also get the all_dbs_active error when > the authentication handlers are executing (because they open the _users DB). > This is not originated by my _replicator DB code at all, since I get it with > trunk as well. > > I would also like to collect feedback about what to do regarding this 2 > issues, specially max_open_dbs. Somehow I feel that no matter how many user > DBs are open, it should always be possible to open the _replicator DB > internally (and the _users DB). > > > cheers > > > -- > Filipe David Manana, > fdman...@gmail.com > > "Reasonable men adapt themselves to the world. > Unreasonable men adapt the world to themselves. > That's why all progress depends on unreasonable men."