On May 25, 2010, at 2:50 AM, Filipe David Manana wrote:

> Hi all,
> 
> I've reworked on some implementation details. Namely, the replication
> gen_servers now have an ID that is no longer based on the replication
> document ID but instead in the md5 of the replication properties (source,
> target, etc, like it is done currently when we post to _replicate). This
> avoids having identical replications going on at the expense of a bit more
> complex code.
> 
> From the user point of view, everything is pretty much the same as I
> announced before. The only few differences are:
> 
> - when a replication is started by adding a replication document to the
> _replicator DB, the replicator besides adding the field "state" with value
> "triggered" to the replication document, also adds the field
> "replication_id". (with this field's value, we can access the replication
> log/checkpoint documents, as Adam suggested before).
> 
> - if the user adds a second document that in fact describes a replication
> already triggered by a previous document  (same source, target, etc), this
> second document will not get a "state" field added to it. However the
> replicator adds the "replication_id" field to it. This is nice IMO, since we
> can add a view whose keys are the "replication_id" values
> and see which replication documents are duplicates.
> 
> - deleting a duplicated replication document (a document that didn't
> triggered a replication, since a former one already triggered that
> replication) doesn't stop the replication. To stop it, we have to delete the
> document that triggered the replication - we can find it by searching for a
> document with the same "replication_id" and "state" set to "triggered".
> 
> For more details, check the JavaScript test suite:
> http://github.com/fdmanana/couchdb/blob/new_replicator_db/share/www/script/test/replicator_db.js
> It's maybe easier to understand _replicator DB by looking at the tests. It's
> very simple from a user's point of view.
> 
> The whole patch can be found in a new branch at:
> http://github.com/fdmanana/couchdb/compare/new_replicator_db
> 

hmm, 3rd time I've tried to send this...

I've been working on the test cases for the replicator db, to remove wait() 
from the test. I think this will make them more robust as well.

Instead of waiting, I wrote functions to check a replication doc for state == 
"complete" or another, to wait for the update_seq of two databases to match.

There are a couple of places where I had to leave wait() in. These are in spots 
with assertions that a particular replication *did stop* when a document is 
deleted. So you have to wait and then see if the docs are there or not. I can't 
think of way to test for this, otherwise. (Unless maybe active_tasks is 
accurate enough to use in these assertions.)

I plan to dig into the meat of the patch soon but wanted to start with the 
tests.

The commit is here: 

http://github.com/jchris/couchdb/tree/fdm/nrd

Thanks for all the hard work, Filipe, and everyone who's giving feedback.

Chris

> Later on I'll add a patch to a Jira ticket.
> 
> cheers
> 
> 
> 
> On Wed, May 19, 2010 at 10:31 AM, Filipe David Manana 
> <fdman...@gmail.com>wrote:
> 
>> Dear all,
>> 
>> I've been working on the _replicator DB along with Chris. Some of you have
>> already heard about this DB in the mailing list, IRC, or whatever. Its
>> purpose:
>> 
>> - replications can be started by adding a replication document to the
>> replicator DB _replicator (its name can be configured in the .ini files)
>> 
>> - replication documents are basically the same JSON structures that we
>> currently use when POSTing to _replicate/  (and we can give them an
>> arbitrary id)
>> 
>> - to cancel a replication, we simply delete the replication document
>> 
>> - after the replication is started, the replicator adds the field "state"
>> to the replication document with value "triggered"
>> 
>> - when the replication finishes (for non continuous replications), the
>> replication sets the doc's "state" field to "completed"
>> 
>> - if an error occurs during a replication, the corresponding replication
>> document will have the "state" field set to "error"
>> 
>> - after detecting that an error was found, the replication is restarted
>> after some time (10s for now, but maybe it should be configurable)
>> 
>> - after a server restart/crash, CouchDB will remember replications and will
>> restart them (this is specially useful for continuous replications)
>> 
>> - in the replication document we can define a "user_ctx" property, which
>> defines the user name and/or role(s) under which the replication will
>> execute
>> 
>> 
>> 
>> Some restrictions regarding the _replicator DB:
>> 
>> - only server admins can add and delete replication documents
>> 
>> - only the replicator itself can update replication documents - this is to
>> avoid having race conditions between the replicator and server admins trying
>> to update replication documents
>> 
>> - the above point implies that to change a replication you have to add a
>> new replication document
>> 
>> All this restrictions are in replicator DB design doc -
>> http://github.com/fdmanana/couchdb/blob/replicator_db/src/couchdb/couch_def_js_funs.hrl<http://github.com/fdmanana/couchdb/blob/_replicator_db/src/couchdb/couch_def_js_funs.hrl>
>> 
>> 
>> The code is fully working and is located at:
>> http://github.com/fdmanana/couchdb/tree/replicator_db
>> 
>> It includes a comprehensive JavaScript test case.
>> 
>> Feel free to try it and give your feedback. There are still some TODOs as
>> comments in the code, so it's still subject to changes.
>> 
>> 
>> For people more involved with CouchDB internals and development:
>> 
>> That branch breaks the stats.js test and, occasionally, the
>> delayed_commits.js tests.
>> 
>> It breaks stats.js because:
>> 
>> - internally CouchDB uses the _changes API to be aware of the
>> addition/update/deletion of replication documents to/from the _replicator
>> DB. The _changes implementation constantly opens and closes the DB (opens
>> are triggered by a gen_event). This affects the stats open_databases and
>> open_os_files.
>> 
>> It breaks delayed_commits.js  occasionally because:
>> 
>> - by listening to _replicator DB changes an  extra file descriptor is used
>> which affects the max_open_dbs config parameter. This parameter is related
>> to the max number of user opened DBs. This causes the error {error,
>> all_dbs_active} (from couch_server.erl) during the execution of
>> delayed_commits.js (as well as stats.js).
>> 
>> I also have another branch that fixes these issues in a "dirty" way:
>> http://github.com/fdmanana/couchdb/tree/_replicator_db  (has a big comment
>> in couch_server.erl explaining the hack)
>> 
>> Basically it doesn't increment stats for the _replicator DB and bypasses
>> the max_open_dbs when opening _replicator DB as well as doesn't allow it to
>> be closed in favour of a user requested DB (like it assigned it a +infinite
>> LRU time to this DB).
>> 
>> Sometimes (although very rarely) I also get the all_dbs_active error when
>> the authentication handlers are executing (because they open the _users DB).
>> This is not originated by my _replicator DB code at all, since I get it with
>> trunk as well.
>> 
>> I would also like to collect feedback about what to do regarding this 2
>> issues, specially max_open_dbs. Somehow I feel that no matter how many user
>> DBs are open, it should always be possible to open the _replicator DB
>> internally (and the _users DB).
>> 
>> 
>> cheers
>> 
>> 
>> --
>> Filipe David Manana,
>> fdman...@gmail.com
>> 
>> "Reasonable men adapt themselves to the world.
>> Unreasonable men adapt the world to themselves.
>> That's why all progress depends on unreasonable men."
>> 
>> 
> 
> 
> -- 
> Filipe David Manana,
> fdman...@gmail.com
> 
> "Reasonable men adapt themselves to the world.
> Unreasonable men adapt the world to themselves.
> That's why all progress depends on unreasonable men."

Reply via email to