Re: Multiple database backup strategy

Adam Kocoloski Sat, 19 Mar 2016 13:42:49 -0700

Hi Bob, comments inline:

> On Mar 19, 2016, at 2:36 PM, Robert Samuel Newson <rnew...@apache.org> wrote:
> 
> Hi,
> 
> The problem is that _db_updates is not guaranteed to see every update, so I 
> think it falls at the first hurdle.


Do you mean to say that a listener of _db_updates is not guaranteed to see 
every updated *database*? I think it would be helpful for the discussion to 
describe the scenario in which an updated database permanently fails to show up 
in the feed. My recollection is that it’s quite byzantine.

> What couch_replicator_manager does in couchdb 2.0 (though not in the version 
> that Cloudant originally contributed) is to us ecouch_event, notice which are 
> to _replicator shards, and trigger management work from that.

Did you mean to say “couch_event”? I assume so. You’re describing how the 
replicator manager discovers new replication jobs, not how the jobs discover 
new updates to source databases specified by replication jobs. Seems orthogonal 
to me unless I missed something.

> Some work I'm embarking on, with a few other devs here at Cloudant, is to 
> enhance the replicator manager to not run all jobs at once and it is indeed 
> the plan to have each of those jobs run for a while, kill them (they 
> checkpoint then close all resources) and reschedule them later. It's TBD 
> whether we'd always strip feed=continuous from those. We _could_ let each job 
> run to completion (i.e, caught up to the source db as of the start of the 
> replication job) but I think we have to be a bit smarter and allow 
> replication jobs that constantly have work to do (i.e, the source db is 
> always busy), to run as they run today, with feed=continuous, unless forcibly 
> ousted by a scheduler due to some configuration concurrency setting.

So I think this is really the crux of the issue. My contention is that 
permanently occupying a socket for each continuous replication with the same 
source and mediator is needlessly expensive, and that _db_updates could be an 
elegant replacement.

> I note  for completeness that the work we're planning explicitly includes 
> "multi database" strategies, you'll hopefully be able to make a single 
> _replicator doc that represents your entire intention (e.g, "replicate _all_ 
> dbs from server1 to server2”).

Nice! It’ll be good to hear more about that design as it evolves, particularly 
in aspects like discovery of newly created source databases and reporting of 
403s and other fatal errors.

Adam

Re: Multiple database backup strategy

Reply via email to