nickva opened a new pull request #470: 63012 scheduler
URL: https://github.com/apache/couchdb/pull/470
 
 
   Introduce Scheduling CouchDB Replicator
   
   Jira: https://issues.apache.org/jira/browse/COUCHDB-3324
   
   The core of the new replicator is a scheduler. It which allows running a 
large number of replication jobs by switching between them, stopping some and 
starting others periodically. Jobs which fail are backed off exponentially. 
There is also an improved inspection and querying API: `_scheduler/jobs` and 
`_scheduler/docs`.
   
   Replication protocol hasn't change so it is possible to replicate between 
CouchDB 1.x, 2.x, PouchDB, and other implementations of CouchDB replication 
protocol.
   
   ## Scheduler
   
   Scheduler allows running a large number of replication jobs. Tested with up 
to 100k replication jobs in 3 node cluster. Replication jobs are run in a fair, 
round-robin fashion. Scheduler behavior can be configured by these 
configuration options in `[replicator]` sections:
   
      * `max_jobs` : Number of actively running replications. Making this too 
high
        could cause performance issues. Making it too low could mean 
replications jobs
        might not have enough time to make progress before getting unscheduled 
again.
        This parameter can be adjusted at runtime and will take effect during 
next
        rescheduling cycle.
   
      * `interval` : Scheduling interval in milliseconds. During each reschedule
          cycle scheduler might start or stop up to "max_churn" number of jobs.
   
      * `max_churn` : Maximum number of replications to start and stop during
         rescheduling. This parameter along with "interval" defines the rate of 
job
         replacement. During startup, however a much larger number of jobs 
could be
         started (up to max_jobs) in a short period of time.
   
   ## _scheduler/{jobs,docs} API
   
   There is an improved replication state querying API, with a focus on ease of 
use and performance. The new API avoids having to update the replication 
document with transient state updates. In production that can lead to conflicts 
and performance issues. The two new APIs are:
   
      * `_scheduler/jobs` : This endpoint shows active replication jobs. These 
are jobs managed by the scheduler. Some of them might be running, some might be 
waiting to run, or backed off (penalized) because they crashed too many times. 
Semantically this is somewhat equivalent to `_active_tasks` but focuses only on 
replications. Jobs which have completed or which were never created because of 
malformed replication document will not be shown here as they are not managed 
by the scheduler. `_replicate` replications, started form `_replicate` endpoint 
not from a document in a `_replicator` db, will also show up here.
   
      *  `_scheduler/docs` :  This endpoint is an improvement on having to go 
back and re-read replication document to query their state. It represents the 
state of all the replications started from documents in `_replicator` dbs. 
Unlike `_scheduler/jobs` it will also show jobs which have failed or completed 
(that is, which are not managed by the scheduler anymore). 
   
   ## Compatibility Mode
   
   Understandably some customers are using the document-based API to query 
replication states (`triggered`, `error`, `completed` etc). To ease the upgrade 
path, there is a compatibility configuration setting:
   
   ```
   [replicator]
   update_docs = false | true
   ```
   It defaults to `false` but when set to `true` it will continue updating 
replication document with the state of the replication jobs.
   
   
   ## Other Improvements
   
    * Network resource usage and performance was improved by implementing a 
common connection pool. This should help in cases of a large number of 
connections to the same sources or target. Previously connection pools were 
shared only withing a single replication job.
   
    * Improved rate limiting handling. Replicator requests will auto-discover 
rate limit capacity on target and sources based on a proven Additive Increase / 
Multiplicative Decrease feedback control algorithm.
   
    * Improve performance by avoiding repeatedly retrying failing replication 
jobs. Instead use exponential backoff. In a large multi-user cluster, quite a 
few replication jobs are invalid, are crashing or failing (for various reasons 
such as inability to checkpoint to source, mismatched credentials, missing 
databases). Penalizing failing replication will free up system resources for 
more useful work. 
   
    * Improve recovery from long but temporary network failure. Currently if 
replications jobs fail to start 10 times in a row, they will not be retried 
anymore. This is sometimes desirable, but in some scenarios (for example, after 
a sustained DNS failure which eventually recovers), replications reach their 
retry limit and cease to work. Previously it required operator intervention to 
continue. Scheduling replicator will never give up retrying a valid scheduled 
replication job. So it should recover automatically.
   
    * Better handling of filtered replications: Failing to fetch filters could 
block couch replicator manager, lead to message queue backups and memory 
exhaustion. Also, when replication filter code changes update replication 
accordingly (replication job ID should change in that case). This PR fixes both 
of those issues. Filters and replication ID calculation are managed in separate 
worker threads (in couch replicator doc processor). Also, when filter code 
changes on the source, replication jobs will restart and re-calculate their new 
ID automatically.
   
   ## Related PR:
   
   Documentation PR:  WIP
   
   ## Tests
   
   * EUnit test coverage. Some modules such as multidb_changes, have close to 
100% coverage. Some have less. All previous replication tests have been updated 
to work with the new scheduling replicator. 
   
   * Except for one modification, existing Javascript tests pass.
   
   * Additional integration tests. To validate, test and benchmark some edge 
cases, an additional toolkit was created: 
https://github.com/cloudant-labs/couchdyno/blob/master/README_tests.md which 
allows testing scenarios where nodes fail, creation of large number of 
replication jobs, manipulation of cluster configurations, and setting up long 
running (soak) tests.
   
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to