GitHub user nickva opened a pull request:

    https://github.com/apache/couchdb-couch-replicator/pull/49

    Fix replicator handling of max_document_size when posting to _bulk_docs

    Currently `max_document_size` setting is a misnomer, it actually configures
    maximum request body size. For single document requests it is a good enough
    approximation. However, _bulk_docs updates could fail the total request size
    check even if individual documents stay below the maximum limit.
    
    Before this fix during replication, `_bulk_docs` reqeust would crash, which
    eventually leads to an infinite cycles of crashes and restarts (with a
    potential large state being dumped to logs), without replicaton job making
    progress.
    
    The is to do binary split on the batch size until either all documents will
    fit under max_document_size limit, or some documents will fail to replicate.
    
    If documents fail to replicate, they bump the `doc_write_failures` count.
    Effectively `max_document_size` acts as in implicit replication filter in 
this
    case.
    
    Jira: COUCHDB-3168

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloudant/couchdb-couch-replicator couchdb-3168

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/couchdb-couch-replicator/pull/49.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #49
    
----
commit a9cd0b191524428ece0ebd0a1e18c88bb2afcbaa
Author: Nick Vatamaniuc <vatam...@apache.org>
Date:   2016-10-03T19:30:23Z

    Fix replicator handling of max_document_size when posting to _bulk_docs
    
    Currently `max_document_size` setting is a misnomer, it actually configures
    maximum request body size. For single document requests it is a good enough
    approximation. However, _bulk_docs updates could fail the total request size
    check even if individual documents stay below the maximum limit.
    
    Before this fix during replication, `_bulk_docs` reqeust would crash, which
    eventually leads to an infinite cycles of crashes and restarts (with a
    potential large state being dumped to logs), without replicaton job making
    progress.
    
    The is to do binary split on the batch size until either all documents will
    fit under max_document_size limit, or some documents will fail to replicate.
    
    If documents fail to replicate, they bump the `doc_write_failures` count.
    Effectively `max_document_size` acts as in implicit replication filter in 
this
    case.
    
    Jira: COUCHDB-3168

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to