[ https://issues.apache.org/jira/browse/COUCHDB-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543211#comment-15543211 ]
ASF GitHub Bot commented on COUCHDB-3168: ----------------------------------------- GitHub user nickva opened a pull request: https://github.com/apache/couchdb-couch-replicator/pull/49 Fix replicator handling of max_document_size when posting to _bulk_docs Currently `max_document_size` setting is a misnomer, it actually configures maximum request body size. For single document requests it is a good enough approximation. However, _bulk_docs updates could fail the total request size check even if individual documents stay below the maximum limit. Before this fix during replication, `_bulk_docs` reqeust would crash, which eventually leads to an infinite cycles of crashes and restarts (with a potential large state being dumped to logs), without replicaton job making progress. The is to do binary split on the batch size until either all documents will fit under max_document_size limit, or some documents will fail to replicate. If documents fail to replicate, they bump the `doc_write_failures` count. Effectively `max_document_size` acts as in implicit replication filter in this case. Jira: COUCHDB-3168 You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloudant/couchdb-couch-replicator couchdb-3168 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/couchdb-couch-replicator/pull/49.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #49 ---- commit a9cd0b191524428ece0ebd0a1e18c88bb2afcbaa Author: Nick Vatamaniuc <vatam...@apache.org> Date: 2016-10-03T19:30:23Z Fix replicator handling of max_document_size when posting to _bulk_docs Currently `max_document_size` setting is a misnomer, it actually configures maximum request body size. For single document requests it is a good enough approximation. However, _bulk_docs updates could fail the total request size check even if individual documents stay below the maximum limit. Before this fix during replication, `_bulk_docs` reqeust would crash, which eventually leads to an infinite cycles of crashes and restarts (with a potential large state being dumped to logs), without replicaton job making progress. The is to do binary split on the batch size until either all documents will fit under max_document_size limit, or some documents will fail to replicate. If documents fail to replicate, they bump the `doc_write_failures` count. Effectively `max_document_size` acts as in implicit replication filter in this case. Jira: COUCHDB-3168 ---- > Replicator doesn't handle well writing documents to a target db which has a > small max_document_size > --------------------------------------------------------------------------------------------------- > > Key: COUCHDB-3168 > URL: https://issues.apache.org/jira/browse/COUCHDB-3168 > Project: CouchDB > Issue Type: Bug > Reporter: Nick Vatamaniuc > > If a target db has set a smaller document max size, replication crashes. > It might make sense for the replication to not crash and instead treat > document size as an implicit replication filter then display doc write > failures in the stats / task info / completion record of normal replications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)