[GitHub] couchdb-couch-replicator pull request #54: Allow configuring maximum documen...

nickva Fri, 03 Feb 2017 15:24:34 -0800

GitHub user nickva opened a pull request:

    https://github.com/apache/couchdb-couch-replicator/pull/54


    Allow configuring maximum document ID length during replication

    Currently due to a bug in http parser and lack of document ID length
    enforcement, large document IDs will break replication jobs. Large IDs
    will pass through the _change feed, revs diffs,  but then fail
    during open_revs get request. open_revs request will keep retrying until
    it gives up after long enough time, then replication task crashes and
    restart again with the same pattern. The current effective limit is
    around 8k or so. (The buffer size default 8192 and if the first line
    of the request is larger than that, request will fail).
    
    (See http://erlang.org/pipermail/erlang-questions/2011-June/059567.html
    for more information about the possible failure mechanism).
    
    Bypassing the parser bug by increasing recbuf size, will alow replication
    to finish, however that means simply spreading the abnormal document through
    the rest of the system, and might not be desirable always.
    
    Also once long document IDs have been inserted in the source DB. Simply 
deleting
    them doesn't work as they'd still appear in the change feed. They'd have to
    be purged or somehow skipped during the replication step. This commit helps
    do the later.
    
    Operators can configure maximum length via this setting:
    ```
      replicator.max_document_id_length=0
    ```
    
    The default value is 0 which means there is no maximum enforced, which is
    backwards compatible behavior.
    
    During replication if maximum is hit by a document, that document is 
skipped,
    an error is written to the log:
    
    ```
    Replicator: document id `aaaaaaaaaaaaaaaaaaaaa...` from source db  
`http://.../cdyno-0000001/` is too long, ignoring.
    ```
    
    and `"doc_write_failures"` statistic is bumped.
    
    COUCHDB-3291

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloudant/couchdb-couch-replicator 
couchdb-3291-limit-doc-id-size-in-replicator

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/couchdb-couch-replicator/pull/54.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #54
    
----
commit 3ff2d83893481afd68025a52a6d859a2efaf0bcf
Author: Nick Vatamaniuc <[email protected]>
Date:   2017-02-03T23:00:37Z

    Allow configuring maximum document ID length during replication
    
    Currently due to a bug in http parser and lack of document ID length
    enforcement, large document IDs will break replication jobs. Large IDs
    will pass through the _change feed, revs diffs,  but then fail
    during open_revs get request. open_revs request will keep retrying until
    it gives up after long enough time, then replication task crashes and
    restart again with the same pattern. The current effective limit is
    around 8k or so. (The buffer size default 8192 and if the first line
    of the request is larger than that, request will fail).
    
    (See http://erlang.org/pipermail/erlang-questions/2011-June/059567.html
    for more information about the possible failure mechanism).
    
    Bypassing the parser bug by increasing recbuf size, will alow replication
    to finish, however that means simply spreading the abnormal document through
    the rest of the system, and might not be desirable always.
    
    Also once long document IDs have been inserted in the source DB. Simply 
deleting
    them doesn't work as they'd still appear in the change feed. They'd have to
    be purged or somehow skipped during the replication step. This commit helps
    do the later.
    
    Operators can configure maximum length via this setting:
    ```
      replicator.max_document_id_length=0
    ```
    
    The default value is 0 which means there is no maximum enforced, which is
    backwards compatible behavior.
    
    During replication if maximum is hit by a document, that document is 
skipped,
    an error is written to the log:
    
    ```
    Replicator: document id `aaaaaaaaaaaaaaaaaaaaa...` from source db  
`http://.../cdyno-0000001/` is too long, ignoring.
    ```
    
    and `"doc_write_failures"` statistic is bumped.
    
    COUCHDB-3291

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] couchdb-couch-replicator pull request #54: Allow configuring maximum documen...

Reply via email to