Thanks to Filipe's patch, the replicator now has configurable
concurrency and pipeline length.
ibrowse enforces these two variables, returning {error, retry_later}
when there is no connection with space in the pipeline available.
These limits are set on per Host-Port combination. As such, they are
shared across concurrent replications between hosts.
In preparation for lowering the concurrency in our production setup, I
was reasoning through potential problems and came upon the following
scenario:
Due to
1) low pipeline/concurrency settings
2) many concurrent replications to/from the same host
ibrowse could return a lot of {error, retry_later} responses. In a
particularly nasty/busy scenario this could cause replications to fail
since couch_rep_reader has a fixed number of requests it will
*attempt* to issue (100).
If
(100 requests/replication) * (n concurrent replications) >>>
max_http_sessions * max_http_pipeline_size
then
replications may fail when in fact there are no network errors.
I propose to make couch_rep_httpc catch {error, retry_later} and treat
it specially. Specifically, it should not decrement the retry count.
My questions are:
1) should there still be an exponential backoff for this retry?
2) would you be in favor of committing this patch?