[
https://issues.apache.org/jira/browse/COUCHDB-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299284#comment-14299284
]
Nathan Vander Wilt commented on COUCHDB-1682:
---------------------------------------------
Since I'm logged in and sullying the bug tracker anyway, I will note that this
is a final nail in the coffin for scaling filtered changes/replication:
1. A handful of users/topics/channels/whatever all trying to filter a database
full of changes
2. The filter workers are all re-processing the same documents in the context
of a different request, slowing each other down
3. Load grows a bit more, and these queries start timing out.
4. Guess what, the client still wants the data, so it retries…
So now we have a situation! Instead of making forward progress, the system
pretty much gets into a escalating loop of "everyone's job times out 75%
through, so they all start back at the beginning" until the clients' backoff
interval [if there is any] is long enough to reduce the load to a point where a
few can get to a checkpoint.
Workaround: implement own changes feed logic in yet more middleware, atop a
`local_seq:true` view that pre-sorts the documents into suitable channels.
> Allow filtered _changes to time out, returning last_seq processed
> -----------------------------------------------------------------
>
> Key: COUCHDB-1682
> URL: https://issues.apache.org/jira/browse/COUCHDB-1682
> Project: CouchDB
> Issue Type: Improvement
> Reporter: Nathan Vander Wilt
>
> Right now a filtered _changes query ?since=0 on a database with a high
> update_seq can take a very long time to return. If this request is performed
> through a proxy or through a browser with a timeout, it may never complete as
> far as the client is concerned.
> Right now CouchDB itself ignores any polling timeout for such a request —
> i.e. it does not time out while the _changes results are still processing.
> This is okay, as it at least lets patient clients get a result.
> I propose, though, that the timeout value be respected during the "initial"
> (e.g. in the context of a fresh replication) request. When the timeout is
> reached, the client should get back a valid response, with incomplete (even
> empty!) results and a last_seq corresponding to how far it had processed
> changes in the background. Then the client/replicator could record a
> checkpoint and request processing of the next batch.
> The net result would be that the initial replication request would not be
> unbounded in time. Even if a response is "timed out" by a proxy/browser
> within 30 seconds or 5 minutes, assuming the client is aware of this limit
> they could set a bit lower timeout and get back a last_seq that keeps them
> from having to (futile-ly) try again from since=0.
> Unfortunately, this does slightly change the semantics of the query: it is as
> if limit=0 when the client provided no (or a different) limit and may be
> expecting last_seq to ± match current_seq for such a request. So perhaps this
> behaviour would need to be enabled by its own query parameter, ?batch=please
> or something.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)