Hi,

Agree there are limitations to the various workarounds in my previous response, 
the only one that I'm confident in is disabling compression for these responses 
(for our particular setup only).

So, the behaviour of the _changes endpoint when used with the feed=continuous 
and heartbeat=X (where X is number of milliseconds) is as follows;

1) when _changes is invoked, couchdb opens its internal "docs in update order" 
btree at the position indicated by the 'since' parameter, or the start of the 
btree if 'since' is not supplied.
2) for every key/value in the index from that point until the end of the btree, 
a json object is written to the http response as a chunk (we might pack 
multiple objects in one chunk but an object never spans multiple chunks).
3) once all of those are written to the response, couchdb then waits for 
notifications that new updates have been made (a document create/update/delete 
operation in a separate request), and then writes those updates to the http 
response as chunks too.
4) in the absence of updates, a timeout occurs, which causes us to write the 
heartbeat chunk (which consists of a single newline character). the timer is 
then reset. so, for a database that is receiving no writes, the response would 
consist of periodic newlines and nothing else.
5) steps 3 and 4 could conceivably continue (in any order) indefinitely; until 
a server crash or the client disconnects, etc.

So, yes, clearly there is no HTTP-compliant solution.

The motivating case for me is we encountered a user that was particularly 
sensitive to the delayed heartbeats, in that it triggered poor connection 
handling logic on their side with much greater probability than if the 'keep 
alive' was working. It is ironic, to say the least, that the fundamental issue 
is that CouchDB is, and always has been, dependent on something (timely 
delivery of the partial http chunks in the response) that the HTTP spec does 
not require any client or proxy to honour.


B.

> On 24 Jun 2023, at 06:06, Willy Tarreau <w...@1wt.eu> wrote:
> 
> Hi Robert,
> 
> On Fri, Jun 23, 2023 at 11:33:37PM +0100, Robert Newson wrote:
>> Hi,
>> 
>> I underestimated. the heartbeat option was added back in 2009, 14 years ago,
>> but I don't want to fixate on whether we made this mistake long enough ago to
>> justify distorting HAProxy.
> 
> OK!
> 
>> The CouchDB dev team are discussing this internally at the moment and I'll
>> update this thread if/when any conclusion comes out of that. It was noted in
>> that discussion that PouchDB (https://pouchdb.com/) does the same thing btw.
> 
> Thanks for the link. It's still not very clear to me what the *exact*
> communication sequence is. PouchDB mentions REST so my feeling is that
> the client sends requests that was and the server responds to each
> request, but then that means that responses are using full messages and
> there shouldn't be any delay. Thus it's not very clear to me when the
> heartbeats are sent. Maybe while processing a request ? If that's the
> case, using HTTP/1xx interim responses might work much better. For
> example there used to be the 102 code used to indicate browsers that
> some processing was in progress, though it's not recommended anymore
> to send it to browsers which will simply ignore it. But I'm not sure
> what motivates this "not recommended" given that any 1xx except 101
> would fit since you can send several of them before a final response.
> 
>> Given the nature of the endpoint (it could well return large amounts of
>> highly compressible data) we're not keen to disable compression on these
>> responses as the ultimate fix, though that certainly _works_.
> 
> OK but just keep in mind that even without compression, *some* analysis
> might require buffering a response. For example, some users might very
> well have:
> 
>    http-response wait-for-body
>    http-response deny if { res.body,lua.check_body }
> 
> where "lua.check_body" would be a Lua-based converter meant to verify
> there is no information leak (credit card numbers, e-mail addresses
> etc). And it could even be done using a regex.
> 
> With such a setup, the first 16kB (or more depending on the config) of
> response body will be buffered without being forwarded until the buffer
> is full, the response is complete or a rule matches. Again, with interim
> responses this wouldn't be a problem because these would be forwarded
> instantly.
> 
>> We could do as little as warn in our documentation that timely delivery of
>> heartbeats is not guaranteed and as much as simply ignore the heartbeat
>> request parameter and proceeding as if it were not set (thus the response is
>> a stream of lots of actual data and then, after a short idle timer expires,
>> it terminates cleanly with an empty chunk).
> 
> Maybe but that sounds a bit like giving up on an existing feature.
> 
>> Another thought is we could cause a configurable number of heartbeat chunks
>> to be emitted instead of a single one to overcome any buffering by an
>> intermediary, whether HAProxy or something else.
> 
> I don't think this would be effective. See the example above of buffering
> a response, it could require 16k heartbeats to overcome a default buffer
> analysis. And with compression it could be quite difficult as well, for
> example a zlib-based compressor can work on up to 32kB input to produce
> just a few bytes indicating the repetition length.
> 
>> In brief, we have options to ponder besides altering HAProxy in ways that
>> violate both the letter and spirit of HTTP law.
>> 
>> On reflection, I don't think HAProxy should attempt to fix this problem, but
>> I thank you for holding that out as an option.
> 
> OK. Without knowing more about the protocol and sequence itself, what I
> could suggest is:
>  - if you're only sending heartbeats before delivering a response,
>    HTTP/1xx should probably be the cleanest way to proceed. If
>    requests are sent as POST we could even imagine that emitting them
>    with "expect: 100-continue" allows the server to regularly send 100
>    for example. We could also bring the discussion to the IETF HTTP WG
>    to figure why 102 is being deprecated, and whether some proxies
>    merge multiple 100-continue to a single one.
> 
>  - if multiple responses are being sent as part of a single HTTP
>    response and the heartbeats are placed between them, then there is
>    no direct HTTP-compliant solution to this and you can only rely on
>    network and intermediaries' absence of buffering. In this case we
>    can try to find together how we can improve the situation for you,
>    possibly even by detecting some communication patterns, by using
>    some headers, or by using some explicit configuration. For example
>    we already have "option http-no-delay" that was added a long time
>    ago to avoid the 200ms TCP delay for interactive HTTP messages even
>    though these do not comply with HTTP.
> 
> Thus do not hesitate to let us know.
> 
> Cheers,
> Willy


Reply via email to