Re: haproxy indefinitely delays the delivery of small http chunks with slz

2023-06-28 Thread Robert Newson
Hi,

Great, thanks for this. Will "option http-no-delay" be required to activate 
this particular tweak
or is that general advice? We'll certainly mention it in our reverse proxy 
documentation either way.

B.

> On 28 Jun 2023, at 04:44, Willy Tarreau  wrote:
> 
> Hi Robert,
> 
> On Tue, Jun 27, 2023 at 01:19:20PM +0100, Robert Newson wrote:
>> Hi,
>> 
>> i'm happy to confirm the two patches combined address the symptom I reported
>> at the start of the thread. I applied them to haproxy.git master after
>> confirming that the problem occurred there for a realistic setup (couchdb
>> with HAProxy in front configured to do compression).
> 
> Excellent, thanks! I'll merge them both to the libslz project and to
> haproxy.
> 
>> The CouchDB project are considering adding a WebSocket option for this
>> endpoint in light of the re-realisation that we've been living in HTTP sin
>> this whole time.
> 
> Yes that would be nice indeed! I don't know if you could benefit from
> this, but in addition with websocket you'd get fully interactive and
> bidirectional communication, which may allow the client to send extra
> requests or interrupt the processing etc. Also WS will work both over
> HTTP/1 and HTTP/2 and might permit to coalesce multiple connections
> into a single one with multiple streams if there's any benefit in doing
> this.
> 
>> Your patches are most welcome as they mean users can keep doing what they've
>> always been doing and can upgrade HAProxy without having to make any change.
> 
> Yeah, I'll ensure we can backport them so that it continues to work
> transparently. Please just remind your users that they should be using
> "option http-no-delay" for what they're doing. It's very possible it
> will improve latency for them even on older versions, and may avoid
> similar issues in the future.
> 
>> In the long term CouchDB will work towards providing an alternative method
>> that doesn't depend on the timely delivery of partial messages.
>> 
>> Thank you again for your efforts, it is very much appreciated.
> 
> You're welcome. It's always a pleasure to be able to improve the code
> base to cover some real-world limitations, especially when this grants
> more time to address these limitations cleanly for the long term. It's
> always better than stacking ugly emergency workarounds :-)
> 
> Cheers,
> Willy




Re: haproxy indefinitely delays the delivery of small http chunks with slz

2023-06-27 Thread Robert Newson
Hi,

i'm happy to confirm the two patches combined address the symptom I reported at 
the start of the thread. I applied them to haproxy.git master after confirming 
that the problem occurred there for a realistic setup (couchdb with HAProxy in 
front configured to do compression).

The CouchDB project are considering adding a WebSocket option for this endpoint 
in light of the re-realisation that we've been living in HTTP sin this whole 
time.

Your patches are most welcome as they mean users can keep doing what they've 
always been doing and can upgrade HAProxy without having to make any change. In 
the long term CouchDB will work towards providing an alternative method that 
doesn't depend on the timely delivery of partial messages.

Thank you again for your efforts, it is very much appreciated.

B.

> On 26 Jun 2023, at 18:50, Willy Tarreau  wrote:
> 
> Hi Robert,
> 
> On Sat, Jun 24, 2023 at 09:48:31PM +0100, Robert Newson wrote:
>> Hi,
>> 
>> That sounds great, much appreciated. I'll be available all week to test any
>> patches you might propose.
> 
> I gave it a try. There was already a flush call in the data block
> processing (I don't know why, to be honest, I'd rather condition it to
> the low latency, but let's not mix things for now). So I implemented a
> flush() operation for slz and added a call to it in haproxy.
> 
> I'd be interested if you could test the two attached patches. They're
> for 2.9-dev but will likely apply to 2.8 and probably even a number of
> older releases since this part doesn't change often. My tests were
> fairly limited, I just verified that compressing a large file (200 MB)
> directly with slz while injecting flushes after each read() continues
> to produce the valid data, and that the regtests are still OK (a few
> of them do use the compression).
> 
> Thanks,
> Willy
> <0001-IMPORT-slz-implement-a-synchronous-flush-operation.patch><0002-WIP-compression-slz-support-just-a-pure-flush.patch>




Re: haproxy indefinitely delays the delivery of small http chunks with slz

2023-06-24 Thread Robert Newson
Hi,

That sounds great, much appreciated. I'll be available all week to test any 
patches you might propose.

B.

> On 24 Jun 2023, at 21:35, Willy Tarreau  wrote:
> 
> Hi Robert,
> 
> On Sat, Jun 24, 2023 at 08:39:22PM +0100, Robert Newson wrote:
>> So, the behaviour of the _changes endpoint when used with the feed=continuous
>> and heartbeat=X (where X is number of milliseconds) is as follows;
>> 
>> 1) when _changes is invoked, couchdb opens its internal "docs in update
>> order" btree at the position indicated by the 'since' parameter, or the start
>> of the btree if 'since' is not supplied.
>> 2) for every key/value in the index from that point until the end of the
>> btree, a json object is written to the http response as a chunk (we might
>> pack multiple objects in one chunk but an object never spans multiple
>> chunks).
>> 3) once all of those are written to the response, couchdb then waits for
>> notifications that new updates have been made (a document
>> create/update/delete operation in a separate request), and then writes those
>> updates to the http response as chunks too.
>> 4) in the absence of updates, a timeout occurs, which causes us to write the
>> heartbeat chunk (which consists of a single newline character). the timer is
>> then reset. so, for a database that is receiving no writes, the response
>> would consist of periodic newlines and nothing else.
>> 5) steps 3 and 4 could conceivably continue (in any order) indefinitely;
>> until a server crash or the client disconnects, etc.
> 
> OK so that's clearly a long-polling type of processing.
> 
>> So, yes, clearly there is no HTTP-compliant solution.
> 
> Indeed.
> 
>> The motivating case for me is we encountered a user that was particularly
>> sensitive to the delayed heartbeats, in that it triggered poor connection
>> handling logic on their side with much greater probability than if the 'keep
>> alive' was working. It is ironic, to say the least, that the fundamental
>> issue is that CouchDB is, and always has been, dependent on something (timely
>> delivery of the partial http chunks in the response) that the HTTP spec does
>> not require any client or proxy to honour.
> 
> Indeed, but in parallel that matches stuff we've already met in the
> past, and while we're not particularly trying to optimize for such
> use cases, we try to remain transparent enough so that these continue
> to work well enough.
> 
> I think we should study an option. Since we already have "option
> http-no-delay" for such use cases, we should probably make the
> compression trigger a systematic flush when the option is set. This
> would follow the spirit of this option and extend it to compression.
> 
> I have not looked at the compression code for a while so I'll first
> need to check if we can act on the queue to forcefully flush it, but
> with that granted, the rest could possibly be simple enough and easy
> to setup. I'll try to check that next week.
> 
> Willy




Re: haproxy indefinitely delays the delivery of small http chunks with slz

2023-06-24 Thread Robert Newson
 Hi,

Agree there are limitations to the various workarounds in my previous response, 
the only one that I'm confident in is disabling compression for these responses 
(for our particular setup only).

So, the behaviour of the _changes endpoint when used with the feed=continuous 
and heartbeat=X (where X is number of milliseconds) is as follows;

1) when _changes is invoked, couchdb opens its internal "docs in update order" 
btree at the position indicated by the 'since' parameter, or the start of the 
btree if 'since' is not supplied.
2) for every key/value in the index from that point until the end of the btree, 
a json object is written to the http response as a chunk (we might pack 
multiple objects in one chunk but an object never spans multiple chunks).
3) once all of those are written to the response, couchdb then waits for 
notifications that new updates have been made (a document create/update/delete 
operation in a separate request), and then writes those updates to the http 
response as chunks too.
4) in the absence of updates, a timeout occurs, which causes us to write the 
heartbeat chunk (which consists of a single newline character). the timer is 
then reset. so, for a database that is receiving no writes, the response would 
consist of periodic newlines and nothing else.
5) steps 3 and 4 could conceivably continue (in any order) indefinitely; until 
a server crash or the client disconnects, etc.

So, yes, clearly there is no HTTP-compliant solution.

The motivating case for me is we encountered a user that was particularly 
sensitive to the delayed heartbeats, in that it triggered poor connection 
handling logic on their side with much greater probability than if the 'keep 
alive' was working. It is ironic, to say the least, that the fundamental issue 
is that CouchDB is, and always has been, dependent on something (timely 
delivery of the partial http chunks in the response) that the HTTP spec does 
not require any client or proxy to honour.


B.

> On 24 Jun 2023, at 06:06, Willy Tarreau  wrote:
> 
> Hi Robert,
> 
> On Fri, Jun 23, 2023 at 11:33:37PM +0100, Robert Newson wrote:
>> Hi,
>> 
>> I underestimated. the heartbeat option was added back in 2009, 14 years ago,
>> but I don't want to fixate on whether we made this mistake long enough ago to
>> justify distorting HAProxy.
> 
> OK!
> 
>> The CouchDB dev team are discussing this internally at the moment and I'll
>> update this thread if/when any conclusion comes out of that. It was noted in
>> that discussion that PouchDB (https://pouchdb.com/) does the same thing btw.
> 
> Thanks for the link. It's still not very clear to me what the *exact*
> communication sequence is. PouchDB mentions REST so my feeling is that
> the client sends requests that was and the server responds to each
> request, but then that means that responses are using full messages and
> there shouldn't be any delay. Thus it's not very clear to me when the
> heartbeats are sent. Maybe while processing a request ? If that's the
> case, using HTTP/1xx interim responses might work much better. For
> example there used to be the 102 code used to indicate browsers that
> some processing was in progress, though it's not recommended anymore
> to send it to browsers which will simply ignore it. But I'm not sure
> what motivates this "not recommended" given that any 1xx except 101
> would fit since you can send several of them before a final response.
> 
>> Given the nature of the endpoint (it could well return large amounts of
>> highly compressible data) we're not keen to disable compression on these
>> responses as the ultimate fix, though that certainly _works_.
> 
> OK but just keep in mind that even without compression, *some* analysis
> might require buffering a response. For example, some users might very
> well have:
> 
>http-response wait-for-body
>http-response deny if { res.body,lua.check_body }
> 
> where "lua.check_body" would be a Lua-based converter meant to verify
> there is no information leak (credit card numbers, e-mail addresses
> etc). And it could even be done using a regex.
> 
> With such a setup, the first 16kB (or more depending on the config) of
> response body will be buffered without being forwarded until the buffer
> is full, the response is complete or a rule matches. Again, with interim
> responses this wouldn't be a problem because these would be forwarded
> instantly.
> 
>> We could do as little as warn in our documentation that timely delivery of
>> heartbeats is not guaranteed and as much as simply ignore the heartbeat
>> request parameter and proceeding as if it were not set (thus the response is
>> a stream

Re: haproxy indefinitely delays the delivery of small http chunks with slz

2023-06-23 Thread Robert Newson
Hi,

I underestimated. the heartbeat option was added back in 2009, 14 years ago, 
but I don't want to fixate on whether we made this mistake long enough ago to 
justify distorting HAProxy.

The CouchDB dev team are discussing this internally at the moment and I'll 
update this thread if/when any conclusion comes out of that. It was noted in 
that discussion that PouchDB (https://pouchdb.com/) does the same thing btw.

Given the nature of the endpoint (it could well return large amounts of highly 
compressible data) we're not keen to disable compression on these responses as 
the ultimate fix, though that certainly _works_.

We could do as little as warn in our documentation that timely delivery of 
heartbeats is not guaranteed and as much as simply ignore the heartbeat request 
parameter and proceeding as if it were not set (thus the response is a stream 
of lots of actual data and then, after a short idle timer expires, it 
terminates cleanly with an empty chunk).

Another thought is we could cause a configurable number of heartbeat chunks to 
be emitted instead of a single one to overcome any buffering by an 
intermediary, whether HAProxy or something else.

In brief, we have options to ponder besides altering HAProxy in ways that 
violate both the letter and spirit of HTTP law.

On reflection, I don't think HAProxy should attempt to fix this problem, but I 
thank you for holding that out as an option.


B.

> On 23 Jun 2023, at 16:08, Willy Tarreau  wrote:
> 
> Hi Robert,
> 
> On Fri, Jun 23, 2023 at 12:04:14PM +, Robert Newson wrote:
>> Hi Willy,
>> 
>> thank you for this response. The behaviour in CouchDB is ancient (12 years
>> plus, essentially since before the 1.0 release), and yes it is clearly a bit
>> naughty, though it has also worked up to this point for us.
> 
> That's exactly the problem with extensions that only work due to the
> tolerance of some components, one day they suddenly break on tiny
> changes. It's said to learn that it was created 12 years ago because
> back then the problems caused by such practices started to be well
> known among HTTP implementors. But I know pretty well that it's always
> difficult to find information about potential future problems during
> the design phase :-/
> 
>> The reason I raised this here is because it seemed inadvertent given the
>> bisect, but I completely accept that this isn't something you should "fix" in
>> HAProxy as it is not broken behaviour. You are also of course right that many
>> other things could cause the same issue; they just don't happen to in our
>> particular setup.
> 
> I perfectly understand. That's also why I said that maybe we should try
> to think about an option to systematically flush compressed data, or
> preferably to figure if anything in your server's responses could allow
> to automatically disable compression.
> 
>> We'll take this onboard and think of next steps. A simple option is to simply
>> disable compression on the responses to these requests, since we can easily
>> identify which would be affected. All other endpoints, many of which send
>> chunked responses, make no assumptions about the timing of the individual
>> chunks (just as long as you get them all in the right order).
> 
> OK, that's a good news already. As a side note, keep in mind that chunks
> are just an on-wire delimitation at the connection level and have no
> meaning for the protocol nor the transported data, and as such they're
> not supposed to be forwarded 1-to-1. In practice most intermediaries
> will proceed as recommended, which is to advertise the known length of
> pending data, so the output data will appear re-chunked to buffer-size
> chunks. But of course no reordering is permitted. The real goal of
> chunks is only to permit to indicate the end of a body whose initial
> size was not known at header time, without closing, That's why usually
> chunks are of a whole buffer size.
> 
> Do not hesitate to let us know if the issue prevents users from upgrading
> so that we can try together to figure a backportable solution to offer
> them a smoother transition.
> 
> Best regards,
> Willy




Re: haproxy indefinitely delays the delivery of small http chunks with slz

2023-06-23 Thread Robert Newson
Hi Willy,

thank you for this response. The behaviour in CouchDB is ancient (12 years 
plus, essentially since before the 1.0 release), and yes it is clearly a bit 
naughty, though it has also worked up to this point for us.

The reason I raised this here is because it seemed inadvertent given the 
bisect, but I completely accept that this isn't something you should "fix" in 
HAProxy as it is not broken behaviour. You are also of course right that many 
other things could cause the same issue; they just don't happen to in our 
particular setup.

We'll take this onboard and think of next steps. A simple option is to simply 
disable compression on the responses to these requests, since we can easily 
identify which would be affected. All other endpoints, many of which send 
chunked responses, make no assumptions about the timing of the individual 
chunks (just as long as you get them all in the right order).

Regards,
Robert Newson

On Fri, 23 Jun 2023, at 11:14, Willy Tarreau wrote:
> Hi Robert,
>
> On Fri, Jun 23, 2023 at 11:01:30AM +0100, Robert Newson wrote:
>> Hi,
>> 
>> We use HAProxy in front of Apache CouchDB. CouchDB has an endpoint with some
>> interesting characteristics called _changes. With certain parameters, that
>> are commonly used, the response is effectively endless, streaming the updates
>> to a database as they happen using chunked transfer mode. In periods where no
>> changes occur a 'heartbeat' is periodically sent to keep the connection
>> alive. This heartbeat is small, consisting of a single newline character.
>> Each heartbeat is sent as a complete chunk.
>> 
>> We found that HAProxy 2.6, if slz is compressing the response, will only pass
>> on these chunks to the client when three have arrived from the backend. The
>> delay in receiving these chunks at the client causes them to timeout. If we
>> USE_ZLIB=1 instead, the problem does not occur and the heartbeat chunks are
>> sent in a timely manner.
>
> This can depend on a lot of factors. In the past we used to see ZLIB
> buffer up to 32 or 256 kB (I don't remember which one to be honest)
> without delivering anything if the content was repetitive for example,
> this was optimal, but slz is not capable of doing this.
>
>> We recently upgraded from 2.0.31 to 2.6.13 which introduced this unexpected
>> change. We eventually tracked this down to a change in libslz. With 2.0.31 we
>> were building libslz 1.0.0. With 2.6.13 we used the in-tree version. We
>> bisected haproxy from 2.0 to 2.6 and found that the switch to in-tree slz was
>> the trigger point between good and bad behaviour. We then further bisected
>> libslz between v1.0.0 and b06c172 (the point at which it was merged in-tree).
>> This pointed at
>> http://git.1wt.eu/web?p=libslz.git;a=commit;h=b334e7ad57727cdb0738b135bb98b65763d758b5
>> as the exact moment the delay was introduced.
>
> Interesting, previously it would buffer at most 3 bytes and now at most
> 7. SLZ doesn't keep input context so you have guarantees that pushing
> more input will eventually produce some output. However it will of
> course be slower or faster depending on how large and how compressible
> the input contents are.
>
>> In summary, we think HAProxy should not indefinitely buffer a complete chunk
>> of a chunked transfer encoding just because it is below the level deemed
>> worthy of compression and should instead send it to the client in a timely
>> manner.
>
> But you're aware that what you're asking for is a direct violation of
> basic HTTP messaging rules stating that no agent may depend on chunk
> delivery due to anything along the chain possibly having to buffer some
> of the data for analysis or transformation. I just found it now, it's
> RFC9110#7.6:
>
>   https://www.rfc-editor.org/rfc/rfc9110
>
>   An HTTP message can be parsed as a stream for incremental processing or
>   forwarding downstream. However, senders and recipients cannot rely on
>   incremental delivery of partial messages, since some implementations will
>   buffer or delay message forwarding for the sake of network efficiency,
>   security checks, or content transformations.
>
> Note: this text was already in RFC7230 9 years ago so it's not something
> new.
>
>> This affects clients using http/1.1 or http/2. Noting that Apache CouchDB
>> only supports http/1.1.
>> 
>> We can work around this by disabling compression for this endpoint, though
>> this is not our preference.
>
> In fact via many components, even without compression you can face issues.
> For example the TCP stack by default will refrain from sending incomplete
> frames for up to 200ms, me

haproxy indefinitely delays the delivery of small http chunks with slz

2023-06-23 Thread Robert Newson
Hi,

We use HAProxy in front of Apache CouchDB. CouchDB has an endpoint with some 
interesting characteristics called _changes. With certain parameters, that are 
commonly used, the response is effectively endless, streaming the updates to a 
database as they happen using chunked transfer mode. In periods where no 
changes occur a 'heartbeat' is periodically sent to keep the connection alive. 
This heartbeat is small, consisting of a single newline character. Each 
heartbeat is sent as a complete chunk.

We found that HAProxy 2.6, if slz is compressing the response, will only pass 
on these chunks to the client when three have arrived from the backend. The 
delay in receiving these chunks at the client causes them to timeout. If we 
USE_ZLIB=1 instead, the problem does not occur and the heartbeat chunks are 
sent in a timely manner.

We recently upgraded from 2.0.31 to 2.6.13 which introduced this unexpected 
change. We eventually tracked this down to a change in libslz. With 2.0.31 we 
were building libslz 1.0.0. With 2.6.13 we used the in-tree version. We 
bisected haproxy from 2.0 to 2.6 and found that the switch to in-tree slz was 
the trigger point between good and bad behaviour. We then further bisected 
libslz between v1.0.0 and b06c172 (the point at which it was merged in-tree). 
This pointed at 
http://git.1wt.eu/web?p=libslz.git;a=commit;h=b334e7ad57727cdb0738b135bb98b65763d758b5
 as the exact moment the delay was introduced.

In summary, we think HAProxy should not indefinitely buffer a complete chunk of 
a chunked transfer encoding just because it is below the level deemed worthy of 
compression and should instead send it to the client in a timely manner.

This affects clients using http/1.1 or http/2. Noting that Apache CouchDB only 
supports http/1.1.

We can work around this by disabling compression for this endpoint, though this 
is not our preference.

Regards,
Robert Newson
Apache CouchDB PMC




Re: regression? scheme and hostname logged with %r with 2.6.13

2023-06-07 Thread Robert Newson
Hi,

Yeah I addressed this with "%HM %HPO%HQ %HV" which looks right in my logs under 
some light testing, but I will check the pathq option also.

B.

> On 7 Jun 2023, at 22:39, Lukas Tribus  wrote:
> 
> Hello,
> 
> 
> yes, H2 behaves very differently; due to protocol differences but also
> due to other changes. In the beginning H2 was only implemented in the
> frontend and every transaction was downgraded to HTTP/1.1 internally.
> This was later changed to an internal generic "HTX" representation
> that allowed to unify the protocol stack.
> 
> 
> To return relative URIs form in the logs, I guess you could
> reconstruct the string manually with pathq (untested):
> \"%HM%[pathq]%HV\" as opposed to %r
> 
> 
> pathq is a HTTP sample designed to do exactly this, always returning
> the URI in a relative format:
> 
> http://docs.haproxy.org/2.6/configuration.html#7.3.6-pathq
> 
> 
> Not sure what %HU does, I assume it refers to the url not pathq.
> 
> 
> 
> I agree that doc updates are needed at least in section "8.2.3. HTTP
> log format" and "8.2.6. Custom log format".
> 
> 
> 
> Lukas




re: regression? scheme and hostname logged with %r with 2.6.13

2023-06-07 Thread Robert Newson
Hi,

Figured this out (my reply might not be threaded, the mailing list daemon 
doesn't add me after I confirm my subscription)

It was 
https://github.com/haproxy/haproxy/commit/30ee1efe676e8264af16bab833c621d60a72a4d7
 in haproxy 2.1 that caused this change. It's deliberate but the documentation 
wasn't updated to match.

I found this by bisecting between 2.0 and 2.1 after noticing that only HTTP/2 
requests were being logged this way.

B.


regression? scheme and hostname logged with %r with 2.6.13

2023-06-05 Thread Robert Newson
Hi,

I've recently upgraded some hosts from 2.0.31 to 2.6.13. I have a custom 
log-format that matches the standard httplog but with some extra items on the 
end.

Notably I log %{+Q}r 

In 2.0.31 this is as the documentation described, it's the request line "GET 
/foo HTTP/1.1". The docs for 2.6.31 (and everything inbetween) say the same, 
and all the examples show this too.

However, I've noticed that what's logged now includes the full URI (which is 
obviously not present in the request line transmitted by the client). e.g, "GET 
https://somewhere.com/foo HTTP/1.1"

the code for LOG_FMT_REQ in 2.0.31 and 2.6.31 looks the same to me, so I am 
inferring that txn-uri used to get some processing prior to logging that no 
longer happens.

(I've also tried %{+Q}HU and also get the scheme and hostname with that)

I have a very complicated config so I'm not sure what parts are relevant to 
share.

Has anyone else seen this behaviour?

Regards,
Robert Newson




Re: Lua hangs with get line at end of request body

2022-08-29 Thread Robert Newson
Hi,

Very nice, thanks! Confirmed this fixes the issue locally.

B.

> On 29 Aug 2022, at 14:59, Christopher Faulet  wrote:
> 
> Le 8/29/22 à 12:30, Christopher Faulet a écrit :
>> There is a bug. I'm able to reproduce it with your lua script. I will fix it 
>> soon.
> 
> FYI, I pushed a fix[1] to 2.7-DEV. It will be backported soon.
> 
> 
> [1] https://github.com/haproxy/haproxy/commit/4a20972a
> 
> -- 
> Christopher Faulet




Lua hangs with get line at end of request body

2022-08-26 Thread Robert Newson
Hi All,

I'm upgrading some HAproxy nodes here (from 2.0.29 to 2.6.4) and everything is 
going well except some Lua code I have. I've distilled the code and config to 
this;

haproxy.cfg;

global
lua-load httpbug.lua

defaults
mode http
timeout client 15
timeout connect 5000
timeout queue 5000
timeout server 36

listen proxy
bind 127.0.0.1:10001
http-request use-service lua.httpbug

httpbug.lua:

core.register_service("httpbug", "http", function(applet)
repeat
local line = applet:getline()
core.log(core.crit, line)
until line == ''
applet:set_status(200)
applet:start_response()
applet:send("success")
end)

--

With a request like 'curl localhost: 10001 -XPOST -T body' and file called 
'body' as follows;

line1
line2
line3

(where each line is newline-terminated)

With 2.0.29 I get a response (of 200 with body "success"), and line1, line2, 
line3 are logged to screen (haproxy -d -f haproxy.cfg)

With 2.6.4 I see the logs of line1, line2, line3, but no response.

It appears applet:getline() blocks forever at the end of the request body, 
rather than returning an empty string to indicate this like it used to (and is 
documented to do).

I searched the mailing list (and SO, etc) for other examples of reading the 
http request body line by line or for a report of this bug from someone else 
but found nothing.

What did I do wrong?

Regards,
Robert Newson




Re: leak of handle to /dev/urandom since 1.8?

2019-04-25 Thread Robert Newson
My bad (was a looong day).

I've tried again with the change here: 
https://github.com/cloudant/haproxy-1.9/commit/abc3427130671e49defcd0ad6316ee16509ef9a1

Same leak. A new open fd each time I reload.

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Thu, 25 Apr 2019, at 08:05, Willy Tarreau wrote:
> Hi Robert,
> 
> On Wed, Apr 24, 2019 at 03:13:00PM -0400, Robert Newson wrote:
> > Hi,
> > 
> > Thanks for the suggestion and, yes, we're using the master-worker mode (-Ws
> > specifically). I made a custom build as directed
> > (https://github.com/cloudant/haproxy-1.9/tree/urandom-leak) and tried it 
> > out.
> > Same leak, unfortunately. An extra /dev/urandom fd each reload.
> 
> Ah, that's not what I was suggesting :-) I was suggesting to put the
> return statement at the *beginning* of the function so that it does
> not initialize the random number generator. Or if you prefer, just
> pretend that it's already initialized (which will achieve the same) :
> 
> diff --git a/src/ssl_sock.c b/src/ssl_sock.c
> index 015943ee6..bd12d876d 100644
> --- a/src/ssl_sock.c
> +++ b/src/ssl_sock.c
> @@ -3736,7 +3736,7 @@ ignore_entry:
>  static int ssl_initialize_random()
>  {
> unsigned char random;
> -   static int random_initialized = 0;
> +   static int random_initialized = 1;
>  
> if (!random_initialized && RAND_bytes(&random, 1) != 0)
> random_initialized = 1;
> 
> Willy
> 
>



Re: leak of handle to /dev/urandom since 1.8?

2019-04-24 Thread Robert Newson
Hi,

Thanks for the suggestion and, yes, we're using the master-worker mode (-Ws 
specifically). I made a custom build as directed 
(https://github.com/cloudant/haproxy-1.9/tree/urandom-leak) and tried it out. 
Same leak, unfortunately. An extra /dev/urandom fd each reload.

B.

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Wed, 24 Apr 2019, at 14:02, Willy Tarreau wrote:
> Hi Robert,
> 
> On Tue, Apr 23, 2019 at 12:31:29PM -0400, Robert Newson wrote:
> > We've noticed that our haproxy processes accumulate many file descriptors to
> > /dev/urandom over time. This coincidences with reloads;
> (...)
> > We've seen a node with over 20,000 open file descriptors to this same file.
> 
> I think I know what's happening. You're probably using the master-worker
> mode since you're under systemd. We have to make openssl initialize its
> RNG very early in the boot (before the chroot at least) and I suspect
> that it doesn't mark the FD with CLOEXEC, so when the master re-execs
> itself, the FD is not closed and a new openssl instance opens another FD
> for /dev/urandom.
> 
> If you want to run a quick test, please add "return 1;" at the beginning
> of ssl_sock.c:ssl_initialize_random() and run your config without the
> "chroot" directive so that /dev/urandom remains accessible during all
> the process' life. If the issue disappears it comes from here. Otherwise
> it may still come from here or from other places.
> 
> We try to close all relevant FDs before execve() but it's possible that
> a few slipped through the cracks :-/
> 
> Willy
> 
>



leak of handle to /dev/urandom since 1.8?

2019-04-23 Thread Robert Newson
Hi,

We've noticed that our haproxy processes accumulate many file descriptors to 
/dev/urandom over time. This coincidences with reloads;

rnewson@lb2:~$ sudo service haproxy restart
rnewson@lb2:~$ sudo ls -l /proc/$(pgrep -o haproxy)/fd | grep -c '/dev/urandom'
1
rnewson@lb2:~$ sudo ls -l /proc/$(pgrep -o haproxy)/fd | grep -c '/dev/urandom'
1
rnewson@lb2:~$ sudo service haproxy reload
rnewson@lb2:~$ sudo ls -l /proc/$(pgrep -o haproxy)/fd | grep -c '/dev/urandom'
2
rnewson@lb2:~$ sudo ls -l /proc/$(pgrep -o haproxy)/fd | grep -c '/dev/urandom'
2
rnewson@lb2:~$ sudo service haproxy reload
rnewson@lb2:~$ sudo ls -l /proc/$(pgrep -o haproxy)/fd | grep -c '/dev/urandom'
3
rnewson@lb2:~$ sudo service haproxy reload
rnewson@lb2:~$ sudo ls -l /proc/$(pgrep -o haproxy)/fd | grep -c '/dev/urandom'
4

We've seen a node with over 20,000 open file descriptors to this same file.

We are running under systemd. Our config is quite complicated so I'm not 
posting it here just yet, I'd appreciate guidance on which parts might be 
relevant. We do use "expose-fd listeners" to hand over open sockets cleanly 
during reload but nothing else seems obviously related.

We see this with haproxy 1.8.16 and 1.9.6. We have a few deployments of 1.7.9 
where the count remains at 0 (which is also odd given we use TLS exclusively).

Any suggestions?
B.

-- 
  Robert Samuel Newson
  rnew...@apache.org



Re: [ANNOUNCE] haproxy-1.8-rc1 : the last mile

2017-11-04 Thread Robert Newson
It’s only 1.0.1 that’s affected, so I’m inferring that predates support for 
serving multiple certificate types; it’s not an haproxy regression. 

I’ve failed in all attempts to try tls 1.3 though. Haproxy segfaults very soon 
after startup. I tried several OpenSSL versions.

Sent from my iPhone

> On 4 Nov 2017, at 09:17, Robert Newson  wrote:
> 
> It seems to only be some versions of OpenSSL. I’ll go over this again more 
> carefully and keep notes. I was trying out tls 1.3 support as well, so it 
> might be specific to OpenSSL 1.1.1-dev. 
> 
> Sent from my iPhone
> 
>> On 3 Nov 2017, at 22:33, Willy Tarreau  wrote:
>> 
>> Hi Robert,
>> 
>>> On Thu, Nov 02, 2017 at 03:58:47PM +, Robert Samuel Newson wrote:
>>> Hi,
>>> 
>>> I think the "cert bundle" feature from 1.7 is broken in 1.8-rc1. My exact 
>>> config works with 1.7 but says this for 1.8-rc1;
>>> 
>>> unable to stat SSL certificate from file '/path/to/foo.pem': No such file 
>>> or directory.
>>> 
>>> That is, it's attempting to load foo.pem, not foo.pem.rsa or foo.pem.ecdsa 
>>> like 1.7 does.
>> 
>> Oh bad, that's totally unexpected. I'm CCing Emeric and Manu, the former
>> being the SSL maintainer (in case he has a quick idea about it) and the
>> latter having upgraded a large part of the cert management code, possibly
>> having an idea about anything that could have gone wrong.
>> 
>>> I also tried asking the mailing list daemon for help by emailing
>>> haproxy+h...@formilux.org as the signup confirmation specifies, the full 
>>> body
>>> of that help is just "Hello,". I was hoping to ask the daemon to send me the
>>> initial message in this thread so I could reply into the thread properly.
>>> Sadly the mailing list archive doesn't show any of the headers I might have
>>> injected to get threading working that way, so sorry for breaking the thread
>>> but I really tried not to.
>> 
>> I was not even aware of the feature :-)
>> 
>>> I am very excited about many of the new features in 1.8 and am itching to 
>>> try
>>> them.
>> 
>> As long as you're very careful that's useful. I'm going to issue an rc2 with
>> the most painful bugs fixed.
>> 
>> Thanks for the report,
>> Willy




Re: [ANNOUNCE] haproxy-1.8-rc1 : the last mile

2017-11-04 Thread Robert Newson
It seems to only be some versions of OpenSSL. I’ll go over this again more 
carefully and keep notes. I was trying out tls 1.3 support as well, so it might 
be specific to OpenSSL 1.1.1-dev. 

Sent from my iPhone

> On 3 Nov 2017, at 22:33, Willy Tarreau  wrote:
> 
> Hi Robert,
> 
>> On Thu, Nov 02, 2017 at 03:58:47PM +, Robert Samuel Newson wrote:
>> Hi,
>> 
>> I think the "cert bundle" feature from 1.7 is broken in 1.8-rc1. My exact 
>> config works with 1.7 but says this for 1.8-rc1;
>> 
>> unable to stat SSL certificate from file '/path/to/foo.pem': No such file or 
>> directory.
>> 
>> That is, it's attempting to load foo.pem, not foo.pem.rsa or foo.pem.ecdsa 
>> like 1.7 does.
> 
> Oh bad, that's totally unexpected. I'm CCing Emeric and Manu, the former
> being the SSL maintainer (in case he has a quick idea about it) and the
> latter having upgraded a large part of the cert management code, possibly
> having an idea about anything that could have gone wrong.
> 
>> I also tried asking the mailing list daemon for help by emailing
>> haproxy+h...@formilux.org as the signup confirmation specifies, the full body
>> of that help is just "Hello,". I was hoping to ask the daemon to send me the
>> initial message in this thread so I could reply into the thread properly.
>> Sadly the mailing list archive doesn't show any of the headers I might have
>> injected to get threading working that way, so sorry for breaking the thread
>> but I really tried not to.
> 
> I was not even aware of the feature :-)
> 
>> I am very excited about many of the new features in 1.8 and am itching to try
>> them.
> 
> As long as you're very careful that's useful. I'm going to issue an rc2 with
> the most painful bugs fixed.
> 
> Thanks for the report,
> Willy