Hi guys,

On Thu, Aug 04, 2016 at 07:02:50PM +0200, Lukas Tribus wrote:
> > It may be very useful to build with libslz instead of building without
> > zlib. It would stress the exact same code paths in haproxy, you would
> > still get compression and we'd see if the issue can be reproduced.
> 
> While googling around I found another report [3], a similar/same crash in
> memcpy() while using zlib. Apparently switching to libslz fixed the issue
> for them.

Now I have found :-)

Thanks to the 3 reports we have accumulated, I noticed that we were each
time flushing the pending data. And the flush function doesn't set next_in
nor avail_in before calling deflate(), probably because it doesn't seem
necessary (given the low crash rate I'd argue it rarely is). But when
reading the zlib code, it was obvious to me that these ones could be used,
to the point that I could force the crash by setting them to some junk.

The reason why we didn't get this in 1.5 is that during 1.5 a buffer was
assigned to the same session during all its life, so by not setting the
values there, we in fact used to leave the earlier ones in place pointing
to a valid location. Since we introduced dynamic buffers in 1.6, this is
no longer true and buffers can vanish at any moment.

Thus I'd encourage those still facing the bug to apply the attached patch
and to report their experience. It works on both 1.7 and 1.6.

Regards,
Willy

>From d8b8b5329e0892b9f0c832f0e377dbe8e9b32aba Mon Sep 17 00:00:00 2001
From: Willy Tarreau <w...@1wt.eu>
Date: Mon, 8 Aug 2016 16:41:01 +0200
Subject: BUG/MAJOR: compression: initialize avail_in/next_in even during
 flush

For quite some time, a few users have been experiencing random crashes
when compressing with zlib, from versions 1.2.3 to 1.2.8 included.

Upon thourough investigation in zlib's deflate.c, it appeared obvious
that avail_in and next_in are used during the flush operation and need
to be initialized, while admittedly it's not obvious in the documentation.

By simply forcing both values to -1 it's possible to immediately reproduce
the exact crash that these users have been experiencing :

  (gdb) bt
  #0  0x00007fa73ce10c00 in __memcpy_sse2 () from /lib64/libc.so.6
  #1  0x00007fa73e0c5d49 in ?? () from /lib64/libz.so.1
  #2  0x00007fa73e0c68e0 in ?? () from /lib64/libz.so.1
  #3  0x00007fa73e0c73c7 in deflate () from /lib64/libz.so.1
  #4  0x00000000004dca1c in deflate_flush_or_finish (comp_ctx=0x7b6580, 
out=0x7fa73e5bd010, flag=2) at src/compression.c:808
  #5  0x00000000004dcb60 in deflate_flush (comp_ctx=0x7b6580, 
out=0x7fa73e5bd010) at src/compression.c:835
  #6  0x00000000004dbc50 in http_compression_buffer_end (s=0x7c0050, 
in=0x7c00a8, out=0x78adf0 <tmpbuf.24662>, end=0) at src/compression.c:249
  #7  0x000000000048bb5f in http_response_forward_body (s=0x7c0050, 
res=0x7c00a0, an_bit=1048576) at src/proto_http.c:7173
  #8  0x00000000004cce54 in process_stream (t=0x7bffd8) at src/stream.c:1939
  #9  0x0000000000427ddf in process_runnable_tasks () at src/task.c:238
  #10 0x0000000000419892 in run_poll_loop () at src/haproxy.c:1573
  #11 0x000000000041a4a5 in main (argc=4, argv=0x7fffcda38348) at 
src/haproxy.c:1933

Note that for all reports deflate_flush_or_finish() was always involved.

The crash is very hard to reproduce when using regular traffic because it
requires that the combination of avail_in and next_in are inadequate so
that the memcpy() call reads out of bounds. But this can very likely
happen when the input buffer points to an area reused by another stream
when the flush has been interrupted due to a full output buffer. This
also explains why this report is recent, as dynamic buffer allocation
was introduced in 1.6.

Anyway it's not acceptable to call a function with a randomly set input
buffer. The deflate() function explicitly checks for the case where both
avail_in and next_in are null and doesn't use it in this case during a
flush, so this is the best solution.

Special thanks to Sasha Litvak, James Hartshorn and Paul Bauer for
reporting very useful stack traces which were critical to finding the
root cause of this bug.

This fix must be backported into 1.6 and 1.5, though 1.5 is less likely to
trigger this case given that it keeps its own buffers allocated all along
the session's life.
---
 src/compression.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/compression.c b/src/compression.c
index 8d09585..4e13bae 100644
--- a/src/compression.c
+++ b/src/compression.c
@@ -565,6 +565,8 @@ static int deflate_flush_or_finish(struct comp_ctx 
*comp_ctx, struct buffer *out
        int out_len = 0;
        z_stream *strm = &comp_ctx->strm;
 
+       strm->next_in = NULL;
+       strm->avail_in = 0;
        strm->next_out = (unsigned char *)bi_end(out);
        strm->avail_out = out->size - buffer_len(out);
 
-- 
1.7.12.1

Reply via email to