Re: 1.6-dev2 crashes with certain server hostname

2015-07-22 Thread Willy Tarreau
On Wed, Jul 22, 2015 at 04:23:00PM +0200, Willy Tarreau wrote:
> The patch is not that small but is still readable so now I'm debugging.

And here comes the fix, it was indeed a one byte overflow in the DNS
code.

Thanks for your report!
Willy

>From d69d6f367879c52013946026239cb7d56c9f6f2b Mon Sep 17 00:00:00 2001
From: Willy Tarreau 
Date: Wed, 22 Jul 2015 16:45:36 +0200
Subject: BUG/MAJOR: dns: fix the length of the string to be copied

Jan A. Bruder reported that some very specific hostnames on server
lines were causing haproxy to crash on startup. Given that hist
backtrace showed some heap corruption, it was obvious there was an
overflow somewhere. The bug in fact is a typo in dns_str_to_dn_label()
which mistakenly copies one extra byte from the host name into the
output value, thus effectively corrupting the structure.

The bug triggers while parsing the next server of similar length
after the corruption, which generally triggers at config time but
could theorically crash at any moment during runtime depending on
what malloc sizes are needed next. This is why it's tagged major.

No backport is needed, this bug was introduced in 1.6-dev2.
---
 src/dns.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/dns.c b/src/dns.c
index 37e041c..5bc57e5 100644
--- a/src/dns.c
+++ b/src/dns.c
@@ -947,7 +947,7 @@ char *dns_str_to_dn_label(const char *string, char *dn, int 
dn_len)
if (dn_len < i + offset)
return NULL;
 
-   i = strlen(string) + offset;
+   i = strlen(string);
memcpy(dn + offset, string, i);
dn[i + offset] = '\0';
/* avoid a '\0' at the beginning of dn string which may prevent the for 
loop
-- 
1.7.12.2.21.g234cd45.dirty



Re: 1.6-dev2 crashes with certain server hostname

2015-07-22 Thread Willy Tarreau
Jan,

On Wed, Jul 22, 2015 at 02:50:12PM +0200, Willy Tarreau wrote:
> It would be nice also if you were able to reproduce the issues with the
> smallest possible config (eg: just a backend and a server line or something
> like this). But I suspect that at some point we'll have to try to reproduce
> with your exact hostname at least to see if it does something abnormal in
> the code :-/

Thanks to the config you sent me offline, I could reproduce it and narrow
it down a such a simple config :

  backend b_media_http
server r  www.master.haproxy.org:80 check
server w  www.master.haproxy.org:80 check

The name's length is critical, the change of port or check changes the error,
but in all cases we are facing a memory corruption. I'm guessing it's related
to a +/-1 in a name length (typically a trailing zero overwriting the malloc
structure). As supposed it came with the DNS changes (the only ones affecting
that area). I could bisect it to this commit :

  commit a68ca96375b76389322af877c32a9c47b5b6a3e0
  Author: Baptiste Assmann 
  Date:   Tue Apr 14 01:15:08 2015 +0200

MAJOR: server: add DNS-based server name resolution

Relies on the DNS protocol freshly implemented in HAProxy.
It performs a server IP addr resolution based on a server hostname.

The patch is not that small but is still readable so now I'm debugging.

Thanks for your detailed report!

Willy




Re: 1.6-dev2 crashes with certain server hostname

2015-07-22 Thread Willy Tarreau
Hi Jan,

On Thu, Jul 16, 2015 at 02:28:54AM +0200, Jan A. Bruder wrote:
> Hi all,
> this malloc crash occurs with and only with a certain hostname of one of my
> backends being added to the config. See "redirector.domain.tld" in the
> config below. Since this is a production server i had to mask the hostname.
> As a hint: The hostname does not contain any special characters, just
> alphabetic a-z characters.
> Interestingly if i change only a single letter anywhere in the hostname it
> doesn't crash anymore. Neither does it crash if i use it's IP instead of
> the hostname. How strange is that!?
> Also, i am using the same config with 1.5 stable without any problems.

I have memories of an old resolver bug on some RHEL 5 or 6 that
affected the libc, but you seem to be running debian so it shouldn't
be the same bug.

Are you sure your hostname properly resolves ? Your backtrace makes me
think that something has corrupted memory, very likely overflown an area
allocated using malloc/calloc. I don't know what could cause this and
this sounds so new to me that if it's a bug it must be a very recent
one. And since you're saying 1.5 is affected as well, I'm seriously
wondering whether haproxy is the only culprit there or of the libc is
complice as well.

I don't know how to proceed from now on, the trace cannot be exploited
since it basically shows that something went wrong before going down
that route. Maybe running haproxy via valgrind could help figure what
is happening. You're saying that this specific hostname only causes the
problem. Out of curiosity, have you seen if it resolves in a specific
way, maybe via a CNAME, in IPv6, or would have some extra fields
associated with the record, etc ?

I tried your configuration here and as you probably expect, I failed to
reproduce the same problem.

It would be nice also if you were able to reproduce the issues with the
smallest possible config (eg: just a backend and a server line or something
like this). But I suspect that at some point we'll have to try to reproduce
with your exact hostname at least to see if it does something abnormal in
the code :-/

Cheers,
Willy