Re: 1.6-dev2 crashes with certain server hostname
On Wed, Jul 22, 2015 at 04:23:00PM +0200, Willy Tarreau wrote: > The patch is not that small but is still readable so now I'm debugging. And here comes the fix, it was indeed a one byte overflow in the DNS code. Thanks for your report! Willy >From d69d6f367879c52013946026239cb7d56c9f6f2b Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Wed, 22 Jul 2015 16:45:36 +0200 Subject: BUG/MAJOR: dns: fix the length of the string to be copied Jan A. Bruder reported that some very specific hostnames on server lines were causing haproxy to crash on startup. Given that hist backtrace showed some heap corruption, it was obvious there was an overflow somewhere. The bug in fact is a typo in dns_str_to_dn_label() which mistakenly copies one extra byte from the host name into the output value, thus effectively corrupting the structure. The bug triggers while parsing the next server of similar length after the corruption, which generally triggers at config time but could theorically crash at any moment during runtime depending on what malloc sizes are needed next. This is why it's tagged major. No backport is needed, this bug was introduced in 1.6-dev2. --- src/dns.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/dns.c b/src/dns.c index 37e041c..5bc57e5 100644 --- a/src/dns.c +++ b/src/dns.c @@ -947,7 +947,7 @@ char *dns_str_to_dn_label(const char *string, char *dn, int dn_len) if (dn_len < i + offset) return NULL; - i = strlen(string) + offset; + i = strlen(string); memcpy(dn + offset, string, i); dn[i + offset] = '\0'; /* avoid a '\0' at the beginning of dn string which may prevent the for loop -- 1.7.12.2.21.g234cd45.dirty
Re: 1.6-dev2 crashes with certain server hostname
Jan, On Wed, Jul 22, 2015 at 02:50:12PM +0200, Willy Tarreau wrote: > It would be nice also if you were able to reproduce the issues with the > smallest possible config (eg: just a backend and a server line or something > like this). But I suspect that at some point we'll have to try to reproduce > with your exact hostname at least to see if it does something abnormal in > the code :-/ Thanks to the config you sent me offline, I could reproduce it and narrow it down a such a simple config : backend b_media_http server r www.master.haproxy.org:80 check server w www.master.haproxy.org:80 check The name's length is critical, the change of port or check changes the error, but in all cases we are facing a memory corruption. I'm guessing it's related to a +/-1 in a name length (typically a trailing zero overwriting the malloc structure). As supposed it came with the DNS changes (the only ones affecting that area). I could bisect it to this commit : commit a68ca96375b76389322af877c32a9c47b5b6a3e0 Author: Baptiste Assmann Date: Tue Apr 14 01:15:08 2015 +0200 MAJOR: server: add DNS-based server name resolution Relies on the DNS protocol freshly implemented in HAProxy. It performs a server IP addr resolution based on a server hostname. The patch is not that small but is still readable so now I'm debugging. Thanks for your detailed report! Willy
Re: 1.6-dev2 crashes with certain server hostname
Hi Jan, On Thu, Jul 16, 2015 at 02:28:54AM +0200, Jan A. Bruder wrote: > Hi all, > this malloc crash occurs with and only with a certain hostname of one of my > backends being added to the config. See "redirector.domain.tld" in the > config below. Since this is a production server i had to mask the hostname. > As a hint: The hostname does not contain any special characters, just > alphabetic a-z characters. > Interestingly if i change only a single letter anywhere in the hostname it > doesn't crash anymore. Neither does it crash if i use it's IP instead of > the hostname. How strange is that!? > Also, i am using the same config with 1.5 stable without any problems. I have memories of an old resolver bug on some RHEL 5 or 6 that affected the libc, but you seem to be running debian so it shouldn't be the same bug. Are you sure your hostname properly resolves ? Your backtrace makes me think that something has corrupted memory, very likely overflown an area allocated using malloc/calloc. I don't know what could cause this and this sounds so new to me that if it's a bug it must be a very recent one. And since you're saying 1.5 is affected as well, I'm seriously wondering whether haproxy is the only culprit there or of the libc is complice as well. I don't know how to proceed from now on, the trace cannot be exploited since it basically shows that something went wrong before going down that route. Maybe running haproxy via valgrind could help figure what is happening. You're saying that this specific hostname only causes the problem. Out of curiosity, have you seen if it resolves in a specific way, maybe via a CNAME, in IPv6, or would have some extra fields associated with the record, etc ? I tried your configuration here and as you probably expect, I failed to reproduce the same problem. It would be nice also if you were able to reproduce the issues with the smallest possible config (eg: just a backend and a server line or something like this). But I suspect that at some point we'll have to try to reproduce with your exact hostname at least to see if it does something abnormal in the code :-/ Cheers, Willy