Re: Haproxy 1.8.4 crashing workers and increased memory usage
Hey, Won't that be a bit pointless since we don't use threads? Regards, Robin Geuze On 4/9/2018 10:31, Илья Шипицин wrote: can you try thread sanitizer (in real time)? https://github.com/google/sanitizers/wiki#threadsanitizer I'd like to try myself, however, we do not observe bad things in our environment 2018-04-09 13:24 GMT+05:00 Robin Geuze <rob...@transip.nl <mailto:rob...@transip.nl>>: Hey Willy, So I made a build this morning with libslz and re-enabled compression and within an hour we had the exit code 134 errors, so zlib does not seem to be the problem here. Regards, Robin Geuze On 4/7/2018 00:30, Willy Tarreau wrote: Hi Robin, On Fri, Apr 06, 2018 at 03:52:33PM +0200, Robin Geuze wrote: Hey Willy, I was actually the one that had the hunch to disable compression. I suspected that this was the issue because there was a bunch of "abort" calls in include/common/hathreads.h" which is used by the compression stuff. However I just noticed those aborts are actually only there if DEBUG_THREAD is defined which it doesn't seem to be for our build. So basically, I have no clue whatsoever why disabling compression fixes the bug. At least I don't feel alone :-) I can see next week if we can make a build with slz instead of zlib (we seem to be linked against zlib/libz atm). Thank you, I appreciate it! Cheers, Willy
Re: Haproxy 1.8.4 crashing workers and increased memory usage
Hey Willy, So I made a build this morning with libslz and re-enabled compression and within an hour we had the exit code 134 errors, so zlib does not seem to be the problem here. Regards, Robin Geuze On 4/7/2018 00:30, Willy Tarreau wrote: Hi Robin, On Fri, Apr 06, 2018 at 03:52:33PM +0200, Robin Geuze wrote: Hey Willy, I was actually the one that had the hunch to disable compression. I suspected that this was the issue because there was a bunch of "abort" calls in include/common/hathreads.h" which is used by the compression stuff. However I just noticed those aborts are actually only there if DEBUG_THREAD is defined which it doesn't seem to be for our build. So basically, I have no clue whatsoever why disabling compression fixes the bug. At least I don't feel alone :-) I can see next week if we can make a build with slz instead of zlib (we seem to be linked against zlib/libz atm). Thank you, I appreciate it! Cheers, Willy
Re: Haproxy 1.8.4 crashing workers and increased memory usage
Hey Willy, I was actually the one that had the hunch to disable compression. I suspected that this was the issue because there was a bunch of "abort" calls in include/common/hathreads.h" which is used by the compression stuff. However I just noticed those aborts are actually only there if DEBUG_THREAD is defined which it doesn't seem to be for our build. So basically, I have no clue whatsoever why disabling compression fixes the bug. I can see next week if we can make a build with slz instead of zlib (we seem to be linked against zlib/libz atm). Regards, Robin Geuze On 4/6/2018 14:18, Willy Tarreau wrote: Hi Frank, On Fri, Apr 06, 2018 at 10:53:36AM +, Frank Schreuder wrote: We tested haproxy 1.8.6 with compression enabled today, within the first few hours it already went wrong: [ALERT] 095/120526 (12989) : Current worker 5241 exited with code 134 OK thanks, and sorry for that. Our other balancer running haproxy 1.8.5 with compression disabled is still running fine after 2 days with the same workload. So there seems to be a locking issue when compression is enabled. Well, an issue with compression, but I'm really not seeing what makes you speak about locking since : - you don't seem to have threads enabled - locking issues generally cause deadlocks, not aborts The other problem is that we noticed already that there are very few abort() calls in haproxy and none of them in this area. So it's very possible that it comes from another layer detecting an issue provoked by compression. Typically the libc's malloc/free can stop the program using abort() if they detect a corruption. It would really help to know where this abort() happens, at least to get a backtrace. By the way, area you using zlib or slz ? zlib uses a tricky allocator. I checked it again yesterday and it was made thread safe. But we couldn't rule out an issue there. slz doesn't need memory however. If you're on zlib, switching to slz could also indicate if the problem is related to these memory allocations or not. Thanks, Willy
Re: "Odd" behaviour on resolvers
During configcheck the resolver portion of haproxy is not yet active. Thus during the check it uses the system resolver. Once actually running it will use the resolver from haproxy. On Mar 4, 2016 16:26, "Arnaud B."wrote: Hi there, First of all : I am very fond of HAProxy :-) I was trying to do some service discovery with bind9 and HAProxy when I found an odd behaviour on the resolvers part. Here are some config samples : My frontend and backend and resolvers config: resolvers discovery nameserver jabba 172.16.0.2:53 resolve_retries 5 timeout retry 1s hold valid 3s frontend staging_frontend mode http bind 127.0.0.1:8013 default_backend staging_backend backend staging_backend server jabba php-staging.vra:8013 check resolvers discovery a sample dig : $ dig @172.16.0.2 php-staging.vra +short 172.16.0.2 So far, everything seems fine to me. But, when i reload haproxy's config : mars 04 15:55:13 jabba haproxy[25262]: [ALERT] 063/155513 (25262) : parsing [/etc/haproxy/haproxy.cfg:98] : 'server jabba' : invalid address: 'php-staging.vra' in ... mars 04 15:55:13 jabba haproxy[25262]: [ALERT] 063/155513 (25262) : Error(s) found in configuration file : /etc/haproxy/haproxy.cfg mars 04 15:55:13 jabba haproxy[25262]: [ALERT] 063/155513 (25262) : Fatal errors found in configuration. I first questioned my configuration and checked everything out ... There was no obvious issue. I checked my /etc/resolv.conf file and saw that my server wasn't using the same resolver. Its resolver did not had any knowledge whatsoever of php-staging.vra and it caused my configuration to fail on reload. My /etc/resolv.conf was : nameserver 8.8.8.8 nameserver 213.186.33.99 and became nameserver 172.16.0.2 search vra nameserver 8.8.8.8 nameserver 213.186.33.99 permitting haproxy's reload. My question is pretty simple : since there is a resolvers section in haproxy's configuration, is it possible to use it despite the server's resolv.conf? Let me know if something's unclear in my explanation :)
Re: [PATCH] MEDIUM: dns: Don't use the ANY query type
Hey guys, Actually when you get an NXDOMAIN reply you can just stop resolving that domain. Basically there are 2 types of "negative" replies in DNS: NODATA: basically this is when you don't get an error (NOERROR in dig), but not the actual data you are looking for. You might have gotten some CNAME data but no A or record (depending on what you wanted obviously). This means that the actual domain name does exist, but doesn't have data of the type you requested. The term NODATA is used in DNS RFC's but it doesn't actually have its own error code. NXDOMAIN: This is denoted by the NXDOMAIN error code. It means that either the domain you requested itself or the last target domain from a CNAME does not exist at all (IE no data whatsoever) and there also isn't a wildcard available that matches it. So if you asked for an A record, getting an NXDOMAIN means there also won't be an record. The above explanation is a bit of an over simplification cause there are also things like empty non-terminals which also don't have any data, but instead of an NXDOMAIN actually return a NODATA (in most cases, there are some authoritative servers that don't do it properly). But the end result is that you can pretty much say that when you get NXDOMAIN, there really is nothing there for you so you can just stop looking (at least at that the current server). -Robin- On 10/20/2015 10:25 PM, Willy Tarreau wrote: On Tue, Oct 20, 2015 at 10:20:50PM +0200, Baptiste wrote: On Tue, Oct 20, 2015 at 9:09 PM, Lukas Tribuswrote: I don't know. I'm always only focused on the combination of user-visible changes and risks of bugs (which are user-visible changes btw). So if we can do it without breaking too much code, then it can be backported. What we have now is something which is apparently insufficient to some users so we can improve the situation. I wouldn't want to remove prefer-* or change the options behavior or whatever for example. Ok, if we don't remove existing prefer-* keywords a 1.6 backport sounds possible without user visible breakage, great. lukas Ok, just to make it clear, let me write a few conf examples: - server home-v4 home-v4.mydomain check resolve-prefer ipv4 => A then (failover on NX) - server home-v4 home-v4.mydomain check v4only => A only (stop on NX) If both 'resolve-prefer ipv[46]' and 'v[46]only' are set, whatever combination, then, v[46]only applies, but configuration parsing may return a warning. Yes, but please avoid the warning, it makes it unconvenient to edit configs. You may for example have "resolve-prefer ipv4" in the default-server directive, and having it warn because one of your servers has v4only is annoying. BTW, the v4only and resolve-prefer should also be used during the initial resolving phase performed by getaddrinfo() but that's for a future patch :-) Willy
Re: [PATCH] MEDIUM: dns: Don't use the ANY query type
Hey Willy, Recursors are not required to recurse when serving an ANY query. ANY query means that you ask a server (either recursor or auth) for everything it has on label x. If it has a CNAME on that label just returning that is a valid response (just like would happen if you queried for the CNAME type at label x). However when you ask for an A or record a recursor is required to follow the CNAME. Welcome to the wonderful world of DNS which doesn't really make sense anymore to anyone ;). Like said in the other mailthread, ANY queries are just a very unreliable way to get the records/types you want. Just asking for the actual types, if necessary in multiple queries, is the way to go. DNS is (usually) fast enough that the one extra query really shouldn't matter that much. -Robin- On 10/20/2015 8:49 AM, Willy Tarreau wrote: Hi Andrew, On Mon, Oct 19, 2015 at 05:39:58PM -0500, Andrew Hayworth wrote: The ANY query type is weird, and some resolvers don't 'do the legwork' of resolving useful things like CNAMEs. Given that upstream resolver behavior is not always under the control of the HAProxy administrator, we should not use the ANY query type. Rather, we should use A or according to either the explicit preferences of the operator, or the implicit default (/IPv6). But how does that fix the problem for you ? In your example below, the server clearly doesn't provide any A nor in the response so asking it for A or should not work either if it doesn't recurse, am I wrong ? PRODUCTION! ahaywo...@secret-hostname.com:~$ dig @10.11.12.53 ANY api.somestartup.io ; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> @10.11.12.53 ANY api.somestartup.io ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62454 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 0 ;; QUESTION SECTION: ;api.somestartup.io.IN ANY ;; ANSWER SECTION: api.somestartup.io. 20 IN CNAME api-somestartup-production.ap-southeast-2.elb.amazonaws.com. (...) I fear that such a change will prevent CNAMEs from working for many users where the DNS servers work fine, and will not necessarily fix the problems for other people. Regards, willy
Re: [call to comment] HAProxy's DNS resolution default query type
Hey Baptiste, Using ANY queries for this kind of stuff is considered by most people to be a bad practice since besides all the things you named it can lead to incomplete responses. Basically a resolver is allowed to just return whatever it has in cache when it receives an ANY query instead of actually doing an ANY query at the authoritative nameserver. Thus if it only received queries for an A record before you do an ANY query you will not get an record even if it is actually available since the resolver doesn't have it in its cache. Even worse if before it only got MX queries, you won't get either A or . Currently I don't know of any resolver that actually behaves this way, but it is allowed as per the DNS related RFC's, so using ANY queries might at some point lead to really weird results. -Robin- On 10/15/2015 4:35 PM, Baptiste wrote: Hey guys, by default, HAProxy tries to resolve server IPs using an ANY query type, then fails over to resolve-prefer type, then to "remaining" type. So ANY -> A -> or ANY -> -> A. In some cases, ANY query type is ignored or response contains no records, which leads HAProxy to try next query type. Today, 0yvind reported that weave DNS server actually answers with an NX response, preventing HAProxy to failover to next query type (this is by design). Jan, a fellow HAProxy user, already reported me that ANY query types are less and less fashion (for many reasons I'm not going to develop here). Amongs the many way to fix this issue, the one below has my preference: A new resolvers section directive (flag in that case) which prevent HAProxy from sending a ANY query type for the nameservers in this section ie "option dont-send-any-qtype". An other option, would to make HAProxy to failover to next query type in case of NX response. This would also cover the case where a server returns a NX because no records exists. Any comments are welcome. Baptiste
Re: [call to comment] HAProxy's DNS resolution default query type
Actually, I just asked one of the powerdns devs, and their recursor/resolver implementation does actually only return what is in its cache when answering an ANY query. On 10/15/2015 4:46 PM, Robin Geuze wrote: Hey Baptiste, Using ANY queries for this kind of stuff is considered by most people to be a bad practice since besides all the things you named it can lead to incomplete responses. Basically a resolver is allowed to just return whatever it has in cache when it receives an ANY query instead of actually doing an ANY query at the authoritative nameserver. Thus if it only received queries for an A record before you do an ANY query you will not get an record even if it is actually available since the resolver doesn't have it in its cache. Even worse if before it only got MX queries, you won't get either A or . Currently I don't know of any resolver that actually behaves this way, but it is allowed as per the DNS related RFC's, so using ANY queries might at some point lead to really weird results. -Robin- On 10/15/2015 4:35 PM, Baptiste wrote: Hey guys, by default, HAProxy tries to resolve server IPs using an ANY query type, then fails over to resolve-prefer type, then to "remaining" type. So ANY -> A -> or ANY -> -> A. In some cases, ANY query type is ignored or response contains no records, which leads HAProxy to try next query type. Today, 0yvind reported that weave DNS server actually answers with an NX response, preventing HAProxy to failover to next query type (this is by design). Jan, a fellow HAProxy user, already reported me that ANY query types are less and less fashion (for many reasons I'm not going to develop here). Amongs the many way to fix this issue, the one below has my preference: A new resolvers section directive (flag in that case) which prevent HAProxy from sending a ANY query type for the nameservers in this section ie "option dont-send-any-qtype". An other option, would to make HAProxy to failover to next query type in case of NX response. This would also cover the case where a server returns a NX because no records exists. Any comments are welcome. Baptiste
Re: Contribution for HAProxy: Peer Cipher based SSL CTX switching
Hey willy, One small comment. As of openssl v1.0.2 it actually supports loading multiple certificates with different chains. It requires calling SSL_CTX_add0_chain_cert (or SSL_CTX_add1_chain_cert, the exact difference can be found in the man page) instead of SSL_CTX_add_extra_chain_cert. I've actually hacked this in using a quick ad dirty method, but haven't gotten around to fixing it properly: https://github.com/haproxy/haproxy-1.5/compare/master...RobinGeuze:master Besides that I agree with the idea of letting openssl do the actual cipher selection and such, since it keeps ssl specific logic from the haproxy code base, and openssl also does some specific checks that would need to be copied to haproxy (for example it checks whether the cipher fingerprint matches a bunch of safari versions and then disables ECDSA as an option since it was broken in those safari versions). -Robin On 8/25/2015 16:36, Willy Tarreau wrote: Hi guys, Yesterday Emeric and I brainstormed on this subject in the office. Emeric brought on the table some cases which couldn't be reliably covered anymore, and proposed a slightly different approach which finally convinced me. I'll try to summarize here our long conversation and we'd like to get some feedback on the various proposals. A bit of background first. 1) it is currently possible to load multiple certs for the same domain in the config, and in this case, the first one loaded will be considered, regardless of its key type (RSA, DSA, ECDSA). 2) in crt-lists, it is possible to write filters which override the cert's CN and alt names. Similarly, these ones are considered in their declaration order so that the first matching one is used. 3) crt-lists support wildcards with exclusion. The cert registration works like this : a) the cert is declared : foo.pem *.mydomain.com !mail.mydomain.com b) the cert is loaded and an SSL_CTX is created. c) an entry is added into the wildcard tree at .mydomain.com with a reference to the SSL_CTX. d) another entry is added into the FQDN tree at mail.mydomain.com with a negation flag set (to reflect the ! in the filter). Lookup method works like this : 1) lookup for SNI name in FQDN tree, and skip entries with the negation flag. If found, return this SSL_CTX. 2) lookup for the SNI's domain in the wildcard tree. If found without a negation on the FQDN, then return it. (note there's currently a small bug there but that's a different story and out of scope for this brainstorming). 3) otherwise return default cert if not strict-sni. 4) currently, we have only one cert chain per SSL_CTX. OCSP Stapling applies to an SSL_CTX as well since it related to a single cert. 5) a number of cert-specific configuration items can be found in auxiliary files named based on the cert file's name, such as .issuer, .ocsp, .sctl, maybe .pwd later, and all are loaded relative to a single SSL_CTX, whether it makes sense or not for the future. For example, OCSP entries are per-SSL_CTX while they should become per-certificate. The latest proposal introduces some problems above when there's some overlapping between certs, because each time a certificate is loaded, a lookup in the tree will be performed to try to locate other instances of the same name(s) and update the corresponding SSL_CTX with the new cert. But if there are multiple instances, it doesn't work anymore. Also even without this, another issue is that names may not match exactly. For example, a hosting provider could be adding a new ECDSA cert for one customer only while the original RSA cert covers multiple customers. This already introduces a problem since the ECDSA cert would have to be added to the same SSL_CTX as the first one, thus would be presented for all other domains. It's possible of course to try to detect such configurations, but if we're realistic, they're going to be the most common ones, because new certs with multiple names will problably not reflect obsolete domains just like they may introduce new names. So it doesn't seem a good approach longterm- wise. Emeric's proposal consists in adopting a slightly different approach. Since the first cause of the trouble relates to matching the correct SSL_CTX when loading the second cert, and the second problem relates to name matching differences between certs in the same SSL_CTX, there are two important things to keep in mind : - certificates to be presented together must be loaded together - the SSL_CTX only matches a combination of names Thus the idea would be that when a certificate is loaded (either on the crt line or from the crt-list file), instead of loading only one cert and trying to match it against another one, better load all possible certs at once in the same SSL_CTX. This means that the RSA/DSA/ECDSA cert names must
Re: Server IP resolution using DNS in HAProxy
Hey, I don't understand the necessity of the hold valid config option. DNS has something that takes care of this for you called the TTL. Besides if hold valid is shorter then the TTL it would be kind of pointless since the resolvers you are querying won't re-resolve until the TTL expires. Tbh I don't really see the point of configuring the resolvers in haproxy when the OS has perfectly fine working facilities for this? What is the benefit besides possibly causing lookups to happen twice, once from the OS resolving stack and once from haproxies? If you really want exactly the same behavior as described you could always configure a local resolver that queries multiple other resolvers instead of recursing itself. -Robin- Marco Corte wrote on 7/15/2015 08:28: Il 14/07/2015 22:11, Baptiste ha scritto: - when parsing the configuration, HAProxy uses libc functions and resolvers provided by the operating system = if the server can't be resolved at this step, then HAProxy can't start [...] First, we want to fix the error when HAProxy fails starting up because the resolvers pointed by the system can't resolve a server's IP address (but HAProxy resolvers could). The idea here would to create a new flag on the server to tell HAProxy which IP to use. The server would be enabled when the IP has been provided by the expected tool. Hi, Baptiste. Since I am used to IP address I cannot figure out all possible implication of the server name DNS resolution :-) IMHO HAproxy should start in any case if the configuration is valid; only the unresolvable items should be marked as disabled or failing or down or whatever. A wrong DNS entry could stop a otherwise perfectly working configuration. Why not providing an option to start haproxy even if not all servers can be resolved? Your proposal of the init-addr could be useful for a trick: I can set a surely unreacheable address to let haproxy start and then force/wait for the name resolution to have a working server. A NX server state would be very nice. .marcoc
Re: Server IP resolution using DNS in HAProxy
Hey Nenad, Actually a local resolver can take care of that for you as well since every resolver I know allows configuring a different destination on domain basis. Also as described in the first email, the server has to be resolvable via the OS resolving stack as well otherwise haproxy won't start. This means you cannot use custom domains without configuring some sort of custom resolver anyway. -Robin- Nenad Merdanovic wrote on 7/15/2015 08:56: Hello Robin, On 07/15/2015 08:49 AM, Robin Geuze wrote: Tbh I don't really see the point of configuring the resolvers in haproxy when the OS has perfectly fine working facilities for this? What is the benefit besides possibly causing lookups to happen twice, once from the OS resolving stack and once from haproxies? If you really want exactly the same behavior as described you could always configure a local resolver that queries multiple other resolvers instead of recursing itself. Because this would perfectly integrate with things like Consul (https://www.consul.io/docs/agent/dns.html), which are currently very widely used to provide service discovery. -Robin- Regards,