Re: Haproxy 1.8.4 crashing workers and increased memory usage

2018-04-09 Thread Robin Geuze

Hey,

Won't that be a bit pointless since we don't use threads?

Regards,

Robin Geuze


On 4/9/2018 10:31, Илья Шипицин wrote:

can you try thread sanitizer (in real time)?

https://github.com/google/sanitizers/wiki#threadsanitizer


I'd like to try myself, however, we do not observe bad things in our 
environment


2018-04-09 13:24 GMT+05:00 Robin Geuze <rob...@transip.nl 
<mailto:rob...@transip.nl>>:


Hey Willy,

So I made a build this morning with libslz and re-enabled
compression and within an hour we had the exit code 134 errors, so
zlib does not seem to be the problem here.

Regards,

    Robin Geuze



On 4/7/2018 00:30, Willy Tarreau wrote:

Hi Robin,

On Fri, Apr 06, 2018 at 03:52:33PM +0200, Robin Geuze wrote:

Hey Willy,

I was actually the one that had the hunch to disable
compression. I
suspected that this was the issue because there was a
bunch of "abort" calls
in include/common/hathreads.h" which is used by the
compression stuff.
However I just noticed those aborts are actually only
there if DEBUG_THREAD
is defined which it doesn't seem to be for our build. So
basically, I have
no clue whatsoever why disabling compression fixes the bug.

At least I don't feel alone :-)

I can see next week if we can make a build with slz
instead of zlib (we seem
to be linked against zlib/libz atm).

Thank you, I appreciate it!

Cheers,
Willy








Re: Haproxy 1.8.4 crashing workers and increased memory usage

2018-04-09 Thread Robin Geuze

Hey Willy,

So I made a build this morning with libslz and re-enabled compression 
and within an hour we had the exit code 134 errors, so zlib does not 
seem to be the problem here.


Regards,

Robin Geuze


On 4/7/2018 00:30, Willy Tarreau wrote:

Hi Robin,

On Fri, Apr 06, 2018 at 03:52:33PM +0200, Robin Geuze wrote:

Hey Willy,

I was actually the one that had the hunch to disable compression. I
suspected that this was the issue because there was a bunch of "abort" calls
in include/common/hathreads.h" which is used by the compression stuff.
However I just noticed those aborts are actually only there if DEBUG_THREAD
is defined which it doesn't seem to be for our build. So basically, I have
no clue whatsoever why disabling compression fixes the bug.

At least I don't feel alone :-)


I can see next week if we can make a build with slz instead of zlib (we seem
to be linked against zlib/libz atm).

Thank you, I appreciate it!

Cheers,
Willy





Re: Haproxy 1.8.4 crashing workers and increased memory usage

2018-04-06 Thread Robin Geuze

Hey Willy,

I was actually the one that had the hunch to disable compression. I 
suspected that this was the issue because there was a bunch of "abort" 
calls in include/common/hathreads.h" which is used by the compression 
stuff. However I just noticed those aborts are actually only there if 
DEBUG_THREAD is defined which it doesn't seem to be for our build. So 
basically, I have no clue whatsoever why disabling compression fixes the 
bug.


I can see next week if we can make a build with slz instead of zlib (we 
seem to be linked against zlib/libz atm).


Regards,

Robin Geuze


On 4/6/2018 14:18, Willy Tarreau wrote:

Hi Frank,

On Fri, Apr 06, 2018 at 10:53:36AM +, Frank Schreuder wrote:

We tested haproxy 1.8.6 with compression enabled today, within the first few 
hours it already went wrong:
[ALERT] 095/120526 (12989) : Current worker 5241 exited with code 134

OK thanks, and sorry for that.


Our other balancer running haproxy 1.8.5 with compression disabled is still
running fine after 2 days with the same workload.
So there seems to be a locking issue when compression is enabled.

Well, an issue with compression, but I'm really not seeing what makes
you speak about locking since :
   - you don't seem to have threads enabled
   - locking issues generally cause deadlocks, not aborts

The other problem is that we noticed already that there are very few
abort() calls in haproxy and none of them in this area. So it's very
possible that it comes from another layer detecting an issue provoked
by compression. Typically the libc's malloc/free can stop the program
using abort() if they detect a corruption.

It would really help to know where this abort() happens, at least to
get a backtrace.

By the way, area you using zlib or slz ? zlib uses a tricky allocator.
I checked it again yesterday and it was made thread safe. But we couldn't
rule out an issue there. slz doesn't need memory however. If you're on
zlib, switching to slz could also indicate if the problem is related to
these memory allocations or not.

Thanks,
Willy






Re: "Odd" behaviour on resolvers

2016-03-04 Thread Robin Geuze
During configcheck the resolver portion of haproxy is not yet active. Thus during the check it uses the system resolver. Once actually running it will use the resolver from haproxy.
On Mar 4, 2016 16:26, "Arnaud B."  wrote:
  


  
  
Hi there,

First of all : I am very fond of HAProxy :-)

I was trying to do some service discovery with bind9 and HAProxy
when I found an odd behaviour on the resolvers part.

Here are some config samples :
My frontend and backend and resolvers config:
resolvers discovery
   nameserver jabba 172.16.0.2:53
   resolve_retries   5
   timeout retry 1s
   hold valid   3s

frontend staging_frontend
      mode http
      bind 127.0.0.1:8013
      default_backend staging_backend
  
  
  backend staging_backend
      server jabba php-staging.vra:8013 check resolvers
discovery


a sample dig : 
$ dig @172.16.0.2 php-staging.vra +short
  172.16.0.2


So far, everything seems fine to me. But, when i reload haproxy's
config : 
mars 04 15:55:13 jabba haproxy[25262]: [ALERT]
063/155513 (25262) : parsing [/etc/haproxy/haproxy.cfg:98] :
'server jabba' : invalid address: 'php-staging.vra' in ...
  mars 04 15:55:13 jabba haproxy[25262]: [ALERT] 063/155513
(25262) : Error(s) found in configuration file :
/etc/haproxy/haproxy.cfg
  mars 04 15:55:13 jabba haproxy[25262]: [ALERT] 063/155513
(25262) : Fatal errors found in configuration.

I first questioned my configuration and checked everything out ...
There was no obvious issue.

I checked my /etc/resolv.conf file and saw that my server
wasn't using the same resolver. Its resolver did not had any
knowledge whatsoever of php-staging.vra and it caused my
configuration to fail on reload. 
My /etc/resolv.conf was : 
nameserver 8.8.8.8
  nameserver 213.186.33.99
  
  

and became
nameserver 172.16.0.2
  search vra
  nameserver 8.8.8.8
  nameserver 213.186.33.99


permitting haproxy's reload.

My question is pretty simple : since there is a resolvers section in
haproxy's configuration, is it possible to use it despite the
server's resolv.conf?

Let me know if something's unclear in my explanation :)
  



Re: [PATCH] MEDIUM: dns: Don't use the ANY query type

2015-10-21 Thread Robin Geuze

Hey guys,

Actually when you get an NXDOMAIN reply you can just stop resolving that 
domain. Basically there are 2 types of "negative" replies in DNS:


NODATA: basically this is when you don't get an error (NOERROR in dig), 
but not the actual data you are looking for. You might have gotten some 
CNAME data but no A or  record (depending on what you wanted 
obviously). This means that the actual domain name does exist, but 
doesn't have data of the type you requested. The term NODATA is used in 
DNS RFC's but it doesn't actually have its own error code.


NXDOMAIN: This is denoted by the NXDOMAIN error code. It means that 
either the domain you requested itself or the last target domain from a 
CNAME does not exist at all (IE no data whatsoever) and there also isn't 
a wildcard available that matches it. So if you asked for an A record, 
getting an NXDOMAIN means there also won't be an  record.


The above explanation is a bit of an over simplification cause there are 
also things like empty non-terminals which also don't have any data, but 
instead of an NXDOMAIN actually return a NODATA (in most cases, there 
are some authoritative servers that don't do it properly). But the end 
result is that you can pretty much say that when you get NXDOMAIN, there 
really is nothing there for you so you can just stop looking (at least 
at that the current server).


-Robin-

On 10/20/2015 10:25 PM, Willy Tarreau wrote:

On Tue, Oct 20, 2015 at 10:20:50PM +0200, Baptiste wrote:

On Tue, Oct 20, 2015 at 9:09 PM, Lukas Tribus  wrote:

I don't know. I'm always only focused on the combination of user-visible
changes and risks of bugs (which are user-visible changes btw). So if we
can do it without breaking too much code, then it can be backported. What
we have now is something which is apparently insufficient to some users
so we can improve the situation. I wouldn't want to remove prefer-* or
change the options behavior or whatever for example.

Ok, if we don't remove existing prefer-* keywords a 1.6 backport sounds
possible without user visible breakage, great.

lukas

Ok, just to make it clear, let me write a few conf examples:
- server home-v4 home-v4.mydomain check resolve-prefer ipv4
  => A then  (failover on NX)
- server home-v4 home-v4.mydomain check v4only
  => A only (stop on NX)

If both 'resolve-prefer ipv[46]' and 'v[46]only' are set, whatever
combination, then, v[46]only applies, but configuration parsing may
return a warning.

Yes, but please avoid the warning, it makes it unconvenient to edit
configs. You may for example have "resolve-prefer ipv4" in the
default-server directive, and having it warn because one of your
servers has v4only is annoying. BTW, the v4only and resolve-prefer
should also be used during the initial resolving phase performed
by getaddrinfo() but that's for a future patch :-)

Willy







Re: [PATCH] MEDIUM: dns: Don't use the ANY query type

2015-10-20 Thread Robin Geuze

Hey Willy,

Recursors are not required to recurse when serving an ANY query. ANY 
query means that you ask a server (either recursor or auth) for 
everything it has on label x. If it has a CNAME on that label just 
returning that is a valid response (just like would happen if you 
queried for the CNAME type at label x). However when you ask for an A or 
 record a recursor is required to follow the CNAME. Welcome to the 
wonderful world of DNS which doesn't really make sense anymore to anyone ;).


Like said in the other mailthread, ANY queries are just a very 
unreliable way to get the records/types you want. Just asking for the 
actual types, if necessary in multiple queries, is the way to go. DNS is 
(usually) fast enough that the one extra query really shouldn't matter 
that much.


-Robin-

On 10/20/2015 8:49 AM, Willy Tarreau wrote:

Hi Andrew,

On Mon, Oct 19, 2015 at 05:39:58PM -0500, Andrew Hayworth wrote:

The ANY query type is weird, and some resolvers don't 'do the legwork'
of resolving useful things like CNAMEs. Given that upstream resolver
behavior is not always under the control of the HAProxy administrator,
we should not use the ANY query type. Rather, we should use A or 
according to either the explicit preferences of the operator, or the
implicit default (/IPv6).

But how does that fix the problem for you ? In your example below,
the server clearly doesn't provide any A nor  in the response
so asking it for A or  should not work either if it doesn't
recurse, am I wrong ?


   PRODUCTION! ahaywo...@secret-hostname.com:~$
   dig @10.11.12.53 ANY api.somestartup.io

   ; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> @10.11.12.53 ANY api.somestartup.io
   ; (1 server found)
   ;; global options: +cmd
   ;; Got answer:
   ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62454
   ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 0

   ;; QUESTION SECTION:
   ;api.somestartup.io.IN  ANY

   ;; ANSWER SECTION:
   api.somestartup.io. 20  IN  CNAME 
api-somestartup-production.ap-southeast-2.elb.amazonaws.com.

(...)

I fear that such a change will prevent CNAMEs from working for many
users where the DNS servers work fine, and will not necessarily fix
the problems for other people.

Regards,
willy







Re: [call to comment] HAProxy's DNS resolution default query type

2015-10-15 Thread Robin Geuze

Hey Baptiste,

Using ANY queries for this kind of stuff is considered by most people to 
be a bad practice since besides all the things you named it can lead to 
incomplete responses. Basically a resolver is allowed to just return 
whatever it has in cache when it receives an ANY query instead of 
actually doing an ANY query at the authoritative nameserver. Thus if it 
only received queries for an A record before you do an ANY query you 
will not get an  record even if it is actually available since the 
resolver doesn't have it in its cache. Even worse if before it only got 
MX queries, you won't get either A or .


Currently I don't know of any resolver that actually behaves this way, 
but it is allowed as per the DNS related RFC's, so using ANY queries 
might at some point lead to really weird results.


-Robin-

On 10/15/2015 4:35 PM, Baptiste wrote:

Hey guys,

by default, HAProxy tries to resolve server IPs using an ANY query
type, then fails over to resolve-prefer type, then to "remaining"
type.
So ANY -> A ->  or ANY ->  -> A.

In some cases, ANY query type is ignored or response contains no
records, which leads HAProxy to try next query type.
Today, 0yvind reported that weave DNS server actually answers with an
NX response, preventing HAProxy to failover to next query type (this
is by design).

Jan, a fellow HAProxy user, already reported me that ANY query types
are less and less fashion (for many reasons I'm not going to develop
here).

Amongs the many way to fix this issue, the one below has my preference:
  A new resolvers section directive (flag in that case) which prevent
HAProxy from sending a ANY query type for the nameservers in this
section ie "option dont-send-any-qtype".

An other option, would to make HAProxy to failover to next query type
in case of NX response.
This would also cover the case where a server returns a NX because no
 records exists.

Any comments are welcome.

Baptiste






Re: [call to comment] HAProxy's DNS resolution default query type

2015-10-15 Thread Robin Geuze
Actually, I just asked one of the powerdns devs, and their 
recursor/resolver implementation does actually only return what is in 
its cache when answering an ANY query.


On 10/15/2015 4:46 PM, Robin Geuze wrote:

Hey Baptiste,

Using ANY queries for this kind of stuff is considered by most people 
to be a bad practice since besides all the things you named it can 
lead to incomplete responses. Basically a resolver is allowed to just 
return whatever it has in cache when it receives an ANY query instead 
of actually doing an ANY query at the authoritative nameserver. Thus 
if it only received queries for an A record before you do an ANY query 
you will not get an  record even if it is actually available since 
the resolver doesn't have it in its cache. Even worse if before it 
only got MX queries, you won't get either A or .


Currently I don't know of any resolver that actually behaves this way, 
but it is allowed as per the DNS related RFC's, so using ANY queries 
might at some point lead to really weird results.


-Robin-

On 10/15/2015 4:35 PM, Baptiste wrote:

Hey guys,

by default, HAProxy tries to resolve server IPs using an ANY query
type, then fails over to resolve-prefer type, then to "remaining"
type.
So ANY -> A ->  or ANY ->  -> A.

In some cases, ANY query type is ignored or response contains no
records, which leads HAProxy to try next query type.
Today, 0yvind reported that weave DNS server actually answers with an
NX response, preventing HAProxy to failover to next query type (this
is by design).

Jan, a fellow HAProxy user, already reported me that ANY query types
are less and less fashion (for many reasons I'm not going to develop
here).

Amongs the many way to fix this issue, the one below has my preference:
  A new resolvers section directive (flag in that case) which prevent
HAProxy from sending a ANY query type for the nameservers in this
section ie "option dont-send-any-qtype".

An other option, would to make HAProxy to failover to next query type
in case of NX response.
This would also cover the case where a server returns a NX because no
 records exists.

Any comments are welcome.

Baptiste









Re: Contribution for HAProxy: Peer Cipher based SSL CTX switching

2015-08-25 Thread Robin Geuze

Hey willy,

One small comment. As of openssl v1.0.2 it actually supports loading 
multiple certificates with different chains. It requires calling 
SSL_CTX_add0_chain_cert (or SSL_CTX_add1_chain_cert, the exact 
difference can be found in the man page) instead of 
SSL_CTX_add_extra_chain_cert. I've actually hacked this in using a quick 
ad dirty method, but haven't gotten around to fixing it properly: 
https://github.com/haproxy/haproxy-1.5/compare/master...RobinGeuze:master


Besides that I agree with the idea of letting openssl do the actual 
cipher selection and such, since it keeps ssl specific logic from the 
haproxy code base, and openssl also does some specific checks that would 
need to be copied to haproxy (for example it checks whether the cipher 
fingerprint matches a bunch of safari versions and then disables ECDSA 
as an option since it was broken in those safari versions).


-Robin

On 8/25/2015 16:36, Willy Tarreau wrote:

Hi guys,

Yesterday Emeric and I brainstormed on this subject in the office. Emeric
brought on the table some cases which couldn't be reliably covered anymore,
and proposed a slightly different approach which finally convinced me.

I'll try to summarize here our long conversation and we'd like to get some
feedback on the various proposals.

A bit of background first.

1) it is currently possible to load multiple certs for the same domain in
the config, and in this case, the first one loaded will be considered,
regardless of its key type (RSA, DSA, ECDSA).

2) in crt-lists, it is possible to write filters which override the cert's CN
and alt names. Similarly, these ones are considered in their declaration
order so that the first matching one is used.

3) crt-lists support wildcards with exclusion. The cert registration works
like this :

a) the cert is declared :

 foo.pem *.mydomain.com !mail.mydomain.com

b) the cert is loaded and an SSL_CTX is created.

c) an entry is added into the wildcard tree at .mydomain.com with
   a reference to the SSL_CTX.

d) another entry is added into the FQDN tree at mail.mydomain.com
   with a negation flag set (to reflect the ! in the filter).

   Lookup method works like this :

1) lookup for SNI name in FQDN tree, and skip entries with the
   negation flag. If found, return this SSL_CTX.

2) lookup for the SNI's domain in the wildcard tree. If found without
   a negation on the FQDN, then return it. (note there's currently a
   small bug there but that's a different story and out of scope for
   this brainstorming).

3) otherwise return default cert if not strict-sni.

4) currently, we have only one cert chain per SSL_CTX. OCSP Stapling
applies to an SSL_CTX as well since it related to a single cert.

5) a number of cert-specific configuration items can be found in auxiliary
files named based on the cert file's name, such as .issuer, .ocsp,
.sctl, maybe .pwd later, and all are loaded relative to a single
SSL_CTX, whether it makes sense or not for the future. For example,
OCSP entries are per-SSL_CTX while they should become per-certificate.


The latest proposal introduces some problems above when there's some
overlapping between certs, because each time a certificate is loaded, a
lookup in the tree will be performed to try to locate other instances of
the same name(s) and update the corresponding SSL_CTX with the new cert.
But if there are multiple instances, it doesn't work anymore.

Also even without this, another issue is that names may not match exactly.
For example, a hosting provider could be adding a new ECDSA cert for one
customer only while the original RSA cert covers multiple customers. This
already introduces a problem since the ECDSA cert would have to be added
to the same SSL_CTX as the first one, thus would be presented for all
other domains.

It's possible of course to try to detect such configurations, but if we're
realistic, they're going to be the most common ones, because new certs
with multiple names will problably not reflect obsolete domains just like
they may introduce new names. So it doesn't seem a good approach longterm-
wise.

Emeric's proposal consists in adopting a slightly different approach.

Since the first cause of the trouble relates to matching the correct
SSL_CTX when loading the second cert, and the second problem relates
to name matching differences between certs in the same SSL_CTX, there
are two important things to keep in mind :

   - certificates to be presented together must be loaded together
   - the SSL_CTX only matches a combination of names

Thus the idea would be that when a certificate is loaded (either on the
crt line or from the crt-list file), instead of loading only one cert
and trying to match it against another one, better load all possible
certs at once in the same SSL_CTX.

This means that the RSA/DSA/ECDSA cert names must 

Re: Server IP resolution using DNS in HAProxy

2015-07-15 Thread Robin Geuze

Hey,

I don't understand the necessity of the hold valid config option. DNS 
has something that takes care of this for you called the TTL. Besides if 
hold valid is shorter then the TTL it would be kind of pointless since 
the resolvers you are querying won't re-resolve until the TTL expires.


Tbh I don't really see the point of configuring the resolvers in haproxy 
when the OS has perfectly fine working facilities for this? What is the 
benefit besides possibly causing lookups to happen twice, once from the 
OS resolving stack and once from haproxies? If you really want exactly 
the same behavior as described you could always configure a local 
resolver that queries multiple other resolvers instead of recursing itself.


-Robin-

Marco Corte wrote on 7/15/2015 08:28:

Il 14/07/2015 22:11, Baptiste ha scritto:

- when parsing the configuration, HAProxy uses libc functions and

resolvers provided by the operating system = if the server can't be
resolved at this step, then HAProxy can't start

[...]
 First, we want to fix the error when HAProxy fails starting up because
 the resolvers pointed by the system can't resolve a server's IP
 address (but HAProxy resolvers could).
 The idea here would to create a new flag on the server to tell HAProxy
 which IP to use. The server would be enabled when the IP has been
 provided by the expected tool.


Hi, Baptiste.

Since I am used to IP address I cannot figure out all possible 
implication of the server name DNS resolution :-)


IMHO HAproxy should start in any case if the configuration is valid; 
only the unresolvable items should be marked as disabled or failing or 
down or whatever.

A wrong DNS entry could stop a otherwise perfectly working configuration.

Why not providing an option to start haproxy even if not all servers 
can be resolved?


Your proposal of the init-addr could be useful for a trick: I can 
set a surely unreacheable address to let haproxy start and then 
force/wait for the name resolution to have a working server.


A NX server state would be very nice.

.marcoc






Re: Server IP resolution using DNS in HAProxy

2015-07-15 Thread Robin Geuze

Hey Nenad,

Actually a local resolver can take care of that for you as well since 
every resolver I know allows configuring a different destination on 
domain basis. Also as described in the first email, the server has to be 
resolvable via the OS resolving stack as well otherwise haproxy won't 
start. This means you cannot use custom domains without configuring some 
sort of custom resolver anyway.


-Robin-

Nenad Merdanovic wrote on 7/15/2015 08:56:

Hello Robin,

On 07/15/2015 08:49 AM, Robin Geuze wrote:

Tbh I don't really see the point of configuring the resolvers in haproxy
when the OS has perfectly fine working facilities for this? What is the
benefit besides possibly causing lookups to happen twice, once from the
OS resolving stack and once from haproxies? If you really want exactly
the same behavior as described you could always configure a local
resolver that queries multiple other resolvers instead of recursing itself.

Because this would perfectly integrate with things like Consul
(https://www.consul.io/docs/agent/dns.html), which are currently very
widely used to provide service discovery.


-Robin-


Regards,