Re: Possible bug with haproxy 1.6.9/1.7.0: multiproc + resolvers cause DNS timeouts

2016-11-28 Thread Joshua M. Boniface
Sorry here is my haproxy command information as well:

| u...@elb2.domain.net ~ $ sudo haproxy -vv 
| HA-Proxy version 1.7.0-1 2016/11/27
| Copyright 2000-2016 Willy Tarreau <wi...@haproxy.org>
| 
| Build options :
|   TARGET  = linux2628
|   CPU = generic
|   CC  = gcc 
|   CFLAGS  = -g -O2 -fPIE -fstack-protector-strong -Wformat 
-Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2
|   OPTIONS = USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1 
USE_NS=1
| 
| Default settings :
|   maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 
| 
| Encrypted password support via crypt(3): yes 
| Built with zlib version : 1.2.8
| Running on zlib version : 1.2.8
| Compression algorithms supported : identity("identity"), deflate("deflate"), 
raw-deflate("deflate"), gzip("gzip")
| Built with OpenSSL version : OpenSSL 1.0.2j  26 Sep 2016
| Running on OpenSSL version : OpenSSL 1.0.2j  26 Sep 2016
| OpenSSL library supports TLS extensions : yes 
| OpenSSL library supports SNI : yes 
| OpenSSL library supports prefer-server-ciphers : yes 
| Built with PCRE version : 8.35 2014-04-04
| Running on PCRE version : 8.35 2014-04-04
| PCRE library supports JIT : no (USE_PCRE_JIT not set)
| Built with Lua version : Lua 5.3.1
| Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT 
IP_FREEBIND
| Built with network namespace support
| 
| Available polling systems :
|   epoll : pref=300,  test result OK
|poll : pref=200,  test result OK
|  select : pref=150,  test result OK
| Total: 3 (3 usable), will use epoll.
| 
| Available filters :
| [COMP] compression
| [TRACE] trace
|     [SPOE] spoe 

Thanks,
Joshua M. Boniface
Linux System Ærchitect - Boniface Labs
Sigmentation fault: core dumped

On 29/11/16 02:17 AM, Joshua M. Boniface wrote:
> Hello list!
> 
> I believe I've found a bug in haproxy related to multiproc and a set of DNS 
> resolvers. What happens is, when combining these two features (multiproc and 
> dynamic resolvers), I get the following problem: the DNS resolvers, one per 
> process it seems, will fail intermittently and independently for no obvious 
> reason, and this triggers a DOWN event in the backend; a short time later, 
> the resolution succeeds and the backend goes back UP for a short time, before 
> repeating indefinitely. This bug also seems to have a curious effect of 
> causing the active record type to switch from A to  and then back to A 
> repeatedly in a dual-stack setup, though the test below shows that this bug 
> occurs in an IPv4-only environment as well, and this failure is not 
> documented in my tests.
> 
> First, some background. I'm attempting to set up an haproxy instance with 
> multiple processes for SSL termination. At the same time, I'm also trying to 
> use a IPv6 backend managed by DNS, so I set up a "resolvers" section so I 
> could use resolved IPv6 addresses from  records. As an aside, I've 
> noticed that haproxy will not start up if the only record for a host is an 
>  record, reporting that the address can't be resolved. However since I 
> run a dual-stack [A + ] record setup normally, this is not a huge deal to 
> me, though I think supporting an IPv6/-only backend should definitely be 
> a future goal!
> 
> First my config (figure 1). This is the most basic config I can construct 
> that triggers the bug while keeping most of my important settings; note the 
> host resolves to an A record only (figure 2); this record is provided by a 
> dnsmasq process which has read it out of /etc/hosts, so the resolution here 
> should be 100% stable.
> 
> (figure 1)
> | global
> | log ::1:514 daemon debug
> | log-send-hostname
> | chroot /var/lib/haproxy
> | pidfile /run/haproxy/haproxy.pid
> | nbproc 2
> | cpu-map 1 0
> | cpu-map 2 1
> | stats socket /var/lib/haproxy/admin-1.sock mode 660 level admin 
> process 1
> | stats socket /var/lib/haproxy/admin-2.sock mode 660 level admin 
> process 2
> | stats timeout 30s
> | user haproxy
> | group haproxy
> | daemon
> | maxconn 1
> | resolvers dns
> | nameserver dnsmasq 127.0.0.1:53
> | resolve_retries 1
> | hold valid 1s
> | hold timeout 1s
> | timeout retry 1s
> | defaults
> | log global
> | option  http-keep-alive
> | option  forwardfor except 127.0.0.0/8
> | option  redispatch
> | option  dontlognull
> | option  forwardfor
> | timeout connect 5s
> | timeout client  24h
> | timeout server  60m
> | listen back_deb-http
> |

Possible bug with haproxy 1.6.9/1.7.0: multiproc + resolvers cause DNS timeouts

2016-11-28 Thread Joshua M. Boniface
ious thing I see in the DNS pcap is requests 
for  records that don't exist, but I don't know if that's the cause, though 
this would explain the swapping-A-and--records I mentioned earlier.

I've noticed this bug both on haproxy 1.6.9 (from the Debian jessie-backports 
repo) and also on my own self-built 1.7.0 package as well, and all the above 
testing was on 1.7.0. Please let me know if I can provide any further 
information!

Thanks,
Joshua M. Boniface
Linux System Ærchitect - Boniface Labs
Sigmentation fault: core dumped


haproxy.pcap
Description: application/vnd.tcpdump.pcap
Process 29888 attached
01:46:54 epoll_wait(0, {}, 200, 983)= 0
01:46:55 epoll_wait(0, {{EPOLLIN, {u32=4, u64=4}}}, 200, 0) = 1
01:46:55 recvfrom(4, 
"\320)\201\200\0\1\0\0\0\0\0\0\3deb\6domain\3net\0\0\34\0\1", 512, 0, NULL, 
NULL) = 32
01:46:55 recvfrom(4, 0x7ffe07e72510, 512, 0, 0, 0) = -1 EAGAIN (Resource 
temporarily unavailable)
01:46:55 sendto(4, "\320)\1\0\0\1\0\0\0\0\0\0\3deb\6domain\3net\0\0\34\0\1", 
32, 0, NULL, 0) = 32
01:46:55 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 1
01:46:55 fcntl(1, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
01:46:55 setsockopt(1, SOL_TCP, TCP_NODELAY, [1], 4) = 0
01:46:55 connect(1, {sa_family=AF_INET, sin_port=htons(80), 
sin_addr=inet_addr("10.9.0.13")}, 16) = -1 EINPROGRESS (Operation now in 
progress)
01:46:55 epoll_wait(0, {{EPOLLIN, {u32=4, u64=4}}}, 200, 0) = 1
01:46:55 recvfrom(1, 0x563209ab8204, 16384, 0, 0, 0) = -1 EAGAIN (Resource 
temporarily unavailable)
01:46:55 getsockopt(1, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
01:46:55 sendto(1, "GET /debian/haproxy HTTP/1.0\r\n\r\n", 32, 
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily 
unavailable)
01:46:55 recvfrom(4, 0x7ffe07e72510, 512, 0, 0, 0) = -1 EAGAIN (Resource 
temporarily unavailable)
01:46:55 epoll_ctl(0, EPOLL_CTL_ADD, 1, {EPOLLOUT, {u32=1, u64=1}}) = 0
01:46:55 epoll_wait(0, {{EPOLLOUT, {u32=1, u64=1}}}, 200, 1000) = 1
01:46:55 getsockopt(1, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
01:46:55 sendto(1, "GET /debian/haproxy HTTP/1.0\r\n\r\n", 32, 
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 32
01:46:55 epoll_ctl(0, EPOLL_CTL_MOD, 1, {EPOLLIN|EPOLLRDHUP, {u32=1, u64=1}}) = 0
01:46:55 epoll_wait(0, {{EPOLLIN|EPOLLRDHUP, {u32=1, u64=1}}}, 200, 999) = 1
01:46:55 recvfrom(1, "HTTP/1.1 200 OK\r\nServer: nginx/1"..., 16384, 0, NULL, 
NULL) = 243
01:46:55 close(1)   = 0
01:46:55 epoll_wait(0, {}, 200, 998)= 0
01:46:56 epoll_wait(0, {}, 200, 0)  = 0
01:46:56 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 1
01:46:56 fcntl(1, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
01:46:56 setsockopt(1, SOL_TCP, TCP_NODELAY, [1], 4) = 0
01:46:56 connect(1, {sa_family=AF_INET, sin_port=htons(80), 
sin_addr=inet_addr("10.9.0.13")}, 16) = -1 EINPROGRESS (Operation now in 
progress)
01:46:56 epoll_wait(0, {{EPOLLIN, {u32=4, u64=4}}}, 200, 0) = 1
01:46:56 recvfrom(1, 0x563209ab8204, 16384, 0, 0, 0) = -1 EAGAIN (Resource 
temporarily unavailable)
01:46:56 getsockopt(1, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
01:46:56 sendto(1, "GET /debian/haproxy HTTP/1.0\r\n\r\n", 32, 
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 32
01:46:56 recvfrom(4, 0x7ffe07e72510, 512, 0, 0, 0) = -1 EAGAIN (Resource 
temporarily unavailable)
01:46:56 epoll_ctl(0, EPOLL_CTL_ADD, 1, {EPOLLIN|EPOLLRDHUP, {u32=1, u64=1}}) = 0
01:46:56 epoll_wait(0, {{EPOLLIN|EPOLLRDHUP, {u32=1, u64=1}}}, 200, 999) = 1
01:46:56 recvfrom(1, "HTTP/1.1 200 OK\r\nServer: nginx/1"..., 16384, 0, NULL, 
NULL) = 243
01:46:56 close(1)   = 0
01:46:56 epoll_wait(0, {}, 200, 998)= 0
01:46:57 epoll_wait(0, {}, 200, 0)  = 0
01:46:57 sendto(4, "\223\311\1\0\0\1\0\0\0\0\0\0\3deb\6domain\3net\0\0\34\0\1", 
32, 0, NULL, 0) = 32
01:46:57 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 1
01:46:57 fcntl(1, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
01:46:57 setsockopt(1, SOL_TCP, TCP_NODELAY, [1], 4) = 0
01:46:57 connect(1, {sa_family=AF_INET, sin_port=htons(80), 
sin_addr=inet_addr("10.9.0.13")}, 16) = -1 EINPROGRESS (Operation now in 
progress)
01:46:57 epoll_wait(0, {{EPOLLIN, {u32=4, u64=4}}}, 200, 0) = 1
01:46:57 recvfrom(1, 0x563209ab8204, 16384, 0, 0, 0) = -1 EAGAIN (Resource 
temporarily unavailable)
01:46:57 getsockopt(1, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
01:46:57 sendto(1, "GET /debian/haproxy HTTP/1.0\r\n\r\n", 32, 
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 32
01:46:57 recvfrom(4, 
"n\256\201\200\0\1\0\0\0\0\0\0\3deb\6domain\3net\0\0\34\0\1", 512, 0, NULL, 
NULL) = 32
01:46:57 recvfrom(4, 0x7ffe07e72510, 512, 0, 0, 0) = -1 EAGAIN (Resource 
temporarily unavailable)
01:46:57 epoll_ctl(0, EPOLL_CTL_ADD, 1, {EPOLLIN|EPOLLRDHUP, {u32=1, u64=1}}) = 0
01:46:57 epoll_wait(0, {{EPOLLIN|EPOLLRDHUP, {u32=1, u64=1}}}, 200, 1000) = 1
01:46:57 recvfrom(1, "HTTP/1.1 200 OK\r\nServer: nginx/1"..., 16384, 0, NULL, 
NULL) = 243
01:46:57 close(1)   = 0
01: