Hi everyone! I assume I am misunderstanding something, but I cannot figure out what it is. We are using haproxy in AWS, in this case as sidecars to applications so they need not know about changing backend addresses at all, but can always talk to localhost.
Haproxy listens on localhost and then forwards traffic to an ELB instance. This works great, but there have been two occasions now, where due to a change in the ELB's IP addresses, our services went down, because the backends could not be reached anymore. I don't understand why haproxy sticks to the old IP address instead of going to one of the updated ones. There is a resolvers section which points to the local dnsmasq instance (there to send some requests to consul, but that's not used here). All other traffic is forwarded on to the AWS DNS server set via DHCP. I managed to get timely updates and updated backend servers when using server-template, but form what I understand this should not really be necessary for this. This is the trimmed down sidecar config. I have not made any changes to dns timeouts etc. resolvers default # dnsmasq nameserver local 127.0.0.1:53 listen regular bind 127.0.0.1:9300 option dontlog-normal server lb-internal loadbalancer-internal.xxx.yyy:9300 resolvers default check addr loadbalancer-internal.xxx.yyy port 9300 listen templated bind 127.0.0.1:9200 option dontlog-normal option httpchk /haproxy-simple-healthcheck server-template lb-internal 2 loadbalancer-internal.xxx.yyy:9200 resolvers default check port 9299 To simulate changing ELB adresses, I added entries for loadbalancer-internal.xxx.yyy in /etc/hosts and to be able to control them via dnsmasq. I tried different scenarios, but could not reliably predict what would happen in all cases. The address ending in 52 (marked as "valid" below) is a currently (as of the time of testing) valid IP for the ELB. The one ending in 199 (marked "invalid") is an unused private IP address in my VPC. Starting with /etc/hosts: 10.205.100.52 loadbalancer-internal.xxx.yyy # valid 10.205.100.199 loadbalancer-internal.xxx.yyy # invalid haproxy starts and reports: regular: lb-internal UP/L7OK templated: lb-internal1 DOWN/L4TOUT lb-internal2 UP/L7OK That's expected. Now when I edit /etc/hosts to _only_ contain the _invalid_ address and restart dnsmasq, I would expect both proxies to go fully down. But only the templated proxy behaves like that: regular: lb-internal UP/L7OK templated: lb-internal1 DOWN/L4TOUT lb-internal2 MAINT (resolution) Reloading haproxy in this state leads to: regular: lb-internal DOWN/L4TOUT templated: lb-internal1 MAINT (resolution) lb-internal2 DOWN/L4TOUT After fixing /etc/hosts to include the valid server again and restarting dnsmasq: regular: lb-internal DOWN/L4TOUT templated: lb-internal1 UP/L7OK lb-internal2 DOWN/L4TOUT Shouldn't the regular proxy also recognize the change and bring the backend up or down depending on the DNS change? I have waited for several health check rounds (seeing "* L4TOUT" and "L4TOUT") toggle, but it still never updates. I also tried to have _only_ the invalid address in /etc/hosts, then restarting haproxy. The regular backends will never recognize it when I add the valid one back in. The templated one does, _unless_ I set it up to have only 1 instead of 2 server slots. In that case it behaves will also only pick up the valid server when reloaded. On the other hand, it _will_ recognize when I remove the valid server without a reload on the next health check, but _not_ bring them back in and make the proxy UP when it comes back. I assume my understanding of something here is broken, and I would gladly be told about it :) Thanks a lot! Daniel Version Info: ------------------ $ haproxy -vv HA-Proxy version 1.8.19-1ppa1~trusty 2019/02/12 Copyright 2000-2019 Willy Tarreau <wi...@haproxy.org> Build options : TARGET = linux2628 CPU = generic CC = gcc CFLAGS = -O2 -g -O2 -fPIE -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1 USE_PCRE_JIT=1 USE_NS=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Built with OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014 Running on OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2 Built with Lua version : Lua 5.3.1 Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Encrypted password support via crypt(3): yes Built with multi-threading support. Built with PCRE version : 8.31 2012-07-06 Running on PCRE version : 8.31 2012-07-06 PCRE library supports JIT : no (libpcre build without JIT?) Built with zlib version : 1.2.8 Running on zlib version : 1.2.8 Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with network namespace support. Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. Available filters : [SPOE] spoe [COMP] compression [TRACE] trace -- Daniel Schneller Principal Cloud Engineer CenterDevice GmbH Rheinwerkallee 3 53227 Bonn www.centerdevice.com __________________________________________ Geschäftsführung: Dr. Patrick Peschlow, Dr. Lukas Pustina, Michael Rosbach, Handelsregister-Nr.: HRB 18655, HR-Gericht: Bonn, USt-IdNr.: DE-815299431 Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist nicht gestattet.