Hello! I am currently on vacation for two weeks, but I'll see to it when I get back. There is no particular reason for the specific check address here, as you correctly figured. It is just an artefact of the template used to create the configuration. I can remove that, but there might be cases were it matters (though I don't think we have any ATM AFAIR). Would not have guessed there would be different resolution paths; if this is intentional, a note in the documentation would be helpful. I can provide that when I am back and when there is clarity on why it would be like this.
Thank you very much for your help! Cheers, Daniel > On 23. Mar 2019, at 14:53, PiBa-NL <piba.nl....@gmail.com> wrote: > > Hi Daniel, Baptiste, > > @Daniel, can you remove the 'addr loadbalancer-internal.xxx.yyy' from the > server check? It seems to me that that name is not being resolved by the > 'resolvers'. And even if it would it would be kinda redundant as it is in the > example as it is the same as the servername.?. Not sure how far below > scenarios are all explained by this though.. > > @Baptiste, is it intentional that a wrong 'addr' dns name makes haproxy fail > to start despite having the supposedly never failing 'default-server > init-addr last,libc,none' ? Is it possibly a good feature request to support > re-resolving a dns name for the addr setting as well ? > > Regards, > PiBa-NL (Pieter) > > Op 21-3-2019 om 20:37 schreef Daniel Schneller: >> Hi! >> >> Thanks for the response. I had looked at the "hold" directives, but since >> they all seem to have reasonable defaults, I did not touch them. >> I specified 10s explictly, but it did not make a difference. >> >> I did some more tests, however, and it seems to have more to do with the >> number of responses for the initial(?) DNS queries. >> Hopefully these three tables make sense and don't get mangled in the mail. >> The "templated" >> proxy is defined via "server-template" with 3 "slots". The "regular" one >> just as "server". >> >> >> Test 1: Start out with both "valid" and "broken" DNS entries. Then comment >> out/add back >> one at a time as described in (1)-(5). >> Each time after changing /etc/hosts, restart dnsmasq and check haproxy via >> hatop. >> Haproxy started fresh once dnsmasq was set up to (1). >> >> | state state >> /etc/hosts | regular templated >> ------------|----------------------------- >> (1) BRK | UP/L7OK DOWN/L4TOUT >> VALID | MAINT/resolution >> | UP/L7OK >> ------------|-------------------------------- >> >> (2) BRK | DOWN/L4TOUT DOWN/L4TOUT >> #VALID | MAINT/resolution >> | MAINT/resolution >> | >> (3) #BRK | UP/L7OK UP/L7OK >> VALID | MAINT/resolution >> | MAINT/resolution >> ------------|------------------------------------ >> (4) BRK | UP/L7OK UP/L7OK >> VALID | DOWN/L4TOUT >> | MAINT/resolution >> ------------|------------------------------------ >> (5) BRK | DOWN/L4TOUT DOWN/L4TOUT >> #VALID | MAINT/resolution >> | MAINT/resolution >> This all looks normal and as expected. As soon as the >> "VALID" DNS entry is present, the >> UP state follows within a few seconds. >> >> >> Test 2: Start out "valid only" (1) and proceed as described in (2)-(5), >> again restarting >> dnsmasq each time, and haproxy reloaded after dnsmasq was set up to (1). >> >> | state state >> /etc/hosts | regular templated >> ------------|------------------------------------ >> (1) #BRK | UP/L7OK MAINT/resolution >> VALID | MAINT/resolution >> | UP/L7OK >> ------------|------------------------------------ >> (2) BRK | UP/L7OK DOWN/L4TOUT >> VALID | MAINT/resolution >> | UP/L7OK >> ------------|------------------------------------ >> (3) #BRK | UP/L7OK MAINT/resolution >> VALID | MAINT/resolution >> | UP/L7OK >> ------------|------------------------------------ >> (4) BRK | UP/L7OK DOWN/L4TOUT >> VALID | MAINT/resolution >> | UP/L7OK >> ------------|------------------------------------ >> (5) BRK | DOWN/L4TOUT DOWN/L4TOUT >> #VALID | MAINT/resolution >> | MAINT/resolution >> Everything good here, too. Adding the broken >> DNS entry does not bring the proxies down >> until only the broken one is left. >> >> >> >> Test 3: Start out "broken only" (1). >> Again, same as before, haproxy restarted once dnsmasq was initialized to (1). >> >> | state state >> /etc/hosts | regular templated >> ------------|------------------------------------ >> (1) BRK | DOWN/L4TOUT DOWN/L4TOUT >> #VALID | MAINT/resolution >> | MAINT/resolution >> ------------|------------------------------------ >> (2) BRK | DOWN/L4TOUT UP/L7OK >> VALID | MAINT/resolution >> | MAINT/resolution >> ------------|------------------------------------ >> (3) #BRK | UP/L7OK MAINT/resolution >> VALID | UP/L7OK >> | MAINT/resolution >> ------------|------------------------------------ >> (4) BRK | UP/L7OK DOWN/L4TOUT >> VALID | UP/L7OK >> | MAINT/resolution >> ------------|------------------------------------ >> (5) BRK | DOWN/L4TOUT DOWN/L4TOUT >> #VALID | MAINT/resolution >> | MAINT/resolution >> >> >> Here it becomes interesting. In (1) both regular and templated proxies are >> DOWN, of course. >> However, adding in a second DNS response in (2) brings the templated proxy >> UP, but the regular >> one stays DOWN. Only when in (3) the valid response is the only one >> presented, does it go >> UP as well. Adding the broken one back (4) is of no consequence then. And >> again, after >> leaving just the broken response (5), both correctly go DOWN. >> >> So it would appear that if haproxy starts with just a single "broken" DNS >> response, adding >> a healthy one later one is not recognized. Instead, it stays DOWN. >> "Replacing" the single >> broken response with a single "valid" response, however, brings it to life, >> and it won't be >> discouraged by bringing the broken one back in. >> >> Tests 1 and 2 make sense to me, but test 3 I don't understand. For now, I >> have worked >> around the issue by defining all my relevant backends with server-template >> and at least >> 2 slots, but I would still like to understand it. And maybe it is a bug, >> after all ;) >> >> Kind regards, and thanks for a great piece of software! >> >> Daniel >> >> >> >> >> >>> On 21. Mar 2019, at 14:28, Bruno Henc <brh...@nua-avenir.net> wrote: >>> >>> Hello Daniel, >>> >>> >>> You might be missing the hold-valid directive in your resolvers section: >>> https://www.haproxy.com/documentation/hapee/1-9r1/onepage/#5.3.2-timeout >>> >>> This should force HAProxy to fetch the DNS record values from the resolver. >>> >>> A reload of the HAProxy instance also forces the instances to query all >>> records from the resolver. >>> >>> Can you please retest with the updated configuration and report back the >>> results? >>> >>> >>> Best regards, >>> >>> Bruno Henc >>> >>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ >>> On Thursday, March 21, 2019 12:09 PM, Daniel Schneller >>> <daniel.schnel...@centerdevice.com> wrote: >>> >>>> Hello! >>>> >>>> Friendly bump :) >>>> I'd be willing to amend the documentation once I understand what's going >>>> on :D >>>> >>>> Cheers, >>>> Daniel >>>> >>>>> On 18. Mar 2019, at 20:28, Daniel Schneller >>>>> daniel.schnel...@centerdevice.com wrote: >>>>> Hi everyone! >>>>> I assume I am misunderstanding something, but I cannot figure out what it >>>>> is. >>>>> We are using haproxy in AWS, in this case as sidecars to applications so >>>>> they need not >>>>> know about changing backend addresses at all, but can always talk to >>>>> localhost. >>>>> Haproxy listens on localhost and then forwards traffic to an ELB instance. >>>>> This works great, but there have been two occasions now, where due to a >>>>> change in the >>>>> ELB's IP addresses, our services went down, because the backends could >>>>> not be reached >>>>> anymore. I don't understand why haproxy sticks to the old IP address >>>>> instead of going >>>>> to one of the updated ones. >>>>> There is a resolvers section which points to the local dnsmasq instance >>>>> (there to send >>>>> some requests to consul, but that's not used here). All other traffic is >>>>> forwarded on >>>>> to the AWS DNS server set via DHCP. >>>>> I managed to get timely updates and updated backend servers when using >>>>> server-template, >>>>> but form what I understand this should not really be necessary for this. >>>>> This is the trimmed down sidecar config. I have not made any changes to >>>>> dns timeouts etc. >>>>> resolvers default >>>>> >>>>> dnsmasq >>>>> >>>>> ======== >>>>> >>>>> nameserver local 127.0.0.1:53 >>>>> listen regular >>>>> bind 127.0.0.1:9300 >>>>> option dontlog-normal >>>>> server lb-internal loadbalancer-internal.xxx.yyy:9300 resolvers default >>>>> check addr loadbalancer-internal.xxx.yyy port 9300 >>>>> listen templated >>>>> bind 127.0.0.1:9200 >>>>> option dontlog-normal >>>>> option httpchk /haproxy-simple-healthcheck >>>>> server-template lb-internal 2 loadbalancer-internal.xxx.yyy:9200 >>>>> resolvers default check port 9299 >>>>> To simulate changing ELB adresses, I added entries for >>>>> loadbalancer-internal.xxx.yyy in /etc/hosts >>>>> and to be able to control them via dnsmasq. >>>>> I tried different scenarios, but could not reliably predict what would >>>>> happen in all cases. >>>>> The address ending in 52 (marked as "valid" below) is a currently (as of >>>>> the time of testing) >>>>> valid IP for the ELB. The one ending in 199 (marked "invalid") is an >>>>> unused private IP address >>>>> in my VPC. >>>>> Starting with /etc/hosts: >>>>> 10.205.100.52 loadbalancer-internal.xxx.yyy # valid >>>>> 10.205.100.199 loadbalancer-internal.xxx.yyy # invalid >>>>> haproxy starts and reports: >>>>> regular: lb-internal UP/L7OK >>>>> templated: lb-internal1 DOWN/L4TOUT >>>>> lb-internal2 UP/L7OK >>>>> That's expected. Now when I edit /etc/hosts to only contain the invalid >>>>> address >>>>> and restart dnsmasq, I would expect both proxies to go fully down. But >>>>> only the templated >>>>> proxy behaves like that: >>>>> regular: lb-internal UP/L7OK >>>>> templated: lb-internal1 DOWN/L4TOUT >>>>> lb-internal2 MAINT (resolution) >>>>> Reloading haproxy in this state leads to: >>>>> regular: lb-internal DOWN/L4TOUT >>>>> templated: lb-internal1 MAINT (resolution) >>>>> lb-internal2 DOWN/L4TOUT >>>>> After fixing /etc/hosts to include the valid server again and restarting >>>>> dnsmasq: >>>>> regular: lb-internal DOWN/L4TOUT >>>>> templated: lb-internal1 UP/L7OK >>>>> lb-internal2 DOWN/L4TOUT >>>>> Shouldn't the regular proxy also recognize the change and bring the >>>>> backend up or down >>>>> depending on the DNS change? I have waited for several health check >>>>> rounds (seeing >>>>> "* L4TOUT" and "L4TOUT") toggle, but it still never updates. >>>>> I also tried to have only the invalid address in /etc/hosts, then >>>>> restarting haproxy. >>>>> The regular backends will never recognize it when I add the valid one >>>>> back in. >>>>> The templated one does, unless I set it up to have only 1 instead of 2 >>>>> server slots. >>>>> In that case it behaves will also only pick up the valid server when >>>>> reloaded. >>>>> On the other hand, it will recognize when I remove the valid server >>>>> without a reload >>>>> on the next health check, but not bring them back in and make the proxy >>>>> UP when it >>>>> comes back. >>>>> I assume my understanding of something here is broken, and I would gladly >>>>> be told >>>>> about it :) >>>>> Thanks a lot! >>>>> Daniel >>>>> >>>>> Version Info: >>>>> >>>>> -------------- >>>>> >>>>> $ haproxy -vv >>>>> HA-Proxy version 1.8.19-1ppa1~trusty 2019/02/12 >>>>> Copyright 2000-2019 Willy Tarreau wi...@haproxy.org >>>>> Build options : >>>>> TARGET = linux2628 >>>>> CPU = generic >>>>> CC = gcc >>>>> CFLAGS = -O2 -g -O2 -fPIE -fstack-protector --param=ssp-buffer-size=4 >>>>> -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fno-strict-aliasing >>>>> -Wdeclaration-after-statement -fwrapv -Wno-unused-label >>>>> OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 >>>>> USE_LUA=1 USE_PCRE=1 USE_PCRE_JIT=1 USE_NS=1 >>>>> Default settings : >>>>> maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 >>>>> Built with OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014 >>>>> Running on OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014 >>>>> OpenSSL library supports TLS extensions : yes >>>>> OpenSSL library supports SNI : yes >>>>> OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2 >>>>> Built with Lua version : Lua 5.3.1 >>>>> Built with transparent proxy support using: IP_TRANSPARENT >>>>> IPV6_TRANSPARENT IP_FREEBIND >>>>> Encrypted password support via crypt(3): yes >>>>> Built with multi-threading support. >>>>> Built with PCRE version : 8.31 2012-07-06 >>>>> Running on PCRE version : 8.31 2012-07-06 >>>>> PCRE library supports JIT : no (libpcre build without JIT?) >>>>> Built with zlib version : 1.2.8 >>>>> Running on zlib version : 1.2.8 >>>>> Compression algorithms supported : identity("identity"), >>>>> deflate("deflate"), raw-deflate("deflate"), gzip("gzip") >>>>> Built with network namespace support. >>>>> Available polling systems : >>>>> epoll : pref=300, test result OK >>>>> poll : pref=200, test result OK >>>>> select : pref=150, test result OK >>>>> Total: 3 (3 usable), will use epoll. >>>>> Available filters : >>>>> [SPOE] spoe >>>>> [COMP] compression >>>>> [TRACE] trace >>>>> -- >>>>> Daniel Schneller >>>>> Principal Cloud Engineer >>>>> CenterDevice GmbH >>>>> Rheinwerkallee 3 >>>>> 53227 Bonn >>>>> www.centerdevice.com >>>>> >>>>> Geschäftsführung: Dr. Patrick Peschlow, Dr. Lukas Pustina, Michael >>>>> Rosbach, Handelsregister-Nr.: HRB 18655, HR-Gericht: Bonn, USt-IdNr.: >>>>> DE-815299431 >>>>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält >>>>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht >>>>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, >>>>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail >>>>> und evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen >>>>> oder Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe >>>>> dieser E-Mail ist nicht gestattet. >>> >> >