Re: DNS Resolver Issues

Daniel Schneller Thu, 21 Mar 2019 12:38:55 -0700

Hi!

Thanks for the response. I had looked at the "hold" directives, but since they 
all seem to have reasonable defaults, I did not touch them.
I specified 10s explictly, but it did not make a difference.


I did some more tests, however, and it seems to have more to do with the number 
of responses for the initial(?) DNS queries.
Hopefully these three tables make sense and don't get mangled in the mail. The 
"templated"
proxy is defined via "server-template" with 3 "slots". The "regular" one just 
as "server".


Test 1: Start out  with both "valid" and "broken" DNS entries. Then comment 
out/add back
one at a time as described in (1)-(5). 
Each time after changing /etc/hosts, restart dnsmasq and check haproxy via 
hatop.
Haproxy started fresh once dnsmasq was set up to (1).

                       |  state           state
            /etc/hosts |  regular         templated
           ------------|-----------------------------
(1)         BRK        |  UP/L7OK         DOWN/L4TOUT
            VALID      |                  MAINT/resolution
                       |                  UP/L7OK
           ------------|--------------------------------

(2)         BRK        |  DOWN/L4TOUT     DOWN/L4TOUT
            #VALID     |                  MAINT/resolution
                       |                  MAINT/resolution
                       |  
(3)         #BRK       |  UP/L7OK         UP/L7OK
            VALID      |                  MAINT/resolution
                       |                  MAINT/resolution
           ------------|------------------------------------
(4)         BRK        |  UP/L7OK         UP/L7OK
            VALID      |                  DOWN/L4TOUT
                       |                  MAINT/resolution
           ------------|------------------------------------
(5)         BRK        |  DOWN/L4TOUT     DOWN/L4TOUT
            #VALID     |                  MAINT/resolution
                       |                  MAINT/resolution                      
                  
                      
This all looks normal and as expected. As soon as the "VALID" DNS entry is 
present, the
UP state follows within a few seconds.
                              


Test 2: Start out "valid only" (1) and proceed as described in (2)-(5), again 
restarting
dnsmasq each time, and haproxy reloaded after dnsmasq was set up to (1).

                       |  state           state
            /etc/hosts |  regular         templated
           ------------|------------------------------------
(1)         #BRK       |  UP/L7OK         MAINT/resolution
            VALID      |                  MAINT/resolution
                       |                  UP/L7OK
           ------------|------------------------------------
(2)         BRK        |  UP/L7OK         DOWN/L4TOUT
            VALID      |                  MAINT/resolution
                       |                  UP/L7OK
           ------------|------------------------------------
(3)         #BRK       |  UP/L7OK         MAINT/resolution
            VALID      |                  MAINT/resolution
                       |                  UP/L7OK
           ------------|------------------------------------
(4)         BRK        |  UP/L7OK         DOWN/L4TOUT
            VALID      |                  MAINT/resolution
                       |                  UP/L7OK
           ------------|------------------------------------
(5)         BRK        |  DOWN/L4TOUT     DOWN/L4TOUT
            #VALID     |                  MAINT/resolution
                       |                  MAINT/resolution                      
                  
                              
Everything good here, too. Adding the broken DNS entry does not bring the 
proxies down
until only the broken one is left.



Test 3: Start out "broken only" (1).
Again, same as before, haproxy restarted once dnsmasq was initialized to (1).

                       |  state           state
            /etc/hosts |  regular         templated
           ------------|------------------------------------
(1)         BRK        |  DOWN/L4TOUT     DOWN/L4TOUT
            #VALID     |                  MAINT/resolution
                       |                  MAINT/resolution
           ------------|------------------------------------      
(2)         BRK        |  DOWN/L4TOUT     UP/L7OK
            VALID      |                  MAINT/resolution
                       |                  MAINT/resolution
           ------------|------------------------------------      
(3)         #BRK       |  UP/L7OK         MAINT/resolution
            VALID      |                  UP/L7OK
                       |                  MAINT/resolution
           ------------|------------------------------------      
(4)         BRK        |  UP/L7OK         DOWN/L4TOUT
            VALID      |                  UP/L7OK
                       |                  MAINT/resolution
           ------------|------------------------------------      
(5)         BRK        |  DOWN/L4TOUT     DOWN/L4TOUT
            #VALID     |                  MAINT/resolution
                       |                  MAINT/resolution                      
                  


Here it becomes interesting. In (1) both regular and templated proxies are 
DOWN, of course.
However, adding in a second DNS response in (2) brings the templated proxy UP, 
but the regular
one stays DOWN. Only when in (3) the valid response is the only one presented, 
does it go 
UP as well. Adding the broken one back (4) is of no consequence then. And 
again, after
leaving just the broken response (5), both correctly go DOWN.

So it would appear that if haproxy starts with just a single "broken" DNS 
response, adding
a healthy one later one is not recognized. Instead, it stays DOWN. "Replacing" 
the single
broken response with a single "valid" response, however, brings it to life, and 
it won't be 
discouraged by bringing the broken one back in. 

Tests 1 and 2 make sense to me, but test 3 I don't understand. For now, I have 
worked
around the issue by defining all my relevant backends with server-template and 
at least
2 slots, but I would still like to understand it. And maybe it is a bug, after 
all ;)

Kind regards, and thanks for a great piece of software!

Daniel





> On 21. Mar 2019, at 14:28, Bruno Henc <brh...@nua-avenir.net> wrote:
> 
> Hello Daniel,
> 
> 
> You might be missing the hold-valid directive in your resolvers section: 
> https://www.haproxy.com/documentation/hapee/1-9r1/onepage/#5.3.2-timeout
> 
> This should force HAProxy to fetch the DNS record values from the resolver.
> 
> A reload of the HAProxy instance also forces the instances to query all 
> records from the resolver.
> 
> Can you please retest with the updated configuration and report back the 
> results?
> 
> 
> Best regards,
> 
> Bruno Henc
> 
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Thursday, March 21, 2019 12:09 PM, Daniel Schneller 
> <daniel.schnel...@centerdevice.com> wrote:
> 
>> Hello!
>> 
>> Friendly bump :)
>> I'd be willing to amend the documentation once I understand what's going on 
>> :D
>> 
>> Cheers,
>> Daniel
>> 
>>> On 18. Mar 2019, at 20:28, Daniel Schneller 
>>> daniel.schnel...@centerdevice.com wrote:
>>> Hi everyone!
>>> I assume I am misunderstanding something, but I cannot figure out what it 
>>> is.
>>> We are using haproxy in AWS, in this case as sidecars to applications so 
>>> they need not
>>> know about changing backend addresses at all, but can always talk to 
>>> localhost.
>>> Haproxy listens on localhost and then forwards traffic to an ELB instance.
>>> This works great, but there have been two occasions now, where due to a 
>>> change in the
>>> ELB's IP addresses, our services went down, because the backends could not 
>>> be reached
>>> anymore. I don't understand why haproxy sticks to the old IP address 
>>> instead of going
>>> to one of the updated ones.
>>> There is a resolvers section which points to the local dnsmasq instance 
>>> (there to send
>>> some requests to consul, but that's not used here). All other traffic is 
>>> forwarded on
>>> to the AWS DNS server set via DHCP.
>>> I managed to get timely updates and updated backend servers when using 
>>> server-template,
>>> but form what I understand this should not really be necessary for this.
>>> This is the trimmed down sidecar config. I have not made any changes to dns 
>>> timeouts etc.
>>> resolvers default
>>> 
>>> dnsmasq
>>> 
>>> ========
>>> 
>>> nameserver local 127.0.0.1:53
>>> listen regular
>>> bind 127.0.0.1:9300
>>> option dontlog-normal
>>> server lb-internal loadbalancer-internal.xxx.yyy:9300 resolvers default 
>>> check addr loadbalancer-internal.xxx.yyy port 9300
>>> listen templated
>>> bind 127.0.0.1:9200
>>> option dontlog-normal
>>> option httpchk /haproxy-simple-healthcheck
>>> server-template lb-internal 2 loadbalancer-internal.xxx.yyy:9200 resolvers 
>>> default check port 9299
>>> To simulate changing ELB adresses, I added entries for 
>>> loadbalancer-internal.xxx.yyy in /etc/hosts
>>> and to be able to control them via dnsmasq.
>>> I tried different scenarios, but could not reliably predict what would 
>>> happen in all cases.
>>> The address ending in 52 (marked as "valid" below) is a currently (as of 
>>> the time of testing)
>>> valid IP for the ELB. The one ending in 199 (marked "invalid") is an unused 
>>> private IP address
>>> in my VPC.
>>> Starting with /etc/hosts:
>>> 10.205.100.52 loadbalancer-internal.xxx.yyy # valid
>>> 10.205.100.199 loadbalancer-internal.xxx.yyy # invalid
>>> haproxy starts and reports:
>>> regular: lb-internal UP/L7OK
>>> templated: lb-internal1 DOWN/L4TOUT
>>> lb-internal2 UP/L7OK
>>> That's expected. Now when I edit /etc/hosts to only contain the invalid 
>>> address
>>> and restart dnsmasq, I would expect both proxies to go fully down. But only 
>>> the templated
>>> proxy behaves like that:
>>> regular: lb-internal UP/L7OK
>>> templated: lb-internal1 DOWN/L4TOUT
>>> lb-internal2 MAINT (resolution)
>>> Reloading haproxy in this state leads to:
>>> regular: lb-internal DOWN/L4TOUT
>>> templated: lb-internal1 MAINT (resolution)
>>> lb-internal2 DOWN/L4TOUT
>>> After fixing /etc/hosts to include the valid server again and restarting 
>>> dnsmasq:
>>> regular: lb-internal DOWN/L4TOUT
>>> templated: lb-internal1 UP/L7OK
>>> lb-internal2 DOWN/L4TOUT
>>> Shouldn't the regular proxy also recognize the change and bring the backend 
>>> up or down
>>> depending on the DNS change? I have waited for several health check rounds 
>>> (seeing
>>> "* L4TOUT" and "L4TOUT") toggle, but it still never updates.
>>> I also tried to have only the invalid address in /etc/hosts, then 
>>> restarting haproxy.
>>> The regular backends will never recognize it when I add the valid one back 
>>> in.
>>> The templated one does, unless I set it up to have only 1 instead of 2 
>>> server slots.
>>> In that case it behaves will also only pick up the valid server when 
>>> reloaded.
>>> On the other hand, it will recognize when I remove the valid server without 
>>> a reload
>>> on the next health check, but not bring them back in and make the proxy UP 
>>> when it
>>> comes back.
>>> I assume my understanding of something here is broken, and I would gladly 
>>> be told
>>> about it :)
>>> Thanks a lot!
>>> Daniel
>>> 
>>> Version Info:
>>> 
>>> --------------
>>> 
>>> $ haproxy -vv
>>> HA-Proxy version 1.8.19-1ppa1~trusty 2019/02/12
>>> Copyright 2000-2019 Willy Tarreau wi...@haproxy.org
>>> Build options :
>>> TARGET = linux2628
>>> CPU = generic
>>> CC = gcc
>>> CFLAGS = -O2 -g -O2 -fPIE -fstack-protector --param=ssp-buffer-size=4 
>>> -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fno-strict-aliasing 
>>> -Wdeclaration-after-statement -fwrapv -Wno-unused-label
>>> OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 
>>> USE_LUA=1 USE_PCRE=1 USE_PCRE_JIT=1 USE_NS=1
>>> Default settings :
>>> maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
>>> Built with OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
>>> Running on OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
>>> OpenSSL library supports TLS extensions : yes
>>> OpenSSL library supports SNI : yes
>>> OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
>>> Built with Lua version : Lua 5.3.1
>>> Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT 
>>> IP_FREEBIND
>>> Encrypted password support via crypt(3): yes
>>> Built with multi-threading support.
>>> Built with PCRE version : 8.31 2012-07-06
>>> Running on PCRE version : 8.31 2012-07-06
>>> PCRE library supports JIT : no (libpcre build without JIT?)
>>> Built with zlib version : 1.2.8
>>> Running on zlib version : 1.2.8
>>> Compression algorithms supported : identity("identity"), 
>>> deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
>>> Built with network namespace support.
>>> Available polling systems :
>>> epoll : pref=300, test result OK
>>> poll : pref=200, test result OK
>>> select : pref=150, test result OK
>>> Total: 3 (3 usable), will use epoll.
>>> Available filters :
>>> [SPOE] spoe
>>> [COMP] compression
>>> [TRACE] trace
>>> --
>>> Daniel Schneller
>>> Principal Cloud Engineer
>>> CenterDevice GmbH
>>> Rheinwerkallee 3
>>> 53227 Bonn
>>> www.centerdevice.com
>>> 
>>> Geschäftsführung: Dr. Patrick Peschlow, Dr. Lukas Pustina, Michael Rosbach, 
>>> Handelsregister-Nr.: HRB 18655, HR-Gericht: Bonn, USt-IdNr.: DE-815299431
>>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche 
>>> und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige 
>>> Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie 
>>> bitte sofort den Absender und löschen Sie diese E-Mail und evtl. 
>>> beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen 
>>> evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist 
>>> nicht gestattet.
> 
>

Re: DNS Resolver Issues

Reply via email to