Re: DNS Resolver Issues

Daniel Schneller Sun, 24 Mar 2019 17:11:25 -0700

Hello!

I am currently on vacation for two weeks, but I'll see to it when I get back.
There is no particular reason for the specific check address here, as you 
correctly figured. It is just an artefact of the template used to create the 
configuration. I can remove that, but there might be cases were it matters 
(though I don't think we have any ATM AFAIR). Would not have guessed there 
would be different resolution paths; if this is intentional, a note in the 
documentation would be helpful. I can provide that when I am back and when 
there is clarity on why it would be like this.


Thank you very much for your help!

Cheers,
Daniel


> On 23. Mar 2019, at 14:53, PiBa-NL <piba.nl....@gmail.com> wrote:
> 
> Hi Daniel, Baptiste,
> 
> @Daniel, can you remove the 'addr loadbalancer-internal.xxx.yyy' from the 
> server check? It seems to me that that name is not being resolved by the 
> 'resolvers'. And even if it would it would be kinda redundant as it is in the 
> example as it is the same as the servername.?. Not sure how far below 
> scenarios are all explained by this though..
> 
> @Baptiste, is it intentional that a wrong 'addr' dns name makes haproxy fail 
> to start despite having the supposedly never failing 'default-server 
> init-addr last,libc,none' ? Is it possibly a good feature request to support 
> re-resolving a dns name for the addr setting as well ?
> 
> Regards,
> PiBa-NL (Pieter)
> 
> Op 21-3-2019 om 20:37 schreef Daniel Schneller:
>> Hi!
>> 
>> Thanks for the response. I had looked at the "hold" directives, but since 
>> they all seem to have reasonable defaults, I did not touch them.
>> I specified 10s explictly, but it did not make a difference.
>> 
>> I did some more tests, however, and it seems to have more to do with the 
>> number of responses for the initial(?) DNS queries.
>> Hopefully these three tables make sense and don't get mangled in the mail. 
>> The "templated"
>> proxy is defined via "server-template" with 3 "slots". The "regular" one 
>> just as "server".
>> 
>> 
>> Test 1: Start out  with both "valid" and "broken" DNS entries. Then comment 
>> out/add back
>> one at a time as described in (1)-(5).
>> Each time after changing /etc/hosts, restart dnsmasq and check haproxy via 
>> hatop.
>> Haproxy started fresh once dnsmasq was set up to (1).
>> 
>>                        |  state           state
>>             /etc/hosts |  regular         templated
>>            ------------|-----------------------------
>> (1)         BRK        |  UP/L7OK         DOWN/L4TOUT
>>             VALID      |                  MAINT/resolution
>>                        |                  UP/L7OK
>>            ------------|--------------------------------
>> 
>> (2)         BRK        |  DOWN/L4TOUT     DOWN/L4TOUT
>>             #VALID     |                  MAINT/resolution
>>                        |                  MAINT/resolution
>>                        |
>> (3)         #BRK       |  UP/L7OK         UP/L7OK
>>             VALID      |                  MAINT/resolution
>>                        |                  MAINT/resolution
>>            ------------|------------------------------------
>> (4)         BRK        |  UP/L7OK         UP/L7OK
>>             VALID      |                  DOWN/L4TOUT
>>                        |                  MAINT/resolution
>>            ------------|------------------------------------
>> (5)         BRK        |  DOWN/L4TOUT     DOWN/L4TOUT
>>             #VALID     |                  MAINT/resolution
>>                        |                  MAINT/resolution
>>                       This all looks normal and as expected. As soon as the 
>> "VALID" DNS entry is present, the
>> UP state follows within a few seconds.
>>                               
>> 
>> Test 2: Start out "valid only" (1) and proceed as described in (2)-(5), 
>> again restarting
>> dnsmasq each time, and haproxy reloaded after dnsmasq was set up to (1).
>> 
>>                        |  state           state
>>             /etc/hosts |  regular         templated
>>            ------------|------------------------------------
>> (1)         #BRK       |  UP/L7OK         MAINT/resolution
>>             VALID      |                  MAINT/resolution
>>                        |                  UP/L7OK
>>            ------------|------------------------------------
>> (2)         BRK        |  UP/L7OK         DOWN/L4TOUT
>>             VALID      |                  MAINT/resolution
>>                        |                  UP/L7OK
>>            ------------|------------------------------------
>> (3)         #BRK       |  UP/L7OK         MAINT/resolution
>>             VALID      |                  MAINT/resolution
>>                        |                  UP/L7OK
>>            ------------|------------------------------------
>> (4)         BRK        |  UP/L7OK         DOWN/L4TOUT
>>             VALID      |                  MAINT/resolution
>>                        |                  UP/L7OK
>>            ------------|------------------------------------
>> (5)         BRK        |  DOWN/L4TOUT     DOWN/L4TOUT
>>             #VALID     |                  MAINT/resolution
>>                        |                  MAINT/resolution
>>                               Everything good here, too. Adding the broken 
>> DNS entry does not bring the proxies down
>> until only the broken one is left.
>> 
>> 
>> 
>> Test 3: Start out "broken only" (1).
>> Again, same as before, haproxy restarted once dnsmasq was initialized to (1).
>> 
>>                        |  state           state
>>             /etc/hosts |  regular         templated
>>            ------------|------------------------------------
>> (1)         BRK        |  DOWN/L4TOUT     DOWN/L4TOUT
>>             #VALID     |                  MAINT/resolution
>>                        |                  MAINT/resolution
>>            ------------|------------------------------------
>> (2)         BRK        |  DOWN/L4TOUT     UP/L7OK
>>             VALID      |                  MAINT/resolution
>>                        |                  MAINT/resolution
>>            ------------|------------------------------------
>> (3)         #BRK       |  UP/L7OK         MAINT/resolution
>>             VALID      |                  UP/L7OK
>>                        |                  MAINT/resolution
>>            ------------|------------------------------------
>> (4)         BRK        |  UP/L7OK         DOWN/L4TOUT
>>             VALID      |                  UP/L7OK
>>                        |                  MAINT/resolution
>>            ------------|------------------------------------
>> (5)         BRK        |  DOWN/L4TOUT     DOWN/L4TOUT
>>             #VALID     |                  MAINT/resolution
>>                        |                  MAINT/resolution
>> 
>> 
>> Here it becomes interesting. In (1) both regular and templated proxies are 
>> DOWN, of course.
>> However, adding in a second DNS response in (2) brings the templated proxy 
>> UP, but the regular
>> one stays DOWN. Only when in (3) the valid response is the only one 
>> presented, does it go
>> UP as well. Adding the broken one back (4) is of no consequence then. And 
>> again, after
>> leaving just the broken response (5), both correctly go DOWN.
>> 
>> So it would appear that if haproxy starts with just a single "broken" DNS 
>> response, adding
>> a healthy one later one is not recognized. Instead, it stays DOWN. 
>> "Replacing" the single
>> broken response with a single "valid" response, however, brings it to life, 
>> and it won't be
>> discouraged by bringing the broken one back in.
>> 
>> Tests 1 and 2 make sense to me, but test 3 I don't understand. For now, I 
>> have worked
>> around the issue by defining all my relevant backends with server-template 
>> and at least
>> 2 slots, but I would still like to understand it. And maybe it is a bug, 
>> after all ;)
>> 
>> Kind regards, and thanks for a great piece of software!
>> 
>> Daniel
>> 
>> 
>> 
>> 
>> 
>>> On 21. Mar 2019, at 14:28, Bruno Henc <brh...@nua-avenir.net> wrote:
>>> 
>>> Hello Daniel,
>>> 
>>> 
>>> You might be missing the hold-valid directive in your resolvers section: 
>>> https://www.haproxy.com/documentation/hapee/1-9r1/onepage/#5.3.2-timeout
>>> 
>>> This should force HAProxy to fetch the DNS record values from the resolver.
>>> 
>>> A reload of the HAProxy instance also forces the instances to query all 
>>> records from the resolver.
>>> 
>>> Can you please retest with the updated configuration and report back the 
>>> results?
>>> 
>>> 
>>> Best regards,
>>> 
>>> Bruno Henc
>>> 
>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>> On Thursday, March 21, 2019 12:09 PM, Daniel Schneller 
>>> <daniel.schnel...@centerdevice.com> wrote:
>>> 
>>>> Hello!
>>>> 
>>>> Friendly bump :)
>>>> I'd be willing to amend the documentation once I understand what's going 
>>>> on :D
>>>> 
>>>> Cheers,
>>>> Daniel
>>>> 
>>>>> On 18. Mar 2019, at 20:28, Daniel Schneller 
>>>>> daniel.schnel...@centerdevice.com wrote:
>>>>> Hi everyone!
>>>>> I assume I am misunderstanding something, but I cannot figure out what it 
>>>>> is.
>>>>> We are using haproxy in AWS, in this case as sidecars to applications so 
>>>>> they need not
>>>>> know about changing backend addresses at all, but can always talk to 
>>>>> localhost.
>>>>> Haproxy listens on localhost and then forwards traffic to an ELB instance.
>>>>> This works great, but there have been two occasions now, where due to a 
>>>>> change in the
>>>>> ELB's IP addresses, our services went down, because the backends could 
>>>>> not be reached
>>>>> anymore. I don't understand why haproxy sticks to the old IP address 
>>>>> instead of going
>>>>> to one of the updated ones.
>>>>> There is a resolvers section which points to the local dnsmasq instance 
>>>>> (there to send
>>>>> some requests to consul, but that's not used here). All other traffic is 
>>>>> forwarded on
>>>>> to the AWS DNS server set via DHCP.
>>>>> I managed to get timely updates and updated backend servers when using 
>>>>> server-template,
>>>>> but form what I understand this should not really be necessary for this.
>>>>> This is the trimmed down sidecar config. I have not made any changes to 
>>>>> dns timeouts etc.
>>>>> resolvers default
>>>>> 
>>>>> dnsmasq
>>>>> 
>>>>> ========
>>>>> 
>>>>> nameserver local 127.0.0.1:53
>>>>> listen regular
>>>>> bind 127.0.0.1:9300
>>>>> option dontlog-normal
>>>>> server lb-internal loadbalancer-internal.xxx.yyy:9300 resolvers default 
>>>>> check addr loadbalancer-internal.xxx.yyy port 9300
>>>>> listen templated
>>>>> bind 127.0.0.1:9200
>>>>> option dontlog-normal
>>>>> option httpchk /haproxy-simple-healthcheck
>>>>> server-template lb-internal 2 loadbalancer-internal.xxx.yyy:9200 
>>>>> resolvers default check port 9299
>>>>> To simulate changing ELB adresses, I added entries for 
>>>>> loadbalancer-internal.xxx.yyy in /etc/hosts
>>>>> and to be able to control them via dnsmasq.
>>>>> I tried different scenarios, but could not reliably predict what would 
>>>>> happen in all cases.
>>>>> The address ending in 52 (marked as "valid" below) is a currently (as of 
>>>>> the time of testing)
>>>>> valid IP for the ELB. The one ending in 199 (marked "invalid") is an 
>>>>> unused private IP address
>>>>> in my VPC.
>>>>> Starting with /etc/hosts:
>>>>> 10.205.100.52 loadbalancer-internal.xxx.yyy # valid
>>>>> 10.205.100.199 loadbalancer-internal.xxx.yyy # invalid
>>>>> haproxy starts and reports:
>>>>> regular: lb-internal UP/L7OK
>>>>> templated: lb-internal1 DOWN/L4TOUT
>>>>> lb-internal2 UP/L7OK
>>>>> That's expected. Now when I edit /etc/hosts to only contain the invalid 
>>>>> address
>>>>> and restart dnsmasq, I would expect both proxies to go fully down. But 
>>>>> only the templated
>>>>> proxy behaves like that:
>>>>> regular: lb-internal UP/L7OK
>>>>> templated: lb-internal1 DOWN/L4TOUT
>>>>> lb-internal2 MAINT (resolution)
>>>>> Reloading haproxy in this state leads to:
>>>>> regular: lb-internal DOWN/L4TOUT
>>>>> templated: lb-internal1 MAINT (resolution)
>>>>> lb-internal2 DOWN/L4TOUT
>>>>> After fixing /etc/hosts to include the valid server again and restarting 
>>>>> dnsmasq:
>>>>> regular: lb-internal DOWN/L4TOUT
>>>>> templated: lb-internal1 UP/L7OK
>>>>> lb-internal2 DOWN/L4TOUT
>>>>> Shouldn't the regular proxy also recognize the change and bring the 
>>>>> backend up or down
>>>>> depending on the DNS change? I have waited for several health check 
>>>>> rounds (seeing
>>>>> "* L4TOUT" and "L4TOUT") toggle, but it still never updates.
>>>>> I also tried to have only the invalid address in /etc/hosts, then 
>>>>> restarting haproxy.
>>>>> The regular backends will never recognize it when I add the valid one 
>>>>> back in.
>>>>> The templated one does, unless I set it up to have only 1 instead of 2 
>>>>> server slots.
>>>>> In that case it behaves will also only pick up the valid server when 
>>>>> reloaded.
>>>>> On the other hand, it will recognize when I remove the valid server 
>>>>> without a reload
>>>>> on the next health check, but not bring them back in and make the proxy 
>>>>> UP when it
>>>>> comes back.
>>>>> I assume my understanding of something here is broken, and I would gladly 
>>>>> be told
>>>>> about it :)
>>>>> Thanks a lot!
>>>>> Daniel
>>>>> 
>>>>> Version Info:
>>>>> 
>>>>> --------------
>>>>> 
>>>>> $ haproxy -vv
>>>>> HA-Proxy version 1.8.19-1ppa1~trusty 2019/02/12
>>>>> Copyright 2000-2019 Willy Tarreau wi...@haproxy.org
>>>>> Build options :
>>>>> TARGET = linux2628
>>>>> CPU = generic
>>>>> CC = gcc
>>>>> CFLAGS = -O2 -g -O2 -fPIE -fstack-protector --param=ssp-buffer-size=4 
>>>>> -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fno-strict-aliasing 
>>>>> -Wdeclaration-after-statement -fwrapv -Wno-unused-label
>>>>> OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 
>>>>> USE_LUA=1 USE_PCRE=1 USE_PCRE_JIT=1 USE_NS=1
>>>>> Default settings :
>>>>> maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
>>>>> Built with OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
>>>>> Running on OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
>>>>> OpenSSL library supports TLS extensions : yes
>>>>> OpenSSL library supports SNI : yes
>>>>> OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
>>>>> Built with Lua version : Lua 5.3.1
>>>>> Built with transparent proxy support using: IP_TRANSPARENT 
>>>>> IPV6_TRANSPARENT IP_FREEBIND
>>>>> Encrypted password support via crypt(3): yes
>>>>> Built with multi-threading support.
>>>>> Built with PCRE version : 8.31 2012-07-06
>>>>> Running on PCRE version : 8.31 2012-07-06
>>>>> PCRE library supports JIT : no (libpcre build without JIT?)
>>>>> Built with zlib version : 1.2.8
>>>>> Running on zlib version : 1.2.8
>>>>> Compression algorithms supported : identity("identity"), 
>>>>> deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
>>>>> Built with network namespace support.
>>>>> Available polling systems :
>>>>> epoll : pref=300, test result OK
>>>>> poll : pref=200, test result OK
>>>>> select : pref=150, test result OK
>>>>> Total: 3 (3 usable), will use epoll.
>>>>> Available filters :
>>>>> [SPOE] spoe
>>>>> [COMP] compression
>>>>> [TRACE] trace
>>>>> --
>>>>> Daniel Schneller
>>>>> Principal Cloud Engineer
>>>>> CenterDevice GmbH
>>>>> Rheinwerkallee 3
>>>>> 53227 Bonn
>>>>> www.centerdevice.com
>>>>> 
>>>>> Geschäftsführung: Dr. Patrick Peschlow, Dr. Lukas Pustina, Michael 
>>>>> Rosbach, Handelsregister-Nr.: HRB 18655, HR-Gericht: Bonn, USt-IdNr.: 
>>>>> DE-815299431
>>>>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält 
>>>>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht 
>>>>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, 
>>>>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail 
>>>>> und evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen 
>>>>> oder Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe 
>>>>> dieser E-Mail ist nicht gestattet.
>>> 
>> 
>

Re: DNS Resolver Issues

Reply via email to