Hello All,

I have noticed a potential issue in HAProxy 1.7 where a backend server has
reported in logs an 'unspecified DNS error' and been marked down for
maintenance.

After some investigation, I have found that our internal DNS server is
responding with non-lowercase domain names in the Answer section. When
HAProxy 'check' performs the lookup in dns.c the validation on the Answer
domain with memcmp fails.

I'm trying to understand what system is at fault here; the DNS server for
not responding with the same case as the query, or HAProxy which should be
performing a case insensitive match.
I have made some test changes to src/dns.c to implement case insensitive
matching in line 260 (in dns_resolve_recv) and unifying case before
returning in dns_read_name. After these changes I can no longer reproduce
the problem. I'm happy to share my patch file, but expect that your own
libs and implementation will be cleaner.
Relevant RFC: https://tools.ietf.org/html/rfc4343#section-3

The current viable workaround appears to be simply matching the case that
the DNS server is configured to reply with. For us this means specifying
something like 'Server.db.example.com' instead of 'server.db.example.com'
in haproxy.cfg.


Reproduction steps:
* Use 'check' and 'resolvers' in server section to enable
post-initialisation dns lookups.
* Have the DNS resolver reply in upper/lower/mixed case that does not match
config file. (I can share a small python script that can be used to test
this)
The initial resolution during config read is fine, as this is done via
libc. The subsequent DNS resolutions during checks fail and eventually set
the server as offline.

---
Sample configuration to reproduce:
defaults
    log                     global
    mode                    tcp
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout check           10s

resolvers dns-resolver
    nameserver ns1 127.0.0.1:1153
    resolve_retries      3
    timeout retry        1s
    hold valid           2s # small, so we can reproduce quickly
    hold other           10s

frontend fe15000
    bind :15000
    default_backend be15000

backend be15000
    server dest15000 example.com:5000 check resolvers dns-resolver
init-addr libc,none
---

Sample DNS response that triggers the issue:
(Note the reply CaSe does not match query. I can also provide a simple
Python server that performs uppercase in its reply, for replication of
this.)
---
$ dig @127.0.0.1 -p 1153 example.com
<snip>
;; QUESTION SECTION:
;EXAMPLE.COM. IN A

;; ANSWER SECTION:
EXAMPLE.COM. 60 IN A 127.0.0.1
<snip>


Thanks for any assistance,
Dale Smith

Reply via email to