On Tue, Apr 3, 2018 at 4:17 PM, Dale Smith <dal...@gmail.com> wrote: > Hello All, > > I have noticed a potential issue in HAProxy 1.7 where a backend server has > reported in logs an 'unspecified DNS error' and been marked down for > maintenance. > > After some investigation, I have found that our internal DNS server is > responding with non-lowercase domain names in the Answer section. When > HAProxy 'check' performs the lookup in dns.c the validation on the Answer > domain with memcmp fails. > > I'm trying to understand what system is at fault here; the DNS server for > not responding with the same case as the query, or HAProxy which should be > performing a case insensitive match. > I have made some test changes to src/dns.c to implement case insensitive > matching in line 260 (in dns_resolve_recv) and unifying case before > returning in dns_read_name. After these changes I can no longer reproduce > the problem. I'm happy to share my patch file, but expect that your own > libs and implementation will be cleaner. > Relevant RFC: https://tools.ietf.org/html/rfc4343#section-3 > > The current viable workaround appears to be simply matching the case that > the DNS server is configured to reply with. For us this means specifying > something like 'Server.db.example.com' instead of 'server.db.example.com' > in haproxy.cfg. > > > Reproduction steps: > * Use 'check' and 'resolvers' in server section to enable > post-initialisation dns lookups. > * Have the DNS resolver reply in upper/lower/mixed case that does not > match config file. (I can share a small python script that can be used to > test this) > The initial resolution during config read is fine, as this is done via > libc. The subsequent DNS resolutions during checks fail and eventually set > the server as offline. > > --- > Sample configuration to reproduce: > defaults > log global > mode tcp > timeout connect 10s > timeout client 1m > timeout server 1m > timeout check 10s > > resolvers dns-resolver > nameserver ns1 127.0.0.1:1153 > resolve_retries 3 > timeout retry 1s > hold valid 2s # small, so we can reproduce quickly > hold other 10s > > frontend fe15000 > bind :15000 > default_backend be15000 > > backend be15000 > server dest15000 example.com:5000 check resolvers dns-resolver > init-addr libc,none > --- > > Sample DNS response that triggers the issue: > (Note the reply CaSe does not match query. I can also provide a simple > Python server that performs uppercase in its reply, for replication of > this.) > --- > $ dig @127.0.0.1 -p 1153 example.com > <snip> > ;; QUESTION SECTION: > ;EXAMPLE.COM. IN A > > ;; ANSWER SECTION: > EXAMPLE.COM. 60 IN A 127.0.0.1 > <snip> > > > Thanks for any assistance, > Dale Smith > >
Hi Dale, Thanks for the report! Please share your patch here and I'll have a look, so we could merge it. Baptiste