Hello list!

We recently had reports from our users about difficulties receiving mails from 
a specific external domain, caused by our systems unability to resolve the 
sender domain through our pdns recursors.
Right now I am refraining to disclose the domain because I don’t know if this 
behavior could disclose a software/version/configuration with some kind of 
known vulnerability.
After some more… “literal” digging, I’ve found out what follows (X.X.X.X being 
the domain authoritative public dns servers, all of the have the same behavior):

dig @X.X.X.X -t mx thedomain.tld +tries=1 +time=20 +dnssec +norecurse +noadflag
- does not work, tested from multiple locations around the world, from multiple 
operating systems
- this is exactly the kind of query that our pdns recursors are sending out
- I’ve increased the timeout just to be sure

dig @X.X.X.X -t mx thedomain.tld +tries=1 +time=1 +dnssec +norecurse +adflag
- does always work, tested from multiple locations / operating systems
- traffic dump shows consistent 20~30 milliseconds between query packet and 
reply packet

dig @X.X.X.X -t mx thedomain.tld +tries=2 +time=1 +dnssec +norecurse +noadflag
- does always work, tested from multiple locations / operating systems
- traffic dump shows that dig gets the answer after the second try
- note that the two queries have the same Transaction ID

dig @X.X.X.X -t mx thedomain.tld +tries=1 +time=1 +dnssec +norecurse +noadflag;
dig @X.X.X.X3 -t mx thedomain.tld +tries=1 +time=1 +dnssec +norecurse +noadflag
- does not work
- traffic dump shows that both queries do not get any answer
- the two queries obviously have two different Transaction IDs

Long story short:
- remote auth servers correctly replies to non-DNSSEC queries and to DNSSEC 
queries with AD bit set
- remote auth servers does NOT reply to DNSSEC queries with AD bit off
- …but they do reply if you resend the same query with the same transaction ID!
(this last one sounds super strange to me but trust me I’ve double- triple- and 
multiple- checked!)

We are using PowerDNS Recursor 4.1.8 on Linux x64, we also can replicate the 
same behavior on other test setups with pdns-recursors with completely default 
configuration and it is also perfectly reproducible simply using dig.
I’m not super expert in dns details but my guess is that pdns is not doing 
anything wrong and its queries, reproduced by the above “dig” commands, are 
perfectly ok and valid.
The same domain results in "All Queries to dns1.domain.tld for domain.tld/A 
timed out or failed” when trying with Verisign Labs DNSSEC Analyzer ( 
https://dnssec-analyzer.verisignlabs.com/ )
Public dns services (I tried Cloudflare and Google) do resolve correctly that 
domain, my guess is that they are doing queries with different flags and/or 
that they have some kind of workaround for this specific defect.

I’d like to ask you guys:
- have any of you observed the same kind of problems out in the wild?
- any idea on how to workaround the problem in pdns-recursor (short of 
completely disabling DNSSEC, which of course we are not going to do)? as far as 
I know it is not possible to configure it to retry two times the same server, 
it always goes to the next available one after network-timeout
- any idea on how the big public services are successfully avoiding this 
problem?

thanks,
--
Luca Lesinigo
LM Networks Srl

_______________________________________________
Pdns-users mailing list
[email protected]
https://mailman.powerdns.com/mailman/listinfo/pdns-users

Reply via email to