Ok, thanks - that makes sense in terms of the 'incomplete' entry being cached.
I might set up a couple of dns servers to simulate this at some point - I'm going to want a reproducible setup for our own testing as well... If I can then I'll come back with that log... Actually, maybe I already have it, let me check... Nope - I have that up to the dnsmasq restart, not the software restart. Cheers, John On Thu, 4 Apr 2019 at 16:27, Simon Kelley <si...@thekelleys.org.uk> wrote: > On 30/03/2019 08:41, John Robson wrote: > > Simon, > > > > The upstream server is authoritative for the initial domain (being > > inside an organisation I don’t think that’s unusual) and the incomplete > > (but perfectly valid, I agree) response is taken as complete. The > > upstream server does do recursion as well, but when that failed it just > > returned what it could (seems reasonable enough). > > > > I’d have thought that the lack of an actual resolved A record (which is > > what was asked for) would mark the cache entry as incomplete at best. > > This is pure gut, not a technically based statement. > > A CNAME reply with no record for the target of the CNAME, from a > recursive server, establishes that the target doesn't exist. If it were > otherwise, there would be large numbers of legitimate answers which are > uncachable. Consider that there are many record types and the target of > a CNAME will not exist for most record types. > > As a common example, an IPv6 enabled host will query for the AAAA record > of something it wants to talk to. If hostname is a CNAME, and the thing > it want's to talk to doesn't have an AAAA record, then the reply will be > a CNAME with no target. You really want to be able to cache that. > > > > > > And whilst I agree that the record was cached (and that that is probably > > technically correct) I can’t then explain why dnsmasq stopped using the > > cache when I restarted my program - with 45+ minutes of cache left, > > dnsmasq went back to the upstream server and got a complete answer. > > > > Restarting dnsmasq obviously reset the cache, and everything recovered > > when I did that - but restarting other software shouldn’t have magically > > reset the cache, and yet it did. > > > I can't explain that. If it's reproducible, run dnsmasq with > --log-queries set and see exactly what's going on. > > > > > > (Un)Fortunately the second/third nameservers seem to be being better > > behaved at the moment, so we haven’t seen the incomplete response in > > several days - kind of makes it harder to test though. > > Not reproducible, then. That's a pity. > > > Cheers, > > Simon. > > > > > Cheers, > > > > John > > > > > > > > On Fri, 29 Mar 2019 at 22:43, Simon Kelley <si...@thekelleys.org.uk > > <mailto:si...@thekelleys.org.uk>> wrote: > > > > On 21/03/2019 11:01, John Robson wrote: > > > OK, > > > > > > Maybe this does reveal something about the caching... > > > Which might be expected behaviour, but I am not convinced it's > > useful... > > > > > > Overnight monitoring has shown that the upstream server does > > > occasionally send back an incomplete (but perfectly valid) CNAME > only > > > response. Mostly I can justify the caching behaviour based on the > > TTLs > > > of the second CNAME or A record (the server is authoritative for > the > > > first CNAME, so that's always at 3600). > > > > > > As a slight aside: > > > dnsmasq sends a query at 22:57:32.599, then again (new transaction > id) > > > at 22:57:33.601, and at 22:57:36.601. > > > This last query gets a response in 0.1 seconds, both the others > > > eventually come in (incomplete) at 22:57:44.073 > > > I am assuming that dnsmasq ignored these late arrivals (either due > > to a > > > default timeout, or just because a better answer has been received > - > > > this would be comparable with behaviour when it queries multiple > > servers > > > to decide which is 'best'). > > > In this case we are protected by the fact that the incomplete query > > > takes far longer than the complete one due to timeouts. > > > > > > Later though: > > > At 01:12:47 we are out of TTL, so send a request, and get an > > incomplete > > > response... The response only contains the first CNAME, which has > > a 3600 > > > TTL. > > > > > > Then dnsmasq doesn't send another query for an hour - despite the > fact > > > that it doesn't have a "good" answer. > > > In this case the query it sends after an hour gets incomplete > response > > > again - not good. > > > Then I lost track because the container got moved to a different > > host - > > > but it looks like it was returning incomplete for several hours... > > > > > > > > > dnsmasq is otherwise well behaved - it is still responding to other > > > queries just fine, despite being hammered by more than 2k > > queries/second > > > > > > Two questions: > > > - Is it correct/wanted behaviour to cache an incomplete record > > like this? > > > I have no issue caching the cname, but should we keep trying to > > resolve > > > the cname to an a record? > > > > > > - Why/How does a restart of the querying program change the > caching > > > behaviour of dnsmasq? > > > Because even if the program is restarted after just a few minutes > it > > > immediately gets better data - my capture from yesterday shows that > > > despite the fact that the TTL had 2855 seconds (of the 3600 > default) > > > left just two minutes before the first 'new process' request comes > in, > > > that new request triggers an outbound query. > > > > > > > > > Cheers, > > > > > > John > > > > > > > What's you're calling an "incomplete" answer is actually a perfectly > > good answer. Dnsmasq is entitled to infer that the target of the > CNAME > > doesn't exist if it's not included in the answer, and keep that > > information in the cache for the the TTL period. > > > > Note that is _only_ true if the the upstream server is a recursive > > server - as such it's expected to attempt the follow the CNAME and > > return as much of the chain as exists. If the upstream server is an > > authoritative server, that's not true - if the CNAME target is > outside > > the domain(s) that the server is authoritative for, then the target > will > > not be included. This is one reason why dnsmasq should only use > > recursive servers, an it will log an error if an upstream server is > not > > recursive (ra flag not set). It's also the most common reason why > people > > see the dnsmasq behaviour you're describing. > > > > > > > > Cheers, > > > > Simon. > > > > > > _______________________________________________ > > Dnsmasq-discuss mailing list > > Dnsmasq-discuss@lists.thekelleys.org.uk > > <mailto:Dnsmasq-discuss@lists.thekelleys.org.uk> > > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss > > > > -- *John Robson Sr. Customer Support Engineer**, Zenoss <https://www.zenoss.com/>* jrob...@zenoss.com | *O:* <https://www.zenoss.com/resources/gartner-market-guide-it-infrastructure-monitoring-tools>
_______________________________________________ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss