On 17/03/2020 01:31, Sasha Litvak wrote: > I couldn't find a specific answer anywhere so hopefully someone has a > clue on this list > > We are using dnsmasq on our servers as a caching dns solution. > > Most of our domains are resolved by a wildcard record like this > > $TTL 3600 ; 1 hour > A 10.10.10.23 > $ORIGIN example.net. > * CNAME excontainers > excontainers CNAME exservice.service.consul > > dnsmasq handles resolution of .consul domain directly but the DNS > server itself also forwards .consul to consul servers. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Can you elaborate? How does dnsmasq handle the resolution of the .consul domain? If you have something like 10.0.48.13 exservice.service.consul in /etc/hosts then that defines, effectively, an immortal record for exservice.service.consul, so a CNAME chain of two records, each with a TTL of one hour, would result in that answer being returned for an hour. > > I added min-ttl 5s to decrease the number of queries to consul > > So when I do dig foo.example.net @127.0.0.1 I get > > foo.example.net. 3600 IN CNAME excontainers.example.net. > excontainers.example.net. 3600 IN CNAME exservice.service.consul. > exservice.service.consul. 5 IN A 10.0.48.13 This might be misleading: is you do that query to dnsmasq with a clean cache, it will forward the query upstream, and return the complete result it gets, including the A record with a 5s TTL, but further queries from the cache would return a 0 (infinite) TTL for the A record of it's defined locally. The fix for this is to define the .consul A record using --host-record, which allows you to specify the 5s TTL. > > Now we often need to migrate subdomains by pointing them to a > different consul cluster. So our script uses nsupdate and creates a > dynamic DNS record resulting in this reply > > foo.example.net. 60 IN CNAME exservice2.service.consul. > exservice2.service.consul. 5 IN A 10.0.48.35 > > So we have a record that is more explicit and it takes precedence over > wild card. On servers with little traffic, domain switch happens > within a few seconds, but on the main busy server with 100s of queries > a second, it takes an hour for dnsmasq to change its cache. We see > dnsmasq sending requests to the DNS server getting correct new records > but still sending the old cached records to a client. > > When we are going back from distinct to default wild card (removing > distinct record in DNS) cache change happens almost immediately (a > couple of seconds) regardless of how busy the server is. > > Sorry for the long description but I would like to find out a reason > why during switching from wild card to more explicit record dnsmasq > cache update takes such a long time. > I'm guessing at exactly what's going on here: more details would be useful, but if I guessed right, that's the solution. Simon. _______________________________________________ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss