Hello Alex, I would try removing all-servers and clear-on-reload statements away. I would use just one server for testing, retesting all of them for the same behaviour. When you do not know which server is used, it is hard to debug better.
I think dots in server=/.X/ are not necessary and maybe even misleading. Try it without them, just server=/X/ip I think one second timeout is too short. Just use only localhost in /etc/resolv.conf and debug what happens with dnsmasq. Record what queries are sent to dnsmasq and what dnsmasq forwards to configured servers. Note I discovered already requests without recursion desired bit set are forwarded always, do not serve any local records. But that should not be the issue. Try dig +rec and dig +norec to rule it out. Regards, Petr On 7/7/19 10:28 PM, Alex Litvak wrote: > (luck of sleep, fixing some mistakes in text) > > Hello everyone, > > I run consul services on my network where services are registered with > <xyz>.service.consul when they start. All containers and bare metal > hosts are running dnsmasq 2.80. > I noticed that if I restart one of the containers, one of the hosts > continue failing to resolve the service name. I assume that dnsmasq is > a culprit because: > > 1. I can resolve service xyz.service.consul against standard dns servers > with dig. > 2. Dnsmasq listening on 127.0.0.1 is the first line in the resolve.conf > and when I run tcpdump against port 53 on interface lo I see it returns > NXDOMAIN on each A record query for service in question. > 3. If I restart dnsmasq everything is back to normal again. Even more > weird, if I send SIGHUP to dnsmasq, which only causes a reread of > /etc/hosts file, everything is back to normal as far as service > resolution goes. > > I have this problem only happening on some hosts without the pattern I > can recognize. For example I have two nodes with the same config, os, > kernel version, dnsmasq version, etc ... and one of them has the problem > 100% after service xyz.service.consul restart and the other is not. > > Where do I start troubleshooting? Any ideas are welcome. > > Here is a standard dnsmasq confugration. > > port=53 > domain-needed > bogus-priv > interface=lo > listen-address=127.0.0.1 > no-dhcp-interface=127.0.0.1 > #bind-interfaces > no-resolv > all-servers > dns-forward-max=500 > > # If you don't want dnsmasq to read /etc/hosts, uncomment the > # following line. > #no-hosts > # or if you want it to read another file, as well as /etc/hosts, use > # this. > #addn-hosts=/etc/banner_add_hosts > > #log-queries=extra > #log-facility=/var/log/dnsmasq.log > log-async=25 > > # Set the cachesize here. > cache-size=10000 > min-cache-ttl=5 > #neg-ttl=3600 > > # If you want to disable negative caching, uncomment this. > #no-negcache > > # For debugging purposes, log each DNS query as it passes through > # dnsmasq. > #log-queries > clear-on-reload > > server=10.0.48.12 > server=10.0.48.11 > server=10.0.21.63 > server=10.0.21.61 > > server=/.la.consul/10.0.73.43 > server=/.la.consul/10.0.73.40 > server=/.la.consul/10.0.73.28 > server=/.chi-pbx.consul/10.1.73.1 > server=/.chi-pbx.consul/10.1.73.2 > server=/.chi-pbx.consul/10.1.73.3 > server=/.consul/10.0.73.43 > server=/.consul/10.0.73.40 > server=/.consul/10.0.73.28 > > Resolver config > > search '' > options timeout:1 attempts:1 > nameserver 127.0.0.1 > nameserver 10.0.48.11 > nameserver 10.0.48.12 > nameserver 10.0.21.63 > > > > _______________________________________________ > Dnsmasq-discuss mailing list > Dnsmasq-discuss@lists.thekelleys.org.uk > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss -- Petr Menšík Software Engineer Red Hat, http://www.redhat.com/ email: pemen...@redhat.com PGP: 65C6C973 _______________________________________________ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss