Brent S. Elmer Ph.D. wrote:
> On Thu, 2010-12-23 at 09:38 +0000, Simon Kelley wrote:
> 
>> It's not as simple as it seems (is it ever?). Dnsmasq _doesn't_ return
>> before the service is ready: by the time the initial process exits, the
>> long-lived server process has opened all the relevant sockets and pretty
>> much the next thing it does is to enter the event loop. As Russ, says,
>> achieving this is non-trivial.
>>
>> However, dnsmasq is a query forwarder, so it relies on having another
>> upstream DNS server to send queries to. I can think is two reasons why
>> this may not be the case immediately at startup.
>>
>> 1) /etc/resolv.conf
>>
>> If you have something like resolvconf installed, then the contents of
>> /etc/resolv.conf may not be correct when dnsmasq first reads it. The
>> mtime of this file is polled at minimum 1 second intervals, so the
>> nameservers will eventually be picked up correctly, but it may take a
>> little time. A DNS query which dnsmasq receives when it has no suitable
>> upstream servers will be instantly answered with an error.
>>
>> 2) networking.
>>
>> Even if dnsmasq has one or more upstream nameservers configured, if
>> attempts to send UDP packets to all of them result in "no route to host"
>> errors then the original request will again get an error straight away.
>> Unconfigured network interfaces or routing tables could easily cause this.
>>
>>
>> The next stage is to add
>>
>> log-queries
>>
>> to /etc/dnsmasq.conf and then run a test start-up. Dnsmasq should log
>> enough information to let us deduce exactly what is going on.
>>
>>
>> Cheers,
>>
>> Simon.
>>
> 
> I turned on log-queries in dnsmasq.conf and rebooted.  Yes, I am running
> resolvconf.  Here is the loop I am using to make sure addresses are
> resolved before I start openafs:
> 
> for i in 1 2 3 4 5 6 7 8
> do
>     found=`/usr/bin/dig +short w3.ibm.com`
>     echo -e "<$found>"
>     if [ "$found" != "" ]; then
>       echo -e "\nResolved w3.ibm.com, dnsmasq must be fully up\n"
>       break
>     else
>       sleep 5
>     fi
> done
> 
> w3.ibm.com was resolved on the 4th loop for this boot.  Here is the
> syslog for the boot.  Let me know if you need anything else.
> 
> Thanks,
> Brent
> 
> 
> 

Thanks.  Of my two options above, this is clearly 2), since you're not
using /etc/resolv.conf:

Dec 23 08:51:05 belmer dnsmasq[2516]: warning: ignoring resolv-file flag
because no-resolv is set

There follows lots of logged queries which don't get forwarded anywhere,
as there's no route to the upstream nameservers. These will have
received answers with RCODE set to REFUSED.

Then NetworkManager does stuff, and as soon as it gets to
Dec 23 08:51:36 belmer NetworkManager[2099]: <info> Activation (wlan0)
Stage 2 of 5 (Device Configure) complete.

dnsmasq stops getting errors when it sends UDP packets and starts
forwarding queries. There are no replies from upstream at this point,
which is no great surprise as wlan0 doesn't have an IP address. The
REFUSED answers will have stopped at this point, instead there will be
no answer.


Later:
Dec 23 08:51:43 belmer dnsmasq[2516]: query[A] weather.noaa.gov from
127.0.0.1
Dec 23 08:51:43 belmer dnsmasq[2516]: forwarded weather.noaa.gov to 4.2.2.1
Dec 23 08:51:43 belmer dnsmasq[2516]: reply weather.noaa.gov is <CNAME>

The first answer from upstream! This coincides with
Dec 23 08:51:44 belmer dhclient: bound to 9.61.255.91 -- renewal in
16288 seconds.

and

Dec 23 08:51:45 belmer NetworkManager[2099]: <info> Activation (wlan0)
Stage 5 of 5 (IP Configure Commit) complete.


There is then a whole slew of queries starting at  08:51:45 which don't
even get forwarded - I can't explain that, but maybe it's something to
do with avahi temporariliy taking wlan0 down.

Fifteen seconds later, and all is well.

Dec 23 08:52:09 belmer dnsmasq[2516]: query[A] ADSMSRV3.BTV.IBM.COM from
127.0.0.1
Dec 23 08:52:09 belmer dnsmasq[2516]: forwarded ADSMSRV3.BTV.IBM.COM to
9.0.3.1
Dec 23 08:52:09 belmer dnsmasq[2516]: forwarded ADSMSRV3.BTV.IBM.COM to
9.0.2.11
Dec 23 08:52:09 belmer dnsmasq[2516]: reply ADSMSRV3.BTV.IBM.COM is
9.61.33.58



Could AFS be fixed to retry DNS queries after a longish timeout either
when there's no reply or when a REFUSED rcode is received? The behaviour
of ntpd in the log above shows fairly well how to get it right.

It's not clear to me that there exists any sane change in the behaviour
of dnsmasq that would help things.

Cheers,

Simon.



-- 
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]

Reply via email to