Mike Frysinger <vap...@gentoo.org> posted
200901282125.52845.vap...@gentoo.org, excerpted below, on  Wed, 28 Jan
2009 21:25:50 -0500:

>> On the wire between the client and the firewall, this happens:
>>
>> a packet 1 is sent
>> b packet 2 is sent
>> c answer 1 is received
>> d answer 2 is received
>>
>> Sometimes d doesn't happen because b is lost in the firewall along the
>> way (where the race condition happens).
> 
> does this affect actual userspace behavior ?  in other words, does this
> lead to lost lookups and errors from the resolver ?

Some of this is beyond my comprehension level, but I've seen interesting 
lookup behavior that is at minimum, rather nicely coincidental.

Specifically, from my machine (running a local caching bind, with 
netfilter on both the machine itself and on my OpenWRT based router), 
doing host lookups on second level domains (cox.com in my case) with MX 
entries works fine, while lookups on third level domains unlikely to have 
MX entries (www.cox.com) return the A record right away, then timeout on 
the MX entry.  AFAIK this is fairly new behavior, apparently quite 
coincident with my installation of glibc-2.9 (_p20081201_r1, currently), 
as IIRC, it formerly returned fine, without waiting for the timeout.

dig -tMX has the same behavior, while a simple dig (A record only) does 
not.

I stumbled across this while investigating after someone (running another 
distribution, no local DNS server) on the local Cox Unix newsgroup 
complained about the response time to www.cox.com.  We traced it down to 
long resolve times and checking them I noted this issue.  I initially 
chalked it up to DNS weirdness on their part and that may indeed be part 
or all of it, but reading this, it sure looks coincidentally similar and 
the timing seems right, at least here (I've no idea what his glibc 
version is or whether he's running netfilter based firewalls either on 
his machine or router, I asked, but don't have a reply yet).

I have not noted any particular delays other than with host/dig -tMX 
myself, but I suspect that may be because I'm running a local bind and it 
mitigates the issue under normal operating conditions.

As I said, it's enough above my head to have no real idea whether this is 
connected or not, but it sure seems coincidental if not.  I'm posting 
because it seems it might help answer the "Does this affect actual 
userspace behavior?" bit.

I can't help feeling a bit uncomfortable with the discussion here as it's 
too much like a normally discouraged bug discussion on the main dev 
list.  So if people want to take the discussion to a bug, post the bug 
link and I'll be happy to CC myself. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


Reply via email to