Chris Buxton escribió: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Sep 4, 2008, at 10:58 AM, caio wrote: >> Chris Buxton escribió: >>> I would be more inclined to suspect network connectivity problems with >>> the lookup you're having problems with. With that many lookups, each one >>> needs to complete in a reasonable amount of time - 50 ms on average, or >>> thereabouts, to complete the whole thing in 5 seconds. How is your >>> connection to the various servers involved? >> >> do not know if a connectivity problem, because i have 2 name servers, at >> the same network level hierarchy (but differents subnet).., and maybe >> there is one working ok while the other with failure.. >> >> here the case of the secondary ns...(at this moment): >> >> # dig @dns2.mydomain.com www.yahoo.com.ar +trace > > [...] > >> And without "+trace" argument: >> >> # dig @dns2.mydomain.com www.yahoo.com.ar >> >> ; <<>> DiG 9.4.2 <<>> @dns2.mydomain.com www.yahoo.com.ar >> ; (1 server found) >> ;; global options: printcmd >> ;; connection timed out; no servers could be reached >> >> Why with 'trace' the query seem to finish, and without 'trace' it fails? > > > The "+trace" option causes dig to behave quite differently than without. > With "+trace", you're not really asking your server anything other than > for a list of root servers. Then 'dig' does all the work of recursion. > > More interesting would be to repeat your previous query with "+norec" > added, in parallel with the recursive query. Or better yet, configure > logging so that we can see what's going on - but this can be hard with a > busy server. > > The fact that you previously indicated that retrying the query a few > seconds later yields an answer tells me that this is some kind of > performance problem, most likely in network latency (as Kevin Darcy > originally suggested). Looking at the trace, which doesn't show > everything (and also terminates at the first CNAME record), I can see > some pretty slow response times - the response from the root server is > over 400 ms. Of course, your resolving name server most likely has some > of this already in cache, including good working RTT values for the root > and .com servers, among others. Therefore, it's likely that your server > is completing the recursion process in something like 6 seconds, just a > bit over dig's 5 second timeout. Try this: > > dig @dns2.mydomain.com www.yahoo.com.ar +time=20 > > What is the result? You might do something like this for a real test: > > rndc flush > # wait 10 seconds > dig @dns2.mydomain.com www.yahoo.com.ar +time=20 > > Chris Buxton
ok about your dig +trace explanation.., thanks Chris. and here the result of: # rndc flush # (10 secs) # dig @dns2.mydomain.com www.yahoo.com.ar +time=20 ; <<>> DiG 9.4.2 <<>> @dns2.mydomain.com www.yahoo.com.ar +time=20 ; (1 server found) ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 36748 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;www.yahoo.com.ar. IN A ;; Query time: 19738 msec ;; SERVER: <mydomain_public_ip_addr>#53(<ip_addr>) ;; WHEN: Thu Sep 4 16:06:49 2008 ;; MSG SIZE rcvd: 34 And after 2 minutes, I threw 2 parallels dig (with +norec, and rec)..., and I do not know how can I explain it.., but both returns successfull results.., but with differents query times... # dig @dns2.mydomain.com www.yahoo.com.ar +norec ; <<>> DiG 9.4.2 <<>> @dns2.mydomain.com www.yahoo.com.ar +norec ; (1 server found) ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6147 ;; flags: qr ra; QUERY: 1, ANSWER: 3, AUTHORITY: 2, ADDITIONAL: 0 ;; QUESTION SECTION: ;www.yahoo.com.ar. IN A ;; ANSWER SECTION: www.yahoo.com.ar. 1565 IN CNAME hp2.latam.g1.b.yahoo.com. hp2.latam.g1.b.yahoo.com. 54 IN CNAME us.hp2.latam.a1.b.yahoo.com. us.hp2.latam.a1.b.yahoo.com. 299 IN A 68.142.226.230 ;; AUTHORITY SECTION: a1.b.yahoo.com. 172554 IN NS yf1.yahoo.com. a1.b.yahoo.com. 172554 IN NS yf2.yahoo.com. ;; Query time: 0 msec ;; SERVER: <mydomain_public_ip_addr>#53(<ip_addr>) ;; WHEN: Thu Sep 4 16:10:25 2008 ;; MSG SIZE rcvd: 154 And the recursive query, the same but with: ;; Query time: 174 msec ;; WHEN: Thu Sep 4 16:10:24 2008 ;; MSG SIZE rcvd: 154 What has happened is what i want to tell you.., the randomness of this qname resolution with my name servers. If everything goes 'normally' the primary name server in a while will start to fails, and the secondary name server will keep resolving well.. -- caio
