>----- Original Message ----- From: "Andrzej Bialecki" <[EMAIL PROTECTED]>
>Sent: Thursday, May 31, 2007 11:39 PM
>
>>Caching seems to be the only solution. Even if you were able to fire DNS
>>requests more rapidly, remote servers wouldn't be able (or wouldn't like
>>to) respond that quickly ...
>
>Then why is fetching so fast, despite having to fetch the content of each
>page (with the delays of 3-way TCP handshaking, web server latency,
>potentially long content pages...)?
>
>From what I've seen, I suspect that the root of all evil may be a relatively
>small set of domain names for which the resolver hangs for up to 10 seconds
>(despite being configured it with "options timeout:1 attempts:2" in
>/etc/rsolv.conf). Even only 500 of such domain names (the 0.5% of the total
>20,000) would waste 1h 23' . In that case, having even a small number N of
>threads would reduce the wastage by a factor of N.

We'd run into similar issues with the DNS resolution taking a long time.

Our solution was a combination of both firing off lots of threads to 
resolve names to addresses, and implementing our cache to handle the 
case of negative responses not being cached.

And yes, from what I remember a big part of the problem was a small 
set of domains which couldn't be resolved properly, where this 
negative response wound up taking a really long time.

Subsequent to that we added local DNS caching to all of the servers 
we use for crawling, which (I think) is configured to cache negative 
responses. So I don't believe we're using my multi-threaded DNS 
resolver hack any longer.

-- Ken
-- 
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to