----- Original Message ----- 
From: "Andrzej Bialecki" <[EMAIL PROTECTED]>
Sent: Thursday, May 31, 2007 11:39 PM

> Caching seems to be the only solution. Even if you were able to fire DNS
> requests more rapidly, remote servers wouldn't be able (or wouldn't like
> to) respond that quickly ...

Then why is fetching so fast, despite having to fetch the content of each
page (with the delays of 3-way TCP handshaking, web server latency,
potentially long content pages...)?

>From what I've seen, I suspect that the root of all evil may be a relatively
small set of domain names for which the resolver hangs for up to 10 seconds
(despite being configured it with "options timeout:1 attempts:2" in
/etc/rsolv.conf). Even only 500 of such domain names (the 0.5% of the total
20,000) would waste 1h 23' . In that case, having even a small number N of
threads would reduce the wastage by a factor of N.

> Which DNS cache implementation are you using?

A local installation of BIND 9.3.2 (yeah, I know, there are better things
around, see e.g. http://nlnetlabs.nl/downloads/bind9-measure.pdf - but here
we are talking about less than 100 queries per second, not tens of 
thousand).

> I've had positive experience with djbdns / tinydns package, with some
> modifications to increase the number of concurrent requests and the cache
> size. This was on Linux, though - I have no idea how to do this on
> Windows.

Actually I'm running on FreeBSD 6.1.

Cheers --

Enzo


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to