----- Original Message ----- From: "Andrzej Bialecki" <[EMAIL PROTECTED]> Sent: Thursday, May 31, 2007 11:39 PM
> Caching seems to be the only solution. Even if you were able to fire DNS > requests more rapidly, remote servers wouldn't be able (or wouldn't like > to) respond that quickly ... Then why is fetching so fast, despite having to fetch the content of each page (with the delays of 3-way TCP handshaking, web server latency, potentially long content pages...)? >From what I've seen, I suspect that the root of all evil may be a relatively small set of domain names for which the resolver hangs for up to 10 seconds (despite being configured it with "options timeout:1 attempts:2" in /etc/rsolv.conf). Even only 500 of such domain names (the 0.5% of the total 20,000) would waste 1h 23' . In that case, having even a small number N of threads would reduce the wastage by a factor of N. > Which DNS cache implementation are you using? A local installation of BIND 9.3.2 (yeah, I know, there are better things around, see e.g. http://nlnetlabs.nl/downloads/bind9-measure.pdf - but here we are talking about less than 100 queries per second, not tens of thousand). > I've had positive experience with djbdns / tinydns package, with some > modifications to increase the number of concurrent requests and the cache > size. This was on Linux, though - I have no idea how to do this on > Windows. Actually I'm running on FreeBSD 6.1. Cheers -- Enzo ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list Nutch-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-general