Enzo Michelangeli wrote: > ----- Original Message ----- From: "Andrzej Bialecki" <[EMAIL PROTECTED]> > Sent: Thursday, May 31, 2007 11:39 PM > >> Caching seems to be the only solution. Even if you were able to fire DNS >> requests more rapidly, remote servers wouldn't be able (or wouldn't like >> to) respond that quickly ... > > Then why is fetching so fast, despite having to fetch the content of each > page (with the delays of 3-way TCP handshaking, web server latency, > potentially long content pages...)? > > From what I've seen, I suspect that the root of all evil may be a > relatively > small set of domain names for which the resolver hangs for up to 10 seconds > (despite being configured it with "options timeout:1 attempts:2" in > /etc/rsolv.conf). Even only 500 of such domain names (the 0.5% of the total > 20,000) would waste 1h 23' . In that case, having even a small number N of > threads would reduce the wastage by a factor of N. > >> Which DNS cache implementation are you using? > > A local installation of BIND 9.3.2 (yeah, I know, there are better things > around, see e.g. http://nlnetlabs.nl/downloads/bind9-measure.pdf - but here > we are talking about less than 100 queries per second, not tens of > thousand).
We are also using BIND and our current index is 52,519,267 pages so you should be fine with this. I think djbdns is just easier to use. Are you using any big DNS caches as backups? Dennis Kubes > >> I've had positive experience with djbdns / tinydns package, with some >> modifications to increase the number of concurrent requests and the cache >> size. This was on Linux, though - I have no idea how to do this on >> Windows. > > Actually I'm running on FreeBSD 6.1. > > Cheers -- > > Enzo > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
