To add to the headache of spidering a site too much at once....
Yes, Wikipedia was the one that made us realize redirections can
result in IP addresses changing. We'd carefully segmented all URLs by
domain IP, and were about to disable the address blocking check, but
then ran into this case.
So we changed the code to put the redirected URL back on the fetch
pile for the next pass.
But as you noted in your previous email re round robin DNS, we could
still have multiple threads hitting the same IP address.
Luckily the sites that have lots of IP addresses & round robin DNS
seem to be the sites with the most bandwidth/users, so we're hoping
we get lost in the noise :)
-- Ken
dnsip
en.wikipedia.org
207.142.131.246 207.142.131.202 207.142.131.210 207.142.131.235
207.142.131.204 207.142.131.205 207.142.131.213 207.142.131.247
207.142.131.245 207.142.131.214 207.142.131.248 207.142.131.236
207.142.131.206 207.142.131.203
dnsip
fr.wikipedia.org
207.142.131.246 207.142.131.205 207.142.131.248 207.142.131.206
207.142.131.214 207.142.131.202 207.142.131.213 207.142.131.235
207.142.131.236 207.142.131.203 207.142.131.245 207.142.131.247
207.142.131.210 207.142.131.204
dnsip
ar.wikipedia.org
207.142.131.245 207.142.131.236 207.142.131.248 207.142.131.203
207.142.131.213 207.142.131.206 207.142.131.214 207.142.131.210
207.142.131.246 207.142.131.204 207.142.131.235 207.142.131.247
207.142.131.205 207.142.131.202
Jeff
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers