To add to the headache of spidering a site too much at once....

Yes, Wikipedia was the one that made us realize redirections can result in IP addresses changing. We'd carefully segmented all URLs by domain IP, and were about to disable the address blocking check, but then ran into this case.

So we changed the code to put the redirected URL back on the fetch pile for the next pass.

But as you noted in your previous email re round robin DNS, we could still have multiple threads hitting the same IP address.

Luckily the sites that have lots of IP addresses & round robin DNS seem to be the sites with the most bandwidth/users, so we're hoping we get lost in the noise :)

-- Ken


dnsip en.wikipedia.org 207.142.131.246 207.142.131.202 207.142.131.210 207.142.131.235 207.142.131.204 207.142.131.205 207.142.131.213 207.142.131.247 207.142.131.245 207.142.131.214 207.142.131.248 207.142.131.236 207.142.131.206 207.142.131.203 dnsip fr.wikipedia.org 207.142.131.246 207.142.131.205 207.142.131.248 207.142.131.206 207.142.131.214 207.142.131.202 207.142.131.213 207.142.131.235 207.142.131.236 207.142.131.203 207.142.131.245 207.142.131.247 207.142.131.210 207.142.131.204 dnsip ar.wikipedia.org 207.142.131.245 207.142.131.236 207.142.131.248 207.142.131.203 207.142.131.213 207.142.131.206 207.142.131.214 207.142.131.210 207.142.131.246 207.142.131.204 207.142.131.235 207.142.131.247 207.142.131.205 207.142.131.202

Jeff


--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"

Reply via email to