On Aug 7, 2007, at 2:06 PM, Martin v. Löwis wrote:
I hope I have now solved the overload problem that massive crawling has caused to the wiki, and, in consequence, caused PyPI outage.Following Laura's advice, I added Crawl-delay into robots.txt. Several robots have picked that up, not just msnbot and slurp, but also e.g. MJ12bot. For the others, I had to fine-tune my throttling code, after observing that the expensive URLs are those with a query string. They now account for 3 regular queries (might have to bump this to 5), so you can only do one of them every 6s.
I don't suppose there's enough resources to just have PyPI on a separate box entirely, so that whatever else is running (the wiki, etc) won't have the opportunity to drag down the package repository?
On a side-note, has anyone checked into a CDN for packages to speed up their delivery and remove more of the traffic load off the PyPi host? That would also lower the bar for other sites that wanted to mirror PyPI, since they wouldn't have to hose all the actual egg's as well.
Cheers, Ben
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Catalog-SIG mailing list [email protected] http://mail.python.org/mailman/listinfo/catalog-sig
