2008/7/2 Victor Stone <[EMAIL PROTECTED]>: > On Wed, Jul 2, 2008 at 2:10 PM, Nathan Yergler > <[EMAIL PROTECTED]> wrote: >> Were having a few issues with aggressive crawlers slowing the machine >> down. > > gee, I wonder what that's like ;) > > I finally fixed all that will a little snippet of code at the bottom > of every page on ccM with a hidden link marked as 'nofollow' and > explicitly listed in robots.txt (I use javascript to replace the href > with nothing just in case) The link is unique with a rand() parameter > so the bots don't 'learn' to avoid it. > > If you follow the link, it leads to a php script that enters your IP > into the deny list in .htaccess. This would stop crawls right as they > started but allows well-behaved crawls from google, yahoo, etc. to > continue. We used to go down once a week, now, pretty much never. We > collect about 50-100 IPs per week. I clean them out regularly because > the evil bots burn the IPs anyway. > > VS
Very interesting. I was wondering how you had that implemented. I think the problem we had had to do with gitweb, which is sort of like ViewVC, but for git repositories. Apparently gitweb (or even git alone) is processor intensive, and so when the crawl started it just ate up all the processor. Asheesh installed a new version of gitweb that is written in C instead of some perl CGI script, and which caches, so hopefully that will help. Nathan _______________________________________________ cc-devel mailing list [email protected] http://lists.ibiblio.org/mailman/listinfo/cc-devel
