we're down again... VS
On Wed, Jul 2, 2008 at 3:35 PM, Nathan Kinkade <[EMAIL PROTECTED]> wrote: > 2008/7/2 Victor Stone <[EMAIL PROTECTED]>: >> On Wed, Jul 2, 2008 at 2:10 PM, Nathan Yergler >> <[EMAIL PROTECTED]> wrote: >>> Were having a few issues with aggressive crawlers slowing the machine >>> down. >> >> gee, I wonder what that's like ;) >> >> I finally fixed all that will a little snippet of code at the bottom >> of every page on ccM with a hidden link marked as 'nofollow' and >> explicitly listed in robots.txt (I use javascript to replace the href >> with nothing just in case) The link is unique with a rand() parameter >> so the bots don't 'learn' to avoid it. >> >> If you follow the link, it leads to a php script that enters your IP >> into the deny list in .htaccess. This would stop crawls right as they >> started but allows well-behaved crawls from google, yahoo, etc. to >> continue. We used to go down once a week, now, pretty much never. We >> collect about 50-100 IPs per week. I clean them out regularly because >> the evil bots burn the IPs anyway. >> >> VS > > Very interesting. I was wondering how you had that implemented. I > think the problem we had had to do with gitweb, which is sort of like > ViewVC, but for git repositories. Apparently gitweb (or even git > alone) is processor intensive, and so when the crawl started it just > ate up all the processor. Asheesh installed a new version of gitweb > that is written in C instead of some perl CGI script, and which > caches, so hopefully that will help. > > Nathan > _______________________________________________ cc-devel mailing list [email protected] http://lists.ibiblio.org/mailman/listinfo/cc-devel
