2008/7/2 Victor Stone <[EMAIL PROTECTED]>:
> On Wed, Jul 2, 2008 at 2:10 PM, Nathan Yergler
> <[EMAIL PROTECTED]> wrote:
>> Were having a few issues with aggressive crawlers slowing the machine
>> down.
>
> gee, I wonder what that's like ;)
>
> I finally fixed all that will a little snippet of code at the bottom
> of every page on ccM with a hidden link marked as 'nofollow' and
> explicitly listed in robots.txt (I use javascript to replace the href
> with nothing just in case) The link is unique with a rand() parameter
> so the bots don't 'learn' to avoid it.
>
> If you follow the link, it leads to a php script that enters your IP
> into the deny list in .htaccess. This would stop crawls right as they
> started but allows well-behaved crawls from google, yahoo, etc. to
> continue. We used to go down once a week, now, pretty much never. We
> collect about 50-100 IPs per week. I clean them out regularly because
> the evil bots burn the IPs anyway.
>
> VS

Very interesting.  I was wondering how you had that implemented.  I
think the problem we had had to do with gitweb, which is sort of like
ViewVC, but for git repositories.  Apparently gitweb (or even git
alone) is processor intensive, and so when the crawl started it just
ate up all the processor.  Asheesh installed a new version of gitweb
that is written in C instead of some perl CGI script, and which
caches, so hopefully that will help.

Nathan
_______________________________________________
cc-devel mailing list
[email protected]
http://lists.ibiblio.org/mailman/listinfo/cc-devel

Reply via email to