Stupid question: Have you tried contacting google to see if there is something they can do? Last I heard it was still a company ran by people who actually liked their jobs =)
-----Original Message----- From: Josh Chamas [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 27, 2004 7:41 PM To: Shawn Cc: [EMAIL PROTECTED] Subject: Re: Search Bot Shawn wrote: > Hi, I have been trying to figure out a way to limit the massive amount > of bandwidth that search bots (Googlebot/2.1) consume daily from my > website. My problem is that I am running Apache::ASP and about 90% of > the site is dynamic content, links such as product.htm?id=100. The > dynamic content gets changed quite a bit so I don't want to use any > caching for regular users, but it would be fine for the bots to use a > cached copy for a month or so. The solution I came up with is manually > modifying the headers to keeping sending back 304 HTTP_NOT_MODIFIED for > a month before allowing new content to be served up to only search bots > and not to regular web browsers. Can anyone tell me if there are some > problems you for see with doing something like this? I have only tested > this on a dev server and was just wondering if anyone else had this > problem or any suggestions they might have. > You could also try compressing your content with CompressGzip setting. You can try setting the Expires header to one month in the future. You could set a /robots.txt file to disallow Google from searching a portion of your site that migth be excludable & high bandwidth. You could sleep(N)seconds when Google does a request, I wonder if that will slow their spiders down across their cluster(s). Just ideas, I have not tried to throttle search bots before. Oh, you might write your own custom mod_perl module that keeps track of bandwidth for search bots and send a 503 "server busy" error code if bandwidth is exceeded. This might tell Google to back off for a while (?). Regards, Josh ________________________________________________________________ Josh Chamas, Founder phone:925-552-0128 Chamas Enterprises Inc. http://www.chamas.com NodeWorks Link Checker http://www.nodeworks.com -- Reporting bugs: http://perl.apache.org/bugs/ Mail list info: http://perl.apache.org/maillist/modperl.html List etiquette: http://perl.apache.org/maillist/email-etiquette.html -- Reporting bugs: http://perl.apache.org/bugs/ Mail list info: http://perl.apache.org/maillist/modperl.html List etiquette: http://perl.apache.org/maillist/email-etiquette.html