Stupid question: Have you tried contacting google to see if there is
something they can do? Last I heard it was still a company ran by people who
actually liked their jobs =)

-----Original Message-----
From: Josh Chamas [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, January 27, 2004 7:41 PM
To: Shawn
Cc: [EMAIL PROTECTED]
Subject: Re: Search Bot


Shawn wrote:
> Hi, I have been trying to figure out a way to limit the massive amount
> of bandwidth that search bots (Googlebot/2.1) consume daily from my 
> website. My problem is that I am running Apache::ASP and about 90% of 
> the site is dynamic content, links such as product.htm?id=100. The 
> dynamic content gets changed quite a bit so I don't want to use any 
> caching for regular users, but it would be fine for the bots to use a 
> cached copy for a month or so. The solution I came up with is manually 
> modifying the headers to keeping sending back 304 HTTP_NOT_MODIFIED for 
> a month before allowing new content to be served up to only search bots 
> and not to regular web browsers. Can anyone tell me if there are some 
> problems you for see with doing something like this? I have only tested 
> this on a dev server and was just wondering if anyone else had this 
> problem or any suggestions they might have.
> 

You could also try compressing your content with CompressGzip setting.
You can try setting the Expires header to one month in the future.
You could set a /robots.txt file to disallow Google from searching
a portion of your site that migth be excludable & high bandwidth.
You could sleep(N)seconds when Google does a request, I wonder if
that will slow their spiders down across their cluster(s).

Just ideas, I have not tried to throttle search bots before.

Oh, you might write your own custom mod_perl module that keeps track
of bandwidth for search bots and send a 503 "server busy" error code
if bandwidth is exceeded.  This might tell Google to back off for
a while (?).

Regards,

Josh
________________________________________________________________
Josh Chamas, Founder                   phone:925-552-0128
Chamas Enterprises Inc.                http://www.chamas.com
NodeWorks Link Checker                 http://www.nodeworks.com


-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html
List etiquette: http://perl.apache.org/maillist/email-etiquette.html



--
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html
List etiquette: http://perl.apache.org/maillist/email-etiquette.html

Reply via email to