Google should realize that if a page hasn't changed in a year then
there is no need to index the thing more than once a month.

>We have tried to put code to slow down how fast we serve pages to google bot, 
>but that is almost as expensive as serving the page...

What happens if you only serve the google bot once a second and the
rest of the time you give it timeouts rather than 500 errors?

On Jan 9, 7:24 am, "Brandon Wirtz" <drak...@digerat.com> wrote:
> We set expiration very long in both the headers and in the sitemap.  We
> tried killing the sitemap (since we were getting crawled and could see the
> crawler winding through links). When you are assigned "special crawl rate"
> because you are on Google infrastructure, you don't get any control.  We
> have observed the bots going through every page in the sitemap, at a rate of
> 1 page as fast as it could get it, and going to the next page and looping
> when it reached the end of the sitemap.  I built an app as a "playground"
> for the bot and if I told it that the change frequency was Hourly with 100k
> pages it would consume 5 F1 python 2.5 instances 24 hours a day.
>
> We have tried to put code to slow down how fast we serve pages to google
> bot, but that is almost as expensive as serving the page, and when it really
> beats on the Wait state backs up legit requests.
> We tried making sure we served Googlebot the oldest version of pages we have
> so that it wouldn't see changes.
> In a test environment we served 500 errors, which got the pages removed from
> the index.
> We tried redirecting only Google Bot to the Naked Domain which is hosted not
> on GAE  That resulted in us crushing the Naked server AND getting the pages
> listed in Google wrong despite setting a preference in WebMasters Tools that
> we always have www. In results.
> We Tried using the MSN Robot.txt setting to throttle crawling.
> We tried to come up with a way to give alternate DNS to Google so it would
> let us set the crawl rate in webmasters, doing so causes AppsForDomains to
> disassociate your Domain, because it can't detect that you are still hosting
> on GAE.
> We Tried attaching HUGE amounts of CSS/ Style data to make the pages big so
> that Google would throttle back the crawls, and we could push the data to
> the buffer, but hit the Bit Budget for the crawl. All that did was up our
> bandwidth usage.
>
> From: google-appengine@googlegroups.com
> [mailto:google-appengine@googlegroups.com] On Behalf Of Anand Mistry
> Sent: Monday, January 09, 2012 3:58 AM
> To: google-appengine@googlegroups.com
> Subject: [google-appengine] Re: Google Bot Is Still the enemy...
>
> Have you looked into using Sitemaps
> (http://www.sitemaps.org/protocol.html#changefreqdef) to hint at how often
> to crawl your site? Google, Bing, and Yahoo all recognise the sitemaps
> protocol, even though they may act on it differently.
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web 
> visithttps://groups.google.com/d/msg/google-appengine/-/Kg4DY93V0C8J.
> To post to this group, send email to google-appengine@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengine+unsubscr...@googlegroups.com.
> For more options, visit this group 
> athttp://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to