I agree with Walter.  There's a lot of variables that should have been
considered for this new value.  If nothing else the specification should
have called for the time in milliseconds, or otherwise allow for
fractional seconds.  In addition, it seems a bit presumptuous for Yahoo
to think that they can force a de facto standard just by implementing it
first.  With this line of thinking webmasters would eventually be
required to update their robots.txt file for dozens of individual bots.
It's hard enough to get them to do it now for the general case, this
additional fragmentation is not going to make anybody's job easier.  Is
Google going to implement their own extensions, then MSN, AltaVista, and
AllTheWeb?  Finally, if we're going to start specifying the criteria for
scheduling, let's consider some other alternatives, like preferred
scanning windows.

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Walter Underwood
Sent: Friday, March 12, 2004 3:37 PM
To: Internet robots, spiders, web-walkers, etc.
Subject: Re: [Robots] Yahoo evolving robots.txt, finally


--On Friday, March 12, 2004 6:46 AM -0800 [EMAIL PROTECTED] wrote:
>
> I am surprised that after all that talk about adding new semantic 
> elements to robots.txt several years ago, nobody commented that the 
> new Yahoo crawler (former Inktomi crawler) took a brave step in that 
> direction by adding "Crawl-delay:" syntax.
> 
> http://help.yahoo.com/help/us/ysearch/slurp/slurp-03.html
> 
> Time to update your robots.txt parsers!

No, time to tell Yahoo to go back and do a better job.

Does crawl-delay allow decimals? Negative numbers? Could this spec be a
bit better quality? The words "positive integer" would improve things a
lot.

Sigh. It would have been nice if they'd discussed this on the list
first. "crawl-delay" is a pretty dumb idea. Any value over one second
means it takes forever to index a site. Ultraseek 
has had a "spider throttle" option to add this sort of delay, but it is
almost never used, because Ultraseek reads 25 pages from one site, then
moves to another. There are many kinds of rate control.

wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek

_______________________________________________
Robots mailing list
[EMAIL PROTECTED] http://www.mccmedia.com/mailman/listinfo/robots
_______________________________________________
Robots mailing list
[EMAIL PROTECTED]
http://www.mccmedia.com/mailman/listinfo/robots

Reply via email to