Re: Don't Aggregrate Me

A. Pagaltzis Fri, 26 Aug 2005 07:55:19 -0700

* Bob Wyman <[EMAIL PROTECTED]> [2005-08-26 01:00]:
> My impression has always been that robots.txt was intended to
> stop robots that crawl a site (i.e. they read one page, extract
> the URLs from it and then read those pages). I don't believe
> robots.txt is intended to stop processes that simply fetch one
> or more specific URLs with known names.


I have to side with Bob here.

   Web Robots (also called “Wanderers” or “Spiders”) are Web
   client programs that automatically traverse the Web’s
   hypertext structure by retrieving a document, and recursively
   retrieving all documents that are referenced.

   Note that “recursively” here doesn’t limit the definition to
   any specific traversal algorithm; even if a robot applies some
   heuristic to the selection and order of documents to visit and
   spaces out requests over a long space of time, it qualifies to
   be called a robot.

– <http://www.robotstxt.org/wc/norobots-rfc.html>

PubSub is not a robot by the definition of the `robots.txt` I-D.

Regards,
-- 
Aristotle Pagaltzis // <http://plasmasturm.org/>

Re: Don't Aggregrate Me

Reply via email to