* Bob Wyman <[EMAIL PROTECTED]> [2005-08-26 01:00]: > My impression has always been that robots.txt was intended to > stop robots that crawl a site (i.e. they read one page, extract > the URLs from it and then read those pages). I don't believe > robots.txt is intended to stop processes that simply fetch one > or more specific URLs with known names.
I have to side with Bob here. Web Robots (also called “Wanderers” or “Spiders”) are Web client programs that automatically traverse the Web’s hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced. Note that “recursively” here doesn’t limit the definition to any specific traversal algorithm; even if a robot applies some heuristic to the selection and order of documents to visit and spaces out requests over a long space of time, it qualifies to be called a robot. – <http://www.robotstxt.org/wc/norobots-rfc.html> PubSub is not a robot by the definition of the `robots.txt` I-D. Regards, -- Aristotle Pagaltzis // <http://plasmasturm.org/>
