Re: Don't Aggregrate Me

Mark Pilgrim Mon, 29 Aug 2005 09:18:29 -0700

On 8/26/05, Graham <[EMAIL PROTECTED]> wrote:
> > (And before you say "but my aggregator is nothing but a podcast
> > client, and the feeds are nothing but links to enclosures, so it's
> > obvious that the publisher wanted me to download them" -- WRONG!  The
> > publisher might want that, or they might not ...
> 
> So you're saying browsers should check robots.txt before downloading
> images?


It's sad that such an inane dodge would even garner any attention at
all, much less require a response.

http://www.robotstxt.org/wc/faq.html

"""
What is a WWW robot?
A robot is a program that automatically traverses the Web's hypertext
structure by retrieving a document, and recursively retrieving all
documents that are referenced.

Note that "recursive" here doesn't limit the definition to any
specific traversal algorithm; even if a robot applies some heuristic
to the selection and order of documents to visit and spaces out
requests over a long space of time, it is still a robot.

Normal Web browsers are not robots, because they are operated by a
human, and don't automatically retrieve referenced documents (other
than inline images).

Web robots are sometimes referred to as Web Wanderers, Web Crawlers,
or Spiders. These names are a bit misleading as they give the
impression the software itself moves between sites like a virus; this
not the case, a robot simply visits sites by requesting documents from
them.
"""

On a more personal note, I would like to thank you for reminding me
why there will never be an Atom Implementor's Guide. 
http://diveintomark.org/archives/2004/08/16/specs

-- 
Cheers,
-Mark

Re: Don't Aggregrate Me

Reply via email to