Shouldn't RSS feeds declare the correct content-type?
Yes, they should, but generally, they don't (a lot of rss feeds return a text/xml content-type). I don't know why. Perhaps because application/rss+xml is not registered to IANA (http://www.iana.org/assignments/media-types/application/) In practice, many webmasters are don't aware of this, since the main entry point for their feeds are some HTML pages that reference them (with the good content-type in HTML tag link) or some feeds aggregators that simply try to parse the feed content (without any care of the protocol mime-type) => Their feeds are viewable and usable by end users. Further more, I see this "feature" as an extension of the cache mechanism. The cache provides an access for a document that no longer exists or is simply temporally unavailable. So why not giving access via the cache to a document with a wrong protocol content-type but that was correctly identified /parsed / indexed by Nutch? Jérôme -- http://motrech.free.fr/ http://www.frutch.org/