Bob Wyman wrote:
I've got another example of a selfish feed which is producing dynamic content which will cause many duplicate entries to float around the blogosphere. The feed in question here is an RSS feed. Nonetheless, I think we must expect people will do the same stupid tricks with Atom feeds. Check out:
http://www.b-eye-network.com/xml/articles.php
What you'll get is a feed with entries that look something like the one at the bottom of this page. The interesting thing to note is that the item has a <link> element with the url:
<link>http://www.b-eye-network.com/view/index.php?
cid=836&fc=0&frss=1&ua=Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322; Alexa Toolbar)</link>
What's happened here is that the site has appended my User Agent to the URL
in the link. I assume that this is to allow some kind of tracking. However,
the impact is that the contents of the feed depend on what tool you use to
read the feed. If you access the feed, you will undoubtedly get different
content then I did... For instance, if PubSub's crawler had read the feed,
the value of the "ua" attribute in the URL would have been different and the
URL would have read:
<link>http://www.b-eye-network.com/view/index.php?
cid=836&fc=0&frss=1&ua=PubSub.com RSS reader - http://www.pubsub.com/</link>
If this feed is read by more than one synthetic feed generator or if items from the feed are copied from this feed to another, it is inevitable that we'll have multiple copies of the item floating around and we'll have very little means for determining which one is authoritative -- essentially they all are. It would be handy to have a "dynamic content flag" that allows us to ignore this stuff...
It seems to me that instead of adding a dynamic content flag, having a separate id element (or in RSS 2.0's case, utilizing the guid element) would be more to the point.
- Sam Ruby