Bob Wyman wrote:
I've got another example of a selfish feed which is producing dynamic
content which will cause many duplicate entries to float around the
blogosphere. The feed in question here is an RSS feed. Nonetheless, I think
we must expect people will do the same stupid tricks with Atom feeds. Check
out:

http://www.b-eye-network.com/xml/articles.php

What you'll get is a feed with entries that look something like the one at
the bottom of this page. The interesting thing to note is that the item has
a <link> element with the url:

<link>http://www.b-eye-network.com/view/index.php?
cid=836&fc=0&frss=1&ua=Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322; Alexa Toolbar)</link>


What's happened here is that the site has appended my User Agent to the URL
in the link. I assume that this is to allow some kind of tracking. However,
the impact is that the contents of the feed depend on what tool you use to
read the feed. If you access the feed, you will undoubtedly get different
content then I did... For instance, if PubSub's crawler had read the feed,
the value of the "ua" attribute in the URL would have been different and the
URL would have read:
<link>http://www.b-eye-network.com/view/index.php?
cid=836&amp;fc=0&amp;frss=1&amp;ua=PubSub.com RSS reader - http://www.pubsub.com/</link>


If this feed is read by more than one synthetic feed generator or if items
from the feed are copied from this feed to another, it is inevitable that
we'll have multiple copies of the item floating around and we'll have very
little means for determining which one is authoritative -- essentially they
all are. It would be handy to have a "dynamic content flag" that allows us
to ignore this stuff...

It seems to me that instead of adding a dynamic content flag, having a separate id element (or in RSS 2.0's case, utilizing the guid element) would be more to the point.


- Sam Ruby



Reply via email to