Re: Selfish Feeds...
Bob Wyman wrote: I've got another example of a selfish feed which is producing dynamic content which will cause many duplicate entries to float around the blogosphere. The feed in question here is an RSS feed. Nonetheless, I think we must expect people will do the same stupid tricks with Atom feeds. Check out: http://www.b-eye-network.com/xml/articles.php What you'll get is a feed with entries that look something like the one at the bottom of this page. The interesting thing to note is that the item has a link element with the url: linkhttp://www.b-eye-network.com/view/index.php? cid=836fc=0frss=1ua=Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322; Alexa Toolbar)/link What's happened here is that the site has appended my User Agent to the URL in the link. I assume that this is to allow some kind of tracking. However, the impact is that the contents of the feed depend on what tool you use to read the feed. If you access the feed, you will undoubtedly get different content then I did... For instance, if PubSub's crawler had read the feed, the value of the ua attribute in the URL would have been different and the URL would have read: linkhttp://www.b-eye-network.com/view/index.php? cid=836amp;fc=0amp;frss=1amp;ua=PubSub.com RSS reader - http://www.pubsub.com//link If this feed is read by more than one synthetic feed generator or if items from the feed are copied from this feed to another, it is inevitable that we'll have multiple copies of the item floating around and we'll have very little means for determining which one is authoritative -- essentially they all are. It would be handy to have a dynamic content flag that allows us to ignore this stuff... It seems to me that instead of adding a dynamic content flag, having a separate id element (or in RSS 2.0's case, utilizing the guid element) would be more to the point. - Sam Ruby
RE: Selfish Feeds...
Sam Ruby wrote: It seems to me that instead of adding a dynamic content flag, having a separate id element (or in RSS 2.0's case, utilizing the guid element) would be more to the point. Relying on a GUID alone only works if you implement a policy that says that you are only interested in seeing content with new GUIDs and you are willing to ignore any updates to previously seen entries/items. Similarly, relying on atom:id + atom:updated implies a policy of only being interested in content changes which are explicitly flagged by the publisher as being worthy of notice. These are certainly appropriate policies for *some* aggregators addressing *some* user needs. However, other aggregators implement different policies which address other user needs. For instance, many aggregators will update their content stores whenever *any* change occurs in an item whether or not the GUID or Atom:id has changed. Some of these aggregators will flag any change as a new or unread entry. (which I think is a really stupid policy...) Others will, like Gush, distinguish between new items and updated items. (I think this is much more sensible, others will say it is overly complex and unnecessary.) Conceivably, once Atom is released, some aggregators will wish to record three states for an entry: new, major update and minor update. (I would support anyone doing this, others would not.) To understand this issue and many other syndication issues, it is vital that you try to consider the full range of policies that are implemented by aggregators and that you try to look beyond your personal preferences. Please try to understand that this isn't a simple issue -- at least not from the point of view of a channel intermediary like PubSub. As was recently pointed out, a very large percentage of the HTTP specification covers issues related to proxies (which is very much the role that PubSub plays.) The same is true of the State Management (Cookie) RFC. I remember that when we were working on that RFC, proxy issues were just about the *only* thing we discussed... Problems which are simple in point to point networks become much more complex when you introduce intermediaries. Frankly, I really wish that we had done the blog architecture work many months ago so that we would all have a shared understanding of the system-wide issues and components rather than the widely divergent personal and partial views that are obvious in many our conversations today... bob wyman
RE: Selfish Feeds...
--On May 6, 2005 4:37:23 PM -0400 Bob Wyman [EMAIL PROTECTED] wrote: Frankly, I really wish that we had done the blog architecture work many months ago so that we would all have a shared understanding of the system-wide issues and components rather than the widely divergent personal and partial views that are obvious in many our conversations today... Agreed. A conceptual model of a resource is up there at the front of our charter, and if we don't have that, it doesn't seem like the WG is done. wunder -- Walter Underwood Principal Architect, Verity