Re: Selfish Feeds...

2005-05-06 Thread Sam Ruby
Bob Wyman wrote:
I've got another example of a selfish feed which is producing dynamic
content which will cause many duplicate entries to float around the
blogosphere. The feed in question here is an RSS feed. Nonetheless, I think
we must expect people will do the same stupid tricks with Atom feeds. Check
out:
http://www.b-eye-network.com/xml/articles.php
What you'll get is a feed with entries that look something like the one at
the bottom of this page. The interesting thing to note is that the item has
a link element with the url:
  linkhttp://www.b-eye-network.com/view/index.php?
   cid=836fc=0frss=1ua=Mozilla/4.0 (compatible; 
MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322; Alexa Toolbar)/link

What's happened here is that the site has appended my User Agent to the URL
in the link. I assume that this is to allow some kind of tracking. However,
the impact is that the contents of the feed depend on what tool you use to
read the feed. If you access the feed, you will undoubtedly get different
content then I did... For instance, if PubSub's crawler had read the feed,
the value of the ua attribute in the URL would have been different and the
URL would have read:
   linkhttp://www.b-eye-network.com/view/index.php?
cid=836amp;fc=0amp;frss=1amp;ua=PubSub.com RSS reader - 
http://www.pubsub.com//link

If this feed is read by more than one synthetic feed generator or if items
from the feed are copied from this feed to another, it is inevitable that
we'll have multiple copies of the item floating around and we'll have very
little means for determining which one is authoritative -- essentially they
all are. It would be handy to have a dynamic content flag that allows us
to ignore this stuff...
It seems to me that instead of adding a dynamic content flag, having a 
separate id element (or in RSS 2.0's case, utilizing the guid element) 
would be more to the point.

- Sam Ruby


RE: Selfish Feeds...

2005-05-06 Thread Bob Wyman

Sam Ruby wrote:
 It seems to me that instead of adding a dynamic content flag, having
 a separate id element (or in RSS 2.0's case, utilizing the guid
 element) would be more to the point.
Relying on a GUID alone only works if you implement a policy that
says that you are only interested in seeing content with new GUIDs and you
are willing to ignore any updates to previously seen entries/items.
Similarly, relying on atom:id + atom:updated implies a policy of only being
interested in content changes which are explicitly flagged by the publisher
as being worthy of notice. These are certainly appropriate policies for
*some* aggregators addressing *some* user needs. However, other aggregators
implement different policies which address other user needs. For instance,
many aggregators will update their content stores whenever *any* change
occurs in an item whether or not the GUID or Atom:id has changed. Some of
these aggregators will flag any change as a new or unread entry. (which I
think is a really stupid policy...) Others will, like Gush, distinguish
between new items and updated items. (I think this is much more
sensible, others will say it is overly complex and unnecessary.)
Conceivably, once Atom is released, some aggregators will wish to record
three states for an entry: new, major update and minor update. (I
would support anyone doing this, others would not.)
To understand this issue and many other syndication issues, it is
vital that you try to consider the full range of policies that are
implemented by aggregators and that you try to look beyond your personal
preferences. Please try to understand that this isn't a simple issue -- at
least not from the point of view of a channel intermediary like PubSub. As
was recently pointed out, a very large percentage of the HTTP specification
covers issues related to proxies (which is very much the role that PubSub
plays.) The same is true of the State Management (Cookie) RFC. I remember
that when we were working on that RFC, proxy issues were just about the
*only* thing we discussed... Problems which are simple in point to point
networks become much more complex when you introduce intermediaries.
Frankly, I really wish that we had done the blog architecture work
many months ago so that we would all have a shared understanding of the
system-wide issues and components rather than the widely divergent personal
and partial views that are obvious in many our conversations today...

bob wyman




RE: Selfish Feeds...

2005-05-06 Thread Walter Underwood

--On May 6, 2005 4:37:23 PM -0400 Bob Wyman [EMAIL PROTECTED] wrote:

   Frankly, I really wish that we had done the blog architecture work
 many months ago so that we would all have a shared understanding of the
 system-wide issues and components rather than the widely divergent personal
 and partial views that are obvious in many our conversations today...

Agreed. A conceptual model of a resource is up there at the front of
our charter, and if we don't have that, it doesn't seem like the WG is done.

wunder
--
Walter Underwood
Principal Architect, Verity