BAG Question: What is a feed? Sliding-Window or Current-State?

Bob Wyman Sun, 06 Feb 2005 11:43:23 -0800

Sam Ruby wrote:
> If you produce feeds that contain multiple entries with the same
> id, there will be people who misunderstand such documents.
        So what? If they initially misunderstand, they will eventually learn
how to do it properly.
        In any case, I think you miswrote the quoted sentence. I believe you
are talking about "feed documents" not "feeds". The simple fact is that
according to the Atom specification as it stands and according to the common
useage in the syndication world, feeds frequently have multiple entries with
the same atom:id. By "feed" I mean, of course, the abstract thing of which a
feed document presents a subset. The issue here is not whether *feeds* can
contain multiple same-id entries, but rather whether feed documents can do
so.
        The *real* issue here is that we never had the "BAG" discussions
that I was pleading for back in the summer... A "Blog Architecture" would
have answered questions like "What is a Feed?" "What is an entry?" how does
a Feed differ from a Feed Document, etc... If nothing else, we might have
come to use a common vocabulary.
        In this particular debate, the core issue is "What is a Feed
Document?" I have long contended that a Feed Document is a "sliding window"
on a feed. Sayre and others have said that a Feed Document is "a
representation of the current state of a set of entries." The difference is
significant. For instance, if you accept the "sliding window" view, then
since a feed (the abstract thing) can have multiple instances of entries
with the same atom:id, then it is obvious that a Feed Document should have
the same property. On the other hand, if you accept the "current state"
definition of a Feed Document, you end up saying that while a Feed can have
multiple same-id instances, it is clear that a Feed Document cannot since
there is only one "current state" of a resource at any one time.
        In either case, the definition of "Feed Document" implies the
answer, without debate, to the question: "Can a Feed Document contain
multiple entries with the same atom:id?" Had we dealt with architectural
issues, this debate would not and could not be happening. The same can be
said about a number of other issues before us now. Sayre and others have,
for instance, complained regularly about the lack of agreed definitions
concerning "What is an Entry...", etc. Our terminology is sloppy. We haven't
done our jobs well.
        I contend that we should adopt the sliding window definition of a
feed simply because it is the most useful and easiest to implement. It makes
"archiving" simple, it allows those who are anal about getting "every
change" to get them, and it makes it possible for us to implement bandwidth
saving measures such as RFC3229+feed. None of these things are possible,
without the creation of a special "archive" type, if one adopts the "current
state" view. Given that the "sliding window" view addresses a wider variety
of requirements, I think we should adopt it as the primary view.
        With sliding windows, the only thing you need to do to update a feed
document is insert at its head the new or changed entry. (You could also
just append to the tail of the file. Order should not be significant in an
Atom file...) On the other hand, the "current state" view requires that you
scan the current Feed Document and remove any pre-existing entry that has
the same atom:id as the one you are inserting. 
        The difference in Feed Document updating may not seem to be big,
however, it becomes significant when we consider things like the
RFC3229+feed that is currently responsible for *massive* bandwidth savings
for those that use it. RFC3229+feed, as written, assumes a "sliding window"
view of Feed Documents and can be simply implemented by simply associating
ETags with the various points in a document at which entries are inserted.
Constructing the "diff" of a feed document is then simply a question of
indexing back into the Feed Document to the ETag in question and copying all
the data inserted after the ETag. However, if the "current state" view of a
feed is required, then it becomes necessary on entry insertion or on
retrieval to process each entry in the sliding window to filter out
duplicate atom:ids. The result is more processing for the server and a great
deal more complexity in the implementation. My fear is that the increased
complexity (which probably requires a database approach -- not just a
sequential file with "ETag cursors" pointing in to it) is sufficient that
people won't implement RFC3229+feed. The result would be wasted bandwidth,
etc...
        Of course, adopting "current state" instead of "sliding window" also
means that we have to invent the atom:archive type so that we can produce
the sliding window view which is implied by the archive requirement. Thus,
we're making things more complex by offering both "current state" and
"sliding window" views.
        It has been argued by a number of people that we can't adopt
"sliding window" since the ability to have multiple instances of a since
entry in a Feed document is "unprecedented" in the history of blogging and
syndication!!! This is, of course, a completely false statement. The reality
is that in RSS and Atom feed documents in use today, one often sees the same
id or guid being used for multiple entries. In fact, the most popular id or
quid is "null" or nothing... The result has, of course, been that virtually
no existing aggregators actually do anything useful with ids or Guids
(unless they are in RSS V2.0 and flagged as permalinks.) This lack of use
for non-permalink guids resulted from the under-specification in previous
versions of syndication formats as well as the fact that GUIDs were
generally optional. But, an essential element of the Atom format is that
atom:id is *required* and thus can be used for useful purposes. It is that
expectation that an atom:id or guid is *useful* that is unprecedented in
Atom... In any case, why in the world would the fact that something is
"unprecedented" be a bar to our adopting it? Are we forbidden to innovate?
Must we propagate all the errors of the past?
        Anyway, I am convinced that the "current state" view is less useful
then the "sliding window" view. The current proposals to define an "archive"
type to patch "sliding window" into Atom are excellent indications that I'm
right... I know others disagree... I also realize that not many people will
actually read this whole message. Ah well...


                bob wyman

BAG Question: What is a feed? Sliding-Window or Current-State?

Reply via email to