Re: IBM IPR Disclosure
On 2/15/07, James M Snell [EMAIL PROTECTED] wrote: IBM has agreed to a blanket commitment to Royalty Free terms for any IPR that reads on the Standards Track specifications produced by the atompub Working Group. ... We think the Atom Syndication Format and the Atom Publishing Protocol are really important. Thank you. bob wyman
Re: Quoting type parameter value allowed? - Was: I-D ACTION:draft-ietf-atompub-typeparam-00.txt
On 1/19/07, Andreas Sewe [EMAIL PROTECTED] wrote: So, it looks like that quoting the type parameter's values is no longer allowed; Are the quotes part of the parameter value? Or, are quotes merely delimiters of the value? If RFC045 is read to indicate that the quotes are delimiters, then it would not be in conflict with RFC4288 since in both cases, feed would be interpretted as being the value 'feed'... bob wyman
Re: Inheritance of license grants by entries in a feed
On 1/14/07, David Powell [EMAIL PROTECTED] wrote: You can't just say that the license extension inherits and expect every implementation out there to implement that. You'd need an Atom 2.0 to do that: either support for must-understand (which was rejected from Atom 1.0), or a special feed document extension container. An implementation should only do things based on the license extension if it understands what the license extension means. Since the draft now has carefully written words to ensure that license extensions only grant additional rights and do not restrict default rights, the worst case situation is that an implementation that doesn't understand the license extension inheritance will simply treat entries as though they only had normal, copyright-defined rights associated with them. i.e. You would get fair use, implied right to syndicate, right to read, right to make facilitative copies, etc. but you wouldn't realize that you also get whatever extra rights were granted by the license. This is, I think a reasonable fall-back. Of course, implementations that do understand that feed-level licenses are inherited will be able to manage rights just a bit better. This is a good thing. A failure to properly implement license inheritance tends to limit what the *reader* believes they can do with entries, but it doesn't do any harm to the owner of the intellectual property in the entries since no one can believe that they have rights not granted. The worst that can happen is that readers don't know all the rights they have. This is acceptable, in my opinion. bob wyman
Re: Inheritance of license grants by entries in a feed
On 1/14/07, David Powell [EMAIL PROTECTED] wrote:Atom doesn't describe the processing model of Atom documents explicitly enough for me to infer much about the semantics of atom:source. ... Needing to [use atom:source] is a good sign that you are abusing feed elements to carry entry metadata though. There are quite a few very common, non-abusive reasons for using atom:source. For instance, the RFC clearly discusses the case where an entry is copied from one feed document into another and needs to maintain its association with the feed metadata of the source feed. There is also the question of signatures In any case, I read the Atom spec as clearly intending that an entry with an atom:source element can be semantically equivalent to a single entry feed document whose feed meta-data is equivelant to that contained in the entry's atom:source. If this isn't what appears to be written, then I suggest that it is a case of non-optimal drafting and the history of this group should be consulted to clarify the intent. I explained why entries with source needed to be equivelant to single entry feeds when I made the original proposal for atom:source at the first Atom community meeting at Sun in June of 2004 and I made it continuously throughout the process of drafting the RFC. This is also one of the many reasons why Atom assigns no significance to the order of atom:entry elements within the feed. The meaning of an entry derives only from data which is either encoded within it or which is recorded as part of the feed metadata associated with the entry. That association is either by containment within a feed document or, more strongly, by encapsulating the feed metadata within the entry. This equivelance property is essential in order to make aggregated/synthetic feeds work and it is necessary to make licensing work properly. (Yes, there were some of us thinking about licensing long before James made his proposal...) Thus, the processing model for an entry with an atom:source is just as precisely described as the processing model for a single entry feed document... bob wyman
Re: Inheritance of license grants by entries in a feed
On 1/14/07, David Powell [EMAIL PROTECTED] wrote: I agree that it is important to distinguish between feeds and feed documents, and this is why I think that feed level inheritance of licenses should be dropped as it is incompatible with Atom. Inheritance can't be incompatible with Atom since Atom defines it. I do agree with you, however, if you argue that Atom would have been cleaner without inheritance. Without inheritance, feed level meta-data would only apply to the collection which contains entries and not to the entries themselves. Without inheritance, we wouldn't need atom:source -- we would have only needed atom:provenance (a simple link to an entry's origin feed similar to the source element in RSS. Note: Synthetic feed producers still would have wanted atom:source as a convenient way to reduce the need to repeatedly fetch feed documents to get atom:title values.) However, folk really wanted to keep inheritance of the feed metadata and so we ended up having to define something more complex. bob wyman
Re: Fwd: Atom format interpretation question
On 1/4/07, James M Snell [EMAIL PROTECTED] wrote: If the NewsML folks want to be able to use a proper mediatype to identify their stuff AND treat it as XML, they should come upwith an appropriate media type registration (e.g.application/newsml+xml, etc). Did the +xml convention ever get formalized in some RFC? I know we all *think* that tacking +xml onto the end of something means that it is some use of XML, however, if I remember correctly, this little bit of syntax has never actually been formalized... Or have I missed something? Is there an RFC that defines what +xml means? bob wyman
Re: I-D ACTION:draft-ietf-atompub-typeparam-00.txt
This document looks good on an initial quick read -- with one possible exception. It says: Atom processors that do recognize the parameter SHOULD detect and report inconsistencies between the parameter's value and the actual type of the document's root element. This would seem to be creating a directive concerning behavior which is not directly related to interoperation between systems. (I'm assuming that the destination of the reports is the user of the application, a log file, or something like that.) Thus, it seems to me that it might be inappropriate to use the SHOULD word since IETF apps are supposed to be focused on interoperation and are supposed to avoid constraining application behavior unnecessarily. May I suggest that you rewrite this sentence in a manner similar to that below: It is strongly recommended that Atom processors that do recognize the parameter detect and report bob wyman
Re: base within HTML content
On 1/1/07, Geoffrey Sneddon [EMAIL PROTECTED] wrote: Why, may I ask, MUST (under the RFC 2119 definition) HTML content be a fragment (HTML markup within SHOULD be such that it could validly appear directly within an HTML DIV element, after unescaping. - note the word SHOULD, not MUST, implying that you can have a full HTML document within)? What would you do if you wanted to display a feed of 10 entries in newspaper style (i.e. all entries in a single HTML page) yet each of the entries had a different BASE defined? It wouldn't do you much good to move all the base elements to the HEAD of the DOM tree -- you'd just end up with a mess. If you want a local base, then use xml:base. That's what it is for. The same problem exists for other page-global stuff. For instance, XHTML modularization is useless if you're creating Atom entries since that stuff relies on elements in HEAD but, an Atom entry ain't got no head Remember as well that not all of the entries in a feed document need be created by the same person. For instance, with aggregated or synthetic feeds, you end up with entries written by many different authors who have no chance of negotiating how they will divide the global resources that might be used to display their entries. Because some entries may be signed, you can't simply say something like just rewrite the entries -- that would break the signatures. It is good that Atom entries should be fragments. That increases to a great degree the variety of environments in which Atom entries are useful. If you feel constrained by this, I would suggest that you push on those who define HTML and get them to provide mechanisms for allowing fragment-local expression of things that at this time can only be expressed as page-global. (Yes, I realize this will take some time.) bob wyman
Re: Inheritance of license grants by entries in a feed
On 12/17/06, David Powell [EMAIL PROTECTED] wrote: What you can do however, is to specify that feed licenses apply to the feed, and inherit to the entries in the feed. ... It means that the license applies to all entries in that feed, not just ones in that specific feed document. This is probably reasonable behaviour for licenses anyway. Particularly in the case of licenses, it is very important to distinguish between the feed or stream of all entries (past, present and future) associated with a feed id and the actual feed documents that encapsulate subsets of that stream. Atom provides no mechanism for associating meta-data with feeds. Atom only supports associating meta-data with Feed Documents. Data in one feed document does not apply to entries found in another feed document -- or to entries that stand-alone. Feed meta-data found in one feed document does not override, compliment or invalidate feed meta-data found in other feed documents. This is one of the many reasons we have atom:source -- so that we can bind specific feed meta-data to an entry no matter what context in which that entry might appear or when it might be read. If we had a case where data in one feed document overrides data in other feed documents, we'd have a mess. Some of the questions that we'd have to answer are: - Elements like atom:author, atom:contributor and atom:rights can and do change over time -- sometimes frequently. If such a change occurs, does it mean that we've implied a change to all previous entries published in all previously published feed documents? This rule would tend to force us to create new feeds (i.e. new feed ids) whenever authors, contributors, or rights change. This would make a mess for aggregators and feed readers who have enough trouble keeping up with changes in the syndisphere... - If data is present in one feed doc but not another later document, does the absence of the data in the later document override the previous document or do we combine what we know from both documents? (i.e. if the earlier Feed document had an atom:contributor field but a later Feed Document does not, does this mean that we wipe out knowledge of the contributor who might have been essential to creating some of the earlier entries? (That's kind of heartless -- it would be a high tech version of What have you done for me lately?...) Or, do we improperly maintain the old contributor as a contributor of new entries --potentially long after the contributor has died?) - How do we handle mistakes? For instance, if after publishing several thousand feed documents in sequence, I might publish one that accidentally grants all rights to everyone when I really meant to grant only non-commercial rights. Does the new badly coded feed document force all of the thousands of entries I've been working on over time into the public domain even if that wasn't what I intended? - How do I repair the mistake discussed immediately above? If I publish a new feed document with a license grant for non-commercial use, does that then apply to all previously published entries -- including those that were accidentally published with over-generous rights? Does this mean that I can use a license in one feed document to *restrict* or rescind rights granted by another feed document? (This would be a very bad thing...) - If I have an aggregator that picks up some content which is licensed for general use today, how can I be sure that I can still use the content tomorrow? If the content of a Feed Document applies retroactively, it would seem that I have to re-fetch the feed every time I use content from the feed so that I can check the metadata. This doesn't seem to make sense. If I were sued by someone, could I use the argument: But, I didn't read the new, more restrictive Feed Documents! Is ignorance an excuse? I could go on...But, I hope the case is made. Feed Documents only describe themselves and the entries they contain. They do not describe the feed. if you store a feed in an implementation such as Microsoft's Feed Engine, only a single set of feed extensions will be associated with the feed. While it is important to be aware of the inadequacies (as well as the strengths) of implementations by companies with significant market power, I don't think that we can simply delegate the standards writing process to such companies or modify standards to cover up their bugs. The fact that Microsoft or any other company has done the wrong thing should not, in itself, be sufficient to dictate the development of standards. Hopefully, they will eventually see the error in their ways and correct them. bob wyman
Re: AD Evaluation of draft-ietf-atompub-protocol-11
On 12/16/06, A. Pagaltzis [EMAIL PROTECTED] wrote: Extending the Atom envelope is a strategy of last resort. +1 It is important to remember that not all processors of Atom data will know what to do with unexpected metadata in the envelope. Thus, unexpected envelope fields will often simply be stripped off and thrown to the bit bucket. If you want data to stay with your content, it is best to put it in the content/... Sometimes, it may be appropriate to extend the envelope, however, one should not do so without a really compelling case. Envelope extensions typically require fetching time or database structure modifications in consuming applications if those extensions are to be supported. This is because many feed consumers have distinct fields in their databases or internal structures for each of the envelope elements and then just have a single field for content. Also, the code for manipulating envelope fields is usually distinctly different from the code used to manipulate and process content/. So, if you create a new envelope field, you require a great deal of code to be modified for that field to be supported. On the other hand, if something can be slipped into content/ you'll see it being stored immediately and have the opportunity for downstream consumers (display routines, etc.) to provide support for the additional data. (For instance, you might write a GreaseMonkey script to do interesting things with stuff encoded in content/ even though the backend of the application knows nothing about it.) My personal feeling is that many of the proposals (but not all) for envelope extensions are derived from what I consider to be unfortunate precedent set in the RSS world where all sorts of random stuff has been pushed into the envelope since in RSS the description/ field is so under-specified that it isn't really possible to think of it as something which can be structured. Fortunately, the field has moved forward since legacy RSS was defined and we've got better methods that can be used with Atom. There are undoubtedly still things that might go in the envelope, but not as many as some folk might think. bob wyman
Inheritance of license grants by entries in a feed
In general, I think the latest version of James Snell's license ID [1] is much better than earlier versions. I am particularly pleased that this draft only speaks of license grants. I remain, as always, opposed to anything that would encourage people to attempt to restrict the implied license to syndicate. I do, however, have a few small issues. The text on inheritance is, I think, almost correct in this draft however, as written it seems to create a risk of the incorrect granting of rights as well as unfortunate loss or decay of grants when entries are copied between feeds. The current draft states: (focus on the underlined bits. The first underlined sentence is too restrictive, the second too inclusive.) 2.3. Inherited Licenses The license on a feed MAY be inherited by entries. Generally, a more specific license overrides the less specific license. More specifically, if an entry has any license link relations at all, including the undefined license, it does not inherit the license of the feed. If an entry has no license link relations, it does inherit the license of its parent feed(s). Since an entry may appear in multiple feeds, it may inherit multiple licenses. This is equivalent to stating multiple licenses on the entry itself. I am concerned that some readers who are not intimately familiar with RFC4287 may not understand that entries which contain atom:source elements do NOT inherit feed metadata from the feeds in which they are found. The text of the current draft seems to override this constraint on inheritance. Thus, I propose the following new wording for the third and fourth sentences in the first paragraph of section 2.3 (the one's quoted and underlined above): More specifically, if an entry has any license link relations at all, including the undefined license, [or, if the entry contains an atom:source element,] it does not inherit the license of the feed. If an entry has no license link relations[, and contains no atom:source element,] it does inherit the license of its parent feed(s). Additionally, I believe that this draft should align with the handling of atom:rights defined in section 4.2.11 of RFC4287 by adding the following text at some appropriate location: If an atom:entry which does not contain an atom:source is copied from one feed into another feed then if the feed into which it is copied contains a license, an atom:source element SHOULD be added to the copied entry. If a source feed contains a license, that license SHOULD be preserved in an atom:source element added to any entries copied from the source feed which do not already contain atom:source elements. The first constraint is necessary to ensure that the act of copying entries does not result in rights being granted by the copyist even though those rights were were not granted by the entry's author. The second constraint helps to prevent the loss or decay of rights as things are copied from feeds with licenses that grant rights into feeds that contain no or lesser grants. I realize that clarifying these constraints on inheritance allows for at least one odd result. That is, I might have a feed which contains entries whose atom:source elements declare license grants that differ greatly from what is seen in the feed's metadata even though all those entries claim the enclosing feed as their source. This actually makes a good bit more sense than it might seem to at first glance. The reason for this is that the rights granted for entries added to a feed can change over time even though changes to the feed's default rights may not impact previously created entries. Thus, a feed might have granted liberal rights when an entry was first created but might not offer the same grants when the entry was updated. The author should be able to maintain with the entry the rights that were originally granted (or not granted) rather than being forced to update the rights in order to do something as simple as a spelling correction. (Yes, I realize that the author could, in some cases, simply attach the old rights to the updated entry rather than using an atom:source which contains the same information. However, this can get messy in some situations and causes us to lose some information about the source of the license grants -- it may be useful in some cases to distinguish between licenses granted in feed metadata and those granted in entry metadata. Forcing attachment of licenses to entries would also require using the undefined license in more cases than is desirable.) I've got a few other comments -- destined for other messages. Nonetheless, this draft is looking much better than earlier drafts. bob wyman [1] http://www.ietf.org/internet-drafts/draft-snell-atompub-feed-license-10.txt
License Draft: Tortured text re obligations...
There is, I think a bit of tortured text in James Snell's otherwise useful License ID[1]. 1.3. Terminology ... The term license refers to a potentially machine-readable description of explicit rights, and associated obligations, that have been granted to consumers of an Atom feed or entry. The problem is the underlined clause... One can't grant an obligation. (When you have a conjunction, you should be able to scan the sentence with only one element of the conjunction without losing meaning...) As written, the sentence can be read by nitpicking lawyers as: The term 'license' refers to obligations that have been granted... Clearly, this isn't the intent. Thus, I propose the following rewording: The term license refers to a potentially machine-readable description of explicit rights that have been granted to consumers of an Atom feed or entry. Rights granted by a license may be associated with obligations which must be assumed by those exercising those rights. I realize that this is a bit more wordy than the existing text, however, I think it better perserves the author's intent. Also, it has the nice attribute of limiting the discussion of obligations to the scope of rights granted by the licenses -- not rights that might exist in the absence of the license. Nothing we do should encourage people to use in-feed or in-entry data to restrict rights which exist independent of an explicit license grant. Such rights may include fair-use rights, the right to create backups, the implied right to syndicate, etc. As with Creative Commons licenses, I believe our goal here should be to provide mechanisms to expand the rights granted -- not to restrict them. bob wyman [1] http://www.ietf.org/internet-drafts/draft-snell-atompub-feed-license-10.txt
Re: Atom Entry docs
On 12/15/06, Hugh Winkler [EMAIL PROTECTED] wrote: It's telling that James felt it natural to choose the name type for the parameter. Because it really is naming a new type of document. What would be better than type? Might root work better? It seems to me that application/atom+xm;type=entry describes an Atom document whose root element is `entry/'. The type of the document is atom but it is a kind or type of atom document that has an entry/ element as it's root. Unfortunately, type is being used to mean two completely different things in this context. Would you be happier if the proposal was for the following? application/atom+xml;root=entry application/atom+xml;root=feed One argument for using root is that it might be a usage that would be useful with other mediatypes which have more than one possible root element. Also, using root as the parameter name would ensure that folk don't get confused into thinking that there is any kind of subtyping going on here -- specifying ;type=root is simply providing meta-data which describes a constrained use of the general atom type -- it is no different from doing something like saying: I won't except any feeds that don't have icon/ elements. or, This feed contains no more than 256 entry elements. If one is being exceptionally formal or overly pedantic, I can see how you might argue that a feed constrained to fewer than 257 entries is somehow a sub-type of sub-class of the more general atom type. But, since every distinct instance of the atom type can be described in similar manners, it would mean that every atom instance is a subtype. In some contexts, this observation might be useful. I don't think, however, that such precision is useful in the realm for which we normally are designing Atom... bob wyman
Re: Atom Entry docs
There is, I think, a compromise position here which will avoid breaking those existing implementations which follow the existing RFC's. 1) Define ;type=feed and ;type=entry as optional parameters. (i.e. get them defined, registered, and ready to use.) 2) Leave RFC4287 unchanged. i.e. do NOT re-define application/atom-xml 3) New specifications MAY require that ;type=entry be used. (Note: Just because ;type=entry is used DOES NOT imply that ;type=feed must also be used) Thus, APP would accept application/atom+xml when looking for a feed but might insist that entries be explicitly identified with a disambiguating type parameter. Thus, no code which currently uses application/atom+xml to designate a feed would be broken. Additionally, any code which is properly built and thus ignores unknown types will not be hurt when it sees application/atom+xml;type=entry since it will ignore the type parameter and dig around inside the data to figure out if it is feed or entry. The only code which will be hurt is some potential code that does not follow the existing RFCs for Atom or mime types. It is, I think, OK to occasionally break code that doesn't follow the specs. Whatever the technical arguments may be, I believe it is important from a political point of view that we do not change the definition of things defined in Atom. I am all for extending Atom, but not for changing Atom. We must not change the exiting specification unless there is some really serious harm being done. If we do, we risk losing the trust of at least some members of the community that we've built these last few years... Folk will remember that one of the advantages that is claimed for RSS is that it has been declared to be eternally free from modification. While I personally believe that that is silly, the proponents of RSS do have a point when they speak of the value of stable specs. If we allow the Atom spec to be *changed* so soon after it was accepted and we don't have a really, really good reason for doing it, we will simply have proven the often made claim that standards groups simply can't be trusted with important specifications. We will be encouraging more of the kind of standards making that resulted in the mess that is RSS... bob wyman PS: Since Kyle points out that GData, a Google product, is potentially impacted by the results of this discussion, I should state that I currently work for Google -- although I am not currently assigned to any product or project that has a direct interest in the definition of Atom, APP, etc... My participation in this discussion, at this time, is driven purely by personal interest.
Re: PaceEntryMediatype
On 12/10/06, Eric Scheid [EMAIL PROTECTED] wrote: The only danger [of defining a new media type] is if someone has implemented APP per the moving target which is Draft-[n++] ... they should revise their test implementations as the draft updates, and certainly update once it reaches RFC status, so no sympathies there. The impact here is not just limited to APP implementations. If a new media type is defined, it will undoubtedly appear in other contexts as well. Given the current definition of the atom syntax, it is perfectly reasonable for an aggregator to treat a single entry as the semantic equivelant of a single-entry feed. If a new media type is defined, such an application would end up having to be modified. That's not right... APP is not the only context within which Atom is used. bob wyman
Re: Fwd: PaceEntryMediatype
On 12/8/06, James M Snell [EMAIL PROTECTED] wrote: I'm fine with the type parameter approach so long as it is effective. By effective I mean: Will existing implementations actually take the time to update their behavior to properly handle the optional type parameter. It would be useful to define better what is meant by properly handle the optional type parameter. Those that don't understand the parameter should simply continue to operate on the current assumption that they can't really be sure if they are reading a feed or an entry until they read the first few bytes. Those that do understand the meaning of the optional parameter will be writing code in the future and we can hope that if they become aware of the type parameter and decide to care about it, they will have sufficient awareness to do whatever they do in a proper manner. The only case where I can see a problem would be those folk who match against the existing media type as an opaque string and don't have any code to handle opional type parameters. Such sloppy code would be broken by the use of the optional type parameters since the presence of the parameter would break the simple string matches used by these coders. However, I must admit that I don't have much sympathy for such folk. Making basic design decisions to adress the concerns of these sloppy folk is something like the old prejudice against using XML attributes since it tended to make it harder to create sloppy, regex based parsers... In any case, the alternative proposal, create a new media type for entries, would tend to confuse people who have their code written properly today --- those whose code understands that the existing atom mediatype can be used for both a feed and and entry. What we would be doing by creating a new media type is break the code of the folk who paid attention to the spec in order to preserve the code of those who didn't read the spec (or those who refused to see Atom as anything other than some twisted form of RSS...) This doesn't make sense to me. We should use the type parameter if anything is changed here. bob wyman
Re: rss reader
On 12/9/06, Greger [EMAIL PROTECTED] wrote: hi I have made a prototype rss reader. All is good. just wondering if anyone would be interested in getting down to create a C++ library for in particular atom, but also the other used rss feed types. anyone working on this kinds of things? If you're going to be building a C++ library for syndication, you might want to look at the Microsoft RSS Platform for some inspiration. See: http://msdn.microsoft.com/XML/rss/default.aspx I believe that Microsoft's is currently the most comprehensive platform for handling syndication feeds. But, there is much that can be done to improve on what they've done. bob wyman
Fwd: PaceEntryMediatype
On 12/5/06, James M Snell [EMAIL PROTECTED] wrote: Mark Baker wrote: It's just an entry without a feed. You'd use the same code path to process that entry whether it were found in an entry or feed document, right? Not necessarily... The majority of applications that most frequently handle Atom Feed Documents have no idea how to deal with Atom Entry Documents and I would wager that most applications that understand how to process Atom Entry Documents and Atom Feed Documents typically don't fall into the same category as most feed readers. What you seem to be implying is that the majority of applications that process Atom Feed documents are not, in fact, supporting extremely important parts of the atom specification. I believe that any properly constructed Atom Feed parser will contain all the code needed to parse the most complex Atom Entry document. And, an entry document with an atom:source is semantically equivelant to an atom:feed with a single entry... The problem here is that people insist on building Atom parsers that aren't capable of handling more semantics than legacy RSS. What we should be doing is encouraging people to exploit Atom and use its features -- atom:source among others -- that aren't supported by RSS. For a parser that properly handles the case of an atom:entry appearing within atom:feed, it should be trivially simple code to recognize and handle an entry without a feed wrapper. I think there are even cases where this makes sense -- and you would even want to subscribe to such a thing: Consider a feed that communicates current weather or current stock price, etc. We wouldn't be surprised if such a feed never contained more than a single entry. We also wouldn't be surprised if the publisher of this single entry feed decided that he wanted to sign the entry in this single-entry-feed and was thus forced to insert all of the feed data into the entry's atom:source. Of course, once you've got a single-entry Atom feed which contains a signed entry, you have all the feed data duplicated -- so, it wouldn't be surprising to see authors of such feeds argue that they shouldn't be forced to waste bits on duplicated feed data when an atom entry document provides exactly what they need. In any case, while it appears reasonable (and sometimes efficient) for people to subscribe to Entry documents, I don't think we should do anything disruptive unless someone can establish actual harm being caused by the current state of affairs. bob wyman
RE: atom license extension (Re: [cc-tab] *important* heads up)
John Panzer asks of Karl Dubost: (Let's say that Doc Searls somehow discovers a license that would deny sploggers more than implied rights to his content while allowing liberal use for others[1], and deploys it. Are you saying that all of his readers' feed software would have to drop his feed content until they're upgraded to understand the license?) [1] http://doc.weblogs.com/2006/08/28 I think John's question can be (aggressively) rephrased as: Can Doc Searls, by inserting a license in his feed, 'poison' the entire syndication system that we've built over the last few years? (i.e. Can he do things that make it unsafe or illegal for people to do things which the syndication system was intentionally built to permit and which he knew were being done before he willingly inserted his content into the syndication network?) I don't think so. As argued in other messages, I strongly believe that we should not do anything that hinders or conflicts with the establishment or recognition of a limited implied license to syndicate content which is formatted using RSS/Atom and is made openly available on the network. (An interesting question, of course, would be: What does it mean to 'syndicate'?) In any case, there is a general problem of proper notice here. As mentioned before, there is nothing special about an optional IETF protocol extension. This subject of inserting licenses in content should be discussed in a general sense -- not limited to this specific protocol extension. A vital question to ask is: What is proper notice of the presence of a license? No IETF standard has the force of law. Readers are not obligated to understand or even take note of the license links. Thus, no one using it should be able to have any expectation that readers will take note of it any more than they would of many other possible means of inserting licenses or references to them in content. Publishers and consumers should both be working on the assumption that normal copyright exists (i.e *all rights reserved*) except where there are fair use privileges of implied licenses that weaken the *all rights* default.) If we were to allow or encourage any one mechanism to associate restrictive licenses with content, we establish a precedent that would allow or encourage others as well. Any other standards group or informal collection of one or more persons could decide to define a new mechanism -- just like the IETF did. At that point, no reader could safely consume content since no matter how many mechanisms they supported there might be some others that they didn't know about. The issue here is about proper notice... How can we obligate folk to respect licenses that they have no means of discovering? We should also ask: At what point does a restrictive license become operative? Imagine that I decided that reading (copying) of my feeds by commercial organizations was to be prohibited. Could I bar such copying by putting a license in the content itself? Of course, if I did, that means that in order to discover that copying was not permitted the reader would have to actually do the thing which is prohibited. Clearly, even if there was some way to put effective restrictive licenses in content, there would have to remain some implied license exceptions to the *all rights provision of copyright. We are all best served by an assumption that copyright leaves all rights reserved to the publisher and that only fair use, limited implied license to syndicate, and explicit license grants (like CC) limit the totality of those rights. With this in mind it might be best to change from a license link to a rights-grant link... In other words, frame this link type as something which can *only* be used to broaden rights, not restrict them. bob wyman
RE: atom license extension (Re: [cc-tab] *important* heads up)
suggestion for this sentence is that it might be less strongly worded. Given that the law in this area is not settled, it might make sense not to say Nor can a license... restrict... Rather, it might be more accurate to say something like: It is believed that a license ... cannot restrict My apologies for such a long message... bob wyman
RE: atom license extension (Re: [cc-tab] *important* heads up)
Thomas Roessler wrote: It's fine to point out the lack of an enforceable binding on a technical level, but I don't think this spec is the place to discuss the legal implications that this might have. If the spec does not make statements concerning the intended legal implications of a feature which clearly addresses legal issues, the result will almost inevitably be wide-spread misunderstanding of the implications of using the feature. The mere act of going to the trouble of specifying the license link indicates that the authors expect that there will be some implication of having used the feature. The question that many readers will have is: What are the intended implications? Leaving the answer to guess work is not useful, I think. Given the unsettled and potentially dynamic state of the law in this area, I certainly agree that the spec should not make pronouncements concerning what the law is in this case. But I don't see any valid argument against making statements of intent that may, or may not, be in conflict with the law as it is or may one day be. The authors of the specification have, I think, not only good reason to state their intention but an obligation to do so. Warning implementers that the use of the license link may not, in at least some situations and in some legal systems, create a legally enforceable binding is the right thing to do. bob wyman
RE: atom license extension (Re: [cc-tab] *important* heads up)
Wendy Seltzer wrote: The concern about limiting implied licenses is important... If the rfc encourages people to add licenses, it opens up the possibility that their explicit terms will contradict and override what has previously been implied. This is precisely why I have normally argued against adding rights and licenses mechanism to Atom and other formats. Unfortunately, it is has been a losing battle (Atom has rights/) so, I'm now trying the tack of attempting to get explanatory text and weakness in the language in order to mitigate some of the damage that might be caused. Oddly, I think part of the push for these dangerous licensing mechanisms is the result of success of Creative Commons. We may be seeing that a movement intended to expand rights will indirectly create a situation where rights are more easily restricted. People really like the CC mechanism for granting rights and as a result want cleaner and better understood means for associating Creative Commons licenses with their content. Unfortunately, an unintended consequence of satisfying this desire to publish CC licenses might be that it becomes easier and more common for folk to publish restrictive licenses. Readers of this thread might be interested to see that Denise Howell has been discussing very similar issues on her new Logarithms blog.[1][2] I've put some comments in there and have also responded in length concerning what I, as a non-lawyer, consider some of the implied licenses that attach to RSS/Atom syndicated content.[3] bob wyman [1] http://blogs.zdnet.com/Howell/?p=17 [2] http://blogs.zdnet.com/Howell/?p=18 [3] http://www.wyman.us/main/2006/09/magazine_or_mus.html
RE: atom license extension (Re: [cc-tab] *important* heads up)
Antone Roundy wrote: With respect to the issue of aggregate feeds, I had thought that the existence of an atom:source element at the entry level blocked any inheritance of the feed metadata, but looking at RFC 4287, I don't see that explicitly stated. It's not explicit, but it is implicit. The source/ element preserves the entry's feed metadata. Thus, to find the feed metadata associated with an entry which has an atom:source, you should look to the preserved data in the atom:source element (or the source feed itself...) -- you should NOT look to the metadata of the feed within which you found the entry. Atom:source says, essentially: This entry is not of this feed. It is foreign and should be interpreted as such. Thus, the feed metadata of the containing feed should never be allowed to leak into the interpretation of an entry which contains an atom:source. To do so would make syndication, aggregation, etc. a complete mess. bob wyman
RE: Finally Atom: Blogger is here
Aristotle Pagaltzis wrote: [Now that Blogger supports both RSS 2.0 and Atom 1.0] That makes what, another few dozen million Atom 1.0 feeds? Yes, many, many more than before. But also many more legacy RSS 2.0 feeds. Which leads to the inevitable rhetorical question: Why the heck do people keep insisting that the industry continue to support new deployments of RSS 2.0? This is just silliness. bob wyman
RE: Atom license link last call
James Snell wrote: [1] The relationship [between license and atom:right] is subtle, but important ... [2] I specifically wanted to differentiate the two. ... [3] The two serve different, but related, purposes. The two should not contradict each other. If they do, consumers must go back to the content publisher to resolve the problem. Given the subtle differences, the claimed importance of the differences, and their supposed utility, I would strongly suggest that these points should be clearly stated in the ID itself. It is highly unlikely that readers of an eventual RFC are going to universally come here and read the illuminating messages in the mailing list archive. Thus, the subtle distinctions that you see are highly likely to be lost once the RFC is published -- unless you document them. Also, it is more likely that reviewers will be able to make more informed judgments if these distinctions are clearly documented in the ID text. bob wyman
RE: Atom license link last call
James, My apologies if these questions and comments have been dealt with before: * What is the expected or intended relationship between data carried in the atom:rights element and data pointed to by the license relationship? * Why did you choose the word license when Atom itself uses the word rights for a very similar (if not identical) concept? * If the intent of the license link is to provide a mechanism to support out of line rights elements, then did you consider doing something similar to the handling of out-of-line atom: content via a src attribute? For example: Does the license link do anything that would not be accomplished by adding support for rights elements in the following form: rights src=http://.../ * If a feed reader discovers both atom:rights and a license link in a single entry or feed, is there any concept of precedence between the two? For instance, if the text of the license is more or less restrictive than what is in the atom:rights element, what should the reader assume about the rights that are granted? bob wyman
RE: Fyi, Apache project proposal
James M Snell mentioned his Apache Project... It would be *very* nice if you could see your way to implementing RFC3229+feed[1] support in your implementation. As I think you know, the use of this mechanism results in massive reductions in the bandwidth and client-side processing required in fetching updates to Atom feeds. Also, Microsoft will be supporting RFC3229+feed in their browsers[2], thus, we can anticipate that support for fetching delta-feeds will soon be considered expected. The only issue with Apache is that Apache *still* does not support the 226 response code... bob wyman [1] http://bobwyman.pubsub.com/main/2004/09/using_rfc3229_w.html http://bobwyman.pubsub.com/main/2004/09/implementations.html http://www.intertwingly.net/blog/2004/09/15/Syndication-with-RFC3229 [2] http://bobwyman.pubsub.com/main/2006/04/microsoft_to_su.html
RE: atom:updated handling
Phil Ringnalda wrote: Patches that will make that more clear are welcome. The warning message that Phil points to says in part: (at: http://feedvalidator.org/docs/warning/DuplicateUpdated.html) For example, it would be generally inappropriate for a publishing system to apply the same timestamp to several entries which were published during the course of a single day. Of course, this leads one to wonder if it might be appropriate to apply the same timestamp to several entries if they were published during the course of multiple days... It would make a great deal more sense to say something like: It would not be appropriate to apply the same timestamp to several entries unless they were published simultaneously. bob wyman
Structured Publishing -- Joe Reger shows the way...
Ive written a blog post pointing to a wonderful demo of tools for doing structured publishing in blogs that Joe Reger has put together. Given that Atom has built-in support for handling much more than just the text/HTML that RSS is limited to, I think this should be interesting to the Atom community. http://bobwyman.pubsub.com/main/2005/09/joe_reger_shows.html What can we do with Atom to make the vision of Structured/Semantic publishing more real? bob wyman
The benefits of Lists are Entries rather than Lists are Feeds
Folks, I hate to be insistent, however, I think that in the mail below I offered some pretty compelling reasons why lists should be entries rather than turning feeds into lists. Could someone please comment on this? Is there some point that I'm completely missing? What is wrong with my suggestion that lists-are-entries is much more useful than the alternative? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bob Wyman Sent: Tuesday, August 30, 2005 5:10 PM To: 'Mark Nottingham' Cc: atom-syntax@imc.org Subject: RE: Top 10 and other lists should be entries, not feeds. Mark Nottingham wrote: Are you saying that when/if Netflix switches over to Atom, they shouldn't use it for the Queue? No. I'm saying that if Netflix switches over to Atom, what they should do is insert the Queue information, as a list, into a single entry within the feed. This will not only preserve the nature of Atom feeds as feeds but also allow NetFlix a number of new and potentially interesting opportunities for providing data to customers. Most important among these will be the ability to include multiple lists in the feed (i.e. in addition to the Queue, they could also include their Top 10 list as well as a set of recommendations based on user experience. They might even include a list of 10 most recent transactions on your account) Each list would be a distinct entry. To make life easier on aggregators, each entry type should probably use the same atom:id across versions. This allows the aggregators to discard earlier, now out of date entries. NetFlix would also be able to intermix information such as the Queue List with non-list entries. For instance, they might have a Message from NetFlix that they want to include in the feed or, they might include a series of movie reviews that were carefully selected for the specific user. Basically, by using entries for lists instead of converting the entire feed into a list, NetFlix is able to offer a much richer and much more satisfying experience to their users. The ability of Atom to carry both lists and non-lists as entries means that Atom is able to offer a much more flexible and powerful mechanism to NetFlix than can be had from the less-capable RSS V2.0 solution. I think that if I were NetFlix, I would want to have the opportunity to experiment with and find ways to exploit this powerful capability. The richer the opportunity for communications between NetFlix and their customers, the greater the opportunity they have to generate revenues. The alternative to using entries rather than feeds would be creating multiple feeds per user. That strikes me as a solution which is ugly on its face and unquestionably increases the complexity of the system for both NetFlix and its customers. The list-in-entry solution is much more elegant and much more powerful. bob wyman
RE: Top 10 and other lists should be entries, not feeds.
Mark Nottingham wrote: Are you saying that when/if Netflix switches over to Atom, they shouldn't use it for the Queue? No. I'm saying that if Netflix switches over to Atom, what they should do is insert the Queue information, as a list, into a single entry within the feed. This will not only preserve the nature of Atom feeds as feeds but also allow NetFlix a number of new and potentially interesting opportunities for providing data to customers. Most important among these will be the ability to include multiple lists in the feed (i.e. in addition to the Queue, they could also include their Top 10 list as well as a set of recommendations based on user experience. They might even include a list of 10 most recent transactions on your account) Each list would be a distinct entry. To make life easier on aggregators, each entry type should probably use the same atom:id across versions. This allows the aggregators to discard earlier, now out of date entries. NetFlix would also be able to intermix information such as the Queue List with non-list entries. For instance, they might have a Message from NetFlix that they want to include in the feed or, they might include a series of movie reviews that were carefully selected for the specific user. Basically, by using entries for lists instead of converting the entire feed into a list, NetFlix is able to offer a much richer and much more satisfying experience to their users. The ability of Atom to carry both lists and non-lists as entries means that Atom is able to offer a much more flexible and powerful mechanism to NetFlix than can be had from the less-capable RSS V2.0 solution. I think that if I were NetFlix, I would want to have the opportunity to experiment with and find ways to exploit this powerful capability. The richer the opportunity for communications between NetFlix and their customers, the greater the opportunity they have to generate revenues. The alternative to using entries rather than feeds would be creating multiple feeds per user. That strikes me as a solution which is ugly on its face and unquestionably increases the complexity of the system for both NetFlix and its customers. The list-in-entry solution is much more elegant and much more powerful. bob wyman
Top 10 and other lists should be entries, not feeds.
Im sorry, but I cant go on without complaining. Microsoft has proposed extensions which turn RSS V2.0 feeds into lists and weve got folk who are proposing much the same for Atom (i.e. stateful, incremental or partitioned feeds) I think they are wrong. Feeds arent lists and Lists arent feeds. It seems to me that if you want a Top 10 list, then you should simply create an entry that provides your Top 10. Then, insert that entry in your feed so that the rest of us can read it. If you update the list, then just replace the entry in your feed. If you create a new list (Top 34?) then insert that in the feed along with the Top10 list. What is the problem? Why dont folk see that lists are the stuff of entries not feeds? Remember, Its about the entries, Stupid I think the reason weve got this pull to turn feeds into Lists is simply because we dont have a commonly accepted list schema. So, the idea is to repurpose what weve got. Folk are too scared or tired to try to get a new thing defined and through the process, so they figure that they will just overload the definition of something that already exists. I think thats wrong. If we want Lists then we should define lists and not muck about with Atom. If everyone is too tired to do the job properly and define a real list as a well defined schema for something that can be the payload of a content element, then why not just use OPML as the list format? What is a search engine or a matching engine supposed to return as a result if it find a match for a user query in an entry that comes from a list-feed? Should it return the entire feed or should it return just the entry/item that contained the stuff in the users query? What should an aggregating intermediary like PubSub do when it finds a match in an element of a list-feed? Is there some way to return an entire feed without building a feed of feeds? Given that no existing aggregator supports feeds as entries, how can an intermediary aggregator/filter return something the client will understand? You might say that the search/matching engine should only present the matching entry in its results. But, if you do that what happens is that you lose the important semantic data that comes from knowing the position the matched entry had in the original list-feed. There is no way to preserve that order-dependence information without private extensions at present. Im sorry but I simply cant see that it makes sense to encourage folk to break important rules of Atom by redefining feeds to be lists. If we want lists we should define what they look like and put them in entries. Keep your hands off the feeds. Feeds arent lists they are feeds. bob wyman
RE: Don't Aggregrate Me
Roger Benningfield wrote: However, if I put something like: User-agent: PubSub Disallow: / ...in my robots.txt and you ignore it, then you very much belong on the Bad List. I don't think so. The reason is that I believe that robots.txt has nothing to do with any service I provide or process that we run. Thus, I can't imagine why I would even look in the file. Remember, PubSub never does anything that a desktop client doesn't do. We only look at feeds that have pinged us or that someone has explicitly loaded into our system using add-feed. We NEVER crawl. We're not a robot and thus I can't see why we would even look at robots.txt. Does your browser look at robots.txt before fetching a page? Does you desktop aggregator look at it before fetching a feed? I don't think so! But, should a crawler like Google, Yahoo! or Technorati respect robots.txt? YES! bob wyman
RE: Don't Aggregrate Me
Antone Roundy wrote: I'm with Bob on this. If a person publishes a feed without limiting access to it, they either don't know what they're doing, or they're EXPECTING it to be polled on a regular basis. As long as PubSub doesn't poll too fast, the publisher is getting exactly what they should be expecting. Because PubSub aggregates content for thousands of others, it removes significant bandwidth load from publishers' sites. We only read a feed from a site in response to an explicit ping from that site or, for those sites that don't ping, we poll them on a scheduled basis. In fact, we read scheduled, non-pinging feeds less frequently than most desktop systems would. No one can claim that we do anything but reduce the load on publishers systems. It should also be noted that we support gzip compression, RFC3229+Feed, conditional-gets, etc. and thus do all the things necessary to reduce our load on publishers sites in the event that we actually do fetch data from them. This is a good thing and not something that robots.txt was intended to prevent. bob wyman
RE: Don't Aggregrate Me
Mark Pilgrim wrote (among other things): (And before you say but my aggregator is nothing but a podcast client, and the feeds are nothing but links to enclosures, so it's obvious that the publisher wanted me to download them -- WRONG! I agree with just about everything that Mark wrote in his post. However, I'm finding it very difficult to accept this bit about enclosures (podcasts.) It seems to me that the very name enclosure implies that the resources pointed to are to be considered part and parcel of the original entry. In fact, I think one might even argue that if you *didn't* download the enclosed items that you had created a derivative work that didn't represent the item that was intended to be syndicated... Others have pointed out the problem with links to images, stylesheets, CSS files, etc. And, what about the numerous proposals for linking one feed to another? What about the remote content pointed to by a src attribute in an atom:content element? Should PubSub be able to read that remote content when indexing and/or matching the entry? It strikes me that not all URIs are created equally and not everything that looks like crawling is really crawling. I am firm in believing that URI's in a/ tags are the stuff of crawlers but the URIs in link/ tags, enclosures, media-rss objects, img/ tags, etc. seem to be qualitatively different. I think crawling URI's found in link/ tags, img/ tags and enclosures isn't crawling... Or... Is there something I'm missing here? bob wyman
RE: Don't Aggregrate Me
Roger Benningfield wrote: We've got a mechanism that allows any user with his own domain and a text editor to tell us whether or not he wants us messing with his stuff. I think it's foolish to ignore that. The problem is that we have *many* such mechanisms. Robots.txt is only one. Others have been mentioned on this list in the past. Others are buried in obscure posts that you really have to dig to find. How do we decide which mechanisms to use? Also, since I don't think robots.txt was intended to be used for services like the aggregators we're discussing, I believe that for us to encourage people to use it in the way you suggest would be an abuse of the robots.txt system. Bob: What about FeedMesh? If I ping blo.gs, they pass that ping along to you, and PubSub fetches my feed, then PubSub is doing something a desktop client doesn't do. Wrong. Some desktop clients *do* work like FeedMesh. Consider the Shrook distributed checking system[1]. FeedMesh and PubSub work very much like Shrook's desktop clients do. In the Shrook system, all the desktop clients report back updates that they have found to a central service that then distributes the update info to other clients. The result is that the amount of polling that goes on is drastically reduced and the freshness of data is increased since every client benefits from the polling of all other clients. Although no single client might poll a site more frequently than once an hour, if you have 60 Shrook clients each polling once an hour, each client is getting the effect of polling every minute... The Shrook model is basically the same as the FeedMesh model except that in FeedMesh you typically ask for info on ALL sites whereas in Shrook, you typically only get updates for a smaller, enumerated set of feeds. However, the number of feeds you monitor does not change the basic nature of the distributed checking system. Shrook and FeedMesh are, as far as I'm concerned, largely indistinguishable in this area. (There are some detail differences of course. For instance, Shrook worries about client privacy issues that aren't relevant in the FeedMesh case.) Remember, PubSub only deals with data from Pings and from sites that have been manually added to our system. We don't do any web scraping and we don't follow links to find other blogs. Also, we filter out of our system feeds that originate with services that are known to scrape web pages and inject data that was not intended by the original publisher to appear in feeds. (Often, people try to get around partial feeds by filling in the missing bits by scraping from blog's websites.) Thus, we filter out any feed that comes from a service like Technorati since they scrape blogs and inject scraped content into feeds without the explicit approval or consent of the publishers of the sites they scraped. bob wyman [1] http://www.fondantfancies.com/apps/shrook/distfaq.php
RE: Don't Aggregrate Me
Karl Dubost wrote: - How one who has previously submitted a feed URL remove it from the index? (Change of opinions) If you are the publisher of a feed and you don't want us to monitor your content, complain to us and we'll filter you out. Folk do this every once in a while. Send us an email using the contact information on our site. (Sorry I don't want to put an email address in a mailing list post... We get enough spam already.) - How someone who's not mastering the ping (built-in in the service, the software) but doesn't want his/her feed being indexed by the service. Providers of hosted blogging solutions or of stand-alone system should feel a responsibility to do a better job of educating their users as to the impact of configuration options (or the lack of options.) There are many blogging systems that don't support pings and others which normally provide pings but allow users to turn them off. Some systems, like LiveJournal even allow you to have a blog but mark it private so that only your friends can read it and pings aren't generated. What might not be happening as well as it could is the process by which service or software providers are educating their users. Services should work harder to educate their users. bob wyman
RE: Don't Aggregrate Me
Karl Dubost points out that it is hard to figure out what email address to send messages to if you want to de-list from PubSub...: Karl, Please, accept my apologies for this. I could have sworn we had the policy prominently displayed on the site. I know we used to have it there. This must have been lost when we did a site redesign last November! I'm really surprised that it has taken this long to notice that it is gone. I'll see that we get it back up. You see educating users is not obvious it seems ;) No offense, it just shows that it is not an easy accessible information. And there's a need to educate Services too. Point taken. I'll get it fixed. It's a weekend now. Give me a few days... I'm not sure, but I think it makes sense to put this on the add-feed page at: http://www.pubsub.com/add_feed.php . Do you agree? Scenario: I take the freedom to add his feed URL to the service and/or to ping the service because I want to know when this guy talk about me the next time. Well the problem is that this guy doesn't want to be indexed by these services. How does he block the service? Yes, forged pings or unauthorized third-party pings are a real issue. Unfortunately, the current design of the pinging system gives us absolutely no means to determine if a ping is authorized by the publisher. This is one of many, many issues that I hope that this Working Group will be willing to take up once it gets the protocol worked out and has time to think about these issues. I argued last year that we should develop a blogging or syndication architecture document in much the same way that the TAG documented the web architecture and in the way that most decent standards groups usually produce some sort of reference architecture document. There are many pieces of the syndication infrastructure that are being ignored or otherwise not being given enough attention. Pinging is one of them. Some solutions, like requiring that pings be signed would work from a technical point of view, but are probably not practical except in some limited cases. (e.g. Signatures may make sense as a way to enable Fat Pings from small or personal blog sites. In that case, the benefit of the Fat Ping might override the cost and complexity of generating the signature.) Some have also proposed the equivalent of a do-not-call list that folk could register with. We might also set up something like FeedMesh where service providers shared updates concerning which bloggers had asked to be filtered out. (That means you would only have to notify one service to get pulled from them all -- a real benefit to users.) Or, we could define extensions to Atom to express these things... There are many options. Today, we do the best we can with what we have. Hopefully, we'll all maintain enough interest in these issues to continue the process of working them out. bob wyman
RE: Don't Aggregrate Me
James M Snell wrote: Does the following work? feed ... x:aggregateno/x:aggregate /feed I think it is important to recognize that there are at least two kinds of aggregator. The most common is the desktop end-point aggregator that consumes feeds from various sources and then presents or processes them locally. The second kind of aggregator would be something like PubSub -- a channel intermediary that serves as an aggregating (and potentially caching) router that forwards messages on toward end-point aggregators. Your syntax seems only focused on the end-point aggregators. Without clarifying the expected behavior of intermediary aggregators, your proposal would tend to cause some significant confusion in the system. Should PubSub aggregate and/or route entries that come from feeds marked no-aggregate? If not, why not? From the publisher's point of view, an intermediary aggregator like PubSub should be indistinguishable from the channel itself. bob wyman
RE: Don't Aggregrate Me
Karl Dubost wrote: One of my reasons which worries me more and more, is that some aggregators, bots do not respect the Creative Common license (or at least the way I understand it). Your understanding of Creative Commons is apparently a bit non-optimal -- even though many people seem to believe as you do. The reality is that a Creative Commons license cannot be used to restrict access to data. It can only be used to relax constraints that might otherwise exist. A Creative Commons license that says no commercial use is not prohibiting commercial use, rather, it is saying that the license does not grant commercial use. (The distinction between prohibiting use and not granting a right to use is very important.) A no commercial use CC license merely says that other constraints i.e. copyright, etc. continue to have force. Thus, if copyright applies to the content, and one has a non-commercial use CC license on that content, one would assume that the copyright restrictions which would tend to limit commercial use would still apply. It is important to re-iterate that a CC License only *grants* rights, it does not restrict, deny, or constrain them in any way. Thus, you can't say: The aggregator failed to respect the CC non-commercial use attribute. You must say: The aggregator failed to respect the copyright. bob wyman
RE: Don't Aggregrate Me
Antone Roundy wrote: How could this all be related to aggregators that accept feed URL submissions? My impression has always been that robots.txt was intended to stop robots that crawl a site (i.e. they read one page, extract the URLs from it and then read those pages). I don't believe robots.txt is intended to stop processes that simply fetch one or more specific URLs with known names. At PubSub we *never* crawl to discover feed URLs. The only feeds we know about are: 1. Feeds that have announced their presence with a ping 2. Feeds that have been announced to us via a FeedMesh message. 3. Feeds that have been manually submitted to us via our add-feed page. We don't crawl. I do not think we qualify as a robot in the sense that is relevant to robots.txt. It would appear that Walter Underwood of Verity would agree with me since he says in his recent post that: I would call desktop clients clients not robots. The distinction is how they add feeds to the polling list. Clients add them because of human decisions. Robots discover them mechanically and add them. If Walter is correct, then he must agree with me that robots.txt does not apply to PubSub! (and, we should not be on his bad list Walter? Please take us off the list...) bob wyman
RE: If you want Fat Pings just use Atom!
Bill de hÓra wrote: the problem is managing the stream buffer off the wire for a protocol model that has no underlying concept of an octet frame. I've written enough XMPP code to understand why the BEEP/MIME crowd might frown at it Framing is, in fact, an exceptionally important issue. Fortunately, HTTP offers us some framing capability in the form of chunked delivery. This is much more light weight than what BEEP provides since HTTP assumes TCP/IP as a transport layer while BEEP did not. The HTTP chunked delivery method would be vastly superior to the suggestions for doing thing like including form-feeds or sequences of nulls as entry boundary markers. If you accept a simple rule that says that you will insert HTTP chunk length markers between each entry sent in a never-ending Atom file, you get something like the feed I show below. Simply strip out the chunk length data prior to stuffing data into your XML parser. If an entry appears to continue beyond a chunk boundary, discard that entry and continue by reading the next chunk. See: http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1 for more information on this method. Note that RFC2616 says: All HTTP/1.1 applications MUST be able to receive and decode the chunked transfer-coding,... Note: the chunk lengths are not correct in the following example. GET /never-ending-feed.xml HTTP/1.1 HTTP/1.1 200 OK Date: Fri Apr 8 17:41:11 2005 Server: FeedMesh/0.1 Connection: close Transfer-Encoding: chunked Content-Type: application/xml; charset=utf-8 ab ?xml version=1.0 encoding=utf-8? feed ... ... a8 entry ... /entry 93 entry ... /entry And so forth until finally you get a /feed, the connection closes, or you close the connection. This is simple, requires no new specifications and provides for robust error recovery in that broken entries can be easily detected and discarded. bob wyman
RE: Extensions at the feed level (Was: Re: geolocation in atom:author?)
James M Snell wrote: Second note to self: After thinking about this a bit more, I would also need a way of specifying a null license (e.g. the lack of a license). For instance, what if an entry that does not contain a license is aggregated into a feed that has a license. The original lack-of-license still applies to that entry regardless of what is specified on the feed level. Golly Bob, you're right, this is rather messy ain't it. Hmm... My apologies for not having more clearly pointed this out in my original message. The problem is exacerbated for folk like us at PubSub since we would feel completely comfortable in claiming copyright over the collection of entries that we pass along to our subscribers, however, there is *no way* that we could even hint at claiming copyright over the individual entries themselves. If statements made at the feed level are inherited by or in the scope of the entries, then we would not be able to assert a copyright claim at the feed level since it would leak down to the entries. Of course, one might argue that since we at PubSub will virtually always ensure that any entry we publish has an atom:source element, one could argue that we don't have to worry about this scope leakage. But, we're a special case in this regard. The general issue of scope exists in cases where the atom:source element is not present. bob wyman
RE: Extensions at the feed level (Was: Re: geolocation in atom:author?)
Paul Hoffman asked: Does an informative extension that appears at the feed level (as compared to in entries) indicate: a) this information pertains to each entry b) this information pertains to the feed itself c) this information pertains to each entry and to the feed itself d) completely unknown unless specified in the extension definition I believe the correct answer is e: e) Unless otherwise specified, this information pertains to the feed only. bob wyman
If you want Fat Pings just use Atom!
as claimed by a ping, we would have to be able to trust the pinger. Normally, creating such trust relationships is very expensive. However, given that the vast majority of posts are made on the large services, we can drastically increase the efficiency of the overall system by having just a few of these hosters/publishers who are permitted the privilege of publishing Fat Pings. It is my hope that in the future well be able to rely on Atoms support for Digital Signatures to expand drastically the number of publishers who could be trusted to publish Fat Pings. Brad Proposes: ?xml version='1.0' encoding='utf-8' ? atomStream time1124247941/time feed xmlns='http://www.w3.org/2005/Atom' title type='text'some journal title/title link href=''" title="http://www.livejournal.com/users/username/'">http://www.livejournal.com/users/username/' / authornamesome name/name/author entry titlesome entry title/title link href=''" title="http://www.livejournal.com/users/username/12345.html'">http://www.livejournal.com/users/username/12345.html' / content type='html' content /content /entry /feed I believe that the sample feed above would be better represented as a simple Atom feed which contains entries having source elements. Note: My sample is a bit bigger than Brads since Ive included various bits that are required in Atom but that Brads proposal omits. He readily admits in his postings that he has not yet gone to the effort of ensuring that he is issuing compliant data. I propose the following as an equivelant to Brads sample: ?xml version=1.0 encoding=utf-8? feed xmlns=http://www.w3.org/2005/Atom titleLiveJournal Aggregate Feed/title link href=""> updated2005-08-21T16:30:02Z /updated authornameBrad/name/author idtag:livejournal.org,2005:aggregatefeed-1/id entry xmlns='http://www.w3.org/2005/Atom' source title type=text'Example Feed'/title link href=''/ link rel='self' type='application/atom+xml' href=''/ idtag:livejournal.org,2005:feed-username/id updated2005-08-21T16:30:02Z/updated authornameJohn Doe/name/author /source title some entry title /title link rel='alternate' type='text/html' href=''/ idtag:livejournal.org,2003:entry-username-32397/id published2005-08-21T16:30:02Z /published updated2005-08-21T16:30:02Z /updated content type=html This is some bcontent/b. /content /entry . . . /feed What do you think? Is there any conceptual problem with streaming basic Atom over TCP/IP, HTTP continuous sessions (probably using chunked content) etc.? Is there any really good reason not just to use Atom as defined? bob wyman
RE: Extensions at the feed level (Was: Re: geolocation in atom:author?)
Paul Hoffman wrote: The crux of the question is: what happens when an extension that does not specify the scope appears at the feed level? Robert Sayre asked: I'm not sure why this question is interesting. What sort of application would need to know? I ask: What should an aggregate feed generator like PubSub do when it finds an entry in a feed that contains unscoped extensions as children of the feed? * Would you expect us to include these extension elements in an atom:source element if we use the entry in one of our feeds? * Should we include in the source elements we generate even things that we don't understand? * What should we do if the entry already has a source element but that source element doesn't include the extension elements? Should we publish the source element as we find it? Or, should we modify the source element to include the extensions? (assuming there are no signatures...) bob wyman
RE: If you want Fat Pings just use Atom!
Joe Gregorio wrote: Why not POST the Atom Entry, ala the Atom Publishing Protocol? This would be an excellent idea if what we were talking about was a low volume site. However, a site like LiveJournal generates hundreds of updates per minute. Right now, on a Sunday evening, they are updating at the rate of 349 entries per minute. During peak periods, they generate much more traffic. Generating 349 POST messages per minute to perhaps 10 or 15 different services means that they would be pumping out thousands of these things per minute. It just isn't reasonable. Using an open TCP/IP socket to carry a stream of Atom Entries results in much greater efficiencies with much reduced bandwidth and processing requirements. At PubSub, we've been experimentally providing Fat Ping versions of our FeedMesh feeds to a small group of testers. We publish messages at a rate much higher than LiveJournal does -- since we publish all of LiveJournal's content plus everyone else's. We couldn't even consider Fat Pings if we had to create and tear down a TCP/IP-HTTP session to post each individual entry. There are many situations in which HTTP would work fine for Fat Pings. However, for high-volume sites, it just isn't reasonable. The key, to me, is that we establish the expectation that the Atom format is adequate to the task (whatever the transport) and leave the transport selection as a context dependent decision. Thus, some server/client pairs would exchange streams of Atom entries using the POST based Atom Publishing Protocol while others would exchange essentially the same streams using a more efficient transport mechanism such as streaming raw sockets or even Atom over XMPP. bob wyman
RE: If you want Fat Pings just use Atom!
Aristotle Pagaltzis wrote: I wonder how you would make sure that the document is well-formed. Since the stream never actually ends and there is no way for a client to signal an intent to close the connection, the feed at the top would never actually be accompanied by a /feed at the bottom. This is a problem which has become well understood in the use and implementation of the XMPP/Jabber protocols which are based on streaming XML. Basically, what you do is consider the open tag to have a virtual closure and use it primarily as a carrier of stream metadata. In XMPP terminology, your code works at picking stanzas out of the stream that can be parsed successfully or unsuccessfully on their own. In an Atom stream, the processor would consider each atom:entry to be a parseable atomic unit. If you accept that the stream can never be a complete well-formed document, is there any reason not to simply send a stream of concatenated Atom Entry Documents? That would seem like the absolute simplest solution. You could certainly do that, however, you will inevitably want to pass across some stream oriented metadata and you'll eventually realize that much of it is stuff that you can map into an Atom Feed. (i.e. created date, unique stream id, stream title, etc.). Since we're all in the process of learning how to deal with atom:feed elements anyway, why not just reuse what we've got instead of inventing something new. A rather nice side effect of forming the stream as an atom feed is the simple fact that a log of the stream can be written to disk as a well-formed Atom file. Thus, the same tools that you usually use to parse Atom files can be used to parse the log of the stream. It is nice to be able to reuse tools in this way... (Note: At PubSub, the atom files that we serve to people are, in essence, just slightly stripped logs of the proto-Atom over XMPP streams that they would have received if they had been listening with that protocol. In our clients we can use the same parser for the stream as we do for atom files. It works out nicely and elegantly.) bob wyman
RE: Extensions at the feed level (Was: Re: geolocation in atom:author?)
Aristotle Pagaltzis wrote: That issue is inheritance. Let me give an example of problematic inheritance... Some have suggested that there be a License that you can associate with Atom feeds and entries. However, scoping becomes very important in this case because of some peculiarities of the legal system. One can copyright an individual thing and one can copyright a collection of things. A claim of copyright in a collection is not, however, necessarily a claim of copyright over the elements of the collection. Similarly, a claim of copyright over an element of the collection doesn't reduce any claim of copyright in the collection itself. If we assume inheritance from feed elements, then without further specification, it isn't possible to claim copyright in the collection that is the feed without claiming copyright in its individual parts. What you'd have to do is create two distinct types of claim (one for collection and one for item. That's messy.) I'm sure that copyright and licenses aren't the only problematic contexts here. bob wyman
RE: If you want Fat Pings just use Atom!
Joe Gregorio wrote: Why can't you keep that socket open, that is the default behavior for HTTP 1.1. In some applications, HTTP 1.1 will work just fine. However, HTTP doesn't add much to the high volume case. It also costs a great deal. For instance, every POST requires a response. This means that you're moving from a pure streaming case to an endless sequence of application level ACK/NAKs that are simply replicating what TCP/IP already does for you. Also, the HTTP headers that would be required simply don't contribute anything useful. The bandwidth overhead of the additional headers as well as the bandwidth, processing and timing problems related to generating responses begins to look pretty nasty when you're moving at hundreds of items per minute or second... One really good reason for using HTTP would be to exploit the existing HTTP infrastructure including proxies, caches, application-level firewalls, etc. However, I'm aware of no such infrastructure components that are designed to handle well permanently open high-bandwidth connections. The HTTP infrastructure is optimized around the normal uses of HTTP. This isn't normal. One of the really irritating things about the current HTTP infrastructure is that it is very fragile. This is a problem that has caused unlimited headaches for the folk trying to do notification over HTTP (mod-pubsub, KnowNow, various HTTP-based IM/chat systems, etc.). The problem is that HTTP connections, given the current infrastructure and standard components, are very hard to keep open permanently or for a very long period of time. One is often considered lucky if you can keep an HTTP connection open for 5 minutes without having to re-initialize... Of course, during the period between when your connection breaks and when you get it re-established, you're losing packets. That means that you have to have a much more robust mechanism for recovering lost messages and that means increased complexity, network traffic, etc. The added complexity and trouble can be justified in some cases; however, not in all cases. HTTP is great in some cases but not all. That's why the IETF has defined BEEP, XMPP, SIP, SIMPLE, etc. in addition to HTTP. One protocol model simply can't suit all needs at all times and in all contexts. Whatever... The point here is that Atom already has defined all that appears to be needed in order to address the Fat Ping requirement whether you prefer individual HTTP POSTs, POSTs over HTTP 1.1 connections, XMPP, or raw open TCP/IP sockets. That is a good thing. bob wyman
RE: If you want Fat Pings just use Atom!
Aristotle Pagaltzis wrote: Shades of SGML. No! No! Not that! :-) He continues with: ... many good points Basically, there are many really easy ways that one can handle streams of Atom entries. You could prepend an empty feed to the head of the stream, you could use virtual end-tags, you could just send entries and rely on the receiver to wrap them up as required, etc... But, since all of these are really easy and none of them really gets in the way of anything rational that I can imagine someone wanting to do, why not just default to doing it the way it is defined in the Atom spec? In that way, we don't have to create one more context-dependent distinction between formats. Complexity is reduced and we can avoid having to read yet-another-specification that looks very, very much like hundreds we've read before. If Atom provides all we need, lets not do something else unless there is a *very* good argument to do so. bob wyman
RE: Protocol Action: 'The Atom Syndication Format' to Proposed Standard
This is excellent news! Finally, we have an openly and formally defined standard for syndication. Wonderful! bob wyman
HTTP Accept Headers for Atom V1.0?
What would the HTTP Accept Headers for Atom V1.0 look like? i.e. if I want to tell the server that I want Atom V1.0 but do not want Atom 0.3? bob wyman
Re: Major backtracking on canonicalization
Paul Hoffman wrote: Now that I understand this better, I believe that our text should read: Thank you for catching this. You've saved us major pain! bob wyman
RE: Roll-up of proposed changes to atompub-format section 5
Paul Hoffman wrote: I'm with Tim on the -1. Bob's suggestion and explanation make good sense for the implementer's guide, but not for the base spec. There is not an interoperability issue that I can see for entries without sources being signed. Could we at least put in a sentence that states that including a source element in signed entries is recommended? The implementer's guide would then expand on that with more detail, discussion, etc. Note: I am not suggesting use of the should word, although I would like it. We can debate what it means to have an interoperability issue, however, my personal feeling is that if systems are forced to break and discard signatures in order to perform usual and customary processing on entries that falls very close to the realm of interoperability if not within it. Deferring this issue until the implementer's guide is written is likely to defer it beyond the point at which common practice is established. The result is likely to be that intermediaries and aggregators end up discarding most signatures that appear in source feeds. bob wyman
RE: Roll-up of proposed changes to atompub-format section 5
Tim Bray wrote: If I want to sign an entry and also want to make it available for aggregation then yes, I'd better put in an atom:source. But this is inherent in the basic definition of digsig; not something we need to call out. -Tim Certainly, the chain of reasoning is as clear and logical as you describe. However, it is also very clear that this is precisely the sort of multi-step chain of reasoning that is often overlooked by even the most earnest of implementers. We have many, many indications that significant numbers of RSS/Atom implementers do not, in fact, think much beyond what it takes to get their content into a file. Even the best implementers, and valued participants in this working group, have regularly proved that they don't remember to think out all the systemic issues of syndication. Perhaps it is because there are so few of us that act as intermediaries... The issues are not well understood by those who don't serve this function. Forgive me for suggesting that we call out the obvious. However, this particularly bit of obviousness is not very obvious. In fact, it is probably not obvious to most folk until *after* it has been called out. We will help matters greatly by at least providing a recommendation that source elements be inserted in signed entries... bob wyman
RE: Roll-up of proposed changes to atompub-format section 5
Antone Roundy wrote: When signing individual entries that do not contain an atom:source element, be aware that aggregators inserting an atom:source element will be unable to retain the signature. For this reason, publishers might consider including an atom:source element in all individually signed entries. +1 bob wyman
RE: Roll-up of proposed changes to atompub-format section 5
Tim Bray wrote: Still -1, despite Bob's arguments, at least in part because we have no idea what kind of applications are going to be using signed entries and we shouldn't try to micromanage a future we don't understand. -Tim We *DO* know that PubSub will support signed entries in the future... And, we know that PubSub and any service like it will be forced to discard signatures on any signed entries that do not have source elements in them. Given that at least some would consider it likely that such services will not only remain popular but grow in popularity in the future, why is it such a terrible thing to provide an optional recommendation that people address the needs of these services? I find it hard to imagine what harm could be done by providing this recommendation. Any application written in the future is already forced to handle entries with source elements since these elements are permitted by the Atom specification as it stands now. Thus, simply recommending that people do what they are already permitted to do just doesn't seem to threaten harm to unspecified future applications -- yet, it would clearly accomplish some good in the case of the known applications. What is the utility of signed entries if not to facilitate the copying of entries between feeds? Why sign individual elements unless they are likely to be removed from their original context? If entries are not to be copied, then feed signatures are all that is necessary and would result in smaller, more bandwidth-efficient feeds. bob wyman
RE: Roll-up of proposed changes to atompub-format section 5
Paul Hoffman wrote: Timing. If we change text other than because of an IESG note, there is a strong chance we will have to delay being finalized by two weeks, possibly more. I am aware of the issues with timing and I believe I am just as concerned as you are with these issues. I was rather stunned to be at Gnomedex recently and hear it said that after all the effort we've put into Atom we still have nothing to show for it. An approved RFC would make such statements much less acceptable... However, I think that this can be positioned as part of the response to the IESG comments concerning canonicalization since including source elements in signed entries will tend to cause those entries to be more canonical or consistent in form. Also, given that the addition is merely a recommendation and is thus non-normative, it shouldn't raise any review issues. Please remember that this isn't an issue that I just pulled out of the hat at the last moment. I first brought this up long ago -- long before last call... The problem, as has often been the case with the issues I raise, is that there aren't many people who seem to be terribly aware of or concerned with the aggregation issues even though we've got reasonable representation from those who build feed generators and clients. I'm trying hard to do the right thing for Atom and really wish that other intermediaries, search engines, etc. would participate more but for whatever reason, most have chosen to remain silent on these issues... bob wyman
RE: Roll-up of proposed changes to atompub-format section 5
How about a compromise on the source insertion thing... Paul Hoffman's proposed text for the first paragraph in Section 5 starts off with a set of examples of why one would want to sign or encrypt atom entries or feeds. (Discount coupons, bank statements, etc.) These examples were requested by the IESG. In my opinion, none of the examples really speaks to current uses of Atom. Thus, I would suggest that we either replace one of the existing examples or add a new one with wording something like: A publisher might digitally sign an entry, which included an atom:source element, in order to ensure that verifiable attribution for the entry was available if that entry was copied into another feed or distributed via some other means. I believe this improves the existing proposed text by providing a much more immediately probable example than those currently listed. Additionally, by alluding to the issue of including the source element it may at least tend to cause implementers to consider the wisdom of including source elements in signed entries. Finally, since the provision of examples is something that was explicitly requested as part of the IESG review, this should not cause any delay beyond those that are already inevitable. bob wyman
RE: Roll-up of proposed changes to atompub-format section 5
Paul Hoffman wrote: Intermediaries such as aggregators may need to add an atom:source element to an entry that does not contain its own atom:source element. If such an entry was signed, the addition will break the signature. Thus, a publisher of individually-signed entries should strongly consider adding an atom:source element to those entries before signing them. It looks good to me. Thanks! bob wyman
Re: Clearing a discuss vote on the Atom format
James M Snell wrote: b. recommended inclusion of a source element in signed entries. +1 bob wyman
Re: Roll-up of proposed changes to atompub-format section 5
I believe it would be very useful to specify that signed entries should include a source element. This can/should be considered part of entry canonicalization. The reason I suggest this is that signed entries are only really useful when extracted from their original source feeds. If entries are only read from their source feeds, then it is probably best for publishers to sign the feed, not the individual entries. (Note: It is my hope that feed publishers will anticipate that their entries will be extracted from the source feeds and will thus sign the individual entries rather than the feeds... i.e. Publishers should anticipate that intermediaries like PubSub and various other search/discovery services will aggregate their entries and republish them in non-source feeds.) When an entry is removed from its source, it SHOULD have a source element inserted if one is not already present. However, if a republisher inserts a source element into a signed entry that would break the signature. Thus, it seems reasonable that we should strongly encourage those who sign entries to anticipate the needs of subsequent processors by inserting the source elements in the original signed entries. By inserting the source elements, the requirement for others to break the signature will be drastically reduced. If an entry is signed, yet contains no source element, much of the utility of the signature (allowing verification of the original publisher) is eliminated. bob wyman
Re: More on Atom XML signatures and encryption
Paul Hoffman wrote: Same as above. Even though it is included-by-reference, the referenced content is still a part of the message. No, it isn't. The reference is part of the message. +1 The signature should only cover the bits that are actually in the element (feed or entry) that is signed. Referenced data may be under different administrative control, may change independently of the signed element, etc. bob wyman
RE: More on Atom XML signatures and encryption
James M Snell wrote: I am becoming increasingly convinced that a c14n algorithm is the *only* way to accomplish the goal here. The need for C14N should never have been questioned. Where there are signatures, there *must* be C14N (Canonicalization). In the absence of explicitly defined C14N rules, the C14N algorithm is simply: Leave it as it is! -- but that is rarely useful and is certainly not useful in the case of Atom. The only interesting question is What is the C14N process for Atom? The question: Is C14N required? is rhetorical at best. The answer is Yes. The algorithm would recast the entry being signed as a standalone entity with all appropriate namespace declarations, etc. Precisely. It is also exceptionally important to ensure that a source element be included in any signed entry in order to ensure that the signed entry can be copied to other feeds without breaking the signature or changing the semantics of the entry by allowing feed metadata from the non-source feed to bleed into the entry. bob wyman
Re: More on Atom XML signatures and encryption
James M Snell wrote: the ability to omit the author element from a contained entry / if the containing feed has an author... Signed entries should include a source element and that source element should contain any of the feed level elements that the entry depends on. This is one of the reasons that souce elements exist. The use of source elements drastically simplifies this part of the canonicalization process. bob wyman
Re: More on Atom XML signatures and encryption
James M Snell wrote: Question: should we only allow signing of the entire document or are there valid use cases for allowing each individual entry in the feed to be individually signed? We definitely need to be able to sign each entry. This is necessary so that we can passed signed content in aggregated feeds. The mere act of aggregation should not force a signature to be removed from an item. (Note: Signed entries really *must* include source elements. Otherwise, aggregators will be forced to strip off the signatures in order to insert the source elements.) bob wyman
Re: Polling Sucks! (was RE: Atom feed synchronization)
James M Snell wrote: If I understand Bob's solution correctly, it goes something like: 1) wake up 2) scratch whatever you need to scratch 3) turn on computer, launch feed reader 4) feed reader does some RFC3229+feed magic to catch up on what happened during the night 5) feed reader opens a XMPP connection to receive the active stream of new entries This is precisely what I was describing and it is what we implement in the PubSub Sidebar clients. This hybrid combination gives you the best of both worlds. The result is the lowest possible bandwidth consumption as well as the lowest latency in delivering content to clients. The Push+Pull approach is particularly well suited to the kind of high volume application that James Snell describes -- particularly if the server has a large number of readers. While I've previously pointed out the benefit to the network (efficient utilization of bandwidth) and to clients (low latency), it is important to point out that the Push model offers real benefits for the server as well. In extremely high volume applications, it is important that the server be able to control and smooth load. Server based load control is most easily accomplished with a Push system. In a Pull based system, load is almost totally dependent on client-driven scheduling and thus load tends to be very bursty. Bursty load is the worst possible thing to have in a network-based system. In Push based systems, the server is able to eliminate load bursts by spreading delivery of entries over time -- without worrying about the need to service bursty client requests within the window of their request time-out limits. Even though there are all sorts of advantages to using Push-based and hybrid Push+Pull systems, the reality is that only a tiny percentage of all the millions of servers that support Atom feeds will have sufficient traffic or readership to benefit from these methods. As Joe Gregario suggests in his recent note: 99.99% of all syndication is done via HTTP and this will probably remain the case in terms of a raw census of servers. However, it is also clear we are seeing significant growth in the use of feed aggregators like PubSub, FeedBurner and the other blog search and monitoring services. Also, we are seeing an increase in the use of feed-readers on mobile devices which require that feeds be consolidated and fed through proxies in order to reduce the amount of polling and other processing done by those mobile devices. As the use of these services increases, it will make sense for client developers to implement client-based support for Push+Pull and thus provide to their users the benefits of reduced bandwidth, reduced session management, and reduced latency. Broad client-based support makes sense even if similarly broad server-based support does not. bob wyman
Re: Polling Sucks! (was RE: Atom feed synchronization)
Sam Ruby wrote: P.S. Why is this on atom-sytax? Is there a concrete proposal we are talking about here? Is there likely to be? Because James Snell asked a question?.. But, more seriously: I intend to write an Internet draft for RFC3229+feed and hope that I'll be able to get the working group to consider it. Given the implemenation history, we certainly meet the IETF tradition of having more than three independent implementions as well as considerable experience in field use. Also, the Atom over XMPP Internet Draft is something that I think the Working Group should consider once the issues related to the syntax and protocol specs are dealt with. In any case, I think it is traditional for IETF mailing lists to provide a forum for discussion of potential use of the protocols that they define in addition to providing a forum for the work of defining the language of the specifications themselves. It is only by developing a common understanding of the various use cases that we can understand how the future work, if any, of the working group should be defined. bob wyman
Polling Sucks! (was RE: Atom feed synchronization)
Henry Story wrote: The best solution is just to add a link types to the atom syntax: a link to the previous feed document that points to the next bunch of entries. IE. do what web sites do. If you can't find your answer on the first page, go look at the next page. How do you know when to stop? If the pages are ordered chronologically, the client will know to stop when he has come to a page with entries with update times before the date he last looked. This is *not* simpler than taking a push feed using Atom over XMPP. For a push feed, all you do is: 1. Open a socket 2. Send a login XML Stanza 3. Process the stanzas as they arrive. For your solution, you need to: 1. Poll the feed to get a pointer to the first link. (each poll will cost you a TCP/IP connection). 2. If you got a new first link then go to step 5 3. Wait some period of time (the polling interval) 4. GoTo Step 1 5. Open a new TCP/IP socket to get the next link 6. Form and send an HTTP request for the next entry 7. Catch the response from the server 8. Parse the response to determine if its time stamp is something you've already seen. 9. If you haven't seen the current entry before, then go to step 5 10. Go to step 1 to start over. (Note: I've eliminated and compressed a few steps to avoid more typing... An actual implementation would be more complex than I describe above.) Your solution is more complex and generates much more network traffic (i.e. because of polling the feed, repeatedly opening new TCP/IP connections with all the traditional slow start overhead, and requesting each next link). Additionally, you end up with reduced latency since the age of any entry you discover will be, on average, half that of your polling frequency plus some latency introduced by link following. (Yes, you could rely on continuous connections and thus remove the overhead of creating so many TCP/IP connections, however, at that point, you might as well have a continuous push socket open...) The push solution conserves network bandwidth, delivers data with much less latency and is simpler to implement. Polling sucks! (that was a pun...) bob wyman
RE: Polling Sucks! (was RE: Atom feed synchronization)
Antone Roundy wrote: XMPP: 5. If the feed had entries that were old and not updated, go to step 7 6. If the feed has a first or next or whatever link, go to step 1 using that link 7. Open a socket 8. Send login XML stanza I am assuming that if you are pushing entries via Atom over XMPP, you would only push new and updated entries. Thus, a client shouldn't need to check for old and not updated entries. Also, I'm assuming that since you are pushing entries, you wouldn't be inserting first or next links that needed to be followed. The client would get all of its entries from the XMPP stream. XMPP could achieve parity in getting feed changes that occurred while offline, at the expense of implementation complexity parity, by polling the feed once upon startup. My assumption is that any well-built XMPP feed reader will, in fact, also be able to read Atom files via HTTP. This is what we do at PubSub and Gush does the same. I think Bill's app also does this. The original question dealt can, I think, be summarized as: How does one best keep up with a high-volume Atom publisher? My point was that the first and next links don't make things any easier. They just force the client to do a great deal of work to discover what the server already knows -- which entries have been updated. The first and next links approach just makes the process of working with feed files more complex as well as more bandwidth intensive. XMPP support is a much better solution for keeping up with changes while connected. Let's keep Atom as it is now -- without the first and next tags and encourage folk who need to keep up with high volume streams to use Atom over XMPP. Lowered bandwidth utilization, reduced latency and simplicity are good things. bob wyman
RE: Polling Sucks! (was RE: Atom feed synchronization)
Joe Gregorio wrote: The one thing missing from the analysis is the overhead, and practicality, of switching protocols (HTTP to XMPP). I'm not aware of anything that might be called overhead. What our clients do is, upon startup, connect to XMPP and request the list of Atom files that they are monitoring. They then immediately fetch those files to establish their start-of-session state. From that point on, they only listen to XMPP since anything that would be written to the Atom files is also written to XMPP. HTTP is only used on start-up. It's a pretty clean process. Let's keep Atom as it is now explain to folks who need to keep up with high volume streams the two options they have, either streaming over XMPP or next links. Where are these next links defined? I don't see them in the Atom Internet Draft. The word next doesn't even appear in the ID... If they aren't there, how can you call them Atom as it is now? I thought Henry Story was proposing these as extensions. bob wyman
Re: Atom feed synchronization
James M Snell wrote: Nice. I had pulled out of the Atom discussions to work on another project back when this was being discussed and missed it. Quick question tho.. in your initial post on the concept you state It is my intention to create a Internet Draft describing the ideas here I do intend to write an Internet Draft, but I had been waiting to get some field experience before doing so. At this point, I guess I'm pretty sure it works as described. Many people have implemented RFC3229+feed and so far I've heard of no issues with it other than some folk who object on principle to RFC3229 itself. Other than waiting for field experience, the thing that has held me up is waiting for the working group to get far enough along with Atom so that I could propose that the Internet Draft be taken up here. Now that Atom V1.0 is almost in the can, I guess it is time to get the thing written... BTW: I think the best way to implement the application you describe is probably via a combination of Push and Pull. If you're updating as rapidly as you say you are, then it would make sense to push the updates to the client using something like Atom over XMPP[1]. You would, however, still generate Atom files and serve them using RFC3229+feed. The Atom files would be used by clients to catch up on missed messages when they initially connect or reconnect to the push stream after having been off-line for some time. The hybrid Push+Pull process described above is what we implement at PubSub for every subscription. We currently have this implemented in our PubSub Sidebars for IE and Firefox[2]. Also, the Gush reader from 2entwine implements this hybrid Push+Pull approach when reading our feeds. Push+Pull with Atom and Atom over XMPP gives you the best of both worlds. You get very efficient and low latency publishing of new entries to clients as well as efficient downloading of catchup files. What more could you want? :-) bob wyman [1] http://www.xmpp.org/drafts/draft-saintandre-atompub-notify-02.html [2] http://www.pubsub.com/downloads.php [3] http://www.2entwine.com/features/pubsub.html
RE: Google Sitemaps: Yet another RSS or site-metadata format and Atom competitor
Greg Stein wrote: It was not published to muddy the waters. That implies a specific intent which was *definitely* not present. Please accept my apologies for what was poor writing. I can see how you read my sentence as implying intent to muddy. It wasn't my intent, however, to imply that. I should have written. publishing this new format *and* muddying the waters. My intent was to say that publishing the format has the effect of muddying the waters. I wasn't trying to say that Google was intentionally doing this. proprietary connotes closed. I'm using the older definition of proprietary which means simply not standard. I see nothing wrong with saying open and proprietary format... I don't think one implies the exclusion of the other. How about this: you have a web site with 10 *million* URLs on it. What format are you going to use? Is Atom appropriate at that scale? No. I don't think Atom would work well with 10 million URLs. At least not as currently defined. I do think, however, that it would have been useful to try to at least have a conversation about defining some subset of Atom that would address the need. I think a result could have ended up looking much like the Sitemap format but offered a smoother migration path from Atom as we know it to the more terse format and the reverse. Please understand that I think that on-the-whole, the efforts by Google to popularize the Sitemap process and syndication by non-blogs is absolutely wonderful! I'm only grumbling about the formats... bob wyman
RE: Google Sitemaps: Yet another RSS or site-metadata format and Atom competitor
Arve Bersvendsen wrote: Actually, Google Sitemaps is already compatible with [Atom]. Yes, I've had a number of folk send me mail pointing to the FAQ that I did not read as closely as I should. In fact, it is great that Google is willing to accept Atom 0.3 files instead of just their Sitemap format. Hopefully, this move will encourage many traditional web sites to start producing Atom files to serve the role of Rich Site Summaries and allow us to expand our feed services to cover non-blogs as well. I do still think it unfortunate that Google felt compelled to invent yet-another-format for Sitemaps. Because of Google's support for the Sitemap format, it is inevitable that aggregation services are going to have to start reading Sitemaps in addition to the billions of flavors of legacy RSS and Atom files. Nonetheless, given the importance of any format that Google supports, should we be considering providing support in Atom for the priority element or some easily mapped equivelant? Also, I wonder what it will take to get Google so say that they'll support Atom 1.0? It seems obvious that they would, but it would be nice to know when they expect to do it... The Sitemap Index files are proper and useful additions to what we have now in the blogosphere and we should look carefully at either trying to figure out how to incorporate this idea into what we do for Atom. Perhaps something along the lines of Sitemap indexes could be incorporated into the autodiscovery specification? bob wyman
RE: Google Sitemaps: Yet another RSS or site-metadata format and Atom competitor
Graham wrote: I don't see how a highly specialized format for a particular task is a competitor to or even compatible with what Atom does. The highly specialized task which is performed using the Sitemap format is providing lists of changed web pages on sites. This is precisely the function that is performed in many applications of Atom. The only difference between the target of Sitemap and Atom is that Sitemap works with web sites that are not blogs and Atom is usually used with web sites that are blogs. However, the differences between these two kinds of site are virtually non-existent. Atom doesn't need to be a jack of all trades to handle the job that Sitemaps handle. It is already quite capable of doing the job. And, as James Snell points out in an earlier message, collection documents would handle well the job of providing Sitemap indexes. It seems quite clear that the Sitemap and Sitemap index formats have little to offer that isn't already provided by Atom. This obviously leads to the question of why Google went to the trouble of defining these formats. It would be real nice if someone from Google could provide a touch of explanation... bob wyman
ByLines, NewsML and interop with other syndication formats
Ive spent an interesting day in Amsterdam at the IPTC News Summit and had a chance to talk about standards convergence issues with various folk in the IPTC (owners of NewsML, NITF, EventsML, SportsML, ANPA, etc.). These folk seem sincerely interested in getting some better worked out compatibility between things like NewsML and Atom Id like to suggest that we explicitly invite the IPTC folk to propose a set of Atom extensions (that would include ByLine) with the intention that these extensions would incorporate their detailed knowledge of the publishing world and facilitate the interchange or translation of documents between NewsML, NITF, etc. formats and Atom. bob wyman
RE: Compulsory feed ID?
Antone Roundy wrote re the issue of DOS attacks: I've been a bit surprised that you [Bob Wyman] haven't been more active in taking the lead on pushing the conversation forward and ensuring that threads addressing the issue don't die out, given the strength of your comments on the issue in the past and the obvious significance to your business. ... Perhaps you, who are probably in a better position than any of us to speak from experience on how to deal with this, could refresh our memories of specifically what you think the best solution is. Yes, this issue is very important to us at PubSub and should be very important to others as well. However, as I've learned from other recent discussions, my viewpoint is not commonly held in this Working Group. Thus, what I've been trying to do is pick carefully the issues that I work on. For instance, I've put a great deal of effort into multiple ids since that allows us the freedom to either work out proprietary solutions to the DOS problem on our own or allows us to punt the problem forward to the end-users' aggregators if we can't come up with a decent solution. Clearly, the best solution here would be for folk to use signatures. But, that is going to take either a great deal of work to get adopted or something really creative (and simple)... The history of attempts to get signatures used does not make pleasant reading... We are putting effort into working out methods to make signatures more acceptable to the community and I hope to have some proposals soon... If we successful (wish us luck!) that will at least provide a solution for some people... Basically, it doesn't make sense for me to keep demanding that people deal with issues that they clearly don't want to address. I've been mentioning the DOS problem for months now and getting nowhere. So, the reason I'm not pushing harder is that it is clear that implementable work-arounds will be more useful than never agreed-to solutions... bob wyman
How is Atom superior to RSS?
Ill be making a presentation on Tuesday which will include a slide on how Atom improves on RSS. If you have any thoughts on this subject, I would appreciate hearing them bob wyman
RE: How is Atom superior to RSS?
This has been an experiment... I've got lots of thoughts on why Atom is an improvement over RSS but I am constantly amazed that people are able to continue making the claim that Atom offers little that RSS doesn't already support. Certainly, Winer and the Microsoft crowd make that claim regularly. I've often wondered why people don't see the really important differences between these two. To a certain extent, the answer comes in the replies I've received to my posting. i.e. Not even those most familiar with Atom can present a decent list of clear advantages -- even though they undoubtedly know them. Yes, we all know the advantages of requiring unique atom:id values, writing less ambiguous documentation, etc. However, I wonder why advances like the following don't get more recognition (note: this is not a complete list.) 1. Explicit support for xml:lang rather than the silly language/ tag of RSS V2.0. 2. Explicit support, in the core, for digital signatures and encryption. 3. Atom Entry documents. Thus, support for the protocol as well as for push delivery of Atom feeds via Atom over XMPP and other such protocols. (i.e. Atom is designed to enable a push future rather than only working in the legacy pull-only world of RSS) 4. Atom:source elements which provide robust support, in the core, for attribution on entries that have been copied from one feed to another and for preservation of important feed metadata in copied entries. Atom's source element makes it a superior format for delivering search results, for constructing feeds which aggregate entries from multiple sources, and for push applications. 5. Support for XML content types rather than being limited to RSS's HTML content type. 6. Explicit support for remote content. We all worked hard in getting these new capabilities and others like them into Atom and properly defined. Why aren't these things given more press and attention? They are significant improvements over RSS that will have profound impact on our ability to build better applications for our users. bob wyman
RE: A different question about atom:author and inheritance
Tim Bray wrote: The intent seems pretty clear; entry-level overrides source-level overrides feed-level, but it seems like we should say that. Anybody think this is anything more than an editorial change? -Tim I believe that this three-level chain of inheritance has always been what we've intended. There was, however, a great deal of discussion at one point about how to actually write the words. Thus, I agree that it is largely an editorial change; however, you might expect some controversy over particular word choices. Give it a shot and let's see how folk respond. Note: There is more to authorship than just the inheritance issue. I think it also makes sense that a feed-level author should be considered to be the author of the collection of items which is the feed. This authorship is independent of authorship over any particular entry within the feed. Even if the feed contains no items authored by the feed-level author, the feed-level author is still author of the collection. This distinction would be useful in describing linkblogs, and a variety of other feeds types that are composed of entries collected from other feeds or multiple authors. bob wyman
RE: Refresher on Updated/Modified
Graham wrote: What if someone (either the publisher or someone downstream) wants to store a history of every revision in an archive feed? To this, Tim Bray answered: I don't see why, if you wanted that kind of archive, you couldn't use atom:updated for every little change in the archived version but atom:updated only for the ones you cared about in the published version. In which case the archived version would be a superset of the published version. I see nothing wrong with that. -Tim Of course, the objections to Tim's position are obvious: 1. The case of someone downstream was ignored in the answer. Tim only addresses the issue of what the publisher might do. 2. Given Tim's solution to the problem, downstream readers would be incapable of maintaining an accurate archive since only the publisher's unpublished archive would have atom:updated values that change on each modification. 3. The archive that Tim describes would not actually be a useful archive for many purposes since it would not be an accurate description of the sequence of entries written to the feed. For instance, such an archive would not satisfy legal rules for logging data in financial applications since such an archive could not be used to determine the value of the atom:updated value in entries that had actually been published. This whole argument is silly. Atom:modified is needed. It should be provided. Nobody has given a decent argument against it. If you insist on objecting to it then let the darn thing be optional -- but instead of trying to impose your personal vision on the process, just let the rest of us get on with doing the work we need to do in the way we know we have to do it. bob wyman
RE: multiple ids
Tim Bray directs the editors to insert the following words: If multiple atom:entry elements with the same atom:id value appear in an Atom Feed document, they describe the same entry and Atom Processors MUST treat them as such. It is a long standing and valued tradition in the IETF that Standards Track RFC's MUST NOT impose constraints on applications unless such constraints relate to issues of interoperability. Thus, while it is entirely appropriate for the specification to state that multiple atom:entry elements with the same atom:id ... describe the same entry it is NOT appropriate to state how Atom Processors must treat such elements. The text should read simply: If multiple atom:entry elements with the same atom:id value appear in an Atom Feed document, they describe the same entry. The appropriate handling of multiple instances of the same entry is a matter which is the solely up to the discretion of Atom Processors since variances in such handling do not impact interoperablity. One can imagine that various developers will make different decisions in duplicate handling policies. Some processors might even allow their end-users to decide the handling policies. By making such decisions, developers will either enhance or detract from the utility of the overall solutions they develop -- but, it is not up to the IETF to direct what decisions should be made in this case. Restating this in a manner perhaps more friendly to those who declare themselves as bits on the wire people: The specification should specify the meaning of the bits on the wire -- not what one does with the bits after receiving them. bob wyman
RE: Compulsory feed ID?
Tim Bray wrote: I think the WG basically decided to punt on the DOS scenario. -Tim I believe you are correct in describing the WG's unfortunate disposition towards this issue. (Naturally, I object...) In any case, given that a significant DOS attack has been identified -- yet not addressed -- I think it would be both wise and appropriate to provide text in a Security Concerns section that describes the vulnerability of systems that rely on Atom documents to this particular attack. bob wyman
atom:modified indicates temporal ORDER not version....
Robert Sayre wrote: Versioning problems aren't solved by timestamps. I don't understand why this version issue keeps coming up. It should be apparent to everyone that there is NO relationship between timestamp and version. Timestamps have only two functions: 1. Different timestamps indicate different instances of an entry. 2. Timestamps allow assumptions concerning temporal ORDER or sequence to made It is totally reasonable for me to develop a V1.0 followed by a V2.0 which is then followed by a V1.1 -- if I have a reasonably rich versioning scheme. If I then order-by-version, I would have the ordered set V1.0, V1.1, V2.0. However, if I order-by-time, I have the ordered set V1.0, V2.0, V1.1. I believe that atom:modified is intended only to permit order-by-time. Certainly, that is all that is needed to address the vast majority of use cases for which atom:modified has been declared useful. Atom:modified is intended to allow processors to compare two non-identical instances of a single entry and determine the order in which they should be considered to have been created. Atom:modified allows us to say This entry is considered to have been created after that entry. Atom:modified does not permit us to make any statements concerning versions or variants. It is possible that some confusion is being introduced here since one may notice that if a very simple, single-line-of-descent versioning scheme is used, the time-ordered sequence of instances will be identical to a version-ordered set of instances. However, this is merely an anecdotal observation that applies to only one of many possible classes of versioning policy. There is no general correlation between temporal order and version order that applies across all versioning policies. Any correlation between temporal-order and version-order should generally be considered coincidental and not interesting. Knowledge of temporal order is extremely useful information that can be usefully exploited by Atom Processors to deliver a number of capabilities demanded by a variety of users. However, the current definition of Atom (since it doesn't support atom:modified) does not permit general temporal ordering of entries even when they are all published in a single feed. At best, the current definition only allows us to temporally order sets of entries that have the same atom:updated value. However, we have no means of determining sequence or temporal ordering of the elements of a set whose members share the same atom:updated value. This inability to order elements of such sets is a significant weakness in Atom in that it introduces ambiguity. Atom should support atom:modified to permit the temporal-ordering of members of sets that share the same atom:id and atom:updated values. This has nothing to do with versioning. bob wyman
RE: atom:modified indicates temporal ORDER not version....
Robert Sayre wrote: What does atom:id have to do with temporal ordering? Absolutely nothing. Atom:id is used to identify sets of entry instances which, according to the Atom specification, should be considered the same entry. Sets composed of instances of the same entry can then be divided into subsets that share a common atom:updated value. After such a division into subsets, some of the subsets may contain multiple elements which cannot be temporally ordered given the current Atom spec draft. atom:modified provides a means to temporally order the elements of sets which contain multiple elements that share common atom:id and atom:updated values. I believe this was communicated when I wrote: Atom should support atom:modified to permit the temporal-ordering of members of sets that share the same atom:id and atom:updated values. bob wyman
RE: atom:modified indicates temporal ORDER not version....
I wrote: I believe this was communicated when I wrote: Atom should support atom:modified to permit the temporal-ordering of members of sets that share the same atom:id and atom:updated values. Robert Sayre wrote: No, that's not what you communicated. How can I temporally order atom entries with different IDs but the same atom:updated value? atom:id and atom:modified are completely unrelated. I don't know what the problem is, but the answer is atom:modified! Robert, it is clear that your disdain for the current discussions has driven you to the point where you are no longer even reading the posts to which you respond. This is not productive. I have said *nothing* about the temporal ordering of atom entries with different IDs. I have only written about the problem of providing temporal ordering of atom entries that share the same atom:id and atom:updated values. I repeat (with a few added words to make it even more clear): Atom should support atom:modified to permit the temporal-ordering of members of sets whose members share the same atom:id and atom:updated values. bob wyman
RE: Fetch me an author. Now, fetch me another author.
Robert Sayre wrote: atom:modified cannot be operationally distinguished from atom:updated. Obviously, if people start shipping feeds with the same id and atom:updated figure, it will be needed. There's no reason to standardize it, though. We don't know how that would work. The definition of atom:updated was explicitly and intentionally crafted to permit the creation of multiple non-identical entries that shared common atom:id and atom:updated values. Clearly, it was the intention of the Working Group to permit this, otherwise the definition of atom:updated would not be as it is. Thus, it is ridiculous to try to suggest that feeds with the same id and atom:updated are somehow unanticipated or not-understood. If such feeds are so far outside the ken of what the working group intends, then atom:updated should never have been defined as it is. Additionally, atom:modified is clearly distinguished from atom:updated *by definition!* Atom:modified indicates that last time an entry was modified. Atom:updated indicates the last time it was modified in a way that the publisher considered significant. This is a very clear distinction. bob wyman
RE: Fetch me an author. Now, fetch me another author.
Robert Sayre wrote: Here's the last time this discussion happened: http://www.imc.org/atom-syntax/mail-archive/msg13276.html Tim's point in the referenced mail supported the current definition of atom:updated which provides a means for publishers to express their own subjective opinions of what is a significant change to an entry. However, the solution of one problem does not eliminate the second problem. The second problem is that readers (not publishers) need to be able to distinguish and temporally order entries that have been written by publishers. Because the publishers CANNOT know the detailed needs of all their readers, publishers' subjective input cannot be held to be useful. Objective metrics which can be clearly understood by both publishers and readers must be used. In this case, the best objective measure to use is to say that the change of one of more bits in the encoding or representation of an entry should result in a new atom:modified value. * Atom:updated addresses needs of publishers * Atom:modified addresses needs of readers Both sets of needs, that of publishers as well as readers, must be addressed and dealt with by the Atom format. Atom:updated only addresses the needs of publishers. bob wyman
RE: atom:modified indicates temporal ORDER not version....
Robert Sayre wrote: Temporal order of what? They are all the same entry, so what is it you are temporally ordering? We are discussing the temporal ordering of multiple non-identical *instances* of a single Atom entry. It is common in the realm of software engineering to deal with this concept of instances. Things are often considered to be simultaneously different and the same. (I am who I am today -- as I was when I was a child, nonetheless, I am very different today than I was when I was a child. The instance of me today differs from the instance of me that you might have come across many years ago.) But, perhaps this concept is too abstract for some readers... Why is this a new problem that only arises when we allow multiple IDs in the same feed? I have been pointing out these issues since long before the issue of multiple IDs (multiple instances) recently regained attention. The issue exists even without duplicate id support but is particularly critical once we support multiple instances of an entry in a single feed document. In the absence of duplicate id support, a reader can infer the temporal order of entries by simply noticing the order in which the entry instances were read from a feed document. (If duplicate ids are prohibited, then if you have read two entry instances which share a common atom:id, they must have been read from different instances of feeds and at different times. Thus, you can infer in some cases that the temporal ordering of the entry instances approximates the temporal ordering of the read operations which retrieved the entry instances. ) However, if you permit multiple instances of an entry in a single feed document then it is possible that you will read multiple entries whose temporal order cannot be inferred. (Note: Order of appearance in a feed does not imply any inter-entry order and thus cannot be used to infer or discover the temporal ordering of entries.) Thus, this issue *is* related to the multiple ID issue in that the problem is exacerbated by permitting multiple instances of a single entry in a single feed document. Whether or not it is relevant in other contexts is largely irrelevant since it appears that addressing the issue in one context will resolve it in other contexts as well. bob wyman
RE: Refresher on Updated/Modified
Tim Bray wrote: for archiving purposes I consider all changes no matter how small significant, and thus preserve them all with different values of atom:updated. For publication to the web, I have a different criterion as to what is significant. I fail to see any problem in the archive being a superset of the feed. The problem is that such an archive would not accurately reflect what you actually published to the web. Thus, for many applications, you would also have to keep a distinct log of what you published. Using your archive you wouldn't be able to meet various legal requirements that apply to a number of businesses which require that you be able to show what you published. The problems get worse if you include signatures in your entries. Using your archiving method, the signatures on your archived entries would be different for the signatures on the entries you published. The archive method you describe does not produce a superset of what you published; it is a different set of data from that which you published. This is not necessary. bob wyman
RE: Refresher on Updated/Modified
Tim Bray wrote: I regularly make minor changes to the trailing part of long entries and decline to refresh the feed or the atom:updated date, specifically because I do not went each of the ten thousand or so newsreaders who fetch my feed to go and re-get the entry because I fixed a typo in paragraph 11. It seems like you are concerned that people who see a change in your feed will re-fetch the HTML? If this is your concern, then do as you do now and don't refresh the feed unless you have a change that warrants an update to atom:updated. This is totally up to you and support for atom:modified wouldn't change that. There is no requirement that your feed change whenever you modify your posts. Thus, there is nothing that stops you from pursuing this policy. You are essentially arguing that the standard should force everyone to have a blog that works in the manner that your blog works. That is not reasonable. To argue that the standard should make it possible for you to do things the way you want is quite reasonable. But, you should give to others the same consideration you apparently demand from them. bob wyman
RE: atom:modified (was Re: Fetch me an author. Now, fetch me another author.)
Antone Roundy wrote: Unless the need for this can be shown, and it can be shown that an extension can't take care of it, I'm -1 on atom:modified. The need is simple and I've stated it dozens of times... Given two non-identical entries that share the same atom:id and the same atom:updated, I need to know which of them is to be presented to the user. The current specification doesn't allow me to do anything other than make a random choice. This is not reasonable. Atom:modified would provide the data needed to determine which was the most recently produced of the two entries. That most recently produced entry is the one that is most often desired by users. On extensions... Virtually anything can be done in extensions. If nothing should be in the core except those things that can be defined by extensions, then nothing would be in the core. It is inevitable that extensions will not be as broadly implemented as elements of the core. The practical implication of forcing something to be an extension is to ensure that it is never broadly implemented. bob wyman
RE: Refresher on Updated/Modified
Tim Bray wrote: As a matter of policy, my feed contains the most recent 20 posts. However, if one of those posts is a long post and only the summary is provided, when I make a change, I make a conscious decision whether it's sufficient that I want newsreaders to re-fetch it, and if so I change the datestamp, otherwise not. Finally, your true motivations appear. It is now clear that you're not talking about Atom itself. Rather, you are trying to regulate the behavior of Atom Processors who will be using Atom feeds somewhat like ping feeds that tell them which entries to fetch. Your concerns are easily addressed by providing text in the specification that makes clear what you want. I therefore propose the following text: To the discussion of both atom:updated and atom:modified add: [Non-Normative: In order to preserve network bandwidth and reduce the load on hosts of resources linked to Atom feeds or entries, Atom Processors which fetch the contents of alternate links are advised that they should not re-fetch such contents unless atom:updated changes.] To the discussion of atom:modified add this normative text: The value of atom:modified MUST only be changed when some other element (including atom:updated) of the same Atom entry has changed. Changes which are limited to resources linked to the Atom Entry MUST NOT trigger changes to atom:modified. Since I'm a good citizen about specs, I would do this wasteful thing. If the spec were written as I have proposed above, then you -- as a good citizen -- would never re-fetch the alternate linked resources unless atom:updated changed. The difference here is that my comments have been solely focused on the contents of the Atom feed -- which is all that PubSub is concerned with. Nonetheless, the proposed texts should resolve your issues while allowing PubSub to do its job. bob wyman
Microsoft to support Atom in any aggregator they produce
FYI: Robert Scoble, a Microsoft employee/insider very familiar with Microsoft's plans for syndication, declares in comments on his blog that we are supporting Atom in any aggregator we produce. Microsoft's example in supporting Atom should be followed by all other aggregator developers in the future and Microsoft should be commended for supporting the adoption of openly defined standards for syndication. For more info (and some heated comments...) see: http://bobwyman.pubsub.com/main/2005/05/microsoft_to_su.html bob wyman
Which is the preferred feed?
Some sites are beginning to serve their feeds via intermediaries like FeedBurner. They are doing this, in part, to make it easier for them to get better statistics on their use of the feeds, to off-load bandwidth requirements, or to take advantage of the advertising insertion and management programs of the intermediaries. However, many of todays intermediaries require that program participants manage a base feed on their own sites that is later copied to the intermediary. This is the approach taken by FeedBurner among others. Whether or not the intermediaries require that a feed be maintained on the site, this is usually required if only because there will be people who are reading the feed and there is no means to notify them, within the feed, that a new preferred source of the feeds is available. For instance, the Typepad site blog.deeje.tv has two feeds generated by Typepad: http://blog.deeje.tv/musings/atom.xml http://blog.deeje.tv/musings/index.rdf and it has a feed generated by FeedBurner: http://feeds.feedburner.com/deeje/musings Now, my assumption is that the owner of blog.deeje.tv probably would prefer that people read his FeedBurner feed rather than the TypePad feeds. Evidence of this can be seen in that the autodiscovery links on the page point to the FeedBurner feeds. However, while the links currently point to FeedBurner, they have not always pointed there At some point in the past, the owner of this blog decided to prefer the FeedBurner service over Typepad for feed services. At some point in the future, the same owner might wish to drop the FeedBurner service in favor of some other service or perhaps just go back to Typepad normal feeds. The problem, of course, is that there is no existing mechanism by which these changes in preferred feeds can be indicated in either an Atom or RSS file. The result is that any software system that started reading the Atom or RDF feeds provided by Typepad before this blog started using FeedBurner will continue to read the Typepad feeds in the future. Similarly, any system currently reading the FeedBurner feeds is likely to continue reading those feeds in the future. One could argue that feed reading software should, on some regular schedule, re-scan the alternate site for a feed to see if the autodiscovery links have changed. However, this is a pretty crude solution It would be much, much better to allow a feed to contain data that explicitly identifies a preferred alternative source. Supporting a means to identify a preferred alternative source would greatly improve the mobility of feeds across the network and would avoid the current problem of potentially pinning someone down to a feed delivery service simply because of historical accident. If I want to move my feeds from Typepad to FeedBurner, I should be able to without having to worry about leaving behind everyone who had ever started reading my Typepad feeds. Similarly, if I later decide that I want to move off FeedBurner, there should be a way to point people to the location of my preferred feeds. bob wyman
RE: Which is the preferred feed?
Anne van Kesteren wrote: Sites could also use a HTTP 302 link on their own site that points to FeedBurner in the end. When FeedBurner dies or when they no longer have desire to use the service, they switch the location of the temporary redirect and all is fine. While 302 is an obvious technical solution, it just doesn't do the job. HTTP's 302 is just a bit too absolute... For instance, if I'm trying to push people from my Atom 0.3 feed to my Atom V1.0 feed, it is likely that there will be many readers who don't know how to process Atom V1.0 correctly -- at least initially. They should be free to fallback to the Atom 0.3 feed until they learn the new format. Similarly, if I have readers who use one of the MAC based readers that only read RSS, it becomes problematic to force them to read my Atom V1.0 feeds... It should also be noted that the ability to change HTTP response codes is not something which is typically provided to many bloggers today. I'm aware of no blog hosting services that allow for customer-requested 302 status values. Even on the one-user systems, I think its pretty hard for normal folk to figure out how to make the modifications needed to return a 302 for some files. We should also realize that business issues are likely to make it difficult for people to use 302-based solutions. For instance, a site that provided intermediary feed serving might not wish to make it easy for people to migrate their feeds away from their service. They might *like* the idea that switching costs are very high... Thus, they might simply refuse (on some technical grounds) to allow users who are moving to a new service to get 302-forwarding on their feeds. On the other hand, if the Atom format itself contained a means of redirecting to preferred feeds, and if the spec said that such data MUST NOT be removed when a feed is copied, etc., then one could essentially force vendors to support feed mobility. (Yes, there would be loop-holes) Normally, I wouldn't argue for replicating an HTTP feature inside the feeds, however, I think that what I'm talking about here is not really what 302 was intended to provide. In any case, this may be looked at as a layering issue. 302 provides hard redirection at the HTTP level, a preferred feed indicator provides soft-redirection at the application level. Implementation of similar services in multiple layers of the stack is a reasonable thing to do as long as the semantics vary at least slightly between the layers and the reasons for the variances are related to the nature of the layers. bob wyman
RE: PaceAllowDuplicateIDs
Graham wrote: Does anyone remember why having the same id in a feed is a bad idea? Beacuse instead of a fixed model where a feed is a stream of entries each with their own id, it is now a stream of entries each of which does not have its own id, but shares it with similar entries. This is bullshit. I completely disagree on this. I think the problem here is people focusing too much on characteristics of the feed when the real issue here is Entries. Like I've said in the past, It's about the Entries, Stupid! (don't take offense...) As long as we allow entries to be updated, it is inevitable that the stream of entries that is created over time will contain instances of entries that share common atom:id values. The only question here is whether or not we're willing to allow a feed document to *accurately* represent the stream of entries -- as they were created -- or whether we insist that the feed document censor the history of the stream by removing old instances of updated entries before allowing updates to be inserted. The reality is that no matter which decision we make in this case, any useful aggregator must have code to deal with multiple instances of an entries that share the same atom:id. This is the case since even if we don't permit duplicate IDs in a single instance of a feed document, we would still permit duplicate ID's *over time. Because duplicate ids appear, over time, whenever you update an entry, the aggregator has to have all the logic needed to handle them in the *stream* of entries that it reads -- over time. This issue only becomes interesting if we try to provide special rules for the handling of data within a single instance of a feed document. The reality is, however, that any aggregator that actually pays attention to these special case rules is going to either get more complex (since it can't simply treat everything as a stream of entries) or it will get confused (since folk will intentionally or unintentionally create duplicate ids). This ban on duplicate ids provides no benefit for aggregators, it makes feed producers more complex, it tempts aggregator or client writers to do dangerous things, it forces deletion of data that is useful to some people for some applications, it puts too much emphasis on feeds when we should be working on entries, etc... It is a really bad thing to do. bob wyman
RE: entry definition
Henry Story wrote: An Atom Entry is a resource (identified by atom:id) whose representations (atom:entry) describe the state of a web resource at a time (the link alternate). I think that if this is not 100% correct then it is at least very close to whatever correct actually is. bob wyman
RE: Autodiscovery
Sjoerd Visscher wrote: [HTML 4.01 says:] This attribute describes the relationship from the current document to the anchor specified by the href attribute. The value of this attribute is a space-separated list of link types. But, if you copy HTML from one document to another, or you construct an HTML document from parts, you risk carrying a tags with rel attributes from one document to another. If I quote some HTML in a new HTML document and the quoted HTML includes rel=alternate in an a tag, are we really saying that the presence of rel=alternate in the quoted text establishes a relation of the new HTML document as a whole? Personally, I think there is a serious scoping problem here. We've got attributes of separable components of a page establishing metadata for the page as a whole. Not good. bob wyman