Re: PaceAllowDuplicateIDs

Henry Story Thu, 05 May 2005 03:45:02 -0700


I hope it won't be thought rude for me to answer your questions to Tim.

On 5 May 2005, at 10:52, Graham wrote:

On 5 May 2005, at 5:02 am, Tim Bray wrote:


[I have added the id's tim mentioned he forgot]

<feed><title>My Portfolio</title>
 ....
 <entry>
     <title>MSFT</title>
     <id>urn:uuid:1335c695-cfb8-4ebb-aaaa-80da344efa6a</id>
     <updated>2005-05-03T10:00:00-05:00</updated>
     <content>Bid: 25.20 Ask: 25.50 Last: 25.20</content></item>
 </entry>
 <entry>
    <title>MSFT</title>
    <id>urn:uuid:1335c695-cfb8-4ebb-aaaa-80da344efa6a</id>
    <updated>2005-05-03T11:00:00-05:00</updated>
    <content>Bid: 25.15 Ask: 25.25 Last: 25.20</content></item>
 </entry>
 <entry>
    <title>MSFT</title>
    <id>urn:uuid:1335c695-cfb8-4ebb-aaaa-80da344efa6a</id>
    <updated>2005-05-03T12:00:00-05:00</updated>
    <content>Bid: 25.10 Ask: 25.15 Last: 25.10</content></item>
 </entry>
</feed>


So with these additions lets look at your question:

Tim, model this as a blog first. Is it:
a) One entry that's being updated?
b) Hourly new postings with the latest price?

Given that the ids are the same it is now clear that we have situation a)

See, I think it's b). Which under any sensible circumstance would count as new entries, and therefoe get new ids. You're trying to use atom:id as a category system here.

No, this is just the way resources on the internet work. If you go to <http://google.com/> you will get a different web page every day. A resource can have different representation at different times. Google could of course architecture its web site differently and have <http://google.com/> be simply a redirect to a dated resource that would be unchanging. Both work.

Let's say I post a new picture of my cat every day. Should all my blog entries have the same id?

If all your cat blog entries had the same id, and you posted a new picture of your cat every day, then you should expect that some aggregators or clients might dump all but the latest version of your entry.

So it depends on what you want.

If you want a feed of how your cat changed, and you want to make sure that people will be able to see all the changes to your cat, then you would have a different entry for each of the cat states, and each of these would have their own "alternate" link.

If you want an entry about what you think of your cat today, or what your cat looks like today, then you would have one entry with the same id changing on a daily basis. Aggregators could choose to keep the older versions, or they could choose to discard them.

Technical problems: The problem multiple ids is that we don't have a date element that provides a definitive answer to the question, "What is the current version?", which 99% of the time is all an aggregator needs.

You can never know about any resource on the net what the "current version" is. This is due to limits such as the speed of light, the speed at which data can travel through the pipes across the world, etc...

But it does not matter. All you may be interested in at any point in time is what is the latest version that *you* have available. You can't do better than that. So if you get a feed with two entries with the same id, you just look at the time stamp and keep the latest. If you find the same id in another trusted feed, you make the same comparison.

For example, what happens if I retract an update to an entry, and presumably roll back atom:updated? The new version stays? If so, the spec of atom:updated needs changing.

There are two situations here: A. You remove the later entry representation that you placed in your feed. You can do this, but it won't stop people who have already retrieved your feed from having received the old representation of the feed, with the non retracted entry. Remember that the internet has a memory, both long term [1] and short term. For example all the caches in the world that this representation may have travelled through may also contain that representation. The more intelligent caches (such as the one in an aggregator) may on later retrieving your altered feed and comparing the entry therein with the one they have in memory, decide to keep the one they have in memory, since that is a later version of your entry. B. You make clear that your retraction is an event, and so you add a new entry with that id (you may, but need not, remove the old entry representation (the <entry>....</entry>)) and with a new time stamp, but with the old content.

I suggest that B. would be the more honest and less confusing option to the readers of your feed.

I see you have the constraint "Their atom:updated timestamps SHOULD be different, and processing software SHOULD regard entries with duplicate atom:id and atom:updated values as evidence of an error in the feed generation". Does this apply temporally as well as spatially? For example, if the content changes the second time I load something, but the atom:updated doesn't, is that an error?

As I see, Tim's language is currently restricted to what happens with two entries with the same id in the same feed document. What happens across feed documents is not defined here.

I myself think that having entries with different contents with the same id across feeds, would be confusing to receiving software and would be close to being an error. Ie. if you do that then you should not be surprised if your clients behave erratically, either ignoring the second version, or flagging an error, or keeping the second entry over the first.

But then this would be the case for any similar situation. Imagine that your bank publishes xml of the history of your accounts, and changes the content for a particular month, without notifying you that anything has changed. I think this would be a cause for concern, and would justify you looking into it.

In any case this is something we may or may not define. It is not a problem with PaceAllowDuplicateIDs. In fact you have just uncovered something that was problematic before.

Again, atom:updated falls short for this purpose.

No. You have just shown that if you make consequential changes to an entry and don't change your atom:updated field, you will end up with interoperability issues.

Finally, at pubsub, what happens when they download an entry from one feed, then the user edits it, but doesn't modify atom:updated,

Either the change they made is "significant" or it is not. If it is a significant change then by not changing the atom:updated field the user will have done something other than what he thought he was doing. For by not changing the date he is allowing receiving software to decide by themselves whether they wish to keep or drop the change. If it is not a significant change, then the receiving software won't be doing anything problematic by either dropping the later version received or keeping it.

then they download the new entry from a second feed associated with the site. Different content, identical atom:ids, identical atom:updated => Invalid feed. They're not in any better position than they were before. This doesn't even solve the problem it's meant to.

"If an Atom Feed Document contains multiple entries with the same atom:id, software MAY choose to display all of them or some subset of them"

What does this even mean, other than "atom:id is meaningless, ignore it"?

If you try out the latest version of BlogEd you can see for yourself. Though BlogEd is a blog editor it keeps all the changes you make to your entry in its database allowing you to visit previous versions.

It still in beta and buggy, but that aspect of the functionality works well. Instructions for installing with jnlp are available here: http://blogs.sun.com/bblfish

So that would be one way to view the previous version of an entry. There will be many others. Many readers will be happy to drop the older versions, but many will be happy to keep them.

I looked around and failed to find how we claimed we were going to do that while still forbidding duplicates, but it's possible I missed that.

Duplicate ids is a constraint of the atom:feed element. Use a different top level element, atom:archive, for archives.

Well that seems like a very complicated way of solving a problem where allowing entries with duplicate ids in a feed document from the start would be much simpler. If you are going to allow <archive> feeds to keep duplicates then why not just allow <feeds> and be done with it?

Henry Story
http://bblfish.net/blog/

[1] http://bblfish.net/blog/page5.html#44

Re: PaceAllowDuplicateIDs

Reply via email to