Antone Roundy wrote: > If PubSub is subscribed to the feed pointed to by > atom:[EMAIL PROTECTED]'self'], then can't you simply drop any entries > claiming to be from that feed but found at a different URI? You'll be > getting those entries from the source if they're legitimate, so is > there any reason to accept them from elsewhere? Or are you keeping > them in case they contain additional information added by someone else? Certainly, the content of an entry might be different depending on its source. PubSub, for instance, now adds what we consider to be important information to entries and we'll be doing more of that in the future. But, that's only one of the reasons for concern here.
The mere fact that an entry has been published in a secondary feed is, in some cases, important information and in some cases, we should be delivering up content which in other situations would be filtered out as duplicative. Consider something like Scoble's well-known "LinkBlog" which contains copies of things that he has found to be interesting when reading through blogs. If we followed the policy of discarding everything that was sourced somewhere outside the LinkBlog itself, we would never deliver any content to anyone who subscribed explicitly to receive data from Scoble's LinkBlog... (i.e. A PubSub weblog subscription such as: "SOURCE:www.scobleizer.com/linkblog" ) What we are considering doing to handle things like link blogs and other cases where copied entries are found is flagging items as "duplicates" when we discover them and then attaching a default "no-duplicates" predicate on all normal subscriptions. However, subscriptions that included "SOURCE:" predicates would have an "allow duplicates" predicate attached to them. (Yes, users could override this.) This means that you wouldn't get duplicates with a general subscription, however, you *would* receive content that we would normally filter out -- if you subscribed to a specific feed. Internally, this means that we would process all new or changed entries that we find in any feed, but whether or not a duplicate is delivered to the user depends on the users' subscription. Does this sound complicated? Well, it is... But, providing an intermediary like we do just isn't quite as simple as building a desktop aggregator. There are a number of other similar situations that are problematic. bob wyman