Antone Roundy wrote:
> If PubSub is subscribed to the feed pointed to by 
> atom:[EMAIL PROTECTED]'self'], then can't you simply drop any entries 
> claiming to be from that feed but found at a different URI?  You'll be 
> getting those entries from the source if they're legitimate, so is 
> there any reason to accept them from elsewhere?  Or are you keeping 
> them in case they contain additional information added by someone else?
        Certainly, the content of an entry might be different depending on
its source. PubSub, for instance, now adds what we consider to be important
information to entries and we'll be doing more of that in the future. But,
that's only one of the reasons for concern here.

        The mere fact that an entry has been published in a secondary feed
is, in some cases, important information and in some cases, we should be
delivering up content which in other situations would be filtered out as
duplicative.
        Consider something like Scoble's well-known "LinkBlog" which
contains copies of things that he has found to be interesting when reading
through blogs. If we followed the policy of discarding everything that was
sourced somewhere outside the LinkBlog itself, we would never deliver any
content to anyone who subscribed explicitly to receive data from Scoble's
LinkBlog... (i.e. A PubSub weblog subscription such as:
"SOURCE:www.scobleizer.com/linkblog" )
        What we are considering doing to handle things like link blogs and
other cases where copied entries are found is flagging items as "duplicates"
when we discover them and then attaching a default "no-duplicates" predicate
on all normal subscriptions. However, subscriptions that included "SOURCE:"
predicates would have an "allow duplicates" predicate attached to them.
(Yes, users could override this.) This means that you wouldn't get
duplicates with a general subscription, however, you *would* receive content
that we would normally filter out -- if you subscribed to a specific feed.
Internally, this means that we would process all new or changed entries that
we find in any feed, but whether or not a duplicate is delivered to the user
depends on the users' subscription.
        Does this sound complicated? Well, it is... But, providing an
intermediary like we do just isn't quite as simple as building a desktop
aggregator.
        There are a number of other similar situations that are problematic.

                bob wyman



Reply via email to