Graham wrote: > I don't seriously believe any aggregator that uses the content hash > approach would survive very long in the market place without being > buried under user complaints. Most (the one's I know of) either use > identifiers or failing that some subset of the elements. The "identifier-based" approach works wonderfully for a single user aggregator which has a "feed" focus as most personal aggregators do today. However, as pointed out elsewhere, the identifier-based approaches don't work in cross-feed duplicate detection because of the ease with which denial of service attacks can be launched. If a subset of elements is to be used in duplicate detection, what is that subset? Can/Should this subset be commonly known? It seems to me that it is important enough to the atom-ecosphere that it might even make sense to have it in the spec as an important interoperability note. i.e. "Entries will be considered duplicates if...."
> Shrook uses an adaptive subset that means that if one element is > unreliable it uses the others. Can you describe your algorithm? What do you consider "unreliable" to mean? bob wyman