Roger Benningfield wrote: > We've got a mechanism that allows any user with his own domain > and a text editor to tell us whether or not he wants us messing with > his stuff. I think it's foolish to ignore that. The problem is that we have *many* such mechanisms. Robots.txt is only one. Others have been mentioned on this list in the past. Others are buried in obscure posts that you really have to dig to find. How do we decide which mechanisms to use? Also, since I don't think robots.txt was intended to be used for services like the aggregators we're discussing, I believe that for us to encourage people to use it in the way you suggest would be an abuse of the robots.txt system.
> Bob: What about FeedMesh? If I ping blo.gs, they pass that ping > along to you, and PubSub fetches my feed, then PubSub is doing > something a desktop client doesn't do. Wrong. Some desktop clients *do* work like FeedMesh. Consider the Shrook distributed checking system[1]. FeedMesh and PubSub work very much like Shrook's desktop clients do. In the Shrook system, all the desktop clients report back updates that they have found to a central service that then distributes the update info to other clients. The result is that the amount of polling that goes on is drastically reduced and the freshness of data is increased since every client benefits from the polling of all other clients. Although no single client might poll a site more frequently than once an hour, if you have 60 Shrook clients each polling once an hour, each client is getting the effect of polling every minute... The Shrook model is basically the same as the FeedMesh model except that in FeedMesh you typically ask for info on ALL sites whereas in Shrook, you typically only get updates for a smaller, enumerated set of feeds. However, the number of feeds you monitor does not change the basic nature of the distributed checking system. Shrook and FeedMesh are, as far as I'm concerned, largely indistinguishable in this area. (There are some detail differences of course. For instance, Shrook worries about client privacy issues that aren't relevant in the FeedMesh case.) Remember, PubSub only deals with data from Pings and from sites that have been manually added to our system. We don't do any web scraping and we don't follow links to find other blogs. Also, we filter out of our system feeds that originate with services that are known to scrape web pages and inject data that was not intended by the original publisher to appear in feeds. (Often, people try to get around partial feeds by "filling in the missing bits by scraping from blog's websites.) Thus, we filter out any feed that comes from a service like Technorati since they scrape blogs and inject scraped content into feeds without the explicit approval or consent of the publishers of the sites they scraped. bob wyman [1] http://www.fondantfancies.com/apps/shrook/distfaq.php