On Wed, Sep 30, 2009 at 7:56 AM, Tim Cornwell <tc...@cornell.edu> wrote: > 41,000 sites and 21 million pages (http://www.ablegrape.com/en/about.html) is > a lot of > vetting. ... > Authoratative vetting of a large volume of resources is a hard problem. I > haven't seen > any good solutions, but am leaning toward crowd-sourcing with an > authoratative crowd. :-) > > Do you have any additional information on how AbleGrape vets these?
I can only guess, but I would think it's probably a combination of automatic and manual vetting: crawl the links from known "good sites", filter out bad sites, filter out off-topic sites, manually add newly-discovered sites not already in the index, manually remove inappropriate sites that somehow made it into the index, adjust the algorithms, try to build a user community and solicit feedback. (I once reported inappropriate results coming from a wine producer's website that had been taken over by vandals, and AbleGrape removed it from the index almost immediately.) Keith