I need to auto-categorize a large number of documents. They are basically news articles from major news sources (nytimes, npr, abcnews, etc). I'd like to categorize them automatically. Any suggestions? Lucene in Action suggests using a set of documents to build category vectors and then comparing each document to each of those vectors and get the closest one. The approach seems pretty simple (from other papers I read on text categorization) but maybe you guys know of something out there that already does this using Lucene/Solr. Thanks! Maria
--------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org