I need to auto-categorize a large number of documents. They are basically news 
articles from major news sources (nytimes, npr, abcnews, etc).
I'd like to categorize them automatically. Any suggestions?
Lucene in Action suggests using a set of documents to build category vectors 
and then comparing each document to each of those vectors and get the closest 
one.
The approach seems pretty simple (from other papers I read on text 
categorization) but maybe you guys know of something out there that already 
does this using Lucene/Solr.
Thanks!
Maria

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to