I would like to be able to analyze my document collection (~1200 documents) and discover good "buckets" of categories for them. I'm pretty sure this is termed Document Clustering .. finding the emergent clumps the documents fall naturally into judging from their term vectors.

Looking at the discussion that flared roughly a year ago (last message 2003-11-12) with the subject Document Clustering, it seems Lucene should be able to help with this. Has anyone had success with this recently?

Last year it was suggested Carrot2 could help, and it would even produce good labels for the clusters. Has this proven to be true? Our goal is to use clustering to build a nifty graphic interface, probably using Flash.

Thanks for any pointers.

Owen


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to