Re: Using mahout to cluster terms in Lucene

Grant Ingersoll Tue, 29 Sep 2009 13:05:52 -0700

The LDA implementation kind of clusters on terms to generate topics.It sounds like you want some co-occurrence analysis, I'm not sure thatthe clustering algorithms are best for that, but perhaps others haveinsight. I could imagine doing this with HBase or Pig and justkeeping a matrix where each cell kept track of the number of timesboth terms appear in a document (or even within some window in adocument).


On Sep 29, 2009, at 8:57 AM, Ole-Martin Mørk wrote:

Hi.
I have been using org.apache.mahout.utils.vectors.lucene.Driver
and org.apache.mahout.clustering.kmeans.KMeansDriver to clusterdocuments inour Lucene index and it works great! I am wondering though, is itpossible
to use Mahout to cluster terms?

I want to cluster terms that often appear in the same documents.

Thank you.

--
Ole-Martin Mørk
http://twitter.com/olemartin
http://flickr.com/olemartin


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)using Solr/Lucene:

http://www.lucidimagination.com/search

Re: Using mahout to cluster terms in Lucene

Reply via email to