Another way to do this through the back door is to transpose the document set so that you have a list of documents for each term. Index this and cluster it just as if it were normal documents and you will have a form of term clustering.
On Tue, Sep 29, 2009 at 1:05 PM, Grant Ingersoll <[email protected]>wrote: > The LDA implementation kind of clusters on terms to generate topics. It > sounds like you want some co-occurrence analysis, I'm not sure that the > clustering algorithms are best for that, but perhaps others have insight. > I could imagine doing this with HBase or Pig and just keeping a matrix > where each cell kept track of the number of times both terms appear in a > document (or even within some window in a document). > > > > On Sep 29, 2009, at 8:57 AM, Ole-Martin Mørk wrote: > > Hi. >> I have been using org.apache.mahout.utils.vectors.lucene.Driver >> and org.apache.mahout.clustering.kmeans.KMeansDriver to cluster documents >> in >> our Lucene index and it works great! I am wondering though, is it possible >> to use Mahout to cluster terms? >> >> I want to cluster terms that often appear in the same documents. >> >> Thank you. >> >> -- >> Ole-Martin Mørk >> http://twitter.com/olemartin >> http://flickr.com/olemartin >> > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > > -- Ted Dunning, CTO DeepDyve
