Heh. What Ted said, but longer-winded. On Tue, Sep 29, 2009 at 2:13 PM, Ted Dunning <[email protected]> wrote:
> Another way to do this through the back door is to transpose the document > set so that you have a list of documents for each term. Index this and > cluster it just as if it were normal documents and you will have a form of > term clustering. > > On Tue, Sep 29, 2009 at 1:05 PM, Grant Ingersoll <[email protected] > >wrote: > > > The LDA implementation kind of clusters on terms to generate topics. It > > sounds like you want some co-occurrence analysis, I'm not sure that the > > clustering algorithms are best for that, but perhaps others have insight. > > I could imagine doing this with HBase or Pig and just keeping a matrix > > where each cell kept track of the number of times both terms appear in a > > document (or even within some window in a document). > > > > > > > > On Sep 29, 2009, at 8:57 AM, Ole-Martin Mørk wrote: > > > > Hi. > >> I have been using org.apache.mahout.utils.vectors.lucene.Driver > >> and org.apache.mahout.clustering.kmeans.KMeansDriver to cluster > documents > >> in > >> our Lucene index and it works great! I am wondering though, is it > possible > >> to use Mahout to cluster terms? > >> > >> I want to cluster terms that often appear in the same documents. > >> > >> Thank you. > >> > >> -- > >> Ole-Martin Mørk > >> http://twitter.com/olemartin > >> http://flickr.com/olemartin > >> > > > > -------------------------- > > Grant Ingersoll > > http://www.lucidimagination.com/ > > > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > > Solr/Lucene: > > http://www.lucidimagination.com/search > > > > > > > -- > Ted Dunning, CTO > DeepDyve >
