The LDA implementation kind of clusters on terms to generate topics.
It sounds like you want some co-occurrence analysis, I'm not sure that
the clustering algorithms are best for that, but perhaps others have
insight. I could imagine doing this with HBase or Pig and just
keeping a matrix where each cell kept track of the number of times
both terms appear in a document (or even within some window in a
document).
On Sep 29, 2009, at 8:57 AM, Ole-Martin Mørk wrote:
Hi.
I have been using org.apache.mahout.utils.vectors.lucene.Driver
and org.apache.mahout.clustering.kmeans.KMeansDriver to cluster
documents in
our Lucene index and it works great! I am wondering though, is it
possible
to use Mahout to cluster terms?
I want to cluster terms that often appear in the same documents.
Thank you.
--
Ole-Martin Mørk
http://twitter.com/olemartin
http://flickr.com/olemartin
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search