Re: Using mahout to cluster terms in Lucene

Ted Dunning Tue, 29 Sep 2009 14:14:12 -0700

Another way to do this through the back door is to transpose the document
set so that you have a list of documents for each term.  Index this and
cluster it just as if it were normal documents and you will have a form of
term clustering.


On Tue, Sep 29, 2009 at 1:05 PM, Grant Ingersoll <[email protected]>wrote:

> The LDA implementation kind of clusters on terms to generate topics.  It
> sounds like you want some co-occurrence analysis, I'm not sure that the
> clustering algorithms are best for that, but perhaps others have insight.
>  I could imagine doing this with HBase or Pig and just keeping a matrix
> where each cell kept track of the number of times both terms appear in a
> document (or even within some window in a document).
>
>
>
> On Sep 29, 2009, at 8:57 AM, Ole-Martin Mørk wrote:
>
>  Hi.
>> I have been using org.apache.mahout.utils.vectors.lucene.Driver
>> and org.apache.mahout.clustering.kmeans.KMeansDriver to cluster documents
>> in
>> our Lucene index and it works great! I am wondering though, is it possible
>> to use Mahout to cluster terms?
>>
>> I want to cluster terms that often appear in the same documents.
>>
>> Thank you.
>>
>> --
>> Ole-Martin Mørk
>> http://twitter.com/olemartin
>> http://flickr.com/olemartin
>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>


-- 
Ted Dunning, CTO
DeepDyve

Re: Using mahout to cluster terms in Lucene

Reply via email to