Ted,

Some time back I had thought about this idea. But, I sensed one
potential problem with this approach. The resulting co-occurrence will
be bi-directional. For document this property is fine, but for terms,
it may not be desirable in some cases.

For example, if "Roger Federer" is the keyword, the co-occuring terms
will be "Tennis", "Grand slam", "Wimbledon", etc. But, for "Tennis",
the list of top co-occurring terms may not include "Roger Federer."

Is there a way to identify the directional relationship among terms?

Of course, this was just a thought and no real code was written to
verify the assertion.

--shashi

On Wed, Sep 30, 2009 at 2:43 AM, Ted Dunning <[email protected]> wrote:
> Another way to do this through the back door is to transpose the document
> set so that you have a list of documents for each term.  Index this and
> cluster it just as if it were normal documents and you will have a form of
> term clustering.
>
> On Tue, Sep 29, 2009 at 1:05 PM, Grant Ingersoll <[email protected]>wrote:
>
>> The LDA implementation kind of clusters on terms to generate topics.  It
>> sounds like you want some co-occurrence analysis, I'm not sure that the
>> clustering algorithms are best for that, but perhaps others have insight.
>>  I could imagine doing this with HBase or Pig and just keeping a matrix
>> where each cell kept track of the number of times both terms appear in a
>> document (or even within some window in a document).
>>
>>
>>
>> On Sep 29, 2009, at 8:57 AM, Ole-Martin Mørk wrote:
>>
>>  Hi.
>>> I have been using org.apache.mahout.utils.vectors.lucene.Driver
>>> and org.apache.mahout.clustering.kmeans.KMeansDriver to cluster documents
>>> in
>>> our Lucene index and it works great! I am wondering though, is it possible
>>> to use Mahout to cluster terms?
>>>
>>> I want to cluster terms that often appear in the same documents.
>>>
>>> Thank you.
>>>
>>> --
>>> Ole-Martin Mørk
>>> http://twitter.com/olemartin
>>> http://flickr.com/olemartin
>>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Reply via email to