Ahmed,

That's what that KPE (link in my previous email, below) will do for you.  It's 
not open source at this time, but that is exactly one of the things it does.  I 
think Mahout collocations stuff might work for you, too.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: ahmed algohary <algoharya...@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Sat, August 21, 2010 7:20:03 AM
> Subject: Re: Calculate Term Co-occurrence Matrix
> 
> Thanks for all your answers!
> 
> it seems like I did not make my question  clear. I have a text corpus and I
> need to determine the pairs of words that  occur together in many documents.
> I need to do that to be able to measure the  semantic proximity between
> words. This method is expanded
> here<http://forums.searchenginewatch.com/showthread.php?t=48>.
> I hope to  find some code that given a text corpus, generate all the words
> pairs with  their probability of occurring together.
> 
> 
> On Sat, Aug 21, 2010 at 1:46  AM, Otis Gospodnetic <
> otis_gospodne...@yahoo.com>  wrote:
> 
> > There is also a non-Mahout Key Phrase Extractor for  Collocations, SIPs, and
> > a
> > few other things:
> > http://sematext.com/products/key-phrase-extractor/index.html
> >
> >  One of the demos that uses news data is at
> > http://sematext.com/demo/kpe/index.html
> >
> > Otis
> >  ----
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Lucene ecosystem  search :: http://search-lucene.com/
> >
> >
> >
> > ----- Original  Message ----
> > > From: Grant Ingersoll <gsing...@apache.org>
> > > To: java-user@lucene.apache.org
> >  > Sent: Fri, August 20, 2010 8:52:17 AM
> > > Subject: Re: Calculate  Term Co-occurrence Matrix
> > >
> > > You might also be interested  in Mahout's collocations package:
> > >http://cwiki.apache.org/confluence/display/MAHOUT/Collocations
> >  >
> > > -Grant
> > > On  Aug 19, 2010, at 11:39 AM, ahmed  algohary wrote:
> > >
> > > > Hi all,
> > > >
> >  > > I need to know if there is a Lucene plug-in or a Lucene-based  API  for
> > > > calculating the term co-occurrence matrix for a  given text  corpus.
> > > >
> > > > Thanks!
> >  > >
> > > > --
> > > >  Ahmed
> >  >
> > > --------------------------
> > > Grant  Ingersoll
> > > http://www.lucidimagination.com/
> > >
> > > Search the  Lucene ecosystem using  Solr/Lucene:
> > >http://www.lucidimagination.com/search
> > >
> > >
> >  >  ---------------------------------------------------------------------
> >  > To  unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >  > For  additional commands, e-mail: java-user-h...@lucene.apache.org
> >  >
> > >
> >
> >  ---------------------------------------------------------------------
> > To  unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >  For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to