Ahmed, if you want the raw score, you can do it the way you describe below.
--- On Sun, 8/22/10, ahmed algohary <algoharya...@gmail.com> wrote: > From: ahmed algohary <algoharya...@gmail.com> > Subject: Re: Calculate Term Co-occurrence Matrix > To: java-user@lucene.apache.org > Date: Sunday, August 22, 2010, 9:27 AM > I think I got it. > > In the CollectionIndexer class, I have added the > co-occurrence score to the > index document: > > doc.add(new Field("score", collocation.getScore() + "", > > Field.Store.YES, Field.Index.NOT_ANALYZED)); > > then in the CollectionSearcher, the scores can be > retrieved: > > d.get("score") > > Is that correct ?? > > On Sun, Aug 22, 2010 at 2:47 PM, ahmed algohary <algoharya...@gmail.com>wrote: > > > Thanks! It is exactly what I need. But, isn't there a > way to get the > > matching score ? > > > > for example, "damaged" co-occurs with "shipment" > with a probability = 0.4 > > ?? > > > > > > On Sun, Aug 22, 2010 at 5:35 AM, Ivan Provalov <iprov...@yahoo.com> > wrote: > > > >> Ahmed, > >> > >> FYI, I updated the term collocations package I > mentioned earlier with a > >> few fixes and changes which will make it work for > Lucene 3.0.2. This may > >> help your task. > >> > >> See: > >> https://issues.apache.org/jira/browse/LUCENE-474 > >> > >> Thanks, > >> > >> Ivan Provalov > >> > >> > >> --- On Sat, 8/21/10, Otis Gospodnetic <otis_gospodne...@yahoo.com> > wrote: > >> > >> > From: Otis Gospodnetic <otis_gospodne...@yahoo.com> > >> > Subject: Re: Calculate Term Co-occurrence > Matrix > >> > To: java-user@lucene.apache.org > >> > Date: Saturday, August 21, 2010, 8:05 AM > >> > Ahmed, > >> > > >> > That's what that KPE (link in my previous > email, below) > >> > will do for you. It's > >> > not open source at this time, but that is > exactly one of > >> > the things it does. I > >> > think Mahout collocations stuff might work > for you, too. > >> > > >> > Otis > >> > ---- > >> > Sematext :: http://sematext.com/ :: Solr - Lucene - > Nutch > >> > Lucene ecosystem search :: http://search-lucene.com/ > >> > > >> > > >> > > >> > ----- Original Message ---- > >> > > From: ahmed algohary <algoharya...@gmail.com> > >> > > To: java-user@lucene.apache.org > >> > > Sent: Sat, August 21, 2010 7:20:03 AM > >> > > Subject: Re: Calculate Term > Co-occurrence Matrix > >> > > > >> > > Thanks for all your answers! > >> > > > >> > > it seems like I did not make my > question clear. > >> > I have a text corpus and I > >> > > need to determine the pairs of words > that occur > >> > together in many documents. > >> > > I need to do that to be able to measure > the > >> > semantic proximity between > >> > > words. This method is expanded > >> > > here<http://forums.searchenginewatch.com/showthread.php?t=48>. > >> > > I hope to find some code that > given a text > >> > corpus, generate all the words > >> > > pairs with their probability of > occurring > >> > together. > >> > > > >> > > > >> > > On Sat, Aug 21, 2010 at 1:46 AM, > Otis > >> > Gospodnetic < > >> > > otis_gospodne...@yahoo.com> > >> > wrote: > >> > > > >> > > > There is also a non-Mahout Key > Phrase Extractor > >> > for Collocations, SIPs, and > >> > > > a > >> > > > few other things: > >> > > > http://sematext.com/products/key-phrase-extractor/index.html > >> > > > > >> > > > One of the demos that uses > news data is at > >> > > > http://sematext.com/demo/kpe/index.html > >> > > > > >> > > > Otis > >> > > > ---- > >> > > > Sematext :: http://sematext.com/ :: Solr - Lucene - > >> > Nutch > >> > > > Lucene ecosystem search :: http://search-lucene.com/ > >> > > > > >> > > > > >> > > > > >> > > > ----- Original Message ---- > >> > > > > From: Grant Ingersoll <gsing...@apache.org> > >> > > > > To: java-user@lucene.apache.org > >> > > > > Sent: Fri, August 20, > 2010 8:52:17 AM > >> > > > > Subject: Re: Calculate > Term > >> > Co-occurrence Matrix > >> > > > > > >> > > > > You might also be > interested in > >> > Mahout's collocations package: > >> > > > >http://cwiki.apache.org/confluence/display/MAHOUT/Collocations > >> > > > > > >> > > > > -Grant > >> > > > > On Aug 19, 2010, at > 11:39 AM, > >> > ahmed algohary wrote: > >> > > > > > >> > > > > > Hi all, > >> > > > > > > >> > > > > > I need to know if > there is a > >> > Lucene plug-in or a Lucene-based > API for > >> > > > > > calculating the term > co-occurrence > >> > matrix for a given text corpus. > >> > > > > > > >> > > > > > Thanks! > >> > > > > > > >> > > > > > -- > >> > > > > > Ahmed > >> > > > > > >> > > > > -------------------------- > >> > > > > Grant Ingersoll > >> > > > > http://www.lucidimagination.com/ > >> > > > > > >> > > > > Search the Lucene > ecosystem > >> > using Solr/Lucene: > >> > > > >http://www.lucidimagination.com/search > >> > > > > > >> > > > > > >> > > > > > >> > > --------------------------------------------------------------------- > >> > > > > To unsubscribe, > e-mail: java-user-unsubscr...@lucene.apache.org > >> > > > > For additional > commands, e-mail: > >> > java-user-h...@lucene.apache.org > >> > > > > > >> > > > > > >> > > > > >> > > > > >> > > --------------------------------------------------------------------- > >> > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> > > > For additional commands, > e-mail: java-user-h...@lucene.apache.org > >> > > > > >> > > > > >> > > > >> > > >> > > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> > For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > > >> > > >> > >> > >> > >> > >> > --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org