Hi Ivan thanx a lot for this. I just caught time to see this and reply, sorry for bugging again, I appreciate already what you uploaded . I would also like to ask one question, if you dont mind. If it is possible somehow to get from this unified list of frequently occuring unigrams, bigrams and trigrams with their frequencies????
Thank you very much On Mon, Aug 23, 2010 at 3:22 PM, Ivan Provalov <iprov...@yahoo.com> wrote: > Ahmed, if you want the raw score, you can do it the way you describe below. > > > > --- On Sun, 8/22/10, ahmed algohary <algoharya...@gmail.com> wrote: > > > From: ahmed algohary <algoharya...@gmail.com> > > Subject: Re: Calculate Term Co-occurrence Matrix > > To: java-user@lucene.apache.org > > Date: Sunday, August 22, 2010, 9:27 AM > > I think I got it. > > > > In the CollectionIndexer class, I have added the > > co-occurrence score to the > > index document: > > > > doc.add(new Field("score", collocation.getScore() + "", > > > > Field.Store.YES, Field.Index.NOT_ANALYZED)); > > > > then in the CollectionSearcher, the scores can be > > retrieved: > > > > d.get("score") > > > > Is that correct ?? > > > > On Sun, Aug 22, 2010 at 2:47 PM, ahmed algohary <algoharya...@gmail.com > >wrote: > > > > > Thanks! It is exactly what I need. But, isn't there a > > way to get the > > > matching score ? > > > > > > for example, "damaged" co-occurs with "shipment" > > with a probability = 0.4 > > > ?? > > > > > > > > > On Sun, Aug 22, 2010 at 5:35 AM, Ivan Provalov <iprov...@yahoo.com> > > wrote: > > > > > >> Ahmed, > > >> > > >> FYI, I updated the term collocations package I > > mentioned earlier with a > > >> few fixes and changes which will make it work for > > Lucene 3.0.2. This may > > >> help your task. > > >> > > >> See: > > >> https://issues.apache.org/jira/browse/LUCENE-474 > > >> > > >> Thanks, > > >> > > >> Ivan Provalov > > >> > > >> > > >> --- On Sat, 8/21/10, Otis Gospodnetic <otis_gospodne...@yahoo.com> > > wrote: > > >> > > >> > From: Otis Gospodnetic <otis_gospodne...@yahoo.com> > > >> > Subject: Re: Calculate Term Co-occurrence > > Matrix > > >> > To: java-user@lucene.apache.org > > >> > Date: Saturday, August 21, 2010, 8:05 AM > > >> > Ahmed, > > >> > > > >> > That's what that KPE (link in my previous > > email, below) > > >> > will do for you. It's > > >> > not open source at this time, but that is > > exactly one of > > >> > the things it does. I > > >> > think Mahout collocations stuff might work > > for you, too. > > >> > > > >> > Otis > > >> > ---- > > >> > Sematext :: http://sematext.com/ :: Solr - Lucene - > > Nutch > > >> > Lucene ecosystem search :: http://search-lucene.com/ > > >> > > > >> > > > >> > > > >> > ----- Original Message ---- > > >> > > From: ahmed algohary <algoharya...@gmail.com> > > >> > > To: java-user@lucene.apache.org > > >> > > Sent: Sat, August 21, 2010 7:20:03 AM > > >> > > Subject: Re: Calculate Term > > Co-occurrence Matrix > > >> > > > > >> > > Thanks for all your answers! > > >> > > > > >> > > it seems like I did not make my > > question clear. > > >> > I have a text corpus and I > > >> > > need to determine the pairs of words > > that occur > > >> > together in many documents. > > >> > > I need to do that to be able to measure > > the > > >> > semantic proximity between > > >> > > words. This method is expanded > > >> > > here<http://forums.searchenginewatch.com/showthread.php?t=48>. > > >> > > I hope to find some code that > > given a text > > >> > corpus, generate all the words > > >> > > pairs with their probability of > > occurring > > >> > together. > > >> > > > > >> > > > > >> > > On Sat, Aug 21, 2010 at 1:46 AM, > > Otis > > >> > Gospodnetic < > > >> > > otis_gospodne...@yahoo.com> > > >> > wrote: > > >> > > > > >> > > > There is also a non-Mahout Key > > Phrase Extractor > > >> > for Collocations, SIPs, and > > >> > > > a > > >> > > > few other things: > > >> > > > http://sematext.com/products/key-phrase-extractor/index.html > > >> > > > > > >> > > > One of the demos that uses > > news data is at > > >> > > > http://sematext.com/demo/kpe/index.html > > >> > > > > > >> > > > Otis > > >> > > > ---- > > >> > > > Sematext :: http://sematext.com/ :: Solr - Lucene - > > >> > Nutch > > >> > > > Lucene ecosystem search :: http://search-lucene.com/ > > >> > > > > > >> > > > > > >> > > > > > >> > > > ----- Original Message ---- > > >> > > > > From: Grant Ingersoll <gsing...@apache.org> > > >> > > > > To: java-user@lucene.apache.org > > >> > > > > Sent: Fri, August 20, > > 2010 8:52:17 AM > > >> > > > > Subject: Re: Calculate > > Term > > >> > Co-occurrence Matrix > > >> > > > > > > >> > > > > You might also be > > interested in > > >> > Mahout's collocations package: > > >> > > > >http://cwiki.apache.org/confluence/display/MAHOUT/Collocations > > >> > > > > > > >> > > > > -Grant > > >> > > > > On Aug 19, 2010, at > > 11:39 AM, > > >> > ahmed algohary wrote: > > >> > > > > > > >> > > > > > Hi all, > > >> > > > > > > > >> > > > > > I need to know if > > there is a > > >> > Lucene plug-in or a Lucene-based > > API for > > >> > > > > > calculating the term > > co-occurrence > > >> > matrix for a given text corpus. > > >> > > > > > > > >> > > > > > Thanks! > > >> > > > > > > > >> > > > > > -- > > >> > > > > > Ahmed > > >> > > > > > > >> > > > > -------------------------- > > >> > > > > Grant Ingersoll > > >> > > > > http://www.lucidimagination.com/ > > >> > > > > > > >> > > > > Search the Lucene > > ecosystem > > >> > using Solr/Lucene: > > >> > > > >http://www.lucidimagination.com/search > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > --------------------------------------------------------------------- > > >> > > > > To unsubscribe, > > e-mail: java-user-unsubscr...@lucene.apache.org > > >> > > > > For additional > > commands, e-mail: > > >> > java-user-h...@lucene.apache.org > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > --------------------------------------------------------------------- > > >> > > > To unsubscribe, e-mail: > java-user-unsubscr...@lucene.apache.org > > >> > > > For additional commands, > > e-mail: java-user-h...@lucene.apache.org > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > --------------------------------------------------------------------- > > >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > >> > For additional commands, e-mail: java-user-h...@lucene.apache.org > > >> > > > >> > > > >> > > >> > > >> > > >> > > >> > > --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > >> > > >> > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >