ok, thank you Ivan!! On Tue, Aug 24, 2010 at 5:13 PM, Ivan Provalov <iprov...@yahoo.com> wrote:
> Aida, > > Right now it will do two term collocation only. > > Ivan > > > --- On Mon, 8/23/10, Aida Hota <hota.a...@gmail.com> wrote: > > > From: Aida Hota <hota.a...@gmail.com> > > Subject: Re: Calculate Term Co-occurrence Matrix > > To: java-user@lucene.apache.org > > Date: Monday, August 23, 2010, 1:36 PM > > Hi Ivan thanx a lot for this. I just > > caught time to see this and reply, > > sorry for bugging again, I appreciate already what you > > uploaded . I would > > also like to ask one question, if you dont mind. If it is > > possible somehow > > to get from this unified list of frequently occuring > > unigrams, bigrams and > > trigrams with their frequencies???? > > > > Thank you very much > > > > > > On Mon, Aug 23, 2010 at 3:22 PM, Ivan Provalov <iprov...@yahoo.com> > > wrote: > > > > > Ahmed, if you want the raw score, you can do it the > > way you describe below. > > > > > > > > > > > > --- On Sun, 8/22/10, ahmed algohary <algoharya...@gmail.com> > > wrote: > > > > > > > From: ahmed algohary <algoharya...@gmail.com> > > > > Subject: Re: Calculate Term Co-occurrence Matrix > > > > To: java-user@lucene.apache.org > > > > Date: Sunday, August 22, 2010, 9:27 AM > > > > I think I got it. > > > > > > > > In the CollectionIndexer class, I have added the > > > > co-occurrence score to the > > > > index document: > > > > > > > > doc.add(new Field("score", > > collocation.getScore() + "", > > > > > > > > Field.Store.YES, Field.Index.NOT_ANALYZED)); > > > > > > > > then in the CollectionSearcher, the scores can > > be > > > > retrieved: > > > > > > > > d.get("score") > > > > > > > > Is that correct ?? > > > > > > > > On Sun, Aug 22, 2010 at 2:47 PM, ahmed algohary > > <algoharya...@gmail.com > > > >wrote: > > > > > > > > > Thanks! It is exactly what I need. But, > > isn't there a > > > > way to get the > > > > > matching score ? > > > > > > > > > > for example, "damaged" co-occurs with > > "shipment" > > > > with a probability = 0.4 > > > > > ?? > > > > > > > > > > > > > > > On Sun, Aug 22, 2010 at 5:35 AM, Ivan > > Provalov <iprov...@yahoo.com> > > > > wrote: > > > > > > > > > >> Ahmed, > > > > >> > > > > >> FYI, I updated the term collocations > > package I > > > > mentioned earlier with a > > > > >> few fixes and changes which will make it > > work for > > > > Lucene 3.0.2. This may > > > > >> help your task. > > > > >> > > > > >> See: > > > > >> https://issues.apache.org/jira/browse/LUCENE-474 > > > > >> > > > > >> Thanks, > > > > >> > > > > >> Ivan Provalov > > > > >> > > > > >> > > > > >> --- On Sat, 8/21/10, Otis Gospodnetic > > <otis_gospodne...@yahoo.com> > > > > wrote: > > > > >> > > > > >> > From: Otis Gospodnetic <otis_gospodne...@yahoo.com> > > > > >> > Subject: Re: Calculate Term > > Co-occurrence > > > > Matrix > > > > >> > To: java-user@lucene.apache.org > > > > >> > Date: Saturday, August 21, 2010, > > 8:05 AM > > > > >> > Ahmed, > > > > >> > > > > > >> > That's what that KPE (link in my > > previous > > > > email, below) > > > > >> > will do for you. It's > > > > >> > not open source at this time, but > > that is > > > > exactly one of > > > > >> > the things it does. I > > > > >> > think Mahout collocations stuff > > might work > > > > for you, too. > > > > >> > > > > > >> > Otis > > > > >> > ---- > > > > >> > Sematext :: http://sematext.com/ :: Solr - Lucene - > > > > Nutch > > > > >> > Lucene ecosystem search :: http://search-lucene.com/ > > > > >> > > > > > >> > > > > > >> > > > > > >> > ----- Original Message ---- > > > > >> > > From: ahmed algohary <algoharya...@gmail.com> > > > > >> > > To: java-user@lucene.apache.org > > > > >> > > Sent: Sat, August 21, 2010 > > 7:20:03 AM > > > > >> > > Subject: Re: Calculate Term > > > > Co-occurrence Matrix > > > > >> > > > > > > >> > > Thanks for all your answers! > > > > >> > > > > > > >> > > it seems like I did not make > > my > > > > question clear. > > > > >> > I have a text corpus and I > > > > >> > > need to determine the pairs of > > words > > > > that occur > > > > >> > together in many documents. > > > > >> > > I need to do that to be able > > to measure > > > > the > > > > >> > semantic proximity between > > > > >> > > words. This method is > > expanded > > > > >> > > here<http://forums.searchenginewatch.com/showthread.php?t=48 > >. > > > > >> > > I hope to find some code > > that > > > > given a text > > > > >> > corpus, generate all the words > > > > >> > > pairs with their > > probability of > > > > occurring > > > > >> > together. > > > > >> > > > > > > >> > > > > > > >> > > On Sat, Aug 21, 2010 at > > 1:46 AM, > > > > Otis > > > > >> > Gospodnetic < > > > > >> > > otis_gospodne...@yahoo.com> > > > > >> > wrote: > > > > >> > > > > > > >> > > > There is also a > > non-Mahout Key > > > > Phrase Extractor > > > > >> > for Collocations, SIPs, and > > > > >> > > > a > > > > >> > > > few other things: > > > > >> > > > > http://sematext.com/products/key-phrase-extractor/index.html > > > > >> > > > > > > > >> > > > One of the demos > > that uses > > > > news data is at > > > > >> > > > http://sematext.com/demo/kpe/index.html > > > > >> > > > > > > > >> > > > Otis > > > > >> > > > ---- > > > > >> > > > Sematext :: http://sematext.com/ :: Solr - Lucene - > > > > >> > Nutch > > > > >> > > > Lucene ecosystem > > search :: http://search-lucene.com/ > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > ----- Original > > Message ---- > > > > >> > > > > From: Grant > > Ingersoll <gsing...@apache.org> > > > > >> > > > > To: java-user@lucene.apache.org > > > > >> > > > > Sent: Fri, > > August 20, > > > > 2010 8:52:17 AM > > > > >> > > > > Subject: Re: > > Calculate > > > > Term > > > > >> > Co-occurrence Matrix > > > > >> > > > > > > > > >> > > > > You might also be > > > > interested in > > > > >> > Mahout's collocations package: > > > > >> > > > > > http://cwiki.apache.org/confluence/display/MAHOUT/Collocations > > > > >> > > > > > > > > >> > > > > -Grant > > > > >> > > > > On Aug 19, > > 2010, at > > > > 11:39 AM, > > > > >> > ahmed algohary wrote: > > > > >> > > > > > > > > >> > > > > > Hi all, > > > > >> > > > > > > > > > >> > > > > > I need to > > know if > > > > there is a > > > > >> > Lucene plug-in or a Lucene-based > > > > API for > > > > >> > > > > > calculating the > > term > > > > co-occurrence > > > > >> > matrix for a given text > > corpus. > > > > >> > > > > > > > > > >> > > > > > Thanks! > > > > >> > > > > > > > > > >> > > > > > -- > > > > >> > > > > > Ahmed > > > > >> > > > > > > > > >> > > > > > > -------------------------- > > > > >> > > > > Grant > > Ingersoll > > > > >> > > > > http://www.lucidimagination.com/ > > > > >> > > > > > > > > >> > > > > Search the > > Lucene > > > > ecosystem > > > > >> > using Solr/Lucene: > > > > >> > > > >http://www.lucidimagination.com/search > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > --------------------------------------------------------------------- > > > > >> > > > > To > > unsubscribe, > > > > e-mail: java-user-unsubscr...@lucene.apache.org > > > > >> > > > > For > > additional > > > > commands, e-mail: > > > > >> > java-user-h...@lucene.apache.org > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > --------------------------------------------------------------------- > > > > >> > > > To unsubscribe, > > e-mail: > > > java-user-unsubscr...@lucene.apache.org > > > > >> > > > For additional > > commands, > > > > e-mail: java-user-h...@lucene.apache.org > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > --------------------------------------------------------------------- > > > > >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > >> > For additional commands, e-mail: > java-user-h...@lucene.apache.org > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > > > --------------------------------------------------------------------- > > > > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >