Hi Ivan!

sorry for not being clear, i am talking about term ngrams, shingles....
Something like:

poster
online advertising
yellow cab
this is phrase
sunshine
good morning sunshine

with their frequencies. That is,  that these that are returned are some
popular phrases and terns, which go over certain threshold.

Thanx Ivan




On Mon, Aug 23, 2010 at 8:41 PM, Ivan Provalov <[email protected]> wrote:

> Aida,
>
> Are you talking about letter n-grams or term n-grams?
>
> Thanks,
>
> Ivan
>
> --- On Mon, 8/23/10, Aida Hota <[email protected]> wrote:
>
> > From: Aida Hota <[email protected]>
> > Subject: Re: Calculate Term Co-occurrence Matrix
> > To: [email protected]
> > Date: Monday, August 23, 2010, 1:36 PM
> > Hi Ivan thanx a lot for this. I just
> > caught time to see this and reply,
> > sorry for bugging again, I appreciate already what you
> > uploaded . I would
> > also like to ask one question, if you dont mind. If it is
> > possible somehow
> > to get from this unified list of frequently occuring
> > unigrams, bigrams and
> > trigrams with their frequencies????
> >
> > Thank you very much
> >
> >
> > On Mon, Aug 23, 2010 at 3:22 PM, Ivan Provalov <[email protected]>
> > wrote:
> >
> > > Ahmed, if you want the raw score, you can do it the
> > way you describe below.
> > >
> > >
> > >
> > > --- On Sun, 8/22/10, ahmed algohary <[email protected]>
> > wrote:
> > >
> > > > From: ahmed algohary <[email protected]>
> > > > Subject: Re: Calculate Term Co-occurrence Matrix
> > > > To: [email protected]
> > > > Date: Sunday, August 22, 2010, 9:27 AM
> > > > I think I got it.
> > > >
> > > > In the CollectionIndexer class, I have added the
> > > > co-occurrence score to the
> > > > index document:
> > > >
> > > >  doc.add(new Field("score",
> > collocation.getScore() + "",
> > > >
> > > > Field.Store.YES, Field.Index.NOT_ANALYZED));
> > > >
> > > > then in the CollectionSearcher, the scores can
> > be
> > > > retrieved:
> > > >
> > > >  d.get("score")
> > > >
> > > > Is that correct ??
> > > >
> > > > On Sun, Aug 22, 2010 at 2:47 PM, ahmed algohary
> > <[email protected]
> > > >wrote:
> > > >
> > > > > Thanks! It is exactly what I need. But,
> > isn't there a
> > > > way to get the
> > > > > matching score ?
> > > > >
> > > > > for example, "damaged"  co-occurs with
> > "shipment"
> > > > with a probability = 0.4
> > > > > ??
> > > > >
> > > > >
> > > > > On Sun, Aug 22, 2010 at 5:35 AM, Ivan
> > Provalov <[email protected]>
> > > > wrote:
> > > > >
> > > > >> Ahmed,
> > > > >>
> > > > >> FYI, I updated the term collocations
> > package I
> > > > mentioned earlier with a
> > > > >> few fixes and changes which will make it
> > work for
> > > > Lucene 3.0.2.  This may
> > > > >> help your task.
> > > > >>
> > > > >> See:
> > > > >> https://issues.apache.org/jira/browse/LUCENE-474
> > > > >>
> > > > >> Thanks,
> > > > >>
> > > > >> Ivan Provalov
> > > > >>
> > > > >>
> > > > >> --- On Sat, 8/21/10, Otis Gospodnetic
> > <[email protected]>
> > > > wrote:
> > > > >>
> > > > >> > From: Otis Gospodnetic <[email protected]>
> > > > >> > Subject: Re: Calculate Term
> > Co-occurrence
> > > > Matrix
> > > > >> > To: [email protected]
> > > > >> > Date: Saturday, August 21, 2010,
> > 8:05 AM
> > > > >> > Ahmed,
> > > > >> >
> > > > >> > That's what that KPE (link in my
> > previous
> > > > email, below)
> > > > >> > will do for you.  It's
> > > > >> > not open source at this time, but
> > that is
> > > > exactly one of
> > > > >> > the things it does.  I
> > > > >> > think Mahout collocations stuff
> > might work
> > > > for you, too.
> > > > >> >
> > > > >> > Otis
> > > > >> > ----
> > > > >> > Sematext :: http://sematext.com/ :: Solr - Lucene -
> > > > Nutch
> > > > >> > Lucene ecosystem search :: http://search-lucene.com/
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > ----- Original Message ----
> > > > >> > > From: ahmed algohary <[email protected]>
> > > > >> > > To: [email protected]
> > > > >> > > Sent: Sat, August 21, 2010
> > 7:20:03 AM
> > > > >> > > Subject: Re: Calculate Term
> > > > Co-occurrence Matrix
> > > > >> > >
> > > > >> > > Thanks for all your answers!
> > > > >> > >
> > > > >> > > it seems like I did not make
> > my
> > > > question  clear.
> > > > >> > I have a text corpus and I
> > > > >> > > need to determine the pairs of
> > words
> > > > that  occur
> > > > >> > together in many documents.
> > > > >> > > I need to do that to be able
> > to measure
> > > > the
> > > > >> > semantic proximity between
> > > > >> > > words. This method is
> > expanded
> > > > >> > > here<http://forums.searchenginewatch.com/showthread.php?t=48
> >.
> > > > >> > > I hope to  find some code
> > that
> > > > given a text
> > > > >> > corpus, generate all the words
> > > > >> > > pairs with  their
> > probability of
> > > > occurring
> > > > >> > together.
> > > > >> > >
> > > > >> > >
> > > > >> > > On Sat, Aug 21, 2010 at
> > 1:46  AM,
> > > > Otis
> > > > >> > Gospodnetic <
> > > > >> > > [email protected]>
> > > > >> > wrote:
> > > > >> > >
> > > > >> > > > There is also a
> > non-Mahout Key
> > > > Phrase Extractor
> > > > >> > for  Collocations, SIPs, and
> > > > >> > > > a
> > > > >> > > > few other things:
> > > > >> > > >
> http://sematext.com/products/key-phrase-extractor/index.html
> > > > >> > > >
> > > > >> > > >  One of the demos
> > that uses
> > > > news data is at
> > > > >> > > > http://sematext.com/demo/kpe/index.html
> > > > >> > > >
> > > > >> > > > Otis
> > > > >> > > >  ----
> > > > >> > > > Sematext :: http://sematext.com/ :: Solr - Lucene -
> > > > >> > Nutch
> > > > >> > > > Lucene ecosystem
> > search :: http://search-lucene.com/
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > ----- Original
> > Message ----
> > > > >> > > > > From: Grant
> > Ingersoll <[email protected]>
> > > > >> > > > > To: [email protected]
> > > > >> > > >  > Sent: Fri,
> > August 20,
> > > > 2010 8:52:17 AM
> > > > >> > > > > Subject: Re:
> > Calculate
> > > > Term
> > > > >> > Co-occurrence Matrix
> > > > >> > > > >
> > > > >> > > > > You might also be
> > > > interested  in
> > > > >> > Mahout's collocations package:
> > > > >> > > > >
> http://cwiki.apache.org/confluence/display/MAHOUT/Collocations
> > > > >> > > >  >
> > > > >> > > > > -Grant
> > > > >> > > > > On  Aug 19,
> > 2010, at
> > > > 11:39 AM,
> > > > >> > ahmed  algohary wrote:
> > > > >> > > > >
> > > > >> > > > > > Hi all,
> > > > >> > > > > >
> > > > >> > > >  > > I need to
> > know if
> > > > there is a
> > > > >> > Lucene plug-in or a Lucene-based
> > > > API  for
> > > > >> > > > > > calculating the
> > term
> > > > co-occurrence
> > > > >> > matrix for a  given text
> > corpus.
> > > > >> > > > > >
> > > > >> > > > > > Thanks!
> > > > >> > > >  > >
> > > > >> > > > > > --
> > > > >> > > > > >  Ahmed
> > > > >> > > >  >
> > > > >> > > > >
> > --------------------------
> > > > >> > > > > Grant
> > Ingersoll
> > > > >> > > > > http://www.lucidimagination.com/
> > > > >> > > > >
> > > > >> > > > > Search the
> > Lucene
> > > > ecosystem
> > > > >> > using  Solr/Lucene:
> > > > >> > > > >http://www.lucidimagination.com/search
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > >  >
> > > > >> >
> > > >
> > ---------------------------------------------------------------------
> > > > >> > > >  > To
> > unsubscribe,
> > > > e-mail: [email protected]
> > > > >> > > >  > For
> > additional
> > > > commands, e-mail:
> > > > >> > [email protected]
> > > > >> > > >  >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> >
> > > >
> > ---------------------------------------------------------------------
> > > > >> > > > To  unsubscribe,
> > e-mail:
> > > [email protected]
> > > > >> > > >  For additional
> > commands,
> > > > e-mail: [email protected]
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >> >
> > > >
> > ---------------------------------------------------------------------
> > > > >> > To unsubscribe, e-mail: [email protected]
> > > > >> > For additional commands, e-mail:
> [email protected]
> > > > >> >
> > > > >> >
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > >
> > ---------------------------------------------------------------------
> > > > >> To unsubscribe, e-mail: [email protected]
> > > > >> For additional commands, e-mail: [email protected]
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> > >
> > >
> > >
> > >
> > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> > >
> >
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to