Re: [ngram] How to recognize informative n-grams in a corpus?

Ted Pedersen tpede...@d.umn.edu [ngram] Tue, 10 May 2016 08:21:17 -0700

The Ngram Statistics Package is mostly intended to help you find the most
frequent ngrams in a corpus, or the most strongly associated ngrams in a
corpus. It doesn't necessarily directly give you informativeness, although
you can certainly come up with ways to use frequency and measures of
association to find that. It sounds like you should look at our paper on
NSP to get some ideas about how to use it, and what it offers.

http://www.d.umn.edu/~tpederse/Pubs/cicling2003-2.pdf

Also, the code itself has some documentation that should be helpful...

http://search.cpan.org/~tpederse/Text-NSP/doc/README.pod

http://search.cpan.org/~tpederse/Text-NSP/doc/USAGE.pod

I hope this helps!
Ted

On Tue, May 10, 2016 at 5:22 AM, 'Amir H. Jadidinejad' amir.jad...@yahoo.com
[ngram] <ngram@yahoogroups.com> wrote:

>
>
> Hi,
>
> I have a corpus of 3K short text documents. I’m going to *recognize the
> most informative n-grams* in the corpus.
> Unfortunately, I can’t find a straight way from the documents. Would you
> please help me?
>
> Kind regards,
> Amir H. Jadidinejad
>
> 
>

Re: [ngram] How to recognize informative n-grams in a corpus?

Reply via email to