Robin,

I gather that minSupport eliminates ngrams with a frequency less than
2? Sounds great to me. Thanks for the additions.

Drew

On Tue, Feb 9, 2010 at 1:59 AM, Robin Anil <robin.a...@gmail.com> wrote:
> Hi Drew, the patch works beautifully. I was trying to put in some command
> line arguments to help prune the list and speed up computation of relevant
> ngrams.
>
> -s --minSupport Default value: 2 (for reuters this reduces output of pass1
> from 247 MB to 37 MB)
> -ml --minLLR Default value: 0 (if I use 1.0 it removes a lot of junk bigrams
> with <1 score)
>
> Before committing, I just want to check with you the following changes i
> have put in
> I have added the minSupport check in CollocReducer just before the output is
> written
> and the minLLR check in LLRReducer after llr is calculated.
>
> Robin
>

Reply via email to