Robin, I gather that minSupport eliminates ngrams with a frequency less than 2? Sounds great to me. Thanks for the additions.
Drew On Tue, Feb 9, 2010 at 1:59 AM, Robin Anil <robin.a...@gmail.com> wrote: > Hi Drew, the patch works beautifully. I was trying to put in some command > line arguments to help prune the list and speed up computation of relevant > ngrams. > > -s --minSupport Default value: 2 (for reuters this reduces output of pass1 > from 247 MB to 37 MB) > -ml --minLLR Default value: 0 (if I use 1.0 it removes a lot of junk bigrams > with <1 score) > > Before committing, I just want to check with you the following changes i > have put in > I have added the minSupport check in CollocReducer just before the output is > written > and the minLLR check in LLRReducer after llr is calculated. > > Robin >