Hi Drew, the patch works beautifully. I was trying to put in some command
line arguments to help prune the list and speed up computation of relevant
ngrams.

-s --minSupport Default value: 2 (for reuters this reduces output of pass1
from 247 MB to 37 MB)
-ml --minLLR Default value: 0 (if I use 1.0 it removes a lot of junk bigrams
with <1 score)

Before committing, I just want to check with you the following changes i
have put in
I have added the minSupport check in CollocReducer just before the output is
written
and the minLLR check in LLRReducer after llr is calculated.

Robin

Reply via email to