Hi Drew, the patch works beautifully. I was trying to put in some command line arguments to help prune the list and speed up computation of relevant ngrams.
-s --minSupport Default value: 2 (for reuters this reduces output of pass1 from 247 MB to 37 MB) -ml --minLLR Default value: 0 (if I use 1.0 it removes a lot of junk bigrams with <1 score) Before committing, I just want to check with you the following changes i have put in I have added the minSupport check in CollocReducer just before the output is written and the minLLR check in LLRReducer after llr is calculated. Robin