I should point out that with only hundreds to review, you can eliminate laugh inducing phrases by hand.
If you have hundreds of thousands, it is a different problem. On Fri, Jan 8, 2010 at 4:44 AM, Shashikant Kore <[email protected]>wrote: > ... > With corpus of million documents, if I calculate LLR score of terms in > a set of say 50,000 documents, I get hundreds of terms with score more > than 50, many of which are not "useful." > > -- Ted Dunning, CTO DeepDyve
