I've had an old implementation Lucene-lm by ilps, which is a good start. However, that implementation doesn't include smooth algorithm. And I found it particularly hard to re-write the core scoring mechanism to enable smooth.
(Background: In language model, smoothing strategy adds a little constant weight to documents with zero query frequency. Of course it doesn't change anything for one keyword, but consider the case of multiple-keyword query, when one document is strongly relevant to a few distinguishing keywords, smoothing may be important) In the lucene framework for a multiple-keyword query (say, the simplest unigram, non-positional query), the following procedure happens, as my understanding: 1)QueryParser parse query string to BooleanQuery.clauses (weights) 2)(The corresponding scorer of BooleanQuery ) merges all document scores for each clause 3) but the problem is: each clause's termdocs only contains inversed index of clause, thus make smoothing strategy impossible, because the document won't be scored by each query term. What can I do about that? What class should I concentrate on? -- View this message in context: http://lucene.472066.n3.nabble.com/Smoothing-language-model-by-Lucene-tp3709311p3709311.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org