Smoothing language model by Lucene

cheyenne.lin Thu, 02 Feb 2012 09:53:34 -0800

I've had an old implementation Lucene-lm by ilps, which is a good start.
However, that implementation doesn't include smooth algorithm. And I found
it particularly hard to re-write the core scoring mechanism to enable
smooth.


(Background: In language model, smoothing strategy adds a little constant
weight to documents with zero query frequency. Of course it doesn't change
anything for one keyword, but consider the case of multiple-keyword query,
when one document is strongly relevant to a few distinguishing keywords,
smoothing may be important) 

In the lucene framework for a multiple-keyword query (say, the simplest
unigram, non-positional query), the following procedure happens, as my
understanding:

1)QueryParser parse query string to BooleanQuery.clauses (weights)
2)(The corresponding scorer of BooleanQuery ) merges all document scores for
each clause
3) but the problem is: each clause's termdocs only contains inversed index
of clause, thus make smoothing strategy impossible, because the document
won't be scored by each query term.

What can I do about that? What class should I concentrate on?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Smoothing-language-model-by-Lucene-tp3709311p3709311.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Smoothing language model by Lucene

Reply via email to