Hi all, I've just published a tiny extension to Lucene 4.0, which enables a mixture of language models using standard FunctionQuery and ValueSource classes: https://github.com/nzhiltsov/lucene-mlm
I'd like you to assess the possibility of integrating this code into Lucene. Appreciate any comments or fixes. NB. The implementation avoids using LMSimilarity per field basis, because it would break the computation of correct Dirichlet priors for non-matched terms, which the standard class LMSimilarity fails to include while calculating term frequencies and treats them as zero probability entries. -- Nikita Zhiltsov Visiting Graduate Student Emory University Intelligent Information Access Lab E500 Emerson Hall, Atlanta, Georgia, USA Phone: (404) 834-5364 E-mail: znik...@emory.edu --------------------------------------------------------------------- Graduate Student, Research Fellow Kazan Federal University Computational Linguistics Laboratory Russia, 420008 Kazan, Prof. Nuzhina Str., 1/37 room 117 Skype: nickita.jhiltsov Personal page: http://cll.niimm.ksu.ru/~nzhiltsov E-mail: nikita.zhilt...@gmail.com ---------------------------------------------------------------------