I think this is the problem that you're running into, though maybe a person with more expertise can confirm...
ZP, If you look at section 5.1 of the Zhai Lafferty paper ( http://www.cs.cmu.edu/~lafferty/pub/smooth-tois.ps), they note that the "term weight is log(1+(1-\lambda)p_ml(q_i|d) / \lamdba p(q_i|C)". P_ml is freq/docLen, so it looks right, no? The formula that you are looking at smooths on p(q_i|d), but if you look at equation 6 (disregarding the constant at the end, that we don't need) and read the paragraph below it, you can see that a term weight in the full log p(q|d) calculation is more that just p(q_i|d). The same goes for Dong's question on Dirichlet smoothing, which also uses a non-constant \alpha_d, making the math a bit trickier. Peter On Tue, Apr 2, 2013 at 12:46 PM, Zeynep P. <zp...@yahoo.com> wrote: > Hi, > > I have the same question related to LMJelinekMercerSimiliarity class. > > protected float score(BasicStats stats, float freq, float docLen) { > return stats.getTotalBoost() * > (float)Math.log(1 + ((1 - lambda) * freq / docLen) / (lambda * > ((LMStats)stats).getCollectionProbability())); > } > > score = Math.log( (1 - lambda) * freq / docLen * + *lambda * > ((LMStats)stats).getCollectionProbability()) ) > > I am also getting much worse results by updating the code like above. > > Why is it calculated this way? > > Thanks in advance, > > Best regards, > ZP > > P.S: Instead of creating a new question, I used your question because I > believe that the reason should be the same. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Scoring-function-in-LMDirichletSimilarity-Class-tp4052488p4053267.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >