I think this is the problem that you're running into, though maybe a person
with more expertise can confirm...

ZP, If you look at section 5.1 of the Zhai Lafferty paper (
http://www.cs.cmu.edu/~lafferty/pub/smooth-tois.ps), they note that the
"term weight is log(1+(1-\lambda)p_ml(q_i|d) / \lamdba p(q_i|C)". P_ml is
freq/docLen, so it looks right, no?

The formula that you are looking at smooths on p(q_i|d), but if you look at
equation 6 (disregarding the constant at the end, that we don't need) and
read the paragraph below it, you can see that a term weight in the full log
p(q|d) calculation is more that just p(q_i|d).

The same goes for Dong's question on Dirichlet smoothing, which also uses a
non-constant \alpha_d, making the math a bit trickier.

Peter


On Tue, Apr 2, 2013 at 12:46 PM, Zeynep P. <zp...@yahoo.com> wrote:

> Hi,
>
> I have the same question related to LMJelinekMercerSimiliarity class.
>
>   protected float score(BasicStats stats, float freq, float docLen) {
>     return stats.getTotalBoost() *
>         (float)Math.log(1 +  ((1 - lambda) * freq / docLen) / (lambda *
> ((LMStats)stats).getCollectionProbability()));
>   }
>
>  score = Math.log( (1 - lambda) *  freq / docLen * + *lambda *
> ((LMStats)stats).getCollectionProbability()) )
>
> I am also getting much worse results by updating the code like above.
>
> Why is it calculated this way?
>
> Thanks in advance,
>
> Best regards,
> ZP
>
> P.S: Instead of creating a new question, I used your question because I
> believe that the reason should be the same.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Scoring-function-in-LMDirichletSimilarity-Class-tp4052488p4053267.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Reply via email to