[ 
https://issues.apache.org/jira/browse/LUCENE-7478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shayan Tabrizi updated LUCENE-7478:
-----------------------------------
    Description: 
It seems that the formula in LMDirichletSimilarity is wrong or at least is not 
the formula in the mentioned C.X. Zhai paper. 

The main part of formula in LMDirichletSimilarity is:
1 + freq /
        (mu * ((LMStats)stats).getCollectionProbability())) +
        Math.log(mu / (docLen + mu)

which is in fact:
(mu*p(w|C)+c(w,d))/(p(w|C)*(|d| + mu))

while the main formula is:
(mu*p(w|C)+c(w,d))/(|d| + mu)

So a p(w|C) is practically added to the formula.

  was:
It seems that the formula in LMDirichletSimilarity is wrong or at least is not 
the formula in the mentioned C.X. Zhai paper. 

The main part of formula in LMDirichletSimilarity is:
Math.log(1 + freq /
        (mu * ((LMStats)stats).getCollectionProbability())) +
        Math.log(mu / (docLen + mu))

which is in fact:
(mu*p(w|C)+c(w,d))/(p(w|C)*(|d| + mu))

while the main formula is:
(mu*p(w|C)+c(w,d))/(|d| + mu)

So a p(w|C) is practically added to the formula.


> Wrong Formula in LMDirichletSimilarity
> --------------------------------------
>
>                 Key: LUCENE-7478
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7478
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Shayan Tabrizi
>            Priority: Critical
>
> It seems that the formula in LMDirichletSimilarity is wrong or at least is 
> not the formula in the mentioned C.X. Zhai paper. 
> The main part of formula in LMDirichletSimilarity is:
> 1 + freq /
>         (mu * ((LMStats)stats).getCollectionProbability())) +
>         Math.log(mu / (docLen + mu)
> which is in fact:
> (mu*p(w|C)+c(w,d))/(p(w|C)*(|d| + mu))
> while the main formula is:
> (mu*p(w|C)+c(w,d))/(|d| + mu)
> So a p(w|C) is practically added to the formula.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to