[ https://issues.apache.org/jira/browse/LUCENE-7478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shayan Tabrizi updated LUCENE-7478: ----------------------------------- Description: It seems that the formula in LMDirichletSimilarity is wrong or at least is not the formula in the mentioned C.X. Zhai paper. The main part of formula in LMDirichletSimilarity is: 1 + freq / (mu * ((LMStats)stats).getCollectionProbability())) + Math.log(mu / (docLen + mu) which is in fact: (mu*p(w|C)+c(w,d))/(p(w|C)*(|d| + mu)) while the main formula is: (mu*p(w|C)+c(w,d))/(|d| + mu) So a p(w|C) is practically added to the formula. was: It seems that the formula in LMDirichletSimilarity is wrong or at least is not the formula in the mentioned C.X. Zhai paper. The main part of formula in LMDirichletSimilarity is: Math.log(1 + freq / (mu * ((LMStats)stats).getCollectionProbability())) + Math.log(mu / (docLen + mu)) which is in fact: (mu*p(w|C)+c(w,d))/(p(w|C)*(|d| + mu)) while the main formula is: (mu*p(w|C)+c(w,d))/(|d| + mu) So a p(w|C) is practically added to the formula. > Wrong Formula in LMDirichletSimilarity > -------------------------------------- > > Key: LUCENE-7478 > URL: https://issues.apache.org/jira/browse/LUCENE-7478 > Project: Lucene - Core > Issue Type: Bug > Reporter: Shayan Tabrizi > Priority: Critical > > It seems that the formula in LMDirichletSimilarity is wrong or at least is > not the formula in the mentioned C.X. Zhai paper. > The main part of formula in LMDirichletSimilarity is: > 1 + freq / > (mu * ((LMStats)stats).getCollectionProbability())) + > Math.log(mu / (docLen + mu) > which is in fact: > (mu*p(w|C)+c(w,d))/(p(w|C)*(|d| + mu)) > while the main formula is: > (mu*p(w|C)+c(w,d))/(|d| + mu) > So a p(w|C) is practically added to the formula. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org