CC Leah, who added Bernoulli option to MLlib's NaiveBayes. -Xiangrui

On Wed, Apr 15, 2015 at 4:49 AM, 姜林和 <linhe_ji...@163.com> wrote:

>
> Dear meng:
>     Thanks for the great work for park machine learning, and I saw the
> changes for  NaiveBayes algorithm ,
> separate the algorithm  to : multinomial model  and Bernoulli model ,but
> there be something confused me:
>
> the caculating of
> P(Ci) -- pi(i)
> P(j|Ci) -- theta(i,j)
>
> on  multinomial and Bernoulli model are all different ,I can only see
>  theta(i,j)  is calculate on different way,but not pi(i)
>
>
> Bernoulli:
> the origin feature vector i of label must be 0 or 1, 1 represent word j is
> exits in Document i,
>
> pi(i) = (number of Documents of class C(i) + lamda)/(number of Documents
> of all class + 2*lamda  )
> theta(i)(j) = (number of Documents which j exists in class C(i) +
> lamda)/(number of Documents of class C(i) + 2*lamda  )
>
> Multinomial:
>
> pi(i) = (number of words of class C(i) + lamda)/(number of words of all
> classes + numFeatures*lamda  )
> theta(i)(j) = (number of words j in class C(i) + lamda)/(number of words
>  in class C(i) + numFeatures*lamda  )
>
> the conparison of  two  algorithm :
>
>
>     definition in Multinomial Multinomial definition in Bernoulli
> Bernoulli  pi(i) number of words of class C(i) math.log(numAllWordsOfC +
> lambda) -piLogDenom  number of Documents of class C(i) math.log(n +
> lambda) - piLogDenom  piLogDenom  number of words of all classes 
> math.log(numAllWords
> + numfeatures* lambda) number of Documents of all class math.log(numDocuments
> + 2 * lambda)              theta(i)(j)  number of words j in class C(i)
> math.log(sumTermFreqs(j) + lamda) - thetaLogDenom number of Documents
> which j exists in class C(i) theta(i)(j) = math.log(sumTermFreqs(j) +
> lamda) - thetaLogDenom  thetaLogDenom number of words  in class C(i) 
> math.log(numAllWordsOfC
> +  numfeatures*lambda) number of Documents of class C(i) math.log(n + 2 *
> lamda)
>
> best   regard !
>
>     Linhe Jiang
>
>
>
>
>
>
> Linhe  Jiang
>
>
>

Reply via email to