[ 
https://issues.apache.org/jira/browse/SPARK-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276380#comment-14276380
 ] 

RJ Nowling commented on SPARK-4894:
-----------------------------------

Hi @lmcguire,

Always happy to have more help! :)

I started looking through the Spark NB functions but I haven't started writing 
code yet.  The docs for NB mention that using binary features will cause the 
multinomial NB to act like Bernoulli NB.  I don't believe the documentation is 
correct, at least when smoothing is used since P(0) != 1 - P(1).    I was 
planning on comparing the sklearn implementation with the Spark implementation 
and showing that the docs were wrong.  Once verified, I think the changes will 
be very small to add a Bernoulli mode controlled by a flag in the constructor.

I won't get to this until next week, though.  If you have time now and want to 
tackle this, I'd be happy to hand it over to you and review any patches.  (I'm 
not a committer, though -- [~mengxr] would have to sign off.)    Otherwise, if 
you want to wait until I have a patch and test it, that could work, too.  What 
do you think?

> Add Bernoulli-variant of Naive Bayes
> ------------------------------------
>
>                 Key: SPARK-4894
>                 URL: https://issues.apache.org/jira/browse/SPARK-4894
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>    Affects Versions: 1.1.1
>            Reporter: RJ Nowling
>
> MLlib only supports the multinomial-variant of Naive Bayes.  The Bernoulli 
> version of Naive Bayes is more useful for situations where the features are 
> binary values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to