[ 
https://issues.apache.org/jira/browse/SPARK-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277702#comment-14277702
 ] 

Joseph K. Bradley commented on SPARK-4894:
------------------------------------------

[~rnowling] Thanks for looking into this issue!  I was thinking about 2 
possibilities for generalizing NaiveBayes:
* Specify the model type with simple strings, and keep current API
** This is simpler and maintains API stability.
* Generalize the model to allow other feature and label types
** This is what we really should do long-term since NaiveBayes should not be 
limited to discrete labels.
** This would require more work, including defining a Factor concept (using the 
terminology from Probabilistic Graphical Models) and updating the NaiveBayes 
API to use factors.  Different factors would handle discrete and/or continuous 
variables, and would encode different types of distributions.  This setup is 
common in graphical model libraries like [Factorie| 
http://factorie.cs.umass.edu/index.html].
** Alternative: We could separate NaiveBayes into 2 classes based on discrete 
and continuous labels, but that might be even more work in the long term (to 
maintain 2 copies of the API).

What do you think?

> Add Bernoulli-variant of Naive Bayes
> ------------------------------------
>
>                 Key: SPARK-4894
>                 URL: https://issues.apache.org/jira/browse/SPARK-4894
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>    Affects Versions: 1.2.0
>            Reporter: RJ Nowling
>            Assignee: RJ Nowling
>
> MLlib only supports the multinomial-variant of Naive Bayes.  The Bernoulli 
> version of Naive Bayes is more useful for situations where the features are 
> binary values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to