[ https://issues.apache.org/jira/browse/SPARK-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277702#comment-14277702 ]
Joseph K. Bradley commented on SPARK-4894: ------------------------------------------ [~rnowling] Thanks for looking into this issue! I was thinking about 2 possibilities for generalizing NaiveBayes: * Specify the model type with simple strings, and keep current API ** This is simpler and maintains API stability. * Generalize the model to allow other feature and label types ** This is what we really should do long-term since NaiveBayes should not be limited to discrete labels. ** This would require more work, including defining a Factor concept (using the terminology from Probabilistic Graphical Models) and updating the NaiveBayes API to use factors. Different factors would handle discrete and/or continuous variables, and would encode different types of distributions. This setup is common in graphical model libraries like [Factorie| http://factorie.cs.umass.edu/index.html]. ** Alternative: We could separate NaiveBayes into 2 classes based on discrete and continuous labels, but that might be even more work in the long term (to maintain 2 copies of the API). What do you think? > Add Bernoulli-variant of Naive Bayes > ------------------------------------ > > Key: SPARK-4894 > URL: https://issues.apache.org/jira/browse/SPARK-4894 > Project: Spark > Issue Type: New Feature > Components: MLlib > Affects Versions: 1.2.0 > Reporter: RJ Nowling > Assignee: RJ Nowling > > MLlib only supports the multinomial-variant of Naive Bayes. The Bernoulli > version of Naive Bayes is more useful for situations where the features are > binary values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org