spark git commit: [MINOR][ML][DOC] Improved Naive Bayes user guide explanation

jkbradley Wed, 09 May 2018 10:35:33 -0700

Repository: spark
Updated Branches:
  refs/heads/master 6ea582e36 -> 94155d039



[MINOR][ML][DOC] Improved Naive Bayes user guide explanation

## What changes were proposed in this pull request?

This copies the material from the spark.mllib user guide page for Naive Bayes 
to the spark.ml user guide page.  I also improved the wording and organization 
slightly.

## How was this patch tested?

Built docs locally.

Author: Joseph K. Bradley <jos...@databricks.com>

Closes #21272 from jkbradley/nb-doc-update.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/94155d03
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/94155d03
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/94155d03

Branch: refs/heads/master
Commit: 94155d0395324a012db2fc8a57edb3cd90b61e96
Parents: 6ea582e
Author: Joseph K. Bradley <jos...@databricks.com>
Authored: Wed May 9 10:34:57 2018 -0700
Committer: Joseph K. Bradley <jos...@databricks.com>
Committed: Wed May 9 10:34:57 2018 -0700

----------------------------------------------------------------------
 docs/ml-classification-regression.md | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/94155d03/docs/ml-classification-regression.md
----------------------------------------------------------------------
diff --git a/docs/ml-classification-regression.md 
b/docs/ml-classification-regression.md
index d660655..b3d1090 100644
--- a/docs/ml-classification-regression.md
+++ b/docs/ml-classification-regression.md
@@ -455,11 +455,29 @@ Refer to the [Python API 
docs](api/python/pyspark.ml.html#pyspark.ml.classificat
 ## Naive Bayes
 
 [Naive Bayes classifiers](http://en.wikipedia.org/wiki/Naive_Bayes_classifier) 
are a family of simple 
-probabilistic classifiers based on applying Bayes' theorem with strong (naive) 
independence 
-assumptions between the features. The `spark.ml` implementation currently 
supports both [multinomial
-naive 
Bayes](http://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html)
+probabilistic, multiclass classifiers based on applying Bayes' theorem with 
strong (naive) independence 
+assumptions between every pair of features.
+
+Naive Bayes can be trained very efficiently. With a single pass over the 
training data,
+it computes the conditional probability distribution of each feature given 
each label.
+For prediction, it applies Bayes' theorem to compute the conditional 
probability distribution
+of each label given an observation.
+
+MLlib supports both [multinomial naive 
Bayes](http://en.wikipedia.org/wiki/Naive_Bayes_classifier#Multinomial_naive_Bayes)
 and [Bernoulli naive 
Bayes](http://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html).
-More information can be found in the section on [Naive Bayes in 
MLlib](mllib-naive-bayes.html#naive-bayes-sparkmllib).
+
+*Input data*:
+These models are typically used for [document 
classification](http://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html).
+Within that context, each observation is a document and each feature 
represents a term.
+A feature's value is the frequency of the term (in multinomial Naive Bayes) or
+a zero or one indicating whether the term was found in the document (in 
Bernoulli Naive Bayes).
+Feature values must be *non-negative*. The model type is selected with an 
optional parameter
+"multinomial" or "bernoulli" with "multinomial" as the default.
+For document classification, the input feature vectors should usually be 
sparse vectors.
+Since the training data is only used once, it is not necessary to cache it.
+
+[Additive smoothing](http://en.wikipedia.org/wiki/Lidstone_smoothing) can be 
used by
+setting the parameter $\lambda$ (default to $1.0$). 
 
 **Examples**
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR][ML][DOC] Improved Naive Bayes user guide explanation

Reply via email to