Repository: spark Updated Branches: refs/heads/branch-1.3 b570d98e8 -> 6a2fc85e0
[SPARK-6083] [MLLib] [DOC] Make Python API example consistent in NaiveBayes Author: MechCoder <manojkumarsivaraj...@gmail.com> Closes #4834 from MechCoder/spark-6083 and squashes the following commits: 1cdd7b5 [MechCoder] Add parse function 65bbbe9 [MechCoder] [SPARK-6083] Make Python API example consistent in NaiveBayes (cherry picked from commit 3f00bb3ef1384fabf86a68180d40a1a515f6f5e3) Signed-off-by: Xiangrui Meng <m...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6a2fc85e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6a2fc85e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6a2fc85e Branch: refs/heads/branch-1.3 Commit: 6a2fc85e07b8804e1c0ad8de3fef44e21ad0fd7d Parents: b570d98 Author: MechCoder <manojkumarsivaraj...@gmail.com> Authored: Sun Mar 1 16:28:15 2015 -0800 Committer: Xiangrui Meng <m...@databricks.com> Committed: Sun Mar 1 16:28:25 2015 -0800 ---------------------------------------------------------------------- docs/mllib-naive-bayes.md | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/6a2fc85e/docs/mllib-naive-bayes.md ---------------------------------------------------------------------- diff --git a/docs/mllib-naive-bayes.md b/docs/mllib-naive-bayes.md index 5224a0b..55b8f2c 100644 --- a/docs/mllib-naive-bayes.md +++ b/docs/mllib-naive-bayes.md @@ -115,22 +115,28 @@ used for evaluation and prediction. Note that the Python API does not yet support model save/load but will in the future. -<!-- TODO: Make Python's example consistent with Scala's and Java's. --> {% highlight python %} -from pyspark.mllib.regression import LabeledPoint from pyspark.mllib.classification import NaiveBayes +from pyspark.mllib.linalg import Vectors +from pyspark.mllib.regression import LabeledPoint + +def parseLine(line): + parts = line.split(',') + label = float(parts[0]) + features = Vectors.dense([float(x) for x in parts[1].split(' ')]) + return LabeledPoint(label, features) + +data = sc.textFile('data/mllib/sample_naive_bayes_data.txt').map(parseLine) -# an RDD of LabeledPoint -data = sc.parallelize([ - LabeledPoint(0.0, [0.0, 0.0]) - ... # more labeled points -]) +# Split data aproximately into training (60%) and test (40%) +training, test = data.randomSplit([0.6, 0.4], seed = 0) # Train a naive Bayes model. -model = NaiveBayes.train(data, 1.0) +model = NaiveBayes.train(training, 1.0) -# Make prediction. -prediction = model.predict([0.0, 0.0]) +# Make prediction and test accuracy. +predictionAndLabel = test.map(lambda p : (model.predict(p.features), p.label)) +accuracy = 1.0 * predictionAndLabel.filter(lambda (x, v): x == v).count() / test.count() {% endhighlight %} </div> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org