[GitHub] spark pull request #12819: [SPARK-14077][ML] Refactor NaiveBayes to support ...

zhengruifeng Tue, 20 Sep 2016 19:59:52 -0700

Github user zhengruifeng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12819#discussion_r79752969
  
    --- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/NaiveBayesSuite.scala 
---
    @@ -150,6 +150,75 @@ class NaiveBayesSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defa
         validateProbabilities(featureAndProbabilities, model, "multinomial")
       }
     
    +  test("Naive Bayes Multinomial with weighted samples") {
    +    val (dataset, weightedDataset) = {
    +      val nPoints = 1000
    +      val piArray = Array(0.5, 0.1, 0.4).map(math.log)
    +      val thetaArray = Array(
    +        Array(0.70, 0.10, 0.10, 0.10), // label 0
    +        Array(0.10, 0.70, 0.10, 0.10), // label 1
    +        Array(0.10, 0.10, 0.70, 0.10) // label 2
    +      ).map(_.map(math.log))
    +      val pi = Vectors.dense(piArray)
    +      val theta = new DenseMatrix(3, 4, thetaArray.flatten, true)
    +
    +      val testData = generateNaiveBayesInput(piArray, thetaArray, nPoints, 
42, "multinomial")
    +
    +      // Let's over-sample the label-1 samples twice, label-2 samples 
triple.
    +      val data1 = testData.flatMap { case labeledPoint: LabeledPoint =>
    +        labeledPoint.label match {
    +          case 0.0 => Iterator(labeledPoint)
    +          case 1.0 => Iterator(labeledPoint, labeledPoint)
    +          case 2.0 => Iterator(labeledPoint, labeledPoint, labeledPoint)
    +        }
    +      }
    +
    +      val rnd = new Random(8392)
    +      val data2 = testData.flatMap { case LabeledPoint(label: Double, 
features: Vector) =>
    --- End diff --
    
    Good point. Of course, I will update this testsuite keep in line with other 
algorithms.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12819: [SPARK-14077][ML] Refactor NaiveBayes to support ...

Reply via email to