[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...

MLnick Tue, 10 Oct 2017 06:44:06 -0700

Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17819#discussion_r143722947
  
    --- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/BucketizerSuite.scala ---
    @@ -187,6 +188,196 @@ class BucketizerSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defa
           }
         }
       }
    +
    +  test("multiple columns: Bucket continuous features, without -inf,inf") {
    +    // Check a set of valid feature values.
    +    val splits = Array(Array(-0.5, 0.0, 0.5), Array(-0.1, 0.3, 0.5))
    +    val validData1 = Array(-0.5, -0.3, 0.0, 0.2)
    +    val validData2 = Array(0.5, 0.3, 0.0, -0.1)
    +    val expectedBuckets1 = Array(0.0, 0.0, 1.0, 1.0)
    +    val expectedBuckets2 = Array(1.0, 1.0, 0.0, 0.0)
    +
    +    val data = (0 until validData1.length).map { idx =>
    +      (validData1(idx), validData2(idx), expectedBuckets1(idx), 
expectedBuckets2(idx))
    +    }
    +    val dataFrame: DataFrame = data.toSeq.toDF("feature1", "feature2", 
"expected1", "expected2")
    --- End diff --
    
    `toSeq` not required here?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...

Reply via email to