[GitHub] spark pull request #16722: [SPARK-19591][ML][MLlib] Add sample weights to de...

imatiach-msft Thu, 16 Feb 2017 06:37:02 -0800

Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16722#discussion_r101531848
  
    --- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
 ---
    @@ -351,6 +370,36 @@ class DecisionTreeClassifierSuite
         dt.fit(df)
       }
     
    +  test("training with sample weights") {
    +    val df = linearMulticlassDataset
    +    val numClasses = 3
    +    val predEquals = (x: Double, y: Double) => x == y
    +    // (impurity, maxDepth)
    +    val testParams = Seq(
    +      ("gini", 10),
    +      ("entropy", 10),
    +      ("gini", 5)
    +    )
    +    for ((impurity, maxDepth) <- testParams) {
    +      val estimator = new DecisionTreeClassifier()
    +        .setMaxDepth(maxDepth)
    +        .setSeed(seed)
    +        .setMinWeightFractionPerNode(0.049)
    --- End diff --
    
    nope, that's not exactly what I was looking for - I wanted to see tests 
that validate that when setting the params for this specific estimator that we 
get the desired errors.  From my point of view even subtle functionality like 
this should be validated.  But I guess spark usually doesn't have such tests.  
There are actually a lot of tests that would be desirable in spark ml 
estimators/transforms, eg fuzzing tests with randomly generated data and 
parameters, regularly scheduled performance tests, tests for various metrics 
(accuracy, precision etc) on a variety of datasets (eg from the UCI repository) 
with different characteristics.  I'm sure there are a lot of bugs that could be 
found this way which users may be running into but not reporting.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16722: [SPARK-19591][ML][MLlib] Add sample weights to de...

Reply via email to