Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r101531848 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala --- @@ -351,6 +370,36 @@ class DecisionTreeClassifierSuite dt.fit(df) } + test("training with sample weights") { + val df = linearMulticlassDataset + val numClasses = 3 + val predEquals = (x: Double, y: Double) => x == y + // (impurity, maxDepth) + val testParams = Seq( + ("gini", 10), + ("entropy", 10), + ("gini", 5) + ) + for ((impurity, maxDepth) <- testParams) { + val estimator = new DecisionTreeClassifier() + .setMaxDepth(maxDepth) + .setSeed(seed) + .setMinWeightFractionPerNode(0.049) --- End diff -- nope, that's not exactly what I was looking for - I wanted to see tests that validate that when setting the params for this specific estimator that we get the desired errors. From my point of view even subtle functionality like this should be validated. But I guess spark usually doesn't have such tests. There are actually a lot of tests that would be desirable in spark ml estimators/transforms, eg fuzzing tests with randomly generated data and parameters, regularly scheduled performance tests, tests for various metrics (accuracy, precision etc) on a variety of datasets (eg from the UCI repository) with different characteristics. I'm sure there are a lot of bugs that could be found this way which users may be running into but not reporting.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org