GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/20028
[SPARK-19053][ML]Supporting multiple evaluation metrics in DataFrame-based API ## What changes were proposed in this pull request? As an initial step, the PR creates BinaryClassificationMetrics, MultiClassClassificationMetrics, RegressionMetrics in ML. The long term target is to reach function parity with MLlib Metrics and be able to provide more enhancements for DataFrame-based API. This PR allows the Binary/MultilclassClassification/Regression Evaluator return a corresponding metrics instance, and users can use the metrics instance to access multiple metrics after a single pass, while originally Evaluator only allows access for single metric. ``` val evaluator = new MulticlassClassificationEvaluator().setMetricName("accuracy") val metrics = evaluator.getMetrics(df) metrics.accuracy metrics.weightedFMeasure metrics.weightedPrecision metrics.weightedRecall ``` To make the review easier, the current PR only includes metrics in the Evaluator to reach function parity. Plan for further development: 1. Initial API and function parity with ML Evaluators. (This PR) 2. Python API. 2. Function parity with MLlib Metrics. 3. Add requested enhancements like including weight, add per-row metrics, add ranking metrics. 4. Reorganize classification Metrics hierarchy, so Binary Classification Metrics can support metrics in MultiClassMetrics (accuracy, recall etc.). 5. Possibly to be used in training summary. The current implementation is still based on MLlib Metrics, which is kept completely internal and can be changed to DataFrame-based calculation when necessary. ## How was this patch tested? new unit tests added. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hhbyyh/spark mlMetrics Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20028.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20028 ---- commit 5d5978e5c83bc2037646f40d6db23059af530a15 Author: Yuhao Yang <yuhao.yang@...> Date: 2017-12-20T02:51:19Z add ml metrics ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org