GitHub user hhbyyh opened a pull request:

    https://github.com/apache/spark/pull/20028

    [SPARK-19053][ML]Supporting multiple evaluation metrics in DataFrame-based 
API

    ## What changes were proposed in this pull request?
    
    As an initial step, the PR creates BinaryClassificationMetrics, 
MultiClassClassificationMetrics, RegressionMetrics in ML. The long term target 
is to reach function parity with MLlib Metrics and be able to provide more 
enhancements for DataFrame-based API.
    
    This PR allows the Binary/MultilclassClassification/Regression Evaluator 
return a corresponding metrics instance, and users can use the metrics instance 
to access multiple metrics after a single pass, while originally Evaluator only 
allows access for single metric.  
    ```
    val evaluator = new 
MulticlassClassificationEvaluator().setMetricName("accuracy")
    val metrics = evaluator.getMetrics(df)
    metrics.accuracy
    metrics.weightedFMeasure
    metrics.weightedPrecision
    metrics.weightedRecall
    ```
    
    To make the review easier, the current PR only includes metrics in the 
Evaluator to reach function parity. Plan for further development:
    1. Initial API and function parity with ML Evaluators. (This PR)
    2. Python API.
    2. Function parity with MLlib Metrics.
    3. Add requested enhancements like including weight, add per-row metrics, 
add ranking metrics.
    4. Reorganize classification Metrics hierarchy, so Binary Classification 
Metrics can support metrics in MultiClassMetrics (accuracy, recall etc.).
    5.  Possibly to be used in training summary.
    
    The current implementation is still based on MLlib Metrics, which is kept 
completely internal and can be changed to DataFrame-based calculation when 
necessary.
    
    
    ## How was this patch tested?
    
    new unit tests added.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hhbyyh/spark mlMetrics

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20028.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20028
    
----
commit 5d5978e5c83bc2037646f40d6db23059af530a15
Author: Yuhao Yang <yuhao.yang@...>
Date:   2017-12-20T02:51:19Z

    add ml metrics

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to