Joseph K. Bradley created SPARK-19053:
-----------------------------------------

             Summary: Supporting multiple evaluation metrics in DataFrame-based 
API: discussion
                 Key: SPARK-19053
                 URL: https://issues.apache.org/jira/browse/SPARK-19053
             Project: Spark
          Issue Type: Brainstorming
          Components: ML
            Reporter: Joseph K. Bradley


This JIRA is to discuss supporting the computation of multiple evaluation 
metrics efficiently in the DataFrame-based API for MLlib.

In the RDD-based API, RegressionMetrics and other *Metrics classes support 
efficient computation of multiple metrics.

In the DataFrame-based API, there are a few options:
* model/result summaries (e.g., LogisticRegressionSummary): These currently 
provide the desired functionality, but they require a model and do not let 
users compute metrics manually from DataFrames of predictions and true labels.
* Evaluator classes (e.g., RegressionEvaluator): These only support computing a 
single metric in one pass over the data, but they do not require a model.
* new class analogous to Metrics: We could introduce a class analogous to 
Metrics.  Model/result summaries could use this internally as a replacement for 
spark.mllib Metrics classes, or they could (maybe) inherit from these classes.

Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to