Joseph K. Bradley created SPARK-19053: -----------------------------------------
Summary: Supporting multiple evaluation metrics in DataFrame-based API: discussion Key: SPARK-19053 URL: https://issues.apache.org/jira/browse/SPARK-19053 Project: Spark Issue Type: Brainstorming Components: ML Reporter: Joseph K. Bradley This JIRA is to discuss supporting the computation of multiple evaluation metrics efficiently in the DataFrame-based API for MLlib. In the RDD-based API, RegressionMetrics and other *Metrics classes support efficient computation of multiple metrics. In the DataFrame-based API, there are a few options: * model/result summaries (e.g., LogisticRegressionSummary): These currently provide the desired functionality, but they require a model and do not let users compute metrics manually from DataFrames of predictions and true labels. * Evaluator classes (e.g., RegressionEvaluator): These only support computing a single metric in one pass over the data, but they do not require a model. * new class analogous to Metrics: We could introduce a class analogous to Metrics. Model/result summaries could use this internally as a replacement for spark.mllib Metrics classes, or they could (maybe) inherit from these classes. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org