Hi ML/MLLib developers, 1. I'm trying to add a weights column to ml spark evaluators (RegressionEvaluator, BinaryClassificationEvaluator, MutliclassClassificationEvaluator) that use mllib metrics and I have a few questions (JIRA 2. SPARK-18693<https://issues.apache.org/jira/browse/SPARK-18693>). I didn't see any similar question on the forums or stackoverflow. Moving forward, will we keep mllib metrics (RegressionMetrics, MulticlassMetrics, BinaryClassificationMetrics) as something separate to the evaluators, or will we remove them when mllib is removed in spark 3.0? The mllib metrics seem very useful because they are able to compute/expose many metrics on one dataset, whereas with the evaluators it is not performant to re-evaluate the entire dataset for a single different metric. For example, if I calculate the RMSE and then MSE using the ML RegressionEvaluator, I will be redoing most of the work twice, so the ML api doesn't make sense to use in this scenario. Also, the ml evaluators expose a lot fewer metrics than the mllib metrics classes, so it seems like the ml evaluators are not at parity with the mllib metrics classes. I can see how the ml evaluators are useful in CrossValidator, but for exploring all metrics from a scored dataset it doesn't really make sense to use them. >From the viewpoint of exploring all metrics for a scored model, does this mean >that the mllib metrics classes should be moved to ml? That would solve my issue if that is what is planned in the future. However, that doesn't make sense to me, because it may cause some confusion for ml users to see metrics and evaluators classes.
Instead, it seems like the ml evaluators need to be changed at the api layer to: 1. Allow the user to either retrieve a single value 2. Allow the user to retrieve all metrics or a set of metrics One possibility would be to overload evaluate so that we would have something like: override def evaluate(dataset: Dataset[_]): Double override def evaluate(dataset: Dataset[_], metrics:Array[String]): Dataset[_] But for some metrics like confusion matrix you couldn't really fit the data into the result of the second api in addition to the single-value metrics. The format of the mllib metrics classes was much more convenient, as you could retrieve them directly. Following this line of thought, maybe the APIs could be: override def evaluate(dataset: Dataset[_]): Double def evaluateMetrics(dataset: Dataset[_]): RegressionEvaluation (or classification/multiclass etc) where the evaluation class returned will have very similar fields to the corresponding mllib RegressionMetrics class that can be called by the user. Any thoughts/ideas about spark ml evaluators/mllib metrics apis, coding suggestions for the api proposed, or a general roadmap would be really appreciated. Thank you, Ilya