mllib metrics vs ml evaluators and how to improve apis for users

Ilya Matiach Thu, 29 Dec 2016 20:38:05 -0800

Hi ML/MLLib developers,
1.    I'm trying to add a weights column to ml spark evaluators 
(RegressionEvaluator, BinaryClassificationEvaluator, 
MutliclassClassificationEvaluator) that use mllib metrics and I have a few 
questions (JIRA
2.    SPARK-18693<https://issues.apache.org/jira/browse/SPARK-18693>).  I 
didn't see any similar question on the forums or stackoverflow.
Moving forward, will we keep mllib metrics (RegressionMetrics, 
MulticlassMetrics, BinaryClassificationMetrics) as something separate to the 
evaluators, or will we remove them when mllib is removed in spark 3.0?
The mllib metrics seem very useful because they are able to compute/expose many 
metrics on one dataset, whereas with the evaluators it is not performant to 
re-evaluate the entire dataset for a single different metric.
For example, if I calculate the RMSE and then MSE using the ML 
RegressionEvaluator, I will be redoing most of the work twice, so the ML api 
doesn't make sense to use in this scenario.
Also, the ml evaluators expose a lot fewer metrics than the mllib metrics 
classes, so it seems like the ml evaluators are not at parity with the mllib 
metrics classes.
I can see how the ml evaluators are useful in CrossValidator, but for exploring 
all metrics from a scored dataset it doesn't really make sense to use them.
>From the viewpoint of exploring all metrics for a scored model, does this mean 
>that the mllib metrics classes should be moved to ml?
That would solve my issue if that is what is planned in the future.  However, 
that doesn't make sense to me, because it may cause some confusion for ml users 
to see metrics and evaluators classes.


Instead, it seems like the ml evaluators need to be changed at the api layer to:

  1.  Allow the user to either retrieve a single value
  2.  Allow the user to retrieve all metrics or a set of metrics
One possibility would be to overload evaluate so that we would have something 
like:

override def evaluate(dataset: Dataset[_]): Double
override def evaluate(dataset: Dataset[_], metrics:Array[String]): Dataset[_]

But for some metrics like confusion matrix you couldn't really fit the data 
into the result of the second api in addition to the single-value metrics.
The format of the mllib metrics classes was much more convenient, as you could 
retrieve them directly.
Following this line of thought, maybe the APIs could be:

override def evaluate(dataset: Dataset[_]): Double
def evaluateMetrics(dataset: Dataset[_]): RegressionEvaluation (or 
classification/multiclass etc)

where the evaluation class returned will have very similar fields to the 
corresponding mllib RegressionMetrics class that can be called by the user.

Any thoughts/ideas about spark ml evaluators/mllib metrics apis, coding 
suggestions for the api proposed, or a general roadmap would be really 
appreciated.

Thank you, Ilya

mllib metrics vs ml evaluators and how to improve apis for users

Reply via email to