[ https://issues.apache.org/jira/browse/SPARK-26852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-26852. ------------------------------- Resolution: Not A Problem > CrossValidator: support transforming metrics to absolute values prior to > min/max test > ------------------------------------------------------------------------------------- > > Key: SPARK-26852 > URL: https://issues.apache.org/jira/browse/SPARK-26852 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 2.4.0 > Reporter: Ben Weber > Priority: Minor > Labels: starter > > When writing a custom Evaluator with PySpark, it's often useful to be able to > support negative values in the evaluate function. For example, the relative > difference between predicted and actual values. In this case, the goal is to > select the value closest to 0 rather than the smaller or largest value. We > should add a flag that enables users to specify this scenario. > For example, CrossValidator may be used with a parameter grid that results in > the following metric values for different folds: > * [ 0.5, 0.5, 0.5, 0, 0 ] > * [ 0.5, -0.5, 0.5, 0, 0 ] > * [ -0.5, -0.5, -0.5, 0, 0 ] > This results in the following values for avgMetrics: [ 1.5, 0.5, -1.5 ]. > There is currently no way to tell the cross validator to select the second > model, with the avg metrics closest to zero. > Here's an example Evaluator where this functionality is useful: > {code:java} > from pyspark.ml.evaluation import Evaluator > from pyspark.sql import functions as F > class SumEvaluator(Evaluator): > def __init__(self, predictionCol="prediction", labelCol="label"): > self.predictionCol = predictionCol > self.labelCol = labelCol > def _evaluate(self, dataset): > actual = dataset.select(F.sum(self.labelCol)).collect()[0][0] > prediction = > dataset.select(F.sum(self.predictionCol)).collect()[0][0] > return ((prediction - actual)/actual) > def isLargerBetter(self): > return False > def applyAbsoluteTransform(self): > return True > {code} > This is a custom evaluator that compares the different between the total and > predicted values in a regression problem. I am proposing a new function for > the Evaluator, that specifies if an absolute transformation should be applied > to the cross validated metrics. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org