[ 
https://issues.apache.org/jira/browse/SPARK-26852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-26852.
-------------------------------
    Resolution: Not A Problem

> CrossValidator: support transforming metrics to absolute values prior to 
> min/max test
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-26852
>                 URL: https://issues.apache.org/jira/browse/SPARK-26852
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.4.0
>            Reporter: Ben Weber
>            Priority: Minor
>              Labels: starter
>
> When writing a custom Evaluator with PySpark, it's often useful to be able to 
> support negative values in the evaluate function. For example, the relative 
> difference between predicted and actual values. In this case, the goal is to 
> select the value closest to 0 rather than the smaller or largest value. We 
> should add a flag that enables users to specify this scenario.
> For example, CrossValidator may be used with a parameter grid that results in 
> the following metric values for different folds:
>  * [ 0.5, 0.5, 0.5, 0, 0 ]
>  * [ 0.5, -0.5, 0.5, 0, 0  ] 
>  * [ -0.5, -0.5, -0.5, 0, 0 ]
> This results in the following values for avgMetrics: [ 1.5, 0.5, -1.5 ]. 
> There is currently no way to tell the cross validator to select the second 
> model, with the avg metrics closest to zero. 
> Here's an example Evaluator where this functionality is useful:
> {code:java}
> from pyspark.ml.evaluation import Evaluator
> from pyspark.sql import functions as F
> class SumEvaluator(Evaluator):
>     def __init__(self, predictionCol="prediction", labelCol="label"):
>         self.predictionCol = predictionCol
>         self.labelCol = labelCol
>     def _evaluate(self, dataset):
>         actual = dataset.select(F.sum(self.labelCol)).collect()[0][0]
>         prediction = 
> dataset.select(F.sum(self.predictionCol)).collect()[0][0] 
>         return ((prediction - actual)/actual) 
>     def isLargerBetter(self):
>         return False 
>     def applyAbsoluteTransform(self):
>         return True
> {code}
> This is a custom evaluator that compares the different between the total and 
> predicted values in a regression problem. I am proposing a new function for 
> the Evaluator, that specifies if an absolute transformation should be applied 
> to the cross validated metrics. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to