[ 
https://issues.apache.org/jira/browse/SPARK-31217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064333#comment-17064333
 ] 

CacheCheck edited comment on SPARK-31217 at 3/22/20, 4:29 PM:
--------------------------------------------------------------

Besides, I think we also should add persist() APIs in other metrics class. 
E.g., _summary_ in RegressionMetrics.
In other three metrics classes, i.e., MulticlassMetrics, MultilabelMetrics, 
RankingMetrics, _predictionAndLabels_ is important and is used by multiple 
actions in object initialization, it's better to check if it is cached before. 
If not, we should cache it in these classes.


was (Author: spark_cachecheck):
Besides, I think we also should add persist() APIs in other metrics class. 
E.g., _summary_ in RegressionMetrics.
In other three metrics classes, i.e., MulticlassMetics, MultilabelMetrics, 
RankingMetrics, _predictionAndLabels_ is important and is used by multiple 
actions in object initialization, it's better to check if it is cached before. 
If not, we should cache it in these classes.

> Unnecessary persist on cumulativeCounts in BinaryClassificationMetrics
> ----------------------------------------------------------------------
>
>                 Key: SPARK-31217
>                 URL: https://issues.apache.org/jira/browse/SPARK-31217
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>    Affects Versions: 2.4.4, 2.4.5
>            Reporter: CacheCheck
>            Priority: Major
>
> In mllib.evaluation.BinaryClassificationMetrics, _cumulativeCounts_ is cached 
> in a lazy initialization. But when I run LogisticRegressionSummaryExample as 
> well as ModelSelectionViaCrossValidationExample, I find that cached 
> _cumulativeCounts_ only used by one action during execution. 
> So I think it should not be cached in initilization, we can set an extra 
> persist() API in this class, just as that the unpersist() API in 
> BinaryClassificationMetrics releases cached _cumulativeCounts_. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to