[ 
https://issues.apache.org/jira/browse/SPARK-3550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135610#comment-14135610
 ] 

Apache Spark commented on SPARK-3550:
-------------------------------------

User 'staple' has created a pull request for this issue:
https://github.com/apache/spark/pull/2412

> Disable automatic rdd caching in python api for relevant learners
> -----------------------------------------------------------------
>
>                 Key: SPARK-3550
>                 URL: https://issues.apache.org/jira/browse/SPARK-3550
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib, PySpark
>            Reporter: Aaron Staple
>
> The python mllib api automatically caches training rdds. However, the 
> NaiveBayes, ALS, and DecisionTree learners do not require external caching to 
> prevent repeated RDD re-evaluation during learning. NaiveBayes only evaluates 
> its input RDD once, while ALS and DecisionTree internally persist 
> transformations of their input RDDs. For these learners, we should disable 
> the automatic caching in the python mllib api.
> See discussion here:
> https://github.com/apache/spark/pull/2362#issuecomment-55637953



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to