[jira] [Commented] (SPARK-3550) Disable automatic rdd caching in python api for relevant learners

Aaron Staple (JIRA) Thu, 25 Sep 2014 14:37:11 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-3550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148347#comment-14148347
 ]


Aaron Staple commented on SPARK-3550:
-------------------------------------

This has been addressed in another commit: 
https://github.com/apache/spark/commit/fce5e251d636c788cda91345867e0294280c074d

See comment here:
https://github.com/apache/spark/pull/2412#issuecomment-56865408

> Disable automatic rdd caching in python api for relevant learners
> -----------------------------------------------------------------
>
>                 Key: SPARK-3550
>                 URL: https://issues.apache.org/jira/browse/SPARK-3550
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib, PySpark
>            Reporter: Aaron Staple
>
> The python mllib api automatically caches training rdds. However, the 
> NaiveBayes, ALS, and DecisionTree learners do not require external caching to 
> prevent repeated RDD re-evaluation during learning. NaiveBayes only evaluates 
> its input RDD once, while ALS and DecisionTree internally persist 
> transformations of their input RDDs. For these learners, we should disable 
> the automatic caching in the python mllib api.
> See discussion here:
> https://github.com/apache/spark/pull/2362#issuecomment-55637953



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3550) Disable automatic rdd caching in python api for relevant learners

Reply via email to