[ https://issues.apache.org/jira/browse/SPARK-3550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135610#comment-14135610 ]
Apache Spark commented on SPARK-3550: ------------------------------------- User 'staple' has created a pull request for this issue: https://github.com/apache/spark/pull/2412 > Disable automatic rdd caching in python api for relevant learners > ----------------------------------------------------------------- > > Key: SPARK-3550 > URL: https://issues.apache.org/jira/browse/SPARK-3550 > Project: Spark > Issue Type: Improvement > Components: MLlib, PySpark > Reporter: Aaron Staple > > The python mllib api automatically caches training rdds. However, the > NaiveBayes, ALS, and DecisionTree learners do not require external caching to > prevent repeated RDD re-evaluation during learning. NaiveBayes only evaluates > its input RDD once, while ALS and DecisionTree internally persist > transformations of their input RDDs. For these learners, we should disable > the automatic caching in the python mllib api. > See discussion here: > https://github.com/apache/spark/pull/2362#issuecomment-55637953 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org