[ https://issues.apache.org/jira/browse/SPARK-8666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-8666. ------------------------------ Resolution: Duplicate > checkpointing does not take advantage of persisted/cached RDDs > -------------------------------------------------------------- > > Key: SPARK-8666 > URL: https://issues.apache.org/jira/browse/SPARK-8666 > Project: Spark > Issue Type: New Feature > Reporter: Glenn Strycker > > I have been noticing that when checkpointing RDDs, all operations are > occurring TWICE. > For example, when I run the following code and watch the stages... > {noformat} > val newRDD = prevRDD.map(a => (a._1, > 1L)).distinct.persist(StorageLevel.MEMORY_AND_DISK_SER) > newRDD.checkpoint > print(newRDD.count()) > {noformat} > I see distinct and count operations appearing TWICE, and shuffle disk writes > and reads (from the distinct) occurring TWICE. > My newRDD is persisted to memory, why can't the checkpoint simply save those > partitions to disk when the first operations have completed? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org