Hello, I have a quick question about RDD.checkpoint().
If the user calls RDD.checkpoint() and after the job finishes, the Spark would call RDD.doCheckpoint() to do the real physical checkpointing, that is to say, dump this RDD's partitions into HDFS. Does this mean that all its parents RDD scala objects and RDD's data (which is managed by BlockManager) will be garbage collected? And could you please point me to the relevant source code region, if possible? thanks, dachuan. -- Dachuan Huang Cellphone: 614-390-7234 2015 Neil Avenue Ohio State University Columbus, Ohio U.S.A. 43210
