See the doc for checkpoint: * Mark this RDD for checkpointing. It will be saved to a file inside the checkpoint * directory set with `SparkContext#setCheckpointDir` and all references to its parent * RDDs will be removed. *This function must be called before any job has been* * * executed on this RDD*. It is strongly recommended that this RDD is persisted in * memory, otherwise saving it on a file will require recomputation.
>From the above description, you should not call it at the end of transformations. Cheers On Wed, Mar 23, 2016 at 7:14 PM, Todd <bit1...@163.com> wrote: > Hi, > > I have a long computing chain, when I get the last RDD after a series of > transformation. I have two choices to do with this last RDD > > 1. Call checkpoint on RDD to materialize it to disk > 2. Call RDD.saveXXX to save it to HDFS, and read it back for further > processing > > I would ask which choice is better? It looks to me that is not much > difference between the two choices. > Thanks! > > >