See the doc for checkpoint:

   * Mark this RDD for checkpointing. It will be saved to a file inside the
checkpoint
   * directory set with `SparkContext#setCheckpointDir` and all references
to its parent
   * RDDs will be removed. *This function must be called before any job has
been*
*   * executed on this RDD*. It is strongly recommended that this RDD is
persisted in
   * memory, otherwise saving it on a file will require recomputation.

>From the above description, you should not call it at the end of
transformations.

Cheers

On Wed, Mar 23, 2016 at 7:14 PM, Todd <bit1...@163.com> wrote:

> Hi,
>
> I have a long computing chain, when I get the last RDD after a series of
> transformation. I have two choices to do with this last RDD
>
> 1. Call checkpoint on RDD to materialize it to disk
> 2. Call RDD.saveXXX to save it to HDFS, and read it back for further
> processing
>
> I would ask which choice is better? It looks to me that is not much
> difference between the two choices.
> Thanks!
>
>
>

Reply via email to