Hi,

One of the goals of checkpointing is to cut the RDD lineage. Otherwise you
run into StackOverflowExceptions. If you eagerly checkpoint, you basically
cut the lineage there, and the next operations all depend on the
checkpointed DataFrame. If you don't checkpoint, you continue to build the
lineage, therefore while that lineage is being resolved, you may hit the
StackOverflowException.

HTH,
Burak

On Thu, Jan 26, 2017 at 10:36 AM, Jean Georges Perrin <j...@jgp.net> wrote:

> Hey Sparkers,
>
> Trying to understand the Dataframe's checkpoint (*not* in the context of
> streaming) https://spark.apache.org/docs/latest/api/
> java/org/apache/spark/sql/Dataset.html#checkpoint(boolean)
>
> What is the goal of the *eager* flag?
>
> Thanks!
>
> jg
>

Reply via email to