It is a bit more than syntactic sugar, but not much more: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L533
BTW this is basically writing all the data out, and then create a new Dataset to load them in. On Wed, Oct 25, 2017 at 6:51 AM, Bernard Jesop <bernard.je...@gmail.com> wrote: > Hello everyone, > > I have a question about checkpointing on dataset. > > It seems in 2.1.0 that there is a Dataset.checkpoint(), however unlike RDD > there is no Dataset.isCheckpointed(). > > I wonder if Dataset.checkpoint is a syntactic sugar for > Dataset.rdd.checkpoint. > When I do : > > Dataset.checkpoint; Dataset.count > Dataset.rdd.isCheckpointed // result: false > > However, when I explicitly do: > Dataset.rdd.checkpoint; Dataset.rdd.count > Dataset.rdd.isCheckpointed // result: true > > Could someone explain this behavior to me, or provide some references? > > Best regards, > Bernard >