It's because of different API design. *RDD.checkpoint* returns void, which means it mutates the RDD state so you need a *RDD**.isCheckpointed* method to check if this RDD is checkpointed.
*Dataset.checkpoint* returns a new Dataset, which means there is no isCheckpointed state in Dataset, and thus we don't need a *Dataset.isCheckpointed* method. On Wed, Oct 25, 2017 at 6:39 PM, Bernard Jesop <bernard.je...@gmail.com> wrote: > Actually, I realized keeping the info would not be enough as I need to > find back the checkpoint files to delete them :/ > > 2017-10-25 19:07 GMT+02:00 Bernard Jesop <bernard.je...@gmail.com>: > >> As far as I understand, Dataset.rdd is not the same as InternalRDD. >> It is just another RDD representation of the same Dataset and is created >> on demand (lazy val) when Dataset.rdd is called. >> This totally explains the observed behavior. >> >> But how would would it be possible to know that a Dataset have been >> checkpointed? >> Should I manually keep track of that info? >> >> 2017-10-25 15:51 GMT+02:00 Bernard Jesop <bernard.je...@gmail.com>: >> >>> Hello everyone, >>> >>> I have a question about checkpointing on dataset. >>> >>> It seems in 2.1.0 that there is a Dataset.checkpoint(), however unlike >>> RDD there is no Dataset.isCheckpointed(). >>> >>> I wonder if Dataset.checkpoint is a syntactic sugar for >>> Dataset.rdd.checkpoint. >>> When I do : >>> >>> Dataset.checkpoint; Dataset.count >>> Dataset.rdd.isCheckpointed // result: false >>> >>> However, when I explicitly do: >>> Dataset.rdd.checkpoint; Dataset.rdd.count >>> Dataset.rdd.isCheckpointed // result: true >>> >>> Could someone explain this behavior to me, or provide some references? >>> >>> Best regards, >>> Bernard >>> >> >> >