Github user szhem commented on the issue: https://github.com/apache/spark/pull/19410 @mallman Just my two cents regarding built-in solutions: Periodic checkpointer deletes checkpoint files not to pollute the hard drive. Although disk storage is cheap it's not free. For example, in my case (graph with >1B vertices and about the same amount of edges) checkpoint directory with a single checkpoint took about 150-200GB. Checkpoint interval was set to 5, and then job was able to complete in about 100 iterations. So in case of not cleaning up unnecessary checkpoints, the checkpoint directory could grow up to 6TB (which is quite a lot) in my case.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org