Github user szhem commented on the issue:

    https://github.com/apache/spark/pull/19410
  
    @mallman 
    
    Just my two cents regarding built-in solutions:
    
    Periodic checkpointer deletes checkpoint files not to pollute the hard 
drive. Although disk storage is cheap it's not free. 
    
    For example, in my case (graph with >1B vertices and about the same amount 
of edges) checkpoint directory with a single checkpoint took about 150-200GB. 
    Checkpoint interval was set to 5, and then job was able to complete in 
about 100 iterations.
    So in case of not cleaning up unnecessary checkpoints, the checkpoint 
directory could grow up to 6TB (which is quite a lot) in my case.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to