Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10934#discussion_r51329576
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
    @@ -1535,6 +1535,10 @@ abstract class RDD[T: ClassTag](
     
       private[spark] var checkpointData: Option[RDDCheckpointData[T]] = None
     
    +  // Whether checkpoint all RDDs that are marked with the checkpoint flag.
    --- End diff --
    
    We need to expand on this comment:
    ```
    // Whether to checkpoint all RDDs that are marked for checkpointing. By 
default, we stop
    // as soon as we find the first such RDD. This optimization allows us to 
write less data
    // but is not safe for all workloads. E.g. in streaming we may checkpoint 
both an RDD
    // and its parent every batch, in which case the parent may never be 
checkpointed
    // and its lineage never truncated (SPARK-6847).
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to