HeartSaVioR edited a comment on pull request #32136: URL: https://github.com/apache/spark/pull/32136#issuecomment-846329353
I'm not sure about the scenario of leveraging PVC as checkpoint location - at least that sounds to me as beyond the support of checkpoint in Structured Streaming. We have been clearly describing about the requirement of checkpoint location in Structured Streaming guide page, like following: > Checkpoint location: For some output sinks where the end-to-end fault-tolerance can be guaranteed, specify the location where the system will write all the checkpoint information. This should be a directory in an HDFS-compatible fault-tolerant file system. The semantics of checkpointing is discussed in more detail in the next section. I know we allow custom checkpoint manager implementations to deal with non-HDFS compatible file system (like object stores which don't provide "atomic rename"), but they still deal with "remote" "fault-tolerant" file system, and doesn't require Spark scheduler to schedule specific task to specific executor based on the availability of checkpoint. In other words, only checkpoint manager handles the complexity of checkpoint on file system, not somewhere else. And sounds like it's no longer holding true if we want to support PVC based checkpoint. Please correct me if I'm missing something. I'm more likely novice on cloud/k8s, but from the common sense, I guess the actual storage of PVC should be still a sort of network storage to be resilient on "node down". I'm wondering how much benefits PVC approach gives compared to the existing approach as just directly use remote fault-tolerant file system. The benefits should be clear to cope with additional complexity. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org