HeartSaVioR edited a comment on pull request #28707: URL: https://github.com/apache/spark/pull/28707#issuecomment-639873447
Sorry my comment was edited so you may be missed the content, but it is also a sort of pointing out of "pinpoint" - do you think your approach works with other state store providers as well? The root cause isn't bound to the implementation of state store provider but this patch is only addressing HDFS state store provider. I guess you're trying to find how it can be done less frequently, first time the state is loaded from the file, which is optimal. While I think it can be even done without binding to the state store provider implementation if we really need it, have we measured the actual overhead? If the overhead turns out to be trivial then it won't be matter we run validation check for each batch. It sounds to be sub-optimal, but the overhead would be trivial. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org