HeartSaVioR edited a comment on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-639873447


   Sorry my comment was edited so you may be missed the content, but it is also 
a sort of pointing out for "pinpointing" - do you think your approach works 
with other state store providers as well? The root cause isn't bound to the 
implementation of state store provider but this patch is only addressing HDFS 
state store provider.
   
   I guess you're trying to find how it can be done less frequently, first time 
the state is loaded from the file, which is optimal. While I think it can be 
even done without binding to the state store provider implementation if we 
really need it (check only once when the provider instance is created), have we 
measured the actual overhead? If the overhead turns out to be trivial then it 
won't be matter we run validation check for each batch. It sounds to be 
sub-optimal, but the overhead would be trivial.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to