HeartSaVioR edited a comment on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-639873447


   Sorry my comment was edited so you may be missed the content, but it is also 
a sort of pointing out of "pinpoint" - do you think your approach works with 
other state store providers as well? The root cause isn't bound to the 
implementation of state store provider but this patch is only addressing HDFS 
state store provider.
   
   I guess you're trying to find how it can be done less frequently, first time 
the state is loaded from the file, which is optimal. While I think it can be 
even done without binding to the state store provider implementation if we 
really need it, have we measured the actual overhead? If the overhead turns out 
to be trivial then it won't be matter we run validation check for each batch. 
It sounds to be sub-optimal, but the overhead would be trivial.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to