HeartSaVioR edited a comment on issue #25577: [WIP][CORE][SPARK-28867] 
InMemoryStore checkpoint to speed up replay log file in HistoryServer
URL: https://github.com/apache/spark/pull/25577#issuecomment-532245797
 
 
   @Ngone51 
   You may want to refresh your understanding of my design doc of SPARK-28594. 
Technically later part of SPARK-28594 and SPARK-28867 are doing the same. 
   
   In SHS part, SPARK-28594 will initialize 
AppStatusListener/SQLAppStatusListener with KVStore being restored from 
snapshot file, and apply event log files via replaying. 
   
   Where's the difference? Only differences between two are "when to snapshot". 
In SPARK-28594, I'm intended to create snapshot file to combine specific number 
of event log files so far into one file. This is to follow the maintenance of 
HDFSBackedStateStoreProvider - snapshot is taken for specific interval of batch 
and it effectively replaces delta files created until that branch so far. In 
SPARK-28867, it creates snapshot (checkpoint) for specific line because the 
event log file is single. 
   
   There's no big difference. Actually my next plan is applying this to single 
log file as well, and surprisingly you raised the patch, so wanted to share the 
load between us.
   
   I think the common issues between two issues are: 1) snapshot/restore of 
KVStore 2) make sure AppStatusListener/SQLAppStatusListener are syncing up its 
full of state (including live entities and metrics) with underlying KVStore. 
For 1) I've already submitted a patch (#25811). I have been planning to do 2) 
for my next task, but looks like you're already investigated the details of 2) 
(I'm just starting from scratch, so it would be really great if you take up 
this work), and then I'm happy to wait for your work and participate reviewing. 
Now my major work is to finish SSPARK-28594, so I might ping about your work 
for 2) once you haven't have time to finish it for couple of weeks.
   
   Does it make sense?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to