HeartSaVioR edited a comment on issue #25577: [WIP][CORE][SPARK-28867] InMemoryStore checkpoint to speed up replay log file in HistoryServer URL: https://github.com/apache/spark/pull/25577#issuecomment-532245797 @Ngone51 You may want to refresh your understanding of my design doc of SPARK-28594. Technically later part of SPARK-28594 and SPARK-28867 are doing the same. In SHS part, SPARK-28594 will initialize AppStatusListener/SQLAppStatusListener with KVStore being restored from snapshot file, and apply event log files via replaying. Where's the difference? Only differences between two are "when to snapshot". In SPARK-28594, I'm intended to create snapshot file to combine specific number of event log files so far into one file. This is to follow the maintenance of HDFSBackedStateStoreProvider - snapshot is taken for specific interval of batch and it effectively replaces delta files created until that branch so far. In SPARK-28867, it creates snapshot (checkpoint) for specific line because the event log file is single. There's no big difference. Actually my next plan is applying this to single log file as well, and surprisingly you raised the patch, so wanted to share the load between us. I think the common issues between two issues are: 1) snapshot/restore of KVStore 2) make sure AppStatusListener/SQLAppStatusListener are syncing up its full of state (including live entities and metrics) with underlying KVStore. For 1) I've already submitted a patch (#25811). I have been planning to do 2) for my next task, but looks like you're already investigated the details of 2) (I'm just starting from scratch, so it would be really great if you take up this work), and then I'm happy to wait for your work and participate reviewing. Now my major work is to finish SSPARK-28594, so I might ping about your work for 2) once you haven't have time to finish it for couple of weeks. Does it make sense?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org