HeartSaVioR commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#issuecomment-557393283 > The differences between the examples you mention (streaming query vs. long batch job) can be worked around in the code. e.g., for the long batch case, you can decide not to compact because there are not enough finished jobs after parsing an event log. But let's say you parse 2 or 3 of them and then you start seeing jobs going away, you can do some compaction. You could keep some state to help with figuring that out. Nice approach. That reminds me to look back what compaction does; regardless of the characteristic of query, compaction would help if there're jobs being ended. I may need to also track the "ended" jobs to see the rate of running vs ended. (Comparing # of tasks instead of # of jobs would make the prediction of saving spaces more accurate.) I'll apply this change.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org