HeartSaVioR commented on issue #26416: [SPARK-29779][CORE] Compact old event 
log files and cleanup
URL: https://github.com/apache/spark/pull/26416#issuecomment-557393283
 
 
   > The differences between the examples you mention (streaming query vs. long 
batch job) can be worked around in the code. e.g., for the long batch case, you 
can decide not to compact because there are not enough finished jobs after 
parsing an event log. But let's say you parse 2 or 3 of them and then you start 
seeing jobs going away, you can do some compaction. You could keep some state 
to help with figuring that out.
   
   Nice approach. That reminds me to look back what compaction does; regardless 
of the characteristic of query, compaction would help if there're jobs being 
ended. I may need to also track the "ended" jobs to see the rate of running vs 
ended. (Comparing # of tasks instead of # of jobs would make the prediction of 
saving spaces more accurate.) I'll apply this change.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to