[ https://issues.apache.org/jira/browse/SPARK-37640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
muhong updated SPARK-37640: --------------------------- Description: when we set "{{{}spark.eventLog.rolling.enabled{}}} =true", the eventlog will be roll and compact(when set "spark.eventLog.compression.codec"), the directory tree like this root dir: /spark2xJobHistory2x/eventlog_v2_application_xxxxxxxxxxx_xxx_1 file in dir: /spark2xJobHistory2x/eventlog_v2_application_xxxx_xxx_1/events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd /spark2xJobHistory2x/eventlog_v2_application_xxxx_xxx_1/events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd /spark2xJobHistory2x/eventlog_v2_application_xxxx_xxx_1/events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd ...... ...... a "long run" spark application, the history server will not clean the 'events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd' file in /spark2xJobHistory2x/eventlog_v2_application_xxxxxxxxxxx_xxx_1, so the size of directory will be bigger and bigger during the whole lifetime of app. so i think we should provide a mechanism for user to clean the “events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd” file in /spark2xJobHistory2x/eventlog_v2_application_xxxxxxxxxxx_xxx_1 directory our solution:add a clean function in “https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#checkForLogs”,this function will list the file in “/spark2xJobHistory2x/eventlog_v2_application_xxxxxxxxxxx_xxx_1” and clean the “events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd” file according to the config "spark.history.fs.cleaner.maxAge" was: when we set "{{{}spark.eventLog.rolling.enabled{}}} =true", the eventlog will be roll and compact(when set "spark.eventLog.compression.codec"), the directory tree like this root dir: /spark2xJobHistory2x/eventlog_v2_application_xxxxxxxxxxx_xxx_1 file in dir: /spark2xJobHistory2x/eventlog_v2_application_xxxx_xxx_1/events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd /spark2xJobHistory2x/eventlog_v2_application_xxxx_xxx_1/events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd /spark2xJobHistory2x/eventlog_v2_application_xxxx_xxx_1/events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd ...... ...... a "long run" spark application, the history server will not clean the 'events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd' file in /spark2xJobHistory2x/eventlog_v2_application_xxxxxxxxxxx_xxx_1, so the size of directory will be bigger and bigger during the whole lifetime of app. so i think we should provide a mechanism for user to clean the “events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd” file in /spark2xJobHistory2x/eventlog_v2_application_xxxxxxxxxxx_xxx_1 directory > rolled event log still need be clean after compact > -------------------------------------------------- > > Key: SPARK-37640 > URL: https://issues.apache.org/jira/browse/SPARK-37640 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.1.1 > Reporter: muhong > Priority: Major > > when we set "{{{}spark.eventLog.rolling.enabled{}}} =true", the eventlog will > be roll and compact(when set "spark.eventLog.compression.codec"), the > directory tree like this > root dir: /spark2xJobHistory2x/eventlog_v2_application_xxxxxxxxxxx_xxx_1 > file in dir: > > /spark2xJobHistory2x/eventlog_v2_application_xxxx_xxx_1/events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd > > /spark2xJobHistory2x/eventlog_v2_application_xxxx_xxx_1/events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd > > /spark2xJobHistory2x/eventlog_v2_application_xxxx_xxx_1/events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd > ...... > ...... > > a "long run" spark application, the history server will not clean the > 'events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd' file in > /spark2xJobHistory2x/eventlog_v2_application_xxxxxxxxxxx_xxx_1, so the size > of directory will be bigger and bigger during the whole lifetime of app. > so i think we should provide a mechanism for user to clean the > “events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd” file in > /spark2xJobHistory2x/eventlog_v2_application_xxxxxxxxxxx_xxx_1 directory > > our solution:add a clean function in > “https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#checkForLogs”,this > function will list the file in > “/spark2xJobHistory2x/eventlog_v2_application_xxxxxxxxxxx_xxx_1” and clean > the “events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd” file according to the > config "spark.history.fs.cleaner.maxAge" -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org