[ 
https://issues.apache.org/jira/browse/SPARK-37640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

muhong updated SPARK-37640:
---------------------------
    Description: 
when we set "{{{}spark.eventLog.rolling.enabled{}}} =true", the eventlog will 
be roll and compact(when set "spark.eventLog.compression.codec"), the directory 
tree like this

root dir: /spark2xJobHistory2x/eventlog_v2_application_xxxxxxxxxxx_xxx_1

file in dir:

 
/spark2xJobHistory2x/eventlog_v2_application_xxxx_xxx_1/events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd

 
/spark2xJobHistory2x/eventlog_v2_application_xxxx_xxx_1/events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd

 
/spark2xJobHistory2x/eventlog_v2_application_xxxx_xxx_1/events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd

......

......

 

a "long run" spark application, the history server will not clean the 
'events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd' file in 
/spark2xJobHistory2x/eventlog_v2_application_xxxxxxxxxxx_xxx_1, so the size of 
directory will be bigger and bigger during the whole lifetime of app. 

so i think we should provide a mechanism for user to clean the 
“events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd” file in 
/spark2xJobHistory2x/eventlog_v2_application_xxxxxxxxxxx_xxx_1 directory

 

our solution:add a clean function in 
“https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#checkForLogs”,this
 function will list the file in 
“/spark2xJobHistory2x/eventlog_v2_application_xxxxxxxxxxx_xxx_1” and clean the 
“events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd” file according to the config 
"spark.history.fs.cleaner.maxAge"

  was:
when we set "{{{}spark.eventLog.rolling.enabled{}}} =true", the eventlog will 
be roll and compact(when set "spark.eventLog.compression.codec"), the directory 
tree like this

root dir: /spark2xJobHistory2x/eventlog_v2_application_xxxxxxxxxxx_xxx_1

file in dir:

 
/spark2xJobHistory2x/eventlog_v2_application_xxxx_xxx_1/events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd

 
/spark2xJobHistory2x/eventlog_v2_application_xxxx_xxx_1/events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd

 
/spark2xJobHistory2x/eventlog_v2_application_xxxx_xxx_1/events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd

......

......

 

a "long run" spark application, the history server will not clean the 
'events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd' file in 
/spark2xJobHistory2x/eventlog_v2_application_xxxxxxxxxxx_xxx_1, so the size of 
directory will be bigger and bigger during the whole lifetime of app. 

so i think we should provide a mechanism for user to clean the 
“events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd” file in 
/spark2xJobHistory2x/eventlog_v2_application_xxxxxxxxxxx_xxx_1 directory

 


> rolled event log still need be clean after compact
> --------------------------------------------------
>
>                 Key: SPARK-37640
>                 URL: https://issues.apache.org/jira/browse/SPARK-37640
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.1.1
>            Reporter: muhong
>            Priority: Major
>
> when we set "{{{}spark.eventLog.rolling.enabled{}}} =true", the eventlog will 
> be roll and compact(when set "spark.eventLog.compression.codec"), the 
> directory tree like this
> root dir: /spark2xJobHistory2x/eventlog_v2_application_xxxxxxxxxxx_xxx_1
> file in dir:
>  
> /spark2xJobHistory2x/eventlog_v2_application_xxxx_xxx_1/events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd
>  
> /spark2xJobHistory2x/eventlog_v2_application_xxxx_xxx_1/events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd
>  
> /spark2xJobHistory2x/eventlog_v2_application_xxxx_xxx_1/events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd
> ......
> ......
>  
> a "long run" spark application, the history server will not clean the 
> 'events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd' file in 
> /spark2xJobHistory2x/eventlog_v2_application_xxxxxxxxxxx_xxx_1, so the size 
> of directory will be bigger and bigger during the whole lifetime of app. 
> so i think we should provide a mechanism for user to clean the 
> “events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd” file in 
> /spark2xJobHistory2x/eventlog_v2_application_xxxxxxxxxxx_xxx_1 directory
>  
> our solution:add a clean function in 
> “https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#checkForLogs”,this
>  function will list the file in 
> “/spark2xJobHistory2x/eventlog_v2_application_xxxxxxxxxxx_xxx_1” and clean 
> the “events_xxxx_application_xxxxxxxxxxx_xxxx_1.zstd” file according to the 
> config "spark.history.fs.cleaner.maxAge"



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to