[ https://issues.apache.org/jira/browse/KYLIN-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835418#comment-17835418 ]
pengfei.zhan commented on KYLIN-5789: ------------------------------------- h1. Design !KYLIN_5789.png! h2. Store the root path: Default Configuration {code:java} kylin.engine.spark-conf.spark.history.fs.logDirectory=${kylin.env.hdfs-working-dir}/spark-history kylin.engine.spark-conf.spark.eventLog.dir=${kylin.env.hdfs-working-dir}/spark-history kylin.storage.columnar.spark-conf.spark.eventLog.dir=${kylin.env.hdfs-working-dir}/sparder-history kylin.storage.columnar.spark-conf.spark.eventLog.rolling.enabled=true kylin.storage.columnar.spark-conf.spark.eventLog.rolling.maxFileSize=100m {code} sparder: ${kylin.storage.columnar.spark-conf.spark.eventLog.dir}/hostname_port/build: \{kylin.engine.spark-conf.spark.eventLog.dir} spark history of building job supports project-level configuration. h2. Storage Format *Sparder:* Related default parameters: kylin.storage.columnar.spark-conf.spark.eventLog.rolling.enabled=true Sparder enables trolling by default, which creates a directory for each Spark Application to store event logs. The folder name for event logs is in the format: eventlog_v2_appId(). The event logs folder stores the event logs of the corresponding application. The event log file name format is: events_\{file_index}_\{appid}_\{timestamp}. When Sparder is not finished, there is an empty file appstatus_\{appId}.inprogress. When Sparder finishes normally, the inprogress suffix is removed. *Job:* The spark event log for each build task is saved in a single file, and the .inprogress suffix is used to indicate if the event log has not completed. h2. Cleanup Strategy: Build task cleanup time threshold: kylin.garbage.storage.executable-survival-time-threshold, default 30d Query history cleanup time threshold: kylin.query.queryhistory.survival-time-threshold, default 30d h2. Scheduler task For query eventlog , each job and all nodes will perform the cleanup task regularly, the global node will broadcast the request to clean up the sparder eventlog to all query nodes (http://ip:port/kylin/api/system/clean_sparder_ eventslogs), each KE node will only clean up the sparder event files under the current startup port directory, which is ${kylin.storage.columnar.spark-conf.spark.eventLog.dir}/${hostname_port}, the files under this directory. If the folder starts with eventlog_v2, delete all files in this directory when lastmodifytime < min (configured time threshold, the end time of the first queryhistory). For build eventlog, you need to iterate through the project-level configuration of all projects, \{kylin.engine.spark-conf.spark.eventLog.dir}/spark-history The files in this directory, if they start with application_, will be deleted when lastmodifytime < min(configured time threshold , the end time of the first queryhistory) is deleted. lastmodifytime < min (the configured time threshold for the end of the earliest build) will be deleted. h2. FastRoutineTool For the query eventlog, since the command line tool is directly related to the port on which KE starts, clean up the files in the ${kylin.storage.columnar.spark-conf.spark.eventLog.dir} directory, and delete all the files in the folder starting with hostname_port when lastmodifytime < min(configured time threshold, the end time of the first queryhistory) will be deleted. For build eventlog, you need to iterate through the project level configuration of all projects, \{kylin.engine.spark-conf.spark.eventLog.dir}/spark-history The files in this directory, if they start with application_, will be deleted when lastmodifytime < min(configured time threshold, the end time of the first queryhistory). lastmodifytime < min (the configured time threshold for the end of the earliest build) will be deleted. h2. RoutineTool For the query eventlog, since the command line tool is not directly related to the port where KE is started, clean up the files in the ${kylin.storage.columnar.spark-conf.spark.eventLog.dir} directory, and delete all the files in the folder starting with hostname_port when lastmodifytime < min(configured time threshold, the end time of the first queryhistory) will be deleted. For build eventlog, you need to iterate through the project level configuration of all projects, \{kylin.engine.spark-conf.spark.eventLog.dir}/spark-history The files in this directory, if they start with application_, will be deleted when lastmodifytime < min(configured time threshold, the end time of the first queryhistory). lastmodifytime < min (the configured time threshold for the end of the earliest build) will be deleted. > Clean sparder history and spark history automatically > ----------------------------------------------------- > > Key: KYLIN-5789 > URL: https://issues.apache.org/jira/browse/KYLIN-5789 > Project: Kylin > Issue Type: Bug > Components: Job Engine, Query Engine > Affects Versions: 5.0-beta > Reporter: pengfei.zhan > Assignee: pengfei.zhan > Priority: Major > Fix For: 5.0-beta > > Attachments: KYLIN_5789.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010)