[ https://issues.apache.org/jira/browse/KYLIN-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832725#comment-17832725 ]
ASF subversion and git services commented on KYLIN-5745: -------------------------------------------------------- Commit 180ff07afeaf64da8e565b7aa5882c3a89f1c268 in kylin's branch refs/heads/kylin5 from fengguangyuan [ https://gitbox.apache.org/repos/asf?p=kylin.git;h=180ff07afe ] KYLIN-5745 Using a global thread pool to clean underlying storages 1. Using a global thread pool to clean underlying storages; 2. Launching cleaning tasks in the local thread and to ignore FileNotFoundException while collecting HDFS files. Co-authored-by: Guangyuan Feng <guangyuan.f...@kyligence.io> > The historical garbage cleanup task was not completed, causing the subsequent > scheduled garbage cleanup task cannot be executed normally > ---------------------------------------------------------------------------------------------------------------------------------------- > > Key: KYLIN-5745 > URL: https://issues.apache.org/jira/browse/KYLIN-5745 > Project: Kylin > Issue Type: Bug > Affects Versions: 5.0-beta > Reporter: zhong.zhu > Assignee: zhong.zhu > Priority: Major > Fix For: 5.0.0 > > > {*}Problem description{*}: > Timed garbage cleanup operation cannot be completed successfully > {*}Background{*}: > The customer found that Kylin has a large number of small files occupying > hdfs storage, we need to clean up, we check the customer's environment and > found that the timed garbage cleanup has not been completed properly, has > been timeout! > *Troubleshooting:* > After the check, it is found that the customer's garbage clearing is > triggered for the first time in the morning of 4.6 after Kylin is restarted > on the night of 4.5. After this clearing operation is triggered, the thread > of query history has been deleted since then. As a result, subsequent > periodic garbage clearing tasks cannot be completed > Delete 2,000 rows of data at a time, one of the customer's projects need to > delete 550,000 query history, look at the kylin.log record, delete > time-consuming because of table locking problems lead to a delete operation > even reached more than 20 minutes! > The following record is that the main thread of garbage collection is waiting > for the query history cleaning to complete, but the query history cleaning > has not been completed, and then the main thread timeout and exit. > {code:shell} > 2023-04-06T00:00:00,015 INFO [RoutineOpsWorker-287] service.ScheduleService > : execute task MetadataBackup with remaining time: 14399995 ms > 2023-04-06T00:01:52,649 INFO [RoutineOpsWorker-287] service.ScheduleService > : execute task QueryHistoriesCleanup with remaining time: 14287361 ms > ... > 2023-04-06T04:00:00,012 WARN [DefaultTaskScheduler-3] > service.ScheduleService : Routine task execution timeout > java.util.concurrent.TimeoutException: null > at java.util.concurrent.FutureTask.get(FutureTask.java:205) > ~[?:1.8.0_242] > at > org.apache.kylin.rest.service.ScheduleService.executeTask(ScheduleService.java:107) > ~[kylin-job-service-5.0.0-ke-4.6.2.0.jar:?] > at > org.apache.kylin.rest.service.ScheduleService.routineTask(ScheduleService.java:77) > ~[kylin-job-service-5.0.0-ke-4.6.2.0.jar:?] > at > org.apache.kylin.rest.service.ScheduleService$$FastClassBySpringCGLIB$$afbfc46c.invoke(<generated>) > ~[kylin-job-service-5.0.0-ke-4.6.2.0.jar:?] > {code} > The following record is until the latest time provided by the log, after 9:00 > pm the query history is still processing deletion, not with the termination > of the main thread > {code:shell} > 2023-04-06T00:08:43,015 DEBUG [QueryHistoryCleanWorker-23145] > QueryHistoryMapper.selectByProject : <== Total: 12 > 2023-04-06T00:08:43,016 INFO [QueryHistoryCleanWorker-23145] > util.QueryHisStoreUtil : Query histories of project<CPIC_FRP> is less than > the maximum limit, so skip it. > 2023-04-06T00:08:43,016 INFO [QueryHistoryCleanWorker-23145] > util.QueryHisStoreUtil : Query histories of project<CXAIMA> is less than the > maximum limit, so skip it. > 2023-04-06T00:08:43,016 INFO [QueryHistoryCleanWorker-23145] > util.QueryHisStoreUtil : Query histories of project<CXCDC> is less than the > maximum limit, so skip it. > 2023-04-06T00:08:43,016 INFO [QueryHistoryCleanWorker-23145] > util.QueryHisStoreUtil : Query histories of project<CXCRMS> is less than the > maximum limit, so skip it. > 2023-04-06T00:08:43,017 INFO [QueryHistoryCleanWorker-23145] > util.QueryHisStoreUtil : Start to delete query histories that are beyond max > size for project<CXCZH>, records:1551669 > ... > 2023-04-06T09:03:54,974 INFO [QueryHistoryCleanWorker-23145] > query.JdbcQueryHistoryStore : Delete 2000 row query history for project > [CXCZH] takes 938060 ms > 2023-04-06T09:03:54,975 DEBUG [QueryHistoryCleanWorker-23145] > QueryHistoryMapper.delete : ==> Preparing: delete from > ke4_instance_query_history_realization where query_time < ? and project_name > = ? > 2023-04-06T09:03:54,975 DEBUG [QueryHistoryCleanWorker-23145] > QueryHistoryMapper.delete : ==> Parameters: 1678863450091(Long), CXCZH(String) > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)