zhong.zhu created KYLIN-5745:
--------------------------------
Summary: The historical garbage cleanup task was not completed,
causing the subsequent scheduled garbage cleanup task cannot be executed
normally
Key: KYLIN-5745
URL: https://issues.apache.org/jira/browse/KYLIN-5745
Project: Kylin
Issue Type: Bug
Affects Versions: 5.0-beta
Reporter: zhong.zhu
Assignee: zhong.zhu
Fix For: 5.0.0
{*}Problem description{*}:
Timed garbage cleanup operation cannot be completed successfully
{*}Background{*}:
The customer found that Kylin has a large number of small files occupying hdfs
storage, we need to clean up, we check the customer's environment and found
that the timed garbage cleanup has not been completed properly, has been
timeout!
*Troubleshooting:*
After the check, it is found that the customer's garbage clearing is triggered
for the first time in the morning of 4.6 after KE is restarted on the night of
4.5. After this clearing operation is triggered, the thread of query history
has been deleted since then. As a result, subsequent periodic garbage clearing
tasks cannot be completed
Delete 2,000 rows of data at a time, one of the customer's projects need to
delete 550,000 query history, look at the kylin.log record, delete
time-consuming because of table locking problems lead to a delete operation
even reached more than 20 minutes!
The following record is that the main thread of garbage collection is waiting
for the query history cleaning to complete, but the query history cleaning has
not been completed, and then the main thread timeout and exit.
{code:shell}
2023-04-06T00:00:00,015 INFO [RoutineOpsWorker-287] service.ScheduleService :
execute task MetadataBackup with remaining time: 14399995 ms
2023-04-06T00:01:52,649 INFO [RoutineOpsWorker-287] service.ScheduleService :
execute task QueryHistoriesCleanup with remaining time: 14287361 ms
...
2023-04-06T04:00:00,012 WARN [DefaultTaskScheduler-3] service.ScheduleService
: Routine task execution timeout
java.util.concurrent.TimeoutException: null
at java.util.concurrent.FutureTask.get(FutureTask.java:205)
~[?:1.8.0_242]
at
org.apache.kylin.rest.service.ScheduleService.executeTask(ScheduleService.java:107)
~[kylin-job-service-5.0.0-ke-4.6.2.0.jar:?]
at
org.apache.kylin.rest.service.ScheduleService.routineTask(ScheduleService.java:77)
~[kylin-job-service-5.0.0-ke-4.6.2.0.jar:?]
at
org.apache.kylin.rest.service.ScheduleService$$FastClassBySpringCGLIB$$afbfc46c.invoke(<generated>)
~[kylin-job-service-5.0.0-ke-4.6.2.0.jar:?]
{code}
The following record is until the latest time provided by the log, after 9:00
pm the query history is still processing deletion, not with the termination of
the main thread
{code:shell}
2023-04-06T00:08:43,015 DEBUG [QueryHistoryCleanWorker-23145]
QueryHistoryMapper.selectByProject : <== Total: 12
2023-04-06T00:08:43,016 INFO [QueryHistoryCleanWorker-23145]
util.QueryHisStoreUtil : Query histories of project<CPIC_FRP> is less than the
maximum limit, so skip it.
2023-04-06T00:08:43,016 INFO [QueryHistoryCleanWorker-23145]
util.QueryHisStoreUtil : Query histories of project<CXAIMA> is less than the
maximum limit, so skip it.
2023-04-06T00:08:43,016 INFO [QueryHistoryCleanWorker-23145]
util.QueryHisStoreUtil : Query histories of project<CXCDC> is less than the
maximum limit, so skip it.
2023-04-06T00:08:43,016 INFO [QueryHistoryCleanWorker-23145]
util.QueryHisStoreUtil : Query histories of project<CXCRMS> is less than the
maximum limit, so skip it.
2023-04-06T00:08:43,017 INFO [QueryHistoryCleanWorker-23145]
util.QueryHisStoreUtil : Start to delete query histories that are beyond max
size for project<CXCZH>, records:1551669
...
2023-04-06T09:03:54,974 INFO [QueryHistoryCleanWorker-23145]
query.JdbcQueryHistoryStore : Delete 2000 row query history for project [CXCZH]
takes 938060 ms
2023-04-06T09:03:54,975 DEBUG [QueryHistoryCleanWorker-23145]
QueryHistoryMapper.delete : ==> Preparing: delete from
ke4_instance_query_history_realization where query_time < ? and project_name = ?
2023-04-06T09:03:54,975 DEBUG [QueryHistoryCleanWorker-23145]
QueryHistoryMapper.delete : ==> Parameters: 1678863450091(Long), CXCZH(String)
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)