[ https://issues.apache.org/jira/browse/CLOUDSTACK-7749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224522#comment-14224522 ]
Rohit Yadav commented on CLOUDSTACK-7749: ----------------------------------------- [~minchen07] can you help backport this to 4.3 or advise, thanks. > AsyncJob GC thread cannot purge queue items that have been blocking for too > long if exception is thrown in expunging some unfinished or completed old > jobs, this will make some future jobs stuck. > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: CLOUDSTACK-7749 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7749 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Components: Management Server > Affects Versions: 4.3.0 > Reporter: Min Chen > Assignee: Min Chen > Priority: Critical > Fix For: 4.5.0 > > > AsyncJobManager has a GC thread to clean up some unfinished job and complete > jobs that are too old. In this same thread, we are also forcefully cancel > blocking queue items if they've been staying there for too long. Currently if > there is an exception thrown in expunging one job, for example, like the one > below: > 2014-10-14 17:57:26,347 INFO [o.a.c.f.j.i.AsyncJobManagerImpl] > (AsyncJobMgr-Heartbeat-1:ctx-67ae4177) Expunging unfinished job AsyncJobVO > {id:18443, userId: 60, accountId: 69, instanceType: null, instanc eId: null, > cmd: com.cloud.vm.VmWorkStart, cmdInfo: > rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIACkoABGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljb > > HVzdGVySWR0ABBMamF2YS9sYW5nL0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS91dGlsL > > 01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-AAN4cAAAAAAAAABFAAAAAAAAADwAAAAAAAACdXQAGVZpcnR1YWxNYWNoaW5lTWFuY > > WdlckltcGwAAAAAAAAAAHBwcHBwcHBzcgARamF2YS51dGlsLkhhc2hNYXAFB9rBwxZg0QMAAkYACmxvYWRGYWN0b3JJAAl0aHJlc2hvbGR4cD9AAAAAAAAMdwgAAAAQAAAAAXQAClZtUGFzc3dvcmR0ABxyTzBBQlhRQURuTmhkbVZrWDNCaGMzTjNiM0preHA, > cmdVersi on: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, > result: null, initMsid: 244536014864905, completeMsid: null, lastUpdated: > null, lastPolled: null, created: Wed Aug 20 18:14:13 EDT 2014} > 2014-10-14 17:57:26,350 DEBUG [c.c.u.d.T.Transaction] > (AsyncJobMgr-Heartbeat-1:ctx-67ae4177) Rolling back the transaction: Time = 2 > Name = AsyncJobMgr-Heartbeat-1; called by -TransactionLegacy.rollback:8 > 96-TransactionLegacy.removeUpTo:839-TransactionLegacy.close:663-TransactionContextInterceptor.invoke:35-ReflectiveMethodInvocation.proceed:161-ExposeInvocationInterceptor.invoke:91-ReflectiveMethodInvocat > ion.proceed:172-JdkDynamicAopProxy.invoke:204-$Proxy151.expunge:-1-AsyncJobManagerImpl$8.doInTransactionWithoutResult:802-TransactionCallbackNoReturn.doInTransaction:25-Transaction$2.doInTransaction:49 > 2014-10-14 17:57:26,368 ERROR [o.a.c.f.j.i.AsyncJobManagerImpl] > (AsyncJobMgr-Heartbeat-1:ctx-67ae4177) Unexpected exception when trying to > execute queue item, > com.cloud.utils.exception.CloudRuntimeException: DB Exception on: > com.mysql.jdbc.JDBC4PreparedStatement@39fdfb52: DELETE FROM async_job WHERE > async_job.id= 18443 > at com.cloud.utils.db.GenericDaoBase.expunge(GenericDaoBase.java:1144) > at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:622) > at > org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) > at > com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:33) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161) > at > org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) > at > org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) > at com.sun.proxy.$Proxy151.expunge(Unknown Source) > at > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$8.doInTransactionWithoutResult(AsyncJobManagerImpl.java:802) > at > com.cloud.utils.db.TransactionCallbackNoReturn.doInTransaction(TransactionCallbackNoReturn.java:25) > at com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:49) > at com.cloud.utils.db.Transaction.execute(Transaction.java:37) > at com.cloud.utils.db.Transaction.execute(Transaction.java:46) > at > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl.expungeAsyncJob(AsyncJobManagerImpl.java:799) > at > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$7.reallyRun(AsyncJobManagerImpl.java:762) > at > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$7.runInContext(AsyncJobManagerImpl.java:738) > at > org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:50) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) > at > org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:47) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:701) > Caused by: > com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: > Cannot delete or update a parent row: a foreign key constraint fails > (`cloud`.`async_job_join_map`, CONSTRAINT > `fk_async_job_join_map__join_job_id` FOREIGN KEY (`join_job_id`) REFERENCES > `async_job` (`id`)) > all the following purge queue item action will not get chances to run at all. > This will cause potential job stuck issues since we are serializing VM > operations per VM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)