Min Chen created CLOUDSTACK-7749:
------------------------------------

             Summary: AsyncJob GC thread cannot purge queue items that have 
been blocking for too long if exception is thrown in expunging some unfinished 
or completed old jobs, this will make some future jobs stuck.
                 Key: CLOUDSTACK-7749
                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7749
             Project: CloudStack
          Issue Type: Bug
      Security Level: Public (Anyone can view this level - this is the default.)
          Components: Management Server
    Affects Versions: 4.3.0
            Reporter: Min Chen
            Assignee: Min Chen
            Priority: Critical
             Fix For: 4.5.0




AsyncJobManager has a GC thread to clean up some unfinished job and complete 
jobs that are too old. In this same thread, we are also forcefully cancel 
blocking queue items if they've been staying there for too long. Currently if 
there is an exception thrown in expunging one job, for example, like the one 
below:

2014-10-14 17:57:26,347 INFO [o.a.c.f.j.i.AsyncJobManagerImpl] 
(AsyncJobMgr-Heartbeat-1:ctx-67ae4177) Expunging unfinished job AsyncJobVO
{id:18443, userId: 60, accountId: 69, instanceType: null, instanc eId: null, 
cmd: com.cloud.vm.VmWorkStart, cmdInfo: 
rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIACkoABGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljb
 
HVzdGVySWR0ABBMamF2YS9sYW5nL0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS91dGlsL
 
01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-AAN4cAAAAAAAAABFAAAAAAAAADwAAAAAAAACdXQAGVZpcnR1YWxNYWNoaW5lTWFuY
 
WdlckltcGwAAAAAAAAAAHBwcHBwcHBzcgARamF2YS51dGlsLkhhc2hNYXAFB9rBwxZg0QMAAkYACmxvYWRGYWN0b3JJAAl0aHJlc2hvbGR4cD9AAAAAAAAMdwgAAAAQAAAAAXQAClZtUGFzc3dvcmR0ABxyTzBBQlhRQURuTmhkbVZrWDNCaGMzTjNiM0preHA,
 cmdVersi on: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: 
null, initMsid: 244536014864905, completeMsid: null, lastUpdated: null, 
lastPolled: null, created: Wed Aug 20 18:14:13 EDT 2014}

2014-10-14 17:57:26,350 DEBUG [c.c.u.d.T.Transaction] 
(AsyncJobMgr-Heartbeat-1:ctx-67ae4177) Rolling back the transaction: Time = 2 
Name = AsyncJobMgr-Heartbeat-1; called by -TransactionLegacy.rollback:8
96-TransactionLegacy.removeUpTo:839-TransactionLegacy.close:663-TransactionContextInterceptor.invoke:35-ReflectiveMethodInvocation.proceed:161-ExposeInvocationInterceptor.invoke:91-ReflectiveMethodInvocat
ion.proceed:172-JdkDynamicAopProxy.invoke:204-$Proxy151.expunge:-1-AsyncJobManagerImpl$8.doInTransactionWithoutResult:802-TransactionCallbackNoReturn.doInTransaction:25-Transaction$2.doInTransaction:49
2014-10-14 17:57:26,368 ERROR [o.a.c.f.j.i.AsyncJobManagerImpl] 
(AsyncJobMgr-Heartbeat-1:ctx-67ae4177) Unexpected exception when trying to 
execute queue item,
com.cloud.utils.exception.CloudRuntimeException: DB Exception on: 
com.mysql.jdbc.JDBC4PreparedStatement@39fdfb52: DELETE FROM async_job WHERE 
async_job.id= 18443
at com.cloud.utils.db.GenericDaoBase.expunge(GenericDaoBase.java:1144)
at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at 
org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
at 
org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at 
com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:33)
at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
at 
org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at 
org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
at com.sun.proxy.$Proxy151.expunge(Unknown Source)
at 
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$8.doInTransactionWithoutResult(AsyncJobManagerImpl.java:802)
at 
com.cloud.utils.db.TransactionCallbackNoReturn.doInTransaction(TransactionCallbackNoReturn.java:25)
at com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:49)
at com.cloud.utils.db.Transaction.execute(Transaction.java:37)
at com.cloud.utils.db.Transaction.execute(Transaction.java:46)
at 
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl.expungeAsyncJob(AsyncJobManagerImpl.java:799)
at 
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$7.reallyRun(AsyncJobManagerImpl.java:762)
at 
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$7.runInContext(AsyncJobManagerImpl.java:738)
at 
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:50)
at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at 
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:47)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)
Caused by: 
com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: 
Cannot delete or update a parent row: a foreign key constraint fails 
(`cloud`.`async_job_join_map`, CONSTRAINT `fk_async_job_join_map__join_job_id` 
FOREIGN KEY (`join_job_id`) REFERENCES `async_job` (`id`))

all the following purge queue item action will not get chances to run at all. 
This will cause potential job stuck issues since we are serializing VM 
operations per VM.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to