[ https://issues.apache.org/jira/browse/CLOUDSTACK-8425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518717#comment-14518717 ]
Koushik Das commented on CLOUDSTACK-8425: ----------------------------------------- Internal jobs get scheduled from 2 different places: 1. In AsyncJobManagerImpl.java there is a generic scheduler which schedules jobs in batches protected void reallyRun() { try { List<SyncQueueItemVO> l = _queueMgr.dequeueFromAny(getMsid(), MAX_ONETIME_SCHEDULE_SIZE); if (l != null && l.size() > 0) { for (SyncQueueItemVO item : l) { if (s_logger.isDebugEnabled()) { s_logger.debug("Execute sync-queue item: " + item.toString()); } executeQueueItem(item, false); } } 2. There is another place in AsyncJobManagerImpl.java, which schedules the next job in a specific VM queue. Look at the checkQueue() in the snippet below. } finally { // guard final clause as well try { if (job.getSyncSource() != null) { // here check queue item one more time to double make sure that queue item is removed in case of any uncaught exception _queueMgr.purgeItem(job.getSyncSource().getId()); checkQueue(job.getSyncSource().getQueueId()); } try { JmxUtil.unregisterMBean("AsyncJobManager", "Active Job " + job.getId()); } catch (Exception e) { // Due to co-existence of normal-dispatched-job/wakeup-dispatched-job, MBean unregister() call // is expected to fail under situations if (s_logger.isTraceEnabled()) s_logger.trace("Unable to unregister job " + job.getId() + " to JMX monitoring due to exception " + ExceptionUtil.toString(e)); } Since there is no synchronisation between these 2 places, in certain scenarios both may end up in scheduling the same job. > Job framework: Same internal job can execute simultaneously > ----------------------------------------------------------- > > Key: CLOUDSTACK-8425 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8425 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Components: Management Server > Affects Versions: 4.5.0 > Reporter: Koushik Das > Assignee: Koushik Das > Priority: Critical > Fix For: 4.6.0 > > > Same internal work job can be executed simultaneously by 2 worker threads. > The below logs show that the same work job-24152 is executed by threads > Work-Job-Executor-84 and Work-Job-Executor-85. This can lead to undesired > behaviour subsequently. > 2015-03-17 14:43:34,289 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (API-Job-Executor-12:ctx-4fa08ed6 job-24151 ctx-f7582544) (logid:26c78006) > Sync job-24152 execution on object VmWorkJobQueue.1396 > 2015-03-17 14:43:35,420 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (AsyncJobMgr-Heartbeat-1:ctx-4dc28af0) (logid:b0db94c2) Schedule queued > job-24152 > 2015-03-17 14:43:35,422 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (Work-Job-Executor-83:ctx-62660b07 job-24149/job-24150) (logid:97d686aa) > Schedule queued job-24152 > 2015-03-17 14:43:35,426 INFO [o.a.c.f.j.i.AsyncJobMonitor] > (Work-Job-Executor-84:ctx-9568eb6e job-24151/job-24152) (logid:d28187a4) Add > job-24152 into job monitoring > 2015-03-17 14:43:35,428 INFO [o.a.c.f.j.i.AsyncJobMonitor] > (Work-Job-Executor-85:ctx-5a06c91e job-24151/job-24152) (logid:e55e3f25) Add > job-24152 into job monitoring > 2015-03-17 14:43:35,432 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (Work-Job-Executor-84:ctx-9568eb6e job-24151/job-24152) (logid:26c78006) > Executing AsyncJobVO > {id:24152, userId: 4, accountId: 4, instanceType: null, instanceId: null, > cmd: com.cloud.vm.VmWorkDetachVolume, cmdInfo: > rO0ABXNyAB9jb20uY2xvdWQudm0uVm1Xb3JrRGV0YWNoVm9sdW1lhvReA-s81AMCAAFMAAh2b2x1bWVJZHQAEExqYXZhL2xhbmcvTG9uZzt4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1ldAASTGphdmEvbGFuZy9TdHJpbmc7eHAAAAAAAAAABAAAAAAAAAAEAAAAAAAABXR0ABRWb2x1bWVBcGlTZXJ2aWNlSW1wbHNyAA5qYXZhLmxhbmcuTG9uZzuL5JDMjyPfAgABSgAFdmFsdWV4cgAQamF2YS5sYW5nLk51bWJlcoaslR0LlOCLAgAAeHAAAAAAAAALWw, > cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: > null, initMsid: 104230524961057, completeMsid: null, lastUpdated: null, > lastPolled: null, created: Tue Mar 17 14:43:34 EDT 2015} > 2015-03-17 14:43:35,432 DEBUG [c.c.v.VmWorkJobDispatcher] > (Work-Job-Executor-84:ctx-9568eb6e job-24151/job-24152) (logid:26c78006) Run > VM work job: com.cloud.vm.VmWorkDetachVolume for VM 1396, job origin: 24151 > 2015-03-17 14:43:35,433 DEBUG [c.c.v.VmWorkJobHandlerProxy] > (Work-Job-Executor-84:ctx-9568eb6e job-24151/job-24152 ctx-0f9cd10a) > (logid:26c78006) Execute VM work job: com.cloud.vm.VmWorkDetachVolume > {"volumeId":2907,"userId":4,"accountId":4,"vmId":1396,"handlerName":"VolumeApiServiceImpl"} > 2015-03-17 14:43:35,435 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (Work-Job-Executor-85:ctx-5a06c91e job-24151/job-24152) (logid:26c78006) > Executing AsyncJobVO > {id:24152, userId: 4, accountId: 4, instanceType: null, instanceId: null, > cmd: com.cloud.vm.VmWorkDetachVolume, cmdInfo: > rO0ABXNyAB9jb20uY2xvdWQudm0uVm1Xb3JrRGV0YWNoVm9sdW1lhvReA-s81AMCAAFMAAh2b2x1bWVJZHQAEExqYXZhL2xhbmcvTG9uZzt4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1ldAASTGphdmEvbGFuZy9TdHJpbmc7eHAAAAAAAAAABAAAAAAAAAAEAAAAAAAABXR0ABRWb2x1bWVBcGlTZXJ2aWNlSW1wbHNyAA5qYXZhLmxhbmcuTG9uZzuL5JDMjyPfAgABSgAFdmFsdWV4cgAQamF2YS5sYW5nLk51bWJlcoaslR0LlOCLAgAAeHAAAAAAAAALWw, > cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: > null, initMsid: 104230524961057, completeMsid: null, lastUpdated: null, > lastPolled: null, created: Tue Mar 17 14:43:34 EDT 2015} -- This message was sent by Atlassian JIRA (v6.3.4#6332)