[ https://issues.apache.org/jira/browse/CLOUDSTACK-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13731022#comment-13731022 ]
Min Chen commented on CLOUDSTACK-3568: -------------------------------------- Vijay, After examining the log more carefully, I found that the issue may reside elsewhere instead of WaitForTask api. If you do the following two grep on the ms: grep "job-112" management-server.log > job-112 grep "job-116" management-server.log > job-116 They represent two failed VM deployment jobs. From the two sub logs, you will notice that both failed with the same error in creating volume from template: job-112:2013-07-16 15:48:45,651 DEBUG [cloud.storage.VolumeManagerImpl] (Job-Executor-19:job-112 = [ e2351737-8cee-4541-bc0f-fb98600c43e6 ]) Unable to create Vol[114|vm=113|ROOT]:java.lang.RuntimeException: File [7caed722555d3f71879476c5e8d8d5ad] ROOT-120-120/ROOT-120-120.vmdk was not found job-116:2013-07-16 15:48:45,633 DEBUG [cloud.storage.VolumeManagerImpl] (Job-Executor-23:job-116 = [ 818da9c2-7b79-462a-8d49-799191dccd47 ]) Unable to create Vol[118|vm=118|ROOT]:java.lang.RuntimeException: File [7caed722555d3f71879476c5e8d8d5ad] ROOT-120-120/ROOT-120-120.vmdk was not found This is also indicating that something is wrong here. Why are two different vm deployment looking for the same root volume vmdk files? Based on volume id and vm id shown in the log, they should search for different vmdk files. We need to investigate why this is happening when agent commands (CopyCommand) are sent in parallel. > [Automation]Parallel deployment - Vmware - When deploying 30 parallel Vms , > we see ~15 Vms fail in org.apache.cloudstack.storage.command.CopyCommand. > ----------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: CLOUDSTACK-3568 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3568 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Components: Management Server, VMware > Affects Versions: 4.2.0 > Environment: Build from 4.2 > Reporter: Sangeetha Hariharan > Assignee: Min Chen > Priority: Blocker > Fix For: 4.2.0 > > Attachments: management-server.log > > > Parallel deployment - Vmware - When deploying 30 parallel Vms , we see ~15 > Vms fail in org.apache.cloudstack.storage.command.CopyCommand. > Steps to reproduce the problem: > Set up - Advanced zone with Vmware ESXI, 5.0.0 host. > Deploy 30 Vms in parallel in a network that is already implemented (router is > already running). > Only 9 vms were successfully deployed. > ~15 Vms failed in org.apache.cloudstack.storage.command.CopyCommand because > of "ROOT-106-104/ROOT-106-104.vmdk was not found" error. > mysql> select count(*),state,type from vm_instance where name like "hello%" > group by state,type ; > +----------+---------+------+ > | count(*) | state | type | > +----------+---------+------+ > | 42 | Error | User | > | 9 | Running | User | > +----------+---------+------+ > 2 rows in set (0.00 sec) > Following exception seen in management server logs: > 2013-07-16 15:47:55,099 ERROR [storage.resource.VmwareStorageProcessor] > (DirectAgent-71:10.223.57.66) CreateCommand failed due to Exception: > java.lang.RuntimeException > Message: File [7caed722555d3f71879476c5e8d8d5ad] > ROOT-106-104/ROOT-106-104.vmdk was not found > java.lang.RuntimeException: File [7caed722555d3f71879476c5e8d8d5ad] > ROOT-106-104/ROOT-106-104.vmdk was not found > at > com.cloud.hypervisor.vmware.util.VmwareClient.waitForTask(VmwareClient.java:290) > at > com.cloud.hypervisor.vmware.mo.HostDatastoreBrowserMO.searchDatastore(HostDatastoreBrowserMO.java:57) > at > com.cloud.hypervisor.vmware.mo.HostDatastoreBrowserMO.searchDatastore(HostDatastoreBrowserMO.java:78) > at > com.cloud.hypervisor.vmware.mo.DatastoreMO.folderExists(DatastoreMO.java:319) > at > com.cloud.storage.resource.VmwareStorageProcessor.createVMLinkedClone(VmwareStorageProcessor.java:279) > at > com.cloud.storage.resource.VmwareStorageProcessor.cloneVolumeFromBaseTemplate(VmwareStorageProcessor.java:393) > at > com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.execute(StorageSubsystemCommandHandlerBase.java:73) > at > com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.handleStorageCommands(StorageSubsystemCommandHandlerBase.java:49) > at > com.cloud.hypervisor.vmware.resource.VmwareResource.executeRequest(VmwareResource.java:565) > at > com.cloud.agent.manager.DirectAgentAttache$Task.run(DirectAgentAttache.java:186) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:165) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:679) > 2013-07-16 15:47:55,100 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-71:null) Seq 1-227737729: Response Received: > 2013-07-16 15:47:55,100 DEBUG [agent.transport.Request] (DirectAgent-71:null) > Seq 1-227737729: Processing: { Ans: , MgmtId: 7647994577963, via: 1, Ver: > v1, Flags: 10, > [{"org.apache.cloudstack.storage.command.CopyCmdAnswer":{"result":false,"details":"java.lang.RuntimeException: > File [7caed722555d3f71879476c5e8d8d5ad] ROOT-106-104/ROOT-106-104.vmdk was > not found","wait":0}}] } > 2013-07-16 15:47:55,100 DEBUG [agent.transport.Request] > (Job-Executor-8:job-101 = [ cad3c4ea-3d40-46f4-bce5-26d60d4af7c1 ]) Seq > 1-227737729: Received: { Ans: , MgmtId: 7647994577963, via: 1, Ver: v1, > Flags: 10, { CopyCmdAnswer } } > mysql> select count(*),state,type from vm_instance where name like "hello%" > group by state,type ; > +----------+---------+------+ > | count(*) | state | type | > +----------+---------+------+ > | 42 | Error | User | > | 9 | Running | User | > +----------+---------+------+ > 2 rows in set (0.00 sec) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira