[ https://issues.apache.org/jira/browse/CLOUDSTACK-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ove Ewerlid updated CLOUDSTACK-2568: ------------------------------------ Attachment: (was: mysqldump-cloud.sql) > ACS41 regression in storage subsystem (seen with local storage and 2 or more > hosts) > ----------------------------------------------------------------------------------- > > Key: CLOUDSTACK-2568 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-2568 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Components: Management Server > Affects Versions: 4.1.0 > Environment: RHES64 as in OEL64. Install from RPM built from latest > GIT on OEL64. > 2 or more KVM hypervisors with local storage in one cluster that has one > primary NFS storage pool. > Reporter: Ove Ewerlid > Assignee: Prachi Damle > Priority: Blocker > Fix For: 4.1.0 > > Attachments: var-log-cloudstack-management.tar.gz > > > ACS402 works with no issues when tested in exactly the setup where ACS41 > fails. > Identical configuration (the same setup program is used for testing both > versions). > In ACS410 startVM fails if and only if the advanceStart: log line picks a > poolID that is not valid. > E.g., the poolID reported in this logline appears random across a large > number of tests. > If a poolID that can not be reached by the host selected for deployment, the > startVM fails. > This is blocking upgrade from 4.0 to 4.1 since there is no reliable way to > start VMs that have been deployed. If a deployed VM fails to start, giving > the startVM command multiple times, will eventually make the VM start. > The more hosts there are, the less likely it is a startVM will succeed. It is > less likely that the poolID is correct. > The below log portion conveys how the VM has a "correct" Deployment > Destination reported and the advanceStart reports a poolID that is different > and since the selected hypervisor can not reach the poolID the startVM fails. > The bug never triggers if there is only one KVM with local storage since the > poolId can not be wrong, there is just one (and the NFS pool is always valid). > ------------------- > 2013-05-20 06:49:29,477 DEBUG [cloud.deploy.FirstFitPlanner] > (Job-Executor-34:job-34) Found a potential host id: 1 name: > vm3-net0-s0-14.test.devops and associated storage pools for this VM > 2013-05-20 06:49:29,478 DEBUG [cloud.deploy.FirstFitPlanner] > (Job-Executor-34:job-34) Returning Deployment Destination: > Dest[Zone(Id)-Pod(Id)-Cluster(Id)-Host(Id)-Storage(Volume(Id|Type-->Pool(Id))] > : Dest[Zone(1)-Pod(1)-Cluster(1)-Host\ > (1)-Storage(Volume(10|ROOT-->Pool(200), Volume(11|DATADISK-->Pool(200))] > 2013-05-20 06:49:29,495 DEBUG [cloud.capacity.CapacityManagerImpl] > (Job-Executor-34:job-34) VM state transitted from :Stopped to Starting with > event: StartRequestedvm's original host id: null new host id: null host id > before state trans\ > ition: null > 2013-05-20 06:49:29,495 DEBUG [cloud.vm.VirtualMachineManagerImpl] > (Job-Executor-34:job-34) Successfully transitioned to start state for > VM[User|testvm-a] reservation id = e644d55e-3627-4395-9f89-639e6fc2f261 > 2013-05-20 06:49:29,502 DEBUG [cloud.vm.VirtualMachineManagerImpl] > (Job-Executor-34:job-34) Trying to deploy VM, vm has dcId: 1 and podId: null > 2013-05-20 06:49:29,502 DEBUG [cloud.vm.VirtualMachineManagerImpl] > (Job-Executor-34:job-34) advanceStart: DeploymentPlan is provided, using > dcId:1, podId: 1, clusterId: 1, hostId: 1, poolId: 201 > 2013-05-20 06:49:29,502 DEBUG [cloud.vm.VirtualMachineManagerImpl] > (Job-Executor-34:job-34) Deploy avoids pods: null, clusters: null, hosts: null > 2013-05-20 06:49:29,504 DEBUG [cloud.deploy.FirstFitPlanner] > (Job-Executor-34:job-34) DeploymentPlanner allocation algorithm: random > 2013-05-20 06:49:29,504 DEBUG [cloud.deploy.FirstFitPlanner] > (Job-Executor-34:job-34) Trying to allocate a host and storage pools from > dc:1, pod:1,cluster:1, requested cpu: 4000, requested ram: 2147483648 > 2013-05-20 06:49:29,504 DEBUG [cloud.deploy.FirstFitPlanner] > (Job-Executor-34:job-34) Is ROOT volume READY (pool already allocated)?: Yes > 2013-05-20 06:49:29,504 DEBUG [cloud.deploy.FirstFitPlanner] > (Job-Executor-34:job-34) DeploymentPlan has host_id specified, making no > checks on this host, looks like admin test: 1 > 2013-05-20 06:49:29,505 DEBUG [cloud.deploy.FirstFitPlanner] > (Job-Executor-34:job-34) Looking for suitable pools for this host under zone: > 1, pod: 1, cluster: 1 > 2013-05-20 06:49:29,506 DEBUG [cloud.deploy.FirstFitPlanner] > (Job-Executor-34:job-34) Checking suitable pools for volume (Id, Type): > (10,ROOT) > 2013-05-20 06:49:29,506 DEBUG [cloud.deploy.FirstFitPlanner] > (Job-Executor-34:job-34) Volume has pool(201) already allocated, checking if > pool can be reused, poolId: null > 2013-05-20 06:49:29,506 DEBUG [cloud.deploy.FirstFitPlanner] > (Job-Executor-34:job-34) finding pool by id '201' > 2013-05-20 06:49:29,507 DEBUG [cloud.deploy.FirstFitPlanner] > (Job-Executor-34:job-34) Planner need not allocate a pool for this volume > since its READY > 2013-05-20 06:49:29,507 DEBUG [cloud.deploy.FirstFitPlanner] > (Job-Executor-34:job-34) Checking suitable pools for volume (Id, Type): > (11,DATADISK) > 2013-05-20 06:49:29,507 DEBUG [cloud.deploy.FirstFitPlanner] > (Job-Executor-34:job-34) Volume has pool(201) already allocated, checking if > pool can be reused, poolId: null > 2013-05-20 06:49:29,507 DEBUG [cloud.deploy.FirstFitPlanner] > (Job-Executor-34:job-34) finding pool by id '201' > 2013-05-20 06:49:29,508 DEBUG [cloud.deploy.FirstFitPlanner] > (Job-Executor-34:job-34) Planner need not allocate a pool for this volume > since its READY > 2013-05-20 06:49:29,508 DEBUG [cloud.deploy.FirstFitPlanner] > (Job-Executor-34:job-34) Trying to find a potenial host and associated > storage pools from the suitable host/pool lists for this VM > 2013-05-20 06:49:29,508 DEBUG [cloud.deploy.FirstFitPlanner] > (Job-Executor-34:job-34) Checking if host: 1 can access any suitable storage > pool for volume: DATADISK > 2013-05-20 06:49:29,508 DEBUG [cloud.deploy.FirstFitPlanner] > (Job-Executor-34:job-34) Host: 1 cannot access pool: 201 > 2013-05-20 06:49:29,508 DEBUG [cloud.deploy.FirstFitPlanner] > (Job-Executor-34:job-34) Could not find a potential host that has associated > storage pools from the suitable host/pool lists for this VM > 2013-05-20 06:49:29,508 DEBUG [cloud.deploy.FirstFitPlanner] > (Job-Executor-34:job-34) Cannnot deploy to specified host, returning. > 2013-05-20 06:49:29,524 DEBUG [cloud.capacity.CapacityManagerImpl] > (Job-Executor-34:job-34) VM state transitted from :Starting to Stopped with > event: OperationFailedvm's original host id: null new host id: null host id > before state tran\ > sition: null > 2013-05-20 06:49:29,533 ERROR [cloud.async.AsyncJobManagerImpl] > (Job-Executor-34:job-34) Unexpected exception while executing > org.apache.cloudstack.api.command.user.vm.StartVMCmd > com.cloud.exception.InsufficientServerCapacityException: Unable to create a > deployment for VM[User|testvm-a]Scope=interface com.cloud.dc.DataCenter; id=1 > at > com.cloud.vm.VirtualMachineManagerImpl.advanceStart(VirtualMachineManagerImpl.java:728) > at > com.cloud.vm.VirtualMachineManagerImpl.start(VirtualMachineManagerImpl.java:471) > at > org.apache.cloudstack.engine.cloud.entity.api.VMEntityManagerImpl.deployVirtualMachine(VMEntityManagerImpl.java:212) > at > org.apache.cloudstack.engine.cloud.entity.api.VirtualMachineEntityImpl.deploy(VirtualMachineEntityImpl.java:209) > at > com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:3865) > at > com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:2573) > at > com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125) > at > org.apache.cloudstack.api.command.user.vm.StartVMCmd.execute(StartVMCmd.java:120) > at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:162) > at > com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:437) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira