[
https://issues.apache.org/jira/browse/CLOUDSTACK-8949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Farrington updated CLOUDSTACK-8949:
----------------------------------------
Description:
when creating a "MigrateVolumeCommand" a timeout should be set, this way the
ACS would not use the default value of parameter "wait". That timeout value
should be the same as the one used on CitrixResourceBase and its children to
control the migration of volumes.
Environment: XenServer 6.2, multiple pre-setup primary storage locations
that are iscsi SRs in XenServer.
Summary of actions: Attempt to move a volume that is attached to a running
instance to another primary storage location.
What *I* assume should happen: Cloudstack communicates with the host of the
VM in question and issues a xe vdi-pool-migrate. The command is synchronous
and doesn't return control back until it has performed the action. Cloudstack
should know this is a synchronous call and wait the configured amount of time
for the migration to be performed.
What *I* think is happening(with some education from Rafael Weingärtner):
There are multiple timers being set in the process (please reference
{noformat}
2015-10-12 16:41:20,456 DEBUG [c.c.a.t.Request] (Job-Executor-63:ctx-f7b6817d
ctx-c6b92515) Seq 38-996939857: Sending { Cmd , MgmtId: 42756806312036, via:
38(xen-nc-bc2b7), Ver: v1, Flags: 100111,
[{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
}
2015-10-12 16:41:20,456 DEBUG [c.c.a.t.Request] (Job-Executor-63:ctx-f7b6817d
ctx-c6b92515) Seq 38-996939857: Executing: { Cmd , MgmtId: 42756806312036,
via: 38(xen-nc-bc2b7), Ver: v1, Flags: 100111,
[{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
}
2015-10-12 16:41:20,457 DEBUG [c.c.a.m.DirectAgentAttache]
(DirectAgent-494:ctx-5aac3f29) Seq 38-996939857: Executing request
2015-10-12 17:41:20,457 DEBUG [c.c.a.m.AgentAttache]
(Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting some more
time because this is the current command
2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache]
(Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting some more
time because this is the current command
2015-10-12 18:41:20,457 INFO [c.c.u.e.CSExceptionErrorCode]
(Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Could not find exception:
com.cloud.exception.OperationTimedoutException in error code list for exceptions
2015-10-12 18:41:20,465 WARN [c.c.a.m.AgentAttache]
(Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Timed out on Seq
38-996939857: { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1,
Flags: 100111,
[{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
}
2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache]
(Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Cancelling.
2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache]
(Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: No more commands
found
2015-10-12 18:41:20,465 DEBUG [o.a.c.s.RemoteHostEndPoint]
(Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Failed to send command, due to
Agent:38, com.cloud.exception.OperationTimedoutException: Commands 996939857 to
Host 38 timed out after 7200
2015-10-12 18:41:20,471 DEBUG [o.a.c.s.m.AncientDataMotionStrategy]
(Job-Executor-63:ctx-f7b6817d ctx-c6b92515) copy failed
com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due to
Agent:38, com.cloud.exception.OperationTimedoutException: Commands 996939857 to
Host 38 timed out after 7200
at
org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:116)
at
org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.migrateVolumeToPool(AncientDataMotionStrategy.java:382)
at
org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.copyAsync(AncientDataMotionStrategy.java:421)
at
org.apache.cloudstack.storage.motion.DataMotionServiceImpl.copyAsync(DataMotionServiceImpl.java:70)
at
org.apache.cloudstack.storage.volume.VolumeServiceImpl.migrateVolume(VolumeServiceImpl.java:931)
at
com.cloud.storage.VolumeApiServiceImpl.liveMigrateVolume(VolumeApiServiceImpl.java:1680)
at
com.cloud.storage.VolumeApiServiceImpl.orchestrateMigrateVolume(VolumeApiServiceImpl.java:1666)
at
com.cloud.storage.VolumeApiServiceImpl.migrateVolume(VolumeApiServiceImpl.java:1622)
at sun.reflect.GeneratedMethodAccessor335.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at
org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at
org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at
org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
at com.sun.proxy.$Proxy196.migrateVolume(Unknown Source)
at
org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd.execute(MigrateVolumeCmd.java:103)
at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:161)
at
com.cloud.api.ApiAsyncJobDispatcher.runJobInContext(ApiAsyncJobDispatcher.java:109)
at
com.cloud.api.ApiAsyncJobDispatcher$1.run(ApiAsyncJobDispatcher.java:66)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at
com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:63)
at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:509)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)
2015-10-12 18:41:20,479 WARN [o.a.c.s.d.ObjectInDataStoreManagerImpl]
(Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Unsupported data object (VOLUME,
org.apache.cloudstack.storage.datastore.PrimaryDataStoreImpl@4fa7a45f), no need
to delete from object in store ref table
2015-10-12 18:41:20,479 DEBUG [c.c.s.VolumeApiServiceImpl]
(Job-Executor-63:ctx-f7b6817d ctx-c6b92515) migrate volume
failed:com.cloud.utils.exception.CloudRuntimeException: Failed to send command,
due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands
996939857 to Host 38 timed out after 7200
2015-10-12 18:41:20,480 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
(Job-Executor-63:ctx-f7b6817d) Complete async job-5257, jobStatus: FAILED,
resultCode: 530, result:
org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530,"errortext":"Failed
to migrate volume"}
2015-10-12 18:41:20,486 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
(Job-Executor-63:ctx-f7b6817d) Done executing
org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd for job-5257
2015-10-12 18:41:20,489 INFO [o.a.c.f.j.i.AsyncJobMonitor]
(Job-Executor-63:ctx-f7b6817d) Remove job-5257 from job monitoring
{noformat}
was:
when creating a "MigrateVolumeCommand" a timeout should be set, this way the
ACS would not use the default value of parameter "wait". That timeout value
should be the same as the one used on CitrixResourceBase and its children to
control the migration of volumes.
> Job timeout caused by improper/inconsistent application of wait parameters
> --------------------------------------------------------------------------
>
> Key: CLOUDSTACK-8949
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8949
> Project: CloudStack
> Issue Type: Bug
> Security Level: Public(Anyone can view this level - this is the
> default.)
> Components: Hypervisor Controller
> Affects Versions: 4.3.0
> Environment: XenServer 6.2
> Reporter: Ryan Farrington
> Priority: Minor
>
> when creating a "MigrateVolumeCommand" a timeout should be set, this way the
> ACS would not use the default value of parameter "wait". That timeout value
> should be the same as the one used on CitrixResourceBase and its children to
> control the migration of volumes.
> Environment: XenServer 6.2, multiple pre-setup primary storage locations
> that are iscsi SRs in XenServer.
> Summary of actions: Attempt to move a volume that is attached to a
> running instance to another primary storage location.
> What *I* assume should happen: Cloudstack communicates with the host of
> the VM in question and issues a xe vdi-pool-migrate. The command is
> synchronous and doesn't return control back until it has performed the
> action. Cloudstack should know this is a synchronous call and wait the
> configured amount of time for the migration to be performed.
> What *I* think is happening(with some education from Rafael Weingärtner):
> There are multiple timers being set in the process (please reference
> {noformat}
> 2015-10-12 16:41:20,456 DEBUG [c.c.a.t.Request] (Job-Executor-63:ctx-f7b6817d
> ctx-c6b92515) Seq 38-996939857: Sending { Cmd , MgmtId: 42756806312036, via:
> 38(xen-nc-bc2b7), Ver: v1, Flags: 100111,
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
> }
> 2015-10-12 16:41:20,456 DEBUG [c.c.a.t.Request] (Job-Executor-63:ctx-f7b6817d
> ctx-c6b92515) Seq 38-996939857: Executing: { Cmd , MgmtId: 42756806312036,
> via: 38(xen-nc-bc2b7), Ver: v1, Flags: 100111,
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
> }
> 2015-10-12 16:41:20,457 DEBUG [c.c.a.m.DirectAgentAttache]
> (DirectAgent-494:ctx-5aac3f29) Seq 38-996939857: Executing request
> 2015-10-12 17:41:20,457 DEBUG [c.c.a.m.AgentAttache]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting some
> more time because this is the current command
> 2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting some
> more time because this is the current command
> 2015-10-12 18:41:20,457 INFO [c.c.u.e.CSExceptionErrorCode]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Could not find exception:
> com.cloud.exception.OperationTimedoutException in error code list for
> exceptions
> 2015-10-12 18:41:20,465 WARN [c.c.a.m.AgentAttache]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Timed out on
> Seq 38-996939857: { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7),
> Ver: v1, Flags: 100111,
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
> }
> 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Cancelling.
> 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: No more
> commands found
> 2015-10-12 18:41:20,465 DEBUG [o.a.c.s.RemoteHostEndPoint]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Failed to send command, due to
> Agent:38, com.cloud.exception.OperationTimedoutException: Commands 996939857
> to Host 38 timed out after 7200
> 2015-10-12 18:41:20,471 DEBUG [o.a.c.s.m.AncientDataMotionStrategy]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) copy failed
> com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due
> to Agent:38, com.cloud.exception.OperationTimedoutException: Commands
> 996939857 to Host 38 timed out after 7200
> at
> org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:116)
> at
> org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.migrateVolumeToPool(AncientDataMotionStrategy.java:382)
> at
> org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.copyAsync(AncientDataMotionStrategy.java:421)
> at
> org.apache.cloudstack.storage.motion.DataMotionServiceImpl.copyAsync(DataMotionServiceImpl.java:70)
> at
> org.apache.cloudstack.storage.volume.VolumeServiceImpl.migrateVolume(VolumeServiceImpl.java:931)
> at
> com.cloud.storage.VolumeApiServiceImpl.liveMigrateVolume(VolumeApiServiceImpl.java:1680)
> at
> com.cloud.storage.VolumeApiServiceImpl.orchestrateMigrateVolume(VolumeApiServiceImpl.java:1666)
> at
> com.cloud.storage.VolumeApiServiceImpl.migrateVolume(VolumeApiServiceImpl.java:1622)
> at sun.reflect.GeneratedMethodAccessor335.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:622)
> at
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
> at
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
> at
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
> at com.sun.proxy.$Proxy196.migrateVolume(Unknown Source)
> at
> org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd.execute(MigrateVolumeCmd.java:103)
> at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:161)
> at
> com.cloud.api.ApiAsyncJobDispatcher.runJobInContext(ApiAsyncJobDispatcher.java:109)
> at
> com.cloud.api.ApiAsyncJobDispatcher$1.run(ApiAsyncJobDispatcher.java:66)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
> at
> com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:63)
> at
> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:509)
> at
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
> at
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:701)
> 2015-10-12 18:41:20,479 WARN [o.a.c.s.d.ObjectInDataStoreManagerImpl]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Unsupported data object (VOLUME,
> org.apache.cloudstack.storage.datastore.PrimaryDataStoreImpl@4fa7a45f), no
> need to delete from object in store ref table
> 2015-10-12 18:41:20,479 DEBUG [c.c.s.VolumeApiServiceImpl]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) migrate volume
> failed:com.cloud.utils.exception.CloudRuntimeException: Failed to send
> command, due to Agent:38, com.cloud.exception.OperationTimedoutException:
> Commands 996939857 to Host 38 timed out after 7200
> 2015-10-12 18:41:20,480 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> (Job-Executor-63:ctx-f7b6817d) Complete async job-5257, jobStatus: FAILED,
> resultCode: 530, result:
> org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530,"errortext":"Failed
> to migrate volume"}
> 2015-10-12 18:41:20,486 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> (Job-Executor-63:ctx-f7b6817d) Done executing
> org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd for job-5257
> 2015-10-12 18:41:20,489 INFO [o.a.c.f.j.i.AsyncJobMonitor]
> (Job-Executor-63:ctx-f7b6817d) Remove job-5257 from job monitoring
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)