Re: Timeout with live migration
No problem, you are welcome. It is nice to heat that worked for you. Sometimes, it is easier to know how things work looking at the source code directly. On Tue, Oct 13, 2015 at 9:37 PM, Ryan Farrington wrote: > Confirmed. We migrated a few TB worth of volumes without issue. Thanks > for helping nail this down! > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > Sent: Tuesday, October 13, 2015 6:12 PM > To: users@cloudstack.apache.org > Subject: [Questionable] Re: Timeout with live migration > > Nice, thanks. > Did that solve your problem? Did you migrate the volume? > > On Tue, Oct 13, 2015 at 7:00 PM, Ryan Farrington < > rfarring...@remitdata.com> > wrote: > > > Issue #1) Terrible parameter names > > https://issues.apache.org/jira/browse/CLOUDSTACK-8946 > > > > Issue #2) Wait value for MigrateVolume > > https://issues.apache.org/jira/browse/CLOUDSTACK-8949 > > > > > > > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > > Sent: Tuesday, October 13, 2015 3:43 PM > > To: users@cloudstack.apache.org > > Subject: [Questionable] Re: Timeout with live migration > > > > That is good. Now I can report to you what is in the code. > > > > Let’s start: > > First: when I looked at the problem at first time, I went straight to the > > class that sends commands to Xen, and there the ACs uses a parameter > > called: “migratewait” to control the timeout of command. You tried that, > > and you still were getting the timeout problem. > > > > That happened because despite that timeout, there is another point in ACS > > that controls the timeout of command that are send to hypervisor (not > just > > Xen this time), and in that point, it is used a parameter called, “wait” > as > > a default value to control timeouts of command. > > > > First conclusion, we have terrible parameter names ;) > > > > Second, when we create a “MigrateVolumeCommand” we should set a timeout > > value, this way the ACS would no use the default value of parameter > “wait”. > > That timeout value should be the same as the one used on > CitrixResourceBase > > and its children to control the migration of volumes. > > > > Can you report what happened to you in a Jira ticket and add my comments > > there? > > I think next Saturday I can have someone working on that for the next ACS > > release (4.7?), or even 4.6 if the PR gets accepted. > > > > Please send me the jira ticket as soon as you open it. > > > > On Tue, Oct 13, 2015 at 5:33 PM, Ryan Farrington < > > rfarring...@remitdata.com> > > wrote: > > > > > Looks like whatever change I made actually resulted in a change in > > > behavior. Prior to the change we were seeing a message every hour > > stating > > > that the job agent was waiting now we see it waited 2hours and 8 > minutes > > > without a peep before finishing. So making the change to the "wait" > > > parameter is what made the magic happen. > > > > > > > > > > > > 2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request] > > > (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Sending > { > > > Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags: > > > 100111, > > > > > > [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] > > > } > > > 2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request] > > > (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: > > Executing: { > > > Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags: > > > 100111, > > > > > > [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] > > > } > > > 2015-10-13
Re: Timeout with live migration
Confirmed. We migrated a few TB worth of volumes without issue. Thanks for helping nail this down! From: Rafael Weingärtner [rafaelweingart...@gmail.com] Sent: Tuesday, October 13, 2015 6:12 PM To: users@cloudstack.apache.org Subject: [Questionable] Re: Timeout with live migration Nice, thanks. Did that solve your problem? Did you migrate the volume? On Tue, Oct 13, 2015 at 7:00 PM, Ryan Farrington wrote: > Issue #1) Terrible parameter names > https://issues.apache.org/jira/browse/CLOUDSTACK-8946 > > Issue #2) Wait value for MigrateVolume > https://issues.apache.org/jira/browse/CLOUDSTACK-8949 > > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > Sent: Tuesday, October 13, 2015 3:43 PM > To: users@cloudstack.apache.org > Subject: [Questionable] Re: Timeout with live migration > > That is good. Now I can report to you what is in the code. > > Let’s start: > First: when I looked at the problem at first time, I went straight to the > class that sends commands to Xen, and there the ACs uses a parameter > called: “migratewait” to control the timeout of command. You tried that, > and you still were getting the timeout problem. > > That happened because despite that timeout, there is another point in ACS > that controls the timeout of command that are send to hypervisor (not just > Xen this time), and in that point, it is used a parameter called, “wait” as > a default value to control timeouts of command. > > First conclusion, we have terrible parameter names ;) > > Second, when we create a “MigrateVolumeCommand” we should set a timeout > value, this way the ACS would no use the default value of parameter “wait”. > That timeout value should be the same as the one used on CitrixResourceBase > and its children to control the migration of volumes. > > Can you report what happened to you in a Jira ticket and add my comments > there? > I think next Saturday I can have someone working on that for the next ACS > release (4.7?), or even 4.6 if the PR gets accepted. > > Please send me the jira ticket as soon as you open it. > > On Tue, Oct 13, 2015 at 5:33 PM, Ryan Farrington < > rfarring...@remitdata.com> > wrote: > > > Looks like whatever change I made actually resulted in a change in > > behavior. Prior to the change we were seeing a message every hour > stating > > that the job agent was waiting now we see it waited 2hours and 8 minutes > > without a peep before finishing. So making the change to the "wait" > > parameter is what made the magic happen. > > > > > > > > 2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request] > > (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Sending { > > Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags: > > 100111, > > > [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] > > } > > 2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request] > > (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: > Executing: { > > Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags: > > 100111, > > > [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] > > } > > 2015-10-13 13:21:48,282 DEBUG [c.c.a.m.DirectAgentAttache] > > (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Executing request > > 2015-10-13 15:27:13,396 DEBUG [c.c.a.m.DirectAgentAttache] > > (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Response Received: > > 2015-10-13 15:27:13,397 DEBUG [c.c.a.t.Request] > > (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Processing: { Ans: , > > MgmtId: 42756806312036, via: 38, Ver: v1, Flags: 110, > > > [{"com.cloud.agent.api.storage.MigrateVolumeAnswer":{"volumePath":"00db15be-3ccd-4648-8928-35ca90924d7c","result":true,"wait":0}}] > > } > > 2015-10-13 15:27:
Re: Timeout with live migration
Nice, thanks. Did that solve your problem? Did you migrate the volume? On Tue, Oct 13, 2015 at 7:00 PM, Ryan Farrington wrote: > Issue #1) Terrible parameter names > https://issues.apache.org/jira/browse/CLOUDSTACK-8946 > > Issue #2) Wait value for MigrateVolume > https://issues.apache.org/jira/browse/CLOUDSTACK-8949 > > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > Sent: Tuesday, October 13, 2015 3:43 PM > To: users@cloudstack.apache.org > Subject: [Questionable] Re: Timeout with live migration > > That is good. Now I can report to you what is in the code. > > Let’s start: > First: when I looked at the problem at first time, I went straight to the > class that sends commands to Xen, and there the ACs uses a parameter > called: “migratewait” to control the timeout of command. You tried that, > and you still were getting the timeout problem. > > That happened because despite that timeout, there is another point in ACS > that controls the timeout of command that are send to hypervisor (not just > Xen this time), and in that point, it is used a parameter called, “wait” as > a default value to control timeouts of command. > > First conclusion, we have terrible parameter names ;) > > Second, when we create a “MigrateVolumeCommand” we should set a timeout > value, this way the ACS would no use the default value of parameter “wait”. > That timeout value should be the same as the one used on CitrixResourceBase > and its children to control the migration of volumes. > > Can you report what happened to you in a Jira ticket and add my comments > there? > I think next Saturday I can have someone working on that for the next ACS > release (4.7?), or even 4.6 if the PR gets accepted. > > Please send me the jira ticket as soon as you open it. > > On Tue, Oct 13, 2015 at 5:33 PM, Ryan Farrington < > rfarring...@remitdata.com> > wrote: > > > Looks like whatever change I made actually resulted in a change in > > behavior. Prior to the change we were seeing a message every hour > stating > > that the job agent was waiting now we see it waited 2hours and 8 minutes > > without a peep before finishing. So making the change to the "wait" > > parameter is what made the magic happen. > > > > > > > > 2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request] > > (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Sending { > > Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags: > > 100111, > > > [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] > > } > > 2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request] > > (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: > Executing: { > > Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags: > > 100111, > > > [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] > > } > > 2015-10-13 13:21:48,282 DEBUG [c.c.a.m.DirectAgentAttache] > > (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Executing request > > 2015-10-13 15:27:13,396 DEBUG [c.c.a.m.DirectAgentAttache] > > (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Response Received: > > 2015-10-13 15:27:13,397 DEBUG [c.c.a.t.Request] > > (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Processing: { Ans: , > > MgmtId: 42756806312036, via: 38, Ver: v1, Flags: 110, > > > [{"com.cloud.agent.api.storage.MigrateVolumeAnswer":{"volumePath":"00db15be-3ccd-4648-8928-35ca90924d7c","result":true,"wait":0}}] > > } > > 2015-10-13 15:27:13,397 DEBUG [c.c.a.m.AgentAttache] > > (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: No more commands found > > 2015-10-13 15:27:13,397 DEBUG [c.c.a.t.Request] > > (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Received: > { > > Ans: , MgmtId: 42756806312036, via: 38, Ver: v1, Fla
Re: Timeout with live migration
Issue #1) Terrible parameter names https://issues.apache.org/jira/browse/CLOUDSTACK-8946 Issue #2) Wait value for MigrateVolume https://issues.apache.org/jira/browse/CLOUDSTACK-8949 From: Rafael Weingärtner [rafaelweingart...@gmail.com] Sent: Tuesday, October 13, 2015 3:43 PM To: users@cloudstack.apache.org Subject: [Questionable] Re: Timeout with live migration That is good. Now I can report to you what is in the code. Let’s start: First: when I looked at the problem at first time, I went straight to the class that sends commands to Xen, and there the ACs uses a parameter called: “migratewait” to control the timeout of command. You tried that, and you still were getting the timeout problem. That happened because despite that timeout, there is another point in ACS that controls the timeout of command that are send to hypervisor (not just Xen this time), and in that point, it is used a parameter called, “wait” as a default value to control timeouts of command. First conclusion, we have terrible parameter names ;) Second, when we create a “MigrateVolumeCommand” we should set a timeout value, this way the ACS would no use the default value of parameter “wait”. That timeout value should be the same as the one used on CitrixResourceBase and its children to control the migration of volumes. Can you report what happened to you in a Jira ticket and add my comments there? I think next Saturday I can have someone working on that for the next ACS release (4.7?), or even 4.6 if the PR gets accepted. Please send me the jira ticket as soon as you open it. On Tue, Oct 13, 2015 at 5:33 PM, Ryan Farrington wrote: > Looks like whatever change I made actually resulted in a change in > behavior. Prior to the change we were seeing a message every hour stating > that the job agent was waiting now we see it waited 2hours and 8 minutes > without a peep before finishing. So making the change to the "wait" > parameter is what made the magic happen. > > > > 2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request] > (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Sending { > Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags: > 100111, > [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] > } > 2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request] > (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Executing: { > Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags: > 100111, > [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] > } > 2015-10-13 13:21:48,282 DEBUG [c.c.a.m.DirectAgentAttache] > (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Executing request > 2015-10-13 15:27:13,396 DEBUG [c.c.a.m.DirectAgentAttache] > (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Response Received: > 2015-10-13 15:27:13,397 DEBUG [c.c.a.t.Request] > (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Processing: { Ans: , > MgmtId: 42756806312036, via: 38, Ver: v1, Flags: 110, > [{"com.cloud.agent.api.storage.MigrateVolumeAnswer":{"volumePath":"00db15be-3ccd-4648-8928-35ca90924d7c","result":true,"wait":0}}] > } > 2015-10-13 15:27:13,397 DEBUG [c.c.a.m.AgentAttache] > (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: No more commands found > 2015-10-13 15:27:13,397 DEBUG [c.c.a.t.Request] > (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Received: { > Ans: , MgmtId: 42756806312036, via: 38, Ver: v1, Flags: 110, { > MigrateVolumeAnswer } } > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > Sent: Tuesday, October 13, 2015 8:09 AM > To: users@cloudstack.apache.org > Subject: [Questionable] Re: Timeout with live migration > > Let’s wait to see if there is nothing else messing with that timeout. Then > I send you the details to put into the Jira ticket. > > On Tue, Oct 13, 2015 at 10:06 AM, Ryan Farrington < > rfarring...@remitdata.com > > wrote: > > > Rafa
Re: Timeout with live migration
That is good. Now I can report to you what is in the code. Let’s start: First: when I looked at the problem at first time, I went straight to the class that sends commands to Xen, and there the ACs uses a parameter called: “migratewait” to control the timeout of command. You tried that, and you still were getting the timeout problem. That happened because despite that timeout, there is another point in ACS that controls the timeout of command that are send to hypervisor (not just Xen this time), and in that point, it is used a parameter called, “wait” as a default value to control timeouts of command. First conclusion, we have terrible parameter names ;) Second, when we create a “MigrateVolumeCommand” we should set a timeout value, this way the ACS would no use the default value of parameter “wait”. That timeout value should be the same as the one used on CitrixResourceBase and its children to control the migration of volumes. Can you report what happened to you in a Jira ticket and add my comments there? I think next Saturday I can have someone working on that for the next ACS release (4.7?), or even 4.6 if the PR gets accepted. Please send me the jira ticket as soon as you open it. On Tue, Oct 13, 2015 at 5:33 PM, Ryan Farrington wrote: > Looks like whatever change I made actually resulted in a change in > behavior. Prior to the change we were seeing a message every hour stating > that the job agent was waiting now we see it waited 2hours and 8 minutes > without a peep before finishing. So making the change to the "wait" > parameter is what made the magic happen. > > > > 2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request] > (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Sending { > Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags: > 100111, > [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] > } > 2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request] > (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Executing: { > Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags: > 100111, > [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] > } > 2015-10-13 13:21:48,282 DEBUG [c.c.a.m.DirectAgentAttache] > (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Executing request > 2015-10-13 15:27:13,396 DEBUG [c.c.a.m.DirectAgentAttache] > (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Response Received: > 2015-10-13 15:27:13,397 DEBUG [c.c.a.t.Request] > (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Processing: { Ans: , > MgmtId: 42756806312036, via: 38, Ver: v1, Flags: 110, > [{"com.cloud.agent.api.storage.MigrateVolumeAnswer":{"volumePath":"00db15be-3ccd-4648-8928-35ca90924d7c","result":true,"wait":0}}] > } > 2015-10-13 15:27:13,397 DEBUG [c.c.a.m.AgentAttache] > (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: No more commands found > 2015-10-13 15:27:13,397 DEBUG [c.c.a.t.Request] > (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Received: { > Ans: , MgmtId: 42756806312036, via: 38, Ver: v1, Flags: 110, { > MigrateVolumeAnswer } } > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > Sent: Tuesday, October 13, 2015 8:09 AM > To: users@cloudstack.apache.org > Subject: [Questionable] Re: Timeout with live migration > > Let’s wait to see if there is nothing else messing with that timeout. Then > I send you the details to put into the Jira ticket. > > On Tue, Oct 13, 2015 at 10:06 AM, Ryan Farrington < > rfarring...@remitdata.com > > wrote: > > > Rafael, > > I am still a bit confused as to what you would like for me to place > in > > the JIRA ticket. I have adjusted the "wait" parameter and will be able > to > > test it in about an hour. But i would think the JIRA ticket should be as > > detailed as I can make it or will you be adding details once I have it > > created? > > > > > > > >
Re: Timeout with live migration
Looks like whatever change I made actually resulted in a change in behavior. Prior to the change we were seeing a message every hour stating that the job agent was waiting now we see it waited 2hours and 8 minutes without a peep before finishing. So making the change to the "wait" parameter is what made the magic happen. 2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request] (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Sending { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] } 2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request] (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Executing: { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] } 2015-10-13 13:21:48,282 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Executing request 2015-10-13 15:27:13,396 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Response Received: 2015-10-13 15:27:13,397 DEBUG [c.c.a.t.Request] (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Processing: { Ans: , MgmtId: 42756806312036, via: 38, Ver: v1, Flags: 110, [{"com.cloud.agent.api.storage.MigrateVolumeAnswer":{"volumePath":"00db15be-3ccd-4648-8928-35ca90924d7c","result":true,"wait":0}}] } 2015-10-13 15:27:13,397 DEBUG [c.c.a.m.AgentAttache] (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: No more commands found 2015-10-13 15:27:13,397 DEBUG [c.c.a.t.Request] (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Received: { Ans: , MgmtId: 42756806312036, via: 38, Ver: v1, Flags: 110, { MigrateVolumeAnswer } } From: Rafael Weingärtner [rafaelweingart...@gmail.com] Sent: Tuesday, October 13, 2015 8:09 AM To: users@cloudstack.apache.org Subject: [Questionable] Re: Timeout with live migration Let’s wait to see if there is nothing else messing with that timeout. Then I send you the details to put into the Jira ticket. On Tue, Oct 13, 2015 at 10:06 AM, Ryan Farrington wrote: > Rafael, > I am still a bit confused as to what you would like for me to place in > the JIRA ticket. I have adjusted the "wait" parameter and will be able to > test it in about an hour. But i would think the JIRA ticket should be as > detailed as I can make it or will you be adding details once I have it > created? > > > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > Sent: Tuesday, October 13, 2015 7:52 AM > To: users@cloudstack.apache.org > Subject: [Questionable] Re: [Questionable] Re: Timeout with live migration > > I guess so, for some reason that I do not understand, the code is > multiplying the value from that parameter by 2, something like 18000 should > do the tricky > > On Tue, Oct 13, 2015 at 12:15 AM, Ryan Farrington < > rfarring...@remitdata.com > > wrote: > > > Yes i can open JIRA tickets. What would you like for me to do? > > > > I'll be happy to change the "wait" parameter. Do I assume it should be > > 1/2 of the value i want it to be? > > > > > > > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > > Sent: Monday, October 12, 2015 10:12 PM > > To: users@cloudstack.apache.org > > Subject: [Questionable] Re: Timeout with live migration > > > > There is your problem, there are currently two distinct values conrolling > > those async jobs. > > Change that value and everything will work for u. > > Can you open a jira ticket? > > > > On Mon, Oct 12, 2015 at 11:51 PM, Ryan Farrington < > > rfarring...@remitdata.com > > > wrote: > > > > > wait is currently configured to be 3600 > > > > > > > > > > > > _
Re: Timeout with live migration
Let’s wait to see if there is nothing else messing with that timeout. Then I send you the details to put into the Jira ticket. On Tue, Oct 13, 2015 at 10:06 AM, Ryan Farrington wrote: > Rafael, > I am still a bit confused as to what you would like for me to place in > the JIRA ticket. I have adjusted the "wait" parameter and will be able to > test it in about an hour. But i would think the JIRA ticket should be as > detailed as I can make it or will you be adding details once I have it > created? > > > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > Sent: Tuesday, October 13, 2015 7:52 AM > To: users@cloudstack.apache.org > Subject: [Questionable] Re: [Questionable] Re: Timeout with live migration > > I guess so, for some reason that I do not understand, the code is > multiplying the value from that parameter by 2, something like 18000 should > do the tricky > > On Tue, Oct 13, 2015 at 12:15 AM, Ryan Farrington < > rfarring...@remitdata.com > > wrote: > > > Yes i can open JIRA tickets. What would you like for me to do? > > > > I'll be happy to change the "wait" parameter. Do I assume it should be > > 1/2 of the value i want it to be? > > > > > > > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > > Sent: Monday, October 12, 2015 10:12 PM > > To: users@cloudstack.apache.org > > Subject: [Questionable] Re: Timeout with live migration > > > > There is your problem, there are currently two distinct values conrolling > > those async jobs. > > Change that value and everything will work for u. > > Can you open a jira ticket? > > > > On Mon, Oct 12, 2015 at 11:51 PM, Ryan Farrington < > > rfarring...@remitdata.com > > > wrote: > > > > > wait is currently configured to be 3600 > > > > > > > > > > > > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > > > Sent: Monday, October 12, 2015 9:46 PM > > > To: users@cloudstack.apache.org > > > Subject: [Questionable] Re: Timeout with live migration > > > > > > I found something odd, > > > can you check the parameter called "wait", what value is it using ? > > > > > > On Mon, Oct 12, 2015 at 10:54 PM, Ryan Farrington < > > > rfarring...@remitdata.com > > > > wrote: > > > > > > > Yes the parameter was set long ago and the management server has been > > > > restarted numerous time over the past few days as we played with > other > > > > parameters to no effect. > > > > > > > > After looking at the log a little more does the "Failed to send > > command, > > > > due to Agent:38, com.cloud.exception.OperationTimedoutException: > > Commands > > > > 996939857 to Host 38 timed out after 7200" mean that the migration > > start > > > > command is being sent in some kind of synchronous mode and not > > returning > > > > control back to the job manager? > > > > > > > > > > > > > > > > > > > > > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > > > > Sent: Monday, October 12, 2015 8:46 PM > > > > To: users@cloudstack.apache.org > > > > Subject: [Questionable] Re: Timeout with live migration > > > > > > > > I thought you using the command “migrateVirtualMachineWithVolume” > but > > it > > > > seems that you are using “migrateVolume” command from ACS's API. > > > > > > > > > > > > For the code I debugged “migrateVirtualMachineWithVolume”, the > > parameter > > > > 3600, means 1 hour of timeout. > > > > > > > > For the “migrateVolume” is the same, they both end up in > > > > > > > > > > > > > > “com.cloud.hypervisor.xen.resource.XenServer610Resource.execute(MigrateVolumeCommand)”, > > > > and in that method the parameter is the same. > > > > > > > > > > > > If your parameter is set to 36000 (10 hours) I do not see why you are > > > > getting the exception after 2 hours. > > > > > > > > Did you restart the management servers after you changed the > parameter? > > > > > > > > On Mon, Oct 12, 2015 at 10:31 PM, Ryan Farrington < > &g
Re: Timeout with live migration
Rafael, I am still a bit confused as to what you would like for me to place in the JIRA ticket. I have adjusted the "wait" parameter and will be able to test it in about an hour. But i would think the JIRA ticket should be as detailed as I can make it or will you be adding details once I have it created? From: Rafael Weingärtner [rafaelweingart...@gmail.com] Sent: Tuesday, October 13, 2015 7:52 AM To: users@cloudstack.apache.org Subject: [Questionable] Re: [Questionable] Re: Timeout with live migration I guess so, for some reason that I do not understand, the code is multiplying the value from that parameter by 2, something like 18000 should do the tricky On Tue, Oct 13, 2015 at 12:15 AM, Ryan Farrington wrote: > Yes i can open JIRA tickets. What would you like for me to do? > > I'll be happy to change the "wait" parameter. Do I assume it should be > 1/2 of the value i want it to be? > > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > Sent: Monday, October 12, 2015 10:12 PM > To: users@cloudstack.apache.org > Subject: [Questionable] Re: Timeout with live migration > > There is your problem, there are currently two distinct values conrolling > those async jobs. > Change that value and everything will work for u. > Can you open a jira ticket? > > On Mon, Oct 12, 2015 at 11:51 PM, Ryan Farrington < > rfarring...@remitdata.com > > wrote: > > > wait is currently configured to be 3600 > > > > > > > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > > Sent: Monday, October 12, 2015 9:46 PM > > To: users@cloudstack.apache.org > > Subject: [Questionable] Re: Timeout with live migration > > > > I found something odd, > > can you check the parameter called "wait", what value is it using ? > > > > On Mon, Oct 12, 2015 at 10:54 PM, Ryan Farrington < > > rfarring...@remitdata.com > > > wrote: > > > > > Yes the parameter was set long ago and the management server has been > > > restarted numerous time over the past few days as we played with other > > > parameters to no effect. > > > > > > After looking at the log a little more does the "Failed to send > command, > > > due to Agent:38, com.cloud.exception.OperationTimedoutException: > Commands > > > 996939857 to Host 38 timed out after 7200" mean that the migration > start > > > command is being sent in some kind of synchronous mode and not > returning > > > control back to the job manager? > > > > > > > > > > > > > > > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > > > Sent: Monday, October 12, 2015 8:46 PM > > > To: users@cloudstack.apache.org > > > Subject: [Questionable] Re: Timeout with live migration > > > > > > I thought you using the command “migrateVirtualMachineWithVolume” but > it > > > seems that you are using “migrateVolume” command from ACS's API. > > > > > > > > > For the code I debugged “migrateVirtualMachineWithVolume”, the > parameter > > > 3600, means 1 hour of timeout. > > > > > > For the “migrateVolume” is the same, they both end up in > > > > > > > > > “com.cloud.hypervisor.xen.resource.XenServer610Resource.execute(MigrateVolumeCommand)”, > > > and in that method the parameter is the same. > > > > > > > > > If your parameter is set to 36000 (10 hours) I do not see why you are > > > getting the exception after 2 hours. > > > > > > Did you restart the management servers after you changed the parameter? > > > > > > On Mon, Oct 12, 2015 at 10:31 PM, Ryan Farrington < > > > rfarring...@remitdata.com > > > > wrote: > > > > > > > Here is the full log, including the stack for the exception, that we > > get > > > > at the 2 hour mark. as for the migratewait it is set to 36000 which > > > should > > > > be 10 hours. > > > > > > > > 2015-10-12 18:41:20,137 DEBUG [c.c.a.m.DirectAgentAttache] > > > > (DirectAgent-323:ctx-6d42edd7) Seq 31-1023875267: Executing request > > > > 2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache] > > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting > > > some > > > > more time because this is the current command >
Re: [Questionable] Re: Timeout with live migration
I guess so, for some reason that I do not understand, the code is multiplying the value from that parameter by 2, something like 18000 should do the tricky On Tue, Oct 13, 2015 at 12:15 AM, Ryan Farrington wrote: > Yes i can open JIRA tickets. What would you like for me to do? > > I'll be happy to change the "wait" parameter. Do I assume it should be > 1/2 of the value i want it to be? > > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > Sent: Monday, October 12, 2015 10:12 PM > To: users@cloudstack.apache.org > Subject: [Questionable] Re: Timeout with live migration > > There is your problem, there are currently two distinct values conrolling > those async jobs. > Change that value and everything will work for u. > Can you open a jira ticket? > > On Mon, Oct 12, 2015 at 11:51 PM, Ryan Farrington < > rfarring...@remitdata.com > > wrote: > > > wait is currently configured to be 3600 > > > > > > > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > > Sent: Monday, October 12, 2015 9:46 PM > > To: users@cloudstack.apache.org > > Subject: [Questionable] Re: Timeout with live migration > > > > I found something odd, > > can you check the parameter called "wait", what value is it using ? > > > > On Mon, Oct 12, 2015 at 10:54 PM, Ryan Farrington < > > rfarring...@remitdata.com > > > wrote: > > > > > Yes the parameter was set long ago and the management server has been > > > restarted numerous time over the past few days as we played with other > > > parameters to no effect. > > > > > > After looking at the log a little more does the "Failed to send > command, > > > due to Agent:38, com.cloud.exception.OperationTimedoutException: > Commands > > > 996939857 to Host 38 timed out after 7200" mean that the migration > start > > > command is being sent in some kind of synchronous mode and not > returning > > > control back to the job manager? > > > > > > > > > > > > > > > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > > > Sent: Monday, October 12, 2015 8:46 PM > > > To: users@cloudstack.apache.org > > > Subject: [Questionable] Re: Timeout with live migration > > > > > > I thought you using the command “migrateVirtualMachineWithVolume” but > it > > > seems that you are using “migrateVolume” command from ACS's API. > > > > > > > > > For the code I debugged “migrateVirtualMachineWithVolume”, the > parameter > > > 3600, means 1 hour of timeout. > > > > > > For the “migrateVolume” is the same, they both end up in > > > > > > > > > “com.cloud.hypervisor.xen.resource.XenServer610Resource.execute(MigrateVolumeCommand)”, > > > and in that method the parameter is the same. > > > > > > > > > If your parameter is set to 36000 (10 hours) I do not see why you are > > > getting the exception after 2 hours. > > > > > > Did you restart the management servers after you changed the parameter? > > > > > > On Mon, Oct 12, 2015 at 10:31 PM, Ryan Farrington < > > > rfarring...@remitdata.com > > > > wrote: > > > > > > > Here is the full log, including the stack for the exception, that we > > get > > > > at the 2 hour mark. as for the migratewait it is set to 36000 which > > > should > > > > be 10 hours. > > > > > > > > 2015-10-12 18:41:20,137 DEBUG [c.c.a.m.DirectAgentAttache] > > > > (DirectAgent-323:ctx-6d42edd7) Seq 31-1023875267: Executing request > > > > 2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache] > > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting > > > some > > > > more time because this is the current command > > > > 2015-10-12 18:41:20,457 INFO [c.c.u.e.CSExceptionErrorCode] > > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Could not find exception: > > > > com.cloud.exception.OperationTimedoutException in error code list for > > > > exceptions > > > > 2015-10-12 18:41:20,465 WARN [c.c.a.m.AgentAttache] > > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Timed > out > > > on > > > > Seq 38-996939857: { Cmd , MgmtId: 42756806312036, v
Re: [Questionable] Re: Timeout with live migration
Hi Ryan, we hit the same problem few days ago. After we changed parameters: migratewait, storage.pool.max.waitseconds, vm.op.cancel.interval and vm.op.cleanup.wait to 36000 (10h) we can migrate large volumes (500GB and more). We use xenserver 6.5 and ACS 4.5.1 Regards, Kuba W dniu 2015-10-13 o 05:15, Ryan Farrington pisze: Yes i can open JIRA tickets. What would you like for me to do? I'll be happy to change the "wait" parameter. Do I assume it should be 1/2 of the value i want it to be? From: Rafael Weingärtner [rafaelweingart...@gmail.com] Sent: Monday, October 12, 2015 10:12 PM To: users@cloudstack.apache.org Subject: [Questionable] Re: Timeout with live migration There is your problem, there are currently two distinct values conrolling those async jobs. Change that value and everything will work for u. Can you open a jira ticket? On Mon, Oct 12, 2015 at 11:51 PM, Ryan Farrington wrote: wait is currently configured to be 3600 From: Rafael Weingärtner [rafaelweingart...@gmail.com] Sent: Monday, October 12, 2015 9:46 PM To: users@cloudstack.apache.org Subject: [Questionable] Re: Timeout with live migration I found something odd, can you check the parameter called "wait", what value is it using ? On Mon, Oct 12, 2015 at 10:54 PM, Ryan Farrington < rfarring...@remitdata.com wrote: Yes the parameter was set long ago and the management server has been restarted numerous time over the past few days as we played with other parameters to no effect. After looking at the log a little more does the "Failed to send command, due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands 996939857 to Host 38 timed out after 7200" mean that the migration start command is being sent in some kind of synchronous mode and not returning control back to the job manager? From: Rafael Weingärtner [rafaelweingart...@gmail.com] Sent: Monday, October 12, 2015 8:46 PM To: users@cloudstack.apache.org Subject: [Questionable] Re: Timeout with live migration I thought you using the command “migrateVirtualMachineWithVolume” but it seems that you are using “migrateVolume” command from ACS's API. For the code I debugged “migrateVirtualMachineWithVolume”, the parameter 3600, means 1 hour of timeout. For the “migrateVolume” is the same, they both end up in “com.cloud.hypervisor.xen.resource.XenServer610Resource.execute(MigrateVolumeCommand)”, and in that method the parameter is the same. If your parameter is set to 36000 (10 hours) I do not see why you are getting the exception after 2 hours. Did you restart the management servers after you changed the parameter? On Mon, Oct 12, 2015 at 10:31 PM, Ryan Farrington < rfarring...@remitdata.com wrote: Here is the full log, including the stack for the exception, that we get at the 2 hour mark. as for the migratewait it is set to 36000 which should be 10 hours. 2015-10-12 18:41:20,137 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-323:ctx-6d42edd7) Seq 31-1023875267: Executing request 2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting some more time because this is the current command 2015-10-12 18:41:20,457 INFO [c.c.u.e.CSExceptionErrorCode] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Could not find exception: com.cloud.exception.OperationTimedoutException in error code list for exceptions 2015-10-12 18:41:20,465 WARN [c.c.a.m.AgentAttache] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Timed out on Seq 38-996939857: { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] } 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Cancelling. 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: No more commands found 2015-10-12 18:41:20,465 DEBUG [o.a.c.s.RemoteHostEndPoint] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Failed to send command, due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands 996939857 to Host 38 timed out after 7200 2015-10-12 18:41:20,471 DEBUG [o.a.c.s.m.AncientDataMotionStrategy] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) copy failed com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due to A
RE: [Questionable] Re: Timeout with live migration
Yes i can open JIRA tickets. What would you like for me to do? I'll be happy to change the "wait" parameter. Do I assume it should be 1/2 of the value i want it to be? From: Rafael Weingärtner [rafaelweingart...@gmail.com] Sent: Monday, October 12, 2015 10:12 PM To: users@cloudstack.apache.org Subject: [Questionable] Re: Timeout with live migration There is your problem, there are currently two distinct values conrolling those async jobs. Change that value and everything will work for u. Can you open a jira ticket? On Mon, Oct 12, 2015 at 11:51 PM, Ryan Farrington wrote: > wait is currently configured to be 3600 > > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > Sent: Monday, October 12, 2015 9:46 PM > To: users@cloudstack.apache.org > Subject: [Questionable] Re: Timeout with live migration > > I found something odd, > can you check the parameter called "wait", what value is it using ? > > On Mon, Oct 12, 2015 at 10:54 PM, Ryan Farrington < > rfarring...@remitdata.com > > wrote: > > > Yes the parameter was set long ago and the management server has been > > restarted numerous time over the past few days as we played with other > > parameters to no effect. > > > > After looking at the log a little more does the "Failed to send command, > > due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands > > 996939857 to Host 38 timed out after 7200" mean that the migration start > > command is being sent in some kind of synchronous mode and not returning > > control back to the job manager? > > > > > > > > > > ____ > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > > Sent: Monday, October 12, 2015 8:46 PM > > To: users@cloudstack.apache.org > > Subject: [Questionable] Re: Timeout with live migration > > > > I thought you using the command “migrateVirtualMachineWithVolume” but it > > seems that you are using “migrateVolume” command from ACS's API. > > > > > > For the code I debugged “migrateVirtualMachineWithVolume”, the parameter > > 3600, means 1 hour of timeout. > > > > For the “migrateVolume” is the same, they both end up in > > > > > “com.cloud.hypervisor.xen.resource.XenServer610Resource.execute(MigrateVolumeCommand)”, > > and in that method the parameter is the same. > > > > > > If your parameter is set to 36000 (10 hours) I do not see why you are > > getting the exception after 2 hours. > > > > Did you restart the management servers after you changed the parameter? > > > > On Mon, Oct 12, 2015 at 10:31 PM, Ryan Farrington < > > rfarring...@remitdata.com > > > wrote: > > > > > Here is the full log, including the stack for the exception, that we > get > > > at the 2 hour mark. as for the migratewait it is set to 36000 which > > should > > > be 10 hours. > > > > > > 2015-10-12 18:41:20,137 DEBUG [c.c.a.m.DirectAgentAttache] > > > (DirectAgent-323:ctx-6d42edd7) Seq 31-1023875267: Executing request > > > 2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache] > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting > > some > > > more time because this is the current command > > > 2015-10-12 18:41:20,457 INFO [c.c.u.e.CSExceptionErrorCode] > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Could not find exception: > > > com.cloud.exception.OperationTimedoutException in error code list for > > > exceptions > > > 2015-10-12 18:41:20,465 WARN [c.c.a.m.AgentAttache] > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Timed out > > on > > > Seq 38-996939857: { Cmd , MgmtId: 42756806312036, via: > 38(xen-nc-bc2b7), > > > Ver: v1, Flags: 100111, > > > > > > [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] > > > } > > > 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache] > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: > Cancelling. > > > 2015-10-12 18:41:2
Re: Timeout with live migration
There is your problem, there are currently two distinct values conrolling those async jobs. Change that value and everything will work for u. Can you open a jira ticket? On Mon, Oct 12, 2015 at 11:51 PM, Ryan Farrington wrote: > wait is currently configured to be 3600 > > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > Sent: Monday, October 12, 2015 9:46 PM > To: users@cloudstack.apache.org > Subject: [Questionable] Re: Timeout with live migration > > I found something odd, > can you check the parameter called "wait", what value is it using ? > > On Mon, Oct 12, 2015 at 10:54 PM, Ryan Farrington < > rfarring...@remitdata.com > > wrote: > > > Yes the parameter was set long ago and the management server has been > > restarted numerous time over the past few days as we played with other > > parameters to no effect. > > > > After looking at the log a little more does the "Failed to send command, > > due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands > > 996939857 to Host 38 timed out after 7200" mean that the migration start > > command is being sent in some kind of synchronous mode and not returning > > control back to the job manager? > > > > > > > > > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > > Sent: Monday, October 12, 2015 8:46 PM > > To: users@cloudstack.apache.org > > Subject: [Questionable] Re: Timeout with live migration > > > > I thought you using the command “migrateVirtualMachineWithVolume” but it > > seems that you are using “migrateVolume” command from ACS's API. > > > > > > For the code I debugged “migrateVirtualMachineWithVolume”, the parameter > > 3600, means 1 hour of timeout. > > > > For the “migrateVolume” is the same, they both end up in > > > > > “com.cloud.hypervisor.xen.resource.XenServer610Resource.execute(MigrateVolumeCommand)”, > > and in that method the parameter is the same. > > > > > > If your parameter is set to 36000 (10 hours) I do not see why you are > > getting the exception after 2 hours. > > > > Did you restart the management servers after you changed the parameter? > > > > On Mon, Oct 12, 2015 at 10:31 PM, Ryan Farrington < > > rfarring...@remitdata.com > > > wrote: > > > > > Here is the full log, including the stack for the exception, that we > get > > > at the 2 hour mark. as for the migratewait it is set to 36000 which > > should > > > be 10 hours. > > > > > > 2015-10-12 18:41:20,137 DEBUG [c.c.a.m.DirectAgentAttache] > > > (DirectAgent-323:ctx-6d42edd7) Seq 31-1023875267: Executing request > > > 2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache] > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting > > some > > > more time because this is the current command > > > 2015-10-12 18:41:20,457 INFO [c.c.u.e.CSExceptionErrorCode] > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Could not find exception: > > > com.cloud.exception.OperationTimedoutException in error code list for > > > exceptions > > > 2015-10-12 18:41:20,465 WARN [c.c.a.m.AgentAttache] > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Timed out > > on > > > Seq 38-996939857: { Cmd , MgmtId: 42756806312036, via: > 38(xen-nc-bc2b7), > > > Ver: v1, Flags: 100111, > > > > > > [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] > > > } > > > 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache] > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: > Cancelling. > > > 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache] > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: No more > > > commands found > > > 2015-10-12 18:41:20,465 DEBUG [o.a.c.s.RemoteHostEndPoint] > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Failed to send command, due > > to > > > Agent:38, com.cloud.exception.OperationTimedoutException: Commands > > >
Re: Timeout with live migration
wait is currently configured to be 3600 From: Rafael Weingärtner [rafaelweingart...@gmail.com] Sent: Monday, October 12, 2015 9:46 PM To: users@cloudstack.apache.org Subject: [Questionable] Re: Timeout with live migration I found something odd, can you check the parameter called "wait", what value is it using ? On Mon, Oct 12, 2015 at 10:54 PM, Ryan Farrington wrote: > Yes the parameter was set long ago and the management server has been > restarted numerous time over the past few days as we played with other > parameters to no effect. > > After looking at the log a little more does the "Failed to send command, > due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands > 996939857 to Host 38 timed out after 7200" mean that the migration start > command is being sent in some kind of synchronous mode and not returning > control back to the job manager? > > > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > Sent: Monday, October 12, 2015 8:46 PM > To: users@cloudstack.apache.org > Subject: [Questionable] Re: Timeout with live migration > > I thought you using the command “migrateVirtualMachineWithVolume” but it > seems that you are using “migrateVolume” command from ACS's API. > > > For the code I debugged “migrateVirtualMachineWithVolume”, the parameter > 3600, means 1 hour of timeout. > > For the “migrateVolume” is the same, they both end up in > > “com.cloud.hypervisor.xen.resource.XenServer610Resource.execute(MigrateVolumeCommand)”, > and in that method the parameter is the same. > > > If your parameter is set to 36000 (10 hours) I do not see why you are > getting the exception after 2 hours. > > Did you restart the management servers after you changed the parameter? > > On Mon, Oct 12, 2015 at 10:31 PM, Ryan Farrington < > rfarring...@remitdata.com > > wrote: > > > Here is the full log, including the stack for the exception, that we get > > at the 2 hour mark. as for the migratewait it is set to 36000 which > should > > be 10 hours. > > > > 2015-10-12 18:41:20,137 DEBUG [c.c.a.m.DirectAgentAttache] > > (DirectAgent-323:ctx-6d42edd7) Seq 31-1023875267: Executing request > > 2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache] > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting > some > > more time because this is the current command > > 2015-10-12 18:41:20,457 INFO [c.c.u.e.CSExceptionErrorCode] > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Could not find exception: > > com.cloud.exception.OperationTimedoutException in error code list for > > exceptions > > 2015-10-12 18:41:20,465 WARN [c.c.a.m.AgentAttache] > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Timed out > on > > Seq 38-996939857: { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), > > Ver: v1, Flags: 100111, > > > [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] > > } > > 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache] > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Cancelling. > > 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache] > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: No more > > commands found > > 2015-10-12 18:41:20,465 DEBUG [o.a.c.s.RemoteHostEndPoint] > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Failed to send command, due > to > > Agent:38, com.cloud.exception.OperationTimedoutException: Commands > > 996939857 to Host 38 timed out after 7200 > > 2015-10-12 18:41:20,471 DEBUG [o.a.c.s.m.AncientDataMotionStrategy] > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) copy failed > > com.cloud.utils.exception.CloudRuntimeException: Failed to send command, > > due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands > > 996939857 to Host 38 timed out after 7200 > > at > > > org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:116) > > at > > > org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.migrateVolumeToPool(AncientDataMotionStrategy.java:382) > > at > > > org.apache.cloudstack.storage.motion.AncientDataMotionSt
Re: Timeout with live migration
I found something odd, can you check the parameter called "wait", what value is it using ? On Mon, Oct 12, 2015 at 10:54 PM, Ryan Farrington wrote: > Yes the parameter was set long ago and the management server has been > restarted numerous time over the past few days as we played with other > parameters to no effect. > > After looking at the log a little more does the "Failed to send command, > due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands > 996939857 to Host 38 timed out after 7200" mean that the migration start > command is being sent in some kind of synchronous mode and not returning > control back to the job manager? > > > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > Sent: Monday, October 12, 2015 8:46 PM > To: users@cloudstack.apache.org > Subject: [Questionable] Re: Timeout with live migration > > I thought you using the command “migrateVirtualMachineWithVolume” but it > seems that you are using “migrateVolume” command from ACS's API. > > > For the code I debugged “migrateVirtualMachineWithVolume”, the parameter > 3600, means 1 hour of timeout. > > For the “migrateVolume” is the same, they both end up in > > “com.cloud.hypervisor.xen.resource.XenServer610Resource.execute(MigrateVolumeCommand)”, > and in that method the parameter is the same. > > > If your parameter is set to 36000 (10 hours) I do not see why you are > getting the exception after 2 hours. > > Did you restart the management servers after you changed the parameter? > > On Mon, Oct 12, 2015 at 10:31 PM, Ryan Farrington < > rfarring...@remitdata.com > > wrote: > > > Here is the full log, including the stack for the exception, that we get > > at the 2 hour mark. as for the migratewait it is set to 36000 which > should > > be 10 hours. > > > > 2015-10-12 18:41:20,137 DEBUG [c.c.a.m.DirectAgentAttache] > > (DirectAgent-323:ctx-6d42edd7) Seq 31-1023875267: Executing request > > 2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache] > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting > some > > more time because this is the current command > > 2015-10-12 18:41:20,457 INFO [c.c.u.e.CSExceptionErrorCode] > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Could not find exception: > > com.cloud.exception.OperationTimedoutException in error code list for > > exceptions > > 2015-10-12 18:41:20,465 WARN [c.c.a.m.AgentAttache] > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Timed out > on > > Seq 38-996939857: { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), > > Ver: v1, Flags: 100111, > > > [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] > > } > > 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache] > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Cancelling. > > 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache] > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: No more > > commands found > > 2015-10-12 18:41:20,465 DEBUG [o.a.c.s.RemoteHostEndPoint] > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Failed to send command, due > to > > Agent:38, com.cloud.exception.OperationTimedoutException: Commands > > 996939857 to Host 38 timed out after 7200 > > 2015-10-12 18:41:20,471 DEBUG [o.a.c.s.m.AncientDataMotionStrategy] > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) copy failed > > com.cloud.utils.exception.CloudRuntimeException: Failed to send command, > > due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands > > 996939857 to Host 38 timed out after 7200 > > at > > > org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:116) > > at > > > org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.migrateVolumeToPool(AncientDataMotionStrategy.java:382) > > at > > > org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.copyAsync(AncientDataMotionStrategy.java:421) > > at > > > org.apache.cloudstack.storage.motion.DataMotionServiceImpl.copyAsync(DataMotionServiceImpl.java:70) > > at > > > org.apache.cloudstack.storage.volume.VolumeServiceImpl
Re: Timeout with live migration
Yes the parameter was set long ago and the management server has been restarted numerous time over the past few days as we played with other parameters to no effect. After looking at the log a little more does the "Failed to send command, due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands 996939857 to Host 38 timed out after 7200" mean that the migration start command is being sent in some kind of synchronous mode and not returning control back to the job manager? From: Rafael Weingärtner [rafaelweingart...@gmail.com] Sent: Monday, October 12, 2015 8:46 PM To: users@cloudstack.apache.org Subject: [Questionable] Re: Timeout with live migration I thought you using the command “migrateVirtualMachineWithVolume” but it seems that you are using “migrateVolume” command from ACS's API. For the code I debugged “migrateVirtualMachineWithVolume”, the parameter 3600, means 1 hour of timeout. For the “migrateVolume” is the same, they both end up in “com.cloud.hypervisor.xen.resource.XenServer610Resource.execute(MigrateVolumeCommand)”, and in that method the parameter is the same. If your parameter is set to 36000 (10 hours) I do not see why you are getting the exception after 2 hours. Did you restart the management servers after you changed the parameter? On Mon, Oct 12, 2015 at 10:31 PM, Ryan Farrington wrote: > Here is the full log, including the stack for the exception, that we get > at the 2 hour mark. as for the migratewait it is set to 36000 which should > be 10 hours. > > 2015-10-12 18:41:20,137 DEBUG [c.c.a.m.DirectAgentAttache] > (DirectAgent-323:ctx-6d42edd7) Seq 31-1023875267: Executing request > 2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache] > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting some > more time because this is the current command > 2015-10-12 18:41:20,457 INFO [c.c.u.e.CSExceptionErrorCode] > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Could not find exception: > com.cloud.exception.OperationTimedoutException in error code list for > exceptions > 2015-10-12 18:41:20,465 WARN [c.c.a.m.AgentAttache] > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Timed out on > Seq 38-996939857: { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), > Ver: v1, Flags: 100111, > [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] > } > 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache] > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Cancelling. > 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache] > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: No more > commands found > 2015-10-12 18:41:20,465 DEBUG [o.a.c.s.RemoteHostEndPoint] > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Failed to send command, due to > Agent:38, com.cloud.exception.OperationTimedoutException: Commands > 996939857 to Host 38 timed out after 7200 > 2015-10-12 18:41:20,471 DEBUG [o.a.c.s.m.AncientDataMotionStrategy] > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) copy failed > com.cloud.utils.exception.CloudRuntimeException: Failed to send command, > due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands > 996939857 to Host 38 timed out after 7200 > at > org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:116) > at > org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.migrateVolumeToPool(AncientDataMotionStrategy.java:382) > at > org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.copyAsync(AncientDataMotionStrategy.java:421) > at > org.apache.cloudstack.storage.motion.DataMotionServiceImpl.copyAsync(DataMotionServiceImpl.java:70) > at > org.apache.cloudstack.storage.volume.VolumeServiceImpl.migrateVolume(VolumeServiceImpl.java:931) > at > com.cloud.storage.VolumeApiServiceImpl.liveMigrateVolume(VolumeApiServiceImpl.java:1680) > at > com.cloud.storage.VolumeApiServiceImpl.orchestrateMigrateVolume(VolumeApiServiceImpl.java:1666) > at > com.cloud.storage.VolumeApiServiceImpl.migrateVolume(VolumeApiServiceImpl.java:1622) > at sun.reflect.GeneratedMethodAccessor335.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:622) > a
Re: Timeout with live migration
.proxy.$Proxy196.migrateVolume(Unknown Source) > at > org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd.execute(MigrateVolumeCmd.java:103) > at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:161) > at > com.cloud.api.ApiAsyncJobDispatcher.runJobInContext(ApiAsyncJobDispatcher.java:109) > at > com.cloud.api.ApiAsyncJobDispatcher$1.run(ApiAsyncJobDispatcher.java:66) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) > at > com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:63) > at > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:509) > at > org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) > at > org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:701) > 2015-10-12 18:41:20,479 WARN [o.a.c.s.d.ObjectInDataStoreManagerImpl] > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Unsupported data object > (VOLUME, > org.apache.cloudstack.storage.datastore.PrimaryDataStoreImpl@4fa7a45f), > no need to delete from object in store ref table > 2015-10-12 18:41:20,479 DEBUG [c.c.s.VolumeApiServiceImpl] > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) migrate volume > failed:com.cloud.utils.exception.CloudRuntimeException: Failed to send > command, due to Agent:38, com.cloud.exception.OperationTimedoutException: > Commands 996939857 to Host 38 timed out after 7200 > 2015-10-12 18:41:20,480 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (Job-Executor-63:ctx-f7b6817d) Complete async job-5257, jobStatus: FAILED, > resultCode: 530, result: > org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530,"errortext":"Failed > to migrate volume"} > 2015-10-12 18:41:20,486 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (Job-Executor-63:ctx-f7b6817d) Done executing > org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd for job-5257 > 2015-10-12 18:41:20,489 INFO [o.a.c.f.j.i.AsyncJobMonitor] > (Job-Executor-63:ctx-f7b6817d) Remove job-5257 from job monitoring > > > > > > > > > > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > Sent: Monday, October 12, 2015 8:24 PM > To: users@cloudstack.apache.org > Subject: [Questionable] Re: [Questionable] Re: Timeout with live migration > > Now I understand what you are doing, I am familiar with that concept (live > migration of VM within a cluster, having the VHD being moved from one SR to > another). > > I just got confused when I read live migration of volumes (a volume does > not run by itself, so that why I asked a little for some more information). > > Looking at the source code this is the variable used to control the > timeout: > "long timeout = (_migratewait) * 1000L;" > > The value of "_migratewait" is taken from this parameter: > value = (String) params.get("migratewait"); > _migratewait = NumbersUtil.parseInt(value, 3600); > > Therefore, the name of the parameter to be configured is "migratewait", the > default value is 3600. > > > BTW1: I think that is a terrible parameter name. We should refactor that, > could you open a Jira ticket for that? > > BTW2: that error message you posted does not seem to be related to the > migration timeout; hence, in the code if the copy times out the message > would be: > "Async " + timeout/1000
Re: Timeout with live migration
obManagerImpl$5.runInContext(AsyncJobManagerImpl.java:509) at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:701) 2015-10-12 18:41:20,479 WARN [o.a.c.s.d.ObjectInDataStoreManagerImpl] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Unsupported data object (VOLUME, org.apache.cloudstack.storage.datastore.PrimaryDataStoreImpl@4fa7a45f), no need to delete from object in store ref table 2015-10-12 18:41:20,479 DEBUG [c.c.s.VolumeApiServiceImpl] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) migrate volume failed:com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands 996939857 to Host 38 timed out after 7200 2015-10-12 18:41:20,480 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Job-Executor-63:ctx-f7b6817d) Complete async job-5257, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530,"errortext":"Failed to migrate volume"} 2015-10-12 18:41:20,486 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Job-Executor-63:ctx-f7b6817d) Done executing org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd for job-5257 2015-10-12 18:41:20,489 INFO [o.a.c.f.j.i.AsyncJobMonitor] (Job-Executor-63:ctx-f7b6817d) Remove job-5257 from job monitoring ________ From: Rafael Weingärtner [rafaelweingart...@gmail.com] Sent: Monday, October 12, 2015 8:24 PM To: users@cloudstack.apache.org Subject: [Questionable] Re: [Questionable] Re: Timeout with live migration Now I understand what you are doing, I am familiar with that concept (live migration of VM within a cluster, having the VHD being moved from one SR to another). I just got confused when I read live migration of volumes (a volume does not run by itself, so that why I asked a little for some more information). Looking at the source code this is the variable used to control the timeout: "long timeout = (_migratewait) * 1000L;" The value of "_migratewait" is taken from this parameter: value = (String) params.get("migratewait"); _migratewait = NumbersUtil.parseInt(value, 3600); Therefore, the name of the parameter to be configured is "migratewait", the default value is 3600. BTW1: I think that is a terrible parameter name. We should refactor that, could you open a Jira ticket for that? BTW2: that error message you posted does not seem to be related to the migration timeout; hence, in the code if the copy times out the message would be: "Async " + timeout/1000 + " seconds timeout for task " + task.toString()" Maybe because it throws a "Types.BadAsyncResult(msg)" and that might be translated into that message, or that might not be related to the problem itself, and you just thought that it was. Does it help you? On Mon, Oct 12, 2015 at 10:00 PM, Ryan Farrington wrote: > Hypervisor: XenServer > > We are moving a data volume from one storage onto another without shutting > down the VM cause that would just be silly and a triplication of effort > with the whole copying to secondary storage and then back off again. The > volume is staying in the same cluster just moving to a different Primary > storage (or SR in the XenServer vernacular) > > If you are familiar with ESX this is a "Storage VMotion" where as in > XenServer it is called "Storage XenMotion". > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > Sent: Monday, October 12, 2015 7:53 PM > To: users@cloudstack.apache.org > Subject: [Questionable] Re: Timeout with live migration > > what do you mean with livre migrating data volume ?! > I understand a live migration of a VM, but volumes... > > do you mean live migrating a VM that has a volume attached? > are you migrating that volume to a different cluster? or just a different > sto
Re: [Questionable] Re: Timeout with live migration
Now I understand what you are doing, I am familiar with that concept (live migration of VM within a cluster, having the VHD being moved from one SR to another). I just got confused when I read live migration of volumes (a volume does not run by itself, so that why I asked a little for some more information). Looking at the source code this is the variable used to control the timeout: "long timeout = (_migratewait) * 1000L;" The value of "_migratewait" is taken from this parameter: value = (String) params.get("migratewait"); _migratewait = NumbersUtil.parseInt(value, 3600); Therefore, the name of the parameter to be configured is "migratewait", the default value is 3600. BTW1: I think that is a terrible parameter name. We should refactor that, could you open a Jira ticket for that? BTW2: that error message you posted does not seem to be related to the migration timeout; hence, in the code if the copy times out the message would be: "Async " + timeout/1000 + " seconds timeout for task " + task.toString()" Maybe because it throws a "Types.BadAsyncResult(msg)" and that might be translated into that message, or that might not be related to the problem itself, and you just thought that it was. Does it help you? On Mon, Oct 12, 2015 at 10:00 PM, Ryan Farrington wrote: > Hypervisor: XenServer > > We are moving a data volume from one storage onto another without shutting > down the VM cause that would just be silly and a triplication of effort > with the whole copying to secondary storage and then back off again. The > volume is staying in the same cluster just moving to a different Primary > storage (or SR in the XenServer vernacular) > > If you are familiar with ESX this is a "Storage VMotion" where as in > XenServer it is called "Storage XenMotion". > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > Sent: Monday, October 12, 2015 7:53 PM > To: users@cloudstack.apache.org > Subject: [Questionable] Re: Timeout with live migration > > what do you mean with livre migrating data volume ?! > I understand a live migration of a VM, but volumes... > > do you mean live migrating a VM that has a volume attached? > are you migrating that volume to a different cluster? or just a different > storage in the same cluster? > What hypervisor are you using ? > > > On Mon, Oct 12, 2015 at 9:47 PM, Ryan Farrington < > rfarring...@remitdata.com> > wrote: > > > Live migrating a data volume. We are purely on shared storage so no local > > storage is involved. > > > > ____ > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > > Sent: Monday, October 12, 2015 7:37 PM > > To: users@cloudstack.apache.org > > Subject: [Questionable] Re: Timeout with live migration > > > > Are you live migrating a VM, or migrating a volume of a stopped VM to a > > different primary storage? > > > > If it is a running VM, is the VM allocated in a shared storage or local > > storage? > > > > On Mon, Oct 12, 2015 at 9:17 PM, Ryan Farrington < > > rfarring...@remitdata.com> > > wrote: > > > > > The slow transfer is related to the storage we are trying to migrate > off > > > of. We are capable of getting about 350mbps off the disks but when we > > are > > > moving volumes that are greater than about 500GB we end up racing the > > clock > > > and hoping that the migration finishes before the job times out. It > > would > > > be awesome to be able to manage that timeout and I know there are a ton > > of > > > settings I just don't know about and am hoping someone might be able to > > > point me in the right direction. > > > > > > > > > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > > > Sent: Monday, October 12, 2015 6:40 PM > > > To: users@cloudstack.apache.org > > > Subject: [Questionable] Re: Timeout with live migration > > > > > > I would first check your NICs' speed and load, the amount of RAM > > allocated > > > for the migrating VM and than check the hypervisor log files. > > > > > > On Mon, Oct 12, 2015 at 8:19 PM, Jan-Arve Nygård < > > > jan.arve.nyg...@gmail.com> > > > wrote: > > > > > > > What version are you running? Check if the copy.volume.wait setting > is > > > set > > > > to 7200 and increase it. If not you could also check > > > > job.cancel.threshold.minutes and job.
RE: [Questionable] Re: Timeout with live migration
Hypervisor: XenServer We are moving a data volume from one storage onto another without shutting down the VM cause that would just be silly and a triplication of effort with the whole copying to secondary storage and then back off again. The volume is staying in the same cluster just moving to a different Primary storage (or SR in the XenServer vernacular) If you are familiar with ESX this is a "Storage VMotion" where as in XenServer it is called "Storage XenMotion". From: Rafael Weingärtner [rafaelweingart...@gmail.com] Sent: Monday, October 12, 2015 7:53 PM To: users@cloudstack.apache.org Subject: [Questionable] Re: Timeout with live migration what do you mean with livre migrating data volume ?! I understand a live migration of a VM, but volumes... do you mean live migrating a VM that has a volume attached? are you migrating that volume to a different cluster? or just a different storage in the same cluster? What hypervisor are you using ? On Mon, Oct 12, 2015 at 9:47 PM, Ryan Farrington wrote: > Live migrating a data volume. We are purely on shared storage so no local > storage is involved. > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > Sent: Monday, October 12, 2015 7:37 PM > To: users@cloudstack.apache.org > Subject: [Questionable] Re: Timeout with live migration > > Are you live migrating a VM, or migrating a volume of a stopped VM to a > different primary storage? > > If it is a running VM, is the VM allocated in a shared storage or local > storage? > > On Mon, Oct 12, 2015 at 9:17 PM, Ryan Farrington < > rfarring...@remitdata.com> > wrote: > > > The slow transfer is related to the storage we are trying to migrate off > > of. We are capable of getting about 350mbps off the disks but when we > are > > moving volumes that are greater than about 500GB we end up racing the > clock > > and hoping that the migration finishes before the job times out. It > would > > be awesome to be able to manage that timeout and I know there are a ton > of > > settings I just don't know about and am hoping someone might be able to > > point me in the right direction. > > > > > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > > Sent: Monday, October 12, 2015 6:40 PM > > To: users@cloudstack.apache.org > > Subject: [Questionable] Re: Timeout with live migration > > > > I would first check your NICs' speed and load, the amount of RAM > allocated > > for the migrating VM and than check the hypervisor log files. > > > > On Mon, Oct 12, 2015 at 8:19 PM, Jan-Arve Nygård < > > jan.arve.nyg...@gmail.com> > > wrote: > > > > > What version are you running? Check if the copy.volume.wait setting is > > set > > > to 7200 and increase it. If not you could also check > > > job.cancel.threshold.minutes and job.expire.minutes. > > > > > > -Jan-Arve > > > > > > 2015-10-13 0:46 GMT+02:00 Ryan Farrington : > > > > > > > We are experiencing a failure in cloudstack waiting for an async job > > > > performing a live migration of a volume to finish. I've copied the > > > relevant > > > > log entries below.We acknowledge that the migration will take a few > > hours > > > > based on the volume of the data and we are looking for a way to > > increase > > > > the timeout of 7200 seconds into something we know we can work with. > > > > > > > > > > > > 2015-10-12 00:19:36,043 DEBUG [o.a.c.s.RemoteHostEndPoint] > > > > (Job-Executor-62:ctx-802065a9 ctx-bb27a168) Failed to send command, > due > > > to > > > > Agent:27, com.cloud.exception.OperationTimedoutException: Commands > > > > 835325398 to Host 27 timed out after 7200 > > > > > > > > > > > > > > > > > > > > > > > -- > > Rafael Weingärtner > > > > > > -- > Rafael Weingärtner > -- Rafael Weingärtner
Re: Timeout with live migration
what do you mean with livre migrating data volume ?! I understand a live migration of a VM, but volumes... do you mean live migrating a VM that has a volume attached? are you migrating that volume to a different cluster? or just a different storage in the same cluster? What hypervisor are you using ? On Mon, Oct 12, 2015 at 9:47 PM, Ryan Farrington wrote: > Live migrating a data volume. We are purely on shared storage so no local > storage is involved. > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > Sent: Monday, October 12, 2015 7:37 PM > To: users@cloudstack.apache.org > Subject: [Questionable] Re: Timeout with live migration > > Are you live migrating a VM, or migrating a volume of a stopped VM to a > different primary storage? > > If it is a running VM, is the VM allocated in a shared storage or local > storage? > > On Mon, Oct 12, 2015 at 9:17 PM, Ryan Farrington < > rfarring...@remitdata.com> > wrote: > > > The slow transfer is related to the storage we are trying to migrate off > > of. We are capable of getting about 350mbps off the disks but when we > are > > moving volumes that are greater than about 500GB we end up racing the > clock > > and hoping that the migration finishes before the job times out. It > would > > be awesome to be able to manage that timeout and I know there are a ton > of > > settings I just don't know about and am hoping someone might be able to > > point me in the right direction. > > > > > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > > Sent: Monday, October 12, 2015 6:40 PM > > To: users@cloudstack.apache.org > > Subject: [Questionable] Re: Timeout with live migration > > > > I would first check your NICs' speed and load, the amount of RAM > allocated > > for the migrating VM and than check the hypervisor log files. > > > > On Mon, Oct 12, 2015 at 8:19 PM, Jan-Arve Nygård < > > jan.arve.nyg...@gmail.com> > > wrote: > > > > > What version are you running? Check if the copy.volume.wait setting is > > set > > > to 7200 and increase it. If not you could also check > > > job.cancel.threshold.minutes and job.expire.minutes. > > > > > > -Jan-Arve > > > > > > 2015-10-13 0:46 GMT+02:00 Ryan Farrington : > > > > > > > We are experiencing a failure in cloudstack waiting for an async job > > > > performing a live migration of a volume to finish. I've copied the > > > relevant > > > > log entries below.We acknowledge that the migration will take a few > > hours > > > > based on the volume of the data and we are looking for a way to > > increase > > > > the timeout of 7200 seconds into something we know we can work with. > > > > > > > > > > > > 2015-10-12 00:19:36,043 DEBUG [o.a.c.s.RemoteHostEndPoint] > > > > (Job-Executor-62:ctx-802065a9 ctx-bb27a168) Failed to send command, > due > > > to > > > > Agent:27, com.cloud.exception.OperationTimedoutException: Commands > > > > 835325398 to Host 27 timed out after 7200 > > > > > > > > > > > > > > > > > > > > > > > -- > > Rafael Weingärtner > > > > > > -- > Rafael Weingärtner > -- Rafael Weingärtner
Re: Timeout with live migration
Live migrating a data volume. We are purely on shared storage so no local storage is involved. From: Rafael Weingärtner [rafaelweingart...@gmail.com] Sent: Monday, October 12, 2015 7:37 PM To: users@cloudstack.apache.org Subject: [Questionable] Re: Timeout with live migration Are you live migrating a VM, or migrating a volume of a stopped VM to a different primary storage? If it is a running VM, is the VM allocated in a shared storage or local storage? On Mon, Oct 12, 2015 at 9:17 PM, Ryan Farrington wrote: > The slow transfer is related to the storage we are trying to migrate off > of. We are capable of getting about 350mbps off the disks but when we are > moving volumes that are greater than about 500GB we end up racing the clock > and hoping that the migration finishes before the job times out. It would > be awesome to be able to manage that timeout and I know there are a ton of > settings I just don't know about and am hoping someone might be able to > point me in the right direction. > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > Sent: Monday, October 12, 2015 6:40 PM > To: users@cloudstack.apache.org > Subject: [Questionable] Re: Timeout with live migration > > I would first check your NICs' speed and load, the amount of RAM allocated > for the migrating VM and than check the hypervisor log files. > > On Mon, Oct 12, 2015 at 8:19 PM, Jan-Arve Nygård < > jan.arve.nyg...@gmail.com> > wrote: > > > What version are you running? Check if the copy.volume.wait setting is > set > > to 7200 and increase it. If not you could also check > > job.cancel.threshold.minutes and job.expire.minutes. > > > > -Jan-Arve > > > > 2015-10-13 0:46 GMT+02:00 Ryan Farrington : > > > > > We are experiencing a failure in cloudstack waiting for an async job > > > performing a live migration of a volume to finish. I've copied the > > relevant > > > log entries below.We acknowledge that the migration will take a few > hours > > > based on the volume of the data and we are looking for a way to > increase > > > the timeout of 7200 seconds into something we know we can work with. > > > > > > > > > 2015-10-12 00:19:36,043 DEBUG [o.a.c.s.RemoteHostEndPoint] > > > (Job-Executor-62:ctx-802065a9 ctx-bb27a168) Failed to send command, due > > to > > > Agent:27, com.cloud.exception.OperationTimedoutException: Commands > > > 835325398 to Host 27 timed out after 7200 > > > > > > > > > > > > > > > -- > Rafael Weingärtner > -- Rafael Weingärtner
Re: Timeout with live migration
Are you live migrating a VM, or migrating a volume of a stopped VM to a different primary storage? If it is a running VM, is the VM allocated in a shared storage or local storage? On Mon, Oct 12, 2015 at 9:17 PM, Ryan Farrington wrote: > The slow transfer is related to the storage we are trying to migrate off > of. We are capable of getting about 350mbps off the disks but when we are > moving volumes that are greater than about 500GB we end up racing the clock > and hoping that the migration finishes before the job times out. It would > be awesome to be able to manage that timeout and I know there are a ton of > settings I just don't know about and am hoping someone might be able to > point me in the right direction. > > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com] > Sent: Monday, October 12, 2015 6:40 PM > To: users@cloudstack.apache.org > Subject: [Questionable] Re: Timeout with live migration > > I would first check your NICs' speed and load, the amount of RAM allocated > for the migrating VM and than check the hypervisor log files. > > On Mon, Oct 12, 2015 at 8:19 PM, Jan-Arve Nygård < > jan.arve.nyg...@gmail.com> > wrote: > > > What version are you running? Check if the copy.volume.wait setting is > set > > to 7200 and increase it. If not you could also check > > job.cancel.threshold.minutes and job.expire.minutes. > > > > -Jan-Arve > > > > 2015-10-13 0:46 GMT+02:00 Ryan Farrington : > > > > > We are experiencing a failure in cloudstack waiting for an async job > > > performing a live migration of a volume to finish. I've copied the > > relevant > > > log entries below.We acknowledge that the migration will take a few > hours > > > based on the volume of the data and we are looking for a way to > increase > > > the timeout of 7200 seconds into something we know we can work with. > > > > > > > > > 2015-10-12 00:19:36,043 DEBUG [o.a.c.s.RemoteHostEndPoint] > > > (Job-Executor-62:ctx-802065a9 ctx-bb27a168) Failed to send command, due > > to > > > Agent:27, com.cloud.exception.OperationTimedoutException: Commands > > > 835325398 to Host 27 timed out after 7200 > > > > > > > > > > > > > > > -- > Rafael Weingärtner > -- Rafael Weingärtner
Re: Timeout with live migration
The slow transfer is related to the storage we are trying to migrate off of. We are capable of getting about 350mbps off the disks but when we are moving volumes that are greater than about 500GB we end up racing the clock and hoping that the migration finishes before the job times out. It would be awesome to be able to manage that timeout and I know there are a ton of settings I just don't know about and am hoping someone might be able to point me in the right direction. From: Rafael Weingärtner [rafaelweingart...@gmail.com] Sent: Monday, October 12, 2015 6:40 PM To: users@cloudstack.apache.org Subject: [Questionable] Re: Timeout with live migration I would first check your NICs' speed and load, the amount of RAM allocated for the migrating VM and than check the hypervisor log files. On Mon, Oct 12, 2015 at 8:19 PM, Jan-Arve Nygård wrote: > What version are you running? Check if the copy.volume.wait setting is set > to 7200 and increase it. If not you could also check > job.cancel.threshold.minutes and job.expire.minutes. > > -Jan-Arve > > 2015-10-13 0:46 GMT+02:00 Ryan Farrington : > > > We are experiencing a failure in cloudstack waiting for an async job > > performing a live migration of a volume to finish. I've copied the > relevant > > log entries below.We acknowledge that the migration will take a few hours > > based on the volume of the data and we are looking for a way to increase > > the timeout of 7200 seconds into something we know we can work with. > > > > > > 2015-10-12 00:19:36,043 DEBUG [o.a.c.s.RemoteHostEndPoint] > > (Job-Executor-62:ctx-802065a9 ctx-bb27a168) Failed to send command, due > to > > Agent:27, com.cloud.exception.OperationTimedoutException: Commands > > 835325398 to Host 27 timed out after 7200 > > > > > > > -- Rafael Weingärtner
Re: Timeout with live migration
We are currently on version 4.3.0. Hypervisor is XenServer.None of the settings are set to 7200 seconds (or any variation that would yield 7200 seconds) but i have provided them below as a reference. Is there any other place where 7200 might be hard coded? We are planning on an upgrade to 4.5.2 next month but this migration needs to happen. We have become pretty proficient at the post volume migration cleanup by manually mucking with the database but it is annoying and I would much rather have cloudstack just wait like i told it to. copy.volume.wait = 10800 (3 hours) job.cancel.threshold.minutes = 60 (1 hour) job.expire.minutes = 1440 (24 hours) From: Jan-Arve Nygård [jan.arve.nyg...@gmail.com] Sent: Monday, October 12, 2015 6:19 PM To: users@cloudstack.apache.org Subject: [Questionable] Re: Timeout with live migration What version are you running? Check if the copy.volume.wait setting is set to 7200 and increase it. If not you could also check job.cancel.threshold.minutes and job.expire.minutes. -Jan-Arve 2015-10-13 0:46 GMT+02:00 Ryan Farrington : > We are experiencing a failure in cloudstack waiting for an async job > performing a live migration of a volume to finish. I've copied the relevant > log entries below.We acknowledge that the migration will take a few hours > based on the volume of the data and we are looking for a way to increase > the timeout of 7200 seconds into something we know we can work with. > > > 2015-10-12 00:19:36,043 DEBUG [o.a.c.s.RemoteHostEndPoint] > (Job-Executor-62:ctx-802065a9 ctx-bb27a168) Failed to send command, due to > Agent:27, com.cloud.exception.OperationTimedoutException: Commands > 835325398 to Host 27 timed out after 7200 > > >
Re: Timeout with live migration
I would first check your NICs' speed and load, the amount of RAM allocated for the migrating VM and than check the hypervisor log files. On Mon, Oct 12, 2015 at 8:19 PM, Jan-Arve Nygård wrote: > What version are you running? Check if the copy.volume.wait setting is set > to 7200 and increase it. If not you could also check > job.cancel.threshold.minutes and job.expire.minutes. > > -Jan-Arve > > 2015-10-13 0:46 GMT+02:00 Ryan Farrington : > > > We are experiencing a failure in cloudstack waiting for an async job > > performing a live migration of a volume to finish. I've copied the > relevant > > log entries below.We acknowledge that the migration will take a few hours > > based on the volume of the data and we are looking for a way to increase > > the timeout of 7200 seconds into something we know we can work with. > > > > > > 2015-10-12 00:19:36,043 DEBUG [o.a.c.s.RemoteHostEndPoint] > > (Job-Executor-62:ctx-802065a9 ctx-bb27a168) Failed to send command, due > to > > Agent:27, com.cloud.exception.OperationTimedoutException: Commands > > 835325398 to Host 27 timed out after 7200 > > > > > > > -- Rafael Weingärtner
Re: Timeout with live migration
What version are you running? Check if the copy.volume.wait setting is set to 7200 and increase it. If not you could also check job.cancel.threshold.minutes and job.expire.minutes. -Jan-Arve 2015-10-13 0:46 GMT+02:00 Ryan Farrington : > We are experiencing a failure in cloudstack waiting for an async job > performing a live migration of a volume to finish. I've copied the relevant > log entries below.We acknowledge that the migration will take a few hours > based on the volume of the data and we are looking for a way to increase > the timeout of 7200 seconds into something we know we can work with. > > > 2015-10-12 00:19:36,043 DEBUG [o.a.c.s.RemoteHostEndPoint] > (Job-Executor-62:ctx-802065a9 ctx-bb27a168) Failed to send command, due to > Agent:27, com.cloud.exception.OperationTimedoutException: Commands > 835325398 to Host 27 timed out after 7200 > > >