Re: Timeout with live migration

2015-10-13 Thread Rafael Weingärtner
No problem, you are welcome. It is nice to heat that worked for you.

Sometimes, it is easier to know how things work looking at the source code
directly.

On Tue, Oct 13, 2015 at 9:37 PM, Ryan Farrington 
wrote:

> Confirmed.  We migrated a few TB worth of volumes without issue.  Thanks
> for helping nail this down!
>
> 
> From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> Sent: Tuesday, October 13, 2015 6:12 PM
> To: users@cloudstack.apache.org
> Subject: [Questionable]  Re: Timeout with live migration
>
> Nice, thanks.
> Did that solve your problem? Did you migrate the volume?
>
> On Tue, Oct 13, 2015 at 7:00 PM, Ryan Farrington <
> rfarring...@remitdata.com>
> wrote:
>
> > Issue #1) Terrible parameter names
> > https://issues.apache.org/jira/browse/CLOUDSTACK-8946
> >
> > Issue #2) Wait value for MigrateVolume
> > https://issues.apache.org/jira/browse/CLOUDSTACK-8949
> >
> >
> >
> > 
> > From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> > Sent: Tuesday, October 13, 2015 3:43 PM
> > To: users@cloudstack.apache.org
> > Subject: [Questionable]  Re: Timeout with live migration
> >
> > That is good. Now I can report to you what is in the code.
> >
> > Let’s start:
> > First: when I looked at the problem at first time, I went straight to the
> > class that sends commands to Xen, and there the ACs uses a parameter
> > called: “migratewait” to control the timeout of command. You tried that,
> > and you still were getting the timeout problem.
> >
> > That happened because despite that timeout, there is another point in ACS
> > that controls the timeout of command that are send to hypervisor (not
> just
> > Xen this time), and in that point, it is used a parameter called, “wait”
> as
> > a default value to control timeouts of command.
> >
> > First conclusion, we have terrible parameter names ;)
> >
> > Second, when we create a “MigrateVolumeCommand” we should set a timeout
> > value, this way the ACS would no use the default value of parameter
> “wait”.
> > That timeout value should be the same as the one used on
> CitrixResourceBase
> > and its children to control the migration of volumes.
> >
> > Can you report what happened to you in a Jira ticket and add my comments
> > there?
> > I think next Saturday I can have someone working on that for the next ACS
> > release (4.7?), or even 4.6 if the PR gets accepted.
> >
> > Please send me the jira ticket as soon as you open it.
> >
> > On Tue, Oct 13, 2015 at 5:33 PM, Ryan Farrington <
> > rfarring...@remitdata.com>
> > wrote:
> >
> > > Looks like whatever change I made actually resulted in a change in
> > > behavior.  Prior to the change we were seeing a message every hour
> > stating
> > > that the job agent was waiting now we see it waited 2hours and 8
> minutes
> > > without a peep before finishing. So making the change to the "wait"
> > > parameter is what made the magic happen.
> > >
> > >
> > >
> > > 2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request]
> > > (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Sending
> {
> > > Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags:
> > > 100111,
> > >
> >
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
> > > }
> > > 2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request]
> > > (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343:
> > Executing:  {
> > > Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags:
> > > 100111,
> > >
> >
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
> > > }
> > > 2015-10-13

Re: Timeout with live migration

2015-10-13 Thread Ryan Farrington
Confirmed.  We migrated a few TB worth of volumes without issue.  Thanks for 
helping nail this down! 


From: Rafael Weingärtner [rafaelweingart...@gmail.com]
Sent: Tuesday, October 13, 2015 6:12 PM
To: users@cloudstack.apache.org
Subject: [Questionable]  Re: Timeout with live migration

Nice, thanks.
Did that solve your problem? Did you migrate the volume?

On Tue, Oct 13, 2015 at 7:00 PM, Ryan Farrington 
wrote:

> Issue #1) Terrible parameter names
> https://issues.apache.org/jira/browse/CLOUDSTACK-8946
>
> Issue #2) Wait value for MigrateVolume
> https://issues.apache.org/jira/browse/CLOUDSTACK-8949
>
>
>
> 
> From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> Sent: Tuesday, October 13, 2015 3:43 PM
> To: users@cloudstack.apache.org
> Subject: [Questionable]  Re: Timeout with live migration
>
> That is good. Now I can report to you what is in the code.
>
> Let’s start:
> First: when I looked at the problem at first time, I went straight to the
> class that sends commands to Xen, and there the ACs uses a parameter
> called: “migratewait” to control the timeout of command. You tried that,
> and you still were getting the timeout problem.
>
> That happened because despite that timeout, there is another point in ACS
> that controls the timeout of command that are send to hypervisor (not just
> Xen this time), and in that point, it is used a parameter called, “wait” as
> a default value to control timeouts of command.
>
> First conclusion, we have terrible parameter names ;)
>
> Second, when we create a “MigrateVolumeCommand” we should set a timeout
> value, this way the ACS would no use the default value of parameter “wait”.
> That timeout value should be the same as the one used on CitrixResourceBase
> and its children to control the migration of volumes.
>
> Can you report what happened to you in a Jira ticket and add my comments
> there?
> I think next Saturday I can have someone working on that for the next ACS
> release (4.7?), or even 4.6 if the PR gets accepted.
>
> Please send me the jira ticket as soon as you open it.
>
> On Tue, Oct 13, 2015 at 5:33 PM, Ryan Farrington <
> rfarring...@remitdata.com>
> wrote:
>
> > Looks like whatever change I made actually resulted in a change in
> > behavior.  Prior to the change we were seeing a message every hour
> stating
> > that the job agent was waiting now we see it waited 2hours and 8 minutes
> > without a peep before finishing. So making the change to the "wait"
> > parameter is what made the magic happen.
> >
> >
> >
> > 2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request]
> > (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Sending  {
> > Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags:
> > 100111,
> >
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
> > }
> > 2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request]
> > (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343:
> Executing:  {
> > Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags:
> > 100111,
> >
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
> > }
> > 2015-10-13 13:21:48,282 DEBUG [c.c.a.m.DirectAgentAttache]
> > (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Executing request
> > 2015-10-13 15:27:13,396 DEBUG [c.c.a.m.DirectAgentAttache]
> > (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Response Received:
> > 2015-10-13 15:27:13,397 DEBUG [c.c.a.t.Request]
> > (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Processing:  { Ans: ,
> > MgmtId: 42756806312036, via: 38, Ver: v1, Flags: 110,
> >
> [{"com.cloud.agent.api.storage.MigrateVolumeAnswer":{"volumePath":"00db15be-3ccd-4648-8928-35ca90924d7c","result":true,"wait":0}}]
> > }
> > 2015-10-13 15:27:

Re: Timeout with live migration

2015-10-13 Thread Rafael Weingärtner
Nice, thanks.
Did that solve your problem? Did you migrate the volume?

On Tue, Oct 13, 2015 at 7:00 PM, Ryan Farrington 
wrote:

> Issue #1) Terrible parameter names
> https://issues.apache.org/jira/browse/CLOUDSTACK-8946
>
> Issue #2) Wait value for MigrateVolume
> https://issues.apache.org/jira/browse/CLOUDSTACK-8949
>
>
>
> 
> From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> Sent: Tuesday, October 13, 2015 3:43 PM
> To: users@cloudstack.apache.org
> Subject: [Questionable]  Re: Timeout with live migration
>
> That is good. Now I can report to you what is in the code.
>
> Let’s start:
> First: when I looked at the problem at first time, I went straight to the
> class that sends commands to Xen, and there the ACs uses a parameter
> called: “migratewait” to control the timeout of command. You tried that,
> and you still were getting the timeout problem.
>
> That happened because despite that timeout, there is another point in ACS
> that controls the timeout of command that are send to hypervisor (not just
> Xen this time), and in that point, it is used a parameter called, “wait” as
> a default value to control timeouts of command.
>
> First conclusion, we have terrible parameter names ;)
>
> Second, when we create a “MigrateVolumeCommand” we should set a timeout
> value, this way the ACS would no use the default value of parameter “wait”.
> That timeout value should be the same as the one used on CitrixResourceBase
> and its children to control the migration of volumes.
>
> Can you report what happened to you in a Jira ticket and add my comments
> there?
> I think next Saturday I can have someone working on that for the next ACS
> release (4.7?), or even 4.6 if the PR gets accepted.
>
> Please send me the jira ticket as soon as you open it.
>
> On Tue, Oct 13, 2015 at 5:33 PM, Ryan Farrington <
> rfarring...@remitdata.com>
> wrote:
>
> > Looks like whatever change I made actually resulted in a change in
> > behavior.  Prior to the change we were seeing a message every hour
> stating
> > that the job agent was waiting now we see it waited 2hours and 8 minutes
> > without a peep before finishing. So making the change to the "wait"
> > parameter is what made the magic happen.
> >
> >
> >
> > 2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request]
> > (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Sending  {
> > Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags:
> > 100111,
> >
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
> > }
> > 2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request]
> > (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343:
> Executing:  {
> > Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags:
> > 100111,
> >
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
> > }
> > 2015-10-13 13:21:48,282 DEBUG [c.c.a.m.DirectAgentAttache]
> > (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Executing request
> > 2015-10-13 15:27:13,396 DEBUG [c.c.a.m.DirectAgentAttache]
> > (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Response Received:
> > 2015-10-13 15:27:13,397 DEBUG [c.c.a.t.Request]
> > (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Processing:  { Ans: ,
> > MgmtId: 42756806312036, via: 38, Ver: v1, Flags: 110,
> >
> [{"com.cloud.agent.api.storage.MigrateVolumeAnswer":{"volumePath":"00db15be-3ccd-4648-8928-35ca90924d7c","result":true,"wait":0}}]
> > }
> > 2015-10-13 15:27:13,397 DEBUG [c.c.a.m.AgentAttache]
> > (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: No more commands found
> > 2015-10-13 15:27:13,397 DEBUG [c.c.a.t.Request]
> > (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Received:
> {
> > Ans: , MgmtId: 42756806312036, via: 38, Ver: v1, Fla

Re: Timeout with live migration

2015-10-13 Thread Ryan Farrington
Issue #1) Terrible parameter names
https://issues.apache.org/jira/browse/CLOUDSTACK-8946

Issue #2) Wait value for MigrateVolume
https://issues.apache.org/jira/browse/CLOUDSTACK-8949




From: Rafael Weingärtner [rafaelweingart...@gmail.com]
Sent: Tuesday, October 13, 2015 3:43 PM
To: users@cloudstack.apache.org
Subject: [Questionable]  Re: Timeout with live migration

That is good. Now I can report to you what is in the code.

Let’s start:
First: when I looked at the problem at first time, I went straight to the
class that sends commands to Xen, and there the ACs uses a parameter
called: “migratewait” to control the timeout of command. You tried that,
and you still were getting the timeout problem.

That happened because despite that timeout, there is another point in ACS
that controls the timeout of command that are send to hypervisor (not just
Xen this time), and in that point, it is used a parameter called, “wait” as
a default value to control timeouts of command.

First conclusion, we have terrible parameter names ;)

Second, when we create a “MigrateVolumeCommand” we should set a timeout
value, this way the ACS would no use the default value of parameter “wait”.
That timeout value should be the same as the one used on CitrixResourceBase
and its children to control the migration of volumes.

Can you report what happened to you in a Jira ticket and add my comments
there?
I think next Saturday I can have someone working on that for the next ACS
release (4.7?), or even 4.6 if the PR gets accepted.

Please send me the jira ticket as soon as you open it.

On Tue, Oct 13, 2015 at 5:33 PM, Ryan Farrington 
wrote:

> Looks like whatever change I made actually resulted in a change in
> behavior.  Prior to the change we were seeing a message every hour stating
> that the job agent was waiting now we see it waited 2hours and 8 minutes
> without a peep before finishing. So making the change to the "wait"
> parameter is what made the magic happen.
>
>
>
> 2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request]
> (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Sending  {
> Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags:
> 100111,
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
> }
> 2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request]
> (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Executing:  {
> Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags:
> 100111,
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
> }
> 2015-10-13 13:21:48,282 DEBUG [c.c.a.m.DirectAgentAttache]
> (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Executing request
> 2015-10-13 15:27:13,396 DEBUG [c.c.a.m.DirectAgentAttache]
> (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Response Received:
> 2015-10-13 15:27:13,397 DEBUG [c.c.a.t.Request]
> (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Processing:  { Ans: ,
> MgmtId: 42756806312036, via: 38, Ver: v1, Flags: 110,
> [{"com.cloud.agent.api.storage.MigrateVolumeAnswer":{"volumePath":"00db15be-3ccd-4648-8928-35ca90924d7c","result":true,"wait":0}}]
> }
> 2015-10-13 15:27:13,397 DEBUG [c.c.a.m.AgentAttache]
> (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: No more commands found
> 2015-10-13 15:27:13,397 DEBUG [c.c.a.t.Request]
> (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Received:  {
> Ans: , MgmtId: 42756806312036, via: 38, Ver: v1, Flags: 110, {
> MigrateVolumeAnswer } }
>
>
> 
> From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> Sent: Tuesday, October 13, 2015 8:09 AM
> To: users@cloudstack.apache.org
> Subject: [Questionable]  Re: Timeout with live migration
>
> Let’s wait to see if there is nothing else messing with that timeout. Then
> I send you the details to put into the Jira ticket.
>
> On Tue, Oct 13, 2015 at 10:06 AM, Ryan Farrington <
> rfarring...@remitdata.com
> > wrote:
>
> > Rafa

Re: Timeout with live migration

2015-10-13 Thread Rafael Weingärtner
That is good. Now I can report to you what is in the code.

Let’s start:
First: when I looked at the problem at first time, I went straight to the
class that sends commands to Xen, and there the ACs uses a parameter
called: “migratewait” to control the timeout of command. You tried that,
and you still were getting the timeout problem.

That happened because despite that timeout, there is another point in ACS
that controls the timeout of command that are send to hypervisor (not just
Xen this time), and in that point, it is used a parameter called, “wait” as
a default value to control timeouts of command.

First conclusion, we have terrible parameter names ;)

Second, when we create a “MigrateVolumeCommand” we should set a timeout
value, this way the ACS would no use the default value of parameter “wait”.
That timeout value should be the same as the one used on CitrixResourceBase
and its children to control the migration of volumes.

Can you report what happened to you in a Jira ticket and add my comments
there?
I think next Saturday I can have someone working on that for the next ACS
release (4.7?), or even 4.6 if the PR gets accepted.

Please send me the jira ticket as soon as you open it.

On Tue, Oct 13, 2015 at 5:33 PM, Ryan Farrington 
wrote:

> Looks like whatever change I made actually resulted in a change in
> behavior.  Prior to the change we were seeing a message every hour stating
> that the job agent was waiting now we see it waited 2hours and 8 minutes
> without a peep before finishing. So making the change to the "wait"
> parameter is what made the magic happen.
>
>
>
> 2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request]
> (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Sending  {
> Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags:
> 100111,
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
> }
> 2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request]
> (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Executing:  {
> Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags:
> 100111,
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
> }
> 2015-10-13 13:21:48,282 DEBUG [c.c.a.m.DirectAgentAttache]
> (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Executing request
> 2015-10-13 15:27:13,396 DEBUG [c.c.a.m.DirectAgentAttache]
> (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Response Received:
> 2015-10-13 15:27:13,397 DEBUG [c.c.a.t.Request]
> (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Processing:  { Ans: ,
> MgmtId: 42756806312036, via: 38, Ver: v1, Flags: 110,
> [{"com.cloud.agent.api.storage.MigrateVolumeAnswer":{"volumePath":"00db15be-3ccd-4648-8928-35ca90924d7c","result":true,"wait":0}}]
> }
> 2015-10-13 15:27:13,397 DEBUG [c.c.a.m.AgentAttache]
> (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: No more commands found
> 2015-10-13 15:27:13,397 DEBUG [c.c.a.t.Request]
> (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Received:  {
> Ans: , MgmtId: 42756806312036, via: 38, Ver: v1, Flags: 110, {
> MigrateVolumeAnswer } }
>
>
> 
> From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> Sent: Tuesday, October 13, 2015 8:09 AM
> To: users@cloudstack.apache.org
> Subject: [Questionable]  Re: Timeout with live migration
>
> Let’s wait to see if there is nothing else messing with that timeout. Then
> I send you the details to put into the Jira ticket.
>
> On Tue, Oct 13, 2015 at 10:06 AM, Ryan Farrington <
> rfarring...@remitdata.com
> > wrote:
>
> > Rafael,
> >     I am still a bit confused as to what you would like for me to place
> in
> > the JIRA ticket.  I have adjusted the "wait" parameter and will be able
> to
> > test it in about an hour.  But i would think the JIRA ticket should be as
> > detailed as I can make it or will you be adding details once I have it
> > created?
> >
> >
> >
> >

Re: Timeout with live migration

2015-10-13 Thread Ryan Farrington
Looks like whatever change I made actually resulted in a change in behavior.  
Prior to the change we were seeing a message every hour stating that the job 
agent was waiting now we see it waited 2hours and 8 minutes without a peep 
before finishing. So making the change to the "wait" parameter is what made the 
magic happen.  



2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request] (Job-Executor-1:ctx-8e0ebced 
ctx-f49e7503) Seq 38-1788936343: Sending  { Cmd , MgmtId: 42756806312036, via: 
38(xen-nc-bc2b7), Ver: v1, Flags: 100111, 
[{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
 }
2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request] (Job-Executor-1:ctx-8e0ebced 
ctx-f49e7503) Seq 38-1788936343: Executing:  { Cmd , MgmtId: 42756806312036, 
via: 38(xen-nc-bc2b7), Ver: v1, Flags: 100111, 
[{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
 }
2015-10-13 13:21:48,282 DEBUG [c.c.a.m.DirectAgentAttache] 
(DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Executing request
2015-10-13 15:27:13,396 DEBUG [c.c.a.m.DirectAgentAttache] 
(DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Response Received:
2015-10-13 15:27:13,397 DEBUG [c.c.a.t.Request] (DirectAgent-430:ctx-ac6d7aeb) 
Seq 38-1788936343: Processing:  { Ans: , MgmtId: 42756806312036, via: 38, Ver: 
v1, Flags: 110, 
[{"com.cloud.agent.api.storage.MigrateVolumeAnswer":{"volumePath":"00db15be-3ccd-4648-8928-35ca90924d7c","result":true,"wait":0}}]
 }
2015-10-13 15:27:13,397 DEBUG [c.c.a.m.AgentAttache] 
(DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: No more commands found
2015-10-13 15:27:13,397 DEBUG [c.c.a.t.Request] (Job-Executor-1:ctx-8e0ebced 
ctx-f49e7503) Seq 38-1788936343: Received:  { Ans: , MgmtId: 42756806312036, 
via: 38, Ver: v1, Flags: 110, { MigrateVolumeAnswer } }



From: Rafael Weingärtner [rafaelweingart...@gmail.com]
Sent: Tuesday, October 13, 2015 8:09 AM
To: users@cloudstack.apache.org
Subject: [Questionable]  Re: Timeout with live migration

Let’s wait to see if there is nothing else messing with that timeout. Then
I send you the details to put into the Jira ticket.

On Tue, Oct 13, 2015 at 10:06 AM, Ryan Farrington  wrote:

> Rafael,
> I am still a bit confused as to what you would like for me to place in
> the JIRA ticket.  I have adjusted the "wait" parameter and will be able to
> test it in about an hour.  But i would think the JIRA ticket should be as
> detailed as I can make it or will you be adding details once I have it
> created?
>
>
>
>
> 
> From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> Sent: Tuesday, October 13, 2015 7:52 AM
> To: users@cloudstack.apache.org
> Subject: [Questionable]  Re: [Questionable] Re: Timeout with live migration
>
> I guess so, for some reason that I do not understand, the code is
> multiplying the value from that parameter by 2, something like 18000 should
> do the tricky
>
> On Tue, Oct 13, 2015 at 12:15 AM, Ryan Farrington <
> rfarring...@remitdata.com
> > wrote:
>
> > Yes i can open JIRA tickets. What would you like for me to do?
> >
> > I'll be happy to change the "wait" parameter.  Do I assume it should be
> > 1/2 of the value i want it to be?
> >
> >
> >
> > 
> > From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> > Sent: Monday, October 12, 2015 10:12 PM
> > To: users@cloudstack.apache.org
> > Subject: [Questionable]  Re: Timeout with live migration
> >
> > There is your problem, there are currently two distinct values conrolling
> > those async jobs.
> > Change that value and everything will work for u.
> > Can you open a jira ticket?
> >
> > On Mon, Oct 12, 2015 at 11:51 PM, Ryan Farrington <
> > rfarring...@remitdata.com
> > > wrote:
> >
> > > wait is currently configured to be 3600
> > >
> > >
> > >
> > > _

Re: Timeout with live migration

2015-10-13 Thread Rafael Weingärtner
Let’s wait to see if there is nothing else messing with that timeout. Then
I send you the details to put into the Jira ticket.

On Tue, Oct 13, 2015 at 10:06 AM, Ryan Farrington  wrote:

> Rafael,
> I am still a bit confused as to what you would like for me to place in
> the JIRA ticket.  I have adjusted the "wait" parameter and will be able to
> test it in about an hour.  But i would think the JIRA ticket should be as
> detailed as I can make it or will you be adding details once I have it
> created?
>
>
>
>
> 
> From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> Sent: Tuesday, October 13, 2015 7:52 AM
> To: users@cloudstack.apache.org
> Subject: [Questionable]  Re: [Questionable] Re: Timeout with live migration
>
> I guess so, for some reason that I do not understand, the code is
> multiplying the value from that parameter by 2, something like 18000 should
> do the tricky
>
> On Tue, Oct 13, 2015 at 12:15 AM, Ryan Farrington <
> rfarring...@remitdata.com
> > wrote:
>
> > Yes i can open JIRA tickets. What would you like for me to do?
> >
> > I'll be happy to change the "wait" parameter.  Do I assume it should be
> > 1/2 of the value i want it to be?
> >
> >
> >
> > 
> > From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> > Sent: Monday, October 12, 2015 10:12 PM
> > To: users@cloudstack.apache.org
> > Subject: [Questionable]  Re: Timeout with live migration
> >
> > There is your problem, there are currently two distinct values conrolling
> > those async jobs.
> > Change that value and everything will work for u.
> > Can you open a jira ticket?
> >
> > On Mon, Oct 12, 2015 at 11:51 PM, Ryan Farrington <
> > rfarring...@remitdata.com
> > > wrote:
> >
> > > wait is currently configured to be 3600
> > >
> > >
> > >
> > > 
> > > From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> > > Sent: Monday, October 12, 2015 9:46 PM
> > > To: users@cloudstack.apache.org
> > > Subject: [Questionable]  Re: Timeout with live migration
> > >
> > > I found something odd,
> > > can you check the parameter called "wait", what value is it using ?
> > >
> > > On Mon, Oct 12, 2015 at 10:54 PM, Ryan Farrington <
> > > rfarring...@remitdata.com
> > > > wrote:
> > >
> > > > Yes the parameter was set long ago and the management server has been
> > > > restarted numerous time over the past few days as we played with
> other
> > > > parameters to no effect.
> > > >
> > > > After looking at the log a little more does the "Failed to send
> > command,
> > > > due to Agent:38, com.cloud.exception.OperationTimedoutException:
> > Commands
> > > > 996939857 to Host 38 timed out after 7200" mean that the migration
> > start
> > > > command is being sent in some kind of synchronous mode and not
> > returning
> > > > control back to the job manager?
> > > >
> > > >
> > > >
> > > >
> > > > 
> > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> > > > Sent: Monday, October 12, 2015 8:46 PM
> > > > To: users@cloudstack.apache.org
> > > > Subject: [Questionable]  Re: Timeout with live migration
> > > >
> > > > I thought you using the command  “migrateVirtualMachineWithVolume”
> but
> > it
> > > > seems that you are using “migrateVolume” command from ACS's API.
> > > >
> > > >
> > > > For the code I debugged “migrateVirtualMachineWithVolume”, the
> > parameter
> > > > 3600, means 1 hour of timeout.
> > > >
> > > > For the “migrateVolume” is the same, they both end up in
> > > >
> > > >
> > >
> >
> “com.cloud.hypervisor.xen.resource.XenServer610Resource.execute(MigrateVolumeCommand)”,
> > > > and in that method the parameter is the same.
> > > >
> > > >
> > > > If your parameter is set to 36000 (10 hours) I do not see why you are
> > > > getting the exception after 2 hours.
> > > >
> > > > Did you restart the management servers after you changed the
> parameter?
> > > >
> > > > On Mon, Oct 12, 2015 at 10:31 PM, Ryan Farrington <
> &g

Re: Timeout with live migration

2015-10-13 Thread Ryan Farrington
Rafael,
I am still a bit confused as to what you would like for me to place in the 
JIRA ticket.  I have adjusted the "wait" parameter and will be able to test it 
in about an hour.  But i would think the JIRA ticket should be as detailed as I 
can make it or will you be adding details once I have it created? 





From: Rafael Weingärtner [rafaelweingart...@gmail.com]
Sent: Tuesday, October 13, 2015 7:52 AM
To: users@cloudstack.apache.org
Subject: [Questionable]  Re: [Questionable] Re: Timeout with live migration

I guess so, for some reason that I do not understand, the code is
multiplying the value from that parameter by 2, something like 18000 should
do the tricky

On Tue, Oct 13, 2015 at 12:15 AM, Ryan Farrington  wrote:

> Yes i can open JIRA tickets. What would you like for me to do?
>
> I'll be happy to change the "wait" parameter.  Do I assume it should be
> 1/2 of the value i want it to be?
>
>
>
> 
> From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> Sent: Monday, October 12, 2015 10:12 PM
> To: users@cloudstack.apache.org
> Subject: [Questionable]  Re: Timeout with live migration
>
> There is your problem, there are currently two distinct values conrolling
> those async jobs.
> Change that value and everything will work for u.
> Can you open a jira ticket?
>
> On Mon, Oct 12, 2015 at 11:51 PM, Ryan Farrington <
> rfarring...@remitdata.com
> > wrote:
>
> > wait is currently configured to be 3600
> >
> >
> >
> > 
> > From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> > Sent: Monday, October 12, 2015 9:46 PM
> > To: users@cloudstack.apache.org
> > Subject: [Questionable]  Re: Timeout with live migration
> >
> > I found something odd,
> > can you check the parameter called "wait", what value is it using ?
> >
> > On Mon, Oct 12, 2015 at 10:54 PM, Ryan Farrington <
> > rfarring...@remitdata.com
> > > wrote:
> >
> > > Yes the parameter was set long ago and the management server has been
> > > restarted numerous time over the past few days as we played with other
> > > parameters to no effect.
> > >
> > > After looking at the log a little more does the "Failed to send
> command,
> > > due to Agent:38, com.cloud.exception.OperationTimedoutException:
> Commands
> > > 996939857 to Host 38 timed out after 7200" mean that the migration
> start
> > > command is being sent in some kind of synchronous mode and not
> returning
> > > control back to the job manager?
> > >
> > >
> > >
> > >
> > > 
> > > From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> > > Sent: Monday, October 12, 2015 8:46 PM
> > > To: users@cloudstack.apache.org
> > > Subject: [Questionable]  Re: Timeout with live migration
> > >
> > > I thought you using the command  “migrateVirtualMachineWithVolume” but
> it
> > > seems that you are using “migrateVolume” command from ACS's API.
> > >
> > >
> > > For the code I debugged “migrateVirtualMachineWithVolume”, the
> parameter
> > > 3600, means 1 hour of timeout.
> > >
> > > For the “migrateVolume” is the same, they both end up in
> > >
> > >
> >
> “com.cloud.hypervisor.xen.resource.XenServer610Resource.execute(MigrateVolumeCommand)”,
> > > and in that method the parameter is the same.
> > >
> > >
> > > If your parameter is set to 36000 (10 hours) I do not see why you are
> > > getting the exception after 2 hours.
> > >
> > > Did you restart the management servers after you changed the parameter?
> > >
> > > On Mon, Oct 12, 2015 at 10:31 PM, Ryan Farrington <
> > > rfarring...@remitdata.com
> > > > wrote:
> > >
> > > > Here is the full log, including the stack for the exception, that we
> > get
> > > > at the 2 hour mark. as for the migratewait it is set to 36000 which
> > > should
> > > > be 10 hours.
> > > >
> > > > 2015-10-12 18:41:20,137 DEBUG [c.c.a.m.DirectAgentAttache]
> > > > (DirectAgent-323:ctx-6d42edd7) Seq 31-1023875267: Executing request
> > > > 2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache]
> > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting
> > > some
> > > > more time because this is the current command
>

Re: [Questionable] Re: Timeout with live migration

2015-10-13 Thread Rafael Weingärtner
I guess so, for some reason that I do not understand, the code is
multiplying the value from that parameter by 2, something like 18000 should
do the tricky

On Tue, Oct 13, 2015 at 12:15 AM, Ryan Farrington  wrote:

> Yes i can open JIRA tickets. What would you like for me to do?
>
> I'll be happy to change the "wait" parameter.  Do I assume it should be
> 1/2 of the value i want it to be?
>
>
>
> 
> From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> Sent: Monday, October 12, 2015 10:12 PM
> To: users@cloudstack.apache.org
> Subject: [Questionable]  Re: Timeout with live migration
>
> There is your problem, there are currently two distinct values conrolling
> those async jobs.
> Change that value and everything will work for u.
> Can you open a jira ticket?
>
> On Mon, Oct 12, 2015 at 11:51 PM, Ryan Farrington <
> rfarring...@remitdata.com
> > wrote:
>
> > wait is currently configured to be 3600
> >
> >
> >
> > 
> > From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> > Sent: Monday, October 12, 2015 9:46 PM
> > To: users@cloudstack.apache.org
> > Subject: [Questionable]  Re: Timeout with live migration
> >
> > I found something odd,
> > can you check the parameter called "wait", what value is it using ?
> >
> > On Mon, Oct 12, 2015 at 10:54 PM, Ryan Farrington <
> > rfarring...@remitdata.com
> > > wrote:
> >
> > > Yes the parameter was set long ago and the management server has been
> > > restarted numerous time over the past few days as we played with other
> > > parameters to no effect.
> > >
> > > After looking at the log a little more does the "Failed to send
> command,
> > > due to Agent:38, com.cloud.exception.OperationTimedoutException:
> Commands
> > > 996939857 to Host 38 timed out after 7200" mean that the migration
> start
> > > command is being sent in some kind of synchronous mode and not
> returning
> > > control back to the job manager?
> > >
> > >
> > >
> > >
> > > 
> > > From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> > > Sent: Monday, October 12, 2015 8:46 PM
> > > To: users@cloudstack.apache.org
> > > Subject: [Questionable]  Re: Timeout with live migration
> > >
> > > I thought you using the command  “migrateVirtualMachineWithVolume” but
> it
> > > seems that you are using “migrateVolume” command from ACS's API.
> > >
> > >
> > > For the code I debugged “migrateVirtualMachineWithVolume”, the
> parameter
> > > 3600, means 1 hour of timeout.
> > >
> > > For the “migrateVolume” is the same, they both end up in
> > >
> > >
> >
> “com.cloud.hypervisor.xen.resource.XenServer610Resource.execute(MigrateVolumeCommand)”,
> > > and in that method the parameter is the same.
> > >
> > >
> > > If your parameter is set to 36000 (10 hours) I do not see why you are
> > > getting the exception after 2 hours.
> > >
> > > Did you restart the management servers after you changed the parameter?
> > >
> > > On Mon, Oct 12, 2015 at 10:31 PM, Ryan Farrington <
> > > rfarring...@remitdata.com
> > > > wrote:
> > >
> > > > Here is the full log, including the stack for the exception, that we
> > get
> > > > at the 2 hour mark. as for the migratewait it is set to 36000 which
> > > should
> > > > be 10 hours.
> > > >
> > > > 2015-10-12 18:41:20,137 DEBUG [c.c.a.m.DirectAgentAttache]
> > > > (DirectAgent-323:ctx-6d42edd7) Seq 31-1023875267: Executing request
> > > > 2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache]
> > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting
> > > some
> > > > more time because this is the current command
> > > > 2015-10-12 18:41:20,457 INFO  [c.c.u.e.CSExceptionErrorCode]
> > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Could not find exception:
> > > > com.cloud.exception.OperationTimedoutException in error code list for
> > > > exceptions
> > > > 2015-10-12 18:41:20,465 WARN  [c.c.a.m.AgentAttache]
> > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Timed
> out
> > > on
> > > > Seq 38-996939857:  { Cmd , MgmtId: 42756806312036, v

Re: [Questionable] Re: Timeout with live migration

2015-10-13 Thread Jakub Kublik

Hi Ryan,
we hit the same problem few days ago. After we changed parameters: 
migratewait, storage.pool.max.waitseconds, vm.op.cancel.interval and 
vm.op.cleanup.wait to 36000 (10h) we can migrate large volumes (500GB 
and more).

We use xenserver 6.5 and ACS 4.5.1
Regards,
Kuba

W dniu 2015-10-13 o 05:15, Ryan Farrington pisze:

Yes i can open JIRA tickets. What would you like for me to do?

I'll be happy to change the "wait" parameter.  Do I assume it should be 1/2 of 
the value i want it to be?




From: Rafael Weingärtner [rafaelweingart...@gmail.com]
Sent: Monday, October 12, 2015 10:12 PM
To: users@cloudstack.apache.org
Subject: [Questionable]  Re: Timeout with live migration

There is your problem, there are currently two distinct values conrolling
those async jobs.
Change that value and everything will work for u.
Can you open a jira ticket?

On Mon, Oct 12, 2015 at 11:51 PM, Ryan Farrington 
wrote:
wait is currently configured to be 3600




From: Rafael Weingärtner [rafaelweingart...@gmail.com]
Sent: Monday, October 12, 2015 9:46 PM
To: users@cloudstack.apache.org
Subject: [Questionable]  Re: Timeout with live migration

I found something odd,
can you check the parameter called "wait", what value is it using ?

On Mon, Oct 12, 2015 at 10:54 PM, Ryan Farrington <
rfarring...@remitdata.com

wrote:
Yes the parameter was set long ago and the management server has been
restarted numerous time over the past few days as we played with other
parameters to no effect.

After looking at the log a little more does the "Failed to send command,
due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands
996939857 to Host 38 timed out after 7200" mean that the migration start
command is being sent in some kind of synchronous mode and not returning
control back to the job manager?





From: Rafael Weingärtner [rafaelweingart...@gmail.com]
Sent: Monday, October 12, 2015 8:46 PM
To: users@cloudstack.apache.org
Subject: [Questionable]  Re: Timeout with live migration

I thought you using the command  “migrateVirtualMachineWithVolume” but it
seems that you are using “migrateVolume” command from ACS's API.


For the code I debugged “migrateVirtualMachineWithVolume”, the parameter
3600, means 1 hour of timeout.

For the “migrateVolume” is the same, they both end up in



“com.cloud.hypervisor.xen.resource.XenServer610Resource.execute(MigrateVolumeCommand)”,

and in that method the parameter is the same.


If your parameter is set to 36000 (10 hours) I do not see why you are
getting the exception after 2 hours.

Did you restart the management servers after you changed the parameter?

On Mon, Oct 12, 2015 at 10:31 PM, Ryan Farrington <
rfarring...@remitdata.com

wrote:
Here is the full log, including the stack for the exception, that we

get

at the 2 hour mark. as for the migratewait it is set to 36000 which

should

be 10 hours.

2015-10-12 18:41:20,137 DEBUG [c.c.a.m.DirectAgentAttache]
(DirectAgent-323:ctx-6d42edd7) Seq 31-1023875267: Executing request
2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache]
(Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting

some

more time because this is the current command
2015-10-12 18:41:20,457 INFO  [c.c.u.e.CSExceptionErrorCode]
(Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Could not find exception:
com.cloud.exception.OperationTimedoutException in error code list for
exceptions
2015-10-12 18:41:20,465 WARN  [c.c.a.m.AgentAttache]
(Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Timed out

on

Seq 38-996939857:  { Cmd , MgmtId: 42756806312036, via:

38(xen-nc-bc2b7),

Ver: v1, Flags: 100111,


[{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]

}
2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache]
(Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857:

Cancelling.

2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache]
(Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: No more
commands found
2015-10-12 18:41:20,465 DEBUG [o.a.c.s.RemoteHostEndPoint]
(Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Failed to send command, due

to

Agent:38, com.cloud.exception.OperationTimedoutException: Commands
996939857 to Host 38 timed out after 7200
2015-10-12 18:41:20,471 DEBUG [o.a.c.s.m.AncientDataMotionStrategy]
(Job-Executor-63:ctx-f7b6817d ctx-c6b92515) copy failed
com.cloud.utils.exception.CloudRuntimeException: Failed to send

command,

due to A

RE: [Questionable] Re: Timeout with live migration

2015-10-12 Thread Ryan Farrington
Yes i can open JIRA tickets. What would you like for me to do?

I'll be happy to change the "wait" parameter.  Do I assume it should be 1/2 of 
the value i want it to be? 




From: Rafael Weingärtner [rafaelweingart...@gmail.com]
Sent: Monday, October 12, 2015 10:12 PM
To: users@cloudstack.apache.org
Subject: [Questionable]  Re: Timeout with live migration

There is your problem, there are currently two distinct values conrolling
those async jobs.
Change that value and everything will work for u.
Can you open a jira ticket?

On Mon, Oct 12, 2015 at 11:51 PM, Ryan Farrington  wrote:

> wait is currently configured to be 3600
>
>
>
> 
> From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> Sent: Monday, October 12, 2015 9:46 PM
> To: users@cloudstack.apache.org
> Subject: [Questionable]  Re: Timeout with live migration
>
> I found something odd,
> can you check the parameter called "wait", what value is it using ?
>
> On Mon, Oct 12, 2015 at 10:54 PM, Ryan Farrington <
> rfarring...@remitdata.com
> > wrote:
>
> > Yes the parameter was set long ago and the management server has been
> > restarted numerous time over the past few days as we played with other
> > parameters to no effect.
> >
> > After looking at the log a little more does the "Failed to send command,
> > due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands
> > 996939857 to Host 38 timed out after 7200" mean that the migration start
> > command is being sent in some kind of synchronous mode and not returning
> > control back to the job manager?
> >
> >
> >
> >
> > ____
> > From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> > Sent: Monday, October 12, 2015 8:46 PM
> > To: users@cloudstack.apache.org
> > Subject: [Questionable]  Re: Timeout with live migration
> >
> > I thought you using the command  “migrateVirtualMachineWithVolume” but it
> > seems that you are using “migrateVolume” command from ACS's API.
> >
> >
> > For the code I debugged “migrateVirtualMachineWithVolume”, the parameter
> > 3600, means 1 hour of timeout.
> >
> > For the “migrateVolume” is the same, they both end up in
> >
> >
> “com.cloud.hypervisor.xen.resource.XenServer610Resource.execute(MigrateVolumeCommand)”,
> > and in that method the parameter is the same.
> >
> >
> > If your parameter is set to 36000 (10 hours) I do not see why you are
> > getting the exception after 2 hours.
> >
> > Did you restart the management servers after you changed the parameter?
> >
> > On Mon, Oct 12, 2015 at 10:31 PM, Ryan Farrington <
> > rfarring...@remitdata.com
> > > wrote:
> >
> > > Here is the full log, including the stack for the exception, that we
> get
> > > at the 2 hour mark. as for the migratewait it is set to 36000 which
> > should
> > > be 10 hours.
> > >
> > > 2015-10-12 18:41:20,137 DEBUG [c.c.a.m.DirectAgentAttache]
> > > (DirectAgent-323:ctx-6d42edd7) Seq 31-1023875267: Executing request
> > > 2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache]
> > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting
> > some
> > > more time because this is the current command
> > > 2015-10-12 18:41:20,457 INFO  [c.c.u.e.CSExceptionErrorCode]
> > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Could not find exception:
> > > com.cloud.exception.OperationTimedoutException in error code list for
> > > exceptions
> > > 2015-10-12 18:41:20,465 WARN  [c.c.a.m.AgentAttache]
> > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Timed out
> > on
> > > Seq 38-996939857:  { Cmd , MgmtId: 42756806312036, via:
> 38(xen-nc-bc2b7),
> > > Ver: v1, Flags: 100111,
> > >
> >
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
> > > }
> > > 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache]
> > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857:
> Cancelling.
> > > 2015-10-12 18:41:2

Re: Timeout with live migration

2015-10-12 Thread Rafael Weingärtner
There is your problem, there are currently two distinct values conrolling
those async jobs.
Change that value and everything will work for u.
Can you open a jira ticket?

On Mon, Oct 12, 2015 at 11:51 PM, Ryan Farrington  wrote:

> wait is currently configured to be 3600
>
>
>
> 
> From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> Sent: Monday, October 12, 2015 9:46 PM
> To: users@cloudstack.apache.org
> Subject: [Questionable]  Re: Timeout with live migration
>
> I found something odd,
> can you check the parameter called "wait", what value is it using ?
>
> On Mon, Oct 12, 2015 at 10:54 PM, Ryan Farrington <
> rfarring...@remitdata.com
> > wrote:
>
> > Yes the parameter was set long ago and the management server has been
> > restarted numerous time over the past few days as we played with other
> > parameters to no effect.
> >
> > After looking at the log a little more does the "Failed to send command,
> > due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands
> > 996939857 to Host 38 timed out after 7200" mean that the migration start
> > command is being sent in some kind of synchronous mode and not returning
> > control back to the job manager?
> >
> >
> >
> >
> > 
> > From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> > Sent: Monday, October 12, 2015 8:46 PM
> > To: users@cloudstack.apache.org
> > Subject: [Questionable]  Re: Timeout with live migration
> >
> > I thought you using the command  “migrateVirtualMachineWithVolume” but it
> > seems that you are using “migrateVolume” command from ACS's API.
> >
> >
> > For the code I debugged “migrateVirtualMachineWithVolume”, the parameter
> > 3600, means 1 hour of timeout.
> >
> > For the “migrateVolume” is the same, they both end up in
> >
> >
> “com.cloud.hypervisor.xen.resource.XenServer610Resource.execute(MigrateVolumeCommand)”,
> > and in that method the parameter is the same.
> >
> >
> > If your parameter is set to 36000 (10 hours) I do not see why you are
> > getting the exception after 2 hours.
> >
> > Did you restart the management servers after you changed the parameter?
> >
> > On Mon, Oct 12, 2015 at 10:31 PM, Ryan Farrington <
> > rfarring...@remitdata.com
> > > wrote:
> >
> > > Here is the full log, including the stack for the exception, that we
> get
> > > at the 2 hour mark. as for the migratewait it is set to 36000 which
> > should
> > > be 10 hours.
> > >
> > > 2015-10-12 18:41:20,137 DEBUG [c.c.a.m.DirectAgentAttache]
> > > (DirectAgent-323:ctx-6d42edd7) Seq 31-1023875267: Executing request
> > > 2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache]
> > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting
> > some
> > > more time because this is the current command
> > > 2015-10-12 18:41:20,457 INFO  [c.c.u.e.CSExceptionErrorCode]
> > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Could not find exception:
> > > com.cloud.exception.OperationTimedoutException in error code list for
> > > exceptions
> > > 2015-10-12 18:41:20,465 WARN  [c.c.a.m.AgentAttache]
> > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Timed out
> > on
> > > Seq 38-996939857:  { Cmd , MgmtId: 42756806312036, via:
> 38(xen-nc-bc2b7),
> > > Ver: v1, Flags: 100111,
> > >
> >
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
> > > }
> > > 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache]
> > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857:
> Cancelling.
> > > 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache]
> > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: No more
> > > commands found
> > > 2015-10-12 18:41:20,465 DEBUG [o.a.c.s.RemoteHostEndPoint]
> > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Failed to send command, due
> > to
> > > Agent:38, com.cloud.exception.OperationTimedoutException: Commands
> > >

Re: Timeout with live migration

2015-10-12 Thread Ryan Farrington
wait is currently configured to be 3600




From: Rafael Weingärtner [rafaelweingart...@gmail.com]
Sent: Monday, October 12, 2015 9:46 PM
To: users@cloudstack.apache.org
Subject: [Questionable]  Re: Timeout with live migration

I found something odd,
can you check the parameter called "wait", what value is it using ?

On Mon, Oct 12, 2015 at 10:54 PM, Ryan Farrington  wrote:

> Yes the parameter was set long ago and the management server has been
> restarted numerous time over the past few days as we played with other
> parameters to no effect.
>
> After looking at the log a little more does the "Failed to send command,
> due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands
> 996939857 to Host 38 timed out after 7200" mean that the migration start
> command is being sent in some kind of synchronous mode and not returning
> control back to the job manager?
>
>
>
>
> 
> From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> Sent: Monday, October 12, 2015 8:46 PM
> To: users@cloudstack.apache.org
> Subject: [Questionable]  Re: Timeout with live migration
>
> I thought you using the command  “migrateVirtualMachineWithVolume” but it
> seems that you are using “migrateVolume” command from ACS's API.
>
>
> For the code I debugged “migrateVirtualMachineWithVolume”, the parameter
> 3600, means 1 hour of timeout.
>
> For the “migrateVolume” is the same, they both end up in
>
> “com.cloud.hypervisor.xen.resource.XenServer610Resource.execute(MigrateVolumeCommand)”,
> and in that method the parameter is the same.
>
>
> If your parameter is set to 36000 (10 hours) I do not see why you are
> getting the exception after 2 hours.
>
> Did you restart the management servers after you changed the parameter?
>
> On Mon, Oct 12, 2015 at 10:31 PM, Ryan Farrington <
> rfarring...@remitdata.com
> > wrote:
>
> > Here is the full log, including the stack for the exception, that we get
> > at the 2 hour mark. as for the migratewait it is set to 36000 which
> should
> > be 10 hours.
> >
> > 2015-10-12 18:41:20,137 DEBUG [c.c.a.m.DirectAgentAttache]
> > (DirectAgent-323:ctx-6d42edd7) Seq 31-1023875267: Executing request
> > 2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache]
> > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting
> some
> > more time because this is the current command
> > 2015-10-12 18:41:20,457 INFO  [c.c.u.e.CSExceptionErrorCode]
> > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Could not find exception:
> > com.cloud.exception.OperationTimedoutException in error code list for
> > exceptions
> > 2015-10-12 18:41:20,465 WARN  [c.c.a.m.AgentAttache]
> > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Timed out
> on
> > Seq 38-996939857:  { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7),
> > Ver: v1, Flags: 100111,
> >
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
> > }
> > 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache]
> > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Cancelling.
> > 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache]
> > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: No more
> > commands found
> > 2015-10-12 18:41:20,465 DEBUG [o.a.c.s.RemoteHostEndPoint]
> > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Failed to send command, due
> to
> > Agent:38, com.cloud.exception.OperationTimedoutException: Commands
> > 996939857 to Host 38 timed out after 7200
> > 2015-10-12 18:41:20,471 DEBUG [o.a.c.s.m.AncientDataMotionStrategy]
> > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) copy failed
> > com.cloud.utils.exception.CloudRuntimeException: Failed to send command,
> > due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands
> > 996939857 to Host 38 timed out after 7200
> > at
> >
> org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:116)
> > at
> >
> org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.migrateVolumeToPool(AncientDataMotionStrategy.java:382)
> > at
> >
> org.apache.cloudstack.storage.motion.AncientDataMotionSt

Re: Timeout with live migration

2015-10-12 Thread Rafael Weingärtner
I found something odd,
can you check the parameter called "wait", what value is it using ?

On Mon, Oct 12, 2015 at 10:54 PM, Ryan Farrington  wrote:

> Yes the parameter was set long ago and the management server has been
> restarted numerous time over the past few days as we played with other
> parameters to no effect.
>
> After looking at the log a little more does the "Failed to send command,
> due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands
> 996939857 to Host 38 timed out after 7200" mean that the migration start
> command is being sent in some kind of synchronous mode and not returning
> control back to the job manager?
>
>
>
>
> 
> From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> Sent: Monday, October 12, 2015 8:46 PM
> To: users@cloudstack.apache.org
> Subject: [Questionable]  Re: Timeout with live migration
>
> I thought you using the command  “migrateVirtualMachineWithVolume” but it
> seems that you are using “migrateVolume” command from ACS's API.
>
>
> For the code I debugged “migrateVirtualMachineWithVolume”, the parameter
> 3600, means 1 hour of timeout.
>
> For the “migrateVolume” is the same, they both end up in
>
> “com.cloud.hypervisor.xen.resource.XenServer610Resource.execute(MigrateVolumeCommand)”,
> and in that method the parameter is the same.
>
>
> If your parameter is set to 36000 (10 hours) I do not see why you are
> getting the exception after 2 hours.
>
> Did you restart the management servers after you changed the parameter?
>
> On Mon, Oct 12, 2015 at 10:31 PM, Ryan Farrington <
> rfarring...@remitdata.com
> > wrote:
>
> > Here is the full log, including the stack for the exception, that we get
> > at the 2 hour mark. as for the migratewait it is set to 36000 which
> should
> > be 10 hours.
> >
> > 2015-10-12 18:41:20,137 DEBUG [c.c.a.m.DirectAgentAttache]
> > (DirectAgent-323:ctx-6d42edd7) Seq 31-1023875267: Executing request
> > 2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache]
> > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting
> some
> > more time because this is the current command
> > 2015-10-12 18:41:20,457 INFO  [c.c.u.e.CSExceptionErrorCode]
> > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Could not find exception:
> > com.cloud.exception.OperationTimedoutException in error code list for
> > exceptions
> > 2015-10-12 18:41:20,465 WARN  [c.c.a.m.AgentAttache]
> > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Timed out
> on
> > Seq 38-996939857:  { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7),
> > Ver: v1, Flags: 100111,
> >
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
> > }
> > 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache]
> > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Cancelling.
> > 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache]
> > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: No more
> > commands found
> > 2015-10-12 18:41:20,465 DEBUG [o.a.c.s.RemoteHostEndPoint]
> > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Failed to send command, due
> to
> > Agent:38, com.cloud.exception.OperationTimedoutException: Commands
> > 996939857 to Host 38 timed out after 7200
> > 2015-10-12 18:41:20,471 DEBUG [o.a.c.s.m.AncientDataMotionStrategy]
> > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) copy failed
> > com.cloud.utils.exception.CloudRuntimeException: Failed to send command,
> > due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands
> > 996939857 to Host 38 timed out after 7200
> > at
> >
> org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:116)
> > at
> >
> org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.migrateVolumeToPool(AncientDataMotionStrategy.java:382)
> > at
> >
> org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.copyAsync(AncientDataMotionStrategy.java:421)
> > at
> >
> org.apache.cloudstack.storage.motion.DataMotionServiceImpl.copyAsync(DataMotionServiceImpl.java:70)
> > at
> >
> org.apache.cloudstack.storage.volume.VolumeServiceImpl

Re: Timeout with live migration

2015-10-12 Thread Ryan Farrington
Yes the parameter was set long ago and the management server has been restarted 
numerous time over the past few days as we played with other parameters to no 
effect.  

After looking at the log a little more does the "Failed to send command, due to 
Agent:38, com.cloud.exception.OperationTimedoutException: Commands 996939857 to 
Host 38 timed out after 7200" mean that the migration start command is being 
sent in some kind of synchronous mode and not returning control back to the job 
manager?  





From: Rafael Weingärtner [rafaelweingart...@gmail.com]
Sent: Monday, October 12, 2015 8:46 PM
To: users@cloudstack.apache.org
Subject: [Questionable]  Re: Timeout with live migration

I thought you using the command  “migrateVirtualMachineWithVolume” but it
seems that you are using “migrateVolume” command from ACS's API.


For the code I debugged “migrateVirtualMachineWithVolume”, the parameter
3600, means 1 hour of timeout.

For the “migrateVolume” is the same, they both end up in
“com.cloud.hypervisor.xen.resource.XenServer610Resource.execute(MigrateVolumeCommand)”,
and in that method the parameter is the same.


If your parameter is set to 36000 (10 hours) I do not see why you are
getting the exception after 2 hours.

Did you restart the management servers after you changed the parameter?

On Mon, Oct 12, 2015 at 10:31 PM, Ryan Farrington  wrote:

> Here is the full log, including the stack for the exception, that we get
> at the 2 hour mark. as for the migratewait it is set to 36000 which should
> be 10 hours.
>
> 2015-10-12 18:41:20,137 DEBUG [c.c.a.m.DirectAgentAttache]
> (DirectAgent-323:ctx-6d42edd7) Seq 31-1023875267: Executing request
> 2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting some
> more time because this is the current command
> 2015-10-12 18:41:20,457 INFO  [c.c.u.e.CSExceptionErrorCode]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Could not find exception:
> com.cloud.exception.OperationTimedoutException in error code list for
> exceptions
> 2015-10-12 18:41:20,465 WARN  [c.c.a.m.AgentAttache]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Timed out on
> Seq 38-996939857:  { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7),
> Ver: v1, Flags: 100111,
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
> }
> 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Cancelling.
> 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: No more
> commands found
> 2015-10-12 18:41:20,465 DEBUG [o.a.c.s.RemoteHostEndPoint]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Failed to send command, due to
> Agent:38, com.cloud.exception.OperationTimedoutException: Commands
> 996939857 to Host 38 timed out after 7200
> 2015-10-12 18:41:20,471 DEBUG [o.a.c.s.m.AncientDataMotionStrategy]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) copy failed
> com.cloud.utils.exception.CloudRuntimeException: Failed to send command,
> due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands
> 996939857 to Host 38 timed out after 7200
> at
> org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:116)
> at
> org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.migrateVolumeToPool(AncientDataMotionStrategy.java:382)
> at
> org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.copyAsync(AncientDataMotionStrategy.java:421)
> at
> org.apache.cloudstack.storage.motion.DataMotionServiceImpl.copyAsync(DataMotionServiceImpl.java:70)
> at
> org.apache.cloudstack.storage.volume.VolumeServiceImpl.migrateVolume(VolumeServiceImpl.java:931)
> at
> com.cloud.storage.VolumeApiServiceImpl.liveMigrateVolume(VolumeApiServiceImpl.java:1680)
> at
> com.cloud.storage.VolumeApiServiceImpl.orchestrateMigrateVolume(VolumeApiServiceImpl.java:1666)
> at
> com.cloud.storage.VolumeApiServiceImpl.migrateVolume(VolumeApiServiceImpl.java:1622)
> at sun.reflect.GeneratedMethodAccessor335.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:622)
> a

Re: Timeout with live migration

2015-10-12 Thread Rafael Weingärtner
.proxy.$Proxy196.migrateVolume(Unknown Source)
> at
> org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd.execute(MigrateVolumeCmd.java:103)
> at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:161)
> at
> com.cloud.api.ApiAsyncJobDispatcher.runJobInContext(ApiAsyncJobDispatcher.java:109)
> at
> com.cloud.api.ApiAsyncJobDispatcher$1.run(ApiAsyncJobDispatcher.java:66)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
> at
> com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:63)
> at
> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:509)
> at
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
> at
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:701)
> 2015-10-12 18:41:20,479 WARN  [o.a.c.s.d.ObjectInDataStoreManagerImpl]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Unsupported data object
> (VOLUME,
> org.apache.cloudstack.storage.datastore.PrimaryDataStoreImpl@4fa7a45f),
> no need to delete from object in store ref table
> 2015-10-12 18:41:20,479 DEBUG [c.c.s.VolumeApiServiceImpl]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) migrate volume
> failed:com.cloud.utils.exception.CloudRuntimeException: Failed to send
> command, due to Agent:38, com.cloud.exception.OperationTimedoutException:
> Commands 996939857 to Host 38 timed out after 7200
> 2015-10-12 18:41:20,480 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> (Job-Executor-63:ctx-f7b6817d) Complete async job-5257, jobStatus: FAILED,
> resultCode: 530, result:
> org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530,"errortext":"Failed
> to migrate volume"}
> 2015-10-12 18:41:20,486 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> (Job-Executor-63:ctx-f7b6817d) Done executing
> org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd for job-5257
> 2015-10-12 18:41:20,489 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
> (Job-Executor-63:ctx-f7b6817d) Remove job-5257 from job monitoring
>
>
>
>
>
>
>
>
>
>
>
> 
> From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> Sent: Monday, October 12, 2015 8:24 PM
> To: users@cloudstack.apache.org
> Subject: [Questionable]  Re: [Questionable] Re: Timeout with live migration
>
> Now I understand what you are doing, I am familiar with that concept (live
> migration of VM within a cluster, having the VHD being moved from one SR to
> another).
>
> I just got confused when I read live migration of volumes (a volume does
> not run by itself, so that why I asked a little for some more information).
>
> Looking at the source code this is the variable used to control the
> timeout:
> "long timeout = (_migratewait) * 1000L;"
>
> The value of "_migratewait" is taken from this parameter:
> value = (String) params.get("migratewait");
> _migratewait = NumbersUtil.parseInt(value, 3600);
>
> Therefore, the name of the parameter to be configured is "migratewait", the
> default value is 3600.
>
>
> BTW1: I think that is a terrible parameter name. We should refactor that,
> could you open a Jira ticket for that?
>
> BTW2: that error message you posted does not seem to be related to the
> migration timeout; hence, in the code if the copy times out the message
> would be:
> "Async " + timeout/1000 

Re: Timeout with live migration

2015-10-12 Thread Ryan Farrington
obManagerImpl$5.runInContext(AsyncJobManagerImpl.java:509)
at 
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at 
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)
2015-10-12 18:41:20,479 WARN  [o.a.c.s.d.ObjectInDataStoreManagerImpl] 
(Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Unsupported data object (VOLUME, 
org.apache.cloudstack.storage.datastore.PrimaryDataStoreImpl@4fa7a45f), no need 
to delete from object in store ref table
2015-10-12 18:41:20,479 DEBUG [c.c.s.VolumeApiServiceImpl] 
(Job-Executor-63:ctx-f7b6817d ctx-c6b92515) migrate volume 
failed:com.cloud.utils.exception.CloudRuntimeException: Failed to send command, 
due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands 
996939857 to Host 38 timed out after 7200
2015-10-12 18:41:20,480 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] 
(Job-Executor-63:ctx-f7b6817d) Complete async job-5257, jobStatus: FAILED, 
resultCode: 530, result: 
org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530,"errortext":"Failed
 to migrate volume"}
2015-10-12 18:41:20,486 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] 
(Job-Executor-63:ctx-f7b6817d) Done executing 
org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd for job-5257
2015-10-12 18:41:20,489 INFO  [o.a.c.f.j.i.AsyncJobMonitor] 
(Job-Executor-63:ctx-f7b6817d) Remove job-5257 from job monitoring











________
From: Rafael Weingärtner [rafaelweingart...@gmail.com]
Sent: Monday, October 12, 2015 8:24 PM
To: users@cloudstack.apache.org
Subject: [Questionable]  Re: [Questionable] Re: Timeout with live migration

Now I understand what you are doing, I am familiar with that concept (live
migration of VM within a cluster, having the VHD being moved from one SR to
another).

I just got confused when I read live migration of volumes (a volume does
not run by itself, so that why I asked a little for some more information).

Looking at the source code this is the variable used to control the timeout:
"long timeout = (_migratewait) * 1000L;"

The value of "_migratewait" is taken from this parameter:
value = (String) params.get("migratewait");
_migratewait = NumbersUtil.parseInt(value, 3600);

Therefore, the name of the parameter to be configured is "migratewait", the
default value is 3600.


BTW1: I think that is a terrible parameter name. We should refactor that,
could you open a Jira ticket for that?

BTW2: that error message you posted does not seem to be related to the
migration timeout; hence, in the code if the copy times out the message
would be:
"Async " + timeout/1000 + " seconds timeout for task " + task.toString()"

Maybe because it throws a "Types.BadAsyncResult(msg)" and that might be
translated into that message, or that might not be related to the problem
itself, and you just thought that it was.


Does it help you?


On Mon, Oct 12, 2015 at 10:00 PM, Ryan Farrington  wrote:

> Hypervisor:  XenServer
>
> We are moving a data volume from one storage onto another without shutting
> down the VM cause that would just be silly and a triplication of effort
> with the whole copying to secondary storage and then back off again. The
> volume is staying in the same cluster just moving to a different Primary
> storage (or SR in the XenServer vernacular)
>
> If you are familiar with ESX this is a "Storage VMotion" where as in
> XenServer it is called "Storage XenMotion".
>
> 
> From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> Sent: Monday, October 12, 2015 7:53 PM
> To: users@cloudstack.apache.org
> Subject: [Questionable]  Re: Timeout with live migration
>
> what do you mean with livre migrating data volume ?!
> I understand a live migration of a VM, but volumes...
>
> do you mean live migrating a VM that has a volume attached?
> are you migrating that volume to a different cluster? or just a different
> sto

Re: [Questionable] Re: Timeout with live migration

2015-10-12 Thread Rafael Weingärtner
Now I understand what you are doing, I am familiar with that concept (live
migration of VM within a cluster, having the VHD being moved from one SR to
another).

I just got confused when I read live migration of volumes (a volume does
not run by itself, so that why I asked a little for some more information).

Looking at the source code this is the variable used to control the timeout:
"long timeout = (_migratewait) * 1000L;"

The value of "_migratewait" is taken from this parameter:
value = (String) params.get("migratewait");
_migratewait = NumbersUtil.parseInt(value, 3600);

Therefore, the name of the parameter to be configured is "migratewait", the
default value is 3600.


BTW1: I think that is a terrible parameter name. We should refactor that,
could you open a Jira ticket for that?

BTW2: that error message you posted does not seem to be related to the
migration timeout; hence, in the code if the copy times out the message
would be:
"Async " + timeout/1000 + " seconds timeout for task " + task.toString()"

Maybe because it throws a "Types.BadAsyncResult(msg)" and that might be
translated into that message, or that might not be related to the problem
itself, and you just thought that it was.


Does it help you?


On Mon, Oct 12, 2015 at 10:00 PM, Ryan Farrington  wrote:

> Hypervisor:  XenServer
>
> We are moving a data volume from one storage onto another without shutting
> down the VM cause that would just be silly and a triplication of effort
> with the whole copying to secondary storage and then back off again. The
> volume is staying in the same cluster just moving to a different Primary
> storage (or SR in the XenServer vernacular)
>
> If you are familiar with ESX this is a "Storage VMotion" where as in
> XenServer it is called "Storage XenMotion".
>
> 
> From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> Sent: Monday, October 12, 2015 7:53 PM
> To: users@cloudstack.apache.org
> Subject: [Questionable]  Re: Timeout with live migration
>
> what do you mean with livre migrating data volume ?!
> I understand a live migration of a VM, but volumes...
>
> do you mean live migrating a VM that has a volume attached?
> are you migrating that volume to a different cluster? or just a different
> storage in the same cluster?
> What hypervisor are you using ?
>
>
> On Mon, Oct 12, 2015 at 9:47 PM, Ryan Farrington <
> rfarring...@remitdata.com>
> wrote:
>
> > Live migrating a data volume. We are purely on shared storage so no local
> > storage is involved.
> >
> > ____
> > From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> > Sent: Monday, October 12, 2015 7:37 PM
> > To: users@cloudstack.apache.org
> > Subject: [Questionable]  Re: Timeout with live migration
> >
> > Are you live migrating a VM, or migrating a volume of a stopped VM to a
> > different primary storage?
> >
> > If it is a running VM, is the VM allocated in a shared storage or local
> > storage?
> >
> > On Mon, Oct 12, 2015 at 9:17 PM, Ryan Farrington <
> > rfarring...@remitdata.com>
> > wrote:
> >
> > > The slow transfer is related to the storage we are trying to migrate
> off
> > > of.  We are capable of getting about 350mbps off the disks but when we
> > are
> > > moving volumes that are greater than about 500GB we end up racing the
> > clock
> > > and hoping that the migration finishes before the job times out.   It
> > would
> > > be awesome to be able to manage that timeout and I know there are a ton
> > of
> > > settings I just don't know about and am hoping someone might be able to
> > > point me in the right direction.
> > >
> > >
> > > 
> > > From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> > > Sent: Monday, October 12, 2015 6:40 PM
> > > To: users@cloudstack.apache.org
> > > Subject: [Questionable]  Re: Timeout with live migration
> > >
> > > I would first check your NICs' speed and load, the amount of RAM
> > allocated
> > > for the migrating VM and than check the hypervisor log files.
> > >
> > > On Mon, Oct 12, 2015 at 8:19 PM, Jan-Arve Nygård <
> > > jan.arve.nyg...@gmail.com>
> > > wrote:
> > >
> > > > What version are you running? Check if the copy.volume.wait setting
> is
> > > set
> > > > to 7200 and increase it. If not you could also check
> > > > job.cancel.threshold.minutes and job.

RE: [Questionable] Re: Timeout with live migration

2015-10-12 Thread Ryan Farrington
Hypervisor:  XenServer

We are moving a data volume from one storage onto another without shutting down 
the VM cause that would just be silly and a triplication of effort with the 
whole copying to secondary storage and then back off again. The volume is 
staying in the same cluster just moving to a different Primary storage (or SR 
in the XenServer vernacular) 

If you are familiar with ESX this is a "Storage VMotion" where as in XenServer 
it is called "Storage XenMotion". 


From: Rafael Weingärtner [rafaelweingart...@gmail.com]
Sent: Monday, October 12, 2015 7:53 PM
To: users@cloudstack.apache.org
Subject: [Questionable]  Re: Timeout with live migration

what do you mean with livre migrating data volume ?!
I understand a live migration of a VM, but volumes...

do you mean live migrating a VM that has a volume attached?
are you migrating that volume to a different cluster? or just a different
storage in the same cluster?
What hypervisor are you using ?


On Mon, Oct 12, 2015 at 9:47 PM, Ryan Farrington 
wrote:

> Live migrating a data volume. We are purely on shared storage so no local
> storage is involved.
>
> 
> From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> Sent: Monday, October 12, 2015 7:37 PM
> To: users@cloudstack.apache.org
> Subject: [Questionable]  Re: Timeout with live migration
>
> Are you live migrating a VM, or migrating a volume of a stopped VM to a
> different primary storage?
>
> If it is a running VM, is the VM allocated in a shared storage or local
> storage?
>
> On Mon, Oct 12, 2015 at 9:17 PM, Ryan Farrington <
> rfarring...@remitdata.com>
> wrote:
>
> > The slow transfer is related to the storage we are trying to migrate off
> > of.  We are capable of getting about 350mbps off the disks but when we
> are
> > moving volumes that are greater than about 500GB we end up racing the
> clock
> > and hoping that the migration finishes before the job times out.   It
> would
> > be awesome to be able to manage that timeout and I know there are a ton
> of
> > settings I just don't know about and am hoping someone might be able to
> > point me in the right direction.
> >
> >
> > 
> > From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> > Sent: Monday, October 12, 2015 6:40 PM
> > To: users@cloudstack.apache.org
> > Subject: [Questionable]  Re: Timeout with live migration
> >
> > I would first check your NICs' speed and load, the amount of RAM
> allocated
> > for the migrating VM and than check the hypervisor log files.
> >
> > On Mon, Oct 12, 2015 at 8:19 PM, Jan-Arve Nygård <
> > jan.arve.nyg...@gmail.com>
> > wrote:
> >
> > > What version are you running? Check if the copy.volume.wait setting is
> > set
> > > to 7200 and increase it. If not you could also check
> > > job.cancel.threshold.minutes and job.expire.minutes.
> > >
> > > -Jan-Arve
> > >
> > > 2015-10-13 0:46 GMT+02:00 Ryan Farrington :
> > >
> > > > We are experiencing a failure in cloudstack waiting for an async job
> > > > performing a live migration of a volume to finish. I've copied the
> > > relevant
> > > > log entries below.We acknowledge that the migration will take a few
> > hours
> > > > based on the volume of the data and we are looking for a way to
> > increase
> > > > the timeout of 7200 seconds into something we know we can work with.
> > > >
> > > >
> > > > 2015-10-12 00:19:36,043 DEBUG [o.a.c.s.RemoteHostEndPoint]
> > > > (Job-Executor-62:ctx-802065a9 ctx-bb27a168) Failed to send command,
> due
> > > to
> > > > Agent:27, com.cloud.exception.OperationTimedoutException: Commands
> > > > 835325398 to Host 27 timed out after 7200
> > > >
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Rafael Weingärtner
> >
>
>
>
> --
> Rafael Weingärtner
>



--
Rafael Weingärtner


Re: Timeout with live migration

2015-10-12 Thread Rafael Weingärtner
what do you mean with livre migrating data volume ?!
I understand a live migration of a VM, but volumes...

do you mean live migrating a VM that has a volume attached?
are you migrating that volume to a different cluster? or just a different
storage in the same cluster?
What hypervisor are you using ?


On Mon, Oct 12, 2015 at 9:47 PM, Ryan Farrington 
wrote:

> Live migrating a data volume. We are purely on shared storage so no local
> storage is involved.
>
> 
> From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> Sent: Monday, October 12, 2015 7:37 PM
> To: users@cloudstack.apache.org
> Subject: [Questionable]  Re: Timeout with live migration
>
> Are you live migrating a VM, or migrating a volume of a stopped VM to a
> different primary storage?
>
> If it is a running VM, is the VM allocated in a shared storage or local
> storage?
>
> On Mon, Oct 12, 2015 at 9:17 PM, Ryan Farrington <
> rfarring...@remitdata.com>
> wrote:
>
> > The slow transfer is related to the storage we are trying to migrate off
> > of.  We are capable of getting about 350mbps off the disks but when we
> are
> > moving volumes that are greater than about 500GB we end up racing the
> clock
> > and hoping that the migration finishes before the job times out.   It
> would
> > be awesome to be able to manage that timeout and I know there are a ton
> of
> > settings I just don't know about and am hoping someone might be able to
> > point me in the right direction.
> >
> >
> > 
> > From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> > Sent: Monday, October 12, 2015 6:40 PM
> > To: users@cloudstack.apache.org
> > Subject: [Questionable]  Re: Timeout with live migration
> >
> > I would first check your NICs' speed and load, the amount of RAM
> allocated
> > for the migrating VM and than check the hypervisor log files.
> >
> > On Mon, Oct 12, 2015 at 8:19 PM, Jan-Arve Nygård <
> > jan.arve.nyg...@gmail.com>
> > wrote:
> >
> > > What version are you running? Check if the copy.volume.wait setting is
> > set
> > > to 7200 and increase it. If not you could also check
> > > job.cancel.threshold.minutes and job.expire.minutes.
> > >
> > > -Jan-Arve
> > >
> > > 2015-10-13 0:46 GMT+02:00 Ryan Farrington :
> > >
> > > > We are experiencing a failure in cloudstack waiting for an async job
> > > > performing a live migration of a volume to finish. I've copied the
> > > relevant
> > > > log entries below.We acknowledge that the migration will take a few
> > hours
> > > > based on the volume of the data and we are looking for a way to
> > increase
> > > > the timeout of 7200 seconds into something we know we can work with.
> > > >
> > > >
> > > > 2015-10-12 00:19:36,043 DEBUG [o.a.c.s.RemoteHostEndPoint]
> > > > (Job-Executor-62:ctx-802065a9 ctx-bb27a168) Failed to send command,
> due
> > > to
> > > > Agent:27, com.cloud.exception.OperationTimedoutException: Commands
> > > > 835325398 to Host 27 timed out after 7200
> > > >
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Rafael Weingärtner
> >
>
>
>
> --
> Rafael Weingärtner
>



-- 
Rafael Weingärtner


Re: Timeout with live migration

2015-10-12 Thread Ryan Farrington
Live migrating a data volume. We are purely on shared storage so no local 
storage is involved.  


From: Rafael Weingärtner [rafaelweingart...@gmail.com]
Sent: Monday, October 12, 2015 7:37 PM
To: users@cloudstack.apache.org
Subject: [Questionable]  Re: Timeout with live migration

Are you live migrating a VM, or migrating a volume of a stopped VM to a
different primary storage?

If it is a running VM, is the VM allocated in a shared storage or local
storage?

On Mon, Oct 12, 2015 at 9:17 PM, Ryan Farrington 
wrote:

> The slow transfer is related to the storage we are trying to migrate off
> of.  We are capable of getting about 350mbps off the disks but when we are
> moving volumes that are greater than about 500GB we end up racing the clock
> and hoping that the migration finishes before the job times out.   It would
> be awesome to be able to manage that timeout and I know there are a ton of
> settings I just don't know about and am hoping someone might be able to
> point me in the right direction.
>
>
> 
> From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> Sent: Monday, October 12, 2015 6:40 PM
> To: users@cloudstack.apache.org
> Subject: [Questionable]  Re: Timeout with live migration
>
> I would first check your NICs' speed and load, the amount of RAM allocated
> for the migrating VM and than check the hypervisor log files.
>
> On Mon, Oct 12, 2015 at 8:19 PM, Jan-Arve Nygård <
> jan.arve.nyg...@gmail.com>
> wrote:
>
> > What version are you running? Check if the copy.volume.wait setting is
> set
> > to 7200 and increase it. If not you could also check
> > job.cancel.threshold.minutes and job.expire.minutes.
> >
> > -Jan-Arve
> >
> > 2015-10-13 0:46 GMT+02:00 Ryan Farrington :
> >
> > > We are experiencing a failure in cloudstack waiting for an async job
> > > performing a live migration of a volume to finish. I've copied the
> > relevant
> > > log entries below.We acknowledge that the migration will take a few
> hours
> > > based on the volume of the data and we are looking for a way to
> increase
> > > the timeout of 7200 seconds into something we know we can work with.
> > >
> > >
> > > 2015-10-12 00:19:36,043 DEBUG [o.a.c.s.RemoteHostEndPoint]
> > > (Job-Executor-62:ctx-802065a9 ctx-bb27a168) Failed to send command, due
> > to
> > > Agent:27, com.cloud.exception.OperationTimedoutException: Commands
> > > 835325398 to Host 27 timed out after 7200
> > >
> > >
> > >
> >
>
>
>
> --
> Rafael Weingärtner
>



--
Rafael Weingärtner


Re: Timeout with live migration

2015-10-12 Thread Rafael Weingärtner
Are you live migrating a VM, or migrating a volume of a stopped VM to a
different primary storage?

If it is a running VM, is the VM allocated in a shared storage or local
storage?

On Mon, Oct 12, 2015 at 9:17 PM, Ryan Farrington 
wrote:

> The slow transfer is related to the storage we are trying to migrate off
> of.  We are capable of getting about 350mbps off the disks but when we are
> moving volumes that are greater than about 500GB we end up racing the clock
> and hoping that the migration finishes before the job times out.   It would
> be awesome to be able to manage that timeout and I know there are a ton of
> settings I just don't know about and am hoping someone might be able to
> point me in the right direction.
>
>
> 
> From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> Sent: Monday, October 12, 2015 6:40 PM
> To: users@cloudstack.apache.org
> Subject: [Questionable]  Re: Timeout with live migration
>
> I would first check your NICs' speed and load, the amount of RAM allocated
> for the migrating VM and than check the hypervisor log files.
>
> On Mon, Oct 12, 2015 at 8:19 PM, Jan-Arve Nygård <
> jan.arve.nyg...@gmail.com>
> wrote:
>
> > What version are you running? Check if the copy.volume.wait setting is
> set
> > to 7200 and increase it. If not you could also check
> > job.cancel.threshold.minutes and job.expire.minutes.
> >
> > -Jan-Arve
> >
> > 2015-10-13 0:46 GMT+02:00 Ryan Farrington :
> >
> > > We are experiencing a failure in cloudstack waiting for an async job
> > > performing a live migration of a volume to finish. I've copied the
> > relevant
> > > log entries below.We acknowledge that the migration will take a few
> hours
> > > based on the volume of the data and we are looking for a way to
> increase
> > > the timeout of 7200 seconds into something we know we can work with.
> > >
> > >
> > > 2015-10-12 00:19:36,043 DEBUG [o.a.c.s.RemoteHostEndPoint]
> > > (Job-Executor-62:ctx-802065a9 ctx-bb27a168) Failed to send command, due
> > to
> > > Agent:27, com.cloud.exception.OperationTimedoutException: Commands
> > > 835325398 to Host 27 timed out after 7200
> > >
> > >
> > >
> >
>
>
>
> --
> Rafael Weingärtner
>



-- 
Rafael Weingärtner


Re: Timeout with live migration

2015-10-12 Thread Ryan Farrington
The slow transfer is related to the storage we are trying to migrate off of.  
We are capable of getting about 350mbps off the disks but when we are moving 
volumes that are greater than about 500GB we end up racing the clock and hoping 
that the migration finishes before the job times out.   It would be awesome to 
be able to manage that timeout and I know there are a ton of settings I just 
don't know about and am hoping someone might be able to point me in the right 
direction.  



From: Rafael Weingärtner [rafaelweingart...@gmail.com]
Sent: Monday, October 12, 2015 6:40 PM
To: users@cloudstack.apache.org
Subject: [Questionable]  Re: Timeout with live migration

I would first check your NICs' speed and load, the amount of RAM allocated
for the migrating VM and than check the hypervisor log files.

On Mon, Oct 12, 2015 at 8:19 PM, Jan-Arve Nygård 
wrote:

> What version are you running? Check if the copy.volume.wait setting is set
> to 7200 and increase it. If not you could also check
> job.cancel.threshold.minutes and job.expire.minutes.
>
> -Jan-Arve
>
> 2015-10-13 0:46 GMT+02:00 Ryan Farrington :
>
> > We are experiencing a failure in cloudstack waiting for an async job
> > performing a live migration of a volume to finish. I've copied the
> relevant
> > log entries below.We acknowledge that the migration will take a few hours
> > based on the volume of the data and we are looking for a way to increase
> > the timeout of 7200 seconds into something we know we can work with.
> >
> >
> > 2015-10-12 00:19:36,043 DEBUG [o.a.c.s.RemoteHostEndPoint]
> > (Job-Executor-62:ctx-802065a9 ctx-bb27a168) Failed to send command, due
> to
> > Agent:27, com.cloud.exception.OperationTimedoutException: Commands
> > 835325398 to Host 27 timed out after 7200
> >
> >
> >
>



--
Rafael Weingärtner


Re: Timeout with live migration

2015-10-12 Thread Ryan Farrington
We are currently on version 4.3.0.  Hypervisor is XenServer.None of the 
settings are set to 7200 seconds (or any variation that would yield 7200 
seconds) but i have provided them below as a reference.   Is there any other 
place where 7200 might be hard coded?  We are planning on an upgrade to 4.5.2 
next month but this migration needs to happen.  We have become pretty 
proficient at the post volume migration cleanup by manually mucking with the 
database but it is annoying and I would much rather have cloudstack just wait 
like i told it to.  

copy.volume.wait = 10800 (3 hours)
job.cancel.threshold.minutes = 60 (1 hour)
job.expire.minutes = 1440 (24 hours)





From: Jan-Arve Nygård [jan.arve.nyg...@gmail.com]
Sent: Monday, October 12, 2015 6:19 PM
To: users@cloudstack.apache.org
Subject: [Questionable]  Re: Timeout with live migration

What version are you running? Check if the copy.volume.wait setting is set
to 7200 and increase it. If not you could also check
job.cancel.threshold.minutes and job.expire.minutes.

-Jan-Arve

2015-10-13 0:46 GMT+02:00 Ryan Farrington :

> We are experiencing a failure in cloudstack waiting for an async job
> performing a live migration of a volume to finish. I've copied the relevant
> log entries below.We acknowledge that the migration will take a few hours
> based on the volume of the data and we are looking for a way to increase
> the timeout of 7200 seconds into something we know we can work with.
>
>
> 2015-10-12 00:19:36,043 DEBUG [o.a.c.s.RemoteHostEndPoint]
> (Job-Executor-62:ctx-802065a9 ctx-bb27a168) Failed to send command, due to
> Agent:27, com.cloud.exception.OperationTimedoutException: Commands
> 835325398 to Host 27 timed out after 7200
>
>
>


Re: Timeout with live migration

2015-10-12 Thread Rafael Weingärtner
I would first check your NICs' speed and load, the amount of RAM allocated
for the migrating VM and than check the hypervisor log files.

On Mon, Oct 12, 2015 at 8:19 PM, Jan-Arve Nygård 
wrote:

> What version are you running? Check if the copy.volume.wait setting is set
> to 7200 and increase it. If not you could also check
> job.cancel.threshold.minutes and job.expire.minutes.
>
> -Jan-Arve
>
> 2015-10-13 0:46 GMT+02:00 Ryan Farrington :
>
> > We are experiencing a failure in cloudstack waiting for an async job
> > performing a live migration of a volume to finish. I've copied the
> relevant
> > log entries below.We acknowledge that the migration will take a few hours
> > based on the volume of the data and we are looking for a way to increase
> > the timeout of 7200 seconds into something we know we can work with.
> >
> >
> > 2015-10-12 00:19:36,043 DEBUG [o.a.c.s.RemoteHostEndPoint]
> > (Job-Executor-62:ctx-802065a9 ctx-bb27a168) Failed to send command, due
> to
> > Agent:27, com.cloud.exception.OperationTimedoutException: Commands
> > 835325398 to Host 27 timed out after 7200
> >
> >
> >
>



-- 
Rafael Weingärtner


Re: Timeout with live migration

2015-10-12 Thread Jan-Arve Nygård
What version are you running? Check if the copy.volume.wait setting is set
to 7200 and increase it. If not you could also check
job.cancel.threshold.minutes and job.expire.minutes.

-Jan-Arve

2015-10-13 0:46 GMT+02:00 Ryan Farrington :

> We are experiencing a failure in cloudstack waiting for an async job
> performing a live migration of a volume to finish. I've copied the relevant
> log entries below.We acknowledge that the migration will take a few hours
> based on the volume of the data and we are looking for a way to increase
> the timeout of 7200 seconds into something we know we can work with.
>
>
> 2015-10-12 00:19:36,043 DEBUG [o.a.c.s.RemoteHostEndPoint]
> (Job-Executor-62:ctx-802065a9 ctx-bb27a168) Failed to send command, due to
> Agent:27, com.cloud.exception.OperationTimedoutException: Commands
> 835325398 to Host 27 timed out after 7200
>
>
>