Re: [Questionable] Re: Timeout with live migration

2015-10-13 Thread Rafael Weingärtner
> > > >
> > >
> >
> com.cloud.api.ApiAsyncJobDispatcher.runJobInContext(ApiAsyncJobDispatcher.java:109)
> > > > at
> > > >
> > com.cloud.api.ApiAsyncJobDispatcher$1.run(ApiAsyncJobDispatcher.java:66)
> > > > at
> > > >
> > >
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
> > > > at
> > > >
> > >
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
> > > > at
> > > >
> > >
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
> > > > at
> > > >
> > com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:63)
> > > > at
> > > >
> > >
> >
> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:509)
> > > > at
> > > >
> > >
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
> > > > at
> > > >
> > >
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
> > > > at
> > > >
> > >
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
> > > > at
> > > >
> > >
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
> > > > at
> > > >
> > >
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
> > > > at
> > > >
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > > > at
> > > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > > > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > > at
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > > > at
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > > at java.lang.Thread.run(Thread.java:701)
> > > > 2015-10-12 18:41:20,479 WARN
> [o.a.c.s.d.ObjectInDataStoreManagerImpl]
> > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Unsupported data object
> > > > (VOLUME,
> > > > org.apache.cloudstack.storage.datastore.PrimaryDataStoreImpl@4fa7a45f
> > ),
> > > > no need to delete from object in store ref table
> > > > 2015-10-12 18:41:20,479 DEBUG [c.c.s.VolumeApiServiceImpl]
> > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) migrate volume
> > > > failed:com.cloud.utils.exception.CloudRuntimeException: Failed to
> send
> > > > command, due to Agent:38,
> > com.cloud.exception.OperationTimedoutException:
> > > > Commands 996939857 to Host 38 timed out after 7200
> > > > 2015-10-12 18:41:20,480 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> > > > (Job-Executor-63:ctx-f7b6817d) Complete async job-5257, jobStatus:
> > > FAILED,
> > > > resultCode: 530, result:
> > > >
> > >
> >
> org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530,"errortext":"Failed
> > > > to migrate volume"}
> > > > 2015-10-12 18:41:20,486 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> > > > (Job-Executor-63:ctx-f7b6817d) Done executing
> > > > org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd for
> > > job-5257
> > > > 2015-10-12 18:41:20,489 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
> > > > (Job-Executor-63:ctx-f7b6817d) Remove job-5257 from job monitoring
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > 
> > > > From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> > > > Sent: Monday, October 12, 2015 8:24 PM
> > > > To: users@cloudstack.apac

Re: [Questionable] Re: Timeout with live migration

2015-10-13 Thread Jakub Kublik
f.j.i.AsyncJobManagerImpl]
(Job-Executor-63:ctx-f7b6817d) Done executing
org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd for

job-5257

2015-10-12 18:41:20,489 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
(Job-Executor-63:ctx-f7b6817d) Remove job-5257 from job monitoring












From: Rafael Weingärtner [rafaelweingart...@gmail.com]
Sent: Monday, October 12, 2015 8:24 PM
To: users@cloudstack.apache.org
Subject: [Questionable]  Re: [Questionable] Re: Timeout with live

migration

Now I understand what you are doing, I am familiar with that concept

(live

migration of VM within a cluster, having the VHD being moved from one

SR

to

another).

I just got confused when I read live migration of volumes (a volume

does

not run by itself, so that why I asked a little for some more

information).

Looking at the source code this is the variable used to control the
timeout:
"long timeout = (_migratewait) * 1000L;"

The value of "_migratewait" is taken from this parameter:
value = (String) params.get("migratewait");
_migratewait = NumbersUtil.parseInt(value, 3600);

Therefore, the name of the parameter to be configured is "migratewait",

the

default value is 3600.


BTW1: I think that is a terrible parameter name. We should refactor

that,

could you open a Jira ticket for that?

BTW2: that error message you posted does not seem to be related to the
migration timeout; hence, in the code if the copy times out the message
would be:
"Async " + timeout/1000 + " seconds timeout for task " +

task.toString()"

Maybe because it throws a "Types.BadAsyncResult(msg)" and that might be
translated into that message, or that might not be related to the

problem

itself, and you just thought that it was.


Does it help you?


On Mon, Oct 12, 2015 at 10:00 PM, Ryan Farrington <
rfarring...@remitdata.com

wrote:
Hypervisor:  XenServer

We are moving a data volume from one storage onto another without

shutting

down the VM cause that would just be silly and a triplication of

effort

with the whole copying to secondary storage and then back off again.

The

volume is staying in the same cluster just moving to a different

Primary

storage (or SR in the XenServer vernacular)

If you are familiar with ESX this is a "Storage VMotion" where as in
XenServer it is called "Storage XenMotion".


From: Rafael Weingärtner [rafaelweingart...@gmail.com]
Sent: Monday, October 12, 2015 7:53 PM
To: users@cloudstack.apache.org
Subject: [Questionable]  Re: Timeout with live migration

what do you mean with livre migrating data volume ?!
I understand a live migration of a VM, but volumes...

do you mean live migrating a VM that has a volume attached?
are you migrating that volume to a different cluster? or just a

different

storage in the same cluster?
What hypervisor are you using ?


On Mon, Oct 12, 2015 at 9:47 PM, Ryan Farrington <
rfarring...@remitdata.com>
wrote:


Live migrating a data volume. We are purely on shared storage so no

local

storage is involved.


From: Rafael Weingärtner [rafaelweingart...@gmail.com]
Sent: Monday, October 12, 2015 7:37 PM
To: users@cloudstack.apache.org
Subject: [Questionable]  Re: Timeout with live migration

Are you live migrating a VM, or migrating a volume of a stopped VM

to a

different primary storage?

If it is a running VM, is the VM allocated in a shared storage or

local

storage?

On Mon, Oct 12, 2015 at 9:17 PM, Ryan Farrington <
rfarring...@remitdata.com>
wrote:


The slow transfer is related to the storage we are trying to

migrate

off

of.  We are capable of getting about 350mbps off the disks but

when

we

are

moving volumes that are greater than about 500GB we end up racing

the

clock

and hoping that the migration finishes before the job times out.

  It

would

be awesome to be able to manage that timeout and I know there

are a

ton

of

settings I just don't know about and am hoping someone might be

able

to

point me in the right direction.



From: Rafael Weingärtner [rafaelweingart...@gmail.com]
Sent: Monday, October 12, 2015 6:40 PM
To: users@cloudstack.apache.org
Subject: [Questionable]  Re: Timeout with live migration

I would first check your NICs' speed and load, the amount of RAM

allocated

for the migrating VM and than check the hypervisor log files.

On Mon, Oct 12, 2015 at 8:19 PM, Jan-Arve Nygård <
jan.arve.nyg...@gmail.com>
wrote:


What version are you running? Check if the copy.volume.wait

setting

is

set

to 7200 and increase it. If not you could also check
job.cancel.threshold.minutes and job.expire.minutes.

-Jan-Arve

2015-10-13 0:46 GMT+02:00 Ryan Farrington <

rfarring...@remitdata.com

:

We are experiencing a failure in cloudstack waiting for an

async

job

perf

RE: [Questionable] Re: Timeout with live migration

2015-10-12 Thread Ryan Farrington
> >
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
> > > at
> > >
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
> > > at
> > >
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
> > > at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > > at
> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > > at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > at java.lang.Thread.run(Thread.java:701)
> > > 2015-10-12 18:41:20,479 WARN  [o.a.c.s.d.ObjectInDataStoreManagerImpl]
> > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Unsupported data object
> > > (VOLUME,
> > > org.apache.cloudstack.storage.datastore.PrimaryDataStoreImpl@4fa7a45f
> ),
> > > no need to delete from object in store ref table
> > > 2015-10-12 18:41:20,479 DEBUG [c.c.s.VolumeApiServiceImpl]
> > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) migrate volume
> > > failed:com.cloud.utils.exception.CloudRuntimeException: Failed to send
> > > command, due to Agent:38,
> com.cloud.exception.OperationTimedoutException:
> > > Commands 996939857 to Host 38 timed out after 7200
> > > 2015-10-12 18:41:20,480 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> > > (Job-Executor-63:ctx-f7b6817d) Complete async job-5257, jobStatus:
> > FAILED,
> > > resultCode: 530, result:
> > >
> >
> org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530,"errortext":"Failed
> > > to migrate volume"}
> > > 2015-10-12 18:41:20,486 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> > > (Job-Executor-63:ctx-f7b6817d) Done executing
> > > org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd for
> > job-5257
> > > 2015-10-12 18:41:20,489 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
> > > (Job-Executor-63:ctx-f7b6817d) Remove job-5257 from job monitoring
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > 
> > > From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> > > Sent: Monday, October 12, 2015 8:24 PM
> > > To: users@cloudstack.apache.org
> > > Subject: [Questionable]  Re: [Questionable] Re: Timeout with live
> > migration
> > >
> > > Now I understand what you are doing, I am familiar with that concept
> > (live
> > > migration of VM within a cluster, having the VHD being moved from one
> SR
> > to
> > > another).
> > >
> > > I just got confused when I read live migration of volumes (a volume
> does
> > > not run by itself, so that why I asked a little for some more
> > information).
> > >
> > > Looking at the source code this is the variable used to control the
> > > timeout:
> > > "long timeout = (_migratewait) * 1000L;"
> > >
> > > The value of "_migratewait" is taken from this parameter:
> > > value = (String) params.get("migratewait");
> > > _migratewait = NumbersUtil.parseInt(value, 3600);
> > >
> > > Therefore, the name of the parameter to be configured is "migratewait",
> > the
> > > default value is 3600.
> > >
> > >
> > > BTW1: I think that is a terrible parameter name. We should refactor
> that,
> > > could you open a Jira ticket for that?
> > >
> > > BTW2: that error message you posted does not seem to be related to the
> > > migration timeout; hence, in the code if the copy times out the message
> > > would be:
> > > "Async " + timeout/1000 + " seconds timeout for task " +
> task.toString()"
> > >
> > > Maybe because it throws a "Types.BadAsyncResult(msg)" and that might be
> > > translated into that message, or that might not be related to the
> problem
> > > itself, and you just thought that it was.
> > >
> 

Re: [Questionable] Re: Timeout with live migration

2015-10-12 Thread Rafael Weingärtner
Now I understand what you are doing, I am familiar with that concept (live
migration of VM within a cluster, having the VHD being moved from one SR to
another).

I just got confused when I read live migration of volumes (a volume does
not run by itself, so that why I asked a little for some more information).

Looking at the source code this is the variable used to control the timeout:
"long timeout = (_migratewait) * 1000L;"

The value of "_migratewait" is taken from this parameter:
value = (String) params.get("migratewait");
_migratewait = NumbersUtil.parseInt(value, 3600);

Therefore, the name of the parameter to be configured is "migratewait", the
default value is 3600.


BTW1: I think that is a terrible parameter name. We should refactor that,
could you open a Jira ticket for that?

BTW2: that error message you posted does not seem to be related to the
migration timeout; hence, in the code if the copy times out the message
would be:
"Async " + timeout/1000 + " seconds timeout for task " + task.toString()"

Maybe because it throws a "Types.BadAsyncResult(msg)" and that might be
translated into that message, or that might not be related to the problem
itself, and you just thought that it was.


Does it help you?


On Mon, Oct 12, 2015 at 10:00 PM, Ryan Farrington  wrote:

> Hypervisor:  XenServer
>
> We are moving a data volume from one storage onto another without shutting
> down the VM cause that would just be silly and a triplication of effort
> with the whole copying to secondary storage and then back off again. The
> volume is staying in the same cluster just moving to a different Primary
> storage (or SR in the XenServer vernacular)
>
> If you are familiar with ESX this is a "Storage VMotion" where as in
> XenServer it is called "Storage XenMotion".
>
> 
> From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> Sent: Monday, October 12, 2015 7:53 PM
> To: users@cloudstack.apache.org
> Subject: [Questionable]  Re: Timeout with live migration
>
> what do you mean with livre migrating data volume ?!
> I understand a live migration of a VM, but volumes...
>
> do you mean live migrating a VM that has a volume attached?
> are you migrating that volume to a different cluster? or just a different
> storage in the same cluster?
> What hypervisor are you using ?
>
>
> On Mon, Oct 12, 2015 at 9:47 PM, Ryan Farrington <
> rfarring...@remitdata.com>
> wrote:
>
> > Live migrating a data volume. We are purely on shared storage so no local
> > storage is involved.
> >
> > 
> > From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> > Sent: Monday, October 12, 2015 7:37 PM
> > To: users@cloudstack.apache.org
> > Subject: [Questionable]  Re: Timeout with live migration
> >
> > Are you live migrating a VM, or migrating a volume of a stopped VM to a
> > different primary storage?
> >
> > If it is a running VM, is the VM allocated in a shared storage or local
> > storage?
> >
> > On Mon, Oct 12, 2015 at 9:17 PM, Ryan Farrington <
> > rfarring...@remitdata.com>
> > wrote:
> >
> > > The slow transfer is related to the storage we are trying to migrate
> off
> > > of.  We are capable of getting about 350mbps off the disks but when we
> > are
> > > moving volumes that are greater than about 500GB we end up racing the
> > clock
> > > and hoping that the migration finishes before the job times out.   It
> > would
> > > be awesome to be able to manage that timeout and I know there are a ton
> > of
> > > settings I just don't know about and am hoping someone might be able to
> > > point me in the right direction.
> > >
> > >
> > > 
> > > From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> > > Sent: Monday, October 12, 2015 6:40 PM
> > > To: users@cloudstack.apache.org
> > > Subject: [Questionable]  Re: Timeout with live migration
> > >
> > > I would first check your NICs' speed and load, the amount of RAM
> > allocated
> > > for the migrating VM and than check the hypervisor log files.
> > >
> > > On Mon, Oct 12, 2015 at 8:19 PM, Jan-Arve Nygård <
> > > jan.arve.nyg...@gmail.com>
> > > wrote:
> > >
> > > > What version are you running? Check if the copy.volume.wait setting
> is
> > > set
> > > > to 7200 and increase it. If not you could also check
> > > > job.cancel.threshold.minutes and job.expire.minutes.
> > > >
> > > > -Jan-Arve
> > > >
> > > > 2015-10-13 0:46 GMT+02:00 Ryan Farrington  >:
> > > >
> > > > > We are experiencing a failure in cloudstack waiting for an async
> job
> > > > > performing a live migration of a volume to finish. I've copied the
> > > > relevant
> > > > > log entries below.We acknowledge that the migration will take a few
> > > hours
> > > > > based on the volume of the data and we are looking for a way to
> > > increase
> > > > > the timeout of 7200 seconds into something we know we can work
> with.
> > > > >
> > > > >
> > > > > 2015-10-12 00:19:36,043 DEBUG [o.a.c.s.Remote

RE: [Questionable] Re: Timeout with live migration

2015-10-12 Thread Ryan Farrington
Hypervisor:  XenServer

We are moving a data volume from one storage onto another without shutting down 
the VM cause that would just be silly and a triplication of effort with the 
whole copying to secondary storage and then back off again. The volume is 
staying in the same cluster just moving to a different Primary storage (or SR 
in the XenServer vernacular) 

If you are familiar with ESX this is a "Storage VMotion" where as in XenServer 
it is called "Storage XenMotion". 


From: Rafael Weingärtner [rafaelweingart...@gmail.com]
Sent: Monday, October 12, 2015 7:53 PM
To: users@cloudstack.apache.org
Subject: [Questionable]  Re: Timeout with live migration

what do you mean with livre migrating data volume ?!
I understand a live migration of a VM, but volumes...

do you mean live migrating a VM that has a volume attached?
are you migrating that volume to a different cluster? or just a different
storage in the same cluster?
What hypervisor are you using ?


On Mon, Oct 12, 2015 at 9:47 PM, Ryan Farrington 
wrote:

> Live migrating a data volume. We are purely on shared storage so no local
> storage is involved.
>
> 
> From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> Sent: Monday, October 12, 2015 7:37 PM
> To: users@cloudstack.apache.org
> Subject: [Questionable]  Re: Timeout with live migration
>
> Are you live migrating a VM, or migrating a volume of a stopped VM to a
> different primary storage?
>
> If it is a running VM, is the VM allocated in a shared storage or local
> storage?
>
> On Mon, Oct 12, 2015 at 9:17 PM, Ryan Farrington <
> rfarring...@remitdata.com>
> wrote:
>
> > The slow transfer is related to the storage we are trying to migrate off
> > of.  We are capable of getting about 350mbps off the disks but when we
> are
> > moving volumes that are greater than about 500GB we end up racing the
> clock
> > and hoping that the migration finishes before the job times out.   It
> would
> > be awesome to be able to manage that timeout and I know there are a ton
> of
> > settings I just don't know about and am hoping someone might be able to
> > point me in the right direction.
> >
> >
> > 
> > From: Rafael Weingärtner [rafaelweingart...@gmail.com]
> > Sent: Monday, October 12, 2015 6:40 PM
> > To: users@cloudstack.apache.org
> > Subject: [Questionable]  Re: Timeout with live migration
> >
> > I would first check your NICs' speed and load, the amount of RAM
> allocated
> > for the migrating VM and than check the hypervisor log files.
> >
> > On Mon, Oct 12, 2015 at 8:19 PM, Jan-Arve Nygård <
> > jan.arve.nyg...@gmail.com>
> > wrote:
> >
> > > What version are you running? Check if the copy.volume.wait setting is
> > set
> > > to 7200 and increase it. If not you could also check
> > > job.cancel.threshold.minutes and job.expire.minutes.
> > >
> > > -Jan-Arve
> > >
> > > 2015-10-13 0:46 GMT+02:00 Ryan Farrington :
> > >
> > > > We are experiencing a failure in cloudstack waiting for an async job
> > > > performing a live migration of a volume to finish. I've copied the
> > > relevant
> > > > log entries below.We acknowledge that the migration will take a few
> > hours
> > > > based on the volume of the data and we are looking for a way to
> > increase
> > > > the timeout of 7200 seconds into something we know we can work with.
> > > >
> > > >
> > > > 2015-10-12 00:19:36,043 DEBUG [o.a.c.s.RemoteHostEndPoint]
> > > > (Job-Executor-62:ctx-802065a9 ctx-bb27a168) Failed to send command,
> due
> > > to
> > > > Agent:27, com.cloud.exception.OperationTimedoutException: Commands
> > > > 835325398 to Host 27 timed out after 7200
> > > >
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Rafael Weingärtner
> >
>
>
>
> --
> Rafael Weingärtner
>



--
Rafael Weingärtner