Re: FS on cloudStack createSnapshot synchronization improvement

Alena Prokharchyk Mon, 15 Oct 2012 09:45:27 -0700

On 10/15/12 4:27 AM, "Nitin Mehta" <[email protected]> wrote:


>Hi Alena - Why is it that we prefer hosts on which vm is/was  running for
>creating snapshots - do we see performance benefits in any way or it was
>a convenient way of choosing the host ?


Anthony is the best person to answer this question; this logic was in
place before I put in my changes. As far as I know, in VmWare case
(according to Kelven), there are less chances for the createSnapshot
command to fail when it's executed on the host where the vm resides on.


>For performance benefits can we not choose a host in the cluster which
>has the least load (in terms of create snapshot jobs or some better
>metric) so as to balance the load across the cluster.

Agree with you on that. But it would be much bigger of a change, we can
implement it later.

>I guess this would also help in first fit vm deployment algorithms to not
>overload the host and starve the create snapshot jobs ?
>
>Also can we not have different config params for
>concurrent.snapshots.threshold.perhost since the snapshot creation
>implementation is drastically different in the hypervisors so one value
>might not be the best idea here ?

Yes, you can file an enhancement for this.

>
>Thanks,
>-Nitin
>
>-----Original Message-----
>From: Alena Prokharchyk [mailto:[email protected]]
>Sent: Monday, October 15, 2012 9:24 AM
>To: [email protected]; Anthony Xu
>Subject: Re: FS on cloudStack createSnapshot synchronization improvement
>
>Anthony,  I implemented the threshold logic on Api Layer, in SyncQueueJob
>manager. In other words, before submitting the job for execution, we
>should know the host the job would go first to - that would be the object
>we are synchronizing on.
>For createSnapshot it's always the host where vm is 1) running on (for
>Running vm) 2) ran the last time on (for Stopped vm). Only when the
>command fails on the initial host, we retry on other hosts in cluster. So
>it would work like this:
>
>1) api call is made
>2) Before submitting the async job to the queue, we figure out the host
>id (getHostIdForSnapshotOperation method in SnapshotManagerImpl). Lets
>say, the id of the host is 1.
>3) The job is submitted with object to sync on = "host id=1".
>4) Once the job is ready to execute, it goes to snapshot manager which
>sends the command to the host id=1 first. If it fails by some reason, it
>gets resent to other host in the cluster (if exist). And in this failure
>scenario we don't do any synchronization. We've decided not to handle
>this error case because it won't happen in most of the cases.
>
>I've checked the code for other commands you've mentioned; the host is
>always picked up randomly from the list of hosts in cluster. So we can't
>apply the same logic unless we fix the code to pick up the same host on
>step 2) and step 4) without making callbacks from SnapshotManager to the
>SyncQueueManager.
>
>I would appreciate any suggestions on how to implement it.
>
>Thank you,
>Alena.
>
>From: Anthony Xu <[email protected]<mailto:[email protected]>>
>Reply-To: 
>"[email protected]<mailto:[email protected]
>e.org>" 
><[email protected]<mailto:[email protected]
>e.org>>
>To: 
>"[email protected]<mailto:[email protected]
>e.org>" 
><[email protected]<mailto:[email protected]
>e.org>>
>Subject: RE: FS on cloudStack createSnapshot synchronization improvement
>
>There are several commands need this kind of threshold, e.g. Move volume,
>create template from snapshot, So this is common requirement , not only
>for createsnapshot.
>Can we add threshold mechanism in host command queue to resolve this
>issue?
>
>
>Anthony
>
>-----Original Message-----
>From: Edison Su [mailto:[email protected]]
>Sent: Thursday, October 11, 2012 4:42 PM
>To: 
>[email protected]<mailto:[email protected]
>.org>
>Subject: RE: FS on cloudStack createSnapshot synchronization improvement
>
>I only have one comment:
>  Can we put this snapshot improvement code out of snapshotmanager?
>
>-----Original Message-----
>From: Alena Prokharchyk [mailto:[email protected]]
>Sent: Tuesday, October 09, 2012 11:51 AM
>To: 
>[email protected]<mailto:[email protected]
>.org>
>Subject: FS on cloudStack createSnapshot synchronization improvement Hi
>All, I'm planning to introduce some changes to create snapshot behavior
>for the future cloudStack release (the changes will go to asf/master
>branch).
>The
>fix is fixing the problem described below:
>"With  the current code for snapshots, cloudStack always creates snapshot
>on  the host where vm is Running (for vms in Running state) or on the
>host  where vm used to run the last time (for vms in Stopped state). As
>the createSnapshot commands are not synchronized on the agent side, the
>case when multiple  commands are send to the backend at the same time can
>lead to the  performance issues on the hypervisor side.  At the end there
>is a high  possibility that createSnapshot command might time out on the
>Xen side.
>The  solution is to synchronize number of concurrent snapshots per host
>basis. The threshold should be configurable as the customer usually knows
>how many snapshots at a time the backend can handle.
>While the  concurrent snapshots are being processed by the backend, all
>subsequent  snapshot commands scheduled for execution on the same host,
>should wait  in the queue"
>Here is the feature FS available for the review:
>https://cwiki.apache.org/confluence/display/CLOUDSTACK/Snapshot+improv
>e
>ment
>s+FS
>If you have any comments/suggestions/questions on the implementation,
>please let me know.
>-Alena.
>
>
>

Re: FS on cloudStack createSnapshot synchronization improvement

Reply via email to