Hi Alena - Why is it that we prefer hosts on which vm is/was  running for 
creating snapshots - do we see performance benefits in any way or it was a 
convenient way of choosing the host ?
For performance benefits can we not choose a host in the cluster which has the 
least load (in terms of create snapshot jobs or some better metric) so as to 
balance the load across the cluster.
I guess this would also help in first fit vm deployment algorithms to not 
overload the host and starve the create snapshot jobs ?

Also can we not have different config params for 
concurrent.snapshots.threshold.perhost since the snapshot creation 
implementation is drastically different in the hypervisors so one value might 
not be the best idea here ?

Thanks,
-Nitin

-----Original Message-----
From: Alena Prokharchyk [mailto:[email protected]] 
Sent: Monday, October 15, 2012 9:24 AM
To: [email protected]; Anthony Xu
Subject: Re: FS on cloudStack createSnapshot synchronization improvement

Anthony,  I implemented the threshold logic on Api Layer, in SyncQueueJob 
manager. In other words, before submitting the job for execution, we should 
know the host the job would go first to - that would be the object we are 
synchronizing on.
For createSnapshot it's always the host where vm is 1) running on (for Running 
vm) 2) ran the last time on (for Stopped vm). Only when the command fails on 
the initial host, we retry on other hosts in cluster. So it would work like 
this:

1) api call is made
2) Before submitting the async job to the queue, we figure out the host id 
(getHostIdForSnapshotOperation method in SnapshotManagerImpl). Lets say, the id 
of the host is 1.
3) The job is submitted with object to sync on = "host id=1".
4) Once the job is ready to execute, it goes to snapshot manager which sends 
the command to the host id=1 first. If it fails by some reason, it gets resent 
to other host in the cluster (if exist). And in this failure scenario we don't 
do any synchronization. We've decided not to handle this error case because it 
won't happen in most of the cases.

I've checked the code for other commands you've mentioned; the host is always 
picked up randomly from the list of hosts in cluster. So we can't apply the 
same logic unless we fix the code to pick up the same host on step 2) and step 
4) without making callbacks from SnapshotManager to the SyncQueueManager.

I would appreciate any suggestions on how to implement it.

Thank you,
Alena.

From: Anthony Xu <[email protected]<mailto:[email protected]>>
Reply-To: 
"[email protected]<mailto:[email protected]>"
 
<[email protected]<mailto:[email protected]>>
To: 
"[email protected]<mailto:[email protected]>"
 
<[email protected]<mailto:[email protected]>>
Subject: RE: FS on cloudStack createSnapshot synchronization improvement

There are several commands need this kind of threshold, e.g. Move volume, 
create template from snapshot, So this is common requirement , not only for 
createsnapshot.
Can we add threshold mechanism in host command queue to resolve this issue?


Anthony

-----Original Message-----
From: Edison Su [mailto:[email protected]]
Sent: Thursday, October 11, 2012 4:42 PM
To: 
[email protected]<mailto:[email protected]>
Subject: RE: FS on cloudStack createSnapshot synchronization improvement

I only have one comment:
  Can we put this snapshot improvement code out of snapshotmanager?

-----Original Message-----
From: Alena Prokharchyk [mailto:[email protected]]
Sent: Tuesday, October 09, 2012 11:51 AM
To: 
[email protected]<mailto:[email protected]>
Subject: FS on cloudStack createSnapshot synchronization improvement Hi All, 
I'm planning to introduce some changes to create snapshot behavior for the 
future cloudStack release (the changes will go to asf/master branch).
The
fix is fixing the problem described below:
"With  the current code for snapshots, cloudStack always creates snapshot on  
the host where vm is Running (for vms in Running state) or on the host  where 
vm used to run the last time (for vms in Stopped state). As the createSnapshot 
commands are not synchronized on the agent side, the case when multiple  
commands are send to the backend at the same time can lead to the  performance 
issues on the hypervisor side.  At the end there is a high  possibility that 
createSnapshot command might time out on the Xen side.
The  solution is to synchronize number of concurrent snapshots per host basis. 
The threshold should be configurable as the customer usually knows how many 
snapshots at a time the backend can handle.
While the  concurrent snapshots are being processed by the backend, all 
subsequent  snapshot commands scheduled for execution on the same host, should 
wait  in the queue"
Here is the feature FS available for the review:
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Snapshot+improv
e
ment
s+FS
If you have any comments/suggestions/questions on the implementation, please 
let me know.
-Alena.


Reply via email to