Anthony,  I implemented the threshold logic on Api Layer, in SyncQueueJob 
manager. In other words, before submitting the job for execution, we should 
know the host the job would go first to – that would be the object we are 
synchronizing on.
For createSnapshot it's always the host where vm is 1) running on (for Running 
vm) 2) ran the last time on (for Stopped vm). Only when the command fails on 
the initial host, we retry on other hosts in cluster. So it would work like 
this:

1) api call is made
2) Before submitting the async job to the queue, we figure out the host id 
(getHostIdForSnapshotOperation method in SnapshotManagerImpl). Lets say, the id 
of the host is 1.
3) The job is submitted with object to sync on = "host id=1".
4) Once the job is ready to execute, it goes to snapshot manager which sends 
the command to the host id=1 first. If it fails by some reason, it gets resent 
to other host in the cluster (if exist). And in this failure scenario we don't 
do any synchronization. We've decided not to handle this error case because it 
won't happen in most of the cases.

I've checked the code for other commands you've mentioned; the host is always 
picked up randomly from the list of hosts in cluster. So we can't apply the 
same logic unless we fix the code to pick up the same host on step 2) and step 
4) without making callbacks from SnapshotManager to the SyncQueueManager.

I would appreciate any suggestions on how to implement it.

Thank you,
Alena.

From: Anthony Xu <[email protected]<mailto:[email protected]>>
Reply-To: 
"[email protected]<mailto:[email protected]>"
 
<[email protected]<mailto:[email protected]>>
To: 
"[email protected]<mailto:[email protected]>"
 
<[email protected]<mailto:[email protected]>>
Subject: RE: FS on cloudStack createSnapshot synchronization improvement

There are several commands need this kind of threshold, e.g. Move volume, 
create template from snapshot,
So this is common requirement , not only for createsnapshot.
Can we add threshold mechanism in host command queue to resolve this issue?


Anthony

-----Original Message-----
From: Edison Su [mailto:[email protected]]
Sent: Thursday, October 11, 2012 4:42 PM
To: 
[email protected]<mailto:[email protected]>
Subject: RE: FS on cloudStack createSnapshot synchronization improvement

I only have one comment:
  Can we put this snapshot improvement code out of snapshotmanager?

-----Original Message-----
From: Alena Prokharchyk [mailto:[email protected]]
Sent: Tuesday, October 09, 2012 11:51 AM
To: 
[email protected]<mailto:[email protected]>
Subject: FS on cloudStack createSnapshot synchronization improvement
Hi All,
I'm planning to introduce some changes to create snapshot behavior for
the future cloudStack release (the changes will go to asf/master
branch).
The
fix is fixing the problem described below:
"With  the current code for snapshots, cloudStack always creates
snapshot on  the host where vm is Running (for vms in Running state)
or on the host  where vm used to run the last time (for vms in Stopped
state). As the createSnapshot commands are not synchronized on the
agent side, the case when multiple  commands are send to the backend
at the same time can lead to the  performance issues on the hypervisor
side.  At the end there is a high  possibility that createSnapshot
command might time out on the Xen side.
The  solution is to synchronize number of concurrent snapshots per
host basis. The threshold should be configurable as the customer
usually knows how many snapshots at a time the backend can handle.
While the  concurrent snapshots are being processed by the backend,
all subsequent  snapshot commands scheduled for execution on the same
host, should wait  in the queue"
Here is the feature FS available for the review:
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Snapshot+improv
e
ment
s+FS
If you have any comments/suggestions/questions on the implementation,
please let me know.
-Alena.


Reply via email to