On 10/15/12 4:27 AM, "Nitin Mehta" <[email protected]> wrote:
>Hi Alena - Why is it that we prefer hosts on which vm is/was running for >creating snapshots - do we see performance benefits in any way or it was >a convenient way of choosing the host ? Anthony is the best person to answer this question; this logic was in place before I put in my changes. As far as I know, in VmWare case (according to Kelven), there are less chances for the createSnapshot command to fail when it's executed on the host where the vm resides on. >For performance benefits can we not choose a host in the cluster which >has the least load (in terms of create snapshot jobs or some better >metric) so as to balance the load across the cluster. Agree with you on that. But it would be much bigger of a change, we can implement it later. >I guess this would also help in first fit vm deployment algorithms to not >overload the host and starve the create snapshot jobs ? > >Also can we not have different config params for >concurrent.snapshots.threshold.perhost since the snapshot creation >implementation is drastically different in the hypervisors so one value >might not be the best idea here ? Yes, you can file an enhancement for this. > >Thanks, >-Nitin > >-----Original Message----- >From: Alena Prokharchyk [mailto:[email protected]] >Sent: Monday, October 15, 2012 9:24 AM >To: [email protected]; Anthony Xu >Subject: Re: FS on cloudStack createSnapshot synchronization improvement > >Anthony, I implemented the threshold logic on Api Layer, in SyncQueueJob >manager. In other words, before submitting the job for execution, we >should know the host the job would go first to - that would be the object >we are synchronizing on. >For createSnapshot it's always the host where vm is 1) running on (for >Running vm) 2) ran the last time on (for Stopped vm). Only when the >command fails on the initial host, we retry on other hosts in cluster. So >it would work like this: > >1) api call is made >2) Before submitting the async job to the queue, we figure out the host >id (getHostIdForSnapshotOperation method in SnapshotManagerImpl). Lets >say, the id of the host is 1. >3) The job is submitted with object to sync on = "host id=1". >4) Once the job is ready to execute, it goes to snapshot manager which >sends the command to the host id=1 first. If it fails by some reason, it >gets resent to other host in the cluster (if exist). And in this failure >scenario we don't do any synchronization. We've decided not to handle >this error case because it won't happen in most of the cases. > >I've checked the code for other commands you've mentioned; the host is >always picked up randomly from the list of hosts in cluster. So we can't >apply the same logic unless we fix the code to pick up the same host on >step 2) and step 4) without making callbacks from SnapshotManager to the >SyncQueueManager. > >I would appreciate any suggestions on how to implement it. > >Thank you, >Alena. > >From: Anthony Xu <[email protected]<mailto:[email protected]>> >Reply-To: >"[email protected]<mailto:[email protected] >e.org>" ><[email protected]<mailto:[email protected] >e.org>> >To: >"[email protected]<mailto:[email protected] >e.org>" ><[email protected]<mailto:[email protected] >e.org>> >Subject: RE: FS on cloudStack createSnapshot synchronization improvement > >There are several commands need this kind of threshold, e.g. Move volume, >create template from snapshot, So this is common requirement , not only >for createsnapshot. >Can we add threshold mechanism in host command queue to resolve this >issue? > > >Anthony > >-----Original Message----- >From: Edison Su [mailto:[email protected]] >Sent: Thursday, October 11, 2012 4:42 PM >To: >[email protected]<mailto:[email protected] >.org> >Subject: RE: FS on cloudStack createSnapshot synchronization improvement > >I only have one comment: > Can we put this snapshot improvement code out of snapshotmanager? > >-----Original Message----- >From: Alena Prokharchyk [mailto:[email protected]] >Sent: Tuesday, October 09, 2012 11:51 AM >To: >[email protected]<mailto:[email protected] >.org> >Subject: FS on cloudStack createSnapshot synchronization improvement Hi >All, I'm planning to introduce some changes to create snapshot behavior >for the future cloudStack release (the changes will go to asf/master >branch). >The >fix is fixing the problem described below: >"With the current code for snapshots, cloudStack always creates snapshot >on the host where vm is Running (for vms in Running state) or on the >host where vm used to run the last time (for vms in Stopped state). As >the createSnapshot commands are not synchronized on the agent side, the >case when multiple commands are send to the backend at the same time can >lead to the performance issues on the hypervisor side. At the end there >is a high possibility that createSnapshot command might time out on the >Xen side. >The solution is to synchronize number of concurrent snapshots per host >basis. The threshold should be configurable as the customer usually knows >how many snapshots at a time the backend can handle. >While the concurrent snapshots are being processed by the backend, all >subsequent snapshot commands scheduled for execution on the same host, >should wait in the queue" >Here is the feature FS available for the review: >https://cwiki.apache.org/confluence/display/CLOUDSTACK/Snapshot+improv >e >ment >s+FS >If you have any comments/suggestions/questions on the implementation, >please let me know. >-Alena. > > >
