Awesome! Thanks for your inputs. I will work on them, and as soon as I have something I will ping you.
On Mon, Jul 16, 2018 at 8:26 PM, Tutkowski, Mike <mike.tutkow...@netapp.com> wrote: > Yeah, I just meant that was a workaround. As you pointed out, that > workaround doesn’t make use of the migrateVirtualMachineWithVolume API > command, though. > > On 7/16/18, 5:23 PM, "Rafael Weingärtner" <rafaelweingart...@gmail.com> > wrote: > > Thanks for the answers Mike. I will not be able to do it today, but I > will > manage to do it this week. There is only one last doubt. > > [Mike] At least for KVM, you can shut the VM down and perform an > offline > migration > of the volume from managed storage to non-managed storage. It’s > possible we > may > support such a similar behavior with other hypervisor types in the > future. > > [Rafael] I guess that we can shut down XenServer VMs and then migrate > the > volumes later, right? However, the method in question here > (migrateVirtualMachineWithVolume) is not supposed to execute such > steps, is > it? > > > On Mon, Jul 16, 2018 at 8:17 PM, Tutkowski, Mike < > mike.tutkow...@netapp.com> > wrote: > > > - So, managed storage can be cluster and zone wide. Is that > correct? > > > > [Mike] Correct > > > > - If I want to migrate a VM across clusters, but if at least > one of > > its > > volumes is placed in a cluster-wide managed storage, the > migration > > is not > > allowed. Is that it? > > > > [Mike] Correct > > > > - A volume placed in managed storage can never (at least not > using > > this > > migrateWithVolume method) be migrated out of the storage pool > it > > resides. > > is this statement right? Do you have alternative/other > execution > > flow > > regarding this scenario? > > > > [Mike] At least for KVM, you can shut the VM down and perform an > offline > > migration > > of the volume from managed storage to non-managed storage. It’s > possible > > we may > > support such a similar behavior with other hypervisor types in the > future. > > > > - When migrating a VM that does not have volumes in managed > > storage, it > > should be possible to migrate it cross clusters. Therefore, we > > should try > > to use the volume allocators to find a suitable storage pool > for its > > volumes in the target cluster > > > > [Mike] It’s OK here if one or more of the volumes is on managed > storage. > > The “trick” is > > that it needs to be on zone-wide managed storage that is visible to > both > > the source and > > destination compute clusters. You cannot specify a new storage pool > for > > any of these volumes > > (each must remain on its current, zone-wide primary storage). > > > > If you can add these new constraints into the code, I can review them > > later. I’m a bit > > pressed for time this week, so it might not be possible to do so > right > > away. Thanks! > > > > On 7/16/18, 3:52 PM, "Rafael Weingärtner" < > rafaelweingart...@gmail.com> > > wrote: > > > > Thanks for your feedback Mike. I actually did not want to change > this > > “migrateVirtualMachineWithVolume” API method. Everything > started when > > we > > wanted to create a feature to allow volume placement overrides. > This > > means, > > allowing root admins to place/migrate the volume to a storage > pool that > > might not be “allowed” (according to its current disk offering). > This > > feature was later expanded to allow changing the disk offering > while > > executing a storage migration (this means allowing changes on > volume’s > > QoS). Thus, creating a mechanism within ACS to allow disk > offerings > > replacement (as opposed to DB intervention, which was the way it > was > > being > > done so far). The rationale behind these extensions/enhancement > is > > that the > > root admins are wise/experts (at least we expect them to be). > > Therefore, > > they know what they are doing when overriding or replacing a disk > > offering > > of a user. > > > > So, why am I changing this “migrateVirtualMachineWithVolume” API > > method? > > When we allowed that override procedure, it broke the migration > of VMs > > that > > had volumes initially placed in NFS and then replaced (via > override) in > > local storage. It had something to do with the way ACS was > detecting > > if the > > VM has a local storage. Then, when I went to the method to fix > it; it > > was > > very convoluted to read and understand. Therefore, I re-wrote, > and I > > missed > > your use case. I am sorry for that. Moreover, I do intend to > keep with > > the > > current code, as we already have other features developed on top > of > > it, and > > this code is well documented and unit tested. It is only a > matter of > > adding > > your requirement there. > > > > Now, let’s fix the problem. I will not point code here. I only > want to > > understand the idea for now. > > > > - So, managed storage can be cluster and zone wide. Is that > correct? > > - If I want to migrate a VM across clusters, but if at least > one of > > its > > volumes is placed in a cluster-wide managed storage, the > migration > > is not > > allowed. Is that it? > > - A volume placed in managed storage can never (at least not > using > > this > > migrateWithVolume method) be migrated out of the storage pool > it > > resides. > > is this statement right? Do you have alternative/other > execution > > flow > > regarding this scenario? > > - When migrating a VM that does not have volumes in managed > > storage, it > > should be possible to migrate it cross clusters. Therefore, we > > should try > > to use the volume allocators to find a suitable storage pool > for its > > volumes in the target cluster > > > > Are these all of the use cases that were left behind? > > > > On Mon, Jul 16, 2018 at 5:36 PM, Tutkowski, Mike < > > mike.tutkow...@netapp.com> > > wrote: > > > > > For your feature, Rafael, are you trying to support the > migration of > > a VM > > > that has local storage from one cluster to another or is > > intra-cluster > > > migration of local storage sufficient? > > > > > > There is the migrateVolume API (you can pass in “live migrate” > > parameter): > > > > > > http://cloudstack.apache.org/api/apidocs-4.11/apis/ > > migrateVolume.html > > > > > > There is also the migrateVirtualMachineWithVolume (one or more > > volumes). > > > This is especially useful for moving a VM with its storage > from one > > cluster > > > to another: > > > > > > http://cloudstack.apache.org/api/apidocs-4.11/apis/ > > > migrateVirtualMachineWithVolume.html > > > > > > On 7/16/18, 2:20 PM, "Tutkowski, Mike" < > mike.tutkow...@netapp.com> > > wrote: > > > > > > Actually, I think I answered both of your questions with > these > > two > > > prior e-mails. Please let me know if you need further > clarification. > > Thanks! > > > > > > On 7/16/18, 2:17 PM, "Tutkowski, Mike" < > > mike.tutkow...@netapp.com> > > > wrote: > > > > > > Allow me to correct what I said here: > > > > > > “If getDefaultMappingOfVolumesAndStoragePoolForMigration > is > > > invoked, we silently ignore the (faulty) input (which is a new > > storage > > > pool) from the user and keep the volume in its same managed > storage > > pool > > > (the user may wonder why it wasn’t migrated if they don’t get > an > > error > > > message back telling them this is not allowed).” > > > > > > I should have said the following: > > > > > > If getDefaultMappingOfVolumesAndStoragePoolForMigration > is > > > invoked on a VM that is using managed storage that is only at > the > > cluster > > > level (managed storage can be at either the zone or cluster > level) > > and we > > > are trying to migrate the VM from one cluster to another, this > > operation > > > should fail (as the old code detects). The new code tries to > keep the > > > volume in the same storage pool (but that storage pool will > not be > > visible > > > to the hosts in the destination compute cluster). > > > > > > On 7/16/18, 2:10 PM, "Tutkowski, Mike" < > > mike.tutkow...@netapp.com> > > > wrote: > > > > > > Let me answer the questions in two separate > e-mails. > > > > > > This answer deals with what you wrote about this > code: > > > > > > > if (destPool.getId() == currentPool.getId()) > { > > > > volumeToPoolObjectMap.put(volume, > > currentPool); > > > > } else { > > > > throw new CloudRuntimeException("Currently, > a > > > volume on managed > > > > storage can only be 'migrated' to itself."); > > > > } > > > > > > > > > > The code above is invoked if the user tries to > migrate a > > > volume that’s on managed storage to another storage pool. At > > present, such > > > volumes can be migrated when a VM is migrated from one compute > > cluster to > > > another, but those volumes have to remain on the same managed > > storage. > > > > > > Here’s an example: > > > > > > Let’s say VM_1 is in Cluster_1. VM_1 has a root (or > > data) disk > > > on managed storage. We try to migrate the VM from Cluster_1 to > > Cluster_2 > > > and specify a new storage pool for the volume. This case should > > fail. To > > > make it work, you need to either 1) not specify a new storage > pool > > or 2) > > > specify the same storage pool the volume is already in. If the > > managed > > > storage in question is zone wide, then it can be used from both > > Cluster_1 > > > and Cluster_2. > > > > > > The new code might call > getDefaultMappingOfVolumesAndS > > toragePoolForMigration > > > (if no storage pools at all are passed in to the API) or it > might > > call > > > createMappingVolumeAndStoragePoolEnteredByUser. > > > > > > If getDefaultMappingOfVolumesAndS > toragePoolForMigration > > is > > > invoked, we silently ignore the (faulty) input (which is a new > > storage > > > pool) from the user and keep the volume in its same managed > storage > > pool > > > (the user may wonder why it wasn’t migrated if they don’t get > an > > error > > > message back telling them this is not allowed). > > > > > > If createMappingVolumeAndStoragePoolEnteredByUser > is > > invoked, > > > we seem to have a bigger problem (code is below): > > > > > > I do not believe you are required to pass in a new > > storage > > > pool for each and every volume of the VM. If the VM has, say, > three > > > volumes, you may only try to migrate two of the volumes to new > > storage > > > pools. This logic seems to assume if you want to migrate one > of the > > VM’s > > > volumes, then you necessarily want to migrate all of the VM’s > > volumes. I > > > believe it’s possible for targetPool to come back null and > later > > throw a > > > NullPointerException. The old code walks through each volume > of the > > VM and > > > checks if there is a new storage pool specified for it. If so, > do one > > > thing; else, do something else. > > > > > > private Map<Volume, StoragePool> > > > createMappingVolumeAndStoragePoolEnteredByUser( > VirtualMachineProfile > > > profile, Host host, Map<Long, Long> volumeToPool) { > > > Map<Volume, StoragePool> > volumeToPoolObjectMap = > > new > > > HashMap<Volume, StoragePool>(); > > > for(Long volumeId: volumeToPool.keySet()) { > > > VolumeVO volume = > > _volsDao.findById(volumeId); > > > > > > Long poolId = > volumeToPool.get(volumeId); > > > StoragePoolVO targetPool = > > > _storagePoolDao.findById(poolId); > > > StoragePoolVO currentPool = > > > _storagePoolDao.findById(volume.getPoolId()); > > > > > > if (_poolHostDao.findByPoolHost( > > targetPool.getId(), > > > host.getId()) == null) { > > > throw new > CloudRuntimeException(String. > > format("Cannot > > > migrate the volume [%s] to the storage pool [%s] while > migrating VM > > [%s] to > > > target host [%s]. The host does not have access to the storage > pool > > > entered.", volume.getUuid(), targetPool.getUuid(), > profile.getUuid(), > > > host.getUuid())); > > > } > > > if (currentPool.getId() == > > targetPool.getId()) { > > > s_logger.info(String.format("The > volume > > [%s] > > > is already allocated in storage pool [%s].", volume.getUuid(), > > > targetPool.getUuid())); > > > } > > > volumeToPoolObjectMap.put(volume, > > targetPool); > > > } > > > return volumeToPoolObjectMap; > > > } > > > > > > On 7/16/18, 5:13 AM, "Rafael Weingärtner" < > > > rafaelweingart...@gmail.com> wrote: > > > > > > Ok, I see what happened there with the > migration to > > > cluster. When I re-did > > > the code I did not have this case. And > therefore, in > > the > > > old code, I was > > > not seeing this use case (convoluted code, > lack of > > > documentation, and so > > > on; we all know the story). I will fix it. > > > > > > Regarding the managed storage issue, can you > > describe the > > > “special > > > handling” you need? > > > > > > Are you talking about this: > > > > > > > if (destPool.getId() == currentPool.getId()) > { > > > > volumeToPoolObjectMap.put(volume, > > currentPool); > > > > } else { > > > > throw new CloudRuntimeException("Currently, > a > > > volume on managed > > > > storage can only be 'migrated' to itself."); > > > > } > > > > > > > > > > > > > That is a simple validation, right? A > validation to > > throw > > > an exception if > > > the user tries to migrate the volume to some > other > > storage > > > pool. Is that > > > it? If that is the case, the default method > > > “getDefaultMappingOfVolumesAndS > > toragePoolForMigration” > > > already takes care > > > of this. Meaning, that it will not try to move > the > > volume > > > to other storage > > > pool. > > > > > > On the other hand, we need to add a validation > in the > > > “createMappingVolumeAndStorageP > oolEnteredByUser” > > method > > > then. > > > I will wait for your feedback before starting > to > > code. > > > Thanks for spotting > > > this issue. > > > > > > On Sun, Jul 15, 2018 at 9:11 PM, Tutkowski, > Mike < > > > mike.tutkow...@netapp.com> > > > wrote: > > > > > > > Hi Rafael, > > > > > > > > Thanks for your time on this. > > > > > > > > Here is an example where the new code > deviates > > from the > > > old code in a > > > > critical fashion (code right below is new): > > > > > > > > private Map<Volume, StoragePool> > > > getDefaultMappingOfVolumesAndS > > > > toragePoolForMigration(VirtualMachineProfile > > profile, > > > Host targetHost) { > > > > Map<Volume, StoragePool> > > volumeToPoolObjectMap = > > > new > > > > HashMap<Volume, StoragePool>(); > > > > List<VolumeVO> allVolumes = _volsDao. > > > findUsableVolumesForInstance( > > > > profile.getId()); > > > > for (VolumeVO volume : allVolumes) { > > > > StoragePoolVO currentPool = > > > _storagePoolDao.findById( > > > > volume.getPoolId()); > > > > if (ScopeType.HOST.equals( > > currentPool.getScope())) > > > { > > > > > createVolumeToStoragePoolMappi > > > ngIfNeeded(profile, > > > > targetHost, volumeToPoolObjectMap, volume, > > currentPool); > > > > } else { > > > > volumeToPoolObjectMap.put( > volume, > > > currentPool); > > > > } > > > > } > > > > return volumeToPoolObjectMap; > > > > } > > > > > > > > What happens in the new code (above) is if > the user > > > didn’t pass in a > > > > storage pool to migrate the virtual disk to > (but > > the VM > > > is being migrated > > > > to a new cluster), this code just assigns the > > virtual > > > disk to its current > > > > storage pool (which is not going to be > visible to > > any of > > > the hosts in the > > > > new compute cluster). > > > > > > > > In the old code (I’m looking at 4.11.3 > here), you > > could > > > look around line > > > > 2337 for the following code (in the > > > VirtualMachineManagerImpl. > > > > getPoolListForVolumesForMigration method): > > > > > > > > // Find a suitable pool > for the > > > volume. Call the > > > > storage pool allocator to find the list of > pools. > > > > > > > > final DiskProfile > diskProfile > > = new > > > > DiskProfile(volume, diskOffering, > > > profile.getHypervisorType()); > > > > final > DataCenterDeployment > > plan = new > > > > DataCenterDeployment(host.getDataCenterId(), > > > host.getPodId(), > > > > host.getClusterId(), > > > > host.getId(), > null, > > null); > > > > > > > > final List<StoragePool> > > poolList = > > > new ArrayList<>(); > > > > final ExcludeList avoid > = new > > > ExcludeList(); > > > > > > > > for (final > StoragePoolAllocator > > > allocator : > > > > _storagePoolAllocators) { > > > > final > List<StoragePool> > > > poolListFromAllocator = > > > > allocator.allocateToPool(diskProfile, > profile, > > plan, > > > avoid, > > > > StoragePoolAllocator.RETURN_UPTO_ALL); > > > > > > > > if > (poolListFromAllocator > > != > > > null && > > > > !poolListFromAllocator.isEmpty()) { > > > > poolList.addAll( > > > poolListFromAllocator); > > > > } > > > > } > > > > > > > > This old code would find an applicable > storage > > pool in > > > the destination > > > > cluster (one that can be seen by the hosts > in that > > > compute cluster). > > > > > > > > I think the main error in the new logic is > the > > > assumption that a VM can > > > > only be migrated to a host in the same > computer > > cluster. > > > For XenServer > > > > (perhaps for other hypervisor types?), we > support > > > cross-cluster VM > > > > migration. > > > > > > > > The other issue I noticed is that there is no > > logic in > > > the new code that > > > > checks for managed-storage use cases. If you > look > > in the > > > > VirtualMachineManagerImpl. > > getPoolListForVolumesForMigration > > > method in the > > > > old code, there is special handling for > managed > > storage. > > > I don’t see this > > > > reproduced in the new logic. > > > > > > > > I sympathize with your point that all tests > passed > > yet > > > this issue was not > > > > uncovered. Unfortunately, I suspect we have a > > fairly low > > > % coverage of > > > > automated tests on CloudStack. If we ever > did get > > to a > > > high % of automated > > > > test coverage, we might be able to spin up > new > > releases > > > more frequently. As > > > > the case stands today, however, there are > probably > > many > > > un-tested use cases > > > > when it comes to our automated suite of > tests. > > > > > > > > Thanks again! > > > > Mike > > > > > > > > On 7/15/18, 4:19 PM, "Rafael Weingärtner" < > > > rafaelweingart...@gmail.com> > > > > wrote: > > > > > > > > Mike, are you able to pin-point in the > > old/replaced > > > code the bit that > > > > was > > > > handling your use case? I took the most > care > > not to > > > break anything. > > > > Also, your test case, isn't it in the > ACS' > > > integration test suite? In > > > > theory, all test passed when we merged > the PR. > > > > > > > > I sure can take a look at it. Can you > detail > > your > > > use case? I mean, the > > > > high level execution flow. What API > methods > > you do, > > > what you expected > > > > to > > > > happen, and what is happening today. > > > > > > > > On Sun, Jul 15, 2018 at 3:25 AM, > Tutkowski, > > Mike < > > > > mike.tutkow...@netapp.com> > > > > wrote: > > > > > > > > > It looks like this is the problematic > PR: > > > > > > > > > > https://github.com/apache/ > > cloudstack/pull/2425/ > > > > > > > > > > On 7/15/18, 12:20 AM, "Tutkowski, > Mike" < > > > mike.tutkow...@netapp.com> > > > > wrote: > > > > > > > > > > Hi, > > > > > > > > > > While running managed-storage > regression > > tests > > > tonight, I > > > > noticed a > > > > > problem that is not related to managed > > storage. > > > > > > > > > > CLOUDSTACK-10240 is a ticket > asking that > > we > > > allow the migration > > > > of a > > > > > virtual disk that’s on local storage to > > shared > > > storage. In the > > > > process of > > > > > enabling this feature, the > > > VirtualMachineManagerImpl. > > > > > getPoolListForVolumesForMigration > method was > > > re-written in a way > > > > that > > > > > completely breaks at least one use > case: > > Migrating > > > a VM across > > > > compute > > > > > clusters (at least supported in > XenServer). > > If, > > > say, a virtual disk > > > > resides > > > > > on shared storage in the source compute > > cluster, > > > we must be able to > > > > copy > > > > > this virtual disk to shared storage in > the > > > destination compute > > > > cluster. > > > > > > > > > > As the code is currently written, > this > > is no > > > longer possible. It > > > > also > > > > > seems that the managed-storage logic > has been > > > dropped for some > > > > reason in > > > > > the new implementation. > > > > > > > > > > Rafael – It seems that you worked > on this > > > feature. Would you be > > > > able > > > > > to look into this and create a PR? > > > > > > > > > > Thanks, > > > > > Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Rafael Weingärtner > > > > > > > > > > > > > > > > > > > > > -- > > > Rafael Weingärtner > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Rafael Weingärtner > > > > > > > > > -- > Rafael Weingärtner > > > -- Rafael Weingärtner