[ https://issues.apache.org/jira/browse/CLOUDSTACK-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14214498#comment-14214498 ]
Joris van Lieshout commented on CLOUDSTACK-7857: ------------------------------------------------ Hi Anthony, I agree that that there is no reliable way to do this beforehand so isn't it better to do it whenever an instance is started on/migrated to a host, or recalculate the free memory metric every couple minutes (for instance as part of the stats collection cycle)? The formula that is used by XenCenter for this seems pretty easy and spot. This would also reduce the number of times a retry mechanism has to kick in for other action as well. On that note, the retry mechanism you are referring to does not seem to apply to HA-workers created by the process that puts a host in maintenance. Also it feels to me that this is more of a workaround than a nice solution, mostly because host_free_mem can be recalculated quickly and easily when needed. And concerning the allocation threshold. If I'm not mistaking this does not apply to HA-workers which is being used whenever you put at host into maintenance. Additionally the instance being migrated is already in the cluster so this threshold is not hit during PrepairForMaintenance. > CitrixResourceBase wrongly calculates total memory on hosts with a lot of > memory and large Dom0 > ----------------------------------------------------------------------------------------------- > > Key: CLOUDSTACK-7857 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7857 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Affects Versions: Future, 4.3.0, 4.4.0, 4.5.0, 4.3.1, 4.4.1, 4.6.0 > Reporter: Joris van Lieshout > Priority: Blocker > > We have hosts with 256GB memory and 4GB dom0. During startup ACS calculates > available memory using this formula: > CitrixResourceBase.java > protected void fillHostInfo > ram = (long) ((ram - dom0Ram - _xs_memory_used) * > _xs_virtualization_factor); > In our situation: > ram = 274841497600 > dom0Ram = 4269801472 > _xs_memory_used = 128 * 1024 * 1024L = 134217728 > _xs_virtualization_factor = 63.0/64.0 = 0,984375 > (274841497600 - 4269801472 - 134217728) * 0,984375 = 266211892800 > This is in fact not the actual amount of memory available for instances. The > difference in our situation is a little less then 1GB. On this particular > hypervisor Dom0+Xen uses about 9GB. > As the comment above the definition of XsMemoryUsed allready stated it's time > to review this logic. > "//Hypervisor specific params with generic value, may need to be overridden > for specific versions" > The effect of this bug is that when you put a hypervisor in maintenance it > might try to move instances (usually small instances (<1GB)) to a host that > in fact does not have enought free memory. > This exception is thrown: > ERROR [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-3:ctx-09aca6e9 > work-8981) Terminating HAWork[8981-Migration-4482-Running-Migrating] > com.cloud.utils.exception.CloudRuntimeException: Unable to migrate due to > Catch Exception com.cloud.utils.exception.CloudRuntimeException: Migration > failed due to com.cloud.utils.exception.CloudRuntim > eException: Unable to migrate VM(r-4482-VM) from > host(6805d06c-4d5b-4438-a245-7915e93041d9) due to Task failed! Task record: > uuid: 645b63c8-1426-b412-7b6a-13d61ee7ab2e > nameLabel: Async.VM.pool_migrate > nameDescription: > allowedOperations: [] > currentOperations: {} > created: Thu Nov 06 13:44:14 CET 2014 > finished: Thu Nov 06 13:44:14 CET 2014 > status: failure > residentOn: com.xensource.xenapi.Host@b42882c6 > progress: 1.0 > type: <none/> > result: > errorInfo: [HOST_NOT_ENOUGH_FREE_MEMORY, 272629760, 263131136] > otherConfig: {} > subtaskOf: com.xensource.xenapi.Task@aaf13f6f > subtasks: [] > at > com.cloud.vm.VirtualMachineManagerImpl.migrate(VirtualMachineManagerImpl.java:1840) > at > com.cloud.vm.VirtualMachineManagerImpl.migrateAway(VirtualMachineManagerImpl.java:2214) > at > com.cloud.ha.HighAvailabilityManagerImpl.migrate(HighAvailabilityManagerImpl.java:610) > at > com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.runWithContext(HighAvailabilityManagerImpl.java:865) > at > com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.access$000(HighAvailabilityManagerImpl.java:822) > at > com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread$1.run(HighAvailabilityManagerImpl.java:834) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) > at > com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:831) -- This message was sent by Atlassian JIRA (v6.3.4#6332)