[jira] [Commented] (CLOUDSTACK-7857) CitrixResourceBase wrongly calculates total memory on hosts with a lot of memory and large Dom0

Joris van Lieshout (JIRA) Thu, 13 Nov 2014 01:44:37 -0800

    [ 
https://issues.apache.org/jira/browse/CLOUDSTACK-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209545#comment-14209545
 ]


Joris van Lieshout commented on CLOUDSTACK-7857:
------------------------------------------------

Hi Rohit, I did some digging around in the XenCenter code and found a possible 
solution there. But there is a challenge I think. The overhead is  dynamic 
based on the instances running on the host, and, at the moment, ACS calculates 
this overhead at host thread startup.

This is what I found in the XenCenter code:
https://github.com/xenserver/xenadmin/blob/a0d31920c5ac62eda9713228043a834ba7829986/XenModel/XenAPI-Extensions/Host.cs#L1071
======
public long xen_memory_calc
{
        get
        {
                if (!Helpers.MidnightRideOrGreater(Connection))
                {
                        Host_metrics host_metrics = 
Connection.Resolve(this.metrics);
                        if (host_metrics == null)
                                return 0;
                        long totalused = 0;
                        foreach (VM vm in Connection.ResolveAll(resident_VMs))
                        {
                                VM_metrics vmMetrics = 
vm.Connection.Resolve(vm.metrics);
                                if (vmMetrics != null)
                                        totalused += vmMetrics.memory_actual;
                        }
                        return host_metrics.memory_total - totalused - 
host_metrics.memory_free;
                }
                long xen_mem = memory_overhead;
                foreach (VM vm in Connection.ResolveAll(resident_VMs))
                {
                        xen_mem += vm.memory_overhead;
                        if (vm.is_control_domain)
                        {
                                VM_metrics vmMetrics = 
vm.Connection.Resolve(vm.metrics);
                                if (vmMetrics != null)
                                        xen_mem += vmMetrics.memory_actual;
                        }
                }
                return xen_mem;
        }
}
=====
We can skip the first part because, if I'm not mistaking, ACS only supports 
XS5.6 and up. XS5.6 = MidnightRide
In short the formula is something like this: xen_mem = host_memory_overhead + 
residentVMs_memory_overhead + dom0_memory_actual

Here is a list of xe commands that will get you the correct numbers to 
summarize. 
host_mem_overhead
        xe host-list name-label=$HOSTNAME params=memory-overhead --minimal
residentVMs_memory_overhead 
        xe vm-list resident-on=$(xe host-list name-label=$HOSTNAME --minimal) 
params=memory-overhead --minimal
dom0_memory_actual
        xe vm-list resident-on=$(xe host-list name-label=$HOSTNAME --minimal) 
is-control-domain=true params=memory-actual --minimal

> CitrixResourceBase wrongly calculates total memory on hosts with a lot of 
> memory and large Dom0
> -----------------------------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-7857
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7857
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>    Affects Versions: Future, 4.3.0, 4.4.0, 4.5.0, 4.3.1, 4.4.1, 4.6.0
>            Reporter: Joris van Lieshout
>            Priority: Blocker
>
> We have hosts with 256GB memory and 4GB dom0. During startup ACS calculates 
> available memory using this formula:
> CitrixResourceBase.java
>       protected void fillHostInfo
>               ram = (long) ((ram - dom0Ram - _xs_memory_used) * 
> _xs_virtualization_factor);
> In our situation:
>       ram = 274841497600
>       dom0Ram = 4269801472
>       _xs_memory_used = 128 * 1024 * 1024L = 134217728
>       _xs_virtualization_factor = 63.0/64.0 = 0,984375
>       (274841497600 - 4269801472 - 134217728) * 0,984375 = 266211892800
> This is in fact not the actual amount of memory available for instances. The 
> difference in our situation is a little less then 1GB. On this particular 
> hypervisor Dom0+Xen uses about 9GB.
> As the comment above the definition of XsMemoryUsed allready stated it's time 
> to review this logic. 
> "//Hypervisor specific params with generic value, may need to be overridden 
> for specific versions"
> The effect of this bug is that when you put a hypervisor in maintenance it 
> might try to move instances (usually small instances (<1GB)) to a host that 
> in fact does not have enought free memory.
> This exception is thrown:
> ERROR [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-3:ctx-09aca6e9 
> work-8981) Terminating HAWork[8981-Migration-4482-Running-Migrating]
> com.cloud.utils.exception.CloudRuntimeException: Unable to migrate due to 
> Catch Exception com.cloud.utils.exception.CloudRuntimeException: Migration 
> failed due to com.cloud.utils.exception.CloudRuntim
> eException: Unable to migrate VM(r-4482-VM) from 
> host(6805d06c-4d5b-4438-a245-7915e93041d9) due to Task failed! Task record:   
>               uuid: 645b63c8-1426-b412-7b6a-13d61ee7ab2e
>            nameLabel: Async.VM.pool_migrate
>      nameDescription: 
>    allowedOperations: []
>    currentOperations: {}
>              created: Thu Nov 06 13:44:14 CET 2014
>             finished: Thu Nov 06 13:44:14 CET 2014
>               status: failure
>           residentOn: com.xensource.xenapi.Host@b42882c6
>             progress: 1.0
>                 type: <none/>
>               result: 
>            errorInfo: [HOST_NOT_ENOUGH_FREE_MEMORY, 272629760, 263131136]
>          otherConfig: {}
>            subtaskOf: com.xensource.xenapi.Task@aaf13f6f
>             subtasks: []
>         at 
> com.cloud.vm.VirtualMachineManagerImpl.migrate(VirtualMachineManagerImpl.java:1840)
>         at 
> com.cloud.vm.VirtualMachineManagerImpl.migrateAway(VirtualMachineManagerImpl.java:2214)
>         at 
> com.cloud.ha.HighAvailabilityManagerImpl.migrate(HighAvailabilityManagerImpl.java:610)
>         at 
> com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.runWithContext(HighAvailabilityManagerImpl.java:865)
>         at 
> com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.access$000(HighAvailabilityManagerImpl.java:822)
>         at 
> com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread$1.run(HighAvailabilityManagerImpl.java:834)
>         at 
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>         at 
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>         at 
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>         at 
> com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:831)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CLOUDSTACK-7857) CitrixResourceBase wrongly calculates total memory on hosts with a lot of memory and large Dom0

Reply via email to