BryanMLima opened a new pull request, #8252:
URL: https://github.com/apache/cloudstack/pull/8252
### Description
## <section id="problem-description">1. Problem description</section>
In Apache CloudStack (ACS), when a VM is deployed in a host with the KVM
hypervisor, an XML file is created in the assigned host, which has a property
<code>shares</code> that defines the weight of the VM to access the host CPU.
The value of this property has no unit, and it is a relative measure to
calculate how much CPU a given VM will have in the host. However, this value
has a limit, which depends on the version of cgroup utilized by the host's
kernel. The problem lies at the range value of shares that varies between both
versions: [2, 264144] for cgroups version 1; and [1, 10000] for cgroups version
2. Currently, ACS calculates the value of <code>shares</code> using Equation 1,
presented below, where <code>CPU</code> is the number of cores and
<code>speed</code> is the CPU frequency; both specified in the VM's compute
offering. Therefore, if a compute offering has, for example, 6 cores at 2 GHz,
the <code>shares</code> value will be 12000 and an exception will be thrown by
li
bvirt if the host utilizes cgroup v2. The second version is becoming the
default one in current Linux distributions; thus, it is necessary to address
this limitation.
- **Equation 1**
<code>shares = CPU * speed</code>
## <section id="proposed-changes">2. Proposed changes</section>
To address the problem described, we propose to apply a scale conversion
considering the max <code>shares</code> of the host. Using the same formula
currently utilized by ACS, it is possible to calculate the maximum
<code>shares</code> of a VM for a given host. In other words, using the number
of cores and the nominal speed of the host's CPU as the upper limit of
<code>shares</code> allowed to a VM. Then, this value will be scaled to the
allowed interval of [1, 10000] of cgroup v2 by using a linear scale conversion.
The VM <code>shares</code> would be calculated as Equation 2, presented
below, where <code>VM requested shares</code> is the requested
<code>shares</code> value calculated using Equation 1, <code>cgroup upper
limit</code> is fixed with a value of 10000 (cgroups v2 upper limit), and
<code>host max shares</code> is the maximum <code>shares</code> value of the
host, calculated using Equation 1. Using Equation 2, the only case where a VM
passes the cgroup v2 limit is when the user requests more resources than the
host has, which is not possible with the current implementation of ACS.
- **Equation 2**
<code>shares = (VM requested shares * cgroup upper limit)/host max
shares</code>
To implement the proposal, the following APIs will be updated:
<code>deployVirtualMachine</code>, <code>migrateVirtualMachine</code> and
<code>scaleVirtualMachine</code>. When a VM is being deployed, a new
verification will be added to find a suitable host. The max <code>shares</code>
of each host will be calculated, and the VM calculated <code>shares</code> will
be verified if it does not surpass the host's value. Likewise, the migration of
VMs will have a similar new verification. Lastly, the scale of VMs will also
have the same verification for the VM's host.
To determine the max <code>shares</code> of a given host, we will use the
same equation currently used in ACS for calculating the <code>shares</code> of
VMs, presented in <a href="#problem-description" class="internal-link">Section
1</a>. When Equation 1 is used to determine the maximum <code>shares</code> of
a host, <code>CPU</code> is the number of cores of the host, and
<code>speed</code> is the nominal CPU speed, i.e., considering the CPU's base
frequency.
It is important to note that these changes are only for hosts with the KVM
hypervisor using cgroup v2 for now.
## <section id="example">Example</section>
To exemplify the proposed changes, consider a host with the following
specification: 32 CPU cores with nominal speed of 2 GHz; and a VM with a
compute offering with 8 CPU cores and with speed of 2 GHz. With the current ACS
implementation, the <code>shares</code> of the VM would be calculated as
Equation 1. Thus, the VM <code>shares</code> would be 16000, over the cgroup v2
limit of 10000.
With the proposed changes, the VM <code>shares</code> would be calculated as
Equation 2. In this example, <code>VM requested shares</code> is 16000,
<code>cgroup upper limit</code> is fixed with a value of 10000, and <code>host
max shares</code> is 64000. Therefore, the VM <code>shares</code> results in
2500, well below the cgroup v2 limit.
## <section id="real-case-scenarios">Real case scenarios</section>
To demonstrate real case scenarios, consider the following hosts:
- **Host A**
- **\# of Cores:** 32
- **CPU nominal frequency:** 2 GHz
- **Max Shares:** 64000
- **Host B**
- **\# of Cores:** 16
- **CPU nominal frequency:** 2 GHz
- **Max Shares:** 32000
Table 1 below presents a set of VMs with their requested resources,
alongside the <code>shares</code> values considering the current
implementation, and the new <code>shares</code> value, for each host,
considering the proposed change using Equation 2.
- Table 1
<table>
<thead>
<tr>
<th rowspan="2">VM</th>
<th rowspan="2">CPU cores</th>
<th rowspan="2">CPU frequency (GHz)</th>
<th rowspan="2">Current shares</th>
<th colspan="2">New shares</th>
</tr>
<tr>
<th>For Host A</th>
<th>For Host B</th>
</tr>
</thead>
<tbody>
<tr>
<td>VM 1</td>
<td>2</td>
<td>2</td>
<td>4000</td>
<td>625</td>
<td>1250</td>
</tr>
<tr>
<td>VM 2</td>
<td>4</td>
<td>2</td>
<td>8000</td>
<td>1250</td>
<td>2500</td>
</tr>
<tr>
<td>VM 3</td>
<td>6</td>
<td>2</td>
<td>12000</td>
<td>1875</td>
<td>3750</td>
</tr>
<tr>
<td>VM 4</td>
<td>8</td>
<td>2</td>
<td>16000</td>
<td>2500</td>
<td>5000</td>
</tr>
<tr>
<td>VM 5</td>
<td>16</td>
<td>2</td>
<td>32000</td>
<td>5000</td>
<td>10000</td>
</tr>
<tr>
<td>VM 6</td>
<td>32</td>
<td>2</td>
<td>64000</td>
<td>10000</td>
<td>20000</td>
</tr>
</tbody>
</table>
Table 2 below presents if the same VMs in Table 1 would be allowed to be
allocated to a given host, or if an exception would be thrown, considering
current and proposed implementations. As we can see, with the current ACS
implementation, VMs 3 through 6 would throw an exception when deploying in host
A; even though the host has enough resources. VM 6 should throw an exception
when trying to deploy it in host B in both implementations, as the host does
not have enough resources to allocate it.
- Table 2
<table>
<thead>
<tr>
<th rowspan="2">VM</th>
<th colspan="2">Host A<br></th>
<th colspan="2">Host B</th>
</tr>
<tr>
<th>Current Implementation</th>
<th>Proposed Implementation</th>
<th>Current Implementation</th>
<th>Proposed Implementation</th>
</tr>
</thead>
<tbody>
<tr>
<td>VM 1</td>
<td>Allowed</td>
<td>Allowed</td>
<td>Allowed</td>
<td>Allowed</td>
</tr>
<tr>
<td>VM 2</td>
<td>Allowed</td>
<td>Allowed</td>
<td>Allowed</td>
<td>Allowed</td>
</tr>
<tr>
<td>VM 3</td>
<td>Exception</td>
<td>Allowed</td>
<td>Exception</td>
<td>Allowed</td>
</tr>
<tr>
<td>VM 4</td>
<td>Exception</td>
<td>Allowed</td>
<td>Exception</td>
<td>Allowed</td>
</tr>
<tr>
<td>VM 5</td>
<td>Exception</td>
<td>Allowed</td>
<td>Exception</td>
<td>Allowed</td>
</tr>
<tr>
<td>VM 6</td>
<td>Exception</td>
<td>Allowed</td>
<td>Exception</td>
<td>Exception</td>
</tr>
</tbody>
</table>
It is important to note that Equation 2 rounds up the <code>shares</code>
value; thus, there is a precision loss with the conversion. Nevertheless, this
precision loss should not be noticeable to the end user, as the
<code>shares</code> value would need to be in a very close interval, e.g.
<code>shares</code> values of <code>3997</code>, <code>3998</code> and
<code>3999</code> would be considered as <code>1249</code> in host B with the
new implementation. However, the precision loss is a small drawback for
enabling support of cgroup v2 to ACS.
## <section id="future-works">3. Future works</section>
With the current proposal, only cgroups version 2 is addressed, as it has
impactful limitations. Thus, as future work, cgroups version 1 will also be
addressed using the same strategy of linear scale conversion.
---
Fixes: #6744
### Types of changes
- [x] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] Enhancement (improves an existing feature and functionality)
- [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
- [ ] build/CI
### Feature/Enhancement Scale or Bug Severity
#### Feature/Enhancement Scale
- [x] Major
- [ ] Minor
#### Bug Severity
- [ ] BLOCKER
- [ ] Critical
- [ ] Major
- [ ] Minor
- [ ] Trivial
### How Has This Been Tested?
Consider the host `host-1` with cgroup v2, `host-2` with cgroup v1 and the
following custom constrained compute offering below:
- **Frequency:** 2 GHz
- **Cores:** 2 - 6
- **RAM**: 1 GB
## Deploy of VMs
I created the VM `vm-tmpfs` with 5 cores and allocated it to host `host-2`.
```bash
ubuntu@host-2:~$ virsh dumpxml --domain i-2-13-VM | grep shares
<shares>10000</shares>
```
I created the VM `vm-cgroupv2` with 5 cores and allocated it to host
`host-1`.
```bash
ubuntu@host-1:~$ virsh dumpxml --domain i-2-15-VM | grep shares
<shares>6667</shares>
```
As expected, ACS considered the host resources to set the `shares` values.
When the host utilizes the cgroup v1, the default behavior is not changed.
## VM live scale
I lived scale the VM `vm-tmpfs`, changing its number of cores from 5 to 6.
```bash
ubuntu@host-2:~$ virsh dumpxml --domain i-2-13-VM | grep shares
<shares>10000</shares>
ubuntu@host-2:~$ virsh dumpxml --domain i-2-13-VM | grep shares
<shares>12000</shares>
```
I lived scale the VM `vm-cgroupv2`, changing its number of cores from 5 to 6.
```bash
ubuntu@host-1:~$ virsh dumpxml --domain i-2-15-VM | grep shares
<shares>6667</shares>
ubuntu@host-1:~$ virsh dumpxml --domain i-2-15-VM | grep shares
<shares>8000</shares>
```
As expected, ACS considered the host resources to set the `shares` values.
When the host utilizes the cgroup v1, the default behavior is not changed.
## VM migration
I migrated VM `vm-tmpfs` from host `host-2` to host `host-1` (from cgroupv1
to cgroupv2). After the migration, the `shares` values was changed to `8000`,
as expected.
```bash
ubuntu@host-1:~$ virsh dumpxml --domain i-2-13-VM | grep shares
<shares>8000</shares>
```
I migrated VM `vm-cgroupv2` from host `host-1` to host `host-2` (from
cgroupv2 to cgroupv1). After the migration, the `shares` values was changed to
`12000`, as expected.
```bash
ubuntu@host-2:~$ virsh dumpxml --domain i-2-15-VM | grep shares
<shares>12000</shares>
```
#### How did you try to break this feature and the system with this change?
I migrated VMs between hosts with different cgroup versions, the VM
migration section above describes this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]