BryanMLima opened a new pull request, #8252:
URL: https://github.com/apache/cloudstack/pull/8252

   ### Description
   
   ## <section id="problem-description">1. Problem description</section>
   
   In Apache CloudStack (ACS), when a VM is deployed in a host with the KVM 
hypervisor, an XML file is created in the assigned host, which has a property 
<code>shares</code> that defines the weight of the VM to access the host CPU. 
The value of this property has no unit, and it is a relative measure to 
calculate how much CPU a given VM will have in the host. However, this value 
has a limit, which depends on the version of cgroup utilized by the host's 
kernel. The problem lies at the range value of shares that varies between both 
versions: [2, 264144] for cgroups version 1; and [1, 10000] for cgroups version 
2. Currently, ACS calculates the value of <code>shares</code> using Equation 1, 
presented below, where <code>CPU</code> is the number of cores and 
<code>speed</code> is the CPU frequency; both specified in the VM's compute 
offering. Therefore, if a compute offering has, for example, 6 cores at 2 GHz, 
the <code>shares</code> value will be 12000 and an exception will be thrown by 
li
 bvirt if the host utilizes cgroup v2. The second version is becoming the 
default one in current Linux distributions; thus, it is necessary to address 
this limitation.
   
   - **Equation 1**
     <code>shares = CPU * speed</code>
   
   ## <section id="proposed-changes">2. Proposed changes</section>
   
   To address the problem described, we propose to apply a scale conversion 
considering the max <code>shares</code> of the host. Using the same formula 
currently utilized by ACS, it is possible to calculate the maximum 
<code>shares</code> of a VM for a given host. In other words, using the number 
of cores and the nominal speed of the host's CPU as the upper limit of 
<code>shares</code> allowed to a VM. Then, this value will be scaled to the 
allowed interval of [1, 10000] of cgroup v2 by using a linear scale conversion.
   
   The VM <code>shares</code> would be calculated as Equation 2, presented 
below, where <code>VM requested shares</code> is the requested 
<code>shares</code> value calculated using Equation 1, <code>cgroup upper 
limit</code> is fixed with a value of 10000 (cgroups v2 upper limit), and 
<code>host max shares</code> is the maximum <code>shares</code> value of the 
host, calculated using Equation 1. Using Equation 2, the only case where a VM 
passes the cgroup v2 limit is when the user requests more resources than the 
host has, which is not possible with the current implementation of ACS.
   
   - **Equation 2**
     <code>shares = (VM requested shares * cgroup upper limit)/host max 
shares</code>
   
   To implement the proposal, the following APIs will be updated: 
<code>deployVirtualMachine</code>, <code>migrateVirtualMachine</code> and 
<code>scaleVirtualMachine</code>. When a VM is being deployed, a new 
verification will be added to find a suitable host. The max <code>shares</code> 
of each host will be calculated, and the VM calculated <code>shares</code> will 
be verified if it does not surpass the host's value. Likewise, the migration of 
VMs will have a similar new verification. Lastly, the scale of VMs will also 
have the same verification for the VM's host.
   
   To determine the max <code>shares</code> of a given host, we will use the 
same equation currently used in ACS for calculating the <code>shares</code> of 
VMs, presented in <a href="#problem-description" class="internal-link">Section 
1</a>. When Equation 1 is used to determine the maximum <code>shares</code> of 
a host, <code>CPU</code> is the number of cores of the host, and 
<code>speed</code> is the nominal CPU speed, i.e., considering the CPU's base 
frequency.
   
   It is important to note that these changes are only for hosts with the KVM 
hypervisor using cgroup v2 for now.
   
   ## <section id="example">Example</section>
   
   To exemplify the proposed changes, consider a host with the following 
specification: 32 CPU cores with nominal speed of 2 GHz; and a VM with a 
compute offering with 8 CPU cores and with speed of 2 GHz. With the current ACS 
implementation, the <code>shares</code> of the VM would be calculated as 
Equation 1. Thus, the VM <code>shares</code> would be 16000, over the cgroup v2 
limit of 10000.
   
   With the proposed changes, the VM <code>shares</code> would be calculated as 
Equation 2. In this example, <code>VM requested shares</code> is 16000, 
<code>cgroup upper limit</code> is fixed with a value of 10000, and <code>host 
max shares</code> is 64000. Therefore, the VM <code>shares</code> results in 
2500, well below the cgroup v2 limit.
   
   ## <section id="real-case-scenarios">Real case scenarios</section>
   
   To demonstrate real case scenarios, consider the following hosts:
   
   - **Host A**
     - **\# of Cores:** 32
     - **CPU nominal frequency:** 2 GHz
     - **Max Shares:** 64000
   - **Host B**
     - **\# of Cores:** 16
     - **CPU nominal frequency:** 2 GHz
     - **Max Shares:** 32000
   
   Table 1 below presents a set of VMs with their requested resources, 
alongside the <code>shares</code> values considering the current 
implementation, and the new <code>shares</code> value, for each host, 
considering the proposed change using Equation 2.
   
   - Table 1
   
   <table>
   
   <thead>
   
   <tr>
   
   <th rowspan="2">VM</th>
   
   <th rowspan="2">CPU cores</th>
   
   <th rowspan="2">CPU frequency (GHz)</th>
   
   <th rowspan="2">Current shares</th>
   
   <th colspan="2">New shares</th>
   
   </tr>
   
   <tr>
   
   <th>For Host A</th>
   
   <th>For Host B</th>
   
   </tr>
   
   </thead>
   
   <tbody>
   
   <tr>
   
   <td>VM 1</td>
   
   <td>2</td>
   
   <td>2</td>
   
   <td>4000</td>
   
   <td>625</td>
   
   <td>1250</td>
   
   </tr>
   
   <tr>
   
   <td>VM 2</td>
   
   <td>4</td>
   
   <td>2</td>
   
   <td>8000</td>
   
   <td>1250</td>
   
   <td>2500</td>
   
   </tr>
   
   <tr>
   
   <td>VM 3</td>
   
   <td>6</td>
   
   <td>2</td>
   
   <td>12000</td>
   
   <td>1875</td>
   
   <td>3750</td>
   
   </tr>
   
   <tr>
   
   <td>VM 4</td>
   
   <td>8</td>
   
   <td>2</td>
   
   <td>16000</td>
   
   <td>2500</td>
   
   <td>5000</td>
   
   </tr>
   
   <tr>
   
   <td>VM 5</td>
   
   <td>16</td>
   
   <td>2</td>
   
   <td>32000</td>
   
   <td>5000</td>
   
   <td>10000</td>
   
   </tr>
   
   <tr>
   
   <td>VM 6</td>
   
   <td>32</td>
   
   <td>2</td>
   
   <td>64000</td>
   
   <td>10000</td>
   
   <td>20000</td>
   
   </tr>
   
   </tbody>
   
   </table>
   
   Table 2 below presents if the same VMs in Table 1 would be allowed to be 
allocated to a given host, or if an exception would be thrown, considering 
current and proposed implementations. As we can see, with the current ACS 
implementation, VMs 3 through 6 would throw an exception when deploying in host 
A; even though the host has enough resources. VM 6 should throw an exception 
when trying to deploy it in host B in both implementations, as the host does 
not have enough resources to allocate it.
   
   - Table 2
   
   <table>
   
   <thead>
   
   <tr>
   
   <th rowspan="2">VM</th>
   
   <th colspan="2">Host A<br></th>
   
   <th colspan="2">Host B</th>
   
   </tr>
   
   <tr>
   
   <th>Current Implementation</th>
   
   <th>Proposed Implementation</th>
   
   <th>Current Implementation</th>
   
   <th>Proposed Implementation</th>
   
   </tr>
   
   </thead>
   
   <tbody>
   
   <tr>
   
   <td>VM 1</td>
   
   <td>Allowed</td>
   
   <td>Allowed</td>
   
   <td>Allowed</td>
   
   <td>Allowed</td>
   
   </tr>
   
   <tr>
   
   <td>VM 2</td>
   
   <td>Allowed</td>
   
   <td>Allowed</td>
   
   <td>Allowed</td>
   
   <td>Allowed</td>
   
   </tr>
   
   <tr>
   
   <td>VM 3</td>
   
   <td>Exception</td>
   
   <td>Allowed</td>
   
   <td>Exception</td>
   
   <td>Allowed</td>
   
   </tr>
   
   <tr>
   
   <td>VM 4</td>
   
   <td>Exception</td>
   
   <td>Allowed</td>
   
   <td>Exception</td>
   
   <td>Allowed</td>
   
   </tr>
   
   <tr>
   
   <td>VM 5</td>
   
   <td>Exception</td>
   
   <td>Allowed</td>
   
   <td>Exception</td>
   
   <td>Allowed</td>
   
   </tr>
   
   <tr>
   
   <td>VM 6</td>
   
   <td>Exception</td>
   
   <td>Allowed</td>
   
   <td>Exception</td>
   
   <td>Exception</td>
   
   </tr>
   
   </tbody>
   
   </table>
   
   It is important to note that Equation 2 rounds up the <code>shares</code> 
value; thus, there is a precision loss with the conversion. Nevertheless, this 
precision loss should not be noticeable to the end user, as the 
<code>shares</code> value would need to be in a very close interval, e.g. 
<code>shares</code> values of <code>3997</code>, <code>3998</code> and 
<code>3999</code> would be considered as <code>1249</code> in host B with the 
new implementation. However, the precision loss is a small drawback for 
enabling support of cgroup v2 to ACS.
   
   ## <section id="future-works">3. Future works</section>
   
   With the current proposal, only cgroups version 2 is addressed, as it has 
impactful limitations. Thus, as future work, cgroups version 1 will also be 
addressed using the same strategy of linear scale conversion.
   
   ---
   Fixes: #6744 
   
   ### Types of changes
   
   - [x] Breaking change (fix or feature that would cause existing 
functionality to change)
   - [ ] New feature (non-breaking change which adds functionality)
   - [ ] Bug fix (non-breaking change which fixes an issue)
   - [ ] Enhancement (improves an existing feature and functionality)
   - [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
   - [ ] build/CI
   
   ### Feature/Enhancement Scale or Bug Severity
   
   #### Feature/Enhancement Scale
   
   - [x] Major
   - [ ] Minor
   
   #### Bug Severity
   
   - [ ] BLOCKER
   - [ ] Critical
   - [ ] Major
   - [ ] Minor
   - [ ] Trivial
   
   ### How Has This Been Tested?
   
   Consider the host `host-1` with cgroup v2, `host-2` with cgroup v1 and the 
following custom constrained compute offering below:
   
   - **Frequency:** 2 GHz
   - **Cores:** 2 - 6
   - **RAM**: 1 GB
   
   ## Deploy of VMs
   
   I created the VM `vm-tmpfs` with 5 cores and allocated it to host `host-2`.
   ```bash
   ubuntu@host-2:~$ virsh dumpxml --domain i-2-13-VM | grep shares
       <shares>10000</shares>
   ```
   
   I created the VM `vm-cgroupv2` with 5 cores and allocated it to host 
`host-1`.
   ```bash
   ubuntu@host-1:~$ virsh dumpxml --domain i-2-15-VM | grep shares
       <shares>6667</shares>
   ```
   
   As expected, ACS considered the host resources to set the `shares` values. 
When the host utilizes the cgroup v1, the default behavior is not changed.
   
   ## VM live scale
   
   I lived scale the VM `vm-tmpfs`, changing its number of cores from 5 to 6.
   
   ```bash
   ubuntu@host-2:~$ virsh dumpxml --domain i-2-13-VM | grep shares
       <shares>10000</shares>
   ubuntu@host-2:~$ virsh dumpxml --domain i-2-13-VM | grep shares
       <shares>12000</shares>
   ```
   
   I lived scale the VM `vm-cgroupv2`, changing its number of cores from 5 to 6.
   ```bash
   ubuntu@host-1:~$ virsh dumpxml --domain i-2-15-VM | grep shares
       <shares>6667</shares>
   ubuntu@host-1:~$ virsh dumpxml --domain i-2-15-VM | grep shares
       <shares>8000</shares>
   ```
   
   As expected, ACS considered the host resources to set the `shares` values. 
When the host utilizes the cgroup v1, the default behavior is not changed.
   
   ## VM migration
   
   I migrated VM `vm-tmpfs` from host `host-2` to host `host-1` (from cgroupv1 
to cgroupv2). After the migration, the `shares` values was changed to `8000`, 
as expected.
   ```bash
   ubuntu@host-1:~$ virsh dumpxml --domain i-2-13-VM | grep shares
       <shares>8000</shares>
   ```
   
   I migrated VM `vm-cgroupv2` from host `host-1` to host `host-2` (from 
cgroupv2 to cgroupv1). After the migration, the `shares` values was changed to 
`12000`, as expected.
   ```bash
   ubuntu@host-2:~$ virsh dumpxml --domain i-2-15-VM | grep shares
       <shares>12000</shares>
   ```
   
   #### How did you try to break this feature and the system with this change?
   
   I migrated VMs between hosts with different cgroup versions, the VM 
migration section above describes this.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to