The following patch-set proposes an efficient mechanism for handing freed
memory between the guest and the host. It enables the guests with no page cache
to rapidly free and reclaims memory to and from the host respectively.
Benefit:
With this patch-series, in our test-case, executed on a single system and
single NUMA node with 15GB memory, we were able to successfully launch atleast
5 guests
when page hinting was enabled and 3 without it. (Detailed explanation of the
test procedure is provided at the bottom).
Changelog in V8:
In this patch-series, the earlier approach [1] which was used to capture and
scan the pages freed by the guest has been changed. The new approach is briefly
described below:
The patch-set still leverages the existing arch_free_page() to add this
functionality. It maintains a per CPU array which is used to store the pages
freed by the guest. The maximum number of entries which it can hold is defined
by MAX_FGPT_ENTRIES(1000). When the array is completely filled, it is scanned
and only the pages which are available in the buddy are stored. This process
continues until the array is filled with pages which are part of the buddy free
list. After which it wakes up a kernel per-cpu-thread.
This kernel per-cpu-thread rescans the per-cpu-array for any re-allocation and
if the page is not reallocated and present in the buddy, the kernel thread
attempts to isolate it from the buddy. If it is successfully isolated, the page
is added to another per-cpu array. Once the entire scanning process is
complete, all the isolated pages are reported to the host through an existing
virtio-balloon driver.
Known Issues:
* Fixed array size: The problem with having a fixed/hardcoded array
size arises when the size of the guest varies. For example when the guest size
increases and it starts making large allocations fixed size limits this
solution's ability to capture all the freed pages. This will result in less
guest free memory getting reported to the host.
Known code re-work:
* Plan to re-use Wei's work, which communicates the poison value to the
host.
* The nomenclatures used in virtio-balloon needs to be changed so that
the code can easily be distinguished from Wei's Free Page Hint code.
* Sorting based on zonenum, to avoid repetitive zone locks for the same
zone.
Other required work:
* Run other benchmarks to evaluate the performance/impact of this
approach.
Test case:
Setup:
Memory-15837 MB
Guest Memory Size-5 GB
Swap-Disabled
Test Program-Simple program which allocates 4GB memory via malloc, touches it
via memset and exits.
Use case-Number of guests that can be launched completely including the
successful execution of the test program.
Procedure:
The first guest is launched and once its console is up, the test allocation
program is executed with 4 GB memory request (Due to this the guest occupies
almost 4-5 GB of memory in the host in a system without page hinting). Once
this program exits at that time another guest is launched in the host and the
same process is followed. We continue launching the guests until a guest gets
killed due to low memory condition in the host.
Result:
Without Hinting-3 Guests
With Hinting-5 to 7 Guests(Based on the amount of memory freed/captured).
[1] https://www.spinics.net/lists/kvm/msg170113.html