hostmem: add an ability to specify prealloc timeout

Daniil Tatianin Mon, 23 Jan 2023 05:39:08 -0800

On 1/23/23 11:57 AM, David Hildenbrand wrote:

On 20.01.23 14:47, Daniil Tatianin wrote:

This series introduces new qemu_prealloc_mem_with_timeout() api,
which allows limiting the maximum amount of time to be spent on memory
preallocation. It also adds prealloc statistics collection that is
exposed via an optional timeout handler.


This new api is then utilized by hostmem for guest RAM preallocation
controlled via new object properties called 'prealloc-timeout' and
'prealloc-timeout-fatal'.

This is useful for limiting VM startup time on systems with
unpredictable page allocation delays due to memory fragmentation or the
backing storage. The timeout can be configured to either simply emit a
warning and continue VM startup without having preallocated the entire
guest RAM or just abort startup entirely if that is not acceptable for
a specific use case.

The major use case for preallocation is memory resources that cannot beovercommitted (hugetlb, file blocks, ...), to avoid running out of suchresources later, while the guest is already running, and crashing it.

Wouldn't you say that preallocating memory for the sake of speeding upguest kernel startup & runtime is a valid use case of prealloc? This waywe can avoid expensive (for a multitude of reasons) page faults thatwill otherwise slow down the guest significantly at runtime and affectthe user experience.

Allocating only a fraction "because it takes too long" looks quiteuseless in that (main use-case) context. We shouldn't encourage QEMUusers to play with fire in such a way. IOW, there should be no wayaround "prealloc-timeout-fatal". Either preallocation succeeded and theguest can run, or it failed, and the guest can't run.

Here we basically accept the fact that e.g with fragmented memory thekernel might take a while in a page fault handler especially for hugetlbbecause of page compaction that has to run for every fault.

This way we can prefault at least some number of pages and let the guestfault the rest on demand later on during runtime even if it's slow andwould cause a noticeable lag.

... but then, management tools can simply start QEMU with "-S", start anown timer, and zap QEMU if it didn't manage to come up in time, andsimply start a new QEMU instance without preallocation enabled.
The "good" thing about that approach is that it will also cover anyimplicit memory preallocation, like using mlock() or VFIO, that don'trun in ordinary per-hostmem preallocation context. If setting QEMU uptakes to long, you might want to try on a different hypervisor in yourcluster instead.

This approach definitely works too but again it assumes that we alwayswant 'prealloc-timeout-fatal' to be on, which is, for the most part onlythe case for working around issues that might be caused by overcommit.

I don't immediately see why we want to make our preallcoation+hostmemimplementation in QEMU more complicated for such a use case.

Re: [PATCH v0 0/4] backends/hostmem: add an ability to specify prealloc timeout

Reply via email to