> On Nov 26, 2025, at 1:17 PM, Daniel P. Berrangé <[email protected]> wrote: > > On Wed, Nov 26, 2025 at 06:14:43PM +0000, Jon Kohler wrote: >> >> >>> On Nov 6, 2025, at 10:53 AM, Daniel P. Berrangé <[email protected]> wrote: >>> >>> On Thu, Nov 06, 2025 at 09:31:43AM -0700, Jon Kohler wrote: >>>> Increase MAX_MEM_PREALLOC_THREAD_COUNT from 16 to 32. This was last >>>> touched in 2017 [1] and, since then, physical machine sizes and VMs >>>> therein have continue to get even bigger, both on average and on the >>>> extremes. >>>> >>>> For very large VMs, using 16 threads to preallocate memory can be a >>>> non-trivial bottleneck during VM start-up and migration. Increasing >>>> this limit to 32 threads reduces the time taken for these operations. >>>> >>>> Test results from quad socket Intel 8490H (4x 60 cores) show a fairly >>>> linear gain of 50% with the 2x thread count increase. >>>> >>>> --------------------------------------------- >>>> Idle Guest w/ 2M HugePages | Start-up time >>>> --------------------------------------------- >>>> 240 vCPU, 7.5TB (16 threads) | 2m41.955s >>>> --------------------------------------------- >>>> 240 vCPU, 7.5TB (32 threads) | 1m19.404s >>>> --------------------------------------------- >>>> >>>> Note: Going above 32 threads appears to have diminishing returns at >>>> the point where the memory bandwidth and context switching costs >>>> appear to be a limiting factor to linear scaling. For posterity, on >>>> the same system as above: >>>> - 32 threads: 1m19s >>>> - 48 threads: 1m4s >>>> - 64 threads: 59s >>>> - 240 threads: 50s >>>> >>>> Additional thread counts also get less interesting as the amount of >>>> memory is to be preallocated is smaller. Putting that all together, >>>> 32 threads appears to be a sane number with a solid speedup on fairly >>>> modern hardware. To go faster, we'd either need to improve the hardware >>>> (CPU/memory) itself or improve clear_pages_*() on the kernel side to >>>> be more efficient. >>>> >>>> [1] 1e356fc14bea ("mem-prealloc: reduce large guest start-up and migration >>>> time.") >>>> >>>> Signed-off-by: Jon Kohler <[email protected]> >>>> --- >>>> util/oslib-posix.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> Reviewed-by: Daniel P. Berrangé <[email protected]> >> >> Thanks, Daniel ! >> >> Is there anything else we need on this one? Want to >> make sure it doesn’t get lost. > > Paolo (CCd) is primary maintainer for this code and should queue it.
Paolo - Pinging on this one, is there anything left on this to queue it? Thanks, Jon >>>> diff --git a/util/oslib-posix.c b/util/oslib-posix.c >>>> index 3c14b72665..dc001da66d 100644 >>>> --- a/util/oslib-posix.c >>>> +++ b/util/oslib-posix.c >>>> @@ -61,7 +61,7 @@ >>>> #include "qemu/memalign.h" >>>> #include "qemu/mmap-alloc.h" >>>> >>>> -#define MAX_MEM_PREALLOC_THREAD_COUNT 16 >>>> +#define MAX_MEM_PREALLOC_THREAD_COUNT 32 >>>> >>>> struct MemsetThread;
