> On Nov 26, 2025, at 1:17 PM, Daniel P. Berrangé <[email protected]> wrote:
> 
> On Wed, Nov 26, 2025 at 06:14:43PM +0000, Jon Kohler wrote:
>> 
>> 
>>> On Nov 6, 2025, at 10:53 AM, Daniel P. Berrangé <[email protected]> wrote:
>>> 
>>> On Thu, Nov 06, 2025 at 09:31:43AM -0700, Jon Kohler wrote:
>>>> Increase MAX_MEM_PREALLOC_THREAD_COUNT from 16 to 32. This was last
>>>> touched in 2017 [1] and, since then, physical machine sizes and VMs
>>>> therein have continue to get even bigger, both on average and on the
>>>> extremes.
>>>> 
>>>> For very large VMs, using 16 threads to preallocate memory can be a
>>>> non-trivial bottleneck during VM start-up and migration. Increasing
>>>> this limit to 32 threads reduces the time taken for these operations.
>>>> 
>>>> Test results from quad socket Intel 8490H (4x 60 cores) show a fairly
>>>> linear gain of 50% with the 2x thread count increase.
>>>> 
>>>> ---------------------------------------------
>>>> Idle Guest w/ 2M HugePages   | Start-up time
>>>> ---------------------------------------------
>>>> 240 vCPU, 7.5TB (16 threads) | 2m41.955s
>>>> ---------------------------------------------
>>>> 240 vCPU, 7.5TB (32 threads) | 1m19.404s
>>>> ---------------------------------------------
>>>> 
>>>> Note: Going above 32 threads appears to have diminishing returns at
>>>> the point where the memory bandwidth and context switching costs
>>>> appear to be a limiting factor to linear scaling. For posterity, on
>>>> the same system as above:
>>>> - 32 threads: 1m19s
>>>> - 48 threads: 1m4s
>>>> - 64 threads: 59s
>>>> - 240 threads: 50s
>>>> 
>>>> Additional thread counts also get less interesting as the amount of
>>>> memory is to be preallocated is smaller. Putting that all together,
>>>> 32 threads appears to be a sane number with a solid speedup on fairly
>>>> modern hardware. To go faster, we'd either need to improve the hardware
>>>> (CPU/memory) itself or improve clear_pages_*() on the kernel side to
>>>> be more efficient.
>>>> 
>>>> [1] 1e356fc14bea ("mem-prealloc: reduce large guest start-up and migration 
>>>> time.")
>>>> 
>>>> Signed-off-by: Jon Kohler <[email protected]>
>>>> ---
>>>> util/oslib-posix.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>> 
>>> Reviewed-by: Daniel P. Berrangé <[email protected]>
>> 
>> Thanks, Daniel !
>> 
>> Is there anything else we need on this one? Want to
>> make sure it doesn’t get lost.
> 
> Paolo (CCd) is primary maintainer for this code and should queue it.

Paolo - Pinging on this one, is there anything left on this to queue it?

Thanks,
Jon

>>>> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
>>>> index 3c14b72665..dc001da66d 100644
>>>> --- a/util/oslib-posix.c
>>>> +++ b/util/oslib-posix.c
>>>> @@ -61,7 +61,7 @@
>>>> #include "qemu/memalign.h"
>>>> #include "qemu/mmap-alloc.h"
>>>> 
>>>> -#define MAX_MEM_PREALLOC_THREAD_COUNT 16
>>>> +#define MAX_MEM_PREALLOC_THREAD_COUNT 32
>>>> 
>>>> struct MemsetThread;

Reply via email to