On Tue, Nov 04, 2025 at 08:33:05PM +0000, Jon Kohler wrote:
> 
> 
> > On Nov 3, 2025, at 4:14 PM, Daniel P. Berrangé <[email protected]> wrote:
> > 
> > On Mon, Nov 03, 2025 at 11:57:50AM -0700, Jon Kohler wrote:
> >> Increase MAX_MEM_PREALLOC_THREAD_COUNT from 16 to 32. This was last
> >> touched in 2017 [1] and, since then, physical machine sizes and VMs
> >> therein have continue to get even bigger, both on average and on the
> >> extremes.
> >> 
> >> For very large VMs, using 16 threads to preallocate memory can be a
> >> non-trivial bottleneck during VM start-up and migration. Increasing
> >> this limit to 32 threads reduces the time taken for these operations.
> >> 
> >> Test results from quad socket Intel 8490H (4x 60 cores) show a fairly
> >> linear gain of 50% with the 2x thread count increase.
> >> 
> >> ---------------------------------------------
> >> Idle Guest w/ 2M HugePages   | Start-up time
> >> ---------------------------------------------
> >> 240 vCPU, 7.5TB (16 threads) | 2m41.955s
> >> ---------------------------------------------
> >> 240 vCPU, 7.5TB (32 threads) | 1m19.404s
> >> ---------------------------------------------
> > 
> > If we're configuring a guest with 240 vCPUs, then this implies the admin
> > is expecting that the guest will consume upto 240 host CPUs worth of
> > compute time.
> > 
> > What is the purpose of limiting the number of prealloc threads to a
> > value that is an order of magnitude less than the number of vCPUs the
> > guest has been given ?
> 
> Daniel - thanks for the quick review and thoughts here.
> 
> I looked back through the original commits that led up to the current 16
> thread max, and it wasn’t immediately clear to me why we clamped it at
> 16. Perhaps there was some other contention at the time.
> 
> > Have you measured what startup time would look like with 240 prealloc
> > threads ? Do we hit some scaling limit before that point making more
> > prealloc threads counter-productive ?
> 
> I have, and it isn’t wildly better, it comes down to about 50-ish seconds,
> as you start running into practical limitations on the speed of memory, as
> well as context switching if you’re doing other things on the host at the
> same time.
> 
> In playing around with some other values, here’s how they shake out:
> 32 threads: 1m19s
> 48 threads: 1m4s
> 64 threads: 59s
> …
> 240 threads: 50s
> 
> This also looks much less exciting when the amount of memory is
> smaller. For smaller memory sizes (I’m testing with 7.5TB), anything
> smaller than that gets less and less fun from a speedup perspective.
> 
> Putting that all together, 32 seemed like a sane number with a solid
> speedup on fairly modern hardware.

Yep, that's useful background, I've no objectino to picking 32.

Perhaps worth putting a bit more of this details into the
commit message as background.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to