On Wed, Sep 04, 2024 at 11:07:44PM +0200, David Hildenbrand wrote: > On 04.09.24 21:16, Peter Xu wrote: > > Zhiyi reported an infinite loop issue in VFIO use case. The cause of that > > was a separate discussion, however during that I found a regression of > > dirty sync slowness when profiling. > > > > Each KVMMemoryListerner maintains an array of kvm memslots. Currently it's > > statically allocated to be the max supported by the kernel. However after > > Linux commit 4fc096a99e ("KVM: Raise the maximum number of user memslots"), > > the max supported memslots reported now grows to some number large enough > > so that it may not be wise to always statically allocate with the max > > reported. > > > > What's worse, QEMU kvm code still walks all the allocated memslots entries > > to do any form of lookups. It can drastically slow down all memslot > > operations because each of such loop can run over 32K times on the new > > kernels. > > > > Fix this issue by making the memslots to be allocated dynamically. > > > > Here the initial size was set to 16 because it should cover the basic VM > > usages, so that the hope is the majority VM use case may not even need to > > grow at all (e.g. if one starts a VM with ./qemu-system-x86_64 by default > > it'll consume 9 memslots), however not too large to waste memory. > > > > There can also be even better way to address this, but so far this is the > > simplest and should be already better even than before we grow the max > > supported memslots. For example, in the case of above issue when VFIO was > > attached on a 32GB system, there are only ~10 memslots used. So it could > > be good enough as of now. > > > > In the above VFIO context, measurement shows that the precopy dirty sync > > shrinked from ~86ms to ~3ms after this patch applied. It should also apply > > to any KVM enabled VM even without VFIO. > > > > Reported-by: Zhiyi Guo <zh...@redhat.com> > > Tested-by: Zhiyi Guo <zh...@redhat.com> > > Signed-off-by: Peter Xu <pet...@redhat.com> > > --- > > > > > { > > int i; > > - kml->slots = g_new0(KVMSlot, s->nr_slots_max); > > kml->as_id = as_id; > > - for (i = 0; i < s->nr_slots_max; i++) { > > - kml->slots[i].slot = i; > > - } > > + kvm_slots_grow(kml, KVM_MEMSLOTS_NUM_ALLOC_DEFAULT); > > I would just keep the static initialization here, and add the additional > > kml->nr_slots_allocated = KVM_MEMSLOTS_NUM_ALLOC_DEFAULT;
IMHO it'll be cleaner to always allocate in the grow() so as to avoid details on e.g. initializations of kml->slots[].slot above. > > here. > > Then, you can remove the parameter from kvm_slots_grow() completely and just > call it > kvm_slots_double() and simplify a bit: > > static bool kvm_slots_double(KVMMemoryListener *kml) > { > unsigned int i, nr_slots_new, cur = kml->nr_slots_allocated; > KVMSlot *slots; > > nr_slots_new = MIN(cur * 2, kvm_state->nr_slots_max); > if (nr_slots_new == kvm_state->nr_slots_max) { > /* We reached the maximum */ > return false; > } > > assert(kml->slots); > slots = g_renew(KVMSlot, kml->slots, nr_slots_new); > /* > * g_renew() doesn't initialize extended buffers, however kvm > * memslots require fields to be zero-initialized. E.g. pointers, > * memory_size field, etc. > */ > memset(&slots[cur], 0x0, sizeof(slots[0]) * (nr_slots_new - cur)); > > for (i = cur; i < nr_slots_new; i++) { > slots[i].slot = i; > } > > kml->slots = slots; > kml->nr_slots_allocated = nr_slots_new; > trace_kvm_slots_grow(cur, nr_slots_new); > > return true; > } Personally I still think it cleaner to allow setting whatever size. We only have one place growing so far, which is pretty trivial to double there, IMO. I'll wait for a second opinion, or let me know if you have strong feelings.. > > > Apart from that looks sane. On the slot freeing/allocation path, there is > certainly > more optimization potential :) > > I'm surprised this 32k loop wasn't found earlier. Yes, it's in the range where it isn't too big to be discovered I guess, but large enough to affect many things, so better fix it sooner than later. This reminded me we should probably copy stable for this patch. I think it means I'll try to move this patch to the 1st patch to make Michael's life and downstream easier. -- Peter Xu