On Mon, Mar 22, 2021 at 09:37:19PM +0800, Keqian Zhu wrote: > > +/* Should be with all slots_lock held for the address spaces. */ > > +static void kvm_dirty_ring_mark_page(KVMState *s, uint32_t as_id, > > + uint32_t slot_id, uint64_t offset) > > +{ > > + KVMMemoryListener *kml; > > + KVMSlot *mem; > > + > > + if (as_id >= s->nr_as) { > > + return; > > + } > > + > > + kml = s->as[as_id].ml; > > + mem = &kml->slots[slot_id]; > > + > > + if (!mem->memory_size || offset >= (mem->memory_size / > > TARGET_PAGE_SIZE)) { > It seems that TARGET_PAGE_SIZE should be qemu_real_host_page_size.
Fixed. [...] > > +/* > > + * Flush all the existing dirty pages to the KVM slot buffers. When > > + * this call returns, we guarantee that all the touched dirty pages > > + * before calling this function have been put into the per-kvmslot > > + * dirty bitmap. > > + * > > + * This function must be called with BQL held. > > + */ > > +static void kvm_dirty_ring_flush(struct KVMDirtyRingReaper *r) > The argument is not used. Indeed, removed. > > > +{ > > + trace_kvm_dirty_ring_flush(0); > > + /* > > + * The function needs to be serialized. Since this function > > + * should always be with BQL held, serialization is guaranteed. > > + * However, let's be sure of it. > > + */ > > + assert(qemu_mutex_iothread_locked()); > > + /* > > + * First make sure to flush the hardware buffers by kicking all > > + * vcpus out in a synchronous way. > > + */ > > + kvm_cpu_synchronize_kick_all(); > Can we make this function to be architecture specific? > It seems that kick out vCPU is an architecture specific way to flush hardware > buffers > to dirty ring (for x86 PML). I can do that, but I'd say it's kind of an overkill if after all the kernel support is not there yet, so I still tend to make it as simple as possible. [...] > > +static void *kvm_dirty_ring_reaper_thread(void *data) > > +{ > > + KVMState *s = data; > > + struct KVMDirtyRingReaper *r = &s->reaper; > > + > > + rcu_register_thread(); > > + > > + trace_kvm_dirty_ring_reaper("init"); > > + > > + while (true) { > > + r->reaper_state = KVM_DIRTY_RING_REAPER_WAIT; > > + trace_kvm_dirty_ring_reaper("wait"); > > + /* > > + * TODO: provide a smarter timeout rather than a constant? > > + */ > > + sleep(1); > > + > > + trace_kvm_dirty_ring_reaper("wakeup"); > > + r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING; > > + > > + qemu_mutex_lock_iothread(); > > + kvm_dirty_ring_reap(s); > > + qemu_mutex_unlock_iothread(); > > + > > + r->reaper_iteration++; > > + } > I don't know when does this iteration exit? > And I see that we start this reaper_thread in kvm_init(), maybe it's better > to start it > when start dirty log and stop it when stop dirty log. Yes we can make it conditional, but note that we can't hook at functions like memory_global_dirty_log_start() because that is only for migration purpose. Currently QEMU exports the dirty tracking more than that, e.g., to the VGA code. We'll need to try to detect whether there's any existing MR got its mr->dirty_log_mask set (besides global_dirty_log being set). When all of them got cleared we'll need to detect too so as to turn the thread off. It's just easier to me to run this thread with such a timeout, then when not necessary it'll see empty ring and return fast (index comparison for each ring). Not to mention the VGA dirty tracking should be on for most of the VM lifecycle, so even if we have that knob this thread will probably be running for 99% of the time as long as any MR has its DIRTY_MEMORY_VGA bit set. [...] > > diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h > > index c68bc3ba8af..2f0991d93f7 100644 > > --- a/include/hw/core/cpu.h > > +++ b/include/hw/core/cpu.h > > @@ -323,6 +323,11 @@ struct qemu_work_item; > > * @ignore_memory_transaction_failures: Cached copy of the MachineState > > * flag of the same name: allows the board to suppress calling of the > > * CPU do_transaction_failed hook function. > > + * @kvm_dirty_ring_full: > > + * Whether the kvm dirty ring of this vcpu is soft-full. > > + * @kvm_dirty_ring_avail: > > + * Semaphore to be posted when the kvm dirty ring of the vcpu is > > + * available again. > The doc does not match code. Right; fixed. Thanks for taking a look, keqian. -- Peter Xu