* Peter Xu (pet...@redhat.com) wrote: > On Wed, Mar 25, 2020 at 08:41:44PM +0000, Dr. David Alan Gilbert wrote: > > [...] > > > > +enum KVMReaperState { > > > + KVM_REAPER_NONE = 0, > > > + /* The reaper is sleeping */ > > > + KVM_REAPER_WAIT, > > > + /* The reaper is reaping for dirty pages */ > > > + KVM_REAPER_REAPING, > > > +}; > > > > That probably needs to be KVMDirtyRingReaperState > > given there are many things that could be reaped. > > Sure. > > > > > > +/* > > > + * KVM reaper instance, responsible for collecting the KVM dirty bits > > > + * via the dirty ring. > > > + */ > > > +struct KVMDirtyRingReaper { > > > + /* The reaper thread */ > > > + QemuThread reaper_thr; > > > + /* > > > + * Telling the reaper thread to wakeup. This should be used as a > > > + * generic interface to kick the reaper thread, like, in vcpu > > > + * threads where it gets a exit due to ring full. > > > + */ > > > + EventNotifier reaper_event; > > > > I think I'd just used a simple semaphore for this type of thing. > > I'm actually uncertain on which is cheaper... > > At the meantime, I wanted to poll two handles at the same time below > (in kvm_dirty_ring_reaper_thread). I don't know how to do that with > semaphore. Could it?
If you're OK with EventNotifier stick with it; it's just I'm used to doing with it with a semaphore; e.g. a flag then the semaphore - but that's fine. > [...] > > > > @@ -412,6 +460,18 @@ int kvm_init_vcpu(CPUState *cpu) > > > (void *)cpu->kvm_run + s->coalesced_mmio * PAGE_SIZE; > > > } > > > > > > + if (s->kvm_dirty_gfn_count) { > > > + cpu->kvm_dirty_gfns = mmap(NULL, s->kvm_dirty_ring_size, > > > + PROT_READ | PROT_WRITE, MAP_SHARED, > > > > Is the MAP_SHARED required? > > Yes it's required. It's the same when we map the per-vcpu kvm_run. > > If we use MAP_PRIVATE, it'll be in a COW fashion - when the userspace > writes to the dirty gfns the 1st time, it'll copy the current dirty > ring page in the kernel and from now on QEMU will never be able to see > what the kernel writes to the dirty gfn pages. MAP_SHARED means the > userspace and the kernel shares exactly the same page(s). OK, worth a comment. > > > > > + cpu->kvm_fd, > > > + PAGE_SIZE * > > > KVM_DIRTY_LOG_PAGE_OFFSET); > > > + if (cpu->kvm_dirty_gfns == MAP_FAILED) { > > > + ret = -errno; > > > + DPRINTF("mmap'ing vcpu dirty gfns failed\n"); > > > > Include errno? > > Will do. > > [...] > > > > +static uint64_t kvm_dirty_ring_reap(KVMState *s) > > > +{ > > > + KVMMemoryListener *kml; > > > + int ret, i, locked_count = s->nr_as; > > > + CPUState *cpu; > > > + uint64_t total = 0; > > > + > > > + /* > > > + * We need to lock all kvm slots for all address spaces here, > > > + * because: > > > + * > > > + * (1) We need to mark dirty for dirty bitmaps in multiple slots > > > + * and for tons of pages, so it's better to take the lock here > > > + * once rather than once per page. And more importantly, > > > + * > > > + * (2) We must _NOT_ publish dirty bits to the other threads > > > + * (e.g., the migration thread) via the kvm memory slot dirty > > > + * bitmaps before correctly re-protect those dirtied pages. > > > + * Otherwise we can have potential risk of data corruption if > > > + * the page data is read in the other thread before we do > > > + * reset below. > > > + */ > > > + for (i = 0; i < s->nr_as; i++) { > > > + kml = s->as[i].ml; > > > + if (!kml) { > > > + /* > > > + * This is tricky - we grow s->as[] dynamically now. Take > > > + * care of that case. We also assumed the as[] will fill > > > + * one by one starting from zero. Without this, we race > > > + * with register_smram_listener. > > > + * > > > + * TODO: make all these prettier... > > > + */ > > > + locked_count = i; > > > + break; > > > + } > > > + kvm_slots_lock(kml); > > > + } > > > + > > > + CPU_FOREACH(cpu) { > > > + total += kvm_dirty_ring_reap_one(s, cpu); > > > + } > > > + > > > + if (total) { > > > + ret = kvm_vm_ioctl(s, KVM_RESET_DIRTY_RINGS); > > > + assert(ret == total); > > > + } > > > + > > > + /* Unlock whatever locks that we have locked */ > > > + for (i = 0; i < locked_count; i++) { > > > + kvm_slots_unlock(s->as[i].ml); > > > + } > > > + > > > + CPU_FOREACH(cpu) { > > > + if (cpu->kvm_dirty_ring_full) { > > > + qemu_sem_post(&cpu->kvm_dirty_ring_avail); > > > + } > > > > Why do you need to wait until here - couldn't you release > > each vcpu after you've reaped it? > > We probably still need to wait. Even after we reaped all the dirty > bits we only marked the pages as "collected", the buffers will only be > available again until the kernel re-protect those pages (when the > above KVM_RESET_DIRTY_RINGS completes). Before that, continuing the > vcpu could let it exit again with the same ring full event. Ah OK. > [...] > > > > +static int kvm_dirty_ring_reaper_init(KVMState *s) > > > +{ > > > + struct KVMDirtyRingReaper *r = &s->reaper; > > > + int ret; > > > + > > > + ret = event_notifier_init(&r->reaper_event, false); > > > + assert(ret == 0); > > > + ret = event_notifier_init(&r->reaper_flush_event, false); > > > + assert(ret == 0); > > > + qemu_sem_init(&r->reaper_flush_sem, 0); > > > + > > > + qemu_thread_create(&r->reaper_thr, "kvm-reaper", > > > + kvm_dirty_ring_reaper_thread, > > > + s, QEMU_THREAD_JOINABLE); > > > > That's marked as joinable - does it ever get joined? > > If the reaper thread does exit on error (I can only see the poll > > failure?) - what happens elsewhere - will things still try and kick it? > > The reaper thread is not designed to fail for sure. If it fails, it'll > exit without being joined, but otherwise I'll just abort() directly in > the thread which seems to be not anything better... > > Regarding to "why not join() it": I think it's simply because we don't > have corresponding operation to AccelClass.init_machine() and > kvm_init(). :-) From that POV, I think I'll still use JOINABLE with > the hope that someday the destroy_machine() hook will be ready and > we'll be able to join it. OK, that's fine. Dave > Thanks, > > -- > Peter Xu > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK