Shaohua Li wrote: > Hi, > I saw some discussions on the topic but no progress. I did an > experiment to make guest page be allocated dynamically and swap out. > please see attachment patches. It's not yet for merge but I'd like get > some suggestions and help. Patches (against kvm-19) work here but > maybe not very stable as there should be some lock issue for swapout, > which I'll do more check later. If you are brave, please try :).
Nice work. This is fairly different from what I had in mind - I wanted to use regular address spaces in kvm, whereas this patchset adds swapout capability to the kvm address space. Differences between the two approaches include: - yours is probably simpler :) - possibly less intrusive code mm changes with using regular address spaces - automatic hugetlbfs support (this was my main motivation for generic address spaces, esp. with npt/ept). of course hugetlbfs can be implemented with your approach as well - your approach allows kvm to continue using page->private, so it saves memory and requires less kvm modification - using Linux address spaces allows paging to file-backed storage, not just swap Ultimately I think the balance is in favor of your approach, as it is more tightly coupled with kvm and can therefore be faster. The simplicity also helps a lot. > Some > issues I have: > 1. there is a spinlock to pretoct kvm struct, we can't sleep in it. A > possible solution is do a 'release lock, sleep and retry', but the > shadow page fault path sounds not easy to follow it. The spinlock also > prevents vcpu is migrated to other cpus as vmx operation must be done > in the cpu vcpu runs. I changed it to a semaphore plus a cpu affinity > setting. It's a little hacky, I'd see if there are better approaches. My plan is to teach the scheduler about kvm, so it can call a callback when a vcpu is migrated. That will allow re-enabling preemption in all kvm code except the actual entry/exit sequence. This is an improvement all over (for realtime, for easier coding, for latency) so I hope to to it soon. > 2. Linux page relcaim can't get if a guest page is referenced often. > My current patch just bliendly adds guest page to lru, not optimized. Well, that will always be a problem with paging guest memory. There are some patches floating around to allow a guest to give hints to the host about page recency, for s390, which may help. > 3. kvm_ops.tlb_flush should really send an IPI to make the vcpu flush > tlb, as it might be called in other cpus other than the cpu vcpu run. > This makes the swapout path not be able to zap shadow page tables. My > patch just skip any guest page which has shadow page table points to. > I assume kvm smp guest support will improve the tlb_flush. > Yes. The apic patchset includes mechanisms for interrupting a running vcpu which can be used for this. > @@ -151,9 +151,8 @@ > walker->inherited_ar &= walker->table[index]; > table_gfn = (*ptep & PT_BASE_ADDR_MASK) >> PAGE_SHIFT; > paddr = safe_gpa_to_hpa(vcpu, *ptep & PT_BASE_ADDR_MASK); > - kunmap_atomic(walker->table, KM_USER0); > - walker->table = kmap_atomic(pfn_to_page(paddr >> PAGE_SHIFT), > - KM_USER0); > + kunmap(walker->table); > + walker->table = kmap(pfn_to_page(paddr >> PAGE_SHIFT)); > kunmap() wants a struct page IIRC. It's also much slower than the atomic variant on i386+HIGHMEM, so I'd rather avoid it. > @@ -1099,11 +1121,23 @@ > } > } > > +static void mmu_zap_active_pages(struct kvm_vcpu *vcpu) > +{ > + struct kvm_mmu_page *page; > + > + while (!list_empty(&vcpu->kvm->active_mmu_pages)) { > + page = container_of(vcpu->kvm->active_mmu_pages.next, > + struct kvm_mmu_page, link); > + kvm_mmu_zap_page(vcpu, page); > + } > +} > + > int kvm_mmu_reset_context(struct kvm_vcpu *vcpu) > { > int r; > > destroy_kvm_mmu(vcpu); > + mmu_zap_active_pages(vcpu); > r = init_kvm_mmu(vcpu); > if (r < 0) > goto out; > This is called on set_cr0(), which can be called fairly often. However, I think it can be qualified on changing the paging related bits. > Index: kvm/kernel/paging_tmpl.h > =================================================================== > --- kvm.orig/kernel/paging_tmpl.h 2007-05-21 09:20:11.000000000 +0800 > +++ kvm/kernel/paging_tmpl.h 2007-05-21 09:20:26.000000000 +0800 > @@ -369,7 +369,7 @@ > *shadow_ent |= PT_WRITABLE_MASK; > FNAME(mark_pagetable_dirty)(vcpu->kvm, walker); > *guest_ent |= PT_DIRTY_MASK; > - rmap_add(vcpu, shadow_ent); > +// rmap_add(vcpu, shadow_ent); > ?? > + > +static void kvm_invalidatepage(struct page *page, unsigned long offset) > +{ > + /* > + * truncate_page is done after vcpu_free, that means all shadow page > + * table should be freed already, we should never get here > + */ > + BUG(); > +} > Eventually we'll want to add support for invalidating a vm page, to support ballooning and similar mechanisms. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel