Shaohua Li wrote:
> Hi,
> I saw some discussions on the topic but no progress. I did an
> experiment to make guest page be allocated dynamically and swap out.
> please see attachment patches. It's not yet for merge but I'd like get
> some suggestions and help. Patches (against kvm-19) work here but
> maybe not very stable as there should be some lock issue for swapout,
> which I'll do more check later. If you are brave, please try :). 

Nice work.  This is fairly different from what I had in mind - I wanted 
to use regular address spaces in kvm, whereas this patchset adds swapout 
capability to the kvm address space.

Differences between the two approaches include:

- yours is probably simpler :)
- possibly less intrusive code mm changes with using regular address spaces
- automatic hugetlbfs support (this was my main motivation for generic 
address spaces, esp. with npt/ept).  of course hugetlbfs can be 
implemented with your approach as well
- your approach allows kvm to continue using page->private, so it saves 
memory and requires less kvm modification
- using Linux address spaces allows paging to file-backed storage, not 
just swap

Ultimately I think the balance is in favor of your approach, as it is 
more tightly coupled with kvm and can therefore be faster.  The 
simplicity also helps a lot.

> Some
> issues I have:
> 1. there is a spinlock to pretoct kvm struct, we can't sleep in it. A
> possible solution is do a 'release lock, sleep and retry', but the
> shadow page fault path sounds not easy to follow it. The spinlock also
> prevents vcpu is migrated to other cpus as vmx operation must be done
> in the cpu vcpu runs. I changed it to a semaphore plus a cpu affinity
> setting. It's a little hacky, I'd see if there are better approaches.

My plan is to teach the scheduler about kvm, so it can call a callback 
when a vcpu is migrated.  That will allow re-enabling preemption in all 
kvm code except the actual entry/exit sequence.  This is an improvement 
all over (for realtime, for easier coding, for latency) so I hope to to 
it soon.

> 2. Linux page relcaim can't get if a guest page is referenced often.
> My current patch just bliendly adds guest page to lru, not optimized.

Well, that will always be a problem with paging guest memory.  There are 
some patches floating around to allow a guest to give hints to the host 
about page recency, for s390, which may help.

> 3. kvm_ops.tlb_flush should really send an IPI to make the vcpu flush
> tlb, as it might be called in other cpus other than the cpu vcpu run.
> This makes the swapout path not be able to zap shadow page tables. My
> patch just skip any guest page which has shadow page table points to.
> I assume kvm smp guest support will improve the tlb_flush.
>

Yes.  The apic patchset includes mechanisms for interrupting a running 
vcpu which can be used for this.

> @@ -151,9 +151,8 @@
>               walker->inherited_ar &= walker->table[index];
>               table_gfn = (*ptep & PT_BASE_ADDR_MASK) >> PAGE_SHIFT;
>               paddr = safe_gpa_to_hpa(vcpu, *ptep & PT_BASE_ADDR_MASK);
> -             kunmap_atomic(walker->table, KM_USER0);
> -             walker->table = kmap_atomic(pfn_to_page(paddr >> PAGE_SHIFT),
> -                                         KM_USER0);
> +             kunmap(walker->table);
> +             walker->table = kmap(pfn_to_page(paddr >> PAGE_SHIFT));
>   

kunmap() wants a struct page IIRC.  It's also much slower than the 
atomic variant on i386+HIGHMEM, so I'd rather avoid it.

> @@ -1099,11 +1121,23 @@
>       }
>  }
>  
> +static void mmu_zap_active_pages(struct kvm_vcpu *vcpu)
> +{
> +     struct kvm_mmu_page *page;
> +
> +     while (!list_empty(&vcpu->kvm->active_mmu_pages)) {
> +             page = container_of(vcpu->kvm->active_mmu_pages.next,
> +                                 struct kvm_mmu_page, link);
> +             kvm_mmu_zap_page(vcpu, page);
> +     }
> +}
> +
>  int kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
>  {
>       int r;
>  
>       destroy_kvm_mmu(vcpu);
> +     mmu_zap_active_pages(vcpu);
>       r = init_kvm_mmu(vcpu);
>       if (r < 0)
>               goto out;
>   

This is called on set_cr0(), which can be called fairly often.  However, 
I think it can be qualified on changing the paging related bits.

> Index: kvm/kernel/paging_tmpl.h
> ===================================================================
> --- kvm.orig/kernel/paging_tmpl.h     2007-05-21 09:20:11.000000000 +0800
> +++ kvm/kernel/paging_tmpl.h  2007-05-21 09:20:26.000000000 +0800
> @@ -369,7 +369,7 @@
>       *shadow_ent |= PT_WRITABLE_MASK;
>       FNAME(mark_pagetable_dirty)(vcpu->kvm, walker);
>       *guest_ent |= PT_DIRTY_MASK;
> -     rmap_add(vcpu, shadow_ent);
> +//   rmap_add(vcpu, shadow_ent);
>   

??

> +
> +static void kvm_invalidatepage(struct page *page, unsigned long offset)
> +{
> +     /*
> +      * truncate_page is done after vcpu_free, that means all shadow page
> +      * table should be freed already, we should never get here
> +      */
> +     BUG();
> +}
>   

Eventually we'll want to add support for invalidating a vm page, to 
support ballooning and similar mechanisms.


-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Reply via email to