Re: unit tests and get_user_pages_ptes_fast()

2010-10-05 Thread Avi Kivity

 On 10/05/2010 01:59 AM, Marcelo Tosatti wrote:

Yep, the drawback is the unnecessary write fault. What i have here is:

--- kvm.orig/virt/kvm/kvm_main.c
+++ kvm/virt/kvm/kvm_main.c
@@ -827,7 +827,7 @@ unsigned long gfn_to_hva(struct kvm *kvm
  }
  EXPORT_SYMBOL_GPL(gfn_to_hva);

-pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
+pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn, int *writable)
  {
 struct page *page[1];
 unsigned long addr;
@@ -842,8 +842,16 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t
 return page_to_pfn(bad_page);
 }

+   *writable = 1;
 npages = get_user_pages_fast(addr, 1, 1, page);

+   /* attempt to map read-only */
+   if (unlikely(npages != 1)) {
+   npages = get_user_pages_fast(addr, 1, 0, page);
+   if (npages == 1)
+   *writable = 0;
+   }
+
 if (unlikely(npages != 1)) {
 struct vm_area_struct *vma;

Can rebase and resend, if you'd like.



That will work for me but not for ksm.  I guess it's good to get things 
going, so please to post it.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unit tests and get_user_pages_ptes_fast()

2010-10-05 Thread Marcelo Tosatti
On Tue, Oct 05, 2010 at 09:36:59AM +0200, Avi Kivity wrote:
  On 10/05/2010 01:59 AM, Marcelo Tosatti wrote:
 Yep, the drawback is the unnecessary write fault. What i have here is:
 
 --- kvm.orig/virt/kvm/kvm_main.c
 +++ kvm/virt/kvm/kvm_main.c
 @@ -827,7 +827,7 @@ unsigned long gfn_to_hva(struct kvm *kvm
   }
   EXPORT_SYMBOL_GPL(gfn_to_hva);
 
 -pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
 +pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn, int *writable)
   {
  struct page *page[1];
  unsigned long addr;
 @@ -842,8 +842,16 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t
  return page_to_pfn(bad_page);
  }
 
 +   *writable = 1;
  npages = get_user_pages_fast(addr, 1, 1, page);
 
 +   /* attempt to map read-only */
 +   if (unlikely(npages != 1)) {
 +   npages = get_user_pages_fast(addr, 1, 0, page);
 +   if (npages == 1)
 +   *writable = 0;
 +   }
 +
  if (unlikely(npages != 1)) {
  struct vm_area_struct *vma;
 
 Can rebase and resend, if you'd like.
 
 
 That will work for me but not for ksm.  I guess it's good to get
 things going, so please to post it.

It'll not be so advantageous for ksm because there should be read-faults
very rarely on that case.

Will post.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unit tests and get_user_pages_ptes_fast()

2010-10-05 Thread Andrea Arcangeli
On Tue, Oct 05, 2010 at 06:22:17AM -0300, Marcelo Tosatti wrote:
 It'll not be so advantageous for ksm because there should be read-faults
 very rarely on that case.

It'll also make all clean swapcache dirty for no good.

 Will post.

If we've to walk pagetables twice, why don't you do this:

writable=1
get_user_pages_fast(write=write_fault)
if (!write_fault)
   writable = __get_user_pages_fast(write=1)

That will solve the debugging knob and it'll solve ksm and it'll be
optimal for read swapins on exclusive clean swapcache too.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unit tests and get_user_pages_ptes_fast()

2010-10-05 Thread Avi Kivity

 On 10/05/2010 04:15 PM, Andrea Arcangeli wrote:

On Tue, Oct 05, 2010 at 06:22:17AM -0300, Marcelo Tosatti wrote:
  It'll not be so advantageous for ksm because there should be read-faults
  very rarely on that case.

It'll also make all clean swapcache dirty for no good.

  Will post.

If we've to walk pagetables twice, why don't you do this:

writable=1
get_user_pages_fast(write=write_fault)
if (!write_fault)
writable = __get_user_pages_fast(write=1)

That will solve the debugging knob and it'll solve ksm and it'll be
optimal for read swapins on exclusive clean swapcache too.


But it means an extra vmexit in the following case:

- read fault
- page is present and writeable in the Linux page table

which is very common.  For this you need get_user_pages_ptes_fast().

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unit tests and get_user_pages_ptes_fast()

2010-10-05 Thread Andrea Arcangeli
On Tue, Oct 05, 2010 at 04:25:05PM +0200, Avi Kivity wrote:
   On 10/05/2010 04:15 PM, Andrea Arcangeli wrote:
  On Tue, Oct 05, 2010 at 06:22:17AM -0300, Marcelo Tosatti wrote:
It'll not be so advantageous for ksm because there should be read-faults
very rarely on that case.
 
  It'll also make all clean swapcache dirty for no good.
 
Will post.
 
  If we've to walk pagetables twice, why don't you do this:
 
  writable=1
  get_user_pages_fast(write=write_fault)
  if (!write_fault)
  writable = __get_user_pages_fast(write=1)
 
  That will solve the debugging knob and it'll solve ksm and it'll be
  optimal for read swapins on exclusive clean swapcache too.
 
 But it means an extra vmexit in the following case:
 
 - read fault
 - page is present and writeable in the Linux page table
 
 which is very common.  For this you need get_user_pages_ptes_fast().

With a read fault, the VM already sets the pte as writable if the VM
permissions allows that and the page isn't shared (i.e. if it's an
exclusive swap page). We've just to check if it did that or not. So
when it's a read fault we've to run __get_user_pages_fast(write=1)
before we can assume the page is mapped writable in the pte.

So I don't see the problem... in terms of page faults is optimal. Only
downside is having to walk the pagetables twice, the second time to
verify if the first gup_fast has marked the host pte writable or not.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unit tests and get_user_pages_ptes_fast()

2010-10-05 Thread Avi Kivity

 On 10/05/2010 04:32 PM, Andrea Arcangeli wrote:

On Tue, Oct 05, 2010 at 04:25:05PM +0200, Avi Kivity wrote:
On 10/05/2010 04:15 PM, Andrea Arcangeli wrote:
On Tue, Oct 05, 2010 at 06:22:17AM -0300, Marcelo Tosatti wrote:
   It'll not be so advantageous for ksm because there should be 
read-faults
   very rarely on that case.
  
It'll also make all clean swapcache dirty for no good.
  
   Will post.
  
If we've to walk pagetables twice, why don't you do this:
  
writable=1
get_user_pages_fast(write=write_fault)
if (!write_fault)
writable = __get_user_pages_fast(write=1)
  
That will solve the debugging knob and it'll solve ksm and it'll be
optimal for read swapins on exclusive clean swapcache too.

  But it means an extra vmexit in the following case:

  - read fault
  - page is present and writeable in the Linux page table

  which is very common.  For this you need get_user_pages_ptes_fast().

With a read fault, the VM already sets the pte as writable if the VM
permissions allows that and the page isn't shared (i.e. if it's an
exclusive swap page). We've just to check if it did that or not. So
when it's a read fault we've to run __get_user_pages_fast(write=1)
before we can assume the page is mapped writable in the pte.

So I don't see the problem... in terms of page faults is optimal. Only
downside is having to walk the pagetables twice, the second time to
verify if the first gup_fast has marked the host pte writable or not.


You're right, I misread your pseudocode.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


unit tests and get_user_pages_ptes_fast()

2010-10-04 Thread Avi Kivity
 During the last kvm forum, I described a unit test framework that can 
help test the kvm APIs.  Briefly, it starts a process in host userspace, 
which sets up a memory slot mapping gpa 0:3G to hva 0:3G.  It then sets 
up guest registers for unpaged protected mode (or paged protected mode 
with 1:1 mapping).  The effect is that we have a 1:1 gva-hva 
translation, and can use KVM_RUN to run host code in guest mode.


There is a snag however.  kvm calls get_user_pages_fast(.write = 1), and 
the host process maps its code pages read-only.


The way I'd like to work around this is to map read-only accesses to 
read-only pages as read-only.  This also prevents ksm cow pages from 
being broken by read accesses.  It can also be used to get the page size 
for transparent huge pages (and later hugetlbfs too).


So, for a read fault we do:

  pte_t pte;
  get_user_pages_ptes_fast(..., page, pte, 1, .write = 0)
  ...
  if (pte_transhuge(pte)) // or however it's called
  ...
  ...
  if (pte_write(pte))
map writeable
  else
map readonly

Any snags?  or alternatives?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unit tests and get_user_pages_ptes_fast()

2010-10-04 Thread Andrea Arcangeli
Hi Avi,

On Mon, Oct 04, 2010 at 11:35:28AM +0200, Avi Kivity wrote:
   During the last kvm forum, I described a unit test framework that can 
 help test the kvm APIs.  Briefly, it starts a process in host userspace, 
 which sets up a memory slot mapping gpa 0:3G to hva 0:3G.  It then sets 
 up guest registers for unpaged protected mode (or paged protected mode 
 with 1:1 mapping).  The effect is that we have a 1:1 gva-hva 
 translation, and can use KVM_RUN to run host code in guest mode.
 
 There is a snag however.  kvm calls get_user_pages_fast(.write = 1), and 
 the host process maps its code pages read-only.
 
 The way I'd like to work around this is to map read-only accesses to 
 read-only pages as read-only.  This also prevents ksm cow pages from 
 being broken by read accesses.  It can also be used to get the page size 
 for transparent huge pages (and later hugetlbfs too).
 
 So, for a read fault we do:
 
pte_t pte;
get_user_pages_ptes_fast(..., page, pte, 1, .write = 0)
...
if (pte_transhuge(pte)) // or however it's called
...
...
if (pte_write(pte))
  map writeable
else
  map readonly
 
 Any snags?  or alternatives?

I think we tried to do this write=0 before, it provides benefits for
swapins from swapcache too (after the page is unmapped but the
swapcache is present and uptodate and clean). If I recall correctly
the only obstacle was the fact the out of sync code wants to map the
spte writable if it can, but if we use write=0 in gup, it won't know
if it can. If it'd wait a real write access from guest, we'd risk
creating a spte wrprotect fault for nothing (it should only write a
write access from the guest if the linux VM solves the page fault with
a readonly pte, even the minor-read-fault from exclusive swapcache
will not set the dirty bit but it will still set the pte_write on the
pte so the out of sync code should take advantage of that information
in the host pte). In the past you suggested some TRY_WRITE operation
instead of only read/write. We just need to pass the write bit of the
pte somehow to the gup caller.

Full agreement that once we use write=0 for read faults your problem
with the debugging userland running in guest mode will have better
chance to work too :).

Andrea
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unit tests and get_user_pages_ptes_fast()

2010-10-04 Thread Marcelo Tosatti
On Mon, Oct 04, 2010 at 03:40:52PM +0200, Andrea Arcangeli wrote:
 Hi Avi,
 
 On Mon, Oct 04, 2010 at 11:35:28AM +0200, Avi Kivity wrote:
During the last kvm forum, I described a unit test framework that can 
  help test the kvm APIs.  Briefly, it starts a process in host userspace, 
  which sets up a memory slot mapping gpa 0:3G to hva 0:3G.  It then sets 
  up guest registers for unpaged protected mode (or paged protected mode 
  with 1:1 mapping).  The effect is that we have a 1:1 gva-hva 
  translation, and can use KVM_RUN to run host code in guest mode.
  
  There is a snag however.  kvm calls get_user_pages_fast(.write = 1), and 
  the host process maps its code pages read-only.
  
  The way I'd like to work around this is to map read-only accesses to 
  read-only pages as read-only.  This also prevents ksm cow pages from 
  being broken by read accesses.  It can also be used to get the page size 
  for transparent huge pages (and later hugetlbfs too).
  
  So, for a read fault we do:
  
 pte_t pte;
 get_user_pages_ptes_fast(..., page, pte, 1, .write = 0)
 ...
 if (pte_transhuge(pte)) // or however it's called
 ...
 ...
 if (pte_write(pte))
   map writeable
 else
   map readonly
  
  Any snags?  or alternatives?
 
 I think we tried to do this write=0 before, it provides benefits for
 swapins from swapcache too (after the page is unmapped but the
 swapcache is present and uptodate and clean). If I recall correctly
 the only obstacle was the fact the out of sync code wants to map the
 spte writable if it can, but if we use write=0 in gup, it won't know
 if it can. If it'd wait a real write access from guest, we'd risk
 creating a spte wrprotect fault for nothing (it should only write a
 write access from the guest if the linux VM solves the page fault with
 a readonly pte, even the minor-read-fault from exclusive swapcache
 will not set the dirty bit but it will still set the pte_write on the
 pte so the out of sync code should take advantage of that information
 in the host pte). In the past you suggested some TRY_WRITE operation
 instead of only read/write. We just need to pass the write bit of the
 pte somehow to the gup caller.

Yep, the drawback is the unnecessary write fault. What i have here is:

--- kvm.orig/virt/kvm/kvm_main.c
+++ kvm/virt/kvm/kvm_main.c
@@ -827,7 +827,7 @@ unsigned long gfn_to_hva(struct kvm *kvm
 }
 EXPORT_SYMBOL_GPL(gfn_to_hva);

-pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
+pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn, int *writable)
 {
struct page *page[1];
unsigned long addr;
@@ -842,8 +842,16 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t
return page_to_pfn(bad_page);
}

+   *writable = 1;
npages = get_user_pages_fast(addr, 1, 1, page);

+   /* attempt to map read-only */
+   if (unlikely(npages != 1)) {
+   npages = get_user_pages_fast(addr, 1, 0, page);
+   if (npages == 1)
+   *writable = 0;
+   }
+
if (unlikely(npages != 1)) {
struct vm_area_struct *vma;

Can rebase and resend, if you'd like.

 Full agreement that once we use write=0 for read faults your problem
 with the debugging userland running in guest mode will have better
 chance to work too :).
 
 Andrea
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html