Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-12-04 Thread Avi Kivity
On 12/03/2011 06:37 AM, Takuya Yoshikawa wrote:
 Avi Kivity a...@redhat.com wrote:
  That's true.  But some applications do require low latency, and the
  current code can impose a lot of time with the mmu spinlock held.
  
  The total amount of work actually increases slightly, from O(N) to O(N
  log N), but since the tree is so wide, the overhead is small.
  

 Controlling the latency can be achieved by making the user space limit
 the number of dirty pages to scan without hacking the core mmu code.

   The fact that we cannot transfer so many pages on the network at
   once suggests this is reasonable.

That is true.  Write protecting everything at once means that there is a
large window between the sampling the dirty log, and transferring the
page.  Any writes within that window cause a re-transfer, even when they
should not.


 With the rmap write protection method in KVM, the only thing we need is
 a new GET_DIRTY_LOG api which takes the [gfn_start, gfn_end] to scan,
 or max_write_protections optionally.

Right.


   I remember that someone suggested splitting the slot at KVM forum.
   Same effect with less effort.

 QEMU can also avoid unwanted page faults by using this api wisely.

   E.g. you can use this for Interactivity improvements TODO on
   KVM wiki, I think.

 Furthermore, QEMU may be able to use multiple threads for the memory
 copy task.

   Each thread has its own range of memory to copy, and does
   GET_DIRTY_LOG independently.  This will make things easy to
   add further optimizations in QEMU.

 In summary, my impression is that the main cause of the current latency
 problem is not the write protection of KVM but the strategy which tries
 to cook the large slot in one hand.

 What do you think?

I agree.  Maybe O(1) write protection has a place, but it is secondary
to fine-grained dirty logging, and if we implement it, it should be
after your idea, and further measurements.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-12-02 Thread Takuya Yoshikawa
Avi Kivity a...@redhat.com wrote:
 That's true.  But some applications do require low latency, and the
 current code can impose a lot of time with the mmu spinlock held.
 
 The total amount of work actually increases slightly, from O(N) to O(N
 log N), but since the tree is so wide, the overhead is small.
 

Controlling the latency can be achieved by making the user space limit
the number of dirty pages to scan without hacking the core mmu code.

The fact that we cannot transfer so many pages on the network at
once suggests this is reasonable.

With the rmap write protection method in KVM, the only thing we need is
a new GET_DIRTY_LOG api which takes the [gfn_start, gfn_end] to scan,
or max_write_protections optionally.

I remember that someone suggested splitting the slot at KVM forum.
Same effect with less effort.

QEMU can also avoid unwanted page faults by using this api wisely.

E.g. you can use this for Interactivity improvements TODO on
KVM wiki, I think.

Furthermore, QEMU may be able to use multiple threads for the memory
copy task.

Each thread has its own range of memory to copy, and does
GET_DIRTY_LOG independently.  This will make things easy to
add further optimizations in QEMU.

In summary, my impression is that the main cause of the current latency
problem is not the write protection of KVM but the strategy which tries
to cook the large slot in one hand.

What do you think?

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-12-01 Thread Avi Kivity
On 11/30/2011 09:03 AM, Xiao Guangrong wrote:
 On 11/29/2011 08:01 PM, Avi Kivity wrote:

  On 11/29/2011 01:56 PM, Xiao Guangrong wrote:
  On 11/29/2011 07:20 PM, Avi Kivity wrote:
 
 
  We used to have a bitmap in a shadow page with a bit set for every slot
  pointed to by the page.  If we extend this to non-leaf pages (so, when
  we set a bit, we propagate it through its parent_ptes list), then we do
  the following on write fault:
 
 
 
  Thanks for the detail.
 
  Um, propagating slot bit to parent ptes is little slow, especially, it
  is the overload for no Xwindow guests which is dirty logged only in the
  migration(i guess most linux guests are running on this mode and migration
  is not frequent). No?
  
  You need to propagate very infrequently.  The first pte added to a page
  will need to propagate, but the second (if from the same slot, which is
  likely) will already have the bit set in the page, so we're assured it's
  set in all its parents.
  


 What will happen if a guest page is unmapped or a shadow page is zapped?
 It should immediately clear the slot bit of the shadow page and its
 parent, it means it should propagate this clear slot bit event to all
 parents, in the case of softmmu. zapping shadow page is frequently, maybe
 it is unacceptable?

You can keep the bit set.  A clear bit means there are exactly zero
pages from the slot in this mmu page or its descendents.  A set bit
means there are zero or more pages.  If we have a bit accidentally set,
nothing bad happens.

 It does not like unsync bit which can be lazily cleared, because all bits
 of hierarchy can be cleared when cr3 reload.

With tdp (and without nested virt) the mappings never change anyway. 
With shadow, they do change.  Not sure how to handle that at the higher
levels.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-12-01 Thread Avi Kivity
On 11/30/2011 07:15 AM, Takuya Yoshikawa wrote:
 (2011/11/30 14:02), Takuya Yoshikawa wrote:

 IIUC, even though O(1) is O(1) at the timing of GET DIRTY LOG, it
 needs O(N) write
 protections with respect to the total number of dirty pages:
 distributed, but
 actually each page fault, which should be logged, does some write
 protection?

 Sorry, was not precise.  It depends on the level, and not completely
 distributed.
 But I think it is O(N), and the total number of costs will not change
 so much,
 I guess.

That's true.  But some applications do require low latency, and the
current code can impose a lot of time with the mmu spinlock held.

The total amount of work actually increases slightly, from O(N) to O(N
log N), but since the tree is so wide, the overhead is small.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-11-29 Thread Xiao Guangrong
Avi Kivity avi at redhat.com writes:

 
 On 11/16/2011 06:28 AM, Takuya Yoshikawa wrote:
  (2011/11/14 21:39), Avi Kivity wrote:
  There was a patchset from Peter Zijlstra that converted mmu notifiers to
  be preemptible, with that, we can convert the mmu spinlock to a mutex,
  I'll see what happened to it.
 
  Interesting!
 
  There is a third method of doing write protection, and that is by
  write-protecting at the higher levels of the paging hierarchy.  The
  advantage there is that write protection is O(1) no matter how large the
  guest is, or the number of dirty pages.
 
  To write protect all guest memory, we just write protect the 512 PTEs at
  the very top, and leave the rest alone.  When the guest writes to a
  page, we allow writes for the top-level PTE that faulted, and
  write-protect all the PTEs that it points to.
 
  One important point is that the guest, not GET DIRTY LOG caller, will pay
  for the write protection at the timing of faults.
 
 I don't think there is a significant difference.  The number of write
 faults does not change.  The amount of work done per fault does, but not
 by much, thanks to the writeable bitmap.
 

Avi,

I think it needs more thinking if only less page need be write protected.

For example, framebuffer-based device used by Xwindow, only ~64M pages needs
to be write protected, but in your way, guest will get write page fault on all
memory? Hmm?

It has some tricks but i missed?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-11-29 Thread Xiao Guangrong
Sorry, CC list is lost. :(

On 11/29/2011 06:01 PM, Xiao Guangrong wrote:

 Avi Kivity avi at redhat.com writes:
 

 On 11/16/2011 06:28 AM, Takuya Yoshikawa wrote:
 (2011/11/14 21:39), Avi Kivity wrote:
 There was a patchset from Peter Zijlstra that converted mmu notifiers to
 be preemptible, with that, we can convert the mmu spinlock to a mutex,
 I'll see what happened to it.

 Interesting!

 There is a third method of doing write protection, and that is by
 write-protecting at the higher levels of the paging hierarchy.  The
 advantage there is that write protection is O(1) no matter how large the
 guest is, or the number of dirty pages.

 To write protect all guest memory, we just write protect the 512 PTEs at
 the very top, and leave the rest alone.  When the guest writes to a
 page, we allow writes for the top-level PTE that faulted, and
 write-protect all the PTEs that it points to.

 One important point is that the guest, not GET DIRTY LOG caller, will pay
 for the write protection at the timing of faults.

 I don't think there is a significant difference.  The number of write
 faults does not change.  The amount of work done per fault does, but not
 by much, thanks to the writeable bitmap.

 
 Avi,
 
 I think it needs more thinking if only less page need be write protected.
 
 For example, framebuffer-based device used by Xwindow, only ~64M pages needs
 to be write protected, but in your way, guest will get write page fault on all
 memory? Hmm?
 
 It has some tricks but i missed?
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-11-29 Thread Takuya Yoshikawa

(2011/11/29 19:09), Xiao Guangrong wrote:

Sorry, CC list is lost. :(

On 11/29/2011 06:01 PM, Xiao Guangrong wrote:


Avi Kivityaviat  redhat.com  writes:



On 11/16/2011 06:28 AM, Takuya Yoshikawa wrote:

(2011/11/14 21:39), Avi Kivity wrote:

There was a patchset from Peter Zijlstra that converted mmu notifiers to
be preemptible, with that, we can convert the mmu spinlock to a mutex,
I'll see what happened to it.


Interesting!


There is a third method of doing write protection, and that is by
write-protecting at the higher levels of the paging hierarchy.  The
advantage there is that write protection is O(1) no matter how large the
guest is, or the number of dirty pages.

To write protect all guest memory, we just write protect the 512 PTEs at
the very top, and leave the rest alone.  When the guest writes to a
page, we allow writes for the top-level PTE that faulted, and
write-protect all the PTEs that it points to.


One important point is that the guest, not GET DIRTY LOG caller, will pay
for the write protection at the timing of faults.


I don't think there is a significant difference.  The number of write
faults does not change.  The amount of work done per fault does, but not
by much, thanks to the writeable bitmap.



Avi,

I think it needs more thinking if only less page need be write protected.

For example, framebuffer-based device used by Xwindow, only ~64M pages needs
to be write protected, but in your way, guest will get write page fault on all
memory? Hmm?

It has some tricks but i missed?


Do you mean write protecting slot by slot is difficult in the case of O(1)?

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-11-29 Thread Xiao Guangrong
On 11/29/2011 07:20 PM, Avi Kivity wrote:


 We used to have a bitmap in a shadow page with a bit set for every slot
 pointed to by the page.  If we extend this to non-leaf pages (so, when
 we set a bit, we propagate it through its parent_ptes list), then we do
 the following on write fault:
 


Thanks for the detail.

Um, propagating slot bit to parent ptes is little slow, especially, it
is the overload for no Xwindow guests which is dirty logged only in the
migration(i guess most linux guests are running on this mode and migration
is not frequent). No?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-11-29 Thread Avi Kivity
On 11/29/2011 01:56 PM, Xiao Guangrong wrote:
 On 11/29/2011 07:20 PM, Avi Kivity wrote:


  We used to have a bitmap in a shadow page with a bit set for every slot
  pointed to by the page.  If we extend this to non-leaf pages (so, when
  we set a bit, we propagate it through its parent_ptes list), then we do
  the following on write fault:
  


 Thanks for the detail.

 Um, propagating slot bit to parent ptes is little slow, especially, it
 is the overload for no Xwindow guests which is dirty logged only in the
 migration(i guess most linux guests are running on this mode and migration
 is not frequent). No?

You need to propagate very infrequently.  The first pte added to a page
will need to propagate, but the second (if from the same slot, which is
likely) will already have the bit set in the page, so we're assured it's
set in all its parents.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-11-29 Thread Avi Kivity
On 11/29/2011 02:01 PM, Avi Kivity wrote:
 On 11/29/2011 01:56 PM, Xiao Guangrong wrote:
  On 11/29/2011 07:20 PM, Avi Kivity wrote:
 
 
   We used to have a bitmap in a shadow page with a bit set for every slot
   pointed to by the page.  If we extend this to non-leaf pages (so, when
   we set a bit, we propagate it through its parent_ptes list), then we do
   the following on write fault:
   
 
 
  Thanks for the detail.
 
  Um, propagating slot bit to parent ptes is little slow, especially, it
  is the overload for no Xwindow guests which is dirty logged only in the
  migration(i guess most linux guests are running on this mode and migration
  is not frequent). No?

 You need to propagate very infrequently.  The first pte added to a page
 will need to propagate, but the second (if from the same slot, which is
 likely) will already have the bit set in the page, so we're assured it's
 set in all its parents.

btw, if you plan to work on this, let's agree on pseudocode/data
structures first to minimize churn.  I'll also want this documented in
mmu.txt.  Of course we can still end up with something different than
planned, but let's at least try to think of the issues in advance.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-11-29 Thread Takuya Yoshikawa

CCing qemu devel, Juan,

(2011/11/29 23:03), Avi Kivity wrote:

On 11/29/2011 02:01 PM, Avi Kivity wrote:

On 11/29/2011 01:56 PM, Xiao Guangrong wrote:

On 11/29/2011 07:20 PM, Avi Kivity wrote:



We used to have a bitmap in a shadow page with a bit set for every slot
pointed to by the page.  If we extend this to non-leaf pages (so, when
we set a bit, we propagate it through its parent_ptes list), then we do
the following on write fault:




Thanks for the detail.

Um, propagating slot bit to parent ptes is little slow, especially, it
is the overload for no Xwindow guests which is dirty logged only in the
migration(i guess most linux guests are running on this mode and migration
is not frequent). No?


You need to propagate very infrequently.  The first pte added to a page
will need to propagate, but the second (if from the same slot, which is
likely) will already have the bit set in the page, so we're assured it's
set in all its parents.


btw, if you plan to work on this, let's agree on pseudocode/data
structures first to minimize churn.  I'll also want this documented in
mmu.txt.  Of course we can still end up with something different than
planned, but let's at least try to think of the issues in advance.



I want to hear the overall view as well.

Now we are trying to improve cases when there are too many dirty pages during
live migration.

I did some measurements of live migration some months ago on 10Gbps dedicated 
line,
two servers were directly connected, and checked that transferring only a few 
MBs of
memory took ms order of latency, even if I excluded other QEMU side overheads: 
it
matches simple math calculation.

In another test, I found that even in a relatively normal workload, it needed a 
few
seconds of pause at the last timing.

Juan has more data?

So, the current scheme is not scalable with respect to the number of dirty 
pages,
and administrators should control not to migrate during such workload if 
possible.

Server consolidation in the night will be OK, but dynamic load balancing
may not work well in such restrictions: I am now more interested in the
former.

Then, taking that in mind, I put the goal on 1K dirty pages, 4MB memory, when
I did the rmap optimization.  Now it takes a few ms or so for write protecting
such number of pages, IIRC: that is not so bad compared to the overall latency?


So, though I like O(1) method, I want to hear the expected improvements in a bit
more detail, if possible.

IIUC, even though O(1) is O(1) at the timing of GET DIRTY LOG, it needs O(N) 
write
protections with respect to the total number of dirty pages: distributed, but
actually each page fault, which should be logged, does some write protection?

In general, what kind of improvements actually needed for live migration?

Thanks,
Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-11-29 Thread Takuya Yoshikawa

(2011/11/30 14:02), Takuya Yoshikawa wrote:


IIUC, even though O(1) is O(1) at the timing of GET DIRTY LOG, it needs O(N) 
write
protections with respect to the total number of dirty pages: distributed, but
actually each page fault, which should be logged, does some write protection?


Sorry, was not precise.  It depends on the level, and not completely 
distributed.
But I think it is O(N), and the total number of costs will not change so much,
I guess.

Takuya



In general, what kind of improvements actually needed for live migration?

Thanks,
Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-11-29 Thread Xiao Guangrong
On 11/29/2011 08:01 PM, Avi Kivity wrote:

 On 11/29/2011 01:56 PM, Xiao Guangrong wrote:
 On 11/29/2011 07:20 PM, Avi Kivity wrote:


 We used to have a bitmap in a shadow page with a bit set for every slot
 pointed to by the page.  If we extend this to non-leaf pages (so, when
 we set a bit, we propagate it through its parent_ptes list), then we do
 the following on write fault:



 Thanks for the detail.

 Um, propagating slot bit to parent ptes is little slow, especially, it
 is the overload for no Xwindow guests which is dirty logged only in the
 migration(i guess most linux guests are running on this mode and migration
 is not frequent). No?
 
 You need to propagate very infrequently.  The first pte added to a page
 will need to propagate, but the second (if from the same slot, which is
 likely) will already have the bit set in the page, so we're assured it's
 set in all its parents.
 


What will happen if a guest page is unmapped or a shadow page is zapped?
It should immediately clear the slot bit of the shadow page and its
parent, it means it should propagate this clear slot bit event to all
parents, in the case of softmmu. zapping shadow page is frequently, maybe
it is unacceptable?

It does not like unsync bit which can be lazily cleared, because all bits
of hierarchy can be cleared when cr3 reload.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-11-29 Thread Xiao Guangrong
On 11/29/2011 10:03 PM, Avi Kivity wrote:

 On 11/29/2011 02:01 PM, Avi Kivity wrote:
 On 11/29/2011 01:56 PM, Xiao Guangrong wrote:
 On 11/29/2011 07:20 PM, Avi Kivity wrote:


 We used to have a bitmap in a shadow page with a bit set for every slot
 pointed to by the page.  If we extend this to non-leaf pages (so, when
 we set a bit, we propagate it through its parent_ptes list), then we do
 the following on write fault:



 Thanks for the detail.

 Um, propagating slot bit to parent ptes is little slow, especially, it
 is the overload for no Xwindow guests which is dirty logged only in the
 migration(i guess most linux guests are running on this mode and migration
 is not frequent). No?

 You need to propagate very infrequently.  The first pte added to a page
 will need to propagate, but the second (if from the same slot, which is
 likely) will already have the bit set in the page, so we're assured it's
 set in all its parents.
 
 btw, if you plan to work on this, let's agree on pseudocode/data
 structures first to minimize churn.  I'll also want this documented in
 mmu.txt.  Of course we can still end up with something different than
 planned, but let's at least try to think of the issues in advance.
 


Yeap, this work is interesting, i will keep researching it. ;)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-11-17 Thread Avi Kivity
On 11/14/2011 11:20 AM, Takuya Yoshikawa wrote:
 This is a revised version of my previous work.  I hope that
 the patches are more self explanatory than before.



Thanks, applied.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-11-16 Thread Takuya Yoshikawa

Adding qemu-devel to Cc.

(2011/11/14 21:39), Avi Kivity wrote:

On 11/14/2011 12:56 PM, Takuya Yoshikawa wrote:

(2011/11/14 19:25), Avi Kivity wrote:

On 11/14/2011 11:20 AM, Takuya Yoshikawa wrote:

This is a revised version of my previous work.  I hope that
the patches are more self explanatory than before.



It looks good.  I'll let Marcelo (or anyone else?) review it as well
before applying.

Do you have performance measurements?



For VGA, 30-40us became 3-5us when the display was quiet, with a
enough warmed up guest.



That's a nice improvement.


Near the criterion, the number was not different much from the
original version.

For live migration, I forgot the number but the result was good.
But my test case was not enough to cover every pattern, so I changed
the criterion to be a bit conservative.

 More tests may be able to find a better criterion.
 I am not in a hurry about this, so it is OK to add some tests
 before merging this.


I think we can merge is as is, it's clear we get an improvement.



I did a simple test to show numbers!

Here, a 4GB guest was being migrated locally during copying a file in it.


Case 1. corresponds to the original method and case 2 does to the optimized one.

Small numbers are, probably, from VGA:

Case 1. about 30us
Case 2. about 3us

Other numbers are from the system RAM (triggered by live migration):

Case 1. about 500us, 2000us
Case 2. about  80us, 2000us (not exactly averaged, see below for 
details)
* 2000us was when rmap was not used, so equal to that of case 1.

So I can say that my patch worked well for both VGA and live migration.

Takuya


=== measurement snippet ===

Case 1. kvm_mmu_slot_remove_write_access() only (same as the original method):

 qemu-system-x86-25413 [000]  6546.215009: funcgraph_entry:   | 
 write_protect_slot() {
 qemu-system-x86-25413 [000]  6546.215010: funcgraph_entry:  ! 2039.512 us 
|kvm_mmu_slot_remove_write_access();
 qemu-system-x86-25413 [000]  6546.217051: funcgraph_exit:   ! 2040.487 us 
|  }
 qemu-system-x86-25413 [002]  6546.217347: funcgraph_entry:   | 
 write_protect_slot() {
 qemu-system-x86-25413 [002]  6546.217349: funcgraph_entry:  ! 571.121 us | 
   kvm_mmu_slot_remove_write_access();
 qemu-system-x86-25413 [002]  6546.217921: funcgraph_exit:   ! 572.525 us | 
 }
 qemu-system-x86-25413 [000]  6546.314583: funcgraph_entry:   | 
 write_protect_slot() {
 qemu-system-x86-25413 [000]  6546.314585: funcgraph_entry:  + 29.598 us  | 
   kvm_mmu_slot_remove_write_access();
 qemu-system-x86-25413 [000]  6546.314616: funcgraph_exit:   + 31.053 us  | 
 }
 qemu-system-x86-25413 [000]  6546.314784: funcgraph_entry:   | 
 write_protect_slot() {
 qemu-system-x86-25413 [000]  6546.314785: funcgraph_entry:  ! 2002.591 us 
|kvm_mmu_slot_remove_write_access();
 qemu-system-x86-25413 [000]  6546.316788: funcgraph_exit:   ! 2003.537 us 
|  }
 qemu-system-x86-25413 [000]  6546.317082: funcgraph_entry:   | 
 write_protect_slot() {
 qemu-system-x86-25413 [000]  6546.317083: funcgraph_entry:  ! 624.445 us | 
   kvm_mmu_slot_remove_write_access();
 qemu-system-x86-25413 [000]  6546.317709: funcgraph_exit:   ! 625.861 us | 
 }
 qemu-system-x86-25413 [000]  6546.414261: funcgraph_entry:   | 
 write_protect_slot() {
 qemu-system-x86-25413 [000]  6546.414263: funcgraph_entry:  + 29.593 us  | 
   kvm_mmu_slot_remove_write_access();
 qemu-system-x86-25413 [000]  6546.414293: funcgraph_exit:   + 30.944 us  | 
 }
 qemu-system-x86-25413 [000]  6546.414528: funcgraph_entry:   | 
 write_protect_slot() {
 qemu-system-x86-25413 [000]  6546.414529: funcgraph_entry:  ! 1990.363 us 
|kvm_mmu_slot_remove_write_access();
 qemu-system-x86-25413 [000]  6546.416520: funcgraph_exit:   ! 1991.370 us 
|  }
 qemu-system-x86-25413 [000]  6546.416775: funcgraph_entry:   | 
 write_protect_slot() {
 qemu-system-x86-25413 [000]  6546.416776: funcgraph_entry:  ! 594.333 us | 
   kvm_mmu_slot_remove_write_access();
 qemu-system-x86-25413 [000]  6546.417371: funcgraph_exit:   ! 595.415 us | 
 }
 qemu-system-x86-25413 [000]  6546.514133: funcgraph_entry:   | 
 write_protect_slot() {
 qemu-system-x86-25413 [000]  6546.514135: funcgraph_entry:  + 24.032 us  | 
   kvm_mmu_slot_remove_write_access();
 qemu-system-x86-25413 [000]  6546.514160: funcgraph_exit:   + 25.074 us  | 
 }
 qemu-system-x86-25413 [000]  6546.514312: funcgraph_entry:   | 
 write_protect_slot() {
 qemu-system-x86-25413 [000]  6546.514313: funcgraph_entry:  ! 2035.365 us 
|kvm_mmu_slot_remove_write_access();
 qemu-system-x86-25413 [000]  6546.516349: funcgraph_exit:   ! 2036.298 us 
|  }
 qemu-system-x86-25413 [000]  6546.516642: funcgraph_entry:   | 
 write_protect_slot() {
 

Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-11-16 Thread Avi Kivity
On 11/16/2011 06:28 AM, Takuya Yoshikawa wrote:
 (2011/11/14 21:39), Avi Kivity wrote:
 There was a patchset from Peter Zijlstra that converted mmu notifiers to
 be preemptible, with that, we can convert the mmu spinlock to a mutex,
 I'll see what happened to it.

 Interesting!

 There is a third method of doing write protection, and that is by
 write-protecting at the higher levels of the paging hierarchy.  The
 advantage there is that write protection is O(1) no matter how large the
 guest is, or the number of dirty pages.

 To write protect all guest memory, we just write protect the 512 PTEs at
 the very top, and leave the rest alone.  When the guest writes to a
 page, we allow writes for the top-level PTE that faulted, and
 write-protect all the PTEs that it points to.

 One important point is that the guest, not GET DIRTY LOG caller, will pay
 for the write protection at the timing of faults.

I don't think there is a significant difference.  The number of write
faults does not change.  The amount of work done per fault does, but not
by much, thanks to the writeable bitmap.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-11-15 Thread Takuya Yoshikawa

(2011/11/14 21:39), Avi Kivity wrote:

There was a patchset from Peter Zijlstra that converted mmu notifiers to
be preemptible, with that, we can convert the mmu spinlock to a mutex,
I'll see what happened to it.


Interesting!


There is a third method of doing write protection, and that is by
write-protecting at the higher levels of the paging hierarchy.  The
advantage there is that write protection is O(1) no matter how large the
guest is, or the number of dirty pages.

To write protect all guest memory, we just write protect the 512 PTEs at
the very top, and leave the rest alone.  When the guest writes to a
page, we allow writes for the top-level PTE that faulted, and
write-protect all the PTEs that it points to.


One important point is that the guest, not GET DIRTY LOG caller, will pay
for the write protection at the timing of faults.

For live migration, it may be good because we have to make the guest memory
converge anyway.


We can combine it with your method by having a small bitmap (say, just
64 bits) per shadow page.  Each bit represents 8 PTEs (total 512 PTEs)
and is set if any of those PTEs are writeable.


Yes, there seem to be some good ways to make every case work well.

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-11-14 Thread Takuya Yoshikawa
This is a revised version of my previous work.  I hope that
the patches are more self explanatory than before.

Thanks,
Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-11-14 Thread Avi Kivity
On 11/14/2011 11:20 AM, Takuya Yoshikawa wrote:
 This is a revised version of my previous work.  I hope that
 the patches are more self explanatory than before.


It looks good.  I'll let Marcelo (or anyone else?) review it as well
before applying.

Do you have performance measurements?

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-11-14 Thread Takuya Yoshikawa

(2011/11/14 19:25), Avi Kivity wrote:

On 11/14/2011 11:20 AM, Takuya Yoshikawa wrote:

This is a revised version of my previous work.  I hope that
the patches are more self explanatory than before.



It looks good.  I'll let Marcelo (or anyone else?) review it as well
before applying.

Do you have performance measurements?



For VGA, 30-40us became 3-5us when the display was quiet, with a
enough warmed up guest.

Near the criterion, the number was not different much from the
original version.

For live migration, I forgot the number but the result was good.
But my test case was not enough to cover every pattern, so I changed
the criterion to be a bit conservative.

More tests may be able to find a better criterion.
I am not in a hurry about this, so it is OK to add some tests
before merging this.

But what I did not like was holding spin lock more than 100us or so
with the original version.  With this version, at least, the problem
should be cured some.

Takuya


One note:

kvm-unit-tests' dirty logging test was broken for 32-bit box: compile error.
I changed an idt to boot_idt and used it.

I do not know kvm-unit-tests well, so I want somebody to fix that officially.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-11-14 Thread Avi Kivity
On 11/14/2011 12:56 PM, Takuya Yoshikawa wrote:
 (2011/11/14 19:25), Avi Kivity wrote:
 On 11/14/2011 11:20 AM, Takuya Yoshikawa wrote:
 This is a revised version of my previous work.  I hope that
 the patches are more self explanatory than before.


 It looks good.  I'll let Marcelo (or anyone else?) review it as well
 before applying.

 Do you have performance measurements?


 For VGA, 30-40us became 3-5us when the display was quiet, with a
 enough warmed up guest.


That's a nice improvement.

 Near the criterion, the number was not different much from the
 original version.

 For live migration, I forgot the number but the result was good.
 But my test case was not enough to cover every pattern, so I changed
 the criterion to be a bit conservative.

 More tests may be able to find a better criterion.
 I am not in a hurry about this, so it is OK to add some tests
 before merging this.

I think we can merge is as is, it's clear we get an improvement.


 But what I did not like was holding spin lock more than 100us or so
 with the original version.  With this version, at least, the problem
 should be cured some.

There was a patchset from Peter Zijlstra that converted mmu notifiers to
be preemptible, with that, we can convert the mmu spinlock to a mutex,
I'll see what happened to it.

There is a third method of doing write protection, and that is by
write-protecting at the higher levels of the paging hierarchy.  The
advantage there is that write protection is O(1) no matter how large the
guest is, or the number of dirty pages.

To write protect all guest memory, we just write protect the 512 PTEs at
the very top, and leave the rest alone.  When the guest writes to a
page, we allow writes for the top-level PTE that faulted, and
write-protect all the PTEs that it points to.

We can combine it with your method by having a small bitmap (say, just
64 bits) per shadow page.  Each bit represents 8 PTEs (total 512 PTEs)
and is set if any of those PTEs are writeable.


 Takuya


 One note:

 kvm-unit-tests' dirty logging test was broken for 32-bit box: compile
 error.
 I changed an idt to boot_idt and used it.

 I do not know kvm-unit-tests well, so I want somebody to fix that
 officially.

I'll look into it.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html