Re: [PATCH 4/5] KVM: PPC: Book3s HV: Implement get_dirty_log using hardware changed bit

2011-12-25 Thread Takuya Yoshikawa

(2011/12/26 8:35), Paul Mackerras wrote:

On Fri, Dec 23, 2011 at 02:23:30PM +0100, Alexander Graf wrote:


So if I read things correctly, this is the only case you're setting
pages as dirty. What if you have the following:

   guest adds HTAB entry x
   guest writes to page mapped by x
   guest removes HTAB entry x
   host fetches dirty log


In that case the dirtiness is preserved in the setting of the
KVMPPC_RMAP_CHANGED bit in the rmap entry.  kvm_test_clear_dirty()
returns 1 if that bit is set (and clears it).  Using the rmap entry
for this is convenient because (a) we also use it for saving the
referenced bit when a HTAB entry is removed, and we can transfer both
R and C over in one operation; (b) we need to be able to save away the
C bit in real mode, and we already need to get the real-mode address
of the rmap entry -- if we wanted to save it in a dirty bitmap we'd
have to do an extra translation to get the real-mode address of the
dirty bitmap word; (c) to avoid SMP races, if we were asynchronously
setting bits in the dirty bitmap we'd have to do the double-buffering
thing that x86 does, which seems more complicated than using the rmap
entry (which we already have a lock bit for).


From my x86 dirty logging experience I have some concern about your code:
your code looks slow even when there is no/few dirty pages in the slot.

+   for (i = 0; i < memslot->npages; ++i) {
+   if (kvm_test_clear_dirty(kvm, rmapp))
+   __set_bit_le(i, map);
+   ++rmapp;
+   }

The check is being done for each page and this can be very expensive because
the number of pages is not small.

When we scan the dirty_bitmap 64 pages are checked at once and
the problem is not so significant.

Though I do not know well what kvm-ppc's dirty logging is aiming at, I guess
reporting cleanliness without noticeable delay to the user-space is important.

E.g. for VGA most of the cases are clean.  For live migration, the
chance of seeing complete clean slot is small but almost all cases
are sparse.




PS: Always CC kvm@vger for stuff that other might want to review
(basically all patches)


(Though I sometimes check kvm-ppc on the archives,)

GET_DIRTY_LOG thing will be welcome.

Takuya



So why do we have a separate kvm-ppc list then? :)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space

2010-05-17 Thread Takuya Yoshikawa



User allocated bitmaps have the advantage of reducing pinned memory.
However we have plenty more pinned memory allocated in memory slots, so
by itself, user allocated bitmaps don't justify this change.


In that sense, what do you think about the question I sent last week?

=== REPOST 1 ===
>>
>> mark_page_dirty is called with the mmu_lock spinlock held in set_spte.
>> Must find a way to move it outside of the spinlock section.
>>
>
> Oh, it's a serious problem. I have to consider it.

Avi, Marcelo,

Sorry but I have to say that mmu_lock spin_lock problem was completely out of
my mind. Although I looked through the code, it seems not easy to move the
set_bit_user to outside of spinlock section without breaking the semantics of
its protection.

So this may take some time to solve.

But personally, I want to do something for x86's "vmallc() every time" problem
even though moving dirty bitmaps to user space cannot be achieved soon.

In that sense, do you mind if we do double buffering without moving dirty 
bitmaps to
user space?

I know that the resource for vmalloc() is precious for x86 but even now, at the 
timing
of get_dirty_log, we use the same amount of memory as double buffering.
=== 1 END ===




Perhaps if we optimize memory slot write protection (I have some ideas
about this) we can make the performance improvement more pronounced.



It's really nice!

Even now we can measure the performance improvement by introducing switch ioctl
when guest is relatively idle, so the combination will be really effective!

=== REPOST 2 ===
>>
>> Can you post such a test, for an idle large guest?
>
> OK, I'll do!


Result of "low workload test" (running top during migration) first,

4GB guest
picked up slots[1](len=3757047808) only
*
get.org get.optswitch.opt

1060875 310292 190335
1076754 301295 188600
 655504 318284 196029
 529769 301471325
 694796  70216 221172
 651868 353073 196184
 543339 312865 213236
1061938  72785 203090
 689527 323901 249519
 621364 323881473
1063671  70703 192958
 915903 336318 174008
1046462 332384782
1037942  72783 190655
 680122 318305 243544
 688156 314935 193526
 558658 265934 190550
 652454 372135 196270
 660140  68613352
1101947 378642 186575
.........
*

As expected we've got the difference more clearly.

In this case, switch.opt reduced 1/3 (.1 msec) compared to get.opt
for each iteration.

And when the slot is cleaner, the ratio is bigger.
=== 2 END ===
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC][PATCH 7/12 not tested yet] PPC: introduce __set_bit() like function for bitmaps in user space

2010-05-12 Thread Takuya Yoshikawa




+static inline int set_bit_user_non_atomic(int nr, void __user *addr)
+{
+   u8 __user *p;
+   u8 val;
+
+   p = (u8 __user *)((unsigned long)addr + nr / BITS_PER_BYTE);



Does C do the + or the / first? Either way, I'd like to see brackets here :)


OK, I'll change like that! I like brackets style too :)




Alex



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space

2010-05-12 Thread Takuya Yoshikawa



[To ppc people]

Hi, Benjamin, Paul, Alex,

Please see the patches 6,7/12. I first say sorry for that I've not tested these
yet. In that sense, these may not be in the quality for precise reviews. But I
will be happy if you would give me any comments.

Alex, could you help me? Though I have a plan to get PPC box in the future,
currently I cannot test these.



Could you please point me to a git tree where everything's readily
applied? That would make testing a lot easier.


OK, I'll prepare one. Probably on sourceforge or somewhere like Kemari.

Thanks,
  Takuya




Alex



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC][PATCH RFC 10/12] KVM: move dirty bitmaps to user space

2010-05-11 Thread Takuya Yoshikawa



r = 0;
@@ -1195,11 +1232,16 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
gfn = unalias_gfn(kvm, gfn);
memslot = gfn_to_memslot_unaliased(kvm, gfn);
if (memslot&&  memslot->dirty_bitmap) {
-   unsigned long rel_gfn = gfn - memslot->base_gfn;
+   int nr = generic_le_bit_offset(gfn - memslot->base_gfn);

-   generic___set_le_bit(rel_gfn, memslot->dirty_bitmap);
+   if (kvm_set_bit_user(nr, memslot->dirty_bitmap))
+   goto out_fault;


mark_page_dirty is called with the mmu_lock spinlock held in set_spte.
Must find a way to move it outside of the spinlock section.



Oh, it's a serious problem. I have to consider it.


Thanks,
  Takuya
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC][PATCH 11/12] KVM: introduce new API for getting/switching dirty bitmaps

2010-05-11 Thread Takuya Yoshikawa



One alternative would be:

KVM_SWITCH_DIRTY_LOG passing the address of a bitmap. If the active
bitmap was clean, it returns 0, no switch performed. If the active
bitmap was dirty, the kernel switches to the new bitmap and returns 1.

And the responsability of cleaning the new bitmap could also be left
for userspace.



That is a beautiful approach but we can do that only when we give up using
GET api.


I follow you and Avi's advice about that kind of maintenance policy!
What do you think?


If you introduce a switch ioctl that frees the bitmap vmalloc'ed by the
current set_memory_region (if its not freed already), after pointing the
memslot to the user supplied one, it should be fine?



You mean switching from vmalloc'ed(not do_mmap'ed) one to user supplied one?

It may be possible but makes things really complicated in my view:
until some point we use set_bit, and then use set_bit_user, etc.

IMO:
 - # of slots is limited and the size of dirty_bitmap_old pointer is not 
problematic.
 - Both user side and kernel side need not allocate buffers every time and once
   paired buffers are registered, we will reuse the buffers until user side 
orders
   to stop logging.
 - We have a tiny advantage if we need not copy_from_user to get a bitmap 
address
   for switch ioctl.

 => So I think having two __user bitmaps is not a bad thing.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space

2010-05-11 Thread Takuya Yoshikawa




In usual workload, the number of dirty pages varies a lot for each
iteration
and we should gain really a lot for relatively clean cases.


Can you post such a test, for an idle large guest?


OK, I'll do!



Result of "low workload test" (running top during migration) first,

4GB guest
picked up slots[1](len=3757047808) only
*
get.org get.optswitch.opt

1060875 310292 190335
1076754 301295 188600
 655504 318284 196029
 529769 301471325
 694796  70216 221172
 651868 353073 196184
 543339 312865 213236
1061938  72785 203090
 689527 323901 249519
 621364 323881473
1063671  70703 192958
 915903 336318 174008
1046462 332384782
1037942  72783 190655
 680122 318305 243544
 688156 314935 193526
 558658 265934 190550
 652454 372135 196270
 660140  68613352
1101947 378642 186575
.........
*

As expected we've got the difference more clearly.

In this case, switch.opt reduced 1/3 (.1 msec) compared to get.opt
for each iteration.

And when the slot is cleaner, the ratio is bigger.










--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC][PATCH 11/12] KVM: introduce new API for getting/switching dirty bitmaps

2010-05-10 Thread Takuya Yoshikawa

(2010/05/11 12:43), Marcelo Tosatti wrote:

On Tue, May 04, 2010 at 10:08:21PM +0900, Takuya Yoshikawa wrote:

+How to Get
+
+Before calling this, you have to set the slot member of kvm_user_dirty_log
+to indicate the target memory slot.
+
+struct kvm_user_dirty_log {
+   __u32 slot;
+   __u32 flags;
+   __u64 dirty_bitmap;
+   __u64 dirty_bitmap_old;
+};
+
+The addresses will be set in the paired members: dirty_bitmap and _old.


Why not pass the bitmap address to the kernel, instead of having the
kernel allocate it. Because apparently you plan to do that in a new
generation anyway?


Yes, we want to make qemu allocate and free bitmaps in the future.
But currently, those are strictly tied with memory slot registration and
changing it in one patch set is really difficult.

Anyway, we need kernel side allocation mechanism to keep the current
GET_DIRTY_LOG api. I don't mind not exporting kernel allocated bitmaps
in this patch set and later introducing a bitmap registration mechanism
in another patch set.

As this RFC is suggesting, kernel side double buffering infrastructure is
designed as general as possible and adding a new API like SWITCH can be done
naturally.



Also, why does the kernel need to know about different bitmaps?


Because we need to support current GET API. We don't have any way to get
a new bitmap in the case of GET and we don't want to do_mmap() every time
for GET.



One alternative would be:

KVM_SWITCH_DIRTY_LOG passing the address of a bitmap. If the active
bitmap was clean, it returns 0, no switch performed. If the active
bitmap was dirty, the kernel switches to the new bitmap and returns 1.

And the responsability of cleaning the new bitmap could also be left
for userspace.



That is a beautiful approach but we can do that only when we give up using
GET api.


I follow you and Avi's advice about that kind of maintenance policy!
What do you think?
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space

2010-05-10 Thread Takuya Yoshikawa




get.org get.opt switch.opt

slots[7].len=32768 278379 66398 64024
slots[8].len=32768 181246 270 160
slots[7].len=32768 263961 64673 64494
slots[8].len=32768 181655 265 160
slots[7].len=32768 263736 64701 64610
slots[8].len=32768 182785 267 160
slots[7].len=32768 260925 65360 65042
slots[8].len=32768 182579 264 160
slots[7].len=32768 267823 65915 65682
slots[8].len=32768 186350 271 160

At a glance, we know our optimization improved significantly compared
to the original get dirty log ioctl. This is true for both get.opt and
switch.opt. This has a really big impact for the personal KVM users who
drive KVM in GUI mode on their usual PCs.

Next, we notice that switch.opt improved a hundred nano seconds or so for
these slots. Although this may sound a bit tiny improvement, we can feel
this as a difference of GUI's responses like mouse reactions.


100 ns... this is a bit on the low side (and if you can measure it
interactively you have much better reflexes than I).


To feel the difference, please try GUI on your PC with our patch series!


No doubt get.org -> get.opt is measurable, but get.opt->switch.opt is
problematic. Have you tried profiling to see where the time is spent
(well I can guess, clearing the write access from the sptes).


Sorry but no, and I agree with your guess.
Anyway, I want to do some profiling to confirm this guess.


BTW, If we only think about performance improvement of time, optimized
get(get.opt) may be enough at this stage.

But if we consider the future expansion like using user allocated bitmaps,
new API's introduced for switch.opt won't become waste, I think, because we
need a structure to get and export bitmap addresses.




In usual workload, the number of dirty pages varies a lot for each
iteration
and we should gain really a lot for relatively clean cases.


Can you post such a test, for an idle large guest?


OK, I'll do!





___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro

2010-05-10 Thread Takuya Yoshikawa

(2010/05/06 22:38), Arnd Bergmann wrote:

On Wednesday 05 May 2010, Takuya Yoshikawa wrote:

Date:
Yesterday 04:59:24

That's why the bitmaps are defined as little endian u64 aligned, even on
big endian 32-bit systems.  Little endian bitmaps are wordsize agnostic,
and u64 alignment ensures we can use long-sized bitops on mixed size
systems.


Ok, I see.


There was a suggestion to propose set_le_bit_user() kind of macros.
But what I thought was these have a constraint you two explained and seemed to 
be
a little bit specific to some area, like KVM.

So I decided to propose just the offset calculation macro.


I'm not sure I understand how this macro is going to be used though.
If you are just using this in kernel space, that's fine, please go for
it.


Yes, I'm just using in kernel space: qemu has its own endian related helpers.

So if you allow us to place this macro in asm-generic/bitops/* it will help us.

Avi, what do you think? Do you want to place it in kvm.h ?




However, if the intention is to use the same macro in user space, putting
it into asm-generic/bitops/* is not going to help, because those headers
are not available in user space, and I wouldn't want to change that.

The definition of the macro is not part of the ABI, so just duplicate
it in KVM if you need it there.

Arnd


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro

2010-05-10 Thread Takuya Yoshikawa




Yes, I'm just using in kernel space: qemu has its own endian related helpers.

So if you allow us to place this macro in asm-generic/bitops/* it will help us.


No problem at all then. Thanks for the explanation.

Acked-by: Arnd Bergmann


Thanks you both. I will add your Acked-by from now on!

  Takuya
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro

2010-05-04 Thread Takuya Yoshikawa
On Tue, 04 May 2010 19:08:23 +0300
Avi Kivity  wrote:

> On 05/04/2010 06:03 PM, Arnd Bergmann wrote:
> > On Tuesday 04 May 2010, Takuya Yoshikawa wrote:
...
> >> So let us use the le bit offset calculation part by defining it as a new
> >> macro: generic_le_bit_offset() .
> >>  
> > Does this work correctly if your user space is 32 bits (i.e. unsigned long
> > is different size in user space and kernel) in both big- and little-endian
> > systems?
> >
> > I'm not sure about all the details, but I think you cannot in general share
> > bitmaps between user space and kernel because of this.
> >
> 
> That's why the bitmaps are defined as little endian u64 aligned, even on 
> big endian 32-bit systems.  Little endian bitmaps are wordsize agnostic, 
> and u64 alignment ensures we can use long-sized bitops on mixed size 
> systems.

There was a suggestion to propose set_le_bit_user() kind of macros.
But what I thought was these have a constraint you two explained and seemed to 
be
a little bit specific to some area, like KVM.

So I decided to propose just the offset calculation macro.

  Thanks, Takuya
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC][PATCH 12/12 sample] qemu-kvm: use new API for getting/switching dirty bitmaps

2010-05-04 Thread Takuya Yoshikawa
We use new API for light dirty log access if KVM supports it.

This conflicts with Marcelo's patches. So please take this as a sample patch.

Signed-off-by: Takuya Yoshikawa 
---
 kvm/include/linux/kvm.h |   11 ++
 qemu-kvm.c  |   81 ++-
 qemu-kvm.h  |1 +
 3 files changed, 85 insertions(+), 8 deletions(-)

diff --git a/kvm/include/linux/kvm.h b/kvm/include/linux/kvm.h
index 6485981..efd9538 100644
--- a/kvm/include/linux/kvm.h
+++ b/kvm/include/linux/kvm.h
@@ -317,6 +317,14 @@ struct kvm_dirty_log {
};
 };
 
+/* for KVM_GET_USER_DIRTY_LOG_ADDR */
+struct kvm_user_dirty_log {
+   __u32 slot;
+   __u32 flags;
+   __u64 dirty_bitmap;
+   __u64 dirty_bitmap_old;
+};
+
 /* for KVM_SET_SIGNAL_MASK */
 struct kvm_signal_mask {
__u32 len;
@@ -499,6 +507,7 @@ struct kvm_ioeventfd {
 #define KVM_CAP_PPC_SEGSTATE 43
 
 #define KVM_CAP_PCI_SEGMENT 47
+#define KVM_CAP_USER_DIRTY_LOG 55
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -595,6 +604,8 @@ struct kvm_clock_data {
struct kvm_userspace_memory_region)
 #define KVM_SET_TSS_ADDR  _IO(KVMIO,   0x47)
 #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO,  0x48, __u64)
+#define KVM_GET_USER_DIRTY_LOG_ADDR _IOW(KVMIO,  0x49, struct 
kvm_user_dirty_log)
+#define KVM_SWITCH_DIRTY_LOG  _IO(KVMIO,   0x4a)
 /* Device model IOC */
 #define KVM_CREATE_IRQCHIP_IO(KVMIO,   0x60)
 #define KVM_IRQ_LINE  _IOW(KVMIO,  0x61, struct kvm_irq_level)
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 91f0222..98777f0 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -143,6 +143,8 @@ struct slot_info {
 unsigned long userspace_addr;
 unsigned flags;
 int logging_count;
+unsigned long *dirty_bitmap;
+unsigned long *dirty_bitmap_old;
 };
 
 struct slot_info slots[KVM_MAX_NUM_MEM_REGIONS];
@@ -232,6 +234,29 @@ int kvm_is_containing_region(kvm_context_t kvm, unsigned 
long phys_addr,
 return 1;
 }
 
+static int kvm_user_dirty_log_works(void)
+{
+return kvm_state->user_dirty_log;
+}
+
+static int kvm_set_user_dirty_log(int slot)
+{
+int r;
+struct kvm_user_dirty_log dlog;
+
+dlog.slot = slot;
+r = kvm_vm_ioctl(kvm_state, KVM_GET_USER_DIRTY_LOG_ADDR, &dlog);
+if (r < 0) {
+DPRINTF("KVM_GET_USER_DIRTY_LOG_ADDR failed: %s\n", strerror(-r));
+return r;
+}
+slots[slot].dirty_bitmap = (unsigned long *)
+   ((unsigned long)dlog.dirty_bitmap);
+slots[slot].dirty_bitmap_old = (unsigned long *)
+   ((unsigned long)dlog.dirty_bitmap_old);
+return r;
+}
+
 /*
  * dirty pages logging control
  */
@@ -265,8 +290,16 @@ static int kvm_dirty_pages_log_change(kvm_context_t kvm,
 DPRINTF("slot %d start %llx len %llx flags %x\n",
 mem.slot, mem.guest_phys_addr, mem.memory_size, mem.flags);
 r = kvm_vm_ioctl(kvm_state, KVM_SET_USER_MEMORY_REGION, &mem);
-if (r < 0)
+if (r < 0) {
 fprintf(stderr, "%s: %m\n", __FUNCTION__);
+return r;
+}
+}
+if (flags & KVM_MEM_LOG_DIRTY_PAGES) {
+r = kvm_set_user_dirty_log(slot);
+} else {
+slots[slot].dirty_bitmap = NULL;
+slots[slot].dirty_bitmap_old = NULL;
 }
 return r;
 }
@@ -589,7 +622,6 @@ int kvm_register_phys_mem(kvm_context_t kvm,
   unsigned long phys_start, void *userspace_addr,
   unsigned long len, int log)
 {
-
 struct kvm_userspace_memory_region memory = {
 .memory_size = len,
 .guest_phys_addr = phys_start,
@@ -608,6 +640,9 @@ int kvm_register_phys_mem(kvm_context_t kvm,
 fprintf(stderr, "create_userspace_phys_mem: %s\n", strerror(-r));
 return -1;
 }
+if (log) {
+r = kvm_set_user_dirty_log(memory.slot);
+}
 register_slot(memory.slot, memory.guest_phys_addr, memory.memory_size,
   memory.userspace_addr, memory.flags);
 return 0;
@@ -652,6 +687,8 @@ void kvm_destroy_phys_mem(kvm_context_t kvm, unsigned long 
phys_start,
 fprintf(stderr, "destroy_userspace_phys_mem: %s", strerror(-r));
 return;
 }
+slots[memory.slot].dirty_bitmap = NULL;
+slots[memory.slot].dirty_bitmap_old = NULL;
 
 free_slot(memory.slot);
 }
@@ -692,6 +729,21 @@ int kvm_get_dirty_pages(kvm_context_t kvm, unsigned long 
phys_addr, void *buf)
 return kvm_get_map(kvm, KVM_GET_DIRTY_LOG, slot, buf);
 }
 
+static int kvm_switch_map(int slot)
+{
+int r;
+
+r = kvm_vm_ioctl(kvm_state, KVM_SWITCH_DIRTY_LOG, slot);
+if (r == 0) {
+unsigned long *dirty_bitmap;
+
+dirty_bitmap = slots[slot].dirty_bitmap;
+slots[slot].dirty_bitmap = slots[slot].dirty_bitmap_old;
+slots[slot].dirty_bitmap_old = dirty_bitmap;
+}
+   

[RFC][PATCH 11/12] KVM: introduce new API for getting/switching dirty bitmaps

2010-05-04 Thread Takuya Yoshikawa
Now that dirty bitmaps are accessible from user space, we export the
addresses of these to achieve zero-copy dirty log check:

  KVM_GET_USER_DIRTY_LOG_ADDR

We also need an API for triggering dirty bitmap switch to take the full
advantage of double buffered bitmaps.

  KVM_SWITCH_DIRTY_LOG

See the documentation in this patch for precise explanations.

About performance improvement: the most important feature of switch API
is the lightness. In our test, this appeared in the form of improved
responses for GUI manipulations.

Signed-off-by: Takuya Yoshikawa 
Signed-off-by: Fernando Luis Vazquez Cao 
CC: Avi Kivity 
CC: Alexander Graf 
---
 Documentation/kvm/api.txt |   87 +
 arch/ia64/kvm/kvm-ia64.c  |   27 +-
 arch/powerpc/kvm/book3s.c |   18 +++--
 arch/x86/kvm/x86.c|   44 ---
 include/linux/kvm.h   |   11 ++
 include/linux/kvm_host.h  |4 ++-
 virt/kvm/kvm_main.c   |   63 +
 7 files changed, 220 insertions(+), 34 deletions(-)

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index a237518..c106c83 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -892,6 +892,93 @@ arguments.
 This ioctl is only useful after KVM_CREATE_IRQCHIP.  Without an in-kernel
 irqchip, the multiprocessing state must be maintained by userspace.
 
+4.39 KVM_GET_USER_DIRTY_LOG_ADDR
+
+Capability: KVM_CAP_USER_DIRTY_LOG (>=1 see below)
+Architectures: all
+Type: vm ioctl
+Parameters: struct kvm_user_dirty_log (in/out)
+Returns: 0 on success, -1 on error
+
+This ioctl makes it possible to use KVM_SWITCH_DIRTY_LOG (see 4.40) instead
+of the old dirty log manipulation by KVM_GET_DIRTY_LOG.
+
+A bit about KVM_CAP_USER_DIRTY_LOG
+
+Before this ioctl was introduced, dirty bitmaps for dirty page logging were
+allocated in the kernel's memory space.  But we have now moved them to user
+space to get more flexiblity and performance.  To achive this move without
+breaking the compatibility, we will split KVM_CAP_USER_DIRTY_LOG capability
+into a few generations which can be identified by its check extension
+return values.
+
+This KVM_GET_USER_DIRTY_LOG_ADDR belongs to the first generation with the
+KVM_SWITCH_DIRTY_LOG (4.40) and must be supported by all generations.
+
+What you get
+
+By using this, you can get paired bitmap addresses which are accessible from
+user space.  See the explanation in 4.40 for the roles of these two bitmaps.
+
+How to Get
+
+Before calling this, you have to set the slot member of kvm_user_dirty_log
+to indicate the target memory slot.
+
+struct kvm_user_dirty_log {
+   __u32 slot;
+   __u32 flags;
+   __u64 dirty_bitmap;
+   __u64 dirty_bitmap_old;
+};
+
+The addresses will be set in the paired members: dirty_bitmap and _old.
+
+Note
+
+In generation 1, we support bitmaps which are created in kernel but do not
+support bitmaps created by users.  This means bitmap creation/destruction
+are done same as before when you instruct KVM by KVM_SET_USER_MEMORY_REGION
+(see 4.34) to start/stop logging.  Please do not try to free the exported
+bitmaps by yourself, or KVM will access the freed area and end with fault.
+
+4.40 KVM_SWITCH_DIRTY_LOG
+
+Capability: KVM_CAP_USER_DIRTY_LOG (>=1 see 4.39)
+Architectures: all
+Type: vm ioctl
+Parameters: memory slot id
+Returns: 0 if switched, 1 if not (slot was clean), -1 on error
+
+This ioctl allows you to switch the dirty log to the next one: a newer
+ioctl for getting dirty page logs than KVM_GET_DIRTY_LOG (see 4.7 for the
+explanation about dirty page logging, log format is not changed).
+
+If you have the capability KVM_CAP_USER_DIRTY_LOG, using this is strongly
+recommended than using KVM_GET_DIRTY_LOG because this does not need buffer
+copy between kernel and user space.
+
+How to Use
+
+Before calling this, you have to remember the paired addresses of dirty
+bitmaps which can be obtained by KVM_GET_USER_DIRTY_LOG_ADDR (see 4.39):
+dirty_bitmap (being used now by kernel) and dirty_bitmap_old (not being
+used now and containing the last log).
+
+After calling this, the role of these bitmaps will change like this;
+If the return value was 0, kernel cleared dirty_bitmap_old and began to use
+it for the next logging, so that you can use the cold dirty_bitmap to check
+the log since the last switch.  If the return value was 1, all pages were not
+dirty and bitmap switch was not done.  Note that remembering which bitmap is
+now active is your responsibility.  So you have to update your remembering
+when you get the return value 0.
+
+Note
+
+Bitmap switch may also occur when you call KVM_GET_DIRTY_LOG.  Please use
+either one, preferably this one, only to avoid extra confusion.  We do not
+guarantee on which condition KVM_GET_DIRTY_LOG causes bitmap switch.
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
diff --git a/arch/ia64/kvm/k

[RFC][PATCH RFC 10/12] KVM: move dirty bitmaps to user space

2010-05-04 Thread Takuya Yoshikawa
We move dirty bitmaps to user space.

 - Allocation and destruction: we use do_mmap() and do_munmap().
   The new bitmap space is twice longer than the original one and we
   use the additional space for double buffering: this makes it
   possible to update the active bitmap while letting the user space
   read the other one safely. For x86, we can also remove the vmalloc()
   in kvm_vm_ioctl_get_dirty_log().

 - Bitmap manipulations: we replace all functions which access dirty
   bitmaps with *_user() functions.

 - For ia64: moving the dirty bitmaps of memory slots does not affect
   ia64 much because it's using a different place to store dirty logs
   rather than the dirty bitmaps of memory slots: all we have to change
   are sync and get of dirty log, so we don't need set_bit_user like
   functions for ia64.

Signed-off-by: Takuya Yoshikawa 
Signed-off-by: Fernando Luis Vazquez Cao 
CC: Avi Kivity 
CC: Alexander Graf 
---
 arch/ia64/kvm/kvm-ia64.c  |   15 +-
 arch/powerpc/kvm/book3s.c |5 +++-
 arch/x86/kvm/x86.c|   25 --
 include/linux/kvm_host.h  |3 +-
 virt/kvm/kvm_main.c   |   62 +---
 5 files changed, 82 insertions(+), 28 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 17fd65c..03503e6 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1823,11 +1823,19 @@ static int kvm_ia64_sync_dirty_log(struct kvm *kvm,
n = kvm_dirty_bitmap_bytes(memslot);
base = memslot->base_gfn / BITS_PER_LONG;
 
+   r = -EFAULT;
+   if (!access_ok(VERIFY_WRITE, memslot->dirty_bitmap, n))
+   goto out;
+
for (i = 0; i < n/sizeof(long); ++i) {
if (dirty_bitmap[base + i])
memslot->is_dirty = true;
 
-   memslot->dirty_bitmap[i] = dirty_bitmap[base + i];
+   if (__put_user(dirty_bitmap[base + i],
+  &memslot->dirty_bitmap[i])) {
+   r = -EFAULT;
+   goto out;
+   }
dirty_bitmap[base + i] = 0;
}
r = 0;
@@ -1858,7 +1866,10 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
if (memslot->is_dirty) {
kvm_flush_remote_tlbs(kvm);
n = kvm_dirty_bitmap_bytes(memslot);
-   memset(memslot->dirty_bitmap, 0, n);
+   if (clear_user(memslot->dirty_bitmap, n)) {
+   r = -EFAULT;
+   goto out;
+   }
memslot->is_dirty = false;
}
r = 0;
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 4b074f1..2a31d2f 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -1210,7 +1210,10 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
kvmppc_mmu_pte_pflush(vcpu, ga, ga_end);
 
n = kvm_dirty_bitmap_bytes(memslot);
-   memset(memslot->dirty_bitmap, 0, n);
+   if (clear_user(memslot->dirty_bitmap, n)) {
+   r = -EFAULT;
+   goto out;
+   }
memslot->is_dirty = false;
}
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 023c7f8..32a3d94 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2760,40 +2760,37 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
/* If nothing is dirty, don't bother messing with page tables. */
if (memslot->is_dirty) {
struct kvm_memslots *slots, *old_slots;
-   unsigned long *dirty_bitmap;
+   unsigned long __user *dirty_bitmap;
+   unsigned long __user *dirty_bitmap_old;
 
spin_lock(&kvm->mmu_lock);
kvm_mmu_slot_remove_write_access(kvm, log->slot);
spin_unlock(&kvm->mmu_lock);
 
-   r = -ENOMEM;
-   dirty_bitmap = vmalloc(n);
-   if (!dirty_bitmap)
+   dirty_bitmap = memslot->dirty_bitmap;
+   dirty_bitmap_old = memslot->dirty_bitmap_old;
+   r = -EFAULT;
+   if (clear_user(dirty_bitmap_old, n))
goto out;
-   memset(dirty_bitmap, 0, n);
 
r = -ENOMEM;
slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
-   if (!slots) {
-   vfree(dirty_bitmap);
+   if (!slots)
goto out;
-   }
+
memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
-   slots->memslots[log->slot].dirty_bitmap = dirty_bitmap;
+   slots->memslots[log->slot].dirty_bitmap = dirty_bitmap_old;
+   slots->memslots[log->slot].dirty_bitmap_old = dirty_bitmap;

[RFC][PATCH 9/12] KVM: introduce a wrapper function of set_bit_user_non_atomic()

2010-05-04 Thread Takuya Yoshikawa
This is not to break the build for other architectures than x86 and ppc.

Signed-off-by: Takuya Yoshikawa 
Signed-off-by: Fernando Luis Vazquez Cao 
---
 arch/ia64/include/asm/kvm_host.h|5 +
 arch/powerpc/include/asm/kvm_host.h |6 ++
 arch/s390/include/asm/kvm_host.h|6 ++
 arch/x86/include/asm/kvm_host.h |5 +
 4 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index a362e67..938041b 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -589,6 +589,11 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu);
 int kvm_pal_emul(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run);
 void kvm_sal_emul(struct kvm_vcpu *vcpu);
 
+static inline int kvm_set_bit_user(int nr, void __user *addr)
+{
+   return 0;
+}
+
 #endif /* __ASSEMBLY__*/
 
 #endif
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 0c9ad86..9463524 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define KVM_MAX_VCPUS 1
 #define KVM_MEMORY_SLOTS 32
@@ -287,4 +288,9 @@ struct kvm_vcpu_arch {
 #endif
 };
 
+static inline int kvm_set_bit_user(int nr, void __user *addr)
+{
+   return set_bit_user_non_atomic(nr, addr);
+}
+
 #endif /* __POWERPC_KVM_HOST_H__ */
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 27605b6..36710ee 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -238,4 +238,10 @@ struct kvm_arch{
 };
 
 extern int sie64a(struct kvm_s390_sie_block *, unsigned long *);
+
+static inline int kvm_set_bit_user(int nr, void __user *addr)
+{
+   return 0;
+}
+
 #endif
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3f0007b..9e22df9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -795,4 +795,9 @@ void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
 
 bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
 
+static inline int kvm_set_bit_user(int nr, void __user *addr)
+{
+   return set_bit_user_non_atomic(nr, addr);
+}
+
 #endif /* _ASM_X86_KVM_HOST_H */
-- 
1.7.0.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro

2010-05-04 Thread Takuya Yoshikawa
Although we can use *_le_bit() helpers to treat bitmaps le arranged,
having le bit offset calculation as a seperate macro gives us more freedom.

For example, KVM has le arranged dirty bitmaps for VGA, live-migration
and they are used in user space too. To avoid bitmap copies between kernel
and user space, we want to update the bitmaps in user space directly.
To achive this, le bit offset with *_user() functions help us a lot.

So let us use the le bit offset calculation part by defining it as a new
macro: generic_le_bit_offset() .

Signed-off-by: Takuya Yoshikawa 
CC: Arnd Bergmann 
---
 include/asm-generic/bitops/le.h |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h
index 80e3bf1..ee445fb 100644
--- a/include/asm-generic/bitops/le.h
+++ b/include/asm-generic/bitops/le.h
@@ -9,6 +9,8 @@
 
 #if defined(__LITTLE_ENDIAN)
 
+#define generic_le_bit_offset(nr)  (nr)
+
 #define generic_test_le_bit(nr, addr) test_bit(nr, addr)
 #define generic___set_le_bit(nr, addr) __set_bit(nr, addr)
 #define generic___clear_le_bit(nr, addr) __clear_bit(nr, addr)
@@ -25,6 +27,8 @@
 
 #elif defined(__BIG_ENDIAN)
 
+#define generic_le_bit_offset(nr)  ((nr) ^ BITOP_LE_SWIZZLE)
+
 #define generic_test_le_bit(nr, addr) \
test_bit((nr) ^ BITOP_LE_SWIZZLE, (addr))
 #define generic___set_le_bit(nr, addr) \
-- 
1.7.0.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC][PATCH 7/12 not tested yet] PPC: introduce __set_bit() like function for bitmaps in user space

2010-05-04 Thread Takuya Yoshikawa
During the work of KVM's dirty page logging optimization, we encountered
the need of manipulating bitmaps in user space efficiantly. To achive this,
we introduce a uaccess function for setting a bit in user space following
Avi's suggestion.

  KVM is now using dirty bitmaps for live-migration and VGA. Although we need
  to update them from kernel side, copying them every time for updating the
  dirty log is a big bottleneck. Especially, we tested that zero-copy bitmap
  manipulation improves responses of GUI manipulations a lot.

We also found one similar need in drivers/vhost/vhost.c in which the author
implemented set_bit_to_user() locally using inefficient functions: see TODO
at the top of that.

Probably, this kind of need would be common for virtualization area.

So we introduce a function set_bit_user_non_atomic().

Signed-off-by: Takuya Yoshikawa 
Signed-off-by: Fernando Luis Vazquez Cao 
CC: Alexander Graf 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
---
 arch/powerpc/include/asm/uaccess.h |   19 +++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index 3a01ce8..f878326 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -321,6 +321,25 @@ do {   
\
__gu_err;   \
 })
 
+static inline int set_bit_user_non_atomic(int nr, void __user *addr)
+{
+   u8 __user *p;
+   u8 val;
+
+   p = (u8 __user *)((unsigned long)addr + nr / BITS_PER_BYTE);
+   if (!access_ok(VERIFY_WRITE, p, 1))
+   return -EFAULT;
+
+   if (__get_user(val, p))
+   return -EFAULT;
+
+   val |= 1U << (nr % BITS_PER_BYTE);
+   if (__put_user(val, p))
+   return -EFAULT;
+
+   return 0;
+}
+
 
 /* more complex routines */
 
-- 
1.7.0.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC][PATCH 6/12 not tested yet] PPC: introduce copy_in_user() for 32-bit

2010-05-04 Thread Takuya Yoshikawa
During the work of KVM's dirty page logging optimization, we encountered
the need of copy_in_user() for 32-bit ppc and x86: these will be used for
manipulating dirty bitmaps in user space.

So we implement copy_in_user() for 32-bit with __copy_tofrom_user().

Signed-off-by: Takuya Yoshikawa 
Signed-off-by: Fernando Luis Vazquez Cao 
CC: Alexander Graf 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
---
 arch/powerpc/include/asm/uaccess.h |   17 +
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index bd0fb84..3a01ce8 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -359,6 +359,23 @@ static inline unsigned long copy_to_user(void __user *to,
return n;
 }
 
+static inline unsigned long copy_in_user(void __user *to,
+   const void __user *from, unsigned long n)
+{
+   unsigned long over;
+
+   if (likely(access_ok(VERIFY_READ, from, n) &&
+   access_ok(VERIFY_WRITE, to, n)))
+   return __copy_tofrom_user(to, from, n);
+   if (((unsigned long)from < TASK_SIZE) ||
+   ((unsigned long)to < TASK_SIZE)) {
+   over = max((unsigned long)from, (unsigned long)to)
+   + n - TASK_SIZE;
+   return __copy_tofrom_user(to, from, n - over) + over;
+   }
+   return n;
+}
+
 #else /* __powerpc64__ */
 
 #define __copy_in_user(to, from, size) \
-- 
1.7.0.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC][PATCH 5/12] x86: introduce __set_bit() like function for bitmaps in user space

2010-05-04 Thread Takuya Yoshikawa
During the work of KVM's dirty page logging optimization, we encountered
the need of manipulating bitmaps in user space efficiantly. To achive this,
we introduce a uaccess function for setting a bit in user space following
Avi's suggestion.

  KVM is now using dirty bitmaps for live-migration and VGA. Although we need
  to update them from kernel side, copying them every time for updating the
  dirty log is a big bottleneck. Especially, we tested that zero-copy bitmap
  manipulation improves responses of GUI manipulations a lot.

We also found one similar need in drivers/vhost/vhost.c in which the author
implemented set_bit_to_user() locally using inefficient functions: see TODO
at the top of that.

Probably, this kind of need would be common for virtualization area.

So we introduce a macro set_bit_user_non_atomic() following the implementation
style of x86's uaccess functions.

Note: there is a one restriction to this macro: bitmaps must be 64-bit
aligned (see the comment in this patch).

Signed-off-by: Takuya Yoshikawa 
Signed-off-by: Fernando Luis Vazquez Cao 
CC: Avi Kivity 
Cc: Thomas Gleixner 
CC: Ingo Molnar 
Cc: "H. Peter Anvin" 
---
 arch/x86/include/asm/uaccess.h |   39 +++
 1 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index abd3e0e..3138e65 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -98,6 +98,45 @@ struct exception_table_entry {
 
 extern int fixup_exception(struct pt_regs *regs);
 
+/**
+ * set_bit_user_non_atomic: - set a bit of a bitmap in user space.
+ * @nr:   Bit offset.
+ * @addr: Base address of a bitmap in user space.
+ *
+ * Context: User context only.  This function may sleep.
+ *
+ * This macro set a bit of a bitmap in user space.
+ *
+ * Restriction: the bitmap pointed to by @addr must be 64-bit aligned:
+ * the kernel accesses the bitmap by its own word length, so bitmaps
+ * allocated by 32-bit processes may cause fault.
+ *
+ * Returns zero on success, or -EFAULT on error.
+ */
+#define __set_bit_user_non_atomic_asm(nr, addr, err, errret)   \
+   asm volatile("1:bts %1,%2\n"\
+"2:\n" \
+".section .fixup,\"ax\"\n" \
+"3:mov %3,%0\n"\
+"  jmp 2b\n"   \
+".previous\n"  \
+_ASM_EXTABLE(1b, 3b)   \
+: "=r"(err)\
+: "r" (nr), "m" (__m(addr)), "i" (errret), "0" (err))
+
+#define set_bit_user_non_atomic(nr, addr)  \
+({ \
+   int __ret_sbu;  \
+   \
+   might_fault();  \
+   if (access_ok(VERIFY_WRITE, addr, nr/8 + 1))\
+   __set_bit_user_non_atomic_asm(nr, addr, __ret_sbu, -EFAULT);\
+   else\
+   __ret_sbu = -EFAULT;\
+   \
+   __ret_sbu;  \
+})
+
 /*
  * These are the main single-value transfer routines.  They automatically
  * use the right size if we just have the right pointer type.
-- 
1.7.0.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC][PATCH 4/12] x86: introduce copy_in_user() for 32-bit

2010-05-04 Thread Takuya Yoshikawa
During the work of KVM's dirty page logging optimization, we encountered
the need of copy_in_user() for 32-bit x86 and ppc: these will be used for
manipulating dirty bitmaps in user space.

So we implement copy_in_user() for 32-bit with existing generic copy user
helpers.

Signed-off-by: Takuya Yoshikawa 
Signed-off-by: Fernando Luis Vazquez Cao 
CC: Avi Kivity 
Cc: Thomas Gleixner 
CC: Ingo Molnar 
Cc: "H. Peter Anvin" 
---
 arch/x86/include/asm/uaccess_32.h |2 ++
 arch/x86/lib/usercopy_32.c|   26 ++
 2 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/uaccess_32.h 
b/arch/x86/include/asm/uaccess_32.h
index 088d09f..85d396d 100644
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -21,6 +21,8 @@ unsigned long __must_check __copy_from_user_ll_nocache
(void *to, const void __user *from, unsigned long n);
 unsigned long __must_check __copy_from_user_ll_nocache_nozero
(void *to, const void __user *from, unsigned long n);
+unsigned long __must_check copy_in_user
+   (void __user *to, const void __user *from, unsigned n);
 
 /**
  * __copy_to_user_inatomic: - Copy a block of data into user space, with less 
checking.
diff --git a/arch/x86/lib/usercopy_32.c b/arch/x86/lib/usercopy_32.c
index e218d5d..e90ffc3 100644
--- a/arch/x86/lib/usercopy_32.c
+++ b/arch/x86/lib/usercopy_32.c
@@ -889,3 +889,29 @@ void copy_from_user_overflow(void)
WARN(1, "Buffer overflow detected!\n");
 }
 EXPORT_SYMBOL(copy_from_user_overflow);
+
+/**
+ * copy_in_user: - Copy a block of data from user space to user space.
+ * @to:   Destination address, in user space.
+ * @from: Source address, in user space.
+ * @n:Number of bytes to copy.
+ *
+ * Context: User context only.  This function may sleep.
+ *
+ * Copy data from user space to user space.
+ *
+ * Returns number of bytes that could not be copied.
+ * On success, this will be zero.
+ */
+unsigned long
+copy_in_user(void __user *to, const void __user *from, unsigned n)
+{
+   if (access_ok(VERIFY_WRITE, to, n) && access_ok(VERIFY_READ, from, n)) {
+   if (movsl_is_ok(to, from, n))
+   __copy_user(to, from, n);
+   else
+   n = __copy_user_intel(to, (const void *)from, n);
+   }
+   return n;
+}
+EXPORT_SYMBOL(copy_in_user);
-- 
1.7.0.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC][PATCH 3/12] KVM: introduce wrapper functions to create and destroy dirty bitmaps

2010-05-04 Thread Takuya Yoshikawa
We will change the vmalloc() and vfree() to do_mmap() and do_munmap() later.
This patch makes it easy and cleanup the code.

Signed-off-by: Takuya Yoshikawa 
Signed-off-by: Fernando Luis Vazquez Cao 
---
 virt/kvm/kvm_main.c |   27 ---
 1 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7ab6312..3e3acad 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -435,6 +435,12 @@ out_err_nodisable:
return ERR_PTR(r);
 }
 
+static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
+{
+   vfree(memslot->dirty_bitmap);
+   memslot->dirty_bitmap = NULL;
+}
+
 /*
  * Free any memory in @free but not in @dont.
  */
@@ -447,7 +453,7 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot 
*free,
vfree(free->rmap);
 
if (!dont || free->dirty_bitmap != dont->dirty_bitmap)
-   vfree(free->dirty_bitmap);
+   kvm_destroy_dirty_bitmap(free);
 
 
for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) {
@@ -458,7 +464,6 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot 
*free,
}
 
free->npages = 0;
-   free->dirty_bitmap = NULL;
free->rmap = NULL;
 }
 
@@ -520,6 +525,18 @@ static int kvm_vm_release(struct inode *inode, struct file 
*filp)
return 0;
 }
 
+static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
+{
+   unsigned long dirty_bytes = kvm_dirty_bitmap_bytes(memslot);
+
+   memslot->dirty_bitmap = vmalloc(dirty_bytes);
+   if (!memslot->dirty_bitmap)
+   return -ENOMEM;
+
+   memset(memslot->dirty_bitmap, 0, dirty_bytes);
+   return 0;
+}
+
 /*
  * Allocate some memory and give it an address in the guest physical address
  * space.
@@ -653,12 +670,8 @@ skip_lpage:
 
/* Allocate page dirty bitmap if needed */
if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES) && !new.dirty_bitmap) {
-   unsigned long dirty_bytes = kvm_dirty_bitmap_bytes(&new);
-
-   new.dirty_bitmap = vmalloc(dirty_bytes);
-   if (!new.dirty_bitmap)
+   if (kvm_create_dirty_bitmap(&new) < 0)
goto out_free;
-   memset(new.dirty_bitmap, 0, dirty_bytes);
/* destroy any largepage mappings for dirty tracking */
if (old.npages)
flush_shadow = 1;
-- 
1.7.0.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC][PATCH 2/12] KVM: introduce slot level dirty state management

2010-05-04 Thread Takuya Yoshikawa
This patch introduces is_dirty member for each memory slot.
Using this member, we remove the dirty bitmap scans which are done in
get_dirty_log().

This is important for moving dirty bitmaps to user space because we don't
have any good ways to check bitmaps in user space with low cost and scanning
bitmaps to check memory slot dirtiness will not be acceptable.

When we mark a slot dirty:
 - x86 and ppc: at the timing of mark_page_dirty()
 - ia64: at the timing of kvm_ia64_sync_dirty_log()
ia64 uses a different place to store dirty logs and synchronize it with
the logs of memory slots right before the get_dirty_log(). So we use this
timing to update is_dirty.

Signed-off-by: Takuya Yoshikawa 
Signed-off-by: Fernando Luis Vazquez Cao 
CC: Avi Kivity 
CC: Alexander Graf 
---
 arch/ia64/kvm/kvm-ia64.c  |   11 +++
 arch/powerpc/kvm/book3s.c |9 -
 arch/x86/kvm/x86.c|9 +++--
 include/linux/kvm_host.h  |4 ++--
 virt/kvm/kvm_main.c   |   13 +++--
 5 files changed, 19 insertions(+), 27 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index d5f4e91..17fd65c 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1824,6 +1824,9 @@ static int kvm_ia64_sync_dirty_log(struct kvm *kvm,
base = memslot->base_gfn / BITS_PER_LONG;
 
for (i = 0; i < n/sizeof(long); ++i) {
+   if (dirty_bitmap[base + i])
+   memslot->is_dirty = true;
+
memslot->dirty_bitmap[i] = dirty_bitmap[base + i];
dirty_bitmap[base + i] = 0;
}
@@ -1838,7 +1841,6 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
int r;
unsigned long n;
struct kvm_memory_slot *memslot;
-   int is_dirty = 0;
 
mutex_lock(&kvm->slots_lock);
spin_lock(&kvm->arch.dirty_log_lock);
@@ -1847,16 +1849,17 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
if (r)
goto out;
 
-   r = kvm_get_dirty_log(kvm, log, &is_dirty);
+   r = kvm_get_dirty_log(kvm, log);
if (r)
goto out;
 
+   memslot = &kvm->memslots->memslots[log->slot];
/* If nothing is dirty, don't bother messing with page tables. */
-   if (is_dirty) {
+   if (memslot->is_dirty) {
kvm_flush_remote_tlbs(kvm);
-   memslot = &kvm->memslots->memslots[log->slot];
n = kvm_dirty_bitmap_bytes(memslot);
memset(memslot->dirty_bitmap, 0, n);
+   memslot->is_dirty = false;
}
r = 0;
 out:
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 28e785f..4b074f1 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -1191,20 +1191,18 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
struct kvm_memory_slot *memslot;
struct kvm_vcpu *vcpu;
ulong ga, ga_end;
-   int is_dirty = 0;
int r;
unsigned long n;
 
mutex_lock(&kvm->slots_lock);
 
-   r = kvm_get_dirty_log(kvm, log, &is_dirty);
+   r = kvm_get_dirty_log(kvm, log);
if (r)
goto out;
 
+   memslot = &kvm->memslots->memslots[log->slot];
/* If nothing is dirty, don't bother messing with page tables. */
-   if (is_dirty) {
-   memslot = &kvm->memslots->memslots[log->slot];
-
+   if (memslot->is_dirty) {
ga = memslot->base_gfn << PAGE_SHIFT;
ga_end = ga + (memslot->npages << PAGE_SHIFT);
 
@@ -1213,6 +1211,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 
n = kvm_dirty_bitmap_bytes(memslot);
memset(memslot->dirty_bitmap, 0, n);
+   memslot->is_dirty = false;
}
 
r = 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b95a211..023c7f8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2740,10 +2740,9 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
  struct kvm_dirty_log *log)
 {
-   int r, i;
+   int r;
struct kvm_memory_slot *memslot;
unsigned long n;
-   unsigned long is_dirty = 0;
 
mutex_lock(&kvm->slots_lock);
 
@@ -2758,11 +2757,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 
n = kvm_dirty_bitmap_bytes(memslot);
 
-   for (i = 0; !is_dirty && i < n/sizeof(long); i++)
-   is_dirty = memslot->dirty_bitmap[i];
-
/* If nothing is dirty, don't bother messing with page tables. */
-   if (is_dirty) {
+   if (memslot->is_dirty) {
struct kvm_memslots *slots, *old_slots;
unsigned long *dirty_bitmap;
 
@@ -2784,6 +2780,7 @@ int kvm_vm_ioctl_get_dirty_log(struct 

[RFC][PATCH 1/12 applied today] KVM: x86: avoid unnecessary bitmap allocation when memslot is clean

2010-05-04 Thread Takuya Yoshikawa
Although we always allocate a new dirty bitmap in x86's get_dirty_log(),
it is only used as a zero-source of copy_to_user() and freed right after
that when memslot is clean. This patch uses clear_user() instead of doing
this unnecessary zero-source allocation.

Performance improvement: as we can expect easily, the time needed to
allocate a bitmap is completely reduced. Furthermore, we can avoid the
tlb flush triggered by vmalloc() and get some good effects. In my test,
the improved ioctl was about 4 to 10 times faster than the original one
for clean slots.

Signed-off-by: Takuya Yoshikawa 
---
 arch/x86/kvm/x86.c |   37 +++--
 1 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6b2ce1d..b95a211 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2744,7 +2744,6 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
struct kvm_memory_slot *memslot;
unsigned long n;
unsigned long is_dirty = 0;
-   unsigned long *dirty_bitmap = NULL;
 
mutex_lock(&kvm->slots_lock);
 
@@ -2759,27 +2758,30 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 
n = kvm_dirty_bitmap_bytes(memslot);
 
-   r = -ENOMEM;
-   dirty_bitmap = vmalloc(n);
-   if (!dirty_bitmap)
-   goto out;
-   memset(dirty_bitmap, 0, n);
-
for (i = 0; !is_dirty && i < n/sizeof(long); i++)
is_dirty = memslot->dirty_bitmap[i];
 
/* If nothing is dirty, don't bother messing with page tables. */
if (is_dirty) {
struct kvm_memslots *slots, *old_slots;
+   unsigned long *dirty_bitmap;
 
spin_lock(&kvm->mmu_lock);
kvm_mmu_slot_remove_write_access(kvm, log->slot);
spin_unlock(&kvm->mmu_lock);
 
-   slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
-   if (!slots)
-   goto out_free;
+   r = -ENOMEM;
+   dirty_bitmap = vmalloc(n);
+   if (!dirty_bitmap)
+   goto out;
+   memset(dirty_bitmap, 0, n);
 
+   r = -ENOMEM;
+   slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
+   if (!slots) {
+   vfree(dirty_bitmap);
+   goto out;
+   }
memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
slots->memslots[log->slot].dirty_bitmap = dirty_bitmap;
 
@@ -2788,13 +2790,20 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
synchronize_srcu_expedited(&kvm->srcu);
dirty_bitmap = old_slots->memslots[log->slot].dirty_bitmap;
kfree(old_slots);
+
+   r = -EFAULT;
+   if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n)) {
+   vfree(dirty_bitmap);
+   goto out;
+   }
+   vfree(dirty_bitmap);
+   } else {
+   r = -EFAULT;
+   if (clear_user(log->dirty_bitmap, n))
+   goto out;
}
 
r = 0;
-   if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n))
-   r = -EFAULT;
-out_free:
-   vfree(dirty_bitmap);
 out:
mutex_unlock(&kvm->slots_lock);
return r;
-- 
1.7.0.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space

2010-05-04 Thread Takuya Yoshikawa
Hi, sorry for sending from my personal account.
The following series are all from me:

  From: Takuya Yoshikawa 

  The 3rd version of "moving dirty bitmaps to user space".

>From this version, we add x86 and ppc and asm-generic people to CC lists.


[To KVM people]

Sorry for being late to reply your comments.

Avi,
 - I've wrote an answer to your question in patch 5/12: drivers/vhost/vhost.c .

 - I've considered to change the set_bit_user_non_atomic to an inline function,
   but did not change because the other helpers in the uaccess.h are written as
   macros. Anyway, I hope that x86 people will give us appropriate suggestions
   about this.

 - I thought that documenting about making bitmaps 64-bit aligned will be
   written when we add an API to register user-allocated bitmaps. So probably
   in the next series.

Avi, Alex,
 - Could you check the ia64 and ppc parts, please? I tried to keep the logical
   changes as small as possible.

   I personally tried to build these with cross compilers. For ia64, I could 
check
   build success with my patch series. But book3s, even without my patch series,
   it failed with the following errors:

  arch/powerpc/kvm/book3s_paired_singles.c: In function 
'kvmppc_emulate_paired_single':
  arch/powerpc/kvm/book3s_paired_singles.c:1289: error: the frame size of 2288 
bytes is larger than 2048 bytes
  make[1]: *** [arch/powerpc/kvm/book3s_paired_singles.o] Error 1
  make: *** [arch/powerpc/kvm] Error 2


About changelog: there are two main changes from the 2nd version:
  1. I changed the treatment of clean slots (see patch 1/12).
 This was already applied today, thanks!
  2. I changed the switch API. (see patch 11/12).

To show this API's advantage, I also did a test (see the end of this mail).


[To x86 people]

Hi, Thomas, Ingo, Peter,

Please review the patches 4,5/12. Because this is the first experience for
me to send patches to x86, please tell me if this lacks anything.


[To ppc people]

Hi, Benjamin, Paul, Alex,

Please see the patches 6,7/12. I first say sorry for that I've not tested these
yet. In that sense, these may not be in the quality for precise reviews. But I
will be happy if you would give me any comments.

Alex, could you help me? Though I have a plan to get PPC box in the future,
currently I cannot test these.



[To asm-generic people]

Hi, Arnd,

Please review the patch 8/12. This kind of macro is acceptable?





[Performance test]

We measured the tsc needed to the ioctl()s for getting dirty logs in
kernel.

Test environment

  AMD Phenom(tm) 9850 Quad-Core Processor with 8GB memory


1. GUI test (running Ubuntu guest in graphical mode)

  sudo qemu-system-x86_64 -hda dirtylog_test.img -boot c -m 4192 -net ...

We show a relatively stable part to compare how much time is needed
for the basic parts of dirty log ioctl.

   get.org   get.opt  switch.opt

slots[7].len=32768  278379 66398 64024
slots[8].len=32768  181246   270   160
slots[7].len=32768  263961 64673 64494
slots[8].len=32768  181655   265   160
slots[7].len=32768  263736 64701 64610
slots[8].len=32768  182785   267   160
slots[7].len=32768  260925 65360 65042
slots[8].len=32768  182579   264   160
slots[7].len=32768  267823 65915 65682
slots[8].len=32768  186350   271   160

At a glance, we know our optimization improved significantly compared
to the original get dirty log ioctl. This is true for both get.opt and
switch.opt. This has a really big impact for the personal KVM users who
drive KVM in GUI mode on their usual PCs.

Next, we notice that switch.opt improved a hundred nano seconds or so for
these slots. Although this may sound a bit tiny improvement, we can feel
this as a difference of GUI's responses like mouse reactions.

To feel the difference, please try GUI on your PC with our patch series!


2. Live-migration test (4GB guest, write loop with 1GB buf)

We also did a live-migration test.

   get.org   get.opt  switch.opt

slots[0].len=655360 797383261144222181
slots[1].len=37570478082186721   1965244   1842824
slots[2].len=637534208 1433562   1012723   1031213
slots[3].len=131072 216858   331   331
slots[4].len=131072 121635   225   164
slots[5].len=131072 120863   356   164
slots[6].len=16777216   121746  1133   156
slots[7].len=32768  120415   230   278
slots[8].len=32768  120368   216   149
slots[0].len=655360 806497194710223582
slots[1].len=37570478082142922   1878025   1895369
slots[2].len=637534208 1386512   1021309   1000345
slots[3].len=131072 221118   459   296
slots[4].len=131072 121516   272   166
slots[5].len=131072