Re: [PATCH 4/5] KVM: PPC: Book3s HV: Implement get_dirty_log using hardware changed bit
(2011/12/26 8:35), Paul Mackerras wrote: On Fri, Dec 23, 2011 at 02:23:30PM +0100, Alexander Graf wrote: So if I read things correctly, this is the only case you're setting pages as dirty. What if you have the following: guest adds HTAB entry x guest writes to page mapped by x guest removes HTAB entry x host fetches dirty log In that case the dirtiness is preserved in the setting of the KVMPPC_RMAP_CHANGED bit in the rmap entry. kvm_test_clear_dirty() returns 1 if that bit is set (and clears it). Using the rmap entry for this is convenient because (a) we also use it for saving the referenced bit when a HTAB entry is removed, and we can transfer both R and C over in one operation; (b) we need to be able to save away the C bit in real mode, and we already need to get the real-mode address of the rmap entry -- if we wanted to save it in a dirty bitmap we'd have to do an extra translation to get the real-mode address of the dirty bitmap word; (c) to avoid SMP races, if we were asynchronously setting bits in the dirty bitmap we'd have to do the double-buffering thing that x86 does, which seems more complicated than using the rmap entry (which we already have a lock bit for). From my x86 dirty logging experience I have some concern about your code: your code looks slow even when there is no/few dirty pages in the slot. + for (i = 0; i < memslot->npages; ++i) { + if (kvm_test_clear_dirty(kvm, rmapp)) + __set_bit_le(i, map); + ++rmapp; + } The check is being done for each page and this can be very expensive because the number of pages is not small. When we scan the dirty_bitmap 64 pages are checked at once and the problem is not so significant. Though I do not know well what kvm-ppc's dirty logging is aiming at, I guess reporting cleanliness without noticeable delay to the user-space is important. E.g. for VGA most of the cases are clean. For live migration, the chance of seeing complete clean slot is small but almost all cases are sparse. PS: Always CC kvm@vger for stuff that other might want to review (basically all patches) (Though I sometimes check kvm-ppc on the archives,) GET_DIRTY_LOG thing will be welcome. Takuya So why do we have a separate kvm-ppc list then? :) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space
User allocated bitmaps have the advantage of reducing pinned memory. However we have plenty more pinned memory allocated in memory slots, so by itself, user allocated bitmaps don't justify this change. In that sense, what do you think about the question I sent last week? === REPOST 1 === >> >> mark_page_dirty is called with the mmu_lock spinlock held in set_spte. >> Must find a way to move it outside of the spinlock section. >> > > Oh, it's a serious problem. I have to consider it. Avi, Marcelo, Sorry but I have to say that mmu_lock spin_lock problem was completely out of my mind. Although I looked through the code, it seems not easy to move the set_bit_user to outside of spinlock section without breaking the semantics of its protection. So this may take some time to solve. But personally, I want to do something for x86's "vmallc() every time" problem even though moving dirty bitmaps to user space cannot be achieved soon. In that sense, do you mind if we do double buffering without moving dirty bitmaps to user space? I know that the resource for vmalloc() is precious for x86 but even now, at the timing of get_dirty_log, we use the same amount of memory as double buffering. === 1 END === Perhaps if we optimize memory slot write protection (I have some ideas about this) we can make the performance improvement more pronounced. It's really nice! Even now we can measure the performance improvement by introducing switch ioctl when guest is relatively idle, so the combination will be really effective! === REPOST 2 === >> >> Can you post such a test, for an idle large guest? > > OK, I'll do! Result of "low workload test" (running top during migration) first, 4GB guest picked up slots[1](len=3757047808) only * get.org get.optswitch.opt 1060875 310292 190335 1076754 301295 188600 655504 318284 196029 529769 301471325 694796 70216 221172 651868 353073 196184 543339 312865 213236 1061938 72785 203090 689527 323901 249519 621364 323881473 1063671 70703 192958 915903 336318 174008 1046462 332384782 1037942 72783 190655 680122 318305 243544 688156 314935 193526 558658 265934 190550 652454 372135 196270 660140 68613352 1101947 378642 186575 ......... * As expected we've got the difference more clearly. In this case, switch.opt reduced 1/3 (.1 msec) compared to get.opt for each iteration. And when the slot is cleaner, the ratio is bigger. === 2 END === ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC][PATCH 7/12 not tested yet] PPC: introduce __set_bit() like function for bitmaps in user space
+static inline int set_bit_user_non_atomic(int nr, void __user *addr) +{ + u8 __user *p; + u8 val; + + p = (u8 __user *)((unsigned long)addr + nr / BITS_PER_BYTE); Does C do the + or the / first? Either way, I'd like to see brackets here :) OK, I'll change like that! I like brackets style too :) Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space
[To ppc people] Hi, Benjamin, Paul, Alex, Please see the patches 6,7/12. I first say sorry for that I've not tested these yet. In that sense, these may not be in the quality for precise reviews. But I will be happy if you would give me any comments. Alex, could you help me? Though I have a plan to get PPC box in the future, currently I cannot test these. Could you please point me to a git tree where everything's readily applied? That would make testing a lot easier. OK, I'll prepare one. Probably on sourceforge or somewhere like Kemari. Thanks, Takuya Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC][PATCH RFC 10/12] KVM: move dirty bitmaps to user space
r = 0; @@ -1195,11 +1232,16 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn) gfn = unalias_gfn(kvm, gfn); memslot = gfn_to_memslot_unaliased(kvm, gfn); if (memslot&& memslot->dirty_bitmap) { - unsigned long rel_gfn = gfn - memslot->base_gfn; + int nr = generic_le_bit_offset(gfn - memslot->base_gfn); - generic___set_le_bit(rel_gfn, memslot->dirty_bitmap); + if (kvm_set_bit_user(nr, memslot->dirty_bitmap)) + goto out_fault; mark_page_dirty is called with the mmu_lock spinlock held in set_spte. Must find a way to move it outside of the spinlock section. Oh, it's a serious problem. I have to consider it. Thanks, Takuya ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC][PATCH 11/12] KVM: introduce new API for getting/switching dirty bitmaps
One alternative would be: KVM_SWITCH_DIRTY_LOG passing the address of a bitmap. If the active bitmap was clean, it returns 0, no switch performed. If the active bitmap was dirty, the kernel switches to the new bitmap and returns 1. And the responsability of cleaning the new bitmap could also be left for userspace. That is a beautiful approach but we can do that only when we give up using GET api. I follow you and Avi's advice about that kind of maintenance policy! What do you think? If you introduce a switch ioctl that frees the bitmap vmalloc'ed by the current set_memory_region (if its not freed already), after pointing the memslot to the user supplied one, it should be fine? You mean switching from vmalloc'ed(not do_mmap'ed) one to user supplied one? It may be possible but makes things really complicated in my view: until some point we use set_bit, and then use set_bit_user, etc. IMO: - # of slots is limited and the size of dirty_bitmap_old pointer is not problematic. - Both user side and kernel side need not allocate buffers every time and once paired buffers are registered, we will reuse the buffers until user side orders to stop logging. - We have a tiny advantage if we need not copy_from_user to get a bitmap address for switch ioctl. => So I think having two __user bitmaps is not a bad thing. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space
In usual workload, the number of dirty pages varies a lot for each iteration and we should gain really a lot for relatively clean cases. Can you post such a test, for an idle large guest? OK, I'll do! Result of "low workload test" (running top during migration) first, 4GB guest picked up slots[1](len=3757047808) only * get.org get.optswitch.opt 1060875 310292 190335 1076754 301295 188600 655504 318284 196029 529769 301471325 694796 70216 221172 651868 353073 196184 543339 312865 213236 1061938 72785 203090 689527 323901 249519 621364 323881473 1063671 70703 192958 915903 336318 174008 1046462 332384782 1037942 72783 190655 680122 318305 243544 688156 314935 193526 558658 265934 190550 652454 372135 196270 660140 68613352 1101947 378642 186575 ......... * As expected we've got the difference more clearly. In this case, switch.opt reduced 1/3 (.1 msec) compared to get.opt for each iteration. And when the slot is cleaner, the ratio is bigger. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC][PATCH 11/12] KVM: introduce new API for getting/switching dirty bitmaps
(2010/05/11 12:43), Marcelo Tosatti wrote: On Tue, May 04, 2010 at 10:08:21PM +0900, Takuya Yoshikawa wrote: +How to Get + +Before calling this, you have to set the slot member of kvm_user_dirty_log +to indicate the target memory slot. + +struct kvm_user_dirty_log { + __u32 slot; + __u32 flags; + __u64 dirty_bitmap; + __u64 dirty_bitmap_old; +}; + +The addresses will be set in the paired members: dirty_bitmap and _old. Why not pass the bitmap address to the kernel, instead of having the kernel allocate it. Because apparently you plan to do that in a new generation anyway? Yes, we want to make qemu allocate and free bitmaps in the future. But currently, those are strictly tied with memory slot registration and changing it in one patch set is really difficult. Anyway, we need kernel side allocation mechanism to keep the current GET_DIRTY_LOG api. I don't mind not exporting kernel allocated bitmaps in this patch set and later introducing a bitmap registration mechanism in another patch set. As this RFC is suggesting, kernel side double buffering infrastructure is designed as general as possible and adding a new API like SWITCH can be done naturally. Also, why does the kernel need to know about different bitmaps? Because we need to support current GET API. We don't have any way to get a new bitmap in the case of GET and we don't want to do_mmap() every time for GET. One alternative would be: KVM_SWITCH_DIRTY_LOG passing the address of a bitmap. If the active bitmap was clean, it returns 0, no switch performed. If the active bitmap was dirty, the kernel switches to the new bitmap and returns 1. And the responsability of cleaning the new bitmap could also be left for userspace. That is a beautiful approach but we can do that only when we give up using GET api. I follow you and Avi's advice about that kind of maintenance policy! What do you think? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space
get.org get.opt switch.opt slots[7].len=32768 278379 66398 64024 slots[8].len=32768 181246 270 160 slots[7].len=32768 263961 64673 64494 slots[8].len=32768 181655 265 160 slots[7].len=32768 263736 64701 64610 slots[8].len=32768 182785 267 160 slots[7].len=32768 260925 65360 65042 slots[8].len=32768 182579 264 160 slots[7].len=32768 267823 65915 65682 slots[8].len=32768 186350 271 160 At a glance, we know our optimization improved significantly compared to the original get dirty log ioctl. This is true for both get.opt and switch.opt. This has a really big impact for the personal KVM users who drive KVM in GUI mode on their usual PCs. Next, we notice that switch.opt improved a hundred nano seconds or so for these slots. Although this may sound a bit tiny improvement, we can feel this as a difference of GUI's responses like mouse reactions. 100 ns... this is a bit on the low side (and if you can measure it interactively you have much better reflexes than I). To feel the difference, please try GUI on your PC with our patch series! No doubt get.org -> get.opt is measurable, but get.opt->switch.opt is problematic. Have you tried profiling to see where the time is spent (well I can guess, clearing the write access from the sptes). Sorry but no, and I agree with your guess. Anyway, I want to do some profiling to confirm this guess. BTW, If we only think about performance improvement of time, optimized get(get.opt) may be enough at this stage. But if we consider the future expansion like using user allocated bitmaps, new API's introduced for switch.opt won't become waste, I think, because we need a structure to get and export bitmap addresses. In usual workload, the number of dirty pages varies a lot for each iteration and we should gain really a lot for relatively clean cases. Can you post such a test, for an idle large guest? OK, I'll do! ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro
(2010/05/06 22:38), Arnd Bergmann wrote: On Wednesday 05 May 2010, Takuya Yoshikawa wrote: Date: Yesterday 04:59:24 That's why the bitmaps are defined as little endian u64 aligned, even on big endian 32-bit systems. Little endian bitmaps are wordsize agnostic, and u64 alignment ensures we can use long-sized bitops on mixed size systems. Ok, I see. There was a suggestion to propose set_le_bit_user() kind of macros. But what I thought was these have a constraint you two explained and seemed to be a little bit specific to some area, like KVM. So I decided to propose just the offset calculation macro. I'm not sure I understand how this macro is going to be used though. If you are just using this in kernel space, that's fine, please go for it. Yes, I'm just using in kernel space: qemu has its own endian related helpers. So if you allow us to place this macro in asm-generic/bitops/* it will help us. Avi, what do you think? Do you want to place it in kvm.h ? However, if the intention is to use the same macro in user space, putting it into asm-generic/bitops/* is not going to help, because those headers are not available in user space, and I wouldn't want to change that. The definition of the macro is not part of the ABI, so just duplicate it in KVM if you need it there. Arnd ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro
Yes, I'm just using in kernel space: qemu has its own endian related helpers. So if you allow us to place this macro in asm-generic/bitops/* it will help us. No problem at all then. Thanks for the explanation. Acked-by: Arnd Bergmann Thanks you both. I will add your Acked-by from now on! Takuya ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro
On Tue, 04 May 2010 19:08:23 +0300 Avi Kivity wrote: > On 05/04/2010 06:03 PM, Arnd Bergmann wrote: > > On Tuesday 04 May 2010, Takuya Yoshikawa wrote: ... > >> So let us use the le bit offset calculation part by defining it as a new > >> macro: generic_le_bit_offset() . > >> > > Does this work correctly if your user space is 32 bits (i.e. unsigned long > > is different size in user space and kernel) in both big- and little-endian > > systems? > > > > I'm not sure about all the details, but I think you cannot in general share > > bitmaps between user space and kernel because of this. > > > > That's why the bitmaps are defined as little endian u64 aligned, even on > big endian 32-bit systems. Little endian bitmaps are wordsize agnostic, > and u64 alignment ensures we can use long-sized bitops on mixed size > systems. There was a suggestion to propose set_le_bit_user() kind of macros. But what I thought was these have a constraint you two explained and seemed to be a little bit specific to some area, like KVM. So I decided to propose just the offset calculation macro. Thanks, Takuya ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC][PATCH 12/12 sample] qemu-kvm: use new API for getting/switching dirty bitmaps
We use new API for light dirty log access if KVM supports it. This conflicts with Marcelo's patches. So please take this as a sample patch. Signed-off-by: Takuya Yoshikawa --- kvm/include/linux/kvm.h | 11 ++ qemu-kvm.c | 81 ++- qemu-kvm.h |1 + 3 files changed, 85 insertions(+), 8 deletions(-) diff --git a/kvm/include/linux/kvm.h b/kvm/include/linux/kvm.h index 6485981..efd9538 100644 --- a/kvm/include/linux/kvm.h +++ b/kvm/include/linux/kvm.h @@ -317,6 +317,14 @@ struct kvm_dirty_log { }; }; +/* for KVM_GET_USER_DIRTY_LOG_ADDR */ +struct kvm_user_dirty_log { + __u32 slot; + __u32 flags; + __u64 dirty_bitmap; + __u64 dirty_bitmap_old; +}; + /* for KVM_SET_SIGNAL_MASK */ struct kvm_signal_mask { __u32 len; @@ -499,6 +507,7 @@ struct kvm_ioeventfd { #define KVM_CAP_PPC_SEGSTATE 43 #define KVM_CAP_PCI_SEGMENT 47 +#define KVM_CAP_USER_DIRTY_LOG 55 #ifdef KVM_CAP_IRQ_ROUTING @@ -595,6 +604,8 @@ struct kvm_clock_data { struct kvm_userspace_memory_region) #define KVM_SET_TSS_ADDR _IO(KVMIO, 0x47) #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO, 0x48, __u64) +#define KVM_GET_USER_DIRTY_LOG_ADDR _IOW(KVMIO, 0x49, struct kvm_user_dirty_log) +#define KVM_SWITCH_DIRTY_LOG _IO(KVMIO, 0x4a) /* Device model IOC */ #define KVM_CREATE_IRQCHIP_IO(KVMIO, 0x60) #define KVM_IRQ_LINE _IOW(KVMIO, 0x61, struct kvm_irq_level) diff --git a/qemu-kvm.c b/qemu-kvm.c index 91f0222..98777f0 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -143,6 +143,8 @@ struct slot_info { unsigned long userspace_addr; unsigned flags; int logging_count; +unsigned long *dirty_bitmap; +unsigned long *dirty_bitmap_old; }; struct slot_info slots[KVM_MAX_NUM_MEM_REGIONS]; @@ -232,6 +234,29 @@ int kvm_is_containing_region(kvm_context_t kvm, unsigned long phys_addr, return 1; } +static int kvm_user_dirty_log_works(void) +{ +return kvm_state->user_dirty_log; +} + +static int kvm_set_user_dirty_log(int slot) +{ +int r; +struct kvm_user_dirty_log dlog; + +dlog.slot = slot; +r = kvm_vm_ioctl(kvm_state, KVM_GET_USER_DIRTY_LOG_ADDR, &dlog); +if (r < 0) { +DPRINTF("KVM_GET_USER_DIRTY_LOG_ADDR failed: %s\n", strerror(-r)); +return r; +} +slots[slot].dirty_bitmap = (unsigned long *) + ((unsigned long)dlog.dirty_bitmap); +slots[slot].dirty_bitmap_old = (unsigned long *) + ((unsigned long)dlog.dirty_bitmap_old); +return r; +} + /* * dirty pages logging control */ @@ -265,8 +290,16 @@ static int kvm_dirty_pages_log_change(kvm_context_t kvm, DPRINTF("slot %d start %llx len %llx flags %x\n", mem.slot, mem.guest_phys_addr, mem.memory_size, mem.flags); r = kvm_vm_ioctl(kvm_state, KVM_SET_USER_MEMORY_REGION, &mem); -if (r < 0) +if (r < 0) { fprintf(stderr, "%s: %m\n", __FUNCTION__); +return r; +} +} +if (flags & KVM_MEM_LOG_DIRTY_PAGES) { +r = kvm_set_user_dirty_log(slot); +} else { +slots[slot].dirty_bitmap = NULL; +slots[slot].dirty_bitmap_old = NULL; } return r; } @@ -589,7 +622,6 @@ int kvm_register_phys_mem(kvm_context_t kvm, unsigned long phys_start, void *userspace_addr, unsigned long len, int log) { - struct kvm_userspace_memory_region memory = { .memory_size = len, .guest_phys_addr = phys_start, @@ -608,6 +640,9 @@ int kvm_register_phys_mem(kvm_context_t kvm, fprintf(stderr, "create_userspace_phys_mem: %s\n", strerror(-r)); return -1; } +if (log) { +r = kvm_set_user_dirty_log(memory.slot); +} register_slot(memory.slot, memory.guest_phys_addr, memory.memory_size, memory.userspace_addr, memory.flags); return 0; @@ -652,6 +687,8 @@ void kvm_destroy_phys_mem(kvm_context_t kvm, unsigned long phys_start, fprintf(stderr, "destroy_userspace_phys_mem: %s", strerror(-r)); return; } +slots[memory.slot].dirty_bitmap = NULL; +slots[memory.slot].dirty_bitmap_old = NULL; free_slot(memory.slot); } @@ -692,6 +729,21 @@ int kvm_get_dirty_pages(kvm_context_t kvm, unsigned long phys_addr, void *buf) return kvm_get_map(kvm, KVM_GET_DIRTY_LOG, slot, buf); } +static int kvm_switch_map(int slot) +{ +int r; + +r = kvm_vm_ioctl(kvm_state, KVM_SWITCH_DIRTY_LOG, slot); +if (r == 0) { +unsigned long *dirty_bitmap; + +dirty_bitmap = slots[slot].dirty_bitmap; +slots[slot].dirty_bitmap = slots[slot].dirty_bitmap_old; +slots[slot].dirty_bitmap_old = dirty_bitmap; +} +
[RFC][PATCH 11/12] KVM: introduce new API for getting/switching dirty bitmaps
Now that dirty bitmaps are accessible from user space, we export the addresses of these to achieve zero-copy dirty log check: KVM_GET_USER_DIRTY_LOG_ADDR We also need an API for triggering dirty bitmap switch to take the full advantage of double buffered bitmaps. KVM_SWITCH_DIRTY_LOG See the documentation in this patch for precise explanations. About performance improvement: the most important feature of switch API is the lightness. In our test, this appeared in the form of improved responses for GUI manipulations. Signed-off-by: Takuya Yoshikawa Signed-off-by: Fernando Luis Vazquez Cao CC: Avi Kivity CC: Alexander Graf --- Documentation/kvm/api.txt | 87 + arch/ia64/kvm/kvm-ia64.c | 27 +- arch/powerpc/kvm/book3s.c | 18 +++-- arch/x86/kvm/x86.c| 44 --- include/linux/kvm.h | 11 ++ include/linux/kvm_host.h |4 ++- virt/kvm/kvm_main.c | 63 + 7 files changed, 220 insertions(+), 34 deletions(-) diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt index a237518..c106c83 100644 --- a/Documentation/kvm/api.txt +++ b/Documentation/kvm/api.txt @@ -892,6 +892,93 @@ arguments. This ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel irqchip, the multiprocessing state must be maintained by userspace. +4.39 KVM_GET_USER_DIRTY_LOG_ADDR + +Capability: KVM_CAP_USER_DIRTY_LOG (>=1 see below) +Architectures: all +Type: vm ioctl +Parameters: struct kvm_user_dirty_log (in/out) +Returns: 0 on success, -1 on error + +This ioctl makes it possible to use KVM_SWITCH_DIRTY_LOG (see 4.40) instead +of the old dirty log manipulation by KVM_GET_DIRTY_LOG. + +A bit about KVM_CAP_USER_DIRTY_LOG + +Before this ioctl was introduced, dirty bitmaps for dirty page logging were +allocated in the kernel's memory space. But we have now moved them to user +space to get more flexiblity and performance. To achive this move without +breaking the compatibility, we will split KVM_CAP_USER_DIRTY_LOG capability +into a few generations which can be identified by its check extension +return values. + +This KVM_GET_USER_DIRTY_LOG_ADDR belongs to the first generation with the +KVM_SWITCH_DIRTY_LOG (4.40) and must be supported by all generations. + +What you get + +By using this, you can get paired bitmap addresses which are accessible from +user space. See the explanation in 4.40 for the roles of these two bitmaps. + +How to Get + +Before calling this, you have to set the slot member of kvm_user_dirty_log +to indicate the target memory slot. + +struct kvm_user_dirty_log { + __u32 slot; + __u32 flags; + __u64 dirty_bitmap; + __u64 dirty_bitmap_old; +}; + +The addresses will be set in the paired members: dirty_bitmap and _old. + +Note + +In generation 1, we support bitmaps which are created in kernel but do not +support bitmaps created by users. This means bitmap creation/destruction +are done same as before when you instruct KVM by KVM_SET_USER_MEMORY_REGION +(see 4.34) to start/stop logging. Please do not try to free the exported +bitmaps by yourself, or KVM will access the freed area and end with fault. + +4.40 KVM_SWITCH_DIRTY_LOG + +Capability: KVM_CAP_USER_DIRTY_LOG (>=1 see 4.39) +Architectures: all +Type: vm ioctl +Parameters: memory slot id +Returns: 0 if switched, 1 if not (slot was clean), -1 on error + +This ioctl allows you to switch the dirty log to the next one: a newer +ioctl for getting dirty page logs than KVM_GET_DIRTY_LOG (see 4.7 for the +explanation about dirty page logging, log format is not changed). + +If you have the capability KVM_CAP_USER_DIRTY_LOG, using this is strongly +recommended than using KVM_GET_DIRTY_LOG because this does not need buffer +copy between kernel and user space. + +How to Use + +Before calling this, you have to remember the paired addresses of dirty +bitmaps which can be obtained by KVM_GET_USER_DIRTY_LOG_ADDR (see 4.39): +dirty_bitmap (being used now by kernel) and dirty_bitmap_old (not being +used now and containing the last log). + +After calling this, the role of these bitmaps will change like this; +If the return value was 0, kernel cleared dirty_bitmap_old and began to use +it for the next logging, so that you can use the cold dirty_bitmap to check +the log since the last switch. If the return value was 1, all pages were not +dirty and bitmap switch was not done. Note that remembering which bitmap is +now active is your responsibility. So you have to update your remembering +when you get the return value 0. + +Note + +Bitmap switch may also occur when you call KVM_GET_DIRTY_LOG. Please use +either one, preferably this one, only to avoid extra confusion. We do not +guarantee on which condition KVM_GET_DIRTY_LOG causes bitmap switch. + 5. The kvm_run structure Application code obtains a pointer to the kvm_run structure by diff --git a/arch/ia64/kvm/k
[RFC][PATCH RFC 10/12] KVM: move dirty bitmaps to user space
We move dirty bitmaps to user space. - Allocation and destruction: we use do_mmap() and do_munmap(). The new bitmap space is twice longer than the original one and we use the additional space for double buffering: this makes it possible to update the active bitmap while letting the user space read the other one safely. For x86, we can also remove the vmalloc() in kvm_vm_ioctl_get_dirty_log(). - Bitmap manipulations: we replace all functions which access dirty bitmaps with *_user() functions. - For ia64: moving the dirty bitmaps of memory slots does not affect ia64 much because it's using a different place to store dirty logs rather than the dirty bitmaps of memory slots: all we have to change are sync and get of dirty log, so we don't need set_bit_user like functions for ia64. Signed-off-by: Takuya Yoshikawa Signed-off-by: Fernando Luis Vazquez Cao CC: Avi Kivity CC: Alexander Graf --- arch/ia64/kvm/kvm-ia64.c | 15 +- arch/powerpc/kvm/book3s.c |5 +++- arch/x86/kvm/x86.c| 25 -- include/linux/kvm_host.h |3 +- virt/kvm/kvm_main.c | 62 +--- 5 files changed, 82 insertions(+), 28 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 17fd65c..03503e6 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -1823,11 +1823,19 @@ static int kvm_ia64_sync_dirty_log(struct kvm *kvm, n = kvm_dirty_bitmap_bytes(memslot); base = memslot->base_gfn / BITS_PER_LONG; + r = -EFAULT; + if (!access_ok(VERIFY_WRITE, memslot->dirty_bitmap, n)) + goto out; + for (i = 0; i < n/sizeof(long); ++i) { if (dirty_bitmap[base + i]) memslot->is_dirty = true; - memslot->dirty_bitmap[i] = dirty_bitmap[base + i]; + if (__put_user(dirty_bitmap[base + i], + &memslot->dirty_bitmap[i])) { + r = -EFAULT; + goto out; + } dirty_bitmap[base + i] = 0; } r = 0; @@ -1858,7 +1866,10 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, if (memslot->is_dirty) { kvm_flush_remote_tlbs(kvm); n = kvm_dirty_bitmap_bytes(memslot); - memset(memslot->dirty_bitmap, 0, n); + if (clear_user(memslot->dirty_bitmap, n)) { + r = -EFAULT; + goto out; + } memslot->is_dirty = false; } r = 0; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 4b074f1..2a31d2f 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -1210,7 +1210,10 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, kvmppc_mmu_pte_pflush(vcpu, ga, ga_end); n = kvm_dirty_bitmap_bytes(memslot); - memset(memslot->dirty_bitmap, 0, n); + if (clear_user(memslot->dirty_bitmap, n)) { + r = -EFAULT; + goto out; + } memslot->is_dirty = false; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 023c7f8..32a3d94 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2760,40 +2760,37 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, /* If nothing is dirty, don't bother messing with page tables. */ if (memslot->is_dirty) { struct kvm_memslots *slots, *old_slots; - unsigned long *dirty_bitmap; + unsigned long __user *dirty_bitmap; + unsigned long __user *dirty_bitmap_old; spin_lock(&kvm->mmu_lock); kvm_mmu_slot_remove_write_access(kvm, log->slot); spin_unlock(&kvm->mmu_lock); - r = -ENOMEM; - dirty_bitmap = vmalloc(n); - if (!dirty_bitmap) + dirty_bitmap = memslot->dirty_bitmap; + dirty_bitmap_old = memslot->dirty_bitmap_old; + r = -EFAULT; + if (clear_user(dirty_bitmap_old, n)) goto out; - memset(dirty_bitmap, 0, n); r = -ENOMEM; slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); - if (!slots) { - vfree(dirty_bitmap); + if (!slots) goto out; - } + memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots)); - slots->memslots[log->slot].dirty_bitmap = dirty_bitmap; + slots->memslots[log->slot].dirty_bitmap = dirty_bitmap_old; + slots->memslots[log->slot].dirty_bitmap_old = dirty_bitmap;
[RFC][PATCH 9/12] KVM: introduce a wrapper function of set_bit_user_non_atomic()
This is not to break the build for other architectures than x86 and ppc. Signed-off-by: Takuya Yoshikawa Signed-off-by: Fernando Luis Vazquez Cao --- arch/ia64/include/asm/kvm_host.h|5 + arch/powerpc/include/asm/kvm_host.h |6 ++ arch/s390/include/asm/kvm_host.h|6 ++ arch/x86/include/asm/kvm_host.h |5 + 4 files changed, 22 insertions(+), 0 deletions(-) diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h index a362e67..938041b 100644 --- a/arch/ia64/include/asm/kvm_host.h +++ b/arch/ia64/include/asm/kvm_host.h @@ -589,6 +589,11 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu); int kvm_pal_emul(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run); void kvm_sal_emul(struct kvm_vcpu *vcpu); +static inline int kvm_set_bit_user(int nr, void __user *addr) +{ + return 0; +} + #endif /* __ASSEMBLY__*/ #endif diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 0c9ad86..9463524 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -26,6 +26,7 @@ #include #include #include +#include #define KVM_MAX_VCPUS 1 #define KVM_MEMORY_SLOTS 32 @@ -287,4 +288,9 @@ struct kvm_vcpu_arch { #endif }; +static inline int kvm_set_bit_user(int nr, void __user *addr) +{ + return set_bit_user_non_atomic(nr, addr); +} + #endif /* __POWERPC_KVM_HOST_H__ */ diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h index 27605b6..36710ee 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -238,4 +238,10 @@ struct kvm_arch{ }; extern int sie64a(struct kvm_s390_sie_block *, unsigned long *); + +static inline int kvm_set_bit_user(int nr, void __user *addr) +{ + return 0; +} + #endif diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 3f0007b..9e22df9 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -795,4 +795,9 @@ void kvm_set_shared_msr(unsigned index, u64 val, u64 mask); bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip); +static inline int kvm_set_bit_user(int nr, void __user *addr) +{ + return set_bit_user_non_atomic(nr, addr); +} + #endif /* _ASM_X86_KVM_HOST_H */ -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro
Although we can use *_le_bit() helpers to treat bitmaps le arranged, having le bit offset calculation as a seperate macro gives us more freedom. For example, KVM has le arranged dirty bitmaps for VGA, live-migration and they are used in user space too. To avoid bitmap copies between kernel and user space, we want to update the bitmaps in user space directly. To achive this, le bit offset with *_user() functions help us a lot. So let us use the le bit offset calculation part by defining it as a new macro: generic_le_bit_offset() . Signed-off-by: Takuya Yoshikawa CC: Arnd Bergmann --- include/asm-generic/bitops/le.h |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h index 80e3bf1..ee445fb 100644 --- a/include/asm-generic/bitops/le.h +++ b/include/asm-generic/bitops/le.h @@ -9,6 +9,8 @@ #if defined(__LITTLE_ENDIAN) +#define generic_le_bit_offset(nr) (nr) + #define generic_test_le_bit(nr, addr) test_bit(nr, addr) #define generic___set_le_bit(nr, addr) __set_bit(nr, addr) #define generic___clear_le_bit(nr, addr) __clear_bit(nr, addr) @@ -25,6 +27,8 @@ #elif defined(__BIG_ENDIAN) +#define generic_le_bit_offset(nr) ((nr) ^ BITOP_LE_SWIZZLE) + #define generic_test_le_bit(nr, addr) \ test_bit((nr) ^ BITOP_LE_SWIZZLE, (addr)) #define generic___set_le_bit(nr, addr) \ -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC][PATCH 7/12 not tested yet] PPC: introduce __set_bit() like function for bitmaps in user space
During the work of KVM's dirty page logging optimization, we encountered the need of manipulating bitmaps in user space efficiantly. To achive this, we introduce a uaccess function for setting a bit in user space following Avi's suggestion. KVM is now using dirty bitmaps for live-migration and VGA. Although we need to update them from kernel side, copying them every time for updating the dirty log is a big bottleneck. Especially, we tested that zero-copy bitmap manipulation improves responses of GUI manipulations a lot. We also found one similar need in drivers/vhost/vhost.c in which the author implemented set_bit_to_user() locally using inefficient functions: see TODO at the top of that. Probably, this kind of need would be common for virtualization area. So we introduce a function set_bit_user_non_atomic(). Signed-off-by: Takuya Yoshikawa Signed-off-by: Fernando Luis Vazquez Cao CC: Alexander Graf CC: Benjamin Herrenschmidt CC: Paul Mackerras --- arch/powerpc/include/asm/uaccess.h | 19 +++ 1 files changed, 19 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index 3a01ce8..f878326 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -321,6 +321,25 @@ do { \ __gu_err; \ }) +static inline int set_bit_user_non_atomic(int nr, void __user *addr) +{ + u8 __user *p; + u8 val; + + p = (u8 __user *)((unsigned long)addr + nr / BITS_PER_BYTE); + if (!access_ok(VERIFY_WRITE, p, 1)) + return -EFAULT; + + if (__get_user(val, p)) + return -EFAULT; + + val |= 1U << (nr % BITS_PER_BYTE); + if (__put_user(val, p)) + return -EFAULT; + + return 0; +} + /* more complex routines */ -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC][PATCH 6/12 not tested yet] PPC: introduce copy_in_user() for 32-bit
During the work of KVM's dirty page logging optimization, we encountered the need of copy_in_user() for 32-bit ppc and x86: these will be used for manipulating dirty bitmaps in user space. So we implement copy_in_user() for 32-bit with __copy_tofrom_user(). Signed-off-by: Takuya Yoshikawa Signed-off-by: Fernando Luis Vazquez Cao CC: Alexander Graf CC: Benjamin Herrenschmidt CC: Paul Mackerras --- arch/powerpc/include/asm/uaccess.h | 17 + 1 files changed, 17 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index bd0fb84..3a01ce8 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -359,6 +359,23 @@ static inline unsigned long copy_to_user(void __user *to, return n; } +static inline unsigned long copy_in_user(void __user *to, + const void __user *from, unsigned long n) +{ + unsigned long over; + + if (likely(access_ok(VERIFY_READ, from, n) && + access_ok(VERIFY_WRITE, to, n))) + return __copy_tofrom_user(to, from, n); + if (((unsigned long)from < TASK_SIZE) || + ((unsigned long)to < TASK_SIZE)) { + over = max((unsigned long)from, (unsigned long)to) + + n - TASK_SIZE; + return __copy_tofrom_user(to, from, n - over) + over; + } + return n; +} + #else /* __powerpc64__ */ #define __copy_in_user(to, from, size) \ -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC][PATCH 5/12] x86: introduce __set_bit() like function for bitmaps in user space
During the work of KVM's dirty page logging optimization, we encountered the need of manipulating bitmaps in user space efficiantly. To achive this, we introduce a uaccess function for setting a bit in user space following Avi's suggestion. KVM is now using dirty bitmaps for live-migration and VGA. Although we need to update them from kernel side, copying them every time for updating the dirty log is a big bottleneck. Especially, we tested that zero-copy bitmap manipulation improves responses of GUI manipulations a lot. We also found one similar need in drivers/vhost/vhost.c in which the author implemented set_bit_to_user() locally using inefficient functions: see TODO at the top of that. Probably, this kind of need would be common for virtualization area. So we introduce a macro set_bit_user_non_atomic() following the implementation style of x86's uaccess functions. Note: there is a one restriction to this macro: bitmaps must be 64-bit aligned (see the comment in this patch). Signed-off-by: Takuya Yoshikawa Signed-off-by: Fernando Luis Vazquez Cao CC: Avi Kivity Cc: Thomas Gleixner CC: Ingo Molnar Cc: "H. Peter Anvin" --- arch/x86/include/asm/uaccess.h | 39 +++ 1 files changed, 39 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index abd3e0e..3138e65 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -98,6 +98,45 @@ struct exception_table_entry { extern int fixup_exception(struct pt_regs *regs); +/** + * set_bit_user_non_atomic: - set a bit of a bitmap in user space. + * @nr: Bit offset. + * @addr: Base address of a bitmap in user space. + * + * Context: User context only. This function may sleep. + * + * This macro set a bit of a bitmap in user space. + * + * Restriction: the bitmap pointed to by @addr must be 64-bit aligned: + * the kernel accesses the bitmap by its own word length, so bitmaps + * allocated by 32-bit processes may cause fault. + * + * Returns zero on success, or -EFAULT on error. + */ +#define __set_bit_user_non_atomic_asm(nr, addr, err, errret) \ + asm volatile("1:bts %1,%2\n"\ +"2:\n" \ +".section .fixup,\"ax\"\n" \ +"3:mov %3,%0\n"\ +" jmp 2b\n" \ +".previous\n" \ +_ASM_EXTABLE(1b, 3b) \ +: "=r"(err)\ +: "r" (nr), "m" (__m(addr)), "i" (errret), "0" (err)) + +#define set_bit_user_non_atomic(nr, addr) \ +({ \ + int __ret_sbu; \ + \ + might_fault(); \ + if (access_ok(VERIFY_WRITE, addr, nr/8 + 1))\ + __set_bit_user_non_atomic_asm(nr, addr, __ret_sbu, -EFAULT);\ + else\ + __ret_sbu = -EFAULT;\ + \ + __ret_sbu; \ +}) + /* * These are the main single-value transfer routines. They automatically * use the right size if we just have the right pointer type. -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC][PATCH 4/12] x86: introduce copy_in_user() for 32-bit
During the work of KVM's dirty page logging optimization, we encountered the need of copy_in_user() for 32-bit x86 and ppc: these will be used for manipulating dirty bitmaps in user space. So we implement copy_in_user() for 32-bit with existing generic copy user helpers. Signed-off-by: Takuya Yoshikawa Signed-off-by: Fernando Luis Vazquez Cao CC: Avi Kivity Cc: Thomas Gleixner CC: Ingo Molnar Cc: "H. Peter Anvin" --- arch/x86/include/asm/uaccess_32.h |2 ++ arch/x86/lib/usercopy_32.c| 26 ++ 2 files changed, 28 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h index 088d09f..85d396d 100644 --- a/arch/x86/include/asm/uaccess_32.h +++ b/arch/x86/include/asm/uaccess_32.h @@ -21,6 +21,8 @@ unsigned long __must_check __copy_from_user_ll_nocache (void *to, const void __user *from, unsigned long n); unsigned long __must_check __copy_from_user_ll_nocache_nozero (void *to, const void __user *from, unsigned long n); +unsigned long __must_check copy_in_user + (void __user *to, const void __user *from, unsigned n); /** * __copy_to_user_inatomic: - Copy a block of data into user space, with less checking. diff --git a/arch/x86/lib/usercopy_32.c b/arch/x86/lib/usercopy_32.c index e218d5d..e90ffc3 100644 --- a/arch/x86/lib/usercopy_32.c +++ b/arch/x86/lib/usercopy_32.c @@ -889,3 +889,29 @@ void copy_from_user_overflow(void) WARN(1, "Buffer overflow detected!\n"); } EXPORT_SYMBOL(copy_from_user_overflow); + +/** + * copy_in_user: - Copy a block of data from user space to user space. + * @to: Destination address, in user space. + * @from: Source address, in user space. + * @n:Number of bytes to copy. + * + * Context: User context only. This function may sleep. + * + * Copy data from user space to user space. + * + * Returns number of bytes that could not be copied. + * On success, this will be zero. + */ +unsigned long +copy_in_user(void __user *to, const void __user *from, unsigned n) +{ + if (access_ok(VERIFY_WRITE, to, n) && access_ok(VERIFY_READ, from, n)) { + if (movsl_is_ok(to, from, n)) + __copy_user(to, from, n); + else + n = __copy_user_intel(to, (const void *)from, n); + } + return n; +} +EXPORT_SYMBOL(copy_in_user); -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC][PATCH 3/12] KVM: introduce wrapper functions to create and destroy dirty bitmaps
We will change the vmalloc() and vfree() to do_mmap() and do_munmap() later. This patch makes it easy and cleanup the code. Signed-off-by: Takuya Yoshikawa Signed-off-by: Fernando Luis Vazquez Cao --- virt/kvm/kvm_main.c | 27 --- 1 files changed, 20 insertions(+), 7 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7ab6312..3e3acad 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -435,6 +435,12 @@ out_err_nodisable: return ERR_PTR(r); } +static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot) +{ + vfree(memslot->dirty_bitmap); + memslot->dirty_bitmap = NULL; +} + /* * Free any memory in @free but not in @dont. */ @@ -447,7 +453,7 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot *free, vfree(free->rmap); if (!dont || free->dirty_bitmap != dont->dirty_bitmap) - vfree(free->dirty_bitmap); + kvm_destroy_dirty_bitmap(free); for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) { @@ -458,7 +464,6 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot *free, } free->npages = 0; - free->dirty_bitmap = NULL; free->rmap = NULL; } @@ -520,6 +525,18 @@ static int kvm_vm_release(struct inode *inode, struct file *filp) return 0; } +static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot) +{ + unsigned long dirty_bytes = kvm_dirty_bitmap_bytes(memslot); + + memslot->dirty_bitmap = vmalloc(dirty_bytes); + if (!memslot->dirty_bitmap) + return -ENOMEM; + + memset(memslot->dirty_bitmap, 0, dirty_bytes); + return 0; +} + /* * Allocate some memory and give it an address in the guest physical address * space. @@ -653,12 +670,8 @@ skip_lpage: /* Allocate page dirty bitmap if needed */ if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES) && !new.dirty_bitmap) { - unsigned long dirty_bytes = kvm_dirty_bitmap_bytes(&new); - - new.dirty_bitmap = vmalloc(dirty_bytes); - if (!new.dirty_bitmap) + if (kvm_create_dirty_bitmap(&new) < 0) goto out_free; - memset(new.dirty_bitmap, 0, dirty_bytes); /* destroy any largepage mappings for dirty tracking */ if (old.npages) flush_shadow = 1; -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC][PATCH 2/12] KVM: introduce slot level dirty state management
This patch introduces is_dirty member for each memory slot. Using this member, we remove the dirty bitmap scans which are done in get_dirty_log(). This is important for moving dirty bitmaps to user space because we don't have any good ways to check bitmaps in user space with low cost and scanning bitmaps to check memory slot dirtiness will not be acceptable. When we mark a slot dirty: - x86 and ppc: at the timing of mark_page_dirty() - ia64: at the timing of kvm_ia64_sync_dirty_log() ia64 uses a different place to store dirty logs and synchronize it with the logs of memory slots right before the get_dirty_log(). So we use this timing to update is_dirty. Signed-off-by: Takuya Yoshikawa Signed-off-by: Fernando Luis Vazquez Cao CC: Avi Kivity CC: Alexander Graf --- arch/ia64/kvm/kvm-ia64.c | 11 +++ arch/powerpc/kvm/book3s.c |9 - arch/x86/kvm/x86.c|9 +++-- include/linux/kvm_host.h |4 ++-- virt/kvm/kvm_main.c | 13 +++-- 5 files changed, 19 insertions(+), 27 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index d5f4e91..17fd65c 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -1824,6 +1824,9 @@ static int kvm_ia64_sync_dirty_log(struct kvm *kvm, base = memslot->base_gfn / BITS_PER_LONG; for (i = 0; i < n/sizeof(long); ++i) { + if (dirty_bitmap[base + i]) + memslot->is_dirty = true; + memslot->dirty_bitmap[i] = dirty_bitmap[base + i]; dirty_bitmap[base + i] = 0; } @@ -1838,7 +1841,6 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, int r; unsigned long n; struct kvm_memory_slot *memslot; - int is_dirty = 0; mutex_lock(&kvm->slots_lock); spin_lock(&kvm->arch.dirty_log_lock); @@ -1847,16 +1849,17 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, if (r) goto out; - r = kvm_get_dirty_log(kvm, log, &is_dirty); + r = kvm_get_dirty_log(kvm, log); if (r) goto out; + memslot = &kvm->memslots->memslots[log->slot]; /* If nothing is dirty, don't bother messing with page tables. */ - if (is_dirty) { + if (memslot->is_dirty) { kvm_flush_remote_tlbs(kvm); - memslot = &kvm->memslots->memslots[log->slot]; n = kvm_dirty_bitmap_bytes(memslot); memset(memslot->dirty_bitmap, 0, n); + memslot->is_dirty = false; } r = 0; out: diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 28e785f..4b074f1 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -1191,20 +1191,18 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot; struct kvm_vcpu *vcpu; ulong ga, ga_end; - int is_dirty = 0; int r; unsigned long n; mutex_lock(&kvm->slots_lock); - r = kvm_get_dirty_log(kvm, log, &is_dirty); + r = kvm_get_dirty_log(kvm, log); if (r) goto out; + memslot = &kvm->memslots->memslots[log->slot]; /* If nothing is dirty, don't bother messing with page tables. */ - if (is_dirty) { - memslot = &kvm->memslots->memslots[log->slot]; - + if (memslot->is_dirty) { ga = memslot->base_gfn << PAGE_SHIFT; ga_end = ga + (memslot->npages << PAGE_SHIFT); @@ -1213,6 +1211,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, n = kvm_dirty_bitmap_bytes(memslot); memset(memslot->dirty_bitmap, 0, n); + memslot->is_dirty = false; } r = 0; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b95a211..023c7f8 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2740,10 +2740,9 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm, int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) { - int r, i; + int r; struct kvm_memory_slot *memslot; unsigned long n; - unsigned long is_dirty = 0; mutex_lock(&kvm->slots_lock); @@ -2758,11 +2757,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, n = kvm_dirty_bitmap_bytes(memslot); - for (i = 0; !is_dirty && i < n/sizeof(long); i++) - is_dirty = memslot->dirty_bitmap[i]; - /* If nothing is dirty, don't bother messing with page tables. */ - if (is_dirty) { + if (memslot->is_dirty) { struct kvm_memslots *slots, *old_slots; unsigned long *dirty_bitmap; @@ -2784,6 +2780,7 @@ int kvm_vm_ioctl_get_dirty_log(struct
[RFC][PATCH 1/12 applied today] KVM: x86: avoid unnecessary bitmap allocation when memslot is clean
Although we always allocate a new dirty bitmap in x86's get_dirty_log(), it is only used as a zero-source of copy_to_user() and freed right after that when memslot is clean. This patch uses clear_user() instead of doing this unnecessary zero-source allocation. Performance improvement: as we can expect easily, the time needed to allocate a bitmap is completely reduced. Furthermore, we can avoid the tlb flush triggered by vmalloc() and get some good effects. In my test, the improved ioctl was about 4 to 10 times faster than the original one for clean slots. Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/x86.c | 37 +++-- 1 files changed, 23 insertions(+), 14 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6b2ce1d..b95a211 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2744,7 +2744,6 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot; unsigned long n; unsigned long is_dirty = 0; - unsigned long *dirty_bitmap = NULL; mutex_lock(&kvm->slots_lock); @@ -2759,27 +2758,30 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, n = kvm_dirty_bitmap_bytes(memslot); - r = -ENOMEM; - dirty_bitmap = vmalloc(n); - if (!dirty_bitmap) - goto out; - memset(dirty_bitmap, 0, n); - for (i = 0; !is_dirty && i < n/sizeof(long); i++) is_dirty = memslot->dirty_bitmap[i]; /* If nothing is dirty, don't bother messing with page tables. */ if (is_dirty) { struct kvm_memslots *slots, *old_slots; + unsigned long *dirty_bitmap; spin_lock(&kvm->mmu_lock); kvm_mmu_slot_remove_write_access(kvm, log->slot); spin_unlock(&kvm->mmu_lock); - slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); - if (!slots) - goto out_free; + r = -ENOMEM; + dirty_bitmap = vmalloc(n); + if (!dirty_bitmap) + goto out; + memset(dirty_bitmap, 0, n); + r = -ENOMEM; + slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); + if (!slots) { + vfree(dirty_bitmap); + goto out; + } memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots)); slots->memslots[log->slot].dirty_bitmap = dirty_bitmap; @@ -2788,13 +2790,20 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, synchronize_srcu_expedited(&kvm->srcu); dirty_bitmap = old_slots->memslots[log->slot].dirty_bitmap; kfree(old_slots); + + r = -EFAULT; + if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n)) { + vfree(dirty_bitmap); + goto out; + } + vfree(dirty_bitmap); + } else { + r = -EFAULT; + if (clear_user(log->dirty_bitmap, n)) + goto out; } r = 0; - if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n)) - r = -EFAULT; -out_free: - vfree(dirty_bitmap); out: mutex_unlock(&kvm->slots_lock); return r; -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space
Hi, sorry for sending from my personal account. The following series are all from me: From: Takuya Yoshikawa The 3rd version of "moving dirty bitmaps to user space". >From this version, we add x86 and ppc and asm-generic people to CC lists. [To KVM people] Sorry for being late to reply your comments. Avi, - I've wrote an answer to your question in patch 5/12: drivers/vhost/vhost.c . - I've considered to change the set_bit_user_non_atomic to an inline function, but did not change because the other helpers in the uaccess.h are written as macros. Anyway, I hope that x86 people will give us appropriate suggestions about this. - I thought that documenting about making bitmaps 64-bit aligned will be written when we add an API to register user-allocated bitmaps. So probably in the next series. Avi, Alex, - Could you check the ia64 and ppc parts, please? I tried to keep the logical changes as small as possible. I personally tried to build these with cross compilers. For ia64, I could check build success with my patch series. But book3s, even without my patch series, it failed with the following errors: arch/powerpc/kvm/book3s_paired_singles.c: In function 'kvmppc_emulate_paired_single': arch/powerpc/kvm/book3s_paired_singles.c:1289: error: the frame size of 2288 bytes is larger than 2048 bytes make[1]: *** [arch/powerpc/kvm/book3s_paired_singles.o] Error 1 make: *** [arch/powerpc/kvm] Error 2 About changelog: there are two main changes from the 2nd version: 1. I changed the treatment of clean slots (see patch 1/12). This was already applied today, thanks! 2. I changed the switch API. (see patch 11/12). To show this API's advantage, I also did a test (see the end of this mail). [To x86 people] Hi, Thomas, Ingo, Peter, Please review the patches 4,5/12. Because this is the first experience for me to send patches to x86, please tell me if this lacks anything. [To ppc people] Hi, Benjamin, Paul, Alex, Please see the patches 6,7/12. I first say sorry for that I've not tested these yet. In that sense, these may not be in the quality for precise reviews. But I will be happy if you would give me any comments. Alex, could you help me? Though I have a plan to get PPC box in the future, currently I cannot test these. [To asm-generic people] Hi, Arnd, Please review the patch 8/12. This kind of macro is acceptable? [Performance test] We measured the tsc needed to the ioctl()s for getting dirty logs in kernel. Test environment AMD Phenom(tm) 9850 Quad-Core Processor with 8GB memory 1. GUI test (running Ubuntu guest in graphical mode) sudo qemu-system-x86_64 -hda dirtylog_test.img -boot c -m 4192 -net ... We show a relatively stable part to compare how much time is needed for the basic parts of dirty log ioctl. get.org get.opt switch.opt slots[7].len=32768 278379 66398 64024 slots[8].len=32768 181246 270 160 slots[7].len=32768 263961 64673 64494 slots[8].len=32768 181655 265 160 slots[7].len=32768 263736 64701 64610 slots[8].len=32768 182785 267 160 slots[7].len=32768 260925 65360 65042 slots[8].len=32768 182579 264 160 slots[7].len=32768 267823 65915 65682 slots[8].len=32768 186350 271 160 At a glance, we know our optimization improved significantly compared to the original get dirty log ioctl. This is true for both get.opt and switch.opt. This has a really big impact for the personal KVM users who drive KVM in GUI mode on their usual PCs. Next, we notice that switch.opt improved a hundred nano seconds or so for these slots. Although this may sound a bit tiny improvement, we can feel this as a difference of GUI's responses like mouse reactions. To feel the difference, please try GUI on your PC with our patch series! 2. Live-migration test (4GB guest, write loop with 1GB buf) We also did a live-migration test. get.org get.opt switch.opt slots[0].len=655360 797383261144222181 slots[1].len=37570478082186721 1965244 1842824 slots[2].len=637534208 1433562 1012723 1031213 slots[3].len=131072 216858 331 331 slots[4].len=131072 121635 225 164 slots[5].len=131072 120863 356 164 slots[6].len=16777216 121746 1133 156 slots[7].len=32768 120415 230 278 slots[8].len=32768 120368 216 149 slots[0].len=655360 806497194710223582 slots[1].len=37570478082142922 1878025 1895369 slots[2].len=637534208 1386512 1021309 1000345 slots[3].len=131072 221118 459 296 slots[4].len=131072 121516 272 166 slots[5].len=131072