Re: [Qemu-devel] Re: Another SIGFPE in display code, now in cirrus
On 05/12/2010 09:11 PM, Stefano Stabellini wrote: On Wed, 12 May 2010, Jamie Lokier wrote: Stefano Stabellini wrote: On Wed, 12 May 2010, Avi Kivity wrote: It's useful if you have a one-line horizontal pattern you want to propagate all over. It might be useful all right, but it is not entirely clear what the hardware should do in this situation from the documentation we have, and certainly the current state of the cirrus emulation code doesn't help. It's quite a reasonable thing for hardware to do, even if not documented. It would be surprising if the hardware didn't copy the one-line pattern. All right then, you convinced me :) This is my proposed solution, however it is untested with Windows NT. Signed-off-by: Stefano Stabellinistefano.stabell...@eu.citrix.com --- diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c index 9f61a01..a7f0d3c 100644 --- a/hw/cirrus_vga.c +++ b/hw/cirrus_vga.c @@ -676,15 +676,17 @@ static void cirrus_do_copy(CirrusVGAState *s, int dst, int src, int w, int h) int sx, sy; int dx, dy; int width, height; +uint32_t start_addr, line_offset, line_compare; int depth; int notify = 0; depth = s-vga.get_bpp(s-vga) / 8; s-vga.get_resolution(s-vga,width,height); +s-vga.get_offsets(s-vga,line_offset,start_addr,line_compare); /* extra x, y */ -sx = (src % ABS(s-cirrus_blt_srcpitch)) / depth; -sy = (src / ABS(s-cirrus_blt_srcpitch)); +sx = (src % line_offset) / depth; +sy = (src / line_offset); dx = (dst % ABS(s-cirrus_blt_dstpitch)) / depth; dy = (dst / ABS(s-cirrus_blt_dstpitch)); dx/dy also need to be calculated according to line_offset. I think the rules are: if src_byte_range in screen and dst_byte_range in screen and srcpitch == dstpitch == line_offset: can use qemu_console_copy elif dst_byte_range overlaps screen: if dstpitch == line_offset: invalidate rectangle else: invalidate full screen -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Info on Unix Domain Socket Device
On 05/29/2010 08:11 PM, Narendra Prasad Madanapalli wrote: Hi, There is a small scale TODO task in kvm, 'Add a Unix domain socket device. With this, the guest can talk to a pci device which is connected to a Unix domain socket on the host.'. I would like to park on this and contribute a patch. If any one is also planning to to do this, we can coordinate and accomplish the task. This was already implemented in qemu (as a large scale TODO) and is called virtio-serial. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add Documentation/kvm/msr.txt
On 05/27/2010 07:02 PM, Glauber Costa wrote: + +Custom MSR list + + +The current supported Custom MSR list is: + +MSR_KVM_WALL_CLOCK: 0x11 + + data: physical address of a memory area. Which must be in guest RAM (i.e., don't point it somewhere random and expect the hypervisor to allocate it for you). Must be aligned to 4 bytes (we don't enforce it though). I don't see the reason for it. Insurance. If this is a requirement, our own implementation is failing to meet it. How are we failing to meet it? All guests align to at least four bytes. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: MMU: always invalidate and flush on spte page size change
On 05/28/2010 03:44 PM, Marcelo Tosatti wrote: Always invalidate spte and flush TLBs when changing page size, to make sure different sized translations for the same address are never cached in a CPU's TLB. The first case where this occurs is when a non-leaf spte pointer is overwritten by a leaf, large spte entry. This can happen after dirty logging is disabled on a memslot, for example. The second case is a leaf, large spte entry is overwritten with a non-leaf spte pointer, in __direct_map. Note this cannot happen now because the only potential source of such overwrite is dirty logging being enabled, which zaps all MMU pages. But this might change in the future, so better be robust against it. Noticed by Andrea. KVM-Stable-Tag Signed-off-by: Marcelo Tosattimtosa...@redhat.com Index: kvm/arch/x86/kvm/mmu.c === --- kvm.orig/arch/x86/kvm/mmu.c +++ kvm/arch/x86/kvm/mmu.c @@ -1952,6 +1952,8 @@ static void mmu_set_spte(struct kvm_vcpu child = page_header(pte PT64_BASE_ADDR_MASK); mmu_page_remove_parent_pte(child, sptep); + __set_spte(sptep, shadow_trap_nonpresent_pte); + kvm_flush_remote_tlbs(vcpu-kvm); } else if (pfn != spte_to_pfn(*sptep)) { pgprintk(hfn old %lx new %lx\n, spte_to_pfn(*sptep), pfn); Applied this bit. @@ -2015,6 +2017,16 @@ static int __direct_map(struct kvm_vcpu break; } + if (is_shadow_present_pte(*iterator.sptep) + !is_large_pte(*iterator.sptep)) + continue; + + if (is_large_pte(*iterator.sptep)) { + rmap_remove(vcpu-kvm, iterator.sptep); + __set_spte(iterator.sptep, shadow_trap_nonpresent_pte); + kvm_flush_remote_tlbs(vcpu-kvm); + } + Don't we have exactly the same issue in FNAME(fetch)()? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2-RFC 0/2] virtio: put last seen used index into ring itself
On Fri, May 28, 2010 at 11:56:54AM +0200, Jes Sorensen wrote: On 05/26/10 21:50, Michael S. Tsirkin wrote: Here's a rewrite of the original patch with a new layout. I haven't tested it yet so no idea how this performs, but I think this addresses the cache bounce issue raised by Avi. Posting for early flames/comments. Generally, the Host end of the virtio ring doesn't need to see where Guest is up to in consuming the ring. However, to completely understand what's going on from the outside, this information must be exposed. For example, host can reduce the number of interrupts by detecting that the guest is currently handling previous buffers. We add a feature bit so the guest can tell the host that it's writing out the current value there, if it wants to use that. This differs from original approach in that the used index is put after avail index (they are typically written out together). To avoid cache bounces on descriptor access, and make future extensions easier, we put the ring itself at start of page, and move the control after it. Hi Michael, It looks pretty good to me, however one thing I have been thinking of while reading through it: Rather than storing a pointer within the ring struct, pointing into a position within the same struct. How about storing a byte offset instead and using a cast to get to the pointer position? That would avoid the pointer dereference, which is less effective cache wise and harder for the CPU to predict. Not sure whether it really matters performance wise, just a thought. Cheers, Jes I think this won't work: when PUBLUSH_USED_IDX is negotiated, the pointer is to within the ring. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup
On Fri, May 28, 2010 at 05:54:42PM +0200, Tejun Heo wrote: Hello, On 05/28/2010 05:08 PM, Michael S. Tsirkin wrote: Well, we have create_singlethread_workqueue, right? This is not very different ... is it? Just copying structures and code from workqueue.c, adding vhost_ in front of it will definitely work: Sure it will, but you'll probably be able to get away with much less. there is nothing magic about the workqueue library. But this just involves cut and paste which might be best avoided. What I'm saying is that some magic needs to be added to workqueue and if you add this single(!) exception, it will have to be backed out pretty soon, so it would be better to do it properly now. One final idea before we go the cut and paste way: how about 'create_workqueue_from_task' that would get a thread and have workqueue run there? You can currently depend on that implementation detail but it's not the workqueue interface is meant to do. The single threadedness is there as execution ordering and concurrency specification and it doesn't (or rather won't) necessarily mean that a specific single thread is bound to certain workqueue. Can you please direct me to have a look at the code. I'll be happy to do the conversion for you. Great, thanks! The code in question is in drivers/vhost/vhost.c It is used from drivers/vhost/net.c On top of this, we have patchset from Sridhar Samudrala, titled '0/3 Make vhost multi-threaded and associate each thread to its guest's cgroup': cgroups: Add an API to attach a task to current task's cgroup workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup vhost: make it more scalable by creating a vhost thread per device I have bounced the last three your way. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On 05/30/2010 03:19 PM, Michael S. Tsirkin wrote: On Fri, May 28, 2010 at 04:07:38PM -0700, Tom Lyon wrote: The VFIO driver is used to allow privileged AND non-privileged processes to implement user-level device drivers for any well-behaved PCI, PCI-X, and PCIe devices. Signed-off-by: Tom Lyonp...@cisco.com --- This patch is the evolution of code which was first proposed as a patch to uio/uio_pci_generic, then as a more generic uio patch. Now it is taken entirely out of the uio framework, and things seem much cleaner. Of course, there is a lot of functional overlap with uio, but the previous version just seemed like a giant mode switch in the uio code that did not lead to clarity for either the new or old code. IMO this was because this driver does two things: programming iommu and handling interrupts. uio does interrupt handling. We could have moved iommu / DMA programming to a separate driver, and have uio work with it. This would solve limitation of the current driver that is needs an iommu domain per device. How do we enforce security then? We need to ensure that unprivileged users can only use the device with an iommu. [a pony for avi...] The major new functionality in this version is the ability to deal with PCI config space accesses (through read write calls) - but includes table driven code to determine whats safe to write and what is not. I don't really see why this is helpful: a driver written corrrectly will not access these addresses, and we need an iommu anyway to protect us against a drivers. Haven't reviewed the code (yet) but things like the BARs, MSI, and interrupt disable need to be protected from the guest regardless of the iommu. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] KVM: MMU: introduce some macros to cleanup hlist traverseing
Introduce for_each_gfn_sp(), for_each_gfn_indirect_sp() and for_each_gfn_indirect_valid_sp() to cleanup hlist traverseing Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c | 129 ++-- 1 files changed, 54 insertions(+), 75 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 56f8c3c..84c705e 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1200,6 +1200,22 @@ static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp) static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp); +#define for_each_gfn_sp(kvm, sp, gfn, pos, n) \ + hlist_for_each_entry_safe(sp, pos, n, \ + kvm-arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link)\ + if (sp-gfn == gfn) + +#define for_each_gfn_indirect_sp(kvm, sp, gfn, pos, n) \ + hlist_for_each_entry_safe(sp, pos, n, \ + kvm-arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link)\ + if (sp-gfn == gfn !sp-role.direct) + +#define for_each_gfn_indirect_valid_sp(kvm, sp, gfn, pos, n) \ + hlist_for_each_entry_safe(sp, pos, n, \ + kvm-arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link)\ + if (sp-gfn == gfn !sp-role.direct\ + !sp-role.invalid) + static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, bool clear_unsync) { @@ -1243,16 +1259,12 @@ static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) /* @gfn should be write-protected at the call site */ static void kvm_sync_pages(struct kvm_vcpu *vcpu, gfn_t gfn) { - struct hlist_head *bucket; struct kvm_mmu_page *s; struct hlist_node *node, *n; - unsigned index; bool flush = false; - index = kvm_page_table_hashfn(gfn); - bucket = vcpu-kvm-arch.mmu_page_hash[index]; - hlist_for_each_entry_safe(s, node, n, bucket, hash_link) { - if (s-gfn != gfn || !s-unsync || s-role.invalid) + for_each_gfn_indirect_valid_sp(vcpu-kvm, s, gfn, node, n) { + if (!s-unsync) continue; WARN_ON(s-role.level != PT_PAGE_TABLE_LEVEL); @@ -1364,9 +1376,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, u64 *parent_pte) { union kvm_mmu_page_role role; - unsigned index; unsigned quadrant; - struct hlist_head *bucket; struct kvm_mmu_page *sp; struct hlist_node *node, *tmp; bool need_sync = false; @@ -1382,36 +1392,36 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, quadrant = (1 ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1; role.quadrant = quadrant; } - index = kvm_page_table_hashfn(gfn); - bucket = vcpu-kvm-arch.mmu_page_hash[index]; - hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link) - if (sp-gfn == gfn) { - if (!need_sync sp-unsync) - need_sync = true; - if (sp-role.word != role.word) - continue; + for_each_gfn_sp(vcpu-kvm, sp, gfn, node, tmp) { + if (!need_sync sp-unsync) + need_sync = true; - if (sp-unsync kvm_sync_page_transient(vcpu, sp)) - break; + if (sp-role.word != role.word) + continue; - mmu_page_add_parent_pte(vcpu, sp, parent_pte); - if (sp-unsync_children) { - set_bit(KVM_REQ_MMU_SYNC, vcpu-requests); - kvm_mmu_mark_parents_unsync(sp); - } else if (sp-unsync) - kvm_mmu_mark_parents_unsync(sp); + if (sp-unsync kvm_sync_page_transient(vcpu, sp)) + break; - trace_kvm_mmu_get_page(sp, false); - return sp; - } + mmu_page_add_parent_pte(vcpu, sp, parent_pte); + if (sp-unsync_children) { + set_bit(KVM_REQ_MMU_SYNC, vcpu-requests); + kvm_mmu_mark_parents_unsync(sp); + } else if (sp-unsync) + kvm_mmu_mark_parents_unsync(sp); + + trace_kvm_mmu_get_page(sp, false); + return sp; + } ++vcpu-kvm-stat.mmu_cache_miss; sp = kvm_mmu_alloc_page(vcpu, parent_pte, direct); if (!sp) return sp; sp-gfn = gfn; sp-role = role; -
[PATCH 2/5] KVM: MMU: split the operations of kvm_mmu_zap_page()
Using kvm_mmu_prepare_zap_page() and kvm_mmu_commit_zap_page() to split kvm_mmu_zap_page() function, then we can: - traverse hlist safely - easily to gather remote tlb flush which occurs during page zapped Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/mmu.c | 42 ++- arch/x86/kvm/x86.c |1 + 3 files changed, 43 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 0cd0f29..e4df1cf 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -388,6 +388,7 @@ struct kvm_arch { * Hash table of struct kvm_mmu_page. */ struct list_head active_mmu_pages; + struct list_head invalid_mmu_pages; struct list_head assigned_dev_head; struct iommu_domain *iommu_domain; int iommu_flags; diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 84c705e..0c957bf 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -915,6 +915,7 @@ static int is_empty_shadow_page(u64 *spt) static void kvm_mmu_free_page(struct kvm *kvm, struct kvm_mmu_page *sp) { ASSERT(is_empty_shadow_page(sp-spt)); + hlist_del(sp-hash_link); list_del(sp-link); __free_page(virt_to_page(sp-spt)); if (!sp-role.direct) @@ -1560,6 +1561,46 @@ static int mmu_zap_unsync_children(struct kvm *kvm, return zapped; } +static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp) +{ + int ret; + + trace_kvm_mmu_zap_page(sp); + ++kvm-stat.mmu_shadow_zapped; + ret = mmu_zap_unsync_children(kvm, sp); + kvm_mmu_page_unlink_children(kvm, sp); + kvm_mmu_unlink_parents(kvm, sp); + if (!sp-role.invalid !sp-role.direct) + unaccount_shadowed(kvm, sp-gfn); + if (sp-unsync) + kvm_unlink_unsync_page(kvm, sp); + if (!sp-root_count) + /* Count self */ + ret++; + else + kvm_reload_remote_mmus(kvm); + + sp-role.invalid = 1; + list_move(sp-link, kvm-arch.invalid_mmu_pages); + kvm_mmu_reset_last_pte_updated(kvm); + return ret; +} + +static void kvm_mmu_commit_zap_page(struct kvm *kvm) +{ + struct kvm_mmu_page *sp, *n; + + if (list_empty(kvm-arch.invalid_mmu_pages)) + return; + + kvm_flush_remote_tlbs(kvm); + list_for_each_entry_safe(sp, n, kvm-arch.invalid_mmu_pages, link) { + WARN_ON(!sp-role.invalid); + if (!sp-root_count) + kvm_mmu_free_page(kvm, sp); + } +} + static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp) { int ret; @@ -1577,7 +1618,6 @@ static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp) if (!sp-root_count) { /* Count self */ ret++; - hlist_del(sp-hash_link); kvm_mmu_free_page(kvm, sp); } else { sp-role.invalid = 1; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5e5cd8d..225c3c4 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5331,6 +5331,7 @@ struct kvm *kvm_arch_create_vm(void) } INIT_LIST_HEAD(kvm-arch.active_mmu_pages); + INIT_LIST_HEAD(kvm-arch.invalid_mmu_pages); INIT_LIST_HEAD(kvm-arch.assigned_dev_head); /* Reserve bit 0 of irq_sources_bitmap for userspace irq source */ -- 1.6.1.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] KVM: MMU: gather remote tlb flush which occurs during page zapped
Using kvm_mmu_prepare_zap_page() and kvm_mmu_zap_page() instead of kvm_mmu_zap_page() that can reduce remote tlb flush IPI Have tested with xp/vista64, fedora12 32/64 guests, and it not broken Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c | 75 +++ arch/x86/kvm/mmutrace.h |2 +- 2 files changed, 31 insertions(+), 46 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 0c957bf..e2b1020 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1199,7 +1199,8 @@ static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp) --kvm-stat.mmu_unsync; } -static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp); +static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp); +static void kvm_mmu_commit_zap_page(struct kvm *kvm); #define for_each_gfn_sp(kvm, sp, gfn, pos, n) \ hlist_for_each_entry_safe(sp, pos, n, \ @@ -1221,7 +1222,7 @@ static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, bool clear_unsync) { if (sp-role.cr4_pae != !!is_pae(vcpu)) { - kvm_mmu_zap_page(vcpu-kvm, sp); + kvm_mmu_prepare_zap_page(vcpu-kvm, sp); return 1; } @@ -1232,7 +1233,7 @@ static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, } if (vcpu-arch.mmu.sync_page(vcpu, sp)) { - kvm_mmu_zap_page(vcpu-kvm, sp); + kvm_mmu_prepare_zap_page(vcpu-kvm, sp); return 1; } @@ -1249,6 +1250,7 @@ static int kvm_sync_page_transient(struct kvm_vcpu *vcpu, ret = __kvm_sync_page(vcpu, sp, false); if (!ret) mmu_convert_notrap(sp); + kvm_mmu_commit_zap_page(vcpu-kvm); return ret; } @@ -1271,13 +1273,13 @@ static void kvm_sync_pages(struct kvm_vcpu *vcpu, gfn_t gfn) WARN_ON(s-role.level != PT_PAGE_TABLE_LEVEL); if ((s-role.cr4_pae != !!is_pae(vcpu)) || (vcpu-arch.mmu.sync_page(vcpu, s))) { - kvm_mmu_zap_page(vcpu-kvm, s); + kvm_mmu_prepare_zap_page(vcpu-kvm, s); continue; } kvm_unlink_unsync_page(vcpu-kvm, s); flush = true; } - + kvm_mmu_commit_zap_page(vcpu-kvm); if (flush) kvm_mmu_flush_tlb(vcpu); } @@ -1363,6 +1365,7 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu, kvm_sync_page(vcpu, sp); mmu_pages_clear_parents(parents); } + kvm_mmu_commit_zap_page(vcpu-kvm); cond_resched_lock(vcpu-kvm-mmu_lock); kvm_mmu_pages_init(parent, parents, pages); } @@ -1551,7 +1554,7 @@ static int mmu_zap_unsync_children(struct kvm *kvm, struct kvm_mmu_page *sp; for_each_sp(pages, sp, parents, i) { - kvm_mmu_zap_page(kvm, sp); + kvm_mmu_prepare_zap_page(kvm, sp); mmu_pages_clear_parents(parents); zapped++; } @@ -1565,7 +1568,7 @@ static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp) { int ret; - trace_kvm_mmu_zap_page(sp); + trace_kvm_mmu_prepare_zap_page(sp); ++kvm-stat.mmu_shadow_zapped; ret = mmu_zap_unsync_children(kvm, sp); kvm_mmu_page_unlink_children(kvm, sp); @@ -1601,33 +1604,6 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm) } } -static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp) -{ - int ret; - - trace_kvm_mmu_zap_page(sp); - ++kvm-stat.mmu_shadow_zapped; - ret = mmu_zap_unsync_children(kvm, sp); - kvm_mmu_page_unlink_children(kvm, sp); - kvm_mmu_unlink_parents(kvm, sp); - kvm_flush_remote_tlbs(kvm); - if (!sp-role.invalid !sp-role.direct) - unaccount_shadowed(kvm, sp-gfn); - if (sp-unsync) - kvm_unlink_unsync_page(kvm, sp); - if (!sp-root_count) { - /* Count self */ - ret++; - kvm_mmu_free_page(kvm, sp); - } else { - sp-role.invalid = 1; - list_move(sp-link, kvm-arch.active_mmu_pages); - kvm_reload_remote_mmus(kvm); - } - kvm_mmu_reset_last_pte_updated(kvm); - return ret; -} - /* * Changing the number of mmu pages allocated to the vm * Note: if kvm_nr_mmu_pages is too small, you will get dead lock @@ -1652,8 +1628,9 @@ void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages) page = container_of(kvm-arch.active_mmu_pages.prev,
[PATCH 4/5] KVM: MMU: traverse sp hlish safely
Now, we can safely to traverse sp hlish Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c | 53 +++ 1 files changed, 24 insertions(+), 29 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index e2b1020..be75cba 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1202,18 +1202,18 @@ static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp) static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp); static void kvm_mmu_commit_zap_page(struct kvm *kvm); -#define for_each_gfn_sp(kvm, sp, gfn, pos, n) \ - hlist_for_each_entry_safe(sp, pos, n, \ +#define for_each_gfn_sp(kvm, sp, gfn, pos) \ + hlist_for_each_entry(sp, pos, \ kvm-arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link)\ if (sp-gfn == gfn) -#define for_each_gfn_indirect_sp(kvm, sp, gfn, pos, n) \ - hlist_for_each_entry_safe(sp, pos, n, \ +#define for_each_gfn_indirect_sp(kvm, sp, gfn, pos)\ + hlist_for_each_entry(sp, pos, \ kvm-arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link)\ if (sp-gfn == gfn !sp-role.direct) -#define for_each_gfn_indirect_valid_sp(kvm, sp, gfn, pos, n) \ - hlist_for_each_entry_safe(sp, pos, n, \ +#define for_each_gfn_indirect_valid_sp(kvm, sp, gfn, pos) \ + hlist_for_each_entry(sp, pos, \ kvm-arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link)\ if (sp-gfn == gfn !sp-role.direct\ !sp-role.invalid) @@ -1263,10 +1263,10 @@ static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) static void kvm_sync_pages(struct kvm_vcpu *vcpu, gfn_t gfn) { struct kvm_mmu_page *s; - struct hlist_node *node, *n; + struct hlist_node *node; bool flush = false; - for_each_gfn_indirect_valid_sp(vcpu-kvm, s, gfn, node, n) { + for_each_gfn_indirect_valid_sp(vcpu-kvm, s, gfn, node) { if (!s-unsync) continue; @@ -1382,7 +1382,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, union kvm_mmu_page_role role; unsigned quadrant; struct kvm_mmu_page *sp; - struct hlist_node *node, *tmp; + struct hlist_node *node; bool need_sync = false; role = vcpu-arch.mmu.base_role; @@ -1397,7 +1397,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, role.quadrant = quadrant; } - for_each_gfn_sp(vcpu-kvm, sp, gfn, node, tmp) { + for_each_gfn_sp(vcpu-kvm, sp, gfn, node) { if (!need_sync sp-unsync) need_sync = true; @@ -1644,18 +1644,17 @@ void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages) static int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn) { struct kvm_mmu_page *sp; - struct hlist_node *node, *n; + struct hlist_node *node; int r; pgprintk(%s: looking for gfn %lx\n, __func__, gfn); r = 0; -restart: - for_each_gfn_indirect_sp(kvm, sp, gfn, node, n) { + + for_each_gfn_indirect_sp(kvm, sp, gfn, node) { pgprintk(%s: gfn %lx role %x\n, __func__, gfn, sp-role.word); r = 1; - if (kvm_mmu_prepare_zap_page(kvm, sp)) - goto restart; + kvm_mmu_prepare_zap_page(kvm, sp); } kvm_mmu_commit_zap_page(kvm); return r; @@ -1664,14 +1663,12 @@ restart: static void mmu_unshadow(struct kvm *kvm, gfn_t gfn) { struct kvm_mmu_page *sp; - struct hlist_node *node, *nn; + struct hlist_node *node; -restart: - for_each_gfn_indirect_valid_sp(kvm, sp, gfn, node, nn) { + for_each_gfn_indirect_valid_sp(kvm, sp, gfn, node) { pgprintk(%s: zap %lx %x\n, __func__, gfn, sp-role.word); - if (kvm_mmu_prepare_zap_page(kvm, sp)) - goto restart; + kvm_mmu_prepare_zap_page(kvm, sp); } kvm_mmu_commit_zap_page(kvm); } @@ -1816,9 +1813,9 @@ static void __kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) static void kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn) { struct kvm_mmu_page *s; - struct hlist_node *node, *n; + struct hlist_node *node; - for_each_gfn_indirect_valid_sp(vcpu-kvm, s, gfn, node, n) { + for_each_gfn_indirect_valid_sp(vcpu-kvm, s, gfn, node) {
[PATCH 5/5] KVM: MMU: reduce remote tlb flush in kvm_mmu_pte_write()
collect remote tlb flush in kvm_mmu_pte_write() path Have tested with xp/vista64, fedora12 32/64 guests, and it not broken Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c | 20 1 files changed, 16 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index be75cba..cbd8d9c 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2629,11 +2629,15 @@ static bool need_remote_flush(u64 old, u64 new) return (old ~new PT64_PERM_MASK) != 0; } -static void mmu_pte_write_flush_tlb(struct kvm_vcpu *vcpu, u64 old, u64 new) +static void mmu_pte_write_flush_tlb(struct kvm_vcpu *vcpu, bool zap_page, + bool remote_flush, bool local_flush) { - if (need_remote_flush(old, new)) + if (zap_page) + return; + + if (remote_flush) kvm_flush_remote_tlbs(vcpu-kvm); - else + else if (local_flush) kvm_mmu_flush_tlb(vcpu); } @@ -2697,6 +2701,9 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, int npte; int r; int invlpg_counter; + bool remote_flush, local_flush, zap_page; + + zap_page = remote_flush = local_flush = false; pgprintk(%s: gpa %llx bytes %d\n, __func__, gpa, bytes); @@ -2770,6 +2777,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, */ pgprintk(misaligned: gpa %llx bytes %d role %x\n, gpa, bytes, sp-role.word); + zap_page = true; kvm_mmu_prepare_zap_page(vcpu-kvm, sp); ++vcpu-kvm-stat.mmu_flooded; continue; @@ -2794,16 +2802,20 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, if (quadrant != sp-role.quadrant) continue; } + + local_flush = true; spte = sp-spt[page_offset / sizeof(*spte)]; while (npte--) { entry = *spte; mmu_pte_write_zap_pte(vcpu, sp, spte); if (gentry) mmu_pte_write_new_pte(vcpu, sp, spte, gentry); - mmu_pte_write_flush_tlb(vcpu, entry, *spte); + if (!remote_flush need_remote_flush(entry, *spte)) + remote_flush = true; ++spte; } } + mmu_pte_write_flush_tlb(vcpu, zap_page, remote_flush, local_flush); kvm_mmu_commit_zap_page(vcpu-kvm); kvm_mmu_audit(vcpu, post pte write); spin_unlock(vcpu-kvm-mmu_lock); -- 1.6.1.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Sun, May 30, 2010 at 03:27:05PM +0300, Avi Kivity wrote: On 05/30/2010 03:19 PM, Michael S. Tsirkin wrote: On Fri, May 28, 2010 at 04:07:38PM -0700, Tom Lyon wrote: The VFIO driver is used to allow privileged AND non-privileged processes to implement user-level device drivers for any well-behaved PCI, PCI-X, and PCIe devices. Signed-off-by: Tom Lyonp...@cisco.com --- This patch is the evolution of code which was first proposed as a patch to uio/uio_pci_generic, then as a more generic uio patch. Now it is taken entirely out of the uio framework, and things seem much cleaner. Of course, there is a lot of functional overlap with uio, but the previous version just seemed like a giant mode switch in the uio code that did not lead to clarity for either the new or old code. IMO this was because this driver does two things: programming iommu and handling interrupts. uio does interrupt handling. We could have moved iommu / DMA programming to a separate driver, and have uio work with it. This would solve limitation of the current driver that is needs an iommu domain per device. How do we enforce security then? We need to ensure that unprivileged users can only use the device with an iommu. Force assigning to iommu before we allow any other operation? [a pony for avi...] The major new functionality in this version is the ability to deal with PCI config space accesses (through read write calls) - but includes table driven code to determine whats safe to write and what is not. I don't really see why this is helpful: a driver written corrrectly will not access these addresses, and we need an iommu anyway to protect us against a drivers. Haven't reviewed the code (yet) but things like the BARs, MSI, and interrupt disable need to be protected from the guest regardless of the iommu. Yes but userspace can do this. As long as userspace can not crash the kernel, no reason to put this policy into kernel. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On 05/29/2010 02:07 AM, Tom Lyon wrote: The VFIO driver is used to allow privileged AND non-privileged processes to implement user-level device drivers for any well-behaved PCI, PCI-X, and PCIe devices. + +Why is this interesting? Some applications, especially in the high performance +computing field, need access to hardware functions with as little overhead as +possible. Examples are in network adapters (typically non tcp/ip based) and +in compute accelerators - i.e., array processors, FPGA processors, etc. +Previous to the VFIO drivers these apps would need either a kernel-level +driver (with corrsponding overheads), or else root permissions to directly +access the hardware. The VFIO driver allows generic access to the hardware +from non-privileged apps IF the hardware is well-behaved enough for this +to be safe. + +Any SR-IOV virtual function meets the VFIO definition of well-behaved, but +there are many other non-IOV PCI devices which also meet the defintion. +Elements of this definition are: +- The size of any memory BARs to be mmap'ed into the user process space must be + a multiple of the system page size. You can relax this. - smaller than page size can be mapped if the rest of the page unused and if the platform tolerates writes to unused areas - if the rest of the page is used, we can relocate the BAR - otherwise, we can prevent mmap() but still allow mediated access via a syscall +- If MSI-X interrupts are used, the device driver must not attempt to mmap or + write the MSI-X vector area. We can allow mediated access (that's what qemu-kvm does). I guess the ioctls for setting up msi interrupts are equivalent to this mediated access. (later I see you do provide mediated access via pwrite - please confirm) +- The device must not use the PCI configuration space in any non-standard way, + i.e., the user level driver will be permitted only to read and write standard + fields of the PCI config space, and only if those fields cannot cause harm to + the system. In addition, some fields are virtualized, so that the user + driver can read/write them like a kernel driver, but they do not affect the + real device. What's wrong with nonstandard fields? + +Even with these restrictions, there are bound to be devices which are unsafe +for user level use - it is still up to the system admin to decide whether to +grant access to the device. When the vfio module is loaded, it will have +access to no devices until the desired PCI devices are bound to the driver. +First, make sure the devices are not bound to another kernel driver. You can +unload that driver if you wish to unbind all its devices, or else enter the +driver's sysfs directory, and unbind a specific device: + cd /sys/bus/pci/drivers/drivername + echo :06:02.00 unbind +(The :06:02.00 is a fully qualified PCI device name - different for each +device). Now, to bind to the vfio driver, go to /sys/bus/pci/drivers/vfio and +write the PCI device type of the target device to the new_id file: + echo 8086 10ca new_id +(8086 10ca are the vendor and device type for the Intel 82576 virtual function +devices). A /dev/vfioN entry will be created for each device bound. The final +step is to grant users permission by changing the mode and/or owner of the /dev +entry - chmod 666 /dev/vfio0. What if I have several such devices? Isn't it better to bind by topoloy (device address)? + +Reads Writes: + +The user driver will typically use mmap to access the memory BAR(s) of a +device; the I/O BARs and the PCI config space may be accessed through normal +read and write system calls. Only 1 file descriptor is needed for all driver +functions -- the desired BAR for I/O, memory, or config space is indicated via +high-order bits of the file offset. My preference would be one fd per BAR, but that's a matter of personal taste. For instance, the following implements a +write to the PCI config space: + + #includelinux/vfio.h + void pci_write_config_word(int pci_fd, u16 off, u16 wd) + { + off_t cfg_off = VFIO_PCI_CONFIG_OFF + off; + + if (pwrite(pci_fd,wd, 2, cfg_off) != 2) + perror(pwrite config_dword); + } + Nice, has the benefit of avoiding endianness issues in the interface. +The routines vfio_pci_space_to_offset and vfio_offset_to_pci_space are provided +in vfio.h to convert bar numbers to file offsets and vice-versa. + +Interrupts: + +Device interrupts are translated by the vfio driver into input events on event +notification file descriptors created by the eventfd system call. The user +program must one or more event descriptors and pass them to the vfio driver +via ioctls to arrange for the interrupt mapping: +1. + efd = eventfd(0, 0); + ioctl(vfio_fd, VFIO_EVENTFD_IRQ,efd); + This provides an eventfd for traditional IRQ interrupts. + IRQs will be
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Sun, May 30, 2010 at 04:01:53PM +0300, Avi Kivity wrote: On 05/30/2010 03:49 PM, Michael S. Tsirkin wrote: On Sun, May 30, 2010 at 03:27:05PM +0300, Avi Kivity wrote: On 05/30/2010 03:19 PM, Michael S. Tsirkin wrote: On Fri, May 28, 2010 at 04:07:38PM -0700, Tom Lyon wrote: The VFIO driver is used to allow privileged AND non-privileged processes to implement user-level device drivers for any well-behaved PCI, PCI-X, and PCIe devices. Signed-off-by: Tom Lyonp...@cisco.com --- This patch is the evolution of code which was first proposed as a patch to uio/uio_pci_generic, then as a more generic uio patch. Now it is taken entirely out of the uio framework, and things seem much cleaner. Of course, there is a lot of functional overlap with uio, but the previous version just seemed like a giant mode switch in the uio code that did not lead to clarity for either the new or old code. IMO this was because this driver does two things: programming iommu and handling interrupts. uio does interrupt handling. We could have moved iommu / DMA programming to a separate driver, and have uio work with it. This would solve limitation of the current driver that is needs an iommu domain per device. How do we enforce security then? We need to ensure that unprivileged users can only use the device with an iommu. Force assigning to iommu before we allow any other operation? That means the driver must be aware of the iommu. The userspace driver? Yes. And It is a good thing to be explicit there anyway, since this lets userspace map a non-contigious virtual address list into a contiguous bus address range. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] KVM: MMU: introduce some macros to cleanup hlist traverseing
On 05/30/2010 03:36 PM, Xiao Guangrong wrote: Introduce for_each_gfn_sp(), for_each_gfn_indirect_sp() and for_each_gfn_indirect_valid_sp() to cleanup hlist traverseing Signed-off-by: Xiao Guangrongxiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c | 129 ++-- 1 files changed, 54 insertions(+), 75 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 56f8c3c..84c705e 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1200,6 +1200,22 @@ static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp) static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp); +#define for_each_gfn_sp(kvm, sp, gfn, pos, n) \ + hlist_for_each_entry_safe(sp, pos, n, \ + kvm-arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link)\ + if (sp-gfn == gfn) if (...) for_each_gfn_sp(...) blah(); else BUG(); will break. Can do 'if ((sp)-gfn != (gfn)) ; else'. Or call functions from the for (;;) parameters to advance the cursor. (also use parentheses to protect macro arguments) + +#define for_each_gfn_indirect_sp(kvm, sp, gfn, pos, n) \ + hlist_for_each_entry_safe(sp, pos, n, \ + kvm-arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link)\ + if (sp-gfn == gfn !sp-role.direct) + +#define for_each_gfn_indirect_valid_sp(kvm, sp, gfn, pos, n) \ + hlist_for_each_entry_safe(sp, pos, n, \ + kvm-arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link)\ + if (sp-gfn == gfn !sp-role.direct \ + !sp-role.invalid) Shouldn't we always skip invalid gfns? What about providing both gfn and role to the macro? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Perf trace event parse errors for KVM events
On Sun, 2010-05-30 at 11:10 +0300, Avi Kivity wrote: On 05/29/2010 04:19 PM, Steven Rostedt wrote: The pretty printing from the kernel handles this fine. But there's pressure to pass the format to userspace in binary and have the tool parse it. Currently it uses the print fmt to figure out how to parse. Using one of the examples that Stefan showed: kvmmmu/kvm_mmu_get_page: print fmt: %s %s, ({ const char *ret = p-buffer + p-len; static const char *access_str[] = { ---, --x, w--, w-x, -u-, -ux, wu-, wux }; union kvm_mmu_page_role role; role.word = REC-role; trace_seq_printf(p, sp gfn %llx %u%s q%u%s %s%s %snxe root %u %s%c, REC-gfn, role.level, role.cr4_pae ? pae : , role.quadrant, role.direct ? direct : , access_str[role.access], role.invalid ? invalid : , role.nxe ? : !, REC-root_count, REC-unsync ? unsync : sync, 0); ret; }), REC-created ? new : existing You need a full C parser/interpreter to understand the above. Right. The tools can fall back to %x/%s based on the structure descriptor if they can't parse the format string. trace-cmd has plugin support to override how to read the format and print it out. It now has the ability to write those plugins in python. -- Steve -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Perf trace event parse errors for KVM events
On 05/30/2010 05:03 PM, Steven Rostedt wrote: Right. The tools can fall back to %x/%s based on the structure descriptor if they can't parse the format string. trace-cmd has plugin support to override how to read the format and print it out. It now has the ability to write those plugins in python. Cool. May make sense to use simpler formatting in the kernel, and use trace-cmd plugins for the complicated cases. It does raise issues with ABIs. Can trace-cmd read plugins from /lib/modules/*? We can then distribute the plugins with the kernel. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: MMU: always invalidate and flush on spte page size change
On Sun, May 30, 2010 at 01:28:19PM +0300, Avi Kivity wrote: On 05/28/2010 03:44 PM, Marcelo Tosatti wrote: Always invalidate spte and flush TLBs when changing page size, to make sure different sized translations for the same address are never cached in a CPU's TLB. The first case where this occurs is when a non-leaf spte pointer is overwritten by a leaf, large spte entry. This can happen after dirty logging is disabled on a memslot, for example. The second case is a leaf, large spte entry is overwritten with a non-leaf spte pointer, in __direct_map. Note this cannot happen now because the only potential source of such overwrite is dirty logging being enabled, which zaps all MMU pages. But this might change in the future, so better be robust against it. Noticed by Andrea. KVM-Stable-Tag Signed-off-by: Marcelo Tosattimtosa...@redhat.com Index: kvm/arch/x86/kvm/mmu.c === --- kvm.orig/arch/x86/kvm/mmu.c +++ kvm/arch/x86/kvm/mmu.c @@ -1952,6 +1952,8 @@ static void mmu_set_spte(struct kvm_vcpu child = page_header(pte PT64_BASE_ADDR_MASK); mmu_page_remove_parent_pte(child, sptep); +__set_spte(sptep, shadow_trap_nonpresent_pte); +kvm_flush_remote_tlbs(vcpu-kvm); } else if (pfn != spte_to_pfn(*sptep)) { pgprintk(hfn old %lx new %lx\n, spte_to_pfn(*sptep), pfn); Applied this bit. @@ -2015,6 +2017,16 @@ static int __direct_map(struct kvm_vcpu break; } +if (is_shadow_present_pte(*iterator.sptep) +!is_large_pte(*iterator.sptep)) +continue; + +if (is_large_pte(*iterator.sptep)) { +rmap_remove(vcpu-kvm, iterator.sptep); +__set_spte(iterator.sptep, shadow_trap_nonpresent_pte); +kvm_flush_remote_tlbs(vcpu-kvm); +} + Don't we have exactly the same issue in FNAME(fetch)()? Yes and its already handled there: if (is_shadow_present_pte(*sptep) !is_large_pte(*sptep)) continue; if (is_large_pte(*sptep)) { rmap_remove(vcpu-kvm, sptep); __set_spte(sptep, shadow_trap_nonpresent_pte); kvm_flush_remote_tlbs(vcpu-kvm); } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Perf trace event parse errors for KVM events
On Sun, 2010-05-30 at 17:07 +0300, Avi Kivity wrote: On 05/30/2010 05:03 PM, Steven Rostedt wrote: Right. The tools can fall back to %x/%s based on the structure descriptor if they can't parse the format string. trace-cmd has plugin support to override how to read the format and print it out. It now has the ability to write those plugins in python. Cool. May make sense to use simpler formatting in the kernel, and use trace-cmd plugins for the complicated cases. It does raise issues with ABIs. Can trace-cmd read plugins from /lib/modules/*? We can then distribute the plugins with the kernel. We probably could do that. Perhaps if we can port the python code to perf, then it would work for both. Then the plugins could be just python scripts, (or binary .so modules) and have a tools/plugins dir? The python part probably would be easier to port, since the .so modules are a bit more specific to trace-cmd. -- Steve -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] cgroups: Add an API to attach a task to current task's cgroup
From: Sridhar Samudrala samudrala.srid...@gmail.com Add a new kernel API to attach a task to current task's cgroup in all the active hierarchies. Signed-off-by: Sridhar Samudrala s...@us.ibm.com --- include/linux/cgroup.h |1 + kernel/cgroup.c| 23 +++ 2 files changed, 24 insertions(+) Index: work/include/linux/cgroup.h === --- work.orig/include/linux/cgroup.h +++ work/include/linux/cgroup.h @@ -570,6 +570,7 @@ struct task_struct *cgroup_iter_next(str void cgroup_iter_end(struct cgroup *cgrp, struct cgroup_iter *it); int cgroup_scan_tasks(struct cgroup_scanner *scan); int cgroup_attach_task(struct cgroup *, struct task_struct *); +int cgroup_attach_task_current_cg(struct task_struct *); /* * CSS ID is ID for cgroup_subsys_state structs under subsys. This only works Index: work/kernel/cgroup.c === --- work.orig/kernel/cgroup.c +++ work/kernel/cgroup.c @@ -1788,6 +1788,29 @@ out: return retval; } +/** + * cgroup_attach_task_current_cg - attach task 'tsk' to current task's cgroup + * @tsk: the task to be attached + */ +int cgroup_attach_task_current_cg(struct task_struct *tsk) +{ + struct cgroupfs_root *root; + struct cgroup *cur_cg; + int retval = 0; + + cgroup_lock(); + for_each_active_root(root) { + cur_cg = task_cgroup_from_root(current, root); + retval = cgroup_attach_task(cur_cg, tsk); + if (retval) + break; + } + cgroup_unlock(); + + return retval; +} +EXPORT_SYMBOL_GPL(cgroup_attach_task_current_cg); + /* * Attach task with pid 'pid' to cgroup 'cgrp'. Call with cgroup_mutex * held. May take task_lock of task -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] vhost: apply cpumask and cgroup to vhost pollers
Apply the cpumask and cgroup of the initializing task to the created vhost poller. Based on Sridhar Samudrala's patch. Cc: Michael S. Tsirkin m...@redhat.com Cc: Sridhar Samudrala samudrala.srid...@gmail.com --- drivers/vhost/vhost.c | 36 +++- 1 file changed, 31 insertions(+), 5 deletions(-) Index: work/drivers/vhost/vhost.c === --- work.orig/drivers/vhost/vhost.c +++ work/drivers/vhost/vhost.c @@ -23,6 +23,7 @@ #include linux/highmem.h #include linux/slab.h #include linux/kthread.h +#include linux/cgroup.h #include linux/net.h #include linux/if_packet.h @@ -176,12 +177,30 @@ repeat: long vhost_dev_init(struct vhost_dev *dev, struct vhost_virtqueue *vqs, int nvqs) { - struct task_struct *poller; - int i; + struct task_struct *poller = NULL; + cpumask_var_t mask; + int i, ret = -ENOMEM; + + if (!alloc_cpumask_var(mask, GFP_KERNEL)) + goto out; poller = kthread_create(vhost_poller, dev, vhost-%d, current-pid); - if (IS_ERR(poller)) - return PTR_ERR(poller); + if (IS_ERR(poller)) { + ret = PTR_ERR(poller); + goto out; + } + + ret = sched_getaffinity(current-pid, mask); + if (ret) + goto out; + + ret = sched_setaffinity(poller-pid, mask); + if (ret) + goto out; + + ret = cgroup_attach_task_current_cg(poller); + if (ret) + goto out; dev-vqs = vqs; dev-nvqs = nvqs; @@ -202,7 +221,14 @@ long vhost_dev_init(struct vhost_dev *de vhost_poll_init(dev-vqs[i].poll, dev-vqs[i].handle_kick, POLLIN, dev); } - return 0; + + wake_up_process(poller);/* avoid contributing to loadavg */ + ret = 0; +out: + if (ret) + kthread_stop(poller); + free_cpumask_var(mask); + return ret; } /* Caller should have device mutex */ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] arch/s390/kvm: Use GFP_ATOMIC when a lock is held
From: Julia Lawall ju...@diku.dk The containing function is called from several places. At one of them, in the function __sigp_stop, the spin lock fi-lock is held. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // smpl @gfp exists@ identifier fn; position p; @@ fn(...) { ... when != spin_unlock when any gfp_ker...@p ... when any } @locked@ identifier gfp.fn; @@ spin_lock(...) ... when != spin_unlock fn(...) @depends on locked@ position gfp.p; @@ - gfp_ker...@p + GFP_ATOMIC // /smpl Signed-off-by: Julia Lawall ju...@diku.dk --- arch/s390/kvm/sigp.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -u -p a/arch/s390/kvm/sigp.c b/arch/s390/kvm/sigp.c --- a/arch/s390/kvm/sigp.c +++ b/arch/s390/kvm/sigp.c @@ -113,7 +113,7 @@ static int __inject_sigp_stop(struct kvm { struct kvm_s390_interrupt_info *inti; - inti = kzalloc(sizeof(*inti), GFP_KERNEL); + inti = kzalloc(sizeof(*inti), GFP_ATOMIC); if (!inti) return -ENOMEM; inti-type = KVM_S390_SIGP_STOP; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
About offload function of Virtio Nic
Hello all, Can I ask some question of the offloading function with Virtio-NIC? I'm interested in KVM virtualization. The virtual NIC is different from real hardward. And there is also a big distance between Virtio-NIC and E1000-NIC. The offloading function of Virtio-NIC close related with the function of host's real hardware. Associate Feature bits: VIRTIO_NET_F_GUEST_... VIRTIO_NET_F_HOST_... --- command: enable rx: guest) # ethtool -K eth0 enable tx: guest) # ethtool -K eth0 enable gs: guest) # ethtool -K eth0 enable tso: guest) # ethtool -K eth0 enable gso: guest) # ethtool -K eth0 enable lro: guest) # ethtool -K eth0 enable gro: guest) # ethtool -K eth0 Those steps work well with Virtual E1000-NIC, but it always fail when using Virtio-NIC. --- After setup, I have two check method. For example: 1. enable tso 2. check the output of 'ethtool -k eth0' 3. send large file from guest to host by scp, listen by tcpdump during transferring file. and check if there contains some large packet($length 1514). The check method works with Virtual E1000-NIC, but it always fail when using Virtio-NIC. --- My question: 1. How to setup offloading function of Virtio-NIC? 2. How about my check method of offloading function(transferring file, check packet length by tcpdump)? 3. Is there some detail user guide for Virtio-NIC? Best Regards, Amos -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About offload function of Virtio Nic
On Mon, May 31, 2010 at 7:44 AM, Amos Kong kongjian...@gmail.com wrote: Hello all, Can I ask some question of the offloading function with Virtio-NIC? I'm interested in KVM virtualization. The virtual NIC is different from real hardward. And there is also a big distance between Virtio-NIC and E1000-NIC. The offloading function of Virtio-NIC close related with the function of host's real hardware. Associate Feature bits: VIRTIO_NET_F_GUEST_... VIRTIO_NET_F_HOST_... --- command: enable rx: guest) # ethtool -K eth0 enable tx: guest) # ethtool -K eth0 enable gs: guest) # ethtool -K eth0 enable tso: guest) # ethtool -K eth0 enable gso: guest) # ethtool -K eth0 enable lro: guest) # ethtool -K eth0 enable gro: guest) # ethtool -K eth0 sorry , it should be enable rx: guest) # ethtool -K eth0 rx on enable tx: guest) # ethtool -K eth0 tx on enable gs: guest) # ethtool -K eth0 gs on enable tso: guest) # ethtool -K eth0 tso on enable gso: guest) # ethtool -K eth0 gso on enable lro: guest) # ethtool -K eth0 lro on enable gro: guest) # ethtool -K eth0 gro on It also works with my real hardware. Those steps work well with Virtual E1000-NIC, but it always fail when using Virtio-NIC. --- After setup, I have two check method. For example: 1. enable tso 2. check the output of 'ethtool -k eth0' 3. send large file from guest to host by scp, listen by tcpdump during transferring file. and check if there contains some large packet($length 1514). The check method works with Virtual E1000-NIC, but it always fail when using Virtio-NIC. --- My question: 1. How to setup offloading function of Virtio-NIC? 2. How about my check method of offloading function(transferring file, check packet length by tcpdump)? 3. Is there some detail user guide for Virtio-NIC? Best Regards, Amos -- 个人主页搬家了,欢迎来串门 ;) http://kongove.whostas.com/ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] cgroups: Add an API to attach a task to current task's cgroup
04:24, Tejun Heo wrote: From: Sridhar Samudrala samudrala.srid...@gmail.com Add a new kernel API to attach a task to current task's cgroup in all the active hierarchies. Signed-off-by: Sridhar Samudrala s...@us.ibm.com Acked-by: Li Zefan l...@cn.fujitsu.com btw: you lost the reviewed-by tag given by Paul Menage. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Issues with getting PCI pass-through to work
I have a PCI WLAN controller that I wish to pass-through to the standard linux-img VM (downloaded from qemu website). When I try qemu-kvm -pcidevice host=0c:00.0,name=abcd linux-0.2.img The VM does boot up but does not see the device. On the command prompt, I see the following message: Failed to assign irq for abcd: Operation not supported Perhaps you are assigning a device that shares an IRQ with another device? I want to know what does this message exactly mean? Is PCI pass-through having the requirement of reconfiguring the IRQ of my host devices before I can pass them through? Can I configure qemu-kvm somehow to emulate the IRQ (and other registers) instead of trying to pass-through the IRQ etc? BTW, my box has VT and VT-d enabled in BIOS. -- Adhyas Two types have compatible type if their types are the same. — ANSI C Standard, 3.1.2.6. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] Setup scsi-bus xfer and xfer_mode for PR IN/OUT and Maintenance IN/OUT
From: Nicholas Bellinger n...@linux-iscsi.org Greetings Gerd, Kevin and Co, Attached are two patches to add the necesary CDB parsing to determine SCSIRequest-cmd.xfer (length) and SCSIRequest-cmd.mode (direction) for Persistent Reservation IN/OUT CDBs and for Maintenance IN/OUT CDBs used for Asymmetric Logical Unit Access, et al. There is a special case for the latter Maintenance CDBs with TYPE_ROM that has been included in scsi_req_length(). Also, I should mention this is a temporary measure in order to ensure that we can actually do passthrough of these CDBs into KVM Guest for lsi and megaraid HBA emulation. What will need to eventually happen is to get rid of scsi_req_xfer_mode() all together and just setup SCSIRequest-cmd.mode based on CDB type in scsi_req_length(), instead of having to have another switch(cdb[0]) statement for every SCSI WRITE CDB on the planet to set SCSI_XFER_TO_DEV. Anyways, I will look at doing this conversion in scsi_req_length() at some point, but please apply these for the moment so folks can get access to their SPC-4 Port LUNs with QEMU. ;) Best, --nab Nicholas Bellinger (2): [scsi-bus]: Add PR-OUT and PR-IN case for SCSIRequest xfer and xfer_mode setup [scsi-bus]: Add MAINTENANCE_IN and MAINTENANCE_OUT case for SCSIRequest xfer and xfer_mode setup hw/scsi-bus.c | 19 +++ hw/scsi-defs.h |2 ++ 2 files changed, 21 insertions(+), 0 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] [scsi-bus]: Add PR-OUT and PR-IN case for SCSIRequest xfer and xfer_mode setup
From: Nicholas Bellinger n...@linux-iscsi.org This patch updates hw/scsi-bus.c to add PERSISTENT_RESERVE_OUT and PERSISTENT_RESERVE_IN case in scsi_req_length() to extra the incoming buffer length into SCSIRequest-cmd.xfer, and adds a second PERSISTENT_RESERVE_OUT case in scsi_req_xfer_mode() in order to properly set SCSI_XFER_TO_DEV for WRITE data. Tested with Linux KVM guests and Megasas 8708EM2 HBA emulation and TCM_Loop target ports. Signed-off-by: Nicholas A. Bellinger n...@linux-iscsi.org --- hw/scsi-bus.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c index b8e4b71..75ec74e 100644 --- a/hw/scsi-bus.c +++ b/hw/scsi-bus.c @@ -325,6 +325,10 @@ static int scsi_req_length(SCSIRequest *req, uint8_t *cmd) case INQUIRY: req-cmd.xfer = cmd[4] | (cmd[3] 8); break; +case PERSISTENT_RESERVE_OUT: +case PERSISTENT_RESERVE_IN: +req-cmd.xfer = cmd[8] | (cmd[7] 8); +break; } return 0; } @@ -389,6 +393,7 @@ static void scsi_req_xfer_mode(SCSIRequest *req) case MEDIUM_SCAN: case SEND_VOLUME_TAG: case WRITE_LONG_2: +case PERSISTENT_RESERVE_OUT: req-cmd.mode = SCSI_XFER_TO_DEV; break; default: -- 1.5.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] [scsi-bus]: Add MAINTENANCE_IN and MAINTENANCE_OUT case for SCSIRequest xfer and xfer_mode setup
From: Nicholas Bellinger n...@linux-iscsi.org This patch updates hw/scsi-bus.c to add MAINTENANCE_IN and MAINTENANCE_OUT case in scsi_req_length() to extra the incoming buffer length into SCSIRequest-cmd.xfer, and adds a second MAINTENANCE_OUT case in scsi_req_xfer_mode() in order to properly set SCSI_XFER_TO_DEV for WRITE data. Tested with Linux KVM guests and Megasas 8708EM2 HBA emulation and TCM_Loop target ports. Signed-off-by: Nicholas A. Bellinger n...@linux-iscsi.org --- hw/scsi-bus.c | 14 ++ hw/scsi-defs.h |2 ++ 2 files changed, 16 insertions(+), 0 deletions(-) diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c index 75ec74e..7d80405 100644 --- a/hw/scsi-bus.c +++ b/hw/scsi-bus.c @@ -329,6 +329,17 @@ static int scsi_req_length(SCSIRequest *req, uint8_t *cmd) case PERSISTENT_RESERVE_IN: req-cmd.xfer = cmd[8] | (cmd[7] 8); break; +case MAINTENANCE_OUT: +case MAINTENANCE_IN: +if (req-dev-type != TYPE_ROM) { +/* Used for MI_REPORT_TARGET_PGS, MO_SET_TARGET_PGS et al. */ +req-cmd.xfer = cmd[9] | (cmd[8] 8) | +(cmd[7] 16) | (cmd[6] 24); +} else { +/* GPCMD_REPORT_KEY and GPCMD_SEND_KEY from multi media commands */ +req-cmd.xfer = cmd[9] | (cmd[8] 8); +} +break; } return 0; } @@ -394,6 +405,7 @@ static void scsi_req_xfer_mode(SCSIRequest *req) case SEND_VOLUME_TAG: case WRITE_LONG_2: case PERSISTENT_RESERVE_OUT: +case MAINTENANCE_OUT: req-cmd.mode = SCSI_XFER_TO_DEV; break; default: @@ -539,6 +551,8 @@ const char *scsi_command_name(uint8_t cmd) [ SPACE] = SPACE, [ INQUIRY ] = INQUIRY, [ RECOVER_BUFFERED_DATA] = RECOVER_BUFFERED_DATA, +[ MAINTENANCE_IN ] = MAINTENANCE_IN, +[ MAINTENANCE_OUT ] = MAINTENANCE_OUT, [ MODE_SELECT ] = MODE_SELECT, [ RESERVE ] = RESERVE, [ RELEASE ] = RELEASE, diff --git a/hw/scsi-defs.h b/hw/scsi-defs.h index 5890094..a4a3518 100644 --- a/hw/scsi-defs.h +++ b/hw/scsi-defs.h @@ -84,6 +84,8 @@ #define MODE_SENSE_10 0x5a #define PERSISTENT_RESERVE_IN 0x5e #define PERSISTENT_RESERVE_OUT 0x5f +#define MAINTENANCE_IN0xa3 +#define MAINTENANCE_OUT 0xa4 #define MOVE_MEDIUM 0xa5 #define READ_12 0xa8 #define WRITE_12 0xaa -- 1.5.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] KVM: MMU: introduce some macros to cleanup hlist traverseing
Avi Kivity wrote: On 05/30/2010 03:36 PM, Xiao Guangrong wrote: Introduce for_each_gfn_sp(), for_each_gfn_indirect_sp() and for_each_gfn_indirect_valid_sp() to cleanup hlist traverseing Signed-off-by: Xiao Guangrongxiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c | 129 ++-- 1 files changed, 54 insertions(+), 75 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 56f8c3c..84c705e 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1200,6 +1200,22 @@ static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp) static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp); +#define for_each_gfn_sp(kvm, sp, gfn, pos, n)\ + hlist_for_each_entry_safe(sp, pos, n,\ +kvm-arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link)\ +if (sp-gfn == gfn) Avi, Thanks for your review. if (...) for_each_gfn_sp(...) blah(); else BUG(); will break. Can do 'if ((sp)-gfn != (gfn)) ; else'. Or call functions from the for (;;) parameters to advance the cursor. (also use parentheses to protect macro arguments) Yeah, it's my mistake, i'll fix it in the next version. + +#define for_each_gfn_indirect_sp(kvm, sp, gfn, pos, n)\ + hlist_for_each_entry_safe(sp, pos, n,\ +kvm-arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link)\ +if (sp-gfn == gfn !sp-role.direct) + +#define for_each_gfn_indirect_valid_sp(kvm, sp, gfn, pos, n)\ + hlist_for_each_entry_safe(sp, pos, n,\ +kvm-arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link)\ +if (sp-gfn == gfn !sp-role.direct\ +!sp-role.invalid) Shouldn't we always skip invalid gfns? Actually, in kvm_mmu_unprotect_page() function, it need find out invalid shadow pages: | hlist_for_each_entry_safe(sp, node, n, bucket, hash_link) | if (sp-gfn == gfn !sp-role.direct) { | pgprintk(%s: gfn %lx role %x\n, __func__, gfn, |sp-role.word); | r = 1; | if (kvm_mmu_zap_page(kvm, sp)) | goto restart; | } I'm not sure whether we can skip invalid sp here, since it can change this function's return value. :-( What about providing both gfn and role to the macro? In current code, no code simply use role and gfn to find sp, in kvm_mmu_get_page(), we need do other work for 'sp-gfn == gfn sp-role != role' sp, and other functions only need compare some members in role, but not all members. Xiao -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] KVM: MMU: split the operations of kvm_mmu_zap_page()
Avi Kivity wrote: On 05/30/2010 03:37 PM, Xiao Guangrong wrote: Using kvm_mmu_prepare_zap_page() and kvm_mmu_commit_zap_page() to split kvm_mmu_zap_page() function, then we can: - traverse hlist safely - easily to gather remote tlb flush which occurs during page zapped +static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp) +{ +int ret; + +trace_kvm_mmu_zap_page(sp); +++kvm-stat.mmu_shadow_zapped; +ret = mmu_zap_unsync_children(kvm, sp); +kvm_mmu_page_unlink_children(kvm, sp); +kvm_mmu_unlink_parents(kvm, sp); +if (!sp-role.invalid !sp-role.direct) +unaccount_shadowed(kvm, sp-gfn); +if (sp-unsync) +kvm_unlink_unsync_page(kvm, sp); +if (!sp-root_count) +/* Count self */ +ret++; +else +kvm_reload_remote_mmus(kvm); + +sp-role.invalid = 1; +list_move(sp-link,kvm-arch.invalid_mmu_pages); +kvm_mmu_reset_last_pte_updated(kvm); +return ret; +} + +static void kvm_mmu_commit_zap_page(struct kvm *kvm) +{ +struct kvm_mmu_page *sp, *n; + +if (list_empty(kvm-arch.invalid_mmu_pages)) +return; + +kvm_flush_remote_tlbs(kvm); +list_for_each_entry_safe(sp, n,kvm-arch.invalid_mmu_pages, link) { +WARN_ON(!sp-role.invalid); +if (!sp-root_count) +kvm_mmu_free_page(kvm, sp); +} +} + You're adding two new functions but not using them here? Possibly in the old kvm_mmu_zap_page()? I use those in the next patch, it's not in kvm_mmu_zap_page(), it's used like: hold mmu spin lock kvm_mmu_prepare_zap_page page A kvm_mmu_prepare_zap_page page B kvm_mmu_prepare_zap_page page C .. kvm_mmu_commit_zap_page release mmu spin lock diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5e5cd8d..225c3c4 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5331,6 +5331,7 @@ struct kvm *kvm_arch_create_vm(void) } INIT_LIST_HEAD(kvm-arch.active_mmu_pages); +INIT_LIST_HEAD(kvm-arch.invalid_mmu_pages); INIT_LIST_HEAD(kvm-arch.assigned_dev_head); This is a good idea, but belongs in a separate patch? We can use it to reclaim invalid pages before allocating new ones. This patch is very simple and kvm_mmu_commit_zap_page() function should depend on kvm-arch.invalid_mmu_pages, so i think we on need separate this patch, your opinion? :-) Thanks, Xiao -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html