[PATCH] KVM: VMX: Enforce EPT pagetable level checking
We only support 4 levels EPT pagetable now. Signed-off-by: Sheng Yang sh...@linux.intel.com --- arch/x86/kvm/vmx.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 99ae513..d400fbb 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -339,6 +339,11 @@ static inline bool cpu_has_vmx_ept_1g_page(void) return vmx_capability.ept VMX_EPT_1GB_PAGE_BIT; } +static inline bool cpu_has_vmx_ept_4levels(void) +{ + return vmx_capability.ept VMX_EPT_PAGE_WALK_4_BIT; +} + static inline bool cpu_has_vmx_invept_individual_addr(void) { return vmx_capability.ept VMX_EPT_EXTENT_INDIVIDUAL_BIT; @@ -1567,7 +1572,8 @@ static __init int hardware_setup(void) if (!cpu_has_vmx_vpid()) enable_vpid = 0; - if (!cpu_has_vmx_ept()) { + if (!cpu_has_vmx_ept() || + !cpu_has_vmx_ept_4levels()) { enable_ept = 0; enable_unrestricted_guest = 0; } -- 1.7.0.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 32-bit qemu + 64-bit kvm be a problem?
02.06.2010 09:44, Neo Jia wrote: On Wed, Mar 10, 2010 at 2:12 PM, Michael Tokarevm...@tls.msk.ru wrote: [] I use 32bit kvm on 64bit kernel since the day one. Nothing of interest since that, everything just works. I just came back to this thread because I am seeing that I can't run VISTA 64-bit inside 64/32 mode, which will crash with bugcheck 0x5D. Is this a known issue? I haven't seen it myself, but I never tried vista. Win7 64bit installs and works just fine. STOP 0x005D, according to M$, means there is a processor present that is not supported. I'd say try to experiment with different -cpu options. Also see if it works in native 64bit mode. Recently (this week) I come across a situation when something does not work in 64/32 mode. Namely it is linux aio (see the other thread in kvm@ a few days back) - but this is not due to kvm but due to other kernel subsystem (in this case aio) which lacks proper compat handlers in place. Could you tell me more about the AIO issue? Will this will slow down the guest if it does a lot I/O? Will setting up coalescing help? You can search it yourself. In a nutshell it's an optimisation in kvm since version 0.12 to let raw devices be used a bit more efficient. /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] use unfair spinlock when running on hypervisor.
On 06/01/2010 10:39 AM, valdis.kletni...@vt.edu wrote: On Tue, 01 Jun 2010 19:52:28 +0300, Avi Kivity said: On 06/01/2010 07:38 PM, Andi Kleen wrote: Your new code would starve again, right? Try it on a NUMA system with unfair memory. We are running everything on NUMA (since all modern machines are now NUMA). At what scale do the issues become observable? My 6-month-old laptop is NUMA? Comes as a surprise to me, and to the perfectly-running NUMA=n kernel I'm running. Or did you mean a less broad phrase than all modern machines? All modern multisocket machines, unless configured in interleaved memory mode. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] ceph/rbd block driver for qemu-kvm (v3)
Hi Kevin, 2010/6/1 Kevin Wolf kw...@redhat.com: Do you have some specific tests for the driver or should we extend qemu-iotests to work with protocols and use only that? Right now I don't have any specific tests, but I'll take a look at qemu-iotests soon. Christian -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
unhandled wrmsr
Hi, With today's git version (qemu-kvm), I got following message in kernel dmesg [168344.215605] kvm: 27289: cpu0 unhandled wrmsr: 0x198 data 0 bash-3.1$ /home/dave/tmp/qemu-system-x86_64 --version QEMU emulator version 0.12.50 (qemu-kvm-devel), Copyright (c) 2003-2008 Fabrice Bellard bash-3.1$ uname -a Linux darkstar 2.6.35-rc1 #35 SMP Mon May 31 16:50:15 CST 2010 x86_64 Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz GenuineIntel GNU/Linux host distribution is slackware64-13.0 0.12.3 works well BTW the guest is a tiny core linux image -- Regards dave -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] use unfair spinlock when running on hypervisor.
On Wed, Jun 02, 2010 at 05:51:14AM +0300, Avi Kivity wrote: On 06/01/2010 08:27 PM, Andi Kleen wrote: On Tue, Jun 01, 2010 at 07:52:28PM +0300, Avi Kivity wrote: We are running everything on NUMA (since all modern machines are now NUMA). At what scale do the issues become observable? On Intel platforms it's visible starting with 4 sockets. Can you recommend a benchmark that shows bad behaviour? I'll run it with Pretty much anything with high lock contention. ticket spinlocks and Gleb's patch. I have a 4-way Nehalem-EX, presumably the huge number of threads will magnify the problem even more there. Yes more threads cause more lock contention too. Do you have any idea how we can tackle both problems? Apparently Xen has something, perhaps that can be leveraged (but I haven't looked at their solution in detail) Otherwise I would probably try to start with a adaptive spinlock that at some point calls into the HV (or updates shared memory?), like john cooper suggested. The tricky part here would be to find the thresholds and fit that state into paravirt ops and the standard spinlock_t. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: rework remove-write-access for a slot
Current code uses slot_bitmap to find ptes who map a page from the memory slot, it is not precise: some ptes in the shadow page are not map any page from the memory slot. This patch uses rmap to find the ptes precisely, and remove the unused slot_bitmap. Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com --- diff --git a/Documentation/kvm/mmu.txt b/Documentation/kvm/mmu.txt index 1e7ecdd..d749399 100644 --- a/Documentation/kvm/mmu.txt +++ b/Documentation/kvm/mmu.txt @@ -183,13 +183,6 @@ Shadow pages contain the following information: perform a reverse map from a pte to a gfn. When role.direct is set, any element of this array can be calculated from the gfn field when used, in this case, the array of gfns is not allocated. See role.direct and gfn. - slot_bitmap: -A bitmap containing one bit per memory slot. If the page contains a pte -mapping a page from memory slot n, then bit n of slot_bitmap will be set -(if a page is aliased among several slots, then it is not guaranteed that -all slots will be marked). -Used during dirty logging to avoid scanning a shadow page if none if its -pages need tracking. root_count: A counter keeping track of how many hardware registers (guest cr3 or pdptrs) are now pointing at the page. While this counter is nonzero, the diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 0cd0f29..bf4f198 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -197,11 +197,6 @@ struct kvm_mmu_page { u64 *spt; /* hold the gfn of each spte inside spt */ gfn_t *gfns; - /* -* One bit set per slot which has memory -* in this shadow page. -*/ - DECLARE_BITMAP(slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS); bool multimapped; /* More than one parent_pte? */ bool unsync; int root_count; /* Currently serving as active root */ diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index c16c4ca..e097e81 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -941,7 +941,6 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, PAGE_SIZE); set_page_private(virt_to_page(sp-spt), (unsigned long)sp); list_add(sp-link, vcpu-kvm-arch.active_mmu_pages); - bitmap_zero(sp-slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS); sp-multimapped = 0; sp-parent_pte = parent_pte; --vcpu-kvm-arch.n_free_mmu_pages; @@ -1660,14 +1659,6 @@ restart: } } -static void page_header_update_slot(struct kvm *kvm, void *pte, gfn_t gfn) -{ - int slot = memslot_id(kvm, gfn); - struct kvm_mmu_page *sp = page_header(__pa(pte)); - - __set_bit(slot, sp-slot_bitmap); -} - static void mmu_convert_notrap(struct kvm_mmu_page *sp) { int i; @@ -1979,7 +1970,6 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, if (!was_rmapped is_large_pte(*sptep)) ++vcpu-kvm-stat.lpages; - page_header_update_slot(vcpu-kvm, sptep, gfn); if (!was_rmapped) { rmap_count = rmap_add(vcpu, sptep, gfn); kvm_release_pfn_clean(pfn); @@ -2975,22 +2965,38 @@ void kvm_mmu_destroy(struct kvm_vcpu *vcpu) mmu_free_memory_caches(vcpu); } +static void rmapp_remove_write_access(struct kvm *kvm, unsigned long *rmapp) +{ + u64 *spte = rmap_next(kvm, rmapp, NULL); + + while (spte) { + /* avoid RMW */ + if (is_writable_pte(*spte)) + *spte = ~PT_WRITABLE_MASK; + spte = rmap_next(kvm, rmapp, spte); + } +} + void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot) { - struct kvm_mmu_page *sp; + int i; + unsigned long gfn_offset; + struct kvm_memslots *slots = kvm_memslots(kvm); + struct kvm_memory_slot *memslot = slots-memslots[slot]; - list_for_each_entry(sp, kvm-arch.active_mmu_pages, link) { - int i; - u64 *pt; + for (gfn_offset = 0; gfn_offset memslot-npages; gfn_offset++) { + rmapp_remove_write_access(kvm, memslot-rmap[gfn_offset]); - if (!test_bit(slot, sp-slot_bitmap)) - continue; + for (i = 0; i KVM_NR_PAGE_SIZES - 1; i++) { + unsigned long gfn = memslot-base_gfn + gfn_offset; + unsigned long huge = KVM_PAGES_PER_HPAGE(i + 2); + int idx = gfn / huge - memslot-base_gfn / huge; - pt = sp-spt; - for (i = 0; i PT64_ENT_PER_PAGE; ++i) - /* avoid RMW */ - if (is_writable_pte(pt[i])) - pt[i] = ~PT_WRITABLE_MASK; + if (!(gfn_offset || (gfn % huge))) + break; +
Re: [PATCH] use unfair spinlock when running on hypervisor.
On 06/02/2010 11:50 AM, Andi Kleen wrote: On Wed, Jun 02, 2010 at 05:51:14AM +0300, Avi Kivity wrote: On 06/01/2010 08:27 PM, Andi Kleen wrote: On Tue, Jun 01, 2010 at 07:52:28PM +0300, Avi Kivity wrote: We are running everything on NUMA (since all modern machines are now NUMA). At what scale do the issues become observable? On Intel platforms it's visible starting with 4 sockets. Can you recommend a benchmark that shows bad behaviour? I'll run it with Pretty much anything with high lock contention. Okay, we'll try to measure it here as soon as we can switch it into numa mode. Do you have any idea how we can tackle both problems? Apparently Xen has something, perhaps that can be leveraged (but I haven't looked at their solution in detail) Otherwise I would probably try to start with a adaptive spinlock that at some point calls into the HV (or updates shared memory?), like john cooper suggested. The tricky part here would be to find the thresholds and fit that state into paravirt ops and the standard spinlock_t. There are two separate problems: the more general problem is that the hypervisor can put a vcpu to sleep while holding a lock, causing other vcpus to spin until the end of their time slice. This can only be addressed with hypervisor help. The second problem is that the extreme fairness of ticket locks causes lots of context switches if the hypervisor helps, and aggravates the first problem horribly if it doesn't (since now a vcpu will spin waiting for its ticket even if the lock is unlocked). So yes, we'll need hypervisor assistance, but even with that we'll need to reduce ticket lock fairness (retaining global fairness but sacrificing some local fairness). I imagine that will be helpful for non-virt as well as local unfairness reduces bounciness. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: cleanup, fix typing
The type of '*new.rmap' is not 'struct page *', fix it Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com --- diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 9ab1a77..cab7839 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -596,7 +596,7 @@ int __kvm_set_memory_region(struct kvm *kvm, /* Allocate if a slot is being created */ #ifndef CONFIG_S390 if (npages !new.rmap) { - new.rmap = vmalloc(npages * sizeof(struct page *)); + new.rmap = vmalloc(npages * sizeof(*new.rmap)); if (!new.rmap) goto out_free; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: cleanup, use linux/uaccess.h instead of asm/uaccess.h
Should use linux/uaccess.h instead of asm/uaccess.h Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com --- diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6b2ce1d..ea03def 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -41,13 +41,13 @@ #include linux/srcu.h #include linux/slab.h #include linux/perf_event.h +#include linux/uaccess.h #include trace/events/kvm.h #define CREATE_TRACE_POINTS #include trace.h #include asm/debugreg.h -#include asm/uaccess.h #include asm/msr.h #include asm/desc.h #include asm/mtrr.h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Tue, Jun 01, 2010 at 12:55:32PM +0300, Michael S. Tsirkin wrote: There seems to be some misunderstanding. The userspace interface proposed forces a separate domain per device and forces userspace to repeat iommu programming for each device. We are better off sharing a domain between devices and programming the iommu once. The natural way to do this is to have an iommu driver for programming iommu. IMO a seperate iommu-userspace driver is a nightmare for a userspace interface. It is just too complicated to use. We can solve the problem of multiple devices-per-domain with an ioctl which allows binding one uio-device to the address-space on another. Thats much simpler. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Tue, Jun 01, 2010 at 03:41:55PM +0300, Avi Kivity wrote: On 06/01/2010 01:46 PM, Michael S. Tsirkin wrote: Main difference is that vhost works fine with unlocked memory, paging it in on demand. iommu needs to unmap memory when it is swapped out or relocated. So you'd just take the memory map and not pin anything. This way you can reuse the memory map. But no, it doesn't handle the dirty bitmap, so no go. IOMMU mapped memory can not be swapped out because we can't do demand paging on io-page-faults with current devices. We have to pin _all_ userspace memory that is mapped into an IOMMU domain. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On 06/02/2010 12:45 PM, Joerg Roedel wrote: On Tue, Jun 01, 2010 at 03:41:55PM +0300, Avi Kivity wrote: On 06/01/2010 01:46 PM, Michael S. Tsirkin wrote: Main difference is that vhost works fine with unlocked memory, paging it in on demand. iommu needs to unmap memory when it is swapped out or relocated. So you'd just take the memory map and not pin anything. This way you can reuse the memory map. But no, it doesn't handle the dirty bitmap, so no go. IOMMU mapped memory can not be swapped out because we can't do demand paging on io-page-faults with current devices. We have to pin _all_ userspace memory that is mapped into an IOMMU domain. vhost doesn't pin memory. What I proposed is to describe the memory map using an object (fd), and pass it around to clients that use it: kvm, vhost, vfio. That way you maintain the memory map in a central location and broadcast changes to clients. Only a vfio client would result in memory being pinned. It can still work, but the interface needs to be extended to include dirty bitmap logging. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On 06/02/2010 12:42 PM, Joerg Roedel wrote: On Tue, Jun 01, 2010 at 12:55:32PM +0300, Michael S. Tsirkin wrote: There seems to be some misunderstanding. The userspace interface proposed forces a separate domain per device and forces userspace to repeat iommu programming for each device. We are better off sharing a domain between devices and programming the iommu once. The natural way to do this is to have an iommu driver for programming iommu. IMO a seperate iommu-userspace driver is a nightmare for a userspace interface. It is just too complicated to use. We can solve the problem of multiple devices-per-domain with an ioctl which allows binding one uio-device to the address-space on another. Thats much simpler. This is non trivial with hotplug. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Tue, Jun 01, 2010 at 09:59:40PM -0700, Tom Lyon wrote: This is just what I was thinking. But rather than a get/set, just use two fds. ioctl(vfio_fd1, VFIO_SET_DOMAIN, vfio_fd2); This may fail if there are really 2 different IOMMUs, so user code must be prepared for failure, In addition, this is strictlyupwards compatible with what is there now, so maybe we can add it later. How can this fail with multiple IOMMUs? This should be handled transparently by the IOMMU driver. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Wed, Jun 02, 2010 at 11:42:01AM +0200, Joerg Roedel wrote: On Tue, Jun 01, 2010 at 12:55:32PM +0300, Michael S. Tsirkin wrote: There seems to be some misunderstanding. The userspace interface proposed forces a separate domain per device and forces userspace to repeat iommu programming for each device. We are better off sharing a domain between devices and programming the iommu once. The natural way to do this is to have an iommu driver for programming iommu. IMO a seperate iommu-userspace driver is a nightmare for a userspace interface. It is just too complicated to use. One advantage would be that we can reuse the uio framework for the devices themselves. So an existing app can just program an iommu for DMA and keep using uio for interrupts and access. We can solve the problem of multiple devices-per-domain with an ioctl which allows binding one uio-device to the address-space on another. This would imply switching an iommu domain for a device while it could potentially be doing DMA. No idea whether this can be done in a safe manner. Forcing iommu assignment to be done as a first step seems much saner. Thats much simpler. Joerg So instead of dev = open(); ioctl(dev, ASSIGN, iommu) mmap and if we for ioctl mmap will fail we have dev = open(); if (ndevices 0) ioctl(devices[0], ASSIGN, dev) mmap And if we forget ioctl we get errors from device. Seems more complicated to me. There will also always exist the confusion: address space for which device are we modifying? With a separate driver for iommu, we can safely check that binding is done correctly. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: disable clocksource=kvm-clock
Avi Kivity wrote: On 06/01/2010 04:57 PM, Peter Lieven wrote: Avi Kivity wrote: On 06/01/2010 04:12 PM, Peter Lieven wrote: hi, is it possible to avoid detection of clocksource=kvm_clock in a linux guest by patching the qemu-kvm binary? i would like to be able to avoid a guest detecting kvm-clock until bug #584516 is fixed without modifying all guest systems and reverting that later. Newer qemu-kvm's will have this feature (using -cpu ...,-kvmclock). with latest git, i can supply this switch, but cat /sys/devices/system/clocksource/clocksource0/current_clocksource still shows kvm_clock in an opensuse 11.2 64-bit or an ubuntu 10.04 64-bit guest. Try again with tomorrow's git, I sent out a patchset earlier today that fixes this. i don't see it in the git, do you mean: [PATCH 0/5] Use more cpuid bits from upstream thanks, peter -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Wed, Jun 02, 2010 at 12:49:28PM +0300, Avi Kivity wrote: On 06/02/2010 12:45 PM, Joerg Roedel wrote: IOMMU mapped memory can not be swapped out because we can't do demand paging on io-page-faults with current devices. We have to pin _all_ userspace memory that is mapped into an IOMMU domain. vhost doesn't pin memory. What I proposed is to describe the memory map using an object (fd), and pass it around to clients that use it: kvm, vhost, vfio. That way you maintain the memory map in a central location and broadcast changes to clients. Only a vfio client would result in memory being pinned. Ah ok, so its only about the database which keeps the mapping information. It can still work, but the interface needs to be extended to include dirty bitmap logging. Thats hard to do. I am not sure about VT-d but the AMD IOMMU has no dirty-bits in the page-table. And without demand-paging we can't really tell what pages a device has written to. The only choice is to mark all IOMMU-mapped pages dirty as long as they are mapped. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Wed, Jun 02, 2010 at 12:04:04PM +0200, Joerg Roedel wrote: On Wed, Jun 02, 2010 at 12:49:28PM +0300, Avi Kivity wrote: On 06/02/2010 12:45 PM, Joerg Roedel wrote: IOMMU mapped memory can not be swapped out because we can't do demand paging on io-page-faults with current devices. We have to pin _all_ userspace memory that is mapped into an IOMMU domain. vhost doesn't pin memory. What I proposed is to describe the memory map using an object (fd), and pass it around to clients that use it: kvm, vhost, vfio. That way you maintain the memory map in a central location and broadcast changes to clients. Only a vfio client would result in memory being pinned. Ah ok, so its only about the database which keeps the mapping information. It can still work, but the interface needs to be extended to include dirty bitmap logging. Thats hard to do. I am not sure about VT-d but the AMD IOMMU has no dirty-bits in the page-table. And without demand-paging we can't really tell what pages a device has written to. The only choice is to mark all IOMMU-mapped pages dirty as long as they are mapped. Joerg Or mark them dirty when they are unmapped. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Wed, Jun 02, 2010 at 12:53:12PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 02, 2010 at 11:42:01AM +0200, Joerg Roedel wrote: IMO a seperate iommu-userspace driver is a nightmare for a userspace interface. It is just too complicated to use. One advantage would be that we can reuse the uio framework for the devices themselves. So an existing app can just program an iommu for DMA and keep using uio for interrupts and access. The driver is called UIO and not U-INTR-MMIO ;-) So I think handling IOMMU mappings belongs there. We can solve the problem of multiple devices-per-domain with an ioctl which allows binding one uio-device to the address-space on another. This would imply switching an iommu domain for a device while it could potentially be doing DMA. No idea whether this can be done in a safe manner. It can. The worst thing that can happen is an io-page-fault. Forcing iommu assignment to be done as a first step seems much saner. If we force it, there is no reason why not doing it implicitly. We can do something like this then: dev1 = open(); ioctl(dev1, IOMMU_MAP, ...); /* creates IOMMU domain and assigns dev1 to it*/ dev2 = open(); ioctl(dev2, IOMMU_MAP, ...); /* Now dev1 and dev2 are in seperate domains */ ioctl(dev2, IOMMU_SHARE, dev1); /* destroys all mapping for dev2 and assigns it to the same domain as dev1. Domain has a refcount of two now */ close(dev1); /* domain refcount goes down to one */ close(dev2); /* domain refcount is zero and domain gets destroyed */ Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Wed, Jun 02, 2010 at 11:45:27AM +0200, Joerg Roedel wrote: On Tue, Jun 01, 2010 at 03:41:55PM +0300, Avi Kivity wrote: On 06/01/2010 01:46 PM, Michael S. Tsirkin wrote: Main difference is that vhost works fine with unlocked memory, paging it in on demand. iommu needs to unmap memory when it is swapped out or relocated. So you'd just take the memory map and not pin anything. This way you can reuse the memory map. But no, it doesn't handle the dirty bitmap, so no go. IOMMU mapped memory can not be swapped out because we can't do demand paging on io-page-faults with current devices. We have to pin _all_ userspace memory that is mapped into an IOMMU domain. Joerg One of the issues I see with the current patch is that it uses the mlock rlimit to do this pinning. So this wastes the rlimit for an app that did mlockall already, and also consumes this resource transparently, so an app might call mlock on a small buffer and be surprised that it fails. Using mmu notifiers might help? -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Wed, Jun 02, 2010 at 12:19:40PM +0200, Joerg Roedel wrote: On Wed, Jun 02, 2010 at 12:53:12PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 02, 2010 at 11:42:01AM +0200, Joerg Roedel wrote: IMO a seperate iommu-userspace driver is a nightmare for a userspace interface. It is just too complicated to use. One advantage would be that we can reuse the uio framework for the devices themselves. So an existing app can just program an iommu for DMA and keep using uio for interrupts and access. The driver is called UIO and not U-INTR-MMIO ;-) So I think handling IOMMU mappings belongs there. We can solve the problem of multiple devices-per-domain with an ioctl which allows binding one uio-device to the address-space on another. This would imply switching an iommu domain for a device while it could potentially be doing DMA. No idea whether this can be done in a safe manner. It can. The worst thing that can happen is an io-page-fault. devices might not be able to recover from this. Forcing iommu assignment to be done as a first step seems much saner. If we force it, there is no reason why not doing it implicitly. What you describe below does 3 ioctls for what can be done with 1. We can do something like this then: dev1 = open(); ioctl(dev1, IOMMU_MAP, ...); /* creates IOMMU domain and assigns dev1 to it*/ dev2 = open(); ioctl(dev2, IOMMU_MAP, ...); /* Now dev1 and dev2 are in seperate domains */ ioctl(dev2, IOMMU_SHARE, dev1); /* destroys all mapping for dev2 and assigns it to the same domain as dev1. Domain has a refcount of two now */ Or maybe it destroys mapping for dev1? How do you remember? close(dev1); /* domain refcount goes down to one */ close(dev2); /* domain refcount is zero and domain gets destroyed */ Joerg Also, no way to unshare? That seems limiting. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Wed, Jun 02, 2010 at 01:15:34PM +0300, Michael S. Tsirkin wrote: One of the issues I see with the current patch is that it uses the mlock rlimit to do this pinning. So this wastes the rlimit for an app that did mlockall already, and also consumes this resource transparently, so an app might call mlock on a small buffer and be surprised that it fails. Using mmu notifiers might help? MMU notifiers are problematic because they are designed for situations where we can do demand paging. The invalidate_range_start and invalidate_range_end functions are not only called on munmap, they also run when mprotect is called (in which case we don't want to tear down iommu mappings). So what may happen with mmu notifiers is that we accidentially tear down iommu mappings. With demand-paging this is no problem because the io-ptes could be re-faulted. But that does not work here. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Wed, Jun 02, 2010 at 01:21:44PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 02, 2010 at 12:19:40PM +0200, Joerg Roedel wrote: It can. The worst thing that can happen is an io-page-fault. devices might not be able to recover from this. With the userspace interface a process can create io-page-faults anyway if it wants. We can't protect us from this. And the process is also responsible to not tear down iommu-mappings that are currently in use. What you describe below does 3 ioctls for what can be done with 1. The second IOMMU_MAP ioctl is just to show that existing mappings would be destroyed if the device is assigned to another address space. Not strictly necessary. So we have two ioctls but save one call to create the iommu-domain. ioctl(dev2, IOMMU_SHARE, dev1); /* destroys all mapping for dev2 and assigns it to the same domain as dev1. Domain has a refcount of two now */ Or maybe it destroys mapping for dev1? How do you remember? Because we express here that dev2 shares the iommu mappings of dev1. Thats easy to remember. Also, no way to unshare? That seems limiting. Just left out for simplicity reasons. An IOMMU_UNBIND (no IOMMU_UNSHARE because that would require a second parameter) ioctl is certainly also required. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Wed, Jun 02, 2010 at 12:35:16PM +0200, Joerg Roedel wrote: On Wed, Jun 02, 2010 at 01:21:44PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 02, 2010 at 12:19:40PM +0200, Joerg Roedel wrote: It can. The worst thing that can happen is an io-page-fault. devices might not be able to recover from this. With the userspace interface a process can create io-page-faults anyway if it wants. We can't protect us from this. We could fail all operations until an iommu is bound. This will help catch bugs with access before setup. We can not do this if a domain is bound by default. And the process is also responsible to not tear down iommu-mappings that are currently in use. What you describe below does 3 ioctls for what can be done with 1. The second IOMMU_MAP ioctl is just to show that existing mappings would be destroyed if the device is assigned to another address space. Not strictly necessary. So we have two ioctls but save one call to create the iommu-domain. With 10 devices you have 10 extra ioctls. ioctl(dev2, IOMMU_SHARE, dev1); /* destroys all mapping for dev2 and assigns it to the same domain as dev1. Domain has a refcount of two now */ Or maybe it destroys mapping for dev1? How do you remember? Because we express here that dev2 shares the iommu mappings of dev1. Thats easy to remember. they both share the mappings. which one gets the iommu destroyed (breaking the device if it is now doing DMA)? Also, no way to unshare? That seems limiting. Just left out for simplicity reasons. An IOMMU_UNBIND (no IOMMU_UNSHARE because that would require a second parameter) ioctl is certainly also required. Joerg -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Wed, Jun 02, 2010 at 12:19:40PM +0200, Joerg Roedel wrote: On Wed, Jun 02, 2010 at 12:53:12PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 02, 2010 at 11:42:01AM +0200, Joerg Roedel wrote: IMO a seperate iommu-userspace driver is a nightmare for a userspace interface. It is just too complicated to use. One advantage would be that we can reuse the uio framework for the devices themselves. So an existing app can just program an iommu for DMA and keep using uio for interrupts and access. The driver is called UIO and not U-INTR-MMIO ;-) So I think handling IOMMU mappings belongs there. Maybe it could be put there but the patch posted did not use uio. And one of the reasons is that uio framework provides for device access and interrupts but not for programming memory mappings. Solutions (besides giving up on uio completely) could include extending the framework in some way (which was tried, but the result was not pretty) or adding a separate driver for iommu and binding to that. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v4 0/3] Sheepdog: distributed storage system for QEMU
Am 28.05.2010 04:44, schrieb MORITA Kazutaka: Hi all, This patch adds a block driver for Sheepdog distributed storage system. Please consider for inclusion. Hint for next time: You should remove the RFC from the subject line if you think the patch is ready for inclusion. Otherwise I might miss this and think you only want comments on it. MORITA Kazutaka (3): close all the block drivers before the qemu process exits block: call the snapshot handlers of the protocol drivers block: add sheepdog driver for distributed storage support Thanks, I have applied the first two patches to the block branch, they look good to me. I'll send some comments for the third one (though it's only coding style until now). Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Wed, Jun 02, 2010 at 01:38:28PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 02, 2010 at 12:35:16PM +0200, Joerg Roedel wrote: With the userspace interface a process can create io-page-faults anyway if it wants. We can't protect us from this. We could fail all operations until an iommu is bound. This will help catch bugs with access before setup. We can not do this if a domain is bound by default. Even if it is bound to a domain the userspace driver could program the device to do dma to unmapped regions causing io-page-faults. The kernel can't do anything about it. The second IOMMU_MAP ioctl is just to show that existing mappings would be destroyed if the device is assigned to another address space. Not strictly necessary. So we have two ioctls but save one call to create the iommu-domain. With 10 devices you have 10 extra ioctls. And this works implicitly with your proposal? Remember that we still need to be able to provide seperate mappings for each device to support IOMMU emulation for the guest. I think my proposal does not have any extra costs. Because we express here that dev2 shares the iommu mappings of dev1. Thats easy to remember. they both share the mappings. which one gets the iommu destroyed (breaking the device if it is now doing DMA)? As I wrote the domain has a reference count and is destroyed only when it goes down to zero. This does not happen as long as a device is bound to it. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: rework remove-write-access for a slot
On 06/02/2010 11:53 AM, Lai Jiangshan wrote: Current code uses slot_bitmap to find ptes who map a page from the memory slot, it is not precise: some ptes in the shadow page are not map any page from the memory slot. This patch uses rmap to find the ptes precisely, and remove the unused slot_bitmap. Patch looks good; a couple of comments: - We might see a slowdown with !tdp, since we no longer have locality. Each page will map to an spte in a different page. However, it's still worth it in my opinion. - I thought of a different approach to write protection: write protect the L4 sptes, on write fault add write permission to the L4 spte and write protect the L3 sptes that it points to, etc. This method can use the slot bitmap to reduce the number of write faults. However we can reintroduce the slot bitmap if/when we use the method, this shouldn't block the patch. +static void rmapp_remove_write_access(struct kvm *kvm, unsigned long *rmapp) +{ +u64 *spte = rmap_next(kvm, rmapp, NULL); + +while (spte) { +/* avoid RMW */ +if (is_writable_pte(*spte)) +*spte = ~PT_WRITABLE_MASK; Must use an atomic operation here to avoid losing dirty or accessed bit. +spte = rmap_next(kvm, rmapp, spte); +} +} -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On 06/02/2010 01:04 PM, Joerg Roedel wrote: On Wed, Jun 02, 2010 at 12:49:28PM +0300, Avi Kivity wrote: On 06/02/2010 12:45 PM, Joerg Roedel wrote: IOMMU mapped memory can not be swapped out because we can't do demand paging on io-page-faults with current devices. We have to pin _all_ userspace memory that is mapped into an IOMMU domain. vhost doesn't pin memory. What I proposed is to describe the memory map using an object (fd), and pass it around to clients that use it: kvm, vhost, vfio. That way you maintain the memory map in a central location and broadcast changes to clients. Only a vfio client would result in memory being pinned. Ah ok, so its only about the database which keeps the mapping information. Yes. It can still work, but the interface needs to be extended to include dirty bitmap logging. Thats hard to do. I am not sure about VT-d but the AMD IOMMU has no dirty-bits in the page-table. And without demand-paging we can't really tell what pages a device has written to. The only choice is to mark all IOMMU-mapped pages dirty as long as they are mapped. The interface would only work for clients which support it: kvm, vhost, and iommu/devices with restartable dma. Note dirty logging is not very interesting for vfio anyway, since you can't live migrate with assigned devices. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Wed, Jun 02, 2010 at 01:12:25PM +0200, Joerg Roedel wrote: On Wed, Jun 02, 2010 at 01:38:28PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 02, 2010 at 12:35:16PM +0200, Joerg Roedel wrote: With the userspace interface a process can create io-page-faults anyway if it wants. We can't protect us from this. We could fail all operations until an iommu is bound. This will help catch bugs with access before setup. We can not do this if a domain is bound by default. Even if it is bound to a domain the userspace driver could program the device to do dma to unmapped regions causing io-page-faults. The kernel can't do anything about it. It can always corrupt its own memory directly as well :) But that is not a reason not to detect errors if we can, and not to make APIs hard to misuse. The second IOMMU_MAP ioctl is just to show that existing mappings would be destroyed if the device is assigned to another address space. Not strictly necessary. So we have two ioctls but save one call to create the iommu-domain. With 10 devices you have 10 extra ioctls. And this works implicitly with your proposal? Yes. so you do: iommu = open ioctl(dev1, BIND, iommu) ioctl(dev2, BIND, iommu) ioctl(dev3, BIND, iommu) ioctl(dev4, BIND, iommu) No need to add a SHARE ioctl. Remember that we still need to be able to provide seperate mappings for each device to support IOMMU emulation for the guest. Generally not true. E.g. guest can enable iommu passthrough or have domain per a group of devices. I think my proposal does not have any extra costs. with my proposal we have 1 ioctl per device + 1 per domain. with yours we have 2 ioctls per device is iommu is shared and 1 if it is not shared. as current apps share iommu it seems to make sense to optimize for that. Because we express here that dev2 shares the iommu mappings of dev1. Thats easy to remember. they both share the mappings. which one gets the iommu destroyed (breaking the device if it is now doing DMA)? As I wrote the domain has a reference count and is destroyed only when it goes down to zero. This does not happen as long as a device is bound to it. Joerg We were talking about UNSHARE ioctl: ioctl(dev1, UNSHARE, dev2) Does it change the domain for dev1 or dev2? If you make a mistake you get a hard to debug bug. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Wed, Jun 02, 2010 at 02:21:00PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 02, 2010 at 01:12:25PM +0200, Joerg Roedel wrote: Even if it is bound to a domain the userspace driver could program the device to do dma to unmapped regions causing io-page-faults. The kernel can't do anything about it. It can always corrupt its own memory directly as well :) But that is not a reason not to detect errors if we can, and not to make APIs hard to misuse. Changing the domain of a device while dma can happen is the same type of bug as unmapping potential dma target addresses. We can't catch this kind of misuse. With 10 devices you have 10 extra ioctls. And this works implicitly with your proposal? Yes. so you do: iommu = open ioctl(dev1, BIND, iommu) ioctl(dev2, BIND, iommu) ioctl(dev3, BIND, iommu) ioctl(dev4, BIND, iommu) No need to add a SHARE ioctl. In my proposal this looks like: dev1 = open(); ioctl(dev2, SHARE, dev1); ioctl(dev3, SHARE, dev1); ioctl(dev4, SHARE, dev1); So we actually save an ioctl. Remember that we still need to be able to provide seperate mappings for each device to support IOMMU emulation for the guest. Generally not true. E.g. guest can enable iommu passthrough or have domain per a group of devices. What I meant was that there may me multiple io-addresses spaces necessary for one process. I didn't want to say that every device _needs_ to have its own address space. As I wrote the domain has a reference count and is destroyed only when it goes down to zero. This does not happen as long as a device is bound to it. Joerg We were talking about UNSHARE ioctl: ioctl(dev1, UNSHARE, dev2) Does it change the domain for dev1 or dev2? If you make a mistake you get a hard to debug bug. As I already wrote we would have an UNBIND ioctl which just removes a device from its current domain. UNBIND is better than UNSHARE for exactly the reason you pointed out above. I thought I stated that already. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Virtual disks with SCSI or virtio hang after some time
Am Dienstag, 1. Juni 2010 schrieben Sie: On 06/01/2010 06:59 PM, Guido Winkelmann wrote: The host OS is Fedora Core 12, with qemu-kvm 0.11.0 Please try with at least qemu-kvm-0.11.1, preferably qemu-kvm-0.12.4. Also use Linux 2.6.32.latest in the guest. Okay, I've upgraded the userspace tools to qemu-kvm-0.12.4, and so far, the problem is not resurfacing. Before, it could usually be triggered by doing hdparm -t /dev/vda (or sda, respectively) once or twice. Unfortunately, now I cannot start new VMs with libvirt anymore, and I have no idea why... :( Guido -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On 06/02/2010 03:19 PM, Joerg Roedel wrote: Yes. so you do: iommu = open ioctl(dev1, BIND, iommu) ioctl(dev2, BIND, iommu) ioctl(dev3, BIND, iommu) ioctl(dev4, BIND, iommu) No need to add a SHARE ioctl. In my proposal this looks like: dev1 = open(); ioctl(dev2, SHARE, dev1); ioctl(dev3, SHARE, dev1); ioctl(dev4, SHARE, dev1); So we actually save an ioctl. The problem with this is that it is assymetric, dev1 is treated differently from dev[234]. It's an unintuitive API. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Wed, Jun 02, 2010 at 02:19:28PM +0200, Joerg Roedel wrote: On Wed, Jun 02, 2010 at 02:21:00PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 02, 2010 at 01:12:25PM +0200, Joerg Roedel wrote: Even if it is bound to a domain the userspace driver could program the device to do dma to unmapped regions causing io-page-faults. The kernel can't do anything about it. It can always corrupt its own memory directly as well :) But that is not a reason not to detect errors if we can, and not to make APIs hard to misuse. Changing the domain of a device while dma can happen is the same type of bug as unmapping potential dma target addresses. We can't catch this kind of misuse. you normally need device mapped to start DMA. SHARE makes this bug more likely as you allow switching domains: mmap could be done before switching. With 10 devices you have 10 extra ioctls. And this works implicitly with your proposal? Yes. so you do: iommu = open ioctl(dev1, BIND, iommu) ioctl(dev2, BIND, iommu) ioctl(dev3, BIND, iommu) ioctl(dev4, BIND, iommu) No need to add a SHARE ioctl. In my proposal this looks like: dev1 = open(); ioctl(dev2, SHARE, dev1); ioctl(dev3, SHARE, dev1); ioctl(dev4, SHARE, dev1); So we actually save an ioctl. I thought we had a BIND ioctl? Remember that we still need to be able to provide seperate mappings for each device to support IOMMU emulation for the guest. Generally not true. E.g. guest can enable iommu passthrough or have domain per a group of devices. What I meant was that there may me multiple io-addresses spaces necessary for one process. I didn't want to say that every device _needs_ to have its own address space. As I wrote the domain has a reference count and is destroyed only when it goes down to zero. This does not happen as long as a device is bound to it. Joerg We were talking about UNSHARE ioctl: ioctl(dev1, UNSHARE, dev2) Does it change the domain for dev1 or dev2? If you make a mistake you get a hard to debug bug. As I already wrote we would have an UNBIND ioctl which just removes a device from its current domain. UNBIND is better than UNSHARE for exactly the reason you pointed out above. I thought I stated that already. Joerg You undo SHARE with UNBIND? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unhandled wrmsr
Dave Young wrote: Hi, With today's git version (qemu-kvm), I got following message in kernel dmesg [168344.215605] kvm: 27289: cpu0 unhandled wrmsr: 0x198 data 0 Are you sure about that? 0x198 is an Intel architectural perfmon MSR and it is read-only. The Linux kernel source I grep'ed obeys this and does only rdmsr. You can work around this by changing the error to a warning with: # modprobe kvm ignore_msrs=1 I'd like to see more details about the guest Linux kernel, at least the version you used to see why there is a wrmsr on this address. Best is you provide the kernel (just the vmlinuz file) somewhere so that we can reproduce this. Also the qemu-kvm command line would be interesting. Regards, Andre. bash-3.1$ /home/dave/tmp/qemu-system-x86_64 --version QEMU emulator version 0.12.50 (qemu-kvm-devel), Copyright (c) 2003-2008 Fabrice Bellard bash-3.1$ uname -a Linux darkstar 2.6.35-rc1 #35 SMP Mon May 31 16:50:15 CST 2010 x86_64 Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz GenuineIntel GNU/Linux host distribution is slackware64-13.0 0.12.3 works well BTW the guest is a tiny core linux image -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 448-3567-12 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Wed, Jun 02, 2010 at 03:25:11PM +0300, Avi Kivity wrote: On 06/02/2010 03:19 PM, Joerg Roedel wrote: Yes. so you do: iommu = open ioctl(dev1, BIND, iommu) ioctl(dev2, BIND, iommu) ioctl(dev3, BIND, iommu) ioctl(dev4, BIND, iommu) No need to add a SHARE ioctl. In my proposal this looks like: dev1 = open(); ioctl(dev2, SHARE, dev1); ioctl(dev3, SHARE, dev1); ioctl(dev4, SHARE, dev1); So we actually save an ioctl. The problem with this is that it is assymetric, dev1 is treated differently from dev[234]. It's an unintuitive API. Its by far more unintuitive that a process needs to explicitly bind a device to an iommu domain before it can do anything with it. If its required anyway the binding can happen implicitly. We could allow to do a nop 'ioctl(dev1, SHARE, dev1)' to remove the asymmetry. Note that this way of handling userspace iommu mappings is also a lot simpler for most use-cases outside of KVM. If a developer wants to write a userspace driver all it needs to do is: dev = open(); ioctl(dev, MAP, ...); /* use device with mappings */ close(dev); Which is much easier than the need to create a domain explicitly. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Wed, Jun 02, 2010 at 03:34:17PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 02, 2010 at 02:19:28PM +0200, Joerg Roedel wrote: you normally need device mapped to start DMA. SHARE makes this bug more likely as you allow switching domains: mmap could be done before switching. We need to support domain switching anyway for iommu emulation in a guest. So if you consider this to be a problem (I don't) it will not go away with your proposal. dev1 = open(); ioctl(dev2, SHARE, dev1); ioctl(dev3, SHARE, dev1); ioctl(dev4, SHARE, dev1); So we actually save an ioctl. I thought we had a BIND ioctl? I can't remember a BIND ioctl in my proposal. I remember an UNBIND, but thats bad naming as you pointed out below. See my statement on this below too. You undo SHARE with UNBIND? Thats bad naming, agreed. Lets keep UNSHARE. Point is, we only need one parameter to do this which removes any ambiguity: ioctl(dev1, UNSHARE); Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On 06/02/2010 03:50 PM, Joerg Roedel wrote: The problem with this is that it is assymetric, dev1 is treated differently from dev[234]. It's an unintuitive API. Its by far more unintuitive that a process needs to explicitly bind a device to an iommu domain before it can do anything with it. I don't really care about the iommu domain. It's a side effect. The kernel takes care of it. I'm only worried about the API. We have a memory map that is (often) the same for a set of devices. If you were coding a non-kernel interface, how would you code it? struct memory_map; void memory_map_init(struct memory_map *mm, ...); struct device; void device_set_memory_map(struct device *device, struct memory_map *mm); or struct device; void device_init_memory_map(struct device *dev, ...); void device_clone_memory_map(struct device *dev, struct device *other); I wouldn't even think of the second one personally. If its required anyway the binding can happen implicitly. We could allow to do a nop 'ioctl(dev1, SHARE, dev1)' to remove the asymmetry. It's still special. You define the memory map only for the first device. You have to make sure dev1 doesn't go away while sharing it. Note that this way of handling userspace iommu mappings is also a lot simpler for most use-cases outside of KVM. If a developer wants to write a userspace driver all it needs to do is: dev = open(); ioctl(dev, MAP, ...); /* use device with mappings */ close(dev); Which is much easier than the need to create a domain explicitly. mm = open() ioctl(mm, MAP, ...) dev = open(); ioctl(dev, BIND, mm); ... close(mm); close(dev); so yes, more work, but once you have multiple devices which come and go dynamically things become simpler. The map object has global lifetime (you can even construct it if you don't assign any devices), the devices attach to it, memory hotplug updates the memory map but doesn't touch devices. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2963581 ] qemu-kvm upstream crashes when using -smp 1
Bugs item #2963581, was opened at 2010-03-04 18:10 Message generated for change (Comment added) made by You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2963581group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: qemu Group: None Status: Closed Resolution: None Priority: 5 Private: No Submitted By: Lucas Meneghel Rodrigues () Assigned to: Nobody/Anonymous (nobody) Summary: qemu-kvm upstream crashes when using -smp 1 Initial Comment: qemu-kvm.git master is crashing when using -smp 1 Relevant versions: Commit hash for git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git is 7811d4e8ec057d25db68f900be1f09a142faca49 (tag kvm-88-3686-g7811d4e) Kernel: 2.6.31.12-174.2.22.fc12.x86_64 Steps to reproduce 1 - Clone git repo git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git 2 - Build qemu-kvm from this repo 3 - Try to start it with -smp 1, reference command line: 03/04 12:56:12 DEBUG|kvm_vm:0461| Running qemu command: /usr/local/autotest/tests/kvm/qemu -name 'vm1' -monitor unix:/tmp/monitor-20100304-125508-G6lf,server,nowait -drive file=/tmp/kvm_autotest_root/images/rhel5-64.qcow2,if=ide -net nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:36:60 -net user,vlan=0 -m 1024 -smp 1 -drive file=/tmp/kvm_autotest_root/isos/linux/RHEL-5.4-x86_64-DVD.iso,index=2,media=cdrom -fda /usr/local/autotest/tests/kvm/images/floppy.img -tftp /usr/local/autotest/tests/kvm/images/tftpboot -boot d -bootp /pxelinux.0 -boot n -mem-path /mnt/kvm_hugepage -redir tcp:5000::22 -vnc :0 03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) kvm_create_vcpu: Bad file descriptor 03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) /bin/sh: line 1: 17273 Segmentation fault (core dumped) /usr/local/autotest/tests/kvm/qemu -name 'vm1' -monitor unix:/tmp/monitor-20100304-125508-G6lf,server,nowait -drive file=/tmp/kvm_autotest_root/images/rhel5-64.qcow2,if=ide -net nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:36:60 -net user,vlan=0 -m 1024 -smp 1 -drive file=/tmp/kvm_autotest_root/isos/linux/RHEL-5.4-x86_64-DVD.iso,index=2,media=cdrom -fda /usr/local/autotest/tests/kvm/images/floppy.img -tftp /usr/local/autotest/tests/kvm/images/tftpboot -boot d -bootp /pxelinux.0 -boot n -mem-path /mnt/kvm_hugepage -redir tcp:5000::22 -vnc :0 03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) (Process terminated with status 139) So we have a segmentation fault. -- Comment By: Lucas Meneghel Rodrigues () Date: 2010-06-02 13:08 Message: Bug resolved a few days after it was reported. Closing it... -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2963581group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[CFP] Bug Day 2010
Hi everyone, Today is our first QEMU Bug Day. The goal of Bug Day is to spread some bug love and make a dent in the various bug trackers. We've setup a special channel, #qemu-bugday on OFTC that folks can use to coordinate bug hunting effort. Here's some ways you can participate in the Bug Day: - Fixing bugs - Marking bugs that require more information as Incomplete - Trying to reproduce existing bugs and marking them as Confirmed - Reporting bugs that haven't yet been reported - Migrating bugs from the SourceForge tracker to Launchpad More information can be found at http://wiki.qemu.org/BugDay/June2010 Happy bug hunting! Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Wed, Jun 02, 2010 at 02:50:50PM +0200, Joerg Roedel wrote: On Wed, Jun 02, 2010 at 03:25:11PM +0300, Avi Kivity wrote: On 06/02/2010 03:19 PM, Joerg Roedel wrote: Yes. so you do: iommu = open ioctl(dev1, BIND, iommu) ioctl(dev2, BIND, iommu) ioctl(dev3, BIND, iommu) ioctl(dev4, BIND, iommu) No need to add a SHARE ioctl. In my proposal this looks like: dev1 = open(); ioctl(dev2, SHARE, dev1); ioctl(dev3, SHARE, dev1); ioctl(dev4, SHARE, dev1); So we actually save an ioctl. The problem with this is that it is assymetric, dev1 is treated differently from dev[234]. It's an unintuitive API. Its by far more unintuitive that a process needs to explicitly bind a device to an iommu domain before it can do anything with it. The reason it is more intuitive is because it is harder to get it wrong. If you swap iommu and device in the call, you get BADF so you know you made a mistake. We can even make it work both ways if we wanted to. With ioctl(dev1, BIND, dev2) it breaks silently. If its required anyway the binding can happen implicitly. We could allow to do a nop 'ioctl(dev1, SHARE, dev1)' to remove the asymmetry. And then when we assign meaning to it we find that half the apps are broken because they did not call this ioctl. Note that this way of handling userspace iommu mappings is also a lot simpler for most use-cases outside of KVM. If a developer wants to write a userspace driver all it needs to do is: dev = open(); ioctl(dev, MAP, ...); /* use device with mappings */ close(dev); Which is much easier than the need to create a domain explicitly. Joerg This simple scenario ignores all the real-life corner cases. For example, with an explicit iommu open and bind application can naturally detect that: - we have run out of iommu domains - iommu is unsupported - iommu is in use by another, incompatible device - device is in bad state because each is a separate operation, so it is easy to produce meaningful errors. Another interesting thing that a separate iommu device supports is when application A controls the iommu and application B controls the device. This might be good to e.g. improve security (B is run by root, A is unpriveledged and passes commands to/from B over a pipe). This is not possible when same fd is used for iommu and device. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unhandled wrmsr
Andre Przywara wrote: Dave Young wrote: Hi, With today's git version (qemu-kvm), I got following message in kernel dmesg [168344.215605] kvm: 27289: cpu0 unhandled wrmsr: 0x198 data 0 Are you sure about that? 0x198 is an Intel architectural perfmon MSR and it is read-only. The Linux kernel source I grep'ed obeys this and does only rdmsr. You can work around this by changing the error to a warning with: # modprobe kvm ignore_msrs=1 I'd like to see more details about the guest Linux kernel, at least the version you used to see why there is a wrmsr on this address. Best is you provide the kernel (just the vmlinuz file) somewhere so that we can reproduce this. Also the qemu-kvm command line would be interesting. I see the same with Ubuntu 10.04 64-bit (2.6.32-21-server) cmdline: /usr/bin/qemu-kvm-devel -net tap,vlan=141,script=no,downscript=no,ifname=tap0 -net nic,vlan=141,model=e1000,macaddr=52:54:00:ff:00:73 -drive format=host_device,file=/dev/mapper/iqn.2001-05.com.equallogic:0-8a0906-c0c61b105-db8000e7e954bf65-migration-test-2,if=ide,boot=on,cache=none,aio=native -m 1024 -cpu qemu64,model_id='Intel(R) Xeon(R) CPU E5520 @ 2.27GHz',-kvmclock -monitor tcp:0:4001,server,nowait -vnc :1 -name 'migration-test-10-04' -boot order=dc,menu=on -k de -pidfile /var/run/qemu/vm-156.pid -mem-path /hugepages -mem-prealloc -rtc base=utc,clock=host -usb -usbdevice tablet Regards, Andre. bash-3.1$ /home/dave/tmp/qemu-system-x86_64 --version QEMU emulator version 0.12.50 (qemu-kvm-devel), Copyright (c) 2003-2008 Fabrice Bellard bash-3.1$ uname -a Linux darkstar 2.6.35-rc1 #35 SMP Mon May 31 16:50:15 CST 2010 x86_64 Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz GenuineIntel GNU/Linux host distribution is slackware64-13.0 0.12.3 works well BTW the guest is a tiny core linux image -- Mit freundlichen Grüßen/Kind Regards Peter Lieven .. KAMP Netzwerkdienste GmbH Vestische Str. 89-91 | 46117 Oberhausen Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40 mailto:p...@kamp.de | http://www.kamp.de Geschäftsführer: Heiner Lante | Michael Lante Amtsgericht Duisburg | HRB Nr. 12154 USt-Id-Nr.: DE 120607556 . -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Wed, Jun 02, 2010 at 04:06:21PM +0300, Avi Kivity wrote: On 06/02/2010 03:50 PM, Joerg Roedel wrote: Its by far more unintuitive that a process needs to explicitly bind a device to an iommu domain before it can do anything with it. I don't really care about the iommu domain. It's a side effect. The kernel takes care of it. I'm only worried about the API. The proposed memory-map object is nothing else than a userspace abstraction of an iommu-domain. We have a memory map that is (often) the same for a set of devices. If you were coding a non-kernel interface, how would you code it? struct memory_map; void memory_map_init(struct memory_map *mm, ...); struct device; void device_set_memory_map(struct device *device, struct memory_map *mm); or struct device; void device_init_memory_map(struct device *dev, ...); void device_clone_memory_map(struct device *dev, struct device *other); I wouldn't even think of the second one personally. Right, a kernel-interface would be designed the first way. The IOMMU-API is actually designed in this manner. But I still think we should keep it simpler for userspace. If its required anyway the binding can happen implicitly. We could allow to do a nop 'ioctl(dev1, SHARE, dev1)' to remove the asymmetry. It's still special. You define the memory map only for the first device. You have to make sure dev1 doesn't go away while sharing it. Must be a misunderstanding. In my proposal the domain is not owned by one device. It is owned by all devices that share it and will only vanish if all devices that use it are unbound (which happens when the file descriptor is closed, for example). so yes, more work, but once you have multiple devices which come and go dynamically things become simpler. The map object has global lifetime (you can even construct it if you don't assign any devices), the devices attach to it, memory hotplug updates the memory map but doesn't touch devices. I still think a userspace interface should be as simple as possible. But since both ways will work I am not really opposed to Michael's proposal. I just think its overkill for the common non-kvm usecase (a userspace device driver). Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v4 3/3] block: add sheepdog driver for distributed storage support
Am 28.05.2010 04:44, schrieb MORITA Kazutaka: Sheepdog is a distributed storage system for QEMU. It provides highly available block level storage volumes to VMs like Amazon EBS. This patch adds a qemu block driver for Sheepdog. Sheepdog features are: - No node in the cluster is special (no metadata node, no control node, etc) - Linear scalability in performance and capacity - No single point of failure - Autonomous management (zero configuration) - Useful volume management support such as snapshot and cloning - Thin provisioning - Autonomous load balancing The more details are available at the project site: http://www.osrg.net/sheepdog/ Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp --- Makefile.objs|2 +- block/sheepdog.c | 1835 ++ 2 files changed, 1836 insertions(+), 1 deletions(-) create mode 100644 block/sheepdog.c One general thing: The code uses some mix of spaces and tabs for indentation, with the greatest part using tabs. According to CODING_STYLE it should consistently use four spaces instead. diff --git a/Makefile.objs b/Makefile.objs index 1a942e5..527a754 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -14,7 +14,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o -block-nested-y += parallels.o nbd.o blkdebug.o +block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o block-nested-$(CONFIG_WIN32) += raw-win32.o block-nested-$(CONFIG_POSIX) += raw-posix.o block-nested-$(CONFIG_CURL) += curl.o diff --git a/block/sheepdog.c b/block/sheepdog.c new file mode 100644 index 000..68545e8 --- /dev/null +++ b/block/sheepdog.c @@ -0,0 +1,1835 @@ +/* + * Copyright (C) 2009-2010 Nippon Telegraph and Telephone Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License version + * 2 as published by the Free Software Foundation. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ +#include netdb.h +#include netinet/tcp.h + +#include qemu-common.h +#include qemu-error.h +#include block_int.h + +#define SD_PROTO_VER 0x01 + +#define SD_DEFAULT_ADDR localhost:7000 + +#define SD_OP_CREATE_AND_WRITE_OBJ 0x01 +#define SD_OP_READ_OBJ 0x02 +#define SD_OP_WRITE_OBJ 0x03 + +#define SD_OP_NEW_VDI0x11 +#define SD_OP_LOCK_VDI 0x12 +#define SD_OP_RELEASE_VDI0x13 +#define SD_OP_GET_VDI_INFO 0x14 +#define SD_OP_READ_VDIS 0x15 + +#define SD_FLAG_CMD_WRITE0x01 +#define SD_FLAG_CMD_COW 0x02 + +#define SD_RES_SUCCESS 0x00 /* Success */ +#define SD_RES_UNKNOWN 0x01 /* Unknown error */ +#define SD_RES_NO_OBJ0x02 /* No object found */ +#define SD_RES_EIO 0x03 /* I/O error */ +#define SD_RES_VDI_EXIST 0x04 /* Vdi exists already */ +#define SD_RES_INVALID_PARMS 0x05 /* Invalid parameters */ +#define SD_RES_SYSTEM_ERROR 0x06 /* System error */ +#define SD_RES_VDI_LOCKED0x07 /* Vdi is locked */ +#define SD_RES_NO_VDI0x08 /* No vdi found */ +#define SD_RES_NO_BASE_VDI 0x09 /* No base vdi found */ +#define SD_RES_VDI_READ 0x0A /* Cannot read requested vdi */ +#define SD_RES_VDI_WRITE 0x0B /* Cannot write requested vdi */ +#define SD_RES_BASE_VDI_READ 0x0C /* Cannot read base vdi */ +#define SD_RES_BASE_VDI_WRITE 0x0D /* Cannot write base vdi */ +#define SD_RES_NO_TAG0x0E /* Requested tag is not found */ +#define SD_RES_STARTUP 0x0F /* Sheepdog is on starting up */ +#define SD_RES_VDI_NOT_LOCKED 0x10 /* Vdi is not locked */ +#define SD_RES_SHUTDOWN 0x11 /* Sheepdog is shutting down */ +#define SD_RES_NO_MEM0x12 /* Cannot allocate memory */ +#define SD_RES_FULL_VDI 0x13 /* we already have the maximum vdis */ +#define SD_RES_VER_MISMATCH 0x14 /* Protocol version mismatch */ +#define SD_RES_NO_SPACE 0x15 /* Server has no room for new objects */ +#define SD_RES_WAIT_FOR_FORMAT 0x16 /* Sheepdog is waiting for a format operation */ +#define SD_RES_WAIT_FOR_JOIN0x17 /* Sheepdog is waiting for other nodes joining */ +#define SD_RES_JOIN_FAILED 0x18 /* Target node had failed to join sheepdog */ + +/* + * Object ID rules + * + * 0 - 19 (20 bits): data object space + * 20 - 31 (12 bits): reserved data object space + * 32 - 55 (24 bits): vdi object space + * 56 - 59 ( 4 bits): reserved vdi object space + * 60 - 63 ( 4 bits): object type indentifier space + */ + +#define VDI_SPACE_SHIFT 32 +#define VDI_BIT (UINT64_C(1) 63) +#define VMSTATE_BIT (UINT64_C(1) 62) +#define MAX_DATA_OBJS (1ULL 20) +#define MAX_CHILDREN 1024 +#define
KVM: read apic-irr with ioapic lock held
Read ioapic-irr inside ioapic-lock protected section. KVM-Stable-Tag Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index 3bc4fdb..1149c60 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -193,12 +193,13 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq) int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level) { - u32 old_irr = ioapic-irr; + u32 old_irr; u32 mask = 1 irq; union kvm_ioapic_redirect_entry entry; int ret = 1; spin_lock(ioapic-lock); + old_irr = ioapic-irr; if (irq = 0 irq IOAPIC_NUM_PINS) { entry = ioapic-redirtbl[irq]; level ^= entry.fields.polarity; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
porting fixes regarding kvm-clock and lost irqs to stable qemu-kvm 0.12.4
Hi, I would like to get latest stable qemu-kvm (0.12.4) to a usable state regarding live-migration. Problems are fixed in git, but there is so much new stuff that has not extensively tested and therefore I would like to stay at 0.12.4 at the moment. Therefore I would appreciate your help regarding 2 bugs: a) -cpu xxx,-kvmclock doesn't work in 0.12.4. It started working first after applying a patchset from Avi that is not even in git yet. What needs to be done to get it working in 0.12.4 ? b) There is a bug in 0.12.4 which leads to lost irqs. It was reported appearing in virtio_blk (#584131) and e1000 (#585113). It is finally fixed in GIT. Can someone give me a hint with commit fixed this? Thanks, Peter -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] make-release: make mtime, owner, group consistent
Files from git have modification time set to one of commit, and owner/group to root. Making it so for generated files as well makes it easier to generate an identical tarball from git. Setting owner/group to root is especially important because otherwise you must have a user/group with same name to generate an identical tarball. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- kvm/scripts/make-release |9 +++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/kvm/scripts/make-release b/kvm/scripts/make-release index fdc402b..e8051f4 100755 --- a/kvm/scripts/make-release +++ b/kvm/scripts/make-release @@ -1,7 +1,7 @@ #!/bin/bash -e usage() { -echo usage: $0 [--upload] [--formal] commit [name] [tarball] +echo usage: $0 [--upload] [--formal] commit [name] [tarball] [user] exit 1 } @@ -51,17 +51,22 @@ cd $(dirname $0)/../.. mkdir -p $(dirname $tarball) git archive --prefix=$name/ --format=tar $commit $tarball +mtime=`git show --format=%ct $commit --` +tarargs=--owner=root --group=root --mti...@$mtime + mkdir -p $tmpdir git cat-file -p ${commit}:roms | awk ' { print $4, $3 } ' \ $tmpdir/EXTERNAL_DEPENDENCIES tar -rf $tarball --transform s,^,$name/, -C $tmpdir \ +$tarargs \ EXTERNAL_DEPENDENCIES rm -rf $tmpdir if [[ -n $formal ]]; then mkdir -p $tmpdir echo $name $tmpdir/KVM_VERSION -tar -rf $tarball --transform s,^,$name/, -C $tmpdir KVM_VERSION +tar -rf $tarball --transform s,^,$name/, -C $tmpdir KVM_VERSION \ +$tarargs rm -rf $tmpdir fi -- 1.7.1.12.g42b7f -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix bugs in msix_set/unset_mask_notifier() routines
I am hitting the following assertions in msix.c when doing a guest reboot or live migration using vhost. qemu-kvm/hw/msix.c:375: msix_mask_all: Assertion `r = 0' failed. qemu-kvm/hw/msix.c:640: msix_unset_mask_notifier: Assertion `dev-msix_mask_notifier_opaque[vector]' failed. The following patch fixes the bugs in handling msix_is_masked() condition in msix_set/unset_mask_notifier() routines. Signed-off-by: Sridhar Samudrala s...@us.ibm.com diff --git a/hw/msix.c b/hw/msix.c index 1398680..a191df1 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -609,7 +609,7 @@ void msix_unuse_all_vectors(PCIDevice *dev) int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque) { -int r; +int r = 0; if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector]) return 0; @@ -619,13 +619,15 @@ int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque) /* Unmask the new notifier unless vector is masked. */ if (msix_is_masked(dev, vector)) { -return 0; +goto out; } r = dev-msix_mask_notifier(dev, vector, opaque, msix_is_masked(dev, vector)); if (r 0) { return r; } + +out: dev-msix_mask_notifier_opaque[vector] = opaque; return r; } @@ -640,8 +642,8 @@ int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector) assert(dev-msix_mask_notifier_opaque[vector]); /* Mask the old notifier unless it is already masked. */ -if (msix_is_masked(dev, vector)) { -return 0; +if (!msix_is_masked(dev, vector)) { +goto out; } r = dev-msix_mask_notifier(dev, vector, dev-msix_mask_notifier_opaque[vector], @@ -649,6 +651,8 @@ int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector) if (r 0) { return r; } + +out: dev-msix_mask_notifier_opaque[vector] = NULL; return r; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] Use more cpuid bits from upstream
On Tue, Jun 01, 2010 at 03:33:42PM +0300, Avi Kivity wrote: This patchset converts kvm_arch_vcpu_init()'s cpuid handling bits to use upstream code. Avi Kivity (5): Make get_para_features() similar to upstream Use get_para_features() from upstream Rename kvm_arch_vcpu_init()s cenv argument to env Use skeleton of upstream's kvm_arch_init_vcpu() Use upstream's kvm_arch_init_vcpu()'s cpuid bits qemu-kvm-x86.c| 148 +++-- target-i386/kvm.c | 15 - 2 files changed, 20 insertions(+), 143 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] Add Documentation/kvm/msr.txt
On Tue, Jun 01, 2010 at 08:22:48AM -0400, Glauber Costa wrote: This patch adds a file that documents the usage of KVM-specific MSRs. [ v2: added comments from Randy ] [ v3: added comments from Avi ] [ v4: added information about wallclock alignment ] Signed-off-by: Glauber Costa glom...@redhat.com Reviewed-by: Randy Dunlap randy.dun...@oracle.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RESEND] VMX: Properly return error to userspace on vmentry failure
On Mon, May 31, 2010 at 10:40:54PM +0300, Mohammed Gamal wrote: The vmexit handler returns KVM_EXIT_UNKNOWN since there is no handler for vmentry failures. This intercepts vmentry failures and returns KVM_FAIL_ENTRY to userspace instead. Signed-off-by: Mohammed Gamal m.gamal...@gmail.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: elide struct thread_struct instances from stack
On Mon, May 31, 2010 at 09:59:13PM +0200, Andreas Schwab wrote: Instead of instantiating a whole thread_struct on the stack use only the required parts of it. Signed-off-by: Andreas Schwab sch...@linux-m68k.org Tested-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_fpu.h | 27 + arch/powerpc/kernel/ppc_ksyms.c |4 - arch/powerpc/kvm/book3s.c| 49 +--- arch/powerpc/kvm/book3s_paired_singles.c | 94 -- arch/powerpc/kvm/fpu.S | 18 ++ 5 files changed, 97 insertions(+), 95 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
* Avi Kivity (a...@redhat.com) wrote: The interface would only work for clients which support it: kvm, vhost, and iommu/devices with restartable dma. BTW, there is no such thing as restartable dma. There is a provision in new specs (read: no real hardware) that allows a device to request pages before using them. So it's akin to demand paging, but the demand is an explicit request rather than a page fault. thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] acpi_piix4: save gpe and pci hotplug slot status
PCI hotplug currently doesn't work after a migration because we don't migrate the enable bits of the GPE state. Pull hotplug structs into vmstate. Signed-off-by: Alex Williamson alex.william...@redhat.com --- hw/acpi_piix4.c | 29 - 1 files changed, 28 insertions(+), 1 deletions(-) diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c index 0fce958..091cdcd 100644 --- a/hw/acpi_piix4.c +++ b/hw/acpi_piix4.c @@ -282,9 +282,33 @@ static int vmstate_acpi_post_load(void *opaque, int version_id) return 0; } +static const VMStateDescription vmstate_gpe = { +.name = gpe, +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +VMSTATE_UINT16(sts, struct gpe_regs), +VMSTATE_UINT16(en, struct gpe_regs), +VMSTATE_END_OF_LIST() +} +}; + +static const VMStateDescription vmstate_pci_status = { +.name = pci_status, +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +VMSTATE_UINT32(up, struct pci_status), +VMSTATE_UINT32(down, struct pci_status), +VMSTATE_END_OF_LIST() +} +}; + static const VMStateDescription vmstate_acpi = { .name = piix4_pm, -.version_id = 1, +.version_id = 2, .minimum_version_id = 1, .minimum_version_id_old = 1, .post_load = vmstate_acpi_post_load, @@ -296,6 +320,9 @@ static const VMStateDescription vmstate_acpi = { VMSTATE_STRUCT(apm, PIIX4PMState, 0, vmstate_apm, APMState), VMSTATE_TIMER(tmr_timer, PIIX4PMState), VMSTATE_INT64(tmr_overflow_time, PIIX4PMState), +VMSTATE_STRUCT(gpe, PIIX4PMState, 2, vmstate_gpe, struct gpe_regs), +VMSTATE_STRUCT(pci0_status, PIIX4PMState, 2, vmstate_pci_status, + struct pci_status), VMSTATE_END_OF_LIST() } }; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix bugs in msix_set/unset_mask_notifier() routines
On Wed, Jun 02, 2010 at 09:09:37AM -0700, Sridhar Samudrala wrote: I am hitting the following assertions in msix.c when doing a guest reboot or live migration using vhost. qemu-kvm/hw/msix.c:375: msix_mask_all: Assertion `r = 0' failed. qemu-kvm/hw/msix.c:640: msix_unset_mask_notifier: Assertion `dev-msix_mask_notifier_opaque[vector]' failed. The following patch fixes the bugs in handling msix_is_masked() condition in msix_set/unset_mask_notifier() routines. Signed-off-by: Sridhar Samudrala s...@us.ibm.com Acked-by: Michael S. Tsirkin m...@redhat.com Note: this is qemu-kvm only patch. diff --git a/hw/msix.c b/hw/msix.c index 1398680..a191df1 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -609,7 +609,7 @@ void msix_unuse_all_vectors(PCIDevice *dev) int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque) { -int r; +int r = 0; if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector]) return 0; @@ -619,13 +619,15 @@ int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque) /* Unmask the new notifier unless vector is masked. */ if (msix_is_masked(dev, vector)) { -return 0; +goto out; } r = dev-msix_mask_notifier(dev, vector, opaque, msix_is_masked(dev, vector)); if (r 0) { return r; } + +out: dev-msix_mask_notifier_opaque[vector] = opaque; return r; } @@ -640,8 +642,8 @@ int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector) assert(dev-msix_mask_notifier_opaque[vector]); /* Mask the old notifier unless it is already masked. */ -if (msix_is_masked(dev, vector)) { -return 0; +if (!msix_is_masked(dev, vector)) { +goto out; } r = dev-msix_mask_notifier(dev, vector, dev-msix_mask_notifier_opaque[vector], @@ -649,6 +651,8 @@ int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector) if (r 0) { return r; } + +out: dev-msix_mask_notifier_opaque[vector] = NULL; return r; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu-kvm tests: make hypercall test runable on 32-bit host
On Tue, Jun 01, 2010 at 10:02:34PM +0800, Asias He wrote: Signed-off-by: Asias He asias.he...@gmail.com --- kvm/test/config-x86-common.mak |5 +++-- kvm/test/config-x86_64.mak |3 +-- 2 files changed, 4 insertions(+), 4 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pc: push setting default cpu_model down a level
On Tue, Jun 01, 2010 at 05:03:00PM -0600, Alex Williamson wrote: Not that CPU hotplug currently works, but if you make the mistake of trying it on a VM started without specifying a -cpu value, you hit a segfault from trying to strdup(NULL) in cpu_x86_find_by_name(). Signed-off-by: Alex Williamson alex.william...@redhat.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: cleanup, fix typing
On Wed, Jun 02, 2010 at 05:01:23PM +0800, Lai Jiangshan wrote: The type of '*new.rmap' is not 'struct page *', fix it Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com Applied this and uaccess cleanup, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Enforce EPT pagetable level checking
On Wed, Jun 02, 2010 at 02:05:24PM +0800, Sheng Yang wrote: We only support 4 levels EPT pagetable now. Signed-off-by: Sheng Yang sh...@linux.intel.com --- arch/x86/kvm/vmx.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
* Joerg Roedel (j...@8bytes.org) wrote: On Wed, Jun 02, 2010 at 02:21:00PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 02, 2010 at 01:12:25PM +0200, Joerg Roedel wrote: Even if it is bound to a domain the userspace driver could program the device to do dma to unmapped regions causing io-page-faults. The kernel can't do anything about it. It can always corrupt its own memory directly as well :) But that is not a reason not to detect errors if we can, and not to make APIs hard to misuse. Changing the domain of a device while dma can happen is the same type of bug as unmapping potential dma target addresses. We can't catch this kind of misuse. With 10 devices you have 10 extra ioctls. And this works implicitly with your proposal? Yes. so you do: iommu = open ioctl(dev1, BIND, iommu) ioctl(dev2, BIND, iommu) ioctl(dev3, BIND, iommu) ioctl(dev4, BIND, iommu) No need to add a SHARE ioctl. In my proposal this looks like: dev1 = open(); ioctl(dev2, SHARE, dev1); ioctl(dev3, SHARE, dev1); ioctl(dev4, SHARE, dev1); So we actually save an ioctl. This is not any hot path, so saving an ioctl shouldn't be a consideration. Only important consideration is a good API. I may have lost context here, but the SHARE API is limited to the vfio fd. The BIND API expects a new iommu object. Are there other uses for this object? Tom's current vfio driver exposes a dma mapping interface, would the iommu object expose one as well? Current interface is device specific DMA interface for host device drivers typically mapping in-flight dma buffers, and IOMMU specific interface for assigned devices typically mapping entire virtual address space. thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix bugs in msix_set/unset_mask_notifier() routines
On Wed, Jun 02, 2010 at 09:09:37AM -0700, Sridhar Samudrala wrote: I am hitting the following assertions in msix.c when doing a guest reboot or live migration using vhost. qemu-kvm/hw/msix.c:375: msix_mask_all: Assertion `r = 0' failed. qemu-kvm/hw/msix.c:640: msix_unset_mask_notifier: Assertion `dev-msix_mask_notifier_opaque[vector]' failed. The following patch fixes the bugs in handling msix_is_masked() condition in msix_set/unset_mask_notifier() routines. Signed-off-by: Sridhar Samudrala s...@us.ibm.com I take back the ACK. diff --git a/hw/msix.c b/hw/msix.c index 1398680..a191df1 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -609,7 +609,7 @@ void msix_unuse_all_vectors(PCIDevice *dev) int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque) { -int r; +int r = 0; if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector]) return 0; @@ -619,13 +619,15 @@ int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque) /* Unmask the new notifier unless vector is masked. */ if (msix_is_masked(dev, vector)) { -return 0; +goto out; } r = dev-msix_mask_notifier(dev, vector, opaque, msix_is_masked(dev, vector)); if (r 0) { return r; } + +out: dev-msix_mask_notifier_opaque[vector] = opaque; return r; } @@ -640,8 +642,8 @@ int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector) assert(dev-msix_mask_notifier_opaque[vector]); /* Mask the old notifier unless it is already masked. */ -if (msix_is_masked(dev, vector)) { -return 0; +if (!msix_is_masked(dev, vector)) { +goto out; } Wait a second, the logic is reverted here. r = dev-msix_mask_notifier(dev, vector, dev-msix_mask_notifier_opaque[vector], @@ -649,6 +651,8 @@ int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector) if (r 0) { return r; } + +out: dev-msix_mask_notifier_opaque[vector] = NULL; return r; } I find the goto's confusing. I'll post a patch without. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] msix: fix msix_set/unset_mask_notifier
Sridhar Samudrala reported hitting the following assertions in msix.c when doing a guest reboot or live migration using vhost. qemu-kvm/hw/msix.c:375: msix_mask_all: Assertion `r = 0' failed. qemu-kvm/hw/msix.c:640: msix_unset_mask_notifier: Assertion `dev-msix_mask_notifier_opaque[vector]' failed. The issue is that we didn't clear/set the opaque pointer when vector is masked. The following patch fixes this. Signed-off-by: Sridhar Samudrala s...@us.ibm.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Sridhar, could you test the following please? hw/msix.c | 33 - 1 files changed, 16 insertions(+), 17 deletions(-) diff --git a/hw/msix.c b/hw/msix.c index b400769..24ff6ae 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -610,7 +610,7 @@ void msix_unuse_all_vectors(PCIDevice *dev) int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque) { -int r; +int r = 0; if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector]) return 0; @@ -619,13 +619,11 @@ int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque) assert(!dev-msix_mask_notifier_opaque[vector]); /* Unmask the new notifier unless vector is masked. */ -if (msix_is_masked(dev, vector)) { -return 0; -} -r = dev-msix_mask_notifier(dev, vector, opaque, -msix_is_masked(dev, vector)); -if (r 0) { -return r; +if (!msix_is_masked(dev, vector)) { +r = dev-msix_mask_notifier(dev, vector, opaque, false); +if (r 0) { +return r; +} } dev-msix_mask_notifier_opaque[vector] = opaque; return r; @@ -634,21 +632,21 @@ int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque) int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector) { int r = 0; +void *opaque; if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector]) return 0; +opaque = dev-msix_mask_notifier_opaque[vector]; + assert(dev-msix_mask_notifier); -assert(dev-msix_mask_notifier_opaque[vector]); +assert(opaque); /* Mask the old notifier unless it is already masked. */ -if (msix_is_masked(dev, vector)) { -return 0; -} -r = dev-msix_mask_notifier(dev, vector, -dev-msix_mask_notifier_opaque[vector], -!msix_is_masked(dev, vector)); -if (r 0) { -return r; +if (!msix_is_masked(dev, vector)) { +r = dev-msix_mask_notifier(dev, vector, opaque, true); +if (r 0) { +return r; +} } dev-msix_mask_notifier_opaque[vector] = NULL; return r; -- 1.7.1.12.g42b7f -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2] virtio-net: stop vhost backend on vmstop
vhost net currently keeps running after vmstop, which causes trouble as qemy does not check for dirty pages anymore. The fix is to simply keep vm and vhost running/stopped status in sync. Tested-by: David L Stevens dlstev...@us.ibm.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Changes from v1: Simplify code as suggested by Amit. hw/virtio-net.c | 11 +-- 1 files changed, 5 insertions(+), 6 deletions(-) diff --git a/hw/virtio-net.c b/hw/virtio-net.c index 28f26e1..67eebcf 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -877,12 +877,11 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status) static void virtio_net_vmstate_change(void *opaque, int running, int reason) { VirtIONet *n = opaque; -if (!running) { -return; -} -/* This is called when vm is started, it will start vhost backend if - * appropriate e.g. after migration. */ -virtio_net_set_status(n-vdev, n-vdev.status); +uint8_t status = running ? n-vdev.status : 0; +/* This is called when vm is started/stopped, + * it will start/stop vhost backend if appropriate + * e.g. after migration. */ +virtio_net_set_status(n-vdev, status); } VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf) -- 1.7.1.12.g42b7f -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Wednesday 02 June 2010 10:46:15 am Chris Wright wrote: * Joerg Roedel (j...@8bytes.org) wrote: On Wed, Jun 02, 2010 at 02:21:00PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 02, 2010 at 01:12:25PM +0200, Joerg Roedel wrote: Even if it is bound to a domain the userspace driver could program the device to do dma to unmapped regions causing io-page-faults. The kernel can't do anything about it. It can always corrupt its own memory directly as well :) But that is not a reason not to detect errors if we can, and not to make APIs hard to misuse. Changing the domain of a device while dma can happen is the same type of bug as unmapping potential dma target addresses. We can't catch this kind of misuse. With 10 devices you have 10 extra ioctls. And this works implicitly with your proposal? Yes. so you do: iommu = open ioctl(dev1, BIND, iommu) ioctl(dev2, BIND, iommu) ioctl(dev3, BIND, iommu) ioctl(dev4, BIND, iommu) No need to add a SHARE ioctl. In my proposal this looks like: dev1 = open(); ioctl(dev2, SHARE, dev1); ioctl(dev3, SHARE, dev1); ioctl(dev4, SHARE, dev1); So we actually save an ioctl. This is not any hot path, so saving an ioctl shouldn't be a consideration. Only important consideration is a good API. I may have lost context here, but the SHARE API is limited to the vfio fd. The BIND API expects a new iommu object. Are there other uses for this object? Tom's current vfio driver exposes a dma mapping interface, would the iommu object expose one as well? Current interface is device specific DMA interface for host device drivers typically mapping in-flight dma buffers, and IOMMU specific interface for assigned devices typically mapping entire virtual address space. Actually, it a domain object - which may be usable among iommus (Joerg?). However, you can't really do the dma mapping with just the domain because every device supports a different size address space as a master, i.e., the dma_mask. And I don't know how kvm would deal with devices with varying dma mask support, or why they'd be in the same domain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM test: Fixing size of NTFS partitions on windows unattended installs
While running tests on WinVista SP2 64 bits, I noticed that, even though we reserved 20GB of disk space on images, the unattended files are allocating 15GB of the effective space, causing SP2 64 bit install to fail. Fix that, make sure all primary disks do use 20GB. Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- .../kvm/unattended/win2008-32-autounattend.xml |2 +- .../kvm/unattended/win2008-64-autounattend.xml |2 +- .../kvm/unattended/win2008-r2-autounattend.xml |2 +- .../tests/kvm/unattended/win7-32-autounattend.xml |2 +- .../tests/kvm/unattended/win7-64-autounattend.xml |2 +- .../kvm/unattended/winvista-32-autounattend.xml|2 +- .../kvm/unattended/winvista-64-autounattend.xml|2 +- 7 files changed, 7 insertions(+), 7 deletions(-) diff --git a/client/tests/kvm/unattended/win2008-32-autounattend.xml b/client/tests/kvm/unattended/win2008-32-autounattend.xml index c6fafac..44a9fc4 100644 --- a/client/tests/kvm/unattended/win2008-32-autounattend.xml +++ b/client/tests/kvm/unattended/win2008-32-autounattend.xml @@ -25,7 +25,7 @@ CreatePartitions CreatePartition wcm:action=add Order1/Order - Size15000/Size + Size2/Size TypePrimary/Type /CreatePartition /CreatePartitions diff --git a/client/tests/kvm/unattended/win2008-64-autounattend.xml b/client/tests/kvm/unattended/win2008-64-autounattend.xml index 2520a7a..ea0a524 100644 --- a/client/tests/kvm/unattended/win2008-64-autounattend.xml +++ b/client/tests/kvm/unattended/win2008-64-autounattend.xml @@ -12,7 +12,7 @@ CreatePartitions CreatePartition wcm:action=add Order1/Order - Size15000/Size + Size2/Size TypePrimary/Type /CreatePartition /CreatePartitions diff --git a/client/tests/kvm/unattended/win2008-r2-autounattend.xml b/client/tests/kvm/unattended/win2008-r2-autounattend.xml index 2520a7a..ea0a524 100644 --- a/client/tests/kvm/unattended/win2008-r2-autounattend.xml +++ b/client/tests/kvm/unattended/win2008-r2-autounattend.xml @@ -12,7 +12,7 @@ CreatePartitions CreatePartition wcm:action=add Order1/Order - Size15000/Size + Size2/Size TypePrimary/Type /CreatePartition /CreatePartitions diff --git a/client/tests/kvm/unattended/win7-32-autounattend.xml b/client/tests/kvm/unattended/win7-32-autounattend.xml index c37afb7..a577a91 100644 --- a/client/tests/kvm/unattended/win7-32-autounattend.xml +++ b/client/tests/kvm/unattended/win7-32-autounattend.xml @@ -26,7 +26,7 @@ CreatePartitions CreatePartition wcm:action=add Order1/Order - Size15000/Size + Size2/Size TypePrimary/Type /CreatePartition /CreatePartitions diff --git a/client/tests/kvm/unattended/win7-64-autounattend.xml b/client/tests/kvm/unattended/win7-64-autounattend.xml index ad047d0..fec8017 100644 --- a/client/tests/kvm/unattended/win7-64-autounattend.xml +++ b/client/tests/kvm/unattended/win7-64-autounattend.xml @@ -26,7 +26,7 @@ CreatePartitions CreatePartition wcm:action=add Order1/Order - Size15000/Size + Size2/Size TypePrimary/Type /CreatePartition
Re: [PATCH] msix: fix msix_set/unset_mask_notifier
On Wed, 2010-06-02 at 20:49 +0300, Michael S. Tsirkin wrote: Sridhar Samudrala reported hitting the following assertions in msix.c when doing a guest reboot or live migration using vhost. qemu-kvm/hw/msix.c:375: msix_mask_all: Assertion `r = 0' failed. qemu-kvm/hw/msix.c:640: msix_unset_mask_notifier: Assertion `dev-msix_mask_notifier_opaque[vector]' failed. The issue is that we didn't clear/set the opaque pointer when vector is masked. The following patch fixes this. Signed-off-by: Sridhar Samudrala s...@us.ibm.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Sridhar, could you test the following please? Yes. looks good and works fine for me. Tested-by: Sridhar Samudrala s...@us.ibm.com hw/msix.c | 33 - 1 files changed, 16 insertions(+), 17 deletions(-) diff --git a/hw/msix.c b/hw/msix.c index b400769..24ff6ae 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -610,7 +610,7 @@ void msix_unuse_all_vectors(PCIDevice *dev) int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque) { -int r; +int r = 0; if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector]) return 0; @@ -619,13 +619,11 @@ int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque) assert(!dev-msix_mask_notifier_opaque[vector]); /* Unmask the new notifier unless vector is masked. */ -if (msix_is_masked(dev, vector)) { -return 0; -} -r = dev-msix_mask_notifier(dev, vector, opaque, -msix_is_masked(dev, vector)); -if (r 0) { -return r; +if (!msix_is_masked(dev, vector)) { +r = dev-msix_mask_notifier(dev, vector, opaque, false); +if (r 0) { +return r; +} } dev-msix_mask_notifier_opaque[vector] = opaque; return r; @@ -634,21 +632,21 @@ int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque) int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector) { int r = 0; +void *opaque; if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector]) return 0; +opaque = dev-msix_mask_notifier_opaque[vector]; + assert(dev-msix_mask_notifier); -assert(dev-msix_mask_notifier_opaque[vector]); +assert(opaque); /* Mask the old notifier unless it is already masked. */ -if (msix_is_masked(dev, vector)) { -return 0; -} -r = dev-msix_mask_notifier(dev, vector, -dev-msix_mask_notifier_opaque[vector], -!msix_is_masked(dev, vector)); -if (r 0) { -return r; +if (!msix_is_masked(dev, vector)) { +r = dev-msix_mask_notifier(dev, vector, opaque, true); +if (r 0) { +return r; +} } dev-msix_mask_notifier_opaque[vector] = NULL; return r; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread
Replace vhost_workqueue with per-vhost kthread. Other than callback argument change from struct work_struct * to struct vhost_work *, there's no visible change to vhost_poll_*() interface. This conversion is to make each vhost use a dedicated kthread so that resource control via cgroup can be applied. Partially based on Sridhar Samudrala's patch. * Updated to use sub structure vhost_work instead of directly using vhost_poll at Michael's suggestion. * Added flusher wake_up() optimization at Michael's suggestion. Signed-off-by: Tejun Heo t...@kernel.org Cc: Michael S. Tsirkin m...@redhat.com Cc: Sridhar Samudrala samudrala.srid...@gmail.com --- Okay, just tested it. dev-work_lock had to be updated to use irq operations but other than that it worked just fine. Copied a large file using scp and it seems to perform pretty well although I don't have any reference of comparison. So, here's the updated version with the sign off. Thanks. drivers/vhost/net.c | 56 ++--- drivers/vhost/vhost.c | 111 ++ drivers/vhost/vhost.h | 38 +++-- 3 files changed, 134 insertions(+), 71 deletions(-) Index: work/drivers/vhost/net.c === --- work.orig/drivers/vhost/net.c +++ work/drivers/vhost/net.c @@ -294,54 +294,58 @@ static void handle_rx(struct vhost_net * unuse_mm(net-dev.mm); } -static void handle_tx_kick(struct work_struct *work) +static void handle_tx_kick(struct vhost_work *work) { - struct vhost_virtqueue *vq; - struct vhost_net *net; - vq = container_of(work, struct vhost_virtqueue, poll.work); - net = container_of(vq-dev, struct vhost_net, dev); + struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue, + poll.work); + struct vhost_net *net = container_of(vq-dev, struct vhost_net, dev); + handle_tx(net); } -static void handle_rx_kick(struct work_struct *work) +static void handle_rx_kick(struct vhost_work *work) { - struct vhost_virtqueue *vq; - struct vhost_net *net; - vq = container_of(work, struct vhost_virtqueue, poll.work); - net = container_of(vq-dev, struct vhost_net, dev); + struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue, + poll.work); + struct vhost_net *net = container_of(vq-dev, struct vhost_net, dev); + handle_rx(net); } -static void handle_tx_net(struct work_struct *work) +static void handle_tx_net(struct vhost_work *work) { - struct vhost_net *net; - net = container_of(work, struct vhost_net, poll[VHOST_NET_VQ_TX].work); + struct vhost_net *net = container_of(work, struct vhost_net, +poll[VHOST_NET_VQ_TX].work); handle_tx(net); } -static void handle_rx_net(struct work_struct *work) +static void handle_rx_net(struct vhost_work *work) { - struct vhost_net *net; - net = container_of(work, struct vhost_net, poll[VHOST_NET_VQ_RX].work); + struct vhost_net *net = container_of(work, struct vhost_net, +poll[VHOST_NET_VQ_RX].work); handle_rx(net); } static int vhost_net_open(struct inode *inode, struct file *f) { struct vhost_net *n = kmalloc(sizeof *n, GFP_KERNEL); + struct vhost_dev *dev; int r; + if (!n) return -ENOMEM; + + dev = n-dev; n-vqs[VHOST_NET_VQ_TX].handle_kick = handle_tx_kick; n-vqs[VHOST_NET_VQ_RX].handle_kick = handle_rx_kick; - r = vhost_dev_init(n-dev, n-vqs, VHOST_NET_VQ_MAX); + r = vhost_dev_init(dev, n-vqs, VHOST_NET_VQ_MAX); if (r 0) { kfree(n); return r; } - vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT); - vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN); + vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev); + vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev); n-tx_poll_state = VHOST_NET_POLL_DISABLED; f-private_data = n; @@ -644,25 +648,13 @@ static struct miscdevice vhost_net_misc static int vhost_net_init(void) { - int r = vhost_init(); - if (r) - goto err_init; - r = misc_register(vhost_net_misc); - if (r) - goto err_reg; - return 0; -err_reg: - vhost_cleanup(); -err_init: - return r; - + return misc_register(vhost_net_misc); } module_init(vhost_net_init); static void vhost_net_exit(void) { misc_deregister(vhost_net_misc); - vhost_cleanup(); } module_exit(vhost_net_exit); Index: work/drivers/vhost/vhost.c === ---
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Wed, Jun 02, 2010 at 11:09:17AM -0700, Tom Lyon wrote: On Wednesday 02 June 2010 10:46:15 am Chris Wright wrote: This is not any hot path, so saving an ioctl shouldn't be a consideration. Only important consideration is a good API. I may have lost context here, but the SHARE API is limited to the vfio fd. The BIND API expects a new iommu object. Are there other uses for this object? Tom's current vfio driver exposes a dma mapping interface, would the iommu object expose one as well? Current interface is device specific DMA interface for host device drivers typically mapping in-flight dma buffers, and IOMMU specific interface for assigned devices typically mapping entire virtual address space. Actually, it a domain object - which may be usable among iommus (Joerg?). Yes, this 'iommu' thing is would be a domain object. But sharing among iommus is not necessary because the fact that there are multiple iommus in the system is hidden by the iommu drivers. This fact is not even exposed by the iommu-api. This makes protection domains system global. However, you can't really do the dma mapping with just the domain because every device supports a different size address space as a master, i.e., the dma_mask. The dma_mask has to be handled by the device driver. With the iommu-mapping interface the driver can specify the target io-address and has to consider the dma_mask for that too. And I don't know how kvm would deal with devices with varying dma mask support, or why they'd be in the same domain. KVM does not care about these masks. This is the business of the guest device drivers. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCHv2-RFC 0/2] virtio: put last seen used index into ring itself
On 05/31/10 09:46, Rusty Russell wrote: On Thu, 27 May 2010 05:20:35 am Michael S. Tsirkin wrote: Here's a rewrite of the original patch with a new layout. I haven't tested it yet so no idea how this performs, but I think this addresses the cache bounce issue raised by Avi. Posting for early flames/comments. Sorry, not without some evidence that it'll actually reduce cacheline bouncing. I *think* it will, but it's not obvious: the host may keep looking at avail_idx as we're updating last_seen. Or does qemu always look at both together anyway? Can someone convince me this is a win? Rusty. Hi Rusty, I ran some tests using the vring index publish patch with virtio_blk. The numbers are based on running IOZone on a ramdisk passed to the guest via virtio. While I didn't see any throughput improvements, I saw a 20-30% reduction in the VMExit count for the full run. This was measured grabbing the VMExit count prior and after the run and calculating the difference. I have the numbers in a PDF, so I will email it to you in private as I don't like sending PDFs to the mailing list. However if anybody else wants the numbers feel free to ping me off list and I'll forward them. Cheers, Jes -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Clock jumps
On 05/26/2010 11:31 AM, Alexander Graf wrote: On 26.05.2010, at 19:10, Orion Poplawski wrote: On 05/25/2010 12:21 AM, Gleb Natapov wrote: Adding kvm to CC. On Mon, May 24, 2010 at 04:06:32PM +, Orion Poplawski wrote: I have a KVM virtual machine running 2.6.33.4-95.fc13.x86_64 on a CentOS 5.5 host whose clock jumps about 8-12 hours a couple times a day. I have no idea what is causing it. Fedora 12 and Centos 5.5 KVM machines run fine on the same host. Is there any debugging I can enable to see what is jumping the clock? kvm-clock: cpu 0, msr 0:1ba4741, boot clock kvm-clock: cpu 0, msr 0:1e15741, primary cpu clock Switching to clocksource kvm-clock rtc_cmos 00:01: setting system clock to 2010-05-20 16:59:48 UTC (1274374788) Thanks, though I don't think it made it there. I'm also not sure it's completely limited to KVM, though that is the only running system I am currently seeing the problem on. I also see clock jumps during anaconda installs on physical hardware and apparently they have been present since at least F11. Might be unrelated though. I'm really at a loss of how to debug this though. Do you have ntpd running inside the guest? I have a bug report lying around about 2.6.33 with kvm-clock jumping in time when ntpd is used: https://bugzilla.novell.com/show_bug.cgi?id=582260 Alex Turning off ntpd and chronyd did not help for me. -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA/CoRA DivisionFAX: 303-415-9702 3380 Mitchell Lane or...@cora.nwra.com Boulder, CO 80301 http://www.cora.nwra.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unhandled wrmsr
On Wed, Jun 2, 2010 at 8:45 PM, Andre Przywara andre.przyw...@amd.com wrote: Dave Young wrote: Hi, With today's git version (qemu-kvm), I got following message in kernel dmesg [168344.215605] kvm: 27289: cpu0 unhandled wrmsr: 0x198 data 0 Are you sure about that? Sure 0x198 is an Intel architectural perfmon MSR and it is read-only. The Linux kernel source I grep'ed obeys this and does only rdmsr. You can work around this by changing the error to a warning with: # modprobe kvm ignore_msrs=1 with this param, appear following warning: kvm: 28520: cpu0 ignored wrmsr: 0x198 data 0 I'd like to see more details about the guest Linux kernel, at least the version you used to see why there is a wrmsr on this address. Best is you provide the kernel (just the vmlinuz file) somewhere so that we can reproduce this. Also the qemu-kvm command line would be interesting. Actually I tried different guest image, recreate this problem is easy. one of them is slackware 13.0 kernel version 2.6.29.6, you can download from: http://www.slackware.at/data/slackware64-13.0/kernels/huge.s/ kvm cmdline: qemu-system-x86_64 -net nic -net tap,ifname=tap0,script=no slack64.img Regards, Andre. bash-3.1$ /home/dave/tmp/qemu-system-x86_64 --version QEMU emulator version 0.12.50 (qemu-kvm-devel), Copyright (c) 2003-2008 Fabrice Bellard bash-3.1$ uname -a Linux darkstar 2.6.35-rc1 #35 SMP Mon May 31 16:50:15 CST 2010 x86_64 Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz GenuineIntel GNU/Linux host distribution is slackware64-13.0 0.12.3 works well BTW the guest is a tiny core linux image -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 448-3567-12 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Regards dave -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unhandled wrmsr
On Thu, Jun 3, 2010 at 9:34 AM, Dave Young hidave.darks...@gmail.com wrote: On Wed, Jun 2, 2010 at 8:45 PM, Andre Przywara andre.przyw...@amd.com wrote: Dave Young wrote: Hi, With today's git version (qemu-kvm), I got following message in kernel dmesg [168344.215605] kvm: 27289: cpu0 unhandled wrmsr: 0x198 data 0 Are you sure about that? Sure 0x198 is an Intel architectural perfmon MSR and it is read-only. The Linux kernel source I grep'ed obeys this and does only rdmsr. You can work around this by changing the error to a warning with: # modprobe kvm ignore_msrs=1 with this param, appear following warning: kvm: 28520: cpu0 ignored wrmsr: 0x198 data 0 I'd like to see more details about the guest Linux kernel, at least the version you used to see why there is a wrmsr on this address. Best is you provide the kernel (just the vmlinuz file) somewhere so that we can reproduce this. Also the qemu-kvm command line would be interesting. Actually I tried different guest image, recreate this problem is easy. one of them is slackware 13.0 kernel version 2.6.29.6, you can download from: http://www.slackware.at/data/slackware64-13.0/kernels/huge.s/ kvm cmdline: qemu-system-x86_64 -net nic -net tap,ifname=tap0,script=no slack64.img Regards, Andre. bash-3.1$ /home/dave/tmp/qemu-system-x86_64 --version QEMU emulator version 0.12.50 (qemu-kvm-devel), Copyright (c) 2003-2008 Fabrice Bellard Another problem with this build is I can not boot from virtio-blk, cmdline: qemu-system-x86_64 -boot c -drive file=slack64.img,if-virtio,boot=on qemu just complain about: Could not read the boot disk Anything missing by me? bash-3.1$ uname -a Linux darkstar 2.6.35-rc1 #35 SMP Mon May 31 16:50:15 CST 2010 x86_64 Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz GenuineIntel GNU/Linux host distribution is slackware64-13.0 0.12.3 works well BTW the guest is a tiny core linux image -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 448-3567-12 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Regards dave -- Regards dave -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unhandled wrmsr
On Thu, Jun 3, 2010 at 10:23 AM, Dave Young hidave.darks...@gmail.com wrote: On Thu, Jun 3, 2010 at 9:34 AM, Dave Young hidave.darks...@gmail.com wrote: On Wed, Jun 2, 2010 at 8:45 PM, Andre Przywara andre.przyw...@amd.com wrote: Dave Young wrote: Hi, With today's git version (qemu-kvm), I got following message in kernel dmesg [168344.215605] kvm: 27289: cpu0 unhandled wrmsr: 0x198 data 0 Are you sure about that? Sure 0x198 is an Intel architectural perfmon MSR and it is read-only. The Linux kernel source I grep'ed obeys this and does only rdmsr. You can work around this by changing the error to a warning with: # modprobe kvm ignore_msrs=1 with this param, appear following warning: kvm: 28520: cpu0 ignored wrmsr: 0x198 data 0 I'd like to see more details about the guest Linux kernel, at least the version you used to see why there is a wrmsr on this address. Best is you provide the kernel (just the vmlinuz file) somewhere so that we can reproduce this. Also the qemu-kvm command line would be interesting. Actually I tried different guest image, recreate this problem is easy. one of them is slackware 13.0 kernel version 2.6.29.6, you can download from: http://www.slackware.at/data/slackware64-13.0/kernels/huge.s/ kvm cmdline: qemu-system-x86_64 -net nic -net tap,ifname=tap0,script=no slack64.img Regards, Andre. bash-3.1$ /home/dave/tmp/qemu-system-x86_64 --version QEMU emulator version 0.12.50 (qemu-kvm-devel), Copyright (c) 2003-2008 Fabrice Bellard Another problem with this build is I can not boot from virtio-blk, cmdline: qemu-system-x86_64 -boot c -drive file=slack64.img,if-virtio,boot=on typo, if=virtio qemu just complain about: Could not read the boot disk Anything missing by me? bash-3.1$ uname -a Linux darkstar 2.6.35-rc1 #35 SMP Mon May 31 16:50:15 CST 2010 x86_64 Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz GenuineIntel GNU/Linux host distribution is slackware64-13.0 0.12.3 works well BTW the guest is a tiny core linux image -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 448-3567-12 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Regards dave -- Regards dave -- Regards dave -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] use unfair spinlock when running on hypervisor.
On Wed, Jun 02, 2010 at 12:00:27PM +0300, Avi Kivity wrote: There are two separate problems: the more general problem is that the hypervisor can put a vcpu to sleep while holding a lock, causing other vcpus to spin until the end of their time slice. This can only be addressed with hypervisor help. Fyi - I have a early patch ready to address this issue. Basically I am using host-kernel memory (mmap'ed into guest as io-memory via ivshmem driver) to hint host whenever guest is in spin-lock'ed section, which is read by host scheduler to defer preemption. Guest side: static inline void spin_lock(spinlock_t *lock) { raw_spin_lock(lock-rlock); + __get_cpu_var(gh_vcpu_ptr)-defer_preempt++; } static inline void spin_unlock(spinlock_t *lock) { + __get_cpu_var(gh_vcpu_ptr)-defer_preempt--; raw_spin_unlock(lock-rlock); } [similar changes to other spinlock variants] Host side: @@ -860,6 +866,17 @@ check_preempt_tick(struct cfs_rq *cfs_rq ideal_runtime = sched_slice(cfs_rq, curr); delta_exec = curr-sum_exec_runtime - curr-prev_sum_exec_runtime; if (delta_exec ideal_runtime) { + if ((sched_feat(DEFER_PREEMPT)) (rq_of(cfs_rq)-curr-ghptr)) { + int defer_preempt = rq_of(cfs_rq)-curr-ghptr-defer_preempt; + if (((defer_preempt 0x) == 0xfeed) ((defer_preempt 0x) != 0)) { + if ((rq_of(cfs_rq)-curr-grace_defer++ sysctl_sched_preempt_defer_count)) { + rq_of(cfs_rq)-defer_preempt++; + return; + } else + rq_of(cfs_rq)-force_preempt++; + } + } resched_task(rq_of(cfs_rq)-curr); /* * The current task ran long enough, ensure it doesn't get [similar changes introduced at other preemption points in sched_fair.c] Note that guest can only request preemption to be deferred (and not disabled via this mechanism). I have seen good improvement (~15%) in kern compile benchmark with sysctl_sched_preempt_defer_count set to a low value of just 2 (i.e we can defer preemption by maximum two ticks). I intend to cleanup and post the patches pretty soon for comments. One pathological case where this may actually hurt is routines in guest like flush_tlb_others_ipi() which take a spinlock and then enter a while() loop waiting for other cpus to ack something. In this case, deferring preemption just because guest is in critical section actually hurts! Hopefully the upper bound for deferring preemtion and the fact that such routines may not be frequently hit should help alleviate such situations. - vatsa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] use unfair spinlock when running on hypervisor.
On Thu, Jun 03, 2010 at 06:51:51AM +0200, Eric Dumazet wrote: Guest side: static inline void spin_lock(spinlock_t *lock) { raw_spin_lock(lock-rlock); + __get_cpu_var(gh_vcpu_ptr)-defer_preempt++; 1) __this_cpu_inc() should be faster Ok ..thx for that tip. 2) Isnt a bit late to do this increment _after_ raw_spin_lock(lock-rlock); I think so, my worry about doing it earlier is we may set the defer_preempt hint for the wrong vcpu (if lets say the guest application thread is preempted by guest kernel and later migrated to another vcpu after it sets the hint and before it acquires the lock). - vatsa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm/powerpc: fix a build error in e500_tlb.c
We use the wrong number arguments when invoking trace_kvm_stlb_inval, and cause the following build error. arch/powerpc/kvm/e500_tlb.c: In function 'kvmppc_e500_stlbe_invalidate': arch/powerpc/kvm/e500_tlb.c:230: error: too many arguments to function 'trace_kvm_stlb_inval' Signed-off-by: Kevin Hao haoke...@gmail.com --- arch/powerpc/kvm/e500_tlb.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c index 21011e1..1261a21 100644 --- a/arch/powerpc/kvm/e500_tlb.c +++ b/arch/powerpc/kvm/e500_tlb.c @@ -226,8 +226,7 @@ static void kvmppc_e500_stlbe_invalidate(struct kvmppc_vcpu_e500 *vcpu_e500, kvmppc_e500_shadow_release(vcpu_e500, tlbsel, esel); stlbe-mas1 = 0; - trace_kvm_stlb_inval(index_of(tlbsel, esel), stlbe-mas1, stlbe-mas2, -stlbe-mas3, stlbe-mas7); + trace_kvm_stlb_inval(index_of(tlbsel, esel)); } static void kvmppc_e500_tlb1_invalidate(struct kvmppc_vcpu_e500 *vcpu_e500, -- 1.6.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: elide struct thread_struct instances from stack
On Mon, May 31, 2010 at 09:59:13PM +0200, Andreas Schwab wrote: Instead of instantiating a whole thread_struct on the stack use only the required parts of it. Signed-off-by: Andreas Schwab sch...@linux-m68k.org Tested-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_fpu.h | 27 + arch/powerpc/kernel/ppc_ksyms.c |4 - arch/powerpc/kvm/book3s.c| 49 +--- arch/powerpc/kvm/book3s_paired_singles.c | 94 -- arch/powerpc/kvm/fpu.S | 18 ++ 5 files changed, 97 insertions(+), 95 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html