[kvm-devel] [ kvm-Bugs-1958464 ] Unknown symbol in module loading kvm.ko
Bugs item #1958464, was opened at 2008-05-06 14:34 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1958464group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: yunfeng (yunfeng) Assigned to: Nobody/Anonymous (nobody) Summary: Unknown symbol in module loading kvm.ko Initial Comment: With today's commits, kvm.git 630741928b4a7eeff27e134d7ba7bc2fc2c764c5 and kvm-userspace.git 77c9148ba4a89a8dc4ab2ecf525c2de8604ea8c3. , I cannot insert PAE KVM modules. Lots of unknown symbols have been reported while inserting kvm.ko. error inserting '/usr/kvm/kvm.ko': -1 Unknown symbol in module dmesg: kvm_intel: Unknown symbol kvm_set_cr4 kvm_intel: Unknown symbol kvm_set_cr0 kvm_intel: Unknown symbol kvm_set_cr8 kvm_intel: Unknown symbol kvm_lapic_enabled kvm_intel: Unknown symbol kvm_mmu_page_fault kvm_intel: Unknown symbol kvm_mmu_reset_context kvm_intel: Unknown symbol kvm_queue_exception_e kvm_intel: Unknown symbol kvm_emulate_cpuid kvm_intel: Unknown symbol kvm_vcpu_init kvm_intel: Unknown symbol gfn_to_hva kvm_intel: Unknown symbol kvm_set_msr_common kvm_intel: Unknown symbol kvm_mmu_set_base_ptes kvm_intel: Unknown symbol kvm_cpu_get_interrupt kvm_intel: Unknown symbol kvm_emulate_pio kvm_intel: Unknown symbol kvm_mmu_set_mask_ptes kvm_intel: Unknown symbol kvm_is_error_hva kvm: Unknown symbol kvm_div64_u64 kvm: emulating preempt notifiers; do not benchmark on this machine loaded kvm module (kvm-67-2059-g9cebc1e) kvm: Unknown symbol kvm_div64_u64 -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1958464group_id=180599 - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [ kvm-Bugs-1958467 ] Fail to save restore and live migrate on 32e platform
Bugs item #1958467, was opened at 2008-05-06 14:36 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1958467group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: yunfeng (yunfeng) Assigned to: Nobody/Anonymous (nobody) Summary: Fail to save restore and live migrate on 32e platform Initial Comment: Environment: Host OS: RHEL5U1 ia32e Commits: 630741928b4a7eeff27e134d7ba7bc2fc2c764c5-77c9148ba4a89a8dc4ab2ecf525c2de8604ea8c3 Hardware:Platform Woodcrest CPU 4 Memory size 8G' Bug detailed description: -- Fail to save restore and live migrate on 32e platform. When restore the guest, the new qemu disappeared, host console print: qemu: warning: error while loading state for instance 0x0 of device 'cpu' Migration failed rc=215 Reproduce steps: 1.use qcow based image to boot a guest: qemu-img create -b /share/xvs/img/app/ia32p_UP.img -f qcow2 /share/xvs/var/sr qemu-system-x86_64 -m 256 -monitor pty -net nic,macaddr=00:16:3e:35:1c:97,model=rtl8139 -net tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/sr 2.ctrl+alt+2 switch to qemu monitor and save the guest migrate file:///share/xvs/var/sr123 3.restore qemu-system-x86_64 -m 256 -net nic,macaddr=00:16:3e:35:1c:97,model=rtl8139 -net tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/sr -incoming file:///share/xvs/var/sr123 Current result: Expected result: -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1958467group_id=180599 - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC] [VTD][patch 1/3] vt-d support for pci passthrough: kvm-vtd--kernel.patch
On Tuesday 06 May 2008 03:06:23 Kay, Allen M wrote: Kvm kernel changes. Signed-off-by: Allen M Kay [EMAIL PROTECTED] --- /dev/null +++ b/arch/x86/kvm/vtd.c @@ -0,0 +1,183 @@ + +#define DEFAULT_DOMAIN_ADDRESS_WIDTH 48 + +struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev *dev); +struct dmar_domain * iommu_alloc_domain(struct intel_iommu *iommu); +void iommu_free_domain(struct dmar_domain *domain); +int domain_init(struct dmar_domain *domain, int guest_width); +int domain_context_mapping(struct dmar_domain *d, + struct pci_dev *pdev); +int domain_page_mapping(struct dmar_domain *domain, dma_addr_t iova, + u64 hpa, size_t size, int prot); +void detach_domain_for_dev(struct dmar_domain *domain, u8 bus, u8 devfn); +struct dmar_domain * find_domain(struct pci_dev *pdev); Please move these to a .h file and also prefix appropriate keywords: domain_context_mapping is confusing and since it's an intel iommu-only thing, use something like intel_iommu_domain_context_mapping +int kvm_iommu_map_pages(struct kvm *kvm, + gfn_t base_gfn, unsigned long npages) +{ + unsigned long gpa; + struct page *page; + hpa_t hpa; + int j, write; + struct vm_area_struct *vma; + + if (!kvm-arch.domain) + return 1; + + gpa = base_gfn PAGE_SHIFT; + page = gfn_to_page(kvm, base_gfn); + hpa = page_to_phys(page); + + printk(KERN_DEBUG kvm_iommu_map_page: gpa = %lx\n, gpa); + printk(KERN_DEBUG kvm_iommu_map_page: hpa = %llx\n, hpa); + printk(KERN_DEBUG kvm_iommu_map_page: size = %lx\n, + npages*PAGE_SIZE); + + for (j = 0; j npages; j++) { + gpa += PAGE_SIZE; + page = gfn_to_page(kvm, gpa PAGE_SHIFT); + hpa = page_to_phys(page); + domain_page_mapping(kvm-arch.domain, gpa, hpa, PAGE_SIZE, + DMA_PTE_READ | DMA_PTE_WRITE); + vma = find_vma(current-mm, gpa); + if (!vma) + return 1; * + write = (vma-vm_flags VM_WRITE) != 0; + get_user_pages(current, current-mm, gpa, + PAGE_SIZE, write, 0, NULL, NULL); You should put_page each of the user pages when freeing or exiting (in unmap_guest), else a ref is held on each page and that's a lot of memory leaked. Also, this rules out any form of guest swapping. You should put_page in case a balloon driver in the guest tries to free some pages for the host. + } + return 0; +} +EXPORT_SYMBOL_GPL(kvm_iommu_map_pages); + +static int kvm_iommu_map_memslots(struct kvm *kvm) +{ + int i, status; + for (i = 0; i kvm-nmemslots; i++) { + status = kvm_iommu_map_pages(kvm, kvm-memslots[i].base_gfn, + kvm-memslots[i].npages); + if (status) + return status; * + } + return 0; +} + +int kvm_iommu_map_guest(struct kvm *kvm, + struct kvm_pci_passthrough_dev *pci_pt_dev) +{ + struct dmar_drhd_unit *drhd; + struct dmar_domain *domain; + struct intel_iommu *iommu; + struct pci_dev *pdev = NULL; + + printk(KERN_DEBUG kvm_iommu_map_guest: host bdf = %x:%x:%x\n, + pci_pt_dev-host.busnr, + PCI_SLOT(pci_pt_dev-host.devfn), + PCI_FUNC(pci_pt_dev-host.devfn)); + + for_each_pci_dev(pdev) { + if ((pdev-bus-number == pci_pt_dev-host.busnr) + (pdev-devfn == pci_pt_dev-host.devfn)) + goto found; + } You can use pci_get_device instead of going through the list yourself. + goto not_found; +found: + pci_pt_dev-pdev = pdev; + + drhd = dmar_find_matched_drhd_unit(pdev); + if (!drhd) { + printk(KERN_ERR kvm_iommu_map_guest: drhd == NULL\n); + goto not_found; + } + + printk(KERN_DEBUG kvm_iommu_map_guest: reg_base_addr = %llx\n, + drhd-reg_base_addr); + + iommu = drhd-iommu; + if (!iommu) { + printk(KERN_ERR kvm_iommu_map_guest: iommu == NULL\n); + goto not_found; + } + domain = iommu_alloc_domain(iommu); + if (!domain) { + printk(KERN_ERR kvm_iommu_map_guest: domain == NULL\n); + goto not_found; + } + if (domain_init(domain, DEFAULT_DOMAIN_ADDRESS_WIDTH)) { + printk(KERN_ERR kvm_iommu_map_guest: domain_init() failed\n); + goto not_found; Memory allocated in iommu_alloc_domain is leaked in this case + } + kvm-arch.domain = domain; + kvm_iommu_map_memslots(kvm); *: You don't check for failure in mapping + domain_context_mapping(kvm-arch.domain, pdev); + return 0; +not_found: + return 1; +} +EXPORT_SYMBOL_GPL(kvm_iommu_map_guest); + +int kvm_iommu_unmap_guest(struct kvm *kvm) +{ + struct
Re: [kvm-devel] [RFC] [VTD][patch 0/3] vt-d support for pci passthrough
On Tuesday 06 May 2008 03:05:30 Kay, Allen M wrote: Following three patches contains vt-d support for pci passthrough. It contains diff's base on Amit's 4/22 passthrough tree. The hardware environment used for this work is an Intel Weybridge system (Q35). The passthrough device is an E1000 NIC. I'm still using irqhook mechanism for interrupt injection as I had problem with irqchip machanism. Following is the command line I used to start the guest. Can you tell me what the problem with in-kernel irqchip is? Last time you mentioned there was a warning that came up when the guest exited. That shouldn't have stopped it from working, though /usr/local/bin/qemu-system-x86_64 -boot c -hda /etc/xen/fc5_32.img -m 256 -net none -pcidevice e1000/01:00.0-16 -no-kvm-irqchip Remaining tasks include: 1) Generated vtd.o with kvm-intel.ko instead of kvm.ko. 2) Make iommu hooks in generic code to be non-Intel specific This is a good idea but will need collaboration with a lot of vendors. Let me know of your feedbacks. Thanks. Allen Amit - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] KVM Test result, kernel 6307419.., userspace 77c9148.. -- 3 new issues
Hi All, This is today's KVM test result against kvm.git 630741928b4a7eeff27e134d7ba7bc2fc2c764c5 and kvm-userspace.git 77c9148ba4a89a8dc4ab2ecf525c2de8604ea8c3. There's one new issue blocked nightly test on ia32-pae platform (issue#1958464). Three New Issues: 1. Unknown symbol in module while loading kvm.ko on PAE host https://sourceforge.net/tracker/?func=detailatid=893831aid=1958464group_id=180599 2. Fail to save restore and live migrate on 32e platform https://sourceforge.net/tracker/?func=detailatid=893831aid=1958467group_id=180599 3. fails to build KVM modules against 2.6.26 kernel https://sourceforge.net/tracker/?func=detailatid=893831aid=1958519group_id=180599 One Old Issues: 4. Cannot boot guests with hugetlbfs https://sourceforge.net/tracker/?func=detailatid=893831aid=1941302group_id=180599 Test environment PlatformWoodcrest CPU 4 Memory size 8G' Details IA32e: 1. boot four 32-bit guest in parallel PASS 2. boot four 64-bit guest in parallel PASS 3. boot 4G 64-bit guest PASS 4. boot 4G pae guest PASS 5. boot 32-bit linux and 32 bit windows guest in parallelPASS 6. boot 32-bit guest with 1500M memory PASS 7. boot 64-bit guest with 1500M memory PASS 8. boot 32-bit guest with 256M memory PASS 9. boot 64-bit guest with 256M memory PASS 10. boot two 32-bit windows xp in parallelPASS 11. boot four 32-bit different guest in para PASS 12. save/restore 64-bit linux guests FAIL 13. save/restore 32-bit linux guests FAIL 14. boot 32-bit SMP windows 2003 with ACPI enabled PASS 15. boot 32-bit SMP windows 2008 with ACPI enabled PASS 16. boot 32-bit SMP Windows 2000 with ACPI enabled PASS 17. boot 32-bit SMP Windows xp with ACPI enabledPASS 18. boot 32-bit Windows 2000 without ACPIPASS 19. boot 64-bit Windows xp with ACPI enabledPASS 20. boot 32-bit Windows xp without ACPIPASS 21. boot 64-bit UP vista PASS 22. boot 64-bit SMP vista PASS 23. kernel build in 32-bit linux guest OS PASS 24. kernel build in 64-bit linux guest OS PASS 25. LTP on 32-bit linux guest OSPASS 26. LTP on 64-bit linux guest OSPASS 27. boot 64-bit guests with ACPI enabled PASS 28. boot 32-bit x-server PASS 29. boot 64-bit SMP windows XP with ACPI enabled PASS 30. boot 64-bit SMP windows 2003 with ACPI enabled PASS 31. boot 64-bit SMP windows 2008 with ACPI enabled PASS 32. live migration 64bit linux guests FAIL 33. live migration 32bit linux guests FAIL 34. reboot 32bit windows xp guest PASS 35. reboot 32bit windows xp guest PASS Report Summary on IA32e Summary Test Report of Last Session = Total PassFailNoResult Crash = control_panel 15 11 4 00 Restart 3 3 0 00 gtest 23 21 2 00 = control_panel 15 11 4 00 :KVM_LM_64_g64 1 0 1 00 :KVM_four_sguest_64_gPAE 1 1 0 00 :KVM_4G_guest_64_g64 1 1 0 00 :KVM_four_sguest_64_g641 1 0 00 :KVM_linux_win_64_gPAE 1 1 0 00 :KVM_1500M_guest_64_gPAE 1 1 0 00 :KVM_SR_64_g64 1 0 1 00 :KVM_LM_64_gPAE1 0 1 00 :KVM_256M_guest_64_g64 1 1 0 00 :KVM_1500M_guest_64_g641 1 0 0
[kvm-devel] [PATCH] janitorial: remove leftovers from merge conflict
apparently harmless and unique Signed-off-by: Carlo Marcelo Arenas Belon [EMAIL PROTECTED] --- qemu/Makefile.target |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/qemu/Makefile.target b/qemu/Makefile.target index cc66651..bb4b9a3 100644 --- a/qemu/Makefile.target +++ b/qemu/Makefile.target @@ -190,7 +190,6 @@ all: $(PROGS) # # cpu emulator library - HEAD:qemu/Makefile.target LIBOBJS=exec.o kqemu.o cpu-exec.o host-utils.o ifeq ($(NO_CPU_EMULATION), 1) -- 1.5.3.7 - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [ kvm-Bugs-1958519 ] fails to build KVM modules against 2.6.26 kernel
Hi all, Initial Comment: Building KVM modules against 2.6.24 kernel is ok. But building against 2.6.26 kernel will fail. I got the same problem, but the following Andrea's patch helped me. Hope this helps, ozaki-r -- Forwarded message -- From: Andrea Arcangeli [EMAIL PROTECTED] Date: 2008/4/26 Subject: [kvm-devel] fix external module compile To: kvm-devel@lists.sourceforge.net Cc: Avi Kivity [EMAIL PROTECTED] Hello, after updating kvm-userland.git, kvm.git and linux-2.6-hg, and after make distclean and rebuild with slightly reduced .config, I can't compile the external module anymore. Looking into it with V=1, $(src) defines to and including /external-module-compat.h clearly fails. I fixed it like below, because it seems more consistent to enforce the ordering of the special includes in the same place, notably $(src)/include is already included as $LINUX at point 1 of the comment, so this looks a cleanup of superflous line in Kconfig besides fixing my compile by moving the external-module-compat in the same place with the other includes where `pwd` works instead of $(src) that doesn't work anymore for whatever reason. Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] diff --git a/kernel/Kbuild b/kernel/Kbuild index cabfc75..d9245eb 100644 --- a/kernel/Kbuild +++ b/kernel/Kbuild @@ -1,4 +1,3 @@ -EXTRA_CFLAGS := -I$(src)/include -include $(src)/external-module-compat.h obj-m := kvm.o kvm-intel.o kvm-amd.o kvm-objs := kvm_main.o x86.o mmu.o x86_emulate.o anon_inodes.o irq.o i8259.o \ lapic.o ioapic.o preempt.o i8254.o external-module-compat.o diff --git a/kernel/Makefile b/kernel/Makefile index 78ff923..e3fccbe 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -27,7 +27,8 @@ all:: # include header priority 1) $LINUX 2) $KERNELDIR 3) include-compat $(MAKE) -C $(KERNELDIR) M=`pwd` \ LINUXINCLUDE=-I`pwd`/include -Iinclude -I`pwd`/include-compat \ - -include include/linux/autoconf.h \ + -include include/linux/autoconf.h \ + -include `pwd`/external-module-compat.h $$@ sync: header-sync source-sync - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel 2008/5/6 SourceForge.net [EMAIL PROTECTED]: Bugs item #1958519, was opened at 2008-05-06 16:05 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1958519group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: yunfeng (yunfeng) Assigned to: Nobody/Anonymous (nobody) Summary: fails to build KVM modules against 2.6.26 kernel Initial Comment: Building KVM modules against 2.6.24 kernel is ok. But building against 2.6.26 kernel will fail. make -j20 -C /lib/modules/2.6.26-rc1-02049-g6307419/build M=`pwd` \ LINUXINCLUDE=-I`pwd`/include -Iinclude -I`pwd`/include-compat \ -include include/linux/autoconf.h \ $@ make[1]: Entering directory `/root/kvm' Building modules, stage 2. MODPOST 3 modules WARNING: kvm_div64_u64 [/root/kvm-master-2.6.22-rc4-2008050601096/kvm-userspace/kernel/kvm.ko] undefined! CC /root/kvm-master-2.6.22-rc4-2008050601096/kvm-userspace/kernel/kvm-amd.mod.o CC /root/kvm-master-2.6.22-rc4-2008050601096/kvm-userspace/kernel/kvm-intel.mod.o CC /root/kvm-master-2.6.22-rc4-2008050601096/kvm-userspace/kernel/kvm.mod.o In file included from command line:1: ./include/linux/autoconf.h:516:1: error: /external-module-compat.h: No such file or directory In file included from command line:1: ./include/linux/autoconf.h:516:1: error: /external-module-compat.h: No such file or directory In file included from command line:1: ./include/linux/autoconf.h:516:1: error: /external-module-compat.h: No such file or directory make[2]: *** [/root/kvm-master-2.6.22-rc4-2008050601096/kvm-userspace/kernel/kvm-intel.mod.o] Error 1 make[2]: *** Waiting for unfinished jobs make[2]: *** [/root/kvm-master-2.6.22-rc4-2008050601096/kvm-userspace/kernel/kvm-amd.mod.o] Error 1 make[2]: *** [/root/kvm-master-2.6.22-rc4-2008050601096/kvm-userspace/kernel/kvm.mod.o] Error 1
Re: [kvm-devel] [patch 0/3] QEMU/KVM: add support for 128 PCI slots (v2)
Anthony Liguori wrote: Avi Kivity wrote: Marcelo Tosatti wrote: Add three PCI bridges to support 128 slots. Changes since v1: - Remove I/O address range support (so standard PCI I/O space is used). - Verify that there's no special quirks for 82801 PCI bridge. - Introduce separate flat IRQ mapping function for non-SPARC targets. I've cooled off on the 128 slot stuff, mainly because most real hosts don't have them. An unusual configuration will likely lead to problems as most guest OSes and workloads will not have been tested thoroughly with them. - it requires a large number of interrupts, which are difficult to provide, and which it is hard to ensure all OSes support. MSI is relatively new. - is only a few interrupts are available, then each interrupt requires scanning a large number of queues If we are to do this, then we need better tests than 80 disks show up. The alternative approach of having the virtio block device control up to 16 disks allows having those 80 disks with just 5 slots (and 5 interrupts). This is similar to the way traditional SCSI controllers behave, and so should not surprise the guest OS. If you have a single virtio-blk device that shows up as 8 functions, we could achieve the same thing. We can cheat with the interrupt handlers to avoid cache line bouncing too. You can't cheat on all guests, and even on Linux, it's better to keep on doing what real hardware does than go off on a tangent than no one else uses. You'll have to cheat on -kick(), too. Virtio needs one exit per O(queue depth). With one spindle per ring, it doesn't make sense to have a queue depth 4 (or latency goes to hell), so you have many exits. Plus, we can use PCI hotplug so we don't have to reinvent a new hotplug mechanism. You can plug disks into a Fibre Channel mesh, so presumably that works on real hardware somehow. I'm inclined to think that ring sharing isn't as useful as it seems as long as we don't have indirect scatter gather lists. I agree, but I think that indirect sg is very important for storage: - a long sg list is cheap from the disk's point of view (the seeks are what's expensive) - it is important to keep the queue depth meaningful and small (O(spindles * 3)), as it drastically affects latency -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [patch 0/3] QEMU/KVM: add support for 128 PCI slots (v2)
Alexander Graf wrote: Marcelo Tosatti wrote: Add three PCI bridges to support 128 slots. Changes since v1: - Remove I/O address range support (so standard PCI I/O space is used). - Verify that there's no special quirks for 82801 PCI bridge. - Introduce separate flat IRQ mapping function for non-SPARC targets. I've cooled off on the 128 slot stuff, mainly because most real hosts don't have them. An unusual configuration will likely lead to problems as most guest OSes and workloads will not have been tested thoroughly with them. This is more of a let's do this conditionally than a let's not do it reason imho. Yes. More precisely, let's not do it until we're sure it works and performs. I don't think a queue-per-disk approach will perform well, since the queue will always be very short and will not be able to amortize exit costs and ring management overhead very well. - it requires a large number of interrupts, which are difficult to provide, and which it is hard to ensure all OSes support. MSI is relatively new. We could just as well extend the device layout to have every device be attached to one virtual IOAPIC pin, so we'd have like 128 / 4 = 32 IOAPICs in the system and one interrupt for each device. That's problematic for these reasons: - how many OSes work well with 32 IOAPICs? - at one point, you run out of interrupt vectors (~ 220 per cpu if the OS can allocate per-cpu vectors; otherwise just ~220) - you will have many interrupts fired, each for a single device with a few requests, reducing performance - is only a few interrupts are available, then each interrupt requires scanning a large number of queues This case should be rare, basically only existent with OSs that don't support APIC properly. Hopefully. The alternative approach of having the virtio block device control up to 16 disks allows having those 80 disks with just 5 slots (and 5 interrupts). This is similar to the way traditional SCSI controllers behave, and so should not surprise the guest OS. The one thing I'm actually really missing here is use cases. What are we doing this for? And further along the line, are there other approaches to the problems for which this was supposed to be a solution? Maybe someone can raise a case where it's not virtblk / virtnet. The requirement for lots of storage is a given. There are two ways of doing that, paying a lot of money to EMC or NetApp for a storage controller, or connecting lots of disks directly and doing the storage controller on the OS (what EMC and NetApp do anyway, inside their boxes). zfs is a good example of a use case, and I'd guess databases could use this too if they were able to supply the redundancy. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC] [VTD][patch 1/3] vt-d support for pci passthrough: kvm-vtd--kernel.patch
Kay, Allen M wrote: + +#include linux/list.h +#include linux/kvm_host.h +#include linux/pci.h +#include linux/dmar.h +#include linux/intel-iommu.h + +//#define DEBUG + +#define DEFAULT_DOMAIN_ADDRESS_WIDTH 48 The name domain is too generic; please use dma_domain or io_domain or something similar. +static int kvm_iommu_map_memslots(struct kvm *kvm) +{ + int i, status; + for (i = 0; i kvm-nmemslots; i++) { + status = kvm_iommu_map_pages(kvm, kvm-memslots[i].base_gfn, + kvm-memslots[i].npages); + if (status) + return status; Need to undo in case of partial completion. diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h index 5f93b78..6202ed1 100644 --- a/include/asm-x86/kvm_para.h +++ b/include/asm-x86/kvm_para.h @@ -170,5 +170,6 @@ struct kvm_pci_pt_info { struct kvm_pci_passthrough_dev { struct kvm_pci_pt_info guest; struct kvm_pci_pt_info host; + struct pci_dev *pdev;/* kernel device pointer for host dev */ This should be stored somewhere private (not sure, but I think kvm_pci_passthrough_dev is a public interface). -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC] [VTD][patch 2/3] vt-d support for pci passthrough: kvm-vtd-user.patch
Kay, Allen M wrote: Still todo: move vt.d to kvm-intel.ko module. Not sure it's the right thing to do. If we get the iommus abstracted properly, we can rename vtd.c to dma.c and move it to virt/kvm/. The code is certainly a lot more about managing memory than anything vmx specific. It's hardly x86 specific, even. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [ kvm-Bugs-1958519 ] fails to build KVM modules against 2.6.26 kernel
Ryota OZAKI wrote: Hi all, Initial Comment: Building KVM modules against 2.6.24 kernel is ok. But building against 2.6.26 kernel will fail. I got the same problem, but the following Andrea's patch helped me. Hope this helps, Yes, while I think it's a Kbuild problem, too many people are hitting it, so I applied the patch. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] janitorial: remove leftovers from merge conflict
Carlo Marcelo Arenas Belon wrote: apparently harmless and unique Sloppy me. Applied, thanks. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] KVM Test result, kernel 6307419.., userspace 77c9148.. -- 3 new issues
Yunfeng Zhao wrote: Three New Issues: 1. Unknown symbol in module while loading kvm.ko on PAE host https://sourceforge.net/tracker/?func=detailatid=893831aid=1958464group_id=180599 2. Fail to save restore and live migrate on 32e platform https://sourceforge.net/tracker/?func=detailatid=893831aid=1958467group_id=180599 3. fails to build KVM modules against 2.6.26 kernel https://sourceforge.net/tracker/?func=detailatid=893831aid=1958519group_id=180599 Fixed and pushed all three. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC] [VTD][patch 0/3] vt-d support for pci passthrough
Kay, Allen M wrote: Following three patches contains vt-d support for pci passthrough. It contains diff's base on Amit's 4/22 passthrough tree. The hardware environment used for this work is an Intel Weybridge system (Q35). The passthrough device is an E1000 NIC. I'm still using irqhook mechanism for interrupt injection as I had problem with irqchip machanism. Following is the command line I used to start the guest. /usr/local/bin/qemu-system-x86_64 -boot c -hda /etc/xen/fc5_32.img -m 256 -net none -pcidevice e1000/01:00.0-16 -no-kvm-irqchip Remaining tasks include: 1) Generated vtd.o with kvm-intel.ko instead of kvm.ko. 2) Make iommu hooks in generic code to be non-Intel specific Eventually we will want to make it even non-x86 specific; ia64 will probably be able to share, and maybe ppc someday. That needn't be done at once, though. Your mail client mangles the patches, please attach or use git send-email. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [ kvm-Bugs-1958715 ] kvm-userspace failed to start linux kernel (kernel panic)
On Tue, 06 May 2008 06:13:18 -0700 SourceForge.net [EMAIL PROTECTED] wrote: When I use the commit bae043c (kvm-userspace) I can start the liveCD but the next commit c33833a produces a kernel panic. I see the screen with different choice of installation but when I choose to install linux I get a kernel panic (see file attach). I insert the report of the kernel panic: --- [EMAIL PROTECTED]/local/kvm-userspace.git/bin]$ ./qemu-system-x86_64 -cdrom /images_iso/ubuntu-8.04-desktop-i386.iso -boot d -m 256 -serial stdio kvm_set_lapic: Bad file descriptor [0.00] Linux version 2.6.24-16-generic ([EMAIL PROTECTED]) (gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)) #1 SMP Thu Apr 10 13:23:42 UTC 2008 (Ubuntu 2.6.24-16.30-generic) [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: - 0009fc00 (usable) [0.00] BIOS-e820: 0009fc00 - 000a (reserved) [0.00] BIOS-e820: 000e8000 - 0010 (reserved) [0.00] BIOS-e820: 0010 - 0fff (usable) [0.00] BIOS-e820: 0fff - 1000 (ACPI data) [0.00] BIOS-e820: fffbd000 - 0001 (reserved) [0.00] 0MB HIGHMEM available. [0.00] 255MB LOWMEM available. [0.00] Zone PFN ranges: [0.00] DMA 0 - 4096 [0.00] Normal 4096 -65520 [0.00] HighMem 65520 -65520 [0.00] Movable zone start PFN for each node [0.00] early_node_map[1] active PFN ranges [0.00] 0:0 -65520 [0.00] DMI 2.4 present. [0.00] ACPI: RSDP signature @ 0xC00FB450 checksum 0 [0.00] ACPI: RSDP 000FB450, 0014 (r0 QEMU ) [0.00] ACPI: RSDT 0FFF, 002C (r1 QEMU QEMURSDT1 QEMU 1) [0.00] ACPI: FACP 0FFF002C, 0074 (r1 QEMU QEMUFACP1 QEMU 1) [0.00] ACPI: DSDT 0FFF0100, 2464 (r1 BXPC BXDSDT1 INTL 20061109) [0.00] ACPI: FACS 0FFF00C0, 0040 [0.00] ACPI: APIC 0FFF2568, 00E0 (r1 QEMU QEMUAPIC1 QEMU 1) [0.00] ACPI: PM-Timer IO Port: 0xb008 [0.00] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) [0.00] Processor #0 6:2 APIC version 20 [0.00] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] disabled) [0.00] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] disabled) [0.00] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] disabled) [0.00] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] disabled) [0.00] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] disabled) [0.00] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] disabled) [0.00] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] disabled) [0.00] ACPI: LAPIC (acpi_id[0x08] lapic_id[0x08] disabled) [0.00] ACPI: LAPIC (acpi_id[0x09] lapic_id[0x09] disabled) [0.00] ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x0a] disabled) [0.00] ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0b] disabled) [0.00] ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0c] disabled) [0.00] ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x0d] disabled) [0.00] ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x0e] disabled) [0.00] ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x0f] disabled) [0.00] ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0]) [0.00] IOAPIC[0]: apic_id 1, version 17, address 0xfec0, GSI 0-23 [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level) [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level) [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level) [0.00] Enabling APIC mode: Flat. Using 1 I/O APICs [0.00] Using ACPI (MADT) for SMP configuration information [0.00] Allocating PCI resources starting at 2000 (gap: 1000:effbd000) [0.00] swsusp: Registered nosave memory region: 0009f000 - 000a [0.00] swsusp: Registered nosave memory region: 000a - 000e8000 [0.00] swsusp: Registered nosave memory region: 000e8000 - 0010 [0.00] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 65009 [0.00] Kernel command line: BOOT_IMAGE=/casper/vmlinuz file=/cdrom/preseed/ubuntu.seed boot=casper initrd=/casper/initrd.gz console=ttyS0 [0.00] Enabling fast FPU save and restore... done. [0.00] Enabling unmasked SIMD FPU exception support... done. [0.00] Initializing CPU#0 [0.00] PID hash table entries: 1024 (order: 10, 4096 bytes) [0.00] Detected 3002.716 MHz processor. [ 18.835013] Console: colour VGA+ 80x25 [ 18.835162] console [ttyS0] enabled [ 18.977947] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes) [ 18.980655] Inode-cache hash table entries: 16384
[kvm-devel] [ kvm-Bugs-1958725 ] openSUSE 11.0 became broken with newer KVM
Bugs item #1958725, was opened at 2008-05-06 16:21 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1958725group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Technologov (technologov) Assigned to: Nobody/Anonymous (nobody) Summary: openSUSE 11.0 became broken with newer KVM Initial Comment: Host OS: Fedora7/x64, kernel 2.6.21 Guest OS: openSUSE 11.0 BETA2, 32-bit x86 DVD ISO, kernel 2.6.25 CPU: Intel Core 2 KVM: KVM-67 (bug also valid for KVM-68) KVM-67 broke openSUSE 11.0 on newer KVMs-67/68 on intel. command: ./qemu-kvm -cdrom openSUSE-11.0-BETA2-32-bit.iso -m 512 -hda myharddisk.qcow2 -boot d Symptoms: Red error message is displayed in the guest monitor, during setup stage2 (Yast) load. I have bisected it. qemu-merge for KVM-67 userspace is responsible for this bug, commit: c33833a3f98b1bb9d8208b0ed115009bc20e6e87 Works fully on KVM-66. On KVM-67/68 it works only with -no-kvm parameter. FAILS with default parameters, fails with -no-kvm-acpi, -no-kvm-pit, -no-kvm-irqchip, and fails when guest is loaded with normal or FAILSAFE kernel boot parameters. That is: fails in all cases. -Alexey Technologov, 6.May.2008. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1958725group_id=180599 - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Protected mode transitions and big real mode... still an issue
On Mon, 5 May 2008 16:29:21 +0300 Mohammed Gamal [EMAIL PROTECTED] wrote: On Mon, May 5, 2008 at 3:57 PM, Anthony Liguori [EMAIL PROTECTED] wrote: WinXP fails to boot with your patch applied too. FWIW, Ubuntu 8.04 has a fixed version of gfxboot that doesn't do nasty things with SS on privileged mode transitions. WinXP fails with the patch applied too. Ubuntu 7.10 live CD and FreeDOS don't boot but complain about instruction mov 0x11,sreg not being emulated. Can you try with this one please? On my computer it boots ubuntu-8.04-desktop-i386.iso liveCD and also openSUSE-10.3-GM-x86_64-mini.iso I will try FreeDOS and WinXP if I can find one ;) Regards, Guillaume --- diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 26c4f02..6e76c2e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1272,7 +1272,9 @@ static void enter_pmode(struct kvm_vcpu *vcpu) fix_pmode_dataseg(VCPU_SREG_GS, vcpu-arch.rmode.gs); fix_pmode_dataseg(VCPU_SREG_FS, vcpu-arch.rmode.fs); +#if 0 vmcs_write16(GUEST_SS_SELECTOR, 0); +#endif vmcs_write32(GUEST_SS_AR_BYTES, 0x93); vmcs_write16(GUEST_CS_SELECTOR, @@ -2633,6 +2635,73 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) return 1; } +static int invalid_guest_state(struct kvm_vcpu *vcpu, + struct kvm_run *kvm_run, u32 failure_reason) +{ + u16 ss, cs; + u8 opcodes[4]; + unsigned long rip = vcpu-arch.rip; + unsigned long rip_linear; + + ss = vmcs_read16(GUEST_SS_SELECTOR); + cs = vmcs_read16(GUEST_CS_SELECTOR); + + if ((ss 0x03) != (cs 0x03)) { + int err; + rip_linear = rip + vmx_get_segment_base(vcpu, VCPU_SREG_CS); + emulator_read_std(rip_linear, (void *)opcodes, 4, vcpu); +#if 0 + printk(KERN_INFO emulation at (%lx) rip %lx: %02x %02x %02x %02x\n, + rip_linear, + rip, opcodes[0], opcodes[1], opcodes[2], opcodes[3]); +#endif + err = emulate_instruction(vcpu, kvm_run, 0, 0, 0); + switch (err) { + case EMULATE_DONE: +#if 0 + printk(KERN_INFO successfully emulated instruction\n); +#endif + return 1; + case EMULATE_DO_MMIO: + printk(KERN_INFO mmio?\n); + return 0; + default: + kvm_report_emulation_failure(vcpu, vmentry failure); + break; + } + } + + kvm_run-exit_reason = KVM_EXIT_UNKNOWN; + kvm_run-hw.hardware_exit_reason = failure_reason; + return 0; +} + +static int handle_vmentry_failure(struct kvm_vcpu *vcpu, + struct kvm_run *kvm_run, + u32 failure_reason) +{ + unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION); +#if 0 + printk(KERN_INFO Failed vm entry (exit reason 0x%x) , failure_reason); +#endif + switch (failure_reason) { + case EXIT_REASON_INVALID_GUEST_STATE: +#if 0 + printk(invalid guest state \n); +#endif + return invalid_guest_state(vcpu, kvm_run, failure_reason); + case EXIT_REASON_MSR_LOADING: + printk(caused by MSR entry %ld loading.\n, exit_qualification); + break; + case EXIT_REASON_MACHINE_CHECK: + printk(caused by machine check.\n); + break; + default: + printk(reason not known yet!\n); + break; + } + return 0; +} /* * The exit handlers return 1 if the exit was handled fully and guest execution * may resume. Otherwise they set the kvm_run parameter to indicate what needs @@ -2694,6 +2763,12 @@ static int kvm_handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) exit_reason != EXIT_REASON_EPT_VIOLATION)) printk(KERN_WARNING %s: unexpected, valid vectoring info and exit reason is 0x%x\n, __func__, exit_reason); + + if ((exit_reason VMX_EXIT_REASONS_FAILED_VMENTRY)) { + exit_reason = ~VMX_EXIT_REASONS_FAILED_VMENTRY; + return handle_vmentry_failure(vcpu, kvm_run, exit_reason); + } + if (exit_reason kvm_vmx_max_exit_handlers kvm_vmx_exit_handlers[exit_reason]) return kvm_vmx_exit_handlers[exit_reason](vcpu, kvm_run); diff --git a/arch/x86/kvm/vmx.h b/arch/x86/kvm/vmx.h index 79d94c6..2cebf48 100644 --- a/arch/x86/kvm/vmx.h +++ b/arch/x86/kvm/vmx.h @@ -238,7 +238,10 @@ enum vmcs_field { #define EXIT_REASON_IO_INSTRUCTION 30 #define EXIT_REASON_MSR_READ31 #define
Re: [kvm-devel] [RFC] fix VMX TSC synchronicity
Avi Kivity wrote: [Resurrecting post from the dead] Marcelo Tosatti wrote: Forcing clustered APIC mode works only on SMP, and there were high CPU consumption on Windows SMP guests due to C3 state being reported (fixed in kvm-30 something). So perhaps: - Faking clustered APIC on SMP - Faking C3 on UP And turning of the TSC bit (for 32-bit guests). Is the way to go? Avi, do you understand why C3 was causing the Windows SMP problems ? It's probably inb()ing on the port in a loop. It's not SMP causing the problems, but the ACPI HAL. I'll check this. Yes, it's reading 0xb010 and 0xb014, which ought to place the cpu in sleep mode, but don't. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Protected mode transitions and big real mode... still an issue
Guillaume Thouvenin wrote: On Mon, 5 May 2008 16:29:21 +0300 Mohammed Gamal [EMAIL PROTECTED] wrote: On Mon, May 5, 2008 at 3:57 PM, Anthony Liguori [EMAIL PROTECTED] wrote: WinXP fails to boot with your patch applied too. FWIW, Ubuntu 8.04 has a fixed version of gfxboot that doesn't do nasty things with SS on privileged mode transitions. WinXP fails with the patch applied too. Ubuntu 7.10 live CD and FreeDOS don't boot but complain about instruction mov 0x11,sreg not being emulated. Can you try with this one please? On my computer it boots ubuntu-8.04-desktop-i386.iso liveCD and also openSUSE-10.3-GM-x86_64-mini.iso 8.04 is not a good test-case. 7.10 is what you want to try. The good news is, 7.10 appears to work! The bad news is that about 20% of the time, it crashes and displays the following: kvm_run: failed entry, reason 5 kvm_run returned -8 So something appears to be a bit buggy. Still, very good work! Regards, Anthony Liguori - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] kvm-67: kernel panic while booting debian-40r3-i386-businesscard.iso
Jan Luebbe wrote: 0f 0d 0bprefetchw (%ebx) This is an AMD 3Dnow! instruction, which is not supported on Intel processors. I guess the 3Dnow! cpuid bit leaked in via the qemu merge. I guess two fixes are needed: - remove the 3Dnow! bit - add emulation for prefetchw (easy, as it doesn't need to do anything) to support live migration from AMD to Intel This problem still occours with kvm-68. Which CPUs will be affected by this (is it only the Core Duo)? All Intels. I'm currently delaying the upload of a new kvm package to debian because of this. I've fixed it for kvm-69. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] s390 kvm_virtio.c build error
Martin Schwidefsky wrote: I've added Heiko's patch to my patchqueue. But since this is drivers/s390/kvm this should go in over the kvm.git. See patch below. Thanks, I added this to my queue as well. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] Build fix for kvm/ia64 userspace.
Zhang, Xiantao wrote: Hi, Avi This patch should go into RC1, otherwise it will block kvm/ia64 userspace build. diff --git a/include/asm-ia64/kvm.h b/include/asm-ia64/kvm.h index eb2d355..62b5fad 100644 --- a/include/asm-ia64/kvm.h +++ b/include/asm-ia64/kvm.h @@ -22,7 +22,12 @@ */ #include asm/types.h + +#ifdef __KERNEL__ #include asm/fpu.h +#else +#include signal.h +#endif Fishy. A kernel header including a userspace header? Maybe you need to include linux/signal.h unconditionally? -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 01 of 11] mmu-notifier-core
On Mon, May 05, 2008 at 02:46:25PM -0500, Jack Steiner wrote: If a task fails to unmap a GRU segment, they still exist at the start of Yes, this will also happen in case the well behaved task receives SIGKILL, so you can test it that way too. exit. On the -release callout, I set a flag in the container of my mmu_notifier that exit has started. As VMA are cleaned up, TLB flushes are skipped because of the flag is set. When the GRU VMA is deleted, I free GRU TLB flushes aren't skipped because your flag is set but because __mmu_notifier_release already executed list_del_init_rcu(grunotifier-hlist) before proceeding with unmap_vmas. my structure containing the notifier. As long as nobody can write through the already established gru tlbs and nobody can establish new tlbs after exit_mmap run you don't strictly need -release. I _think_ works. Do you see any problems? You can remove the flag and -release and -clear_flush_young (if you keep clear_flush_young implemented it should return 0). The synchronize_rcu after mmu_notifier_register can also be dropped thanks to mm_lock(). gru_drop_mmu_notifier should be careful with current-mm if you're using an fd and if the fd can be passed to a different task through unix sockets (you should probably fail any operation if current-mm != gru-mm). The way I use -release in KVM is to set the root hpa to -1UL (invalid) as a debug trap. That's only for debugging because even if tlb entries and sptes are still established on the secondary mmu they are only relevant when the cpu jumps to guest mode and that can never happen again after exit_mmap is started. I should also mention that I have an open-coded function that possibly belongs in mmu_notifier.c. A user is allowed to have multiple GRU segments. Each GRU has a couple of data structures linked to the VMA. All, however, need to share the same notifier. I currently open code a function that scans the notifier list to determine if a GRU notifier already exists. If it does, I update a refcnt use it. Otherwise, I register a new one. All of this is protected by the mmap_sem. Just in case I mangled the above description, I'll attach a copy of the GRU mmuops code. Well that function needs fixing w.r.t. srcu. Are you sure you want to search for mn-ops == gru_mmuops and not for mn == gmn? And if you search for mn why can't you keep track of the mn being registered or unregistered outside of the mmu_notifier layer? Set a bitflag in the container after mmu_notifier_register returns and a clear it after _unregister returns. I doubt saving one bitflag is worth searching the list and your approach make it obvious that you've to protect the bitflag and the register/unregister under write-mmap_sem yourself. Otherwise the find function will return an object that can be freed at any time if somebody calls unregister and kfree. (synchronize_srcu in mmu_notifier_unregister won't wait for anything but some outstanding srcu_read_lock) - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] problems running many guests
On Thursday 01 May 2008 7:16:53 pm Marcelo Tosatti wrote: Does -no-kvm-irqchip or -no-kvm-pit makes a difference? If not, please grab kvm_stat --once output when that happens. Per some suggestions I have moved up to kvm-68 which is better, but still having problems. Replicating the problem with only one guest spinning has proven quite difficult, but attempting to boot a large smp guest can reliably recreate the problem. Using -no-kvm-pit did not help the large guest and -no-kvm-irqchip made it seize up even earlier with only 1 cpu spinning instead of all of them. Also run readprofile -r ; readprofile -m System-map-of-guest.map with the host booted with profile=kvm. Make sure all guests are running the same kernel image. I got this from a spinning 16-way guest with only 8 of the host CPUs online and without either -no-kvm-irqchip or -no-kvm-pit: [EMAIL PROTECTED] ~]# readprofile -r ; readprofile -m karl/System.map-2.6.25-03591-g873c05f 101 native_read_tsc3.4828 1 read_persistent_clock 0.0192 25 kvm_clock_read 0.2660 95 getnstimeofday 0.7252 13 update_wall_time 0.0138 1 second_overflow0.0020 readprofile: profile address out of range. Wrong map file? The kvm_stat output during this is: [EMAIL PROTECTED] ~]# kvm_stat --once efer_reload23354 0 exits3587109 2250 fpu_reload 1934298 0 halt_exits 4583 0 halt_wakeup 42 0 host_state_reload2165502 167 hypercalls 1482 0 insn_emulation900199 0 insn_emulation_fail0 0 invlpg 0 0 io_exits 1983116 0 irq_exits 427728 2250 irq_window 0 0 largepages 0 0 mmio_exits163522 0 mmu_cache_miss 176 0 mmu_flooded 99 0 mmu_pde_zapped 191 0 mmu_pte_updated 10 0 mmu_pte_write 59030 0 mmu_recycled 0 0 mmu_shadow_zapped 99 0 pf_fixed 14890 0 pf_guest 0 0 remote_tlb_flush 29 0 request_irq0 0 signal_exits 1 0 tlb_flush 481952 0 The output with -no-kvm-pit looked almost identical and with -no-kvm-pit there was no samples registered for either tool. -- Karl Rister IBM Linux Performance Team [EMAIL PROTECTED] (512) 838-1553 (t/l 678) - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 0 of 2] [RESEND] [PowerPC] Fix setting memory for bamboo board model
Jerone Young wrote: These patches fell through the cracks. Unfortunately, the cracks are getting wider. Anyway, applied, thanks. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 0/4] paravirt clock patches
Marcelo Tosatti wrote: F8 host, recent kvm-userspace.git (so with IO thread), recent kvm.git (plus your patches), haven't tried 2x but I think 4x is not necessary to reproduce the problem. Ok, see it too. Seem to be actually two (maybe related) problems. First the guest hangs hard after a while, burning 100% CPU time (deadlocked I guess), doesn't respond to sysrq any more. Is there some easy way to get the guest vcpu state then? EIP for starters, preferably with stack trace? The other one is that one ticks slower than the other. I don't see it from start, but after a while it starts happening (unless the guest deadlocks before ...). cheers, Gerd -- http://kraxel.fedorapeople.org/xenner/ - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] KVM: PIT: take inject_pending into account when emulating hlt
Otherwise hlt emulation fails if PIT is not injecting IRQ's. Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED] diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index 1646102..07f9ff1 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -216,7 +216,7 @@ int pit_has_pending_timer(struct kvm_vcpu *vcpu) { struct kvm_pit *pit = vcpu-kvm-arch.vpit; - if (pit vcpu-vcpu_id == 0) + if (pit vcpu-vcpu_id == 0 pit-pit_state.inject_pending) return atomic_read(pit-pit_state.pit_timer.pending); return 0; - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] problems running many guests
Hi Karl, On Mon, May 05, 2008 at 08:40:22PM -0500, Karl Rister wrote: On Thursday 01 May 2008 7:16:53 pm Marcelo Tosatti wrote: Does -no-kvm-irqchip or -no-kvm-pit makes a difference? If not, please grab kvm_stat --once output when that happens. Per some suggestions I have moved up to kvm-68 which is better, but still having problems. Replicating the problem with only one guest spinning has proven quite difficult, but attempting to boot a large smp guest can reliably recreate the problem. Using -no-kvm-pit did not help the large guest and -no-kvm-irqchip made it seize up even earlier with only 1 cpu spinning instead of all of them. Also run readprofile -r ; readprofile -m System-map-of-guest.map with the host booted with profile=kvm. Make sure all guests are running the same kernel image. I got this from a spinning 16-way guest with only 8 of the host CPUs online and without either -no-kvm-irqchip or -no-kvm-pit: [EMAIL PROTECTED] ~]# readprofile -r ; readprofile -m karl/System.map-2.6.25-03591-g873c05f 101 native_read_tsc3.4828 1 read_persistent_clock 0.0192 25 kvm_clock_read 0.2660 95 getnstimeofday 0.7252 13 update_wall_time 0.0138 1 second_overflow0.0020 readprofile: profile address out of range. Wrong map file? KVM clock has known problems with SMP guests, please disable it for now. Also disable LOCKDEP on the guest if it has more VCPU's than CPU's available in the host. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH] fixup 3dnow! support
qemu recently added support for 3dnow instructions. Because of that, 3dnow will be featured among cpuid bits. But this will break kvm in cpus that don't have those instructions (which includes my laptop). So we fixup our cpuid before exposing it to the guest. Signed-off-by: Glauber Costa [EMAIL PROTECTED] --- arch/x86/kvm/x86.c | 22 ++ include/asm-x86/cpufeature.h |2 ++ 2 files changed, 20 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 979f983..e79fcd5 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -919,7 +919,7 @@ static int is_efer_nx(void) return efer EFER_NX; } -static void cpuid_fix_nx_cap(struct kvm_vcpu *vcpu) +static void cpuid_fix_caps(struct kvm_vcpu *vcpu) { int i; struct kvm_cpuid_entry2 *e, *entry; @@ -932,6 +932,20 @@ static void cpuid_fix_nx_cap(struct kvm_vcpu *vcpu) break; } } + + /* 3DNOWEXT */ + if (entry (entry-edx (1 30)) !cpu_has_3dnowext) { + entry-edx = ~(1 30); + printk(KERN_INFO kvm: guest 3DNOWEXT capability removed\n); + } + + /* 3DNOW */ + if (entry (entry-edx (1 31)) !cpu_has_3dnow) { + entry-edx = ~(1 31); + printk(KERN_INFO kvm: guest 3DNOW capability removed\n); + } + + /* NX */ if (entry (entry-edx (1 20)) !is_efer_nx()) { entry-edx = ~(1 20); printk(KERN_INFO kvm: guest NX capability removed\n); @@ -970,7 +984,7 @@ static int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu, vcpu-arch.cpuid_entries[i].padding[2] = 0; } vcpu-arch.cpuid_nent = cpuid-nent; - cpuid_fix_nx_cap(vcpu); + cpuid_fix_caps(vcpu); r = 0; out_free: @@ -1061,8 +1075,8 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, bit(X86_FEATURE_LM) | #endif bit(X86_FEATURE_MMXEXT) | - bit(X86_FEATURE_3DNOWEXT) | - bit(X86_FEATURE_3DNOW); + (bit(X86_FEATURE_3DNOWEXT) cpu_has_3dnowext) | + (bit(X86_FEATURE_3DNOW) cpu_has_3dnow); const u32 kvm_supported_word3_x86_features = bit(X86_FEATURE_XMM3) | bit(X86_FEATURE_CX16); const u32 kvm_supported_word6_x86_features = diff --git a/include/asm-x86/cpufeature.h b/include/asm-x86/cpufeature.h index 0d609c8..efbc5ce 100644 --- a/include/asm-x86/cpufeature.h +++ b/include/asm-x86/cpufeature.h @@ -187,6 +187,8 @@ extern const char * const x86_power_flags[32]; #define cpu_has_gbpagesboot_cpu_has(X86_FEATURE_GBPAGES) #define cpu_has_arch_perfmon boot_cpu_has(X86_FEATURE_ARCH_PERFMON) #define cpu_has_patboot_cpu_has(X86_FEATURE_PAT) +#define cpu_has_3dnow boot_cpu_has(X86_FEATURE_3DNOW) +#define cpu_has_3dnowext boot_cpu_has(X86_FEATURE_3DNOWEXT) #if defined(CONFIG_X86_INVLPG) || defined(CONFIG_X86_64) # define cpu_has_invlpg1 -- 1.5.0.6 - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] QEMU/KVM: fix copypaste bug in ACPI IRQ routing tables
Slots 9 and 25 were using the identifier of the previous slot. Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED] diff --git a/bios/acpi-dsdt.dsl b/bios/acpi-dsdt.dsl index d2e33f4..c145c4b 100755 --- a/bios/acpi-dsdt.dsl +++ b/bios/acpi-dsdt.dsl @@ -269,10 +269,10 @@ DefinitionBlock ( Package() {0x0008, 3, LNKC, 0}, // PCI Slot 9 -Package() {0x0008, 0, LNKA, 0}, -Package() {0x0008, 1, LNKB, 0}, -Package() {0x0008, 2, LNKC, 0}, -Package() {0x0008, 3, LNKD, 0}, +Package() {0x0009, 0, LNKA, 0}, +Package() {0x0009, 1, LNKB, 0}, +Package() {0x0009, 2, LNKC, 0}, +Package() {0x0009, 3, LNKD, 0}, // PCI Slot 10 Package() {0x000a, 0, LNKB, 0}, @@ -365,10 +365,10 @@ DefinitionBlock ( Package() {0x0018, 3, LNKC, 0}, // PCI Slot 25 -Package() {0x0018, 0, LNKA, 0}, -Package() {0x0018, 1, LNKB, 0}, -Package() {0x0018, 2, LNKC, 0}, -Package() {0x0018, 3, LNKD, 0}, +Package() {0x0019, 0, LNKA, 0}, +Package() {0x0019, 1, LNKB, 0}, +Package() {0x0019, 2, LNKC, 0}, +Package() {0x0019, 3, LNKD, 0}, // PCI Slot 26 Package() {0x001a, 0, LNKB, 0}, - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 0/4] paravirt clock patches
Gerd Hoffmann wrote: Marcelo Tosatti wrote: F8 host, recent kvm-userspace.git (so with IO thread), recent kvm.git (plus your patches), haven't tried 2x but I think 4x is not necessary to reproduce the problem. Ok, see it too. Seem to be actually two (maybe related) problems. First the guest hangs hard after a while, burning 100% CPU time (deadlocked I guess), doesn't respond to sysrq any more. Is there some easy way to get the guest vcpu state then? Hmm, info registers in qemu monitor hangs ... cheers, Gerd -- http://kraxel.fedorapeople.org/xenner/ - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Protected mode transitions and big real mode... still an issue
On Tue, May 6, 2008 at 5:30 PM, Anthony Liguori [EMAIL PROTECTED] wrote: Guillaume Thouvenin wrote: On Mon, 5 May 2008 16:29:21 +0300 Mohammed Gamal [EMAIL PROTECTED] wrote: On Mon, May 5, 2008 at 3:57 PM, Anthony Liguori [EMAIL PROTECTED] wrote: WinXP fails to boot with your patch applied too. FWIW, Ubuntu 8.04 has a fixed version of gfxboot that doesn't do nasty things with SS on privileged mode transitions. WinXP fails with the patch applied too. Ubuntu 7.10 live CD and FreeDOS don't boot but complain about instruction mov 0x11,sreg not being emulated. Can you try with this one please? On my computer it boots ubuntu-8.04-desktop-i386.iso liveCD and also openSUSE-10.3-GM-x86_64-mini.iso 8.04 is not a good test-case. 7.10 is what you want to try. The good news is, 7.10 appears to work! The bad news is that about 20% of the time, it crashes and displays the following: kvm_run: failed entry, reason 5 kvm_run returned -8 So something appears to be a bit buggy. Still, very good work! Regards, Anthony Liguori 7.10 liveCD doesn't work with me at all. It only works with -no-kvm - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] mmu notifier v15 - v16 diff
Hello everyone, This is to allow GRU code to call __mmu_notifier_register inside the mmap_sem (write mode is required as documented in the patch). It also removes the requirement to implement -release as it's not guaranteed all users will really need it. I didn't integrate the search function as we can sort that out after 2.6.26 is out and it wasn't entirely obvious it's really needed, as the driver should be able to track if a mmu notifier is registered in the container. diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -29,10 +29,25 @@ struct mmu_notifier_ops { /* * Called either by mmu_notifier_unregister or when the mm is * being destroyed by exit_mmap, always before all pages are -* freed. It's mandatory to implement this method. This can -* run concurrently with other mmu notifier methods and it +* freed. This can run concurrently with other mmu notifier +* methods (the ones invoked outside the mm context) and it * should tear down all secondary mmu mappings and freeze the -* secondary mmu. +* secondary mmu. If this method isn't implemented you've to +* be sure that nothing could possibly write to the pages +* through the secondary mmu by the time the last thread with +* tsk-mm == mm exits. +* +* As side note: the pages freed after -release returns could +* be immediately reallocated by the gart at an alias physical +* address with a different cache model, so if -release isn't +* implemented because all _software_ driven memory accesses +* through the secondary mmu are terminated by the time the +* last thread of this mm quits, you've also to be sure that +* speculative _hardware_ operations can't allocate dirty +* cachelines in the cpu that could not be snooped and made +* coherent with the other read and write operations happening +* through the gart alias address, so leading to memory +* corruption. */ void (*release)(struct mmu_notifier *mn, struct mm_struct *mm); diff --git a/mm/mmap.c b/mm/mmap.c --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2340,13 +2340,20 @@ static inline void __mm_unlock(spinlock_ /* * This operation locks against the VM for all pte/vma/mm related * operations that could ever happen on a certain mm. This includes - * vmtruncate, try_to_unmap, and all page faults. The holder - * must not hold any mm related lock. A single task can't take more - * than one mm_lock in a row or it would deadlock. + * vmtruncate, try_to_unmap, and all page faults. * - * The mmap_sem must be taken in write mode to block all operations - * that could modify pagetables and free pages without altering the - * vma layout (for example populate_range() with nonlinear vmas). + * The caller must take the mmap_sem in read or write mode before + * calling mm_lock(). The caller isn't allowed to release the mmap_sem + * until mm_unlock() returns. + * + * While mm_lock() itself won't strictly require the mmap_sem in write + * mode to be safe, in order to block all operations that could modify + * pagetables and free pages without need of altering the vma layout + * (for example populate_range() with nonlinear vmas) the mmap_sem + * must be taken in write mode by the caller. + * + * A single task can't take more than one mm_lock in a row or it would + * deadlock. * * The sorting is needed to avoid lock inversion deadlocks if two * tasks run mm_lock at the same time on different mm that happen to @@ -2377,17 +2384,13 @@ int mm_lock(struct mm_struct *mm, struct { spinlock_t **anon_vma_locks, **i_mmap_locks; - down_write(mm-mmap_sem); if (mm-map_count) { anon_vma_locks = vmalloc(sizeof(spinlock_t *) * mm-map_count); - if (unlikely(!anon_vma_locks)) { - up_write(mm-mmap_sem); + if (unlikely(!anon_vma_locks)) return -ENOMEM; - } i_mmap_locks = vmalloc(sizeof(spinlock_t *) * mm-map_count); if (unlikely(!i_mmap_locks)) { - up_write(mm-mmap_sem); vfree(anon_vma_locks); return -ENOMEM; } @@ -2426,10 +2429,12 @@ static void mm_unlock_vfree(spinlock_t * /* * mm_unlock doesn't require any memory allocation and it won't fail. * + * The mmap_sem cannot be released until mm_unlock returns. + * * All memory has been previously allocated by mm_lock and it'll be * all freed before returning. Only after mm_unlock returns, the * caller is allowed to free and forget the mm_lock_data structure. - * + * * mm_unlock runs in O(N) where N is the max number of VMAs in the * mm. The max number of vmas is defined in *
Re: [kvm-devel] [PATCH 1/4] Replace SIGUSR1 in io-thread with eventfd() (v2)
Looks good (the whole series). Needs some good testing of course... Have you tested migration/loadvm? On Mon, May 05, 2008 at 08:47:12AM -0500, Anthony Liguori wrote: It's a little odd to use signals to raise a notification on a file descriptor when we can just work directly with a file descriptor instead. This patch converts the SIGUSR1 based notification in the io-thread to instead use an eventfd file descriptor. If eventfd isn't available, we use a pipe() instead. The benefit of using eventfd is that multiple notifications will be batched into a signal IO event. Signed-off-by: Anthony Liguori [EMAIL PROTECTED] - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 1/4] Replace SIGUSR1 in io-thread with eventfd() (v2)
Marcelo Tosatti wrote: Looks good (the whole series). Needs some good testing of course... Have you tested migration/loadvm? No, but I will before resubmitting (which should be sometime tomorrow). Regards, Anthony Liguori On Mon, May 05, 2008 at 08:47:12AM -0500, Anthony Liguori wrote: It's a little odd to use signals to raise a notification on a file descriptor when we can just work directly with a file descriptor instead. This patch converts the SIGUSR1 based notification in the io-thread to instead use an eventfd file descriptor. If eventfd isn't available, we use a pipe() instead. The benefit of using eventfd is that multiple notifications will be batched into a signal IO event. Signed-off-by: Anthony Liguori [EMAIL PROTECTED] - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC] [VTD][patch 1/3] vt-d support for pci passthrough: kvm-vtd--kernel.patch
+ +#define DEFAULT_DOMAIN_ADDRESS_WIDTH 48 + +struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev *dev); +struct dmar_domain * iommu_alloc_domain(struct intel_iommu *iommu); +void iommu_free_domain(struct dmar_domain *domain); +int domain_init(struct dmar_domain *domain, int guest_width); +int domain_context_mapping(struct dmar_domain *d, +struct pci_dev *pdev); +int domain_page_mapping(struct dmar_domain *domain, dma_addr_t iova, +u64 hpa, size_t size, int prot); +void detach_domain_for_dev(struct dmar_domain *domain, u8 bus, u8 devfn); +struct dmar_domain * find_domain(struct pci_dev *pdev); Please move these to a .h file and also prefix appropriate keywords: domain_context_mapping is confusing and since it's an intel iommu-only thing, use something like intel_iommu_domain_context_mapping These functions currently are just direct calls into existing functions in drivers/pci/intel-iommu.c - hence the lack of more descriptive name in KVM environment. To get more relavant names in KVM environment, we can either create wrappers for these functions or using a iommu function table. Allen - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] fixup 3dnow! support
Alexander Graf wrote: On May 6, 2008, at 6:27 PM, Glauber Costa wrote: qemu recently added support for 3dnow instructions. Because of that, 3dnow will be featured among cpuid bits. But this will break kvm in cpus that don't have those instructions (which includes my laptop). So we fixup our cpuid before exposing it to the guest. I actually don't see where the problem is here. As far as I read the code, the CPUID feature function gets received from the host CPU and bitwise ANDed with a bunch of features that are known to work. What's wrong with that approach? Probably is that besides that known to work features, there are also features that qemu puts in unconditionally. Among them, 3DNOW. But I'm pretty sure Dao can tell us a lot more about this. Sure, it would be welcome. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC] [VTD][patch 1/3] vt-d support for pci passthrough: kvm-vtd--kernel.patch
Kay, Allen M wrote: Kvm kernel changes. Signed-off-by: Allen M Kay [EMAIL PROTECTED] -- arch/x86/kvm/Makefile |2 arch/x86/kvm/vtd.c | 183 + arch/x86/kvm/x86.c |7 + include/asm-x86/kvm_host.h |3 include/asm-x86/kvm_para.h |1 include/linux/kvm_host.h |6 + virt/kvm/kvm_main.c|3 7 files changed, 204 insertions(+), 1 deletion(-) -- diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index c97d35c..b1057fb 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -12,7 +12,7 @@ EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm kvm-objs := $(common-objs) x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o \ i8254.o obj-$(CONFIG_KVM) += kvm.o -kvm-intel-objs = vmx.o +kvm-intel-objs = vmx.o vtd.o obj-$(CONFIG_KVM_INTEL) += kvm-intel.o kvm-amd-objs = svm.o obj-$(CONFIG_KVM_AMD) += kvm-amd.o diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c new file mode 100644 index 000..9a080b5 --- /dev/null +++ b/arch/x86/kvm/vtd.c @@ -0,0 +1,183 @@ +/* + * Copyright (c) 2006, Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + * + * Copyright (C) 2006-2008 Intel Corporation + * Author: Allen M. Kay [EMAIL PROTECTED] + * Author: Weidong Han [EMAIL PROTECTED] + */ + +#include linux/list.h +#include linux/kvm_host.h +#include linux/pci.h +#include linux/dmar.h +#include linux/intel-iommu.h + +//#define DEBUG + +#define DEFAULT_DOMAIN_ADDRESS_WIDTH 48 + +struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev *dev); +struct dmar_domain * iommu_alloc_domain(struct intel_iommu *iommu); +void iommu_free_domain(struct dmar_domain *domain); +int domain_init(struct dmar_domain *domain, int guest_width); +int domain_context_mapping(struct dmar_domain *d, + struct pci_dev *pdev); +int domain_page_mapping(struct dmar_domain *domain, dma_addr_t iova, + u64 hpa, size_t size, int prot); +void detach_domain_for_dev(struct dmar_domain *domain, u8 bus, u8 devfn); +struct dmar_domain * find_domain(struct pci_dev *pdev); These definitely need to be moved to a common header. + +int kvm_iommu_map_pages(struct kvm *kvm, + gfn_t base_gfn, unsigned long npages) +{ + unsigned long gpa; + struct page *page; + hpa_t hpa; + int j, write; + struct vm_area_struct *vma; + + if (!kvm-arch.domain) + return 1; In the kernel, we should be using -errno to return error codes. + gpa = base_gfn PAGE_SHIFT; + page = gfn_to_page(kvm, base_gfn); + hpa = page_to_phys(page); Please use gfn_to_pfn(). Keep in mind, by using gfn_to_page/gfn_to_pfn, you take a reference to a page. You're leaking that reference here. + printk(KERN_DEBUG kvm_iommu_map_page: gpa = %lx\n, gpa); + printk(KERN_DEBUG kvm_iommu_map_page: hpa = %llx\n, hpa); + printk(KERN_DEBUG kvm_iommu_map_page: size = %lx\n, + npages*PAGE_SIZE); + + for (j = 0; j npages; j++) { + gpa += PAGE_SIZE; + page = gfn_to_page(kvm, gpa PAGE_SHIFT); + hpa = page_to_phys(page); Again, gfn_to_pfn() and you're taking a reference that I never see you releasing. + domain_page_mapping(kvm-arch.domain, gpa, hpa, PAGE_SIZE, + DMA_PTE_READ | DMA_PTE_WRITE); + vma = find_vma(current-mm, gpa); + if (!vma) + return 1; + write = (vma-vm_flags VM_WRITE) != 0; + get_user_pages(current, current-mm, gpa, + PAGE_SIZE, write, 0, NULL, NULL); I don't quite see what you're doing here. It looks like you're trying to pre-fault the page in? gfn_to_pfn will do that for you. You're taking a bunch of references here that are never getting released. I think the general approach here is a bit faulty. I think what we want to do is mlock() from userspace to ensure all the memory is present for the guest. We should combine this with MMU-notifiers such that whenever the userspace mapping changes, we can reprogram the IOMMU. In the case where we don't have MMU-notifiers, we simply hold on to the memory forever and
Re: [kvm-devel] [RFC] [VTD][patch 3/3] vt-d support for pci passthrough: kvm-intel-iommu.patch
Kay, Allen M wrote: Intel-iommu driver changes for kvm vt-d support. Important changes are in intel-iommu.c. The rest of the changes are for moving intel-iommu.h and iova.h from drivers/pci directory to include/linux directory. Signed-off-by: Allen M Kay [EMAIL PROTECTED] b/drivers/pci/dmar.c |4 b/drivers/pci/intel-iommu.c | 26 ++- b/drivers/pci/iova.c |2 b/include/linux/intel-iommu.h | 344 ++ b/include/linux/iova.h| 52 ++ drivers/pci/intel-iommu.h | 344 -- drivers/pci/iova.h| 52 -- 7 files changed, 416 insertions(+), 408 deletions(-) diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c index f941f60..a58a5b0 100644 --- a/drivers/pci/dmar.c +++ b/drivers/pci/dmar.c @@ -26,8 +26,8 @@ #include linux/pci.h #include linux/dmar.h -#include iova.h -#include intel-iommu.h +#include linux/iova.h +#include linux/intel-iommu.h #undef PREFIX #define PREFIX DMAR: diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c index 4cb949f..bfa888b 100644 --- a/drivers/pci/intel-iommu.c +++ b/drivers/pci/intel-iommu.c @@ -31,8 +31,8 @@ #include linux/dmar.h #include linux/dma-mapping.h #include linux/mempool.h -#include iova.h -#include intel-iommu.h +#include linux/iova.h +#include linux/intel-iommu.h #include asm/proto.h /* force_iommu in this header in x86-64*/ #include asm/cacheflush.h #include asm/gart.h @@ -1056,7 +1056,7 @@ static void free_iommu(struct intel_iommu *iommu) kfree(iommu); } -static struct dmar_domain * iommu_alloc_domain(struct intel_iommu *iommu) +struct dmar_domain * iommu_alloc_domain(struct intel_iommu *iommu) { unsigned long num; unsigned long ndomains; @@ -1086,8 +1086,9 @@ static struct dmar_domain * iommu_alloc_domain(struct intel_iommu *iommu) return domain; } +EXPORT_SYMBOL_GPL(iommu_alloc_domain); -static void iommu_free_domain(struct dmar_domain *domain) +void iommu_free_domain(struct dmar_domain *domain) { unsigned long flags; @@ -1095,6 +1096,7 @@ static void iommu_free_domain(struct dmar_domain *domain) clear_bit(domain-id, domain-iommu-domain_ids); spin_unlock_irqrestore(domain-iommu-lock, flags); } +EXPORT_SYMBOL_GPL(iommu_free_domain); static struct iova_domain reserved_iova_list; static struct lock_class_key reserved_alloc_key; @@ -1160,7 +1162,7 @@ static inline int guestwidth_to_adjustwidth(int gaw) return agaw; } -static int domain_init(struct dmar_domain *domain, int guest_width) +int domain_init(struct dmar_domain *domain, int guest_width) { I think it's already been mentioned, but these are pretty terrible names if you're exporting these symbols. Linux supports other IOMMUs so VT-d should not be hogging the iommu_* namespace. Regards, Anthony Liguori - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC] [VTD][patch 2/3] vt-d support for pci passthrough: kvm-vtd-user.patch
Avi Kivity wrote: Kay, Allen M wrote: Still todo: move vt.d to kvm-intel.ko module. Not sure it's the right thing to do. If we get the iommus abstracted properly, we can rename vtd.c to dma.c and move it to virt/kvm/. The code is certainly a lot more about managing memory than anything vmx specific. It's hardly x86 specific, even. Really, an external interface to KVM that allowed someone to query the GPA = PA mapping would suffice. It should not fault in pages that aren't present and we should provide notifications for when the mapping changes for a given reason. Userspace can enforce the requirement that memory remains present via mlock(). This allows us to implement a PV API for DMA registration without the IOMMU code having any particular knowledge of it. Regards, Anthony Liguori - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC] [VTD][patch 1/3] vt-d support for pci passthrough: kvm-vtd--kernel.patch
We have to ensure we don't swap KVM guest memory while using hardware pass-through, but AFAICT, we do not need to make the memory non-reclaimable As long as we reprogram the IOMMU with a new, valid, mapping everything should be fine. mlock() really gives us the right semantics. Semantically, a PV API that supports DMA window registration simply mlock()s the DMA regions on behalf of the guest. No special logic should be needed. What should be done for unmodified guest where there is no PV driver in the guest? Would a call to mlock() from qemu/hw/pci-passthrough.c/add_pci_passthrough_device() a reasonable thing to do? Allen - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC] [VTD][patch 1/3] vt-d support for pci passthrough: kvm-vtd--kernel.patch
Kay, Allen M wrote: We have to ensure we don't swap KVM guest memory while using hardware pass-through, but AFAICT, we do not need to make the memory non-reclaimable As long as we reprogram the IOMMU with a new, valid, mapping everything should be fine. mlock() really gives us the right semantics. Semantically, a PV API that supports DMA window registration simply mlock()s the DMA regions on behalf of the guest. No special logic should be needed. What should be done for unmodified guest where there is no PV driver in the guest? Would a call to mlock() from qemu/hw/pci-passthrough.c/add_pci_passthrough_device() a reasonable thing to do? Yup. The idea is to ensure that the memory is always present, without necessarily taking a reference to it. This allows for memory reclaiming which should allow for things like NUMA page migration. We can't swap of course but that doesn't mean reclaimation isn't useful. Regards, Anthony Liguori Allen - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Protected mode transitions and big real mode... still an issue
On Tue, 06 May 2008 09:30:44 -0500 Anthony Liguori [EMAIL PROTECTED] wrote: 8.04 is not a good test-case. 7.10 is what you want to try. Oh yes you're right. I tried 8.04 because Balaji had problems to boot it with the patch. The good news is, 7.10 appears to work! The bad news is that about 20% of the time, it crashes and displays the following: kvm_run: failed entry, reason 5 kvm_run returned -8 So something appears to be a bit buggy. Still, very good work! I can see the problem with openSuse10.3 too but no so often I'm looking for this issue. Thank you for the help, Regards, Guillaume - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel