[COMMIT stable-2.6.30] Define true and false bool constants
From: Avi Kivity a...@redhat.com Needed by iommu.h. Signed-off-by: Avi Kivity a...@redhat.com diff --git a/external-module-compat-comm.h b/external-module-compat-comm.h index 581d867..d473ae7 100644 --- a/external-module-compat-comm.h +++ b/external-module-compat-comm.h @@ -199,6 +199,9 @@ uint64_t div64_u64(uint64_t dividend, uint64_t divisor); typedef _Bool bool; +#define false 0 +#define true 1 + #endif #endif -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT stable-2.6.30] Include asm/uaccess.h on pre-2.6.18 kernels
From: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/external-module-compat-comm.h b/external-module-compat-comm.h index d473ae7..e2342db 100644 --- a/external-module-compat-comm.h +++ b/external-module-compat-comm.h @@ -302,7 +302,11 @@ static inline void pagefault_enable(void) #endif +#if LINUX_VERSION_CODE KERNEL_VERSION(2,6,18) +#include asm/uaccess.h +#else #include linux/uaccess.h +#endif /* vm ops -fault() was introduced in 2.6.23. */ #include linux/mm.h -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT stable-2.6.30] Update source link for v2.6.30
From: Avi Kivity a...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/linux-2.6 b/linux-2.6 index 9fa7eb2..07a2039 16 --- a/linux-2.6 +++ b/linux-2.6 @@ -1 +1 @@ -Subproject commit 9fa7eb283c5cdc2b0f4a8cfe6387ed82e5e9a3d3 +Subproject commit 07a2039b8eb0af4ff464efd3dfd95de5c02648c6 -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] bios: Fix MADT table creation
From: Beth Kon e...@us.ibm.com Correct MADT table size calculation. Based on patch from Vincent Minet. Signed-off-by: Beth Kon e...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index 369cbef..cdae363 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -86,6 +86,8 @@ typedef unsigned long long uint64_t; #define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg)) #define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1) +#define MAX_INT_OVERRIDES 16 + static inline void outl(int addr, int val) { asm volatile (outl %1, %w0 : : d (addr), a (val)); @@ -1600,7 +1602,7 @@ void acpi_bios_init(void) uint32_t hpet_addr; #endif uint32_t base_addr, rsdt_addr, fadt_addr, addr, facs_addr, dsdt_addr, ssdt_addr; -uint32_t acpi_tables_size, madt_addr, madt_size, rsdt_size; +uint32_t acpi_tables_size, madt_addr, madt_size, rsdt_size, madt_end; uint32_t srat_addr,srat_size; uint16_t i, external_tables; int nb_numa_nodes; @@ -1668,7 +1670,7 @@ void acpi_bios_init(void) madt_size = sizeof(*madt) + sizeof(struct madt_processor_apic) * MAX_CPUS + #ifdef BX_QEMU -sizeof(struct madt_io_apic) /* + sizeof(struct madt_int_override) */; +sizeof(struct madt_io_apic) + sizeof(struct madt_int_override) * MAX_INT_OVERRIDES; #else sizeof(struct madt_io_apic); #endif @@ -1786,8 +1788,9 @@ void acpi_bios_init(void) continue; } int_override++; -madt_size += sizeof(struct madt_int_override); } +madt_end = (uint32_t)int_override; +madt_size = madt_end - madt_addr; acpi_build_table_header((struct acpi_table_header *)madt, APIC, madt_size, 1); } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: MCE: Fix compiling without CONFIG_X86_MCE
From: Huang Ying ying.hu...@intel.com Enclose do_machine_check with #ifdef CONFIG_X86_MCE. Reported-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Huang Ying ying.hu...@intel.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 673bcb3..8c60db6 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2681,12 +2681,14 @@ static int handle_rmode_exception(struct kvm_vcpu *vcpu, */ static void kvm_machine_check(void) { +#if defined(CONFIG_X86_MCE) defined(CONFIG_X86_64) struct pt_regs regs = { .cs = 3, /* Fake ring 3 no matter what the guest ran on */ .flags = X86_EFLAGS_IF, }; do_machine_check(regs, 0); +#endif } static int handle_machine_check(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: MMU: Adjust pte accessors to explicitly indicate guest or shadow pte
From: Avi Kivity a...@redhat.com Since the guest and host ptes can have wildly different format, adjust the pte accessor names to indicate on which type of pte they operate on. No functional changes. Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 8e5003c..415630d 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -240,12 +240,12 @@ static int is_writeble_pte(unsigned long pte) return pte PT_WRITABLE_MASK; } -static int is_dirty_pte(unsigned long pte) +static int is_dirty_gpte(unsigned long pte) { return pte PT_DIRTY_MASK; } -static int is_rmap_pte(u64 pte) +static int is_rmap_spte(u64 pte) { return is_shadow_present_pte(pte); } @@ -498,7 +498,7 @@ static void rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn, int lpage) unsigned long *rmapp; int i; - if (!is_rmap_pte(*spte)) + if (!is_rmap_spte(*spte)) return; gfn = unalias_gfn(vcpu-kvm, gfn); sp = page_header(__pa(spte)); @@ -560,7 +560,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte) unsigned long *rmapp; int i; - if (!is_rmap_pte(*spte)) + if (!is_rmap_spte(*spte)) return; sp = page_header(__pa(spte)); pfn = spte_to_pfn(*spte); @@ -1747,7 +1747,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, __func__, *shadow_pte, pt_access, write_fault, user_fault, gfn); - if (is_rmap_pte(*shadow_pte)) { + if (is_rmap_spte(*shadow_pte)) { /* * If we overwrite a PTE page pointer with a 2MB PMD, unlink * the parent of the now unreachable PTE. @@ -1783,7 +1783,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, page_header_update_slot(vcpu-kvm, shadow_pte, gfn); if (!was_rmapped) { rmap_add(vcpu, shadow_pte, gfn, largepage); - if (!is_rmap_pte(*shadow_pte)) + if (!is_rmap_spte(*shadow_pte)) kvm_release_pfn_clean(pfn); } else { if (was_writeble) @@ -1960,7 +1960,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) ASSERT(!VALID_PAGE(root)); if (vcpu-arch.mmu.root_level == PT32E_ROOT_LEVEL) { pdptr = kvm_pdptr_read(vcpu, i); - if (!is_present_pte(pdptr)) { + if (!is_present_gpte(pdptr)) { vcpu-arch.mmu.pae_root[i] = 0; continue; } @@ -2451,7 +2451,7 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, if ((bytes == 4) (gpa % 4 == 0)) memcpy((void *)gpte, new, 4); } - if (!is_present_pte(gpte)) + if (!is_present_gpte(gpte)) return; gfn = (gpte PT64_BASE_ADDR_MASK) PAGE_SHIFT; diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 3494a2f..016bf71 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -75,7 +75,7 @@ static inline int is_paging(struct kvm_vcpu *vcpu) return vcpu-arch.cr0 X86_CR0_PG; } -static inline int is_present_pte(unsigned long pte) +static inline int is_present_gpte(unsigned long pte) { return pte PT_PRESENT_MASK; } diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 4cb1dbf..238a193 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -132,7 +132,7 @@ walk: #if PTTYPE == 64 if (!is_long_mode(vcpu)) { pte = kvm_pdptr_read(vcpu, (addr 30) 3); - if (!is_present_pte(pte)) + if (!is_present_gpte(pte)) goto not_present; --walker-level; } @@ -155,7 +155,7 @@ walk: kvm_read_guest(vcpu-kvm, pte_gpa, pte, sizeof(pte)); - if (!is_present_pte(pte)) + if (!is_present_gpte(pte)) goto not_present; rsvd_fault = is_rsvd_bits_set(vcpu, pte, walker-level); @@ -205,7 +205,7 @@ walk: --walker-level; } - if (write_fault !is_dirty_pte(pte)) { + if (write_fault !is_dirty_gpte(pte)) { bool ret; mark_page_dirty(vcpu-kvm, table_gfn); @@ -252,7 +252,7 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page, gpte = *(const pt_element_t *)pte; if (~gpte (PT_PRESENT_MASK | PT_ACCESSED_MASK)) { - if (!is_present_pte(gpte)) + if (!is_present_gpte(gpte)) set_shadow_pte(spte, shadow_notrap_nonpresent_pte); return; } @@ -289,7 +289,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, pt_element_t curr_pte; struct
[COMMIT master] KVM: MMU: s/shadow_pte/spte/
From: Avi Kivity a...@redhat.com We use shadow_pte and spte inconsistently, switch to the shorter spelling. Rename set_shadow_pte() to __set_spte() to avoid a conflict with the existing set_spte(), and to indicate its lowlevelness. Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 415630d..abd3a17 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -143,7 +143,7 @@ module_param(oos_shadow, bool, 0644); #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level) struct kvm_rmap_desc { - u64 *shadow_ptes[RMAP_EXT]; + u64 *sptes[RMAP_EXT]; struct kvm_rmap_desc *more; }; @@ -262,7 +262,7 @@ static gfn_t pse36_gfn_delta(u32 gpte) return (gpte PT32_DIR_PSE36_MASK) shift; } -static void set_shadow_pte(u64 *sptep, u64 spte) +static void __set_spte(u64 *sptep, u64 spte) { #ifdef CONFIG_X86_64 set_64bit((unsigned long *)sptep, spte); @@ -510,21 +510,21 @@ static void rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn, int lpage) } else if (!(*rmapp 1)) { rmap_printk(rmap_add: %p %llx 1-many\n, spte, *spte); desc = mmu_alloc_rmap_desc(vcpu); - desc-shadow_ptes[0] = (u64 *)*rmapp; - desc-shadow_ptes[1] = spte; + desc-sptes[0] = (u64 *)*rmapp; + desc-sptes[1] = spte; *rmapp = (unsigned long)desc | 1; } else { rmap_printk(rmap_add: %p %llx many-many\n, spte, *spte); desc = (struct kvm_rmap_desc *)(*rmapp ~1ul); - while (desc-shadow_ptes[RMAP_EXT-1] desc-more) + while (desc-sptes[RMAP_EXT-1] desc-more) desc = desc-more; - if (desc-shadow_ptes[RMAP_EXT-1]) { + if (desc-sptes[RMAP_EXT-1]) { desc-more = mmu_alloc_rmap_desc(vcpu); desc = desc-more; } - for (i = 0; desc-shadow_ptes[i]; ++i) + for (i = 0; desc-sptes[i]; ++i) ; - desc-shadow_ptes[i] = spte; + desc-sptes[i] = spte; } } @@ -535,14 +535,14 @@ static void rmap_desc_remove_entry(unsigned long *rmapp, { int j; - for (j = RMAP_EXT - 1; !desc-shadow_ptes[j] j i; --j) + for (j = RMAP_EXT - 1; !desc-sptes[j] j i; --j) ; - desc-shadow_ptes[i] = desc-shadow_ptes[j]; - desc-shadow_ptes[j] = NULL; + desc-sptes[i] = desc-sptes[j]; + desc-sptes[j] = NULL; if (j != 0) return; if (!prev_desc !desc-more) - *rmapp = (unsigned long)desc-shadow_ptes[0]; + *rmapp = (unsigned long)desc-sptes[0]; else if (prev_desc) prev_desc-more = desc-more; @@ -587,8 +587,8 @@ static void rmap_remove(struct kvm *kvm, u64 *spte) desc = (struct kvm_rmap_desc *)(*rmapp ~1ul); prev_desc = NULL; while (desc) { - for (i = 0; i RMAP_EXT desc-shadow_ptes[i]; ++i) - if (desc-shadow_ptes[i] == spte) { + for (i = 0; i RMAP_EXT desc-sptes[i]; ++i) + if (desc-sptes[i] == spte) { rmap_desc_remove_entry(rmapp, desc, i, prev_desc); @@ -619,10 +619,10 @@ static u64 *rmap_next(struct kvm *kvm, unsigned long *rmapp, u64 *spte) prev_desc = NULL; prev_spte = NULL; while (desc) { - for (i = 0; i RMAP_EXT desc-shadow_ptes[i]; ++i) { + for (i = 0; i RMAP_EXT desc-sptes[i]; ++i) { if (prev_spte == spte) - return desc-shadow_ptes[i]; - prev_spte = desc-shadow_ptes[i]; + return desc-sptes[i]; + prev_spte = desc-sptes[i]; } desc = desc-more; } @@ -644,7 +644,7 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn) BUG_ON(!(*spte PT_PRESENT_MASK)); rmap_printk(rmap_write_protect: spte %p %llx\n, spte, *spte); if (is_writeble_pte(*spte)) { - set_shadow_pte(spte, *spte ~PT_WRITABLE_MASK); + __set_spte(spte, *spte ~PT_WRITABLE_MASK); write_protected = 1; } spte = rmap_next(kvm, rmapp, spte); @@ -668,7 +668,7 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn) if (is_writeble_pte(*spte)) { rmap_remove(kvm, spte); --kvm-stat.lpages; - set_shadow_pte(spte,
Re: TODO list for qemu+KVM networking performance v2
Rusty Russell wrote: On Fri, 5 Jun 2009 02:13:20 am Michael S. Tsirkin wrote: I out up a copy at http://www.linux-kvm.org/page/Networking_Performance as well, and intend to dump updates there from time to time. Hi Michael, Sorry for the delay. I'm weaning myself off my virtio work, but virtio_net performance is an issue which still needs lots of love. BTW a non-wiki on the wiki?. You should probably rename it to MST_Networking_Performance or allow editing :) - skbs in flight are kept in send queue linked list, so that we can flush them when device is removed [ mst: optimization idea: virtqueue already tracks posted buffers. Add flush/purge operation and use that instead? Interesting idea, but not really an optimization. (flush_buf() which does a get_buf() but for unused buffers). ] - skb is reformatted to scattergather format [ mst: idea to try: this does a copy for skb head, which might be costly especially for small/linear packets. Try to avoid this? Might need to tweak virtio interface. ] There's no copy here that I can see? - network driver adds the packet buffer on TX ring - network driver does a kick which causes a VM exit [ mst: any way to mitigate # of VM exits here? Possibly could be done on host side as well. ] [ markmc: All of our efforts there have been on the host side, I think that's preferable than trying to do anything on the guest side. ] The current theoretical hole is that the host suppresses notifications using the VIRTIO_AVAIL_F_NO_NOTIFY flag, but we can get a number of notifications in before it gets to that suppression. You can use a counter to improve this: you only notify when they're equal, and inc when you notify. That way you suppress further notifications even if the other side takes ages to wake up. In practice, this shouldn't be played with until we have full aio (or equiv in kernel) for other side: host xmit tends to be too fast at the moment and we get a notification per packet anyway. Xen ring has the exact optimization for ages. imho we should have it too, regardless of aio. It reduces #vmexits/spurious wakeups and it is very simple to implement. - Full queue: we keep a single extra skb around: if we fail to transmit, we queue it [ mst: idea to try: what does it do to performance if we queue more packets? ] Bad idea!! We already have two queues, this is a third. We should either stop the queue before it gets full, or fix TX_BUSY handling. I've been arguing on netdev for the latter (see thread[PATCH 2/4] virtio_net: return NETDEV_TX_BUSY instead of queueing an extra skb.). [ markmc: the queue might soon be going away: 200905292346.04815.ru...@rustcorp.com.au Ah, yep, that one. http://archive.netbsd.se/?ml=linux-netdeva=2009-05m=10788575 ] - We get each buffer from host as it is completed and free it - TX interrupts are only enabled when queue is stopped, and when it is originally created (we disable them on completion) [ mst: idea: second part is probably unintentional. todo: we probably should disable interrupts when device is created. ] Yep, minor wart. - We poll for buffer completions: 1. Before each TX 2. On a timer tasklet (unless 3 is supported) 3. When host sends us interrupt telling us that the queue is empty [ mst: idea to try: instead of empty, enable send interrupts on xmit when buffer is almost full (e.g. at least half empty): we are running out of buffers, it's important to free them ASAP. Can be done from host or from guest. ] [ Rusty proposing that we don't need (2) or (3) if the skbs are orphaned before start_xmit(). See subj net: skb_orphan on dev_hard_start_xmit.] [ rusty also seems to be suggesting that disabling VIRTIO_F_NOTIFY_ON_EMPTY on the host should help the case where the host out-paces the guest ] Yes, that's more fruitful. - Each skb has a 128 byte buffer at head and a single page for data. Only full pages are passed to virtio buffers. [ mst: for large packets, managing the 128 head buffers is wasted effort. Try allocating skbs on rcv path when needed. ]. [ mst: to clarify the previos suggestion: I am talking about merging here. We currently allocate skbs and pages for them. If a packet spans multiple pages, we discard the extra skbs. Instead, let's allocate pages but not skbs. Allocate and fill skbs on receive path. ] Yep. There's another issue here, which is alignment: packets which get placed into pages are misaligned (that 14 byte ethernet header). We should add a feature to allow the
RE: [PATCH RFC] qemu: fix hot remove assigned device
Gerd Hoffmann wrote: On 06/09/09 16:51, Paul Brook wrote: On Tuesday 09 June 2009, Han, Weidong wrote: Paul Brook wrote: On Monday 08 June 2009, Weidong Han wrote: When hot remove an assigned device, segmentation fault was triggered by qemu_free(pci_dev-qdev) in pci_unregister_device(). pci_register_device() doesn't initialize or set pci_dev-qdev. For an assigned device, qdev variable isn't touched at all. So segmentation fault happens when to free a non-initialized qdev. Better would be to just disable hot remove for devices still using the legacy pci_register_device API. PCI passthrough uses pci_register_device to register assigned device to qemu. Is there newer API to do so? Yes. See e.g. LSI scsi emulation. Well. Except that you can't (yet) register pci config read/write callbacks using the qdev-based API. So pci passthrough have to use pci_register_device now. I cooked a new patch (post below) to fix this issue. Thanks. When hot remove an assigned device, segmentation fault was triggered by qdev_free(pci_dev-qdev) in pci_unregister_device(). pci_register_device() doesn't initialize or set pci_dev-qdev. For an assigned device, qdev variable isn't touched at all. So segmentation fault happens when to free a non-initialized qdev. This patch adds a parameter in pci_unregister_device to check if it's an assigned device. For assgined device, free pci_dev instead of qdev. No changes for other devices. Signed-off-by: Weidong Han weidong@intel.com --- hw/device-assignment.c |2 +- hw/pci-hotplug.c |2 +- hw/pci.c |8 ++-- hw/pci.h |2 +- 4 files changed, 9 insertions(+), 5 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 624d15a..65920d0 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -581,7 +581,7 @@ static void free_assigned_device(AssignedDevInfo *adev) dev-real_device.config_fd = 0; } -pci_unregister_device(dev-dev); +pci_unregister_device(dev-dev, 1); #ifdef KVM_CAP_IRQ_ROUTING free_dev_irq_entries(dev); #endif diff --git a/hw/pci-hotplug.c b/hw/pci-hotplug.c index d7c8b84..18a4912 100644 --- a/hw/pci-hotplug.c +++ b/hw/pci-hotplug.c @@ -271,6 +271,6 @@ void pci_device_hot_remove_success(int pcibus, int slot) break; } -pci_unregister_device(d); +pci_unregister_device(d, 0); } diff --git a/hw/pci.c b/hw/pci.c index 25581a4..35fd064 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -363,7 +363,7 @@ static void pci_unregister_io_regions(PCIDevice *pci_dev) } } -int pci_unregister_device(PCIDevice *pci_dev) +int pci_unregister_device(PCIDevice *pci_dev, int assigned) { int ret = 0; @@ -377,7 +377,11 @@ int pci_unregister_device(PCIDevice *pci_dev) qemu_free_irqs(pci_dev-irq); pci_irq_index--; pci_dev-bus-devices[pci_dev-devfn] = NULL; -qdev_free(pci_dev-qdev); + +if (assigned) +qemu_free(pci_dev); +else +qdev_free(pci_dev-qdev); return 0; } diff --git a/hw/pci.h b/hw/pci.h index f2dccb5..f658e78 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -199,7 +199,7 @@ PCIDevice *pci_register_device(PCIBus *bus, const char *name, int instance_size, int devfn, PCIConfigReadFunc *config_read, PCIConfigWriteFunc *config_write); -int pci_unregister_device(PCIDevice *pci_dev); +int pci_unregister_device(PCIDevice *pci_dev, int assigned); void pci_register_io_region(PCIDevice *pci_dev, int region_num, uint32_t size, int type, -- 1.6.0.4-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG at mmu.c:615 from localhost migration using ept+hugetlbfs
Avi Kivity wrote: Not really. One thing, migration should transition the shadow pagetables from large pages to small ones, maybe that bit is broken. Maybe we're looking at a largepage spte and interpreting it as a normal L2 spte, and interpreting a guest page as the L1 spt. I tried to find where we drop the mmu (or at least large sptes for the slot) when we enable dirty logging, and failed. Maybe remove_write_access() is sufficient. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] qemu: fix hot remove assigned device
Han, Weidong wrote: -int pci_unregister_device(PCIDevice *pci_dev) +int pci_unregister_device(PCIDevice *pci_dev, int assigned) { int ret = 0; @@ -377,7 +377,11 @@ int pci_unregister_device(PCIDevice *pci_dev) qemu_free_irqs(pci_dev-irq); pci_irq_index--; pci_dev-bus-devices[pci_dev-devfn] = NULL; -qdev_free(pci_dev-qdev); + +if (assigned) +qemu_free(pci_dev); +else +qdev_free(pci_dev-qdev); return 0; } Can you check pci_dev-qdev instead of assigned? A little less ugly. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive
On 06/09/2009 05:44 PM, Michael Goldish wrote: The test looks pretty nicely written. Comments: 1. Consider making all the cloned VMs use image snapshots: curr_vm = vm1.clone() curr_vm.get_params()[extra_params] += -snapshot I'm not sure it's a good idea to let all VMs use the same disk image. Or maybe you shouldn't add -snapshot yourself, but rather do it in the config file for the first VM, and then all cloned VMs will have -snapshot as well. Yes I use 'image_snapshot = yes' in config file. 2. Consider changing the message Booting the %dth guest % num to Booting guest #%d % num (because there's no such thing as 2th and 3th) 3. Consider changing the message Cannot boot vm anylonger to Cannot create VM #%d % num 4. Why not add curr_vm to vms immediately after cloning it? That way you can kill it in the exception handler later, without having to send it a 'quit' if you can't login ('if not curr_vm_session'). Yes, good idea. 5. %dth guest boots up successfully % num -- again, 2th and 3th make no sense. Also, I wonder why you add those spaces before every info message. 6. %dth guest's session is not responsive -- same (maybe use Guest session #%d is not responsive % num) 7. Shut down the %dth guest -- same (maybe Shutting down guest #%d? or destroying/killing?) 8. Shouldn't we fail the test when we find an unresponsive session? It seems you just display an error message. You can simply replace logging.error( with raise error.TestFail(. 9. Consider using a stricter test than just vm_session.is_responsive(). vm_session.is_responsive() just sends ENTER to the sessions and returns True if it gets anything as a result (usually a prompt, or even just a newline echoed back). If the session passes this test it is indeed responsive, so it's a decent test, but maybe you can send some command (user configurable?) and test for some output. I'm really not sure this is important, because I can't imagine a session would respond to a newline but not to other commands, but who knows. Maybe you can send the first VM a user-specified command when the test begins, remember the output, and then send all other VMs the same command and make sure the output is the same. maybe use 'info status' and send command 'help' via session to vms and compare their output? 10. I'm not sure you should use the param kill_vm_gracefully because that's a postprocessor param (probably not your business). You can just call destroy() in the exception handler with gracefully=False, because if the VMs are non- responsive, I don't expect them to shutdown nicely with an SSH command (that's what gracefully does). Also, we're using -snapshot, so there's no reason to shut them down nicely. Yes, I agree. :) 11. Total number booted successfully: %d % (num - 1) -- why not just num? We really have num VMs including the first one. Or you can say: Total number booted successfully in addition to the first one but that's much longer. Since after the first guest booted, I set num = 1 and then 'num += 1' at first in while loop ( for the purpose of getting a new vm ). So curr_vm is vm2 ( num is 2) now. If the second vm failed to boot up, the num booted successfully should be (num - 1). I would use enumerate(vms) that Uri suggested to make number easier to count. 12. Consider adding a 'max_vms' (or 'threshold') user param to the test. If num reaches 'max_vms', we stop adding VMs and pass the test. Otherwise the test will always fail (which is depressing). If params.get(threshold) is None or , or in short -- 'if not params.get(threshold)', disable this feature and keep adding VMs forever. The user can enable the feature with: max_vms = 50 or disable it with: max_vms = This is a good idea for hardware resource limit of host. 13. Why are you catching OSError? If you get OSError it might be a framework bug. Since sometimes, vm.create() successfully but failed to ssh-login since the running python cannot allocate physical memory (OSError). Add max_vms could fix this problem I think. 14. At the end of the exception handler you should proably re-raise the exception you caught. Otherwise the user won't see the error message. You can simply replace 'break' with 'raise' (no parameters), and it should work, hopefully. Yes I should if add a 'max_vms'. I know these are quite a few comments, but they're all rather minor and the test is well written in my opinion. Thank you, I will do modification according to your and Uri's comments, and will re-submit it here later. :) Thanks and Best Regards, Yolkfull Thanks, Michael - Original Message - From: Yolkfull Chowyz...@redhat.com To:kvm@vger.kernel.org Cc: Uri Lublinu...@redhat.com Sent: Tuesday, June 9, 2009 11:41:54 AM (GMT+0200) Auto-Detected Subject: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive Hi, This test will boot VMs until one of them becomes unresponsive, and records the maximum
Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive
On 06/09/2009 08:45 PM, Uri Lublin wrote: On 06/09/2009 11:41 AM, Yolkfull Chow wrote: Hi, This test will boot VMs until one of them becomes unresponsive, and records the maximum number of VMs successfully started. Hello, Some more comments (in addition to previous comments by others) 1. Do not just send monitor command quit but use vm.destroy * This was mentioned by Michael, but in a different context. 2. Do not destroy main_vm (or vm1). We may want to run other tests on it. 3. You can use enumerate(vms) instead of looking for vm with index. 4. It would be nice to close all ssh sessions too. OK, I will do modification according to your comments, thank you so much. :) Best Regards, Yolkfull Regards, Uri. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH RFC] qemu: fix hot remove assigned device
Avi Kivity wrote: Han, Weidong wrote: -int pci_unregister_device(PCIDevice *pci_dev) +int pci_unregister_device(PCIDevice *pci_dev, int assigned) { int ret = 0; @@ -377,7 +377,11 @@ int pci_unregister_device(PCIDevice *pci_dev) qemu_free_irqs(pci_dev-irq); pci_irq_index--; pci_dev-bus-devices[pci_dev-devfn] = NULL; -qdev_free(pci_dev-qdev); + +if (assigned) +qemu_free(pci_dev); +else +qdev_free(pci_dev-qdev); return 0; } Can you check pci_dev-qdev instead of assigned? A little less ugly. I tried to find an easy and clean way to check it, but I found the members of struct PCIDevice and DeviceState were not suitable for this checking, and qdev is not pointer in struct PCIDevice. Regards, Weidong-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] qemu: fix hot remove assigned device
Han, Weidong wrote: I tried to find an easy and clean way to check it, but I found the members of struct PCIDevice and DeviceState were not suitable for this checking, and qdev is not pointer in struct PCIDevice. Yes, of course. I applied the patch, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] qemu: fix hot remove assigned device
On 06/10/09 10:31, Han, Weidong wrote: Avi Kivity wrote: Can you check pci_dev-qdev instead of assigned? A little less ugly. I tried to find an easy and clean way to check it, but I found the members of struct PCIDevice and DeviceState were not suitable for this checking, and qdev is not pointer in struct PCIDevice. Well, certain DeviceState pointers (type for example) are set only in case the device was created using the qdev api. I think you certainly should get away without adding a new parameter to pci_unregister_device. Also I've just sent a patch to the qemu-devel fixing the qdev register API for pci devices, allowing to register config space callbacks. Once this is merged you can switch pci passthrough to the new qdev API. cheers, Gerd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH RFC] qemu: fix hot remove assigned device
Gerd Hoffmann wrote: On 06/10/09 10:31, Han, Weidong wrote: Avi Kivity wrote: Can you check pci_dev-qdev instead of assigned? A little less ugly. I tried to find an easy and clean way to check it, but I found the members of struct PCIDevice and DeviceState were not suitable for this checking, and qdev is not pointer in struct PCIDevice. Well, certain DeviceState pointers (type for example) are set only in case the device was created using the qdev api. I think you certainly should get away without adding a new parameter to pci_unregister_device. Also I've just sent a patch to the qemu-devel fixing the qdev register API for pci devices, allowing to register config space callbacks. Once this is merged you can switch pci passthrough to the new qdev API. Good. Extending the qdev APIs is the most elegant solution. Thanks. Regards, Weidong -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: MCE: Fix compiling without CONFIG_X86_MCE
Huang Ying wrote: Enclose do_machine_check with #ifdef CONFIG_X86_MCE. Reported-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Huang Ying ying.hu...@intel.com Applied, thanks. I would have preferred to select X86_MCE instead, but that doesn't do anything (need to select X86_MCE_INTEL as well, and that becomes fragile). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/5] VMX EPT misconfigurtion handler
On Wednesday 10 June 2009 05:30:09 Marcelo Tosatti wrote: From the Intel docs: An EPT misconfiguration occurs when, in the course of translation a guest-physical address, the logical processor encounters an EPT paging-structure entry that contains an unsupported value. Handle this event and print useful information for diagnostics. Looks fine to me, thanks! -- regards Yang, Sheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/5] KVM: MMU: make for_each_shadow_entry aware of largepages
Marcelo Tosatti wrote: This way there is no need to add explicit checks in every for_each_shadow_entry user. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/kvm/mmu.c === --- kvm.orig/arch/x86/kvm/mmu.c +++ kvm/arch/x86/kvm/mmu.c @@ -1273,6 +1273,11 @@ static bool shadow_walk_okay(struct kvm_ { if (iterator-level PT_PAGE_TABLE_LEVEL) return false; + + if (iterator-level == PT_PAGE_TABLE_LEVEL) + if (is_large_pte(*iterator-sptep)) + return false; s/==//? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 2/4] udp: Handle large UFO packets from untrusted sources
On Tue, Jun 09, 2009 at 10:51:23PM -0700, Sridhar Samudrala wrote: Unfortunately, this doesn't work for UDP without any changes. skb_segment() currently adds transport header to each segmented skb. But when we are using IP fragmentation, only the first fragment should include the UDP header. We either need to fix skb_segment() to handle IP fragmentation or write a new skb_fragment(). I will look into this when i get some time. Couldn't you get around this by setting skb-transport_header to skb-network_header before getting into skb_segment? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 5/5] KVM: VMX: conditionally disable 2M pages
Marcelo Tosatti wrote: Disable usage of 2M pages if VMX_EPT_2MB_PAGE_BIT (bit 16) is clear in MSR_IA32_VMX_EPT_VPID_CAP and EPT is enabled. Index: kvm/virt/kvm/kvm_main.c === --- kvm.orig/virt/kvm/kvm_main.c +++ kvm/virt/kvm/kvm_main.c @@ -85,6 +85,8 @@ static long kvm_vcpu_ioctl(struct file * static bool kvm_rebooting; +static bool largepages_disabled = false; + #ifdef KVM_CAP_DEVICE_ASSIGNMENT static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head *head, int assigned_dev_id) @@ -1184,6 +1186,10 @@ int __kvm_set_memory_region(struct kvm * if ((base_gfn ^ ugfn) (KVM_PAGES_PER_HPAGE - 1)) for (i = 0; i largepages; ++i) new.lpage_info[i].write_count = 1; + + if (largepages_disabled) + for (i = 0; i largepages; ++i) + new.lpage_info[i].write_count = 1; } Please reuse the loop above, it's exactly the same (think of 1GB pages). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/5] KVM: MMU: make for_each_shadow_entry aware of largepages
Avi Kivity wrote: Marcelo Tosatti wrote: This way there is no need to add explicit checks in every for_each_shadow_entry user. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/kvm/mmu.c === --- kvm.orig/arch/x86/kvm/mmu.c +++ kvm/arch/x86/kvm/mmu.c @@ -1273,6 +1273,11 @@ static bool shadow_walk_okay(struct kvm_ { if (iterator-level PT_PAGE_TABLE_LEVEL) return false; + +if (iterator-level == PT_PAGE_TABLE_LEVEL) +if (is_large_pte(*iterator-sptep)) +return false; s/==//? Ah, it's actually fine. But changing == to = will make it 1GBpage-ready. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions
On Wed, Jun 10, 2009 at 12:19:42AM +0100, Paul Brook wrote: On Monday 25 May 2009, Michael S. Tsirkin wrote: Add functions implementing MSI-X support. First user will be virtio-pci. Note that platform must set a flag to declare MSI supported. For PC this will be set by APIC. This sounds wrong. The device shouldn't know or care whether the system has a MSI capable interrupt controller. That's for the guest OS to figure out. Paul You are right of course. In theory there's nothing that breaks if I set this flag to on, on all platforms. OTOH if qemu emulates some controller incorrectly, guest might misdetect MSI support in the controller, and things will break horribly. It seems safer to have a flag that can be enabled by people that know about a specific platform. What do you think? -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities
On Tue, Jun 09, 2009 at 02:11:14PM -0300, Glauber Costa wrote: On Fri, Jun 05, 2009 at 01:23:15PM +0300, Michael S. Tsirkin wrote: Add routines to manage PCI capability list. First user will be MSI-X. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/pci.c | 98 -- hw/pci.h | 18 +++- 2 files changed, 106 insertions(+), 10 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 361d741..ed011b5 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -130,12 +130,13 @@ void pci_device_save(PCIDevice *s, QEMUFile *f) int version = s-cap_present ? 3 : 2; int i; -qemu_put_be32(f, version); /* PCI device version */ +/* PCI device version and capabilities */ +qemu_put_be32(f, version); +if (version = 3) +qemu_put_be32(f, s-cap_present); qemu_put_buffer(f, s-config, 256); for (i = 0; i 4; i++) qemu_put_be32(f, s-irq_state[i]); -if (version = 3) -qemu_put_be32(f, s-cap_present); } What is it doing here? You should just do it right in the first patch, instead of doing in one way there, and fixing here. int pci_device_load(PCIDevice *s, QEMUFile *f) @@ -146,12 +147,6 @@ int pci_device_load(PCIDevice *s, QEMUFile *f) version_id = qemu_get_be32(f); if (version_id 3) return -EINVAL; -qemu_get_buffer(f, s-config, 256); -pci_update_mappings(s); - -if (version_id = 2) -for (i = 0; i 4; i ++) -s-irq_state[i] = qemu_get_be32(f); if (version_id = 3) s-cap_present = qemu_get_be32(f); else ditto. @@ -160,6 +155,18 @@ int pci_device_load(PCIDevice *s, QEMUFile *f) if (s-cap_present ~s-cap_supported) return -EINVAL; +qemu_get_buffer(f, s-config, 256); +pci_update_mappings(s); + +if (version_id = 2) +for (i = 0; i 4; i ++) +s-irq_state[i] = qemu_get_be32(f); +/* Clear wmask and used bits for capabilities. + Must be restored separately, since capabilities can + be placed anywhere in config space. */ +memset(s-used, 0, PCI_CONFIG_SPACE_SIZE); +for (i = PCI_CONFIG_HEADER_SIZE; i PCI_CONFIG_SPACE_SIZE; ++i) +s-wmask[i] = 0xff; return 0; } Sorry, I don't exactly understand it. Although it can be anywhere, what do we actually lose by keeping it at the same place in config space? We lose the ability to let user control the capabilities exposed by the device. And generally, I dislike arbitrary limitations. The PCI spec says the capability can be anywhere, implementing a linked list of caps is simple enough to not invent abritrary restrictions. @@ -870,3 +877,76 @@ PCIDevice *pci_create_simple(PCIBus *bus, int devfn, const char *name) return (PCIDevice *)dev; } + +static int pci_find_space(PCIDevice *pdev, uint8_t size) +{ +int offset = PCI_CONFIG_HEADER_SIZE; +int i; +for (i = PCI_CONFIG_HEADER_SIZE; i PCI_CONFIG_SPACE_SIZE; ++i) +if (pdev-used[i]) +offset = i + 1; +else if (i - offset + 1 == size) +return offset; +return 0; +} + +static uint8_t pci_find_capability_list(PCIDevice *pdev, uint8_t cap_id, +uint8_t *prev_p) +{ +uint8_t next, prev; + +if (!(pdev-config[PCI_STATUS] PCI_STATUS_CAP_LIST)) +return 0; + +for (prev = PCI_CAPABILITY_LIST; (next = pdev-config[prev]); + prev = next + PCI_CAP_LIST_NEXT) +if (pdev-config[next + PCI_CAP_LIST_ID] == cap_id) +break; + +*prev_p = prev; +return next; +} I'd prefer to do: if (prev_p) *prev_p = prev; so we don't have to always pass a prev_p pointer. You have yourself a user where you don't need it in this very patch. Good idea. + +/* Reserve space and add capability to the linked list in pci config space */ +int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size) +{ +uint8_t offset = pci_find_space(pdev, size); +uint8_t *config = pdev-config + offset; +if (!offset) +return -ENOSPC; +config[PCI_CAP_LIST_ID] = cap_id; +config[PCI_CAP_LIST_NEXT] = pdev-config[PCI_CAPABILITY_LIST]; +pdev-config[PCI_CAPABILITY_LIST] = offset; +pdev-config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; +memset(pdev-used + offset, 0xFF, size); +/* Make capability read-only by default */ +memset(pdev-wmask + offset, 0, size); +return offset; +} + +/* Unlink capability from the pci config space. */ +void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size) +{ +uint8_t prev, offset = pci_find_capability_list(pdev, cap_id, prev); +if (!offset) +return; +pdev-config[prev] = pdev-config[offset +
Re: [Qemu-devel] [PATCHv3 05/13] qemu: MSI-X support functions
On Tue, Jun 09, 2009 at 02:26:27PM -0300, Glauber Costa wrote: On Fri, Jun 05, 2009 at 01:23:31PM +0300, Michael S. Tsirkin wrote: Add functions implementing MSI-X support. First user will be virtio-pci. Note that platform must set a flag to declare MSI supported. For PC this will be set by APIC. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Makefile.target |2 +- hw/msix.c | 423 +++ hw/msix.h | 35 + hw/pci.h| 20 +++ 4 files changed, 479 insertions(+), 1 deletions(-) create mode 100644 hw/msix.c create mode 100644 hw/msix.h diff --git a/Makefile.target b/Makefile.target index 664a1e3..87b2859 100644 --- a/Makefile.target +++ b/Makefile.target @@ -486,7 +486,7 @@ endif #CONFIG_BSD_USER ifndef CONFIG_USER_ONLY OBJS=vl.o osdep.o monitor.o pci.o loader.o isa_mmio.o machine.o \ - gdbstub.o gdbstub-xml.o + gdbstub.o gdbstub-xml.o msix.o # virtio has to be here due to weird dependency between PCI and virtio-net. # need to fix this properly OBJS+=virtio-blk.o virtio-balloon.o virtio-net.o virtio-console.o diff --git a/hw/msix.c b/hw/msix.c new file mode 100644 index 000..1b5aec8 --- /dev/null +++ b/hw/msix.c @@ -0,0 +1,423 @@ +/* + * MSI-X device support + * + * This module includes support for MSI-X in pci devices. + * + * Author: Michael S. Tsirkin m...@redhat.com + * + * Copyright (c) 2009, Red Hat Inc, Michael S. Tsirkin (m...@redhat.com) + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + */ + +#include hw.h +#include msix.h +#include pci.h + +/* Declaration from linux/pci_regs.h */ +#define PCI_CAP_ID_MSIX 0x11 /* MSI-X */ +#define PCI_MSIX_FLAGS 2 /* Table at lower 11 bits */ +#define PCI_MSIX_FLAGS_QSIZE 0x7FF +#define PCI_MSIX_FLAGS_ENABLE (1 15) +#define PCI_MSIX_FLAGS_BIRMASK(7 0) + +/* MSI-X capability structure */ +#define MSIX_TABLE_OFFSET 4 +#define MSIX_PBA_OFFSET 8 +#define MSIX_CAP_LENGTH 12 + +/* MSI enable bit is in byte 1 in FLAGS register */ +#define MSIX_ENABLE_OFFSET (PCI_MSIX_FLAGS + 1) +#define MSIX_ENABLE_MASK (PCI_MSIX_FLAGS_ENABLE 8) + +/* MSI-X table format */ +#define MSIX_MSG_ADDR 0 +#define MSIX_MSG_UPPER_ADDR 4 +#define MSIX_MSG_DATA 8 +#define MSIX_VECTOR_CTRL 12 +#define MSIX_ENTRY_SIZE 16 +#define MSIX_VECTOR_MASK 0x1 + +/* How much space does an MSIX table need. */ +/* The spec requires giving the table structure + * a 4K aligned region all by itself. Align it to + * target pages so that drivers can do passthrough + * on the rest of the region. */ +#define MSIX_PAGE_SIZE TARGET_PAGE_ALIGN(0x1000) +/* Reserve second half of the page for pending bits */ +#define MSIX_PAGE_PENDING (MSIX_PAGE_SIZE / 2) +#define MSIX_MAX_ENTRIES 32 + + +#ifdef MSIX_DEBUG +#define DEBUG(fmt, ...) \ +do { \ + fprintf(stderr, %s: fmt, __func__ , __VA_ARGS__);\ +} while (0) +#else +#define DEBUG(fmt, ...) do { } while(0) +#endif + +/* Flag to globally disable MSI-X support */ +int msix_disable; + +/* Flag for interrupt controller to declare MSI-X support */ +int msix_supported; maybe better to make it static, It's not read-only either. and provide msi_state() returning -1 for disabled, 0 for supported, etc... Matter of taste, I prefer a set of binary flags rather than yet another enum: msix_disable is controlled by user, msix_supported is a safety valve for non-PC platforms. It's easier to keep them separate IMO. + +/* Add MSI-X capability to the config space for the device. */ +/* Given a bar and its size, add MSI-X table on top of it + * and fill MSI-X capability in the config space. + * Original bar size must be a power of 2 or 0. + * New bar size is returned. */ +static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries, + unsigned bar_nr, unsigned bar_size) +{ +int config_offset; +uint8_t *config; +uint32_t new_size; + +if (nentries 1 || nentries PCI_MSIX_FLAGS_QSIZE + 1) +return -EINVAL; +if (bar_size 0x8000) +return -ENOSPC; + +/* Add space for MSI-X structures */ +if (!bar_size) +new_size = MSIX_PAGE_SIZE; +else if (bar_size MSIX_PAGE_SIZE) { +bar_size = MSIX_PAGE_SIZE; +new_size = MSIX_PAGE_SIZE * 2; +} else +new_size = bar_size * 2; + +pdev-msix_bar_size = new_size; +config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH); +if (config_offset 0) +return config_offset; +config = pdev-config + config_offset; + +
Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive
- Yolkfull Chow yz...@redhat.com wrote: On 06/09/2009 05:44 PM, Michael Goldish wrote: The test looks pretty nicely written. Comments: 1. Consider making all the cloned VMs use image snapshots: curr_vm = vm1.clone() curr_vm.get_params()[extra_params] += -snapshot I'm not sure it's a good idea to let all VMs use the same disk image. Or maybe you shouldn't add -snapshot yourself, but rather do it in the config file for the first VM, and then all cloned VMs will have -snapshot as well. Yes I use 'image_snapshot = yes' in config file. 2. Consider changing the message Booting the %dth guest % num to Booting guest #%d % num (because there's no such thing as 2th and 3th) 3. Consider changing the message Cannot boot vm anylonger to Cannot create VM #%d % num 4. Why not add curr_vm to vms immediately after cloning it? That way you can kill it in the exception handler later, without having to send it a 'quit' if you can't login ('if not curr_vm_session'). Yes, good idea. 5. %dth guest boots up successfully % num -- again, 2th and 3th make no sense. Also, I wonder why you add those spaces before every info message. 6. %dth guest's session is not responsive -- same (maybe use Guest session #%d is not responsive % num) 7. Shut down the %dth guest -- same (maybe Shutting down guest #%d? or destroying/killing?) 8. Shouldn't we fail the test when we find an unresponsive session? It seems you just display an error message. You can simply replace logging.error( with raise error.TestFail(. 9. Consider using a stricter test than just vm_session.is_responsive(). vm_session.is_responsive() just sends ENTER to the sessions and returns True if it gets anything as a result (usually a prompt, or even just a newline echoed back). If the session passes this test it is indeed responsive, so it's a decent test, but maybe you can send some command (user configurable?) and test for some output. I'm really not sure this is important, because I can't imagine a session would respond to a newline but not to other commands, but who knows. Maybe you can send the first VM a user-specified command when the test begins, remember the output, and then send all other VMs the same command and make sure the output is the same. maybe use 'info status' and send command 'help' via session to vms and compare their output? I'm not sure I understand. What does 'info status' do? We're talking about an SSH shell, not the monitor. You can do whatever you like, like 'uname -a', and 'ls /', but you should leave it up to the user to decide, so he/she can specify different commands for different guests. Linux commands won't work under Windows, so Linux and Windows must have different commands in the config file. In the Linux section, under '- @Linux:' you can add something like: stress_boot: stress_boot_test_command = uname -a and under '- @Windows:': stress_boot: stress_boot_test_command = ver vol These commands are just naive suggestions. I'm sure someone can think of much more informative commands. 10. I'm not sure you should use the param kill_vm_gracefully because that's a postprocessor param (probably not your business). You can just call destroy() in the exception handler with gracefully=False, because if the VMs are non- responsive, I don't expect them to shutdown nicely with an SSH command (that's what gracefully does). Also, we're using -snapshot, so there's no reason to shut them down nicely. Yes, I agree. :) 11. Total number booted successfully: %d % (num - 1) -- why not just num? We really have num VMs including the first one. Or you can say: Total number booted successfully in addition to the first one but that's much longer. Since after the first guest booted, I set num = 1 and then 'num += 1' at first in while loop ( for the purpose of getting a new vm ). So curr_vm is vm2 ( num is 2) now. If the second vm failed to boot up, the num booted successfully should be (num - 1). I would use enumerate(vms) that Uri suggested to make number easier to count. OK, I didn't notice that. 12. Consider adding a 'max_vms' (or 'threshold') user param to the test. If num reaches 'max_vms', we stop adding VMs and pass the test. Otherwise the test will always fail (which is depressing). If params.get(threshold) is None or , or in short -- 'if not params.get(threshold)', disable this feature and keep adding VMs forever. The user can enable the feature with: max_vms = 50 or disable it with: max_vms = This is a good idea for hardware resource limit of host. 13. Why are you catching OSError? If you get OSError it might be a framework bug. Since sometimes, vm.create() successfully but failed to ssh-login since the running python cannot allocate physical memory (OSError). Add max_vms could fix this problem I
Re: [Qemu-devel] [PATCHv3 08/13] qemu: add support for resizing regions
On Tue, Jun 09, 2009 at 02:36:21PM -0300, Glauber Costa wrote: On Fri, Jun 05, 2009 at 01:23:55PM +0300, Michael S. Tsirkin wrote: Make it possible to resize PCI regions. This will be used by virtio with MSI-X, where the region size depends on whether MSI-X is enabled, and can change across load/save. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/pci.c | 54 -- hw/pci.h |3 +++ 2 files changed, 39 insertions(+), 18 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index ed011b5..042a216 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -392,6 +392,41 @@ void pci_register_io_region(PCIDevice *pci_dev, int region_num, *(uint32_t *)(pci_dev-wmask + addr) = cpu_to_le32(wmask); } +static void pci_unmap_region(PCIDevice *d, PCIIORegion *r) +{ +if (r-addr == -1) +return; +if (r-type PCI_ADDRESS_SPACE_IO) { +int class; +/* NOTE: specific hack for IDE in PC case: + only one byte must be mapped. */ +class = pci_get_word(d-config + PCI_CLASS_DEVICE); +if (class == 0x0101 r-size == 4) { +isa_unassign_ioport(r-addr + 2, 1); +} else { +isa_unassign_ioport(r-addr, r-size); +} +} else { +cpu_register_physical_memory(pci_to_cpu_addr(r-addr), + r-size, + IO_MEM_UNASSIGNED); +qemu_unregister_coalesced_mmio(r-addr, r-size); +} +} + this is a good cleanup... +void pci_resize_io_region(PCIDevice *pci_dev, int region_num, + uint32_t size) +{ + +PCIIORegion *r = pci_dev-io_regions[region_num]; +if (r-size == size) +return; +r-size = size; +pci_unmap_region(pci_dev, r); +r-addr = -1; +pci_update_mappings(pci_dev); +} + but the only user of this one seem to be commented out, and later removed. Why is this needed? Um, I think this needs to be called on load: virtio has a memmory region if and only if it has MSI-X. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCHv3 12/13] qemu: virtio save/load bindings
On Tue, Jun 09, 2009 at 02:45:54PM -0300, Glauber Costa wrote: duplicated save config. diff --git a/hw/virtio.h b/hw/virtio.h index 04a3c3d..ce05517 100644 --- a/hw/virtio.h +++ b/hw/virtio.h @@ -72,6 +72,10 @@ typedef struct VirtQueueElement typedef struct { void (*notify)(void * opaque, uint16_t vector); +void (*save_config)(void * opaque, QEMUFile *f); +void (*save_queue)(void * opaque, int n, QEMUFile *f); +int (*load_config)(void * opaque, QEMUFile *f); +int (*load_queue)(void * opaque, int n, QEMUFile *f); } VirtIOBindings; So, what's the overall effect on a virtual machine that gets migrated, of a certain device not implementing one of those functions? Those are implemented by a transport (e.g. virtio_pci) not the device. Will it work? Will it break? It will work - assuming there's nothing transport-specific you need to save and load. If there is - this patch is not breaking anything that isn't already broken ... -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive
On 06/10/2009 06:03 PM, Michael Goldish wrote: - Yolkfull Chowyz...@redhat.com wrote: On 06/09/2009 05:44 PM, Michael Goldish wrote: The test looks pretty nicely written. Comments: 1. Consider making all the cloned VMs use image snapshots: curr_vm = vm1.clone() curr_vm.get_params()[extra_params] += -snapshot I'm not sure it's a good idea to let all VMs use the same disk image. Or maybe you shouldn't add -snapshot yourself, but rather do it in the config file for the first VM, and then all cloned VMs will have -snapshot as well. Yes I use 'image_snapshot = yes' in config file. 2. Consider changing the message Booting the %dth guest % num to Booting guest #%d % num (because there's no such thing as 2th and 3th) 3. Consider changing the message Cannot boot vm anylonger to Cannot create VM #%d % num 4. Why not add curr_vm to vms immediately after cloning it? That way you can kill it in the exception handler later, without having to send it a 'quit' if you can't login ('if not curr_vm_session'). Yes, good idea. 5. %dth guest boots up successfully % num -- again, 2th and 3th make no sense. Also, I wonder why you add those spaces before every info message. 6. %dth guest's session is not responsive -- same (maybe use Guest session #%d is not responsive % num) 7. Shut down the %dth guest -- same (maybe Shutting down guest #%d? or destroying/killing?) 8. Shouldn't we fail the test when we find an unresponsive session? It seems you just display an error message. You can simply replace logging.error( with raise error.TestFail(. 9. Consider using a stricter test than just vm_session.is_responsive(). vm_session.is_responsive() just sends ENTER to the sessions and returns True if it gets anything as a result (usually a prompt, or even just a newline echoed back). If the session passes this test it is indeed responsive, so it's a decent test, but maybe you can send some command (user configurable?) and test for some output. I'm really not sure this is important, because I can't imagine a session would respond to a newline but not to other commands, but who knows. Maybe you can send the first VM a user-specified command when the test begins, remember the output, and then send all other VMs the same command and make sure the output is the same. maybe use 'info status' and send command 'help' via session to vms and compare their output? I'm not sure I understand. What does 'info status' do? We're talking about an SSH shell, not the monitor. You can do whatever you like, like 'uname -a', and 'ls /', but you should leave it up to the user to decide, so he/she can specify different commands for different guests. Linux commands won't work under Windows, so Linux and Windows must have different commands in the config file. In the Linux section, under '- @Linux:' you can add something like: stress_boot: stress_boot_test_command = uname -a and under '- @Windows:': stress_boot: stress_boot_test_command = ver vol These commands are just naive suggestions. I'm sure someone can think of much more informative commands. That's really good suggestions. Thanks, Michael. And can I use 'migration_test_command' instead? 10. I'm not sure you should use the param kill_vm_gracefully because that's a postprocessor param (probably not your business). You can just call destroy() in the exception handler with gracefully=False, because if the VMs are non- responsive, I don't expect them to shutdown nicely with an SSH command (that's what gracefully does). Also, we're using -snapshot, so there's no reason to shut them down nicely. Yes, I agree. :) 11. Total number booted successfully: %d % (num - 1) -- why not just num? We really have num VMs including the first one. Or you can say: Total number booted successfully in addition to the first one but that's much longer. Since after the first guest booted, I set num = 1 and then 'num += 1' at first in while loop ( for the purpose of getting a new vm ). So curr_vm is vm2 ( num is 2) now. If the second vm failed to boot up, the num booted successfully should be (num - 1). I would use enumerate(vms) that Uri suggested to make number easier to count. OK, I didn't notice that. 12. Consider adding a 'max_vms' (or 'threshold') user param to the test. If num reaches 'max_vms', we stop adding VMs and pass the test. Otherwise the test will always fail (which is depressing). If params.get(threshold) is None or , or in short -- 'if not params.get(threshold)', disable
Re: [PATCH 0/4] Arguments to skip boot menu and to hide bios output
Just wondered... what version of qemu was this for, has it gotten anywhere, etc? I've tried to apply these patches to 0.10.5, 0.10.0, 0.9.1 and the current Git (as of June 10 2009) but none seem to include rombios.c or vgabios.c :( Thanks! -dav7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] kvm-kmod-2.6.30
This is a package containing the kvm external module, based on the 2.6.30 series. kvm-kmod-2.6.30 contains the kvm code that is present in Linux 2.6.30, except that it can run on older kernels. It is a good companion to the qemu-kvm-0.10 series. Note that performance and features will vary with the host kernel used. Changes from kvm-kmod-2.6.30-rc8 - update source to Linux 2.6.30 - fix oops during host reboot if CONFIG_MAXSMP is selected - backport new required API zalloc_cpumask_var() http://www.linux-kvm.org -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCHv3 08/13] qemu: add support for resizing regions
On Tue, Jun 09, 2009 at 02:36:21PM -0300, Glauber Costa wrote: On Fri, Jun 05, 2009 at 01:23:55PM +0300, Michael S. Tsirkin wrote: Make it possible to resize PCI regions. This will be used by virtio with MSI-X, where the region size depends on whether MSI-X is enabled, and can change across load/save. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/pci.c | 54 -- hw/pci.h |3 +++ 2 files changed, 39 insertions(+), 18 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index ed011b5..042a216 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -392,6 +392,41 @@ void pci_register_io_region(PCIDevice *pci_dev, int region_num, *(uint32_t *)(pci_dev-wmask + addr) = cpu_to_le32(wmask); } +static void pci_unmap_region(PCIDevice *d, PCIIORegion *r) +{ +if (r-addr == -1) +return; +if (r-type PCI_ADDRESS_SPACE_IO) { +int class; +/* NOTE: specific hack for IDE in PC case: + only one byte must be mapped. */ +class = pci_get_word(d-config + PCI_CLASS_DEVICE); +if (class == 0x0101 r-size == 4) { +isa_unassign_ioport(r-addr + 2, 1); +} else { +isa_unassign_ioport(r-addr, r-size); +} +} else { +cpu_register_physical_memory(pci_to_cpu_addr(r-addr), + r-size, + IO_MEM_UNASSIGNED); +qemu_unregister_coalesced_mmio(r-addr, r-size); +} +} + this is a good cleanup... +void pci_resize_io_region(PCIDevice *pci_dev, int region_num, + uint32_t size) +{ + +PCIIORegion *r = pci_dev-io_regions[region_num]; +if (r-size == size) +return; +r-size = size; +pci_unmap_region(pci_dev, r); +r-addr = -1; +pci_update_mappings(pci_dev); +} + but the only user of this one seem to be commented out, and later removed. Why is this needed? This was the missing bit: Set correct size for msi-x memory region when loading the device. Signed-off-by: Michael S. Tsirkin m...@redhat.com diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 589fbb1..f657364 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -133,6 +133,8 @@ static int virtio_pci_load_config(void * opaque, QEMUFile *f) return ret; if (msix_present(proxy-pci_dev)) qemu_get_be16s(f, proxy-vdev-config_vector); + +pci_resize_io_region(proxy-pci_dev, 1, msix_bar_size(proxy-pci_dev)); return 0; } -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [KVM-AUTOTEST PATCH 1/4] Make all programs on kvm test use /usr/bin/python
Even better would be to use /usr/bin/python2. This is because future distros will include python3, which is incompatible with python2 code. python will be symlink of python3. -Alexey -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] pte accessor fixes
A minor fix to the pte accessors, followed by a cosmetic fix so we don't hit the same problem next time, followed by a cosmetic patch just because I had my search-and-replace warmed up. Please review. Avi Kivity (3): KVM: MMU: Fix is_dirty_pte() KVM: MMU: Adjust pte accessors to explicitly indicate guest or shadow pte KVM: MMU: s/shadow_pte/spte/ arch/x86/kvm/mmu.c | 116 ++-- arch/x86/kvm/mmu.h |2 +- arch/x86/kvm/paging_tmpl.h | 38 +++--- arch/x86/kvm/x86.c |2 +- 4 files changed, 79 insertions(+), 79 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] KVM: MMU: Fix is_dirty_pte()
is_dirty_pte() is used on guest ptes, not shadow ptes, so it needs to avoid shadow_dirty_mask and use PT_DIRTY_MASK instead. Misdetecting dirty pages could lead to unnecessarily setting the dirty bit under EPT. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/mmu.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 809cce0..8e5003c 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -242,7 +242,7 @@ static int is_writeble_pte(unsigned long pte) static int is_dirty_pte(unsigned long pte) { - return pte shadow_dirty_mask; + return pte PT_DIRTY_MASK; } static int is_rmap_pte(u64 pte) -- 1.6.0.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] KVM: MMU: Adjust pte accessors to explicitly indicate guest or shadow pte
Since the guest and host ptes can have wildly different format, adjust the pte accessor names to indicate on which type of pte they operate on. No functional changes. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/mmu.c | 16 arch/x86/kvm/mmu.h |2 +- arch/x86/kvm/paging_tmpl.h | 22 +++--- arch/x86/kvm/x86.c |2 +- 4 files changed, 21 insertions(+), 21 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 8e5003c..415630d 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -240,12 +240,12 @@ static int is_writeble_pte(unsigned long pte) return pte PT_WRITABLE_MASK; } -static int is_dirty_pte(unsigned long pte) +static int is_dirty_gpte(unsigned long pte) { return pte PT_DIRTY_MASK; } -static int is_rmap_pte(u64 pte) +static int is_rmap_spte(u64 pte) { return is_shadow_present_pte(pte); } @@ -498,7 +498,7 @@ static void rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn, int lpage) unsigned long *rmapp; int i; - if (!is_rmap_pte(*spte)) + if (!is_rmap_spte(*spte)) return; gfn = unalias_gfn(vcpu-kvm, gfn); sp = page_header(__pa(spte)); @@ -560,7 +560,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte) unsigned long *rmapp; int i; - if (!is_rmap_pte(*spte)) + if (!is_rmap_spte(*spte)) return; sp = page_header(__pa(spte)); pfn = spte_to_pfn(*spte); @@ -1747,7 +1747,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, __func__, *shadow_pte, pt_access, write_fault, user_fault, gfn); - if (is_rmap_pte(*shadow_pte)) { + if (is_rmap_spte(*shadow_pte)) { /* * If we overwrite a PTE page pointer with a 2MB PMD, unlink * the parent of the now unreachable PTE. @@ -1783,7 +1783,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, page_header_update_slot(vcpu-kvm, shadow_pte, gfn); if (!was_rmapped) { rmap_add(vcpu, shadow_pte, gfn, largepage); - if (!is_rmap_pte(*shadow_pte)) + if (!is_rmap_spte(*shadow_pte)) kvm_release_pfn_clean(pfn); } else { if (was_writeble) @@ -1960,7 +1960,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) ASSERT(!VALID_PAGE(root)); if (vcpu-arch.mmu.root_level == PT32E_ROOT_LEVEL) { pdptr = kvm_pdptr_read(vcpu, i); - if (!is_present_pte(pdptr)) { + if (!is_present_gpte(pdptr)) { vcpu-arch.mmu.pae_root[i] = 0; continue; } @@ -2451,7 +2451,7 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, if ((bytes == 4) (gpa % 4 == 0)) memcpy((void *)gpte, new, 4); } - if (!is_present_pte(gpte)) + if (!is_present_gpte(gpte)) return; gfn = (gpte PT64_BASE_ADDR_MASK) PAGE_SHIFT; diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 3494a2f..016bf71 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -75,7 +75,7 @@ static inline int is_paging(struct kvm_vcpu *vcpu) return vcpu-arch.cr0 X86_CR0_PG; } -static inline int is_present_pte(unsigned long pte) +static inline int is_present_gpte(unsigned long pte) { return pte PT_PRESENT_MASK; } diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 4cb1dbf..238a193 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -132,7 +132,7 @@ walk: #if PTTYPE == 64 if (!is_long_mode(vcpu)) { pte = kvm_pdptr_read(vcpu, (addr 30) 3); - if (!is_present_pte(pte)) + if (!is_present_gpte(pte)) goto not_present; --walker-level; } @@ -155,7 +155,7 @@ walk: kvm_read_guest(vcpu-kvm, pte_gpa, pte, sizeof(pte)); - if (!is_present_pte(pte)) + if (!is_present_gpte(pte)) goto not_present; rsvd_fault = is_rsvd_bits_set(vcpu, pte, walker-level); @@ -205,7 +205,7 @@ walk: --walker-level; } - if (write_fault !is_dirty_pte(pte)) { + if (write_fault !is_dirty_gpte(pte)) { bool ret; mark_page_dirty(vcpu-kvm, table_gfn); @@ -252,7 +252,7 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page, gpte = *(const pt_element_t *)pte; if (~gpte (PT_PRESENT_MASK | PT_ACCESSED_MASK)) { - if (!is_present_pte(gpte)) + if (!is_present_gpte(gpte))
[PATCH 3/3] KVM: MMU: s/shadow_pte/spte/
We use shadow_pte and spte inconsistently, switch to the shorter spelling. Rename set_shadow_pte() to __set_spte() to avoid a conflict with the existing set_spte(), and to indicate its lowlevelness. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/mmu.c | 102 ++-- arch/x86/kvm/paging_tmpl.h | 16 +++--- 2 files changed, 59 insertions(+), 59 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 415630d..abd3a17 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -143,7 +143,7 @@ module_param(oos_shadow, bool, 0644); #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level) struct kvm_rmap_desc { - u64 *shadow_ptes[RMAP_EXT]; + u64 *sptes[RMAP_EXT]; struct kvm_rmap_desc *more; }; @@ -262,7 +262,7 @@ static gfn_t pse36_gfn_delta(u32 gpte) return (gpte PT32_DIR_PSE36_MASK) shift; } -static void set_shadow_pte(u64 *sptep, u64 spte) +static void __set_spte(u64 *sptep, u64 spte) { #ifdef CONFIG_X86_64 set_64bit((unsigned long *)sptep, spte); @@ -510,21 +510,21 @@ static void rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn, int lpage) } else if (!(*rmapp 1)) { rmap_printk(rmap_add: %p %llx 1-many\n, spte, *spte); desc = mmu_alloc_rmap_desc(vcpu); - desc-shadow_ptes[0] = (u64 *)*rmapp; - desc-shadow_ptes[1] = spte; + desc-sptes[0] = (u64 *)*rmapp; + desc-sptes[1] = spte; *rmapp = (unsigned long)desc | 1; } else { rmap_printk(rmap_add: %p %llx many-many\n, spte, *spte); desc = (struct kvm_rmap_desc *)(*rmapp ~1ul); - while (desc-shadow_ptes[RMAP_EXT-1] desc-more) + while (desc-sptes[RMAP_EXT-1] desc-more) desc = desc-more; - if (desc-shadow_ptes[RMAP_EXT-1]) { + if (desc-sptes[RMAP_EXT-1]) { desc-more = mmu_alloc_rmap_desc(vcpu); desc = desc-more; } - for (i = 0; desc-shadow_ptes[i]; ++i) + for (i = 0; desc-sptes[i]; ++i) ; - desc-shadow_ptes[i] = spte; + desc-sptes[i] = spte; } } @@ -535,14 +535,14 @@ static void rmap_desc_remove_entry(unsigned long *rmapp, { int j; - for (j = RMAP_EXT - 1; !desc-shadow_ptes[j] j i; --j) + for (j = RMAP_EXT - 1; !desc-sptes[j] j i; --j) ; - desc-shadow_ptes[i] = desc-shadow_ptes[j]; - desc-shadow_ptes[j] = NULL; + desc-sptes[i] = desc-sptes[j]; + desc-sptes[j] = NULL; if (j != 0) return; if (!prev_desc !desc-more) - *rmapp = (unsigned long)desc-shadow_ptes[0]; + *rmapp = (unsigned long)desc-sptes[0]; else if (prev_desc) prev_desc-more = desc-more; @@ -587,8 +587,8 @@ static void rmap_remove(struct kvm *kvm, u64 *spte) desc = (struct kvm_rmap_desc *)(*rmapp ~1ul); prev_desc = NULL; while (desc) { - for (i = 0; i RMAP_EXT desc-shadow_ptes[i]; ++i) - if (desc-shadow_ptes[i] == spte) { + for (i = 0; i RMAP_EXT desc-sptes[i]; ++i) + if (desc-sptes[i] == spte) { rmap_desc_remove_entry(rmapp, desc, i, prev_desc); @@ -619,10 +619,10 @@ static u64 *rmap_next(struct kvm *kvm, unsigned long *rmapp, u64 *spte) prev_desc = NULL; prev_spte = NULL; while (desc) { - for (i = 0; i RMAP_EXT desc-shadow_ptes[i]; ++i) { + for (i = 0; i RMAP_EXT desc-sptes[i]; ++i) { if (prev_spte == spte) - return desc-shadow_ptes[i]; - prev_spte = desc-shadow_ptes[i]; + return desc-sptes[i]; + prev_spte = desc-sptes[i]; } desc = desc-more; } @@ -644,7 +644,7 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn) BUG_ON(!(*spte PT_PRESENT_MASK)); rmap_printk(rmap_write_protect: spte %p %llx\n, spte, *spte); if (is_writeble_pte(*spte)) { - set_shadow_pte(spte, *spte ~PT_WRITABLE_MASK); + __set_spte(spte, *spte ~PT_WRITABLE_MASK); write_protected = 1; } spte = rmap_next(kvm, rmapp, spte); @@ -668,7 +668,7 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn) if (is_writeble_pte(*spte)) {
Re: [Qemu-devel] [PATCHv3 12/13] qemu: virtio save/load bindings
On Tue, Jun 09, 2009 at 02:45:54PM -0300, Glauber Costa wrote: duplicated save config. Good catch. Fixed. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: KVM VServer coexistence
Should work, we use KVM and openVZ in Proxmox VE. Best Regards, Martin Maurer mar...@proxmox.com http://pve.proxmox.com -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Ajay EP Sent: Dienstag, 09. Juni 2009 15:37 To: kvm@vger.kernel.org Subject: KVM VServer coexistence Hi , I would like to know if both container based (Vserver ) and para virtualization (KVM) co-exist at the same time on the host machine ( INTEL VT) ? Thanks Ajay __ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive
- Yolkfull Chow yz...@redhat.com wrote: On 06/10/2009 06:03 PM, Michael Goldish wrote: - Yolkfull Chowyz...@redhat.com wrote: On 06/09/2009 05:44 PM, Michael Goldish wrote: The test looks pretty nicely written. Comments: 1. Consider making all the cloned VMs use image snapshots: curr_vm = vm1.clone() curr_vm.get_params()[extra_params] += -snapshot I'm not sure it's a good idea to let all VMs use the same disk image. Or maybe you shouldn't add -snapshot yourself, but rather do it in the config file for the first VM, and then all cloned VMs will have -snapshot as well. Yes I use 'image_snapshot = yes' in config file. 2. Consider changing the message Booting the %dth guest % num to Booting guest #%d % num (because there's no such thing as 2th and 3th) 3. Consider changing the message Cannot boot vm anylonger to Cannot create VM #%d % num 4. Why not add curr_vm to vms immediately after cloning it? That way you can kill it in the exception handler later, without having to send it a 'quit' if you can't login ('if not curr_vm_session'). Yes, good idea. 5. %dth guest boots up successfully % num -- again, 2th and 3th make no sense. Also, I wonder why you add those spaces before every info message. 6. %dth guest's session is not responsive -- same (maybe use Guest session #%d is not responsive % num) 7. Shut down the %dth guest -- same (maybe Shutting down guest #%d? or destroying/killing?) 8. Shouldn't we fail the test when we find an unresponsive session? It seems you just display an error message. You can simply replace logging.error( with raise error.TestFail(. 9. Consider using a stricter test than just vm_session.is_responsive(). vm_session.is_responsive() just sends ENTER to the sessions and returns True if it gets anything as a result (usually a prompt, or even just a newline echoed back). If the session passes this test it is indeed responsive, so it's a decent test, but maybe you can send some command (user configurable?) and test for some output. I'm really not sure this is important, because I can't imagine a session would respond to a newline but not to other commands, but who knows. Maybe you can send the first VM a user-specified command when the test begins, remember the output, and then send all other VMs the same command and make sure the output is the same. maybe use 'info status' and send command 'help' via session to vms and compare their output? I'm not sure I understand. What does 'info status' do? We're talking about an SSH shell, not the monitor. You can do whatever you like, like 'uname -a', and 'ls /', but you should leave it up to the user to decide, so he/she can specify different commands for different guests. Linux commands won't work under Windows, so Linux and Windows must have different commands in the config file. In the Linux section, under '- @Linux:' you can add something like: stress_boot: stress_boot_test_command = uname -a and under '- @Windows:': stress_boot: stress_boot_test_command = ver vol These commands are just naive suggestions. I'm sure someone can think of much more informative commands. That's really good suggestions. Thanks, Michael. And can I use 'migration_test_command' instead? Not really. Why would you want to use another test's param? 1. There's no guarantee that 'migration_test_command' is defined for your boot stress test. In fact, it is probably only defined for migration tests, so you probably won't be able to access it. Try params.get('migration_test_command') in your test and you'll probably get None. 2. The user may not want to run migration at all, and then he/she will probably not define 'migration_test_command'. 3. The user might want to use different test commands for migration and for the boot stress test. 10. I'm not sure you should use the param kill_vm_gracefully because that's a postprocessor param (probably not your business). You can just call destroy() in the exception handler with gracefully=False, because if the VMs are non- responsive, I don't expect them to shutdown nicely with an SSH command (that's what gracefully does). Also, we're using -snapshot, so there's no reason to shut them down nicely. Yes, I agree. :) 11. Total number booted successfully: %d % (num - 1) -- why not just num? We really
[PATCH 0/2] *** SUBJECT HERE ***
RFC: move the dirty page tracking to use dirty bit Well, i was bored this morning and had this idea for a while, didnt test it to much..., first i want to hear what ppl think? Thanks. Izik Eidus (2): kvm: fix dirty bit tracking for slots with large pages kvm: change the dirty page tracking to work with dirty bit instead of page fault arch/ia64/kvm/kvm-ia64.c|4 arch/powerpc/kvm/powerpc.c |4 arch/s390/kvm/kvm-s390.c|4 arch/x86/include/asm/kvm_host.h |3 +++ arch/x86/kvm/mmu.c | 32 +--- arch/x86/kvm/svm.c |7 +++ arch/x86/kvm/vmx.c |7 +++ arch/x86/kvm/x86.c | 21 ++--- include/linux/kvm_host.h|1 + virt/kvm/kvm_main.c | 17 - 10 files changed, 89 insertions(+), 11 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] kvm: fix dirty bit tracking for slots with large pages
When slot is already allocted and being asked to be tracked we need to break the large pages. This code flush the mmu when someone ask a slot to start dirty bit tracking. Signed-off-by: Izik Eidus iei...@redhat.com --- virt/kvm/kvm_main.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 669eb4a..4a60c72 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1160,6 +1160,8 @@ int __kvm_set_memory_region(struct kvm *kvm, new.userspace_addr = mem-userspace_addr; else new.userspace_addr = 0; + + kvm_arch_flush_shadow(kvm); } if (npages !new.lpage_info) { largepages = 1 + (base_gfn + npages - 1) / KVM_PAGES_PER_HPAGE; -- 1.5.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] kvm: change the dirty page tracking to work with dirty bit instead of page fault
right now the dirty page tracking work with the help of page faults, when we want to track a page for being dirty, we write protect it and we mark it dirty when we have write page fault, this code move into looking at the dirty bit of the spte. Signed-off-by: Izik Eidus iei...@redhat.com --- arch/ia64/kvm/kvm-ia64.c|4 arch/powerpc/kvm/powerpc.c |4 arch/s390/kvm/kvm-s390.c|4 arch/x86/include/asm/kvm_host.h |3 +++ arch/x86/kvm/mmu.c | 32 +--- arch/x86/kvm/svm.c |7 +++ arch/x86/kvm/vmx.c |7 +++ arch/x86/kvm/x86.c | 21 ++--- include/linux/kvm_host.h|1 + virt/kvm/kvm_main.c | 15 ++- 10 files changed, 87 insertions(+), 11 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 3199221..5914128 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -1809,6 +1809,10 @@ void kvm_arch_exit(void) kvm_vmm_info = NULL; } +void kvm_arch_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot) +{ +} + static int kvm_ia64_sync_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) { diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 2cf915e..6beb368 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -418,6 +418,10 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) return -ENOTSUPP; } +void kvm_arch_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot) +{ +} + long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 981ab04..ab6f115 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -130,6 +130,10 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, return 0; } +void kvm_arch_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot) +{ +} + long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c7b0cc2..8a24149 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -527,6 +527,7 @@ struct kvm_x86_ops { int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); int (*get_tdp_level)(void); u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); + int (*dirty_bit_support)(void); }; extern struct kvm_x86_ops *kvm_x86_ops; @@ -796,4 +797,6 @@ int kvm_unmap_hva(struct kvm *kvm, unsigned long hva); int kvm_age_hva(struct kvm *kvm, unsigned long hva); int cpuid_maxphyaddr(struct kvm_vcpu *vcpu); +int is_dirty_and_clean_rmapp(struct kvm *kvm, unsigned long *rmapp); + #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 809cce0..3ec6a7d 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -140,6 +140,8 @@ module_param(oos_shadow, bool, 0644); #define ACC_USER_MASKPT_USER_MASK #define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK) +#define SPTE_DONT_DIRTY (1ULL PT_FIRST_AVAIL_BITS_SHIFT) + #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level) struct kvm_rmap_desc { @@ -629,6 +631,25 @@ static u64 *rmap_next(struct kvm *kvm, unsigned long *rmapp, u64 *spte) return NULL; } +int is_dirty_and_clean_rmapp(struct kvm *kvm, unsigned long *rmapp) +{ + u64 *spte; + int dirty = 0; + + spte = rmap_next(kvm, rmapp, NULL); + while (spte) { + if (*spte PT_DIRTY_MASK) { + set_shadow_pte(spte, (*spte = ~PT_DIRTY_MASK) | + SPTE_DONT_DIRTY); + dirty = 1; + } + spte = rmap_next(kvm, rmapp, spte); + } + + return dirty; +} + + static int rmap_write_protect(struct kvm *kvm, u64 gfn) { unsigned long *rmapp; @@ -1676,7 +1697,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, * whether the guest actually used the pte (in order to detect * demand paging). */ - spte = shadow_base_present_pte | shadow_dirty_mask; + spte = shadow_base_present_pte; + if (!(spte SPTE_DONT_DIRTY)) + spte |= shadow_dirty_mask; + if (!speculative) spte |= shadow_accessed_mask; if (!dirty) @@ -1725,8 +1749,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, } } - if (pte_access ACC_WRITE_MASK) - mark_page_dirty(vcpu-kvm, gfn); + if (!shadow_dirty_mask) { + if (pte_access ACC_WRITE_MASK) + mark_page_dirty(vcpu-kvm, gfn); + } set_pte:
Re: [PATCH 1/2] kvm: fix dirty bit tracking for slots with large pages
Izik Eidus wrote: When slot is already allocted and being asked to be tracked we need to break the large pages. This code flush the mmu when someone ask a slot to start dirty bit tracking. Signed-off-by: Izik Eidus iei...@redhat.com --- virt/kvm/kvm_main.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 669eb4a..4a60c72 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1160,6 +1160,8 @@ int __kvm_set_memory_region(struct kvm *kvm, new.userspace_addr = mem-userspace_addr; else new.userspace_addr = 0; + + kvm_arch_flush_shadow(kvm); } if (npages !new.lpage_info) { largepages = 1 + (base_gfn + npages - 1) / KVM_PAGES_PER_HPAGE; Ryan, can you try this out with your large page migration failures? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] kvm: change the dirty page tracking to work with dirty bit instead of page fault
Few quick thoughts: +void kvm_arch_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot) +{ +} + long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c7b0cc2..8a24149 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -527,6 +527,7 @@ struct kvm_x86_ops { int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); int (*get_tdp_level)(void); u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); + int (*dirty_bit_support)(void); }; extern struct kvm_x86_ops *kvm_x86_ops; @@ -796,4 +797,6 @@ int kvm_unmap_hva(struct kvm *kvm, unsigned long hva); int kvm_age_hva(struct kvm *kvm, unsigned long hva); int cpuid_maxphyaddr(struct kvm_vcpu *vcpu); +int is_dirty_and_clean_rmapp(struct kvm *kvm, unsigned long *rmapp); + #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 809cce0..3ec6a7d 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -140,6 +140,8 @@ module_param(oos_shadow, bool, 0644); #define ACC_USER_MASKPT_USER_MASK #define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK) +#define SPTE_DONT_DIRTY (1ULL PT_FIRST_AVAIL_BITS_SHIFT) + #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level) struct kvm_rmap_desc { @@ -629,6 +631,25 @@ static u64 *rmap_next(struct kvm *kvm, unsigned long *rmapp, u64 *spte) return NULL; } +int is_dirty_and_clean_rmapp(struct kvm *kvm, unsigned long *rmapp) +{ + u64 *spte; + int dirty = 0; + Here we should add: if (!shadow_dirty_mask) return 0; + spte = rmap_next(kvm, rmapp, NULL); + while (spte) { + if (*spte PT_DIRTY_MASK) { + set_shadow_pte(spte, (*spte = ~PT_DIRTY_MASK) | + SPTE_DONT_DIRTY); + dirty = 1; + } + spte = rmap_next(kvm, rmapp, spte); + } + + return dirty; +} + */ @@ -1982,9 +1995,11 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, /* If nothing is dirty, don't bother messing with page tables. */ if (is_dirty) { - spin_lock(kvm-mmu_lock); - kvm_mmu_slot_remove_write_access(kvm, log-slot); - spin_unlock(kvm-mmu_lock); + if (kvm_x86_ops-dirty_bit_support()) { This should be if (kvm_x86_ops-dirty_bit_support() - if (!kvm_x86_ops-dirty_bit_support()) + spin_lock(kvm-mmu_lock); + kvm_mmu_slot_remove_write_access(kvm, log-slot); + spin_unlock(kvm-mmu_lock); + } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] kvm: fix dirty bit tracking for slots with large pages
Avi Kivity wrote: Izik Eidus wrote: When slot is already allocted and being asked to be tracked we need to break the large pages. This code flush the mmu when someone ask a slot to start dirty bit tracking. Signed-off-by: Izik Eidus iei...@redhat.com --- virt/kvm/kvm_main.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 669eb4a..4a60c72 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1160,6 +1160,8 @@ int __kvm_set_memory_region(struct kvm *kvm, new.userspace_addr = mem-userspace_addr; else new.userspace_addr = 0; + +kvm_arch_flush_shadow(kvm); } if (npages !new.lpage_info) { largepages = 1 + (base_gfn + npages - 1) / KVM_PAGES_PER_HPAGE; Ryan, can you try this out with your large page migration failures? Wait, i think it is in the wrong place., i am sending a second seires :( -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG at mmu.c:615 from localhost migration using ept+hugetlbfs
On Wed, Jun 10, 2009 at 11:08:14AM +0300, Avi Kivity wrote: Avi Kivity wrote: Not really. One thing, migration should transition the shadow pagetables from large pages to small ones, maybe that bit is broken. Maybe we're looking at a largepage spte and interpreting it as a normal L2 spte, and interpreting a guest page as the L1 spt. I tried to find where we drop the mmu (or at least large sptes for the slot) when we enable dirty logging, and failed. Maybe remove_write_access() is sufficient. I believe you have to break down large pages into 4k pages for migration to work reliably. Was tempted to copypaste the hugetlbfs file ram alloc code into user/main.c to use with user/vm.c (which then can also be used to test TLB flushes on 2M-4k transition which are lacking). Regarding the bogus spte, could not reproduce yesterday with kvm.git, but in the worst case the audit code will catch it. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/5] KVM: MMU: add kvm_mmu_shadow_walk helper
On Wed, Jun 10, 2009 at 12:17:09PM +0300, Avi Kivity wrote: Marcelo Tosatti wrote: Required by EPT misconfiguration handler. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/kvm/mmu.c === --- kvm.orig/arch/x86/kvm/mmu.c +++ kvm/arch/x86/kvm/mmu.c @@ -3013,6 +3013,26 @@ out: return r; } +void kvm_mmu_shadow_walk(struct kvm_vcpu *vcpu, u64 addr, + struct mmu_shadow_walk *walk) +{ +struct kvm_shadow_walk_iterator iterator; + +spin_lock(vcpu-kvm-mmu_lock); +for_each_shadow_entry(vcpu, addr, iterator) { +int err; + +err = walk-fn(vcpu, iterator.sptep, iterator.level, walk); +if (err) +break; + +if (!is_shadow_present_pte(*iterator.sptep)) +break; +} +spin_unlock(vcpu-kvm-mmu_lock); +} +EXPORT_SYMBOL(kvm_mmu_shadow_walk); + Isn't it simpler to invoke for_each_shadow_entry(), instead of defining a callback and calling it? We had those callbacks once, then switched to for_each. The point is its exported to use in a external module (kvm-intel.ko), so you hide the details (such as locking) in the kvm_mmu_shadow_walk helper. Let me know how do you prefer this to be. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [KVM-AUTOTEST PATCH 2/4] Make kvm_config.py to use internal/standard exeptions
Looks fine to me. BTW, I think debug_print() should be prefixed by a single underscore, not two. A double underscore should be used only when name mangling is required -- at least that's what I understood from PEP 8. Let me know what you think. Thanks, Michael - Original Message - From: Lucas Meneghel Rodrigues l...@redhat.com To: autot...@test.kernel.org Cc: kvm@vger.kernel.org Sent: Tuesday, June 9, 2009 7:33:27 PM (GMT+0200) Auto-Detected Subject: [Autotest] [KVM-AUTOTEST PATCH 2/4] Make kvm_config.py to use internal/standard exeptions Instead of resorting to internal autotest exception types, use either python standard exceptions or an internally defined ConfigError exception. Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/tests/kvm/kvm_config.py | 13 + 1 files changed, 9 insertions(+), 4 deletions(-) diff --git a/client/tests/kvm/kvm_config.py b/client/tests/kvm/kvm_config.py index a3467a0..cce931a 100755 --- a/client/tests/kvm/kvm_config.py +++ b/client/tests/kvm/kvm_config.py @@ -8,6 +8,11 @@ KVM configuration file utility functions. @copyright: Red Hat 2008-2009 + +class ConfigError(Exception): +pass + + class config: Parse an input file or string that follows the KVM Test Config File format @@ -47,7 +52,7 @@ class config: @param filename: Path of the configuration file. if not os.path.exists(filename): -raise Exception, File %s not found % filename +raise IOError(File %s not found % filename) self.filename = filename file = open(filename, r) self.list = self.parse(file, self.list) @@ -357,7 +362,7 @@ class config: # (inside an exception or inside subvariants) if restricted: e_msg = Using variants in this context is not allowed -raise error.AutotestError() +raise ConfigError(e_msg) if self.debug and not restricted: self.__debug_print(indented_line, Entering variants block (%d dicts in @@ -402,7 +407,7 @@ class config: words[1]) if not os.path.exists(filename): e_msg = Cannot include %s -- file not found % filename -raise error.AutotestError(e_msg) +raise ConfigError(e_msg) new_file = open(filename, r) list = self.parse(new_file, list, restricted) new_file.close() @@ -410,7 +415,7 @@ class config: self.__debug_print(, Leaving file %s % words[1]) else: e_msg = Cannot include anything because no file is open -raise error.AutotestError(e_msg) +raise ConfigError(e_msg) # Parse multi-line exceptions # (the block is parsed for each dict separately) -- 1.6.2.2 ___ Autotest mailing list autot...@test.kernel.org http://test.kernel.org/cgi-bin/mailman/listinfo/autotest -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/5] KVM: MMU: add kvm_mmu_shadow_walk helper
Marcelo Tosatti wrote: Isn't it simpler to invoke for_each_shadow_entry(), instead of defining a callback and calling it? We had those callbacks once, then switched to for_each. The point is its exported to use in a external module (kvm-intel.ko), so you hide the details (such as locking) in the kvm_mmu_shadow_walk helper. Let me know how do you prefer this to be. Ah, you're right. I don't think it's worthwhile to add all this just for debugging. You can add a function that dumps the spte chain as well as the features MSR, and we can decode it by hand when we see it. Disadvantage is more work for us when we hit the bug, but at least that function is reusable in other contexts. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Clean up MADT Table Creation (v2)
Beth Kon wrote: This patch is based on the recent patch from Vincent Minet. I split Vincent's changes into 2 patches (to separate MADT and RSDT table cleanup, as suggested by Marcelo) and added a bit to them. There has been much ado over the acpi_bios_init function recently. I had actually done a rewrite very similar to Gleb's, but Avi argued that the rewrite has to be more incremental. This patch contains minimal changes without any rewrite because the changes are kvm-only. The rewrite would better be a separate step, submitted to qemu and then merged into kvm. I am submitting the RSDT fix to kvm because the kvm and qemu RSDT implementation differs. Again, as a separate rewrite effort, the kvm and qemu RSDT manipulation could be merged into one base as a later, separate step. This patch will get MADT into reasonable enough shape for me to resubmit hpet patches on top of it. After that, I'd be willing to submit incremental rewrite patches for acpi_bios_init to qemu, starting with MADT and RSDT. Applied both, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [KVM-AUTOTEST PATCH 4/4] Fix bad logging calls
Looks fine to me. - Original Message - From: Lucas Meneghel Rodrigues l...@redhat.com To: autot...@test.kernel.org Cc: kvm@vger.kernel.org Sent: Tuesday, June 9, 2009 7:33:29 PM (GMT+0200) Auto-Detected Subject: [Autotest] [KVM-AUTOTEST PATCH 4/4] Fix bad logging calls During the conversion of kvm autotest to upstream coding standards, some bad logging calls were left behind. This patch fixes them. Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/tests/kvm/kvm_utils.py |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py index 0f4c770..70a49c9 100644 --- a/client/tests/kvm/kvm_utils.py +++ b/client/tests/kvm/kvm_utils.py @@ -304,7 +304,7 @@ class kvm_spawn: # Print some debugging info if match == None and self.poll() != 0: -logging.debug(Timeout elapsed or process terminated. Output:, +logging.debug(Timeout elapsed or process terminated. Output:%s, format_str_for_message(data.strip())) return (match, data) @@ -465,8 +465,8 @@ class kvm_spawn: # Print some debugging info if status != 0: -logging.debug(Command failed; status: %d, output: % status \ -+ format_str_for_message(output.strip())) +logging.debug(Command failed; status: %d, output:%s, status, + format_str_for_message(output.strip())) return (status, output) -- 1.6.2.2 ___ Autotest mailing list autot...@test.kernel.org http://test.kernel.org/cgi-bin/mailman/listinfo/autotest -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv4 00/13] qemu: MSI-X support
Here is the port of MSI-X support patches to upstream qemu. Please comment or commit. This patchset adds generic support for MSI-X, adds implementation in APIC, and uses MSI-X in virtio-net. Changelog: - since v3 call to resize_region on load split patches a bit differently to address style comments by Glauber update commit message to clarify what msix_support flag does - since v2 rename mask - wmask to avoid conflict with work by Yamahata - since v1 At Paul's suggestion, use stl_phy to decouple APIC and MSI-X implementation This uses the mask table patch that I posted previously, and which is included in the series. -- MST Michael S. Tsirkin (13): qemu: make default_write_config use mask table qemu: capability bits in pci save/restore qemu: add routines to manage PCI capabilities qemu: helper routines for pci access qemu: MSI-X support functions qemu: add flag to disable MSI-X by default qemu: minimal MSI/MSI-X implementation for PC qemu: add support for resizing regions qemu: virtio support for many interrupt vectors qemu: MSI-X support in virtio PCI qemu: request 3 vectors in virtio-net qemu: virtio save/load bindings qemu: add pci_get/set_byte Makefile.target|2 +- hw/apic.c | 43 +- hw/msix.c | 420 hw/msix.h | 35 + hw/pci.c | 295 - hw/pci.h | 105 +- hw/syborg_virtio.c | 13 ++- hw/virtio-net.c|1 + hw/virtio-pci.c| 216 ++- hw/virtio.c| 70 ++--- hw/virtio.h| 14 ++- qemu-options.hx|2 + vl.c |3 + 13 files changed, 1005 insertions(+), 214 deletions(-) create mode 100644 hw/msix.c create mode 100644 hw/msix.h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv4 01/13] qemu: make default_write_config use mask table
Change much of hw/pci to use symbolic constants and a table-driven design: add a mask table with writable bits set and readonly bits unset. Detect change by comparing original and new registers. This makes it easy to support capabilities where read-only/writeable bit layout differs between devices, depending on capabilities present. As a result, writing a single byte in BAR registers now works as it should. Writing to upper limit registers in the bridge also works as it should. Code is also shorter. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/pci.c | 145 - hw/pci.h | 18 +++- 2 files changed, 46 insertions(+), 117 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 8c904ba..766e0c6 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -238,6 +238,17 @@ int pci_assign_devaddr(const char *addr, int *domp, int *busp, unsigned *slotp) return pci_parse_devaddr(devaddr, domp, busp, slotp); } +static void pci_init_mask(PCIDevice *dev) +{ +int i; +dev-wmask[PCI_CACHE_LINE_SIZE] = 0xff; +dev-wmask[PCI_INTERRUPT_LINE] = 0xff; +dev-wmask[PCI_COMMAND] = PCI_COMMAND_IO | PCI_COMMAND_MEMORY + | PCI_COMMAND_MASTER; +for (i = PCI_CONFIG_HEADER_SIZE; i PCI_CONFIG_SPACE_SIZE; ++i) +dev-wmask[i] = 0xff; +} + /* -1 for devfn means auto assign */ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus, const char *name, int devfn, @@ -257,6 +268,7 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus, pstrcpy(pci_dev-name, sizeof(pci_dev-name), name); memset(pci_dev-irq_state, 0, sizeof(pci_dev-irq_state)); pci_set_default_subsystem_id(pci_dev); +pci_init_mask(pci_dev); if (!config_read) config_read = pci_default_read_config; @@ -328,6 +340,7 @@ void pci_register_io_region(PCIDevice *pci_dev, int region_num, { PCIIORegion *r; uint32_t addr; +uint32_t wmask; if ((unsigned int)region_num = PCI_NUM_REGIONS) return; @@ -343,12 +356,17 @@ void pci_register_io_region(PCIDevice *pci_dev, int region_num, r-size = size; r-type = type; r-map_func = map_func; + +wmask = ~(size - 1); if (region_num == PCI_ROM_SLOT) { addr = 0x30; +/* ROM enable bit is writeable */ +wmask |= 1; } else { addr = 0x10 + region_num * 4; } *(uint32_t *)(pci_dev-config + addr) = cpu_to_le32(type); +*(uint32_t *)(pci_dev-wmask + addr) = cpu_to_le32(wmask); } static void pci_update_mappings(PCIDevice *d) @@ -457,118 +475,21 @@ uint32_t pci_default_read_config(PCIDevice *d, return val; } -void pci_default_write_config(PCIDevice *d, - uint32_t address, uint32_t val, int len) +void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l) { -int can_write, i; -uint32_t end, addr; - -if (len == 4 ((address = 0x10 address 0x10 + 4 * 6) || - (address = 0x30 address 0x34))) { -PCIIORegion *r; -int reg; +uint8_t orig[PCI_CONFIG_SPACE_SIZE]; +int i; -if ( address = 0x30 ) { -reg = PCI_ROM_SLOT; -}else{ -reg = (address - 0x10) 2; -} -r = d-io_regions[reg]; -if (r-size == 0) -goto default_config; -/* compute the stored value */ -if (reg == PCI_ROM_SLOT) { -/* keep ROM enable bit */ -val = (~(r-size - 1)) | 1; -} else { -val = ~(r-size - 1); -val |= r-type; -} -*(uint32_t *)(d-config + address) = cpu_to_le32(val); -pci_update_mappings(d); -return; -} - default_config: /* not efficient, but simple */ -addr = address; -for(i = 0; i len; i++) { -/* default read/write accesses */ -switch(d-config[0x0e]) { -case 0x00: -case 0x80: -switch(addr) { -case 0x00: -case 0x01: -case 0x02: -case 0x03: -case 0x06: -case 0x07: -case 0x08: -case 0x09: -case 0x0a: -case 0x0b: -case 0x0e: -case 0x10 ... 0x27: /* base */ -case 0x2c ... 0x2f: /* read-only subsystem ID vendor ID */ -case 0x30 ... 0x33: /* rom */ -case 0x3d: -can_write = 0; -break; -default: -can_write = 1; -break; -} -break; -default: -case 0x01: -switch(addr) { -case 0x00: -case 0x01: -case 0x02: -case 0x03: -case 0x06: -case 0x07: -case 0x08: -case 0x09: -case 0x0a: -case 0x0b: -
[PATCHv4 02/13] qemu: capability bits in pci save/restore
Add support for capability bits in save/restore for pci. These will be used for MSI, where the capability might be present or not as requested by user, which does not map well into a single version number. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/pci.c | 17 ++--- hw/pci.h |4 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 766e0c6..6740b07 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -126,9 +126,13 @@ int pci_bus_num(PCIBus *s) void pci_device_save(PCIDevice *s, QEMUFile *f) { +int version = s-cap_present ? 3 : 2; int i; -qemu_put_be32(f, 2); /* PCI device version */ +/* PCI device version and capabilities */ +qemu_put_be32(f, version); +if (version = 3) +qemu_put_be32(f, s-cap_present); qemu_put_buffer(f, s-config, 256); for (i = 0; i 4; i++) qemu_put_be32(f, s-irq_state[i]); @@ -140,15 +144,22 @@ int pci_device_load(PCIDevice *s, QEMUFile *f) int i; version_id = qemu_get_be32(f); -if (version_id 2) +if (version_id 3) return -EINVAL; +if (version_id = 3) +s-cap_present = qemu_get_be32(f); +else +s-cap_present = 0; + +if (s-cap_present ~s-cap_supported) +return -EINVAL; + qemu_get_buffer(f, s-config, 256); pci_update_mappings(s); if (version_id = 2) for (i = 0; i 4; i ++) s-irq_state[i] = qemu_get_be32(f); - return 0; } diff --git a/hw/pci.h b/hw/pci.h index c50ea50..656badc 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -174,6 +174,10 @@ struct PCIDevice { /* Current IRQ levels. Used internally by the generic PCI code. */ int irq_state[4]; + +/* Capability bits for save/load */ +uint32_t cap_supported; +uint32_t cap_present; }; PCIDevice *pci_register_device(PCIBus *bus, const char *name, -- 1.6.3.1.56.g79e1.dirty -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv4 03/13] qemu: add routines to manage PCI capabilities
Add routines to manage PCI capability list. First user will be MSI-X. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/pci.c | 79 ++ hw/pci.h | 18 +- 2 files changed, 96 insertions(+), 1 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 6740b07..a3f3fd3 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -160,6 +160,12 @@ int pci_device_load(PCIDevice *s, QEMUFile *f) if (version_id = 2) for (i = 0; i 4; i ++) s-irq_state[i] = qemu_get_be32(f); +/* Clear wmask and used bits for capabilities. + Must be restored separately, since capabilities can + be placed anywhere in config space. */ +memset(s-used, 0, PCI_CONFIG_SPACE_SIZE); +for (i = PCI_CONFIG_HEADER_SIZE; i PCI_CONFIG_SPACE_SIZE; ++i) +s-wmask[i] = 0xff; return 0; } @@ -865,3 +871,76 @@ PCIDevice *pci_create_simple(PCIBus *bus, int devfn, const char *name) return (PCIDevice *)dev; } + +static int pci_find_space(PCIDevice *pdev, uint8_t size) +{ +int offset = PCI_CONFIG_HEADER_SIZE; +int i; +for (i = PCI_CONFIG_HEADER_SIZE; i PCI_CONFIG_SPACE_SIZE; ++i) +if (pdev-used[i]) +offset = i + 1; +else if (i - offset + 1 == size) +return offset; +return 0; +} + +static uint8_t pci_find_capability_list(PCIDevice *pdev, uint8_t cap_id, +uint8_t *prev_p) +{ +uint8_t next, prev; + +if (!(pdev-config[PCI_STATUS] PCI_STATUS_CAP_LIST)) +return 0; + +for (prev = PCI_CAPABILITY_LIST; (next = pdev-config[prev]); + prev = next + PCI_CAP_LIST_NEXT) +if (pdev-config[next + PCI_CAP_LIST_ID] == cap_id) +break; + +if (prev_p) +*prev_p = prev; +return next; +} + +/* Reserve space and add capability to the linked list in pci config space */ +int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size) +{ +uint8_t offset = pci_find_space(pdev, size); +uint8_t *config = pdev-config + offset; +if (!offset) +return -ENOSPC; +config[PCI_CAP_LIST_ID] = cap_id; +config[PCI_CAP_LIST_NEXT] = pdev-config[PCI_CAPABILITY_LIST]; +pdev-config[PCI_CAPABILITY_LIST] = offset; +pdev-config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; +memset(pdev-used + offset, 0xFF, size); +/* Make capability read-only by default */ +memset(pdev-wmask + offset, 0, size); +return offset; +} + +/* Unlink capability from the pci config space. */ +void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size) +{ +uint8_t prev, offset = pci_find_capability_list(pdev, cap_id, prev); +if (!offset) +return; +pdev-config[prev] = pdev-config[offset + PCI_CAP_LIST_NEXT]; +/* Make capability writeable again */ +memset(pdev-wmask + offset, 0xff, size); +memset(pdev-used + offset, 0, size); + +if (!pdev-config[PCI_CAPABILITY_LIST]) +pdev-config[PCI_STATUS] = ~PCI_STATUS_CAP_LIST; +} + +/* Reserve space for capability at a known offset (to call after load). */ +void pci_reserve_capability(PCIDevice *pdev, uint8_t offset, uint8_t size) +{ +memset(pdev-used + offset, 0xff, size); +} + +uint8_t pci_find_capability(PCIDevice *pdev, uint8_t cap_id) +{ +return pci_find_capability_list(pdev, cap_id, NULL); +} diff --git a/hw/pci.h b/hw/pci.h index 656badc..2ae72f0 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -121,6 +121,10 @@ typedef struct PCIIORegion { #define PCI_MIN_GNT0x3e/* 8 bits */ #define PCI_MAX_LAT0x3f/* 8 bits */ +/* Capability lists */ +#define PCI_CAP_LIST_ID0 /* Capability ID */ +#define PCI_CAP_LIST_NEXT 1 /* Next capability in the list */ + #define PCI_REVISION0x08/* obsolete, use PCI_REVISION_ID */ #define PCI_SUBVENDOR_ID0x2c/* obsolete, use PCI_SUBSYSTEM_VENDOR_ID */ #define PCI_SUBDEVICE_ID0x2e/* obsolete, use PCI_SUBSYSTEM_ID */ @@ -128,7 +132,7 @@ typedef struct PCIIORegion { /* Bits in the PCI Status Register (PCI 2.3 spec) */ #define PCI_STATUS_RESERVED1 0x007 #define PCI_STATUS_INT_STATUS 0x008 -#define PCI_STATUS_CAPABILITIES0x010 +#define PCI_STATUS_CAP_LIST0x010 #define PCI_STATUS_66MHZ 0x020 #define PCI_STATUS_RESERVED2 0x040 #define PCI_STATUS_FAST_BACK 0x080 @@ -158,6 +162,9 @@ struct PCIDevice { /* Used to implement R/W bytes */ uint8_t wmask[PCI_CONFIG_SPACE_SIZE]; +/* Used to allocate config space for capabilities. */ +uint8_t used[PCI_CONFIG_SPACE_SIZE]; + /* the following fields are read only */ PCIBus *bus; int devfn; @@ -190,6 +197,15 @@ void pci_register_io_region(PCIDevice *pci_dev, int region_num, uint32_t size, int type, PCIMapIORegionFunc *map_func); +int pci_add_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t cap_size); + +void
[PATCHv4 04/13] qemu: helper routines for pci access
Add inline routines for convenient access to pci devices with correct (little) endianness. Will be used by MSI-X support. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/pci.h | 30 +++--- 1 files changed, 27 insertions(+), 3 deletions(-) diff --git a/hw/pci.h b/hw/pci.h index 2ae72f0..c603384 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -236,21 +236,45 @@ PCIBus *pci_bridge_init(PCIBus *bus, int devfn, uint16_t vid, uint16_t did, pci_map_irq_fn map_irq, const char *name); static inline void +pci_set_word(uint8_t *config, uint16_t val) +{ +cpu_to_le16wu((uint16_t *)config, val); +} + +static inline uint16_t +pci_get_word(uint8_t *config) +{ +return le16_to_cpupu((uint16_t *)config); +} + +static inline void +pci_set_long(uint8_t *config, uint32_t val) +{ +cpu_to_le32wu((uint32_t *)config, val); +} + +static inline uint32_t +pci_get_long(uint8_t *config) +{ +return le32_to_cpupu((uint32_t *)config); +} + +static inline void pci_config_set_vendor_id(uint8_t *pci_config, uint16_t val) { -cpu_to_le16wu((uint16_t *)pci_config[PCI_VENDOR_ID], val); +pci_set_word(pci_config[PCI_VENDOR_ID], val); } static inline void pci_config_set_device_id(uint8_t *pci_config, uint16_t val) { -cpu_to_le16wu((uint16_t *)pci_config[PCI_DEVICE_ID], val); +pci_set_word(pci_config[PCI_DEVICE_ID], val); } static inline void pci_config_set_class(uint8_t *pci_config, uint16_t val) { -cpu_to_le16wu((uint16_t *)pci_config[PCI_CLASS_DEVICE], val); +pci_set_word(pci_config[PCI_CLASS_DEVICE], val); } typedef void (*pci_qdev_initfn)(PCIDevice *dev); -- 1.6.3.1.56.g79e1.dirty -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv4 05/13] qemu: MSI-X support functions
Add functions implementing MSI-X support. First user will be virtio-pci. Note that platform must set a flag to declare MSI supported: this is a safety measure to avoid breaking platforms which should support MSI-X but currently lack this in the interrupt controller emulation. For PC this will be set by APIC. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Makefile.target |2 +- hw/msix.c | 417 +++ hw/msix.h | 35 + hw/pci.h| 20 +++ 4 files changed, 473 insertions(+), 1 deletions(-) create mode 100644 hw/msix.c create mode 100644 hw/msix.h diff --git a/Makefile.target b/Makefile.target index 27de4b9..85d8a4a 100644 --- a/Makefile.target +++ b/Makefile.target @@ -494,7 +494,7 @@ endif #CONFIG_BSD_USER ifndef CONFIG_USER_ONLY OBJS=vl.o osdep.o monitor.o pci.o loader.o isa_mmio.o machine.o \ - gdbstub.o gdbstub-xml.o + gdbstub.o gdbstub-xml.o msix.o # virtio has to be here due to weird dependency between PCI and virtio-net. # need to fix this properly OBJS+=virtio-blk.o virtio-balloon.o virtio-net.o virtio-console.o diff --git a/hw/msix.c b/hw/msix.c new file mode 100644 index 000..a448eab --- /dev/null +++ b/hw/msix.c @@ -0,0 +1,417 @@ +/* + * MSI-X device support + * + * This module includes support for MSI-X in pci devices. + * + * Author: Michael S. Tsirkin m...@redhat.com + * + * Copyright (c) 2009, Red Hat Inc, Michael S. Tsirkin (m...@redhat.com) + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + */ + +#include hw.h +#include msix.h +#include pci.h + +/* Declaration from linux/pci_regs.h */ +#define PCI_CAP_ID_MSIX 0x11 /* MSI-X */ +#define PCI_MSIX_FLAGS 2 /* Table at lower 11 bits */ +#define PCI_MSIX_FLAGS_QSIZE 0x7FF +#define PCI_MSIX_FLAGS_ENABLE (1 15) +#define PCI_MSIX_FLAGS_BIRMASK(7 0) + +/* MSI-X capability structure */ +#define MSIX_TABLE_OFFSET 4 +#define MSIX_PBA_OFFSET 8 +#define MSIX_CAP_LENGTH 12 + +/* MSI enable bit is in byte 1 in FLAGS register */ +#define MSIX_ENABLE_OFFSET (PCI_MSIX_FLAGS + 1) +#define MSIX_ENABLE_MASK (PCI_MSIX_FLAGS_ENABLE 8) + +/* MSI-X table format */ +#define MSIX_MSG_ADDR 0 +#define MSIX_MSG_UPPER_ADDR 4 +#define MSIX_MSG_DATA 8 +#define MSIX_VECTOR_CTRL 12 +#define MSIX_ENTRY_SIZE 16 +#define MSIX_VECTOR_MASK 0x1 + +/* How much space does an MSIX table need. */ +/* The spec requires giving the table structure + * a 4K aligned region all by itself. Align it to + * target pages so that drivers can do passthrough + * on the rest of the region. */ +#define MSIX_PAGE_SIZE TARGET_PAGE_ALIGN(0x1000) +/* Reserve second half of the page for pending bits */ +#define MSIX_PAGE_PENDING (MSIX_PAGE_SIZE / 2) +#define MSIX_MAX_ENTRIES 32 + + +#ifdef MSIX_DEBUG +#define DEBUG(fmt, ...) \ +do { \ + fprintf(stderr, %s: fmt, __func__ , __VA_ARGS__);\ +} while (0) +#else +#define DEBUG(fmt, ...) do { } while(0) +#endif + +/* Flag to globally disable MSI-X support */ +int msix_disable; + +/* Flag for interrupt controller to declare MSI-X support */ +int msix_supported; + +/* Add MSI-X capability to the config space for the device. */ +/* Given a bar and its size, add MSI-X table on top of it + * and fill MSI-X capability in the config space. + * Original bar size must be a power of 2 or 0. + * New bar size is returned. */ +static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries, + unsigned bar_nr, unsigned bar_size) +{ +int config_offset; +uint8_t *config; +uint32_t new_size; + +if (nentries 1 || nentries PCI_MSIX_FLAGS_QSIZE + 1) +return -EINVAL; +if (bar_size 0x8000) +return -ENOSPC; + +/* Add space for MSI-X structures */ +if (!bar_size) +new_size = MSIX_PAGE_SIZE; +else if (bar_size MSIX_PAGE_SIZE) { +bar_size = MSIX_PAGE_SIZE; +new_size = MSIX_PAGE_SIZE * 2; +} else +new_size = bar_size * 2; + +pdev-msix_bar_size = new_size; +config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH); +if (config_offset 0) +return config_offset; +config = pdev-config + config_offset; + +pci_set_word(config + PCI_MSIX_FLAGS, nentries - 1); +/* Table on top of BAR */ +pci_set_long(config + MSIX_TABLE_OFFSET, bar_size | bar_nr); +/* Pending bits on top of that */ +pci_set_long(config + MSIX_PBA_OFFSET, (bar_size + MSIX_PAGE_PENDING) | + bar_nr); +pdev-msix_cap = config_offset; +/* Make flags bit writeable. */ +pdev-wmask[config_offset + MSIX_ENABLE_OFFSET] |= MSIX_ENABLE_MASK; +return 0; +} + +static void msix_free_irq_entries(PCIDevice *dev) +{ +int vector; + +for (vector = 0; vector dev-msix_entries_nr; ++vector) +
[PATCHv4 06/13] qemu: add flag to disable MSI-X by default
Add global flag to disable MSI-X by default. This is useful primarily to make images loadable by older qemu (without msix). Even when MSI-X is disabled by flag, you can still load images that have MSI-X enabled. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/msix.c |3 +++ qemu-options.hx |2 ++ vl.c|3 +++ 3 files changed, 8 insertions(+), 0 deletions(-) diff --git a/hw/msix.c b/hw/msix.c index a448eab..ce4e6ba 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -122,6 +122,9 @@ void msix_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len) { unsigned enable_pos = dev-msix_cap + MSIX_ENABLE_OFFSET; +if (!(dev-cap_present QEMU_PCI_CAP_MSIX)) +return; + if (addr + len = enable_pos || addr enable_pos) return; diff --git a/qemu-options.hx b/qemu-options.hx index fa549c3..ed92ec2 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -1575,3 +1575,5 @@ DEF(semihosting, 0, QEMU_OPTION_semihosting, DEF(old-param, 0, QEMU_OPTION_old_param, -old-param old param mode\n) #endif +DEF(disable-msix, 0, QEMU_OPTION_disable_msix, +-disable-msix disable msix support for PCI devices (enabled by default)\n) diff --git a/vl.c b/vl.c index f08f0f3..8c116c7 100644 --- a/vl.c +++ b/vl.c @@ -135,6 +135,7 @@ int main(int argc, char **argv) #include hw/usb.h #include hw/pcmcia.h #include hw/pc.h +#include hw/msix.h #include hw/audiodev.h #include hw/isa.h #include hw/baum.h @@ -5681,6 +5682,8 @@ int main(int argc, char **argv, char **envp) xen_mode = XEN_ATTACH; break; #endif +case QEMU_OPTION_disable_msix: +msix_disable = 1; } } } -- 1.6.3.1.56.g79e1.dirty -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv4 07/13] qemu: minimal MSI/MSI-X implementation for PC
Implement MSI support in APIC. Note that MSI and MMIO APIC registers are at the same memory location, but actually not on the global bus: MSI is on PCI bus, APIC is connected directly to the CPU. We map them on the global bus at the same address which happens to work because MSI registers are reserved in APIC MMIO and vice versa. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/apic.c | 43 +++ 1 files changed, 39 insertions(+), 4 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index 8c8b2de..ed03a36 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -19,6 +19,8 @@ */ #include hw.h #include pc.h +#include pci.h +#include msix.h #include qemu-timer.h #include host-utils.h @@ -63,6 +65,19 @@ #define MAX_APICS 255 #define MAX_APIC_WORDS 8 +/* Intel APIC constants: from include/asm/msidef.h */ +#define MSI_DATA_VECTOR_SHIFT 0 +#define MSI_DATA_VECTOR_MASK 0x00ff +#define MSI_DATA_DELIVERY_MODE_SHIFT 8 +#define MSI_DATA_TRIGGER_SHIFT 15 +#define MSI_DATA_LEVEL_SHIFT 14 +#define MSI_ADDR_DEST_MODE_SHIFT 2 +#define MSI_ADDR_DEST_ID_SHIFT 12 +#defineMSI_ADDR_DEST_ID_MASK 0x000 + +#define MSI_ADDR_BASE 0xfee0 +#define MSI_ADDR_SIZE 0x10 + typedef struct APICState { CPUState *cpu_env; uint32_t apicbase; @@ -712,11 +727,31 @@ static uint32_t apic_mem_readl(void *opaque, target_phys_addr_t addr) return val; } +static void apic_send_msi(target_phys_addr_t addr, uint32 data) +{ +uint8_t dest = (addr MSI_ADDR_DEST_ID_MASK) MSI_ADDR_DEST_ID_SHIFT; +uint8_t vector = (data MSI_DATA_VECTOR_MASK) MSI_DATA_VECTOR_SHIFT; +uint8_t dest_mode = (addr MSI_ADDR_DEST_MODE_SHIFT) 0x1; +uint8_t trigger_mode = (data MSI_DATA_TRIGGER_SHIFT) 0x1; +uint8_t delivery = (data MSI_DATA_DELIVERY_MODE_SHIFT) 0x7; +/* XXX: Ignore redirection hint. */ +apic_deliver_irq(dest, dest_mode, delivery, vector, 0, trigger_mode); +} + static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val) { CPUState *env; APICState *s; -int index; +int index = (addr 4) 0xff; +if (addr 0xfff || !index) { +/* MSI and MMIO APIC are at the same memory location, + * but actually not on the global bus: MSI is on PCI bus + * APIC is connected directly to the CPU. + * Mapping them on the global bus happens to work because + * MSI registers are reserved in APIC MMIO and vice versa. */ +apic_send_msi(addr, val); +return; +} env = cpu_single_env; if (!env) @@ -727,7 +762,6 @@ static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val) printf(APIC write: %08x = %08x\n, (uint32_t)addr, val); #endif -index = (addr 4) 0xff; switch(index) { case 0x02: s-id = (val 24); @@ -911,6 +945,7 @@ int apic_init(CPUState *env) s-cpu_env = env; apic_reset(s); +msix_supported = 1; /* XXX: mapping more APICs at the same memory location */ if (apic_io_memory == 0) { @@ -918,7 +953,8 @@ int apic_init(CPUState *env) on the global memory bus. */ apic_io_memory = cpu_register_io_memory(0, apic_mem_read, apic_mem_write, NULL); -cpu_register_physical_memory(s-apicbase ~0xfff, 0x1000, +/* XXX: what if the base changes? */ +cpu_register_physical_memory(MSI_ADDR_BASE, MSI_ADDR_SIZE, apic_io_memory); } s-timer = qemu_new_timer(vm_clock, apic_timer, s); @@ -929,4 +965,3 @@ int apic_init(CPUState *env) local_apics[s-id] = s; return 0; } - -- 1.6.3.1.56.g79e1.dirty -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv4 09/13] qemu: virtio support for many interrupt vectors
Extend virtio to support many interrupt vectors, and rearrange code in preparation for multi-vector support (mostly move reset out to bindings, because we will have to reset the vectors in transport-specific code). Actual bindings in pci, and use in net, to follow. Load and save are not connected to bindings yet, so they are left stubbed out for now. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/syborg_virtio.c | 13 -- hw/virtio-pci.c| 24 hw/virtio.c| 59 ++- hw/virtio.h| 10 +++- 4 files changed, 81 insertions(+), 25 deletions(-) diff --git a/hw/syborg_virtio.c b/hw/syborg_virtio.c index 37c219c..d8c978a 100644 --- a/hw/syborg_virtio.c +++ b/hw/syborg_virtio.c @@ -134,7 +134,10 @@ static void syborg_virtio_writel(void *opaque, target_phys_addr_t offset, vdev-features = value; break; case SYBORG_VIRTIO_QUEUE_BASE: -virtio_queue_set_addr(vdev, vdev-queue_sel, value); +if (value == 0) +virtio_reset(vdev); +else +virtio_queue_set_addr(vdev, vdev-queue_sel, value); break; case SYBORG_VIRTIO_QUEUE_SEL: if (value VIRTIO_PCI_QUEUE_MAX) @@ -228,7 +231,7 @@ static CPUWriteMemoryFunc *syborg_virtio_writefn[] = { syborg_virtio_writel }; -static void syborg_virtio_update_irq(void *opaque) +static void syborg_virtio_update_irq(void *opaque, uint16_t vector) { SyborgVirtIOProxy *proxy = opaque; int level; @@ -239,7 +242,7 @@ static void syborg_virtio_update_irq(void *opaque) } static VirtIOBindings syborg_virtio_bindings = { -.update_irq = syborg_virtio_update_irq +.notify = syborg_virtio_update_irq }; static void syborg_virtio_init(SyborgVirtIOProxy *proxy, VirtIODevice *vdev) @@ -248,6 +251,8 @@ static void syborg_virtio_init(SyborgVirtIOProxy *proxy, VirtIODevice *vdev) proxy-vdev = vdev; +/* Don't support multiple vectors */ +proxy-vdev-nvectors = 0; sysbus_init_irq(proxy-busdev, proxy-irq); iomemtype = cpu_register_io_memory(0, syborg_virtio_readfn, syborg_virtio_writefn, proxy); @@ -255,6 +260,8 @@ static void syborg_virtio_init(SyborgVirtIOProxy *proxy, VirtIODevice *vdev) proxy-id = ((uint32_t)0x1af4 16) | vdev-device_id; +qemu_register_reset(virtio_reset, 0, vdev); + virtio_bind_device(vdev, syborg_virtio_bindings, proxy); } diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index c072423..7dfdd80 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -78,13 +78,19 @@ typedef struct { /* virtio device */ -static void virtio_pci_update_irq(void *opaque) +static void virtio_pci_notify(void *opaque, uint16_t vector) { VirtIOPCIProxy *proxy = opaque; qemu_set_irq(proxy-pci_dev.irq[0], proxy-vdev-isr 1); } +static void virtio_pci_reset(void *opaque) +{ +VirtIOPCIProxy *proxy = opaque; +virtio_reset(proxy-vdev); +} + static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val) { VirtIOPCIProxy *proxy = opaque; @@ -108,7 +114,10 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val) break; case VIRTIO_PCI_QUEUE_PFN: pa = (target_phys_addr_t)val VIRTIO_PCI_QUEUE_ADDR_SHIFT; -virtio_queue_set_addr(vdev, vdev-queue_sel, pa); +if (pa == 0) +virtio_pci_reset(proxy); +else +virtio_queue_set_addr(vdev, vdev-queue_sel, pa); break; case VIRTIO_PCI_QUEUE_SEL: if (val VIRTIO_PCI_QUEUE_MAX) @@ -120,7 +129,7 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val) case VIRTIO_PCI_STATUS: vdev-status = val 0xFF; if (vdev-status == 0) -virtio_reset(vdev); +virtio_pci_reset(proxy); break; } } @@ -158,7 +167,7 @@ static uint32_t virtio_ioport_read(void *opaque, uint32_t addr) /* reading from the ISR also clears it. */ ret = vdev-isr; vdev-isr = 0; -virtio_update_irq(vdev); +qemu_set_irq(proxy-pci_dev.irq[0], 0); break; default: break; @@ -243,7 +252,7 @@ static void virtio_map(PCIDevice *pci_dev, int region_num, } static const VirtIOBindings virtio_pci_bindings = { -.update_irq = virtio_pci_update_irq +.notify = virtio_pci_notify }; static void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev, @@ -255,6 +264,9 @@ static void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev, proxy-vdev = vdev; +/* No support for multiple vectors yet. */ +proxy-vdev-nvectors = 0; + config = proxy-pci_dev.config; pci_config_set_vendor_id(config, vendor); pci_config_set_device_id(config, device); @@ -279,6 +291,8 @@ static void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev, pci_register_io_region(proxy-pci_dev, 0,
Re: [KVM-AUTOTEST PATCH 3/4] Fix bad line breaks
Looks fine to me. - Original Message - From: Lucas Meneghel Rodrigues l...@redhat.com To: autot...@test.kernel.org Cc: kvm@vger.kernel.org, Lucas Meneghel Rodrigues l...@redhat.com Sent: Tuesday, June 9, 2009 7:33:28 PM (GMT+0200) Auto-Detected Subject: [KVM-AUTOTEST PATCH 3/4] Fix bad line breaks During the conversion of the kvm_runtest_2 to the kvm test, some bad line breaks were introduced. Started using parenthesis implicit line continuation instead of backslash continuation in assignments that were using it. Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/tests/kvm/kvm_guest_wizard.py |8 client/tests/kvm/kvm_tests.py|4 ++-- client/tests/kvm/kvm_utils.py| 12 ++-- client/tests/kvm/kvm_vm.py |4 ++-- client/tests/kvm/make_html_report.py |6 ++ 5 files changed, 16 insertions(+), 18 deletions(-) diff --git a/client/tests/kvm/kvm_guest_wizard.py b/client/tests/kvm/kvm_guest_wizard.py index 01aeb97..2dd9be5 100644 --- a/client/tests/kvm/kvm_guest_wizard.py +++ b/client/tests/kvm/kvm_guest_wizard.py @@ -105,8 +105,8 @@ def barrier_2(vm, words, fail_if_stuck_for, stuck_detection_history, time.sleep(sleep_duration) # Failure -message = Barrier failed at step %s after %.2f seconds (%s) % \ -(current_step_num, time.time() - start_time, failure_message) +message = (Barrier failed at step %s after %.2f seconds (%s) % + (current_step_num, time.time() - start_time, failure_message)) # What should we do with this failure? if words[-1] == optional: @@ -201,9 +201,9 @@ def run_steps(test, params, env): logging.error(Variable not defined: %s % words[1]) elif words[0] == barrier_2: if current_screendump: -scrdump_filename = \ +scrdump_filename = ( os.path.join(ppm_utils.get_data_dir(steps_filename), - current_screendump) + current_screendump)) else: scrdump_filename = None if not barrier_2(vm, words, fail_if_stuck_for, diff --git a/client/tests/kvm/kvm_tests.py b/client/tests/kvm/kvm_tests.py index 48e..54d2a7a 100644 --- a/client/tests/kvm/kvm_tests.py +++ b/client/tests/kvm/kvm_tests.py @@ -321,8 +321,8 @@ def run_autotest(test, params, env): status_fail = False if result_list == []: status_fail = True -message_fail = Test '%s' did not produce any recognizable - results % test_name +message_fail = (Test '%s' did not produce any recognizable +results % test_name) for result in result_list: logging.info(str(result)) if result[1] == FAIL: diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py index 434190d..0f4c770 100644 --- a/client/tests/kvm/kvm_utils.py +++ b/client/tests/kvm/kvm_utils.py @@ -644,8 +644,8 @@ def scp_to_remote(host, port, username, password, local_path, remote_path, @return: True on success and False on failure. -command = scp -o UserKnownHostsFile=/dev/null -r -P %s %s %...@%s:%s % \ -(port, local_path, username, host, remote_path) +command = (scp -o UserKnownHostsFile=/dev/null -r -P %s %s %...@%s:%s % + (port, local_path, username, host, remote_path)) return remote_scp(command, password, timeout) @@ -664,8 +664,8 @@ def scp_from_remote(host, port, username, password, remote_path, local_path, @return: True on success and False on failure. -command = scp -o UserKnownHostsFile=/dev/null -r -P %s %...@%s:%s %s % \ -(port, username, host, remote_path, local_path) +command = (scp -o UserKnownHostsFile=/dev/null -r -P %s %...@%s:%s %s % + (port, username, host, remote_path, local_path)) return remote_scp(command, password, timeout) @@ -681,8 +681,8 @@ def ssh(host, port, username, password, prompt, timeout=10): @return: kvm_spawn object on success and None on failure. -command = ssh -o UserKnownHostsFile=/dev/null -p %s %...@%s % \ -(port, username, host) +command = (ssh -o UserKnownHostsFile=/dev/null -p %s %...@%s % + (port, username, host)) return remote_login(command, password, prompt, \n, timeout) diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py index eb9717b..de21b2f 100644 --- a/client/tests/kvm/kvm_vm.py +++ b/client/tests/kvm/kvm_vm.py @@ -118,8 +118,8 @@ class VM: # Find available monitor filename while True: # The monitor filename should be unique -self.instance = time.strftime(%Y%m%d-%H%M%S-) + \ -kvm_utils.generate_random_string(4) +self.instance = (time.strftime(%Y%m%d-%H%M%S-) + + kvm_utils.generate_random_string(4)) self.monitor_file_name =
[PATCHv4 11/13] qemu: request 3 vectors in virtio-net
Request up to 3 vectors in virtio-net. Actual bindings might supply less. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/virtio-net.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/hw/virtio-net.c b/hw/virtio-net.c index 60aa6da..6118fe3 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -621,6 +621,7 @@ VirtIODevice *virtio_net_init(DeviceState *dev) n-mac_table.macs = qemu_mallocz(MAC_TABLE_ENTRIES * ETH_ALEN); n-vlans = qemu_mallocz(MAX_VLAN 3); +n-vdev.nvectors = 3; register_savevm(virtio-net, virtio_net_id++, VIRTIO_NET_VM_VERSION, virtio_net_save, virtio_net_load, n); -- 1.6.3.1.56.g79e1.dirty -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv4 12/13] qemu: virtio save/load bindings
Implement bindings for virtio save/load. Use them in virtio pci. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/virtio-pci.c | 50 +- hw/virtio.c | 33 - hw/virtio.h |4 3 files changed, 69 insertions(+), 18 deletions(-) diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 294f4c7..5a7bdcc 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -105,6 +105,50 @@ static void virtio_pci_notify(void *opaque, uint16_t vector) qemu_set_irq(proxy-pci_dev.irq[0], proxy-vdev-isr 1); } +static void virtio_pci_save_config(void * opaque, QEMUFile *f) +{ +VirtIOPCIProxy *proxy = opaque; +pci_device_save(proxy-pci_dev, f); +msix_save(proxy-pci_dev, f); +if (msix_present(proxy-pci_dev)) +qemu_put_be16(f, proxy-vdev-config_vector); +} + +static void virtio_pci_save_queue(void * opaque, int n, QEMUFile *f) +{ +VirtIOPCIProxy *proxy = opaque; +if (msix_present(proxy-pci_dev)) +qemu_put_be16(f, virtio_queue_vector(proxy-vdev, n)); +} + +static int virtio_pci_load_config(void * opaque, QEMUFile *f) +{ +VirtIOPCIProxy *proxy = opaque; +int ret; +ret = pci_device_load(proxy-pci_dev, f); +if (ret) +return ret; +ret = msix_load(proxy-pci_dev, f); +if (ret) +return ret; +if (msix_present(proxy-pci_dev)) +qemu_get_be16s(f, proxy-vdev-config_vector); + +pci_resize_io_region(proxy-pci_dev, 1, msix_bar_size(proxy-pci_dev)); +return 0; +} + +static int virtio_pci_load_queue(void * opaque, int n, QEMUFile *f) +{ +VirtIOPCIProxy *proxy = opaque; +uint16_t vector; +if (!msix_present(proxy-pci_dev)) +return 0; +qemu_get_be16s(f, vector); +virtio_queue_set_vector(proxy-vdev, n, vector); +return 0; +} + static void virtio_pci_reset(void *opaque) { VirtIOPCIProxy *proxy = opaque; @@ -317,7 +361,11 @@ static void virtio_write_config(PCIDevice *pci_dev, uint32_t address, } static const VirtIOBindings virtio_pci_bindings = { -.notify = virtio_pci_notify +.notify = virtio_pci_notify, +.save_config = virtio_pci_save_config, +.load_config = virtio_pci_load_config, +.save_queue = virtio_pci_save_queue, +.load_queue = virtio_pci_load_queue, }; static void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev, diff --git a/hw/virtio.c b/hw/virtio.c index fe9f793..b773dff 100644 --- a/hw/virtio.c +++ b/hw/virtio.c @@ -568,9 +568,8 @@ void virtio_save(VirtIODevice *vdev, QEMUFile *f) { int i; -/* FIXME: load/save binding. */ -//pci_device_save(vdev-pci_dev, f); -//msix_save(vdev-pci_dev, f); +if (vdev-binding-save_config) +vdev-binding-save_config(vdev-binding_opaque, f); qemu_put_8s(f, vdev-status); qemu_put_8s(f, vdev-isr); @@ -596,18 +595,20 @@ void virtio_save(VirtIODevice *vdev, QEMUFile *f) qemu_put_be32(f, vdev-vq[i].vring.num); qemu_put_be64(f, vdev-vq[i].pa); qemu_put_be16s(f, vdev-vq[i].last_avail_idx); -if (vdev-nvectors) -qemu_put_be16s(f, vdev-vq[i].vector); +if (vdev-binding-save_queue) +vdev-binding-save_queue(vdev-binding_opaque, i, f); } } -void virtio_load(VirtIODevice *vdev, QEMUFile *f) +int virtio_load(VirtIODevice *vdev, QEMUFile *f) { -int num, i; +int num, i, ret; -/* FIXME: load/save binding. */ -//pci_device_load(vdev-pci_dev, f); -//r = msix_load(vdev-pci_dev, f); +if (vdev-binding-load_config) { +ret = vdev-binding-load_config(vdev-binding_opaque, f); +if (ret) +return ret; +} qemu_get_8s(f, vdev-status); qemu_get_8s(f, vdev-isr); @@ -616,10 +617,6 @@ void virtio_load(VirtIODevice *vdev, QEMUFile *f) vdev-config_len = qemu_get_be32(f); qemu_get_buffer(f, vdev-config, vdev-config_len); -if (vdev-nvectors) { -qemu_get_be16s(f, vdev-config_vector); -//msix_vector_use(vdev-pci_dev, vdev-config_vector); -} num = qemu_get_be32(f); for (i = 0; i num; i++) { @@ -630,13 +627,15 @@ void virtio_load(VirtIODevice *vdev, QEMUFile *f) if (vdev-vq[i].pa) { virtqueue_init(vdev-vq[i]); } -if (vdev-nvectors) { -qemu_get_be16s(f, vdev-vq[i].vector); -//msix_vector_use(vdev-pci_dev, vdev-config_vector); +if (vdev-binding-load_queue) { +ret = vdev-binding-load_queue(vdev-binding_opaque, i, f); +if (ret) +return ret; } } virtio_notify_vector(vdev, VIRTIO_NO_VECTOR); +return 0; } void virtio_cleanup(VirtIODevice *vdev) diff --git a/hw/virtio.h b/hw/virtio.h index 04a3c3d..ce05517 100644 --- a/hw/virtio.h +++ b/hw/virtio.h @@ -72,6 +72,10 @@ typedef struct VirtQueueElement typedef struct { void (*notify)(void * opaque, uint16_t vector); +void (*save_config)(void *
[PATCHv4 13/13] qemu: add pci_get/set_byte
Add pci_get/set_byte to keep *_word and *_long access functions company. They are unused for now. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/pci.h | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/hw/pci.h b/hw/pci.h index 9455e4c..07404c5 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -259,6 +259,18 @@ PCIBus *pci_bridge_init(PCIBus *bus, int devfn, uint16_t vid, uint16_t did, pci_map_irq_fn map_irq, const char *name); static inline void +pci_set_byte(uint8_t *config, uint8_t val) +{ +*config = val; +} + +static inline uint8_t +pci_get_byte(uint8_t *config) +{ +return *config; +} + +static inline void pci_set_word(uint8_t *config, uint16_t val) { cpu_to_le16wu((uint16_t *)config, val); -- 1.6.3.1.56.g79e1.dirty -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] pte accessor fixes
On Wed, Jun 10, 2009 at 02:28:19PM +0300, Avi Kivity wrote: A minor fix to the pte accessors, followed by a cosmetic fix so we don't hit the same problem next time, followed by a cosmetic patch just because I had my search-and-replace warmed up. Please review. Looks good. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] pte accessor fixes
Marcelo Tosatti wrote: On Wed, Jun 10, 2009 at 02:28:19PM +0300, Avi Kivity wrote: A minor fix to the pte accessors, followed by a cosmetic fix so we don't hit the same problem next time, followed by a cosmetic patch just because I had my search-and-replace warmed up. Please review. Looks good. Thanks, merged. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/5] KVM: MMU: add kvm_mmu_shadow_walk helper
On Wed, Jun 10, 2009 at 03:23:47PM +0300, Avi Kivity wrote: Marcelo Tosatti wrote: Isn't it simpler to invoke for_each_shadow_entry(), instead of defining a callback and calling it? We had those callbacks once, then switched to for_each. The point is its exported to use in a external module (kvm-intel.ko), so you hide the details (such as locking) in the kvm_mmu_shadow_walk helper. Let me know how do you prefer this to be. Ah, you're right. I don't think it's worthwhile to add all this just for debugging. You can add a function that dumps the spte chain as well as the features MSR, and we can decode it by hand when we see it. Disadvantage is more work for us when we hit the bug, but at least that function is reusable in other contexts. The problem is if someone hits the bug who has no possibility of reporting the printks. Nice thing about the WARN_ON's there is you can look up kerneloops.org, match the line number to the kernel version, and narrow down what bits are wrong (you still need reporter to send contents of dmesg for full spte value). Also the bit-by-bit validity checks (the inspect_spte function in vmx.c) can be used in the mmu audit code (thats the reason for print=0 in the callback parameters), so it is reusable in other contexes. What you dislike is hardcoding the bits checked in C code? Don't worry about the level stuff, will be handled next. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network throughput limits for local VM - VM communication
On Tuesday 09 June 2009, Fischer, Anna wrote: I have tried using virtio and using the emulated QEMU virtual NICs. It does not make a difference. It seems as if there is an overflow somewhere when QEMU/virtio cannot cope with the network load any more, and then the virtual interfaces don't seem to transmit anything anymore. It seems to mostly work again when I shut down and start up the interfaces of the router inside of the guest. I use two bridges (and VLANs) that pass packets between sending/receiving guests and the routing guest. The set-up works fine for simple ping and other communication that is low-throughput type traffic. Have you tried eliminating VLAN to simplify the setup? Does it change when the guests communicate over a -net socket interface with your router instead of the -net tap + bridge in the host? Arnd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Network throughput limits for local VM - VM communication
Subject: Re: Network throughput limits for local VM - VM communication On Tuesday 09 June 2009, Fischer, Anna wrote: I have tried using virtio and using the emulated QEMU virtual NICs. It does not make a difference. It seems as if there is an overflow somewhere when QEMU/virtio cannot cope with the network load any more, and then the virtual interfaces don't seem to transmit anything anymore. It seems to mostly work again when I shut down and start up the interfaces of the router inside of the guest. I use two bridges (and VLANs) that pass packets between sending/receiving guests and the routing guest. The set-up works fine for simple ping and other communication that is low-throughput type traffic. Have you tried eliminating VLAN to simplify the setup? No - but there is a relating bug in the tun/tap interface (well, it is not really a bug but simply the way tun/tap works) that will cause packets to be replicated on all the tap interfaces (across all bridges attached to those) if I do not configure VLANs. This will result in a system that is even more overloaded. I had discovered this a while back when running UDP stress tests under 10G. Does it change when the guests communicate over a -net socket interface with your router instead of the -net tap + bridge in the host? I have not tried this - I need the bridge in the network data path for some testing, so using the -net socket interface would not solve my problem. However, I have just today managed to get around this bug by using the e1000 QEMU emulated NIC model and this seems to do the trick. Now the throughput is still very low, but that might simply be because my system is too weak. When using the e1000 model instead of rtl8139 or virtio, I do not have any network crashes any more. I have been looking through the RedHat bug lists to search for some hints today, and it seems as if there are a lot of people seeing the network under KVM break down under heavy load. I think that this is something that needs some further investigation. I can provide more detailed system set-up etc, it should be easy to reproduce this. Thanks for your help, Anna -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions
Note that platform must set a flag to declare MSI supported. For PC this will be set by APIC. This sounds wrong. The device shouldn't know or care whether the system has a MSI capable interrupt controller. That's for the guest OS to figure out. You are right of course. In theory there's nothing that breaks if I set this flag to on, on all platforms. OTOH if qemu emulates some controller incorrectly, guest might misdetect MSI support in the controller, and things will break horribly. It seems safer to have a flag that can be enabled by people that know about a specific platform. No. The solution is to fix whatever is broken. If we really need to avoid MSI-X capable devices then that should be done explicity per-device. i.e. you have a different virtio-net device that does not use MSI-X. Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [KVM-AUTOTEST PATCH 1/4] Make all programs on kvm test use /usr/bin/python
On Wed, Jun 10, 2009 at 4:01 AM, Alexey Eromenkoaerom...@redhat.com wrote: Even better would be to use /usr/bin/python2. That doesn't seem to exist, on Ubuntu at least. This is because future distros will include python3, which is incompatible with python2 code. python will be symlink of python3. -Alexey -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions
On Wed, Jun 10, 2009 at 03:07:34PM +0100, Paul Brook wrote: Note that platform must set a flag to declare MSI supported. For PC this will be set by APIC. This sounds wrong. The device shouldn't know or care whether the system has a MSI capable interrupt controller. That's for the guest OS to figure out. You are right of course. In theory there's nothing that breaks if I set this flag to on, on all platforms. OTOH if qemu emulates some controller incorrectly, guest might misdetect MSI support in the controller, and things will break horribly. It seems safer to have a flag that can be enabled by people that know about a specific platform. No. The solution is to fix whatever is broken. That's easy enough then. Patch below. If we really need to avoid MSI-X capable devices then that should be done explicity per-device. i.e. you have a different virtio-net device that does not use MSI-X. Paul Why should it be done per-device? --- Don't add an option for platforms to disable MSI-X in all devices. Paul Brook will find and fix all platforms that have broken MSI-X emulation. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- diff --git a/hw/apic.c b/hw/apic.c index ed03a36..9d44061 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -945,7 +945,6 @@ int apic_init(CPUState *env) s-cpu_env = env; apic_reset(s); -msix_supported = 1; /* XXX: mapping more APICs at the same memory location */ if (apic_io_memory == 0) { diff --git a/hw/msix.c b/hw/msix.c index ce4e6ba..16efb27 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -62,9 +62,6 @@ /* Flag to globally disable MSI-X support */ int msix_disable; -/* Flag for interrupt controller to declare MSI-X support */ -int msix_supported; - /* Add MSI-X capability to the config space for the device. */ /* Given a bar and its size, add MSI-X table on top of it * and fill MSI-X capability in the config space. @@ -232,10 +229,7 @@ void msix_mmio_map(PCIDevice *d, int region_num, int msix_init(struct PCIDevice *dev, unsigned short nentries, unsigned bar_nr, unsigned bar_size) { -int ret = -ENOMEM; -/* Nothing to do if MSI is not supported by interrupt controller */ -if (!msix_supported) -return -ENOTTY; +int ret; if (nentries MSIX_MAX_ENTRIES) return -EINVAL; diff --git a/hw/msix.h b/hw/msix.h index 79e84a3..2fcadd3 100644 --- a/hw/msix.h +++ b/hw/msix.h @@ -30,6 +30,5 @@ void msix_notify(PCIDevice *dev, unsigned vector); void msix_reset(PCIDevice *dev); extern int msix_disable; -extern int msix_supported; #endif -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions
If we really need to avoid MSI-X capable devices then that should be done explicity per-device. i.e. you have a different virtio-net device that does not use MSI-X. Paul Why should it be done per-device? Because otherwise you end up with the horrible hacks that you're currently tripping over: devices have to magically morph into a different device when you load a VM. That's seems just plain wrong to me. Loading a VM shouldn't not do anything that can't happen during normal operation. Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TODO list for qemu+KVM networking performance v2
On Wed, 10 Jun 2009 03:56:31 pm Dor Laor wrote: Rusty Russell wrote: The current theoretical hole is that the host suppresses notifications using the VIRTIO_AVAIL_F_NO_NOTIFY flag, but we can get a number of notifications in before it gets to that suppression. You can use a counter to improve this: you only notify when they're equal, and inc when you notify. That way you suppress further notifications even if the other side takes ages to wake up. In practice, this shouldn't be played with until we have full aio (or equiv in kernel) for other side: host xmit tends to be too fast at the moment and we get a notification per packet anyway. Xen ring has the exact optimization for ages. imho we should have it too, regardless of aio. It reduces #vmexits/spurious wakeups and it is very simple to implement. But look at number of wakeups received vs notifications sent: I just don't see any benefit there at the moment. As I said, improving the host code might change that significantly. And implementing it the other way is v. v. hard given the nature of interrupts (shared and coalesced). Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions
On Wed, Jun 10, 2009 at 03:39:05PM +0100, Paul Brook wrote: If we really need to avoid MSI-X capable devices then that should be done explicity per-device. i.e. you have a different virtio-net device that does not use MSI-X. Paul Why should it be done per-device? Because otherwise you end up with the horrible hacks that you're currently tripping over: devices have to magically morph into a different device when you load a VM. No, the hacks are there so I that I can support loading and saving from non-MSI setups in a backward-compatible way. The flag we are discussing is set at qemu startup and can't change across load/store. That's seems just plain wrong to me. Loading a VM shouldn't not do anything that can't happen during normal operation. At least wrt pci, we are very far from this state: load just overwrites all registers, readonly or not, which can never happen during normal operation. And if we fix it, and only edit BAR registers, the happy result will be that we can't add functionality to PCI devices without breaking guests and/or breaking loading from old images each time. Paul -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities
On Wed, Jun 10, 2009 at 12:54:15PM +0300, Michael S. Tsirkin wrote: On Tue, Jun 09, 2009 at 02:11:14PM -0300, Glauber Costa wrote: On Fri, Jun 05, 2009 at 01:23:15PM +0300, Michael S. Tsirkin wrote: Add routines to manage PCI capability list. First user will be MSI-X. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/pci.c | 98 -- hw/pci.h | 18 +++- 2 files changed, 106 insertions(+), 10 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 361d741..ed011b5 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -130,12 +130,13 @@ void pci_device_save(PCIDevice *s, QEMUFile *f) int version = s-cap_present ? 3 : 2; int i; -qemu_put_be32(f, version); /* PCI device version */ +/* PCI device version and capabilities */ +qemu_put_be32(f, version); +if (version = 3) +qemu_put_be32(f, s-cap_present); qemu_put_buffer(f, s-config, 256); for (i = 0; i 4; i++) qemu_put_be32(f, s-irq_state[i]); -if (version = 3) -qemu_put_be32(f, s-cap_present); } What is it doing here? You should just do it right in the first patch, instead of doing in one way there, and fixing here. int pci_device_load(PCIDevice *s, QEMUFile *f) @@ -146,12 +147,6 @@ int pci_device_load(PCIDevice *s, QEMUFile *f) version_id = qemu_get_be32(f); if (version_id 3) return -EINVAL; -qemu_get_buffer(f, s-config, 256); -pci_update_mappings(s); - -if (version_id = 2) -for (i = 0; i 4; i ++) -s-irq_state[i] = qemu_get_be32(f); if (version_id = 3) s-cap_present = qemu_get_be32(f); else ditto. @@ -160,6 +155,18 @@ int pci_device_load(PCIDevice *s, QEMUFile *f) if (s-cap_present ~s-cap_supported) return -EINVAL; +qemu_get_buffer(f, s-config, 256); +pci_update_mappings(s); + +if (version_id = 2) +for (i = 0; i 4; i ++) +s-irq_state[i] = qemu_get_be32(f); +/* Clear wmask and used bits for capabilities. + Must be restored separately, since capabilities can + be placed anywhere in config space. */ +memset(s-used, 0, PCI_CONFIG_SPACE_SIZE); +for (i = PCI_CONFIG_HEADER_SIZE; i PCI_CONFIG_SPACE_SIZE; ++i) +s-wmask[i] = 0xff; return 0; } Sorry, I don't exactly understand it. Although it can be anywhere, what do we actually lose by keeping it at the same place in config space? We lose the ability to let user control the capabilities exposed by the device. And generally, I dislike arbitrary limitations. The PCI spec says the capability can be anywhere, implementing a linked list of caps is simple enough to not invent abritrary restrictions. yes, but this is migration time, right? caps can be anywhere, but we don't expect it to change during machine execution lifetime. Or I am just confused by the name pci_device_load ? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TODO list for qemu+KVM networking performance v2
On Thu, Jun 11, 2009 at 12:09:33AM +0930, Rusty Russell wrote: On Wed, 10 Jun 2009 03:56:31 pm Dor Laor wrote: Rusty Russell wrote: The current theoretical hole is that the host suppresses notifications using the VIRTIO_AVAIL_F_NO_NOTIFY flag, but we can get a number of notifications in before it gets to that suppression. You can use a counter to improve this: you only notify when they're equal, and inc when you notify. That way you suppress further notifications even if the other side takes ages to wake up. In practice, this shouldn't be played with until we have full aio (or equiv in kernel) for other side: host xmit tends to be too fast at the moment and we get a notification per packet anyway. Xen ring has the exact optimization for ages. imho we should have it too, regardless of aio. It reduces #vmexits/spurious wakeups and it is very simple to implement. But look at number of wakeups received vs notifications sent: I just don't see any benefit there at the moment. As I said, improving the host code might change that significantly. And implementing it the other way is v. v. hard given the nature of interrupts (shared and coalesced). I agree it's not such a simple thing to implement race-free, so I do buy the argument that we shouldn't unless it gives a performance benefit. But I don't understand how aio will make implementing it easier - or are you merely saying that it will make it worthwhile? -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] qemu-kvm BIOS move _PR to SSDT
Hi, Per Avi's request, here's a patch which moves the current _PR section from the DSDT table into a new SSDT table. The long term idea is to be able to have multiple different SSDTs for different sized systems, and in particular allow to keep the legacy one for supporting legacy operating systems which cannot handle more then 15 vcpus. Cheers, Jes Move _PR block from the DSDT to a new SSDT in the KVM BIOS. This will make it possible to plug in different SSDTs with different processor counts in the future. Signed-off-by: Jes Sorensen j...@sgi.com --- kvm/bios/Makefile | 10 ++- kvm/bios/acpi-dsdt.dsl | 115 kvm/bios/acpi-ssdt.dsl | 140 + kvm/bios/rombios32.c | 16 - 4 files changed, 162 insertions(+), 119 deletions(-) Index: qemu-kvm/kvm/bios/Makefile === --- qemu-kvm.orig/kvm/bios/Makefile +++ qemu-kvm/kvm/bios/Makefile @@ -108,13 +108,19 @@ rombios32.bin: rombios32.out rombios.h rombios32.out: rombios32start.o rombios32.o vapic.o rombios32.ld ld -o $@ -T rombios32.ld rombios32start.o vapic.o rombios32.o -rombios32.o: rombios32.c acpi-dsdt.hex +rombios32.o: rombios32.c acpi-dsdt.hex acpi-ssdt.hex $(GCC) -m32 -O2 -Wall -c -o $@ $ acpi-dsdt.hex: acpi-dsdt.dsl cpp -P $ $.i iasl -tc -p $@ $.i - sed -i -e's/^unsigned/const unsigned/' $@ + sed -i -e's/^unsigned char AmlCode/const unsigned char DSDTCode/' $@ + rm $.i + +acpi-ssdt.hex: acpi-ssdt.dsl + cpp -P $ $.i + iasl -tc -p $@ $.i + sed -i -e's/^unsigned char AmlCode/const unsigned char SSDTCode/' $@ rm $.i rombios32start.o: rombios32start.S Index: qemu-kvm/kvm/bios/acpi-dsdt.dsl === --- qemu-kvm.orig/kvm/bios/acpi-dsdt.dsl +++ qemu-kvm/kvm/bios/acpi-dsdt.dsl @@ -25,119 +25,6 @@ DefinitionBlock ( 0x1 // OEM Revision ) { - Scope (\_PR) - { - /* pointer to first element of MADT APIC structures */ - OperationRegion(ATPR, SystemMemory, 0x0514, 4) - Field (ATPR, DwordAcc, NoLock, Preserve) - { - ATP, 32 - } - -#define madt_addr(nr) Add (ATP, Multiply(nr, 8)) - -#define gen_processor(nr, name) \ - Processor (CPU##name, nr, 0xb010, 0x06) { \ - OperationRegion (MATR, SystemMemory, madt_addr(nr), 8) \ - Field (MATR, ByteAcc, NoLock, Preserve) \ - { \ - MAT, 64 \ - } \ - Field (MATR, ByteAcc, NoLock, Preserve) \ - { \ - Offset(4), \ - FLG, 1 \ - } \ -Method(_MAT, 0) { \ - Return(MAT) \ -} \ -Method (_STA) { \ -If (FLG) { Return(0xF) } Else { Return(0x9) } \ -} \ -} \ - - - gen_processor(0, 0) - gen_processor(1, 1) - gen_processor(2, 2) - gen_processor(3, 3) - gen_processor(4, 4) - gen_processor(5, 5) - gen_processor(6, 6) - gen_processor(7, 7) - gen_processor(8, 8) - gen_processor(9, 9) - gen_processor(10, A) - gen_processor(11, B) - gen_processor(12, C) - gen_processor(13, D) - gen_processor(14, E) - - Method (NTFY, 2) { -#define gen_ntfy(nr)\ - If (LEqual(Arg0, 0x##nr)) { \ - If (LNotEqual(Arg1, \_PR.CPU##nr.FLG)) {\ - Store (Arg1, \_PR.CPU##nr.FLG) \ - If (LEqual(Arg1, 1)) { \ -Notify(CPU##nr, 1) \ - } Else {\ -Notify(CPU##nr, 3) \ - } \ - } \ - } - gen_ntfy(0) - gen_ntfy(1) - gen_ntfy(2) - gen_ntfy(3) - gen_ntfy(4) - gen_ntfy(5) - gen_ntfy(6) - gen_ntfy(7) - gen_ntfy(8) - gen_ntfy(9) - gen_ntfy(A) - gen_ntfy(B) - gen_ntfy(C) - gen_ntfy(D) - gen_ntfy(E) - Return(One) - } - - OperationRegion(PRST, SystemIO, 0xaf00, 32) - Field (PRST, ByteAcc, NoLock, Preserve) - { - PRS, 256 - } - - Method(PRSC, 0) { - Store(PRS, Local3) - Store(Zero, Local0) - While(LLess(Local0, 32)) { - Store(Zero, Local1) - Store(DerefOf(Index(Local3, Local0)), Local2) - While(LLess(Local1, 8)) { -
Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities
On Wed, Jun 10, 2009 at 11:55:40AM -0300, Glauber Costa wrote: On Wed, Jun 10, 2009 at 12:54:15PM +0300, Michael S. Tsirkin wrote: On Tue, Jun 09, 2009 at 02:11:14PM -0300, Glauber Costa wrote: On Fri, Jun 05, 2009 at 01:23:15PM +0300, Michael S. Tsirkin wrote: Add routines to manage PCI capability list. First user will be MSI-X. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/pci.c | 98 -- hw/pci.h | 18 +++- 2 files changed, 106 insertions(+), 10 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 361d741..ed011b5 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -130,12 +130,13 @@ void pci_device_save(PCIDevice *s, QEMUFile *f) int version = s-cap_present ? 3 : 2; int i; -qemu_put_be32(f, version); /* PCI device version */ +/* PCI device version and capabilities */ +qemu_put_be32(f, version); +if (version = 3) +qemu_put_be32(f, s-cap_present); qemu_put_buffer(f, s-config, 256); for (i = 0; i 4; i++) qemu_put_be32(f, s-irq_state[i]); -if (version = 3) -qemu_put_be32(f, s-cap_present); } What is it doing here? You should just do it right in the first patch, instead of doing in one way there, and fixing here. int pci_device_load(PCIDevice *s, QEMUFile *f) @@ -146,12 +147,6 @@ int pci_device_load(PCIDevice *s, QEMUFile *f) version_id = qemu_get_be32(f); if (version_id 3) return -EINVAL; -qemu_get_buffer(f, s-config, 256); -pci_update_mappings(s); - -if (version_id = 2) -for (i = 0; i 4; i ++) -s-irq_state[i] = qemu_get_be32(f); if (version_id = 3) s-cap_present = qemu_get_be32(f); else ditto. @@ -160,6 +155,18 @@ int pci_device_load(PCIDevice *s, QEMUFile *f) if (s-cap_present ~s-cap_supported) return -EINVAL; +qemu_get_buffer(f, s-config, 256); +pci_update_mappings(s); + +if (version_id = 2) +for (i = 0; i 4; i ++) +s-irq_state[i] = qemu_get_be32(f); +/* Clear wmask and used bits for capabilities. + Must be restored separately, since capabilities can + be placed anywhere in config space. */ +memset(s-used, 0, PCI_CONFIG_SPACE_SIZE); +for (i = PCI_CONFIG_HEADER_SIZE; i PCI_CONFIG_SPACE_SIZE; ++i) +s-wmask[i] = 0xff; return 0; } Sorry, I don't exactly understand it. Although it can be anywhere, what do we actually lose by keeping it at the same place in config space? We lose the ability to let user control the capabilities exposed by the device. And generally, I dislike arbitrary limitations. The PCI spec says the capability can be anywhere, implementing a linked list of caps is simple enough to not invent abritrary restrictions. yes, but this is migration time, right? I think so, yes. caps can be anywhere, but we don't expect it to change during machine execution lifetime. Or I am just confused by the name pci_device_load ? Right. So I want to load an image and it has capability X at offset Y. wmask has to match. I don't want to assume that we never change Y for the device without breaking old images, so I clear wmask here and set it up again after looking up capabilities that I loaded. Maybe this explanation should go into the comment above? -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [KVM-AUTOTEST PATCH] Make all programs on kvm test use /usr/bin/python - take 2
Looks good. On Tue, Jun 9, 2009 at 5:57 PM, Lucas Meneghel Rodriguesl...@redhat.com wrote: All kvm modules that can be used as stand alone programs were updated to use #!/usr/bin/python instead of #!/usr/bin/env python, complying with the rest of the autotest code base. As suggested by Martin, common.py was added. With this, the stand alone programs will be able to use the autotest library namespace and choose the best python interpreter available in the system. Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/tests/kvm/common.py | 8 client/tests/kvm/fix_cdkeys.py | 3 ++- client/tests/kvm/kvm_config.py | 4 +++- client/tests/kvm/make_html_report.py | 5 +++-- client/tests/kvm/stepeditor.py | 4 ++-- client/tests/kvm/stepmaker.py | 4 +++- 6 files changed, 21 insertions(+), 7 deletions(-) create mode 100644 client/tests/kvm/common.py diff --git a/client/tests/kvm/common.py b/client/tests/kvm/common.py new file mode 100644 index 000..ce78b85 --- /dev/null +++ b/client/tests/kvm/common.py @@ -0,0 +1,8 @@ +import os, sys +dirname = os.path.dirname(sys.modules[__name__].__file__) +client_dir = os.path.abspath(os.path.join(dirname, .., ..)) +sys.path.insert(0, client_dir) +import setup_modules +sys.path.pop(0) +setup_modules.setup(base_path=client_dir, + root_module_name=autotest_lib.client) diff --git a/client/tests/kvm/fix_cdkeys.py b/client/tests/kvm/fix_cdkeys.py index 4f7a824..7a821fa 100755 --- a/client/tests/kvm/fix_cdkeys.py +++ b/client/tests/kvm/fix_cdkeys.py @@ -1,5 +1,6 @@ -#!/usr/bin/env python +#!/usr/bin/python import shutil, os, sys +import common Program that replaces the CD keys present on a KVM autotest configuration file. diff --git a/client/tests/kvm/kvm_config.py b/client/tests/kvm/kvm_config.py index 40f16f1..13fdac2 100755 --- a/client/tests/kvm/kvm_config.py +++ b/client/tests/kvm/kvm_config.py @@ -1,4 +1,6 @@ +#!/usr/bin/python import re, os, sys, StringIO +import common from autotest_lib.client.common_lib import error @@ -356,7 +358,7 @@ class config: # (inside an exception or inside subvariants) if restricted: e_msg = Using variants in this context is not allowed - raise error.AutotestError() + raise error.AutotestError(e_msg) if self.debug and not restricted: self.__debug_print(indented_line, Entering variants block (%d dicts in diff --git a/client/tests/kvm/make_html_report.py b/client/tests/kvm/make_html_report.py index 6aed39e..e69367b 100755 --- a/client/tests/kvm/make_html_report.py +++ b/client/tests/kvm/make_html_report.py @@ -1,4 +1,7 @@ #!/usr/bin/python +import os, sys, re, getopt, time, datetime, commands +import common + Script used to parse the test results and generate an HTML report. @@ -7,8 +10,6 @@ Script used to parse the test results and generate an HTML report. �...@author: Dror Russo (dru...@redhat.com) -import os, sys, re, getopt, time, datetime, commands - format_css= html,body { diff --git a/client/tests/kvm/stepeditor.py b/client/tests/kvm/stepeditor.py index 9669200..e7794ac 100755 --- a/client/tests/kvm/stepeditor.py +++ b/client/tests/kvm/stepeditor.py @@ -1,6 +1,6 @@ -#!/usr/bin/env python +#!/usr/bin/python import pygtk, gtk, os, glob, shutil, sys, logging -import ppm_utils +import common, ppm_utils pygtk.require('2.0') diff --git a/client/tests/kvm/stepmaker.py b/client/tests/kvm/stepmaker.py index 2b7fd54..8f16ffd 100644 --- a/client/tests/kvm/stepmaker.py +++ b/client/tests/kvm/stepmaker.py @@ -1,8 +1,10 @@ -#!/usr/bin/env python +#!/usr/bin/python import pygtk, gtk, gobject, time, os, commands +import common from autotest_lib.client.common_lib import error import kvm_utils, logging, ppm_utils, stepeditor pygtk.require('2.0') + Step file creator/editor. -- 1.6.2.2 ___ Autotest mailing list autot...@test.kernel.org http://test.kernel.org/cgi-bin/mailman/listinfo/autotest -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] qemu-kvm BIOS move _PR to SSDT
On Wed, Jun 10, 2009 at 05:01:18PM +0200, Jes Sorensen wrote: Index: qemu-kvm/kvm/bios/acpi-ssdt.dsl === --- /dev/null +++ qemu-kvm/kvm/bios/acpi-ssdt.dsl @@ -0,0 +1,140 @@ +/* + * Bochs/QEMU ACPI SSDT ASL definition + * + * Copyright (c) 2006 Fabrice Bellard I am not sure that there is even one bit of code below written by Fabrice. + * Copyright (c) 2009 SGI, Jes Sorensen j...@sgi.com + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License version 2 as published by the Free Software Foundation. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + */ +DefinitionBlock ( +acpi-ssdt.aml,// Output Filename +SSDT, // Signature +0x01, // DSDT Compliance Revision +BXPC, // OEMID +BXSSDT, // TABLE ID +0x1 // OEM Revision +) +{ + Scope (\_PR) + { + /* pointer to first element of MADT APIC structures */ + OperationRegion(ATPR, SystemMemory, 0x0514, 4) + Field (ATPR, DwordAcc, NoLock, Preserve) + { + ATP, 32 + } + +#define madt_addr(nr) Add (ATP, Multiply(nr, 8)) + +#define gen_processor(nr, name) \ + Processor (C##name, nr, 0xb010, 0x06) { \ + OperationRegion (MATR, SystemMemory, madt_addr(nr), 8) \ + Field (MATR, ByteAcc, NoLock, Preserve) \ + { \ + MAT, 64 \ + } \ + Field (MATR, ByteAcc, NoLock, Preserve) \ + { \ + Offset(4), \ + FLG, 1 \ + } \ +Method(_MAT, 0) { \ + Return(MAT) \ +} \ +Method (_STA) { \ +If (FLG) { Return(0xF) } Else { Return(0x9) } \ +} \ +} \ + + + gen_processor(0, 0) + gen_processor(1, 1) + gen_processor(2, 2) + gen_processor(3, 3) + gen_processor(4, 4) + gen_processor(5, 5) + gen_processor(6, 6) + gen_processor(7, 7) + gen_processor(8, 8) + gen_processor(9, 9) + gen_processor(10, A) + gen_processor(11, B) + gen_processor(12, C) + gen_processor(13, D) + gen_processor(14, E) + + Method (NTFY, 2) { +#define gen_ntfy(nr)\ + If (LEqual(Arg0, 0x##nr)) { \ + If (LNotEqual(Arg1, \_PR.C##nr.FLG)) { \ + Store (Arg1, \_PR.C##nr.FLG)\ + If (LEqual(Arg1, 1)) { \ + Notify(C##nr, 1)\ + } Else {\ + Notify(C##nr, 3)\ + } \ + } \ + } + gen_ntfy(0) + gen_ntfy(1) + gen_ntfy(2) + gen_ntfy(3) + gen_ntfy(4) + gen_ntfy(5) + gen_ntfy(6) + gen_ntfy(7) + gen_ntfy(8) + gen_ntfy(9) + gen_ntfy(A) + gen_ntfy(B) + gen_ntfy(C) + gen_ntfy(D) + gen_ntfy(E) + Return(One) + } + + OperationRegion(PRST, SystemIO, 0xaf00, 32) + Field (PRST, ByteAcc, NoLock, Preserve) + { + PRS, 256 + } + + Method(PRSC, 0) { + Store(PRS, Local3) + Store(Zero, Local0) + While(LLess(Local0,
Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions
That's seems just plain wrong to me. Loading a VM shouldn't not do anything that can't happen during normal operation. At least wrt pci, we are very far from this state: load just overwrites all registers, readonly or not, which can never happen during normal operation. IMO that code is wrong. We should only be loading things that the guest can change (directly or indirectly). Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CentOS-devel] Latest kvm packages for CentOS 5.3
On Wed, 10 Jun 2009, Federico Simoncelli wrote: I've been working quite extensively with kvm on CentOS 5.3 lately. If you are interested in the latest rpm of kvm-kmod-2.6.30-rc8, qemu-kvm-0.10.5 and libvirt-0.6.4 you can temporary find them here: http://update.nethesis.it/kvm/ I've had no problem so far using these packages. Feedback is welcome. RHEL5.4 is expected to have KVM support, so it would be nice to know in advance which version is being included with RHEL 5.4. Then we can update our own CentOS kvm kmod for testing and reporting upstream the issue(s) we still find. Today the kvm module is still missing from the upstream 2.6.18-152.el5 kernel, just as fuse which is also expected to appear. -- -- dag wieers, d...@centos.org, http://dag.wieers.com/ -- [Any errors in spelling, tact or fact are transmission errors] -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TODO list for qemu+KVM networking performance v2
Michael S. Tsirkin wrote: But I don't understand how aio will make implementing it easier - or are you merely saying that it will make it worthwhile? If you have aio, the the NIC and the guest proceed in parallel. If the guest is faster (likely), then when it sends the next packet it will see that interrupts are disabled and not notify again. Once aio complete we can recheck the queue; if it's empty we reenable notifications. If there's still stuff in it we submit it with notifications disabled. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/5] KVM: MMU: add kvm_mmu_shadow_walk helper
Marcelo Tosatti wrote: Ah, you're right. I don't think it's worthwhile to add all this just for debugging. You can add a function that dumps the spte chain as well as the features MSR, and we can decode it by hand when we see it. Disadvantage is more work for us when we hit the bug, but at least that function is reusable in other contexts. The problem is if someone hits the bug who has no possibility of reporting the printks. Nice thing about the WARN_ON's there is you can look up kerneloops.org, match the line number to the kernel version, and narrow down what bits are wrong (you still need reporter to send contents of dmesg for full spte value). Well, we can KERN_WARNING instead of KERN_DEBUG in that printout. Also the bit-by-bit validity checks (the inspect_spte function in vmx.c) can be used in the mmu audit code (thats the reason for print=0 in the callback parameters), so it is reusable in other contexes. What you dislike is hardcoding the bits checked in C code? Don't worry about the level stuff, will be handled next. I just don't want to introduce yet another function-callback pair for such a small thing. No objection to the code itself. Printing out the spte hierarchy seems a good idea in other bug contexts, so at least that function is reusable. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] qemu-kvm BIOS move _PR to SSDT
On 06/10/2009 05:13 PM, Gleb Natapov wrote: + * Bochs/QEMU ACPI SSDT ASL definition + * + * Copyright (c) 2006 Fabrice Bellard I am not sure that there is even one bit of code below written by Fabrice. Well better safe than sorry, since I copied over a bunch of stuff, I figured it was simpler to just leave it intact. +Scope (\) +{ +/* Debug Output */ +OperationRegion (DBG, SystemIO, 0xb044, 0x04) +Field (DBG, DWordAcc, NoLock, Preserve) +{ +DBGL, 32, +} +} Except may be this one, but it should stay in DSDT. Hmmm, good point. I think I just moved it when I had zero idea what was going on at first in this AML hell :-) Try this one. Cheers, Jes Move _PR block from the DSDT to a new SSDT in the KVM BIOS. This will make it possible to plug in different SSDTs with different processor counts in the future. Signed-off-by: Jes Sorensen j...@sgi.com --- kvm/bios/Makefile | 10 +++ kvm/bios/acpi-dsdt.dsl | 105 --- kvm/bios/acpi-ssdt.dsl | 130 + kvm/bios/rombios32.c | 16 -- 4 files changed, 152 insertions(+), 109 deletions(-) Index: qemu-kvm/kvm/bios/Makefile === --- qemu-kvm.orig/kvm/bios/Makefile +++ qemu-kvm/kvm/bios/Makefile @@ -108,13 +108,19 @@ rombios32.bin: rombios32.out rombios.h rombios32.out: rombios32start.o rombios32.o vapic.o rombios32.ld ld -o $@ -T rombios32.ld rombios32start.o vapic.o rombios32.o -rombios32.o: rombios32.c acpi-dsdt.hex +rombios32.o: rombios32.c acpi-dsdt.hex acpi-ssdt.hex $(GCC) -m32 -O2 -Wall -c -o $@ $ acpi-dsdt.hex: acpi-dsdt.dsl cpp -P $ $.i iasl -tc -p $@ $.i - sed -i -e's/^unsigned/const unsigned/' $@ + sed -i -e's/^unsigned char AmlCode/const unsigned char DSDTCode/' $@ + rm $.i + +acpi-ssdt.hex: acpi-ssdt.dsl + cpp -P $ $.i + iasl -tc -p $@ $.i + sed -i -e's/^unsigned char AmlCode/const unsigned char SSDTCode/' $@ rm $.i rombios32start.o: rombios32start.S Index: qemu-kvm/kvm/bios/acpi-dsdt.dsl === --- qemu-kvm.orig/kvm/bios/acpi-dsdt.dsl +++ qemu-kvm/kvm/bios/acpi-dsdt.dsl @@ -25,108 +25,6 @@ DefinitionBlock ( 0x1 // OEM Revision ) { - Scope (\_PR) - { - /* pointer to first element of MADT APIC structures */ - OperationRegion(ATPR, SystemMemory, 0x0514, 4) - Field (ATPR, DwordAcc, NoLock, Preserve) - { - ATP, 32 - } - -#define madt_addr(nr) Add (ATP, Multiply(nr, 8)) - -#define gen_processor(nr, name) \ - Processor (CPU##name, nr, 0xb010, 0x06) { \ - OperationRegion (MATR, SystemMemory, madt_addr(nr), 8) \ - Field (MATR, ByteAcc, NoLock, Preserve) \ - { \ - MAT, 64 \ - } \ - Field (MATR, ByteAcc, NoLock, Preserve) \ - { \ - Offset(4), \ - FLG, 1 \ - } \ -Method(_MAT, 0) { \ - Return(MAT) \ -} \ -Method (_STA) { \ -If (FLG) { Return(0xF) } Else { Return(0x9) } \ -} \ -} \ - - - gen_processor(0, 0) - gen_processor(1, 1) - gen_processor(2, 2) - gen_processor(3, 3) - gen_processor(4, 4) - gen_processor(5, 5) - gen_processor(6, 6) - gen_processor(7, 7) - gen_processor(8, 8) - gen_processor(9, 9) - gen_processor(10, A) - gen_processor(11, B) - gen_processor(12, C) - gen_processor(13, D) - gen_processor(14, E) - - Method (NTFY, 2) { -#define gen_ntfy(nr)\ - If (LEqual(Arg0, 0x##nr)) { \ - If (LNotEqual(Arg1, \_PR.CPU##nr.FLG)) {\ - Store (Arg1, \_PR.CPU##nr.FLG) \ - If (LEqual(Arg1, 1)) { \ -Notify(CPU##nr, 1) \ - } Else {\ -Notify(CPU##nr, 3) \ - } \ - } \ - } - gen_ntfy(0) - gen_ntfy(1) - gen_ntfy(2) - gen_ntfy(3) - gen_ntfy(4) - gen_ntfy(5) - gen_ntfy(6) - gen_ntfy(7) - gen_ntfy(8) - gen_ntfy(9) - gen_ntfy(A) - gen_ntfy(B) -
[patch 5/6] KVM: MMU audit: audit_mappings tweaks
- Fail early in case gfn_to_pfn returns is_error_pfn. - For the pre pte write case, avoid spurious gva is valid but spte is notrap messages (the emulation code does the guest write first, so this particular case is OK). Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/kvm/mmu.c === --- kvm.orig/arch/x86/kvm/mmu.c +++ kvm/arch/x86/kvm/mmu.c @@ -3092,6 +3092,11 @@ static void audit_mappings_page(struct k pfn_t pfn = gfn_to_pfn(vcpu-kvm, gfn); hpa_t hpa = (hpa_t)pfn PAGE_SHIFT; + if (is_error_pfn(pfn)) { + kvm_release_pfn_clean(pfn); + continue; + } + if (is_shadow_present_pte(ent) (ent PT64_BASE_ADDR_MASK) != hpa) printk(KERN_ERR xx audit error: (%s) levels %d @@ -3263,7 +3268,8 @@ static void kvm_mmu_audit(struct kvm_vcp audit_msg = msg; audit_rmap(vcpu); audit_write_protection(vcpu); - audit_mappings(vcpu); + if (strcmp(pre pte write, audit_msg) != 0) + audit_mappings(vcpu); audit_writable_sptes_have_rmaps(vcpu); dbg = olddbg; } -- -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 3/6] KVM: MMU audit: update audit_write_protection
- Unsync pages contain writable sptes in the rmap. - rmaps do not exclusively contain writable sptes anymore. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/kvm/mmu.c === --- kvm.orig/arch/x86/kvm/mmu.c +++ kvm/arch/x86/kvm/mmu.c @@ -3235,20 +3235,28 @@ static void audit_write_protection(struc struct kvm_mmu_page *sp; struct kvm_memory_slot *slot; unsigned long *rmapp; + u64 *spte; gfn_t gfn; list_for_each_entry(sp, vcpu-kvm-arch.active_mmu_pages, link) { if (sp-role.direct) continue; + if (sp-unsync) + continue; gfn = unalias_gfn(vcpu-kvm, sp-gfn); slot = gfn_to_memslot_unaliased(vcpu-kvm, sp-gfn); rmapp = slot-rmap[gfn - slot-base_gfn]; - if (*rmapp) - printk(KERN_ERR %s: (%s) shadow page has writable - mappings: gfn %lx role %x\n, + + spte = rmap_next(vcpu-kvm, rmapp, NULL); + while (spte) { + if (*spte PT_WRITABLE_MASK) + printk(KERN_ERR %s: (%s) shadow page has + writable mappings: gfn %lx role %x\n, __func__, audit_msg, sp-gfn, sp-role.word); + spte = rmap_next(vcpu-kvm, rmapp, spte); + } } } -- -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/6] mmu audit update v4
Addressing comments, introducing a new helper, handling largepages. -- -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2/6] KVM: MMU audit: update count_writable_mappings / count_rmaps
Under testing, count_writable_mappings returns a value that is 2 integers larger than what count_rmaps returns. Suspicion is that either of the two functions is counting a duplicate (either positively or negatively). Modifying check_writable_mappings_rmap to check for rmap existance on all present MMU pages fails to trigger an error, which should keep Avi happy. Also introduce mmu_spte_walk to invoke a callback on all present sptes visible to the current vcpu, might be useful in the future. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/kvm/mmu.c === --- kvm.orig/arch/x86/kvm/mmu.c +++ kvm/arch/x86/kvm/mmu.c @@ -3020,6 +3020,55 @@ static gva_t canonicalize(gva_t gva) return gva; } + +typedef void (*inspect_spte_fn) (struct kvm *kvm, struct kvm_mmu_page *sp, +u64 *sptep); + +static void __mmu_spte_walk(struct kvm *kvm, struct kvm_mmu_page *sp, + inspect_spte_fn fn) +{ + int i; + + for (i = 0; i PT64_ENT_PER_PAGE; ++i) { + u64 ent = sp-spt[i]; + + if (is_shadow_present_pte(ent)) { + if (sp-role.level 1 !is_large_pte(ent)) { + struct kvm_mmu_page *child; + child = page_header(ent PT64_BASE_ADDR_MASK); + __mmu_spte_walk(kvm, child, fn); + } + if (sp-role.level == 1) + fn(kvm, sp, sp-spt[i]); + } + } +} + +static void mmu_spte_walk(struct kvm_vcpu *vcpu, inspect_spte_fn fn) +{ + int i; + struct kvm_mmu_page *sp; + + if (!VALID_PAGE(vcpu-arch.mmu.root_hpa)) + return; + if (vcpu-arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) { + hpa_t root = vcpu-arch.mmu.root_hpa; + sp = page_header(root); + __mmu_spte_walk(vcpu-kvm, sp, fn); + return; + } + for (i = 0; i 4; ++i) { + hpa_t root = vcpu-arch.mmu.pae_root[i]; + + if (root VALID_PAGE(root)) { + root = PT64_BASE_ADDR_MASK; + sp = page_header(root); + __mmu_spte_walk(vcpu-kvm, sp, fn); + } + } + return; +} + static void audit_mappings_page(struct kvm_vcpu *vcpu, u64 page_pte, gva_t va, int level) { @@ -3112,9 +3161,47 @@ static int count_rmaps(struct kvm_vcpu * return nmaps; } -static int count_writable_mappings(struct kvm_vcpu *vcpu) +void inspect_spte_has_rmap(struct kvm *kvm, struct kvm_mmu_page *sp, u64 *sptep) +{ + unsigned long *rmapp; + struct kvm_mmu_page *rev_sp; + gfn_t gfn; + + if (*sptep PT_WRITABLE_MASK) { + rev_sp = page_header(__pa(sptep)); + gfn = rev_sp-gfns[sptep - rev_sp-spt]; + + if (!gfn_to_memslot(kvm, gfn)) { + if (!printk_ratelimit()) + return; + printk(KERN_ERR %s: no memslot for gfn %ld\n, +audit_msg, gfn); + printk(KERN_ERR %s: index %ld of sp (gfn=%lx)\n, + audit_msg, sptep - rev_sp-spt, + rev_sp-gfn); + dump_stack(); + return; + } + + rmapp = gfn_to_rmap(kvm, rev_sp-gfns[sptep - rev_sp-spt], 0); + if (!*rmapp) { + if (!printk_ratelimit()) + return; + printk(KERN_ERR %s: no rmap for writable spte %llx\n, +audit_msg, *sptep); + dump_stack(); + } + } + +} + +void audit_writable_sptes_have_rmaps(struct kvm_vcpu *vcpu) +{ + mmu_spte_walk(vcpu, inspect_spte_has_rmap); +} + +static void check_writable_mappings_rmap(struct kvm_vcpu *vcpu) { - int nmaps = 0; struct kvm_mmu_page *sp; int i; @@ -3131,20 +3218,16 @@ static int count_writable_mappings(struc continue; if (!(ent PT_WRITABLE_MASK)) continue; - ++nmaps; + inspect_spte_has_rmap(vcpu-kvm, sp, pt[i]); } } - return nmaps; + return; } static void audit_rmap(struct kvm_vcpu *vcpu) { - int n_rmap = count_rmaps(vcpu); - int n_actual = count_writable_mappings(vcpu); - - if (n_rmap != n_actual) - printk(KERN_ERR %s: (%s) rmap %d actual %d\n, - __func__, audit_msg, n_rmap, n_actual); + check_writable_mappings_rmap(vcpu); + count_rmaps(vcpu); }
[patch 1/6] KVM: MMU: introduce is_last_spte helper
Hiding some of the last largepage / level interaction (which is useful for gbpages and for zero based levels). Also merge the PT_PAGE_TABLE_LEVEL clearing loop in unlink_children. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/kvm/mmu.c === --- kvm.orig/arch/x86/kvm/mmu.c +++ kvm/arch/x86/kvm/mmu.c @@ -250,6 +250,15 @@ static int is_rmap_spte(u64 pte) return is_shadow_present_pte(pte); } +static int is_last_spte(u64 pte, int level) +{ + if (level == PT_PAGE_TABLE_LEVEL) + return 1; + if (level == PT_DIRECTORY_LEVEL is_large_pte(pte)) + return 1; + return 0; +} + static pfn_t spte_to_pfn(u64 pte) { return (pte PT64_BASE_ADDR_MASK) PAGE_SHIFT; @@ -1293,25 +1302,17 @@ static void kvm_mmu_page_unlink_children pt = sp-spt; - if (sp-role.level == PT_PAGE_TABLE_LEVEL) { - for (i = 0; i PT64_ENT_PER_PAGE; ++i) { - if (is_shadow_present_pte(pt[i])) - rmap_remove(kvm, pt[i]); - pt[i] = shadow_trap_nonpresent_pte; - } - return; - } - for (i = 0; i PT64_ENT_PER_PAGE; ++i) { ent = pt[i]; if (is_shadow_present_pte(ent)) { - if (!is_large_pte(ent)) { + if (!is_last_spte(ent, sp-role.level)) { ent = PT64_BASE_ADDR_MASK; mmu_page_remove_parent_pte(page_header(ent), pt[i]); } else { - --kvm-stat.lpages; + if (is_large_pte(ent)) + --kvm-stat.lpages; rmap_remove(kvm, pt[i]); } } @@ -2357,8 +2358,7 @@ static void mmu_pte_write_zap_pte(struct pte = *spte; if (is_shadow_present_pte(pte)) { - if (sp-role.level == PT_PAGE_TABLE_LEVEL || - is_large_pte(pte)) + if (is_last_spte(pte, sp-role.level)) rmap_remove(vcpu-kvm, spte); else { child = page_header(pte PT64_BASE_ADDR_MASK); -- -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 4/6] KVM: MMU audit: nontrapping ptes in nonleaf level
It is valid to set non leaf sptes as notrap. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/kvm/mmu.c === --- kvm.orig/arch/x86/kvm/mmu.c +++ kvm/arch/x86/kvm/mmu.c @@ -3084,12 +3084,7 @@ static void audit_mappings_page(struct k va = canonicalize(va); if (level 1) { - if (ent == shadow_notrap_nonpresent_pte) - printk(KERN_ERR audit: (%s) nontrapping pte - in nonleaf level: levels %d gva %lx - level %d pte %llx\n, audit_msg, - vcpu-arch.mmu.root_level, va, level, ent); - else + if (is_shadow_present_pte(ent)) audit_mappings_page(vcpu, ent, va, level - 1); } else { gpa_t gpa = vcpu-arch.mmu.gva_to_gpa(vcpu, va); -- -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities
On Wed, Jun 10, 2009 at 04:24:28PM +0100, Paul Brook wrote: caps can be anywhere, but we don't expect it to change during machine execution lifetime. Or I am just confused by the name pci_device_load ? Right. So I want to load an image and it has capability X at offset Y. wmask has to match. I don't want to assume that we never change Y for the device without breaking old images, so I clear wmask here and set it up again after looking up capabilities that I loaded. We should not be loading state into a different device (or a similar device with a different set of capabilities). If you want to provide backwards compatibility then you should do that by creating a device that is the same as the original. As I mentioned in my earlier mail, loading a snapshot should never do anything that can not be achieved through normal operation. Paul Why shouldn't it? You don't load a snapshot while guest is running. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TODO list for qemu+KVM networking performance v2
On Wed, Jun 10, 2009 at 06:18:01PM +0300, Avi Kivity wrote: Michael S. Tsirkin wrote: But I don't understand how aio will make implementing it easier - or are you merely saying that it will make it worthwhile? If you have aio, the the NIC and the guest proceed in parallel. If the guest is faster (likely), then when it sends the next packet it will see that interrupts are disabled and not notify again. Once aio complete we can recheck the queue; if it's empty we reenable notifications. If there's still stuff in it we submit it with notifications disabled. So you are saying that with aio we won't need this optimization at all? I guess it's late in the day, and my mind is fuzzy... -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions
On Wed, Jun 10, 2009 at 04:15:04PM +0100, Paul Brook wrote: That's seems just plain wrong to me. Loading a VM shouldn't not do anything that can't happen during normal operation. At least wrt pci, we are very far from this state: load just overwrites all registers, readonly or not, which can never happen during normal operation. IMO that code is wrong. We should only be loading things that the guest can change (directly or indirectly). Paul Making it work this way will mean that minor changes to a device can break backwards compatibility with old images, often in surprising ways. What are the advantages? -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions
On Wednesday 10 June 2009, Michael S. Tsirkin wrote: On Wed, Jun 10, 2009 at 04:15:04PM +0100, Paul Brook wrote: That's seems just plain wrong to me. Loading a VM shouldn't not do anything that can't happen during normal operation. At least wrt pci, we are very far from this state: load just overwrites all registers, readonly or not, which can never happen during normal operation. IMO that code is wrong. We should only be loading things that the guest can change (directly or indirectly). Making it work this way will mean that minor changes to a device can break backwards compatibility with old images, often in surprising ways. What are the advantages? If you can't create an identical machine from scratch then I don't consider snapshot/migration to be a useful feature. i.e. as soon as you shutdown and restart the guest it is liable to break anyway. It may be that the snapshot/migration code wants to include a machine config, and create a new machine from that. However this is a separate issue, and arguably something your VM manager should be handling for you. Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TODO list for qemu+KVM networking performance v2
Michael S. Tsirkin wrote: On Wed, Jun 10, 2009 at 06:18:01PM +0300, Avi Kivity wrote: Michael S. Tsirkin wrote: But I don't understand how aio will make implementing it easier - or are you merely saying that it will make it worthwhile? If you have aio, the the NIC and the guest proceed in parallel. If the guest is faster (likely), then when it sends the next packet it will see that interrupts are disabled and not notify again. Once aio complete we can recheck the queue; if it's empty we reenable notifications. If there's still stuff in it we submit it with notifications disabled. So you are saying that with aio we won't need this optimization at all? I guess it's late in the day, and my mind is fuzzy... No, I'm saying with aio the optimization becomes worthwhile. But I joined late in the thread so we may be talking about different things. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] qemu-kvm BIOS move _PR to SSDT
On Wed, Jun 10, 2009 at 05:30:40PM +0200, Jes Sorensen wrote: On 06/10/2009 05:13 PM, Gleb Natapov wrote: + * Bochs/QEMU ACPI SSDT ASL definition + * + * Copyright (c) 2006 Fabrice Bellard I am not sure that there is even one bit of code below written by Fabrice. Well better safe than sorry, since I copied over a bunch of stuff, I figured it was simpler to just leave it intact. +Scope (\) +{ +/* Debug Output */ +OperationRegion (DBG, SystemIO, 0xb044, 0x04) +Field (DBG, DWordAcc, NoLock, Preserve) +{ +DBGL, 32, +} +} Except may be this one, but it should stay in DSDT. Hmmm, good point. I think I just moved it when I had zero idea what was going on at first in this AML hell :-) Try this one. WindowXP/Windows7 BSOD - The system is not (MS)ACPI compliant. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html