[COMMIT master] make-release: fix mtime for a wider range of git versions
From: Bernhard Kohl bernhard.k...@nsn.com With the latest git versions, e.g. 1.7.2.3, git still prints out the tag info in addition to the requested format. So let's simply fetch the first line from the output. In addition I use the --pretty option instead of --format which is not recognized in very old git versions, e.g. 1.5.5.6. Tested with git versions 1.5.5.6 and 1.7.2.3. Signed-off-by: Bernhard Kohl bernhard.k...@nsn.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/kvm/scripts/make-release b/kvm/scripts/make-release index 56302c3..2d050fc 100755 --- a/kvm/scripts/make-release +++ b/kvm/scripts/make-release @@ -51,7 +51,7 @@ cd $(dirname $0)/../.. mkdir -p $(dirname $tarball) git archive --prefix=$name/ --format=tar $commit $tarball -mtime=`git show --format=%ct $commit^{commit} --` +mtime=`git show --pretty=format:%ct $commit^{commit} -- | head -n 1` tarargs=--owner=root --group=root mkdir -p $tmpdir/$name -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] device-assignment: register a reset function
From: Bernhard Kohl bernhard.k...@nsn.com This is necessary because during reboot of a VM the assigned devices continue DMA transfers which causes memory corruption. Acked-by: Alex Williamson alex.william...@redhat.com Acked-by: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Thomas Ostler thomas.ost...@nsn.com Signed-off-by: Bernhard Kohl bernhard.k...@nsn.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/hw/device-assignment.c b/hw/device-assignment.c index c2a7b27..369bff9 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -1438,6 +1438,17 @@ static const VMStateDescription vmstate_assigned_device = { .name = pci-assign }; +static void reset_assigned_device(DeviceState *dev) +{ +PCIDevice *d = DO_UPCAST(PCIDevice, qdev, dev); + +/* + * When a 0 is written to the command register, the device is logically + * disconnected from the PCI bus. This avoids further DMA transfers. + */ +assigned_dev_pci_write_config(d, PCI_COMMAND, 0, 2); +} + static int assigned_initfn(struct PCIDevice *pci_dev) { AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev); @@ -1555,6 +1566,7 @@ static PCIDeviceInfo assign_info = { .qdev.name= pci-assign, .qdev.desc= pass through host pci devices to the guest, .qdev.size= sizeof(AssignedDevice), +.qdev.reset = reset_assigned_device, .init = assigned_initfn, .exit = assigned_exitfn, .config_read = assigned_dev_pci_read_config, -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] device-assignment: Register as un-migratable
From: Alex Williamson alex.william...@redhat.com Use register_device_unmigratable() to declare ourselves as non-migratable. Signed-off-by: Alex Williamson alex.william...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 5f5bde1..c2a7b27 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -1434,6 +1434,10 @@ static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev) dev-msix_table_page = NULL; } +static const VMStateDescription vmstate_assigned_device = { +.name = pci-assign +}; + static int assigned_initfn(struct PCIDevice *pci_dev) { AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev); @@ -1495,6 +1499,12 @@ static int assigned_initfn(struct PCIDevice *pci_dev) assigned_dev_load_option_rom(dev); QLIST_INSERT_HEAD(devs, dev, next); + +/* Register a vmsd so that we can mark it unmigratable. */ +vmstate_register(dev-dev.qdev, 0, vmstate_assigned_device, dev); +register_device_unmigratable(dev-dev.qdev, + vmstate_assigned_device.name, dev); + return 0; assigned_out: @@ -1508,6 +1518,7 @@ static int assigned_exitfn(struct PCIDevice *pci_dev) { AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev); +vmstate_unregister(dev-dev.qdev, vmstate_assigned_device, dev); QLIST_REMOVE(dev, next); deassign_device(dev); free_assigned_device(dev); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Mask KVM_GET_SUPPORTED_CPUID data with Linux cpuid info
From: Avi Kivity a...@redhat.com This allows Linux to mask cpuid bits if, for example, nx is enabled on only some cpus. Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 003a0ca..410d2d1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2232,6 +2232,11 @@ out: return r; } +static void cpuid_mask(u32 *word, int wordnum) +{ + *word = boot_cpu_data.x86_capability[wordnum]; +} + static void do_cpuid_1_ent(struct kvm_cpuid_entry2 *entry, u32 function, u32 index) { @@ -2306,7 +2311,9 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, break; case 1: entry-edx = kvm_supported_word0_x86_features; + cpuid_mask(entry-edx, 0); entry-ecx = kvm_supported_word4_x86_features; + cpuid_mask(entry-ecx, 4); /* we support x2apic emulation even if host does not support * it since we emulate x2apic in software */ entry-ecx |= F(X2APIC); @@ -2397,7 +2404,9 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, break; case 0x8001: entry-edx = kvm_supported_word1_x86_features; + cpuid_mask(entry-edx, 1); entry-ecx = kvm_supported_word6_x86_features; + cpuid_mask(entry-ecx, 6); break; } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: VMX: Fix host userspace gsbase corruption
From: Avi Kivity a...@redhat.com We now use load_gs_index() to load gs safely; unfortunately this also changes MSR_KERNEL_GS_BASE, which we managed separately. This resulted in confusion and breakage running 32-bit host userspace on a 64-bit kernel. Fix by - saving guest MSR_KERNEL_GS_BASE before we we reload the host's gs - doing the host save/load unconditionally, instead of only when in guest long mode Things can be cleaned up further, but this is the minmal fix for now. Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 9367abc..0badeac 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -821,10 +821,9 @@ static void vmx_save_host_state(struct kvm_vcpu *vcpu) #endif #ifdef CONFIG_X86_64 - if (is_long_mode(vmx-vcpu)) { - rdmsrl(MSR_KERNEL_GS_BASE, vmx-msr_host_kernel_gs_base); + rdmsrl(MSR_KERNEL_GS_BASE, vmx-msr_host_kernel_gs_base); + if (is_long_mode(vmx-vcpu)) wrmsrl(MSR_KERNEL_GS_BASE, vmx-msr_guest_kernel_gs_base); - } #endif for (i = 0; i vmx-save_nmsrs; ++i) kvm_set_shared_msr(vmx-guest_msrs[i].index, @@ -839,11 +838,14 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx) ++vmx-vcpu.stat.host_state_reload; vmx-host_state.loaded = 0; +#ifdef CONFIG_X86_64 + if (is_long_mode(vmx-vcpu)) + rdmsrl(MSR_KERNEL_GS_BASE, vmx-msr_guest_kernel_gs_base); +#endif if (vmx-host_state.gs_ldt_reload_needed) { kvm_load_ldt(vmx-host_state.ldt_sel); #ifdef CONFIG_X86_64 load_gs_index(vmx-host_state.gs_sel); - wrmsrl(MSR_KERNEL_GS_BASE, current-thread.gs); #else loadsegment(gs, vmx-host_state.gs_sel); #endif @@ -852,10 +854,7 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx) loadsegment(fs, vmx-host_state.fs_sel); reload_tss(); #ifdef CONFIG_X86_64 - if (is_long_mode(vmx-vcpu)) { - rdmsrl(MSR_KERNEL_GS_BASE, vmx-msr_guest_kernel_gs_base); - wrmsrl(MSR_KERNEL_GS_BASE, vmx-msr_host_kernel_gs_base); - } + wrmsrl(MSR_KERNEL_GS_BASE, vmx-msr_host_kernel_gs_base); #endif if (current_thread_info()-status TS_USEDFPU) clts(); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Clear assigned guest IRQ on release
From: Jan Kiszka jan.kis...@siemens.com When we deassign a guest IRQ, clear the potentially asserted guest line. There might be no chance for the guest to do this, specifically if we switch from INTx to MSI mode. Acked-by: Alex Williamson alex.william...@redhat.com Acked-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c index 7c98928..ecc4419 100644 --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -141,6 +141,9 @@ static void deassign_guest_irq(struct kvm *kvm, kvm_unregister_irq_ack_notifier(kvm, assigned_dev-ack_notifier); assigned_dev-ack_notifier.gsi = -1; + kvm_set_irq(assigned_dev-kvm, assigned_dev-irq_source_id, + assigned_dev-guest_irq, 0); + if (assigned_dev-irq_source_id != -1) kvm_free_irq_source_id(kvm, assigned_dev-irq_source_id); assigned_dev-irq_source_id = -1; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Refactor IRQ names of assigned devices
From: Jan Kiszka jan.kis...@siemens.com Cosmetic change, but it helps to correlate IRQs with PCI devices. Acked-by: Alex Williamson alex.william...@redhat.com Acked-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 9fe7fef..4bd663d 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -489,6 +489,7 @@ struct kvm_assigned_dev_kernel { struct pci_dev *dev; struct kvm *kvm; spinlock_t intx_lock; + char irq_name[32]; }; struct kvm_irq_mask_notifier { diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c index 1d77ce1..7623408 100644 --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -231,8 +231,7 @@ static int assigned_device_enable_host_intx(struct kvm *kvm, * are going to be long delays in accepting, acking, etc. */ if (request_threaded_irq(dev-host_irq, NULL, kvm_assigned_dev_thread, -IRQF_ONESHOT, kvm_assigned_intx_device, -(void *)dev)) +IRQF_ONESHOT, dev-irq_name, (void *)dev)) return -EIO; return 0; } @@ -251,7 +250,7 @@ static int assigned_device_enable_host_msi(struct kvm *kvm, dev-host_irq = dev-dev-irq; if (request_threaded_irq(dev-host_irq, NULL, kvm_assigned_dev_thread, -0, kvm_assigned_msi_device, (void *)dev)) { +0, dev-irq_name, (void *)dev)) { pci_disable_msi(dev-dev); return -EIO; } @@ -278,8 +277,7 @@ static int assigned_device_enable_host_msix(struct kvm *kvm, for (i = 0; i dev-entries_nr; i++) { r = request_threaded_irq(dev-host_msix_entries[i].vector, NULL, kvm_assigned_dev_thread, -0, kvm_assigned_msix_device, -(void *)dev); +0, dev-irq_name, (void *)dev); if (r) goto err; } @@ -336,6 +334,9 @@ static int assign_host_irq(struct kvm *kvm, if (dev-irq_requested_type KVM_DEV_IRQ_HOST_MASK) return r; + snprintf(dev-irq_name, sizeof(dev-irq_name), kvm:%s, +pci_name(dev-dev)); + switch (host_irq_type) { case KVM_DEV_IRQ_HOST_INTX: r = assigned_device_enable_host_intx(kvm, dev); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Save/restore state of assigned PCI device
From: Jan Kiszka jan.kis...@siemens.com The guest may change states that pci_reset_function does not touch. So we better save/restore the assigned device across guest usage. Acked-by: Alex Williamson alex.william...@redhat.com Acked-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c index 7623408..d389207 100644 --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -197,7 +197,8 @@ static void kvm_free_assigned_device(struct kvm *kvm, { kvm_free_assigned_irq(kvm, assigned_dev); - pci_reset_function(assigned_dev-dev); + __pci_reset_function(assigned_dev-dev); + pci_restore_state(assigned_dev-dev); pci_release_regions(assigned_dev-dev); pci_disable_device(assigned_dev-dev); @@ -514,6 +515,7 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm, } pci_reset_function(dev); + pci_save_state(dev); match-assigned_dev_id = assigned_dev-assigned_dev_id; match-host_segnr = assigned_dev-segnr; @@ -544,6 +546,7 @@ out: mutex_unlock(kvm-lock); return r; out_list_del: + pci_restore_state(dev); list_del(match-list); pci_release_regions(dev); out_disable: -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Document device assigment API
From: Jan Kiszka jan.kis...@siemens.com Adds API documentation for KVM_[DE]ASSIGN_PCI_DEVICE, KVM_[DE]ASSIGN_DEV_IRQ, KVM_SET_GSI_ROUTING, KVM_ASSIGN_SET_MSIX_NR, and KVM_ASSIGN_SET_MSIX_ENTRY. Acked-by: Alex Williamson alex.william...@redhat.com Acked-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt index b336266..e1a9297 100644 --- a/Documentation/kvm/api.txt +++ b/Documentation/kvm/api.txt @@ -1085,6 +1085,184 @@ of 4 instructions that make up a hypercall. If any additional field gets added to this structure later on, a bit for that additional piece of information will be set in the flags bitmap. +4.47 KVM_ASSIGN_PCI_DEVICE + +Capability: KVM_CAP_DEVICE_ASSIGNMENT +Architectures: x86 ia64 +Type: vm ioctl +Parameters: struct kvm_assigned_pci_dev (in) +Returns: 0 on success, -1 on error + +Assigns a host PCI device to the VM. + +struct kvm_assigned_pci_dev { + __u32 assigned_dev_id; + __u32 busnr; + __u32 devfn; + __u32 flags; + __u32 segnr; + union { + __u32 reserved[11]; + }; +}; + +The PCI device is specified by the triple segnr, busnr, and devfn. +Identification in succeeding service requests is done via assigned_dev_id. The +following flags are specified: + +/* Depends on KVM_CAP_IOMMU */ +#define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 0) + +4.48 KVM_DEASSIGN_PCI_DEVICE + +Capability: KVM_CAP_DEVICE_DEASSIGNMENT +Architectures: x86 ia64 +Type: vm ioctl +Parameters: struct kvm_assigned_pci_dev (in) +Returns: 0 on success, -1 on error + +Ends PCI device assignment, releasing all associated resources. + +See KVM_CAP_DEVICE_ASSIGNMENT for the data structure. Only assigned_dev_id is +used in kvm_assigned_pci_dev to identify the device. + +4.49 KVM_ASSIGN_DEV_IRQ + +Capability: KVM_CAP_ASSIGN_DEV_IRQ +Architectures: x86 ia64 +Type: vm ioctl +Parameters: struct kvm_assigned_irq (in) +Returns: 0 on success, -1 on error + +Assigns an IRQ to a passed-through device. + +struct kvm_assigned_irq { + __u32 assigned_dev_id; + __u32 host_irq; + __u32 guest_irq; + __u32 flags; + union { + struct { + __u32 addr_lo; + __u32 addr_hi; + __u32 data; + } guest_msi; + __u32 reserved[12]; + }; +}; + +The following flags are defined: + +#define KVM_DEV_IRQ_HOST_INTX(1 0) +#define KVM_DEV_IRQ_HOST_MSI (1 1) +#define KVM_DEV_IRQ_HOST_MSIX(1 2) + +#define KVM_DEV_IRQ_GUEST_INTX (1 8) +#define KVM_DEV_IRQ_GUEST_MSI(1 9) +#define KVM_DEV_IRQ_GUEST_MSIX (1 10) + +It is not valid to specify multiple types per host or guest IRQ. However, the +IRQ type of host and guest can differ or can even be null. + +4.50 KVM_DEASSIGN_DEV_IRQ + +Capability: KVM_CAP_ASSIGN_DEV_IRQ +Architectures: x86 ia64 +Type: vm ioctl +Parameters: struct kvm_assigned_irq (in) +Returns: 0 on success, -1 on error + +Ends an IRQ assignment to a passed-through device. + +See KVM_ASSIGN_DEV_IRQ for the data structure. The target device is specified +by assigned_dev_id, flags must correspond to the IRQ type specified on +KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed. + +4.51 KVM_SET_GSI_ROUTING + +Capability: KVM_CAP_IRQ_ROUTING +Architectures: x86 ia64 +Type: vm ioctl +Parameters: struct kvm_irq_routing (in) +Returns: 0 on success, -1 on error + +Sets the GSI routing table entries, overwriting any previously set entries. + +struct kvm_irq_routing { + __u32 nr; + __u32 flags; + struct kvm_irq_routing_entry entries[0]; +}; + +No flags are specified so far, the corresponding field must be set to zero. + +struct kvm_irq_routing_entry { + __u32 gsi; + __u32 type; + __u32 flags; + __u32 pad; + union { + struct kvm_irq_routing_irqchip irqchip; + struct kvm_irq_routing_msi msi; + __u32 pad[8]; + } u; +}; + +/* gsi routing entry types */ +#define KVM_IRQ_ROUTING_IRQCHIP 1 +#define KVM_IRQ_ROUTING_MSI 2 + +No flags are specified so far, the corresponding field must be set to zero. + +struct kvm_irq_routing_irqchip { + __u32 irqchip; + __u32 pin; +}; + +struct kvm_irq_routing_msi { + __u32 address_lo; + __u32 address_hi; + __u32 data; + __u32 pad; +}; + +4.52 KVM_ASSIGN_SET_MSIX_NR + +Capability: KVM_CAP_DEVICE_MSIX +Architectures: x86 ia64 +Type: vm ioctl +Parameters: struct kvm_assigned_msix_nr (in) +Returns: 0 on success, -1 on error + +Set the number of MSI-X interrupts for an assigned device. This service can +only be called once in the lifetime of an assigned device. + +struct kvm_assigned_msix_nr { + __u32 assigned_dev_id; + __u16 entry_nr; + __u16 padding; +}; + +#define KVM_MAX_MSIX_PER_DEV 256 + +4.53
[COMMIT master] KVM: MMU: don't mark spte notrap if reserved bit set
From: Xiao Guangrong xiaoguangr...@cn.fujitsu.com If reserved bit is set, we need inject the #PF with PFEC.RSVD=1, but shadow_notrap_nonpresent_pte injects #PF with PFEC.RSVD=0 only Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index ba00eef..590bf12 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -395,8 +395,10 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw, gpte = gptep[i]; - if (!is_present_gpte(gpte) || - is_rsvd_bits_set(mmu, gpte, PT_PAGE_TABLE_LEVEL)) { + if (is_rsvd_bits_set(mmu, gpte, PT_PAGE_TABLE_LEVEL)) + continue; + + if (!is_present_gpte(gpte)) { if (!sp-unsync) __set_spte(spte, shadow_notrap_nonpresent_pte); continue; @@ -760,6 +762,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, pt_element_t gpte; gpa_t pte_gpa; gfn_t gfn; + bool rsvd_bits_set; if (!is_shadow_present_pte(sp-spt[i])) continue; @@ -771,12 +774,14 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, return -EINVAL; gfn = gpte_to_gfn(gpte); - if (is_rsvd_bits_set(vcpu-arch.mmu, gpte, PT_PAGE_TABLE_LEVEL) - || gfn != sp-gfns[i] || !is_present_gpte(gpte) - || !(gpte PT_ACCESSED_MASK)) { + rsvd_bits_set = is_rsvd_bits_set(vcpu-arch.mmu, gpte, +PT_PAGE_TABLE_LEVEL); + if (rsvd_bits_set || gfn != sp-gfns[i] || + !is_present_gpte(gpte) || !(gpte PT_ACCESSED_MASK)) { u64 nonpresent; - if (is_present_gpte(gpte) || !clear_unsync) + if (rsvd_bits_set || is_present_gpte(gpte) || + !clear_unsync) nonpresent = shadow_trap_nonpresent_pte; else nonpresent = shadow_notrap_nonpresent_pte; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: take kvm_lock for hardware_disable() during cpu hotplug
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp In kvm_cpu_hotplug(), only CPU_STARTING case is protected by kvm_lock. This patch adds missing protection for CPU_DYING case. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 339dd43..0fdd911 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2148,7 +2148,9 @@ static int kvm_cpu_hotplug(struct notifier_block *notifier, unsigned long val, case CPU_DYING: printk(KERN_INFO kvm: disabling virtualization on CPU%d\n, cpu); + spin_lock(kvm_lock); hardware_disable(NULL); + spin_unlock(kvm_lock); break; case CPU_STARTING: printk(KERN_INFO kvm: enabling virtualization on CPU%d\n, -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: drop unused #ifndef __KERNEL__
From: Avi Kivity a...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 38b6e8d..ffd6e01 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -20,16 +20,9 @@ * From: xen-unstable 10676:af9809f51f81a3c43f276f00c81a52ef558afda4 */ -#ifndef __KERNEL__ -#include stdio.h -#include stdint.h -#include public/xen.h -#define DPRINTF(_f, _a ...) printf(_f , ## _a) -#else #include linux/kvm_host.h #include kvm_cache_regs.h #define DPRINTF(x...) do {} while (0) -#endif #include linux/module.h #include asm/kvm_emulate.h -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: rename hardware_[dis|en]able() to *_nolock() and add locking wrappers
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp The naming convension of hardware_[dis|en]able family is little bit confusing because only hardware_[dis|en]able_all are using _nolock suffix. Renaming current hardware_[dis|en]able() to *_nolock() and using hardware_[dis|en]able() as wrapper functions which take kvm_lock for them reduces extra confusion. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 0fdd911..fb93ff9 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2067,7 +2067,7 @@ static struct miscdevice kvm_dev = { kvm_chardev_ops, }; -static void hardware_enable(void *junk) +static void hardware_enable_nolock(void *junk) { int cpu = raw_smp_processor_id(); int r; @@ -2087,7 +2087,14 @@ static void hardware_enable(void *junk) } } -static void hardware_disable(void *junk) +static void hardware_enable(void *junk) +{ + spin_lock(kvm_lock); + hardware_enable_nolock(junk); + spin_unlock(kvm_lock); +} + +static void hardware_disable_nolock(void *junk) { int cpu = raw_smp_processor_id(); @@ -2097,13 +2104,20 @@ static void hardware_disable(void *junk) kvm_arch_hardware_disable(NULL); } +static void hardware_disable(void *junk) +{ + spin_lock(kvm_lock); + hardware_disable_nolock(junk); + spin_unlock(kvm_lock); +} + static void hardware_disable_all_nolock(void) { BUG_ON(!kvm_usage_count); kvm_usage_count--; if (!kvm_usage_count) - on_each_cpu(hardware_disable, NULL, 1); + on_each_cpu(hardware_disable_nolock, NULL, 1); } static void hardware_disable_all(void) @@ -2122,7 +2136,7 @@ static int hardware_enable_all(void) kvm_usage_count++; if (kvm_usage_count == 1) { atomic_set(hardware_enable_failed, 0); - on_each_cpu(hardware_enable, NULL, 1); + on_each_cpu(hardware_enable_nolock, NULL, 1); if (atomic_read(hardware_enable_failed)) { hardware_disable_all_nolock(); @@ -2148,16 +2162,12 @@ static int kvm_cpu_hotplug(struct notifier_block *notifier, unsigned long val, case CPU_DYING: printk(KERN_INFO kvm: disabling virtualization on CPU%d\n, cpu); - spin_lock(kvm_lock); hardware_disable(NULL); - spin_unlock(kvm_lock); break; case CPU_STARTING: printk(KERN_INFO kvm: enabling virtualization on CPU%d\n, cpu); - spin_lock(kvm_lock); hardware_enable(NULL); - spin_unlock(kvm_lock); break; } return NOTIFY_OK; @@ -2188,7 +2198,7 @@ static int kvm_reboot(struct notifier_block *notifier, unsigned long val, */ printk(KERN_INFO kvm: exiting hardware virtualization\n); kvm_rebooting = true; - on_each_cpu(hardware_disable, NULL, 1); + on_each_cpu(hardware_disable_nolock, NULL, 1); return NOTIFY_OK; } @@ -2358,7 +2368,7 @@ static void kvm_exit_debug(void) static int kvm_suspend(struct sys_device *dev, pm_message_t state) { if (kvm_usage_count) - hardware_disable(NULL); + hardware_disable_nolock(NULL); return 0; } @@ -2366,7 +2376,7 @@ static int kvm_resume(struct sys_device *dev) { if (kvm_usage_count) { WARN_ON(spin_is_locked(kvm_lock)); - hardware_enable(NULL); + hardware_enable_nolock(NULL); } return 0; } @@ -2543,7 +2553,7 @@ void kvm_exit(void) sysdev_class_unregister(kvm_sysdev_class); unregister_reboot_notifier(kvm_reboot_notifier); unregister_cpu_notifier(kvm_cpu_notifier); - on_each_cpu(hardware_disable, NULL, 1); + on_each_cpu(hardware_disable_nolock, NULL, 1); kvm_arch_hardware_unsetup(); kvm_arch_exit(); free_cpumask_var(cpus_hardware_enabled); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Switch assigned device IRQ forwarding to threaded handler
From: Jan Kiszka jan.kis...@siemens.com This improves the IRQ forwarding for assigned devices: By using the kernel's threaded IRQ scheme, we can get rid of the latency-prone work queue and simplify the code in the same run. Moreover, we no longer have to hold assigned_dev_lock while raising the guest IRQ, which can be a lenghty operation as we may have to iterate over all VCPUs. The lock is now only used for synchronizing masking vs. unmasking of INTx-type IRQs, thus is renames to intx_lock. Acked-by: Alex Williamson alex.william...@redhat.com Acked-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 2d63f2c..9fe7fef 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -470,16 +470,8 @@ struct kvm_irq_ack_notifier { void (*irq_acked)(struct kvm_irq_ack_notifier *kian); }; -#define KVM_ASSIGNED_MSIX_PENDING 0x1 -struct kvm_guest_msix_entry { - u32 vector; - u16 entry; - u16 flags; -}; - struct kvm_assigned_dev_kernel { struct kvm_irq_ack_notifier ack_notifier; - struct work_struct interrupt_work; struct list_head list; int assigned_dev_id; int host_segnr; @@ -490,13 +482,13 @@ struct kvm_assigned_dev_kernel { bool host_irq_disabled; struct msix_entry *host_msix_entries; int guest_irq; - struct kvm_guest_msix_entry *guest_msix_entries; + struct msix_entry *guest_msix_entries; unsigned long irq_requested_type; int irq_source_id; int flags; struct pci_dev *dev; struct kvm *kvm; - spinlock_t assigned_dev_lock; + spinlock_t intx_lock; }; struct kvm_irq_mask_notifier { diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c index ecc4419..1d77ce1 100644 --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -55,58 +55,31 @@ static int find_index_from_host_irq(struct kvm_assigned_dev_kernel return index; } -static void kvm_assigned_dev_interrupt_work_handler(struct work_struct *work) +static irqreturn_t kvm_assigned_dev_thread(int irq, void *dev_id) { - struct kvm_assigned_dev_kernel *assigned_dev; - int i; + struct kvm_assigned_dev_kernel *assigned_dev = dev_id; + u32 vector; + int index; - assigned_dev = container_of(work, struct kvm_assigned_dev_kernel, - interrupt_work); + if (assigned_dev-irq_requested_type KVM_DEV_IRQ_HOST_INTX) { + spin_lock(assigned_dev-intx_lock); + disable_irq_nosync(irq); + assigned_dev-host_irq_disabled = true; + spin_unlock(assigned_dev-intx_lock); + } - spin_lock_irq(assigned_dev-assigned_dev_lock); if (assigned_dev-irq_requested_type KVM_DEV_IRQ_HOST_MSIX) { - struct kvm_guest_msix_entry *guest_entries = - assigned_dev-guest_msix_entries; - for (i = 0; i assigned_dev-entries_nr; i++) { - if (!(guest_entries[i].flags - KVM_ASSIGNED_MSIX_PENDING)) - continue; - guest_entries[i].flags = ~KVM_ASSIGNED_MSIX_PENDING; + index = find_index_from_host_irq(assigned_dev, irq); + if (index = 0) { + vector = assigned_dev- + guest_msix_entries[index].vector; kvm_set_irq(assigned_dev-kvm, - assigned_dev-irq_source_id, - guest_entries[i].vector, 1); + assigned_dev-irq_source_id, vector, 1); } } else kvm_set_irq(assigned_dev-kvm, assigned_dev-irq_source_id, assigned_dev-guest_irq, 1); - spin_unlock_irq(assigned_dev-assigned_dev_lock); -} - -static irqreturn_t kvm_assigned_dev_intr(int irq, void *dev_id) -{ - unsigned long flags; - struct kvm_assigned_dev_kernel *assigned_dev = - (struct kvm_assigned_dev_kernel *) dev_id; - - spin_lock_irqsave(assigned_dev-assigned_dev_lock, flags); - if (assigned_dev-irq_requested_type KVM_DEV_IRQ_HOST_MSIX) { - int index = find_index_from_host_irq(assigned_dev, irq); - if (index 0) - goto out; - assigned_dev-guest_msix_entries[index].flags |= - KVM_ASSIGNED_MSIX_PENDING; - } - - schedule_work(assigned_dev-interrupt_work); - - if (assigned_dev-irq_requested_type KVM_DEV_IRQ_GUEST_INTX) { - disable_irq_nosync(irq); - assigned_dev-host_irq_disabled = true; - } - -out: -
[COMMIT master] KVM: x86 emulator: drop DPRINTF()
From: Avi Kivity a...@redhat.com Failed emulation is reported via a tracepoint; the cmps printk is pointless. Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index ffd6e01..3325b47 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -22,7 +22,6 @@ #include linux/kvm_host.h #include kvm_cache_regs.h -#define DPRINTF(x...) do {} while (0) #include linux/module.h #include asm/kvm_emulate.h @@ -2796,10 +2795,8 @@ done_prefixes: c-execute = opcode.u.execute; /* Unrecognised? */ - if (c-d == 0 || (c-d Undefined)) { - DPRINTF(Cannot emulate %02x\n, c-b); + if (c-d == 0 || (c-d Undefined)) return -1; - } if (mode == X86EMUL_MODE_PROT64 (c-d Stack)) c-op_bytes = 8; @@ -3261,7 +3258,6 @@ special_insn: break; case 0xa6 ... 0xa7: /* cmps */ c-dst.type = OP_NONE; /* Disable writeback. */ - DPRINTF(cmps: mem1=0x%p mem2=0x%p\n, c-src.addr.mem, c-dst.addr.mem); goto cmp; case 0xa8 ... 0xa9: /* test ax, imm */ goto test; @@ -3778,6 +3774,5 @@ twobyte_insn: goto writeback; cannot_emulate: - DPRINTF(Cannot emulate %02x\n, c-b); return -1; } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: do not perform address calculations on linear addresses
From: Avi Kivity a...@redhat.com Linear addresses are supposed to already have segment checks performed on them; if we play with these addresses the checks become invalid. Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index e967055..bdbbb18 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -568,7 +568,8 @@ static int read_descriptor(struct x86_emulate_ctxt *ctxt, ctxt-vcpu, NULL); if (rc != X86EMUL_CONTINUE) return rc; - rc = ops-read_std(linear(ctxt, addr) + 2, address, op_bytes, + addr.ea += 2; + rc = ops-read_std(linear(ctxt, addr), address, op_bytes, ctxt-vcpu, NULL); return rc; } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: preserve an operand's segment identity
From: Avi Kivity a...@redhat.com Currently the x86 emulator converts the segment register associated with an operand into a segment base which is added into the operand address. This loss of information results in us not doing segment limit checks properly. Replace struct operand's addr.mem field by a segmented_address structure which holds both the effetive address and segment. This will allow us to do the limit check at the point of access. Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index b36c6b3..b48c133 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -159,7 +159,10 @@ struct operand { }; union { unsigned long *reg; - unsigned long mem; + struct segmented_address { + ulong ea; + unsigned seg; + } mem; } addr; union { unsigned long val; diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 3325b47..e967055 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -410,9 +410,9 @@ address_mask(struct decode_cache *c, unsigned long reg) } static inline unsigned long -register_address(struct decode_cache *c, unsigned long base, unsigned long reg) +register_address(struct decode_cache *c, unsigned long reg) { - return base + address_mask(c, reg); + return address_mask(c, reg); } static inline void @@ -444,26 +444,26 @@ static unsigned long seg_base(struct x86_emulate_ctxt *ctxt, return ops-get_cached_segment_base(seg, ctxt-vcpu); } -static unsigned long seg_override_base(struct x86_emulate_ctxt *ctxt, - struct x86_emulate_ops *ops, - struct decode_cache *c) +static unsigned seg_override(struct x86_emulate_ctxt *ctxt, +struct x86_emulate_ops *ops, +struct decode_cache *c) { if (!c-has_seg_override) return 0; - return seg_base(ctxt, ops, c-seg_override); + return c-seg_override; } -static unsigned long es_base(struct x86_emulate_ctxt *ctxt, -struct x86_emulate_ops *ops) +static ulong linear(struct x86_emulate_ctxt *ctxt, + struct segmented_address addr) { - return seg_base(ctxt, ops, VCPU_SREG_ES); -} + struct decode_cache *c = ctxt-decode; + ulong la; -static unsigned long ss_base(struct x86_emulate_ctxt *ctxt, -struct x86_emulate_ops *ops) -{ - return seg_base(ctxt, ops, VCPU_SREG_SS); + la = seg_base(ctxt, ctxt-ops, addr.seg) + addr.ea; + if (c-ad_bytes != 8) + la = (u32)-1; + return la; } static void emulate_exception(struct x86_emulate_ctxt *ctxt, int vec, @@ -556,7 +556,7 @@ static void *decode_register(u8 modrm_reg, unsigned long *regs, static int read_descriptor(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops, - ulong addr, + struct segmented_address addr, u16 *size, unsigned long *address, int op_bytes) { int rc; @@ -564,10 +564,12 @@ static int read_descriptor(struct x86_emulate_ctxt *ctxt, if (op_bytes == 2) op_bytes = 3; *address = 0; - rc = ops-read_std(addr, (unsigned long *)size, 2, ctxt-vcpu, NULL); + rc = ops-read_std(linear(ctxt, addr), (unsigned long *)size, 2, + ctxt-vcpu, NULL); if (rc != X86EMUL_CONTINUE) return rc; - rc = ops-read_std(addr + 2, address, op_bytes, ctxt-vcpu, NULL); + rc = ops-read_std(linear(ctxt, addr) + 2, address, op_bytes, + ctxt-vcpu, NULL); return rc; } @@ -760,7 +762,7 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt, break; } } - op-addr.mem = modrm_ea; + op-addr.mem.ea = modrm_ea; done: return rc; } @@ -775,13 +777,13 @@ static int decode_abs(struct x86_emulate_ctxt *ctxt, op-type = OP_MEM; switch (c-ad_bytes) { case 2: - op-addr.mem = insn_fetch(u16, 2, c-eip); + op-addr.mem.ea = insn_fetch(u16, 2, c-eip); break; case 4: - op-addr.mem = insn_fetch(u32, 4, c-eip); + op-addr.mem.ea = insn_fetch(u32, 4, c-eip); break; case 8: - op-addr.mem = insn_fetch(u64, 8, c-eip); + op-addr.mem.ea = insn_fetch(u64, 8, c-eip); break; } done: @@ -800,7 +802,7 @@ static void fetch_bit_operand(struct decode_cache *c) else if (c-src.bytes == 4)
[COMMIT master] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git/
From: Marcelo Tosatti mtosa...@redhat.com Conflicts: arch/x86/kvm/svm.c kernel/sched.c Signed-off-by: Marcelo Tosatti mtosa...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run()
From: Avi Kivity a...@redhat.com cea15c2 (KVM: Move KVM context switch into own function) split vmx_vcpu_run() to prevent multiple copies of the context switch from being generated (causing problems due to a label). This patch folds them back together again and adds the __noclone attribute to prevent the label from being duplicated. Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a9ad174..58e5913 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3904,17 +3904,33 @@ static void vmx_cancel_injection(struct kvm_vcpu *vcpu) #define Q l #endif -/* - * We put this into a separate noinline function to prevent the compiler - * from duplicating the code. This is needed because this code - * uses non local labels that cannot be duplicated. - * Do not put any flow control into this function. - * Better would be to put this whole monstrosity into a .S file. - */ -static void noinline do_vmx_vcpu_run(struct kvm_vcpu *vcpu) +static void vmx_vcpu_run(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); - asm volatile( + + /* Record the guest's net vcpu time for enforced NMI injections. */ + if (unlikely(!cpu_has_virtual_nmis() vmx-soft_vnmi_blocked)) + vmx-entry_time = ktime_get(); + + /* Don't enter VMX if guest state is invalid, let the exit handler + start emulation until we arrive back to a valid state */ + if (vmx-emulation_required emulate_invalid_guest_state) + return; + + if (test_bit(VCPU_REGS_RSP, (unsigned long *)vcpu-arch.regs_dirty)) + vmcs_writel(GUEST_RSP, vcpu-arch.regs[VCPU_REGS_RSP]); + if (test_bit(VCPU_REGS_RIP, (unsigned long *)vcpu-arch.regs_dirty)) + vmcs_writel(GUEST_RIP, vcpu-arch.regs[VCPU_REGS_RIP]); + + /* When single-stepping over STI and MOV SS, we must clear the +* corresponding interruptibility bits in the guest state. Otherwise +* vmentry fails as it then expects bit 14 (BS) in pending debug +* exceptions being set, but that's not correct for the guest debugging +* case. */ + if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP) + vmx_set_interrupt_shadow(vcpu, 0); + + asm( /* Store host registers */ push %%Rdx; push %%Rbp; push %%Rcx \n\t @@ -4009,35 +4025,6 @@ static void noinline do_vmx_vcpu_run(struct kvm_vcpu *vcpu) , r8, r9, r10, r11, r12, r13, r14, r15 #endif ); -} - -static void vmx_vcpu_run(struct kvm_vcpu *vcpu) -{ - struct vcpu_vmx *vmx = to_vmx(vcpu); - - /* Record the guest's net vcpu time for enforced NMI injections. */ - if (unlikely(!cpu_has_virtual_nmis() vmx-soft_vnmi_blocked)) - vmx-entry_time = ktime_get(); - - /* Don't enter VMX if guest state is invalid, let the exit handler - start emulation until we arrive back to a valid state */ - if (vmx-emulation_required emulate_invalid_guest_state) - return; - - if (test_bit(VCPU_REGS_RSP, (unsigned long *)vcpu-arch.regs_dirty)) - vmcs_writel(GUEST_RSP, vcpu-arch.regs[VCPU_REGS_RSP]); - if (test_bit(VCPU_REGS_RIP, (unsigned long *)vcpu-arch.regs_dirty)) - vmcs_writel(GUEST_RIP, vcpu-arch.regs[VCPU_REGS_RIP]); - - /* When single-stepping over STI and MOV SS, we must clear the -* corresponding interruptibility bits in the guest state. Otherwise -* vmentry fails as it then expects bit 14 (BS) in pending debug -* exceptions being set, but that's not correct for the guest debugging -* case. */ - if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP) - vmx_set_interrupt_shadow(vcpu, 0); - - do_vmx_vcpu_run(vcpu); vcpu-arch.regs_avail = ~((1 VCPU_REGS_RIP) | (1 VCPU_REGS_RSP) | (1 VCPU_EXREG_PDPTR)); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: VMX: Inform user about INTEL_TXT dependency
From: Shane Wang shane.w...@intel.com Inform user to either disable TXT in the BIOS or do TXT launch with tboot before enabling KVM since some BIOSes do not set FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX bit when TXT is enabled. Signed-off-by: Shane Wang shane.w...@intel.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 0badeac..a9ad174 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1305,8 +1305,11 @@ static __init int vmx_disabled_by_bios(void) tboot_enabled()) return 1; if (!(msr FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX) -!tboot_enabled()) +!tboot_enabled()) { + printk(KERN_WARNING kvm: disable TXT in the BIOS or +activate TXT before enabling KVM\n); return 1; + } } return 0; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Add instruction-set-specific exit qualifications to kvm_exit trace
From: Avi Kivity a...@redhat.com The exit reason alone is insufficient to understand exactly why an exit occured; add ISA-specific trace parameters for additional information. Because fetching these parameters is expensive on vmx, and because these parameters are fetched even if tracing is disabled, we fetch the parameters via a callback instead of as traditional trace arguments. Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index b04c0fa..54e42c8 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -594,6 +594,7 @@ struct kvm_x86_ops { void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset); + void (*get_exit_info)(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2); const struct trace_print_flags *exit_reasons_str; }; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index b83954e..2fd2f4d 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2974,6 +2974,14 @@ void dump_vmcb(struct kvm_vcpu *vcpu) } +static void svm_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2) +{ + struct vmcb_control_area *control = to_svm(vcpu)-vmcb-control; + + *info1 = control-exit_info_1; + *info2 = control-exit_info_2; +} + static int handle_exit(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); @@ -3678,7 +3686,9 @@ static struct kvm_x86_ops svm_x86_ops = { .get_tdp_level = get_npt_level, .get_mt_mask = svm_get_mt_mask, + .get_exit_info = svm_get_exit_info, .exit_reasons_str = svm_exit_reasons_str, + .get_lpage_level = svm_get_lpage_level, .cpuid_update = svm_cpuid_update, diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index 1061022..1357d7c 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -192,18 +192,22 @@ TRACE_EVENT(kvm_exit, __field(unsigned int, exit_reason ) __field(unsigned long, guest_rip ) __field(u32,isa ) + __field(u64,info1 ) + __field(u64,info2 ) ), TP_fast_assign( __entry-exit_reason= exit_reason; __entry-guest_rip = kvm_rip_read(vcpu); __entry-isa= isa; + kvm_x86_ops-get_exit_info(vcpu, __entry-info1, + __entry-info2); ), - TP_printk(reason %s rip 0x%lx, + TP_printk(reason %s rip 0x%lx info %llx %llx, ftrace_print_symbols_seq(p, __entry-exit_reason, kvm_x86_ops-exit_reasons_str), -__entry-guest_rip) +__entry-guest_rip, __entry-info1, __entry-info2) ); /* diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 4e2b8f3..caa967e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3690,6 +3690,12 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = { static const int kvm_vmx_max_exit_handlers = ARRAY_SIZE(kvm_vmx_exit_handlers); +static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2) +{ + *info1 = vmcs_readl(EXIT_QUALIFICATION); + *info2 = vmcs_read32(VM_EXIT_INTR_INFO); +} + /* * The guest has exited. See if we can fix it or if we need userspace * assistance. @@ -4339,7 +4345,9 @@ static struct kvm_x86_ops vmx_x86_ops = { .get_tdp_level = get_ept_level, .get_mt_mask = vmx_get_mt_mask, + .get_exit_info = vmx_get_exit_info, .exit_reasons_str = vmx_exit_reasons_str, + .get_lpage_level = vmx_get_lpage_level, .cpuid_update = vmx_cpuid_update, -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Record instruction set in kvm_exit tracepoint
From: Avi Kivity a...@redhat.com exit_reason's meaning depend on the instruction set; record it so a trace taken on one machine can be interpreted on another. Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index c6a7798..b83954e 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2980,7 +2980,7 @@ static int handle_exit(struct kvm_vcpu *vcpu) struct kvm_run *kvm_run = vcpu-run; u32 exit_code = svm-vmcb-control.exit_code; - trace_kvm_exit(exit_code, vcpu); + trace_kvm_exit(exit_code, vcpu, KVM_ISA_SVM); if (!(svm-vmcb-control.intercept_cr_write INTERCEPT_CR0_MASK)) vcpu-arch.cr0 = svm-vmcb-save.cr0; diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index a6544b8..1061022 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -178,21 +178,26 @@ TRACE_EVENT(kvm_apic, #define trace_kvm_apic_read(reg, val) trace_kvm_apic(0, reg, val) #define trace_kvm_apic_write(reg, val) trace_kvm_apic(1, reg, val) +#define KVM_ISA_VMX 1 +#define KVM_ISA_SVM 2 + /* * Tracepoint for kvm guest exit: */ TRACE_EVENT(kvm_exit, - TP_PROTO(unsigned int exit_reason, struct kvm_vcpu *vcpu), - TP_ARGS(exit_reason, vcpu), + TP_PROTO(unsigned int exit_reason, struct kvm_vcpu *vcpu, u32 isa), + TP_ARGS(exit_reason, vcpu, isa), TP_STRUCT__entry( __field(unsigned int, exit_reason ) __field(unsigned long, guest_rip ) + __field(u32,isa ) ), TP_fast_assign( __entry-exit_reason= exit_reason; __entry-guest_rip = kvm_rip_read(vcpu); + __entry-isa= isa; ), TP_printk(reason %s rip 0x%lx, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 58e5913..4e2b8f3 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3700,7 +3700,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) u32 exit_reason = vmx-exit_reason; u32 vectoring_info = vmx-idt_vectoring_info; - trace_kvm_exit(exit_reason, vcpu); + trace_kvm_exit(exit_reason, vcpu, KVM_ISA_VMX); /* If guest state is invalid, start emulating */ if (vmx-emulation_required emulate_invalid_guest_state) -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: fast-path msi injection with irqfd
From: Michael S. Tsirkin m...@redhat.com Store irq routing table pointer in the irqfd object, and use that to inject MSI directly without bouncing out to a kernel thread. While we touch this structure, rearrange irqfd fields to make fastpath better packed for better cache utilization. This also adds some comments about locking rules and rcu usage in code. Some notes on the design: - Use pointer into the rt instead of copying an entry, to make it possible to use rcu, thus side-stepping locking complexities. We also save some memory this way. - Old workqueue code is still used for level irqs. I don't think we DTRT with level anyway, however, it seems easier to keep the code around as it has been thought through and debugged, and fix level later than rip out and re-instate it later. Signed-off-by: Michael S. Tsirkin m...@redhat.com Acked-by: Marcelo Tosatti mtosa...@redhat.com Acked-by: Gregory Haskins ghask...@novell.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 4bd663d..f17beae 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -17,6 +17,7 @@ #include linux/preempt.h #include linux/msi.h #include linux/slab.h +#include linux/rcupdate.h #include asm/signal.h #include linux/kvm.h @@ -240,6 +241,10 @@ struct kvm { struct mutex irq_lock; #ifdef CONFIG_HAVE_KVM_IRQCHIP + /* +* Update side is protected by irq_lock and, +* if configured, irqfds.lock. +*/ struct kvm_irq_routing_table __rcu *irq_routing; struct hlist_head mask_notifier_list; struct hlist_head irq_ack_notifier_list; @@ -511,6 +516,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic, unsigned long *deliver_bitmask); #endif int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level); +int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm, + int irq_source_id, int level); void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin); void kvm_register_irq_ack_notifier(struct kvm *kvm, struct kvm_irq_ack_notifier *kian); @@ -652,17 +659,26 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {} void kvm_eventfd_init(struct kvm *kvm); int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags); void kvm_irqfd_release(struct kvm *kvm); +void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *); int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args); #else static inline void kvm_eventfd_init(struct kvm *kvm) {} + static inline int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags) { return -EINVAL; } static inline void kvm_irqfd_release(struct kvm *kvm) {} + +static inline void kvm_irq_routing_update(struct kvm *kvm, + struct kvm_irq_routing_table *irq_rt) +{ + rcu_assign_pointer(kvm-irq_routing, irq_rt); +} + static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) { return -ENOSYS; diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index c1f1e3c..2ca4535 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -44,14 +44,19 @@ */ struct _irqfd { - struct kvm *kvm; - struct eventfd_ctx *eventfd; - int gsi; - struct list_head list; - poll_tablept; - wait_queue_t wait; - struct work_structinject; - struct work_structshutdown; + /* Used for MSI fast-path */ + struct kvm *kvm; + wait_queue_t wait; + /* Update side is protected by irqfds.lock */ + struct kvm_kernel_irq_routing_entry __rcu *irq_entry; + /* Used for level IRQ fast-path */ + int gsi; + struct work_struct inject; + /* Used for setup/shutdown */ + struct eventfd_ctx *eventfd; + struct list_head list; + poll_table pt; + struct work_struct shutdown; }; static struct workqueue_struct *irqfd_cleanup_wq; @@ -125,14 +130,22 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) { struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait); unsigned long flags = (unsigned long)key; + struct kvm_kernel_irq_routing_entry *irq; + struct kvm *kvm = irqfd-kvm; - if (flags POLLIN) + if (flags POLLIN) { + rcu_read_lock(); + irq = rcu_dereference(irqfd-irq_entry); /* An event has been signaled, inject an interrupt */ - schedule_work(irqfd-inject); + if (irq) + kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1); + else + schedule_work(irqfd-inject); + rcu_read_unlock(); + } if (flags POLLHUP)
[COMMIT master] apic: test nmi-after-sti
From: Avi Kivity a...@redhat.com While not required by the spec, some guests (Linux) rely on nmi being blocked by an IF-enabling sti. Add a unit test for this condition. Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/x86/apic.c b/x86/apic.c index 165f820..2207040 100644 --- a/x86/apic.c +++ b/x86/apic.c @@ -1,6 +1,7 @@ #include libcflat.h #include apic.h #include vm.h +#include smp.h typedef struct { unsigned short offset0; @@ -274,9 +275,74 @@ static void test_ioapic_simultaneous(void) g_66 g_78 g_66_after_78 g_66_rip == g_78_rip); } +volatile int nmi_counter_private, nmi_counter, nmi_hlt_counter, sti_loop_active; + +void sti_nop(char *p) +{ +asm volatile ( + .globl post_sti \n\t + sti \n + /* + * vmx won't exit on external interrupt if blocked-by-sti, + * so give it a reason to exit by accessing an unmapped page. + */ + post_sti: testb $0, %0 \n\t + nop \n\t + cli + : : m(*p) + ); +nmi_counter = nmi_counter_private; +} + +static void sti_loop(void *ignore) +{ +unsigned k = 0; + +while (sti_loop_active) { + sti_nop((char *)(ulong)((k++ * 4096) % (128 * 1024 * 1024))); +} +} + +static void nmi_handler(isr_regs_t *regs) +{ +extern void post_sti(void); +++nmi_counter_private; +nmi_hlt_counter += regs-rip == (ulong)post_sti; +} + +static void update_cr3(void *cr3) +{ +write_cr3((ulong)cr3); +} + +static void test_sti_nmi(void) +{ +unsigned old_counter; + +if (cpu_count() 2) { + return; +} + +set_idt_entry(2, nmi_handler); +on_cpu(1, update_cr3, (void *)read_cr3()); + +sti_loop_active = 1; +on_cpu_async(1, sti_loop, 0); +while (nmi_counter 3) { + old_counter = nmi_counter; + apic_icr_write(APIC_DEST_PHYSICAL | APIC_DM_NMI | APIC_INT_ASSERT, 1); + while (nmi_counter == old_counter) { + ; + } +} +sti_loop_active = 0; +report(nmi-after-sti, nmi_hlt_counter == 0); +} + int main() { setup_vm(); +smp_init(); test_lapic_existence(); @@ -288,6 +354,7 @@ int main() test_ioapic_intr(); test_ioapic_simultaneous(); +test_sti_nmi(); printf(\nsummary: %d tests, %d failures\n, g_tests, g_fail); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] apic: use boot idt instead of a locally allocated idt
From: Avi Kivity a...@redhat.com This allows the smp support, which uses the boot idt, to work. Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/x86/apic.c b/x86/apic.c index 48fa0f7..165f820 100644 --- a/x86/apic.c +++ b/x86/apic.c @@ -89,7 +89,7 @@ asm ( #endif ); -static idt_entry_t idt[256]; +static idt_entry_t *idt = 0; static int g_fail; static int g_tests; @@ -127,19 +127,6 @@ void test_enable_x2apic(void) } } -static void init_idt(void) -{ -struct { -u16 limit; -ulong idt; -} __attribute__((packed)) idt_ptr = { -sizeof(idt_entry_t) * 256 - 1, -(ulong)idt, -}; - -asm volatile(lidt %0 : : m(idt_ptr)); -} - static void set_idt_entry(unsigned vec, void (*func)(isr_regs_t *regs)) { u8 *thunk = vmalloc(50); @@ -296,7 +283,6 @@ int main() mask_pic_interrupts(); enable_apic(); test_enable_x2apic(); -init_idt(); test_self_ipi(); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
On 11/23/2010 08:41 AM, Avi Kivity wrote: On 11/23/2010 01:00 AM, Anthony Liguori wrote: qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT. Instead of teaching them to respond to these signals, introduce monitor commands that stop and start individual vcpus. The purpose of these commands are to implement CPU hard limits using an external tool that watches the CPU consumption and stops the CPU as appropriate. Why not use cgroup for that? The monitor commands provide a more elegant solution that signals because it ensures that a stopped vcpu isn't holding the qemu_mutex. From signal(7): The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored. Perhaps this is a bug in kvm? If we could catch SIGSTOP, then it would be easy to unblock it only while running in guest context. It would then stop on exit to userspace. Using monitor commands is fairly heavyweight for something as high frequency as this. What control period do you see people using? Maybe we should define USR1 for vcpu start/stop. What happens if one vcpu is stopped while another is running? Spin loops, synchronous IPIs will take forever. Maybe we need to stop the entire process. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Mask bit support's API
On Tuesday 23 November 2010 15:54:40 Avi Kivity wrote: On 11/23/2010 08:35 AM, Yang, Sheng wrote: On Tuesday 23 November 2010 14:17:28 Avi Kivity wrote: On 11/23/2010 08:09 AM, Yang, Sheng wrote: Hi Avi, I've purposed the following API for mask bit support. The main point is, QEmu can know which entries are enabled(by pci_enable_msix()). And for enabled entries, kernel own it, including MSI data/address and mask bit(routing table and mask bitmap). QEmu should use KVM_GET_MSIX_ENTRY ioctl to get them(and it can sync with them if it want to do so). Before entries are enabled, QEmu can still use it's own MSI table(because we didn't contain these kind of information in kernel, and it's unnecessary for kernel). The KVM_MSIX_FLAG_ENTRY flag would be clear if QEmu want to query one entry didn't exist in kernel - or we can simply return -EINVAL for it. I suppose it would be rare for QEmu to use this interface to get the context of entry(the only case I think is when MSI-X disable and QEmu need to sync the context), so performance should not be an issue. What's your opinion? #define KVM_GET_MSIX_ENTRY_IOWR(KVMIO, 0x7d, struct kvm_msix_entry) Need SET_MSIX_ENTRY for live migration as well. Current we don't support LM with VT-d... Isn't this work useful for virtio as well? Yeah, but won't be included in this patchset. #define KVM_UPDATE_MSIX_MMIO _IOW(KVMIO, 0x7e, struct kvm_msix_mmio) #define KVM_MSIX_TYPE_ASSIGNED_DEV 1 #define KVM_MSIX_FLAG_MASKBIT (1 0) #define KVM_MSIX_FLAG_QUERY_MASKBIT (1 0) #define KVM_MSIX_FLAG_ENTRY (1 1) #define KVM_MSIX_FLAG_QUERY_ENTRY (1 1) Why is there a need for the flag? If we simply get/set entire entries, that includes the mask bits? We still want QEmu to cover a part of entries which hasn't been enabled yet(which won't existed in routing table), but kernel would cover all mask bit regardless of if it's enabled. So QEmu can query any entry to check the maskbit, but not address/data. Don't understand. If we support reading/writing entire entries, that works for both enabled and disabled entries? What about the pending bits? We didn't cover it here - and it's in another MMIO space(PBA). Of course we can add more flags for it later. When an entry is masked, we need to set the pending bit for it somewhere. I guess this is broken in the existing code (without your patches)? Even with my patch, we didn't support the pending bit. It would always return 0 now. What we supposed to do(after my patch checked in) is to check IRQ_PENDING flag of irq_desc-status(if the entry is masked), and return the result to userspace. That would involve some core change, like to export irq_to_desc(). I don't think it would be accepted soon, so would push mask bit first. Also need a new exit reason to tell userspace that an msix entry has changed, so userspace can update mappings. I think we don't need it. Whenever userspace want to get one mapping which is an enabled MSI-X entry, it can check it with the API above(which is quite rare, because kernel would handle all of them when guest is accessing them). If it's a disabled entry, the context inside userspace MMIO record is the correct one(and only one). The only place I think QEmu need to sync is when MSI-X is about to disabled, QEmu need to update it's own MMIO record. So in-kernel handling of mmio would be decided per entry? I'm trying to simplify this, and simplest thing is - all or nothing. So you would like to handle all MSI-X MMIO in kernel? -- regards Yang, Sheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: buildbot for kvm.git
On 11/23/2010 02:11 AM, Daniel Gollub wrote: On Monday, November 22, 2010 10:37:05 pm Avi Kivity wrote: On 11/11/2010 11:22 AM, Daniel Gollub wrote: On Thursday, November 11, 2010 02:31:06 am Avi Kivity wrote: Daniel, the buildbot has been fairly effective in keeping qemu-kvm.git building. I'd like to extend that to kvm.git, especially for non-x86 architectures. [...] Can you help with this? Sure. I'll look into that next week. Daniel, any news about this? Currently I'm applying your recipe on the buildmaster configuration. Beside that, buildmaster and a small x86_64 buildslave got setup and is available on: http://buildbot.b1-systems.de/kvm/ Once I'm done with the buildmaster configuration (and some more testing) kvm.git continuous build testing could be ready within the next days. (I'm travelling right now, but shouldn't block me to get this done) If you like you can already setup the git post-receive hook in the kvm.git repo to trigger the buildmaster. Like for qemu-kvm.git you need to copy git_buildbot.py (preferably a copy of the one which is used for qemu-kvm.git) and change the master-port to 9991 ( master = :9991 ) In hooks/post-receive you add: /path/to/git_buildbot.py $1 $2 $3 Thanks, done. Will you set up crossbuilders for ppc/ia64/s390, or will I contribute a builder? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: trace_printk() support in trace-cmd
On 11/16/2010 05:12 PM, Steven Rostedt wrote: Hmm, I'll try it out on the latest kernel. Would you be able to upload the trace.dat that does not work someplace that I can get it. I'd like to take a look at it. If you don't have a place to put it, I could give you access to my box, and you can scp it there. Hmm, I still can not reproduce. But as a workaround, here's what you can do for now. Instead of using trace_printk() use: __trace_printk(_THIS_IP_, format, args); This will force the snprintf into the buffer and skips the bprintk trick to post process at read time. I see a trace_printk() commit in trace-cmd.git. Is that related? If not, I'll work on getting a small sample of the problem. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH trace-cmd 0/3] kvm plugin updates
Currently the kvm plugin only decodes vmx exit reasons; the first patch in this series adds support for the svm instruction set. Second patch fixes a typo. A couple of fields were added to the kvm_exit tracepoint; the third patch prints them out. Avi Kivity (3): kvm: parse svm exit reason kvm: fix typo UNKOWN kvm: display the new kvm_exit info1 and info2 fields, if available plugin_kvm.c | 121 ++ 1 files changed, 113 insertions(+), 8 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH trace-cmd 2/3] kvm: fix typo UNKOWN
Signed-off-by: Avi Kivity a...@redhat.com --- plugin_kvm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/plugin_kvm.c b/plugin_kvm.c index c8e8b8c..659b27f 100644 --- a/plugin_kvm.c +++ b/plugin_kvm.c @@ -236,7 +236,7 @@ static const char *find_exit_reason(unsigned isa, int val) break; if (strings[i].str) return strings[i].str; - return UNKOWN; + return UNKNOWN; } static int kvm_exit_handler(struct trace_seq *s, struct record *record, -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH trace-cmd 1/3] kvm: parse svm exit reason
svm exit reasons use different code than vmx; use the new isa trace field to select the instruction set and display the strings accordingly. Signed-off-by: Avi Kivity a...@redhat.com --- plugin_kvm.c | 114 ++ 1 files changed, 107 insertions(+), 7 deletions(-) diff --git a/plugin_kvm.c b/plugin_kvm.c index 724143d..c8e8b8c 100644 --- a/plugin_kvm.c +++ b/plugin_kvm.c @@ -120,6 +120,80 @@ static const char *disassemble(unsigned char *insn, int len, uint64_t rip, _ER(EPT_MISCONFIG, 49) \ _ER(WBINVD, 54) +#define SVM_EXIT_REASONS \ + _ER(EXIT_READ_CR0, 0x000) \ + _ER(EXIT_READ_CR3, 0x003) \ + _ER(EXIT_READ_CR4, 0x004) \ + _ER(EXIT_READ_CR8, 0x008) \ + _ER(EXIT_WRITE_CR0, 0x010) \ + _ER(EXIT_WRITE_CR3, 0x013) \ + _ER(EXIT_WRITE_CR4, 0x014) \ + _ER(EXIT_WRITE_CR8, 0x018) \ + _ER(EXIT_READ_DR0, 0x020) \ + _ER(EXIT_READ_DR1, 0x021) \ + _ER(EXIT_READ_DR2, 0x022) \ + _ER(EXIT_READ_DR3, 0x023) \ + _ER(EXIT_READ_DR4, 0x024) \ + _ER(EXIT_READ_DR5, 0x025) \ + _ER(EXIT_READ_DR6, 0x026) \ + _ER(EXIT_READ_DR7, 0x027) \ + _ER(EXIT_WRITE_DR0, 0x030) \ + _ER(EXIT_WRITE_DR1, 0x031) \ + _ER(EXIT_WRITE_DR2, 0x032) \ + _ER(EXIT_WRITE_DR3, 0x033) \ + _ER(EXIT_WRITE_DR4, 0x034) \ + _ER(EXIT_WRITE_DR5, 0x035) \ + _ER(EXIT_WRITE_DR6, 0x036) \ + _ER(EXIT_WRITE_DR7, 0x037) \ + _ER(EXIT_EXCP_BASE, 0x040) \ + _ER(EXIT_INTR, 0x060) \ + _ER(EXIT_NMI, 0x061) \ + _ER(EXIT_SMI, 0x062) \ + _ER(EXIT_INIT, 0x063) \ + _ER(EXIT_VINTR, 0x064) \ + _ER(EXIT_CR0_SEL_WRITE, 0x065) \ + _ER(EXIT_IDTR_READ, 0x066) \ + _ER(EXIT_GDTR_READ, 0x067) \ + _ER(EXIT_LDTR_READ, 0x068) \ + _ER(EXIT_TR_READ, 0x069) \ + _ER(EXIT_IDTR_WRITE,0x06a) \ + _ER(EXIT_GDTR_WRITE,0x06b) \ + _ER(EXIT_LDTR_WRITE,0x06c) \ + _ER(EXIT_TR_WRITE, 0x06d) \ + _ER(EXIT_RDTSC, 0x06e) \ + _ER(EXIT_RDPMC, 0x06f) \ + _ER(EXIT_PUSHF, 0x070) \ + _ER(EXIT_POPF, 0x071) \ + _ER(EXIT_CPUID, 0x072) \ + _ER(EXIT_RSM, 0x073) \ + _ER(EXIT_IRET, 0x074) \ + _ER(EXIT_SWINT, 0x075) \ + _ER(EXIT_INVD, 0x076) \ + _ER(EXIT_PAUSE, 0x077) \ + _ER(EXIT_HLT, 0x078) \ + _ER(EXIT_INVLPG,0x079) \ + _ER(EXIT_INVLPGA, 0x07a) \ + _ER(EXIT_IOIO, 0x07b) \ + _ER(EXIT_MSR, 0x07c) \ + _ER(EXIT_TASK_SWITCH, 0x07d) \ + _ER(EXIT_FERR_FREEZE, 0x07e) \ + _ER(EXIT_SHUTDOWN, 0x07f) \ + _ER(EXIT_VMRUN, 0x080) \ + _ER(EXIT_VMMCALL, 0x081) \ + _ER(EXIT_VMLOAD,0x082) \ + _ER(EXIT_VMSAVE,0x083) \ + _ER(EXIT_STGI, 0x084) \ + _ER(EXIT_CLGI, 0x085) \ + _ER(EXIT_SKINIT,0x086) \ + _ER(EXIT_RDTSCP,0x087) \ + _ER(EXIT_ICEBP, 0x088) \ + _ER(EXIT_WBINVD,0x089) \ + _ER(EXIT_MONITOR, 0x08a) \ + _ER(EXIT_MWAIT, 0x08b) \ + _ER(EXIT_MWAIT_COND,0x08c) \ + _ER(EXIT_NPF, 0x400) \ + _ER(EXIT_ERR, -1) + #define _ER(reason, val) { #reason, val }, struct str_values { const char *str; @@ -131,27 +205,53 @@ static struct str_values vmx_exit_reasons[] = { { NULL, -1} }; -static const char *find_vmx_reason(int val) +static struct str_values svm_exit_reasons[] = { + SVM_EXIT_REASONS + { NULL, -1} +}; + +static struct isa_exit_reasons { + unsigned isa; + struct str_values *strings; +} isa_exit_reasons[] = { + { .isa = 1, .strings = vmx_exit_reasons }, + { .isa = 2, .strings = svm_exit_reasons }, + { } +}; + +static const char *find_exit_reason(unsigned isa, int val) { + struct str_values *strings = NULL; int i; - for (i = 0; vmx_exit_reasons[i].val = 0; i++) -
[PATCH trace-cmd 3/3] kvm: display the new kvm_exit info1 and info2 fields, if available
Signed-off-by: Avi Kivity a...@redhat.com --- plugin_kvm.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/plugin_kvm.c b/plugin_kvm.c index 659b27f..c1cb2e4 100644 --- a/plugin_kvm.c +++ b/plugin_kvm.c @@ -244,6 +244,7 @@ static int kvm_exit_handler(struct trace_seq *s, struct record *record, { unsigned long long isa; unsigned long long val; + unsigned long long info1 = 0, info2 = 0; if (pevent_get_field_val(s, event, exit_reason, record, val, 1) 0) return -1; @@ -255,6 +256,10 @@ static int kvm_exit_handler(struct trace_seq *s, struct record *record, pevent_print_num_field(s, rip 0x%lx, event, guest_rip, record, 1); + if (pevent_get_field_val(s, event, info1, record, info1, 1) = 0 +pevent_get_field_val(s, event, info2, record, info2, 1) = 0) + trace_seq_printf(s, info %llx %llx\n, info1, info2); + return 0; } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: trace_printk() support in trace-cmd
On 11/16/2010 05:13 PM, Steven Rostedt wrote: BTW, what does /debug/tracing/printk_formats show? Empty. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Performance test result between per-vhost kthread disable and enable
On Tue, Nov 23, 2010 at 10:13:43AM +0800, lidong chen wrote: I test the performance between per-vhost kthread disable and enable. Test method: Send the same traffic load between per-vhost kthread disable and enable, and compare the cpu rate of host os. I run five vm on kvm, each of them have five nic. the vhost version which per-vhost kthread disable we used is rhel6 beta 2(2.6.32.60). the vhost version which per-vhost kthread enable we used is rhel6 (2.6.32-71). At this point, I'd suggest testing vhost-net on the upstream kernel, not on rhel kernels. The change that introduced per-device threads is: c23f3445e68e1db0e74099f264bc5ff5d55ebdeb Test result: with per-vhost kthread disable, the cpu rate of host os is 110%. with per-vhost kthread enable, the cpu rate of host os is 130%. Is CONFIG_SCHED_DEBUG set? We are stressing the scheduler a lot with vhost-net. In 2.6.32.60,the whole system only have a kthread. [r...@rhel6-kvm1 ~]# ps -ef | grep vhost root 973 2 0 Nov22 ?00:00:00 [vhost] In 2.6.32.71,the whole system have 25 kthread. [r...@kvm-4slot ~]# ps -ef | grep vhost- root 12896 2 0 10:26 ?00:00:00 [vhost-12842] root 12897 2 0 10:26 ?00:00:00 [vhost-12842] root 12898 2 0 10:26 ?00:00:00 [vhost-12842] root 12899 2 0 10:26 ?00:00:00 [vhost-12842] root 12900 2 0 10:26 ?00:00:00 [vhost-12842] root 13022 2 0 10:26 ?00:00:00 [vhost-12981] root 13023 2 0 10:26 ?00:00:00 [vhost-12981] root 13024 2 0 10:26 ?00:00:00 [vhost-12981] root 13025 2 0 10:26 ?00:00:00 [vhost-12981] root 13026 2 0 10:26 ?00:00:00 [vhost-12981] root 13146 2 0 10:26 ?00:00:00 [vhost-13088] root 13147 2 0 10:26 ?00:00:00 [vhost-13088] root 13148 2 0 10:26 ?00:00:00 [vhost-13088] root 13149 2 0 10:26 ?00:00:00 [vhost-13088] root 13150 2 0 10:26 ?00:00:00 [vhost-13088] ... Code difference: In 2.6.32.60,in function vhost_init, create the kthread for vhost. vhost_workqueue = create_singlethread_workqueue(vhost); In 2.6.32.71,in function vhost_dev_set_owner, create the kthread for each nic interface. dev-wq = create_singlethread_workqueue(vhost_name); Conclusion: with per-vhost kthread enable, the system can more throughput. but deal the same traffic load with per-vhost kthread enable, it waste more cpu resource. In my application scene, the cpu resource is more important, and one kthread for deal with traffic load is enough. So i think we should add a param to control this. for the CPU-bound system, this param disable per-vhost kthread. for the I/O-bound system, this param enable per-vhost kthread. the default value of this param is enable. If my opinion is right, i will give a patch for this. Let's try to figure out what the issue is, first. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Mask bit support's API
On Tue, Nov 23, 2010 at 02:09:52PM +0800, Yang, Sheng wrote: Hi Avi, I've purposed the following API for mask bit support. The main point is, QEmu can know which entries are enabled(by pci_enable_msix()). Unfortunately, it can't I think, unless all your guests are linux. enabled entries is a linux kernel concept. The MSIX spec only tells you which entries are masked and which are unmasked. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for Nov 23
On Mon, 22 Nov 2010 17:00:41 -0600 Anthony Liguori anth...@codemonkey.ws wrote: On 11/22/2010 03:45 PM, Chris Wright wrote: * Juan Quintela (quint...@redhat.com) wrote: Please send in any agenda items you are interested in covering. usb-ccid - vcpu hard limits - 0.14 (release date, bug day, -rc planning, etc) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on disable_kvm_x86_64_debian_5_0
The Buildbot has detected a new failure of disable_kvm_x86_64_debian_5_0 on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_x86_64_debian_5_0/builds/643 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_1 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Avi Kivity a...@redhat.com BUILD FAILED: failed git sincerely, -The Buildbot -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on disable_kvm_x86_64_out_of_tree
The Buildbot has detected a new failure of disable_kvm_x86_64_out_of_tree on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_x86_64_out_of_tree/builds/592 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_1 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Avi Kivity a...@redhat.com BUILD FAILED: failed git sincerely, -The Buildbot -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on disable_kvm_i386_debian_5_0
The Buildbot has detected a new failure of disable_kvm_i386_debian_5_0 on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_i386_debian_5_0/builds/644 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_2 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Avi Kivity a...@redhat.com BUILD FAILED: failed git sincerely, -The Buildbot -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on default_x86_64_debian_5_0
The Buildbot has detected a new failure of default_x86_64_debian_5_0 on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/default_x86_64_debian_5_0/builds/653 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_1 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Avi Kivity a...@redhat.com BUILD FAILED: failed git sincerely, -The Buildbot -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on default_x86_64_out_of_tree
The Buildbot has detected a new failure of default_x86_64_out_of_tree on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/default_x86_64_out_of_tree/builds/594 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_1 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Avi Kivity a...@redhat.com BUILD FAILED: failed git sincerely, -The Buildbot -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on disable_kvm_i386_out_of_tree
The Buildbot has detected a new failure of disable_kvm_i386_out_of_tree on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_i386_out_of_tree/builds/592 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_2 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Avi Kivity a...@redhat.com BUILD FAILED: failed git sincerely, -The Buildbot -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Mask bit support's API
On 11/23/2010 10:30 AM, Yang, Sheng wrote: On Tuesday 23 November 2010 15:54:40 Avi Kivity wrote: On 11/23/2010 08:35 AM, Yang, Sheng wrote: On Tuesday 23 November 2010 14:17:28 Avi Kivity wrote: On 11/23/2010 08:09 AM, Yang, Sheng wrote: Hi Avi, I've purposed the following API for mask bit support. The main point is, QEmu can know which entries are enabled(by pci_enable_msix()). And for enabled entries, kernel own it, including MSI data/address and mask bit(routing table and mask bitmap). QEmu should use KVM_GET_MSIX_ENTRY ioctl to get them(and it can sync with them if it want to do so). Before entries are enabled, QEmu can still use it's own MSI table(because we didn't contain these kind of information in kernel, and it's unnecessary for kernel). The KVM_MSIX_FLAG_ENTRY flag would be clear if QEmu want to query one entry didn't exist in kernel - or we can simply return -EINVAL for it. I suppose it would be rare for QEmu to use this interface to get the context of entry(the only case I think is when MSI-X disable and QEmu need to sync the context), so performance should not be an issue. What's your opinion? #define KVM_GET_MSIX_ENTRY_IOWR(KVMIO, 0x7d, struct kvm_msix_entry) Need SET_MSIX_ENTRY for live migration as well. Current we don't support LM with VT-d... Isn't this work useful for virtio as well? Yeah, but won't be included in this patchset. What API changes are needed? I'd like to see the complete API. What about the pending bits? We didn't cover it here - and it's in another MMIO space(PBA). Of course we can add more flags for it later. When an entry is masked, we need to set the pending bit for it somewhere. I guess this is broken in the existing code (without your patches)? Even with my patch, we didn't support the pending bit. It would always return 0 now. What we supposed to do(after my patch checked in) is to check IRQ_PENDING flag of irq_desc-status(if the entry is masked), and return the result to userspace. That would involve some core change, like to export irq_to_desc(). I don't think it would be accepted soon, so would push mask bit first. The API needs to be compatible with the pending bit, even if we don't implement it now. I want to reduce the rate of API changes. Also need a new exit reason to tell userspace that an msix entry has changed, so userspace can update mappings. I think we don't need it. Whenever userspace want to get one mapping which is an enabled MSI-X entry, it can check it with the API above(which is quite rare, because kernel would handle all of them when guest is accessing them). If it's a disabled entry, the context inside userspace MMIO record is the correct one(and only one). The only place I think QEmu need to sync is when MSI-X is about to disabled, QEmu need to update it's own MMIO record. So in-kernel handling of mmio would be decided per entry? I'm trying to simplify this, and simplest thing is - all or nothing. So you would like to handle all MSI-X MMIO in kernel? Yes. Writes to address or data would be handled by: - recording it into the shadow msix table - notifying userspace that msix entry x changed Reads would be handled in kernel from the shadow msix table. So instead of - guest reads/writes msix - kvm filters mmio, implements some, passes others to userspace we have - guest reads/writes msix - kvm implements all - some writes generate an additional notification to userspace -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on default_i386_debian_5_0
The Buildbot has detected a new failure of default_i386_debian_5_0 on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/default_i386_debian_5_0/builds/655 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_2 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Avi Kivity a...@redhat.com BUILD FAILED: failed git sincerely, -The Buildbot -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on default_i386_out_of_tree
The Buildbot has detected a new failure of default_i386_out_of_tree on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/default_i386_out_of_tree/builds/592 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_2 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Avi Kivity a...@redhat.com BUILD FAILED: failed git sincerely, -The Buildbot -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for Nov 23
23.11.2010 15:08, Luiz Capitulino wrote: [] - 0.14 (release date, bug day, -rc planning, etc) Um, can we have some 0.13.x before, please?.. :) /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Mask bit support's API
On Tue, Nov 23, 2010 at 02:47:33PM +0200, Avi Kivity wrote: On 11/23/2010 10:30 AM, Yang, Sheng wrote: On Tuesday 23 November 2010 15:54:40 Avi Kivity wrote: On 11/23/2010 08:35 AM, Yang, Sheng wrote: On Tuesday 23 November 2010 14:17:28 Avi Kivity wrote: On 11/23/2010 08:09 AM, Yang, Sheng wrote: Hi Avi, I've purposed the following API for mask bit support. The main point is, QEmu can know which entries are enabled(by pci_enable_msix()). And for enabled entries, kernel own it, including MSI data/address and mask bit(routing table and mask bitmap). QEmu should use KVM_GET_MSIX_ENTRY ioctl to get them(and it can sync with them if it want to do so). Before entries are enabled, QEmu can still use it's own MSI table(because we didn't contain these kind of information in kernel, and it's unnecessary for kernel). The KVM_MSIX_FLAG_ENTRY flag would be clear if QEmu want to query one entry didn't exist in kernel - or we can simply return -EINVAL for it. I suppose it would be rare for QEmu to use this interface to get the context of entry(the only case I think is when MSI-X disable and QEmu need to sync the context), so performance should not be an issue. What's your opinion? #define KVM_GET_MSIX_ENTRY_IOWR(KVMIO, 0x7d, struct kvm_msix_entry) Need SET_MSIX_ENTRY for live migration as well. Current we don't support LM with VT-d... Isn't this work useful for virtio as well? Yeah, but won't be included in this patchset. What API changes are needed? I'd like to see the complete API. What about the pending bits? We didn't cover it here - and it's in another MMIO space(PBA). Of course we can add more flags for it later. When an entry is masked, we need to set the pending bit for it somewhere. I guess this is broken in the existing code (without your patches)? Even with my patch, we didn't support the pending bit. It would always return 0 now. What we supposed to do(after my patch checked in) is to check IRQ_PENDING flag of irq_desc-status(if the entry is masked), and return the result to userspace. That would involve some core change, like to export irq_to_desc(). I don't think it would be accepted soon, so would push mask bit first. The API needs to be compatible with the pending bit, even if we don't implement it now. I want to reduce the rate of API changes. Also need a new exit reason to tell userspace that an msix entry has changed, so userspace can update mappings. I think we don't need it. Whenever userspace want to get one mapping which is an enabled MSI-X entry, it can check it with the API above(which is quite rare, because kernel would handle all of them when guest is accessing them). If it's a disabled entry, the context inside userspace MMIO record is the correct one(and only one). The only place I think QEmu need to sync is when MSI-X is about to disabled, QEmu need to update it's own MMIO record. So in-kernel handling of mmio would be decided per entry? I'm trying to simplify this, and simplest thing is - all or nothing. So you would like to handle all MSI-X MMIO in kernel? Yes. Writes to address or data would be handled by: - recording it into the shadow msix table - notifying userspace that msix entry x changed Reads would be handled in kernel from the shadow msix table. So instead of - guest reads/writes msix - kvm filters mmio, implements some, passes others to userspace we have - guest reads/writes msix - kvm implements all - some writes generate an additional notification to userspace One small proposal in addition: since all accesses are done from guest anyway, the shadow table can/should be stored using userspace memory, reducing the kernel memory overhead of the feature from up to 4K per MSIX table to just 8 bytes. Active entries can be cached in kernel memory. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Performance test result between per-vhost kthread disable and enable
At this point, I'd suggest testing vhost-net on the upstream kernel, not on rhel kernels. The change that introduced per-device threads is: c23f3445e68e1db0e74099f264bc5ff5d55ebdeb i will try this tomorrow. Is CONFIG_SCHED_DEBUG set? yes. CONFIG_SCHED_DEBUG=y. 2010/11/23 Michael S. Tsirkin m...@redhat.com: On Tue, Nov 23, 2010 at 10:13:43AM +0800, lidong chen wrote: I test the performance between per-vhost kthread disable and enable. Test method: Send the same traffic load between per-vhost kthread disable and enable, and compare the cpu rate of host os. I run five vm on kvm, each of them have five nic. the vhost version which per-vhost kthread disable we used is rhel6 beta 2(2.6.32.60). the vhost version which per-vhost kthread enable we used is rhel6 (2.6.32-71). At this point, I'd suggest testing vhost-net on the upstream kernel, not on rhel kernels. The change that introduced per-device threads is: c23f3445e68e1db0e74099f264bc5ff5d55ebdeb Test result: with per-vhost kthread disable, the cpu rate of host os is 110%. with per-vhost kthread enable, the cpu rate of host os is 130%. Is CONFIG_SCHED_DEBUG set? We are stressing the scheduler a lot with vhost-net. In 2.6.32.60,the whole system only have a kthread. [r...@rhel6-kvm1 ~]# ps -ef | grep vhost root 973 2 0 Nov22 ? 00:00:00 [vhost] In 2.6.32.71,the whole system have 25 kthread. [r...@kvm-4slot ~]# ps -ef | grep vhost- root 12896 2 0 10:26 ? 00:00:00 [vhost-12842] root 12897 2 0 10:26 ? 00:00:00 [vhost-12842] root 12898 2 0 10:26 ? 00:00:00 [vhost-12842] root 12899 2 0 10:26 ? 00:00:00 [vhost-12842] root 12900 2 0 10:26 ? 00:00:00 [vhost-12842] root 13022 2 0 10:26 ? 00:00:00 [vhost-12981] root 13023 2 0 10:26 ? 00:00:00 [vhost-12981] root 13024 2 0 10:26 ? 00:00:00 [vhost-12981] root 13025 2 0 10:26 ? 00:00:00 [vhost-12981] root 13026 2 0 10:26 ? 00:00:00 [vhost-12981] root 13146 2 0 10:26 ? 00:00:00 [vhost-13088] root 13147 2 0 10:26 ? 00:00:00 [vhost-13088] root 13148 2 0 10:26 ? 00:00:00 [vhost-13088] root 13149 2 0 10:26 ? 00:00:00 [vhost-13088] root 13150 2 0 10:26 ? 00:00:00 [vhost-13088] ... Code difference: In 2.6.32.60,in function vhost_init, create the kthread for vhost. vhost_workqueue = create_singlethread_workqueue(vhost); In 2.6.32.71,in function vhost_dev_set_owner, create the kthread for each nic interface. dev-wq = create_singlethread_workqueue(vhost_name); Conclusion: with per-vhost kthread enable, the system can more throughput. but deal the same traffic load with per-vhost kthread enable, it waste more cpu resource. In my application scene, the cpu resource is more important, and one kthread for deal with traffic load is enough. So i think we should add a param to control this. for the CPU-bound system, this param disable per-vhost kthread. for the I/O-bound system, this param enable per-vhost kthread. the default value of this param is enable. If my opinion is right, i will give a patch for this. Let's try to figure out what the issue is, first. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 02/22] bitops: rename generic little-endian bitops functions
As a preparation for providing little-endian bitops for all architectures, This removes generic_ prefix from little-endian bitops function names in asm-generic/bitops/le.h. s/generic_find_next_le_bit/find_next_le_bit/ s/generic_find_next_zero_le_bit/find_next_zero_le_bit/ s/generic_find_first_zero_le_bit/find_first_zero_le_bit/ s/generic___test_and_set_le_bit/__test_and_set_le_bit/ s/generic___test_and_clear_le_bit/__test_and_clear_le_bit/ s/generic_test_le_bit/test_le_bit/ s/generic___set_le_bit/__set_le_bit/ s/generic___clear_le_bit/__clear_le_bit/ s/generic_test_and_set_le_bit/test_and_set_le_bit/ s/generic_test_and_clear_le_bit/test_and_clear_le_bit/ Signed-off-by: Akinobu Mita akinobu.m...@gmail.com Acked-by: Arnd Bergmann a...@arndb.de Acked-by: Hans-Christian Egtvedt hans-christian.egtv...@atmel.com Cc: Geert Uytterhoeven ge...@linux-m68k.org Cc: Roman Zippel zip...@linux-m68k.org Cc: Andreas Schwab sch...@linux-m68k.org Cc: linux-m...@lists.linux-m68k.org Cc: Greg Ungerer g...@uclinux.org Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: Paul Mackerras pau...@samba.org Cc: linuxppc-...@lists.ozlabs.org Cc: Andy Grover andy.gro...@oracle.com Cc: rds-de...@oss.oracle.com Cc: David S. Miller da...@davemloft.net Cc: net...@vger.kernel.org Cc: Avi Kivity a...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Cc: kvm@vger.kernel.org --- No change from previous submission arch/avr32/kernel/avr32_ksyms.c |4 ++-- arch/avr32/lib/findbit.S |4 ++-- arch/m68k/include/asm/bitops_mm.h|8 arch/m68k/include/asm/bitops_no.h|2 +- arch/powerpc/include/asm/bitops.h| 11 ++- include/asm-generic/bitops/ext2-non-atomic.h | 12 ++-- include/asm-generic/bitops/le.h | 26 +- include/asm-generic/bitops/minix-le.h| 10 +- lib/find_next_bit.c |9 - net/rds/cong.c |6 +++--- virt/kvm/kvm_main.c |2 +- 11 files changed, 47 insertions(+), 47 deletions(-) diff --git a/arch/avr32/kernel/avr32_ksyms.c b/arch/avr32/kernel/avr32_ksyms.c index 11e310c..c63b943 100644 --- a/arch/avr32/kernel/avr32_ksyms.c +++ b/arch/avr32/kernel/avr32_ksyms.c @@ -58,8 +58,8 @@ EXPORT_SYMBOL(find_first_zero_bit); EXPORT_SYMBOL(find_next_zero_bit); EXPORT_SYMBOL(find_first_bit); EXPORT_SYMBOL(find_next_bit); -EXPORT_SYMBOL(generic_find_next_le_bit); -EXPORT_SYMBOL(generic_find_next_zero_le_bit); +EXPORT_SYMBOL(find_next_le_bit); +EXPORT_SYMBOL(find_next_zero_le_bit); /* I/O primitives (lib/io-*.S) */ EXPORT_SYMBOL(__raw_readsb); diff --git a/arch/avr32/lib/findbit.S b/arch/avr32/lib/findbit.S index 997b33b..6880d85 100644 --- a/arch/avr32/lib/findbit.S +++ b/arch/avr32/lib/findbit.S @@ -123,7 +123,7 @@ ENTRY(find_next_bit) brgt1b retal r11 -ENTRY(generic_find_next_le_bit) +ENTRY(find_next_le_bit) lsr r8, r10, 5 sub r9, r11, r10 retle r11 @@ -153,7 +153,7 @@ ENTRY(generic_find_next_le_bit) brgt1b retal r11 -ENTRY(generic_find_next_zero_le_bit) +ENTRY(find_next_zero_le_bit) lsr r8, r10, 5 sub r9, r11, r10 retle r11 diff --git a/arch/m68k/include/asm/bitops_mm.h b/arch/m68k/include/asm/bitops_mm.h index b4ecdaa..f1010ab 100644 --- a/arch/m68k/include/asm/bitops_mm.h +++ b/arch/m68k/include/asm/bitops_mm.h @@ -366,9 +366,9 @@ static inline int minix_test_bit(int nr, const void *vaddr) #define ext2_clear_bit(nr, addr) __test_and_clear_bit((nr) ^ 24, (unsigned long *)(addr)) #define ext2_clear_bit_atomic(lock, nr, addr) test_and_clear_bit((nr) ^ 24, (unsigned long *)(addr)) #define ext2_find_next_zero_bit(addr, size, offset) \ - generic_find_next_zero_le_bit((unsigned long *)addr, size, offset) + find_next_zero_le_bit((unsigned long *)addr, size, offset) #define ext2_find_next_bit(addr, size, offset) \ - generic_find_next_le_bit((unsigned long *)addr, size, offset) + find_next_le_bit((unsigned long *)addr, size, offset) static inline int ext2_test_bit(int nr, const void *vaddr) { @@ -398,7 +398,7 @@ static inline int ext2_find_first_zero_bit(const void *vaddr, unsigned size) return (p - addr) * 32 + res; } -static inline unsigned long generic_find_next_zero_le_bit(const unsigned long *addr, +static inline unsigned long find_next_zero_le_bit(const unsigned long *addr, unsigned long size, unsigned long offset) { const unsigned long *p = addr + (offset 5); @@ -440,7 +440,7 @@ static inline int ext2_find_first_bit(const void *vaddr, unsigned size) return (p - addr) * 32 + res; } -static inline unsigned long generic_find_next_le_bit(const unsigned long *addr, +static inline unsigned long find_next_le_bit(const unsigned long *addr, unsigned long size,
Re: Performance test result between per-vhost kthread disable and enable
On Tue, Nov 23, 2010 at 09:23:41PM +0800, lidong chen wrote: At this point, I'd suggest testing vhost-net on the upstream kernel, not on rhel kernels. The change that introduced per-device threads is: c23f3445e68e1db0e74099f264bc5ff5d55ebdeb i will try this tomorrow. Is CONFIG_SCHED_DEBUG set? yes. CONFIG_SCHED_DEBUG=y. Disable it. Either debug scheduler or perf-test it :) 2010/11/23 Michael S. Tsirkin m...@redhat.com: On Tue, Nov 23, 2010 at 10:13:43AM +0800, lidong chen wrote: I test the performance between per-vhost kthread disable and enable. Test method: Send the same traffic load between per-vhost kthread disable and enable, and compare the cpu rate of host os. I run five vm on kvm, each of them have five nic. the vhost version which per-vhost kthread disable we used is rhel6 beta 2(2.6.32.60). the vhost version which per-vhost kthread enable we used is rhel6 (2.6.32-71). At this point, I'd suggest testing vhost-net on the upstream kernel, not on rhel kernels. The change that introduced per-device threads is: c23f3445e68e1db0e74099f264bc5ff5d55ebdeb Test result: with per-vhost kthread disable, the cpu rate of host os is 110%. with per-vhost kthread enable, the cpu rate of host os is 130%. Is CONFIG_SCHED_DEBUG set? We are stressing the scheduler a lot with vhost-net. In 2.6.32.60,the whole system only have a kthread. [r...@rhel6-kvm1 ~]# ps -ef | grep vhost root 973 2 0 Nov22 ? 00:00:00 [vhost] In 2.6.32.71,the whole system have 25 kthread. [r...@kvm-4slot ~]# ps -ef | grep vhost- root 12896 2 0 10:26 ? 00:00:00 [vhost-12842] root 12897 2 0 10:26 ? 00:00:00 [vhost-12842] root 12898 2 0 10:26 ? 00:00:00 [vhost-12842] root 12899 2 0 10:26 ? 00:00:00 [vhost-12842] root 12900 2 0 10:26 ? 00:00:00 [vhost-12842] root 13022 2 0 10:26 ? 00:00:00 [vhost-12981] root 13023 2 0 10:26 ? 00:00:00 [vhost-12981] root 13024 2 0 10:26 ? 00:00:00 [vhost-12981] root 13025 2 0 10:26 ? 00:00:00 [vhost-12981] root 13026 2 0 10:26 ? 00:00:00 [vhost-12981] root 13146 2 0 10:26 ? 00:00:00 [vhost-13088] root 13147 2 0 10:26 ? 00:00:00 [vhost-13088] root 13148 2 0 10:26 ? 00:00:00 [vhost-13088] root 13149 2 0 10:26 ? 00:00:00 [vhost-13088] root 13150 2 0 10:26 ? 00:00:00 [vhost-13088] ... Code difference: In 2.6.32.60,in function vhost_init, create the kthread for vhost. vhost_workqueue = create_singlethread_workqueue(vhost); In 2.6.32.71,in function vhost_dev_set_owner, create the kthread for each nic interface. dev-wq = create_singlethread_workqueue(vhost_name); Conclusion: with per-vhost kthread enable, the system can more throughput. but deal the same traffic load with per-vhost kthread enable, it waste more cpu resource. In my application scene, the cpu resource is more important, and one kthread for deal with traffic load is enough. So i think we should add a param to control this. for the CPU-bound system, this param disable per-vhost kthread. for the I/O-bound system, this param enable per-vhost kthread. the default value of this param is enable. If my opinion is right, i will give a patch for this. Let's try to figure out what the issue is, first. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 09/22] kvm: stop including asm-generic/bitops/le.h
No need to include asm-generic/bitops/le.h as all architectures provide little-endian bit operations now. Signed-off-by: Akinobu Mita akinobu.m...@gmail.com Cc: Avi Kivity a...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Cc: kvm@vger.kernel.org --- No change from previous submission virt/kvm/kvm_main.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index da16155..57a7e3d 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -52,7 +52,6 @@ #include asm/io.h #include asm/uaccess.h #include asm/pgtable.h -#include asm-generic/bitops/le.h #include coalesced_mmio.h -- 1.7.3.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
On 11/23/2010 12:41 AM, Avi Kivity wrote: On 11/23/2010 01:00 AM, Anthony Liguori wrote: qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT. Instead of teaching them to respond to these signals, introduce monitor commands that stop and start individual vcpus. The purpose of these commands are to implement CPU hard limits using an external tool that watches the CPU consumption and stops the CPU as appropriate. The monitor commands provide a more elegant solution that signals because it ensures that a stopped vcpu isn't holding the qemu_mutex. From signal(7): The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored. Perhaps this is a bug in kvm? I need to dig deeper than. Maybe its something about sending SIGSTOP to a process? If we could catch SIGSTOP, then it would be easy to unblock it only while running in guest context. It would then stop on exit to userspace. Yeah, that's not a bad idea. Using monitor commands is fairly heavyweight for something as high frequency as this. What control period do you see people using? Maybe we should define USR1 for vcpu start/stop. What happens if one vcpu is stopped while another is running? Spin loops, synchronous IPIs will take forever. Maybe we need to stop the entire process. It's the same problem if a VCPU is descheduled while another is running. The problem with stopping the entire process is that a big motivation for this is to ensure that benchmarks have consistent results regardless of CPU capacity. If you just monitor the full process, then one VCPU may dominate the entitlement resulting in very erratic benchmarking. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Mask bit support's API
On Tuesday 23 November 2010 20:47:33 Avi Kivity wrote: On 11/23/2010 10:30 AM, Yang, Sheng wrote: On Tuesday 23 November 2010 15:54:40 Avi Kivity wrote: On 11/23/2010 08:35 AM, Yang, Sheng wrote: On Tuesday 23 November 2010 14:17:28 Avi Kivity wrote: On 11/23/2010 08:09 AM, Yang, Sheng wrote: Hi Avi, I've purposed the following API for mask bit support. The main point is, QEmu can know which entries are enabled(by pci_enable_msix()). And for enabled entries, kernel own it, including MSI data/address and mask bit(routing table and mask bitmap). QEmu should use KVM_GET_MSIX_ENTRY ioctl to get them(and it can sync with them if it want to do so). Before entries are enabled, QEmu can still use it's own MSI table(because we didn't contain these kind of information in kernel, and it's unnecessary for kernel). The KVM_MSIX_FLAG_ENTRY flag would be clear if QEmu want to query one entry didn't exist in kernel - or we can simply return -EINVAL for it. I suppose it would be rare for QEmu to use this interface to get the context of entry(the only case I think is when MSI-X disable and QEmu need to sync the context), so performance should not be an issue. What's your opinion? #define KVM_GET_MSIX_ENTRY_IOWR(KVMIO, 0x7d, struct kvm_msix_entry) Need SET_MSIX_ENTRY for live migration as well. Current we don't support LM with VT-d... Isn't this work useful for virtio as well? Yeah, but won't be included in this patchset. What API changes are needed? I'd like to see the complete API. I am not sure about it. But I suppose the structure should be the same? In fact it's pretty hard for me to image what's needed for virtio in the future, especially there is no such code now. I really prefer to deal with assigned device and virtio separately, which would make the work much easier. But seems you won't agree on that. What about the pending bits? We didn't cover it here - and it's in another MMIO space(PBA). Of course we can add more flags for it later. When an entry is masked, we need to set the pending bit for it somewhere. I guess this is broken in the existing code (without your patches)? Even with my patch, we didn't support the pending bit. It would always return 0 now. What we supposed to do(after my patch checked in) is to check IRQ_PENDING flag of irq_desc-status(if the entry is masked), and return the result to userspace. That would involve some core change, like to export irq_to_desc(). I don't think it would be accepted soon, so would push mask bit first. The API needs to be compatible with the pending bit, even if we don't implement it now. I want to reduce the rate of API changes. This can be implemented by this API, just adding a flag for it. And I would still take this into consideration in the next API purposal. Also need a new exit reason to tell userspace that an msix entry has changed, so userspace can update mappings. I think we don't need it. Whenever userspace want to get one mapping which is an enabled MSI-X entry, it can check it with the API above(which is quite rare, because kernel would handle all of them when guest is accessing them). If it's a disabled entry, the context inside userspace MMIO record is the correct one(and only one). The only place I think QEmu need to sync is when MSI-X is about to disabled, QEmu need to update it's own MMIO record. So in-kernel handling of mmio would be decided per entry? I'm trying to simplify this, and simplest thing is - all or nothing. So you would like to handle all MSI-X MMIO in kernel? Yes. Writes to address or data would be handled by: - recording it into the shadow msix table - notifying userspace that msix entry x changed Reads would be handled in kernel from the shadow msix table. So instead of - guest reads/writes msix - kvm filters mmio, implements some, passes others to userspace we have - guest reads/writes msix - kvm implements all - some writes generate an additional notification to userspace I suppose we don't need to generate notification to userspace? Because every read/write is handled by kernel, and userspace just need interface to kernel to get/set the entry - and well, does userspace need to do it when kernel can handle all of them? Maybe not... -- regards Yang, Sheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
On 11/23/2010 02:16 AM, Dor Laor wrote: On 11/23/2010 08:41 AM, Avi Kivity wrote: On 11/23/2010 01:00 AM, Anthony Liguori wrote: qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT. Instead of teaching them to respond to these signals, introduce monitor commands that stop and start individual vcpus. The purpose of these commands are to implement CPU hard limits using an external tool that watches the CPU consumption and stops the CPU as appropriate. Why not use cgroup for that? This is a stop-gap. The cgroup solution isn't perfect. It doesn't know anything about guest time verses hypervisor time so it can't account just the guest time like we do with this implementation. Also, since it may deschedule the vcpu thread while it's holding the qemu_mutex, it may unfairly tax other vcpu threads by creating additional lock contention. This is all solvable but if there's an alternative that just requires a small change to qemu, it's worth doing in the short term. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
On 11/23/2010 03:51 PM, Anthony Liguori wrote: On 11/23/2010 12:41 AM, Avi Kivity wrote: On 11/23/2010 01:00 AM, Anthony Liguori wrote: qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT. Instead of teaching them to respond to these signals, introduce monitor commands that stop and start individual vcpus. The purpose of these commands are to implement CPU hard limits using an external tool that watches the CPU consumption and stops the CPU as appropriate. The monitor commands provide a more elegant solution that signals because it ensures that a stopped vcpu isn't holding the qemu_mutex. From signal(7): The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored. Perhaps this is a bug in kvm? I need to dig deeper than. Signals are a bottomless pit. Maybe its something about sending SIGSTOP to a process? AFAIK sending SIGSTOP to a process should stop all of its threads? SIGSTOPping a thread should also work. If we could catch SIGSTOP, then it would be easy to unblock it only while running in guest context. It would then stop on exit to userspace. Yeah, that's not a bad idea. Except we can't. Using monitor commands is fairly heavyweight for something as high frequency as this. What control period do you see people using? Maybe we should define USR1 for vcpu start/stop. What happens if one vcpu is stopped while another is running? Spin loops, synchronous IPIs will take forever. Maybe we need to stop the entire process. It's the same problem if a VCPU is descheduled while another is running. We can fix that with directed yield or lock holder preemption prevention. But if a vcpu is stopped by qemu, we suddenly can't. The problem with stopping the entire process is that a big motivation for this is to ensure that benchmarks have consistent results regardless of CPU capacity. If you just monitor the full process, then one VCPU may dominate the entitlement resulting in very erratic benchmarking. What's the desired behaviour? Give each vcpu 300M cycles per second, or give a 2vcpu guest 600M cycles per second? You could monitor threads separately but stop the entire process. Stopping individual threads will break apart as soon as they start taking locks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Mask bit support's API
On Tuesday 23 November 2010 20:04:16 Michael S. Tsirkin wrote: On Tue, Nov 23, 2010 at 02:09:52PM +0800, Yang, Sheng wrote: Hi Avi, I've purposed the following API for mask bit support. The main point is, QEmu can know which entries are enabled(by pci_enable_msix()). Unfortunately, it can't I think, unless all your guests are linux. enabled entries is a linux kernel concept. The MSIX spec only tells you which entries are masked and which are unmasked. Can't understand what you are talking about, and how it related to the guest OS. I was talking about pci_enable_msix() in the host Linux. -- regards Yang, Sheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Mask bit support's API
On 11/23/2010 03:57 PM, Yang, Sheng wrote: Yeah, but won't be included in this patchset. What API changes are needed? I'd like to see the complete API. I am not sure about it. But I suppose the structure should be the same? In fact it's pretty hard for me to image what's needed for virtio in the future, especially there is no such code now. I really prefer to deal with assigned device and virtio separately, which would make the work much easier. But seems you won't agree on that. First, I don't really see why the two cases are different (but I don't do a lot in this space). Surely between you and Michael, you have all the information? Second, my worry is a huge number of ABI variants that come from incrementally adding features. I want to implement bigger chunks of functionality. So I'd like to see all potential users addressed, at least from the ABI point of view if not the implementation. The API needs to be compatible with the pending bit, even if we don't implement it now. I want to reduce the rate of API changes. This can be implemented by this API, just adding a flag for it. And I would still take this into consideration in the next API purposal. Shouldn't kvm also service reads from the pending bitmask? So instead of - guest reads/writes msix - kvm filters mmio, implements some, passes others to userspace we have - guest reads/writes msix - kvm implements all - some writes generate an additional notification to userspace I suppose we don't need to generate notification to userspace? Because every read/write is handled by kernel, and userspace just need interface to kernel to get/set the entry - and well, does userspace need to do it when kernel can handle all of them? Maybe not... We could have the kernel handle addr/data writes by setting up an internal interrupt routing. A disadvantage is that more work is needed if we emulator interrupt remapping in qemu. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
On 11/23/2010 08:00 AM, Avi Kivity wrote: If we could catch SIGSTOP, then it would be easy to unblock it only while running in guest context. It would then stop on exit to userspace. Yeah, that's not a bad idea. Except we can't. Yeah, I s:SIGSTOP:SIGUSR1:g. Using monitor commands is fairly heavyweight for something as high frequency as this. What control period do you see people using? Maybe we should define USR1 for vcpu start/stop. What happens if one vcpu is stopped while another is running? Spin loops, synchronous IPIs will take forever. Maybe we need to stop the entire process. It's the same problem if a VCPU is descheduled while another is running. We can fix that with directed yield or lock holder preemption prevention. But if a vcpu is stopped by qemu, we suddenly can't. That only works for spin locks. Here's the scenario: 1) VCPU 0 drops to userspace and acquires qemu_mutex 2) VCPU 0 gets descheduled 3) VCPU 1 needs to drop to userspace and acquire qemu_mutex, gets blocked and yields 4) If we're lucky, VCPU 0 gets scheduled but it depends on how busy the system is With CFS hard limits, once (2) happens, we're boned for (3) because (4) cannot happen. By having QEMU know about (2), it can choose to run just a little bit longer in order to drop qemu_mutex such that (3) never happens. The problem with stopping the entire process is that a big motivation for this is to ensure that benchmarks have consistent results regardless of CPU capacity. If you just monitor the full process, then one VCPU may dominate the entitlement resulting in very erratic benchmarking. What's the desired behaviour? Give each vcpu 300M cycles per second, or give a 2vcpu guest 600M cycles per second? Each vcpu gets 300M cycles per second. You could monitor threads separately but stop the entire process. Stopping individual threads will break apart as soon as they start taking locks. I don't think so.. PLE should work as expected. It's no different than a normally contended system. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: trace_printk() support in trace-cmd
On Tue, 2010-11-23 at 13:04 +0200, Avi Kivity wrote: On 11/16/2010 05:13 PM, Steven Rostedt wrote: BTW, what does /debug/tracing/printk_formats show? Empty. So you have real trace_printk's not bprintk's? That is, if the format is not a const, then we fall back to __trace_printk(_THIS_IP_, fmt, args); And this is a different object. I have not tested these in a while, I'll give it a try. But if your printks are bprintks, then the bug is in the kernel, since that printk_formats needs to show something. -- Steve -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
On 11/23/2010 04:24 PM, Anthony Liguori wrote: Using monitor commands is fairly heavyweight for something as high frequency as this. What control period do you see people using? Maybe we should define USR1 for vcpu start/stop. What happens if one vcpu is stopped while another is running? Spin loops, synchronous IPIs will take forever. Maybe we need to stop the entire process. It's the same problem if a VCPU is descheduled while another is running. We can fix that with directed yield or lock holder preemption prevention. But if a vcpu is stopped by qemu, we suddenly can't. That only works for spin locks. Here's the scenario: 1) VCPU 0 drops to userspace and acquires qemu_mutex 2) VCPU 0 gets descheduled 3) VCPU 1 needs to drop to userspace and acquire qemu_mutex, gets blocked and yields 4) If we're lucky, VCPU 0 gets scheduled but it depends on how busy the system is With CFS hard limits, once (2) happens, we're boned for (3) because (4) cannot happen. By having QEMU know about (2), it can choose to run just a little bit longer in order to drop qemu_mutex such that (3) never happens. There's some support for futex priority inheritance, perhaps we can leverage that. It's supposed to be for realtime threads, but perhaps we can hook the priority booster to directed yield. It's really the same problem -- preempted lock holder -- only in userspace. We should be able to use the same solution. The problem with stopping the entire process is that a big motivation for this is to ensure that benchmarks have consistent results regardless of CPU capacity. If you just monitor the full process, then one VCPU may dominate the entitlement resulting in very erratic benchmarking. What's the desired behaviour? Give each vcpu 300M cycles per second, or give a 2vcpu guest 600M cycles per second? Each vcpu gets 300M cycles per second. You could monitor threads separately but stop the entire process. Stopping individual threads will break apart as soon as they start taking locks. I don't think so.. PLE should work as expected. It's no different than a normally contended system. PLE without directed yield is useless. With directed yield, it may work, but if the vcpu is stopped, it becomes ineffective. Directed yield allows the scheduler to follow a bouncing lock around by increasing the priority (or decreasing vruntime) of the immediate lock holder at the expense of waiters. SIGSTOP may drop the priority of the lock holder to zero without giving PLE a way to adjust. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for Nov 23
Am 22.11.2010 14:55, schrieb Stefan Hajnoczi: On Mon, Nov 22, 2010 at 1:38 PM, Juan Quintela quint...@redhat.com wrote: Please send in any agenda items you are interested in covering. QCOW2 performance roadmap: * What can be done to achieve near-raw image format performance? * Benchmark results from an ideal QCOW2 model. Some thoughts on qcow2 performance: == Fully allocated image == Should be able to perform similar to raw because there is very little handling of metadata. Additional I/O only if an L2 table must be read from the disk. * Should we increase the L2 table cache size to make it happen less often? (Currently 16 * 512 MB, QED uses more) Known problems: * Synchronous read of L2 tables; should be made async ** General thought on making things async: Coroutines? What happened to that proposal? * We may want to have online defragmentation eventually == Growing stand-alone image == Stand-alone images (i.e. images without a backing file) aren't that interesting because you would use raw for them anyway if you needed optimal performance. We need to be good enough here. However, all of the problems that arise from dealing with metadata apply for the really interesting third case, so optimizing them is an important step on the way. Known problems: * Needs a bdrv_flush between refcount table and L2 table write * Synchronous metadata updates * Both to be solved by block-queue ** Batches writes and makes the async, can greatly reduce number of bdrv_flush calls ** Except for cache=writethrough, but this is secondary ** Should we make cache=off the default caching mode in qemu? writethrough seems to be a bit too much anyway irrespective of the image format. * Synchronous refcount table reads ** How frequent are cache misses? ** Making this one async is much harder than L2 table reads. We can make it a goal for mid-term, but short term we should make it hurt less if it's a problem in practice. *** It's probably not, because (without internal snapshots or compression) we never free clusters, so we fill it sequentially and only load a new one when the old one is full - and that one we don't even read, but write, so block-queue will help * Things like refcount table growth are completely synchronous. ** Not a real problem, because it happens approximately never. == Growing image with backing file == This is the really interesting scenario where you need an image format that provides some features. For qcow2, it's mostly the same as above. See stand-alone, plus: * Needs an bdrv_flush between COW and writing to the L2 table ** qcow2 has already one after refcount table write, so no additional overhead * Synchronous COW ** Should be fairly easy to make async -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: trace_printk() support in trace-cmd
On 11/23/2010 04:30 PM, Steven Rostedt wrote: On Tue, 2010-11-23 at 13:04 +0200, Avi Kivity wrote: On 11/16/2010 05:13 PM, Steven Rostedt wrote: BTW, what does /debug/tracing/printk_formats show? Empty. So you have real trace_printk's not bprintk's? What are bprintk()s? That is, if the format is not a const, then we fall back to __trace_printk(_THIS_IP_, fmt, args); And this is a different object. I have not tested these in a while, I'll give it a try. But if your printks are bprintks, then the bug is in the kernel, since that printk_formats needs to show something. What I do is sprinkle trace_printk()s around my code and expect to see them interspersed with enabled tracepoints in 'trace-cmd report'. Is that not the intended behaviour? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for Nov 23
On Tue, Nov 23, 2010 at 2:37 PM, Kevin Wolf kw...@redhat.com wrote: Am 22.11.2010 14:55, schrieb Stefan Hajnoczi: On Mon, Nov 22, 2010 at 1:38 PM, Juan Quintela quint...@redhat.com wrote: Please send in any agenda items you are interested in covering. QCOW2 performance roadmap: * What can be done to achieve near-raw image format performance? * Benchmark results from an ideal QCOW2 model. Performance figures from a series of I/O scenarios: http://wiki.qemu.org/Qcow2/PerformanceRoadmap Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] qemu-kvm: remove unused setupcpuid
kvm_setup_cpuid seems unused, so remove it. Signed-off-by: Michael S. Tsirkin m...@redhat.com diff --git a/kvm/libkvm/libkvm-x86.c b/kvm/libkvm/libkvm-x86.c index f1aef76..2b12408 100644 --- a/kvm/libkvm/libkvm-x86.c +++ b/kvm/libkvm/libkvm-x86.c @@ -466,45 +466,6 @@ __u64 kvm_get_cr8(kvm_context_t kvm, int vcpu) return kvm-run[vcpu]-cr8; } -int kvm_setup_cpuid(kvm_context_t kvm, int vcpu, int nent, - struct kvm_cpuid_entry *entries) -{ - struct kvm_cpuid *cpuid; - int r; - - cpuid = malloc(sizeof(*cpuid) + nent * sizeof(*entries)); - if (!cpuid) - return -ENOMEM; - - cpuid-nent = nent; - memcpy(cpuid-entries, entries, nent * sizeof(*entries)); - r = ioctl(kvm-vcpu_fd[vcpu], KVM_SET_CPUID, cpuid); - - free(cpuid); - return r; -} - -int kvm_setup_cpuid2(kvm_context_t kvm, int vcpu, int nent, -struct kvm_cpuid_entry2 *entries) -{ - struct kvm_cpuid2 *cpuid; - int r; - - cpuid = malloc(sizeof(*cpuid) + nent * sizeof(*entries)); - if (!cpuid) - return -ENOMEM; - - cpuid-nent = nent; - memcpy(cpuid-entries, entries, nent * sizeof(*entries)); - r = ioctl(kvm-vcpu_fd[vcpu], KVM_SET_CPUID2, cpuid); - if (r == -1) { - fprintf(stderr, kvm_setup_cpuid2: %m\n); - r = -errno; - } - free(cpuid); - return r; -} - int kvm_set_shadow_pages(kvm_context_t kvm, unsigned int nrshadow_pages) { #ifdef KVM_CAP_MMU_SHADOW_CACHE_CONTROL diff --git a/kvm/libkvm/libkvm.h b/kvm/libkvm/libkvm.h index 4821a1e..a70945d 100644 --- a/kvm/libkvm/libkvm.h +++ b/kvm/libkvm/libkvm.h @@ -359,36 +359,6 @@ int kvm_set_guest_debug(kvm_context_t, int vcpu, struct kvm_guest_debug *dbg); #if defined(__i386__) || defined(__x86_64__) /*! - * \brief Setup a vcpu's cpuid instruction emulation - * - * Set up a table of cpuid function to cpuid outputs.\n - * - * \param kvm Pointer to the current kvm_context - * \param vcpu Which virtual CPU should be initialized - * \param nent number of entries to be installed - * \param entries cpuid function entries table - * \return 0 on success, or -errno on error - */ -int kvm_setup_cpuid(kvm_context_t kvm, int vcpu, int nent, - struct kvm_cpuid_entry *entries); - -/*! - * \brief Setup a vcpu's cpuid instruction emulation - * - * Set up a table of cpuid function to cpuid outputs. - * This call replaces the older kvm_setup_cpuid interface by adding a few - * parameters to support cpuid functions that have sub-leaf values. - * - * \param kvm Pointer to the current kvm_context - * \param vcpu Which virtual CPU should be initialized - * \param nent number of entries to be installed - * \param entries cpuid function entries table - * \return 0 on success, or -errno on error - */ -int kvm_setup_cpuid2(kvm_context_t kvm, int vcpu, int nent, -struct kvm_cpuid_entry2 *entries); - -/*! * \brief Setting the number of shadow pages to be allocated to the vm * * \param kvm pointer to kvm_context diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index 20b7d6d..672bcbf 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -418,37 +418,6 @@ static void kvm_set_cr8(CPUState *env, uint64_t cr8) env-kvm_run-cr8 = cr8; } -int kvm_setup_cpuid(CPUState *env, int nent, -struct kvm_cpuid_entry *entries) -{ -struct kvm_cpuid *cpuid; -int r; - -cpuid = qemu_malloc(sizeof(*cpuid) + nent * sizeof(*entries)); - -cpuid-nent = nent; -memcpy(cpuid-entries, entries, nent * sizeof(*entries)); -r = kvm_vcpu_ioctl(env, KVM_SET_CPUID, cpuid); - -free(cpuid); -return r; -} - -int kvm_setup_cpuid2(CPUState *env, int nent, - struct kvm_cpuid_entry2 *entries) -{ -struct kvm_cpuid2 *cpuid; -int r; - -cpuid = qemu_malloc(sizeof(*cpuid) + nent * sizeof(*entries)); - -cpuid-nent = nent; -memcpy(cpuid-entries, entries, nent * sizeof(*entries)); -r = kvm_vcpu_ioctl(env, KVM_SET_CPUID2, cpuid); -free(cpuid); -return r; -} - int kvm_set_shadow_pages(kvm_context_t kvm, unsigned int nrshadow_pages) { #ifdef KVM_CAP_MMU_SHADOW_CACHE_CONTROL diff --git a/qemu-kvm.h b/qemu-kvm.h index 0f3fb50..7e6edfb 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -219,6 +219,7 @@ int kvm_get_mpstate(CPUState *env, struct kvm_mp_state *mp_state); int kvm_set_mpstate(CPUState *env, struct kvm_mp_state *mp_state); #endif +#if defined(__i386__) || defined(__x86_64__) /*! * \brief Simulate an external vectored interrupt * @@ -231,36 +232,6 @@ int kvm_set_mpstate(CPUState *env, struct kvm_mp_state *mp_state); */ int kvm_inject_irq(CPUState *env, unsigned irq); -#if defined(__i386__) || defined(__x86_64__) -/*! - * \brief Setup a vcpu's cpuid instruction emulation - * - * Set up a table of cpuid function to cpuid outputs.\n - * - * \param kvm Pointer to the current kvm_context - * \param vcpu Which
Re: Mask bit support's API
On Tue, Nov 23, 2010 at 04:06:20PM +0200, Avi Kivity wrote: So instead of - guest reads/writes msix - kvm filters mmio, implements some, passes others to userspace we have - guest reads/writes msix - kvm implements all - some writes generate an additional notification to userspace I suppose we don't need to generate notification to userspace? Because every read/write is handled by kernel, and userspace just need interface to kernel to get/set the entry - and well, does userspace need to do it when kernel can handle all of them? Maybe not... We could have the kernel handle addr/data writes by setting up an internal interrupt routing. A disadvantage is that more work is needed if we emulator interrupt remapping in qemu. As an alternative, interrupt remapping will need some API rework, right? Existing APIs only pass address/data for msi. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Mask bit support's API
On Tue, Nov 23, 2010 at 05:11:19PM +0200, Michael S. Tsirkin wrote: On Tue, Nov 23, 2010 at 04:06:20PM +0200, Avi Kivity wrote: So instead of - guest reads/writes msix - kvm filters mmio, implements some, passes others to userspace we have - guest reads/writes msix - kvm implements all - some writes generate an additional notification to userspace I suppose we don't need to generate notification to userspace? Because every read/write is handled by kernel, and userspace just need interface to kernel to get/set the entry - and well, does userspace need to do it when kernel can handle all of them? Maybe not... We could have the kernel handle addr/data writes by setting up an internal interrupt routing. A disadvantage is that more work is needed if we emulator interrupt remapping in qemu. As an alternative, interrupt remapping will need some API rework, right? Existing APIs only pass address/data for msi. IIRC interrupt remapping works with address/data to. It just interpret it differently from apic. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 00/16] boot order specification
Anthony, Blue No comments on this patch series for almost a week. Can it be applied? On Wed, Nov 17, 2010 at 06:43:47PM +0200, Gleb Natapov wrote: I am using open firmware naming scheme to specify device path names. In this version: added SCSI bus support. Pass boot order list as file to firmware. Names look like this on pci machine: /p...@i0cf8/i...@1,1/dr...@1/d...@0 /p...@i0cf8/i...@1/f...@03f1/flo...@1 /p...@i0cf8/i...@1/f...@03f1/flo...@0 /p...@i0cf8/i...@1,1/dr...@1/d...@1 /p...@i0cf8/i...@1,1/dr...@0/d...@0 /p...@i0cf8/s...@3/d...@0,0 /p...@i0cf8/ether...@4/ethernet-...@0 /p...@i0cf8/ether...@5/ethernet-...@0 /p...@i0cf8/i...@1,1/dr...@0/d...@1 /p...@i0cf8/i...@1/i...@01e8/dr...@0/d...@0 /p...@i0cf8/u...@1,2/netw...@0/ether...@0 /p...@i0cf8/u...@1,2/h...@1/netw...@0/ether...@0 /r...@genroms/linuxboot.bin and on isa machine: /isa/i...@0170/dr...@0/d...@0 /isa/f...@03f1/flo...@1 /isa/f...@03f1/flo...@0 /isa/i...@0170/dr...@0/d...@1 Instead of using get_dev_path() callback I introduces another one get_fw_dev_path. Unfortunately the way get_dev_path() callback is used in migration code makes it hard to reuse it for other purposes. First of all it is not called recursively so caller expects it to provide unique name by itself. Device path though is inherently recursive. Each individual element may not be unique, but the whole path will be. On the other hand to call get_dev_path() recursively in migration code we should implement it for all possible buses first. Other problem is compatibility. If we change get_dev_path() output format now we will not be able to migrate from old qemu to new one without some additional compatibility layer. Gleb Natapov (16): Introduce fw_name field to DeviceInfo structure. Introduce new BusInfo callback get_fw_dev_path. Keep track of ISA ports ISA device is using in qdev. Add get_fw_dev_path callback to ISA bus in qdev. Store IDE bus id in IDEBus structure for easy access. Add get_fw_dev_path callback to IDE bus. Add get_dev_path callback for system bus. Add get_fw_dev_path callback for pci bus. Record which USBDevice USBPort belongs too. Add get_dev_path callback for usb bus. Add get_dev_path callback to scsi bus. Add bootindex parameter to net/block/fd device Change fw_cfg_add_file() to get full file path as a parameter. Add bootindex for option roms. Add notifier that will be called when machine is fully created. Pass boot device list to firmware. block_int.h |4 +- hw/cs4231a.c |1 + hw/e1000.c|4 ++ hw/eepro100.c |3 + hw/fdc.c | 12 ++ hw/fw_cfg.c | 30 -- hw/fw_cfg.h |4 +- hw/gus.c |4 ++ hw/ide/cmd646.c |4 +- hw/ide/internal.h |3 +- hw/ide/isa.c |5 ++- hw/ide/piix.c |4 +- hw/ide/qdev.c | 22 ++- hw/ide/via.c |4 +- hw/isa-bus.c | 42 +++ hw/isa.h |4 ++ hw/lance.c|1 + hw/loader.c | 32 --- hw/loader.h |8 ++-- hw/m48t59.c |1 + hw/mc146818rtc.c |1 + hw/multiboot.c|3 +- hw/ne2000-isa.c |3 + hw/ne2000.c |5 ++- hw/nseries.c |4 +- hw/palm.c |6 +- hw/parallel.c |5 ++ hw/pc.c |7 ++- hw/pci.c | 110 --- hw/pci_host.c |2 + hw/pckbd.c|3 + hw/pcnet.c|6 ++- hw/piix_pci.c |1 + hw/qdev.c | 32 +++ hw/qdev.h |9 hw/rtl8139.c |4 ++ hw/sb16.c |4 ++ hw/scsi-bus.c | 23 +++ hw/scsi-disk.c|2 + hw/serial.c |1 + hw/sysbus.c | 30 ++ hw/sysbus.h |4 ++ hw/usb-bus.c | 45 - hw/usb-hub.c |3 +- hw/usb-musb.c |2 +- hw/usb-net.c |3 + hw/usb-ohci.c |2 +- hw/usb-uhci.c |2 +- hw/usb.h |3 +- hw/virtio-blk.c |2 + hw/virtio-net.c |2 + hw/virtio-pci.c |1 + net.h |4 +- qemu-config.c | 17 sysemu.h | 11 +- vl.c | 114 - 56 files changed, 588 insertions(+), 80 deletions(-) -- 1.7.2.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
http://www.cir-rosario.com.ar/peper.php -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: trace_printk() support in trace-cmd
On Tue, 2010-11-23 at 16:37 +0200, Avi Kivity wrote: On 11/23/2010 04:30 PM, Steven Rostedt wrote: On Tue, 2010-11-23 at 13:04 +0200, Avi Kivity wrote: On 11/16/2010 05:13 PM, Steven Rostedt wrote: BTW, what does /debug/tracing/printk_formats show? Empty. So you have real trace_printk's not bprintk's? What are bprintk()s? trace_printk() tries to be clever. If it detects that the format is constant, instead of doing the sprintf at the tracepoint, it copies a pointer to the format, and then copies the args to the stack. (although, I'm not sure how much quicker this is). It just saves on the format in the ring buffer. If the format is not static, then it just simply calls __trace_printk() that does the sprintf() and writes that output into the buffer. That is, if the format is not a const, then we fall back to __trace_printk(_THIS_IP_, fmt, args); And this is a different object. I have not tested these in a while, I'll give it a try. But if your printks are bprintks, then the bug is in the kernel, since that printk_formats needs to show something. What I do is sprinkle trace_printk()s around my code and expect to see them interspersed with enabled tracepoints in 'trace-cmd report'. Is that not the intended behaviour? No, that is exactly the intended behavior. But the problem is, for some reason, the bprintk's (the default that trace_printk() uses) is not having the format exported. Remember, only the pointer to the format is stored in the ring buffer (and thus exported by trace-cmd). If that format is not shown in the printk_format's than trace-cmd has no way to determine what that trace_printk's format was. I guess the question is, why did it not show up? Again, the work around is to replace your trace_printks() with __trace_printk(_THIS_IP_, ...) or just modify the trace_printk() macro in include/linux/kernel.h to always use the __trace_printk() version. -- Steve -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Performance test result between per-vhost kthread disable and enable
On 11/23/2010 5:41 AM, Michael S. Tsirkin wrote: On Tue, Nov 23, 2010 at 09:23:41PM +0800, lidong chen wrote: At this point, I'd suggest testing vhost-net on the upstream kernel, not on rhel kernels. The change that introduced per-device threads is: c23f3445e68e1db0e74099f264bc5ff5d55ebdeb i will try this tomorrow. Is CONFIG_SCHED_DEBUG set? yes. CONFIG_SCHED_DEBUG=y. Disable it. Either debug scheduler or perf-test it :) Another debug option to disable is CONFIG_WORKQUEUE_TRACER if it is set when using old rhel6 kernels. -Sridhar 2010/11/23 Michael S. Tsirkinm...@redhat.com: On Tue, Nov 23, 2010 at 10:13:43AM +0800, lidong chen wrote: I test the performance between per-vhost kthread disable and enable. Test method: Send the same traffic load between per-vhost kthread disable and enable, and compare the cpu rate of host os. I run five vm on kvm, each of them have five nic. the vhost version which per-vhost kthread disable we used is rhel6 beta 2(2.6.32.60). the vhost version which per-vhost kthread enable we used is rhel6 (2.6.32-71). At this point, I'd suggest testing vhost-net on the upstream kernel, not on rhel kernels. The change that introduced per-device threads is: c23f3445e68e1db0e74099f264bc5ff5d55ebdeb Test result: with per-vhost kthread disable, the cpu rate of host os is 110%. with per-vhost kthread enable, the cpu rate of host os is 130%. Is CONFIG_SCHED_DEBUG set? We are stressing the scheduler a lot with vhost-net. In 2.6.32.60,the whole system only have a kthread. [r...@rhel6-kvm1 ~]# ps -ef | grep vhost root 973 2 0 Nov22 ?00:00:00 [vhost] In 2.6.32.71,the whole system have 25 kthread. [r...@kvm-4slot ~]# ps -ef | grep vhost- root 12896 2 0 10:26 ?00:00:00 [vhost-12842] root 12897 2 0 10:26 ?00:00:00 [vhost-12842] root 12898 2 0 10:26 ?00:00:00 [vhost-12842] root 12899 2 0 10:26 ?00:00:00 [vhost-12842] root 12900 2 0 10:26 ?00:00:00 [vhost-12842] root 13022 2 0 10:26 ?00:00:00 [vhost-12981] root 13023 2 0 10:26 ?00:00:00 [vhost-12981] root 13024 2 0 10:26 ?00:00:00 [vhost-12981] root 13025 2 0 10:26 ?00:00:00 [vhost-12981] root 13026 2 0 10:26 ?00:00:00 [vhost-12981] root 13146 2 0 10:26 ?00:00:00 [vhost-13088] root 13147 2 0 10:26 ?00:00:00 [vhost-13088] root 13148 2 0 10:26 ?00:00:00 [vhost-13088] root 13149 2 0 10:26 ?00:00:00 [vhost-13088] root 13150 2 0 10:26 ?00:00:00 [vhost-13088] ... Code difference: In 2.6.32.60,in function vhost_init, create the kthread for vhost. vhost_workqueue = create_singlethread_workqueue(vhost); In 2.6.32.71,in function vhost_dev_set_owner, create the kthread for each nic interface. dev-wq = create_singlethread_workqueue(vhost_name); Conclusion: with per-vhost kthread enable, the system can more throughput. but deal the same traffic load with per-vhost kthread enable, it waste more cpu resource. In my application scene, the cpu resource is more important, and one kthread for deal with traffic load is enough. So i think we should add a param to control this. for the CPU-bound system, this param disable per-vhost kthread. for the I/O-bound system, this param enable per-vhost kthread. the default value of this param is enable. If my opinion is right, i will give a patch for this. Let's try to figure out what the issue is, first. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM call minutes for Nov 23
qcow2 performance roadmap - What can be done to achieve near-raw image format performance? - some discussion points from Kevin on list http://lists.nongnu.org/archive/html/qemu-devel/2010-11/msg02126.html - please follow up on the list - some perf numbers (latest upstream qcow2 compared with qed) - qed is fully async, added unconditional flush to model qcow2 - http://wiki.qemu.org/Qcow2/PerformanceRoadmap - qcow2 not scaling as well - metadata handling still quite sync - sequential reads not scaling at all (a - only serialization point is two accesses to same block and need to allocate - template based backing file is common (esp. in cloud) - perf data suggests that data/table format dictates performance ceiling - barriers off on underlying fs, cache=writethrough - raw backing file (sparse) grows with basic tools like cp - suggestion: qed == qcow2 v3 - wouldn't support encryption and compression (Kevin won't do this) usb-ccid - concern about external library implementation - hard to add device features, enhancements, live migration protocol changes - external library - will resend patch to vcpu hard limits - will continue discussion on list 0.14 (release date, bug day, -rc planning, etc) - aiming for dec 15th - will send note out after call with release schedule 0.13.x - will connect with jforbes regarding -stable maintainance gPXE vs. iPXE - ipxe is new fork - ipxe looking more active (including original gpxe developers) - which is a better choice? - iPXE more active, gPXE stalled - some concern about where the community sits (gPXE has irc, bug reports, etc) - some concern about boot delay with iPXE - qemu not updating roms that frequently, next time we need to update, can evaluate - syslinux still using gPXE -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Mask bit support's API
On Tue, Nov 23, 2010 at 05:24:44PM +0200, Gleb Natapov wrote: On Tue, Nov 23, 2010 at 05:11:19PM +0200, Michael S. Tsirkin wrote: On Tue, Nov 23, 2010 at 04:06:20PM +0200, Avi Kivity wrote: So instead of - guest reads/writes msix - kvm filters mmio, implements some, passes others to userspace we have - guest reads/writes msix - kvm implements all - some writes generate an additional notification to userspace I suppose we don't need to generate notification to userspace? Because every read/write is handled by kernel, and userspace just need interface to kernel to get/set the entry - and well, does userspace need to do it when kernel can handle all of them? Maybe not... We could have the kernel handle addr/data writes by setting up an internal interrupt routing. A disadvantage is that more work is needed if we emulator interrupt remapping in qemu. As an alternative, interrupt remapping will need some API rework, right? Existing APIs only pass address/data for msi. IIRC interrupt remapping works with address/data to. It just interpret it differently from apic. Yes. So since our APIs use address/data, this is an argument for doing the remapping in kernel. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 00/16] boot order specification
On 11/23/2010 09:31 AM, Gleb Natapov wrote: Anthony, Blue No comments on this patch series for almost a week. Can it be applied? Does that mean everyone's happy or have folks not gotten around to review it? IOW, last call if you have objections :-) Regards, Anthony Liguori On Wed, Nov 17, 2010 at 06:43:47PM +0200, Gleb Natapov wrote: I am using open firmware naming scheme to specify device path names. In this version: added SCSI bus support. Pass boot order list as file to firmware. Names look like this on pci machine: /p...@i0cf8/i...@1,1/dr...@1/d...@0 /p...@i0cf8/i...@1/f...@03f1/flo...@1 /p...@i0cf8/i...@1/f...@03f1/flo...@0 /p...@i0cf8/i...@1,1/dr...@1/d...@1 /p...@i0cf8/i...@1,1/dr...@0/d...@0 /p...@i0cf8/s...@3/d...@0,0 /p...@i0cf8/ether...@4/ethernet-...@0 /p...@i0cf8/ether...@5/ethernet-...@0 /p...@i0cf8/i...@1,1/dr...@0/d...@1 /p...@i0cf8/i...@1/i...@01e8/dr...@0/d...@0 /p...@i0cf8/u...@1,2/netw...@0/ether...@0 /p...@i0cf8/u...@1,2/h...@1/netw...@0/ether...@0 /r...@genroms/linuxboot.bin and on isa machine: /isa/i...@0170/dr...@0/d...@0 /isa/f...@03f1/flo...@1 /isa/f...@03f1/flo...@0 /isa/i...@0170/dr...@0/d...@1 Instead of using get_dev_path() callback I introduces another one get_fw_dev_path. Unfortunately the way get_dev_path() callback is used in migration code makes it hard to reuse it for other purposes. First of all it is not called recursively so caller expects it to provide unique name by itself. Device path though is inherently recursive. Each individual element may not be unique, but the whole path will be. On the other hand to call get_dev_path() recursively in migration code we should implement it for all possible buses first. Other problem is compatibility. If we change get_dev_path() output format now we will not be able to migrate from old qemu to new one without some additional compatibility layer. Gleb Natapov (16): Introduce fw_name field to DeviceInfo structure. Introduce new BusInfo callback get_fw_dev_path. Keep track of ISA ports ISA device is using in qdev. Add get_fw_dev_path callback to ISA bus in qdev. Store IDE bus id in IDEBus structure for easy access. Add get_fw_dev_path callback to IDE bus. Add get_dev_path callback for system bus. Add get_fw_dev_path callback for pci bus. Record which USBDevice USBPort belongs too. Add get_dev_path callback for usb bus. Add get_dev_path callback to scsi bus. Add bootindex parameter to net/block/fd device Change fw_cfg_add_file() to get full file path as a parameter. Add bootindex for option roms. Add notifier that will be called when machine is fully created. Pass boot device list to firmware. block_int.h |4 +- hw/cs4231a.c |1 + hw/e1000.c|4 ++ hw/eepro100.c |3 + hw/fdc.c | 12 ++ hw/fw_cfg.c | 30 -- hw/fw_cfg.h |4 +- hw/gus.c |4 ++ hw/ide/cmd646.c |4 +- hw/ide/internal.h |3 +- hw/ide/isa.c |5 ++- hw/ide/piix.c |4 +- hw/ide/qdev.c | 22 ++- hw/ide/via.c |4 +- hw/isa-bus.c | 42 +++ hw/isa.h |4 ++ hw/lance.c|1 + hw/loader.c | 32 --- hw/loader.h |8 ++-- hw/m48t59.c |1 + hw/mc146818rtc.c |1 + hw/multiboot.c|3 +- hw/ne2000-isa.c |3 + hw/ne2000.c |5 ++- hw/nseries.c |4 +- hw/palm.c |6 +- hw/parallel.c |5 ++ hw/pc.c |7 ++- hw/pci.c | 110 --- hw/pci_host.c |2 + hw/pckbd.c|3 + hw/pcnet.c|6 ++- hw/piix_pci.c |1 + hw/qdev.c | 32 +++ hw/qdev.h |9 hw/rtl8139.c |4 ++ hw/sb16.c |4 ++ hw/scsi-bus.c | 23 +++ hw/scsi-disk.c|2 + hw/serial.c |1 + hw/sysbus.c | 30 ++ hw/sysbus.h |4 ++ hw/usb-bus.c | 45 - hw/usb-hub.c |3 +- hw/usb-musb.c |2 +- hw/usb-net.c |3 + hw/usb-ohci.c |2 +- hw/usb-uhci.c |2 +- hw/usb.h |3 +- hw/virtio-blk.c |2 + hw/virtio-net.c |2 + hw/virtio-pci.c |1 + net.h |4 +- qemu-config.c | 17 sysemu.h | 11 +- vl.c | 114 - 56 files changed, 588 insertions(+), 80 deletions(-) -- 1.7.2.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org
[PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2)
qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT. Instead of teaching them to respond to these signals (which cannot be trapped), use SIGUSR1 to approximate the behavior of SIGSTOP/SIGCONT. The purpose of this is to implement CPU hard limits using an external tool that watches the CPU consumption and stops the VCPU as appropriate. This provides a more elegant solution in that it allows the VCPU thread to release qemu_mutex before going to sleep. This current implementation uses a single signal. I think this is too racey in the long term so I think we should introduce a second signal. If two signals get coalesced into one, it could confuse the monitoring tool into giving the VCPU the inverse of it's entitlement. It might be better to simply move this logic entirely into QEMU to make this more robust--the question is whether we think this is a good long term feature to carry in QEMU? Signed-off-by: Anthony Liguori aligu...@us.ibm.com diff --git a/cpu-defs.h b/cpu-defs.h index 51533c6..6434dca 100644 --- a/cpu-defs.h +++ b/cpu-defs.h @@ -220,6 +220,7 @@ struct KVMCPUState { const char *cpu_model_str; \ struct KVMState *kvm_state; \ struct kvm_run *kvm_run;\ +int sigusr1_fd; \ int kvm_fd; \ int kvm_vcpu_dirty; \ struct KVMCPUState kvm_cpu_state; diff --git a/qemu-kvm.c b/qemu-kvm.c index 471306b..354109f 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -1351,6 +1351,29 @@ static void pause_all_threads(void) } } +static void vcpu_stop(CPUState *env) +{ +if (env != cpu_single_env) { +env-stop = 1; +pthread_kill(env-kvm_cpu_state.thread, SIG_IPI); +} else { +env-stop = 0; +env-stopped = 1; +cpu_exit(env); +} + +while (!env-stopped) { +qemu_cond_wait(qemu_pause_cond); +} +} + +static void vcpu_start(CPUState *env) +{ +env-stop = 0; +env-stopped = 0; +pthread_kill(env-kvm_cpu_state.thread, SIG_IPI); +} + static void resume_all_threads(void) { CPUState *penv = first_cpu; @@ -1426,6 +1449,37 @@ static int kvm_main_loop_cpu(CPUState *env) return 0; } +static __thread int sigusr1_wfd; + +static void on_sigusr1(int signo) +{ +char ch = 0; +if (write(sigusr1_wfd, ch, 1) 0) { +/* who cares */ +} +} + +static void sigusr1_read(void *opaque) +{ +CPUState *env = opaque; +ssize_t len; +int caught_signal = 0; + +do { +char buffer[256]; +len = read(env-sigusr1_fd, buffer, sizeof(buffer)); +caught_signal = 1; +} while (len 0); + +if (caught_signal) { +if (env-stopped) { +vcpu_start(env); +} else { +vcpu_stop(env); +} +} +} + static void *ap_main_loop(void *_env) { CPUState *env = _env; @@ -1433,10 +1487,12 @@ static void *ap_main_loop(void *_env) #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT struct ioperm_data *data = NULL; #endif +int fds[2]; current_env = env; env-thread_id = kvm_get_thread_id(); sigfillset(signals); +sigdelset(signals, SIGUSR1); sigprocmask(SIG_BLOCK, signals, NULL); #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT @@ -1451,6 +1507,18 @@ static void *ap_main_loop(void *_env) kvm_create_vcpu(env, env-cpu_index); setup_kernel_sigmask(env); +if (pipe(fds) == -1) { +/* do nothing */ +} + +fcntl(fds[0], F_SETFL, O_NONBLOCK); +fcntl(fds[1], F_SETFL, O_NONBLOCK); + +env-sigusr1_fd = fds[0]; +sigusr1_wfd = fds[1]; + +qemu_set_fd_handler2(fds[0], NULL, sigusr1_read, NULL, env); + /* signal VCPU creation */ current_env-created = 1; pthread_cond_signal(qemu_vcpu_cond); @@ -1463,6 +1531,8 @@ static void *ap_main_loop(void *_env) /* re-initialize cpu_single_env after re-acquiring qemu_mutex */ cpu_single_env = env; +signal(SIGUSR1, on_sigusr1); + kvm_main_loop_cpu(env); return NULL; } diff --git a/qemu-kvm.h b/qemu-kvm.h index 0f3fb50..3addc77 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -783,6 +783,7 @@ struct KVMState { int irqchip_in_kernel; int pit_in_kernel; int xsave, xcrs; +int sigusr2_fd; struct kvm_context kvm_context; }; -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH trace-cmd 0/3] kvm plugin updates
On Tue, 2010-11-23 at 12:58 +0200, Avi Kivity wrote: Currently the kvm plugin only decodes vmx exit reasons; the first patch in this series adds support for the svm instruction set. Second patch fixes a typo. A couple of fields were added to the kvm_exit tracepoint; the third patch prints them out. Avi Kivity (3): kvm: parse svm exit reason kvm: fix typo UNKOWN kvm: display the new kvm_exit info1 and info2 fields, if available plugin_kvm.c | 121 ++ 1 files changed, 113 insertions(+), 8 deletions(-) Applied, Thanks Avi! -- Steve -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 00/16] boot order specification
On Tue, Nov 23, 2010 at 4:12 PM, Anthony Liguori aligu...@linux.vnet.ibm.com wrote: On 11/23/2010 09:31 AM, Gleb Natapov wrote: Anthony, Blue No comments on this patch series for almost a week. Can it be applied? Does that mean everyone's happy or have folks not gotten around to review it? IOW, last call if you have objections :-) I'm happy with the patch set in general, I've just been very busy IRL. More experiments with Sparc32 device paths would not hurt, but bugs (if any) can be fixed later. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2)
On Tue, Nov 23, 2010 at 4:49 PM, Anthony Liguori aligu...@us.ibm.com wrote: qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT. Instead of teaching them to respond to these signals (which cannot be trapped), use SIGUSR1 to approximate the behavior of SIGSTOP/SIGCONT. The purpose of this is to implement CPU hard limits using an external tool that watches the CPU consumption and stops the VCPU as appropriate. This provides a more elegant solution in that it allows the VCPU thread to release qemu_mutex before going to sleep. This current implementation uses a single signal. I think this is too racey in the long term so I think we should introduce a second signal. If two signals get coalesced into one, it could confuse the monitoring tool into giving the VCPU the inverse of it's entitlement. It might be better to simply move this logic entirely into QEMU to make this more robust--the question is whether we think this is a good long term feature to carry in QEMU? +static __thread int sigusr1_wfd; While OpenBSD finally updated the default compiler to 4.2.1 from 3.x series, thread local storage is still not supported: $ cat thread.c static __thread int sigusr1_wfd; $ gcc thread.c -c thread.c:1: error: thread-local storage not supported for this target $ gcc -v Reading specs from /usr/lib/gcc-lib/sparc64-unknown-openbsd4.8/4.2.1/specs Target: sparc64-unknown-openbsd4.8 Configured with: OpenBSD/sparc64 system compiler Thread model: posix gcc version 4.2.1 20070719 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2)
On 11/23/2010 01:35 PM, Blue Swirl wrote: On Tue, Nov 23, 2010 at 4:49 PM, Anthony Liguorialigu...@us.ibm.com wrote: qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT. Instead of teaching them to respond to these signals (which cannot be trapped), use SIGUSR1 to approximate the behavior of SIGSTOP/SIGCONT. The purpose of this is to implement CPU hard limits using an external tool that watches the CPU consumption and stops the VCPU as appropriate. This provides a more elegant solution in that it allows the VCPU thread to release qemu_mutex before going to sleep. This current implementation uses a single signal. I think this is too racey in the long term so I think we should introduce a second signal. If two signals get coalesced into one, it could confuse the monitoring tool into giving the VCPU the inverse of it's entitlement. It might be better to simply move this logic entirely into QEMU to make this more robust--the question is whether we think this is a good long term feature to carry in QEMU? +static __thread int sigusr1_wfd; While OpenBSD finally updated the default compiler to 4.2.1 from 3.x series, thread local storage is still not supported: Hrm, is there a portable way to do this (distinguish a signal on a particular thread)? Regards, Anthony Liguori $ cat thread.c static __thread int sigusr1_wfd; $ gcc thread.c -c thread.c:1: error: thread-local storage not supported for this target $ gcc -v Reading specs from /usr/lib/gcc-lib/sparc64-unknown-openbsd4.8/4.2.1/specs Target: sparc64-unknown-openbsd4.8 Configured with: OpenBSD/sparc64 system compiler Thread model: posix gcc version 4.2.1 20070719 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2)
On 11/23/2010 10:46 PM, Anthony Liguori wrote: +static __thread int sigusr1_wfd; While OpenBSD finally updated the default compiler to 4.2.1 from 3.x series, thread local storage is still not supported: Hrm, is there a portable way to do this (distinguish a signal on a particular thread)? You can use pthread_getspecific/pthread_setspecific. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2)
On 11/23/2010 05:43 PM, Paolo Bonzini wrote: On 11/23/2010 10:46 PM, Anthony Liguori wrote: +static __thread int sigusr1_wfd; While OpenBSD finally updated the default compiler to 4.2.1 from 3.x series, thread local storage is still not supported: Hrm, is there a portable way to do this (distinguish a signal on a particular thread)? You can use pthread_getspecific/pthread_setspecific. Is it signal safe? BTW, this is all only theoretical. This is in the KVM io thread code which is already highly unportable. Regards, Anthony Liguori Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 00/16] boot order specification
Hi Gleb, On Tue, Nov 23, 2010 at 05:31:41PM +0200, Gleb Natapov wrote: Anthony, Blue No comments on this patch series for almost a week. Can it be applied? My apologies - I haven't had time to review. On Wed, Nov 17, 2010 at 06:43:47PM +0200, Gleb Natapov wrote: I am using open firmware naming scheme to specify device path names. In this version: added SCSI bus support. Pass boot order list as file to firmware. Names look like this on pci machine: [...] /p...@i0cf8/u...@1,2/h...@1/netw...@0/ether...@0 /r...@genroms/linuxboot.bin What's the plan for handling optionroms (ie, BCVs and BEVs)? This is an area which is a bit tricky - mainly due to legacy BIOS crud. An option rom can register either a BEV (eg, gpxe on a network card), or it can register one or more BCVs (eg, a scsi card registering two drives). How do we say boot from the optionrom on the second nic card? If you have a scsi card, how do we communicate that its second drive should be the c: drive? The ugly thing about BCVs is that they are not necessarily registered in the rom for the device that controls it. So, if you have two of the same type of scsi card, each with two drives, it's possible for the optionrom to put all four drives in the rom of the first scsi card. Gleb Natapov (16): [...] Pass boot device list to firmware. It looks like you went with a newline separated list. Thanks. -Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Mask bit support's API
On Tuesday 23 November 2010 22:06:20 Avi Kivity wrote: On 11/23/2010 03:57 PM, Yang, Sheng wrote: Yeah, but won't be included in this patchset. What API changes are needed? I'd like to see the complete API. I am not sure about it. But I suppose the structure should be the same? In fact it's pretty hard for me to image what's needed for virtio in the future, especially there is no such code now. I really prefer to deal with assigned device and virtio separately, which would make the work much easier. But seems you won't agree on that. First, I don't really see why the two cases are different (but I don't do a lot in this space). Surely between you and Michael, you have all the information? Second, my worry is a huge number of ABI variants that come from incrementally adding features. I want to implement bigger chunks of functionality. So I'd like to see all potential users addressed, at least from the ABI point of view if not the implementation. The API needs to be compatible with the pending bit, even if we don't implement it now. I want to reduce the rate of API changes. This can be implemented by this API, just adding a flag for it. And I would still take this into consideration in the next API purposal. Shouldn't kvm also service reads from the pending bitmask? Of course KVM should service reading from pending bitmask. For assigned device, it's kernel who would set the pending bit; but I am not sure for virtio. This interface is GET_ENTRY, so reading is fine with it. So instead of - guest reads/writes msix - kvm filters mmio, implements some, passes others to userspace we have - guest reads/writes msix - kvm implements all - some writes generate an additional notification to userspace I suppose we don't need to generate notification to userspace? Because every read/write is handled by kernel, and userspace just need interface to kernel to get/set the entry - and well, does userspace need to do it when kernel can handle all of them? Maybe not... We could have the kernel handle addr/data writes by setting up an internal interrupt routing. A disadvantage is that more work is needed if we emulator interrupt remapping in qemu. In fact modifying irq routing in the kernel is also the thing I want to avoid. So, the flow would be: kernel get MMIO write, record it in it's own MSI table KVM exit to QEmu, by one specific exit reason QEmu know it have to sync the MSI table, then reading the entries from kernel QEmu found it's an write, so it need to reprogram irq routing table using the entries above done But wait, why should qemu read entries from kernel? By default exit we already have the information about what's the entry to modify and what to write, so we can use them directly. By this way, we also don't need an specific exit reason - just exit to qemu in normal way is fine. Then it would be: kernel get MMIO write, record it in it's own MSI table KVM exit to QEmu, indicate MMIO exit QEmu found it's an write, it would update it's own MSI table(may need to query mask bit from kernel), and reprogram irq routing table using the entries above done Then why should kernel kept it's own MSI table? I think the only reason is we can speed up reading in that way - but the reading we want to speed up is mostly on enabled entry(the first entry), which is already in the IRQ routing table... And for enabled/disabled entry, you can see it like this: for the entries inside routing table, we think it's enabled; otherwise it's disabled. Then you don't need to bothered by pci_enable_msix(). So our strategy for reading accelerating can be: If the entry contained in irq routing table, then use it; otherwise let qemu deal with it. Because it's the QEmu who owned irq routing table, the synchronization is guaranteed. We don't need the MSI table in the kernel then. And for writing, we just want to cover all of mask bit, but none of others. I think the concept here is more acceptable? The issue here is MSI table and irq routing table got duplicate information on some entries. My initial purposal is to use irq routing table in kernel, then we don't need to duplicate information. -- regards Yang, Sheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2)
On 11/24/2010 02:15 AM, Anthony Liguori wrote: Is it signal safe? Yes, at heart it is just a somewhat more expensive access to pthread_self()-some_array[key]. BTW, this is all only theoretical. This is in the KVM io thread code which is already highly unportable. True, and newer versions of GCC emulate __thread even on Windows. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html