Re: [PATCH/RFC] kvm: fix refcounting race release vs. module unload
--- kvm.orig/virt/kvm/kvm_main.c +++ kvm/virt/kvm/kvm_main.c @@ -1303,7 +1303,7 @@ static int kvm_vcpu_release(struct inode return 0; } -static const struct file_operations kvm_vcpu_fops = { +static struct file_operations kvm_vcpu_fops = { .release= kvm_vcpu_release, .unlocked_ioctl = kvm_vcpu_ioctl, .compat_ioctl = kvm_vcpu_ioctl, @@ -1318,6 +1318,7 @@ static int create_vcpu_fd(struct kvm_vcp int fd = anon_inode_getfd(kvm-vcpu, kvm_vcpu_fops, vcpu, 0); if (fd 0) kvm_put_kvm(vcpu-kvm); + __module_get(kvm_vcpu_fops.owner); return fd; } @@ -2061,6 +2062,7 @@ int kvm_init(void *opaque, unsigned int } kvm_chardev_ops.owner = module; + kvm_vcpu_fops.owner = module; r = misc_register(kvm_dev); if (r) { Messing with module counts is slightly ugly. How about having a vm fd fget() the /dev/kvm fd() instead? I personally find fget (and fput) slightly more ugly than handling the module reference count. Especially if the problem is module unloading...the module refcount looks so natural. I am also a bit worried by fget/fput, since we would call fput in the release function - which is part of the module. Wouldnt that open another very small race? In addition, we would need variables containing the fd and the file pointer for /dev/kvm, since fget/fput need some parameters, no? (Is there an easy way to get the fd from the struct file *filp? Searching current-files, seems to be the only method I know) To me, the fget approach looks more complicated and less safe. Christian -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Fix race between pending IRQ and NMI
Avi Kivity wrote: Jan Kiszka wrote: But I think I see a bigger issue - if we inject an regular interrupt while another is pending, then we will encounter this problem. Looks like we have to enable the interrupt window after injecting an interrupt if there are still pending interrupts. Yeah, probably. I'm just wondering now if we can set exit-on-interrupt-window while the vcpu state is interruptible (ie. _before_ the injection). There is some entry check like this for NMIs, but maybe no for interrupts. Need to check. Turns out it's not necessary, since the guest eoi will cause an exit and allow the code to request an interrupt window. But you added explicit handling now nevertheless? I've added an apic test program so we can track these issues (user/test/x86/apic.c). That's good. BTW, your NMI race fix is still lacking support for the -no-kvm-irqchip case. Will post an according patch later today. Jan -- Siemens AG, Corporate Technology, CT SE 2 ES-OS Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm-testsuite: Fix halt callback
Change halt callback in testsuite to conform with latest refactorings. Signed-off-by: Jan Kiszka [EMAIL PROTECTED] --- user/main.c |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/user/main.c b/user/main.c index a00b073..55639b5 100644 --- a/user/main.c +++ b/user/main.c @@ -304,13 +304,12 @@ static int test_debug(void *opaque, void *vcpu) return 0; } -static int test_halt(void *opaque, void *_vcpu) +static int test_halt(void *opaque, int vcpu) { - struct vcpu_info *vcpu = _vcpu; int n; sigwait(ipi_sigmask, n); - kvm_inject_irq(kvm, vcpu-id, apic_ipi_vector); + kvm_inject_irq(kvm, vcpus[vcpu].id, apic_ipi_vector); return 0; } -- Siemens AG, Corporate Technology, CT SE 2 ES-OS Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86: Cleanup user space NMI injection
There is no point in doing the ready_for_nmi_injection/ request_nmi_window dance with user space. First, we don't do this for in-kernel irqchip anyway, while the code path is the same as for user space irqchip mode. And second, there is nothing to loose if a pending NMI is overwritten by another one (in contrast to IRQs where we have to save the number). Actually, there is even the risk of raising spurious NMIs this way because the reason for the held-back NMI might already be handled while processing the first one. [ Avi, how to deal with the fields in struct kvm_run and the exit reason? They are not mainline yet, neither in linux nor in qemu, and I don't think they should ever be pushed in their current form. Simply revert them? ] Signed-off-by: Jan Kiszka [EMAIL PROTECTED] --- arch/x86/kvm/vmx.c | 24 ++-- arch/x86/kvm/x86.c | 34 -- include/linux/kvm.h |6 +++--- 3 files changed, 13 insertions(+), 51 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 775a140..6fbff55 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2498,15 +2498,13 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu, } if (vcpu-arch.nmi_injected) { vmx_inject_nmi(vcpu); - if (vcpu-arch.nmi_pending || kvm_run-request_nmi_window) + if (vcpu-arch.nmi_pending) enable_nmi_window(vcpu); else if (vcpu-arch.irq_summary || kvm_run-request_interrupt_window) enable_irq_window(vcpu); return; } - if (!vcpu-arch.nmi_window_open || kvm_run-request_nmi_window) - enable_nmi_window(vcpu); if (vcpu-arch.interrupt_window_open) { if (vcpu-arch.irq_summary !vcpu-arch.interrupt.pending) @@ -3040,14 +3038,6 @@ static int handle_nmi_window(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control); ++vcpu-stat.nmi_window_exits; - /* -* If the user space waits to inject a NMI, exit as soon as possible -*/ - if (kvm_run-request_nmi_window !vcpu-arch.nmi_pending) { - kvm_run-exit_reason = KVM_EXIT_NMI_WINDOW_OPEN; - return 0; - } - return 1; } @@ -3162,7 +3152,7 @@ static int kvm_handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) vmx-soft_vnmi_blocked = 0; vcpu-arch.nmi_window_open = 1; } else if (vmx-vnmi_blocked_time 10LL - (kvm_run-request_nmi_window || vcpu-arch.nmi_pending)) { + vcpu-arch.nmi_pending) { /* * This CPU don't support us in finding the end of an * NMI-blocked window if the guest runs with IRQs @@ -3175,16 +3165,6 @@ static int kvm_handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) vmx-soft_vnmi_blocked = 0; vmx-vcpu.arch.nmi_window_open = 1; } - - /* -* If the user space waits to inject an NNI, exit ASAP -*/ - if (vcpu-arch.nmi_window_open kvm_run-request_nmi_window -!vcpu-arch.nmi_pending) { - kvm_run-exit_reason = KVM_EXIT_NMI_WINDOW_OPEN; - ++vcpu-stat.nmi_window_exits; - return 0; - } } if (exit_reason kvm_vmx_max_exit_handlers diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7a2aeba..a5da129 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2885,37 +2885,18 @@ static int dm_request_for_irq_injection(struct kvm_vcpu *vcpu, (kvm_x86_ops-get_rflags(vcpu) X86_EFLAGS_IF)); } -/* - * Check if userspace requested a NMI window, and that the NMI window - * is open. - * - * No need to exit to userspace if we already have a NMI queued. - */ -static int dm_request_for_nmi_injection(struct kvm_vcpu *vcpu, - struct kvm_run *kvm_run) -{ - return (!vcpu-arch.nmi_pending - kvm_run-request_nmi_window - vcpu-arch.nmi_window_open); -} - static void post_kvm_run_save(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) { kvm_run-if_flag = (kvm_x86_ops-get_rflags(vcpu) X86_EFLAGS_IF) != 0; kvm_run-cr8 = kvm_get_cr8(vcpu); kvm_run-apic_base = kvm_get_apic_base(vcpu); - if (irqchip_in_kernel(vcpu-kvm)) { + if (irqchip_in_kernel(vcpu-kvm)) kvm_run-ready_for_interrupt_injection = 1; - kvm_run-ready_for_nmi_injection = 1; - } else { + else kvm_run-ready_for_interrupt_injection =
[PATCH] KVM: VMX: Fix pending NMI-vs.-IRQ race for user space irqchip
Push b55a50582030cf294a675492d7ab2e235b965cc8 and d3a2c20c9b850d92dae383fd6a64840de2687cd6 also to the user space irqchip path. Signed-off-by: Jan Kiszka [EMAIL PROTECTED] --- arch/x86/kvm/vmx.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 7ea4855..775a140 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2486,7 +2486,9 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu, vmx_update_window_states(vcpu); if (vcpu-arch.nmi_pending !vcpu-arch.nmi_injected) { - if (vcpu-arch.nmi_window_open) { + if (vcpu-arch.interrupt.pending) { + enable_nmi_window(vcpu); + } else if (vcpu-arch.nmi_window_open) { vcpu-arch.nmi_pending = false; vcpu-arch.nmi_injected = true; } else { -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] kvm: Replace force type convert with container_of()
Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- qemu/hw/device-assignment.c | 20 1 files changed, 12 insertions(+), 8 deletions(-) diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c index 9a790c6..786b2f0 100644 --- a/qemu/hw/device-assignment.c +++ b/qemu/hw/device-assignment.c @@ -144,7 +144,7 @@ static uint32_t assigned_dev_ioport_readl(void *opaque, uint32_t addr) static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num, uint32_t e_phys, uint32_t e_size, int type) { -AssignedDevice *r_dev = (AssignedDevice *) pci_dev; +AssignedDevice *r_dev = container_of(pci_dev, AssignedDevice, dev); AssignedDevRegion *region = r_dev-v_addrs[region_num]; uint32_t old_ephys = region-e_physbase; uint32_t old_esize = region-e_size; @@ -172,7 +172,7 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num, static void assigned_dev_ioport_map(PCIDevice *pci_dev, int region_num, uint32_t addr, uint32_t size, int type) { -AssignedDevice *r_dev = (AssignedDevice *) pci_dev; +AssignedDevice *r_dev = container_of(pci_dev, AssignedDevice, dev); AssignedDevRegion *region = r_dev-v_addrs[region_num]; int first_map = (region-e_size == 0); CPUState *env; @@ -221,6 +221,7 @@ static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address, { int fd; ssize_t ret; +AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev); DEBUG((%x.%x): address=%04x val=0x%08x len=%d\n, ((d-devfn 3) 0x1F), (d-devfn 0x7), @@ -242,7 +243,7 @@ static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address, ((d-devfn 3) 0x1F), (d-devfn 0x7), (uint16_t) address, val, len); -fd = ((AssignedDevice *)d)-real_device.config_fd; +fd = pci_dev-real_device.config_fd; again: ret = pwrite(fd, val, len, address); @@ -263,6 +264,7 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address, uint32_t val = 0; int fd; ssize_t ret; +AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev); if ((address = 0x10 address = 0x24) || address == 0x34 || address == 0x3c || address == 0x3d) { @@ -276,7 +278,7 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address, if (address == 0xFC) goto do_log; -fd = ((AssignedDevice *)d)-real_device.config_fd; +fd = pci_dev-real_device.config_fd; again: ret = pread(fd, val, len, address); @@ -489,16 +491,18 @@ struct PCIDevice *init_assigned_device(AssignedDevInfo *adev, PCIBus *bus) { int r; AssignedDevice *dev; +PCIDevice *pci_dev; uint8_t e_device, e_intx; struct kvm_assigned_pci_dev assigned_dev_data; DEBUG(Registering real physical device %s (bus=%x dev=%x func=%x)\n, adev-name, adev-bus, adev-dev, adev-func); -dev = (AssignedDevice *) -pci_register_device(bus, adev-name, sizeof(AssignedDevice), --1, assigned_dev_pci_read_config, -assigned_dev_pci_write_config); +pci_dev = pci_register_device(bus, adev-name, + sizeof(AssignedDevice), -1, assigned_dev_pci_read_config, + assigned_dev_pci_write_config); +dev = container_of(pci_dev, AssignedDevice, dev); + if (NULL == dev) { fprintf(stderr, %s: Error: Couldn't register real device %s\n, __func__, adev-name); -- 1.5.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] Support for device capability
This framework can be easily extended to support device capability, like MSI/MSI-x. Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- qemu/hw/pci.c | 85 + qemu/hw/pci.h | 30 2 files changed, 115 insertions(+), 0 deletions(-) diff --git a/qemu/hw/pci.c b/qemu/hw/pci.c index 75bc9a9..73f73da 100644 --- a/qemu/hw/pci.c +++ b/qemu/hw/pci.c @@ -339,11 +339,65 @@ static void pci_update_mappings(PCIDevice *d) } } +int pci_access_cap_config(PCIDevice *pci_dev, uint32_t address, int len) +{ +if (pci_dev-cap.supported address = pci_dev-cap.start +(address + len) pci_dev-cap.start + pci_dev-cap.length) +return 1; +return 0; +} + +uint32_t pci_default_cap_read_config(PCIDevice *pci_dev, + uint32_t address, int len) +{ +uint32_t val = 0; + +if (pci_access_cap_config(pci_dev, address, len)) { +switch(len) { +default: +case 4: +if (address pci_dev-cap.start + pci_dev-cap.length - 4) { +val = le32_to_cpu(*(uint32_t *)(pci_dev-cap.config ++ address - pci_dev-cap.start)); +break; +} +/* fall through */ +case 2: +if (address pci_dev-cap.start + pci_dev-cap.length - 2) { +val = le16_to_cpu(*(uint16_t *)(pci_dev-cap.config ++ address - pci_dev-cap.start)); +break; +} +/* fall through */ +case 1: +val = pci_dev-cap.config[address - pci_dev-cap.start]; +break; +} +} +return val; +} + +void pci_default_cap_write_config(PCIDevice *pci_dev, + uint32_t address, uint32_t val, int len) +{ +if (pci_access_cap_config(pci_dev, address, len)) { +int i; +for (i = 0; i len; i++) { +pci_dev-cap.config[address + i - pci_dev-cap.start] = val; +val = 8; +} +return; +} +} + uint32_t pci_default_read_config(PCIDevice *d, uint32_t address, int len) { uint32_t val; +if (pci_access_cap_config(d, address, len)) +return d-cap.config_read(d, address, len); + switch(len) { default: case 4: @@ -397,6 +451,11 @@ void pci_default_write_config(PCIDevice *d, return; } default_config: +if (pci_access_cap_config(d, address, len)) { +d-cap.config_write(d, address, val, len); +return; +} + /* not efficient, but simple */ addr = address; for(i = 0; i len; i++) { @@ -802,3 +861,29 @@ PCIBus *pci_bridge_init(PCIBus *bus, int devfn, uint32_t id, s-bus = pci_register_secondary_bus(s-dev, map_irq); return s-bus; } + +void pci_enable_capability_support(PCIDevice *pci_dev, + uint32_t config_start, + PCICapConfigReadFunc *config_read, + PCICapConfigWriteFunc *config_write, + PCICapConfigInitFunc *config_init) +{ +if (!pci_dev) +return; + +if (config_start = 0x40 config_start 0xff) +pci_dev-cap.start = config_start; +else +pci_dev-cap.start = PCI_CAPABILITY_CONFIG_DEFAULT_START_ADDR; +if (config_read) +pci_dev-cap.config_read = config_read; +else +pci_dev-cap.config_read = pci_default_cap_read_config; +if (config_write) +pci_dev-cap.config_write = config_write; +else +pci_dev-cap.config_write = pci_default_cap_write_config; +pci_dev-cap.supported = 1; +pci_dev-config[0x34] = pci_dev-cap.start; +config_init(pci_dev); +} diff --git a/qemu/hw/pci.h b/qemu/hw/pci.h index e11fbbf..86b4ae5 100644 --- a/qemu/hw/pci.h +++ b/qemu/hw/pci.h @@ -19,6 +19,12 @@ typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int region_num, uint32_t addr, uint32_t size, int type); typedef int PCIUnregisterFunc(PCIDevice *pci_dev); +typedef void PCICapConfigWriteFunc(PCIDevice *pci_dev, + uint32_t address, uint32_t val, int len); +typedef uint32_t PCICapConfigReadFunc(PCIDevice *pci_dev, + uint32_t address, int len); +typedef void PCICapConfigInitFunc(PCIDevice *pci_dev); + #define PCI_ADDRESS_SPACE_MEM 0x00 #define PCI_ADDRESS_SPACE_IO 0x01 #define PCI_ADDRESS_SPACE_MEM_PREFETCH 0x08 @@ -46,6 +52,10 @@ typedef struct PCIIORegion { #define PCI_MIN_GNT0x3e/* 8 bits */ #define PCI_MAX_LAT0x3f/* 8 bits */ +#define PCI_CAPABILITY_CONFIG_MAX_LENGTH 0x60 +#define PCI_CAPABILITY_CONFIG_DEFAULT_START_ADDR 0x40 +#define PCI_CAPABILITY_CONFIG_MSI_LENGTH 0x10 + struct PCIDevice { /* PCI config space */ uint8_t config[256]; @@ -68,6 +78,15 @@
[PATCH 0/5][v2] Userspace for MSI support of KVM
Hi Avi Anthony Here is the userspace for MSI support of KVM. Main change from v1: Make device assignment depends on libpci. Move capability framework to pci.c (this patch may can be accepted by QEmu). Thanks! -- regards Yang, Sheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] Make device assignment depend on libpci
Which is used later for capability detection. Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- qemu/Makefile.target |1 + qemu/configure | 20 2 files changed, 21 insertions(+), 0 deletions(-) diff --git a/qemu/Makefile.target b/qemu/Makefile.target index 05ace8e..59653ba 100644 --- a/qemu/Makefile.target +++ b/qemu/Makefile.target @@ -735,6 +735,7 @@ OBJS += device-hotplug.o ifeq ($(USE_KVM_DEVICE_ASSIGNMENT), 1) OBJS+= device-assignment.o +LIBS+=-lpci endif ifeq ($(TARGET_BASE_ARCH), i386) diff --git a/qemu/configure b/qemu/configure index 18ef980..bdde5ed 100755 --- a/qemu/configure +++ b/qemu/configure @@ -808,6 +808,26 @@ EOF fi fi +# libpci probe for kvm_cap_device_assignment +if test $kvm_cap_device_assignment = yes ; then +cat $TMPC EOF +#include pci/pci.h +#ifndef PCI_VENDOR_ID +#error NO LIBPCI +#endif +int main(void) { return 0; } +EOF +if $cc $ARCH_CFLAGS -o $TMPE ${OS_CFLAGS} $TMPC 2/dev/null ; then +: +else +echo +echo Error: libpci check failed +echo Disable KVM Device Assignment capability. +echo +kvm_cap_device_assignment=no +fi +fi + ## # zlib check -- 1.5.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] kvm: expose MSI capability to guest
Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- qemu/hw/device-assignment.c | 90 +++--- qemu/hw/device-assignment.h |2 + 2 files changed, 85 insertions(+), 7 deletions(-) diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c index d3105bc..67bd6b3 100644 --- a/qemu/hw/device-assignment.c +++ b/qemu/hw/device-assignment.c @@ -262,7 +262,8 @@ static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address, } if ((address = 0x10 address = 0x24) || address == 0x34 || -address == 0x3c || address == 0x3d) { +address == 0x3c || address == 0x3d || +pci_access_cap_config(d, address, len)) { /* used for update-mappings (BAR emulation) */ pci_default_write_config(d, address, val, len); return; @@ -296,7 +297,8 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address, AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev); if ((address = 0x10 address = 0x24) || address == 0x34 || -address == 0x3c || address == 0x3d) { +address == 0x3c || address == 0x3d || +pci_access_cap_config(d, address, len)) { val = pci_default_read_config(d, address, len); DEBUG((%x.%x): address=%04x val=0x%08x len=%d\n, (d-devfn 3) 0x1F, (d-devfn 0x7), address, val, len); @@ -325,11 +327,13 @@ do_log: DEBUG((%x.%x): address=%04x val=0x%08x len=%d\n, (d-devfn 3) 0x1F, (d-devfn 0x7), address, val, len); -/* kill the special capabilities */ -if (address == 4 len == 4) -val = ~0x10; -else if (address == 6) -val = ~0x10; +if (!pci_dev-cap.available) { +/* kill the special capabilities */ +if (address == 4 len == 4) +val = ~0x10; +else if (address == 6) +val = ~0x10; +} return val; } @@ -537,6 +541,73 @@ void assigned_dev_update_irq(PCIDevice *d) } } +#ifdef KVM_CAP_DEVICE_MSI +static void assigned_dev_enable_msi(PCIDevice *pci_dev) +{ +int r; +struct kvm_assigned_irq assigned_irq_data; +AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev); + +memset(assigned_irq_data, 0, sizeof assigned_irq_data); +assigned_irq_data.assigned_dev_id = +calc_assigned_dev_id(assigned_dev-h_busnr, +(uint8_t)assigned_dev-h_devfn); +assigned_irq_data.guest_msi.addr_lo = *(uint32_t *) +(pci_dev-cap.config + 4); +assigned_irq_data.guest_msi.data = *(uint16_t *) +(pci_dev-cap.config + 8); +assigned_irq_data.flags |= KVM_DEV_IRQ_ASSIGN_ENABLE_MSI; +r = kvm_assign_irq(kvm_context, assigned_irq_data); +if (r 0) { +perror(assigned_dev_enable_msi); +assigned_dev-cap.enabled = ~ASSIGNED_DEVICE_MSI_ENABLED; +/* Fail to enable MSI, enable INTx instead */ +assigned_dev_update_irq(pci_dev); +} +} +#endif + +void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t address, + uint32_t val, int len) +{ +AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev); +uint32_t pos = pci_dev-cap.start; +uint8_t target_byte, target_position; + +pci_default_cap_write_config(pci_dev, address, val, len); +#ifdef KVM_CAP_DEVICE_MSI +/* Check if guest want to enable MSI */ +if (assigned_dev-cap.available ASSIGNED_DEVICE_CAP_MSI) { +target_position = pos + 2; +if (address = target_position address + len target_position) { +target_byte = (uint8_t)(val (target_position - address)); +if (target_byte == 1) { +assigned_dev-cap.enabled |= ASSIGNED_DEVICE_MSI_ENABLED; +assigned_dev_enable_msi(pci_dev); +if (!assigned_dev-cap.enabled ASSIGNED_DEVICE_MSI_ENABLED) +pci_dev-cap.config[target_position - pos] = 0; +} +} +pos += PCI_CAPABILITY_CONFIG_MSI_LENGTH; +} +#endif +return; +} + +void assigned_device_pci_cap_init(PCIDevice *pci_dev) +{ +AssignedDevice *dev = container_of(pci_dev, AssignedDevice, dev); + +#ifdef KVM_CAP_DEVICE_MSI +/* Expose MSI capability + * MSI capability is the 1st capability in cap.config */ +if (dev-cap.available ASSIGNED_DEVICE_CAP_MSI) { +pci_dev-cap.config[0] = 0x5; +pci_dev-cap.length += PCI_CAPABILITY_CONFIG_MSI_LENGTH; +} +#endif +} + struct PCIDevice *init_assigned_device(AssignedDevInfo *adev, PCIBus *bus) { int r; @@ -580,6 +651,11 @@ struct PCIDevice *init_assigned_device(AssignedDevInfo *adev, PCIBus *bus) dev-h_busnr = adev-bus; dev-h_devfn = PCI_DEVFN(adev-dev, adev-func); +if (dev-cap.available) +pci_enable_capability_support(pci_dev, 0, NULL, + assigned_device_pci_cap_write_config, +
[PATCH 3/5] Figure out device capability
Try to figure out device capability in update_dev_cap(). Now we are only care about MSI capability. The function pci_find_cap_offset original function wrote by Allen for Xen. Notice the function need root privilege to work. This depends on libpci to work. Signed-off-by: Allen Kay [EMAIL PROTECTED] Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- qemu/hw/device-assignment.c | 50 +++ qemu/hw/device-assignment.h |5 2 files changed, 55 insertions(+), 0 deletions(-) diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c index 786b2f0..d3105bc 100644 --- a/qemu/hw/device-assignment.c +++ b/qemu/hw/device-assignment.c @@ -216,6 +216,35 @@ static void assigned_dev_ioport_map(PCIDevice *pci_dev, int region_num, (r_dev-v_addrs + region_num)); } +uint8_t pci_find_cap_offset(struct pci_dev *pci_dev, uint8_t cap) +{ +int id; +int max_cap = 48; +int pos = PCI_CAPABILITY_LIST; +int status; + +status = pci_read_byte(pci_dev, PCI_STATUS); +if ((status PCI_STATUS_CAP_LIST) == 0) +return 0; + +while (max_cap--) { +pos = pci_read_byte(pci_dev, pos); +if (pos 0x40) +break; + +pos = ~3; +id = pci_read_byte(pci_dev, pos + PCI_CAP_LIST_ID); + +if (id == 0xff) +break; +if (id == cap) +return pos; + +pos += PCI_CAP_LIST_NEXT; +} +return 0; +} + static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address, uint32_t val, int len) { @@ -367,6 +396,25 @@ static int assigned_dev_register_regions(PCIRegion *io_regions, return 0; } +static void update_dev_cap(AssignedDevice *pci_dev, uint8_t r_bus, + uint8_t r_dev, uint8_t r_func) +{ +#ifdef KVM_CAP_DEVICE_MSI +struct pci_access *pacc; +struct pci_dev *pdev; +int r; + +pacc = pci_alloc(); +pci_init(pacc); +pdev = pci_get_dev(pacc, 0, r_bus, r_dev, r_func); +pci_cleanup(pacc); +r = pci_find_cap_offset(pdev, PCI_CAP_ID_MSI); +if (r) +pci_dev-cap.available |= ASSIGNED_DEVICE_CAP_MSI; +pci_free_dev(pdev); +#endif +} + static int get_real_device(AssignedDevice *pci_dev, uint8_t r_bus, uint8_t r_dev, uint8_t r_func) { @@ -436,6 +484,8 @@ again: fclose(f); dev-region_number = r; + +update_dev_cap(pci_dev, r_bus, r_dev, r_func); return 0; } diff --git a/qemu/hw/device-assignment.h b/qemu/hw/device-assignment.h index d6caa67..de60988 100644 --- a/qemu/hw/device-assignment.h +++ b/qemu/hw/device-assignment.h @@ -29,6 +29,7 @@ #define __DEVICE_ASSIGNMENT_H__ #include sys/mman.h +#include pci/pci.h #include qemu-common.h #include sys-queue.h #include pci.h @@ -80,6 +81,10 @@ typedef struct { unsigned char h_busnr; unsigned int h_devfn; int bound; +struct { +#define ASSIGNED_DEVICE_CAP_MSI (1 0) +int available; +} cap; } AssignedDevice; typedef struct AssignedDevInfo AssignedDevInfo; -- 1.5.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] always assign userspace_addr
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index b1953ee..f605bba 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -735,11 +735,17 @@ int __kvm_set_memory_region(struct kvm *kvm, base_gfn = mem-guest_phys_addr PAGE_SHIFT; npages = mem-memory_size PAGE_SHIFT; - if (!npages) - mem-flags = ~KVM_MEM_LOG_DIRTY_PAGES; - new = old = *memslot; +if (!npages) { +mem-flags = ~KVM_MEM_LOG_DIRTY_PAGES; +kvm_arch_flush_shadow(kvm); +kvm_free_physmem_slot(memslot, NULL); +kvm_arch_set_memory_region(kvm, mem, old, user_alloc); +goto out; +} + + new.base_gfn = base_gfn; Any comments about this version? In the absense of it, I'll submit a version with a SoB for inclusion. new.npages = npages; new.flags = mem-flags; @@ -812,9 +818,6 @@ int __kvm_set_memory_region(struct kvm *kvm, } #endif /* not defined CONFIG_S390 */ - if (!npages) - kvm_arch_flush_shadow(kvm); - spin_lock(kvm-mmu_lock); if (mem-slot = kvm-nmemslots) kvm-nmemslots = mem-slot + 1; -- Glauber Costa. Free as in Freedom http://glommer.net The less confident you are, the more serious you have to act. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Prevent trace call into unloaded module text
Add marker_synchronize_unregister() before module unloading. This prevents possible trace calls into unloaded module text. Signed-off-by: Wu Fengguang [EMAIL PROTECTED] --- diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index a87f45e..64f38b3 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2102,5 +2102,6 @@ void kvm_exit(void) kvm_arch_exit(); kvm_exit_debug(); __free_page(bad_page); + marker_synchronize_unregister(); } EXPORT_SYMBOL_GPL(kvm_exit); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is this a bug in qemu-img?
walt wrote: ... BTW, I've been through the same steps twice and get the same results, so I don't think it's flakey hardware. OTOH today is a new day, so I'll try it again to triple check. Tried again all the way from the beginning and got the same result. The commit step is where things go wrong every time. I know qcow2 is not considered quite ready for prime time, but having that commit feature is important to me so I'd love to see it work correctly. Any chance that 'commit' could be added to raw as well as qcow2? Thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.27.5 guest boot failure using in-kernel PIT
On Fri, Nov 21, 2008 at 3:10 PM, Marcelo Tosatti [EMAIL PROTECTED] wrote: Hi Jan, On Fri, Nov 21, 2008 at 08:54:56AM +0100, Jan Kiszka wrote: Eduardo Habkost wrote: On Thu, Nov 20, 2008 at 12:22:53PM -0200, Eduardo Habkost wrote: Hi, When using a kvm.git kernel as host, I am getting guest boot failures when booting Fedora Rawhide kernel (2.6.27.5-117.fc10.x86_64). Guest stops booting at: ENABLING IO-APIC IRQs ..TIMER: vector=0x30 apic1=0 pin1=0 apic2=-1 pin2=-1 ..MP-BIOS bug: 8254 timer not connected to IO-APIC ...trying to set up timer (IRQ0) through the 8259A ... . (found apic 0 pin 0) ... ... failed. ...trying to set up timer as Virtual Wire IRQ... . failed. ...trying to set up timer as ExtINT IRQ... I've just found out this problem happens because the guest has HZ=1000 and the host had HZ=250 and no CONFIG_HIGH_RES_TIMERS. With this setup, the host is not managing to inject enough timer interrupts during the mdelay() loop on timer_irq_works(). Interesting, and plausible. My observation so far is a sporadic test failure, often correlating with some raised host OS load. I'm running a high-res kernel, but that cannot prevent that this only 10 ticks long loop of the guest may obtain too few CPU cycles to handle enough of them once in a while (IIRC, it needs 4 out of the 10 ticks to declare the timer routing functional). Using in-kernel PIT? This is a potential problem which can be worked around by disabling the whole thing either via no_timer_check or paravirt equivalent (Glauber?) but for the non-paravirt case it seems its not the culprit. Possible failure scenarios: For KVM_CLOCK case, I believe there's absolutely no reason to be more complicated than than that: +extern int no_timer_check; + void __init kvmclock_init(void) { if (!kvm_para_available()) @@ -178,6 +180,8 @@ void __init kvmclock_init(void) if (kvmclock kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) { if (kvm_register_clock(boot clock)) return; + + no_timer_check = 1; pv_time_ops.get_wallclock = kvm_get_wallclock; pv_time_ops.set_wallclock = kvm_set_wallclock; pv_time_ops.sched_clock = kvm_clock_read; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/12] x86: disable virt on kdump and emergency_restart (v4)
On Fri, Nov 21, 2008 at 06:07:36PM +0200, Avi Kivity wrote: snip Eduardo, please check the merge (there was a small conflict in reboot.c which I fixed) once I push it. Also, when generating patches that move files, use the -M switch: this makes it easier to review, and also handles files that change better. The merge looks ok. I didn't know about -M, I will use it next time. Thanks! -- Eduardo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: MMU: optimize set_spte for page sync
On Sun, Nov 23, 2008 at 12:36:29PM +0200, Avi Kivity wrote: Marcelo Tosatti wrote: The cost of hash table and memslot lookups are quite significant if the workload is pagetable write intensive resulting in increased mmu_lock contention. @@ -1593,7 +1593,16 @@ static int set_spte(struct kvm_vcpu *vcp spte |= PT_WRITABLE_MASK; - if (mmu_need_write_protect(vcpu, gfn, can_unsync)) { +/* + * Optimization: for pte sync, if spte was writable the hash + * lookup is unnecessary (and expensive). Write protection + * is responsibility of mmu_get_page / kvm_sync_page. + * Same reasoning can be applied to dirty page accounting. + */ +if (sync_page is_writeble_pte(*shadow_pte)) +goto set_pte; What if *shadow_pte points at a different page? Is that possible? To a different gfn? Then sync_page will have nuked the spte: if (gpte_to_gfn(gpte) != gfn || !is_present_pte(gpte) || !(gpte PT_ACCESSED_MASK)) { u64 nonpresent; .. set_shadow_pte(sp-spt[i], nonpresent); } Otherwise: /* * Using the cached information from sp-gfns is safe because: * - The spte has a reference to the struct page, so the pfn for a given * gfn can't change unless all sptes pointing to it are nuked first. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFT] Rebased gdb/debug register patches
Hi, this is not yet the official submission, but a request for testing: I'm happy to announce the availability of a rebased patch series to enhance KVM's guest debugging support as well as to add debug register emulation. It was rebased because QEMU mainline recently accepted the core of my corresponding bits and KVM has merged them over. A few patches are still awaiting QEMU merge, and two of them are mandatory to provide a clean foundation for the KVM changes - therefore this intermediate step. To test the series, checkout the kernel bits from git://git.kiszka.org/linux-kvm.git gdb-queue and the user space part from git://git.kiszka.org/kvm-userspace.git gdb-queue Early feedback welcome, also before the final submission. And if someone could look into AMD/SVM implementation, this would also be great (unfortunately, there is no customer need for it ATM, thus no resources). Enjoy, Jan -- Siemens AG, Corporate Technology, CT SE 2 ES-OS Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm-userspace: Cleanup user space NMI injection
Cleanup redundant check for an open NMI window before injecting. This will no longer be supported by the kernel, and it was broken by design anyway. This change still allows to run the user space against older kernel modules. Signed-off-by: Jan Kiszka [EMAIL PROTECTED] --- libkvm/libkvm.c | 20 +++- libkvm/libkvm.h | 13 + qemu/qemu-kvm-x86.c | 16 ++-- qemu/qemu-kvm.c |6 +++--- qemu/qemu-kvm.h |2 +- user/main.c |5 ++--- 6 files changed, 16 insertions(+), 46 deletions(-) diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c index f6948f5..40c95ce 100644 --- a/libkvm/libkvm.c +++ b/libkvm/libkvm.c @@ -832,9 +832,9 @@ int try_push_interrupts(kvm_context_t kvm) return kvm-callbacks-try_push_interrupts(kvm-opaque); } -int try_push_nmi(kvm_context_t kvm) +void push_nmi(kvm_context_t kvm) { - return kvm-callbacks-try_push_nmi(kvm-opaque); + kvm-callbacks-push_nmi(kvm-opaque); } void post_kvm_run(kvm_context_t kvm, void *env) @@ -861,17 +861,6 @@ int kvm_is_ready_for_interrupt_injection(kvm_context_t kvm, int vcpu) return run-ready_for_interrupt_injection; } -int kvm_is_ready_for_nmi_injection(kvm_context_t kvm, int vcpu) -{ -#ifdef KVM_CAP_NMI - struct kvm_run *run = kvm-run[vcpu]; - - return run-ready_for_nmi_injection; -#else - return 0; -#endif -} - int kvm_run(kvm_context_t kvm, int vcpu, void *env) { int r; @@ -880,7 +869,7 @@ int kvm_run(kvm_context_t kvm, int vcpu, void *env) again: #ifdef KVM_CAP_NMI - run-request_nmi_window = try_push_nmi(kvm); + push_nmi(kvm); #endif #if !defined(__s390__) if (!kvm-irqchip_in_kernel) @@ -957,9 +946,6 @@ again: r = handle_halt(kvm, vcpu); break; case KVM_EXIT_IRQ_WINDOW_OPEN: -#ifdef KVM_CAP_NMI - case KVM_EXIT_NMI_WINDOW_OPEN: -#endif break; case KVM_EXIT_SHUTDOWN: r = handle_shutdown(kvm, env); diff --git a/libkvm/libkvm.h b/libkvm/libkvm.h index aae9f03..aaad4fb 100644 --- a/libkvm/libkvm.h +++ b/libkvm/libkvm.h @@ -66,7 +66,7 @@ struct kvm_callbacks { int (*shutdown)(void *opaque, void *env); int (*io_window)(void *opaque); int (*try_push_interrupts)(void *opaque); -int (*try_push_nmi)(void *opaque); +void (*push_nmi)(void *opaque); void (*post_kvm_run)(void *opaque, void *env); int (*pre_kvm_run)(void *opaque, void *env); int (*tpr_access)(void *opaque, int vcpu, uint64_t rip, int is_write); @@ -217,17 +217,6 @@ uint64_t kvm_get_apic_base(kvm_context_t kvm, int vcpu); int kvm_is_ready_for_interrupt_injection(kvm_context_t kvm, int vcpu); /*! - * \brief Check if a vcpu is ready for NMI injection - * - * This checks if vcpu is not already running in NMI context. - * - * \param kvm Pointer to the current kvm_context - * \param vcpu Which virtual CPU should get dumped - * \return boolean indicating NMI injection readiness - */ -int kvm_is_ready_for_nmi_injection(kvm_context_t kvm, int vcpu); - -/*! * \brief Read VCPU registers * * This gets the GP registers from the VCPU and outputs them diff --git a/qemu/qemu-kvm-x86.c b/qemu/qemu-kvm-x86.c index a4ae7ed..671b5b3 100644 --- a/qemu/qemu-kvm-x86.c +++ b/qemu/qemu-kvm-x86.c @@ -667,22 +667,18 @@ int kvm_arch_try_push_interrupts(void *opaque) return (env-interrupt_request CPU_INTERRUPT_HARD) != 0; } -int kvm_arch_try_push_nmi(void *opaque) +void kvm_arch_push_nmi(void *opaque) { CPUState *env = cpu_single_env; int r; if (likely(!(env-interrupt_request CPU_INTERRUPT_NMI))) -return 0; - -if (kvm_is_ready_for_nmi_injection(kvm_context, env-cpu_index)) { -env-interrupt_request = ~CPU_INTERRUPT_NMI; -r = kvm_inject_nmi(kvm_context, env-cpu_index); -if (r 0) -printf(cpu %d fail inject NMI\n, env-cpu_index); -} +return; -return (env-interrupt_request CPU_INTERRUPT_NMI) != 0; +env-interrupt_request = ~CPU_INTERRUPT_NMI; +r = kvm_inject_nmi(kvm_context, env-cpu_index); +if (r 0) +printf(cpu %d fail inject NMI\n, env-cpu_index); } void kvm_arch_update_regs_for_sipi(CPUState *env) diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c index 8b4cdd6..cf0e85d 100644 --- a/qemu/qemu-kvm.c +++ b/qemu/qemu-kvm.c @@ -154,9 +154,9 @@ static int try_push_interrupts(void *opaque) return kvm_arch_try_push_interrupts(opaque); } -static int try_push_nmi(void *opaque) +static void push_nmi(void *opaque) { -return kvm_arch_try_push_nmi(opaque); +kvm_arch_push_nmi(opaque); } static void post_kvm_run(void *opaque, void *data) @@ -742,7 +742,7 @@ static struct kvm_callbacks qemu_kvm_ops = { .shutdown = kvm_shutdown, .io_window = kvm_io_window, .try_push_interrupts = try_push_interrupts, -.try_push_nmi = try_push_nmi, +.push_nmi =
[ kvm-Bugs-2327497 ] NFS copy makes guest network unstable
Bugs item #2327497, was opened at 2008-11-22 17:53 Message generated for change (Comment added) made by avik You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2327497group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jiajun Xu (jiajun) Assigned to: Nobody/Anonymous (nobody) Summary: NFS copy makes guest network unstable Initial Comment: The NFS network of KVM guest is very unstable. When we copy a 600M file to the guest by NFS mount. The guest's network will down after finishing at about 500M size. Then, guest's network is down. Host also can not use ping or scp. And sometimes, host also complains: ping: sendmsg: No buffer space available. I see memory by 'free', there is only 69MB free (While totally 8GB on the machine!). Using scp to copy file can not reproduce it. This issue is very easy to be reproduced (50%). Reproduce steps: 1. Create a guest and config NFS sharing folder on it 2. Mount the nfs folder to local folder --- /media 3. cp xxx /media 4. After some time, guest network is down -- Comment By: Avi Kivity (avik) Date: 2008-11-24 17:31 Message: Seems to be a bug in the 8139too driver. Please try with the 8139cp driver (which has much better performance). -- Comment By: Jiajun Xu (jiajun) Date: 2008-11-24 09:01 Message: We did not test such case before. I think the issue also exists before. -- Comment By: Avi Kivity (avik) Date: 2008-11-23 23:13 Message: It's almost certainly a problem with the qemu process, not the bridge. -- Comment By: Fabio Coatti (cova) Date: 2008-11-23 22:15 Message: I can't find out easily wich kvm version worked (nor be sure that is kvm executable itself to have issues), as the subsystems involved are quite a lot an some time passet prior to spot the problem. (kvm itself, network birdge, host kernel may be involved, of course). Now I'm trying to find out the combination that worked, but at the same time I'll be willing to do some tests to discover (on the actual non working setup) some hints, as the bisection can be a very daunting task. (this issue has been noticed after several upgrades). -- Comment By: Avi Kivity (avik) Date: 2008-11-23 20:40 Message: Is this a regression, or a new test? It it is a regression, what was the last version that worked? -- Comment By: Fabio Coatti (cova) Date: 2008-11-23 17:12 Message: I can confirm a similar behaviour: a kvm machines gets large amounts of data via http protocol and saves that files over NFS. (file sizes are in the range of 4-20 MB approx and the machine downloads several of that files.) After some time (I don't have a precise figure, but some hundreds of MB) the guest nework goes down. No answers even to ping coming from outside. the guest uses virtio network drivers (as normal drivers are way too slow) host machine: 64 bit AMD dual quad core 16GB, tried with several kernels ranging from 2.6.27.4 to 2.6.25.19 guest: 32 bit kvm machines (tried 76/77/78 ). both UP and SMP configuration. kernels: same as host machine network setup: bridged network with br0 device on host machine. We are using 2 vlans for guest and we have tried all the configuration (single tap and vlans resolved on guest side,then two tap so two interfaces on guest machine and so on) without any improvement. I can exclude MTU issues, as we have seen that and solved, this issue is completely different. At some point, sniffing traffic on host interfaces we are able to see only ARP requests coming from guest, nothing more. I understand that data is in no way complete, but I'm willing to do any debug if someone gives me any hint on how to do so correctly. Thanks. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2327497group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2327497 ] NFS copy makes guest network unstable
Bugs item #2327497, was opened at 2008-11-22 17:53 Message generated for change (Settings changed) made by avik You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2327497group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Pending Resolution: None Priority: 5 Private: No Submitted By: Jiajun Xu (jiajun) Assigned to: Nobody/Anonymous (nobody) Summary: NFS copy makes guest network unstable Initial Comment: The NFS network of KVM guest is very unstable. When we copy a 600M file to the guest by NFS mount. The guest's network will down after finishing at about 500M size. Then, guest's network is down. Host also can not use ping or scp. And sometimes, host also complains: ping: sendmsg: No buffer space available. I see memory by 'free', there is only 69MB free (While totally 8GB on the machine!). Using scp to copy file can not reproduce it. This issue is very easy to be reproduced (50%). Reproduce steps: 1. Create a guest and config NFS sharing folder on it 2. Mount the nfs folder to local folder --- /media 3. cp xxx /media 4. After some time, guest network is down -- Comment By: Avi Kivity (avik) Date: 2008-11-24 17:31 Message: Seems to be a bug in the 8139too driver. Please try with the 8139cp driver (which has much better performance). -- Comment By: Jiajun Xu (jiajun) Date: 2008-11-24 09:01 Message: We did not test such case before. I think the issue also exists before. -- Comment By: Avi Kivity (avik) Date: 2008-11-23 23:13 Message: It's almost certainly a problem with the qemu process, not the bridge. -- Comment By: Fabio Coatti (cova) Date: 2008-11-23 22:15 Message: I can't find out easily wich kvm version worked (nor be sure that is kvm executable itself to have issues), as the subsystems involved are quite a lot an some time passet prior to spot the problem. (kvm itself, network birdge, host kernel may be involved, of course). Now I'm trying to find out the combination that worked, but at the same time I'll be willing to do some tests to discover (on the actual non working setup) some hints, as the bisection can be a very daunting task. (this issue has been noticed after several upgrades). -- Comment By: Avi Kivity (avik) Date: 2008-11-23 20:40 Message: Is this a regression, or a new test? It it is a regression, what was the last version that worked? -- Comment By: Fabio Coatti (cova) Date: 2008-11-23 17:12 Message: I can confirm a similar behaviour: a kvm machines gets large amounts of data via http protocol and saves that files over NFS. (file sizes are in the range of 4-20 MB approx and the machine downloads several of that files.) After some time (I don't have a precise figure, but some hundreds of MB) the guest nework goes down. No answers even to ping coming from outside. the guest uses virtio network drivers (as normal drivers are way too slow) host machine: 64 bit AMD dual quad core 16GB, tried with several kernels ranging from 2.6.27.4 to 2.6.25.19 guest: 32 bit kvm machines (tried 76/77/78 ). both UP and SMP configuration. kernels: same as host machine network setup: bridged network with br0 device on host machine. We are using 2 vlans for guest and we have tried all the configuration (single tap and vlans resolved on guest side,then two tap so two interfaces on guest machine and so on) without any improvement. I can exclude MTU issues, as we have seen that and solved, this issue is completely different. At some point, sniffing traffic on host interfaces we are able to see only ARP requests coming from guest, nothing more. I understand that data is in no way complete, but I'm willing to do any debug if someone gives me any hint on how to do so correctly. Thanks. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2327497group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2327497 ] NFS copy makes guest network unstable
Bugs item #2327497, was opened at 2008-11-22 16:53 Message generated for change (Comment added) made by cova You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2327497group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Pending Resolution: None Priority: 5 Private: No Submitted By: Jiajun Xu (jiajun) Assigned to: Nobody/Anonymous (nobody) Summary: NFS copy makes guest network unstable Initial Comment: The NFS network of KVM guest is very unstable. When we copy a 600M file to the guest by NFS mount. The guest's network will down after finishing at about 500M size. Then, guest's network is down. Host also can not use ping or scp. And sometimes, host also complains: ping: sendmsg: No buffer space available. I see memory by 'free', there is only 69MB free (While totally 8GB on the machine!). Using scp to copy file can not reproduce it. This issue is very easy to be reproduced (50%). Reproduce steps: 1. Create a guest and config NFS sharing folder on it 2. Mount the nfs folder to local folder --- /media 3. cp xxx /media 4. After some time, guest network is down -- Comment By: Fabio Coatti (cova) Date: 2008-11-24 16:47 Message: I wouldn't be so sure of 8139 culprit. We are seeing this with e1000 and virtio driver... -- Comment By: Avi Kivity (avik) Date: 2008-11-24 16:31 Message: Seems to be a bug in the 8139too driver. Please try with the 8139cp driver (which has much better performance). -- Comment By: Jiajun Xu (jiajun) Date: 2008-11-24 08:01 Message: We did not test such case before. I think the issue also exists before. -- Comment By: Avi Kivity (avik) Date: 2008-11-23 22:13 Message: It's almost certainly a problem with the qemu process, not the bridge. -- Comment By: Fabio Coatti (cova) Date: 2008-11-23 21:15 Message: I can't find out easily wich kvm version worked (nor be sure that is kvm executable itself to have issues), as the subsystems involved are quite a lot an some time passet prior to spot the problem. (kvm itself, network birdge, host kernel may be involved, of course). Now I'm trying to find out the combination that worked, but at the same time I'll be willing to do some tests to discover (on the actual non working setup) some hints, as the bisection can be a very daunting task. (this issue has been noticed after several upgrades). -- Comment By: Avi Kivity (avik) Date: 2008-11-23 19:40 Message: Is this a regression, or a new test? It it is a regression, what was the last version that worked? -- Comment By: Fabio Coatti (cova) Date: 2008-11-23 16:12 Message: I can confirm a similar behaviour: a kvm machines gets large amounts of data via http protocol and saves that files over NFS. (file sizes are in the range of 4-20 MB approx and the machine downloads several of that files.) After some time (I don't have a precise figure, but some hundreds of MB) the guest nework goes down. No answers even to ping coming from outside. the guest uses virtio network drivers (as normal drivers are way too slow) host machine: 64 bit AMD dual quad core 16GB, tried with several kernels ranging from 2.6.27.4 to 2.6.25.19 guest: 32 bit kvm machines (tried 76/77/78 ). both UP and SMP configuration. kernels: same as host machine network setup: bridged network with br0 device on host machine. We are using 2 vlans for guest and we have tried all the configuration (single tap and vlans resolved on guest side,then two tap so two interfaces on guest machine and so on) without any improvement. I can exclude MTU issues, as we have seen that and solved, this issue is completely different. At some point, sniffing traffic on host interfaces we are able to see only ARP requests coming from guest, nothing more. I understand that data is in no way complete, but I'm willing to do any debug if someone gives me any hint on how to do so correctly. Thanks. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2327497group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] really remove a slow when a user ask us so
Right now, KVM does not remove a slot when we do a register ioctl for size 0 (would be the expected behaviour). Instead, we only mark it as empty, but keep all bitmaps and allocated data structures present. It completely nullifies our chances of reusing that same slot again for mapping a different piece of memory. In this patch, we destroy rmaps, and vfree() the pointers that used to hold the dirty bitmap, rmap and lpage_info structures. Signed-off-by: Glauber Costa [EMAIL PROTECTED] --- virt/kvm/kvm_main.c | 15 +-- 1 files changed, 9 insertions(+), 6 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index b1953ee..f605bba 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -735,11 +735,17 @@ int __kvm_set_memory_region(struct kvm *kvm, base_gfn = mem-guest_phys_addr PAGE_SHIFT; npages = mem-memory_size PAGE_SHIFT; - if (!npages) - mem-flags = ~KVM_MEM_LOG_DIRTY_PAGES; - new = old = *memslot; +if (!npages) { +mem-flags = ~KVM_MEM_LOG_DIRTY_PAGES; +kvm_arch_flush_shadow(kvm); +kvm_free_physmem_slot(memslot, NULL); +kvm_arch_set_memory_region(kvm, mem, old, user_alloc); +goto out; +} + + new.base_gfn = base_gfn; new.npages = npages; new.flags = mem-flags; @@ -812,9 +818,6 @@ int __kvm_set_memory_region(struct kvm *kvm, } #endif /* not defined CONFIG_S390 */ - if (!npages) - kvm_arch_flush_shadow(kvm); - spin_lock(kvm-mmu_lock); if (mem-slot = kvm-nmemslots) kvm-nmemslots = mem-slot + 1; -- 1.5.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] sign kvmclock as paravirt
Currently, we only set the KVM paravirt signature in case of CONFIG_KVM_GUEST. However, it is possible to have it turned off, while CONFIG_KVM_CLOCK is turned on. This is also a paravirt case, and should be shown accordingly. Signed-off-by: Glauber Costa [EMAIL PROTECTED] --- arch/x86/kernel/kvmclock.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index 1c9cc43..4a1ee5a 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -194,5 +194,7 @@ void __init kvmclock_init(void) #endif kvm_get_preset_lpj(); clocksource_register(kvm_clock); + pv_info.paravirt_enabled = 1; + pv_info.name = KVM; } } -- 1.5.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: MMU: optimize set_spte for page sync
On Mon, Nov 24, 2008 at 01:04:23PM +0100, Marcelo Tosatti wrote: On Sun, Nov 23, 2008 at 12:36:29PM +0200, Avi Kivity wrote: Marcelo Tosatti wrote: The cost of hash table and memslot lookups are quite significant if the workload is pagetable write intensive resulting in increased mmu_lock contention. @@ -1593,7 +1593,16 @@ static int set_spte(struct kvm_vcpu *vcp spte |= PT_WRITABLE_MASK; - if (mmu_need_write_protect(vcpu, gfn, can_unsync)) { + /* + * Optimization: for pte sync, if spte was writable the hash + * lookup is unnecessary (and expensive). Write protection + * is responsibility of mmu_get_page / kvm_sync_page. + * Same reasoning can be applied to dirty page accounting. + */ + if (sync_page is_writeble_pte(*shadow_pte)) + goto set_pte; What if *shadow_pte points at a different page? Is that possible? To a different gfn? Then sync_page will have nuked the spte: if (gpte_to_gfn(gpte) != gfn || !is_present_pte(gpte) || !(gpte PT_ACCESSED_MASK)) { u64 nonpresent; .. set_shadow_pte(sp-spt[i], nonpresent); } Otherwise: /* * Using the cached information from sp-gfns is safe because: * - The spte has a reference to the struct page, so the pfn for a given * gfn can't change unless all sptes pointing to it are nuked first. *shadow_pte can point to a different page if the guest updates pagetable, there is a fault before resync, the fault updates the spte with new gfn (and pfn) via mmu_set_spte. In which case the gfn cache is updated since: } else if (pfn != spte_to_pfn(*shadow_pte)) { printk(hfn old %lx new %lx\n, spte_to_pfn(*shadow_pte), pfn); rmap_remove(vcpu-kvm, shadow_pte); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
hrtimer_forward() semantics when using non-high-res timers
Hi, Thomas, I've been looking at a timer problem on KVM recently[1] and I've got a question about the expected semantics of hrtimer_forward(). The problem I am looking at is related to having proper accouting of missed ticks on the KVM timer code when it the host has lost timer ticks because of high CPU load, or because it doesn't have hrtimers enabled. hrtimer_forward_now() overrun accounting looked perfect for the task of checking how many ticks we have lost. However hrtimer_forward() limits the interval parameter to the timer resolution, making it useless for calculating how many timer periods we've lost because of too-low timer resolution. I am even a bit surprised no other code needs a hrtimer_forward-like function for that, yet. For example: if we want to account for a tick every 1 ms and the host has HZ=250 and no high-resolution timers, calling hrtimer_forward_now() on every timer tick will normally return 1 because it will count how many 4 ms periods were added to the timer expiration time. However, I would like to calculate how many 1 ms periods I've lost, no matter what the real timer resolution is. I could do my own missed-ticks calculation, but the hrtimer_forward() logic would be perfect for my needs if it didn't have the resolution check code, and I don't feel like duplicating part of hrtimer_forward(). Do you think it would make sense to have on the timers API a hrtimer_forward-like function that doesn't have the interval lower-limit? [1] http://marc.info/?l=kvmm=122728725028262w=2 -- Eduardo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: ppc: stop leaking host memory on VM exit
When the VM exits, we must call put_page() for every page referenced in the shadow TLB. Without this patch, we usually leak 30-50 host pages (120 - 200 KiB with 4 KiB pages). The maximum number of pages leaked is the size of our shadow TLB, 64 pages. Signed-off-by: Hollis Blanchard [EMAIL PROTECTED] --- The obvious question is why didn't we see this before? Basically, we'd never looked for it, and since most of our work was in the kernel we always ended up rebooting before exhausting host memory. Since it's such a large leak, and a simple fix, please commit this for 2.6.28. This patch does apply to kvm.git with fuzz, but if you prefer I can send a separate patch for that later. diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -104,4 +104,6 @@ static inline void kvmppc_set_pid(struct } } +extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu); + #endif /* __POWERPC_KVM_PPC_H__ */ diff --git a/arch/powerpc/kvm/44x_tlb.c b/arch/powerpc/kvm/44x_tlb.c --- a/arch/powerpc/kvm/44x_tlb.c +++ b/arch/powerpc/kvm/44x_tlb.c @@ -124,6 +124,14 @@ static void kvmppc_44x_shadow_release(st } } +void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu) +{ + int i; + + for (i = 0; i = tlb_44x_hwater; i++) + kvmppc_44x_shadow_release(vcpu, i); +} + void kvmppc_tlbe_set_modified(struct kvm_vcpu *vcpu, unsigned int i) { vcpu-arch.shadow_tlb_mod[i] = 1; diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -238,6 +238,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu * void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) { + kvmppc_core_destroy_mmu(vcpu); } /* Note: clearing MSR[DE] just means that the debug interrupt will not be -- Hollis Blanchard IBM Linux Technology Center -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Direct kernel boot without harddrive image
- Daire Byrne [EMAIL PROTECTED] wrote: I tried with -no-kvm and I get the same crash when I reboot the VM. I suppose it's a qemu bug then. I tried with the latest kvm-qemu (78) but perhaps I should try the latest Qemu and if it still breaks report the bug on the Qemu mailing list? It is like it forgets to boot the kernel and initrd again after a reboot and tries to boot from the harddrive instead. More weirdness with direct booting - using more than 2048MB causes the BIOS to repeatedly crash out. This only happens using -kernel and -initrd. Daire -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
gettimeofday slow in RHEL4 guests
I noticed that gettimeofday in RHEL4.6 guests is taking much longer than with RHEL3.8 guests. I wrote a simple program (see below) to call gettimeofday in a loop 1,000,000 times and then used time to measure how long it took. For the RHEL3.8 guest: time -p ./timeofday_bench real 0.99 user 0.12 sys 0.24 For the RHEL4.6 guest with the default clock source (pmtmr): time -p ./timeofday_bench real 15.65 user 0.18 sys 15.46 and RHEL4.6 guest with PIT as the clock source (clock=pit kernel parameter): time -p ./timeofday_bench real 13.67 user 0.21 sys 13.45 So, basically gettimeofday() takes about 50 times as long on a RHEL4 guest. Host is a DL380G5, 2 dual-core Xeon 5140 processors, 4 GB of RAM. It's running kvm.git tree as of 11/18/08 with kvm-75 userspace. Guest in both RHEL3 and RHEL4 cases has 4 vcpus, 3.5GB of RAM. david -- timeofday_bench.c: #include sys/time.h #include stdio.h #include stdlib.h int main(int argc, char *argv[]) { int rc = 0, n; struct timeval tv; int iter = 100; /* number of times to call gettimeofday */ if (argc 1) iter = atoi(argv[1]); if (iter == 0) { fprintf(stderr, invalid number of iterations\n); return 1; } printf(starting ); for (n = 0; n iter; ++n) { if (gettimeofday(tv, NULL) != 0) { fprintf(stderr, \ngettimeofday failed\n); rc = 1; break; } } if (!rc) printf(done\n); return rc; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is this a bug in qemu-img?
walt wrote: Any chance that 'commit' could be added to raw as well as qcow2? Raw images by their nature can't contain metadata -- they have only the exact contents of the virtual drive, which is what makes them raw -- so they by definition can't support copy-on-write (and thus commit) or other functionality requiring metadata. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Enable Pass Through Feature in Intel IOMMU
The patch set adds kernel parameter intel_iommu=pt to set up pass through mode in context mapping entry. This disables DMAR in linux kernel; but KVM still runs on VT-d. In this mode, kernel uses swiotlb for DMA API functions but other VT-d functionalities are enabled for KVM. KVM always uses multi level translation page table in VT-d. By default, pass though mode is disabled in kernel. This is useful when people don't want to enable VT-d DMAR in kernel for reasons like kernel iommu performance concern or debug purpose but still want to use KVM. Thanks. -Fenghua Signed-off-by: Fenghua Yu [EMAIL PROTECTED] Signed-off-by: Weidong Han [EMAIL PROTECTED] Signed-off-by: Allen Kay [EMAIL PROTECTED] Signed-off-by: David Woodhouse [EMAIL PROTECTED] --- Documentation/kernel-parameters.txt |5 +++ arch/ia64/include/asm/iommu.h |1 arch/ia64/kernel/pci-swiotlb.c |2 - arch/x86/include/asm/iommu.h|1 arch/x86/kernel/pci-swiotlb_64.c|4 ++- drivers/pci/intel-iommu.c | 47 ++-- include/linux/dma_remapping.h |3 ++ include/linux/intel-iommu.h |3 +- 8 files changed, 50 insertions(+), 16 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index e0f346d..b966185 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -931,6 +931,11 @@ and is between 256 and 4096 characters. It is defined in the file With this option on every unmap_single operation will result in a hardware IOTLB flush operation as opposed to batching them for performance. + pt [Default no Pass Through] + This option enables Pass Through in context mapping if + Pass Through is supported in hardware. With this option + DMAR is disabled in kernel and kernel uses swiotlb, but + KVM still uses VT-d hardware. io_delay= [X86-32,X86-64] I/O delay method 0x80 diff --git a/arch/ia64/include/asm/iommu.h b/arch/ia64/include/asm/iommu.h index 0490794..37d41ca 100644 --- a/arch/ia64/include/asm/iommu.h +++ b/arch/ia64/include/asm/iommu.h @@ -9,6 +9,7 @@ extern void pci_iommu_shutdown(void); extern void no_iommu_init(void); extern int force_iommu, no_iommu; extern int iommu_detected; +extern int iommu_pass_through; extern void iommu_dma_init(void); extern void machvec_init(const char *name); diff --git a/arch/ia64/kernel/pci-swiotlb.c b/arch/ia64/kernel/pci-swiotlb.c index 16c5051..69135b0 100644 --- a/arch/ia64/kernel/pci-swiotlb.c +++ b/arch/ia64/kernel/pci-swiotlb.c @@ -32,7 +32,7 @@ struct dma_mapping_ops swiotlb_dma_ops = { void __init pci_swiotlb_init(void) { - if (!iommu_detected) { + if (!iommu_detected || iommu_pass_through) { #ifdef CONFIG_IA64_GENERIC swiotlb = 1; printk(KERN_INFO PCI-DMA: Re-initialize machine vector.\n); diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h index 0b500c5..014e94f 100644 --- a/arch/x86/include/asm/iommu.h +++ b/arch/x86/include/asm/iommu.h @@ -6,6 +6,7 @@ extern void no_iommu_init(void); extern struct dma_mapping_ops nommu_dma_ops; extern int force_iommu, no_iommu; extern int iommu_detected; +extern int iommu_pass_through; extern unsigned long iommu_nr_pages(unsigned long addr, unsigned long len); diff --git a/arch/x86/kernel/pci-swiotlb_64.c b/arch/x86/kernel/pci-swiotlb_64.c index 3c539d1..4af2425 100644 --- a/arch/x86/kernel/pci-swiotlb_64.c +++ b/arch/x86/kernel/pci-swiotlb_64.c @@ -50,8 +50,10 @@ struct dma_mapping_ops swiotlb_dma_ops = { void __init pci_swiotlb_init(void) { /* don't initialize swiotlb if iommu=off (no_iommu=1) */ - if (!iommu_detected !no_iommu max_pfn MAX_DMA32_PFN) + if ((!iommu_detected !no_iommu max_pfn MAX_DMA32_PFN) || + iommu_pass_through) swiotlb = 1; + if (swiotlb_force) swiotlb = 1; if (swiotlb) { diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c index aec60ad..f164a3c 100644 --- a/drivers/pci/intel-iommu.c +++ b/drivers/pci/intel-iommu.c @@ -120,7 +120,6 @@ struct context_entry { (c).lo = (((u64)-1) 4) | 3; \ (c).lo |= ((val) 3) 2; \ } while (0) -#define CONTEXT_TT_MULTI_LEVEL 0 #define context_set_address_root(c, val) \ do {(c).lo |= (val) VTD_PAGE_MASK; } while (0) #define context_set_address_width(c, val) do {(c).hi |= (val) 7;} while (0) @@ -203,6 +202,7 @@ static long list_size; static void domain_remove_dev_info(struct dmar_domain *domain); int dmar_disabled; +int iommu_pass_through; static int __initdata dmar_map_gfx = 1; static int dmar_forcedac; static int intel_iommu_strict; @@ -231,6 +231,9 @@ static int __init
[PATCH 2/2] Enable Pass Through Feature in Intel IOMMU
The patch set adds kernel parameter intel_iommu=pt to set up pass through mode in context mapping entry. This disables DMAR in linux kernel; but KVM still runs on VT-d. In this mode, kernel uses swiotlb for DMA API functions but other VT-d functionalities are enabled for KVM. By default, pass though mode is disabled in kernel. This second patch changes context mapping interface called in KVM vtd.c. KVM always uses multi level translation page table in VT-d. Signed-off-by: Fenghua Yu [EMAIL PROTECTED] Signed-off-by: Weidong Han [EMAIL PROTECTED] Signed-off-by: Allen Kay [EMAIL PROTECTED] Signed-off-by: David Woodhouse [EMAIL PROTECTED] --- vtd.c |2 +- 1 files changed, 1 insertion(+), 1 deletion(-) diff --git a/virt/kvm/vtd.c b/virt/kvm/vtd.c index a770874..7b753d7 100644 --- a/virt/kvm/vtd.c +++ b/virt/kvm/vtd.c @@ -124,7 +124,7 @@ int kvm_iommu_map_guest(struct kvm *kvm, pdev-bus-number, pdev-devfn); r = intel_iommu_context_mapping(kvm-arch.intel_iommu_domain, - pdev); + pdev, CONTEXT_TT_MULTI_LEVEL); if (r) { printk(KERN_ERR Domain context map for %s failed, pci_name(pdev)); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: external module: fix kernel header rsync
When the shell encounters a glob it can't expand (like arch/powerpc/include/asm/vmx*.h), it leaves the raw pattern behind. rsync then looks for a file named arch/powerpc/include/asm/vmx*.h (without trying to do its own globbing) and fails. Fix by using make's $(wildcard) function for the expansion, which does not leave unexpanded patterns behind. Signed-off-by: Hollis Blanchard [EMAIL PROTECTED] diff --git a/kernel/Makefile b/kernel/Makefile --- a/kernel/Makefile +++ b/kernel/Makefile @@ -57,22 +57,22 @@ header-link: T = $(subst -sync,,$@)-tmp -headers-old = $(LINUX)/./include/asm-$(ARCH_DIR)/kvm*.h -headers-new = $(LINUX)/arch/$(ARCH_DIR)/include/asm/./kvm*.h \ +headers-old = $(wildcard $(LINUX)/./include/asm-$(ARCH_DIR)/kvm*.h) +headers-new = $(wildcard $(LINUX)/arch/$(ARCH_DIR)/include/asm/./kvm*.h \ $(LINUX)/arch/$(ARCH_DIR)/include/asm/./vmx*.h \ $(LINUX)/arch/$(ARCH_DIR)/include/asm/./svm*.h \ - $(LINUX)/arch/$(ARCH_DIR)/include/asm/./virtext*.h + $(LINUX)/arch/$(ARCH_DIR)/include/asm/./virtext*.h) header-sync: rm -rf $T rsync -R \ $(LINUX)/./include/linux/kvm*.h \ -$(if $(wildcard $(headers-old)), $(headers-old)) \ - $T/ - $(if $(wildcard $(headers-new)), \ +$(headers-old) \ +$T/ + $(if $(headers-new), \ rsync -R \ $(headers-new) \ - $T/include/asm-$(ARCH_DIR)/) +$T/include/asm-$(ARCH_DIR)/) for i in $$(find $T -name '*.h'); do \ $(call unifdef,$$i); done -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] kvm: ppc: stop leaking host memory on VM exit
Good catch. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Hollis Blanchard Sent: Tuesday, November 25, 2008 1:38 AM To: Avi Kivity Cc: kvm-ppc; kvm Subject: [PATCH] kvm: ppc: stop leaking host memory on VM exit When the VM exits, we must call put_page() for every page referenced in the shadow TLB. Without this patch, we usually leak 30-50 host pages (120 - 200 KiB with 4 KiB pages). The maximum number of pages leaked is the size of our shadow TLB, 64 pages. Signed-off-by: Hollis Blanchard [EMAIL PROTECTED] --- The obvious question is why didn't we see this before? Basically, we'd never looked for it, and since most of our work was in the kernel we always ended up rebooting before exhausting host memory. Since it's such a large leak, and a simple fix, please commit this for 2.6.28. This patch does apply to kvm.git with fuzz, but if you prefer I can send a separate patch for that later. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gettimeofday slow in RHEL4 guests
Some more data on this overhead. RHEL3 (which is based on the 2.4.21 kernel) gets microsecond resolutions by reading the TSC. Reading the TSC from within a guest is very fast on kvm. RHEL4 (which is basd on the 2.6.9 kernel) allows multiple time sources: pmtmr (ACPI power management timer which is the default), pit, hpet and TSC. The pmtmr and pit both do ioport reads to get microsecond resolutions (see read_pmtmr and get_offset_pit, respectively). For the tsc as the timer source gettimeofday is *very* lightweight, but time drifts very badly and ntpd cannot acquire a sync. I believe someone is working on the HPET for guests and I know from bare metal performance that it is a much lighter weight time source, but with RHEL4 the HPET breaks the ability to use the RTC. So, I'm running out of options for reliable and lightweight time sources. Any chance the pit or pmtmr options can be optimized a bit? thanks, david PS. yes, I did try the userspace pit and its performance is worse than the in-kernel PIT. David S. Ahern wrote: I noticed that gettimeofday in RHEL4.6 guests is taking much longer than with RHEL3.8 guests. I wrote a simple program (see below) to call gettimeofday in a loop 1,000,000 times and then used time to measure how long it took. For the RHEL3.8 guest: time -p ./timeofday_bench real 0.99 user 0.12 sys 0.24 For the RHEL4.6 guest with the default clock source (pmtmr): time -p ./timeofday_bench real 15.65 user 0.18 sys 15.46 and RHEL4.6 guest with PIT as the clock source (clock=pit kernel parameter): time -p ./timeofday_bench real 13.67 user 0.21 sys 13.45 So, basically gettimeofday() takes about 50 times as long on a RHEL4 guest. Host is a DL380G5, 2 dual-core Xeon 5140 processors, 4 GB of RAM. It's running kvm.git tree as of 11/18/08 with kvm-75 userspace. Guest in both RHEL3 and RHEL4 cases has 4 vcpus, 3.5GB of RAM. david -- timeofday_bench.c: #include sys/time.h #include stdio.h #include stdlib.h int main(int argc, char *argv[]) { int rc = 0, n; struct timeval tv; int iter = 100; /* number of times to call gettimeofday */ if (argc 1) iter = atoi(argv[1]); if (iter == 0) { fprintf(stderr, invalid number of iterations\n); return 1; } printf(starting ); for (n = 0; n iter; ++n) { if (gettimeofday(tv, NULL) != 0) { fprintf(stderr, \ngettimeofday failed\n); rc = 1; break; } } if (!rc) printf(done\n); return rc; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Weekly KVM Test report, kernel 30d95f ... userspace fc94d1 ...
On Monday, November 24, 2008 12:57 AM Avi Kivity wrote: Xu, Jiajun wrote: 2. failure to migrate guests with more than 4GB of RAM https://sourceforge.net/tracker/index.php?func=detailaid=19715 12group_id=180599atid=893831 Can you retest this? I successfully migrated a 5G guest (from a 4G host to itself; slo...)/ I tried latest commit, userspace.git 6e63ba19476753595e508713eb9daf559dc50bf6 with a 64-bit RHEL5.1 Guest. My host kernel is 2.6.26.2. And My host has 8GB memory and 4GB swap. Guest can be live migrated, but after that, guest will call trace. Maybe we can have a check with each other's environment. My steps as following: 1. qemu-system-x86_64 -incoming tcp:localhost: -m 4096 -net nic,macaddr=00:16:3e:44:1a:a6,model=rtl8139 -net tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/rhel5u1.img 2. qemu-system-x86_64 -m 4096 -net nic,macaddr=00:16:3e:44:1a:a6,model=rtl8139 -net tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/rhel5u1.img 3. In qemu console, type migrate tcp:localhost: The call trace messages in guest: ### Kernel BUG at block/elevator.c:560 invalid opcode: [1] SMP last sysfs file: /block/hda/removable CPU 0 Modules linked in: ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc iscsi_tcp ib_iser libiscsi scsi_transport_iscsi rdma_ucm ib_ucm ib_srp ib_sdp rdma_cm ib_cm iw_cm ib_addr ib_local_sa ib_ipoib ib_sa ib_uverbs ib_umad ib_mad ib_core dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac lp floppy pcspkr serio_raw 8139cp 8139too parport_pc parport mii ide_cd cdrom ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 0, comm: swapper Not tainted 2.6.18-53.el5 #1 RIP: 0010:[80134673] [80134673] elv_dequeue_request+0x8/0x3c RSP: 0018:8040ddc0 EFLAGS: 00010046 RAX: 0001 RBX: 81011381b398 RCX: RDX: 81011381b398 RSI: 81011381b398 RDI: 81011fb912c0 RBP: 804abe18 R08: 80304108 R09: 0012 R10: 0022 R11: R12: R13: 0001 R14: 0086 R15: 8040deb8 FS: () GS:80396000() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 2ad6f4d0 CR3: 0001126cc000 CR4: 06e0 Process swapper (pid: 0, threadinfo 803c6000, task 802dcae0) Stack: 8000ae3c 804abe18 804abe50 804abd00 0246 8003ba73 8003ba0c 804abe18 81011fbe5800 8000d2a5 81011fb8c5c0 Call Trace: IRQ [8000ae3c] ide_end_request+0xc6/0xfc [8003ba73] ide_dma_intr+0x67/0xab [8003ba0c] ide_dma_intr+0x0/0xab [8000d2a5] ide_intr+0x16f/0x1df [800107a0] handle_IRQ_event+0x29/0x58 [800b5482] __do_IRQ+0xa4/0x105 [8006a3bd] do_IRQ+0xe7/0xf5 [8005b615] ret_from_intr+0x0/0xa [80011ca9] __do_softirq+0x53/0xd5 [8005c2fc] call_softirq+0x1c/0x28 [8006a53a] do_softirq+0x2c/0x85 [80068d0e] default_idle+0x0/0x50 [8005bc8e] apic_timer_interrupt+0x66/0x6c EOI [80068d37] default_idle+0x29/0x50 [80046f8d] cpu_idle+0x95/0xb8 [803d1806] start_kernel+0x220/0x225 [803d1237] _sinittext+0x237/0x23e Code: 0f 0b 68 25 50 29 80 c2 30 02 48 8b 46 08 48 89 42 08 48 89 RIP [80134673] elv_dequeue_request+0x8/0x3c RSP 8040ddc0 0Kernel panic - not syncing: Fatal exception BUG: warning at kernel/panic.c:137/panic() (Not tainted) Call Trace: IRQ [8008ccca] panic+0x1e3/0x1f4 [80196ae8] do_unblank_screen+0x1b/0x132 [800631aa] oops_end+0x51/0x53 [80069689] die+0x3a/0x44 [80069c37] do_invalid_op+0xad/0xb7 [80134673] elv_dequeue_request+0x8/0x3c [80092dd4] do_timer+0x2e8/0x53c [8006c0cc] main_timer_handler+0x23d/0x3f4 [8005bde9] error_exit+0x0/0x84 [80134673] elv_dequeue_request+0x8/0x3c [8000ae3c] ide_end_request+0xc6/0xfc [8003ba73] ide_dma_intr+0x67/0xab [8003ba0c] ide_dma_intr+0x0/0xab [8000d2a5] ide_intr+0x16f/0x1df [800107a0] handle_IRQ_event+0x29/0x58 [800b5482] __do_IRQ+0xa4/0x105 [8006a3bd] do_IRQ+0xe7/0xf5 [8005b615] ret_from_intr+0x0/0xa [80011ca9] __do_softirq+0x53/0xd5 [8005c2fc] call_softirq+0x1c/0x28 [8006a53a] do_softirq+0x2c/0x85 [80068d0e] default_idle+0x0/0x50 [8005bc8e] apic_timer_interrupt+0x66/0x6c EOI [80068d37] default_idle+0x29/0x50 [80046f8d] cpu_idle+0x95/0xb8 [803d1806] start_kernel+0x220/0x225 [803d1237] _sinittext+0x237/0x23e BUG: warning at drivers/input/serio/i8042.c:846/i8042_panic_blink() (Not tainted) Call Trace: IRQ [801ee9b8]
[ kvm-Bugs-1971512 ] failure to migrate guests with more than 4GB of RAM
Bugs item #1971512, was opened at 2008-05-24 14:45 Message generated for change (Comment added) made by jiajun You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1971512group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Pending Resolution: Fixed Priority: 3 Private: No Submitted By: Marcelo Tosatti (mtosatti) Assigned to: Anthony Liguori (aliguori) Summary: failure to migrate guests with more than 4GB of RAM Initial Comment: The migration code assumes linear phys_ram_base: [EMAIL PROTECTED] kvm-userspace.tip]# qemu/x86_64-softmmu/qemu-system-x86_64 -hda /root/images/marcelo5-io-test.img -m 4097 -net nic,model=rtl8139 -net tap,script=/root/iptables/ifup -incoming tcp://0:/ audit_log_user_command(): Connection refused audit_log_user_command(): Connection refused migration: memory size mismatch: recv 22032384 mine 4316999680 migrate_incoming_fd failed (rc=232) -- Comment By: Jiajun Xu (jiajun) Date: 2008-11-24 21:52 Message: I tried latest commit, userspace.git 6e63ba19476753595e508713eb9daf559dc50bf6 with a 64-bit RHEL5.1 Guest. My host kernel is 2.6.26.2. And My host has 8GB memory and 4GB swap. Guest can be live migrated, but after that, guest will call trace. Maybe we can have a check with each other's environment. My steps as following: 1. qemu-system-x86_64 -incoming tcp:localhost: -m 4096 -net nic,macaddr=00:16:3e:44:1a:a6,model=rtl8139 -net tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/rhel5u1.img 2. qemu-system-x86_64 -m 4096 -net nic,macaddr=00:16:3e:44:1a:a6,model=rtl8139 -net tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/rhel5u1.img 3. In qemu console, type migrate tcp:localhost: The call trace messages in guest: ### Kernel BUG at block/elevator.c:560 invalid opcode: [1] SMP last sysfs file: /block/hda/removable CPU 0 Modules linked in: ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc iscsi_tcp ib_iser libiscsi scsi_transport_iscsi rdma_ucm ib_ucm ib_srp ib_sdp rdma_cm ib_cm iw_cm ib_addr ib_local_sa ib_ipoib ib_sa ib_uverbs ib_umad ib_mad ib_core dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac lp floppy pcspkr serio_raw 8139cp 8139too parport_pc parport mii ide_cd cdrom ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 0, comm: swapper Not tainted 2.6.18-53.el5 #1 RIP: 0010:[80134673] [80134673] elv_dequeue_request+0x8/0x3c RSP: 0018:8040ddc0 EFLAGS: 00010046 RAX: 0001 RBX: 81011381b398 RCX: RDX: 81011381b398 RSI: 81011381b398 RDI: 81011fb912c0 RBP: 804abe18 R08: 80304108 R09: 0012 R10: 0022 R11: R12: R13: 0001 R14: 0086 R15: 8040deb8 FS: () GS:80396000() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 2ad6f4d0 CR3: 0001126cc000 CR4: 06e0 Process swapper (pid: 0, threadinfo 803c6000, task 802dcae0) Stack: 8000ae3c 804abe18 804abe50 804abd00 0246 8003ba73 8003ba0c 804abe18 81011fbe5800 8000d2a5 81011fb8c5c0 Call Trace: IRQ [8000ae3c] ide_end_request+0xc6/0xfc [8003ba73] ide_dma_intr+0x67/0xab [8003ba0c] ide_dma_intr+0x0/0xab [8000d2a5] ide_intr+0x16f/0x1df [800107a0] handle_IRQ_event+0x29/0x58 [800b5482] __do_IRQ+0xa4/0x105 [8006a3bd] do_IRQ+0xe7/0xf5 [8005b615] ret_from_intr+0x0/0xa [80011ca9] __do_softirq+0x53/0xd5 [8005c2fc] call_softirq+0x1c/0x28 [8006a53a] do_softirq+0x2c/0x85 [80068d0e] default_idle+0x0/0x50 [8005bc8e] apic_timer_interrupt+0x66/0x6c EOI [80068d37] default_idle+0x29/0x50 [80046f8d] cpu_idle+0x95/0xb8 [803d1806] start_kernel+0x220/0x225 [803d1237] _sinittext+0x237/0x23e Code: 0f 0b 68 25 50 29 80 c2 30 02 48 8b 46 08 48 89 42 08 48 89 RIP [80134673] elv_dequeue_request+0x8/0x3c RSP 8040ddc0 0Kernel panic - not syncing: Fatal exception BUG: warning at kernel/panic.c:137/panic() (Not tainted) Call Trace: IRQ [8008ccca] panic+0x1e3/0x1f4 [80196ae8] do_unblank_screen+0x1b/0x132 [800631aa] oops_end+0x51/0x53 [80069689] die+0x3a/0x44 [80069c37] do_invalid_op+0xad/0xb7 [80134673] elv_dequeue_request+0x8/0x3c [80092dd4] do_timer+0x2e8/0x53c [8006c0cc] main_timer_handler+0x23d/0x3f4 [8005bde9] error_exit+0x0/0x84 [80134673]
[PATCH 3/5] Figure out device capability
Try to figure out device capability in update_dev_cap(). Now we are only care about MSI capability. The function pci_find_cap_offset original function wrote by Allen for Xen. Notice the function need root privilege to work. This depends on libpci to work. (Update: Make update_dev_cap() more generic.) Signed-off-by: Allen Kay [EMAIL PROTECTED] Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- qemu/hw/device-assignment.c | 50 +++ qemu/hw/device-assignment.h |5 2 files changed, 55 insertions(+), 0 deletions(-) diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c index 786b2f0..f79cc67 100644 --- a/qemu/hw/device-assignment.c +++ b/qemu/hw/device-assignment.c @@ -216,6 +216,35 @@ static void assigned_dev_ioport_map(PCIDevice *pci_dev, int region_num, (r_dev-v_addrs + region_num)); } +uint8_t pci_find_cap_offset(struct pci_dev *pci_dev, uint8_t cap) +{ +int id; +int max_cap = 48; +int pos = PCI_CAPABILITY_LIST; +int status; + +status = pci_read_byte(pci_dev, PCI_STATUS); +if ((status PCI_STATUS_CAP_LIST) == 0) +return 0; + +while (max_cap--) { +pos = pci_read_byte(pci_dev, pos); +if (pos 0x40) +break; + +pos = ~3; +id = pci_read_byte(pci_dev, pos + PCI_CAP_LIST_ID); + +if (id == 0xff) +break; +if (id == cap) +return pos; + +pos += PCI_CAP_LIST_NEXT; +} +return 0; +} + static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address, uint32_t val, int len) { @@ -367,6 +396,25 @@ static int assigned_dev_register_regions(PCIRegion *io_regions, return 0; } +static void update_dev_cap(AssignedDevice *pci_dev, uint8_t r_bus, + uint8_t r_dev, uint8_t r_func) +{ +struct pci_access *pacc; +struct pci_dev *pdev; +int r; + +pacc = pci_alloc(); +pci_init(pacc); +pdev = pci_get_dev(pacc, 0, r_bus, r_dev, r_func); +pci_cleanup(pacc); +#ifdef KVM_CAP_DEVICE_MSI +r = pci_find_cap_offset(pdev, PCI_CAP_ID_MSI); +if (r) +pci_dev-cap.available |= ASSIGNED_DEVICE_CAP_MSI; +#endif +pci_free_dev(pdev); +} + static int get_real_device(AssignedDevice *pci_dev, uint8_t r_bus, uint8_t r_dev, uint8_t r_func) { @@ -436,6 +484,8 @@ again: fclose(f); dev-region_number = r; + +update_dev_cap(pci_dev, r_bus, r_dev, r_func); return 0; } diff --git a/qemu/hw/device-assignment.h b/qemu/hw/device-assignment.h index d6caa67..de60988 100644 --- a/qemu/hw/device-assignment.h +++ b/qemu/hw/device-assignment.h @@ -29,6 +29,7 @@ #define __DEVICE_ASSIGNMENT_H__ #include sys/mman.h +#include pci/pci.h #include qemu-common.h #include sys-queue.h #include pci.h @@ -80,6 +81,10 @@ typedef struct { unsigned char h_busnr; unsigned int h_devfn; int bound; +struct { +#define ASSIGNED_DEVICE_CAP_MSI (1 0) +int available; +} cap; } AssignedDevice; typedef struct AssignedDevInfo AssignedDevInfo; -- 1.5.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch 4/5] x86_emulator: add the assembler code for three operands
On Tue, 04 Nov 2008 12:21:30 +0200 Avi Kivity [EMAIL PROTECTED] wrote: Guillaume Thouvenin wrote: Add the assembler code for three operands +/* Instruction has three operands */ +/* In the switch we only implement case 4 because we know that for shld instruction + * bytes are equal to 4. When eveything will be fine, we will add others cases. No, shld is defined for 16, 32, and 64 bit operands. Need to implement those too. I tried something like: +/* Instruction has three operands */ +/* In the switch we only implement case 4 because we know that for shld instruction + * bytes are equal to 4. When eveything will be fine, we will add others cases. + */ +#define __emulate_2op_cl(_op,_src,_src2,_dst,_eflags,_by,_bx,_wx,_wy,_lx,_ly,_qx,_qy) \ + do { \ + unsigned long _tmp; \ + \ + switch((_dst).bytes) { \ + case 2: \ + __asm__ __volatile__ ( \ + _PRE_EFLAGS(0, 5, 2) \ + mov %4, %%rcx \n\t \ + _opw %%cl,%3,%1; \n\t \ + _POST_EFLAGS(0, 5, 2) \ + : =m (_eflags), =m ((_dst).val), \ + =r (_tmp) \ + : _wy ((_src).val) , _wy ((_src2).val), i (EFLAGS_MASK) \ + : %rcx ); \ + break; \ + case 4: \ + __asm__ __volatile__ ( \ + _PRE_EFLAGS(0, 5, 2) \ + mov %4, %%rcx \n\t \ + _opl %%cl,%3,%1; \n\t \ + _POST_EFLAGS(0, 5, 2) \ + : =m (_eflags), =m ((_dst).val), \ + =r (_tmp) \ + : _ly ((_src).val) , _ly ((_src2).val), i (EFLAGS_MASK) \ + : %rcx ); \ + break; \ + case 8: \ + __asm__ __volatile__ ( \ + _PRE_EFLAGS(0, 5, 2) \ + mov %4, %%rcx \n\t \ + _opq %%cl,%3,%1; \n\t \ + _POST_EFLAGS(0, 5, 2) \ + : =m (_eflags), =m ((_dst).val), \ + =r (_tmp) \ + : _ly ((_src).val) , _ly ((_src2).val), i (EFLAGS_MASK) \ + : %rcx ); \ + break; \ + } \ + } while (0) + +#define emulate_2op_cl(_op, _src, _src2, _dst, _eflags)\ + __emulate_2op_cl(_op, _src, _src2, _dst, _eflags, \ + b, r, b, r, b, r, b, r) + but it doesn't work because shld can not be used with suffix 'l' or 'w' etc... Is the solution is to have a single case for all operand size like: +#define __emulate_2op_cl(_op,_src,_src2,_dst,_eflags,_wx,_wy) \ + do { \ + unsigned long _tmp; \ + \ + __asm__ __volatile__ ( \ + _PRE_EFLAGS(0, 5, 2) \ +
Re: [PATCH 0 of 2] libcflat test for PowerPC
On Sat, 2008-11-22 at 13:17 -0600, Deepa Srinivasan wrote: Add Hello world test for libcflat. Also, fix CFLAGS issue in config-powerpc.mak. These look good, except int main() should be int main(void). I'll fix myself and commit. Thanks. -- Hollis Blanchard IBM Linux Technology Center -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] kvm: ppc: stop leaking host memory on VM exit
Good catch. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Hollis Blanchard Sent: Tuesday, November 25, 2008 1:38 AM To: Avi Kivity Cc: kvm-ppc; kvm Subject: [PATCH] kvm: ppc: stop leaking host memory on VM exit When the VM exits, we must call put_page() for every page referenced in the shadow TLB. Without this patch, we usually leak 30-50 host pages (120 - 200 KiB with 4 KiB pages). The maximum number of pages leaked is the size of our shadow TLB, 64 pages. Signed-off-by: Hollis Blanchard [EMAIL PROTECTED] --- The obvious question is why didn't we see this before? Basically, we'd never looked for it, and since most of our work was in the kernel we always ended up rebooting before exhausting host memory. Since it's such a large leak, and a simple fix, please commit this for 2.6.28. This patch does apply to kvm.git with fuzz, but if you prefer I can send a separate patch for that later. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html