Question: data consistency on fail-over using shared disk
Hi, We are now checking about what we should do on vm fail-over. Concerning this, does anybody know about any danger about data consistency when we are using shared disk? What I'm concerning is if crashed VM-side host is still holding buffered data, starting a new VM instance on another node may result in file system corruption. This problem may similar to live-migration but little bit different in the sense that VM is crashed -> cannot do anything from that point. How about the combination of old or new guest OS and the following settings? - writethrough - writeback - none If needed, we'll do sync by HA-side scripts before starting a new VM instance. Thanks, Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Freezing Windows 2008 x64bit guest
Gleb Natapov kirjoitti: On Mon, Jul 19, 2010 at 10:17:02AM +0300, Harri Olin wrote: Gleb Natapov kirjoitti: On Thu, Jul 15, 2010 at 03:19:44PM +0200, Christoph Adomeit wrote: But one Windows 2008 64 Bit Server Standard is freezing regularly. This happens sometimes 3 times a day, sometimes it takes 2 days until freeze. The Windows Machine is a clean fresh install. I think I have seen same problem occur on my Windows 2008 SBS SP2 64bit system, but a bit less often, only like once a week. Now I haven't seen crashes but only freezes with qemu on 100% and virtual system unresponsive. qemu command line: /usr/local/qemu-kvm-0.11.1/bin/qemu-system-x86_64 -drive file=/dev/rigelvg/w2k8system,cache=none,boot=on -drive file=/dev/rigelvg/w2k8data,cache=none -m 6144 -vnc :1 -net nic,macaddr=C0:FF:12:FB:AA:01,model=e1000 -net tap -smp 4 -localtime Try with different model then e1000 please. Default driver that comes with Windows known to have problems. Didn't help, changed to default realtek emulation, but system freezed again. Command line: /usr/local/qemu-kvm-0.11.1/bin/qemu-system-x86_64 -drive file=/dev/rigelvg/w2k8system,cache=none,boot=on -drive file=/dev/rigelvg/w2k8data,cache=none -m 6144 -vnc :1 -net nic,macaddr=C0:FF:12:FB:AA:01 -net tap -smp 4 -localtime This time the virtual system was not totally unresponsive somehow, system pinged just fine and I think DNS server worked too. However on console mouse moved but didn't react to clicking, ctrl-alt-del, etc. When hang occurs ensure that problematic vm is the only on on the server and run kvm_stat. Send output here. Also do the following: # mount -t debugfs debugfs /sys/kernel/debug # echo kvm > /sys/kernel/debug/tracing/set_event # sleep 1 # cat /sys/kernel/debug/tracing/trace > /tmp/trace Send /tmp/trace here too, but it may be huge, so send only last 1000 lines. Stats from this hang: log style http://mizar.remote.agasha.com/k/kvm/kvm_stat_2_log.txt kvm statistics efer_reload 0 0 exits 5368720296 21258 fpu_reload 28267507212216 halt_exits 695286653 0 halt_wakeup 685187661 0 host_state_reload 30103996402216 hypercalls 0 0 insn_emulation 27653520326542 insn_emulation_fail477 0 invlpg42417110 204 io_exits 818764026 122 irq_exits1096595729870 irq_injections 7889920324643 irq_window221702771718 largepages 0 0 mmio_exits 1490912756 69 mmu_cache_miss 2471963 0 mmu_flooded 905690 0 mmu_pde_zapped 1821325 0 mmu_pte_updated3256583 0 mmu_pte_write 2896199 0 mmu_recycled259845 0 full trace: http://mizar.remote.agasha.com/k/kvm/kvm_trace_2_full.txt qemu-system-x86-5155 [002] 1248198.487147: kvm_entry: vcpu 3 qemu-system-x86-5152 [000] 1248198.487147: kvm_inj_virq: irq 209 qemu-system-x86-5152 [000] 1248198.487148: kvm_entry: vcpu 0 qemu-system-x86-5154 [001] 1248198.487148: kvm_exit: reason apic_access rip 0xf80001c9df13 qemu-system-x86-5155 [002] 1248198.487148: kvm_exit: reason apic_access rip 0xf80001c9df13 qemu-system-x86-5152 [000] 1248198.487150: kvm_exit: reason apic_access rip 0xf80001c9d5c4 qemu-system-x86-5154 [001] 1248198.487150: kvm_mmio: mmio write len 4 gpa 0xfee000b0 val 0x0 qemu-system-x86-5154 [001] 1248198.487150: kvm_apic: apic_write APIC_EOI = 0x0 qemu-system-x86-5155 [002] 1248198.487150: kvm_mmio: mmio write len 4 gpa 0xfee000b0 val 0x0 qemu-system-x86-5155 [002] 1248198.487151: kvm_apic: apic_write APIC_EOI = 0x0 qemu-system-x86-5154 [001] 1248198.487151: kvm_ack_irq: irqchip IOAPIC pin 2 qemu-system-x86-5155 [002] 1248198.487151: kvm_ack_irq: irqchip IOAPIC pin 2 qemu-system-x86-5152 [000] 1248198.487151: kvm_mmio: mmio write len 4 gpa 0xfee000b0 val 0x0 qemu-system-x86-5152 [000] 1248198.487152: kvm_apic: apic_write APIC_EOI = 0x0 qemu-system-x86-5154 [001] 1248198.487152: kvm_entry: vcpu 2 qemu-system-x86-5155 [002] 1248198.487152: kvm_entry: vcpu 3 qemu-system-x86-5152 [000] 1248198.487152: kvm_ack_irq: irqchip IOAPIC pin 2 qemu-system-x86-5153 [003] 1248198.487153: kvm_exit: reason ext_irq rip 0xf80001d04a00 qemu-system-x86-5152 [000] 1248198.487153: kvm_entry: vcpu 0 qemu-system-x86-5153 [003] 1248198.487154: kvm_inj_virq: irq 209 qemu-system-x86-5153 [003] 1248198.487155: kvm_entry: vcpu 1 qemu-system-x86-5153 [003] 1248198.487157: kvm_exit: reason apic_access rip 0xf80001c9df13 qemu-system-x86-5153 [003] 1248198.487159: kvm_mmio: mmio write len 4 gpa 0xfee000b0 val 0x0 qemu-system-x86-5153 [003] 1248198.487159: kvm_apic: apic_write APIC_EOI = 0x0 qemu-system-x86-5153 [003] 1248198.487160: kvm_ack_irq: irqchip IOAPIC pin 2 qemu-system-x86-5153 [003] 1248198.
Re: [PATCH v2 3/3] KVM: Non-atomic interrupt injection
On 07/21/2010 03:55 AM, Marcelo Tosatti wrote: --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4709,6 +4709,19 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (unlikely(r)) goto out; + inject_pending_event(vcpu); + + /* enable NMI/IRQ window open exits if needed */ + if (vcpu->arch.nmi_pending) + kvm_x86_ops->enable_nmi_window(vcpu); + else if (kvm_cpu_has_interrupt(vcpu) || req_int_win) + kvm_x86_ops->enable_irq_window(vcpu); + + if (kvm_lapic_enabled(vcpu)) { + update_cr8_intercept(vcpu); + kvm_lapic_sync_to_vapic(vcpu); + } + preempt_disable(); kvm_x86_ops->prepare_guest_switch(vcpu); @@ -4727,23 +4740,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) smp_wmb(); local_irq_enable(); preempt_enable(); + kvm_x86_ops->cancel_injection(vcpu); r = 1; goto out; } - inject_pending_event(vcpu); - - /* enable NMI/IRQ window open exits if needed */ - if (vcpu->arch.nmi_pending) - kvm_x86_ops->enable_nmi_window(vcpu); - else if (kvm_cpu_has_interrupt(vcpu) || req_int_win) - kvm_x86_ops->enable_irq_window(vcpu); - - if (kvm_lapic_enabled(vcpu)) { - update_cr8_intercept(vcpu); - kvm_lapic_sync_to_vapic(vcpu); - } - srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); kvm_guest_enter(); This breaks int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu) { struct kvm_lapic *apic = vcpu->arch.apic; int highest_irr; /* This may race with setting of irr in __apic_accept_irq() and * value returned may be wrong, but kvm_vcpu_kick() in * __apic_accept_irq * will cause vmexit immediately and the value will be * recalculated * on the next vmentry. */ (also valid for nmi_pending and PIC). Can't simply move atomic_set(guest_mode, 1) in preemptible section as that would make it possible for kvm_vcpu_kick to IPI stale vcpu->cpu. Right. Can fix by adding a kvm_make_request() to force the retry loop. Also should undo vmx.rmode.* ? Elaborate? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] device-assignment: Use PCI I/O port sysfs resource file when available
When supported by the host kernel, we can use read/write on the PCI sysfs resource file for I/O port regions. This allows us to avoid raw in/out commands and works with deprivileged guests via libvirt. For uid 0 callers, we use in/out directly to avoid any compatibility issues. Signed-off-by: Alex Williamson --- Required kernel patch pending here: http://www.spinics.net/lists/linux-pci/msg09389.html v2: Drop getuid() since it doesn't guarantee permissions Don't use in/out as a fallback since we don't have permissions Consolidate ioport read/write functions hw/device-assignment.c | 205 hw/device-assignment.h |1 2 files changed, 120 insertions(+), 86 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 2bba22f..2e141ac 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -62,93 +62,100 @@ static void assigned_dev_load_option_rom(AssignedDevice *dev); static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev); -static uint32_t guest_to_host_ioport(AssignedDevRegion *region, uint32_t addr) +static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region, + uint32_t addr, int len, uint32_t *val) { -return region->u.r_baseport + (addr - region->e_physbase); +uint32_t ret = 0; +uint32_t offset = addr - dev_region->e_physbase; +int fd = dev_region->region->resource_fd; + +if (fd >= 0) { +if (val) { +DEBUG("pwrite val=%x, len=%d, e_phys=%x, offset=%x\n", + *val, len, addr, offset); +if (pwrite(fd, val, len, offset) != len) { +fprintf(stderr, "%s - pwrite failed %s\n", +__func__, strerror(errno)); +} +} else { +if (pread(fd, &ret, len, offset) != len) { +fprintf(stderr, "%s - pread failed %s\n", +__func__, strerror(errno)); +ret = (1UL << (len * 8)) - 1; +} +DEBUG("pread ret=%x, len=%d, e_phys=%x, offset=%x\n", + ret, len, addr, offset); +} +} else { +uint32_t port = offset + dev_region->u.r_baseport; + +if (val) { +DEBUG("out val=%x, len=%d, e_phys=%x, host=%x\n", + *val, len, addr, port); +switch (len) { +case 1: +outb(*val, port); +break; +case 2: +outw(*val, port); +break; +case 4: +outl(*val, port); +break; +} +} else { +switch (len) { +case 1: +ret = inb(port); +break; +case 2: +ret = inw(port); +break; +case 4: +ret = inl(port); +break; +} +DEBUG("in val=%x, len=%d, e_phys=%x, host=%x\n", + ret, len, addr, port); +} +} +return ret; } static void assigned_dev_ioport_writeb(void *opaque, uint32_t addr, uint32_t value) { -AssignedDevRegion *r_access = opaque; -uint32_t r_pio = guest_to_host_ioport(r_access, addr); - -DEBUG("r_pio=%08x e_physbase=%08x r_baseport=%08lx value=%08x\n", - r_pio, (int)r_access->e_physbase, - (unsigned long)r_access->u.r_baseport, value); - -outb(value, r_pio); +assigned_dev_ioport_rw(opaque, addr, 1, &value); +return; } static void assigned_dev_ioport_writew(void *opaque, uint32_t addr, uint32_t value) { -AssignedDevRegion *r_access = opaque; -uint32_t r_pio = guest_to_host_ioport(r_access, addr); - -DEBUG("r_pio=%08x e_physbase=%08x r_baseport=%08lx value=%08x\n", - r_pio, (int)r_access->e_physbase, - (unsigned long)r_access->u.r_baseport, value); - -outw(value, r_pio); +assigned_dev_ioport_rw(opaque, addr, 2, &value); +return; } static void assigned_dev_ioport_writel(void *opaque, uint32_t addr, uint32_t value) { -AssignedDevRegion *r_access = opaque; -uint32_t r_pio = guest_to_host_ioport(r_access, addr); - -DEBUG("r_pio=%08x e_physbase=%08x r_baseport=%08lx value=%08x\n", - r_pio, (int)r_access->e_physbase, - (unsigned long)r_access->u.r_baseport, value); - -outl(value, r_pio); +assigned_dev_ioport_rw(opaque, addr, 4, &value); +return; } static uint32_t assigned_dev_ioport_readb(void *opaque, uint32_t addr) { -AssignedDevRegion *r_access = opaque; -uint32_t r_pio = guest_to_host_ioport(r_access, addr); -uint32_t value; - -value = inb(r_pio); - -DEBUG("r_pio=%08x e_physbase=%08x r_=%08lx value=%08x\n", - r_pio, (int)r_
Re: [BUG?] vhost assert error with < 4GB of RAM
On Tue, Jul 20, 2010 at 02:42:19PM -0600, Cam Macdonell wrote: > I think I've found a bug when running a guest with vhost with less > than 4GB of RAM. > > If a guest has less than 4GB of RAM, then above_4g_mem_size is 0 for > this call to cpu_register_physical_memory() in pc_memory_init() from > hw/pc.c:922 > > #if TARGET_PHYS_ADDR_BITS > 32 > cpu_register_physical_memory(0x1ULL, above_4g_mem_size, > ram_addr + below_4g_mem_size); > #endif Yes, the fix is in qemu already, it's a matter of merging into qemu-kvm. > this leads to vhost_client_set_memory being called with size == 0 > > #3 0x004301f3 in vhost_client_set_memory (client=0x113b010, > start_addr=4294967296, size=0, phys_offset=3221225472) > at /home/cam/research/KVM/qemu-kvm/hw/vhost.c:312 > > which trips the assert at hw/vhost.c:312 > > static void vhost_client_set_memory(CPUPhysMemoryClient *client, > target_phys_addr_t start_addr, > ram_addr_t size, > ram_addr_t phys_offset) > { > > .. > > assert(size); > ... > > something like the following fixes the problem but I'm not sure if > it's the proper way to handle it. > > diff --git a/exec.c b/exec.c > index 5e9a5b7..991abfc 100644 > --- a/exec.c > +++ b/exec.c > @@ -2592,7 +2592,9 @@ void > cpu_register_physical_memory_offset(target_phys_addr_t start_addr, > ram_addr_t orig_size = size; > subpage_t *subpage; > > -cpu_notify_set_memory(start_addr, size, phys_offset); > +if (size > 0) { > +cpu_notify_set_memory(start_addr, size, phys_offset); > +} > > if (phys_offset == IO_MEM_UNASSIGNED) { > region_offset = start_addr; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/3] KVM: Non-atomic interrupt injection
On Tue, Jul 20, 2010 at 04:17:07PM +0300, Avi Kivity wrote: > Change the interrupt injection code to work from preemptible, interrupts > enabled context. This works by adding a ->cancel_injection() operation > that undoes an injection in case we were not able to actually enter the guest > (this condition could never happen with atomic injection). > > Signed-off-by: Avi Kivity > --- > arch/x86/include/asm/kvm_host.h |1 + > arch/x86/kvm/svm.c | 12 > arch/x86/kvm/vmx.c | 11 +++ > arch/x86/kvm/x86.c | 27 ++- > 4 files changed, 38 insertions(+), 13 deletions(-) > > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -4709,6 +4709,19 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > if (unlikely(r)) > goto out; > > + inject_pending_event(vcpu); > + > + /* enable NMI/IRQ window open exits if needed */ > + if (vcpu->arch.nmi_pending) > + kvm_x86_ops->enable_nmi_window(vcpu); > + else if (kvm_cpu_has_interrupt(vcpu) || req_int_win) > + kvm_x86_ops->enable_irq_window(vcpu); > + > + if (kvm_lapic_enabled(vcpu)) { > + update_cr8_intercept(vcpu); > + kvm_lapic_sync_to_vapic(vcpu); > + } > + > preempt_disable(); > > kvm_x86_ops->prepare_guest_switch(vcpu); > @@ -4727,23 +4740,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > smp_wmb(); > local_irq_enable(); > preempt_enable(); > + kvm_x86_ops->cancel_injection(vcpu); > r = 1; > goto out; > } > > - inject_pending_event(vcpu); > - > - /* enable NMI/IRQ window open exits if needed */ > - if (vcpu->arch.nmi_pending) > - kvm_x86_ops->enable_nmi_window(vcpu); > - else if (kvm_cpu_has_interrupt(vcpu) || req_int_win) > - kvm_x86_ops->enable_irq_window(vcpu); > - > - if (kvm_lapic_enabled(vcpu)) { > - update_cr8_intercept(vcpu); > - kvm_lapic_sync_to_vapic(vcpu); > - } > - > srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); > > kvm_guest_enter(); This breaks int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu) { struct kvm_lapic *apic = vcpu->arch.apic; int highest_irr; /* This may race with setting of irr in __apic_accept_irq() and * value returned may be wrong, but kvm_vcpu_kick() in * __apic_accept_irq * will cause vmexit immediately and the value will be * recalculated * on the next vmentry. */ (also valid for nmi_pending and PIC). Can't simply move atomic_set(guest_mode, 1) in preemptible section as that would make it possible for kvm_vcpu_kick to IPI stale vcpu->cpu. Also should undo vmx.rmode.* ? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/6] KVM: MMU: fix forgot reserved bits check in speculative path
Xiao Guangrong wrote: > In the speculative path, we should check guest pte's reserved bits just as > the real processor does > Ping..? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] device-assignment: Use PCI I/O port sysfs resource file when available
* Alex Williamson (alex.william...@redhat.com) wrote: > When supported by the host kernel, we can use read/write on the > PCI sysfs resource file for I/O port regions. This allows us to > avoid raw in/out commands and works with deprivileged guests via > libvirt. For uid 0 callers, we use in/out directly to avoid any > compatibility issues. won't uid 0 test will fail if libvirt launches qemu with user set to root (capabilities still get dropped)? thanks, -chris -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] device-assignment: Use PCI I/O port sysfs resource file when available
When supported by the host kernel, we can use read/write on the PCI sysfs resource file for I/O port regions. This allows us to avoid raw in/out commands and works with deprivileged guests via libvirt. For uid 0 callers, we use in/out directly to avoid any compatibility issues. Signed-off-by: Alex Williamson --- Required kernel patch pending here: http://www.spinics.net/lists/linux-pci/msg09389.html hw/device-assignment.c | 131 hw/device-assignment.h |1 2 files changed, 99 insertions(+), 33 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 2bba22f..37c1278 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -67,6 +67,28 @@ static uint32_t guest_to_host_ioport(AssignedDevRegion *region, uint32_t addr) return region->u.r_baseport + (addr - region->e_physbase); } +static int assigned_dev_ioport_rw(AssignedDevRegion *dev_region, + uint32_t addr, int len, uint32_t *val, + int write) +{ +if (dev_region->region->resource_fd == -1) +return -1; + +if (write) { +if (pwrite(dev_region->region->resource_fd, val, len, + (addr - dev_region->e_physbase)) != len) { +return -1; +} +} else { +if (pread(dev_region->region->resource_fd, val, len, + (addr - dev_region->e_physbase)) != len) { +return -1; +} +} + +return 0; +} + static void assigned_dev_ioport_writeb(void *opaque, uint32_t addr, uint32_t value) { @@ -77,7 +99,9 @@ static void assigned_dev_ioport_writeb(void *opaque, uint32_t addr, r_pio, (int)r_access->e_physbase, (unsigned long)r_access->u.r_baseport, value); -outb(value, r_pio); +if (assigned_dev_ioport_rw(r_access, addr, 1, &value, 1) != 0) { +outb(value, r_pio); +} } static void assigned_dev_ioport_writew(void *opaque, uint32_t addr, @@ -90,7 +114,9 @@ static void assigned_dev_ioport_writew(void *opaque, uint32_t addr, r_pio, (int)r_access->e_physbase, (unsigned long)r_access->u.r_baseport, value); -outw(value, r_pio); +if (assigned_dev_ioport_rw(r_access, addr, 2, &value, 1) != 0) { +outw(value, r_pio); +} } static void assigned_dev_ioport_writel(void *opaque, uint32_t addr, @@ -103,7 +129,9 @@ static void assigned_dev_ioport_writel(void *opaque, uint32_t addr, r_pio, (int)r_access->e_physbase, (unsigned long)r_access->u.r_baseport, value); -outl(value, r_pio); +if (assigned_dev_ioport_rw(r_access, addr, 4, &value, 1) != 0) { +outl(value, r_pio); +} } static uint32_t assigned_dev_ioport_readb(void *opaque, uint32_t addr) @@ -112,7 +140,9 @@ static uint32_t assigned_dev_ioport_readb(void *opaque, uint32_t addr) uint32_t r_pio = guest_to_host_ioport(r_access, addr); uint32_t value; -value = inb(r_pio); +if (assigned_dev_ioport_rw(r_access, addr, 1, &value, 0) != 0) { +value = inb(r_pio); +} DEBUG("r_pio=%08x e_physbase=%08x r_=%08lx value=%08x\n", r_pio, (int)r_access->e_physbase, @@ -127,7 +157,9 @@ static uint32_t assigned_dev_ioport_readw(void *opaque, uint32_t addr) uint32_t r_pio = guest_to_host_ioport(r_access, addr); uint32_t value; -value = inw(r_pio); +if (assigned_dev_ioport_rw(r_access, addr, 2, &value, 0) != 0) { +value = inw(r_pio); +} DEBUG("r_pio=%08x e_physbase=%08x r_baseport=%08lx value=%08x\n", r_pio, (int)r_access->e_physbase, @@ -142,7 +174,9 @@ static uint32_t assigned_dev_ioport_readl(void *opaque, uint32_t addr) uint32_t r_pio = guest_to_host_ioport(r_access, addr); uint32_t value; -value = inl(r_pio); +if (assigned_dev_ioport_rw(r_access, addr, 4, &value, 0) != 0) { +value = inl(r_pio); +} DEBUG("r_pio=%08x e_physbase=%08x r_baseport=%08lx value=%08x\n", r_pio, (int)r_access->e_physbase, @@ -305,7 +339,7 @@ static void assigned_dev_ioport_map(PCIDevice *pci_dev, int region_num, DEBUG("e_phys=0x%" FMT_PCIBUS " r_baseport=%x type=0x%x len=%" FMT_PCIBUS " region_num=%d \n", addr, region->u.r_baseport, type, size, region_num); -if (first_map) { +if (first_map && region->region->resource_fd < 0) { struct ioperm_data *data; data = qemu_mallocz(sizeof(struct ioperm_data)); @@ -586,19 +620,46 @@ static int assigned_dev_register_regions(PCIRegion *io_regions, slow_map ? assigned_dev_iomem_map_slow : assigned_dev_iomem_map); continue; +} else { +/* handle port io regions */ +uint32_t val; +int ret; + +/* Test kernel support for ioport resource read/write. Old + * kerne
Re: [PATCH 04/18] Make cpu_tsc_khz updates use local CPU
On 07/19/2010 10:53 PM, Avi Kivity wrote: On 07/19/2010 11:06 PM, Zachary Amsden wrote: +static void tsc_khz_changed(void *data) { -/* nothing */ +struct cpufreq_freqs *freq = data; +unsigned long khz = 0; + +if (data) +khz = freq->new; +else if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) +khz = cpufreq_quick_get(raw_smp_processor_id()); +if (!khz) +khz = tsc_khz; +__get_cpu_var(cpu_tsc_khz) = khz; } Do we really need to cache cpufreq_quick_get()? If it's really quick, why not just use it everywhere instead of cacheing it? Not a comment on this patch. If cpufreq is compiled in, but disabled, it returns zero, so we need some sort of logic. Maybe it's better to put it into cpufreq_quick_get(). Inconsistent APIs that appear to work are bad. I don't think it's quite so simple; cpufreq is platform independent and tsc_khz is a platform specific export. It seems cpufreq is designed to return zero when disabled and we're the unusual ones for wanting to use it. Zach -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[BUG?] vhost assert error with < 4GB of RAM
I think I've found a bug when running a guest with vhost with less than 4GB of RAM. If a guest has less than 4GB of RAM, then above_4g_mem_size is 0 for this call to cpu_register_physical_memory() in pc_memory_init() from hw/pc.c:922 #if TARGET_PHYS_ADDR_BITS > 32 cpu_register_physical_memory(0x1ULL, above_4g_mem_size, ram_addr + below_4g_mem_size); #endif this leads to vhost_client_set_memory being called with size == 0 #3 0x004301f3 in vhost_client_set_memory (client=0x113b010, start_addr=4294967296, size=0, phys_offset=3221225472) at /home/cam/research/KVM/qemu-kvm/hw/vhost.c:312 which trips the assert at hw/vhost.c:312 static void vhost_client_set_memory(CPUPhysMemoryClient *client, target_phys_addr_t start_addr, ram_addr_t size, ram_addr_t phys_offset) { .. assert(size); ... something like the following fixes the problem but I'm not sure if it's the proper way to handle it. diff --git a/exec.c b/exec.c index 5e9a5b7..991abfc 100644 --- a/exec.c +++ b/exec.c @@ -2592,7 +2592,9 @@ void cpu_register_physical_memory_offset(target_phys_addr_t start_addr, ram_addr_t orig_size = size; subpage_t *subpage; -cpu_notify_set_memory(start_addr, size, phys_offset); +if (size > 0) { +cpu_notify_set_memory(start_addr, size, phys_offset); +} if (phys_offset == IO_MEM_UNASSIGNED) { region_offset = start_addr; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers
On Sat, Jul 17, 2010 at 10:45:23AM +0200, Piotr Jaroszy??ski wrote: > On 16 July 2010 23:58, Tom Lyon wrote: > > The VFIO "driver" is used to allow privileged AND non-privileged processes > > to > > implement user-level device drivers for any well-behaved PCI, PCI-X, and > > PCIe > > devices. > > Thanks for working on that! I wonder whether it's possible to say what > are the chances of it being merged to mainline and which version we > might be talking about? We still have a long way to go before you need to worry about what kernel version it's going to show up in... thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Swap usage with KVM
> Yes, we are using Virtio drivers for networking and storage in both VMs > with cache=none. Both VMs are running Linux 2.6.32-bpo.5-amd64 from > Lenny Backports repositories. For VMHost, we are using a stable version > of KVM with Linux 2.6.32.12 compiled from source code of kernel.org and > qemu-kvm 0.12.3 compiled with the source code obtained from the official > site of KVM. > Afaik this should be this bug http://sourceforge.net/tracker/?func=detail&atid=893831&aid=2989366&group_id=180599 try upgrading to 0.12.4 or backport this commit http://git.kernel.org/?p=virt/kvm/qemu- kvm.git;a=commit;h=012d4869c1eb195e83f159ed7b2bced33f37f960 David -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes for July 20
On 07/20/2010 11:29 AM, Aurelien Jarno wrote: It's a pitty I can't easily attend to this conference call, as it seems a lot of decisions are taken there. Anyway let me comment the part concerning 0.12 stable: Is it a matter of time zone or conflict? The call has historically been centered around KVM issues but these days it's hard to make such a clear distinction.. On Tue, Jul 20, 2010 at 07:45:51AM -0700, Chris Wright wrote: 0.12.stable - start w/ git tree + pull requests - release process is separate from commit access - justin will put up a tree for pull requests - there's current backlog, what about that? I think someone should actively follow the patches committed to HEAD and backport them when they seems to be stable material. I guess it's what's Justin plans to do. OTOH, it might be useful if people sending patches to HEAD adds a small comment about cherry-picking the patch to stable if it applies. My big concern with -stable is testing. For folks interested in helping out, what I'd really like to see is people explicitly testing their patches on -stable. IOW, just saying "this is probably stable material" is not nearly as helpful as saying, "I've verified this cherry picks cleanly to stable and tested there." - anthony's concern with -stable is the testing (upstream tree gets more testing than -stable) Debian gets regular uploads with the contents of the -stable tree between to releases. Also patches from trunk are all cherry-picked from HEAD. That's good to know. My main point was that proportionately speaking, the master branch gets considerably more testing than the stable branch. Considering that there is a higher expectation of stable too, the testing requirement for it is pretty high in my opinion. Regards, Anthony Liguori - 0.12.5? - planning to do next w/ 0.13 release - aurelien may cut a release Following the minutes from last week, I sent a call for release, with a deadline today. I only got the patch series from Kevin. There are currently 44 patches waiting in the stable tree, so I guess we can go for a release. I plan to do that later this week if nobody opposes. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call minutes for July 20
On 07/20/10 08:45, Chris Wright wrote: > 0.13 > - rc RSN (hopefully this week, top priority for anthony) Can Cam's inter-vm shared memory device get committed for 0.13? It's been stagnant on the list for a while now waiting for inclusion (or NAK comments). David -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes for July 20
It's a pitty I can't easily attend to this conference call, as it seems a lot of decisions are taken there. Anyway let me comment the part concerning 0.12 stable: On Tue, Jul 20, 2010 at 07:45:51AM -0700, Chris Wright wrote: > 0.12.stable > - start w/ git tree + pull requests > - release process is separate from commit access > - justin will put up a tree for pull requests > - there's current backlog, what about that? I think someone should actively follow the patches committed to HEAD and backport them when they seems to be stable material. I guess it's what's Justin plans to do. OTOH, it might be useful if people sending patches to HEAD adds a small comment about cherry-picking the patch to stable if it applies. > - anthony's concern with -stable is the testing (upstream tree gets more > testing than -stable) Debian gets regular uploads with the contents of the -stable tree between to releases. Also patches from trunk are all cherry-picked from HEAD. > - 0.12.5? > - planning to do next w/ 0.13 release > - aurelien may cut a release Following the minutes from last week, I sent a call for release, with a deadline today. I only got the patch series from Kevin. There are currently 44 patches waiting in the stable tree, so I guess we can go for a release. I plan to do that later this week if nobody opposes. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC PATCH 01/14] KVM-test: Add a new macaddress pool algorithm
On 07/20/2010 04:44 PM, Amos Kong wrote: > On Tue, Jul 20, 2010 at 01:19:39PM +0300, Michael Goldish wrote: >> > > Michael, > > Thanks for your comments. Let's simplify this method together. > >> On 07/20/2010 04:34 AM, Amos Kong wrote: >>> Old method uses the mac address in the configuration files which could >>> lead serious problem when multiple tests running in different hosts. >>> >>> This patch adds a new macaddress pool algorithm, it generates the mac prefix >>> based on mac address of the host which could eliminate the duplicated mac >>> addresses between machines. >>> >>> When user have set the mac_prefix in the configuration file, we should use >>> it >>> in stead of the dynamic generated mac prefix. >>> >>> Other change: >>> . Fix randomly generating mac address so that it correspond to IEEE802. >>> . Update clone function to decide clone mac address or not. >>> . Update get_macaddr function. >>> . Add set_mac_address function. >>> >>> New auto mac address pool algorithm: >>> If address_index is defined, VM will get mac from config file then record >>> mac >>> in to address_pool. If address_index is not defined, VM will call >>> get_mac_from_pool to auto create mac then recored mac to address_pool in >>> following format: >>> {'macpool': {'AE:9D:94:6A:9b:f9': ['20100310-165222-Wt7l:0']}} >>> >>> AE:9D:94:6A:9b:f9: mac address >>> 20100310-165222-Wt7l : instance attribute of VM >>> 0: index of NIC >> >> Why do you use the mac address as a key, instead of the instance string >> + nic index? When the mac address is used as a key, each key has a list >> of values instead of just one value. This order seems unnatural. If it >> were the other way around (i.e. key = VM instance + nic index, value = >> mac address), then each key would have exactly one value, and I think >> this patch would be shorter and simpler. > > One mac address may be used by two VMs, eg. migration. Sure, that's why I thought the opposite direction would be better: keys = VMs (nics), values = mac addresses. That way we have one value per key, instead of a list of values per key. To clarify, instead of using: {'AE:9D:94:6A:9b:f9': ['20100310-165222-Wt7l:0', '20100310-165222-Wt7l:1', '20100310-165222-Wt7l:2']} I suggest: {'20100310-165222-Wt7l:0': 'AE:9D:94:6A:9b:f9', '20100310-165222-Wt7l:1': 'AE:9D:94:6A:9b:f9', '20100310-165222-Wt7l:2': 'AE:9D:94:6A:9b:f9'} >>> Signed-off-by: Jason Wang >>> Signed-off-by: Feng Yang >>> Signed-off-by: Amos Kong >>> --- >>> 0 files changed, 0 insertions(+), 0 deletions(-) >>> >>> diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py >>> index fb2d1c2..7c0946e 100644 >>> --- a/client/tests/kvm/kvm_utils.py >>> +++ b/client/tests/kvm/kvm_utils.py >>> @@ -5,6 +5,7 @@ KVM test utility functions. >>> """ >>> >>> import time, string, random, socket, os, signal, re, logging, commands, >>> cPickle >>> +import fcntl, shelve >>> from autotest_lib.client.bin import utils >>> from autotest_lib.client.common_lib import error, logging_config >>> import kvm_subprocess >>> @@ -82,6 +83,104 @@ def get_sub_dict_names(dict, keyword): >>> >>> # Functions related to MAC/IP addresses >>> >>> +def get_mac_from_pool(root_dir, vm, nic_index, prefix='00:11:22:33:'): >> >> The name of this function is confusing because it does the exact >> opposite: it puts a mac address in address_pool. Maybe the pool you're >> referring to in the name isn't address_pool, but still a less confusing >> name should probably be used. > > How about allocate_mac(...) ? > address_pool -> address_container > > Allocate mac address and record into address_container. Yes, something like that, sounds less confusing. >>> +""" >>> +random generated mac address. >>> + >>> +1) First try to generate macaddress based on the mac address prefix. >>> +2) And then try to use total random generated mac address. >>> + >>> +@param root_dir: Root dir for kvm >>> +@param vm: Here we use instance of vm >>> +@param nic_index: The index of nic. >>> +@param prefix: Prefix of mac address. >>> +@Return: Return mac address. >>> +""" >>> + >>> +lock_filename = os.path.join(root_dir, "mac_lock") >>> +lock_file = open(lock_filename, 'w') >>> +fcntl.lockf(lock_file.fileno() ,fcntl.LOCK_EX) >>> +mac_filename = os.path.join(root_dir, "address_pool") >> >> Maybe it makes sense to put address_pool and the lock file in /tmp, >> where they can be shared by more than a single autotest instance running >> on the same host (unlikely, but theoretically possible). > > good idea. > >>> +mac_shelve = shelve.open(mac_filename, writeback=False) >>> + >>> +mac_pool = mac_shelve.get("macpool") >> >> Why is this 'macpool' needed? Why not put the keys directly in the >> shelve object? > > yes, put keys directly in the shelve object is better. > >>> +if not mac_pool: >>> +mac_pool = {} >>> +found = False >>> + >>> +v
Re: Swap usage with KVM
On Sunday, 11 July 2010 19:08:58 -0300, Daniel Bareiro wrote: > > > I have an installation with Debian GNU/Linux 5.0.4 amd64 with > > > qemu-kvm 0.12.3 compiled with the source code obtained from the > > > official site of KVM and Linux 2.6.32.12 compiled from source code > > > of kernel.org. All this is installed on an HP Proliant DL380 G6 > > > with two Xeon E5530 quadcore processors and 16 GiB of RAM which > > > has two VMs with the following configuration of memory: > > Are you using virtio drivers in the VMs? > > > > There was an issue with KVM-72 and virtio that leaks memory in the > > host until all RAM and swap is used (inside the VMs, no swap is > > used). It was supposed to be fixed in KVM-80-something, though. > > > > Perhaps something similar is happening again? If you switch the > > disks to scsi instead of virtio, does the problem go away? > > > > We are running KVM-72 on Debian 5.0 and have run into this issue. > > We'll be upgrading our hosts this month to fix this. > Yes, we are using Virtio drivers for networking and storage in both > VMs with cache=none. Both VMs are running Linux 2.6.32-bpo.5-amd64 > from Lenny Backports repositories. For VMHost, we are using a stable > version of KVM with Linux 2.6.32.12 compiled from source code of > kernel.org and qemu-kvm 0.12.3 compiled with the source code obtained > from the official site of KVM. > > This is the syntax I'm using to boot the virtual machines: > > > 8587 ?Sl 6515:25 /usr/local/qemu-kvm/bin/qemu-system-x86_64 -drive > file=/dev/vm/aps4-raiz,cache=none,if=virtio,boot=on -drive > file=/dev/vm/aps4-cache,cache=none,if=virtio -drive > file=/dev/vm/aps4-index,cache=none,if=virtio > -drive file=/dev/vm/aps4-space,cache=none,if=virtio -m 7168 -smp 4 -net > nic,model=virtio,macaddr=00:16:3e:00:00:95 -net tap -daemonize -vnc :3 -k es > -localtime -monitor > telnet:localhost:4003,server,nowait -serial > telnet:localhost:4043,server,nowait > > 9769 ?Rl 11968:47 /usr/local/qemu-kvm/bin/qemu-system-x86_64 -drive > file=/dev/vm/leela-raiz,cache=none,if=virtio,boot=on -drive > file=/dev/vm/leela-u01,cache=none,if=virtio -drive > file=/dev/vm/leela-u02,cache=none,if=virtio > -drive file=/dev/vm/leela-u03,cache=none,if=virtio -drive > file=/dev/vm/leela-u04,cache=none,if=virtio -drive > file=/dev/vm/leela-u05,cache=none,if=virtio > -drive file=/dev/vm/leela-u06,cache=none,if=virtio -drive > file=/dev/vm/leela-u07,cache=none,if=virtio -drive > file=/dev/vm/leela-u08,cache=none,if=virtio > -drive file=/dev/vm/leela-u09,cache=none,if=virtio -drive > file=/dev/vm/leela-space,cache=none,if=virtio -m 7168 -smp 8 -net > nic,model=virtio,macaddr=00:16:3e:00:00:96 -net tap -daemonize -vnc :4 -k es > -localtime -monitor > telnet:localhost:4004,server,nowait -serial > telnet:localhost:4044,server,nowait > To make the switch from Virtio to SCSI I would have to shut down the > hosts, which would not be a good idea whereas are two productive > systems. At least, before doing so I would be sure of what might be > the problem. > > Taking a current measurement in VMHost with free, I got the following: > > > ss04:~# free > total used free sharedbuffers cached > Mem: 16461588 16406504 55084 0 2920 21504 > -/+ buffers/cache: 16382080 79508 > Swap: 2028492 9831401045352 > > > It draws attention to me that thinking about initially leaving a margin > of 2 GB of RAM for the VMHost, already it has used almost half of swap. This is a current measurement I've taken in both the VMs and in VMHost: * VMHost: ss04:~# free total used free sharedbuffers cached Mem: 16461588 16405140 56448 0 3496 18604 -/+ buffers/cache: 16383040 78548 Swap: 517422024015522772668 * Aps4: aps4:~# free total used free sharedbuffers cached Mem: 71643007120192 44108 0 23108 239076 -/+ buffers/cache:6858008 306292 Swap: 2931820 140842917736 * Leela: leela:~# free total used free sharedbuffers cached Mem: 71638366905224 258612 0 1233806282816 -/+ buffers/cache: 4990286664808 Swap: 979924 35640 944284 As you can see, I added more swap in VMHost for more margin, but currently only 54% is free. Thanks in advance for your replies. Regards, Daniel -- Fingerprint: BFB3 08D6 B4D1 31B2 72B9 29CE 6696 BF1B 14E6 1D37 Powered by Debian GNU/Linux Lenny - Linux user #188.598 signature.asc Description: Digital signature
KVM call minutes for July 20
0.12.stable - start w/ git tree + pull requests - release process is separate from commit access - justin will put up a tree for pull requests - there's current backlog, what about that? - anthony's concern with -stable is the testing (upstream tree gets more testing than -stable) - 0.12.5? - planning to do next w/ 0.13 release - aurelien may cut a release - justin will do some sanity testing, most patches are in fedora anyway 0.13 - rc RSN (hopefully this week, top priority for anthony) kvm testsuite - was planning to clean up and contribute to qemu - now thinking perhaps just split it out to its own repo - not really qemu code, not really kvm code, not cross compile, etc.. - could use std serial device - could use vga (needs mmio space) - - would like to add nested svm and (more important) nested vmx - small bit to copy l1 to l2 state, to make guest nested - need framework, can then require nested patches come w/ regression tests - current testsuite failing on qemu (shows softmmu issues, any takers?) fw_cfg issues - mostly on list - concerns about dma interface (too close to use case specific hack) - rep could be optimized in general - each byte == function call - possible pull in 4k (instead of 1k) on each exit - bar for changes should be no new interfaces -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: does sidt get correct start address of IDT in guest?
On 07/20/2010 05:04 PM, 吴忠远 wrote: in guest os , a module with sidt instruction was execution to get start address of IDT.does this return the correct address of IDT in guest OS? thanks. Yes. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
does sidt get correct start address of IDT in guest?
in guest os , a module with sidt instruction was execution to get start address of IDT.does this return the correct address of IDT in guest OS? thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC PATCH 01/14] KVM-test: Add a new macaddress pool algorithm
On Tue, Jul 20, 2010 at 01:19:39PM +0300, Michael Goldish wrote: > Michael, Thanks for your comments. Let's simplify this method together. > On 07/20/2010 04:34 AM, Amos Kong wrote: > > Old method uses the mac address in the configuration files which could > > lead serious problem when multiple tests running in different hosts. > > > > This patch adds a new macaddress pool algorithm, it generates the mac prefix > > based on mac address of the host which could eliminate the duplicated mac > > addresses between machines. > > > > When user have set the mac_prefix in the configuration file, we should use > > it > > in stead of the dynamic generated mac prefix. > > > > Other change: > > . Fix randomly generating mac address so that it correspond to IEEE802. > > . Update clone function to decide clone mac address or not. > > . Update get_macaddr function. > > . Add set_mac_address function. > > > > New auto mac address pool algorithm: > > If address_index is defined, VM will get mac from config file then record > > mac > > in to address_pool. If address_index is not defined, VM will call > > get_mac_from_pool to auto create mac then recored mac to address_pool in > > following format: > > {'macpool': {'AE:9D:94:6A:9b:f9': ['20100310-165222-Wt7l:0']}} > > > > AE:9D:94:6A:9b:f9: mac address > > 20100310-165222-Wt7l : instance attribute of VM > > 0: index of NIC > > Why do you use the mac address as a key, instead of the instance string > + nic index? When the mac address is used as a key, each key has a list > of values instead of just one value. This order seems unnatural. If it > were the other way around (i.e. key = VM instance + nic index, value = > mac address), then each key would have exactly one value, and I think > this patch would be shorter and simpler. One mac address may be used by two VMs, eg. migration. > > Signed-off-by: Jason Wang > > Signed-off-by: Feng Yang > > Signed-off-by: Amos Kong > > --- > > 0 files changed, 0 insertions(+), 0 deletions(-) > > > > diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py > > index fb2d1c2..7c0946e 100644 > > --- a/client/tests/kvm/kvm_utils.py > > +++ b/client/tests/kvm/kvm_utils.py > > @@ -5,6 +5,7 @@ KVM test utility functions. > > """ > > > > import time, string, random, socket, os, signal, re, logging, commands, > > cPickle > > +import fcntl, shelve > > from autotest_lib.client.bin import utils > > from autotest_lib.client.common_lib import error, logging_config > > import kvm_subprocess > > @@ -82,6 +83,104 @@ def get_sub_dict_names(dict, keyword): > > > > # Functions related to MAC/IP addresses > > > > +def get_mac_from_pool(root_dir, vm, nic_index, prefix='00:11:22:33:'): > > The name of this function is confusing because it does the exact > opposite: it puts a mac address in address_pool. Maybe the pool you're > referring to in the name isn't address_pool, but still a less confusing > name should probably be used. How about allocate_mac(...) ? address_pool -> address_container Allocate mac address and record into address_container. > > +""" > > +random generated mac address. > > + > > +1) First try to generate macaddress based on the mac address prefix. > > +2) And then try to use total random generated mac address. > > + > > +@param root_dir: Root dir for kvm > > +@param vm: Here we use instance of vm > > +@param nic_index: The index of nic. > > +@param prefix: Prefix of mac address. > > +@Return: Return mac address. > > +""" > > + > > +lock_filename = os.path.join(root_dir, "mac_lock") > > +lock_file = open(lock_filename, 'w') > > +fcntl.lockf(lock_file.fileno() ,fcntl.LOCK_EX) > > +mac_filename = os.path.join(root_dir, "address_pool") > > Maybe it makes sense to put address_pool and the lock file in /tmp, > where they can be shared by more than a single autotest instance running > on the same host (unlikely, but theoretically possible). good idea. > > +mac_shelve = shelve.open(mac_filename, writeback=False) > > + > > +mac_pool = mac_shelve.get("macpool") > > Why is this 'macpool' needed? Why not put the keys directly in the > shelve object? yes, put keys directly in the shelve object is better. > > +if not mac_pool: > > +mac_pool = {} > > +found = False > > + > > +val = "%s:%s" % (vm, nic_index) > > +for key in mac_pool.keys(): > > +if val in mac_pool[key]: > > +mac_pool[key].append(val) > > Why append val to mac_pool[key] if val is already in mac_pool[key]? need drop it. > > +found = True > > +mac = key > > + > > +while not found: > > +postfix = "%02x:%02x" % (random.randint(0x00,0xfe), > > +random.randint(0x00,0xfe)) > > +mac = prefix + postfix > > +mac_list = mac.split(":") > > +# Clear multicast bit > > +mac_list[0] = int(mac_list[0],16)
[PATCH v2 3/3] KVM: Non-atomic interrupt injection
Change the interrupt injection code to work from preemptible, interrupts enabled context. This works by adding a ->cancel_injection() operation that undoes an injection in case we were not able to actually enter the guest (this condition could never happen with atomic injection). Signed-off-by: Avi Kivity --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/svm.c | 12 arch/x86/kvm/vmx.c | 11 +++ arch/x86/kvm/x86.c | 27 ++- 4 files changed, 38 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 502e53f..5dd797c 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -505,6 +505,7 @@ struct kvm_x86_ops { void (*queue_exception)(struct kvm_vcpu *vcpu, unsigned nr, bool has_error_code, u32 error_code, bool reinject); + void (*cancel_injection)(struct kvm_vcpu *vcpu); int (*interrupt_allowed)(struct kvm_vcpu *vcpu); int (*nmi_allowed)(struct kvm_vcpu *vcpu); bool (*get_nmi_mask)(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 56c9b6b..46d068e 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3135,6 +3135,17 @@ static void svm_complete_interrupts(struct vcpu_svm *svm) } } +static void svm_cancel_injection(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *svm = to_svm(vcpu); + struct vmcb_control_area *control = &svm->vmcb->control; + + control->exit_int_info = control->event_inj; + control->exit_int_info_err = control->event_inj_err; + control->event_inj = 0; + svm_complete_interrupts(svm); +} + #ifdef CONFIG_X86_64 #define R "r" #else @@ -3493,6 +3504,7 @@ static struct kvm_x86_ops svm_x86_ops = { .set_irq = svm_set_irq, .set_nmi = svm_inject_nmi, .queue_exception = svm_queue_exception, + .cancel_injection = svm_cancel_injection, .interrupt_allowed = svm_interrupt_allowed, .nmi_allowed = svm_nmi_allowed, .get_nmi_mask = svm_get_nmi_mask, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 53b6fc0..72381b7 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3906,6 +3906,16 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) IDT_VECTORING_ERROR_CODE); } +static void vmx_cancel_injection(struct kvm_vcpu *vcpu) +{ + __vmx_complete_interrupts(to_vmx(vcpu), + vmcs_read32(VM_ENTRY_INTR_INFO_FIELD), + VM_ENTRY_INSTRUCTION_LEN, + VM_ENTRY_EXCEPTION_ERROR_CODE); + + vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0); +} + /* * Failure to inject an interrupt should give us the information * in IDT_VECTORING_INFO_FIELD. However, if the failure occurs @@ -4360,6 +4370,7 @@ static struct kvm_x86_ops vmx_x86_ops = { .set_irq = vmx_inject_irq, .set_nmi = vmx_inject_nmi, .queue_exception = vmx_queue_exception, + .cancel_injection = vmx_cancel_injection, .interrupt_allowed = vmx_interrupt_allowed, .nmi_allowed = vmx_nmi_allowed, .get_nmi_mask = vmx_get_nmi_mask, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 84bfb51..1040d3f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4709,6 +4709,19 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (unlikely(r)) goto out; + inject_pending_event(vcpu); + + /* enable NMI/IRQ window open exits if needed */ + if (vcpu->arch.nmi_pending) + kvm_x86_ops->enable_nmi_window(vcpu); + else if (kvm_cpu_has_interrupt(vcpu) || req_int_win) + kvm_x86_ops->enable_irq_window(vcpu); + + if (kvm_lapic_enabled(vcpu)) { + update_cr8_intercept(vcpu); + kvm_lapic_sync_to_vapic(vcpu); + } + preempt_disable(); kvm_x86_ops->prepare_guest_switch(vcpu); @@ -4727,23 +4740,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) smp_wmb(); local_irq_enable(); preempt_enable(); + kvm_x86_ops->cancel_injection(vcpu); r = 1; goto out; } - inject_pending_event(vcpu); - - /* enable NMI/IRQ window open exits if needed */ - if (vcpu->arch.nmi_pending) - kvm_x86_ops->enable_nmi_window(vcpu); - else if (kvm_cpu_has_interrupt(vcpu) || req_int_win) - kvm_x86_ops->enable_irq_window(vcpu); - - if (kvm_lapic_enabled(vcpu)) { - update_cr8_intercept(vcpu); - kvm_lapic_sync_to_vapic(vcpu); - } - srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); kvm_guest_enter(); -- 1.7.1 -- To unsubscribe from this list:
[PATCH v2 1/3] KVM: VMX: Split up vmx_complete_interrupts()
vmx_complete_interrupts() does too much, split it up: - vmx_vcpu_run() gets the "cache important vmcs fields" part - a new vmx_complete_atomic_exit() gets the parts that must be done atomically - a new vmx_recover_nmi_blocking() does what its name says - vmx_complete_interrupts() retains the event injection recovery code This helps in reducing the work done in atomic context. Signed-off-by: Avi Kivity --- arch/x86/kvm/vmx.c | 39 +++ 1 files changed, 27 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 2fdcc98..1a35964 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -125,6 +125,7 @@ struct vcpu_vmx { unsigned long host_rsp; int launched; u8fail; + u32 exit_intr_info; u32 idt_vectoring_info; struct shared_msr_entry *guest_msrs; int nmsrs; @@ -3792,18 +3793,9 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) vmcs_write32(TPR_THRESHOLD, irr); } -static void vmx_complete_interrupts(struct vcpu_vmx *vmx) +static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx) { - u32 exit_intr_info; - u32 idt_vectoring_info = vmx->idt_vectoring_info; - bool unblock_nmi; - u8 vector; - int type; - bool idtv_info_valid; - - exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); - - vmx->exit_reason = vmcs_read32(VM_EXIT_REASON); + u32 exit_intr_info = vmx->exit_intr_info; /* Handle machine checks before interrupts are enabled */ if ((vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY) @@ -3818,8 +3810,16 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) asm("int $2"); kvm_after_handle_nmi(&vmx->vcpu); } +} - idtv_info_valid = idt_vectoring_info & VECTORING_INFO_VALID_MASK; +static void vmx_recover_nmi_blocking(struct vcpu_vmx *vmx) +{ + u32 exit_intr_info = vmx->exit_intr_info; + bool unblock_nmi; + u8 vector; + bool idtv_info_valid; + + idtv_info_valid = vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK; if (cpu_has_virtual_nmis()) { unblock_nmi = (exit_intr_info & INTR_INFO_UNBLOCK_NMI) != 0; @@ -3841,6 +3841,16 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) } else if (unlikely(vmx->soft_vnmi_blocked)) vmx->vnmi_blocked_time += ktime_to_ns(ktime_sub(ktime_get(), vmx->entry_time)); +} + +static void vmx_complete_interrupts(struct vcpu_vmx *vmx) +{ + u32 idt_vectoring_info = vmx->idt_vectoring_info; + u8 vector; + int type; + bool idtv_info_valid; + + idtv_info_valid = idt_vectoring_info & VECTORING_INFO_VALID_MASK; vmx->vcpu.arch.nmi_injected = false; kvm_clear_exception_queue(&vmx->vcpu); @@ -4051,6 +4061,11 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu) asm("mov %0, %%ds; mov %0, %%es" : : "r"(__USER_DS)); vmx->launched = 1; + vmx->exit_reason = vmcs_read32(VM_EXIT_REASON); + vmx->exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); + + vmx_complete_atomic_exit(vmx); + vmx_recover_nmi_blocking(vmx); vmx_complete_interrupts(vmx); } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/3] KVM: VMX: Parameterize vmx_complete_interrupts() for both exit and entry
Currently vmx_complete_interrupts() can decode event information from vmx exit fields into the generic kvm event queues. Make it able to decode the information from the entry fields as well by parametrizing it. Signed-off-by: Avi Kivity --- arch/x86/kvm/vmx.c | 19 ++- 1 files changed, 14 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 1a35964..53b6fc0 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3843,9 +3843,11 @@ static void vmx_recover_nmi_blocking(struct vcpu_vmx *vmx) ktime_to_ns(ktime_sub(ktime_get(), vmx->entry_time)); } -static void vmx_complete_interrupts(struct vcpu_vmx *vmx) +static void __vmx_complete_interrupts(struct vcpu_vmx *vmx, + u32 idt_vectoring_info, + int instr_len_field, + int error_code_field) { - u32 idt_vectoring_info = vmx->idt_vectoring_info; u8 vector; int type; bool idtv_info_valid; @@ -3875,18 +3877,18 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) break; case INTR_TYPE_SOFT_EXCEPTION: vmx->vcpu.arch.event_exit_inst_len = - vmcs_read32(VM_EXIT_INSTRUCTION_LEN); + vmcs_read32(instr_len_field); /* fall through */ case INTR_TYPE_HARD_EXCEPTION: if (idt_vectoring_info & VECTORING_INFO_DELIVER_CODE_MASK) { - u32 err = vmcs_read32(IDT_VECTORING_ERROR_CODE); + u32 err = vmcs_read32(error_code_field); kvm_queue_exception_e(&vmx->vcpu, vector, err); } else kvm_queue_exception(&vmx->vcpu, vector); break; case INTR_TYPE_SOFT_INTR: vmx->vcpu.arch.event_exit_inst_len = - vmcs_read32(VM_EXIT_INSTRUCTION_LEN); + vmcs_read32(instr_len_field); /* fall through */ case INTR_TYPE_EXT_INTR: kvm_queue_interrupt(&vmx->vcpu, vector, @@ -3897,6 +3899,13 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) } } +static void vmx_complete_interrupts(struct vcpu_vmx *vmx) +{ + __vmx_complete_interrupts(vmx, vmx->idt_vectoring_info, + VM_EXIT_INSTRUCTION_LEN, + IDT_VECTORING_ERROR_CODE); +} + /* * Failure to inject an interrupt should give us the information * in IDT_VECTORING_INFO_FIELD. However, if the failure occurs -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/3] Nonatomic interrupt injection
This patchset changes interrupt injection to be done from normal process context instead of interrupts disabled context. This is useful for real mode interrupt injection on Intel without the current hacks (injecting as a software interrupt of a vm86 task), reducing latencies, and later, for allowing nested virtualization code to use kvm_read_guest()/kvm_write_guest() instead of kmap() to access the guest vmcb/vmcs. Seems to survive a hack that cancels every 16th entry, after injection has already taken place. v2: svm support (easier than expected) fix silly vmx warning Avi Kivity (3): KVM: VMX: Split up vmx_complete_interrupts() KVM: VMX: Parameterize vmx_complete_interrupts() for both exit and entry KVM: Non-atomic interrupt injection arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/svm.c | 12 +++ arch/x86/kvm/vmx.c | 65 ++- arch/x86/kvm/x86.c | 27 4 files changed, 77 insertions(+), 28 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: KVM call agenda for July 20
On Tue, 20 Jul 2010 09:07:11 +0300 Avi Kivity wrote: > On 07/20/2010 12:46 AM, Chris Wright wrote: > > Please send in any agenda items you are interested in covering. > > > > > Last week's agenda, minus the item that we started to discuss. (includes 0.13) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Allow a user to stop and start one guest VM
On Tue, Jul 20, 2010 at 08:01:15AM -0500, Neil Aggarwal wrote: > Hello: > > One of my customers asked for access to stop and start > their guest VM. > > Right now, I can do that using virsh, but I do not want > to give this customer the ability to stop and start > all VMs running on the host. > > Is there a way to give stop and start control of one > VM to someone? Fine grained role based access control is not available at the libvirt/virsh level. It is currently something that must be provided by the management layer above libvirt. We intend to add this capability directly into libvirt in the future, but there's no firm ETA. So in the immediate term you'd need to write a small tool using libvirt APIs to delegate stop/start operations to users you desire Regards, Daniel -- |: Red Hat, Engineering, London-o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org-o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Allow a user to stop and start one guest VM
Hello: One of my customers asked for access to stop and start their guest VM. Right now, I can do that using virsh, but I do not want to give this customer the ability to stop and start all VMs running on the host. Is there a way to give stop and start control of one VM to someone? I am using KVM on a CentOS 5.5 host. Thanks, Neil -- Neil Aggarwal, (281)846-8957 FREE trial: Virtualmin VPS with unmetered bandwidth http://UnmeteredVPS.net/virtualmin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] KVM: VMX: Parameterize vmx_complete_interrupts() for both exit and entry
Currently vmx_complete_interrupts() can decode event information from vmx exit fields into the generic kvm event queues. Make it able to decode the information from the entry fields as well by parametrizing it. Signed-off-by: Avi Kivity --- arch/x86/kvm/vmx.c | 19 ++- 1 files changed, 14 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 1a35964..53b6fc0 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3843,9 +3843,11 @@ static void vmx_recover_nmi_blocking(struct vcpu_vmx *vmx) ktime_to_ns(ktime_sub(ktime_get(), vmx->entry_time)); } -static void vmx_complete_interrupts(struct vcpu_vmx *vmx) +static void __vmx_complete_interrupts(struct vcpu_vmx *vmx, + u32 idt_vectoring_info, + int instr_len_field, + int error_code_field) { - u32 idt_vectoring_info = vmx->idt_vectoring_info; u8 vector; int type; bool idtv_info_valid; @@ -3875,18 +3877,18 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) break; case INTR_TYPE_SOFT_EXCEPTION: vmx->vcpu.arch.event_exit_inst_len = - vmcs_read32(VM_EXIT_INSTRUCTION_LEN); + vmcs_read32(instr_len_field); /* fall through */ case INTR_TYPE_HARD_EXCEPTION: if (idt_vectoring_info & VECTORING_INFO_DELIVER_CODE_MASK) { - u32 err = vmcs_read32(IDT_VECTORING_ERROR_CODE); + u32 err = vmcs_read32(error_code_field); kvm_queue_exception_e(&vmx->vcpu, vector, err); } else kvm_queue_exception(&vmx->vcpu, vector); break; case INTR_TYPE_SOFT_INTR: vmx->vcpu.arch.event_exit_inst_len = - vmcs_read32(VM_EXIT_INSTRUCTION_LEN); + vmcs_read32(instr_len_field); /* fall through */ case INTR_TYPE_EXT_INTR: kvm_queue_interrupt(&vmx->vcpu, vector, @@ -3897,6 +3899,13 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) } } +static void vmx_complete_interrupts(struct vcpu_vmx *vmx) +{ + __vmx_complete_interrupts(vmx, vmx->idt_vectoring_info, + VM_EXIT_INSTRUCTION_LEN, + IDT_VECTORING_ERROR_CODE); +} + /* * Failure to inject an interrupt should give us the information * in IDT_VECTORING_INFO_FIELD. However, if the failure occurs -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] KVM: VMX: Split up vmx_complete_interrupts()
vmx_complete_interrupts() does too much, split it up: - vmx_vcpu_run() gets the "cache important vmcs fields" part - a new vmx_complete_atomic_exit() gets the parts that must be done atomically - a new vmx_recover_nmi_blocking() does what its name says - vmx_complete_interrupts() retains the event injection recovery code This helps in reducing the work done in atomic context. Signed-off-by: Avi Kivity --- arch/x86/kvm/vmx.c | 39 +++ 1 files changed, 27 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 2fdcc98..1a35964 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -125,6 +125,7 @@ struct vcpu_vmx { unsigned long host_rsp; int launched; u8fail; + u32 exit_intr_info; u32 idt_vectoring_info; struct shared_msr_entry *guest_msrs; int nmsrs; @@ -3792,18 +3793,9 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) vmcs_write32(TPR_THRESHOLD, irr); } -static void vmx_complete_interrupts(struct vcpu_vmx *vmx) +static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx) { - u32 exit_intr_info; - u32 idt_vectoring_info = vmx->idt_vectoring_info; - bool unblock_nmi; - u8 vector; - int type; - bool idtv_info_valid; - - exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); - - vmx->exit_reason = vmcs_read32(VM_EXIT_REASON); + u32 exit_intr_info = vmx->exit_intr_info; /* Handle machine checks before interrupts are enabled */ if ((vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY) @@ -3818,8 +3810,16 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) asm("int $2"); kvm_after_handle_nmi(&vmx->vcpu); } +} - idtv_info_valid = idt_vectoring_info & VECTORING_INFO_VALID_MASK; +static void vmx_recover_nmi_blocking(struct vcpu_vmx *vmx) +{ + u32 exit_intr_info = vmx->exit_intr_info; + bool unblock_nmi; + u8 vector; + bool idtv_info_valid; + + idtv_info_valid = vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK; if (cpu_has_virtual_nmis()) { unblock_nmi = (exit_intr_info & INTR_INFO_UNBLOCK_NMI) != 0; @@ -3841,6 +3841,16 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) } else if (unlikely(vmx->soft_vnmi_blocked)) vmx->vnmi_blocked_time += ktime_to_ns(ktime_sub(ktime_get(), vmx->entry_time)); +} + +static void vmx_complete_interrupts(struct vcpu_vmx *vmx) +{ + u32 idt_vectoring_info = vmx->idt_vectoring_info; + u8 vector; + int type; + bool idtv_info_valid; + + idtv_info_valid = idt_vectoring_info & VECTORING_INFO_VALID_MASK; vmx->vcpu.arch.nmi_injected = false; kvm_clear_exception_queue(&vmx->vcpu); @@ -4051,6 +4061,11 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu) asm("mov %0, %%ds; mov %0, %%es" : : "r"(__USER_DS)); vmx->launched = 1; + vmx->exit_reason = vmcs_read32(VM_EXIT_REASON); + vmx->exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); + + vmx_complete_atomic_exit(vmx); + vmx_recover_nmi_blocking(vmx); vmx_complete_interrupts(vmx); } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] KVM: Non-atomic interrupt injection
Change the interrupt injection code to work from preemptible, interrupts enabled context. This works by adding a ->cancel_injection() operation that undoes an injection in case we were not able to actually enter the guest (this condition could never happen with atomic injection). Signed-off-by: Avi Kivity --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/vmx.c | 10 ++ arch/x86/kvm/x86.c | 27 ++- 3 files changed, 25 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 502e53f..5dd797c 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -505,6 +505,7 @@ struct kvm_x86_ops { void (*queue_exception)(struct kvm_vcpu *vcpu, unsigned nr, bool has_error_code, u32 error_code, bool reinject); + void (*cancel_injection)(struct kvm_vcpu *vcpu); int (*interrupt_allowed)(struct kvm_vcpu *vcpu); int (*nmi_allowed)(struct kvm_vcpu *vcpu); bool (*get_nmi_mask)(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 53b6fc0..a039af2 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3906,6 +3906,15 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) IDT_VECTORING_ERROR_CODE); } +static void vmx_cancel_injection(struct vcpu_vmx *vmx) +{ + __vmx_complete_interrupts(vmx, vmcs_read32(VM_ENTRY_INTR_INFO_FIELD), + VM_ENTRY_INSTRUCTION_LEN, + VM_ENTRY_EXCEPTION_ERROR_CODE); + + vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0); +} + /* * Failure to inject an interrupt should give us the information * in IDT_VECTORING_INFO_FIELD. However, if the failure occurs @@ -4360,6 +4369,7 @@ static struct kvm_x86_ops vmx_x86_ops = { .set_irq = vmx_inject_irq, .set_nmi = vmx_inject_nmi, .queue_exception = vmx_queue_exception, + .cancel_injection = vmx_cancel_injection, .interrupt_allowed = vmx_interrupt_allowed, .nmi_allowed = vmx_nmi_allowed, .get_nmi_mask = vmx_get_nmi_mask, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 84bfb51..1040d3f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4709,6 +4709,19 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (unlikely(r)) goto out; + inject_pending_event(vcpu); + + /* enable NMI/IRQ window open exits if needed */ + if (vcpu->arch.nmi_pending) + kvm_x86_ops->enable_nmi_window(vcpu); + else if (kvm_cpu_has_interrupt(vcpu) || req_int_win) + kvm_x86_ops->enable_irq_window(vcpu); + + if (kvm_lapic_enabled(vcpu)) { + update_cr8_intercept(vcpu); + kvm_lapic_sync_to_vapic(vcpu); + } + preempt_disable(); kvm_x86_ops->prepare_guest_switch(vcpu); @@ -4727,23 +4740,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) smp_wmb(); local_irq_enable(); preempt_enable(); + kvm_x86_ops->cancel_injection(vcpu); r = 1; goto out; } - inject_pending_event(vcpu); - - /* enable NMI/IRQ window open exits if needed */ - if (vcpu->arch.nmi_pending) - kvm_x86_ops->enable_nmi_window(vcpu); - else if (kvm_cpu_has_interrupt(vcpu) || req_int_win) - kvm_x86_ops->enable_irq_window(vcpu); - - if (kvm_lapic_enabled(vcpu)) { - update_cr8_intercept(vcpu); - kvm_lapic_sync_to_vapic(vcpu); - } - srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); kvm_guest_enter(); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] Nonatomic interrupt injection
This patchset changes interrupt injection to be done from normal process context instead of interrupts disabled context. This is useful for real mode interrupt injection on Intel without the current hacks (injecting as a software interrupt of a vm86 task), reducing latencies, and later, for allowing nested virtualization code to use kvm_read_guest()/kvm_write_guest() instead of kmap() to access the guest vmcb/vmcs. Seems to survive a hack that cancels every 16th entry, after injection has already taken place. TODO: svm support, more complicated due to debug and nsvm handling Avi Kivity (3): KVM: VMX: Split up vmx_complete_interrupts() KVM: VMX: Parameterize vmx_complete_interrupts() for both exit and entry KVM: Non-atomic interrupt injection arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/vmx.c | 64 +- arch/x86/kvm/x86.c | 27 3 files changed, 64 insertions(+), 28 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest][RFC PATCH 00/14] Patchset of network related subtests
On Tue, 2010-07-20 at 09:34 +0800, Amos Kong wrote: > The following series contain 11 network related subtests, welcome to give me > some suggestions about correctness, design, enhancement. Awesome work, will start to review them today. Thanks! > Thank you so much! > > --- > > Amos Kong (14): > KVM-test: Add a new macaddress pool algorithm > KVM Test: Add a function get_interface_name() to kvm_net_utils.py > KVM Test: Add a common ping module for network related tests > KVM-test: Add a new subtest ping > KVM-test: Add a subtest jumbo > KVM-test: Add basic file transfer test > KVM-test: Add a subtest of load/unload nic driver > KVM-test: Add a subtest of nic promisc > KVM-test: Add a subtest of multicast > KVM-test: Add a subtest of pxe > KVM-test: Add a subtest of changing mac address > KVM-test: Add a subtest of netperf > KVM-test: Improve vlan subtest > KVM-test: Add subtest of testing offload by ethtool > > > 0 files changed, 0 insertions(+), 0 deletions(-) > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC PATCH 01/14] KVM-test: Add a new macaddress pool algorithm
On 07/20/2010 04:34 AM, Amos Kong wrote: > Old method uses the mac address in the configuration files which could > lead serious problem when multiple tests running in different hosts. > > This patch adds a new macaddress pool algorithm, it generates the mac prefix > based on mac address of the host which could eliminate the duplicated mac > addresses between machines. > > When user have set the mac_prefix in the configuration file, we should use it > in stead of the dynamic generated mac prefix. > > Other change: > . Fix randomly generating mac address so that it correspond to IEEE802. > . Update clone function to decide clone mac address or not. > . Update get_macaddr function. > . Add set_mac_address function. > > New auto mac address pool algorithm: > If address_index is defined, VM will get mac from config file then record mac > in to address_pool. If address_index is not defined, VM will call > get_mac_from_pool to auto create mac then recored mac to address_pool in > following format: > {'macpool': {'AE:9D:94:6A:9b:f9': ['20100310-165222-Wt7l:0']}} > > AE:9D:94:6A:9b:f9: mac address > 20100310-165222-Wt7l : instance attribute of VM > 0: index of NIC Why do you use the mac address as a key, instead of the instance string + nic index? When the mac address is used as a key, each key has a list of values instead of just one value. This order seems unnatural. If it were the other way around (i.e. key = VM instance + nic index, value = mac address), then each key would have exactly one value, and I think this patch would be shorter and simpler. > Signed-off-by: Jason Wang > Signed-off-by: Feng Yang > Signed-off-by: Amos Kong > --- > 0 files changed, 0 insertions(+), 0 deletions(-) > > diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py > index fb2d1c2..7c0946e 100644 > --- a/client/tests/kvm/kvm_utils.py > +++ b/client/tests/kvm/kvm_utils.py > @@ -5,6 +5,7 @@ KVM test utility functions. > """ > > import time, string, random, socket, os, signal, re, logging, commands, > cPickle > +import fcntl, shelve > from autotest_lib.client.bin import utils > from autotest_lib.client.common_lib import error, logging_config > import kvm_subprocess > @@ -82,6 +83,104 @@ def get_sub_dict_names(dict, keyword): > > # Functions related to MAC/IP addresses > > +def get_mac_from_pool(root_dir, vm, nic_index, prefix='00:11:22:33:'): The name of this function is confusing because it does the exact opposite: it puts a mac address in address_pool. Maybe the pool you're referring to in the name isn't address_pool, but still a less confusing name should probably be used. > +""" > +random generated mac address. > + > +1) First try to generate macaddress based on the mac address prefix. > +2) And then try to use total random generated mac address. > + > +@param root_dir: Root dir for kvm > +@param vm: Here we use instance of vm > +@param nic_index: The index of nic. > +@param prefix: Prefix of mac address. > +@Return: Return mac address. > +""" > + > +lock_filename = os.path.join(root_dir, "mac_lock") > +lock_file = open(lock_filename, 'w') > +fcntl.lockf(lock_file.fileno() ,fcntl.LOCK_EX) > +mac_filename = os.path.join(root_dir, "address_pool") Maybe it makes sense to put address_pool and the lock file in /tmp, where they can be shared by more than a single autotest instance running on the same host (unlikely, but theoretically possible). > +mac_shelve = shelve.open(mac_filename, writeback=False) > + > +mac_pool = mac_shelve.get("macpool") Why is this 'macpool' needed? Why not put the keys directly in the shelve object? > + > +if not mac_pool: > +mac_pool = {} > +found = False > + > +val = "%s:%s" % (vm, nic_index) > +for key in mac_pool.keys(): > +if val in mac_pool[key]: > +mac_pool[key].append(val) Why append val to mac_pool[key] if val is already in mac_pool[key]? > +found = True > +mac = key > + > +while not found: > +postfix = "%02x:%02x" % (random.randint(0x00,0xfe), > +random.randint(0x00,0xfe)) > +mac = prefix + postfix > +mac_list = mac.split(":") > +# Clear multicast bit > +mac_list[0] = int(mac_list[0],16) & 0xfe > +# Set local assignment bit (IEEE802) > +mac_list[0] = mac_list[0] | 0x02 > +mac_list[0] = "%02x" % mac_list[0] Why is this needed? Most mac addresses begin with 00. If the mac address is generated from the address of eth0 (using the method in this patch) it begins with 00, which is fine. If the prefix is set by the user using mac_prefix, I don't think we should modify it. > +mac = ":".join(mac_list) > +if mac not in mac_pool.keys() or 0 == len(mac_pool[mac]): > +mac_pool[mac] = ["%s:%s" % (vm,nic_index)] > +found = True > +mac_shelve["macpool"] =
Re: [PATCH 04/18] Make cpu_tsc_khz updates use local CPU
On 07/19/2010 11:06 PM, Zachary Amsden wrote: +static void tsc_khz_changed(void *data) { -/* nothing */ +struct cpufreq_freqs *freq = data; +unsigned long khz = 0; + +if (data) +khz = freq->new; +else if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) +khz = cpufreq_quick_get(raw_smp_processor_id()); +if (!khz) +khz = tsc_khz; +__get_cpu_var(cpu_tsc_khz) = khz; } Do we really need to cache cpufreq_quick_get()? If it's really quick, why not just use it everywhere instead of cacheing it? Not a comment on this patch. If cpufreq is compiled in, but disabled, it returns zero, so we need some sort of logic. Maybe it's better to put it into cpufreq_quick_get(). Inconsistent APIs that appear to work are bad. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PPC64/Power7 - 2.6.35-rc5] Bad relocation warnings whileBuilding a CONFIG_RELOCATABLE kernel with CONFIG_ISERIES enabled
On 20.07.2010, at 09:27, Milton Miller wrote: > On Mon, 19 Jul 2010 about 14:00:56 +0200, Alexander Graf wrote: >> Milton Miller wrote: >>> I wrote: >>> >>> Oh yea, and for book-3s, the code copies from 0x100 to __end_interrupts >>> in arch/powerpc/kernel/exceptions-64s.h down to the real 0, but the rest >>> of the kernel is at some disjointed address. The interrupt will go to >>> the copy at the real zero. Any references to code outside that region >>> must be done via a full indrect branch (not a relative one), simiar to >>> the secondary startup (via following the function pointer in a descriptor >>> set in very low memory), or syscall entry and exception vectors via paca. >>> >> >> That would still break on normal PPC boxes, as any address accessed in >> real mode has to be inside the RMA. And the #include for >> kvm/book3s_rmhandlers.S happens after __end_interrupts. So I'd end up >> with code that gets executed outside of the RMA after a relocation, right? >> >> Alex >> > > Weither its outside of the RMA or not, DO_KVM is creating a branch outside > of code copied to lowmem. > > This is BROKEN. > > We have a hard limit that we can't extend _end_interrupts past 0x7000, and > a soft limit that we can't exceed 0x6000. If there is space, we could > move the real mode handler extensions inside end_interrupts in > exceptions-64s.S, and store the full address in a .quad so it gets > relocated properly. Don't subtract the start, we have designed the kernel > to run with start at a VA that can be used as a EA in real mode. Moving everything to exceptions-64s.S sounds like the best thing to do. All the code in real mode really is there so it stays inside the RMA. I don't think we can guarantee that for any code that is not copied, right? > Otherwise we need to mark KVM_BOOK3S_64 depends on (!RELOCATABLE || > BROKEN) for 2.6.35 until we get fixes. Well - it's only broken when really getting relocated. But I agree, the current state doesn't cope with Linux's relocation logic. > I took a read though the book3s code as of 2.6.34. A few things I noticed: > > (1) The code is using slb large to control the segment size. It should > be using SLB B field (or just impliment 256M segments only). I'm not sure I understand this part? We only use 256MB segments for now. > (2) It appears that the mtspr and mfspr code is using the same storage for > bats 4-7 as 0-3 ... I would have expected a 4 + a few places. Yes, that one is fixed in more recent versions already. > (3) Its not clear to me that you clear RI when transitioning to the guest > but its obviously required because you place state in srr0 & srr1. Uh - do I have to clear RI? I'm not prepared to take an interrupt anyways and RI is just a soft flag for Linux's handlers, right? > (4) I don't understand why __kvmppc_vcpu_run turns on interrupts so that > __kvmppc_vcpu_entry can turn them back off. Something to do with > irq trace annotations? __kvmppc_vcpu_run turns on soft interrupts while __kvmppc_vcpu_entry turns them off in MSR. This is so that when enabling interrupts again on guest exit, we have the soft enable bit set. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html