[COMMIT master] Release script: document external dependencies in tarball
From: Avi Kivity a...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/kvm/scripts/make-release b/kvm/scripts/make-release index 3b1dccf..11d9c27 100755 --- a/kvm/scripts/make-release +++ b/kvm/scripts/make-release @@ -45,6 +45,13 @@ tarball=$releasedir/$name.tar cd $(dirname $0)/../.. git archive --prefix=$name/ --format=tar $commit $tarball +mkdir -p $tmpdir +git cat-file -p ${commit}:roms | awk ' { print $4, $3 } ' \ + $tmpdir/EXTERNAL_DEPENDENCIES +tar -rf $tarball --transform s,^,$name/, -C $tmpdir \ +EXTERNAL_DEPENDENCIES +rm -rf $tmpdir + if [[ -n $formal ]]; then mkdir -p $tmpdir echo $name $tmpdir/KVM_VERSION -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] Don't load options roms intended to be loaded by the bios in qemu
From: Avi Kivity a...@redhat.com The first such option rom will load at address 0, which isn't very nice, and the second will report a conflict and abort, which is horrible. Signed-off-by: Avi Kivity a...@redhat.com diff --git a/hw/loader.c b/hw/loader.c index 2ceb8eb..eef385e 100644 --- a/hw/loader.c +++ b/hw/loader.c @@ -636,6 +636,9 @@ static void rom_reset(void *unused) Rom *rom; QTAILQ_FOREACH(rom, roms, next) { +if (rom-fw_file) { +continue; +} if (rom-data == NULL) continue; cpu_physical_memory_write_rom(rom-addr, rom-data, rom-romsize); @@ -654,6 +657,9 @@ int rom_load_all(void) Rom *rom; QTAILQ_FOREACH(rom, roms, next) { +if (rom-fw_file) { +continue; +} if (addr rom-addr) { fprintf(stderr, rom: requested regions overlap (rom %s. free=0x TARGET_FMT_plx @@ -752,7 +758,7 @@ void do_info_roms(Monitor *mon) Rom *rom; QTAILQ_FOREACH(rom, roms, next) { -if (rom-addr) { +if (!rom-fw_file) { monitor_printf(mon, addr= TARGET_FMT_plx size=0x%06zx mem=%s name=\%s\ \n, rom-addr, rom-romsize, -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] Split off sysfs id retrieval
From: Alexander Graf ag...@suse.de To retreive device and vendor ID from a PCI device, we need to read a sysfs file. That code is currently hand written at least two times, the later patch introducing two more calls. So let's move that out to a function. Signed-off-by: Alexander Graf ag...@suse.de Signed-off-by: Avi Kivity a...@redhat.com diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 4b902d7..35c8812 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -562,14 +562,46 @@ static int assigned_dev_register_regions(PCIRegion *io_regions, return 0; } +static int get_real_id(const char *devpath, const char *idname, uint16_t *val) +{ +FILE *f; +char name[128]; +long id; + +snprintf(name, sizeof(name), %s%s, devpath, idname); +f = fopen(name, r); +if (f == NULL) { +fprintf(stderr, %s: %s: %m\n, __func__, name); +return -1; +} +if (fscanf(f, %li\n, id) == 1) { +*val = id; +} else { +return -1; +} +fclose(f); + +return 0; +} + +static int get_real_vendor_id(const char *devpath, uint16_t *val) +{ +return get_real_id(devpath, vendor, val); +} + +static int get_real_device_id(const char *devpath, uint16_t *val) +{ +return get_real_id(devpath, device, val); +} + static int get_real_device(AssignedDevice *pci_dev, uint8_t r_bus, uint8_t r_dev, uint8_t r_func) { char dir[128], name[128]; -int fd, r = 0; +int fd, r = 0, v; FILE *f; unsigned long long start, end, size, flags; -unsigned long id; +uint16_t id; struct stat statbuf; PCIRegion *rp; PCIDevRegions *dev = pci_dev-real_device; @@ -635,31 +667,21 @@ again: fclose(f); -/* read and fill device ID */ -snprintf(name, sizeof(name), %svendor, dir); -f = fopen(name, r); -if (f == NULL) { -fprintf(stderr, %s: %s: %m\n, __func__, name); +/* read and fill vendor ID */ +v = get_real_vendor_id(dir, id); +if (v) { return 1; } -if (fscanf(f, %li\n, id) == 1) { - pci_dev-dev.config[0] = id 0xff; - pci_dev-dev.config[1] = (id 0xff00) 8; -} -fclose(f); +pci_dev-dev.config[0] = id 0xff; +pci_dev-dev.config[1] = (id 0xff00) 8; -/* read and fill vendor ID */ -snprintf(name, sizeof(name), %sdevice, dir); -f = fopen(name, r); -if (f == NULL) { -fprintf(stderr, %s: %s: %m\n, __func__, name); +/* read and fill device ID */ +v = get_real_device_id(dir, id); +if (v) { return 1; } -if (fscanf(f, %li\n, id) == 1) { - pci_dev-dev.config[2] = id 0xff; - pci_dev-dev.config[3] = (id 0xff00) 8; -} -fclose(f); +pci_dev-dev.config[2] = id 0xff; +pci_dev-dev.config[3] = (id 0xff00) 8; /* dealing with virtual function device */ snprintf(name, sizeof(name), %sphysfn/, dir); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] Inform users about busy device assignment attempt
From: Alexander Graf ag...@suse.de When using -pcidevice on a device that is already in use by a kernel driver all the user gets is the following (very useful) information: Failed to assign device 04:00.0 : Device or resource busy Failed to deassign device 04:00.0 : Invalid argument Error initializing device pci-assign Since I usually prefer to have my computer do the thinking for me, I figured it might be a good idea to check and see if a device is actually used by a driver. If so, tell the user. So with this patch applied you get the following output: Failed to assign device 04:00.0 : Device or resource busy *** The driver 'igb' is occupying your device 04:00.0. *** *** You can try the following commands to free it: *** *** $ echo 8086 150a /sys/bus/pci/drivers/pci-stub/new_id *** $ echo :04:00.0 /sys/bus/pci/drivers/igb/unbind *** $ echo :04:00.0 /sys/bus/pci/drivers/pci-stub/bind *** $ echo 8086 150a /sys/bus/pci/drivers/pci-stub/remove_id *** Failed to deassign device 04:00.0 : Invalid argument Error initializing device pci-assign That should keep people like me from doing the most obvious misuses :-). CC: Daniel P. Berrange berra...@redhat.com Signed-off-by: Alexander Graf ag...@suse.de Signed-off-by: Avi Kivity a...@redhat.com diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 35c8812..02d23d8 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -756,6 +756,54 @@ static uint32_t calc_assigned_dev_id(uint8_t bus, uint8_t devfn) return (uint32_t)bus 8 | (uint32_t)devfn; } +static void assign_failed_examine(AssignedDevice *dev) +{ +char name[PATH_MAX], dir[PATH_MAX], driver[PATH_MAX] = {}, *ns; +uint16_t vendor_id, device_id; +int r; + +/* XXX implement multidomain */ +sprintf(dir, /sys/bus/pci/devices/:%02x:%02x.%01x/, + dev-host.bus, dev-host.dev, dev-host.func); + +sprintf(name, %sdriver, dir); + +r = readlink(name, driver, sizeof(driver)); +if ((r = 0) || r = sizeof(driver) || !(ns = strrchr(driver, '/'))) { +goto fail; +} + +ns++; + +if (get_real_vendor_id(dir, vendor_id) || +get_real_device_id(dir, device_id)) { +goto fail; +} + +fprintf(stderr, *** The driver '%s' is occupying your device +%02x:%02x.%x.\n, +ns, dev-host.bus, dev-host.dev, dev-host.func); +fprintf(stderr, ***\n); +fprintf(stderr, *** You can try the following commands to free it:\n); +fprintf(stderr, ***\n); +fprintf(stderr, *** $ echo \%04x %04x\ /sys/bus/pci/drivers/pci-stub/ +new_id\n, vendor_id, device_id); +fprintf(stderr, *** $ echo \:%02x:%02x.%x\ /sys/bus/pci/drivers/ +%s/unbind\n, +dev-host.bus, dev-host.dev, dev-host.func, ns); +fprintf(stderr, *** $ echo \:%02x:%02x.%x\ /sys/bus/pci/drivers/ +pci-stub/bind\n, +dev-host.bus, dev-host.dev, dev-host.func); +fprintf(stderr, *** $ echo \%04x %04x\ /sys/bus/pci/drivers/pci-stub +/remove_id\n, vendor_id, device_id); +fprintf(stderr, ***\n); + +return; + +fail: +fprintf(stderr, Couldn't find out why.\n); +} + static int assign_device(AssignedDevice *dev) { struct kvm_assigned_pci_dev assigned_dev_data; @@ -781,9 +829,18 @@ static int assign_device(AssignedDevice *dev) #endif r = kvm_assign_pci_device(kvm_context, assigned_dev_data); -if (r 0) - fprintf(stderr, Failed to assign device \%s\ : %s\n, +if (r 0) { +fprintf(stderr, Failed to assign device \%s\ : %s\n, dev-dev.qdev.id, strerror(-r)); + +switch (r) { +case -EBUSY: +assign_failed_examine(dev); +break; +default: +break; +} +} return r; } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] Don't leak kvm_save_mpstate() to main qemu code
From: Avi Kivity a...@redhat.com It doesn't exist outside x86, and breaks the build. Move it to cpu_synchronize_state() instead (only reading, not writing). Signed-off-by: Avi Kivity a...@redhat.com diff --git a/monitor.c b/monitor.c index 5a9fae6..6ff6e1f 100644 --- a/monitor.c +++ b/monitor.c @@ -677,7 +677,6 @@ static CPUState *mon_get_cpu(void) mon_set_cpu(0); } cpu_synchronize_state(cur_mon-mon_cpu); -kvm_save_mpstate(cur_mon-mon_cpu); return cur_mon-mon_cpu; } @@ -780,7 +779,6 @@ static void do_info_cpus(Monitor *mon, QObject **ret_data) QObject *obj; cpu_synchronize_state(env); -kvm_save_mpstate(env); obj = qobject_from_jsonf({ 'CPU': %d, 'current': %i, 'halted': %i }, env-cpu_index, env == mon-mon_cpu, diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index 7b7bc0f..82e362c 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -1217,6 +1217,7 @@ void kvm_arch_save_regs(CPUState *env) return; } } +kvm_arch_save_mpstate(env); } static void do_cpuid_ent(struct kvm_cpuid_entry2 *e, uint32_t function, -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: powerpc: Move vector to irqprio resolving to separate function
From: Alexander Graf ag...@suse.de We're using a switch table to find the irqprio that belongs to a specific interrupt vector. This table is part of the interrupt inject logic. Since we'll add a new function to stop interrupts, let's move this table out of the injection logic into a separate function. Signed-off-by: Alexander Graf ag...@suse.de Acked-by: Acked-by: Hollis Blanchard hol...@penguinppc.org Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 3e294bd..241795b 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -125,11 +125,10 @@ void kvmppc_inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 flags) vcpu-arch.mmu.reset_msr(vcpu); } -void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec) +static int kvmppc_book3s_vec2irqprio(unsigned int vec) { unsigned int prio; - vcpu-stat.queue_intr++; switch (vec) { case 0x100: prio = BOOK3S_IRQPRIO_SYSTEM_RESET; break; case 0x200: prio = BOOK3S_IRQPRIO_MACHINE_CHECK;break; @@ -149,7 +148,15 @@ void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec) default:prio = BOOK3S_IRQPRIO_MAX; break; } - set_bit(prio, vcpu-arch.pending_exceptions); + return prio; +} + +void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec) +{ + vcpu-stat.queue_intr++; + + set_bit(kvmppc_book3s_vec2irqprio(vec), + vcpu-arch.pending_exceptions); #ifdef EXIT_DEBUG printk(KERN_INFO Queueing interrupt %x\n, vec); #endif -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: powerpc: Remove AGGRESSIVE_DEC
From: Alexander Graf ag...@suse.de Because we now emulate the DEC interrupt according to real life behavior, there's no need to keep the AGGRESSIVE_DEC hack around. Let's just remove it. Signed-off-by: Alexander Graf ag...@suse.de Acked-by: Acked-by: Hollis Blanchard hol...@penguinppc.org Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index fd3ad6c..803505d 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -34,12 +34,6 @@ /* #define EXIT_DEBUG */ /* #define EXIT_DEBUG_SIMPLE */ -/* Without AGGRESSIVE_DEC we only fire off a DEC interrupt when DEC turns 0. - * When set, we retrigger a DEC interrupt after that if DEC = 0. - * PPC32 Linux runs faster without AGGRESSIVE_DEC, PPC64 Linux requires it. */ - -/* #define AGGRESSIVE_DEC */ - struct kvm_stats_debugfs_item debugfs_entries[] = { { exits, VCPU_STAT(sum_exits) }, { mmio,VCPU_STAT(mmio_exits) }, @@ -81,7 +75,7 @@ void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu) to_book3s(vcpu)-slb_shadow_max = get_paca()-kvm_slb_max; } -#if defined(AGGRESSIVE_DEC) || defined(EXIT_DEBUG) +#if defined(EXIT_DEBUG) static u32 kvmppc_get_dec(struct kvm_vcpu *vcpu) { u64 jd = mftb() - vcpu-arch.dec_jiffies; @@ -273,14 +267,6 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu) unsigned long *pending = vcpu-arch.pending_exceptions; unsigned int priority; - /* XXX be more clever here - no need to mftb() on every entry */ - /* Issue DEC again if it's still active */ -#ifdef AGGRESSIVE_DEC - if (vcpu-arch.msr MSR_EE) - if (kvmppc_get_dec(vcpu) 0x8000) - kvmppc_core_queue_dec(vcpu); -#endif - #ifdef EXIT_DEBUG if (vcpu-arch.pending_exceptions) printk(KERN_EMERG KVM: Check pending: %lx\n, vcpu-arch.pending_exceptions); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: powerpc: Improve DEC handling
From: Alexander Graf ag...@suse.de We treated the DEC interrupt like an edge based one. This is not true for Book3s. The DEC keeps firing until mtdec is issued again and thus clears the interrupt line. So let's implement this logic in KVM too. This patch moves the line clearing from the firing of the interrupt to the mtdec emulation. This makes PPC64 guests work without AGGRESSIVE_DEC defined. Signed-off-by: Alexander Graf ag...@suse.de Acked-by: Acked-by: Hollis Blanchard hol...@penguinppc.org Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 269ee46..abfd0c4 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -82,6 +82,7 @@ extern void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu); extern int kvmppc_core_pending_dec(struct kvm_vcpu *vcpu); extern void kvmppc_core_queue_program(struct kvm_vcpu *vcpu); extern void kvmppc_core_queue_dec(struct kvm_vcpu *vcpu); +extern void kvmppc_core_dequeue_dec(struct kvm_vcpu *vcpu); extern void kvmppc_core_queue_external(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq); diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 241795b..fd3ad6c 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -151,6 +151,13 @@ static int kvmppc_book3s_vec2irqprio(unsigned int vec) return prio; } +static void kvmppc_book3s_dequeue_irqprio(struct kvm_vcpu *vcpu, + unsigned int vec) +{ + clear_bit(kvmppc_book3s_vec2irqprio(vec), + vcpu-arch.pending_exceptions); +} + void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec) { vcpu-stat.queue_intr++; @@ -178,6 +185,11 @@ int kvmppc_core_pending_dec(struct kvm_vcpu *vcpu) return test_bit(BOOK3S_INTERRUPT_DECREMENTER 7, vcpu-arch.pending_exceptions); } +void kvmppc_core_dequeue_dec(struct kvm_vcpu *vcpu) +{ + kvmppc_book3s_dequeue_irqprio(vcpu, BOOK3S_INTERRUPT_DECREMENTER); +} + void kvmppc_core_queue_external(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq) { @@ -275,7 +287,9 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu) #endif priority = __ffs(*pending); while (priority = (sizeof(unsigned int) * 8)) { - if (kvmppc_book3s_irqprio_deliver(vcpu, priority)) { + if (kvmppc_book3s_irqprio_deliver(vcpu, priority) + (priority != BOOK3S_IRQPRIO_DECREMENTER)) { + /* DEC interrupts get cleared by mtdec */ clear_bit(priority, vcpu-arch.pending_exceptions); break; } diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 06f5a9e..d8b6342 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -97,6 +97,11 @@ int kvmppc_core_pending_dec(struct kvm_vcpu *vcpu) return test_bit(BOOKE_IRQPRIO_DECREMENTER, vcpu-arch.pending_exceptions); } +void kvmppc_core_dequeue_dec(struct kvm_vcpu *vcpu) +{ + clear_bit(BOOKE_IRQPRIO_DECREMENTER, vcpu-arch.pending_exceptions); +} + void kvmppc_core_queue_external(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq) { diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c index 4a9ac66..303457b 100644 --- a/arch/powerpc/kvm/emulate.c +++ b/arch/powerpc/kvm/emulate.c @@ -83,6 +83,9 @@ void kvmppc_emulate_dec(struct kvm_vcpu *vcpu) pr_debug(mtDEC: %x\n, vcpu-arch.dec); #ifdef CONFIG_PPC64 + /* mtdec lowers the interrupt line when positive. */ + kvmppc_core_dequeue_dec(vcpu); + /* POWER4+ triggers a dec interrupt if the value is 0 */ if (vcpu-arch.dec 0x8000) { hrtimer_try_to_cancel(vcpu-arch.dec_timer); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: powerpc: Change maintainer
From: Alexander Graf ag...@suse.de Progress on KVM for Embedded PowerPC has stalled, but for Book3S there's quite a lot of work to do and going on. So in agreement with Hollis and Avi, we should switch maintainers for PowerPC. Signed-off-by: Alexander Graf ag...@suse.de Acked-by: Hollis Blanchard hol...@penguinppc.org Signed-off-by: Avi Kivity a...@redhat.com diff --git a/MAINTAINERS b/MAINTAINERS index efd2ef2..0c1a696 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3139,7 +3139,7 @@ F:arch/x86/include/asm/svm.h F: arch/x86/kvm/svm.c KERNEL VIRTUAL MACHINE (KVM) FOR POWERPC -M: Hollis Blanchard holl...@us.ibm.com +M: Alexander Graf ag...@suse.de L: kvm-...@vger.kernel.org W: http://kvm.qumranet.com S: Supported -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: powerpc: Show timing option only on embedded
From: Alexander Graf ag...@suse.de Embedded PowerPC KVM has an exit timing implementation to track and evaluate how much time was spent in which exit path. For Book3S, we don't implement it. So let's not expose it as a config option either. Signed-off-by: Alexander Graf ag...@suse.de Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 7635ba2..be28968 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -54,7 +54,7 @@ config KVM_440 config KVM_EXIT_TIMING bool Detailed exit timing - depends on KVM + depends on KVM_440 || KVM_E500 ---help--- Calculate elapsed time for every exit/enter cycle. A per-vcpu report is available in debugfs kvm/vm#_vcpu#_timing. -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Qemu vs Qemu-KVM
On 12/22/2009 12:21 AM, Mikolaj Kucharski wrote: On Mon, Dec 21, 2009 at 09:22:52AM +0200, Gleb Natapov wrote: I have personal interest in resolving RedHat's bz #508801, unfortunately I cannot do that myself, so I wanted to ask on the list for help, but now I'm confused where should I go. Can you try kvm modules from latest kvm.git please? It looks like emulation of push %ds fails and it was added after 2.6.32. Having following GIT repositories: git://git.kernel.org/pub/scm/virt/kvm/kvm.git git://git.kernel.org/pub/scm/virt/kvm/kvm-kmod.git Which one I should use to build my modules from? I would like to keep my system (Fedora 12) consistent and I don't want to have any parts built outside of rpm. I would like to contribute/help for RedHat's bz #508801 resolution, but I need some directions. git clone git://git.kernel.org/pub/scm/virt/kvm/kvm-kmod.git git submodule init git submodule update ./configure make sync make make install Is kvm.git whole Linux kernel Yes. kvm-kmod downloads it (via git submodule) and extracts the kvm bits (via make sync). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Improve Decrementor Implementation v2
On 12/21/2009 09:21 PM, Alexander Graf wrote: We currently have an ugly hack called AGGRESSIVE_DEC that makes the Decrementor either work well for PPC32 or PPC64 targets. This patchset removes that hack, bringing the decrementor implementation closer to real hardware. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Show KVM timing option only on embedded
On 12/20/2009 11:24 PM, Alexander Graf wrote: Embedded PowerPC KVM has an exit timing implementation to track and evaluate how much time was spent in which exit path. For Book3S, we don't implement it. So let's not expose it as a config option either. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Change PowerPC KVM maintainer
On 12/20/2009 11:24 PM, Alexander Graf wrote: Progress on KVM for Embedded PowerPC has stalled, but for Book3S there's quite a lot of work to do and going on. So in agreement with Hollis and Avi, we should switch maintainers for PowerPC. I'll still demand Acks from Hollis for code that changes BookE parts when I can't say for sure if the change is ok. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Device Assignment fixes
On 12/17/2009 05:04 PM, Alexander Graf wrote: While trying out I stumbled across several issues in the device assignment code. This set addresses the most major ones. Namely allowing passthrough of non page aligned BAR region (like on lpfc) and telling users what to do when their device is in use. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] slow_map: minor improvements to ROM BAR handling
ROM BAR can be handled same as regular BAR: load_option_roms utility will take care of copying it to RAM as appropriate. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- This patch applies on top of agraf's one, it takes care of non-page aligned ROM BARs as well: they mostly are taken care of, we just do not need to warn user about them. hw/device-assignment.c | 20 +--- 1 files changed, 9 insertions(+), 11 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 000fa61..066fdb6 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -486,25 +486,23 @@ static int assigned_dev_register_regions(PCIRegion *io_regions, : PCI_BASE_ADDRESS_SPACE_MEMORY; if (cur_region-size 0xFFF) { -fprintf(stderr, PCI region %d at address 0x%llx -has size 0x%x, which is not a multiple of 4K. -You might experience some performance hit due to that.\n, -i, (unsigned long long)cur_region-base_addr, -cur_region-size); +if (i != PCI_ROM_SLOT) { +fprintf(stderr, PCI region %d at address 0x%llx +has size 0x%x, which is not a multiple of 4K. +You might experience some performance hit +due to that.\n, +i, (unsigned long long)cur_region-base_addr, +cur_region-size); +} slow_map = 1; } -if (slow_map (i == PCI_ROM_SLOT)) { -fprintf(stderr, ROM not aligned - can't continue\n); -return -1; -} - /* map physical memory */ pci_dev-v_addrs[i].e_physbase = cur_region-base_addr; if (i == PCI_ROM_SLOT) { pci_dev-v_addrs[i].u.r_virtbase = mmap(NULL, - (cur_region-size + 0xFFF) 0xF000, + cur_region-size, PROT_WRITE | PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, 0, (off_t) 0); -- 1.6.6.rc1.43.gf55cc -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
qemu-kvm-0.12 bug?
Hi, I've switched from -0.11.0 to -0.12 and from 2.6.31.6 to 2.6.32.2 to try the new virtio-memory-API introduced in latest libvirt from git. I can start VMs f.e. by kvm -cdrom $someiso --enable-kvm but my domain configs to not work anymore. Domain config: http://pastebin.com/f66324669 Qemu log when doing virsh start wp01: LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin HOME=/root USER=root LOGNAME=root QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.11 -enable-kvm -m 1000 -smp 1 -name wp01 -uuid 4c34e9aa-2bcd-fa8e-bc98-4417c1cb779a -chardev socket,id=monitor,path=/usr/local/var/lib/libvirt/qemu/wp01.monitor,server,nowait -monitor chardev:monitor -boot c -kernel /boot/vmlinuz-2.6.32.2 -initrd /boot/initrd.img-2.6.32.2 -append root=/dev/vdb noresume2 -drive file=/dev/disk/by-path/ip-10.0.1.1:3260-iscsi-iqn.2009-09.local:store.wp01-swap-lun-0,if=virtio,index=0,boot=on -drive file=/dev/disk/by-path/ip-10.0.1.1:3260-iscsi-iqn.2009-09.local:store.wp01-disk-lun-0,if=virtio,index=1 -net nic,macaddr=00:16:3e:00:01:01,vlan=0,model=virtio,name=virtio.0 -net tap,fd=18,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 -parallel none -usb -vnc 0.0.0.0:0 -k de -vga cirrus char device redirected to /dev/pts/3 Option 'ipv4': Use 'on' or 'off' Failed to parse yes for dummy.ipv4 rom: requested regions overlap (rom vapic.bin. free=0x0600, addr=0x) rom loading failed Is this a bug or am I missing something? Thanks, kr tom -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm-0.12 bug?
On Tuesday 22 December 2009 13:02:28 Thomas Treutner wrote: Hi, I've switched from -0.11.0 to -0.12 and from 2.6.31.6 to 2.6.32.2 to try the new virtio-memory-API introduced in latest libvirt from git. I can start VMs f.e. by kvm -cdrom $someiso --enable-kvm but my domain configs to not work anymore. Just found the other bug report about this -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] slow_map: minor improvements to ROM BAR handling
On Tue, Dec 22, 2009 at 01:05:23PM +0100, Alexander Graf wrote: Michael S. Tsirkin wrote: ROM BAR can be handled same as regular BAR: load_option_roms utility will take care of copying it to RAM as appropriate. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- This patch applies on top of agraf's one, it takes care of non-page aligned ROM BARs as well: they mostly are taken care of, we just do not need to warn user about them. hw/device-assignment.c | 20 +--- 1 files changed, 9 insertions(+), 11 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 000fa61..066fdb6 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -486,25 +486,23 @@ static int assigned_dev_register_regions(PCIRegion *io_regions, : PCI_BASE_ADDRESS_SPACE_MEMORY; if (cur_region-size 0xFFF) { -fprintf(stderr, PCI region %d at address 0x%llx -has size 0x%x, which is not a multiple of 4K. -You might experience some performance hit due to that.\n, -i, (unsigned long long)cur_region-base_addr, -cur_region-size); +if (i != PCI_ROM_SLOT) { +fprintf(stderr, PCI region %d at address 0x%llx +has size 0x%x, which is not a multiple of 4K. +You might experience some performance hit +due to that.\n, +i, (unsigned long long)cur_region-base_addr, +cur_region-size); +} slow_map = 1; This is wrong. You're setting slow_map = 1 on code that is very likely to be executed inside the guest. That doesn't work. It is? Can you really run code directly from a PCI card? I looked at BIOS boot specification and it always talks about shadowing PCI ROMs. Better pad the ROM size to page boundary and use the shadow mapping we have in place already. Changing BAR size might break some drivers. Our BIOS seems to shadow ROM instead of running it directly, so we should be fine I think? Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] qemu-kvm-0.12.1.1
qemu-kvm-0.12.1.1 is now available. This release is is based on the upstream qemu 0.12.1, plus kvm-specific enhancements. Please see the original qemu 0.12.1 release announcement for details. This release can be used with the kvm kernel modules provided by your distribution kernel, or by the modules in the kvm-kmod package, such as kvm-kmod-2.6.32. Changes from qemu-kvm-0.12.1 - fix build error due to missing kvm_save_mpstate() on some configurations - fix option rom loading (fixes boot=on) http://www.linux-kvm.org -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] qemu-kvm-0.12.1.1
sorry to bring bad news, but it still doesn't compile (at least for me): [r...@vmdev03 qemu-kvm-0.12.1.1]# make LINK x86_64-softmmu/qemu-system-x86_64 machine.o: In function `cpu_pre_save': /usr/src/redhat/BUILD/qemu-kvm-0.12.1.1/target-i386/machine.c:326: undefined reference to `kvm_save_mpstate' collect2: ld returned 1 exit status make[1]: *** [qemu-system-x86_64] Error 1 make: *** [subdir-x86_64-softmmu] Error 2 n. On Tue, Dec 22, 2009 at 02:50:03PM +0200, Avi Kivity wrote: qemu-kvm-0.12.1.1 is now available. This release is is based on the upstream qemu 0.12.1, plus kvm-specific enhancements. Please see the original qemu 0.12.1 release announcement for details. This release can be used with the kvm kernel modules provided by your distribution kernel, or by the modules in the kvm-kmod package, such as kvm-kmod-2.6.32. Changes from qemu-kvm-0.12.1 - fix build error due to missing kvm_save_mpstate() on some configurations - fix option rom loading (fixes boot=on) http://www.linux-kvm.org -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- - Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28. rijna 168, 709 01 Ostrava tel.: +420 596 603 142 fax:+420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: ser...@linuxbox.cz - -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] qemu-kvm-0.12.1.1
On 12/22/2009 03:35 PM, Nikola Ciprich wrote: sorry to bring bad news, but it still doesn't compile (at least for me): [r...@vmdev03 qemu-kvm-0.12.1.1]# make LINK x86_64-softmmu/qemu-system-x86_64 machine.o: In function `cpu_pre_save': /usr/src/redhat/BUILD/qemu-kvm-0.12.1.1/target-i386/machine.c:326: undefined reference to `kvm_save_mpstate' collect2: ld returned 1 exit status make[1]: *** [qemu-system-x86_64] Error 1 make: *** [subdir-x86_64-softmmu] Error 2 Please provide details about your host. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] qemu-kvm-0.12.1.1
CentOS5, all packages updated. But I just noticed build fails only when configured with multiple targets, including some more exotic (based on fedora spec). building just using x86_64-softmmu works fine for me. n. On Tue, Dec 22, 2009 at 03:43:32PM +0200, Avi Kivity wrote: On 12/22/2009 03:35 PM, Nikola Ciprich wrote: sorry to bring bad news, but it still doesn't compile (at least for me): [r...@vmdev03 qemu-kvm-0.12.1.1]# make LINK x86_64-softmmu/qemu-system-x86_64 machine.o: In function `cpu_pre_save': /usr/src/redhat/BUILD/qemu-kvm-0.12.1.1/target-i386/machine.c:326: undefined reference to `kvm_save_mpstate' collect2: ld returned 1 exit status make[1]: *** [qemu-system-x86_64] Error 1 make: *** [subdir-x86_64-softmmu] Error 2 Please provide details about your host. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- - Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28. rijna 168, 709 01 Ostrava tel.: +420 596 603 142 fax:+420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: ser...@linuxbox.cz - -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] qemu-kvm-0.12.1.1
On 12/22/2009 04:07 PM, Nikola Ciprich wrote: CentOS5, all packages updated. But I just noticed build fails only when configured with multiple targets, including some more exotic (based on fedora spec). building just using x86_64-softmmu works fine for me. Okay, so at least it works out of the box (no special configure option). Hopefully someone will figure out why other targets cause it to fail - looks like some files are compiled with the wrong config.h. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Qemu vs Qemu-KVM
On Tue, Dec 22, 2009 at 09:15:57AM +0200, Gleb Natapov wrote: On Mon, Dec 21, 2009 at 10:21:14PM +, Mikolaj Kucharski wrote: On Mon, Dec 21, 2009 at 09:22:52AM +0200, Gleb Natapov wrote: I have personal interest in resolving RedHat's bz #508801, unfortunately I cannot do that myself, so I wanted to ask on the list for help, but now I'm confused where should I go. Can you try kvm modules from latest kvm.git please? It looks like emulation of push %ds fails and it was added after 2.6.32. Having following GIT repositories: git://git.kernel.org/pub/scm/virt/kvm/kvm.git git://git.kernel.org/pub/scm/virt/kvm/kvm-kmod.git Which one I should use to build my modules from? I would like to keep my system (Fedora 12) consistent and I don't want to have any parts built outside of rpm. I would like to contribute/help for RedHat's bz #508801 resolution, but I need some directions. Is kvm.git whole Linux kernel? Don't bother, I already tested upstream with OpenBSD and, as Avi said, the problem is somewhere else. For some strange reason openbsd configures gsi 4 (com0) and gsi 12 (i8042) to be level triggered active high in ioapic. That causes KVM to send endless stream of interrupts into the guest, so guest's stack overflows into framebuffer area. At this point KVM start to emulate instructions and fails. I don't know why openbsd configures those interrupts incorrectly. It also ignores my attempts to override interrupt polarity/type with ACPI tables. Somebody knowledgeable in openbsd should look into why it configures interrupt controller incorrectly. If I override wrong settings inside KVM like in the patch below openbsd boots, but I doubt com port is usable. diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index 38a2d20..fc37eac 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -121,6 +121,8 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val) default: index = (ioapic-ioregsel - 0x10) 1; + if (!(ioapic-ioregsel 1)) + val = ~0xa000; ioapic_debug(change redir index %x val %x\n, index, val); if (index = IOAPIC_NUM_PINS) return; -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] slow_map: minor improvements to ROM BAR handling
On Tue, Dec 22, 2009 at 02:34:42PM +0100, Alexander Graf wrote: Michael S. Tsirkin wrote: On Tue, Dec 22, 2009 at 01:05:23PM +0100, Alexander Graf wrote: Michael S. Tsirkin wrote: ROM BAR can be handled same as regular BAR: load_option_roms utility will take care of copying it to RAM as appropriate. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- This patch applies on top of agraf's one, it takes care of non-page aligned ROM BARs as well: they mostly are taken care of, we just do not need to warn user about them. hw/device-assignment.c | 20 +--- 1 files changed, 9 insertions(+), 11 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 000fa61..066fdb6 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -486,25 +486,23 @@ static int assigned_dev_register_regions(PCIRegion *io_regions, : PCI_BASE_ADDRESS_SPACE_MEMORY; if (cur_region-size 0xFFF) { -fprintf(stderr, PCI region %d at address 0x%llx -has size 0x%x, which is not a multiple of 4K. -You might experience some performance hit due to that.\n, -i, (unsigned long long)cur_region-base_addr, -cur_region-size); +if (i != PCI_ROM_SLOT) { +fprintf(stderr, PCI region %d at address 0x%llx +has size 0x%x, which is not a multiple of 4K. +You might experience some performance hit +due to that.\n, +i, (unsigned long long)cur_region-base_addr, +cur_region-size); +} slow_map = 1; This is wrong. You're setting slow_map = 1 on code that is very likely to be executed inside the guest. That doesn't work. It is? Can you really run code directly from a PCI card? I looked at BIOS boot specification and it always talks about shadowing PCI ROMs. I'm not sure the BIOS is the only one executing ROMs. If it is, then I'm good with the change. Maybe it'd make sense to also add a read only flag so we don't accidently try to write to the ROM region with slow_map. Alex Correct: I think it's made readonly down the road with mprotect, so attempt to do so will crash qemu :) -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] slow_map: minor improvements to ROM BAR handling
On 12/22/2009 05:19 PM, Michael S. Tsirkin wrote: I'm not sure the BIOS is the only one executing ROMs. If it is, then I'm good with the change. Maybe it'd make sense to also add a read only flag so we don't accidently try to write to the ROM region with slow_map. Alex Correct: I think it's made readonly down the road with mprotect, so attempt to do so will crash qemu :) Alex, are you happy with this? I'd like to apply it. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] slow_map: minor improvements to ROM BAR handling
Avi Kivity wrote: On 12/22/2009 05:19 PM, Michael S. Tsirkin wrote: I'm not sure the BIOS is the only one executing ROMs. If it is, then I'm good with the change. Maybe it'd make sense to also add a read only flag so we don't accidently try to write to the ROM region with slow_map. Alex Correct: I think it's made readonly down the road with mprotect, so attempt to do so will crash qemu :) Alex, are you happy with this? I'd like to apply it. I'd like to see the read-only protection in. Apart from that I'm good on checking it in, though I'm only awaiting the day someone runs code off such a ROM region ;-). Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] slow_map: minor improvements to ROM BAR handling
Avi Kivity wrote: On 12/22/2009 05:36 PM, Alexander Graf wrote: Is there a way to trap this and fprintf something? I don't think so. KVM will just trap on execution outside of RAM and either fail badly or throw something bad into the guest. MMIO access works by analyzing the instruction that accesses the MMIO address. That just doesn't work when we don't have an instruction to analyze. We could certainly extend emulate.c to fetch instruction bytes from userspace. It uses -read_std() now, so we'd need to switch to -read_emulated() and add appropriate buffering. I thought the policy on emulate.c was to not have a full instruction emulator but only emulate instructions that do PT modifications or MMIO access? Btw, we're in the same situation with PowerPC here. The instruction emulator is _really_ small. It only does a few MMU specific instructions, a couple of privileged ones and MMIO accessing ones. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] slow_map: minor improvements to ROM BAR handling
On 12/22/2009 05:41 PM, Alexander Graf wrote: We could certainly extend emulate.c to fetch instruction bytes from userspace. It uses -read_std() now, so we'd need to switch to -read_emulated() and add appropriate buffering. I thought the policy on emulate.c was to not have a full instruction emulator but only emulate instructions that do PT modifications or MMIO access? It's not a policy, just laziness. With emulate_invalid_guest_state=1 we need many more instructions. Of course I don't want to add instructions just for the sake of it, since they will be untested. I'd much prefer not to run from mmio if possible - just pointing out it's doable. Btw, we're in the same situation with PowerPC here. The instruction emulator is _really_ small. It only does a few MMU specific instructions, a couple of privileged ones and MMIO accessing ones. Plus, you have a fixed length instruction length, likely more regular too. I imagine powerpc is load/store, so you don't have to emulate a zillion ALU instructions? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33
On Tuesday 22 December 2009 04:31:32 pm Anthony Liguori wrote: I think the comparison would be if someone submitted a second e1000 driver that happened to do better on one netperf test than the current e1000 driver. You can argue, hey, choice is good, let's let a user choose if they want to use the faster e1000 driver. But surely, the best thing for a user is to figure out why the second e1000 driver is better on that one test, integrate that change into the current e1000 driver, or decided that the Even though this is Won't somebody please think of the users? argument such work would be much welcomed. Sending patches would be a great start.. new e1000 driver is more superior in architecture and do the required work to make the new e1000 driver a full replacement for the old one. Right, like everyone actually does things this way.. I wonder why do we have OSS, old Firewire and IDE stacks still around then? Regards, Anthony Liguori Unwritten code tends to always sound nicer, but it remains to be seen if it can deliver what it promises. From a abstract stand point having efficient paravirtual IO interfaces seem attractive. I also personally don't see a big problem in having another set of virtual drivers -- Linux already has plenty (vmware, xen, virtio, power, s390-vm, ...) and it's not that they would be a particular maintenance burden impacting the kernel core. Exactly, I also don't see any problem here, especially since AlacrityVM drivers have much cleaner design / internal architecture than some of their competitors.. -- Bartlomiej Zolnierkiewicz -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] slow_map: minor improvements to ROM BAR handling
On Tue, Dec 22, 2009 at 05:00:52PM +0100, Alexander Graf wrote: Avi Kivity wrote: On 12/22/2009 05:41 PM, Alexander Graf wrote: We could certainly extend emulate.c to fetch instruction bytes from userspace. It uses -read_std() now, so we'd need to switch to -read_emulated() and add appropriate buffering. I thought the policy on emulate.c was to not have a full instruction emulator but only emulate instructions that do PT modifications or MMIO access? It's not a policy, just laziness. With emulate_invalid_guest_state=1 we need many more instructions. Of course I don't want to add instructions just for the sake of it, since they will be untested. I'd much prefer not to run from mmio if possible - just pointing out it's doable. Right... emulator is _really_ small. It only does a few MMU specific instructions, a couple of privileged ones and MMIO accessing ones. Btw, we're in the same situation with PowerPC here. The instruction Plus, you have a fixed length instruction length, likely more regular too. I imagine powerpc is load/store, so you don't have to emulate a zillion ALU instructions? Well, it's certainly doable (and easier than on x86). But I'm on the same position as you on the x86 side. Why increase the emulator size at least 10 times if we don't have to? Either way, people will report bugs when / if they actually start executing code off MMIO. So let's not care too much about it for now. Just make sure the read-only check is in. Alex So I think all we need is this on top? diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 066fdb6..0c3c8f4 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -233,7 +233,8 @@ static void assigned_dev_iomem_map_slow(PCIDevice *pci_dev, int region_num, int m; DEBUG(slow map\n); -m = cpu_register_io_memory(slow_bar_read, slow_bar_write, region); +m = cpu_register_io_memory(slow_bar_read, region_num == PCI_ROM_SLOT ? + NULL : slow_bar_write, region); cpu_register_physical_memory(e_phys, e_size, m); /* MSI-X MMIO page */ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] slow_map: minor improvements to ROM BAR handling
Michael S. Tsirkin wrote: On Tue, Dec 22, 2009 at 05:00:52PM +0100, Alexander Graf wrote: Avi Kivity wrote: On 12/22/2009 05:41 PM, Alexander Graf wrote: We could certainly extend emulate.c to fetch instruction bytes from userspace. It uses -read_std() now, so we'd need to switch to -read_emulated() and add appropriate buffering. I thought the policy on emulate.c was to not have a full instruction emulator but only emulate instructions that do PT modifications or MMIO access? It's not a policy, just laziness. With emulate_invalid_guest_state=1 we need many more instructions. Of course I don't want to add instructions just for the sake of it, since they will be untested. I'd much prefer not to run from mmio if possible - just pointing out it's doable. Right... emulator is _really_ small. It only does a few MMU specific instructions, a couple of privileged ones and MMIO accessing ones. Btw, we're in the same situation with PowerPC here. The instruction Plus, you have a fixed length instruction length, likely more regular too. I imagine powerpc is load/store, so you don't have to emulate a zillion ALU instructions? Well, it's certainly doable (and easier than on x86). But I'm on the same position as you on the x86 side. Why increase the emulator size at least 10 times if we don't have to? Either way, people will report bugs when / if they actually start executing code off MMIO. So let's not care too much about it for now. Just make sure the read-only check is in. Alex So I think all we need is this on top? diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 066fdb6..0c3c8f4 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -233,7 +233,8 @@ static void assigned_dev_iomem_map_slow(PCIDevice *pci_dev, int region_num, int m; DEBUG(slow map\n); -m = cpu_register_io_memory(slow_bar_read, slow_bar_write, region); +m = cpu_register_io_memory(slow_bar_read, region_num == PCI_ROM_SLOT ? + NULL : slow_bar_write, region); cpu_register_physical_memory(e_phys, e_size, m); /* MSI-X MMIO page */ I guess so, yes. I'd prefer a written out if statement though, but that's probably personal preference. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33
On 12/22/2009 06:21 PM, Andi Kleen wrote: So far, the only actual technical advantage I've seen is that vbus avoids EOI exits. The technical advantage is that it's significantly faster today. Maybe your proposed alternative is as fast, or maybe it's not. Who knows? We're working on numbers for the proposed alternative, so we should know soon. Are the AlacrityVM folks working on having all the virtio drivers for all the virtio archs? We shouldn't drop everything and switch to new code just because someone came up with a new idea. The default should be to enhance the existing code. We think we understand why vbus does better than the current userspace virtio backend. That's why we're building vhost-net. It's not done yet, but our expectation is that it will do just as well if not better. That's the vapourware vs working code disconnect I mentioned. One side has hard numbersworking code and the other has expectations. I usually find it sad when the vapourware holds up the working code. vhost-net is working code and is queued for 2.6.33. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6 v2] Add support for RDTSCP in VMX
On Sun, Dec 20, 2009 at 11:30:00AM +0200, Avi Kivity wrote: On 12/18/2009 10:48 AM, Sheng Yang wrote: Applied all, thanks. Need to save/restore MSR_TSC_AUX in qemu-kvm. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] vhost-net: comment use of invalid fd when setting vhost backend
This looks like an error case, but it's just a special case to shutdown the backend. Clarify with a comment. Signed-off-by: Chris Wright chr...@redhat.com --- drivers/vhost/net.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 22d5fef..cc92086 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -465,6 +465,7 @@ static struct socket *get_tun_socket(int fd) static struct socket *get_socket(int fd) { struct socket *sock; + /* special case to disable backend */ if (fd == -1) return NULL; sock = get_raw_socket(fd); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6 v2] Add support for RDTSCP in VMX
On 12/22/2009 07:26 PM, Marcelo Tosatti wrote: On Sun, Dec 20, 2009 at 11:30:00AM +0200, Avi Kivity wrote: On 12/18/2009 10:48 AM, Sheng Yang wrote: Applied all, thanks. Need to save/restore MSR_TSC_AUX in qemu-kvm. Should be automatic. After all, we expose all the MSR list, so qemu can read it and save everything there. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33
On 12/22/2009 07:36 PM, Gregory Haskins wrote: Gregory, it would be nice if you worked _much_ harder with the KVM folks before giving up. I think the 5+ months that I politely tried to convince the KVM folks that this was a good idea was pretty generous of my employer. The KVM maintainers have ultimately made it clear they are not interested in directly supporting this concept (which is their prerogative), but are perhaps willing to support the peripheral logic needed to allow it to easily interface with KVM. I can accept that, and thus AlacrityVM was born. Review pointed out locking issues with xinterface which I have not seen addressed. I asked why the irqfd/ioeventfd mechanisms are insufficient, and you did not reply. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33
On 12/22/09 1:53 PM, Avi Kivity wrote: On 12/22/2009 07:36 PM, Gregory Haskins wrote: Gregory, it would be nice if you worked _much_ harder with the KVM folks before giving up. I think the 5+ months that I politely tried to convince the KVM folks that this was a good idea was pretty generous of my employer. The KVM maintainers have ultimately made it clear they are not interested in directly supporting this concept (which is their prerogative), but are perhaps willing to support the peripheral logic needed to allow it to easily interface with KVM. I can accept that, and thus AlacrityVM was born. Review pointed out locking issues with xinterface which I have not seen addressed. I asked why the irqfd/ioeventfd mechanisms are insufficient, and you did not reply. Yes, I understand. I've been too busy to rework the code for an upstream push. I will certainly address those questions when I make the next attempt, but they weren't relevant to the guest side. signature.asc Description: OpenPGP digital signature
[PATCH] vhost-net: defer f-private_data until setup succeeds
Trivial change, just for readability. The filp is not installed on failure, so the current code is not incorrect (also vhost_dev_init currently has no failure case). This just treats setting f-private_data as something with global scope (sure, true only after fd_install). Signed-off-by: Chris Wright chr...@redhat.com --- drivers/vhost/net.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 22d5fef..0697ab2 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -326,7 +326,6 @@ static int vhost_net_open(struct inode *inode, struct file *f) int r; if (!n) return -ENOMEM; - f-private_data = n; n-vqs[VHOST_NET_VQ_TX].handle_kick = handle_tx_kick; n-vqs[VHOST_NET_VQ_RX].handle_kick = handle_rx_kick; r = vhost_dev_init(n-dev, n-vqs, VHOST_NET_VQ_MAX); @@ -338,6 +337,9 @@ static int vhost_net_open(struct inode *inode, struct file *f) vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT); vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN); n-tx_poll_state = VHOST_NET_POLL_DISABLED; + + f-private_data = n; + return 0; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33
On 12/22/09 1:53 PM, Avi Kivity wrote: I asked why the irqfd/ioeventfd mechanisms are insufficient, and you did not reply. BTW: the ioeventfd issue just fell through the cracks, so sorry about that. Note that I have no specific issue with irqfd ever since the lockless IRQ injection code was added. ioeventfd turned out to be suboptimal for me in the fast path for two reasons: 1) the underlying eventfd is called in atomic context. I had posted patches to Davide to address that limitation, but I believe he rejected them on the grounds that they are only relevant to KVM. 2) it cannot retain the data field passed in the PIO. I wanted to have one vector that could tell me what value was written, and this cannot be expressed in ioeventfd. Based on this, it was a better decision to add a ioevent interface to xinterface. It neatly solves both problems. Kind Regards, -Greg signature.asc Description: OpenPGP digital signature
Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33
On 12/22/2009 09:15 PM, Gregory Haskins wrote: On 12/22/09 1:53 PM, Avi Kivity wrote: I asked why the irqfd/ioeventfd mechanisms are insufficient, and you did not reply. BTW: the ioeventfd issue just fell through the cracks, so sorry about that. Note that I have no specific issue with irqfd ever since the lockless IRQ injection code was added. ioeventfd turned out to be suboptimal for me in the fast path for two reasons: 1) the underlying eventfd is called in atomic context. I had posted patches to Davide to address that limitation, but I believe he rejected them on the grounds that they are only relevant to KVM. If you're not doing something pretty minor, you're better of waking up a thread (perhaps _sync if you want to keep on the same cpu). With the new user return notifier thingie, that's pretty cheap. 2) it cannot retain the data field passed in the PIO. I wanted to have one vector that could tell me what value was written, and this cannot be expressed in ioeventfd. It would be easier to add data logging support to ioeventfd, if it was needed that badly. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33
On 12/22/09 2:32 PM, Gregory Haskins wrote: On 12/22/09 2:25 PM, Avi Kivity wrote: If you're not doing something pretty minor, you're better of waking up a thread (perhaps _sync if you want to keep on the same cpu). With the new user return notifier thingie, that's pretty cheap. We have exploits that take advantage of IO heuristics. When triggered they do more work in vcpu context than normal, which reduces latency under certain circumstances. But you definitely do _not_ want to do them in-atomic ;) And I almost forgot: dev-call() is an RPC to the backend device. Therefore, it must be synchronous, yet we dont want it locked either. I think that was actually the primary motivation for the change, now that I think about it. -Greg signature.asc Description: OpenPGP digital signature
Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33
On 12/22/09 2:38 PM, Avi Kivity wrote: On 12/22/2009 09:32 PM, Gregory Haskins wrote: xinterface, as it turns out, is a great KVM interface for me and easy to extend, all without conflicting with the changes in upstream. The old way was via the kvm ioctl interface, but that sucked as the ABI was always moving. Where is the problem? ioeventfd still works fine as it is. It means that kvm locking suddenly affects more of the kernel. Thats ok. This would only be w.r.t. devices that are bound to the KVM instance anyway, so they better know what they are doing (and they do). -Greg signature.asc Description: OpenPGP digital signature
Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33
On 12/22/2009 09:32 PM, Gregory Haskins wrote: Besides, Davide has already expressed dissatisfaction with the KVM-isms creeping into eventfd, so its not likely to ever be accepted regardless of your own disposition. Why don't you duplicate eventfd, then, should be easier than duplicating virtio. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33
On 12/22/2009 09:41 PM, Gregory Haskins wrote: It means that kvm locking suddenly affects more of the kernel. Thats ok. This would only be w.r.t. devices that are bound to the KVM instance anyway, so they better know what they are doing (and they do). It's okay to the author of that device. It's not okay to the kvm developers who are still evolving the locking and have to handle all devices that use xinterface. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33
On 12/22/09 2:43 PM, Avi Kivity wrote: On 12/22/2009 09:41 PM, Gregory Haskins wrote: It means that kvm locking suddenly affects more of the kernel. Thats ok. This would only be w.r.t. devices that are bound to the KVM instance anyway, so they better know what they are doing (and they do). It's okay to the author of that device. It's not okay to the kvm developers who are still evolving the locking and have to handle all devices that use xinterface. Perhaps, but like it or not, if you want to do in-kernel you need to invoke backends. And if you want to invoke backends, limiting it to thread wakeups is, well, limiting. For one, you miss out on that exploit I mentioned earlier which can help sometimes. Besides, the direction that Marcelo and I left the mmio/pio bus was that it would go lockless eventually, not more lockful ;) Has that changed? I honestly haven't followed whats going on in the io-bus code in a while. -Greg signature.asc Description: OpenPGP digital signature
Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33
On 12/22/09 2:39 PM, Davide Libenzi wrote: On Tue, 22 Dec 2009, Gregory Haskins wrote: On 12/22/09 1:53 PM, Avi Kivity wrote: I asked why the irqfd/ioeventfd mechanisms are insufficient, and you did not reply. BTW: the ioeventfd issue just fell through the cracks, so sorry about that. Note that I have no specific issue with irqfd ever since the lockless IRQ injection code was added. ioeventfd turned out to be suboptimal for me in the fast path for two reasons: 1) the underlying eventfd is called in atomic context. I had posted patches to Davide to address that limitation, but I believe he rejected them on the grounds that they are only relevant to KVM. I thought we addressed this already, in the few hundreds of email we exchanged back then :) We addressed the race conditions, but not the atomic callbacks. I can't remember exactly what you said, but the effect was no, so I dropped it. ;) This was the thread. http://www.archivum.info/linux-ker...@vger.kernel.org/2009-06/08548/Re:_[KVM-RFC_PATCH_1_2]_eventfd:_add_an_explicit_srcu_based_notifier_interface 2) it cannot retain the data field passed in the PIO. I wanted to have one vector that could tell me what value was written, and this cannot be expressed in ioeventfd. Like might have hinted in his reply, couldn't you add data support to the ioeventfd bits in KVM, instead of leaking them into mainline eventfd? Perhaps, or even easier I could extend xinterface. Which is what I did ;) The problem with the first proposal is that you would no longer actually have an eventfd based mechanism...so any code using ioeventfd (like Michael Tsirkin's for instance) would break. Kind Regards, -Greg signature.asc Description: OpenPGP digital signature
Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33
On 12/21/09 7:12 PM, Anthony Liguori wrote: On 12/21/2009 11:44 AM, Gregory Haskins wrote: Well, surely something like SR-IOV is moving in that direction, no? Not really, but that's a different discussion. Ok, but my general point still stands. At some level, some crafty hardware engineer may invent something that obsoletes the need for, say, PV 802.x drivers because it can hit 40GE line rate at the same performance level of bare metal with some kind of pass-through trick. But I still do not see that as an excuse for sloppy software in the meantime, as there will always be older platforms, older IO cards, or different IO types that are not benefactors of said hw based optimizations. But let's focus on concrete data. For a given workload, how many exits do you see due to EOI? Its of course highly workload dependent, and I've published these details in the past, I believe. Off the top of my head, I recall that virtio-pci tends to throw about 65k exits per second, vs about 32k/s for venet on a 10GE box, but I don't recall what ratio of those exits are EOI. Was this userspace virtio-pci or was this vhost-net? Both, actually, though userspace is obviously even worse. If it was the former, then were you using MSI-X? MSI-X If you weren't, there would be an additional (rather heavy) exit per-interrupt to clear the ISR which would certainly account for a large portion of the additional exits. Yep, if you don't use MSI it is significantly worse as expected. To be perfectly honest, I don't care. I do not discriminate against the exit type...I want to eliminate as many as possible, regardless of the type. That's how you go fast and yet use less CPU. It's important to understand why one mechanism is better than another. Agreed, but note _I_ already understand why. I've certainly spent countless hours/emails trying to get others to understand as well, but it seems most are too busy to actually listen. All I'm looking for is a set of bullet points that say, vbus does this, vhost-net does that, therefore vbus is better. We would then either say, oh, that's a good idea, let's change vhost-net to do that, or we would say, hrm, well, we can't change vhost-net to do that because of some fundamental flaw, let's drop it and adopt vbus. It's really that simple :-) This is all been covered ad-nauseam, directly with youself in many cases. Google is your friend. Here are some tips while you research: Do not fall into the trap of vhost-net vs vbus, or venet vs virtio-net, or you miss the point entirely. Recall that venet was originally crafted to demonstrate the virtues of my three performance objectives (kill exits, reduce exit overhead, and run concurrently). Then there is all the stuff we are laying on top, like qos, real-time, advanced fabrics, and easy adoption for various environments (so it doesn't need to be redefined each time). Therefore if you only look at the limited feature set of virtio-net, you will miss the majority of the points of the framework. virtio tried to capture some of these ideas, but it missed the mark on several levels and was only partially defined. Incidentally, you can stil run virtio over vbus if desired, but so far no one has tried to use my transport. They should be relatively rare because obtaining good receive batching is pretty easy. Batching is poor mans throughput (its easy when you dont care about latency), so we generally avoid as much as possible. Fair enough. Considering these are lightweight exits (on the order of 1-2us), APIC EOIs on x86 are MMIO based, so they are generally much heavier than that. I measure at least 4-5us just for the MMIO exit on my Woodcrest, never mind executing the locking/apic-emulation code. You won't like to hear me say this, but Woodcrests are pretty old and clunky as far as VT goes :-) Fair enough. On a modern Nehalem, I would be surprised if an MMIO exit handled in the kernel was muck more than 2us. The hardware is getting very, very fast. The trends here are very important to consider when we're looking at architectures that we potentially are going to support for a long time. The exit you do not take will always be infinitely faster. you need an awfully large amount of interrupts before you get really significant performance impact. You would think NAPI would kick in at this point anyway. Whether NAPI can kick in or not is workload dependent, and it also does not address coincident events. But on that topic, you can think of AlacrityVM's interrupt controller as NAPI for interrupts, because it operates on the same principle. For what its worth, it also operates on a NAPI for hypercalls concept too. The concept of always batching hypercalls has certainly been explored within the context of Xen. I am not talking about batching, which again is a poor mans throughput trick at the
Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33
On 12/22/2009 11:33 AM, Andi Kleen wrote: We're not talking about vaporware. vhost-net exists. Is it as fast as the alacrityvm setup then e.g. for network traffic? Last I heard the first could do wirespeed 10Gbit/s on standard hardware. I'm very wary of any such claims. As far as I know, no one has done an exhaustive study of vbus and published the results. This is why it's so important to understand why the results are what they are when we see numbers posted. For instance, check out http://www.redhat.com/f/pdf/summit/cwright_11_open_source_virt.pdf slide 32. These benchmarks show KVM without vhost-net pretty closely pacing native. With large message sizes, it's awfully close to line rate. Comparatively speaking, consider http://developer.novell.com/wiki/index.php/AlacrityVM/Results vbus here is pretty far off of native and virtio-net is ridiculus. Why are the results so different? Because benchmarking is fickle and networking performance is complicated. No one benchmarking scenario is going to give you a very good picture overall. It's also relatively easy to stack the cards in favor of one approach verses another. The virtio-net setup probably made extensive use of pinning and other tricks to make things faster than a normal user would see them. It ends up creating a perfect combination of batching which is pretty much just cooking the mitigation schemes to do extremely well for one benchmark. This is why it's so important to look at vbus from the perspective of critically asking, what precisely makes it better than virtio. A couple benchmarks on a single piece of hardware does not constitute an existence proof that it's better overall. There are a ton of differences between virtio and vbus because vbus was written in a vacuum wrt virtio. I'm not saying we are totally committed to virtio no matter what, but it should take a whole lot more than a couple netperf runs on a single piece of hardware for a single kind of driver to justify replacing it. Can vhost-net do the same thing? I think the fundamentally question is, what makes vbus better than vhost-net? vhost-net exists and is further along upstream than vbus is at the moment. If that question cannot be answered with technical facts and numbers to back them up, then we're just arguing for the sake of arguing. Regards, Anthony Liguori -Andi -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33
On Tue, 22 Dec 2009, Gregory Haskins wrote: On 12/22/09 2:39 PM, Davide Libenzi wrote: On Tue, 22 Dec 2009, Gregory Haskins wrote: On 12/22/09 1:53 PM, Avi Kivity wrote: I asked why the irqfd/ioeventfd mechanisms are insufficient, and you did not reply. BTW: the ioeventfd issue just fell through the cracks, so sorry about that. Note that I have no specific issue with irqfd ever since the lockless IRQ injection code was added. ioeventfd turned out to be suboptimal for me in the fast path for two reasons: 1) the underlying eventfd is called in atomic context. I had posted patches to Davide to address that limitation, but I believe he rejected them on the grounds that they are only relevant to KVM. I thought we addressed this already, in the few hundreds of email we exchanged back then :) We addressed the race conditions, but not the atomic callbacks. I can't remember exactly what you said, but the effect was no, so I dropped it. ;) This was the thread. http://www.archivum.info/linux-ker...@vger.kernel.org/2009-06/08548/Re:_[KVM-RFC_PATCH_1_2]_eventfd:_add_an_explicit_srcu_based_notifier_interface Didn't that ended up in schedule_work() being just fine, and no need was there for pre-emptible callbacks? 2) it cannot retain the data field passed in the PIO. I wanted to have one vector that could tell me what value was written, and this cannot be expressed in ioeventfd. Like might have hinted in his reply, couldn't you add data support to the ioeventfd bits in KVM, instead of leaking them into mainline eventfd? Perhaps, or even easier I could extend xinterface. Which is what I did ;) The problem with the first proposal is that you would no longer actually have an eventfd based mechanism...so any code using ioeventfd (like Michael Tsirkin's for instance) would break. At that point, the KVM eventfd can take care of thing so that Michael bits do not break. - Davide -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33
On Tue, Dec 22, 2009 at 12:36, Gregory Haskins gregory.hask...@gmail.com wrote: On 12/22/09 2:57 AM, Ingo Molnar wrote: * Gregory Haskins gregory.hask...@gmail.com wrote: Actually, these patches have nothing to do with the KVM folks. [...] That claim is curious to me - the AlacrityVM host It's quite simple, really. These drivers support accessing vbus, and vbus is hypervisor agnostic. In fact, vbus isn't necessarily even hypervisor related. It may be used anywhere where a Linux kernel is the io backend, which includes hypervisors like AlacrityVM, but also userspace apps, and interconnected physical systems as well. The vbus-core on the backend, and the drivers on the frontend operate completely independent of the underlying hypervisor. A glue piece called a connector ties them together, and any hypervisor specific details are encapsulated in the connector module. In this case, the connector surfaces to the guest side as a pci-bridge, so even that is not hypervisor specific per se. It will work with any pci-bridge that exposes a compatible ABI, which conceivably could be actual hardware. This is actually something that is of particular interest to me. I have a few prototype boards right now with programmable PCI-E host/device links on them; one of my long-term plans is to finagle vbus into providing multiple virtual devices across that single PCI-E interface. Specifically, I want to be able to provide virtual NIC(s), serial ports and serial consoles, virtual block storage, and possibly other kinds of interfaces. My big problem with existing virtio right now (although I would be happy to be proven wrong) is that it seems to need some sort of out-of-band communication channel for setting up devices, not to mention it seems to need one PCI device per virtual device. So I would love to be able to port something like vbus to my nify PCI hardware and write some backend drivers... then my PCI-E connected systems would dynamically provide a list of highly-efficient virtual devices to each other, with only one 4-lane PCI-E bus. Cheers, Kyle Moffett -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: fix high 32 bit in FEATURES ioctls
On Tue, 22 Dec 2009 10:09:33 pm Michael S. Tsirkin wrote: From: David Stevens dlstev...@us.ibm.com Subject: vhost: fix high 32 bit in FEATURES ioctls Thanks, applied. Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33
* Anthony Liguori anth...@codemonkey.ws wrote: On 12/22/2009 10:01 AM, Bartlomiej Zolnierkiewicz wrote: new e1000 driver is more superior in architecture and do the required work to make the new e1000 driver a full replacement for the old one. Right, like everyone actually does things this way.. I wonder why do we have OSS, old Firewire and IDE stacks still around then? And it's always a source of pain, isn't it. Even putting aside the fact that such overlap sucks and is a pain to users (and that 98% of driver and subsystem version transitions are done completely seemlessly to users - the examples that were cited were the odd ones out of 150K commits in the past 4 years, 149K+ of which are seemless), the comparison does not even apply really. e1000, OSS, old Firewire and IDE are hardware stacks, where hardware is a not fully understood externality, with its inevitable set of compatibility voes. There's often situations where one piece of hardware still works better with the old driver, for some odd (or not so odd) reason. Also, note that the 'new' hw drivers are generally intended and are maintained as clear replacements for the old ones, and do so with minimal ABI changes - or preferably with no ABI changes at all. Most driver developers just switch from old to new and the old bits are left around and are phased out. We phased out old OSS recently. That is a very different situation from the AlacrityVM patches, which: - Are a pure software concept and any compatibility mismatch is self-inflicted. The patches are in fact breaking the ABI to KVM intentionally (for better or worse). - Gregory claims that the AlacricityVM patches are not a replacement for KVM. I.e. there's no intention to phase out any 'old stuff' and it splits the pool of driver developers. i.e. it has all the makings of a stupid, avoidable, permanent fork. The thing is, if AlacricityVM is better, and KVM developers are not willing to fix their stuff, replace KVM with it. It's a bit as if someone found a performance problem with sys_open() and came up with sys_open_v2() and claimed that he wants to work with the VFS developers while not really doing so but advances sys_open_v2() all the time. Do we allow sys_open_v2() upstream, in the name of openness and diversity, letting some apps use that syscall while other apps still use sys_open()? Or do we say enough is enough of this stupidity, come up with some strong reasons to replace sys_open, and if so, replace the thing and be done with the pain!. Overlap and forking can still be done in special circumstances, when a project splits and a hostile fork is inevitable due to prolongued and irreconcilable differences between the parties and if there's no strong technical advantage on either side. I havent seen evidence of this yet though: Gregory claims that he wants to 'work with the community' and the KVM guys seem to agree violently that performance can be improved - and are doing so (and are asking Gregory to take part in that effort). The main difference is that Gregory claims that improved performance is not possible within the existing KVM framework, while the KVM developers disagree. The good news is that this is a hard, testable fact. I think we should try _much_ harder before giving up and forking the ABI of a healthy project and intentionally inflicting pain on our users. And, at minimum, such kinds of things _have to_ be pointed out in pull requests, because it's like utterly important. In fact i couldnt list any more important thing to point out in a pull request. Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Improve Decrementor Implementation v2
On 12/21/2009 09:21 PM, Alexander Graf wrote: We currently have an ugly hack called AGGRESSIVE_DEC that makes the Decrementor either work well for PPC32 or PPC64 targets. This patchset removes that hack, bringing the decrementor implementation closer to real hardware. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Change PowerPC KVM maintainer
On 12/20/2009 11:24 PM, Alexander Graf wrote: Progress on KVM for Embedded PowerPC has stalled, but for Book3S there's quite a lot of work to do and going on. So in agreement with Hollis and Avi, we should switch maintainers for PowerPC. I'll still demand Acks from Hollis for code that changes BookE parts when I can't say for sure if the change is ok. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html