date:20091222

From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/kvm/scripts/make-release b/kvm/scripts/make-release
index 3b1dccf..11d9c27 100755
--- a/kvm/scripts/make-release
+++ b/kvm/scripts/make-release
@@ -45,6 +45,13 @@ tarball=$releasedir/$name.tar
 cd $(dirname $0)/../..
 git archive --prefix=$name/ --format=tar $commit  $tarball
 
+mkdir -p $tmpdir
+git cat-file -p ${commit}:roms | awk ' { print $4, $3 } ' \
+ $tmpdir/EXTERNAL_DEPENDENCIES
+tar -rf $tarball --transform s,^,$name/, -C $tmpdir \
+EXTERNAL_DEPENDENCIES
+rm -rf $tmpdir
+
 if [[ -n $formal ]]; then
 mkdir -p $tmpdir
 echo $name  $tmpdir/KVM_VERSION
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] Don't load options roms intended to be loaded by the bios in qemu

From: Avi Kivity a...@redhat.com

The first such option rom will load at address 0, which isn't very nice,
and the second will report a conflict and abort, which is horrible.

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/hw/loader.c b/hw/loader.c
index 2ceb8eb..eef385e 100644
--- a/hw/loader.c
+++ b/hw/loader.c
@@ -636,6 +636,9 @@ static void rom_reset(void *unused)
 Rom *rom;
 
 QTAILQ_FOREACH(rom, roms, next) {
+if (rom-fw_file) {
+continue;
+}
 if (rom-data == NULL)
 continue;
 cpu_physical_memory_write_rom(rom-addr, rom-data, rom-romsize);
@@ -654,6 +657,9 @@ int rom_load_all(void)
 Rom *rom;
 
 QTAILQ_FOREACH(rom, roms, next) {
+if (rom-fw_file) {
+continue;
+}
 if (addr  rom-addr) {
 fprintf(stderr, rom: requested regions overlap 
 (rom %s. free=0x TARGET_FMT_plx
@@ -752,7 +758,7 @@ void do_info_roms(Monitor *mon)
 Rom *rom;
 
 QTAILQ_FOREACH(rom, roms, next) {
-if (rom-addr) {
+if (!rom-fw_file) {
 monitor_printf(mon, addr= TARGET_FMT_plx
 size=0x%06zx mem=%s name=\%s\ \n,
rom-addr, rom-romsize,
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] Split off sysfs id retrieval

From: Alexander Graf ag...@suse.de

To retreive device and vendor ID from a PCI device, we need to read a
sysfs file. That code is currently hand written at least two times,
the later patch introducing two more calls.

So let's move that out to a function.

Signed-off-by: Alexander Graf ag...@suse.de
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 4b902d7..35c8812 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -562,14 +562,46 @@ static int assigned_dev_register_regions(PCIRegion 
*io_regions,
 return 0;
 }
 
+static int get_real_id(const char *devpath, const char *idname, uint16_t *val)
+{
+FILE *f;
+char name[128];
+long id;
+
+snprintf(name, sizeof(name), %s%s, devpath, idname);
+f = fopen(name, r);
+if (f == NULL) {
+fprintf(stderr, %s: %s: %m\n, __func__, name);
+return -1;
+}
+if (fscanf(f, %li\n, id) == 1) {
+*val = id;
+} else {
+return -1;
+}
+fclose(f);
+
+return 0;
+}
+
+static int get_real_vendor_id(const char *devpath, uint16_t *val)
+{
+return get_real_id(devpath, vendor, val);
+}
+
+static int get_real_device_id(const char *devpath, uint16_t *val)
+{
+return get_real_id(devpath, device, val);
+}
+
 static int get_real_device(AssignedDevice *pci_dev, uint8_t r_bus,
uint8_t r_dev, uint8_t r_func)
 {
 char dir[128], name[128];
-int fd, r = 0;
+int fd, r = 0, v;
 FILE *f;
 unsigned long long start, end, size, flags;
-unsigned long id;
+uint16_t id;
 struct stat statbuf;
 PCIRegion *rp;
 PCIDevRegions *dev = pci_dev-real_device;
@@ -635,31 +667,21 @@ again:
 
 fclose(f);
 
-/* read and fill device ID */
-snprintf(name, sizeof(name), %svendor, dir);
-f = fopen(name, r);
-if (f == NULL) {
-fprintf(stderr, %s: %s: %m\n, __func__, name);
+/* read and fill vendor ID */
+v = get_real_vendor_id(dir, id);
+if (v) {
 return 1;
 }
-if (fscanf(f, %li\n, id) == 1) {
-   pci_dev-dev.config[0] = id  0xff;
-   pci_dev-dev.config[1] = (id  0xff00)  8;
-}
-fclose(f);
+pci_dev-dev.config[0] = id  0xff;
+pci_dev-dev.config[1] = (id  0xff00)  8;
 
-/* read and fill vendor ID */
-snprintf(name, sizeof(name), %sdevice, dir);
-f = fopen(name, r);
-if (f == NULL) {
-fprintf(stderr, %s: %s: %m\n, __func__, name);
+/* read and fill device ID */
+v = get_real_device_id(dir, id);
+if (v) {
 return 1;
 }
-if (fscanf(f, %li\n, id) == 1) {
-   pci_dev-dev.config[2] = id  0xff;
-   pci_dev-dev.config[3] = (id  0xff00)  8;
-}
-fclose(f);
+pci_dev-dev.config[2] = id  0xff;
+pci_dev-dev.config[3] = (id  0xff00)  8;
 
 /* dealing with virtual function device */
 snprintf(name, sizeof(name), %sphysfn/, dir);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] Inform users about busy device assignment attempt

From: Alexander Graf ag...@suse.de

When using -pcidevice on a device that is already in use by a kernel driver
all the user gets is the following (very useful) information:

  Failed to assign device 04:00.0 : Device or resource busy
  Failed to deassign device 04:00.0 : Invalid argument
  Error initializing device pci-assign

Since I usually prefer to have my computer do the thinking for me, I figured
it might be a good idea to check and see if a device is actually used by a
driver. If so, tell the user.

So with this patch applied you get the following output:

  Failed to assign device 04:00.0 : Device or resource busy
  *** The driver 'igb' is occupying your device 04:00.0.
  ***
  *** You can try the following commands to free it:
  ***
  *** $ echo 8086 150a  /sys/bus/pci/drivers/pci-stub/new_id
  *** $ echo :04:00.0  /sys/bus/pci/drivers/igb/unbind
  *** $ echo :04:00.0  /sys/bus/pci/drivers/pci-stub/bind
  *** $ echo 8086 150a  /sys/bus/pci/drivers/pci-stub/remove_id
  ***
  Failed to deassign device 04:00.0 : Invalid argument
  Error initializing device pci-assign

That should keep people like me from doing the most obvious misuses :-).

CC: Daniel P. Berrange berra...@redhat.com
Signed-off-by: Alexander Graf ag...@suse.de
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 35c8812..02d23d8 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -756,6 +756,54 @@ static uint32_t calc_assigned_dev_id(uint8_t bus, uint8_t 
devfn)
 return (uint32_t)bus  8 | (uint32_t)devfn;
 }
 
+static void assign_failed_examine(AssignedDevice *dev)
+{
+char name[PATH_MAX], dir[PATH_MAX], driver[PATH_MAX] = {}, *ns;
+uint16_t vendor_id, device_id;
+int r;
+
+/* XXX implement multidomain */
+sprintf(dir, /sys/bus/pci/devices/:%02x:%02x.%01x/,
+ dev-host.bus, dev-host.dev, dev-host.func);
+
+sprintf(name, %sdriver, dir);
+
+r = readlink(name, driver, sizeof(driver));
+if ((r = 0) || r = sizeof(driver) || !(ns = strrchr(driver, '/'))) {
+goto fail;
+}
+
+ns++;
+
+if (get_real_vendor_id(dir, vendor_id) ||
+get_real_device_id(dir, device_id)) {
+goto fail;
+}
+
+fprintf(stderr, *** The driver '%s' is occupying your device 
+%02x:%02x.%x.\n,
+ns, dev-host.bus, dev-host.dev, dev-host.func);
+fprintf(stderr, ***\n);
+fprintf(stderr, *** You can try the following commands to free it:\n);
+fprintf(stderr, ***\n);
+fprintf(stderr, *** $ echo \%04x %04x\  /sys/bus/pci/drivers/pci-stub/
+new_id\n, vendor_id, device_id);
+fprintf(stderr, *** $ echo \:%02x:%02x.%x\  /sys/bus/pci/drivers/
+%s/unbind\n,
+dev-host.bus, dev-host.dev, dev-host.func, ns);
+fprintf(stderr, *** $ echo \:%02x:%02x.%x\  /sys/bus/pci/drivers/
+pci-stub/bind\n,
+dev-host.bus, dev-host.dev, dev-host.func);
+fprintf(stderr, *** $ echo \%04x %04x\  /sys/bus/pci/drivers/pci-stub
+/remove_id\n, vendor_id, device_id);
+fprintf(stderr, ***\n);
+
+return;
+
+fail:
+fprintf(stderr, Couldn't find out why.\n);
+}
+
 static int assign_device(AssignedDevice *dev)
 {
 struct kvm_assigned_pci_dev assigned_dev_data;
@@ -781,9 +829,18 @@ static int assign_device(AssignedDevice *dev)
 #endif
 
 r = kvm_assign_pci_device(kvm_context, assigned_dev_data);
-if (r  0)
-   fprintf(stderr, Failed to assign device \%s\ : %s\n,
+if (r  0) {
+fprintf(stderr, Failed to assign device \%s\ : %s\n,
 dev-dev.qdev.id, strerror(-r));
+
+switch (r) {
+case -EBUSY:
+assign_failed_examine(dev);
+break;
+default:
+break;
+}
+}
 return r;
 }
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] Don't leak kvm_save_mpstate() to main qemu code

From: Avi Kivity a...@redhat.com

It doesn't exist outside x86, and breaks the build.  Move it to
cpu_synchronize_state() instead (only reading, not writing).

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/monitor.c b/monitor.c
index 5a9fae6..6ff6e1f 100644
--- a/monitor.c
+++ b/monitor.c
@@ -677,7 +677,6 @@ static CPUState *mon_get_cpu(void)
 mon_set_cpu(0);
 }
 cpu_synchronize_state(cur_mon-mon_cpu);
-kvm_save_mpstate(cur_mon-mon_cpu);
 return cur_mon-mon_cpu;
 }
 
@@ -780,7 +779,6 @@ static void do_info_cpus(Monitor *mon, QObject **ret_data)
 QObject *obj;
 
 cpu_synchronize_state(env);
-kvm_save_mpstate(env);
 
 obj = qobject_from_jsonf({ 'CPU': %d, 'current': %i, 'halted': %i },
  env-cpu_index, env == mon-mon_cpu,
diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 7b7bc0f..82e362c 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -1217,6 +1217,7 @@ void kvm_arch_save_regs(CPUState *env)
 return;
 }
 }
+kvm_arch_save_mpstate(env);
 }
 
 static void do_cpuid_ent(struct kvm_cpuid_entry2 *e, uint32_t function,
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: powerpc: Move vector to irqprio resolving to separate function

From: Alexander Graf ag...@suse.de

We're using a switch table to find the irqprio that belongs to a specific
interrupt vector. This table is part of the interrupt inject logic.

Since we'll add a new function to stop interrupts, let's move this table
out of the injection logic into a separate function.

Signed-off-by: Alexander Graf ag...@suse.de
Acked-by: Acked-by: Hollis Blanchard hol...@penguinppc.org
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 3e294bd..241795b 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -125,11 +125,10 @@ void kvmppc_inject_interrupt(struct kvm_vcpu *vcpu, int 
vec, u64 flags)
vcpu-arch.mmu.reset_msr(vcpu);
 }
 
-void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec)
+static int kvmppc_book3s_vec2irqprio(unsigned int vec)
 {
unsigned int prio;
 
-   vcpu-stat.queue_intr++;
switch (vec) {
case 0x100: prio = BOOK3S_IRQPRIO_SYSTEM_RESET; break;
case 0x200: prio = BOOK3S_IRQPRIO_MACHINE_CHECK;break;
@@ -149,7 +148,15 @@ void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, 
unsigned int vec)
default:prio = BOOK3S_IRQPRIO_MAX;  break;
}
 
-   set_bit(prio, vcpu-arch.pending_exceptions);
+   return prio;
+}
+
+void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec)
+{
+   vcpu-stat.queue_intr++;
+
+   set_bit(kvmppc_book3s_vec2irqprio(vec),
+   vcpu-arch.pending_exceptions);
 #ifdef EXIT_DEBUG
printk(KERN_INFO Queueing interrupt %x\n, vec);
 #endif
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: powerpc: Remove AGGRESSIVE_DEC

From: Alexander Graf ag...@suse.de

Because we now emulate the DEC interrupt according to real life behavior,
there's no need to keep the AGGRESSIVE_DEC hack around.

Let's just remove it.

Signed-off-by: Alexander Graf ag...@suse.de
Acked-by: Acked-by: Hollis Blanchard hol...@penguinppc.org
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index fd3ad6c..803505d 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -34,12 +34,6 @@
 /* #define EXIT_DEBUG */
 /* #define EXIT_DEBUG_SIMPLE */
 
-/* Without AGGRESSIVE_DEC we only fire off a DEC interrupt when DEC turns 0.
- * When set, we retrigger a DEC interrupt after that if DEC = 0.
- * PPC32 Linux runs faster without AGGRESSIVE_DEC, PPC64 Linux requires it. */
-
-/* #define AGGRESSIVE_DEC */
-
 struct kvm_stats_debugfs_item debugfs_entries[] = {
{ exits,   VCPU_STAT(sum_exits) },
{ mmio,VCPU_STAT(mmio_exits) },
@@ -81,7 +75,7 @@ void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
to_book3s(vcpu)-slb_shadow_max = get_paca()-kvm_slb_max;
 }
 
-#if defined(AGGRESSIVE_DEC) || defined(EXIT_DEBUG)
+#if defined(EXIT_DEBUG)
 static u32 kvmppc_get_dec(struct kvm_vcpu *vcpu)
 {
u64 jd = mftb() - vcpu-arch.dec_jiffies;
@@ -273,14 +267,6 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
unsigned long *pending = vcpu-arch.pending_exceptions;
unsigned int priority;
 
-   /* XXX be more clever here - no need to mftb() on every entry */
-   /* Issue DEC again if it's still active */
-#ifdef AGGRESSIVE_DEC
-   if (vcpu-arch.msr  MSR_EE)
-   if (kvmppc_get_dec(vcpu)  0x8000)
-   kvmppc_core_queue_dec(vcpu);
-#endif
-
 #ifdef EXIT_DEBUG
if (vcpu-arch.pending_exceptions)
printk(KERN_EMERG KVM: Check pending: %lx\n, 
vcpu-arch.pending_exceptions);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: powerpc: Improve DEC handling

From: Alexander Graf ag...@suse.de

We treated the DEC interrupt like an edge based one. This is not true for
Book3s. The DEC keeps firing until mtdec is issued again and thus clears
the interrupt line.

So let's implement this logic in KVM too. This patch moves the line clearing
from the firing of the interrupt to the mtdec emulation.

This makes PPC64 guests work without AGGRESSIVE_DEC defined.

Signed-off-by: Alexander Graf ag...@suse.de
Acked-by: Acked-by: Hollis Blanchard hol...@penguinppc.org
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 269ee46..abfd0c4 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -82,6 +82,7 @@ extern void kvmppc_core_deliver_interrupts(struct kvm_vcpu 
*vcpu);
 extern int kvmppc_core_pending_dec(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_queue_program(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_queue_dec(struct kvm_vcpu *vcpu);
+extern void kvmppc_core_dequeue_dec(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_queue_external(struct kvm_vcpu *vcpu,
struct kvm_interrupt *irq);
 
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 241795b..fd3ad6c 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -151,6 +151,13 @@ static int kvmppc_book3s_vec2irqprio(unsigned int vec)
return prio;
 }
 
+static void kvmppc_book3s_dequeue_irqprio(struct kvm_vcpu *vcpu,
+ unsigned int vec)
+{
+   clear_bit(kvmppc_book3s_vec2irqprio(vec),
+ vcpu-arch.pending_exceptions);
+}
+
 void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec)
 {
vcpu-stat.queue_intr++;
@@ -178,6 +185,11 @@ int kvmppc_core_pending_dec(struct kvm_vcpu *vcpu)
return test_bit(BOOK3S_INTERRUPT_DECREMENTER  7, 
vcpu-arch.pending_exceptions);
 }
 
+void kvmppc_core_dequeue_dec(struct kvm_vcpu *vcpu)
+{
+   kvmppc_book3s_dequeue_irqprio(vcpu, BOOK3S_INTERRUPT_DECREMENTER);
+}
+
 void kvmppc_core_queue_external(struct kvm_vcpu *vcpu,
 struct kvm_interrupt *irq)
 {
@@ -275,7 +287,9 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
 #endif
priority = __ffs(*pending);
while (priority = (sizeof(unsigned int) * 8)) {
-   if (kvmppc_book3s_irqprio_deliver(vcpu, priority)) {
+   if (kvmppc_book3s_irqprio_deliver(vcpu, priority) 
+   (priority != BOOK3S_IRQPRIO_DECREMENTER)) {
+   /* DEC interrupts get cleared by mtdec */
clear_bit(priority, vcpu-arch.pending_exceptions);
break;
}
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 06f5a9e..d8b6342 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -97,6 +97,11 @@ int kvmppc_core_pending_dec(struct kvm_vcpu *vcpu)
return test_bit(BOOKE_IRQPRIO_DECREMENTER, 
vcpu-arch.pending_exceptions);
 }
 
+void kvmppc_core_dequeue_dec(struct kvm_vcpu *vcpu)
+{
+   clear_bit(BOOKE_IRQPRIO_DECREMENTER, vcpu-arch.pending_exceptions);
+}
+
 void kvmppc_core_queue_external(struct kvm_vcpu *vcpu,
 struct kvm_interrupt *irq)
 {
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 4a9ac66..303457b 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -83,6 +83,9 @@ void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 
pr_debug(mtDEC: %x\n, vcpu-arch.dec);
 #ifdef CONFIG_PPC64
+   /* mtdec lowers the interrupt line when positive. */
+   kvmppc_core_dequeue_dec(vcpu);
+
/* POWER4+ triggers a dec interrupt if the value is  0 */
if (vcpu-arch.dec  0x8000) {
hrtimer_try_to_cancel(vcpu-arch.dec_timer);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: powerpc: Change maintainer

From: Alexander Graf ag...@suse.de

Progress on KVM for Embedded PowerPC has stalled, but for Book3S there's quite
a lot of work to do and going on.

So in agreement with Hollis and Avi, we should switch maintainers for PowerPC.

Signed-off-by: Alexander Graf ag...@suse.de
Acked-by: Hollis Blanchard hol...@penguinppc.org
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/MAINTAINERS b/MAINTAINERS
index efd2ef2..0c1a696 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3139,7 +3139,7 @@ F:arch/x86/include/asm/svm.h
 F: arch/x86/kvm/svm.c
 
 KERNEL VIRTUAL MACHINE (KVM) FOR POWERPC
-M: Hollis Blanchard holl...@us.ibm.com
+M: Alexander Graf ag...@suse.de
 L: kvm-...@vger.kernel.org
 W: http://kvm.qumranet.com
 S: Supported
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[COMMIT master] KVM: powerpc: Show timing option only on embedded

From: Alexander Graf ag...@suse.de

Embedded PowerPC KVM has an exit timing implementation to track and evaluate
how much time was spent in which exit path.

For Book3S, we don't implement it. So let's not expose it as a config option
either.

Signed-off-by: Alexander Graf ag...@suse.de
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 7635ba2..be28968 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -54,7 +54,7 @@ config KVM_440
 
 config KVM_EXIT_TIMING
bool Detailed exit timing
-   depends on KVM
+   depends on KVM_440 || KVM_E500
---help---
  Calculate elapsed time for every exit/enter cycle. A per-vcpu
  report is available in debugfs kvm/vm#_vcpu#_timing.
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Qemu vs Qemu-KVM


On 12/22/2009 12:21 AM, Mikolaj Kucharski wrote:

On Mon, Dec 21, 2009 at 09:22:52AM +0200, Gleb Natapov wrote:
   

I have personal interest in resolving RedHat's bz #508801, unfortunately
I cannot do that myself, so I wanted to ask on the list for help, but now
I'm confused where should I go.

   

Can you try kvm modules from latest kvm.git please? It looks like emulation of
push %ds fails and it was added after 2.6.32.
 

Having following GIT repositories:

git://git.kernel.org/pub/scm/virt/kvm/kvm.git
git://git.kernel.org/pub/scm/virt/kvm/kvm-kmod.git

Which one I should use to build my modules from? I would like to keep my
system (Fedora 12) consistent and I don't want to have any parts built
outside of rpm. I would like to contribute/help for RedHat's bz #508801
resolution, but I need some directions.

   


git clone git://git.kernel.org/pub/scm/virt/kvm/kvm-kmod.git

git submodule init
git submodule update
./configure
make sync
make
make install


Is kvm.git whole Linux kernel


Yes.  kvm-kmod downloads it (via git submodule) and extracts the kvm 
bits (via make sync).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Improve Decrementor Implementation v2


On 12/21/2009 09:21 PM, Alexander Graf wrote:

We currently have an ugly hack called AGGRESSIVE_DEC that makes the Decrementor
either work well for PPC32 or PPC64 targets.

This patchset removes that hack, bringing the decrementor implementation closer
to real hardware.
   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Show KVM timing option only on embedded


On 12/20/2009 11:24 PM, Alexander Graf wrote:

Embedded PowerPC KVM has an exit timing implementation to track and evaluate
how much time was spent in which exit path.

For Book3S, we don't implement it. So let's not expose it as a config option
either.
   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Change PowerPC KVM maintainer


On 12/20/2009 11:24 PM, Alexander Graf wrote:

Progress on KVM for Embedded PowerPC has stalled, but for Book3S there's quite
a lot of work to do and going on.

So in agreement with Hollis and Avi, we should switch maintainers for PowerPC.

I'll still demand Acks from Hollis for code that changes BookE parts when I
can't say for sure if the change is ok.

   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Device Assignment fixes


On 12/17/2009 05:04 PM, Alexander Graf wrote:

While trying out I stumbled across several issues in the device assignment
code.

This set addresses the most major ones. Namely allowing passthrough of non page
aligned BAR region (like on lpfc) and telling users what to do when their
device is in use.

   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] slow_map: minor improvements to ROM BAR handling

ROM BAR can be handled same as regular BAR:
load_option_roms utility will take care of
copying it to RAM as appropriate.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---

This patch applies on top of agraf's one,
it takes care of non-page aligned ROM BARs as well:
they mostly are taken care of, we just do not
need to warn user about them.

 hw/device-assignment.c |   20 +---
 1 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 000fa61..066fdb6 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -486,25 +486,23 @@ static int assigned_dev_register_regions(PCIRegion 
*io_regions,
 : PCI_BASE_ADDRESS_SPACE_MEMORY;
 
 if (cur_region-size  0xFFF) {
-fprintf(stderr, PCI region %d at address 0x%llx 
-has size 0x%x, which is not a multiple of 4K. 
-You might experience some performance hit due to 
that.\n,
-i, (unsigned long long)cur_region-base_addr,
-cur_region-size);
+if (i != PCI_ROM_SLOT) {
+fprintf(stderr, PCI region %d at address 0x%llx 
+has size 0x%x, which is not a multiple of 4K. 
+You might experience some performance hit 
+due to that.\n,
+i, (unsigned long long)cur_region-base_addr,
+cur_region-size);
+}
 slow_map = 1;
 }
 
-if (slow_map  (i == PCI_ROM_SLOT)) {
-fprintf(stderr, ROM not aligned - can't continue\n);
-return -1;
-}
-
 /* map physical memory */
 pci_dev-v_addrs[i].e_physbase = cur_region-base_addr;
 if (i == PCI_ROM_SLOT) {
 pci_dev-v_addrs[i].u.r_virtbase =
 mmap(NULL,
- (cur_region-size + 0xFFF)  0xF000,
+ cur_region-size,
  PROT_WRITE | PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE,
  0, (off_t) 0);
 
-- 
1.6.6.rc1.43.gf55cc
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

qemu-kvm-0.12 bug?

2009-12-22 Thread Thomas Treutner

Hi, 

I've switched from -0.11.0 to -0.12 and from 2.6.31.6 to 2.6.32.2 to try the 
new virtio-memory-API introduced in latest libvirt from git. I can start VMs 
f.e. by kvm -cdrom $someiso --enable-kvm but my domain configs to not work 
anymore. 


Domain config: http://pastebin.com/f66324669

Qemu log when doing virsh start wp01:

LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin 
HOME=/root USER=root LOGNAME=root QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M 
pc-0.11 -enable-kvm -m 1000 -smp 1 -name wp01 -uuid 
4c34e9aa-2bcd-fa8e-bc98-4417c1cb779a -chardev 
socket,id=monitor,path=/usr/local/var/lib/libvirt/qemu/wp01.monitor,server,nowait
 -monitor 
chardev:monitor -boot 
c -kernel /boot/vmlinuz-2.6.32.2 -initrd /boot/initrd.img-2.6.32.2 -append 
root=/dev/vdb noresume2 -drive 
file=/dev/disk/by-path/ip-10.0.1.1:3260-iscsi-iqn.2009-09.local:store.wp01-swap-lun-0,if=virtio,index=0,boot=on
 -drive 
file=/dev/disk/by-path/ip-10.0.1.1:3260-iscsi-iqn.2009-09.local:store.wp01-disk-lun-0,if=virtio,index=1
 -net 
nic,macaddr=00:16:3e:00:01:01,vlan=0,model=virtio,name=virtio.0 -net 
tap,fd=18,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial 
chardev:serial0 -parallel none -usb -vnc 0.0.0.0:0 -k de -vga cirrus
char device redirected to /dev/pts/3
Option 'ipv4': Use 'on' or 'off'
Failed to parse yes for dummy.ipv4
rom: requested regions overlap (rom vapic.bin. free=0x0600, 
addr=0x)
rom loading failed


Is this a bug or am I missing something?


Thanks, kr
tom

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: qemu-kvm-0.12 bug?

2009-12-22 Thread Thomas Treutner

On Tuesday 22 December 2009 13:02:28 Thomas Treutner wrote:
 Hi,

 I've switched from -0.11.0 to -0.12 and from 2.6.31.6 to 2.6.32.2 to try
 the new virtio-memory-API introduced in latest libvirt from git. I can
 start VMs f.e. by kvm -cdrom $someiso --enable-kvm but my domain configs to
 not work anymore.

Just found the other bug report about this
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] slow_map: minor improvements to ROM BAR handling

On Tue, Dec 22, 2009 at 01:05:23PM +0100, Alexander Graf wrote:
 Michael S. Tsirkin wrote:
  ROM BAR can be handled same as regular BAR:
  load_option_roms utility will take care of
  copying it to RAM as appropriate.
 
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
 
  This patch applies on top of agraf's one,
  it takes care of non-page aligned ROM BARs as well:
  they mostly are taken care of, we just do not
  need to warn user about them.
 
   hw/device-assignment.c |   20 +---
   1 files changed, 9 insertions(+), 11 deletions(-)
 
  diff --git a/hw/device-assignment.c b/hw/device-assignment.c
  index 000fa61..066fdb6 100644
  --- a/hw/device-assignment.c
  +++ b/hw/device-assignment.c
  @@ -486,25 +486,23 @@ static int assigned_dev_register_regions(PCIRegion 
  *io_regions,
   : PCI_BASE_ADDRESS_SPACE_MEMORY;
   
   if (cur_region-size  0xFFF) {
  -fprintf(stderr, PCI region %d at address 0x%llx 
  -has size 0x%x, which is not a multiple of 4K. 
  -You might experience some performance hit due to 
  that.\n,
  -i, (unsigned long long)cur_region-base_addr,
  -cur_region-size);
  +if (i != PCI_ROM_SLOT) {
  +fprintf(stderr, PCI region %d at address 0x%llx 
  +has size 0x%x, which is not a multiple of 4K. 
  
  +You might experience some performance hit 
  +due to that.\n,
  +i, (unsigned long long)cur_region-base_addr,
  +cur_region-size);
  +}
   slow_map = 1;

 
 This is wrong. You're setting slow_map = 1 on code that is very likely
 to be executed inside the guest. That doesn't work.

It is? Can you really run code directly from a PCI card?
I looked at BIOS boot specification and it always talks
about shadowing PCI ROMs.


 Better pad the ROM size to page boundary and use the shadow mapping we
 have in place already.

Changing BAR size might break some drivers.
Our BIOS seems to shadow ROM instead of running it directly,
so we should be fine I think?

 
 Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ANNOUNCE] qemu-kvm-0.12.1.1

qemu-kvm-0.12.1.1 is now available.  This release is is based on the 
upstream qemu 0.12.1, plus kvm-specific enhancements.  Please see the 
original qemu 0.12.1 release announcement for details.


This release can be used with the kvm kernel modules provided by your 
distribution kernel, or by the modules in the kvm-kmod package, such as 
kvm-kmod-2.6.32.


Changes from qemu-kvm-0.12.1
- fix build error due to missing kvm_save_mpstate() on some configurations
- fix option rom loading (fixes boot=on)


http://www.linux-kvm.org

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ANNOUNCE] qemu-kvm-0.12.1.1

2009-12-22 Thread Nikola Ciprich

sorry to bring bad news, but it still doesn't compile (at least for me):

[r...@vmdev03 qemu-kvm-0.12.1.1]# make
  LINK  x86_64-softmmu/qemu-system-x86_64
machine.o: In function `cpu_pre_save':
/usr/src/redhat/BUILD/qemu-kvm-0.12.1.1/target-i386/machine.c:326: undefined 
reference to `kvm_save_mpstate'
collect2: ld returned 1 exit status
make[1]: *** [qemu-system-x86_64] Error 1
make: *** [subdir-x86_64-softmmu] Error 2

n.

On Tue, Dec 22, 2009 at 02:50:03PM +0200, Avi Kivity wrote:
 qemu-kvm-0.12.1.1 is now available.  This release is is based on the
 upstream qemu 0.12.1, plus kvm-specific enhancements.  Please see
 the original qemu 0.12.1 release announcement for details.
 
 This release can be used with the kvm kernel modules provided by
 your distribution kernel, or by the modules in the kvm-kmod package,
 such as kvm-kmod-2.6.32.
 
 Changes from qemu-kvm-0.12.1
 - fix build error due to missing kvm_save_mpstate() on some configurations
 - fix option rom loading (fixes boot=on)
 
 
 http://www.linux-kvm.org
 
 -- 
 error compiling committee.c: too many arguments to function
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ANNOUNCE] qemu-kvm-0.12.1.1


On 12/22/2009 03:35 PM, Nikola Ciprich wrote:

sorry to bring bad news, but it still doesn't compile (at least for me):

[r...@vmdev03 qemu-kvm-0.12.1.1]# make
   LINK  x86_64-softmmu/qemu-system-x86_64
machine.o: In function `cpu_pre_save':
/usr/src/redhat/BUILD/qemu-kvm-0.12.1.1/target-i386/machine.c:326: undefined 
reference to `kvm_save_mpstate'
collect2: ld returned 1 exit status
make[1]: *** [qemu-system-x86_64] Error 1
make: *** [subdir-x86_64-softmmu] Error 2
   


Please provide details about your host.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ANNOUNCE] qemu-kvm-0.12.1.1

2009-12-22 Thread Nikola Ciprich

CentOS5, all packages updated.
But I just noticed build fails only when configured with multiple targets,
including some more exotic (based on fedora spec).
building just using x86_64-softmmu works fine for me.
n.

On Tue, Dec 22, 2009 at 03:43:32PM +0200, Avi Kivity wrote:
 On 12/22/2009 03:35 PM, Nikola Ciprich wrote:
 sorry to bring bad news, but it still doesn't compile (at least for me):
 
 [r...@vmdev03 qemu-kvm-0.12.1.1]# make
LINK  x86_64-softmmu/qemu-system-x86_64
 machine.o: In function `cpu_pre_save':
 /usr/src/redhat/BUILD/qemu-kvm-0.12.1.1/target-i386/machine.c:326: undefined 
 reference to `kvm_save_mpstate'
 collect2: ld returned 1 exit status
 make[1]: *** [qemu-system-x86_64] Error 1
 make: *** [subdir-x86_64-softmmu] Error 2
 
 Please provide details about your host.
 
 -- 
 error compiling committee.c: too many arguments to function
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ANNOUNCE] qemu-kvm-0.12.1.1


On 12/22/2009 04:07 PM, Nikola Ciprich wrote:

CentOS5, all packages updated.
But I just noticed build fails only when configured with multiple targets,
including some more exotic (based on fedora spec).
building just using x86_64-softmmu works fine for me.
   


Okay, so at least it works out of the box (no special configure option).

Hopefully someone will figure out why other targets cause it to fail - 
looks like some files are compiled with the wrong config.h.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Qemu vs Qemu-KVM

2009-12-22 Thread Gleb Natapov

On Tue, Dec 22, 2009 at 09:15:57AM +0200, Gleb Natapov wrote:
 On Mon, Dec 21, 2009 at 10:21:14PM +, Mikolaj Kucharski wrote:
  On Mon, Dec 21, 2009 at 09:22:52AM +0200, Gleb Natapov wrote:
I have personal interest in resolving RedHat's bz #508801, unfortunately
I cannot do that myself, so I wanted to ask on the list for help, but 
now
I'm confused where should I go.

   Can you try kvm modules from latest kvm.git please? It looks like 
   emulation of
   push %ds fails and it was added after 2.6.32.
  
  Having following GIT repositories:
  
  git://git.kernel.org/pub/scm/virt/kvm/kvm.git
  git://git.kernel.org/pub/scm/virt/kvm/kvm-kmod.git
  
  Which one I should use to build my modules from? I would like to keep my
  system (Fedora 12) consistent and I don't want to have any parts built
  outside of rpm. I would like to contribute/help for RedHat's bz #508801
  resolution, but I need some directions.
  
  Is kvm.git whole Linux kernel?
  
 Don't bother, I already tested upstream with OpenBSD and, as Avi said,
 the problem is somewhere else.
 
For some strange reason openbsd configures gsi 4 (com0) and gsi 12
(i8042) to be level triggered active high in ioapic. That causes KVM
to send endless stream of interrupts into the guest, so guest's stack
overflows into framebuffer area. At this point KVM start to emulate
instructions and fails. I don't know why openbsd configures those
interrupts incorrectly. It also ignores my attempts to override interrupt
polarity/type with ACPI tables. Somebody knowledgeable in openbsd should
look into why it configures interrupt controller incorrectly. If I
override wrong settings inside KVM like in the patch below openbsd boots,
but I doubt com port is usable. 


diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 38a2d20..fc37eac 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -121,6 +121,8 @@ static void ioapic_write_indirect(struct kvm_ioapic 
*ioapic, u32 val)
default:
index = (ioapic-ioregsel - 0x10)  1;
 
+   if (!(ioapic-ioregsel  1))
+   val = ~0xa000;
ioapic_debug(change redir index %x val %x\n, index, val);
if (index = IOAPIC_NUM_PINS)
return;
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] slow_map: minor improvements to ROM BAR handling

On Tue, Dec 22, 2009 at 02:34:42PM +0100, Alexander Graf wrote:
 Michael S. Tsirkin wrote:
  On Tue, Dec 22, 2009 at 01:05:23PM +0100, Alexander Graf wrote:

  Michael S. Tsirkin wrote:
  
  ROM BAR can be handled same as regular BAR:
  load_option_roms utility will take care of
  copying it to RAM as appropriate.
 
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
 
  This patch applies on top of agraf's one,
  it takes care of non-page aligned ROM BARs as well:
  they mostly are taken care of, we just do not
  need to warn user about them.
 
   hw/device-assignment.c |   20 +---
   1 files changed, 9 insertions(+), 11 deletions(-)
 
  diff --git a/hw/device-assignment.c b/hw/device-assignment.c
  index 000fa61..066fdb6 100644
  --- a/hw/device-assignment.c
  +++ b/hw/device-assignment.c
  @@ -486,25 +486,23 @@ static int assigned_dev_register_regions(PCIRegion 
  *io_regions,
   : PCI_BASE_ADDRESS_SPACE_MEMORY;
   
   if (cur_region-size  0xFFF) {
  -fprintf(stderr, PCI region %d at address 0x%llx 
  -has size 0x%x, which is not a multiple of 4K. 
  -You might experience some performance hit due 
  to that.\n,
  -i, (unsigned long long)cur_region-base_addr,
  -cur_region-size);
  +if (i != PCI_ROM_SLOT) {
  +fprintf(stderr, PCI region %d at address 0x%llx 
  +has size 0x%x, which is not a multiple of 
  4K. 
  +You might experience some performance hit 
  +due to that.\n,
  +i, (unsigned long long)cur_region-base_addr,
  +cur_region-size);
  +}
   slow_map = 1;


  This is wrong. You're setting slow_map = 1 on code that is very likely
  to be executed inside the guest. That doesn't work.
  
 
  It is? Can you really run code directly from a PCI card?
  I looked at BIOS boot specification and it always talks
  about shadowing PCI ROMs.

 
 I'm not sure the BIOS is the only one executing ROMs. If it is, then I'm
 good with the change.
 Maybe it'd make sense to also add a read only flag so we don't
 accidently try to write to the ROM region with slow_map.
 
 Alex

Correct: I think it's made readonly down the road with mprotect,
so attempt to do so will crash qemu :)

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] slow_map: minor improvements to ROM BAR handling


On 12/22/2009 05:19 PM, Michael S. Tsirkin wrote:



I'm not sure the BIOS is the only one executing ROMs. If it is, then I'm
good with the change.
Maybe it'd make sense to also add a read only flag so we don't
accidently try to write to the ROM region with slow_map.

Alex
 

Correct: I think it's made readonly down the road with mprotect,
so attempt to do so will crash qemu :)
   


Alex, are you happy with this?  I'd like to apply it.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] slow_map: minor improvements to ROM BAR handling

2009-12-22 Thread Alexander Graf

Avi Kivity wrote:
 On 12/22/2009 05:19 PM, Michael S. Tsirkin wrote:

 I'm not sure the BIOS is the only one executing ROMs. If it is, then
 I'm
 good with the change.
 Maybe it'd make sense to also add a read only flag so we don't
 accidently try to write to the ROM region with slow_map.

 Alex
  
 Correct: I think it's made readonly down the road with mprotect,
 so attempt to do so will crash qemu :)


 Alex, are you happy with this?  I'd like to apply it.

I'd like to see the read-only protection in. Apart from that I'm good on
checking it in, though I'm only awaiting the day someone runs code off
such a ROM region ;-).

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] slow_map: minor improvements to ROM BAR handling

2009-12-22 Thread Alexander Graf

Avi Kivity wrote:
 On 12/22/2009 05:36 PM, Alexander Graf wrote:

 Is there a way to trap this and fprintf something?
  
 I don't think so. KVM will just trap on execution outside of RAM and
 either fail badly or throw something bad into the guest. MMIO access
 works by analyzing the instruction that accesses the MMIO address. That
 just doesn't work when we don't have an instruction to analyze.


 We could certainly extend emulate.c to fetch instruction bytes from
 userspace.  It uses -read_std() now, so we'd need to switch to
 -read_emulated() and add appropriate buffering.

I thought the policy on emulate.c was to not have a full instruction
emulator but only emulate instructions that do PT modifications or MMIO
access?

Btw, we're in the same situation with PowerPC here. The instruction
emulator is _really_ small. It only does a few MMU specific
instructions, a couple of privileged ones and MMIO accessing ones.

Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] slow_map: minor improvements to ROM BAR handling


On 12/22/2009 05:41 PM, Alexander Graf wrote:



We could certainly extend emulate.c to fetch instruction bytes from
userspace.  It uses -read_std() now, so we'd need to switch to
-read_emulated() and add appropriate buffering.
 

I thought the policy on emulate.c was to not have a full instruction
emulator but only emulate instructions that do PT modifications or MMIO
access?
   


It's not a policy, just laziness.  With emulate_invalid_guest_state=1 we 
need many more instructions.  Of course I don't want to add instructions 
just for the sake of it, since they will be untested.


I'd much prefer not to run from mmio if possible - just pointing out 
it's doable.



Btw, we're in the same situation with PowerPC here. The instruction
emulator is _really_ small. It only does a few MMU specific
instructions, a couple of privileged ones and MMIO accessing ones.
   


Plus, you have a fixed length instruction length, likely more regular 
too.  I imagine powerpc is load/store, so you don't have to emulate a 
zillion ALU instructions?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

2009-12-22 Thread Bartlomiej Zolnierkiewicz

On Tuesday 22 December 2009 04:31:32 pm Anthony Liguori wrote:

 I think the comparison would be if someone submitted a second e1000 
 driver that happened to do better on one netperf test than the current 
 e1000 driver.
 
 You can argue, hey, choice is good, let's let a user choose if they want 
 to use the faster e1000 driver.  But surely, the best thing for a user 
 is to figure out why the second e1000 driver is better on that one test, 
 integrate that change into the current e1000 driver, or decided that the 

Even though this is Won't somebody please think of the users? argument
such work would be much welcomed.  Sending patches would be a great start..

 new e1000 driver is more superior in architecture and do the required 
 work to make the new e1000 driver a full replacement for the old one.

Right, like everyone actually does things this way..

I wonder why do we have OSS, old Firewire and IDE stacks still around then?

 Regards,
 
 Anthony Liguori
 
  Unwritten code tends to always sound nicer, but it remains to be seen
  if it can deliver what it promises.
 
   From a abstract stand point having efficient paravirtual IO interfaces
  seem attractive.
 
  I also personally don't see a big problem in having another set of
  virtual drivers -- Linux already has plenty (vmware, xen, virtio, power,
  s390-vm, ...) and it's not that they would be a particular maintenance
  burden impacting the kernel core.

Exactly, I also don't see any problem here, especially since AlacrityVM
drivers have much cleaner design / internal architecture than some of their
competitors..

--
Bartlomiej Zolnierkiewicz
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] slow_map: minor improvements to ROM BAR handling

On Tue, Dec 22, 2009 at 05:00:52PM +0100, Alexander Graf wrote:
 Avi Kivity wrote:
  On 12/22/2009 05:41 PM, Alexander Graf wrote:
 
  We could certainly extend emulate.c to fetch instruction bytes from
  userspace.  It uses -read_std() now, so we'd need to switch to
  -read_emulated() and add appropriate buffering.
   
  I thought the policy on emulate.c was to not have a full instruction
  emulator but only emulate instructions that do PT modifications or MMIO
  access?
 
 
  It's not a policy, just laziness.  With emulate_invalid_guest_state=1
  we need many more instructions.  Of course I don't want to add
  instructions just for the sake of it, since they will be untested.
 
  I'd much prefer not to run from mmio if possible - just pointing out
  it's doable.
 
 Right...
 
  emulator is _really_ small. It only does a few MMU specific
  instructions, a couple of privileged ones and MMIO accessing ones.
 
  Btw, we're in the same situation with PowerPC here. The instruction
 
  Plus, you have a fixed length instruction length, likely more regular
  too.  I imagine powerpc is load/store, so you don't have to emulate a
  zillion ALU instructions?
 
 Well, it's certainly doable (and easier than on x86). But I'm on the
 same position as you on the x86 side. Why increase the emulator size at
 least 10 times if we don't have to?
 
 Either way, people will report bugs when / if they actually start
 executing code off MMIO. So let's not care too much about it for now.
 Just make sure the read-only check is in.
 
 Alex

So I think all we need is this on top?

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 066fdb6..0c3c8f4 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -233,7 +233,8 @@ static void assigned_dev_iomem_map_slow(PCIDevice *pci_dev, 
int region_num,
 int m;
 
 DEBUG(slow map\n);
-m = cpu_register_io_memory(slow_bar_read, slow_bar_write, region);
+m = cpu_register_io_memory(slow_bar_read, region_num == PCI_ROM_SLOT ?
+   NULL : slow_bar_write, region);
 cpu_register_physical_memory(e_phys, e_size, m);
 
 /* MSI-X MMIO page */
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] slow_map: minor improvements to ROM BAR handling

2009-12-22 Thread Alexander Graf

Michael S. Tsirkin wrote:
 On Tue, Dec 22, 2009 at 05:00:52PM +0100, Alexander Graf wrote:
   
 Avi Kivity wrote:
 
 On 12/22/2009 05:41 PM, Alexander Graf wrote:
   
 We could certainly extend emulate.c to fetch instruction bytes from
 userspace.  It uses -read_std() now, so we'd need to switch to
 -read_emulated() and add appropriate buffering.
  
   
 I thought the policy on emulate.c was to not have a full instruction
 emulator but only emulate instructions that do PT modifications or MMIO
 access?

 
 It's not a policy, just laziness.  With emulate_invalid_guest_state=1
 we need many more instructions.  Of course I don't want to add
 instructions just for the sake of it, since they will be untested.

 I'd much prefer not to run from mmio if possible - just pointing out
 it's doable.
   
 Right...

 
 emulator is _really_ small. It only does a few MMU specific
 instructions, a couple of privileged ones and MMIO accessing ones.

 
 Btw, we're in the same situation with PowerPC here. The instruction

 Plus, you have a fixed length instruction length, likely more regular
 too.  I imagine powerpc is load/store, so you don't have to emulate a
 zillion ALU instructions?
   
 Well, it's certainly doable (and easier than on x86). But I'm on the
 same position as you on the x86 side. Why increase the emulator size at
 least 10 times if we don't have to?

 Either way, people will report bugs when / if they actually start
 executing code off MMIO. So let's not care too much about it for now.
 Just make sure the read-only check is in.

 Alex
 

 So I think all we need is this on top?

 diff --git a/hw/device-assignment.c b/hw/device-assignment.c
 index 066fdb6..0c3c8f4 100644
 --- a/hw/device-assignment.c
 +++ b/hw/device-assignment.c
 @@ -233,7 +233,8 @@ static void assigned_dev_iomem_map_slow(PCIDevice 
 *pci_dev, int region_num,
  int m;
  
  DEBUG(slow map\n);
 -m = cpu_register_io_memory(slow_bar_read, slow_bar_write, region);
 +m = cpu_register_io_memory(slow_bar_read, region_num == PCI_ROM_SLOT ?
 +   NULL : slow_bar_write, region);
  cpu_register_physical_memory(e_phys, e_size, m);
  
  /* MSI-X MMIO page */
   

I guess so, yes. I'd prefer a written out if statement though, but
that's probably personal preference.

Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33


On 12/22/2009 06:21 PM, Andi Kleen wrote:

So far, the only actual technical advantage I've seen is that vbus avoids
EOI exits.
 

The technical advantage is that it's significantly faster today.

Maybe your proposed alternative is as fast, or maybe it's not. Who knows?
   


We're working on numbers for the proposed alternative, so we should know 
soon.  Are the AlacrityVM folks working on having all the virtio drivers 
for all the virtio archs?


We shouldn't drop everything and switch to new code just because someone 
came up with a new idea.  The default should be to enhance the existing 
code.



We think we understand why vbus does better than the current userspace
virtio backend.  That's why we're building vhost-net.  It's not done yet,
but our expectation is that it will do just as well if not better.
 

That's the vapourware vs working code disconnect I mentioned. One side has hard
numbersworking code and the other has expectations. I usually find it sad when 
the
vapourware holds up the working code.
   


vhost-net is working code and is queued for 2.6.33.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/6 v2] Add support for RDTSCP in VMX

2009-12-22 Thread Marcelo Tosatti

On Sun, Dec 20, 2009 at 11:30:00AM +0200, Avi Kivity wrote:
 On 12/18/2009 10:48 AM, Sheng Yang wrote:

 Applied all, thanks.

Need to save/restore MSR_TSC_AUX in qemu-kvm.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] vhost-net: comment use of invalid fd when setting vhost backend

2009-12-22 Thread Chris Wright

This looks like an error case, but it's just a special case to shutdown
the backend.  Clarify with a comment.

Signed-off-by: Chris Wright chr...@redhat.com
---
 drivers/vhost/net.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 22d5fef..cc92086 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -465,6 +465,7 @@ static struct socket *get_tun_socket(int fd)
 static struct socket *get_socket(int fd)
 {
struct socket *sock;
+   /* special case to disable backend */
if (fd == -1)
return NULL;
sock = get_raw_socket(fd);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/6 v2] Add support for RDTSCP in VMX


On 12/22/2009 07:26 PM, Marcelo Tosatti wrote:

On Sun, Dec 20, 2009 at 11:30:00AM +0200, Avi Kivity wrote:
   

On 12/18/2009 10:48 AM, Sheng Yang wrote:

Applied all, thanks.
 

Need to save/restore MSR_TSC_AUX in qemu-kvm.
   


Should be automatic.  After all, we expose all the MSR list, so qemu can 
read it and save everything there.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33


On 12/22/2009 07:36 PM, Gregory Haskins wrote:



Gregory, it would be nice if you worked _much_ harder with the KVM folks
before giving up.
 

I think the 5+ months that I politely tried to convince the KVM folks
that this was a good idea was pretty generous of my employer.  The KVM
maintainers have ultimately made it clear they are not interested in
directly supporting this concept (which is their prerogative), but are
perhaps willing to support the peripheral logic needed to allow it to
easily interface with KVM.  I can accept that, and thus AlacrityVM was born.
   


Review pointed out locking issues with xinterface which I have not seen 
addressed.  I asked why the irqfd/ioeventfd mechanisms are insufficient, 
and you did not reply.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/09 1:53 PM, Avi Kivity wrote:
 On 12/22/2009 07:36 PM, Gregory Haskins wrote:

 Gregory, it would be nice if you worked _much_ harder with the KVM folks
 before giving up.
  
 I think the 5+ months that I politely tried to convince the KVM folks
 that this was a good idea was pretty generous of my employer.  The KVM
 maintainers have ultimately made it clear they are not interested in
 directly supporting this concept (which is their prerogative), but are
 perhaps willing to support the peripheral logic needed to allow it to
 easily interface with KVM.  I can accept that, and thus AlacrityVM was
 born.

 
 Review pointed out locking issues with xinterface which I have not seen
 addressed.  I asked why the irqfd/ioeventfd mechanisms are insufficient,
 and you did not reply.
 

Yes, I understand.  I've been too busy to rework the code for an
upstream push.  I will certainly address those questions when I make the
next attempt, but they weren't relevant to the guest side.



signature.asc
Description: OpenPGP digital signature

[PATCH] vhost-net: defer f-private_data until setup succeeds

2009-12-22 Thread Chris Wright

Trivial change, just for readability.  The filp is not installed on
failure, so the current code is not incorrect (also vhost_dev_init
currently has no failure case).  This just treats setting f-private_data
as something with global scope (sure, true only after fd_install).

Signed-off-by: Chris Wright chr...@redhat.com
---
 drivers/vhost/net.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 22d5fef..0697ab2 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -326,7 +326,6 @@ static int vhost_net_open(struct inode *inode, struct file 
*f)
int r;
if (!n)
return -ENOMEM;
-   f-private_data = n;
n-vqs[VHOST_NET_VQ_TX].handle_kick = handle_tx_kick;
n-vqs[VHOST_NET_VQ_RX].handle_kick = handle_rx_kick;
r = vhost_dev_init(n-dev, n-vqs, VHOST_NET_VQ_MAX);
@@ -338,6 +337,9 @@ static int vhost_net_open(struct inode *inode, struct file 
*f)
vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT);
vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN);
n-tx_poll_state = VHOST_NET_POLL_DISABLED;
+
+   f-private_data = n;
+
return 0;
 }
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/09 1:53 PM, Avi Kivity wrote:
 I asked why the irqfd/ioeventfd mechanisms are insufficient, and you did not 
 reply.
 

BTW: the ioeventfd issue just fell through the cracks, so sorry about
that.  Note that I have no specific issue with irqfd ever since the
lockless IRQ injection code was added.

ioeventfd turned out to be suboptimal for me in the fast path for two
reasons:

1) the underlying eventfd is called in atomic context.  I had posted
patches to Davide to address that limitation, but I believe he rejected
them on the grounds that they are only relevant to KVM.

2) it cannot retain the data field passed in the PIO.  I wanted to have
one vector that could tell me what value was written, and this cannot be
expressed in ioeventfd.

Based on this, it was a better decision to add a ioevent interface to
xinterface.  It neatly solves both problems.

Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature

Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33


On 12/22/2009 09:15 PM, Gregory Haskins wrote:

On 12/22/09 1:53 PM, Avi Kivity wrote:
   

I asked why the irqfd/ioeventfd mechanisms are insufficient, and you did not 
reply.

 

BTW: the ioeventfd issue just fell through the cracks, so sorry about
that.  Note that I have no specific issue with irqfd ever since the
lockless IRQ injection code was added.

ioeventfd turned out to be suboptimal for me in the fast path for two
reasons:

1) the underlying eventfd is called in atomic context.  I had posted
patches to Davide to address that limitation, but I believe he rejected
them on the grounds that they are only relevant to KVM.
   


If you're not doing something pretty minor, you're better of waking up a 
thread (perhaps _sync if you want to keep on the same cpu).  With the 
new user return notifier thingie, that's pretty cheap.



2) it cannot retain the data field passed in the PIO.  I wanted to have
one vector that could tell me what value was written, and this cannot be
expressed in ioeventfd.

   


It would be easier to add data logging support to ioeventfd, if it was 
needed that badly.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/09 2:32 PM, Gregory Haskins wrote:
 On 12/22/09 2:25 PM, Avi Kivity wrote:


 If you're not doing something pretty minor, you're better of waking up a
 thread (perhaps _sync if you want to keep on the same cpu).  With the
 new user return notifier thingie, that's pretty cheap.
 
 We have exploits that take advantage of IO heuristics.  When triggered
 they do more work in vcpu context than normal, which reduces latency
 under certain circumstances.  But you definitely do _not_ want to do
 them in-atomic ;)

And I almost forgot:  dev-call() is an RPC to the backend device.
Therefore, it must be synchronous, yet we dont want it locked either.  I
think that was actually the primary motivation for the change, now that
I think about it.

-Greg



signature.asc
Description: OpenPGP digital signature

Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/09 2:38 PM, Avi Kivity wrote:
 On 12/22/2009 09:32 PM, Gregory Haskins wrote:
 xinterface, as it turns out, is a great KVM interface for me and easy to
 extend, all without conflicting with the changes in upstream.  The old
 way was via the kvm ioctl interface, but that sucked as the ABI was
 always moving.  Where is the problem?  ioeventfd still works fine as
 it is.

 
 It means that kvm locking suddenly affects more of the kernel.
 

Thats ok.  This would only be w.r.t. devices that are bound to the KVM
instance anyway, so they better know what they are doing (and they do).

-Greg



signature.asc
Description: OpenPGP digital signature

Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33


On 12/22/2009 09:32 PM, Gregory Haskins wrote:

Besides, Davide has
already expressed dissatisfaction with the KVM-isms creeping into
eventfd, so its not likely to ever be accepted regardless of your own
disposition.
   


Why don't you duplicate eventfd, then, should be easier than duplicating 
virtio.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33


On 12/22/2009 09:41 PM, Gregory Haskins wrote:



It means that kvm locking suddenly affects more of the kernel.

 

Thats ok.  This would only be w.r.t. devices that are bound to the KVM
instance anyway, so they better know what they are doing (and they do).

   


It's okay to the author of that device.  It's not okay to the kvm 
developers who are still evolving the locking and have to handle all 
devices that use xinterface.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/09 2:43 PM, Avi Kivity wrote:
 On 12/22/2009 09:41 PM, Gregory Haskins wrote:

 It means that kvm locking suddenly affects more of the kernel.

  
 Thats ok.  This would only be w.r.t. devices that are bound to the KVM
 instance anyway, so they better know what they are doing (and they do).


 
 It's okay to the author of that device.  It's not okay to the kvm
 developers who are still evolving the locking and have to handle all
 devices that use xinterface.

Perhaps, but like it or not, if you want to do in-kernel you need to
invoke backends.  And if you want to invoke backends, limiting it to
thread wakeups is, well, limiting.  For one, you miss out on that
exploit I mentioned earlier which can help sometimes.

Besides, the direction that Marcelo and I left the mmio/pio bus was that
it would go lockless eventually, not more lockful ;)

Has that changed?  I honestly haven't followed whats going on in the
io-bus code in a while.

-Greg




signature.asc
Description: OpenPGP digital signature

Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/09 2:39 PM, Davide Libenzi wrote:
On Tue, 22 Dec 2009, Gregory Haskins wrote:

On 12/22/09 1:53 PM, Avi Kivity wrote:
I asked why the irqfd/ioeventfd mechanisms are insufficient, and you did
not reply.

BTW: the ioeventfd issue just fell through the cracks, so sorry about
that. Note that I have no specific issue with irqfd ever since the
lockless IRQ injection code was added.

ioeventfd turned out to be suboptimal for me in the fast path for two
reasons:

1) the underlying eventfd is called in atomic context. I had posted
patches to Davide to address that limitation, but I believe he rejected
them on the grounds that they are only relevant to KVM.

I thought we addressed this already, in the few hundreds of email we
exchanged back then :)

We addressed the race conditions, but not the atomic callbacks. I can't
remember exactly what you said, but the effect was no, so I dropped it. ;)

This was the thread.

http://www.archivum.info/linux-ker...@vger.kernel.org/2009-06/08548/Re:_[KVM-RFC_PATCH_1_2]_eventfd:_add_an_explicit_srcu_based_notifier_interface

2) it cannot retain the data field passed in the PIO. I wanted to have
one vector that could tell me what value was written, and this cannot be
expressed in ioeventfd.

Like might have hinted in his reply, couldn't you add data support to the
ioeventfd bits in KVM, instead of leaking them into mainline eventfd?

Perhaps, or even easier I could extend xinterface. Which is what I did ;)

The problem with the first proposal is that you would no longer actually
have an eventfd based mechanism...so any code using ioeventfd (like
Michael Tsirkin's for instance) would break.

Kind Regards,
-Greg

signature.asc
Description: OpenPGP digital signature

Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/21/09 7:12 PM, Anthony Liguori wrote:
 On 12/21/2009 11:44 AM, Gregory Haskins wrote:
 Well, surely something like SR-IOV is moving in that direction, no?

 
 Not really, but that's a different discussion.

Ok, but my general point still stands.  At some level, some crafty
hardware engineer may invent something that obsoletes the
need for, say, PV 802.x drivers because it can hit 40GE line rate at the
same performance level of bare metal with some kind of pass-through
trick.  But I still do not see that as an excuse for sloppy software in
the meantime, as there will always be older platforms, older IO cards,
or different IO types that are not benefactors of said hw based
optimizations.

 
 But let's focus on concrete data.  For a given workload,
 how many exits do you see due to EOI?
  
 Its of course highly workload dependent, and I've published these
 details in the past, I believe.  Off the top of my head, I recall that
 virtio-pci tends to throw about 65k exits per second, vs about 32k/s for
 venet on a 10GE box, but I don't recall what ratio of those exits are
 EOI.
 
 Was this userspace virtio-pci or was this vhost-net?

Both, actually, though userspace is obviously even worse.

  If it was the
 former, then were you using MSI-X?

MSI-X

  If you weren't, there would be an
 additional (rather heavy) exit per-interrupt to clear the ISR which
 would certainly account for a large portion of the additional exits.


Yep, if you don't use MSI it is significantly worse as expected.


To be perfectly honest, I don't care.  I do not discriminate
 against the exit type...I want to eliminate as many as possible,
 regardless of the type.  That's how you go fast and yet use less CPU.

 
 It's important to understand why one mechanism is better than another. 

Agreed, but note _I_ already understand why.  I've certainly spent
countless hours/emails trying to get others to understand as well, but
it seems most are too busy to actually listen.


 All I'm looking for is a set of bullet points that say, vbus does this,
 vhost-net does that, therefore vbus is better.  We would then either
 say, oh, that's a good idea, let's change vhost-net to do that, or we
 would say, hrm, well, we can't change vhost-net to do that because of
 some fundamental flaw, let's drop it and adopt vbus.
 
 It's really that simple :-)

This is all been covered ad-nauseam, directly with youself in many
cases.  Google is your friend.

Here are some tips while you research:  Do not fall into the trap of
vhost-net vs vbus, or venet vs virtio-net, or you miss the point
entirely.  Recall that venet was originally crafted to demonstrate the
virtues of my three performance objectives (kill exits, reduce exit
overhead, and run concurrently). Then there is all the stuff we are
laying on top, like qos, real-time, advanced fabrics, and easy adoption
for various environments (so it doesn't need to be redefined each time).

Therefore if you only look at the limited feature set of virtio-net, you
will miss the majority of the points of the framework.  virtio tried to
capture some of these ideas, but it missed the mark on several levels
and was only partially defined.  Incidentally, you can stil run virtio
over vbus if desired, but so far no one has tried to use my transport.

 
 
   They should be relatively rare
 because obtaining good receive batching is pretty easy.
  
 Batching is poor mans throughput (its easy when you dont care about
 latency), so we generally avoid as much as possible.

 
 Fair enough.
 
 Considering
 these are lightweight exits (on the order of 1-2us),
  
 APIC EOIs on x86 are MMIO based, so they are generally much heavier than
 that.  I measure at least 4-5us just for the MMIO exit on my Woodcrest,
 never mind executing the locking/apic-emulation code.

 
 You won't like to hear me say this, but Woodcrests are pretty old and
 clunky as far as VT goes :-)

Fair enough.

 
 On a modern Nehalem, I would be surprised if an MMIO exit handled in the
 kernel was muck more than 2us.  The hardware is getting very, very
 fast.  The trends here are very important to consider when we're looking
 at architectures that we potentially are going to support for a long time.

The exit you do not take will always be infinitely faster.

 
 you need an awfully
 large amount of interrupts before you get really significant performance
 impact.  You would think NAPI would kick in at this point anyway.

  
 Whether NAPI can kick in or not is workload dependent, and it also does
 not address coincident events.  But on that topic, you can think of
 AlacrityVM's interrupt controller as NAPI for interrupts, because it
 operates on the same principle.  For what its worth, it also operates on
 a NAPI for hypercalls concept too.

 
 The concept of always batching hypercalls has certainly been explored
 within the context of Xen.

I am not talking about batching, which again is a poor mans throughput
trick at the

Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

2009-12-22 Thread Anthony Liguori


On 12/22/2009 11:33 AM, Andi Kleen wrote:

We're not talking about vaporware.  vhost-net exists.
 

Is it as fast as the alacrityvm setup then e.g. for network traffic?

Last I heard the first could do wirespeed 10Gbit/s on standard hardware.
   


I'm very wary of any such claims.  As far as I know, no one has done an 
exhaustive study of vbus and published the results.  This is why it's so 
important to understand why the results are what they are when we see 
numbers posted.


For instance, check out 
http://www.redhat.com/f/pdf/summit/cwright_11_open_source_virt.pdf slide 32.


These benchmarks show KVM without vhost-net pretty closely pacing 
native.  With large message sizes, it's awfully close to line rate.


Comparatively speaking, consider 
http://developer.novell.com/wiki/index.php/AlacrityVM/Results


vbus here is pretty far off of native and virtio-net is ridiculus.

Why are the results so different?  Because benchmarking is fickle and 
networking performance is complicated.  No one benchmarking scenario is 
going to give you a very good picture overall.  It's also relatively 
easy to stack the cards in favor of one approach verses another.  The 
virtio-net setup probably made extensive use of pinning and other tricks 
to make things faster than a normal user would see them.  It ends up 
creating a perfect combination of batching which is pretty much just 
cooking the mitigation schemes to do extremely well for one benchmark.


This is why it's so important to look at vbus from the perspective of 
critically asking, what precisely makes it better than virtio.  A couple 
benchmarks on a single piece of hardware does not constitute an 
existence proof that it's better overall.


There are a ton of differences between virtio and vbus because vbus was 
written in a vacuum wrt virtio.  I'm not saying we are totally committed 
to virtio no matter what, but it should take a whole lot more than a 
couple netperf runs on a single piece of hardware for a single kind of 
driver to justify replacing it.



Can vhost-net do the same thing?


I think the fundamentally question is, what makes vbus better than 
vhost-net?  vhost-net exists and is further along upstream than vbus is 
at the moment.  If that question cannot be answered with technical facts 
and numbers to back them up, then we're just arguing for the sake of 
arguing.


Regards,

Anthony Liguori


-Andi
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

2009-12-22 Thread Davide Libenzi

On Tue, 22 Dec 2009, Gregory Haskins wrote:

On 12/22/09 2:39 PM, Davide Libenzi wrote:
On Tue, 22 Dec 2009, Gregory Haskins wrote:

On 12/22/09 1:53 PM, Avi Kivity wrote:
I asked why the irqfd/ioeventfd mechanisms are insufficient, and you did
not reply.

BTW: the ioeventfd issue just fell through the cracks, so sorry about
that. Note that I have no specific issue with irqfd ever since the
lockless IRQ injection code was added.

ioeventfd turned out to be suboptimal for me in the fast path for two
reasons:

1) the underlying eventfd is called in atomic context. I had posted
patches to Davide to address that limitation, but I believe he rejected
them on the grounds that they are only relevant to KVM.

I thought we addressed this already, in the few hundreds of email we
exchanged back then :)

We addressed the race conditions, but not the atomic callbacks. I can't
remember exactly what you said, but the effect was no, so I dropped it. ;)

This was the thread.

http://www.archivum.info/linux-ker...@vger.kernel.org/2009-06/08548/Re:_[KVM-RFC_PATCH_1_2]_eventfd:_add_an_explicit_srcu_based_notifier_interface

Didn't that ended up in schedule_work() being just fine, and no need was
there for pre-emptible callbacks?

2) it cannot retain the data field passed in the PIO. I wanted to have
one vector that could tell me what value was written, and this cannot be
expressed in ioeventfd.

Like might have hinted in his reply, couldn't you add data support to the
ioeventfd bits in KVM, instead of leaking them into mainline eventfd?

Perhaps, or even easier I could extend xinterface. Which is what I did ;)

The problem with the first proposal is that you would no longer actually
have an eventfd based mechanism...so any code using ioeventfd (like
Michael Tsirkin's for instance) would break.

At that point, the KVM eventfd can take care of thing so that Michael bits
do not break.

- Davide

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

2009-12-22 Thread Kyle Moffett

On Tue, Dec 22, 2009 at 12:36, Gregory Haskins
gregory.hask...@gmail.com wrote:
 On 12/22/09 2:57 AM, Ingo Molnar wrote:
 * Gregory Haskins gregory.hask...@gmail.com wrote:
 Actually, these patches have nothing to do with the KVM folks. [...]

 That claim is curious to me - the AlacrityVM host

 It's quite simple, really.  These drivers support accessing vbus, and
 vbus is hypervisor agnostic.  In fact, vbus isn't necessarily even
 hypervisor related.  It may be used anywhere where a Linux kernel is the
 io backend, which includes hypervisors like AlacrityVM, but also
 userspace apps, and interconnected physical systems as well.

 The vbus-core on the backend, and the drivers on the frontend operate
 completely independent of the underlying hypervisor.  A glue piece
 called a connector ties them together, and any hypervisor specific
 details are encapsulated in the connector module.  In this case, the
 connector surfaces to the guest side as a pci-bridge, so even that is
 not hypervisor specific per se.  It will work with any pci-bridge that
 exposes a compatible ABI, which conceivably could be actual hardware.

This is actually something that is of particular interest to me.  I
have a few prototype boards right now with programmable PCI-E
host/device links on them; one of my long-term plans is to finagle
vbus into providing multiple virtual devices across that single
PCI-E interface.

Specifically, I want to be able to provide virtual NIC(s), serial
ports and serial consoles, virtual block storage, and possibly other
kinds of interfaces.  My big problem with existing virtio right now
(although I would be happy to be proven wrong) is that it seems to
need some sort of out-of-band communication channel for setting up
devices, not to mention it seems to need one PCI device per virtual
device.

So I would love to be able to port something like vbus to my nify PCI
hardware and write some backend drivers... then my PCI-E connected
systems would dynamically provide a list of highly-efficient virtual
devices to each other, with only one 4-lane PCI-E bus.

Cheers,
Kyle Moffett
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vhost: fix high 32 bit in FEATURES ioctls

2009-12-22 Thread Rusty Russell

On Tue, 22 Dec 2009 10:09:33 pm Michael S. Tsirkin wrote:
 From: David Stevens dlstev...@us.ibm.com
 Subject: vhost: fix high 32 bit in FEATURES ioctls

Thanks, applied.

Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

2009-12-22 Thread Ingo Molnar


* Anthony Liguori anth...@codemonkey.ws wrote:

 On 12/22/2009 10:01 AM, Bartlomiej Zolnierkiewicz wrote:
 new e1000 driver is more superior in architecture and do the required
 work to make the new e1000 driver a full replacement for the old one.
 Right, like everyone actually does things this way..
 
 I wonder why do we have OSS, old Firewire and IDE stacks still around then?
 
 And it's always a source of pain, isn't it.

Even putting aside the fact that such overlap sucks and is a pain to users 
(and that 98% of driver and subsystem version transitions are done completely 
seemlessly to users - the examples that were cited were the odd ones out of 
150K commits in the past 4 years, 149K+ of which are seemless), the comparison 
does not even apply really.

e1000, OSS, old Firewire and IDE are hardware stacks, where hardware is a not 
fully understood externality, with its inevitable set of compatibility voes. 
There's often situations where one piece of hardware still works better with 
the old driver, for some odd (or not so odd) reason.

Also, note that the 'new' hw drivers are generally intended and are maintained 
as clear replacements for the old ones, and do so with minimal ABI changes - 
or preferably with no ABI changes at all. Most driver developers just switch 
from old to new and the old bits are left around and are phased out. We phased 
out old OSS recently.

That is a very different situation from the AlacrityVM patches, which:

 - Are a pure software concept and any compatibility mismatch is
   self-inflicted. The patches are in fact breaking the ABI to KVM 
   intentionally (for better or worse).

 - Gregory claims that the AlacricityVM patches are not a replacement for KVM.
   I.e. there's no intention to phase out any 'old stuff' and it splits the 
   pool of driver developers.

i.e. it has all the makings of a stupid, avoidable, permanent fork. The thing 
is, if AlacricityVM is better, and KVM developers are not willing to fix their 
stuff, replace KVM with it.

It's a bit as if someone found a performance problem with sys_open() and came 
up with sys_open_v2() and claimed that he wants to work with the VFS 
developers while not really doing so but advances sys_open_v2() all the time. 

Do we allow sys_open_v2() upstream, in the name of openness and diversity, 
letting some apps use that syscall while other apps still use sys_open()? Or 
do we say enough is enough of this stupidity, come up with some strong 
reasons to replace sys_open, and if so, replace the thing and be done with the 
pain!.

Overlap and forking can still be done in special circumstances, when a project 
splits and a hostile fork is inevitable due to prolongued and irreconcilable 
differences between the parties and if there's no strong technical advantage 
on either side. I havent seen evidence of this yet though: Gregory claims that 
he wants to 'work with the community' and the KVM guys seem to agree violently 
that performance can be improved - and are doing so (and are asking Gregory to 
take part in that effort).

The main difference is that Gregory claims that improved performance is not 
possible within the existing KVM framework, while the KVM developers disagree. 
The good news is that this is a hard, testable fact.

I think we should try _much_ harder before giving up and forking the ABI of a 
healthy project and intentionally inflicting pain on our users.

And, at minimum, such kinds of things _have to_ be pointed out in pull 
requests, because it's like utterly important. In fact i couldnt list any more 
important thing to point out in a pull request.

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Improve Decrementor Implementation v2


On 12/21/2009 09:21 PM, Alexander Graf wrote:

We currently have an ugly hack called AGGRESSIVE_DEC that makes the Decrementor
either work well for PPC32 or PPC64 targets.

This patchset removes that hack, bringing the decrementor implementation closer
to real hardware.
   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Change PowerPC KVM maintainer