RE: qemu-kvm-0.14.0 msix_mask_notifier failed.

2011-03-21 Thread Dietmar Maurer
I simply defined the syscall myself - but this only works if your kernel 
supports the eventfd syscall.
Note: You need to adjust the the syscall NR for your architecture.

-
  glibc < 2.8 does not have eventfd support. So we define the 
  syscall manually. That is required for vhost=on.

Index: new/hw/event_notifier.c
===
--- new.orig/hw/event_notifier.c2011-03-01 12:16:30.0 +0100
+++ new/hw/event_notifier.c 2011-03-01 12:36:02.0 +0100
@@ -14,6 +14,13 @@
 #include "event_notifier.h"
 #ifdef CONFIG_EVENTFD
 #include 
+#else
+#include 
+#include 
+/* this is for AMD64 */ 
+#define __NR_eventfd2  290
+#define EFD_NONBLOCK  04000
+#define EFD_CLOEXEC 0200
 #endif
 
 int event_notifier_init(EventNotifier *e, int active)
@@ -25,7 +32,11 @@
 e->fd = fd;
 return 0;
 #else
-return -ENOSYS;
+int fd = syscall(__NR_eventfd2, !!active, EFD_NONBLOCK | EFD_CLOEXEC);
+if (fd < 0)
+return -errno;
+e->fd = fd;
+return 0;
 #endif
 }

> -Original Message-
> From: Nikola Ciprich [mailto:extmaill...@linuxbox.cz]
> Sent: Montag, 21. März 2011 21:48
> To: Dietmar Maurer
> Cc: kvm@vger.kernel.org; nikola.cipr...@linuxbox.cz
> Subject: Re: qemu-kvm-0.14.0 msix_mask_notifier failed.
> 
> Hello Dietmar,
> did You somehow fixed this problem?
> BR
> nik
> 
> On Tue, Mar 01, 2011 at 11:43:39AM +, Dietmar Maurer wrote:
> > Seems the bug is related to EVENTFD support.
> >
> > int event_notifier_init(EventNotifier *e, int active)
> >
> > That simply return -ENOSYS on older glibc (< 2.8)
> >
> > I can define the syscall manually, but I wonder if there is a better way?
> >
> > - Dietmar
> >
> >
> > > I get this error when I run new kvm 0.14.0 on kernel 2.6.35 (vhost=on):
> > >
> > > # kvm -netdev type=tap,id=n0,vhost=on -device
> > > virtio-net-pci,netdev=n0 -cdrom ubuntu-10.10-desktop-amd64.iso
> > >
> > > kvm: /home/code/qemu-kvm/hw/msix.c:639: msix_unset_mask_notifier:
> > > Assertion `dev->msix_mask_notifier' failed.
> > >
> > > Any ideas?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6] Do not use sysdevs for implementing "core" PM operations on x86

2011-03-21 Thread Rafael J. Wysocki
Hi,

On Saturday, March 12, 2011, Rafael J. Wysocki wrote:
> On Thursday, March 10, 2011, Rafael J. Wysocki wrote:
> > There are multiple problems with sysdevs, or struct sys_device objects to
> > be precise, that are so annoying that some people have started to think
> > of removind them entirely from the kernel.  To me, personally, the most
> > obvious issue is the way sysdevs are used for defining suspend/resume
> > callbacks to be executed with one CPU on-line and interrupts disabled.
> > Greg and Kay may tell you more about the other problems with sysdevs. :-)
> > 
> > Some subsystems need to carry out certain operations during suspend after
> > we've disabled non-boot CPUs and interrupts have been switched off on the
> > only on-line one.  Currently, the only way to achieve that is to define
> > sysdev suspend/resume callbacks, but this is cumbersome and inefficient.
> > Namely, to do that, one has to define a sysdev class providing the callbacks
> > and a sysdev actually using them, which is excessively complicated.  
> > Moreover,
> > the sysdev suspend/resume callbacks take arguments that are not really used
> > by the majority of subsystems defining sysdev suspend/resume callbacks
> > (or even if they are used, they don't really _need_ to be used, so they
> > are simply unnecessary).  Of course, if a sysdev is only defined to provide
> > suspend/resume (and maybe shutdown) callbacks, there's no real reason why
> > it should show up in sysfs.
> > 
> > For this reason, I thought it would be a good idea to provide a simpler
> > interface for subsystems to define "very late" suspend callbacks and
> > "very early" resume callbacks (and "very late" shutdown callbacks as well)
> > without the entire bloat related to sysdevs.  The interface is introduced
> > by the first of the following patches, while the second patch converts some
> > sysdev users related to the x86 architecture to using the new interface.
> > 
> > I believe that call sysdev users who need to define suspend/resume/shutdown
> > callbacks may be converted to using the interface provided by the first 
> > patch,
> > which in turn should allow us to convert the remaining sysdev functionality
> > into "normal" struct device interfaces.  Still, even if that turns out to be
> > too complicated, the bloat reduction resulting from the second patch kind of
> > shows that moving at least some sysdev users to a simpler interface (like in
> > the first patch) is a good idea anyway.
> > 
> > This is a proof of concept, so the patches have not been tested.  Please be
> > extrememly careful, because they touch sensitive code, so to speak.  In the
> > majority of cases the changes are rather straightforward, but there are some
> > more interesting cases as well (io_apic.c most importantly).
> 
> Since Greg likes the idea and there haven't been any objections so far, here's
> the official submission.  The patches have been tested on HP nx6325 and
> Toshiba Portege R500.
> 
> Patch [1/8] is regareded as 2.6.38 material, following Greg's advice.  The
> other patches in the set are regarded as 2.6.39 material.  The last one
> obviously depends on all of the previous ones.
> 
> [1/8] - Introduce struct syscore_ops for registering operations to be run on
> one CPU during suspend/resume/shutdown.
> 
> [2/8] - Convert sysdev users in arch/x86 to using struct syscore_ops.
> 
> [3/8] - Make ACPI use struct syscore_ops for irqrouter_resume().
> 
> [4/8] - Make timekeeping use struct syscore_ops for suspend/resume.
> 
> [5/8] - Make Intel IOMMU use struct syscore_ops for suspend/resume.
> 
> [6/8] - Make KVM use struct syscore_ops for suspend/resume.
> 
> [7/8] - Make cpufreq use struct syscore_ops for boot CPU suspend/resume.
> 
> [8/8] - Introduce config switch allowing architectures to skip sysdev
> suspend/resume/shutdown code.
> 
> If there are no objectsions, I'd like to push these patches through the 
> suspend
> tree.

[1/8] has been merged in the meantime and [3/8] has been included into the
ACPI tree.  if there are no objections, I'm going to push the following
patches to Linus this week through the suspend-2.6 tree:

[1/6] - Convert sysdev users in arch/x86 to using struct syscore_ops.

[2/6] - Make timekeeping use struct syscore_ops for suspend/resume.
 
[3/6] - Make Intel IOMMU use struct syscore_ops for suspend/resume.

[4/6] - Make KVM use struct syscore_ops for suspend/resume.

[5/6] - Make cpufreq use struct syscore_ops for boot CPU suspend/resume.

[6/6] - Introduce config switch allowing architectures to skip sysdev
suspend/resume/shutdown code.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] x86: Use syscore_ops instead of sysdev classes and sysdevs

2011-03-21 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

Some subsystems in the x86 tree need to carry out suspend/resume and
shutdown operations with one CPU on-line and interrupts disabled and
they define sysdev classes and sysdevs or sysdev drivers for this
purpose.  This leads to unnecessarily complicated code and excessive
memory usage, so switch them to using struct syscore_ops objects for
this purpose instead.

Generally, there are three categories of subsystems that use
sysdevs for implementing PM operations: (1) subsystems whose
suspend/resume callbacks ignore their arguments entirely (the
majority), (2) subsystems whose suspend/resume callbacks use their
struct sys_device argument, but don't really need to do that,
because they can be implemented differently in an arguably simpler
way (io_apic.c), and (3) subsystems whose suspend/resume callbacks
use their struct sys_device argument, but the value of that argument
is always the same and could be ignored (microcode_core.c).  In all
of these cases the subsystems in question may be readily converted to
using struct syscore_ops objects for power management and shutdown.

Signed-off-by: Rafael J. Wysocki 
Reviewed-by: Thomas Gleixner 
---
 arch/x86/kernel/amd_iommu_init.c |   26 ++
 arch/x86/kernel/apic/apic.c  |   33 +++--
 arch/x86/kernel/apic/io_apic.c   |   97 ++-
 arch/x86/kernel/cpu/mcheck/mce.c |   21 
 arch/x86/kernel/cpu/mtrr/main.c  |   10 ++--
 arch/x86/kernel/i8237.c  |   30 ++--
 arch/x86/kernel/i8259.c  |   33 -
 arch/x86/kernel/microcode_core.c |   34 +
 arch/x86/kernel/pci-gart_64.c|   32 ++--
 arch/x86/oprofile/nmi_int.c  |   44 -
 10 files changed, 128 insertions(+), 232 deletions(-)

Index: linux-2.6/arch/x86/kernel/amd_iommu_init.c
===
--- linux-2.6.orig/arch/x86/kernel/amd_iommu_init.c
+++ linux-2.6/arch/x86/kernel/amd_iommu_init.c
@@ -21,7 +21,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -1260,7 +1260,7 @@ static void disable_iommus(void)
  * disable suspend until real resume implemented
  */
 
-static int amd_iommu_resume(struct sys_device *dev)
+static void amd_iommu_resume(void)
 {
struct amd_iommu *iommu;
 
@@ -1276,11 +1276,9 @@ static int amd_iommu_resume(struct sys_d
 */
amd_iommu_flush_all_devices();
amd_iommu_flush_all_domains();
-
-   return 0;
 }
 
-static int amd_iommu_suspend(struct sys_device *dev, pm_message_t state)
+static int amd_iommu_suspend(void)
 {
/* disable IOMMUs to go out of the way for BIOS */
disable_iommus();
@@ -1288,17 +1286,11 @@ static int amd_iommu_suspend(struct sys_
return 0;
 }
 
-static struct sysdev_class amd_iommu_sysdev_class = {
-   .name = "amd_iommu",
+static struct syscore_ops amd_iommu_syscore_ops = {
.suspend = amd_iommu_suspend,
.resume = amd_iommu_resume,
 };
 
-static struct sys_device device_amd_iommu = {
-   .id = 0,
-   .cls = &amd_iommu_sysdev_class,
-};
-
 /*
  * This is the core init function for AMD IOMMU hardware in the system.
  * This function is called from the generic x86 DMA layer initialization
@@ -1415,14 +1407,6 @@ static int __init amd_iommu_init(void)
goto free;
}
 
-   ret = sysdev_class_register(&amd_iommu_sysdev_class);
-   if (ret)
-   goto free;
-
-   ret = sysdev_register(&device_amd_iommu);
-   if (ret)
-   goto free;
-
ret = amd_iommu_init_devices();
if (ret)
goto free;
@@ -1441,6 +1425,8 @@ static int __init amd_iommu_init(void)
 
amd_iommu_init_notifier();
 
+   register_syscore_ops(&amd_iommu_syscore_ops);
+
if (iommu_pass_through)
goto out;
 
Index: linux-2.6/arch/x86/kernel/apic/apic.c
===
--- linux-2.6.orig/arch/x86/kernel/apic/apic.c
+++ linux-2.6/arch/x86/kernel/apic/apic.c
@@ -24,7 +24,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -2046,7 +2046,7 @@ static struct {
unsigned int apic_thmr;
 } apic_pm_state;
 
-static int lapic_suspend(struct sys_device *dev, pm_message_t state)
+static int lapic_suspend(void)
 {
unsigned long flags;
int maxlvt;
@@ -2084,23 +2084,21 @@ static int lapic_suspend(struct sys_devi
return 0;
 }
 
-static int lapic_resume(struct sys_device *dev)
+static void lapic_resume(void)
 {
unsigned int l, h;
unsigned long flags;
-   int maxlvt;
-   int ret = 0;
+   int maxlvt, ret;
struct IO_APIC_route_entry **ioapic_entries = NULL;
 
if (!apic_pm_state.active)
-   return 0;
+   return;
 
local_irq_save(flags);
if (intr_remapping_enabled) {
ioapic_ent

[PATCH 4/6] KVM: Use syscore_ops instead of sysdev class and sysdev

2011-03-21 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

KVM uses a sysdev class and a sysdev for executing kvm_suspend()
after interrupts have been turned off on the boot CPU (during system
suspend) and for executing kvm_resume() before turning on interrupts
on the boot CPU (during system resume).  However, since both of these
functions ignore their arguments, the entire mechanism may be
replaced with a struct syscore_ops object which is simpler.

Signed-off-by: Rafael J. Wysocki 
---
 virt/kvm/kvm_main.c |   34 --
 1 file changed, 8 insertions(+), 26 deletions(-)

Index: linux-2.6/virt/kvm/kvm_main.c
===
--- linux-2.6.orig/virt/kvm/kvm_main.c
+++ linux-2.6/virt/kvm/kvm_main.c
@@ -30,7 +30,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -2447,33 +2447,26 @@ static void kvm_exit_debug(void)
debugfs_remove(kvm_debugfs_dir);
 }
 
-static int kvm_suspend(struct sys_device *dev, pm_message_t state)
+static int kvm_suspend(void)
 {
if (kvm_usage_count)
hardware_disable_nolock(NULL);
return 0;
 }
 
-static int kvm_resume(struct sys_device *dev)
+static void kvm_resume(void)
 {
if (kvm_usage_count) {
WARN_ON(raw_spin_is_locked(&kvm_lock));
hardware_enable_nolock(NULL);
}
-   return 0;
 }
 
-static struct sysdev_class kvm_sysdev_class = {
-   .name = "kvm",
+static struct syscore_ops kvm_syscore_ops = {
.suspend = kvm_suspend,
.resume = kvm_resume,
 };
 
-static struct sys_device kvm_sysdev = {
-   .id = 0,
-   .cls = &kvm_sysdev_class,
-};
-
 struct page *bad_page;
 pfn_t bad_pfn;
 
@@ -2557,14 +2550,6 @@ int kvm_init(void *opaque, unsigned vcpu
goto out_free_2;
register_reboot_notifier(&kvm_reboot_notifier);
 
-   r = sysdev_class_register(&kvm_sysdev_class);
-   if (r)
-   goto out_free_3;
-
-   r = sysdev_register(&kvm_sysdev);
-   if (r)
-   goto out_free_4;
-
/* A kmem cache lets us meet the alignment requirements of fx_save. */
if (!vcpu_align)
vcpu_align = __alignof__(struct kvm_vcpu);
@@ -2572,7 +2557,7 @@ int kvm_init(void *opaque, unsigned vcpu
   0, NULL);
if (!kvm_vcpu_cache) {
r = -ENOMEM;
-   goto out_free_5;
+   goto out_free_3;
}
 
r = kvm_async_pf_init();
@@ -2589,6 +2574,8 @@ int kvm_init(void *opaque, unsigned vcpu
goto out_unreg;
}
 
+   register_syscore_ops(&kvm_syscore_ops);
+
kvm_preempt_ops.sched_in = kvm_sched_in;
kvm_preempt_ops.sched_out = kvm_sched_out;
 
@@ -2600,10 +2587,6 @@ out_unreg:
kvm_async_pf_deinit();
 out_free:
kmem_cache_destroy(kvm_vcpu_cache);
-out_free_5:
-   sysdev_unregister(&kvm_sysdev);
-out_free_4:
-   sysdev_class_unregister(&kvm_sysdev_class);
 out_free_3:
unregister_reboot_notifier(&kvm_reboot_notifier);
unregister_cpu_notifier(&kvm_cpu_notifier);
@@ -2631,8 +2614,7 @@ void kvm_exit(void)
misc_deregister(&kvm_dev);
kmem_cache_destroy(kvm_vcpu_cache);
kvm_async_pf_deinit();
-   sysdev_unregister(&kvm_sysdev);
-   sysdev_class_unregister(&kvm_sysdev_class);
+   unregister_syscore_ops(&kvm_syscore_ops);
unregister_reboot_notifier(&kvm_reboot_notifier);
unregister_cpu_notifier(&kvm_cpu_notifier);
on_each_cpu(hardware_disable_nolock, NULL, 1);

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] cpufreq: Use syscore_ops for boot CPU suspend/resume (v2)

2011-03-21 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

The cpufreq subsystem uses sysdev suspend and resume for
executing cpufreq_suspend() and cpufreq_resume(), respectively,
during system suspend, after interrupts have been switched off on the
boot CPU, and during system resume, while interrupts are still off on
the boot CPU.  In both cases the other CPUs are off-line at the
relevant point (either they have been switched off via CPU hotplug
during suspend, or they haven't been switched on yet during resume).
For this reason, although it may seem that cpufreq_suspend() and
cpufreq_resume() are executed for all CPUs in the system, they are
only called for the boot CPU in fact, which is quite confusing.

To remove the confusion and to prepare for elimiating sysdev
suspend and resume operations from the kernel enirely, convernt
cpufreq to using a struct syscore_ops object for the boot CPU
suspend and resume and rename the callbacks so that their names
reflect their purpose.  In addition, put some explanatory remarks
into their kerneldoc comments.

Signed-off-by: Rafael J. Wysocki 
---
 drivers/cpufreq/cpufreq.c |   66 ++
 1 file changed, 26 insertions(+), 40 deletions(-)

Index: linux-2.6/drivers/cpufreq/cpufreq.c
===
--- linux-2.6.orig/drivers/cpufreq/cpufreq.c
+++ linux-2.6/drivers/cpufreq/cpufreq.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -1340,35 +1341,31 @@ out:
 }
 EXPORT_SYMBOL(cpufreq_get);
 
+static struct sysdev_driver cpufreq_sysdev_driver = {
+   .add= cpufreq_add_dev,
+   .remove = cpufreq_remove_dev,
+};
+
 
 /**
- * cpufreq_suspend - let the low level driver prepare for suspend
+ * cpufreq_bp_suspend - Prepare the boot CPU for system suspend.
+ *
+ * This function is only executed for the boot processor.  The other CPUs
+ * have been put offline by means of CPU hotplug.
  */
-
-static int cpufreq_suspend(struct sys_device *sysdev, pm_message_t pmsg)
+static int cpufreq_bp_suspend(void)
 {
int ret = 0;
 
-   int cpu = sysdev->id;
+   int cpu = smp_processor_id();
struct cpufreq_policy *cpu_policy;
 
dprintk("suspending cpu %u\n", cpu);
 
-   if (!cpu_online(cpu))
-   return 0;
-
-   /* we may be lax here as interrupts are off. Nonetheless
-* we need to grab the correct cpu policy, as to check
-* whether we really run on this CPU.
-*/
-
+   /* If there's no policy for the boot CPU, we have nothing to do. */
cpu_policy = cpufreq_cpu_get(cpu);
if (!cpu_policy)
-   return -EINVAL;
-
-   /* only handle each CPU group once */
-   if (unlikely(cpu_policy->cpu != cpu))
-   goto out;
+   return 0;
 
if (cpufreq_driver->suspend) {
ret = cpufreq_driver->suspend(cpu_policy);
@@ -1377,13 +1374,12 @@ static int cpufreq_suspend(struct sys_de
"step on CPU %u\n", cpu_policy->cpu);
}
 
-out:
cpufreq_cpu_put(cpu_policy);
return ret;
 }
 
 /**
- * cpufreq_resume -  restore proper CPU frequency handling after resume
+ * cpufreq_bp_resume - Restore proper frequency handling of the boot CPU.
  *
  * 1.) resume CPUfreq hardware support (cpufreq_driver->resume())
  * 2.) schedule call cpufreq_update_policy() ASAP as interrupts are
@@ -1391,31 +1387,23 @@ out:
  * what we believe it to be. This is a bit later than when it
  * should be, but nonethteless it's better than calling
  * cpufreq_driver->get() here which might re-enable interrupts...
+ *
+ * This function is only executed for the boot CPU.  The other CPUs have not
+ * been turned on yet.
  */
-static int cpufreq_resume(struct sys_device *sysdev)
+static void cpufreq_bp_resume(void)
 {
int ret = 0;
 
-   int cpu = sysdev->id;
+   int cpu = smp_processor_id();
struct cpufreq_policy *cpu_policy;
 
dprintk("resuming cpu %u\n", cpu);
 
-   if (!cpu_online(cpu))
-   return 0;
-
-   /* we may be lax here as interrupts are off. Nonetheless
-* we need to grab the correct cpu policy, as to check
-* whether we really run on this CPU.
-*/
-
+   /* If there's no policy for the boot CPU, we have nothing to do. */
cpu_policy = cpufreq_cpu_get(cpu);
if (!cpu_policy)
-   return -EINVAL;
-
-   /* only handle each CPU group once */
-   if (unlikely(cpu_policy->cpu != cpu))
-   goto fail;
+   return;
 
if (cpufreq_driver->resume) {
ret = cpufreq_driver->resume(cpu_policy);
@@ -1430,14 +1418,11 @@ static int cpufreq_resume(struct sys_dev
 
 fail:
cpufreq_cpu_put(cpu_policy);
-   return ret;
 }
 
-static struct sysdev_driver cpufreq_sysdev_driver = {
-   .add= cpufreq_add_dev,
-   .r

[PATCH 6/6] Introduce ARCH_NO_SYSDEV_OPS config option (v2)

2011-03-21 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

Introduce Kconfig option allowing architectures where sysdev
operations used during system suspend, resume and shutdown have been
completely replaced with struct sycore_ops operations to avoid
building sysdev code that will never be used.

Make callbacks in struct sys_device and struct sysdev_driver depend
on ARCH_NO_SYSDEV_OPS to allows us to verify if all of the references
have been actually removed from the code the given architecture
depends on.

Make x86 select ARCH_NO_SYSDEV_OPS.

Signed-off-by: Rafael J. Wysocki 
---
 arch/x86/Kconfig   |1 +
 drivers/base/Kconfig   |7 +++
 drivers/base/sys.c |3 ++-
 include/linux/device.h |4 
 include/linux/pm.h |   10 --
 include/linux/sysdev.h |7 +--
 6 files changed, 27 insertions(+), 5 deletions(-)

Index: linux-2.6/include/linux/pm.h
===
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -529,13 +529,19 @@ struct dev_power_domain {
  */
 
 #ifdef CONFIG_PM_SLEEP
-extern void device_pm_lock(void);
+#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
+extern int sysdev_suspend(pm_message_t state);
 extern int sysdev_resume(void);
+#else
+static inline int sysdev_suspend(pm_message_t state) { return 0; }
+static inline int sysdev_resume(void) { return 0; }
+#endif
+
+extern void device_pm_lock(void);
 extern void dpm_resume_noirq(pm_message_t state);
 extern void dpm_resume_end(pm_message_t state);
 
 extern void device_pm_unlock(void);
-extern int sysdev_suspend(pm_message_t state);
 extern int dpm_suspend_noirq(pm_message_t state);
 extern int dpm_suspend_start(pm_message_t state);
 
Index: linux-2.6/include/linux/sysdev.h
===
--- linux-2.6.orig/include/linux/sysdev.h
+++ linux-2.6/include/linux/sysdev.h
@@ -33,12 +33,13 @@ struct sysdev_class {
const char *name;
struct list_headdrivers;
struct sysdev_class_attribute **attrs;
-
+   struct kset kset;
+#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
/* Default operations for these types of devices */
int (*shutdown)(struct sys_device *);
int (*suspend)(struct sys_device *, pm_message_t state);
int (*resume)(struct sys_device *);
-   struct kset kset;
+#endif
 };
 
 struct sysdev_class_attribute {
@@ -76,9 +77,11 @@ struct sysdev_driver {
struct list_headentry;
int (*add)(struct sys_device *);
int (*remove)(struct sys_device *);
+#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
int (*shutdown)(struct sys_device *);
int (*suspend)(struct sys_device *, pm_message_t state);
int (*resume)(struct sys_device *);
+#endif
 };
 
 
Index: linux-2.6/include/linux/device.h
===
--- linux-2.6.orig/include/linux/device.h
+++ linux-2.6/include/linux/device.h
@@ -633,8 +633,12 @@ static inline int devtmpfs_mount(const c
 /* drivers/base/power/shutdown.c */
 extern void device_shutdown(void);
 
+#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
 /* drivers/base/sys.c */
 extern void sysdev_shutdown(void);
+#else
+static inline void sysdev_shutdown(void) { }
+#endif
 
 /* debugging and troubleshooting/diagnostic helpers. */
 extern const char *dev_driver_string(const struct device *dev);
Index: linux-2.6/arch/x86/Kconfig
===
--- linux-2.6.orig/arch/x86/Kconfig
+++ linux-2.6/arch/x86/Kconfig
@@ -71,6 +71,7 @@ config X86
select GENERIC_IRQ_SHOW
select IRQ_FORCED_THREADING
select USE_GENERIC_SMP_HELPERS if SMP
+   select ARCH_NO_SYSDEV_OPS
 
 config INSTRUCTION_DECODER
def_bool (KPROBES || PERF_EVENTS)
Index: linux-2.6/drivers/base/sys.c
===
--- linux-2.6.orig/drivers/base/sys.c
+++ linux-2.6/drivers/base/sys.c
@@ -329,7 +329,7 @@ void sysdev_unregister(struct sys_device
 }
 
 
-
+#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
 /**
  * sysdev_shutdown - Shut down all system devices.
  *
@@ -524,6 +524,7 @@ int sysdev_resume(void)
return 0;
 }
 EXPORT_SYMBOL_GPL(sysdev_resume);
+#endif /* CONFIG_ARCH_NO_SYSDEV_OPS */
 
 int __init system_bus_init(void)
 {
Index: linux-2.6/drivers/base/Kconfig
===
--- linux-2.6.orig/drivers/base/Kconfig
+++ linux-2.6/drivers/base/Kconfig
@@ -168,4 +168,11 @@ config SYS_HYPERVISOR
bool
default n
 
+config ARCH_NO_SYSDEV_OPS
+   bool
+   ---help---
+ To be selected by architectures that don't use sysdev class or
+ sysdev driver power management (suspend/resume) and shutdown
+ operations.
+
 endmenu

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo

[PATCH 3/6] PCI / Intel IOMMU: Use syscore_ops instead of sysdev class and sysdev

2011-03-21 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

The Intel IOMMU subsystem uses a sysdev class and a sysdev for
executing iommu_suspend() after interrupts have been turned off
on the boot CPU (during system suspend) and for executing
iommu_resume() before turning on interrupts on the boot CPU
(during system resume).  However, since both of these functions
ignore their arguments, the entire mechanism may be replaced with a
struct syscore_ops object which is simpler.

Signed-off-by: Rafael J. Wysocki 
---
 drivers/pci/intel-iommu.c |   38 +-
 1 file changed, 9 insertions(+), 29 deletions(-)

Index: linux-2.6/drivers/pci/intel-iommu.c
===
--- linux-2.6.orig/drivers/pci/intel-iommu.c
+++ linux-2.6/drivers/pci/intel-iommu.c
@@ -36,7 +36,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -3135,7 +3135,7 @@ static void iommu_flush_all(void)
}
 }
 
-static int iommu_suspend(struct sys_device *dev, pm_message_t state)
+static int iommu_suspend(void)
 {
struct dmar_drhd_unit *drhd;
struct intel_iommu *iommu = NULL;
@@ -3175,7 +3175,7 @@ nomem:
return -ENOMEM;
 }
 
-static int iommu_resume(struct sys_device *dev)
+static void iommu_resume(void)
 {
struct dmar_drhd_unit *drhd;
struct intel_iommu *iommu = NULL;
@@ -3183,7 +3183,7 @@ static int iommu_resume(struct sys_devic
 
if (init_iommu_hw()) {
WARN(1, "IOMMU setup failed, DMAR can not resume!\n");
-   return -EIO;
+   return;
}
 
for_each_active_iommu(iommu, drhd) {
@@ -3204,40 +3204,20 @@ static int iommu_resume(struct sys_devic
 
for_each_active_iommu(iommu, drhd)
kfree(iommu->iommu_state);
-
-   return 0;
 }
 
-static struct sysdev_class iommu_sysclass = {
-   .name   = "iommu",
+static struct syscore_ops iommu_syscore_ops = {
.resume = iommu_resume,
.suspend= iommu_suspend,
 };
 
-static struct sys_device device_iommu = {
-   .cls= &iommu_sysclass,
-};
-
-static int __init init_iommu_sysfs(void)
+static void __init init_iommu_pm_ops(void)
 {
-   int error;
-
-   error = sysdev_class_register(&iommu_sysclass);
-   if (error)
-   return error;
-
-   error = sysdev_register(&device_iommu);
-   if (error)
-   sysdev_class_unregister(&iommu_sysclass);
-
-   return error;
+   register_syscore_ops(&iommu_syscore_ops);
 }
 
 #else
-static int __init init_iommu_sysfs(void)
-{
-   return 0;
-}
+static inline int init_iommu_pm_ops(void) { }
 #endif /* CONFIG_PM */
 
 /*
@@ -3320,7 +3300,7 @@ int __init intel_iommu_init(void)
 #endif
dma_ops = &intel_dma_ops;
 
-   init_iommu_sysfs();
+   init_iommu_pm_ops();
 
register_iommu(&intel_iommu_ops);
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] timekeeping: Use syscore_ops instead of sysdev class and sysdev

2011-03-21 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

The timekeeping subsystem uses a sysdev class and a sysdev for
executing timekeeping_suspend() after interrupts have been turned off
on the boot CPU (during system suspend) and for executing
timekeeping_resume() before turning on interrupts on the boot CPU
(during system resume).  However, since both of these functions
ignore their arguments, the entire mechanism may be replaced with a
struct syscore_ops object which is simpler.

Signed-off-by: Rafael J. Wysocki 
Reviewed-by: Thomas Gleixner 
---
 kernel/time/timekeeping.c |   27 ---
 1 file changed, 8 insertions(+), 19 deletions(-)

Index: linux-2.6/kernel/time/timekeeping.c
===
--- linux-2.6.orig/kernel/time/timekeeping.c
+++ linux-2.6/kernel/time/timekeeping.c
@@ -14,7 +14,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -597,13 +597,12 @@ static struct timespec timekeeping_suspe
 
 /**
  * timekeeping_resume - Resumes the generic timekeeping subsystem.
- * @dev:   unused
  *
  * This is for the generic clocksource timekeeping.
  * xtime/wall_to_monotonic/jiffies/etc are
  * still managed by arch specific suspend/resume code.
  */
-static int timekeeping_resume(struct sys_device *dev)
+static void timekeeping_resume(void)
 {
unsigned long flags;
struct timespec ts;
@@ -632,11 +631,9 @@ static int timekeeping_resume(struct sys
 
/* Resume hrtimers */
hres_timers_resume();
-
-   return 0;
 }
 
-static int timekeeping_suspend(struct sys_device *dev, pm_message_t state)
+static int timekeeping_suspend(void)
 {
unsigned long flags;
 
@@ -654,26 +651,18 @@ static int timekeeping_suspend(struct sy
 }
 
 /* sysfs resume/suspend bits for timekeeping */
-static struct sysdev_class timekeeping_sysclass = {
-   .name   = "timekeeping",
+static struct syscore_ops timekeeping_syscore_ops = {
.resume = timekeeping_resume,
.suspend= timekeeping_suspend,
 };
 
-static struct sys_device device_timer = {
-   .id = 0,
-   .cls= &timekeeping_sysclass,
-};
-
-static int __init timekeeping_init_device(void)
+static int __init timekeeping_init_ops(void)
 {
-   int error = sysdev_class_register(&timekeeping_sysclass);
-   if (!error)
-   error = sysdev_register(&device_timer);
-   return error;
+   register_syscore_ops(&timekeeping_syscore_ops);
+   return 0;
 }
 
-device_initcall(timekeeping_init_device);
+device_initcall(timekeeping_init_ops);
 
 /*
  * If the error is already larger, we look ahead even further

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] posix-timers: RCU conversion

2011-03-21 Thread Eric Dumazet
Le lundi 21 mars 2011 à 23:57 +0545, Ben Nagy a écrit :
> On Mon, Mar 21, 2011 at 10:57 PM, Eric Dumazet  wrote:
> > Le lundi 21 mars 2011 à 19:02 +0200, Avi Kivity a écrit :
> >
> >> Any ideas on how to fix it?  We could pre-allocate IDs and batch them in
> >> per-cpu caches, but it seems like a lot of work.
> >>
> >
> > Hmm, I dont know what syscalls kvm do, but even a timer_gettime() has to
> > lock this idr_lock.
> >
> > Sounds crazy, and should be fixed in kernel code, not kvm ;)
> 
> Hi,
> 
> I'll need to work out a way I can make the perf.data available
> (~100M), but here's the summary
> 
> http://paste.ubuntu.com/583425/
> 
> And here's the summary of the summary
> 
> # Overhead  Command Shared Object
> Symbol
> #   ...  
> ..
> #
> 71.86%  kvm  [kernel.kallsyms] [k] __ticket_spin_lock
> |
> --- __ticket_spin_lock
>|
>|--54.19%-- default_spin_lock_flags
>|  _raw_spin_lock_irqsave
>|  |
>|   --54.14%-- __lock_timer
>| |
>| |--31.92%-- sys_timer_gettime
>| |  
> system_call_fastpath
>| |
>|  --22.22%-- sys_timer_settime
>|
> system_call_fastpath
>|
>|--15.66%-- _raw_spin_lock
> [...]
> 
> Which I guess seems to match what Eric just said.
> 
> I'll post a link to the full data tomorrow. Many thanks for the help
> so far. If it's a kernel scaling limit then I guess we just wait until
> the kernel gets better. :S I'll check it out with a linux guest
> tomorrow anyway.

Here is a quick & dirty patch to lower idr_lock use. You could try it
eventually ;)

Thanks

[RFC] posix-timers: RCU conversion

Ben Nagy reported a scalability problem with KVM/QEMU that hit very hard
a single spinlock (idr_lock) in posix-timers code.

Even on a 16 cpu machine (2x4x2), a single test can show 98% of cpu time
used in ticket_spin_lock

Switching to RCU should be quite easy, IDR being already RCU ready.

idr_lock should be locked only before an insert/delete, not a find.

Reported-by: Ben Nagy 
Signed-off-by: Eric Dumazet 
Cc: Avi Kivity 
Cc: Thomas Gleixner 
Cc: John Stultz 
Cc: Richard Cochran 
---
 include/linux/posix-timers.h |1 +
 kernel/posix-timers.c|   26 ++
 2 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index d51243a..5dc27ca 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -81,6 +81,7 @@ struct k_itimer {
unsigned long expires;
} mmtimer;
} it;
+   struct rcu_head rcu;
 };
 
 struct k_clock {
diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
index 4c01249..e2a823a 100644
--- a/kernel/posix-timers.c
+++ b/kernel/posix-timers.c
@@ -491,6 +491,11 @@ static struct k_itimer * alloc_posix_timer(void)
return tmr;
 }
 
+static void k_itimer_rcu_free(struct rcu_head *head)
+{
+   kmem_cache_free(posix_timers_cache, container_of(head, struct k_itimer, 
rcu));
+}
+
 #define IT_ID_SET  1
 #define IT_ID_NOT_SET  0
 static void release_posix_timer(struct k_itimer *tmr, int it_id_set)
@@ -499,11 +504,12 @@ static void release_posix_timer(struct k_itimer *tmr, int 
it_id_set)
unsigned long flags;
spin_lock_irqsave(&idr_lock, flags);
idr_remove(&posix_timers_id, tmr->it_id);
+   tmr->it_id = -1;
spin_unlock_irqrestore(&idr_lock, flags);
}
put_pid(tmr->it_pid);
sigqueue_free(tmr->sigq);
-   kmem_cache_free(posix_timers_cache, tmr);
+   call_rcu(&tmr->rcu, k_itimer_rcu_free);
 }
 
 static struct k_clock *clockid_to_kclock(const clockid_t id)
@@ -631,22 +637,18 @@ out:
 static struct k_itimer *__lock_timer(timer_t timer_id, unsigned long *flags)
 {
struct k_itimer *timr;
-   /*
-* Watch out here.  We do a irqsave on the idr_lock and pass the
-* flags part over to the timer lock.  Must not let interrupts in
-* while we are moving the lock.
-*/
-   spin_lock_irqsave(&idr_lock, *flags);
+
+   rcu_read_lock();
timr = idr_find(&posix_timers_id, (int)timer_id);
if (timr) {
-   spin_lock(&timr->it_lock);
-   if (timr->it_signal == current->signal) {
-   spin_unlock(&idr_lock);
+   spin_lock_irqsave(&

Re: "KVM internal error. Suberror: 1" with ancient 2.4 kernel as guest

2011-03-21 Thread Wei Xu
Avi and Jiri:

I implemented emulation of movq(64bit) and movdqa(128 bit). If you guys
still need it let me know and I can post somewhere...

Wei Xu


On 8/31/10 9:30 AM, "Avi Kivity"  wrote:

> 
>   On 08/31/2010 06:49 PM, Avi Kivity wrote:
>>  On 08/31/2010 05:32 PM, Jiri Kosina wrote:
>>> (qemu) x/5i $eip
>>> 0xc027a841:  movq   (%esi),%mm0
>>> 0xc027a844:  movq   0x8(%esi),%mm1
>>> 0xc027a848:  movq   0x10(%esi),%mm2
>>> 0xc027a84c:  movq   0x18(%esi),%mm3
>>> 0xc027a850:  movq   %mm0,(%edx)
>>> ===
>>> 
>>> Is there any issue with emulating MMX?
>>> 
>> 
>> Yes.  MMX is not currently emulated.
>> 
>> If there's a command line option to disable the use of MMX you can try
>> it, otherwise wait for it to be implemented (or implement it
>> yourself).  I'll try to do it for 2.6.37, but can't promise anything.
> 
> You can also run qemu with -cpu qemu32,-mmx.  That will expose a cpu
> without mmx support; hopefully the guest kernel will see that and avoid
> mmx instructions.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm-0.14.0 msix_mask_notifier failed.

2011-03-21 Thread Nikola Ciprich
Hello Dietmar,
did You somehow fixed this problem?
BR
nik

On Tue, Mar 01, 2011 at 11:43:39AM +, Dietmar Maurer wrote:
> Seems the bug is related to EVENTFD support.
> 
> int event_notifier_init(EventNotifier *e, int active)
> 
> That simply return -ENOSYS on older glibc (< 2.8)
> 
> I can define the syscall manually, but I wonder if there is a better way? 
> 
> - Dietmar
> 
> 
> > I get this error when I run new kvm 0.14.0 on kernel 2.6.35 (vhost=on):
> > 
> > # kvm -netdev type=tap,id=n0,vhost=on -device virtio-net-pci,netdev=n0 
> > -cdrom
> > ubuntu-10.10-desktop-amd64.iso
> > 
> > kvm: /home/code/qemu-kvm/hw/msix.c:639: msix_unset_mask_notifier:
> > Assertion `dev->msix_mask_notifier' failed.
> > 
> > Any ideas?
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:+420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM lock contention on 48 core AMD machine

2011-03-21 Thread Avi Kivity

On 03/21/2011 08:48 PM, Michael Tokarev wrote:

21.03.2011 20:12, Eric Dumazet wrote:
>  Le lundi 21 mars 2011 à 19:02 +0200, Avi Kivity a écrit :
>
>>  Any ideas on how to fix it?  We could pre-allocate IDs and batch them in
>>  per-cpu caches, but it seems like a lot of work.
>>
>
>  Hmm, I dont know what syscalls kvm do, but even a timer_gettime() has to
>  lock this idr_lock.

I wonder why my testcase - 160 kvm guests on a 2-core AMD AthlonII CPU -
shows no contention of that sort, not at all...  Yes sure, all these
(linux) guests are idling at login: prompt, but I'd expect at least
some gettimeofday() and some load average, -- everything stays close
to zero.


For this workload spinlock contention only happens on large machines; it 
(mostly) doesn't matter how many guests you have on a small host.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM lock contention on 48 core AMD machine

2011-03-21 Thread Michael Tokarev
21.03.2011 20:12, Eric Dumazet wrote:
> Le lundi 21 mars 2011 à 19:02 +0200, Avi Kivity a écrit :
> 
>> Any ideas on how to fix it?  We could pre-allocate IDs and batch them in 
>> per-cpu caches, but it seems like a lot of work.
>>
> 
> Hmm, I dont know what syscalls kvm do, but even a timer_gettime() has to
> lock this idr_lock.

I wonder why my testcase - 160 kvm guests on a 2-core AMD AthlonII CPU -
shows no contention of that sort, not at all...  Yes sure, all these
(linux) guests are idling at login: prompt, but I'd expect at least
some gettimeofday() and some load average, -- everything stays close
to zero.

/mjt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM lock contention on 48 core AMD machine

2011-03-21 Thread Avi Kivity

On 03/21/2011 07:12 PM, Eric Dumazet wrote:

Le lundi 21 mars 2011 à 19:02 +0200, Avi Kivity a écrit :

>  Any ideas on how to fix it?  We could pre-allocate IDs and batch them in
>  per-cpu caches, but it seems like a lot of work.
>

Hmm, I dont know what syscalls kvm do, but even a timer_gettime() has to
lock this idr_lock.


It's qemu playing with posix timers, nothing to do with kvm per se.


Sounds crazy, and should be fixed in kernel code, not kvm ;)



Oh yes, I couldn't fix it in kvm if I wanted to (and I don't want to).

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM lock contention on 48 core AMD machine

2011-03-21 Thread Ben Nagy
On Mon, Mar 21, 2011 at 10:57 PM, Eric Dumazet  wrote:
> Le lundi 21 mars 2011 à 19:02 +0200, Avi Kivity a écrit :
>
>> Any ideas on how to fix it?  We could pre-allocate IDs and batch them in
>> per-cpu caches, but it seems like a lot of work.
>>
>
> Hmm, I dont know what syscalls kvm do, but even a timer_gettime() has to
> lock this idr_lock.
>
> Sounds crazy, and should be fixed in kernel code, not kvm ;)

Hi,

I'll need to work out a way I can make the perf.data available
(~100M), but here's the summary

http://paste.ubuntu.com/583425/

And here's the summary of the summary

# Overhead  Command Shared Object
Symbol
#   ...  
..
#
71.86%  kvm  [kernel.kallsyms] [k] __ticket_spin_lock
|
--- __ticket_spin_lock
   |
   |--54.19%-- default_spin_lock_flags
   |  _raw_spin_lock_irqsave
   |  |
   |   --54.14%-- __lock_timer
   | |
   | |--31.92%-- sys_timer_gettime
   | |  system_call_fastpath
   | |
   |  --22.22%-- sys_timer_settime
   |system_call_fastpath
   |
   |--15.66%-- _raw_spin_lock
[...]

Which I guess seems to match what Eric just said.

I'll post a link to the full data tomorrow. Many thanks for the help
so far. If it's a kernel scaling limit then I guess we just wait until
the kernel gets better. :S I'll check it out with a linux guest
tomorrow anyway.

Cheers,

ben
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] virtio_net: remove send completion interrupts and avoid TX queue overrun through packet drop

2011-03-21 Thread Shirley Ma
On Fri, 2011-03-18 at 18:41 -0700, Shirley Ma wrote:
> > > +   /* Drop packet instead of stop queue for better
> performance
> > */
> > 
> > I would like to see some justification as to why this is the right
> > way to go and not just papering over the real problem. 
> 
> Fair. KVM guest virtio_net TX queue stop/restart is pretty expensive,
> which involves:
> 
> 1. Guest enable callback: one memory barrier, interrupt flag set

Missed this cost: for history reason, it also involves a guest exit from
I/O write (PCI_QUEUE_NOTIFY).

> 2. Host signals guest: one memory barrier, and a TX interrupt from
> host
> to KVM guest through evenfd_signal.
> 
> 
> Most of the workload so far we barely see TX over run, except for
> small
> messages TCP_STREAM. 
> 
> For small message size TCP_STREAM workload, no matter how big the TX
> queue size is, it always causes overrun. I even re-enable the TX queue
> when it's empty, it still hits TX overrun again and again.
> 
> Somehow KVM guest and host is not in pace on processing small packets.
> I
> tried to pin each thread to different CPU, it didn't help. So it
> didn't
> seem to be scheduling related.
> 
> >From the performance results, we can see dramatically performance
> gain
> with this patch.
> 
> I would like to dig out the real reason why host can't in pace with
> guest, but haven't figured it out in month, that's the reason I held
> this patch for a while. However if anyone can give me any ideas on how
> to debug the real problem, I am willing to try it out. 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Biweekly KVM Test report, kernel a72e315c... qemu b73357ec...

2011-03-21 Thread Avi Kivity
On 03/21/2011 02:35 PM, Avi Kivity wrote:
> On 03/21/2011 04:47 AM, Ren, Yongjie wrote:
> > Hi all,
> > This is KVM test result against kvm.git 
> > a72e315c509376bbd1e121219c3ad9f23973923f based on kernel 2.6.38-rc6+, and 
> > qemu-kvm.git b73357ecd2b14c057134cb71d29447b5b988c516.
> >
> > The VT-d bug 730441 concerning "nomsi NIC" is exist for two weeks.
> > We found another two issues: one about "Save/Restore" , the other about 
> > "Live Migration".
> >
> > New issues:
> > 1. [KVM-User] I/O errors after "Save/Restore"
> >  https://bugs.launchpad.net/qemu/+bug/739088.
>
> Probably the same as the next issue.
>
> > 2. [KVM-User] guest hangs when using network after live migration
> >   https://bugs.launchpad.net/qemu/+bug/739092
>
> Confirmed, autotest Fedora.32.migrate.tmp also fails. Looks like the
> recent qemu.git merge. I am bisecting to find out more.
>

82fa39b75181b730d6d4d09f443bd26bcfcd045c is the first bad commit
commit 82fa39b75181b730d6d4d09f443bd26bcfcd045c
Author: Juan Quintela 
Date: Thu Mar 10 12:33:49 2011 +0100

vmstate: Fix varrays with uint8 indexes

Signed-off-by: Juan Quintela 
Signed-off-by: Anthony Liguori 


Juan, any insight?

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Biweekly KVM Test report, kernel a72e315c... qemu b73357ec...

2011-03-21 Thread Avi Kivity
On 03/21/2011 06:26 PM, Avi Kivity wrote:
> On 03/21/2011 02:35 PM, Avi Kivity wrote:
> > On 03/21/2011 04:47 AM, Ren, Yongjie wrote:
> > > Hi all,
> > > This is KVM test result against kvm.git 
> > > a72e315c509376bbd1e121219c3ad9f23973923f based on kernel 2.6.38-rc6+, and 
> > > qemu-kvm.git b73357ecd2b14c057134cb71d29447b5b988c516.
> > >
> > > The VT-d bug 730441 concerning "nomsi NIC" is exist for two weeks.
> > > We found another two issues: one about "Save/Restore" , the other about 
> > > "Live Migration".
> > >
> > > New issues:
> > > 1. [KVM-User] I/O errors after "Save/Restore"
> > >  https://bugs.launchpad.net/qemu/+bug/739088.
> >
> > Probably the same as the next issue.
> >
> > > 2. [KVM-User] guest hangs when using network after live migration
> > >   https://bugs.launchpad.net/qemu/+bug/739092
> >
> > Confirmed, autotest Fedora.32.migrate.tmp also fails. Looks like the
> > recent qemu.git merge. I am bisecting to find out more.
> >
>
> 82fa39b75181b730d6d4d09f443bd26bcfcd045c is the first bad commit
> commit 82fa39b75181b730d6d4d09f443bd26bcfcd045c
> Author: Juan Quintela 
> Date: Thu Mar 10 12:33:49 2011 +0100
>
> vmstate: Fix varrays with uint8 indexes

With this reverted, qemu-kvm.git master appears to work.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM lock contention on 48 core AMD machine

2011-03-21 Thread Eric Dumazet
Le lundi 21 mars 2011 à 19:02 +0200, Avi Kivity a écrit :

> Any ideas on how to fix it?  We could pre-allocate IDs and batch them in 
> per-cpu caches, but it seems like a lot of work.
> 

Hmm, I dont know what syscalls kvm do, but even a timer_gettime() has to
lock this idr_lock.

Sounds crazy, and should be fixed in kernel code, not kvm ;)



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM lock contention on 48 core AMD machine

2011-03-21 Thread Avi Kivity

On 03/21/2011 06:54 PM, Eric Dumazet wrote:

Le lundi 21 mars 2011 à 22:01 +0545, Ben Nagy a écrit :
>  >>  On Mon, Mar 21, 2011 at 7:38 PM, Avi Kivity   wrote:
>  >>  >   In the future, please post the binary perf.dat.
>  >>
>  >>  Hi Avi,
>  >>
>  >>  How do I do that?
>  >
>  >  'make nconfig' and go to the kernel hacking section.
>
>  Imprecise question sorry, I meant how do I get the perf.dat not how do
>  I disable the debugging.
>
>  On the non-debug kernel: Linux eax 2.6.38-7-server #36-Ubuntu SMP Fri
>  Mar 18 23:36:13 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
>
> 150512.00 67.8% __ticket_spin_lock
>  [kernel.kallsyms]
>  11126.00  5.0% memcmp_pages
>  [kernel.kallsyms]
>   9563.00  4.3% native_safe_halt
>  [kernel.kallsyms]
>   8965.00  4.0% svm_vcpu_run
>  /lib/modules/2.6.38-7-server/kernel/arch/x86/kvm/kvm-amd.ko
>   6489.00  2.9% tg_load_down
>  [kernel.kallsyms]
>   4676.00  2.1% kvm_get_cs_db_l_bits
>  /lib/modules/2.6.38-7-server/kernel/arch/x86/kvm/kvm.ko
>   1931.00  0.9% load_balance_fair
>  [kernel.kallsyms]
>   1917.00  0.9% ktime_get
>  [kernel.kallsyms]
>   1624.00  0.7% walk_tg_tree.clone.129
>  [kernel.kallsyms]
>   1542.00  0.7% find_busiest_group
>  [kernel.kallsyms]
>   1326.00  0.6% find_next_bit
>  [kernel.kallsyms]
>673.00  0.3% lock_hrtimer_base.clone.25
>  [kernel.kallsyms]
>624.00  0.3% copy_user_generic_string[kernel.kallsyms]
>
>  top now says:
>
>  top - 00:11:35 up 22 min,  4 users,  load average: 0.11, 6.15, 7.78
>  Tasks: 491 total,   3 running, 488 sleeping,   0 stopped,   0 zombie
>  Cpu(s):  0.9%us, 15.4%sy,  0.0%ni, 83.7%id,  0.0%wa,  0.0%hi,  0.0%si,  
0.0%st
>  Mem:  99068660k total, 70831760k used, 28236900k free,10036k buffers
>  Swap:  2438140k total,  2173652k used,   264488k free,  3396144k cached
>
>  With average 'resting cpu' per idle guest 8%, 96 guests running.
>
>  Is this as good as I am going to get? It seems like I can't really
>  debug that lock contention without blowing stuff up because of the
>  load of the lock debugging...
>
>  Don't know if I mentioned this before, but those guests are each
>  pinned to a cpu (cpu guestnum%48) with taskset.
>
>  Cheers,

It seems you hit idr_lock contention (used in kernel/posix-timers.c)



Any ideas on how to fix it?  We could pre-allocate IDs and batch them in 
per-cpu caches, but it seems like a lot of work.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM lock contention on 48 core AMD machine

2011-03-21 Thread Eric Dumazet
Le lundi 21 mars 2011 à 22:01 +0545, Ben Nagy a écrit :
> >> On Mon, Mar 21, 2011 at 7:38 PM, Avi Kivity  wrote:
> >> >  In the future, please post the binary perf.dat.
> >>
> >> Hi Avi,
> >>
> >> How do I do that?
> >
> > 'make nconfig' and go to the kernel hacking section.
> 
> Imprecise question sorry, I meant how do I get the perf.dat not how do
> I disable the debugging.
> 
> On the non-debug kernel: Linux eax 2.6.38-7-server #36-Ubuntu SMP Fri
> Mar 18 23:36:13 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
> 
>150512.00 67.8% __ticket_spin_lock
> [kernel.kallsyms]
> 11126.00  5.0% memcmp_pages
> [kernel.kallsyms]
>  9563.00  4.3% native_safe_halt
> [kernel.kallsyms]
>  8965.00  4.0% svm_vcpu_run
> /lib/modules/2.6.38-7-server/kernel/arch/x86/kvm/kvm-amd.ko
>  6489.00  2.9% tg_load_down
> [kernel.kallsyms]
>  4676.00  2.1% kvm_get_cs_db_l_bits
> /lib/modules/2.6.38-7-server/kernel/arch/x86/kvm/kvm.ko
>  1931.00  0.9% load_balance_fair
> [kernel.kallsyms]
>  1917.00  0.9% ktime_get
> [kernel.kallsyms]
>  1624.00  0.7% walk_tg_tree.clone.129
> [kernel.kallsyms]
>  1542.00  0.7% find_busiest_group
> [kernel.kallsyms]
>  1326.00  0.6% find_next_bit
> [kernel.kallsyms]
>   673.00  0.3% lock_hrtimer_base.clone.25
> [kernel.kallsyms]
>   624.00  0.3% copy_user_generic_string[kernel.kallsyms]
> 
> top now says:
> 
> top - 00:11:35 up 22 min,  4 users,  load average: 0.11, 6.15, 7.78
> Tasks: 491 total,   3 running, 488 sleeping,   0 stopped,   0 zombie
> Cpu(s):  0.9%us, 15.4%sy,  0.0%ni, 83.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Mem:  99068660k total, 70831760k used, 28236900k free,10036k buffers
> Swap:  2438140k total,  2173652k used,   264488k free,  3396144k cached
> 
> With average 'resting cpu' per idle guest 8%, 96 guests running.
> 
> Is this as good as I am going to get? It seems like I can't really
> debug that lock contention without blowing stuff up because of the
> load of the lock debugging...
> 
> Don't know if I mentioned this before, but those guests are each
> pinned to a cpu (cpu guestnum%48) with taskset.
> 
> Cheers,

It seems you hit idr_lock contention (used in kernel/posix-timers.c)



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM lock contention on 48 core AMD machine

2011-03-21 Thread Avi Kivity

On 03/21/2011 06:16 PM, Ben Nagy wrote:

>>  On Mon, Mar 21, 2011 at 7:38 PM, Avi Kivitywrote:
>>  >In the future, please post the binary perf.dat.
>>
>>  Hi Avi,
>>
>>  How do I do that?
>
>  'make nconfig' and go to the kernel hacking section.

Imprecise question sorry, I meant how do I get the perf.dat not how do
I disable the debugging.



'perf record' dumps perf.dat in the current directory.


On the non-debug kernel: Linux eax 2.6.38-7-server #36-Ubuntu SMP Fri
Mar 18 23:36:13 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

150512.00 67.8% __ticket_spin_lock
[kernel.kallsyms]
 11126.00  5.0% memcmp_pages
[kernel.kallsyms]
  9563.00  4.3% native_safe_halt
[kernel.kallsyms]
  8965.00  4.0% svm_vcpu_run
/lib/modules/2.6.38-7-server/kernel/arch/x86/kvm/kvm-amd.ko
  6489.00  2.9% tg_load_down
[kernel.kallsyms]
  4676.00  2.1% kvm_get_cs_db_l_bits
/lib/modules/2.6.38-7-server/kernel/arch/x86/kvm/kvm.ko
  1931.00  0.9% load_balance_fair
[kernel.kallsyms]
  1917.00  0.9% ktime_get
[kernel.kallsyms]
  1624.00  0.7% walk_tg_tree.clone.129
[kernel.kallsyms]
  1542.00  0.7% find_busiest_group
[kernel.kallsyms]
  1326.00  0.6% find_next_bit
[kernel.kallsyms]
   673.00  0.3% lock_hrtimer_base.clone.25
[kernel.kallsyms]
   624.00  0.3% copy_user_generic_string[kernel.kallsyms]

top now says:

top - 00:11:35 up 22 min,  4 users,  load average: 0.11, 6.15, 7.78
Tasks: 491 total,   3 running, 488 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.9%us, 15.4%sy,  0.0%ni, 83.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  99068660k total, 70831760k used, 28236900k free,10036k buffers
Swap:  2438140k total,  2173652k used,   264488k free,  3396144k cached

With average 'resting cpu' per idle guest 8%, 96 guests running.

Is this as good as I am going to get? It seems like I can't really
debug that lock contention without blowing stuff up because of the
load of the lock debugging...



Post perf.dat for the non-debug kernel, and we'll see.  You're probably 
hitting some Linux scaling limit, and these are continuously expanded as 
big hardware becomes more common.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM lock contention on 48 core AMD machine

2011-03-21 Thread Ben Nagy
>> On Mon, Mar 21, 2011 at 7:38 PM, Avi Kivity  wrote:
>> >  In the future, please post the binary perf.dat.
>>
>> Hi Avi,
>>
>> How do I do that?
>
> 'make nconfig' and go to the kernel hacking section.

Imprecise question sorry, I meant how do I get the perf.dat not how do
I disable the debugging.

On the non-debug kernel: Linux eax 2.6.38-7-server #36-Ubuntu SMP Fri
Mar 18 23:36:13 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

   150512.00 67.8% __ticket_spin_lock
[kernel.kallsyms]
11126.00  5.0% memcmp_pages
[kernel.kallsyms]
 9563.00  4.3% native_safe_halt
[kernel.kallsyms]
 8965.00  4.0% svm_vcpu_run
/lib/modules/2.6.38-7-server/kernel/arch/x86/kvm/kvm-amd.ko
 6489.00  2.9% tg_load_down
[kernel.kallsyms]
 4676.00  2.1% kvm_get_cs_db_l_bits
/lib/modules/2.6.38-7-server/kernel/arch/x86/kvm/kvm.ko
 1931.00  0.9% load_balance_fair
[kernel.kallsyms]
 1917.00  0.9% ktime_get
[kernel.kallsyms]
 1624.00  0.7% walk_tg_tree.clone.129
[kernel.kallsyms]
 1542.00  0.7% find_busiest_group
[kernel.kallsyms]
 1326.00  0.6% find_next_bit
[kernel.kallsyms]
  673.00  0.3% lock_hrtimer_base.clone.25
[kernel.kallsyms]
  624.00  0.3% copy_user_generic_string[kernel.kallsyms]

top now says:

top - 00:11:35 up 22 min,  4 users,  load average: 0.11, 6.15, 7.78
Tasks: 491 total,   3 running, 488 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.9%us, 15.4%sy,  0.0%ni, 83.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  99068660k total, 70831760k used, 28236900k free,10036k buffers
Swap:  2438140k total,  2173652k used,   264488k free,  3396144k cached

With average 'resting cpu' per idle guest 8%, 96 guests running.

Is this as good as I am going to get? It seems like I can't really
debug that lock contention without blowing stuff up because of the
load of the lock debugging...

Don't know if I mentioned this before, but those guests are each
pinned to a cpu (cpu guestnum%48) with taskset.

Cheers,

ben
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM lock contention on 48 core AMD machine

2011-03-21 Thread Avi Kivity

(restored kvm@vger)


On 03/21/2011 05:44 PM, Ben Nagy wrote:

On Mon, Mar 21, 2011 at 7:38 PM, Avi Kivity  wrote:
>  In the future, please post the binary perf.dat.

Hi Avi,

Thanks for the help!

How do I do that?

I'll try again on the non lock debug kernel and let you know.



'make nconfig' and go to the kernel hacking section.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] This patch adds migration test (tmigrate). It also daemonize the virtio_console_guest.py script, which is necessarily for the migration test.

2011-03-21 Thread Lukas Doktor
From: Jiří Župka 

[virtio_console_guest.py]
It is splited to 2 parts, first one is client and second one is daemon.
* Daemon part is forked and works as the previous script except it redirects 
the STD{IN,OUT,ERR} into named pipes.
* Client part connects to named pipes and handle the communication with host 
OS. Also when the client is executed multiple time it finds the previous client 
and destrois it. Then it opens the pipes and take over the communication. The 
daemon is NOT executed twice.

[tmigrate]
This test transfers and checks data through virtio_console loopback while 
migrating the guest. It's splitted into 2 tests tmigrate_online and 
tmigrate_offline.
* tmigrate_{offline,online} - parses test parameters and executes the actual 
test __tmigrate()
* test parameters: virtio_console_migration_{offline,online} = 
"[{serialport,console}]:$no_migrations:send_buf_len:recv_buf_len:loopback_buf_len;..."
  ; multiple tests are allowed and separated by ';'

It sends the data from first {console,serialport} and receive on all remaining 
ports of the same type. By default it adds 2 {console,serialport}s depends on 
test parameters but you can override this using no_{consoles,serialports} or 
the number can be higher according to other tests.

[BUGs]
Tested on: HOST=F13 (2.6.36-0.20.rc3.git4.fc15.x86_64), GUEST=F14 
(2.6.35.6-45.fc14.x86_64)
Developer: is noticed simultanously with this patch for explenation.
1) console very often won't survive the migration (it stops transfering data).
virtio_console_migration_offline = "console:5:2048:2048:2048"
2a) offline migration: drops 1-2 send-buffers of data, which is for now 
accepted as success.
virtio_console_migration_offline = "serialport:5:2048:2048:2048"
2b) Using smp2 the data loss is 1-4 send-buffers of data.
only smp2
virtio_console_migration_offline = "serialport:5:2048:2048:2048"
3) online migration: similar problem as offline but the data-loss is much 
higher and depends on the console throughput. In this version very high value 
is accepted as pass. You should check the message "Maximal loss was %s per one 
migration". If it should be n-degree lower than the transfered data.
virtio_console_migration_online = "serialport:5:2048:2048:2048"
4) ThSendCheck sometimes get stuck in send. Test handle this and the problem is 
similar to: the "FIXME: workaround the problem with qemu-kvm stall when too 
much data is sent without receiving". Using migration this occures with even 
smaller slices.
virtio_console_migration_offline = "serialport:5:4096:1:1"

Signed-off-by: Jiří Župka 
Signed-off-by: Lukas Doktor 
---
 client/tests/kvm/scripts/virtio_console_guest.py |  231 +++--
 client/tests/kvm/tests/virtio_console.py |  385 +++---
 client/tests/kvm/tests_base.cfg.sample   |   14 +-
 3 files changed, 545 insertions(+), 85 deletions(-)

diff --git a/client/tests/kvm/scripts/virtio_console_guest.py 
b/client/tests/kvm/scripts/virtio_console_guest.py
index 8bf5f8b..24c368c 100755
--- a/client/tests/kvm/scripts/virtio_console_guest.py
+++ b/client/tests/kvm/scripts/virtio_console_guest.py
@@ -9,8 +9,8 @@ Auxiliary script used to send data between ports on guests.
 """
 import threading
 from threading import Thread
-import os, select, re, random, sys, array
-import fcntl, traceback, signal
+import os, select, re, random, sys, array, stat
+import fcntl, traceback, signal, time
 
 DEBUGPATH = "/sys/kernel/debug"
 SYSFSPATH = "/sys/class/virtio-ports/"
@@ -703,7 +703,6 @@ def compile():
 def guest_exit():
 global exiting
 exiting = True
-os.kill(os.getpid(), signal.SIGUSR1)
 
 
 def worker(virt):
@@ -711,48 +710,220 @@ def worker(virt):
 Worker thread (infinite) loop of virtio_guest.
 """
 global exiting
-print "PASS: Start"
-
+print "PASS: Daemon start."
+p = select.poll()
+p.register(sys.stdin.fileno())
 while not exiting:
-str = raw_input()
-try:
-exec str
-except:
-exc_type, exc_value, exc_traceback = sys.exc_info()
-print "On Guest exception from: \n" + "".join(
-traceback.format_exception(exc_type,
-   exc_value,
-   exc_traceback))
-print "FAIL: Guest command exception."
+d = p.poll()
+if (d[0][1] == select.POLLIN):
+str = raw_input()
+try:
+exec str
+except:
+exc_type, exc_value, exc_traceback = sys.exc_info()
+print "On Guest exception from: \n" + "".join(
+traceback.format_exception(exc_type,
+   exc_value,
+   exc_traceback))
+print "FAIL: Guest command exception."
+elif (d[0][1] & select.POLLHUP):
+ti

[PATCH][KVM-Autotest] virtio-console: Offline and online migration test

2011-03-21 Thread Lukas Doktor

It adds the online and offline migration test for virtio_console. It uses the 
loopback inside the guest. The host sends the data and checks it's validity 
while the guest is migrated (on localhost, multiple times).

You can find more info in the patch itself.

Regards,
Lukáš Doktor
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM lock contention on 48 core AMD machine

2011-03-21 Thread Avi Kivity

On 03/21/2011 03:41 PM, Ben Nagy wrote:

perf report output (huge, sorry)
http://paste.ubuntu.com/583311/



In the future, please post the binary perf.dat.

Anyway, I see

39.87%  kvm  [kernel.kallsyms]   [k] delay_tsc
|
--- delay_tsc
   |
   |--98.20%-- __delay
   |  do_raw_spin_lock
   |  |
   |  |--90.51%-- _raw_spin_lock_irqsave
   |  |  |
   |  |  |--99.78%-- __lock_timer
   |  |  |  |
   |  |  |  |--62.85%-- 
sys_timer_gettime
   |  |  |  |  
system_call_fastpath
   |  |  |  |  |
   |  |  |  |  
|--1.59%-- 0x7ff6c20d26fc
   |  |  |  |  |
   |  |  |  |  
|--1.55%-- 0x7f16d89866fc


Please disable all spinlock debugging and try again.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM lock contention on 48 core AMD machine

2011-03-21 Thread Ben Nagy
On Mon, Mar 21, 2011 at 5:28 PM, Ben Nagy  wrote:

> Bit of a status update:
[...]
> Since the underlying configs have changed so much, I will set up, run
> all the tests again and then followup if things are still locky.

Which they are.

I will need to get the git sources in order to build with
disable-strip as Avi suggested, which I can start doing tomorrow, but
in the meantime, I brought up 96 guests.

Command:
00:02:47 /usr/bin/kvm -S -M pc-0.14 -enable-kvm -m 1024 -smp
1,sockets=1,cores=1,threads=1 -name fbn-95 -uuid
2c7bfda3-525f-9a4f-901c-7e139dfc236f -nographic -nodefconfig
-nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/fbn-95.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=readline -rtc base=localtime
-boot cd -drive
file=/mnt/big/bigfiles/kvm_disks/eax/fbn-95.ovl,if=none,id=drive-virtio-disk0,boot=on,format=qed
-device 
virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0
-drive if=none,media=cdrom,id=drive-ide0-0-1,readonly=on,format=raw
-device ide-drive,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1
-netdev tap,fd=16,id=hostnet0 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:01:41:1c,bus=pci.0,addr=0x3
-vga std -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
-snapshot

perf report output (huge, sorry)
http://paste.ubuntu.com/583311/

perf top output
http://paste.ubuntu.com/583313/

lock_stat contents
http://paste.ubuntu.com/583314/

The guests are xpsp3, ACPI Uniprocessor HAL, no /usepmtimer switch
anymore (standard boot.ini), fully updated, with version 1.1.16 of the
redhat virtio drivers for NIC and block

Major changes I made
- disabled usb
- changed from cirrus to std-vga
- changed .ovl format to qed
- changed backing disk format to raw

The machines now boot much MUCH faster, I can bring up 96 machines in
a few minutes instead of hours, but the system is at around 43% system
time, doing nothing, and each guest reports ~25% CPU from top.

If anybody can see anything from the stripped perf report output that
was be great, or if anyone has any other suggestions as to things I
can try to debug, I'm all ears. :) My existing plans for further
investigation are to build with disable-strip and retest, and also to
test with similarly configured linux guests.

Thanks a lot,

ben
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for Mars 21th

2011-03-21 Thread Juan Quintela

Please, send in any agenda items you are interested in covening.

- Merge patches speed.  I just "feel", that patches are not being
  handled fast enough, so ... I looked how much patches have been
  integrated since Mars 1st:

(master)$ g log --pretty=fuller ed02c54d1fd5d35c06e62e3dae84ee5bb90097ef.. | 
grep "^Commit: " | wc -l
121

121 patches total.

Now, I looked at how much patches Anthony commited himself:

(master)$ g log --pretty=fuller ed02c54d1fd5d35c06e62e3dae84ee5bb90097ef.. | 
grep "^Commit: " | grep Anthony | wc -l
13

This feel pretty bad, having in account that I have almost 80 patches
on the list waiting for action.  As it "appears" that more people is
also waiting for Anthony to merge things, I think we have a problem
here.

And no, I don't want to make Anthony feel bad, but I know that the kind
of patches that I sent are for subsystems that only he merges.  I think
that we really have a problem integrating patches, it just takes too
long.

Thanks, Juan.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Biweekly KVM Test report, kernel a72e315c... qemu b73357ec...

2011-03-21 Thread Avi Kivity
On 03/21/2011 04:47 AM, Ren, Yongjie wrote:
> Hi all,
> This is KVM test result against kvm.git 
> a72e315c509376bbd1e121219c3ad9f23973923f based on kernel 2.6.38-rc6+, and 
> qemu-kvm.git b73357ecd2b14c057134cb71d29447b5b988c516.
>
> The VT-d bug 730441 concerning "nomsi NIC" is exist for two weeks.
> We found another two issues: one about "Save/Restore" , the other about "Live 
> Migration".
>
> New issues:
> 1. [KVM-User] I/O errors after "Save/Restore"
>  https://bugs.launchpad.net/qemu/+bug/739088.

Probably the same as the next issue.

> 2. [KVM-User] guest hangs when using network after live migration
>   https://bugs.launchpad.net/qemu/+bug/739092

Confirmed, autotest Fedora.32.migrate.tmp also fails. Looks like the
recent qemu.git merge. I am bisecting to find out more.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM lock contention on 48 core AMD machine

2011-03-21 Thread Ben Nagy
On Mon, Mar 21, 2011 at 3:35 PM, Avi Kivity  wrote:
> It's actually not kvm code, but timer code.  Please build qemu with
> ./configure --disable-strip and redo.

Bit of a status update:

Removing the usb tablet device helped a LOT - lowered the number of
machines we could bring up until the lock % got too high
Removing the -no-acpi option helped quite a lot - lowered the
delay_tsc calls to a 'normal' number

I am currently building a new XP template guest with the ACPI HAL,
using /usepmtimer, and disabling AMD PowerNow in the host BIOS, but
it's a lot of changes so it's taking a while to roll out and test. On
the IO side we moved to .raw disk images backing the files, and I
installed the latest virtio NIC and block drivers in the XP guest.
hdparm -t in the XP guest (running off the .raw over the iscsi) gets
437MB/s read, but I haven't tested it yet under contention. We did
verify (again) that the performance issues remain more or less the
same off the local SSD as off the infiniband / iSCSI.

Would anyone care to give me any tips regarding the best way to share
a single template disk for many guests based off that template?
Currently we plan to create a .raw template, keep it on a separate
physical device on the SAN and build qed files using that file as the
backing, then run the guests off those in snapshot mode (it's fine
that the changes get discarded, for our app).

Since the underlying configs have changed so much, I will set up, run
all the tests again and then followup if things are still locky.

Many thanks,

ben
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: openbsd system_powerdown: "KVM internal error. Suberror: 1"

2011-03-21 Thread Gleb Natapov
On Mon, Mar 21, 2011 at 12:47:39PM +0200, Avi Kivity wrote:
> On 03/21/2011 12:41 PM, Gleb Natapov wrote:
> >On Mon, Mar 21, 2011 at 12:28:56PM +0200, Avi Kivity wrote:
> >>  >(This all is not to say I wont help resolving these issues -
> >>  >quite the opposite, I'm willing to help, but I think my help
> >>  >in a form of broken phone isn't of much value :)
> >>
> >>  Yes, it's better if we can reproduce this, and I understand from
> >>  Gleb's message that we can.
> >>
> >https://bugzilla.redhat.com/show_bug.cgi?id=508801#c34
> >
> 
> Well, that says it's fixed, no?
> 
No. It says it is fixed for all but SCI interrupt. See comment 32 which the link
above points to.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: openbsd system_powerdown: "KVM internal error. Suberror: 1"

2011-03-21 Thread Avi Kivity

On 03/21/2011 12:41 PM, Gleb Natapov wrote:

On Mon, Mar 21, 2011 at 12:28:56PM +0200, Avi Kivity wrote:
>  >(This all is not to say I wont help resolving these issues -
>  >quite the opposite, I'm willing to help, but I think my help
>  >in a form of broken phone isn't of much value :)
>
>  Yes, it's better if we can reproduce this, and I understand from
>  Gleb's message that we can.
>
https://bugzilla.redhat.com/show_bug.cgi?id=508801#c34



Well, that says it's fixed, no?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: openbsd system_powerdown: "KVM internal error. Suberror: 1"

2011-03-21 Thread Gleb Natapov
On Mon, Mar 21, 2011 at 12:28:56PM +0200, Avi Kivity wrote:
> >(This all is not to say I wont help resolving these issues -
> >quite the opposite, I'm willing to help, but I think my help
> >in a form of broken phone isn't of much value :)
> 
> Yes, it's better if we can reproduce this, and I understand from
> Gleb's message that we can.
> 
https://bugzilla.redhat.com/show_bug.cgi?id=508801#c34

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: openbsd system_powerdown: "KVM internal error. Suberror: 1"

2011-03-21 Thread Avi Kivity

On 03/21/2011 12:12 PM, Michael Tokarev wrote:

21.03.2011 12:43, Avi Kivity wrote:
>  On 03/17/2011 10:18 PM, Michael Tokarev wrote:

[]
>>  47965.428791: kvm_exit: reason npf rip 0xd020203a
>>  47965.428791: kvm_page_fault: address bfff0 error_code 4
>>  47965.428792: kvm_emulate_insn: 0:d020203a: 5a (prot32)
>>  47965.428792: kvm_mmio: mmio unsatisfied-read len 4 gpa 0xbfff0 val 0x0
>>  47965.428793: kvm_mmio: mmio read len 4 gpa 0xbfff0 val 0xb100
>>  47965.428794: kvm_entry: vcpu 0
>>  47965.428795: kvm_exit: reason npf rip 0xd020203b
>>  47965.428795: kvm_page_fault: address bfff4 error_code 4
>>  47965.428795: kvm_emulate_insn: 0:d020203b: 59 (prot32)
>>  47965.428796: kvm_mmio: mmio unsatisfied-read len 4 gpa 0xbfff4 val 0x0
>>  47965.428797: kvm_mmio: mmio read len 4 gpa 0xbfff4 val 0x0
>>  47965.428797: kvm_entry: vcpu 0
>>  47965.428798: kvm_exit: reason npf rip 0xd020203c
>>  47965.428798: kvm_page_fault: address bfff8 error_code 4
>>  47965.428799: kvm_emulate_insn: 0:d020203c: 58 (prot32)
>
>  That's a POP instruction.  So openbsd mapped the stack into the
>  framebuffer, and kvm has to emulate everything.
>
>  Please post a complete binary trace from bootup until the
>  host_state_reload issue appears.

http://95.84.243.119:8000/tmp/kvm-obsd/ -- that's the whole
thing.

There, trace.dat.gz and trace.txt.gz are the complete traces
from trace-cmd from the beginning to the login prompt.

I don't think it's easy to catch the place when host_state_reloads
starts increasing - it happens during boot before userspace takes
control, while in kernel (during this time the text on the screen -
bootup progress - in on a strange background color). The difficulty
is because during that time there are lots of other activity on
kvm side - number of exits and emulations for example.

Also, obsd4.8-32bit.qcow2.xz is the disk image of openbsd install
which I used for all these tests - it's 112 Mb compressed, about
600Mb uncompressed, 2Gb virtual size.  This is here in order for
me to stop acting as a broken phone (but I can continue doing so
just fine - I just think it's a bit less productive this way :)

I've shown this file to Gleb Natapov (Cc'd) before too (who tried
to debug the "insane amount of host_state_reload" issue. This is
a default openbsd install from their current installation cdrom,
so anyone can create their own disk image too, obviously.

I run it just like "kvm -hda obsd4.8-32bit.qcow2 -snapshot -net none"
(and it can use rtl8139 NIC).  Root password is "12", but there's no
need to login since all the problems happens before login.

In order to trigger the error in $subject, wait till it's idle
(which happens right after "login:" prompt) and send
"system_powerdown" command (I use -monitor stdio) - in about
5 seconds from there it'll error out.  In order to catch this
error it's better to use kvm 0.12 (it works with current seabios
under freebsd since graphics mode isn't used) -- current 0.14
behaves badly after "KVM internal error" (which needs to be
improved a bit too, I think)

>>  47965.428799: kvm_mmio: mmio unsatisfied-read len 4 gpa 0xbfff8 val 0x0
>>  47965.428801: kvm_mmio: mmio read len 4 gpa 0xbfff8 val 0x30
>>  47965.428801: kvm_entry: vcpu 0
>>  47965.428802: kvm_exit: reason vintr rip 0xd0202041
>>  47965.428802: kvm_inj_virq: irq 81
>>  47965.428802: kvm_inj_virq: irq 81
>>  47965.428803: kvm_entry: vcpu 0
>>  47965.428803: kvm_exit: reason npf rip 0xd0202041
>>  47965.428804: kvm_page_fault: address bfffc error_code 6
>>  47965.428804: kvm_emulate_insn: 0:d0202041: cf (prot32)
>>  47965.428805: kvm_emulate_insn: 0:d0202041: cf (prot32) failed
>
>  We don't emulate IRET-with-mmio-stack.

Note that the whole this story - two issues with OpenBSD - is
pure my "luck" - I don't use openbsd, never used it before, and
don't actually plan to use in a near future.  We debugged an
unrelated problem (a bug in linux nfs server) and tried to
perform various interoperability tests (installing various
operating systems in kvm), and found out that OpenBSD behaves
somewhat.. unexpectedly in kvm.  So I went on and performed
a few tests locally (installing OpenBSD for the first time
ever), which resulted in 2 my "bugreports".

I'm not sure how important OpenBSD support for kvm is, and if
it's something which better be done in OpenBSD itself instead
of kvm.


There's no such thing as OpenBSD support.  We emulate a PC, and OpenBSD 
runs on a PC.  If it doesn't work well, there's a bug in one or the other.


It's true that the bug has relatively low priority if it's just OpenBSD 
that triggers it.



(This all is not to say I wont help resolving these issues -
quite the opposite, I'm willing to help, but I think my help
in a form of broken phone isn't of much value :)


Yes, it's better if we can reproduce this, and I understand from Gleb's 
message that we can.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the b

Re: openbsd system_powerdown: "KVM internal error. Suberror: 1"

2011-03-21 Thread Michael Tokarev
21.03.2011 12:43, Avi Kivity wrote:
> On 03/17/2011 10:18 PM, Michael Tokarev wrote:

[]
>> 47965.428791: kvm_exit: reason npf rip 0xd020203a
>> 47965.428791: kvm_page_fault: address bfff0 error_code 4
>> 47965.428792: kvm_emulate_insn: 0:d020203a: 5a (prot32)
>> 47965.428792: kvm_mmio: mmio unsatisfied-read len 4 gpa 0xbfff0 val 0x0
>> 47965.428793: kvm_mmio: mmio read len 4 gpa 0xbfff0 val 0xb100
>> 47965.428794: kvm_entry: vcpu 0
>> 47965.428795: kvm_exit: reason npf rip 0xd020203b
>> 47965.428795: kvm_page_fault: address bfff4 error_code 4
>> 47965.428795: kvm_emulate_insn: 0:d020203b: 59 (prot32)
>> 47965.428796: kvm_mmio: mmio unsatisfied-read len 4 gpa 0xbfff4 val 0x0
>> 47965.428797: kvm_mmio: mmio read len 4 gpa 0xbfff4 val 0x0
>> 47965.428797: kvm_entry: vcpu 0
>> 47965.428798: kvm_exit: reason npf rip 0xd020203c
>> 47965.428798: kvm_page_fault: address bfff8 error_code 4
>> 47965.428799: kvm_emulate_insn: 0:d020203c: 58 (prot32)
> 
> That's a POP instruction.  So openbsd mapped the stack into the
> framebuffer, and kvm has to emulate everything.
> 
> Please post a complete binary trace from bootup until the
> host_state_reload issue appears.

http://95.84.243.119:8000/tmp/kvm-obsd/ -- that's the whole
thing.

There, trace.dat.gz and trace.txt.gz are the complete traces
from trace-cmd from the beginning to the login prompt.

I don't think it's easy to catch the place when host_state_reloads
starts increasing - it happens during boot before userspace takes
control, while in kernel (during this time the text on the screen -
bootup progress - in on a strange background color). The difficulty
is because during that time there are lots of other activity on
kvm side - number of exits and emulations for example.

Also, obsd4.8-32bit.qcow2.xz is the disk image of openbsd install
which I used for all these tests - it's 112 Mb compressed, about
600Mb uncompressed, 2Gb virtual size.  This is here in order for
me to stop acting as a broken phone (but I can continue doing so
just fine - I just think it's a bit less productive this way :)

I've shown this file to Gleb Natapov (Cc'd) before too (who tried
to debug the "insane amount of host_state_reload" issue. This is
a default openbsd install from their current installation cdrom,
so anyone can create their own disk image too, obviously.

I run it just like "kvm -hda obsd4.8-32bit.qcow2 -snapshot -net none"
(and it can use rtl8139 NIC).  Root password is "12", but there's no
need to login since all the problems happens before login.

In order to trigger the error in $subject, wait till it's idle
(which happens right after "login:" prompt) and send
"system_powerdown" command (I use -monitor stdio) - in about
5 seconds from there it'll error out.  In order to catch this
error it's better to use kvm 0.12 (it works with current seabios
under freebsd since graphics mode isn't used) -- current 0.14
behaves badly after "KVM internal error" (which needs to be
improved a bit too, I think)

>> 47965.428799: kvm_mmio: mmio unsatisfied-read len 4 gpa 0xbfff8 val 0x0
>> 47965.428801: kvm_mmio: mmio read len 4 gpa 0xbfff8 val 0x30
>> 47965.428801: kvm_entry: vcpu 0
>> 47965.428802: kvm_exit: reason vintr rip 0xd0202041
>> 47965.428802: kvm_inj_virq: irq 81
>> 47965.428802: kvm_inj_virq: irq 81
>> 47965.428803: kvm_entry: vcpu 0
>> 47965.428803: kvm_exit: reason npf rip 0xd0202041
>> 47965.428804: kvm_page_fault: address bfffc error_code 6
>> 47965.428804: kvm_emulate_insn: 0:d0202041: cf (prot32)
>> 47965.428805: kvm_emulate_insn: 0:d0202041: cf (prot32) failed
> 
> We don't emulate IRET-with-mmio-stack.

Note that the whole this story - two issues with OpenBSD - is
pure my "luck" - I don't use openbsd, never used it before, and
don't actually plan to use in a near future.  We debugged an
unrelated problem (a bug in linux nfs server) and tried to
perform various interoperability tests (installing various
operating systems in kvm), and found out that OpenBSD behaves
somewhat.. unexpectedly in kvm.  So I went on and performed
a few tests locally (installing OpenBSD for the first time
ever), which resulted in 2 my "bugreports".

I'm not sure how important OpenBSD support for kvm is, and if
it's something which better be done in OpenBSD itself instead
of kvm.

(This all is not to say I wont help resolving these issues -
quite the opposite, I'm willing to help, but I think my help
in a form of broken phone isn't of much value :)

Thanks!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: openbsd system_powerdown: "KVM internal error. Suberror: 1"

2011-03-21 Thread Gleb Natapov
On Mon, Mar 21, 2011 at 11:43:32AM +0200, Avi Kivity wrote:
> >47965.428797: kvm_mmio: mmio read len 4 gpa 0xbfff4 val 0x0
> >47965.428797: kvm_entry: vcpu 0
> >47965.428798: kvm_exit: reason npf rip 0xd020203c
> >47965.428798: kvm_page_fault: address bfff8 error_code 4
> >47965.428799: kvm_emulate_insn: 0:d020203c: 58 (prot32)
> 
> That's a POP instruction.  So openbsd mapped the stack into the
> framebuffer, and kvm has to emulate everything.
> 
IIRC openbsd had always have this problem with powerdown. Last time
I looked at it I found that when openbsd receives ACPI interrupt it
enters some kind of interrupt injection loop where with each received
interrupt stack grows a little bit. When stack starts to overlap with
frame buffer emulation error happens. May be something wrong with our MP
tables, but I couldn't figure what.

> Please post a complete binary trace from bootup until the
> host_state_reload issue appears.
> 
> >47965.428799: kvm_mmio: mmio unsatisfied-read len 4 gpa 0xbfff8 val 0x0
> >47965.428801: kvm_mmio: mmio read len 4 gpa 0xbfff8 val 0x30
> >47965.428801: kvm_entry: vcpu 0
> >47965.428802: kvm_exit: reason vintr rip 0xd0202041
> >47965.428802: kvm_inj_virq: irq 81
> >47965.428802: kvm_inj_virq: irq 81
> >47965.428803: kvm_entry: vcpu 0
> >47965.428803: kvm_exit: reason npf rip 0xd0202041
> >47965.428804: kvm_page_fault: address bfffc error_code 6
> >47965.428804: kvm_emulate_insn: 0:d0202041: cf (prot32)
> >47965.428805: kvm_emulate_insn: 0:d0202041: cf (prot32) failed
> 
> We don't emulate IRET-with-mmio-stack.
> 

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM lock contention on 48 core AMD machine

2011-03-21 Thread Avi Kivity

On 03/19/2011 06:45 AM, Ben Nagy wrote:

On Fri, Mar 18, 2011 at 7:30 PM, Joerg Roedel  wrote:
>  Hi Ben,
>
>  Can you try to run
>
>  # perf record -a -g
>
>  for a while when your VMs are up and unresponsive?

Hi Joerg,

Thanks for the response. Here's that output:

http://paste.ubuntu.com/582345/

Looks like it's spending all its time in kvm code, but there are some
symbols that are unresolved. Is the report still useful?



It's actually not kvm code, but timer code.  Please build qemu with 
./configure --disable-strip and redo.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: openbsd system_powerdown: "KVM internal error. Suberror: 1"

2011-03-21 Thread Avi Kivity

On 03/17/2011 10:18 PM, Michael Tokarev wrote:

17.03.2011 20:52, Marcelo Tosatti wrote:
[]
>  iret emulation is only partially implemented. Why is iret faulting
>  in the first place i don't know. Can you enable tracing with
>
>  echo kvm>  /$debugfs/tracing/set_event
>
>  And save the tail of the log, including events at $RIP?

Something like the one below (with the error at the end)?

What do you mean "events at $RIP" ?  I see 2 patterns here
with references to $RIP.  Many like this:

   kvm-0.12.5-1301  [001] 47965.427622: kvm_page_fault: address fee00080 
error_code 6
   kvm-0.12.5-1301  [001] 47965.427622: kvm_emulate_insn: 0:d0202002: 89 1d 
80 00 99 d0 (prot32)
   kvm-0.12.5-1301  [001] 47965.427623: kvm_mmio: mmio write len 4 gpa 
0xfee00080 val 0x30
   kvm-0.12.5-1301  [001] 47965.427623: kvm_apic: apic_write APIC_TASKPRI = 
0x30
   kvm-0.12.5-1301  [001] 47965.427624: kvm_entry: vcpu 0
   kvm-0.12.5-1301  [001] 47965.427625: kvm_exit: reason vintr rip 
0xd0202041
   kvm-0.12.5-1301  [001] 47965.427625: kvm_inj_virq: irq 81
   kvm-0.12.5-1301  [001] 47965.427625: kvm_inj_virq: irq 81
   kvm-0.12.5-1301  [001] 47965.427626: kvm_entry: vcpu 0
   kvm-0.12.5-1301  [001] 47965.427627: kvm_exit: reason npf rip 0xd02024f1
   kvm-0.12.5-1301  [001] 47965.427627: kvm_page_fault: address fee00080 
error_code 4

which are repeated without changes over and over again.
And at the end, several like this:

   kvm-0.12.5-1301  [001] 47965.428634: kvm_entry: vcpu 0
   kvm-0.12.5-1301  [001] 47965.428635: kvm_exit: reason npf rip 0xd020203a
   kvm-0.12.5-1301  [001] 47965.428635: kvm_page_fault: address bfffc 
error_code 4
   kvm-0.12.5-1301  [001] 47965.428635: kvm_emulate_insn: 0:d020203a: 5a 
(prot32)
   kvm-0.12.5-1301  [001] 47965.428636: kvm_mmio: mmio unsatisfied-read len 
4 gpa 0xbfffc val 0x0
   kvm-0.12.5-1301  [001] 47965.428637: kvm_mmio: mmio read len 4 gpa 
0xbfffc val 0xb100
   kvm-0.12.5-1301  [001] 47965.428637: kvm_entry: vcpu 0
   kvm-0.12.5-1301  [001] 47965.428638: kvm_exit: reason vintr rip 
0xd0202041
   kvm-0.12.5-1301  [001] 47965.428638: kvm_inj_virq: irq 81
   kvm-0.12.5-1301  [001] 47965.428638: kvm_inj_virq: irq 81
   kvm-0.12.5-1301  [001] 47965.428639: kvm_entry: vcpu 0
   kvm-0.12.5-1301  [001] 47965.428640: kvm_exit: reason npf rip 0xd02024cc
   kvm-0.12.5-1301  [001] 47965.428640: kvm_page_fault: address bfffc 
error_code 6
   kvm-0.12.5-1301  [001] 47965.428640: kvm_emulate_insn: 0:d02024cc: 6a 03 
(prot32)
   kvm-0.12.5-1301  [001] 47965.428641: kvm_mmio: mmio write len 4 gpa 
0xbfffc val 0x3

(with different kvm_emulate_insn each time), which finally ends with
   kvm-0.12.5-1301  [001] 47965.428804: kvm_emulate_insn: 0:d0202041: cf 
(prot32)
   kvm-0.12.5-1301  [001] 47965.428805: kvm_emulate_insn: 0:d0202041: cf 
(prot32) failed

Note it's the same openbsd which triggers insane amount of
host_state_reloads, so the trace is quite, well, large :)

Thanks!

/mjt

$ kvm-0.12.5 -drive file=obsd.raw,snapshot=on -net none -monitor stdio
QEMU 0.12.5 monitor - type 'help' for more information
(qemu) system_powerdown
(qemu) KVM internal error. Suberror: 1
rax 0030 rbx  rcx  rdx 
b100
rsi d0201fc6 rdi d0ac1ad4 rsp d438d004 rbp 
d4492e1c
r8   r9   r10  r11 

r12  r13  r14  r15 

rip d0202041 rflags 0292
cs 0050 (/ p 1 dpl 0 db 1 s 1 type b l 0 g 1 avl 0)
ds 0010 (/ p 1 dpl 0 db 1 s 1 type 3 l 0 g 1 avl 0)
es 0010 (/ p 1 dpl 0 db 1 s 1 type 3 l 0 g 1 avl 0)
ss 0010 (/ p 1 dpl 0 db 1 s 1 type 3 l 0 g 1 avl 0)
fs 0058 (d0ac1aa0/03db p 1 dpl 0 db 0 s 1 type 3 l 0 g 0 avl 0)
gs 0010 (/ p 1 dpl 0 db 1 s 1 type 3 l 0 g 1 avl 0)
tr 0078 (d4491000/0333 p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0)
ldt 0018 (d0a31580/0087 p 1 dpl 0 db 0 s 0 type 2 l 0 g 0 avl 0)
gdt d42b1000/
idt d0a31620/7ff
cr0 8001003b cr2 8adaa850 cr3 737 cr4 780 cr8 3 efer 0
emulation failure, check dmesg for details

(qemu) x/20i 0xd0202036
0xd0202036:  pop%edi
0xd0202037:  pop%esi
0xd0202038:  pop%ebp
0xd0202039:  pop%ebx
0xd020203a:  pop%edx
0xd020203b:  pop%ecx
0xd020203c:  pop%eax
0xd020203d:  sti
0xd020203e:  add$0x8,%esp
0xd0202041:  iret
0xd0202042:  mov%esi,%esi
0xd0202044:  mov$0x70,%eax
0xd0202049:  mov%eax,0xd0990080
0xd020204e:  sti
0xd020204f:  push   $0x2
0xd0202051:  call   0xd0570470
0xd0202056:  add$0x4,%esp
0xd0202059:  jmp*%esi
0xd020205b:  nop
0xd020205c:  mov   

Re: [PATCH 7/9] KVM: VMX: Refactor vmx_complete_atomic_exit()

2011-03-21 Thread Avi Kivity

On 03/15/2011 11:40 PM, Marcelo Tosatti wrote:

On Tue, Mar 08, 2011 at 03:57:43PM +0200, Avi Kivity wrote:
>  Move the exit reason checks to the front of the function, for early
>  exit in the common case.
>

>
>   static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
>   {
>  - u32 exit_intr_info = vmx->exit_intr_info;
>  + u32 exit_intr_info;
>  +
>  + if (!(vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY
>  +   || vmx->exit_reason == EXIT_REASON_EXCEPTION_NMI))
>  + return;
>  +
>  + exit_intr_info = vmx->exit_intr_info;

So you're saving a vmread on exception and mce_during_vmentry exits?


Well, the real goal is patch 8, which removes the unconditional read of 
VM_EXIT_INTR_INFO.  Right now we have 2 reads on exceptions and one on 
other exits, we drop both by 1.



Managing nmi_known_unmasked appears tricky... perhaps worthwhile to
have it in a helper that sets the bit in interruptibility info +
nmi_known_unmasked.


I'll look into it, likely as an add-on patch.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html