Re: [Qemu-devel] Help debugging a regression in KVM Module

2015-08-18 Thread Peter Lieven

Am 14.08.2015 um 22:01 schrieb Alex Bennée:

Peter Lieven p...@kamp.de writes:


Hi,

some time a go I stumbled across a regression in the KVM Module that has been 
introduced somewhere
between 3.17 and 3.19.

I have a rather old openSUSE guest with an XFS filesystem which realiably 
crashes after some live migrations.
I originally believed that the issue might be related to my setup with a 3.12 
host kernel and kvm-kmod 3.19,
but I now found that it is also still present with a 3.19 host kernel with 
included 3.19 kvm module.

My idea was to continue testing on a 3.12 host kernel and then bisect all 
commits to the kvm related parts.

Now my question is how to best bisect only kvm related changes (those
that go into kvm-kmod)?

In general I don't bother. As it is a bisection you eliminate half the
commits at a time you get their fairly quickly anyway. However you can
tell bisect which parts of the tree you car about:

   git bisect start -- arch/arm64/kvm include/linux/kvm* 
include/uapi/linux/kvm* virt/kvm/


After some experiments I was able to find out the bad commit that introduced 
the regression:

commit f30ebc312ca9def25650b4e1d01cdb425c310dca
Author: Radim Krčmář rkrc...@redhat.com
Date:   Thu Oct 30 15:06:47 2014 +0100

It seems that this optimisation is not working reliabliy after live migration. 
I can't reproduce if
I take a 3.19 kernel and revert this single commit.

Peter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Help debugging a regression in KVM Module

2015-08-18 Thread Peter Lieven


 Am 18.08.2015 um 17:25 schrieb Radim Krčmář rkrc...@redhat.com:
 
 2015-08-18 16:54+0200, Peter Lieven:
 After some experiments I was able to find out the bad commit that introduced 
 the regression:
 
 commit f30ebc312ca9def25650b4e1d01cdb425c310dca
 Author: Radim Krčmář rkrc...@redhat.com
 Date:   Thu Oct 30 15:06:47 2014 +0100
 
 It seems that this optimisation is not working reliabliy after live 
 migration. I can't reproduce if
 I take a 3.19 kernel and revert this single commit.
 
 Hello, this bug has gone unnoticed for a long time so it is fixed only
 since v4.1 (and v3.19.stable was dead at that point).

thanks for the pointer. i noticed the regression some time ago, but never found 
the time to debug. some distros rely on 3.19 e.g. Ubuntu LTS 14.04.2. I will 
try to ping the maintainer.

Peter

 
 commit b6ac069532218027f2991cba01d7a72a200688b0
 Author: Radim Krčmář rkrc...@redhat.com
 Date:   Fri Jun 5 20:57:41 2015 +0200
 
KVM: x86: fix lapic.timer_mode on restore
 
lapic.timer_mode was not properly initialized after migration, which
broke few useful things, like login, by making every sleep eternal.
 
Fix this by calling apic_update_lvtt in kvm_apic_post_state_restore.
 
There are other slowpaths that update lvtt, so this patch makes sure
something similar doesn't happen again by calling apic_update_lvtt
after every modification.
 
Cc: sta...@vger.kernel.org
Fixes: f30ebc312ca9 (KVM: x86: optimize some accesses to LVTT and SPIV)
Signed-off-by: Radim Krčmář rkrc...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Help debugging a regression in KVM Module

2015-08-17 Thread Peter Lieven

Am 14.08.2015 um 22:01 schrieb Alex Bennée:

Peter Lieven p...@kamp.de writes:


Hi,

some time a go I stumbled across a regression in the KVM Module that has been 
introduced somewhere
between 3.17 and 3.19.

I have a rather old openSUSE guest with an XFS filesystem which realiably 
crashes after some live migrations.
I originally believed that the issue might be related to my setup with a 3.12 
host kernel and kvm-kmod 3.19,
but I now found that it is also still present with a 3.19 host kernel with 
included 3.19 kvm module.

My idea was to continue testing on a 3.12 host kernel and then bisect all 
commits to the kvm related parts.

Now my question is how to best bisect only kvm related changes (those
that go into kvm-kmod)?

In general I don't bother. As it is a bisection you eliminate half the
commits at a time you get their fairly quickly anyway. However you can
tell bisect which parts of the tree you car about:

   git bisect start -- arch/arm64/kvm include/linux/kvm* 
include/uapi/linux/kvm* virt/kvm/


Yes, I just have to find out how exactly that works out if I want to bisect the 
linux submodule
of the kvm-kmod repository. But thanks for the pointer on how to limit the 
directories.

Thanks,
Peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Help debugging a regression in KVM Module

2015-08-14 Thread Peter Lieven
Hi,

some time a go I stumbled across a regression in the KVM Module that has been 
introduced somewhere
between 3.17 and 3.19.

I have a rather old openSUSE guest with an XFS filesystem which realiably 
crashes after some live migrations.
I originally believed that the issue might be related to my setup with a 3.12 
host kernel and kvm-kmod 3.19,
but I now found that it is also still present with a 3.19 host kernel with 
included 3.19 kvm module.

My idea was to continue testing on a 3.12 host kernel and then bisect all 
commits to the kvm related parts.

Now my question is how to best bisect only kvm related changes (those that go 
into kvm-kmod)?

Thanks,
Peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help debugging a regression in KVM Module

2015-08-14 Thread Peter Lieven
Am 14.08.2015 um 15:01 schrieb Paolo Bonzini:

 - Original Message -
 From: Peter Lieven p...@kamp.de
 To: qemu-de...@nongnu.org, kvm@vger.kernel.org
 Cc: Paolo Bonzini pbonz...@redhat.com
 Sent: Friday, August 14, 2015 1:11:34 PM
 Subject: Help debugging a regression in KVM Module

 Hi,

 some time a go I stumbled across a regression in the KVM Module that has been
 introduced somewhere
 between 3.17 and 3.19.

 I have a rather old openSUSE guest with an XFS filesystem which realiably
 crashes after some live migrations.
 I originally believed that the issue might be related to my setup with a 3.12
 host kernel and kvm-kmod 3.19,
 but I now found that it is also still present with a 3.19 host kernel with
 included 3.19 kvm module.

 My idea was to continue testing on a 3.12 host kernel and then bisect all
 commits to the kvm related parts.

 Now my question is how to best bisect only kvm related changes (those that go
 into kvm-kmod)?
 I haven't forgotten this.  Sorry. :(

 Unfortunately I'll be away for three weeks, but I'll make it a priority
 when I'm back.

Its not time critical, but I think its worth investigating as it might affect
other systems as well - and maybe XFS is only very sensitive.

I suppose you are going on vacation. Enjoy!

Peter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v3 1/2] add support for Hyper-V reference time counter

2014-01-12 Thread Peter Lieven

Am 12.01.2014 um 13:08 schrieb Vadim Rozenfeld vroze...@redhat.com:

 On Wed, 2014-01-08 at 23:20 +0100, Peter Lieven wrote:
 Am 08.01.2014 21:08, schrieb Vadim Rozenfeld:
 On Wed, 2014-01-08 at 15:54 +0100, Peter Lieven wrote:
 On 08.01.2014 13:12, Vadim Rozenfeld wrote:
 On Wed, 2014-01-08 at 12:48 +0100, Peter Lieven wrote:
 On 08.01.2014 11:44, Vadim Rozenfeld wrote:
 On Wed, 2014-01-08 at 11:15 +0100, Peter Lieven wrote:
 On 08.01.2014 10:40, Vadim Rozenfeld wrote:
 On Tue, 2014-01-07 at 18:52 +0100, Peter Lieven wrote:
 Am 07.01.2014 10:36, schrieb Vadim Rozenfeld:
 On Thu, 2014-01-02 at 17:52 +0100, Peter Lieven wrote:
 Am 11.12.2013 19:59, schrieb Marcelo Tosatti:
 On Wed, Dec 11, 2013 at 04:53:05PM -0200, Marcelo Tosatti wrote:
 On Sun, Dec 08, 2013 at 10:33:38PM +1100, Vadim Rozenfeld wrote:
 Signed-off: Peter Lieven p...@dlh.net
 Signed-off: Gleb Natapov g...@redhat.com
 Signed-off: Vadim Rozenfeld vroze...@redhat.com
 
 v1 - v2
 1. mark TSC page dirty as suggested by
   Eric Northup digitale...@google.com and Gleb
 2. disable local irq when calling get_kernel_ns,
   as it was done by Peter Lieven p...@dlhnet.de
 3. move check for TSC page enable from second patch
   to this one.
 
 ---
arch/x86/include/asm/kvm_host.h|  2 ++
arch/x86/include/uapi/asm/hyperv.h | 13 +
arch/x86/kvm/x86.c | 39 
 +-
include/uapi/linux/kvm.h   |  1 +
4 files changed, 54 insertions(+), 1 deletion(-)
 
 diff --git a/arch/x86/include/asm/kvm_host.h 
 b/arch/x86/include/asm/kvm_host.h
 index ae5d783..2fd0753 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -605,6 +605,8 @@ struct kvm_arch {
 /* fields used by HYPER-V emulation */
 u64 hv_guest_os_id;
 u64 hv_hypercall;
 +   u64 hv_ref_count;
 +   u64 hv_tsc_page;
 
 #ifdef CONFIG_KVM_MMU_AUDIT
 int audit_point;
 diff --git a/arch/x86/include/uapi/asm/hyperv.h 
 b/arch/x86/include/uapi/asm/hyperv.h
 index b8f1c01..462efe7 100644
 --- a/arch/x86/include/uapi/asm/hyperv.h
 +++ b/arch/x86/include/uapi/asm/hyperv.h
 @@ -28,6 +28,9 @@
/* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) 
 available*/
#define HV_X64_MSR_TIME_REF_COUNT_AVAILABLE  (1  1)
 
 +/* A partition's reference time stamp counter (TSC) page */
 +#define HV_X64_MSR_REFERENCE_TSC   0x4021
 +
/*
 * There is a single feature flag that signifies the 
 presence of the MSR
 * that can be used to retrieve both the local APIC Timer 
 frequency as
 @@ -198,6 +201,9 @@
#define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK \
 (~((1ull  
 HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
 
 +#define HV_X64_MSR_TSC_REFERENCE_ENABLE
 0x0001
 +#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT 12
 +
#define HV_PROCESSOR_POWER_STATE_C0  0
#define HV_PROCESSOR_POWER_STATE_C1  1
#define HV_PROCESSOR_POWER_STATE_C2  2
 @@ -210,4 +216,11 @@
#define HV_STATUS_INVALID_ALIGNMENT  4
#define HV_STATUS_INSUFFICIENT_BUFFERS   19
 
 +typedef struct _HV_REFERENCE_TSC_PAGE {
 +   __u32 tsc_sequence;
 +   __u32 res1;
 +   __u64 tsc_scale;
 +   __s64 tsc_offset;
 +} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
 +
#endif
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 21ef1ba..5e4e495a 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -840,7 +840,7 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc);
static u32 msrs_to_save[] = {
 MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
 MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
 -   HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
 +   HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, 
 HV_X64_MSR_TIME_REF_COUNT,
 HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, 
 MSR_KVM_STEAL_TIME,
 MSR_KVM_PV_EOI_EN,
 MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, 
 MSR_IA32_SYSENTER_EIP,
 @@ -1826,6 +1826,8 @@ static bool kvm_hv_msr_partition_wide(u32 
 msr)
 switch (msr) {
 case HV_X64_MSR_GUEST_OS_ID:
 case HV_X64_MSR_HYPERCALL:
 +   case HV_X64_MSR_REFERENCE_TSC:
 +   case HV_X64_MSR_TIME_REF_COUNT:
 r = true;
 break;
 }
 @@ -1865,6 +1867,29 @@ static int set_msr_hyperv_pw(struct 
 kvm_vcpu *vcpu, u32 msr, u64 data)
 if (__copy_to_user((void __user *)addr, 
 instructions, 4))
 return 1;
 kvm-arch.hv_hypercall = data;
 +   local_irq_disable();
 +   kvm-arch.hv_ref_count = get_kernel_ns() + 
 kvm-arch.kvmclock_offset;
 +   local_irq_enable();
 
 Where does the docs say that HV_X64_MSR_HYPERCALL is the where 
 the clock
 starts counting?
 
 No need to store kvmclock_offset in hv_ref_count? (moreover
 the name is weird, better name would

Re: [RFC PATCH v3 1/2] add support for Hyper-V reference time counter

2014-01-08 Thread Peter Lieven

On 08.01.2014 10:40, Vadim Rozenfeld wrote:

On Tue, 2014-01-07 at 18:52 +0100, Peter Lieven wrote:

Am 07.01.2014 10:36, schrieb Vadim Rozenfeld:

On Thu, 2014-01-02 at 17:52 +0100, Peter Lieven wrote:

Am 11.12.2013 19:59, schrieb Marcelo Tosatti:

On Wed, Dec 11, 2013 at 04:53:05PM -0200, Marcelo Tosatti wrote:

On Sun, Dec 08, 2013 at 10:33:38PM +1100, Vadim Rozenfeld wrote:

Signed-off: Peter Lieven p...@dlh.net
Signed-off: Gleb Natapov g...@redhat.com
Signed-off: Vadim Rozenfeld vroze...@redhat.com

v1 - v2
1. mark TSC page dirty as suggested by
 Eric Northup digitale...@google.com and Gleb
2. disable local irq when calling get_kernel_ns,
 as it was done by Peter Lieven p...@dlhnet.de
3. move check for TSC page enable from second patch
 to this one.

---
  arch/x86/include/asm/kvm_host.h|  2 ++
  arch/x86/include/uapi/asm/hyperv.h | 13 +
  arch/x86/kvm/x86.c | 39 +-
  include/uapi/linux/kvm.h   |  1 +
  4 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ae5d783..2fd0753 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -605,6 +605,8 @@ struct kvm_arch {
/* fields used by HYPER-V emulation */
u64 hv_guest_os_id;
u64 hv_hypercall;
+   u64 hv_ref_count;
+   u64 hv_tsc_page;

#ifdef CONFIG_KVM_MMU_AUDIT
int audit_point;
diff --git a/arch/x86/include/uapi/asm/hyperv.h 
b/arch/x86/include/uapi/asm/hyperv.h
index b8f1c01..462efe7 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h
@@ -28,6 +28,9 @@
  /* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) available*/
  #define HV_X64_MSR_TIME_REF_COUNT_AVAILABLE   (1  1)

+/* A partition's reference time stamp counter (TSC) page */
+#define HV_X64_MSR_REFERENCE_TSC   0x4021
+
  /*
   * There is a single feature flag that signifies the presence of the MSR
   * that can be used to retrieve both the local APIC Timer frequency as
@@ -198,6 +201,9 @@
  #define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK  \
(~((1ull  HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1))

+#define HV_X64_MSR_TSC_REFERENCE_ENABLE0x0001
+#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT 12
+
  #define HV_PROCESSOR_POWER_STATE_C0   0
  #define HV_PROCESSOR_POWER_STATE_C1   1
  #define HV_PROCESSOR_POWER_STATE_C2   2
@@ -210,4 +216,11 @@
  #define HV_STATUS_INVALID_ALIGNMENT   4
  #define HV_STATUS_INSUFFICIENT_BUFFERS19

+typedef struct _HV_REFERENCE_TSC_PAGE {
+   __u32 tsc_sequence;
+   __u32 res1;
+   __u64 tsc_scale;
+   __s64 tsc_offset;
+} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
+
  #endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 21ef1ba..5e4e495a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -840,7 +840,7 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc);
  static u32 msrs_to_save[] = {
MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
-   HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
+   HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, HV_X64_MSR_TIME_REF_COUNT,
HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
MSR_KVM_PV_EOI_EN,
MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
@@ -1826,6 +1826,8 @@ static bool kvm_hv_msr_partition_wide(u32 msr)
switch (msr) {
case HV_X64_MSR_GUEST_OS_ID:
case HV_X64_MSR_HYPERCALL:
+   case HV_X64_MSR_REFERENCE_TSC:
+   case HV_X64_MSR_TIME_REF_COUNT:
r = true;
break;
}
@@ -1865,6 +1867,29 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 
msr, u64 data)
if (__copy_to_user((void __user *)addr, instructions, 4))
return 1;
kvm-arch.hv_hypercall = data;
+   local_irq_disable();
+   kvm-arch.hv_ref_count = get_kernel_ns() + 
kvm-arch.kvmclock_offset;
+   local_irq_enable();


Where does the docs say that HV_X64_MSR_HYPERCALL is the where the clock
starts counting?

No need to store kvmclock_offset in hv_ref_count? (moreover
the name is weird, better name would be hv_ref_start_time.


Just add kvmclock_offset when reading the values (otherwise you have a
stale copy of kvmclock_offset in hv_ref_count).



After some experiments I think we do no need kvm-arch.hv_ref_count at all.

I was debugging some weird clockjump issues and I think the problem is that 
after live migration
kvm-arch.hv_ref_count is initialized to 0. Depending on the uptime of the 
vServer when the
hypercall was set up this can lead to series jumps.

So I would suggest to completely drop kvm-arch.hv_ref_count.

And use simply this in get_msr_hyperv_pw().

 case

Re: [RFC PATCH v3 1/2] add support for Hyper-V reference time counter

2014-01-08 Thread Peter Lieven

On 08.01.2014 11:44, Vadim Rozenfeld wrote:

On Wed, 2014-01-08 at 11:15 +0100, Peter Lieven wrote:

On 08.01.2014 10:40, Vadim Rozenfeld wrote:

On Tue, 2014-01-07 at 18:52 +0100, Peter Lieven wrote:

Am 07.01.2014 10:36, schrieb Vadim Rozenfeld:

On Thu, 2014-01-02 at 17:52 +0100, Peter Lieven wrote:

Am 11.12.2013 19:59, schrieb Marcelo Tosatti:

On Wed, Dec 11, 2013 at 04:53:05PM -0200, Marcelo Tosatti wrote:

On Sun, Dec 08, 2013 at 10:33:38PM +1100, Vadim Rozenfeld wrote:

Signed-off: Peter Lieven p...@dlh.net
Signed-off: Gleb Natapov g...@redhat.com
Signed-off: Vadim Rozenfeld vroze...@redhat.com

v1 - v2
1. mark TSC page dirty as suggested by
  Eric Northup digitale...@google.com and Gleb
2. disable local irq when calling get_kernel_ns,
  as it was done by Peter Lieven p...@dlhnet.de
3. move check for TSC page enable from second patch
  to this one.

---
   arch/x86/include/asm/kvm_host.h|  2 ++
   arch/x86/include/uapi/asm/hyperv.h | 13 +
   arch/x86/kvm/x86.c | 39 
+-
   include/uapi/linux/kvm.h   |  1 +
   4 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ae5d783..2fd0753 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -605,6 +605,8 @@ struct kvm_arch {
/* fields used by HYPER-V emulation */
u64 hv_guest_os_id;
u64 hv_hypercall;
+   u64 hv_ref_count;
+   u64 hv_tsc_page;

#ifdef CONFIG_KVM_MMU_AUDIT
int audit_point;
diff --git a/arch/x86/include/uapi/asm/hyperv.h 
b/arch/x86/include/uapi/asm/hyperv.h
index b8f1c01..462efe7 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h
@@ -28,6 +28,9 @@
   /* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) available*/
   #define HV_X64_MSR_TIME_REF_COUNT_AVAILABLE  (1  1)

+/* A partition's reference time stamp counter (TSC) page */
+#define HV_X64_MSR_REFERENCE_TSC   0x4021
+
   /*
* There is a single feature flag that signifies the presence of the MSR
* that can be used to retrieve both the local APIC Timer frequency as
@@ -198,6 +201,9 @@
   #define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK \
(~((1ull  HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1))

+#define HV_X64_MSR_TSC_REFERENCE_ENABLE0x0001
+#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT 12
+
   #define HV_PROCESSOR_POWER_STATE_C0  0
   #define HV_PROCESSOR_POWER_STATE_C1  1
   #define HV_PROCESSOR_POWER_STATE_C2  2
@@ -210,4 +216,11 @@
   #define HV_STATUS_INVALID_ALIGNMENT  4
   #define HV_STATUS_INSUFFICIENT_BUFFERS   19

+typedef struct _HV_REFERENCE_TSC_PAGE {
+   __u32 tsc_sequence;
+   __u32 res1;
+   __u64 tsc_scale;
+   __s64 tsc_offset;
+} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
+
   #endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 21ef1ba..5e4e495a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -840,7 +840,7 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc);
   static u32 msrs_to_save[] = {
MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
-   HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
+   HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, HV_X64_MSR_TIME_REF_COUNT,
HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
MSR_KVM_PV_EOI_EN,
MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
@@ -1826,6 +1826,8 @@ static bool kvm_hv_msr_partition_wide(u32 msr)
switch (msr) {
case HV_X64_MSR_GUEST_OS_ID:
case HV_X64_MSR_HYPERCALL:
+   case HV_X64_MSR_REFERENCE_TSC:
+   case HV_X64_MSR_TIME_REF_COUNT:
r = true;
break;
}
@@ -1865,6 +1867,29 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 
msr, u64 data)
if (__copy_to_user((void __user *)addr, instructions, 4))
return 1;
kvm-arch.hv_hypercall = data;
+   local_irq_disable();
+   kvm-arch.hv_ref_count = get_kernel_ns() + 
kvm-arch.kvmclock_offset;
+   local_irq_enable();


Where does the docs say that HV_X64_MSR_HYPERCALL is the where the clock
starts counting?

No need to store kvmclock_offset in hv_ref_count? (moreover
the name is weird, better name would be hv_ref_start_time.


Just add kvmclock_offset when reading the values (otherwise you have a
stale copy of kvmclock_offset in hv_ref_count).



After some experiments I think we do no need kvm-arch.hv_ref_count at all.

I was debugging some weird clockjump issues and I think the problem is that 
after live migration
kvm-arch.hv_ref_count is initialized to 0. Depending on the uptime of the 
vServer when the
hypercall was set up this can lead to series jumps.

So I

Re: [RFC PATCH v3 1/2] add support for Hyper-V reference time counter

2014-01-08 Thread Peter Lieven

On 08.01.2014 13:12, Vadim Rozenfeld wrote:

On Wed, 2014-01-08 at 12:48 +0100, Peter Lieven wrote:

On 08.01.2014 11:44, Vadim Rozenfeld wrote:

On Wed, 2014-01-08 at 11:15 +0100, Peter Lieven wrote:

On 08.01.2014 10:40, Vadim Rozenfeld wrote:

On Tue, 2014-01-07 at 18:52 +0100, Peter Lieven wrote:

Am 07.01.2014 10:36, schrieb Vadim Rozenfeld:

On Thu, 2014-01-02 at 17:52 +0100, Peter Lieven wrote:

Am 11.12.2013 19:59, schrieb Marcelo Tosatti:

On Wed, Dec 11, 2013 at 04:53:05PM -0200, Marcelo Tosatti wrote:

On Sun, Dec 08, 2013 at 10:33:38PM +1100, Vadim Rozenfeld wrote:

Signed-off: Peter Lieven p...@dlh.net
Signed-off: Gleb Natapov g...@redhat.com
Signed-off: Vadim Rozenfeld vroze...@redhat.com

v1 - v2
1. mark TSC page dirty as suggested by
   Eric Northup digitale...@google.com and Gleb
2. disable local irq when calling get_kernel_ns,
   as it was done by Peter Lieven p...@dlhnet.de
3. move check for TSC page enable from second patch
   to this one.

---
arch/x86/include/asm/kvm_host.h|  2 ++
arch/x86/include/uapi/asm/hyperv.h | 13 +
arch/x86/kvm/x86.c | 39 
+-
include/uapi/linux/kvm.h   |  1 +
4 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ae5d783..2fd0753 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -605,6 +605,8 @@ struct kvm_arch {
/* fields used by HYPER-V emulation */
u64 hv_guest_os_id;
u64 hv_hypercall;
+   u64 hv_ref_count;
+   u64 hv_tsc_page;

#ifdef CONFIG_KVM_MMU_AUDIT
int audit_point;
diff --git a/arch/x86/include/uapi/asm/hyperv.h 
b/arch/x86/include/uapi/asm/hyperv.h
index b8f1c01..462efe7 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h
@@ -28,6 +28,9 @@
/* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) available*/
#define HV_X64_MSR_TIME_REF_COUNT_AVAILABLE (1  1)

+/* A partition's reference time stamp counter (TSC) page */
+#define HV_X64_MSR_REFERENCE_TSC   0x4021
+
/*
 * There is a single feature flag that signifies the presence of the MSR
 * that can be used to retrieve both the local APIC Timer frequency as
@@ -198,6 +201,9 @@
#define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK\
(~((1ull  HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1))

+#define HV_X64_MSR_TSC_REFERENCE_ENABLE0x0001
+#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT 12
+
#define HV_PROCESSOR_POWER_STATE_C0 0
#define HV_PROCESSOR_POWER_STATE_C1 1
#define HV_PROCESSOR_POWER_STATE_C2 2
@@ -210,4 +216,11 @@
#define HV_STATUS_INVALID_ALIGNMENT 4
#define HV_STATUS_INSUFFICIENT_BUFFERS  19

+typedef struct _HV_REFERENCE_TSC_PAGE {
+   __u32 tsc_sequence;
+   __u32 res1;
+   __u64 tsc_scale;
+   __s64 tsc_offset;
+} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
+
#endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 21ef1ba..5e4e495a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -840,7 +840,7 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc);
static u32 msrs_to_save[] = {
MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
-   HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
+   HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, HV_X64_MSR_TIME_REF_COUNT,
HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
MSR_KVM_PV_EOI_EN,
MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
@@ -1826,6 +1826,8 @@ static bool kvm_hv_msr_partition_wide(u32 msr)
switch (msr) {
case HV_X64_MSR_GUEST_OS_ID:
case HV_X64_MSR_HYPERCALL:
+   case HV_X64_MSR_REFERENCE_TSC:
+   case HV_X64_MSR_TIME_REF_COUNT:
r = true;
break;
}
@@ -1865,6 +1867,29 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 
msr, u64 data)
if (__copy_to_user((void __user *)addr, instructions, 4))
return 1;
kvm-arch.hv_hypercall = data;
+   local_irq_disable();
+   kvm-arch.hv_ref_count = get_kernel_ns() + 
kvm-arch.kvmclock_offset;
+   local_irq_enable();


Where does the docs say that HV_X64_MSR_HYPERCALL is the where the clock
starts counting?

No need to store kvmclock_offset in hv_ref_count? (moreover
the name is weird, better name would be hv_ref_start_time.


Just add kvmclock_offset when reading the values (otherwise you have a
stale copy of kvmclock_offset in hv_ref_count).



After some experiments I think we do no need kvm-arch.hv_ref_count at all.

I was debugging some weird clockjump issues and I think the problem is that 
after live migration
kvm-arch.hv_ref_count

Re: [RFC PATCH v3 1/2] add support for Hyper-V reference time counter

2014-01-08 Thread Peter Lieven
Am 08.01.2014 21:08, schrieb Vadim Rozenfeld:
 On Wed, 2014-01-08 at 15:54 +0100, Peter Lieven wrote:
 On 08.01.2014 13:12, Vadim Rozenfeld wrote:
 On Wed, 2014-01-08 at 12:48 +0100, Peter Lieven wrote:
 On 08.01.2014 11:44, Vadim Rozenfeld wrote:
 On Wed, 2014-01-08 at 11:15 +0100, Peter Lieven wrote:
 On 08.01.2014 10:40, Vadim Rozenfeld wrote:
 On Tue, 2014-01-07 at 18:52 +0100, Peter Lieven wrote:
 Am 07.01.2014 10:36, schrieb Vadim Rozenfeld:
 On Thu, 2014-01-02 at 17:52 +0100, Peter Lieven wrote:
 Am 11.12.2013 19:59, schrieb Marcelo Tosatti:
 On Wed, Dec 11, 2013 at 04:53:05PM -0200, Marcelo Tosatti wrote:
 On Sun, Dec 08, 2013 at 10:33:38PM +1100, Vadim Rozenfeld wrote:
 Signed-off: Peter Lieven p...@dlh.net
 Signed-off: Gleb Natapov g...@redhat.com
 Signed-off: Vadim Rozenfeld vroze...@redhat.com

 v1 - v2
 1. mark TSC page dirty as suggested by
Eric Northup digitale...@google.com and Gleb
 2. disable local irq when calling get_kernel_ns,
as it was done by Peter Lieven p...@dlhnet.de
 3. move check for TSC page enable from second patch
to this one.

 ---
 arch/x86/include/asm/kvm_host.h|  2 ++
 arch/x86/include/uapi/asm/hyperv.h | 13 +
 arch/x86/kvm/x86.c | 39 
 +-
 include/uapi/linux/kvm.h   |  1 +
 4 files changed, 54 insertions(+), 1 deletion(-)

 diff --git a/arch/x86/include/asm/kvm_host.h 
 b/arch/x86/include/asm/kvm_host.h
 index ae5d783..2fd0753 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -605,6 +605,8 @@ struct kvm_arch {
   /* fields used by HYPER-V emulation */
   u64 hv_guest_os_id;
   u64 hv_hypercall;
 + u64 hv_ref_count;
 + u64 hv_tsc_page;

   #ifdef CONFIG_KVM_MMU_AUDIT
   int audit_point;
 diff --git a/arch/x86/include/uapi/asm/hyperv.h 
 b/arch/x86/include/uapi/asm/hyperv.h
 index b8f1c01..462efe7 100644
 --- a/arch/x86/include/uapi/asm/hyperv.h
 +++ b/arch/x86/include/uapi/asm/hyperv.h
 @@ -28,6 +28,9 @@
 /* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) 
 available*/
 #define HV_X64_MSR_TIME_REF_COUNT_AVAILABLE   (1  1)

 +/* A partition's reference time stamp counter (TSC) page */
 +#define HV_X64_MSR_REFERENCE_TSC 0x4021
 +
 /*
  * There is a single feature flag that signifies the presence 
 of the MSR
  * that can be used to retrieve both the local APIC Timer 
 frequency as
 @@ -198,6 +201,9 @@
 #define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK  \
   (~((1ull  
 HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1))

 +#define HV_X64_MSR_TSC_REFERENCE_ENABLE  0x0001
 +#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT   12
 +
 #define HV_PROCESSOR_POWER_STATE_C0   0
 #define HV_PROCESSOR_POWER_STATE_C1   1
 #define HV_PROCESSOR_POWER_STATE_C2   2
 @@ -210,4 +216,11 @@
 #define HV_STATUS_INVALID_ALIGNMENT   4
 #define HV_STATUS_INSUFFICIENT_BUFFERS19

 +typedef struct _HV_REFERENCE_TSC_PAGE {
 + __u32 tsc_sequence;
 + __u32 res1;
 + __u64 tsc_scale;
 + __s64 tsc_offset;
 +} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
 +
 #endif
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 21ef1ba..5e4e495a 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -840,7 +840,7 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc);
 static u32 msrs_to_save[] = {
   MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
   MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
 - HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
 + HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, 
 HV_X64_MSR_TIME_REF_COUNT,
   HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, 
 MSR_KVM_STEAL_TIME,
   MSR_KVM_PV_EOI_EN,
   MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, 
 MSR_IA32_SYSENTER_EIP,
 @@ -1826,6 +1826,8 @@ static bool kvm_hv_msr_partition_wide(u32 
 msr)
   switch (msr) {
   case HV_X64_MSR_GUEST_OS_ID:
   case HV_X64_MSR_HYPERCALL:
 + case HV_X64_MSR_REFERENCE_TSC:
 + case HV_X64_MSR_TIME_REF_COUNT:
   r = true;
   break;
   }
 @@ -1865,6 +1867,29 @@ static int set_msr_hyperv_pw(struct 
 kvm_vcpu *vcpu, u32 msr, u64 data)
   if (__copy_to_user((void __user *)addr, 
 instructions, 4))
   return 1;
   kvm-arch.hv_hypercall = data;
 + local_irq_disable();
 + kvm-arch.hv_ref_count = get_kernel_ns() + 
 kvm-arch.kvmclock_offset;
 + local_irq_enable();

 Where does the docs say that HV_X64_MSR_HYPERCALL is the where the 
 clock
 starts counting?

 No need to store kvmclock_offset in hv_ref_count? (moreover
 the name is weird, better name would be hv_ref_start_time.

 Just add kvmclock_offset when reading the values (otherwise you 
 have a
 stale copy of kvmclock_offset in hv_ref_count).


 After some

Re: [RFC PATCH v3 1/2] add support for Hyper-V reference time counter

2014-01-07 Thread Peter Lieven
Am 07.01.2014 10:36, schrieb Vadim Rozenfeld:
 On Thu, 2014-01-02 at 17:52 +0100, Peter Lieven wrote:
 Am 11.12.2013 19:59, schrieb Marcelo Tosatti:
 On Wed, Dec 11, 2013 at 04:53:05PM -0200, Marcelo Tosatti wrote:
 On Sun, Dec 08, 2013 at 10:33:38PM +1100, Vadim Rozenfeld wrote:
 Signed-off: Peter Lieven p...@dlh.net
 Signed-off: Gleb Natapov g...@redhat.com
 Signed-off: Vadim Rozenfeld vroze...@redhat.com

 v1 - v2
 1. mark TSC page dirty as suggested by 
 Eric Northup digitale...@google.com and Gleb
 2. disable local irq when calling get_kernel_ns, 
 as it was done by Peter Lieven p...@dlhnet.de
 3. move check for TSC page enable from second patch
 to this one.

 ---
  arch/x86/include/asm/kvm_host.h|  2 ++
  arch/x86/include/uapi/asm/hyperv.h | 13 +
  arch/x86/kvm/x86.c | 39 
 +-
  include/uapi/linux/kvm.h   |  1 +
  4 files changed, 54 insertions(+), 1 deletion(-)

 diff --git a/arch/x86/include/asm/kvm_host.h 
 b/arch/x86/include/asm/kvm_host.h
 index ae5d783..2fd0753 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -605,6 +605,8 @@ struct kvm_arch {
   /* fields used by HYPER-V emulation */
   u64 hv_guest_os_id;
   u64 hv_hypercall;
 + u64 hv_ref_count;
 + u64 hv_tsc_page;
  
   #ifdef CONFIG_KVM_MMU_AUDIT
   int audit_point;
 diff --git a/arch/x86/include/uapi/asm/hyperv.h 
 b/arch/x86/include/uapi/asm/hyperv.h
 index b8f1c01..462efe7 100644
 --- a/arch/x86/include/uapi/asm/hyperv.h
 +++ b/arch/x86/include/uapi/asm/hyperv.h
 @@ -28,6 +28,9 @@
  /* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) available*/
  #define HV_X64_MSR_TIME_REF_COUNT_AVAILABLE  (1  1)
  
 +/* A partition's reference time stamp counter (TSC) page */
 +#define HV_X64_MSR_REFERENCE_TSC 0x4021
 +
  /*
   * There is a single feature flag that signifies the presence of the MSR
   * that can be used to retrieve both the local APIC Timer frequency as
 @@ -198,6 +201,9 @@
  #define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK \
   (~((1ull  HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
  
 +#define HV_X64_MSR_TSC_REFERENCE_ENABLE  0x0001
 +#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT   12
 +
  #define HV_PROCESSOR_POWER_STATE_C0  0
  #define HV_PROCESSOR_POWER_STATE_C1  1
  #define HV_PROCESSOR_POWER_STATE_C2  2
 @@ -210,4 +216,11 @@
  #define HV_STATUS_INVALID_ALIGNMENT  4
  #define HV_STATUS_INSUFFICIENT_BUFFERS   19
  
 +typedef struct _HV_REFERENCE_TSC_PAGE {
 + __u32 tsc_sequence;
 + __u32 res1;
 + __u64 tsc_scale;
 + __s64 tsc_offset;
 +} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
 +
  #endif
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 21ef1ba..5e4e495a 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -840,7 +840,7 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc);
  static u32 msrs_to_save[] = {
   MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
   MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
 - HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
 + HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, HV_X64_MSR_TIME_REF_COUNT,
   HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
   MSR_KVM_PV_EOI_EN,
   MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
 @@ -1826,6 +1826,8 @@ static bool kvm_hv_msr_partition_wide(u32 msr)
   switch (msr) {
   case HV_X64_MSR_GUEST_OS_ID:
   case HV_X64_MSR_HYPERCALL:
 + case HV_X64_MSR_REFERENCE_TSC:
 + case HV_X64_MSR_TIME_REF_COUNT:
   r = true;
   break;
   }
 @@ -1865,6 +1867,29 @@ static int set_msr_hyperv_pw(struct kvm_vcpu 
 *vcpu, u32 msr, u64 data)
   if (__copy_to_user((void __user *)addr, instructions, 4))
   return 1;
   kvm-arch.hv_hypercall = data;
 + local_irq_disable();
 + kvm-arch.hv_ref_count = get_kernel_ns() + 
 kvm-arch.kvmclock_offset;
 + local_irq_enable();

 Where does the docs say that HV_X64_MSR_HYPERCALL is the where the clock
 starts counting?

 No need to store kvmclock_offset in hv_ref_count? (moreover
 the name is weird, better name would be hv_ref_start_time.

 Just add kvmclock_offset when reading the values (otherwise you have a
 stale copy of kvmclock_offset in hv_ref_count).


 After some experiments I think we do no need kvm-arch.hv_ref_count at all.

 I was debugging some weird clockjump issues and I think the problem is that 
 after live migration
 kvm-arch.hv_ref_count is initialized to 0. Depending on the uptime of the 
 vServer when the
 hypercall was set up this can lead to series jumps.

 So I would suggest to completely drop kvm-arch.hv_ref_count.

 And use simply this in get_msr_hyperv_pw().

 case HV_X64_MSR_TIME_REF_COUNT: {
 data = div_u64(get_kernel_ns() + kvm-arch.kvmclock_offset, 
 100);
 break;
 }

 
 Agreed. It should work as long as we rely

Re: [RFC PATCH v3 1/2] add support for Hyper-V reference time counter

2014-01-02 Thread Peter Lieven
Am 11.12.2013 19:53, schrieb Marcelo Tosatti:
 On Sun, Dec 08, 2013 at 10:33:38PM +1100, Vadim Rozenfeld wrote:
 Signed-off: Peter Lieven p...@dlh.net
 Signed-off: Gleb Natapov g...@redhat.com
 Signed-off: Vadim Rozenfeld vroze...@redhat.com

 v1 - v2
 1. mark TSC page dirty as suggested by 
 Eric Northup digitale...@google.com and Gleb
 2. disable local irq when calling get_kernel_ns, 
 as it was done by Peter Lieven p...@dlhnet.de
 3. move check for TSC page enable from second patch
 to this one.

 ---
  arch/x86/include/asm/kvm_host.h|  2 ++
  arch/x86/include/uapi/asm/hyperv.h | 13 +
  arch/x86/kvm/x86.c | 39 
 +-
  include/uapi/linux/kvm.h   |  1 +
  4 files changed, 54 insertions(+), 1 deletion(-)

 diff --git a/arch/x86/include/asm/kvm_host.h 
 b/arch/x86/include/asm/kvm_host.h
 index ae5d783..2fd0753 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -605,6 +605,8 @@ struct kvm_arch {
  /* fields used by HYPER-V emulation */
  u64 hv_guest_os_id;
  u64 hv_hypercall;
 +u64 hv_ref_count;
 +u64 hv_tsc_page;
  
  #ifdef CONFIG_KVM_MMU_AUDIT
  int audit_point;
 diff --git a/arch/x86/include/uapi/asm/hyperv.h 
 b/arch/x86/include/uapi/asm/hyperv.h
 index b8f1c01..462efe7 100644
 --- a/arch/x86/include/uapi/asm/hyperv.h
 +++ b/arch/x86/include/uapi/asm/hyperv.h
 @@ -28,6 +28,9 @@
  /* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) available*/
  #define HV_X64_MSR_TIME_REF_COUNT_AVAILABLE (1  1)
  
 +/* A partition's reference time stamp counter (TSC) page */
 +#define HV_X64_MSR_REFERENCE_TSC0x4021
 +
  /*
   * There is a single feature flag that signifies the presence of the MSR
   * that can be used to retrieve both the local APIC Timer frequency as
 @@ -198,6 +201,9 @@
  #define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK\
  (~((1ull  HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
  
 +#define HV_X64_MSR_TSC_REFERENCE_ENABLE 0x0001
 +#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT  12
 +
  #define HV_PROCESSOR_POWER_STATE_C0 0
  #define HV_PROCESSOR_POWER_STATE_C1 1
  #define HV_PROCESSOR_POWER_STATE_C2 2
 @@ -210,4 +216,11 @@
  #define HV_STATUS_INVALID_ALIGNMENT 4
  #define HV_STATUS_INSUFFICIENT_BUFFERS  19
  
 +typedef struct _HV_REFERENCE_TSC_PAGE {
 +__u32 tsc_sequence;
 +__u32 res1;
 +__u64 tsc_scale;
 +__s64 tsc_offset;
 +} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
 +
  #endif
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 21ef1ba..5e4e495a 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -840,7 +840,7 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc);
  static u32 msrs_to_save[] = {
  MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
  MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
 -HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
 +HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, HV_X64_MSR_TIME_REF_COUNT,
  HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
  MSR_KVM_PV_EOI_EN,
  MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
 @@ -1826,6 +1826,8 @@ static bool kvm_hv_msr_partition_wide(u32 msr)
  switch (msr) {
  case HV_X64_MSR_GUEST_OS_ID:
  case HV_X64_MSR_HYPERCALL:
 +case HV_X64_MSR_REFERENCE_TSC:
 +case HV_X64_MSR_TIME_REF_COUNT:
  r = true;
  break;
  }
 @@ -1865,6 +1867,29 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, 
 u32 msr, u64 data)
  if (__copy_to_user((void __user *)addr, instructions, 4))
  return 1;
  kvm-arch.hv_hypercall = data;
 +local_irq_disable();
 +kvm-arch.hv_ref_count = get_kernel_ns() + 
 kvm-arch.kvmclock_offset;
 +local_irq_enable();
 
 Where does the docs say that HV_X64_MSR_HYPERCALL is the where the clock
 starts counting?
 
 No need to store kvmclock_offset in hv_ref_count? (moreover
 the name is weird, better name would be hv_ref_start_time.
 
 +break;
 +}
 +case HV_X64_MSR_REFERENCE_TSC: {
 +u64 gfn;
 +unsigned long addr;
 +HV_REFERENCE_TSC_PAGE tsc_ref;
 +tsc_ref.tsc_sequence = 0;
 +if (!(data  HV_X64_MSR_TSC_REFERENCE_ENABLE)) {
 +kvm-arch.hv_tsc_page = data;
 +break;
 +}
 +gfn = data  HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT;
 +addr = gfn_to_hva(kvm, data 
 +HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT);
 +if (kvm_is_error_hva(addr))
 +return 1;
 +if (__copy_to_user((void __user *)addr, tsc_ref, 
 sizeof(tsc_ref)))
 +return 1;
 +mark_page_dirty(kvm, gfn);
 +kvm-arch.hv_tsc_page = data;
  break;
  }
  default

Re: [RFC PATCH v3 1/2] add support for Hyper-V reference time counter

2014-01-02 Thread Peter Lieven
Am 02.01.2014 14:57, schrieb Marcelo Tosatti:
 On Thu, Jan 02, 2014 at 02:15:48PM +0100, Peter Lieven wrote:
 Am 11.12.2013 19:53, schrieb Marcelo Tosatti:
 On Sun, Dec 08, 2013 at 10:33:38PM +1100, Vadim Rozenfeld wrote:
 Signed-off: Peter Lieven p...@dlh.net
 Signed-off: Gleb Natapov g...@redhat.com
 Signed-off: Vadim Rozenfeld vroze...@redhat.com

 v1 - v2
 1. mark TSC page dirty as suggested by 
 Eric Northup digitale...@google.com and Gleb
 2. disable local irq when calling get_kernel_ns, 
 as it was done by Peter Lieven p...@dlhnet.de
 3. move check for TSC page enable from second patch
 to this one.

 ---
  arch/x86/include/asm/kvm_host.h|  2 ++
  arch/x86/include/uapi/asm/hyperv.h | 13 +
  arch/x86/kvm/x86.c | 39 
 +-
  include/uapi/linux/kvm.h   |  1 +
  4 files changed, 54 insertions(+), 1 deletion(-)

 diff --git a/arch/x86/include/asm/kvm_host.h 
 b/arch/x86/include/asm/kvm_host.h
 index ae5d783..2fd0753 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -605,6 +605,8 @@ struct kvm_arch {
/* fields used by HYPER-V emulation */
u64 hv_guest_os_id;
u64 hv_hypercall;
 +  u64 hv_ref_count;
 +  u64 hv_tsc_page;
  
#ifdef CONFIG_KVM_MMU_AUDIT
int audit_point;
 diff --git a/arch/x86/include/uapi/asm/hyperv.h 
 b/arch/x86/include/uapi/asm/hyperv.h
 index b8f1c01..462efe7 100644
 --- a/arch/x86/include/uapi/asm/hyperv.h
 +++ b/arch/x86/include/uapi/asm/hyperv.h
 @@ -28,6 +28,9 @@
  /* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) available*/
  #define HV_X64_MSR_TIME_REF_COUNT_AVAILABLE   (1  1)
  
 +/* A partition's reference time stamp counter (TSC) page */
 +#define HV_X64_MSR_REFERENCE_TSC  0x4021
 +
  /*
   * There is a single feature flag that signifies the presence of the MSR
   * that can be used to retrieve both the local APIC Timer frequency as
 @@ -198,6 +201,9 @@
  #define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK  \
(~((1ull  HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
  
 +#define HV_X64_MSR_TSC_REFERENCE_ENABLE   0x0001
 +#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT12
 +
  #define HV_PROCESSOR_POWER_STATE_C0   0
  #define HV_PROCESSOR_POWER_STATE_C1   1
  #define HV_PROCESSOR_POWER_STATE_C2   2
 @@ -210,4 +216,11 @@
  #define HV_STATUS_INVALID_ALIGNMENT   4
  #define HV_STATUS_INSUFFICIENT_BUFFERS19
  
 +typedef struct _HV_REFERENCE_TSC_PAGE {
 +  __u32 tsc_sequence;
 +  __u32 res1;
 +  __u64 tsc_scale;
 +  __s64 tsc_offset;
 +} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
 +
  #endif
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 21ef1ba..5e4e495a 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -840,7 +840,7 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc);
  static u32 msrs_to_save[] = {
MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
 -  HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
 +  HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, HV_X64_MSR_TIME_REF_COUNT,
HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
MSR_KVM_PV_EOI_EN,
MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
 @@ -1826,6 +1826,8 @@ static bool kvm_hv_msr_partition_wide(u32 msr)
switch (msr) {
case HV_X64_MSR_GUEST_OS_ID:
case HV_X64_MSR_HYPERCALL:
 +  case HV_X64_MSR_REFERENCE_TSC:
 +  case HV_X64_MSR_TIME_REF_COUNT:
r = true;
break;
}
 @@ -1865,6 +1867,29 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, 
 u32 msr, u64 data)
if (__copy_to_user((void __user *)addr, instructions, 4))
return 1;
kvm-arch.hv_hypercall = data;
 +  local_irq_disable();
 +  kvm-arch.hv_ref_count = get_kernel_ns() + 
 kvm-arch.kvmclock_offset;
 +  local_irq_enable();

 Where does the docs say that HV_X64_MSR_HYPERCALL is the where the clock
 starts counting?

 No need to store kvmclock_offset in hv_ref_count? (moreover
 the name is weird, better name would be hv_ref_start_time.

 +  break;
 +  }
 +  case HV_X64_MSR_REFERENCE_TSC: {
 +  u64 gfn;
 +  unsigned long addr;
 +  HV_REFERENCE_TSC_PAGE tsc_ref;
 +  tsc_ref.tsc_sequence = 0;
 +  if (!(data  HV_X64_MSR_TSC_REFERENCE_ENABLE)) {
 +  kvm-arch.hv_tsc_page = data;
 +  break;
 +  }
 +  gfn = data  HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT;
 +  addr = gfn_to_hva(kvm, data 
 +  HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT);
 +  if (kvm_is_error_hva(addr))
 +  return 1;
 +  if (__copy_to_user((void __user *)addr, tsc_ref, 
 sizeof(tsc_ref)))
 +  return 1;
 +  mark_page_dirty(kvm, gfn);
 +  kvm-arch.hv_tsc_page = data;
break

Re: [RFC PATCH v3 1/2] add support for Hyper-V reference time counter

2014-01-02 Thread Peter Lieven
Am 11.12.2013 19:59, schrieb Marcelo Tosatti:
 On Wed, Dec 11, 2013 at 04:53:05PM -0200, Marcelo Tosatti wrote:
 On Sun, Dec 08, 2013 at 10:33:38PM +1100, Vadim Rozenfeld wrote:
 Signed-off: Peter Lieven p...@dlh.net
 Signed-off: Gleb Natapov g...@redhat.com
 Signed-off: Vadim Rozenfeld vroze...@redhat.com

 v1 - v2
 1. mark TSC page dirty as suggested by 
 Eric Northup digitale...@google.com and Gleb
 2. disable local irq when calling get_kernel_ns, 
 as it was done by Peter Lieven p...@dlhnet.de
 3. move check for TSC page enable from second patch
 to this one.

 ---
  arch/x86/include/asm/kvm_host.h|  2 ++
  arch/x86/include/uapi/asm/hyperv.h | 13 +
  arch/x86/kvm/x86.c | 39 
 +-
  include/uapi/linux/kvm.h   |  1 +
  4 files changed, 54 insertions(+), 1 deletion(-)

 diff --git a/arch/x86/include/asm/kvm_host.h 
 b/arch/x86/include/asm/kvm_host.h
 index ae5d783..2fd0753 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -605,6 +605,8 @@ struct kvm_arch {
 /* fields used by HYPER-V emulation */
 u64 hv_guest_os_id;
 u64 hv_hypercall;
 +   u64 hv_ref_count;
 +   u64 hv_tsc_page;
  
 #ifdef CONFIG_KVM_MMU_AUDIT
 int audit_point;
 diff --git a/arch/x86/include/uapi/asm/hyperv.h 
 b/arch/x86/include/uapi/asm/hyperv.h
 index b8f1c01..462efe7 100644
 --- a/arch/x86/include/uapi/asm/hyperv.h
 +++ b/arch/x86/include/uapi/asm/hyperv.h
 @@ -28,6 +28,9 @@
  /* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) available*/
  #define HV_X64_MSR_TIME_REF_COUNT_AVAILABLE(1  1)
  
 +/* A partition's reference time stamp counter (TSC) page */
 +#define HV_X64_MSR_REFERENCE_TSC   0x4021
 +
  /*
   * There is a single feature flag that signifies the presence of the MSR
   * that can be used to retrieve both the local APIC Timer frequency as
 @@ -198,6 +201,9 @@
  #define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK   \
 (~((1ull  HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
  
 +#define HV_X64_MSR_TSC_REFERENCE_ENABLE0x0001
 +#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT 12
 +
  #define HV_PROCESSOR_POWER_STATE_C00
  #define HV_PROCESSOR_POWER_STATE_C11
  #define HV_PROCESSOR_POWER_STATE_C22
 @@ -210,4 +216,11 @@
  #define HV_STATUS_INVALID_ALIGNMENT4
  #define HV_STATUS_INSUFFICIENT_BUFFERS 19
  
 +typedef struct _HV_REFERENCE_TSC_PAGE {
 +   __u32 tsc_sequence;
 +   __u32 res1;
 +   __u64 tsc_scale;
 +   __s64 tsc_offset;
 +} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
 +
  #endif
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 21ef1ba..5e4e495a 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -840,7 +840,7 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc);
  static u32 msrs_to_save[] = {
 MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
 MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
 -   HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
 +   HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, HV_X64_MSR_TIME_REF_COUNT,
 HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
 MSR_KVM_PV_EOI_EN,
 MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
 @@ -1826,6 +1826,8 @@ static bool kvm_hv_msr_partition_wide(u32 msr)
 switch (msr) {
 case HV_X64_MSR_GUEST_OS_ID:
 case HV_X64_MSR_HYPERCALL:
 +   case HV_X64_MSR_REFERENCE_TSC:
 +   case HV_X64_MSR_TIME_REF_COUNT:
 r = true;
 break;
 }
 @@ -1865,6 +1867,29 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, 
 u32 msr, u64 data)
 if (__copy_to_user((void __user *)addr, instructions, 4))
 return 1;
 kvm-arch.hv_hypercall = data;
 +   local_irq_disable();
 +   kvm-arch.hv_ref_count = get_kernel_ns() + 
 kvm-arch.kvmclock_offset;
 +   local_irq_enable();

 Where does the docs say that HV_X64_MSR_HYPERCALL is the where the clock
 starts counting?

 No need to store kvmclock_offset in hv_ref_count? (moreover
 the name is weird, better name would be hv_ref_start_time.
 
 Just add kvmclock_offset when reading the values (otherwise you have a
 stale copy of kvmclock_offset in hv_ref_count).
 

After some experiments I think we do no need kvm-arch.hv_ref_count at all.

I was debugging some weird clockjump issues and I think the problem is that 
after live migration
kvm-arch.hv_ref_count is initialized to 0. Depending on the uptime of the 
vServer when the
hypercall was set up this can lead to series jumps.

So I would suggest to completely drop kvm-arch.hv_ref_count.

And use simply this in get_msr_hyperv_pw().

case HV_X64_MSR_TIME_REF_COUNT: {
data = div_u64(get_kernel_ns() + kvm-arch.kvmclock_offset, 
100);
break;
}

It seems that get_kernel_ns() + kvm-arch.kvmclock_offset is exactly the 
vServer uptime

Re: [Qemu-devel] [Bug 1100843] Re: Live Migration Causes Performance Issues

2013-10-10 Thread Peter Lieven

On 07.10.2013 11:55, Paolo Bonzini wrote:

Il 07/10/2013 11:49, Peter Lieven ha scritto:

It's in general not easy to do this if you take non-x86 targets into
account.

What about the dirty way to zero out all non zero pages at the beginning of
ram_load?

I'm not sure I follow?

sth like this for each ram block at the beginning of ram_load.

 
+base = memory_region_get_ram_ptr(block-mr);

+for (offset = 0; offset  block-length;
+ offset += TARGET_PAGE_SIZE) {
+if (!is_zero_page(base + offset)) {
+memset(base + offset, 0x00, TARGET_PAGE_SIZE);
+}
+}
+

Then add a capability skip_zero_pages which does not sent them on the source
and enables this zeroing. it would also be possible to skip the zero check
for each incoming compressed pages.

Peter


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [Bug 1100843] Re: Live Migration Causes Performance Issues

2013-10-07 Thread Peter Lieven

On 06.10.2013 15:57, Zhang Haoyu wrote:

From my testing this has been fixed in the saucy version (1.5.0) of

qemu. It is fixed by this patch:

f1c72795af573b24a7da5eb52375c9aba8a37972

However later in the history this commit was reverted, and again broke

this. The other commit that fixes this is:

211ea74022f51164a7729030b28eec90b6c99a08


See below post,please.
https://lists.gnu.org/archive/html/qemu-devel/2013-08/msg05062.html


I would still like to fix qemu to not load roms etc. if we set up a migration 
target. In this case
we could drop the madvise, skip the checking for zero pages and also could 
avoid sending
zero pages at all. It would be the cleanest solution.

Peter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [Bug 1100843] Re: Live Migration Causes Performance Issues

2013-10-07 Thread Peter Lieven

On 07.10.2013 11:37, Paolo Bonzini wrote:

Il 07/10/2013 08:38, Peter Lieven ha scritto:

On 06.10.2013 15:57, Zhang Haoyu wrote:

From my testing this has been fixed in the saucy version (1.5.0) of

qemu. It is fixed by this patch:

f1c72795af573b24a7da5eb52375c9aba8a37972

However later in the history this commit was reverted, and again broke

this. The other commit that fixes this is:

211ea74022f51164a7729030b28eec90b6c99a08


See below post,please.
https://lists.gnu.org/archive/html/qemu-devel/2013-08/msg05062.html

I would still like to fix qemu to not load roms etc. if we set up a
migration target. In this case
we could drop the madvise, skip the checking for zero pages and also
could avoid sending
zero pages at all. It would be the cleanest solution.

It's in general not easy to do this if you take non-x86 targets into
account.

What about the dirty way to zero out all non zero pages at the beginning of
ram_load?

Peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 1/2] add support for Hyper-V reference time counter

2013-05-23 Thread Peter Lieven

On 22.05.2013 23:55, Paolo Bonzini wrote:

Il 22/05/2013 09:32, Vadim Rozenfeld ha scritto:

@@ -1827,6 +1829,29 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 
msr, u64 data)
if (__copy_to_user((void __user *)addr, instructions, 4))
return 1;
kvm-arch.hv_hypercall = data;
+   local_irq_disable();
+   kvm-arch.hv_ref_count = get_kernel_ns();
+   local_irq_enable();
+   break;

local_irq_disable/local_irq_enable not needed.


What is the reasoning behind reading this time value at msr write time?
[VR] Windows writs this MSR only once, during HAL initialization.
So, I decided to treat this call as a partition crate event.



But is it expected by Windows that the reference count starts counting
up from 0 at partition creation time?  If you could just use
(get_kernel_ns() + kvm-arch.kvmclock_offset) / 100, it would also be
simpler for migration purposes.


I can just report, that I have used the patch that does it that way and it 
works.
Maybe Windows is calculating the uptime by the reference counter?

Peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 2/2] add support for Hyper-V invariant TSC

2013-05-23 Thread Peter Lieven

On 22.05.2013 23:23, Marcelo Tosatti wrote:

On Wed, May 22, 2013 at 03:22:55AM -0400, Vadim Rozenfeld wrote:



- Original Message -
From: Marcelo Tosatti mtosa...@redhat.com
To: Vadim Rozenfeld vroze...@redhat.com
Cc: kvm@vger.kernel.org, g...@redhat.com, p...@dlh.net
Sent: Wednesday, May 22, 2013 10:50:46 AM
Subject: Re: [RFC PATCH v2 2/2] add support for Hyper-V invariant TSC

On Sun, May 19, 2013 at 05:06:37PM +1000, Vadim Rozenfeld wrote:

The following patch allows to activate a partition reference
time enlightenment that is based on the host platform's support
for an Invariant Time Stamp Counter (iTSC).
NOTE: This code will survive migration due to lack of VM stop/resume
handlers, when offset, scale and sequence should be
readjusted.

---
  arch/x86/kvm/x86.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9645dab..b423fe4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1838,7 +1838,6 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 
msr, u64 data)
u64 gfn;
unsigned long addr;
HV_REFERENCE_TSC_PAGE tsc_ref;
-   tsc_ref.TscSequence = 0;
if (!(data  HV_X64_MSR_TSC_REFERENCE_ENABLE)) {
kvm-arch.hv_tsc_page = data;
break;
@@ -1848,6 +1847,11 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 
msr, u64 data)
HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT);
if (kvm_is_error_hva(addr))
return 1;
+   tsc_ref.TscSequence =
+   boot_cpu_has(X86_FEATURE_CONSTANT_TSC) ? 1 : 0;


1) You want NONSTOP_TSC (see 40fb1715 commit) which matches INVARIANT TSC.
[VR]
Thank you for reviewing. Will fix it.
2) TscSequence should increase?
This field serves as a sequence number that is incremented whenever...
[VR]
Yes, on every VM resume, including migration. After migration we also need
to recalculate scale and adjust offset.
3) 0x is the value for invalid source of reference time?
[VR] Yes, on boot-up. In this case guest will go with PMTimer (not sure about 
HPET
but I can check). But if we set sequence to 0x after migration - it's 
probably will not work.


Reference TSC during Save and Restore and Migration

To address migration scenarios to physical platforms that do not support
iTSC, the TscSequence field is used. In the event that a guest partition
is  migrated from an iTSC capable host to a non-iTSC capable host, the
hypervisor sets TscSequence to the special value of 0x, which
directs the guest operating system to fall back to a different clock
source (for example, the virtual PM timer).

Why it would not/does not work after migration?




what exactly do we heed the reference TSC for? the reference counter alone 
works great and it seems
that there is a lot of trouble and crash possibilities involved with the 
referece tsc.

Peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 1/2] add support for Hyper-V reference time counter

2013-05-23 Thread Peter Lieven

On 23.05.2013 11:54, Paolo Bonzini wrote:

Il 23/05/2013 08:17, Peter Lieven ha scritto:

On 22.05.2013 23:55, Paolo Bonzini wrote:

Il 22/05/2013 09:32, Vadim Rozenfeld ha scritto:

@@ -1827,6 +1829,29 @@ static int set_msr_hyperv_pw(struct kvm_vcpu
*vcpu, u32 msr, u64 data)
   if (__copy_to_user((void __user *)addr, instructions, 4))
   return 1;
   kvm-arch.hv_hypercall = data;
+local_irq_disable();
+kvm-arch.hv_ref_count = get_kernel_ns();
+local_irq_enable();
+break;

local_irq_disable/local_irq_enable not needed.


What is the reasoning behind reading this time value at msr write time?
[VR] Windows writs this MSR only once, during HAL initialization.
So, I decided to treat this call as a partition crate event.



But is it expected by Windows that the reference count starts counting
up from 0 at partition creation time?  If you could just use
(get_kernel_ns() + kvm-arch.kvmclock_offset) / 100, it would also be
simpler for migration purposes.


I can just report, that I have used the patch that does it that way and
it works.


What do you mean by that way? :)


Ups sorry… I meant the way it was implemented in the old patch (I sent a few 
days ago).

@@ -1426,6 +1428,21 @@ static int set_msr_hyperv_pw(struct kvm_
if (__copy_to_user((void   *)addr, instructions, 4))
return 1;
kvm-arch.hv_hypercall = data;
+   kvm-arch.hv_ref_count = get_kernel_ns();
+   break;
+   }
+   case HV_X64_MSR_REFERENCE_TSC: {
+   u64 gfn;
+   unsigned long addr;
+   u32 hv_tsc_sequence;
+   gfn = data  HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT;
+   addr = gfn_to_hva(kvm, gfn);
+   if (kvm_is_error_hva(addr))
+   return 1;
+   hv_tsc_sequence = 0x0; //invalid
+   if (__copy_to_user((void   *)addr, (void __user *) 
hv_tsc_sequence, sizeof(hv_tsc_sequence)))
+   return 1;   
+   kvm-arch.hv_reference_tsc = data;
break;
}
default:
@@ -1826,6 +1843,17 @@ static int get_msr_hyperv_pw(struct kvm_
case HV_X64_MSR_HYPERCALL:
data = kvm-arch.hv_hypercall;
break;
+   case HV_X64_MSR_TIME_REF_COUNT: {
+   u64 now_ns;
+   local_irq_disable();
+   now_ns = get_kernel_ns();
+   data = div_u64(now_ns + kvm-arch.kvmclock_offset - 
kvm-arch.hv_ref_count,100);
+   local_irq_enable();
+   break;
+   }
+   case HV_X64_MSR_REFERENCE_TSC:
+   data = kvm-arch.hv_reference_tsc;
+   break;
default:
pr_unimpl(vcpu, Hyper-V unhandled rdmsr: 0x%x\n, msr);
return 1;


Peter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 2/2] add support for Hyper-V invariant TSC

2013-05-23 Thread Peter Lieven

On 23.05.2013 14:33, Vadim Rozenfeld wrote:



- Original Message -
From: Peter Lieven p...@dlhnet.de
To: Marcelo Tosatti mtosa...@redhat.com
Cc: Vadim Rozenfeld vroze...@redhat.com, kvm@vger.kernel.org, 
g...@redhat.com, p...@dlh.net
Sent: Thursday, May 23, 2013 4:18:55 PM
Subject: Re: [RFC PATCH v2 2/2] add support for Hyper-V invariant TSC

On 22.05.2013 23:23, Marcelo Tosatti wrote:

On Wed, May 22, 2013 at 03:22:55AM -0400, Vadim Rozenfeld wrote:



- Original Message -
From: Marcelo Tosatti mtosa...@redhat.com
To: Vadim Rozenfeld vroze...@redhat.com
Cc: kvm@vger.kernel.org, g...@redhat.com, p...@dlh.net
Sent: Wednesday, May 22, 2013 10:50:46 AM
Subject: Re: [RFC PATCH v2 2/2] add support for Hyper-V invariant TSC

On Sun, May 19, 2013 at 05:06:37PM +1000, Vadim Rozenfeld wrote:

The following patch allows to activate a partition reference
time enlightenment that is based on the host platform's support
for an Invariant Time Stamp Counter (iTSC).
NOTE: This code will survive migration due to lack of VM stop/resume
handlers, when offset, scale and sequence should be
readjusted.

---
   arch/x86/kvm/x86.c | 6 +-
   1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9645dab..b423fe4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1838,7 +1838,6 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 
msr, u64 data)
u64 gfn;
unsigned long addr;
HV_REFERENCE_TSC_PAGE tsc_ref;
-   tsc_ref.TscSequence = 0;
if (!(data  HV_X64_MSR_TSC_REFERENCE_ENABLE)) {
kvm-arch.hv_tsc_page = data;
break;
@@ -1848,6 +1847,11 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 
msr, u64 data)
HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT);
if (kvm_is_error_hva(addr))
return 1;
+   tsc_ref.TscSequence =
+   boot_cpu_has(X86_FEATURE_CONSTANT_TSC) ? 1 : 0;


1) You want NONSTOP_TSC (see 40fb1715 commit) which matches INVARIANT TSC.
[VR]
Thank you for reviewing. Will fix it.
2) TscSequence should increase?
This field serves as a sequence number that is incremented whenever...
[VR]
Yes, on every VM resume, including migration. After migration we also need
to recalculate scale and adjust offset.
3) 0x is the value for invalid source of reference time?
[VR] Yes, on boot-up. In this case guest will go with PMTimer (not sure about 
HPET
but I can check). But if we set sequence to 0x after migration - it's 
probably will not work.


Reference TSC during Save and Restore and Migration

To address migration scenarios to physical platforms that do not support
iTSC, the TscSequence field is used. In the event that a guest partition
is  migrated from an iTSC capable host to a non-iTSC capable host, the
hypervisor sets TscSequence to the special value of 0x, which
directs the guest operating system to fall back to a different clock
source (for example, the virtual PM timer).

Why it would not/does not work after migration?




what exactly do we heed the reference TSC for? the reference counter alone 
works great and it seems
that there is a lot of trouble and crash possibilities involved with the 
referece tsc.

[VR]
Because it is incredibly light and fast.
The simple test which calls QueryPerformanceCounter in a
loop 10 millions times gives we the following results:
PMTimer 32269 ms
HPET38466 ms
Ref Count   6499 ms
iTSC1169 ms


is the ref_count with local_irq_disable or preempt_disable?

Peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 1/2] add support for Hyper-V reference time counter

2013-05-23 Thread Peter Lieven

On 23.05.2013 15:18, Paolo Bonzini wrote:

Il 23/05/2013 14:25, Vadim Rozenfeld ha scritto:



- Original Message -
From: Peter Lieven p...@dlhnet.de
To: Paolo Bonzini pbonz...@redhat.com
Cc: Vadim Rozenfeld vroze...@redhat.com, Marcelo Tosatti 
mtosa...@redhat.com, kvm@vger.kernel.org, g...@redhat.com, p...@dlh.net
Sent: Thursday, May 23, 2013 4:17:57 PM
Subject: Re: [RFC PATCH v2 1/2] add support for Hyper-V reference time counter

On 22.05.2013 23:55, Paolo Bonzini wrote:

Il 22/05/2013 09:32, Vadim Rozenfeld ha scritto:

@@ -1827,6 +1829,29 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 
msr, u64 data)
if (__copy_to_user((void __user *)addr, instructions, 4))
return 1;
kvm-arch.hv_hypercall = data;
+   local_irq_disable();
+   kvm-arch.hv_ref_count = get_kernel_ns();
+   local_irq_enable();
+   break;

local_irq_disable/local_irq_enable not needed.


What is the reasoning behind reading this time value at msr write time?
[VR] Windows writs this MSR only once, during HAL initialization.
So, I decided to treat this call as a partition crate event.



But is it expected by Windows that the reference count starts counting
up from 0 at partition creation time?  If you could just use
(get_kernel_ns() + kvm-arch.kvmclock_offset) / 100, it would also be
simpler for migration purposes.


I can just report, that I have used the patch that does it that way and it 
works.
Maybe Windows is calculating the uptime by the reference counter?

[VR]
Windows use it (reference counters/iTSC/PMTimer/HPET) as a time-stamp source
for (Ke)QueryPerformanceCounter function.


So I would prefer to remove kvm-arch.hv_ref_count altogether.


But only if the migration support is guaranteed.
And what if we have a host which lacks invariant TSC support?

Peter




Paolo



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 1/2] add support for Hyper-V reference time counter

2013-05-23 Thread Peter Lieven

On 23.05.2013 15:23, Paolo Bonzini wrote:

Il 23/05/2013 15:20, Peter Lieven ha scritto:

On 23.05.2013 15:18, Paolo Bonzini wrote:

Il 23/05/2013 14:25, Vadim Rozenfeld ha scritto:



- Original Message -
From: Peter Lieven p...@dlhnet.de
To: Paolo Bonzini pbonz...@redhat.com
Cc: Vadim Rozenfeld vroze...@redhat.com, Marcelo Tosatti
mtosa...@redhat.com, kvm@vger.kernel.org, g...@redhat.com, p...@dlh.net
Sent: Thursday, May 23, 2013 4:17:57 PM
Subject: Re: [RFC PATCH v2 1/2] add support for Hyper-V reference
time counter

On 22.05.2013 23:55, Paolo Bonzini wrote:

Il 22/05/2013 09:32, Vadim Rozenfeld ha scritto:

@@ -1827,6 +1829,29 @@ static int set_msr_hyperv_pw(struct
kvm_vcpu *vcpu, u32 msr, u64 data)
if (__copy_to_user((void __user *)addr, instructions, 4))
return 1;
kvm-arch.hv_hypercall = data;
+local_irq_disable();
+kvm-arch.hv_ref_count = get_kernel_ns();
+local_irq_enable();
+break;

local_irq_disable/local_irq_enable not needed.


What is the reasoning behind reading this time value at msr write
time?
[VR] Windows writs this MSR only once, during HAL initialization.
So, I decided to treat this call as a partition crate event.



But is it expected by Windows that the reference count starts counting
up from 0 at partition creation time?  If you could just use
(get_kernel_ns() + kvm-arch.kvmclock_offset) / 100, it would also be
simpler for migration purposes.


I can just report, that I have used the patch that does it that way
and it works.
Maybe Windows is calculating the uptime by the reference counter?

[VR]
Windows use it (reference counters/iTSC/PMTimer/HPET) as a time-stamp
source
for (Ke)QueryPerformanceCounter function.


So I would prefer to remove kvm-arch.hv_ref_count altogether.


But only if the migration support is guaranteed.


Migration support wouldn't work yet anyway, you need to recompute the
scale and sequence.  But that could be done by KVM_SET_CLOCK.


hv_ref_counter does work out of the box. what I was trying to say is
even it is slower than iTSC, it is significantly faster than hpet
or pmtimer and I can confirm it works flawlessly with migration.




And what if we have a host which lacks invariant TSC support?


Then the sequence must be set to 0 or 0x, I still haven't
understood. :)


yes, but windows does then fall back to pmtimer or hpet which is much slower
then reference counter.

Peter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/2] Hyper-H reference counter

2013-05-20 Thread Peter Lieven
Hi all,

sorry that I am a bit unresponsive about this series. I have a few days off and 
can't spend much time in this.
If I read that the REFERENCE TSC breaks migration I don't think its a good 
option to include it at all.
I have this hyperv_refcnt MSR in an internal patch I sent over about 1.5 years 
ago and its working flawlessly
with Win2k8R2, Win7, Win8 + Win2012. I set the reference TSC to 0x00 and this 
seems to work with all
the above Windows versions. Some of the early Alphas of Windows 8 didn't work 
with this patch, but the
final is running smoothly also with migration etc.

I crafted this patch to avoid the heavy calls to PM Timer during high I/O which 
slowed down Windows
approx. by 30% compared to Hyper-V.

I reinclude this patch for reference. Its unchanged since mid 2012 and it might 
not apply.

Cheers,
Peter

diff -Npur kvm-kmod-3.3/include/asm-x86/hyperv.h 
kvm-kmod-3.3-hyperv-refcnt/include/asm-x86/hyperv.h
--- kvm-kmod-3.3/include/asm-x86/hyperv.h   2012-03-19 23:00:49.0 
+0100
+++ kvm-kmod-3.3-hyperv-refcnt/include/asm-x86/hyperv.h 2012-03-28 
12:23:02.0 +0200
@@ -169,7 +169,8 @@

/* MSR used to read the per-partition time reference counter */
#define HV_X64_MSR_TIME_REF_COUNT   0x4020
-
+#define HV_X64_MSR_REFERENCE_TSC   0x4021
+
/* Define the virtual APIC registers */
#define HV_X64_MSR_EOI  0x4070
#define HV_X64_MSR_ICR  0x4071
diff -Npur kvm-kmod-3.3/include/asm-x86/kvm_host.h 
kvm-kmod-3.3-hyperv-refcnt/include/asm-x86/kvm_host.h
--- kvm-kmod-3.3/include/asm-x86/kvm_host.h 2012-03-19 23:00:49.0 
+0100
+++ kvm-kmod-3.3-hyperv-refcnt/include/asm-x86/kvm_host.h   2012-03-28 
15:08:24.0 +0200
@@ -553,6 +553,8 @@ struct kvm_arch {
/* fields used by HYPER-V emulation */
u64 hv_guest_os_id;
u64 hv_hypercall;
+   u64 hv_ref_count;
+   u64 hv_reference_tsc;

atomic_t reader_counter;

diff -Npur kvm-kmod-3.3/x86/x86.c kvm-kmod-3.3-hyperv-refcnt/x86/x86.c
--- kvm-kmod-3.3/x86/x86.c  2012-03-19 23:00:56.0 +0100
+++ kvm-kmod-3.3-hyperv-refcnt/x86/x86.c2012-03-28 16:27:46.0 
+0200
@@ -826,7 +826,7 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc);
static u32 msrs_to_save[] = {
MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
-   HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
+   HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, HV_X64_MSR_TIME_REF_COUNT,
HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
MSR_STAR,
@@ -1387,6 +1387,8 @@ static bool kvm_hv_msr_partition_wide(u3
switch (msr) {
case HV_X64_MSR_GUEST_OS_ID:
case HV_X64_MSR_HYPERCALL:
+   case HV_X64_MSR_REFERENCE_TSC:
+   case HV_X64_MSR_TIME_REF_COUNT:
r = true;
break;
}
@@ -1426,6 +1428,21 @@ static int set_msr_hyperv_pw(struct kvm_
if (__copy_to_user((void   *)addr, instructions, 4))
return 1;
kvm-arch.hv_hypercall = data;
+   kvm-arch.hv_ref_count = get_kernel_ns();
+   break;
+   }
+   case HV_X64_MSR_REFERENCE_TSC: {
+   u64 gfn;
+   unsigned long addr;
+   u32 hv_tsc_sequence;
+   gfn = data  HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT;
+   addr = gfn_to_hva(kvm, gfn);
+   if (kvm_is_error_hva(addr))
+   return 1;
+   hv_tsc_sequence = 0x0; //invalid
+   if (__copy_to_user((void   *)addr, (void __user *) 
hv_tsc_sequence, sizeof(hv_tsc_sequence)))
+   return 1;   
+   kvm-arch.hv_reference_tsc = data;
break;
}
default:
@@ -1826,6 +1843,17 @@ static int get_msr_hyperv_pw(struct kvm_
case HV_X64_MSR_HYPERCALL:
data = kvm-arch.hv_hypercall;
break;
+   case HV_X64_MSR_TIME_REF_COUNT: {
+   u64 now_ns;
+   local_irq_disable();
+   now_ns = get_kernel_ns();
+   data = div_u64(now_ns + kvm-arch.kvmclock_offset - 
kvm-arch.hv_ref_count,100); 
+   local_irq_enable();
+   break;
+   }
+   case HV_X64_MSR_REFERENCE_TSC:
+   data = kvm-arch.hv_reference_tsc;
+   break;
default:
pr_unimpl(vcpu, Hyper-V unhandled rdmsr: 0x%x\n, msr);
return 1;


Am 20.05.2013 um 11:41 schrieb Gleb Natapov g...@redhat.com:

 On Mon, May 20, 2013 at 11:32:27AM +0200, Paolo Bonzini wrote:
 Il 20/05/2013 11:25, Gleb Natapov ha scritto:
 So in Hyper-V spec they
 say:
 
  Special value of 0x is used to indicate that this facility is no
  

Re: [RFC PATCH 1/2] Hyper-H reference counter

2013-05-14 Thread Peter Lieven

On 13.05.2013 13:45, Vadim Rozenfeld wrote:

Signed-off: Peter Lieven p...@dlh.net
Signed-off: Gleb Natapov g...@redhat.com
Signed-off: Vadim Rozenfeld vroze...@redhat.com

The following patch allows to activate Hyper-V
reference time counter
---
  arch/x86/include/asm/kvm_host.h|  2 ++
  arch/x86/include/uapi/asm/hyperv.h |  3 +++
  arch/x86/kvm/x86.c | 25 -
  3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3741c65..f0fee35 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -575,6 +575,8 @@ struct kvm_arch {
/* fields used by HYPER-V emulation */
u64 hv_guest_os_id;
u64 hv_hypercall;
+   u64 hv_ref_count;
+   u64 hv_tsc_page;

#ifdef CONFIG_KVM_MMU_AUDIT
int audit_point;
diff --git a/arch/x86/include/uapi/asm/hyperv.h 
b/arch/x86/include/uapi/asm/hyperv.h
index b80420b..9711819 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h
@@ -136,6 +136,9 @@
  /* MSR used to read the per-partition time reference counter */
  #define HV_X64_MSR_TIME_REF_COUNT 0x4020

+/* A partition's reference time stamp counter (TSC) page */
+#define HV_X64_MSR_REFERENCE_TSC   0x4021
+
  /* Define the virtual APIC registers */
  #define HV_X64_MSR_EOI0x4070
  #define HV_X64_MSR_ICR0x4071
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 094b5d9..1a4036d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -843,7 +843,7 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc);
  static u32 msrs_to_save[] = {
MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
-   HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
+   HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,HV_X64_MSR_TIME_REF_COUNT,
HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
MSR_KVM_PV_EOI_EN,
MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
@@ -1764,6 +1764,8 @@ static bool kvm_hv_msr_partition_wide(u32 msr)
switch (msr) {
case HV_X64_MSR_GUEST_OS_ID:
case HV_X64_MSR_HYPERCALL:
+   case HV_X64_MSR_REFERENCE_TSC:
+   case HV_X64_MSR_TIME_REF_COUNT:
r = true;
break;
}
@@ -1803,6 +1805,21 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 
msr, u64 data)
if (__copy_to_user((void __user *)addr, instructions, 4))
return 1;
kvm-arch.hv_hypercall = data;
+   kvm-arch.hv_ref_count = get_kernel_ns();
+   break;
+   }
+   case HV_X64_MSR_REFERENCE_TSC: {
+   u64 gfn;
+   unsigned long addr;
+   u32 tsc_ref;
+   gfn = data  HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT;
+   addr = gfn_to_hva(kvm, gfn);
+   if (kvm_is_error_hva(addr))
+   return 1;
+   tsc_ref = 0;
+   if(__copy_to_user((void __user *)addr, tsc_ref, 
sizeof(tsc_ref)))
+   return 1;
+   kvm-arch.hv_tsc_page = data;
break;
}
default:
@@ -2229,6 +2246,12 @@ static int get_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 
msr, u64 *pdata)
case HV_X64_MSR_HYPERCALL:
data = kvm-arch.hv_hypercall;
break;
+   case HV_X64_MSR_TIME_REF_COUNT:
+   data = div_u64(get_kernel_ns() - kvm-arch.hv_ref_count,100);
+   break;



in an earlier version of this patch I have the following:

+   case HV_X64_MSR_TIME_REF_COUNT: {
+   u64 now_ns;
+   local_irq_disable();
+   now_ns = get_kernel_ns();
+   data = div_u64(now_ns + kvm-arch.kvmclock_offset - 
kvm-arch.hv_ref_count,100);
+   local_irq_enable();
+   break;
+   }

I do not know if this is right, but I can report that this one is working 
without any flaws
since approx. 1.5 years.

Peter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Ubuntu/Debian Installer + Virtio-SCSI - Bad ram pointer

2012-11-22 Thread Peter Lieven


On 19.11.2012 18:20, Stefan Hajnoczi wrote:

On Thu, Nov 8, 2012 at 4:26 PM, Peter Lieven p...@dlhnet.de wrote:

Has anyone any other idea what the cause could be or where to start?


Hi Peter,
I suggested posting the source tree you are building.  Since you have
applied patches yourself no one else is able to follow along with the
gdb output or reproduce the issue accurately.


Sorry for the late reply, I used qemu git at 
e24dc9feb0d68142d54dc3c097f57588836d1338
and libiscsi git at 3b3036b9dae55f0c3eef9d75db89c7b78f637a12.

The cmdline:
qemu-system-x86_64 -enable-kvm -m 1024 -drive 
if=virtio,file=iscsi://172.21.200.56/iqn.2001-05.com.equallogic:0-8a0906-62ff4e007-e4a3c8908af50839-test-3000g/0
 -cdrom ubuntu-12.04.1-server-amd64.iso -vnc :1

The vm crashes with:

Bad ram pointer 0x7fd220008000

after the user settings and timezone config when loading the module
libdmraid1.0.0.rc16-udeb

I hope this helps to reproduce.

Peter



Stefan



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Ubuntu/Debian Installer + Virtio-SCSI - Bad ram pointer

2012-11-08 Thread Peter Lieven
Has anyone any other idea what the cause could be or where to start?

Peter

Am 31.10.2012 um 15:08 schrieb ronnie sahlberg:

 On Tue, Oct 30, 2012 at 10:48 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Tue, Oct 30, 2012 at 10:09 PM, ronnie sahlberg
 ronniesahlb...@gmail.com wrote:
 About half a year there was an issue where recent kernels had added
 support to start using new scsi opcodes,  but the qemu functions that
 determine which transfer direction is used for this opcode had not
 yet been updated, so that the opcode was sent with the wrong transfer
 direction.
 
 That caused the guests memory to be overwritten and crash.
 
 I dont have (easy) access to the git tree right now, but it was a
 patch for the ATA_PASSTHROUGH command that fixed that.
 
 This patch?
 
 http://patchwork.ozlabs.org/patch/174946/
 
 Stefan
 
 This is the one I was thinking about :
 381b634c275ca1a2806e97392527bbfc01bcb333
 
 But that also crashed when using local /dev/sg* devices.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Ubuntu/Debian Installer + Virtio-SCSI - Bad ram pointer

2012-11-05 Thread Peter Lieven

Am 31.10.2012 um 15:08 schrieb ronnie sahlberg:

 On Tue, Oct 30, 2012 at 10:48 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Tue, Oct 30, 2012 at 10:09 PM, ronnie sahlberg
 ronniesahlb...@gmail.com wrote:
 About half a year there was an issue where recent kernels had added
 support to start using new scsi opcodes,  but the qemu functions that
 determine which transfer direction is used for this opcode had not
 yet been updated, so that the opcode was sent with the wrong transfer
 direction.
 
 That caused the guests memory to be overwritten and crash.
 
 I dont have (easy) access to the git tree right now, but it was a
 patch for the ATA_PASSTHROUGH command that fixed that.
 
 This patch?
 
 http://patchwork.ozlabs.org/patch/174946/
 
 Stefan
 
 This is the one I was thinking about :
 381b634c275ca1a2806e97392527bbfc01bcb333
 
 But that also crashed when using local /dev/sg* devices.

I was using a local LVM Volume not an iSCSI disk. 

I added debugging output and breakpoints to scsi_cmd_xfer_mode().
The function is not called.

Peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Ubuntu/Debian Installer + Virtio-BLK - Bad ram pointer

2012-10-30 Thread Peter Lieven

On 30.10.2012 09:32, Stefan Hajnoczi wrote:

On Mon, Oct 29, 2012 at 03:09:37PM +0100, Peter Lieven wrote:

Hi,

Bug subject should be virtio-blk, not virtio-scsi.  virtio-scsi is a
different virtio device type from virtoi-blk and is not present in the
backtrace you posted.

you are right, sorry for that.


Sounds pedantic but I want to make sure this gets chalked up against the
right device :).


If I try to Install Ubuntu 12.04 LTS / 12.10 64-bit on a virtio
storage backend that supports iSCSI
qemu-kvm crashes reliably with the following error:

Are you using vanilla qemu-kvm-1.2.0 or are there patches applied?
I use vanilla qemu-kvm 1.2.0 except for one virtio-blk related patch 
(CVE-2011-4127):

http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=commit;h=1ba1f2e319afdcb485963cd3f426fdffd1b725f2
that for some reason did not made it into qemu-kvm 1.2.0 and two aio 
related patchs:

http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=commit;h=00f78533326c5ba2e62fafada16655aa558a5520
http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=commit;h=2db2bfc0ccac5fd68dbf0ceb70fbc372c5d8a8c7

this is why I can circumvent the issue with scsi=off i guess.


Have you tried qemu-kvm.git/master?

not yet.


Have you tried a local raw disk image to check whether libiscsi is
involved?
I have, here it does not happen. For a raw device scsi is scsi=off, 
isn't it?



Bad ram pointer 0x3039303620008000

This happens directly after the confirmation of the Timezone before
the Disk is partitioned.

If I specify  -global virtio-blk-pci.scsi=off in the cmdline this
does not happen.

Here is a stack trace:

Thread 1 (Thread 0x77fee700 (LWP 8226)):
#0 0x763c0a10 in abort () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1 https://github.com/sahlberg/libiscsi/issues/1
0x557b751d in qemu_ram_addr_from_host_nofail (
ptr=0x3039303620008000) at /usr/src/qemu-kvm-1.2.0/exec.c:2835
ram_addr = 0
#2 https://github.com/sahlberg/libiscsi/issues/2
0x557b9177 in cpu_physical_memory_unmap (
buffer=0x3039303620008000, len=4986663671065686081, is_write=1,
access_len=1) at /usr/src/qemu-kvm-1.2.0/exec.c:3645

buffer and len are ASCII junk.  It appears to be hex digits and it's not
clear where they come from.

It would be interesting to print *elem one stack frame up in #3
virtqueue_fill() to show the iovecs and in/out counts.

I will collect that info for you.

Peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Ubuntu/Debian Installer + Virtio-SCSI - Bad ram pointer

2012-10-30 Thread Peter Lieven

On 30.10.2012 09:32, Stefan Hajnoczi wrote:

On Mon, Oct 29, 2012 at 03:09:37PM +0100, Peter Lieven wrote:

Hi,

Bug subject should be virtio-blk, not virtio-scsi.  virtio-scsi is a
different virtio device type from virtoi-blk and is not present in the
backtrace you posted.

Sounds pedantic but I want to make sure this gets chalked up against the
right device :).


If I try to Install Ubuntu 12.04 LTS / 12.10 64-bit on a virtio
storage backend that supports iSCSI
qemu-kvm crashes reliably with the following error:

Are you using vanilla qemu-kvm-1.2.0 or are there patches applied?

Have you tried qemu-kvm.git/master?

Have you tried a local raw disk image to check whether libiscsi is
involved?


Bad ram pointer 0x3039303620008000

This happens directly after the confirmation of the Timezone before
the Disk is partitioned.

If I specify  -global virtio-blk-pci.scsi=off in the cmdline this
does not happen.

Here is a stack trace:

Thread 1 (Thread 0x77fee700 (LWP 8226)):
#0 0x763c0a10 in abort () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1 https://github.com/sahlberg/libiscsi/issues/1
0x557b751d in qemu_ram_addr_from_host_nofail (
ptr=0x3039303620008000) at /usr/src/qemu-kvm-1.2.0/exec.c:2835
ram_addr = 0
#2 https://github.com/sahlberg/libiscsi/issues/2
0x557b9177 in cpu_physical_memory_unmap (
buffer=0x3039303620008000, len=4986663671065686081, is_write=1,
access_len=1) at /usr/src/qemu-kvm-1.2.0/exec.c:3645

buffer and len are ASCII junk.  It appears to be hex digits and it's not
clear where they come from.

It would be interesting to print *elem one stack frame up in #3
virtqueue_fill() to show the iovecs and in/out counts.


(gdb) print *elem
$6 = {index = 3, out_num = 2, in_num = 4, in_addr = {1914920960, 1916656688,
2024130072, 2024130088, 0 repeats 508 times, 4129, 93825009696000,
140737328183160, 0 repeats 509 times}, out_addr = {2024130056,
2038414056, 0, 8256, 4128, 93824999311936, 0, 3, 0 repeats 512 times,
12385, 93825009696000, 140737328183160, 0 repeats 501 times}, 
in_sg = {{

  iov_base = 0x3039303620008000, iov_len = 4986663671065686081}, {
  iov_base = 0x383038454635, iov_len = 3544389261899019573}, {
  iov_base = 0x2aab32443039, iov_len = 16}, {iov_base = 0x2aab2365c628,
  iov_len = 1}, {iov_base = 0x0, iov_len = 0}, {iov_base = 0x0,
  iov_len = 0}, {iov_base = 0x2041, iov_len = 93825010788016}, {
  iov_base = 0x7673f778, iov_len = 0}, {iov_base = 0x0,
  iov_len = 0} repeats 256 times, {iov_base = 0x1021,
  iov_len = 93825010788016}, {iov_base = 0x7673f778, iov_len = 
0}, {

  iov_base = 0x0, iov_len = 0} repeats 255 times, {iov_base = 0x0,
  iov_len = 24768}, {iov_base = 0x1020, iov_len = 93824999311936}, {
  iov_base = 0x0, iov_len = 2}, {iov_base = 0x0,
  iov_len = 0} repeats 256 times, {iov_base = 0x1021,
  iov_len = 93825009696000}, {iov_base = 0x7673f778, iov_len = 
0}, {

  iov_base = 0x0, iov_len = 0} repeats 242 times}, out_sg = {{
  iov_base = 0x2aab2365c608, iov_len = 16}, {iov_base = 0x2aab243fbae8,
  iov_len = 6}, {iov_base = 0x0, iov_len = 0} repeats 11 times, {
  iov_base = 0x0, iov_len = 33024}, {iov_base = 0x30,
  iov_len = 93825010821424}, {iov_base = 0x5670d7a0, iov_len = 
0}, {

  iov_base = 0x5670cbb0, iov_len = 0}, {iov_base = 0x71,
  iov_len = 93825008729792}, {iov_base = 0x5670e960, iov_len = 
0}, {
  iov_base = 0x31, iov_len = 140737328183192}, {iov_base = 
0x7673f798,

  iov_len = 0}, {iov_base = 0x56711e20, iov_len = 80}, {
  iov_base = 0x20, iov_len = 93825010821584}, {iov_base = 0x0,
  iov_len = 33184}, {iov_base = 0x30, iov_len = 93825010821536}, {
  iov_base = 0x5670e840, iov_len = 0}, {iov_base = 0x5670e1b0,
  iov_len = 0}, {iov_base = 0x41, iov_len = 93825010821584}, {
  iov_base = 0x5670eb20, iov_len = 32}, {iov_base = 0x20,
  iov_len = 93825010821920}, {iov_base = 0x0, iov_len = 33296}, {
  iov_base = 0x30, iov_len = 93825010821872}, {iov_base = 
0x5670e8b0,

  iov_len = 0}, {iov_base = 0x5670dc68, iov_len = 0}, {
  iov_base = 0x191, iov_len = 93825009696736}, {iov_base = 
0x5670eb20,

  iov_len = 0}, {iov_base = 0x21, iov_len = 93825010826352}, {
  iov_base = 0x5670e880, iov_len = 64}, {iov_base = 0x30,
  iov_len = 93825010821200}, {iov_base = 0x5670e920, iov_len = 
0}, {

  iov_base = 0x5670e5c8, iov_len = 0}, {iov_base = 0x41,
  iov_len = 93825008729792}, {iov_base = 0x5670e9d0, iov_len = 
32}, {

  iov_base = 0x20, iov_len = 93825010821696}, {iov_base = 0x0,
  iov_len = 176}, {iov_base = 0x30, iov_len = 93825010821648}, {
  iov_base = 0x5670e990, iov_len = 0}, {iov_base = 0x5670e080,
  iov_len = 0}, {iov_base = 0x41, iov_len = 93825008729792}, {
  iov_base = 0x5670eb20, iov_len = 32}, {iov_base = 0x20,
  iov_len

Re: [Qemu-devel] Ubuntu/Debian Installer + Virtio-SCSI - Bad ram pointer

2012-10-30 Thread Peter Lieven

Am 30.10.2012 19:27, schrieb Stefan Hajnoczi:

On Tue, Oct 30, 2012 at 4:56 PM, Peter Lieven p...@dlhnet.de wrote:

On 30.10.2012 09:32, Stefan Hajnoczi wrote:

On Mon, Oct 29, 2012 at 03:09:37PM +0100, Peter Lieven wrote:

Hi,

Bug subject should be virtio-blk, not virtio-scsi.  virtio-scsi is a
different virtio device type from virtoi-blk and is not present in the
backtrace you posted.

Sounds pedantic but I want to make sure this gets chalked up against the
right device :).


If I try to Install Ubuntu 12.04 LTS / 12.10 64-bit on a virtio
storage backend that supports iSCSI
qemu-kvm crashes reliably with the following error:

Are you using vanilla qemu-kvm-1.2.0 or are there patches applied?

Have you tried qemu-kvm.git/master?

Have you tried a local raw disk image to check whether libiscsi is
involved?


Bad ram pointer 0x3039303620008000

This happens directly after the confirmation of the Timezone before
the Disk is partitioned.

If I specify  -global virtio-blk-pci.scsi=off in the cmdline this
does not happen.

Here is a stack trace:

Thread 1 (Thread 0x77fee700 (LWP 8226)):
#0 0x763c0a10 in abort () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1 https://github.com/sahlberg/libiscsi/issues/1
0x557b751d in qemu_ram_addr_from_host_nofail (
ptr=0x3039303620008000) at /usr/src/qemu-kvm-1.2.0/exec.c:2835
ram_addr = 0
#2 https://github.com/sahlberg/libiscsi/issues/2
0x557b9177 in cpu_physical_memory_unmap (
buffer=0x3039303620008000, len=4986663671065686081, is_write=1,
access_len=1) at /usr/src/qemu-kvm-1.2.0/exec.c:3645

buffer and len are ASCII junk.  It appears to be hex digits and it's not
clear where they come from.

It would be interesting to print *elem one stack frame up in #3
virtqueue_fill() to show the iovecs and in/out counts.


(gdb) print *elem

Great, thanks for providing this info:


$6 = {index = 3, out_num = 2, in_num = 4, in_addr = {1914920960, 1916656688,
 2024130072, 2024130088, 0 repeats 508 times, 4129, 93825009696000,
 140737328183160, 0 repeats 509 times}, out_addr = {2024130056,
 2038414056, 0, 8256, 4128, 93824999311936, 0, 3, 0 repeats 512 times,
 12385, 93825009696000, 140737328183160, 0 repeats 501 times},

Up to here everything is fine.


in_sg =
{{
   iov_base = 0x3039303620008000, iov_len = 4986663671065686081}, {
   iov_base = 0x383038454635, iov_len = 3544389261899019573}, {

The fields are bogus, in_sg has been overwritten with ASCII data.
Unfortunately I don't see any hint of where this ASCII data came from
yet.

The hdr fields you provided in stack frame #6 show that in_sg was
overwritten during or after the bdrv_ioctl() call.  We pulled valid
data out of the vring and mapped buffers correctly.  But something is
overwriting in_sg and when we complete the request we blow up due to
the bogus values.

Ok. What I have to mention. I've been testing with qemu-kvm 1.2.0
and libiscsi for a few weeks now. Its been very stable. The only thing
it blows up is during the debian/ubuntu installer. Ubuntu itself for
instance is running flawlessly. My guess is that the installer is probing
for something. The installer itself also runs flawlessly when I disable
scsi passthru with scsi=off.


Please post your full qemu-kvm command-line.
/usr/bin/qemu-kvm-1.2.0  -net 
tap,vlan=164,script=no,downscript=no,ifname=tap0  -net 
nic,vlan=164,model=e1000,macaddr=52:54:00:ff:01:35   -iscsi 
initiator-name=iqn.2005-03.org.virtual-core:0025b51f001c  -drive 
format=iscsi,file=iscsi://172.21.200.56/iqn.2001-05.com.equallogic:0-8a0906-335f4e007-d29001a3355508e8-libiscsi-test-hd0/0,if=virtio,cache=none,aio=native 
-m 2048 -smp 2,sockets=1,cores=2,threads=1  -monitor 
tcp:0:4002,server,nowait -vnc :2 -qmp tcp:0:3002,server,nowait -name 
'libiscsi-debug'  -boot order=dc,menu=off  -k de  -pidfile 
/var/run/qemu/vm-280.pid  -mem-path /hugepages  -mem-prealloc  -cpu 
host,+x2apic,model_id='Intel(R) Xeon(R) CPU   L5640  @ 
2.27GHz',-tsc  -rtc base=utc -usb -usbdevice tablet -no-hpet -vga cirrus


Please also post the exact qemu-kvm version you are using.  I can see
it's based on qemu-kvm-1.2.0 but are there any patches applied (e.g.
distro packages may carry patches so the full package version
information would be useful)?

I use vanilly qemu-kvm 1.2.0 with some cherry picked patches. I will
retry with untouched qemu-kvm 1.2.0 and latest git tomorrow at latest.


Thanks,
Stefan

Thank you, too
Peter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Ubuntu/Debian Installer + Virtio-SCSI - Bad ram pointer

2012-10-30 Thread Peter Lieven

Am 30.10.2012 19:27, schrieb Stefan Hajnoczi:

On Tue, Oct 30, 2012 at 4:56 PM, Peter Lieven p...@dlhnet.de wrote:

On 30.10.2012 09:32, Stefan Hajnoczi wrote:

On Mon, Oct 29, 2012 at 03:09:37PM +0100, Peter Lieven wrote:

Hi,

Bug subject should be virtio-blk, not virtio-scsi.  virtio-scsi is a
different virtio device type from virtoi-blk and is not present in the
backtrace you posted.

Sounds pedantic but I want to make sure this gets chalked up against the
right device :).


If I try to Install Ubuntu 12.04 LTS / 12.10 64-bit on a virtio
storage backend that supports iSCSI
qemu-kvm crashes reliably with the following error:

Are you using vanilla qemu-kvm-1.2.0 or are there patches applied?

Have you tried qemu-kvm.git/master?

Have you tried a local raw disk image to check whether libiscsi is
involved?


Bad ram pointer 0x3039303620008000

This happens directly after the confirmation of the Timezone before
the Disk is partitioned.

If I specify  -global virtio-blk-pci.scsi=off in the cmdline this
does not happen.

Here is a stack trace:

Thread 1 (Thread 0x77fee700 (LWP 8226)):
#0 0x763c0a10 in abort () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1 https://github.com/sahlberg/libiscsi/issues/1
0x557b751d in qemu_ram_addr_from_host_nofail (
ptr=0x3039303620008000) at /usr/src/qemu-kvm-1.2.0/exec.c:2835
ram_addr = 0
#2 https://github.com/sahlberg/libiscsi/issues/2
0x557b9177 in cpu_physical_memory_unmap (
buffer=0x3039303620008000, len=4986663671065686081, is_write=1,
access_len=1) at /usr/src/qemu-kvm-1.2.0/exec.c:3645

buffer and len are ASCII junk.  It appears to be hex digits and it's not
clear where they come from.

It would be interesting to print *elem one stack frame up in #3
virtqueue_fill() to show the iovecs and in/out counts.


(gdb) print *elem

Great, thanks for providing this info:


$6 = {index = 3, out_num = 2, in_num = 4, in_addr = {1914920960, 1916656688,
 2024130072, 2024130088, 0 repeats 508 times, 4129, 93825009696000,
 140737328183160, 0 repeats 509 times}, out_addr = {2024130056,
 2038414056, 0, 8256, 4128, 93824999311936, 0, 3, 0 repeats 512 times,
 12385, 93825009696000, 140737328183160, 0 repeats 501 times},

Up to here everything is fine.


in_sg =
{{
   iov_base = 0x3039303620008000, iov_len = 4986663671065686081}, {
   iov_base = 0x383038454635, iov_len = 3544389261899019573}, {

The fields are bogus, in_sg has been overwritten with ASCII data.
Unfortunately I don't see any hint of where this ASCII data came from
yet.

The hdr fields you provided in stack frame #6 show that in_sg was
overwritten during or after the bdrv_ioctl() call.  We pulled valid
data out of the vring and mapped buffers correctly.  But something is
overwriting in_sg and when we complete the request we blow up due to
the bogus values.

Please post your full qemu-kvm command-line.

Please also post the exact qemu-kvm version you are using.  I can see
it's based on qemu-kvm-1.2.0 but are there any patches applied (e.g.
distro packages may carry patches so the full package version
information would be useful)?
Stefan, Ronnie, if I do remove the following patch from my cherry-picked 
patches its

working again:

iSCSI: We need to support SG_IO also from iscsi_ioctl()

Peter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Block Migration and xbzrle

2012-10-02 Thread Peter Lieven

Am 02.10.2012 um 11:28 schrieb Orit Wasserman:

 On 10/02/2012 10:33 AM, lieven-li...@dlh.net wrote:
 Orit Wasserman wrote:
 On 09/16/2012 01:39 PM, Peter Lieven wrote:
 Hi,
 
 I remember that this was broken some time ago and currently with
 qemu-kvm 1.2.0 I am still not able to use
 block migration plus xbzrle. The migration fails if both are used
 together. XBZRLE without block migration works.
 
 Can someone please advise what is the current expected behaviour?
 XBZRLE only work on guest memory so it shouldn't be effected by block
 migration.
 What is the error you are getting?
 What command line ?
 
 Meanwhile I can confirm that it happens with and without block migration.
 I I observe 2 errors:
 a)
 qemu: warning: error while loading state section id 2
 load of migration failed
 b)
 the vm does not enter running state after migration.
 
 The command-line:
 /usr/bin/qemu-kvm-1.2.0  -net
 tap,vlan=798,script=no,downscript=no,ifname=tap1  -net
 nic,vlan=798,model=e1000,macaddr=52:54:00:ff:01:15   -drive
 format=host_device,file=/dev/mapper/iqn.2001-05.com.equallogic:0-8a0906-d85f4e007-3f30017ce11505df-ubuntu-tools-hd0,if=virtio,cache=none,aio=native
 -m 4096 -smp 2,sockets=1,cores=2,threads=1  -monitor
 tcp:0:4002,server,nowait -vnc :2 -qmp tcp:0:3002,server,nowait  -name
 'Ubuntu-Tools'  -boot order=dc,menu=off  -k de  -incoming
 tcp:172.21.55.34:5002  -pidfile /var/run/qemu/vm-250.pid  -mem-path
 /hugepages  -mem-prealloc  -rtc base=utc -usb -usbdevice tablet -no-hpet
 -vga cirrus  -cpu host,+x2apic,model_id='Intel(R) Xeon(R) CPU 
 Migration with -cpu host is very problemtic, because the source and 
 destination can 
 have different cpu resulting in different cpu features.
 Does regular migration works with this setup?
 Can you try with a different cpu type?
 What are the source and destination /proc/cpuinfo output ?

The CPUs are identical, we also check if flags and cpu types match if cpu type 
is set to host.
Regular migration does work.

BR,
Peter

 
 Cheers,
 Orit
 
 L5640  @ 2.27GHz',-tsc
 
 Thanks,
 Peter
 
 
 Regards,
 Orit
 
 Thanks,
 Peter
 
 
 
 
 
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Block Migration and xbzrle

2012-10-02 Thread Peter Lieven

Am 02.10.2012 um 11:38 schrieb Paolo Bonzini:

 Il 16/09/2012 12:39, Peter Lieven ha scritto:
 
 I remember that this was broken some time ago and currently with
 qemu-kvm 1.2.0 I am still not able to use
 block migration plus xbzrle. The migration fails if both are used
 together. XBZRLE without block migration works.
 
 Can someone please advise what is the current expected behaviour?
 
 Block migration is broken by design.  It will converge really slowly as
 soon as you have real load in the VMs, and it will hamper the
 convergence of RAM as well.
 
 Hopefully a real alternative will be in 1.3 (based on drive-mirror on
 the source + an embedded NBD server running on the destination), then in
 1.4 we can reimplement the block migration monitor commands using the
 alternative.

Hi Paolo, i know that block migration is not that good, but it seems that
there is a bug in XBZRLE that is independent of block migration.

Peter

 
 Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Block Migration Assertion in qemu-kvm 1.2.0

2012-09-18 Thread Peter Lieven

On 09/17/12 22:12, Peter Lieven wrote:

On 09/17/12 10:41, Kevin Wolf wrote:

Am 16.09.2012 12:13, schrieb Peter Lieven:

Hi,

when trying to block migrate a VM from one node to another, the source
VM crashed with the following assertion:
block.c:3829: bdrv_set_in_use: Assertion `bs-in_use != in_use' failed.

Is this sth already addresses/known?

Not that I'm aware of, at least.

Block migration doesn't seem to check whether the device is already in
use, maybe this is the problem. Not sure why it would be in use, though,
and in my quick test it didn't crash.

So we need some more information: What's you command line, did you do
anything specific in the monitor with block devices, what does the
stacktrace look like, etc.?
kevin, it seems that i can very easily force a crash if I cancel a 
running block migration.
if I understand correctly what happens there are aio callbacks coming in 
after

blk_mig_cleanup() has been called.

what is the proper way to detect this in blk_mig_read_cb()?

Thanks,
Peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Block Migration Assertion in qemu-kvm 1.2.0

2012-09-18 Thread Peter Lieven

On 09/18/12 12:31, Kevin Wolf wrote:

Am 18.09.2012 12:28, schrieb Peter Lieven:

On 09/17/12 22:12, Peter Lieven wrote:

On 09/17/12 10:41, Kevin Wolf wrote:

Am 16.09.2012 12:13, schrieb Peter Lieven:

Hi,

when trying to block migrate a VM from one node to another, the source
VM crashed with the following assertion:
block.c:3829: bdrv_set_in_use: Assertion `bs-in_use != in_use' failed.

Is this sth already addresses/known?

Not that I'm aware of, at least.

Block migration doesn't seem to check whether the device is already in
use, maybe this is the problem. Not sure why it would be in use, though,
and in my quick test it didn't crash.

So we need some more information: What's you command line, did you do
anything specific in the monitor with block devices, what does the
stacktrace look like, etc.?

kevin, it seems that i can very easily force a crash if I cancel a
running block migration.

if I understand correctly what happens there are aio callbacks coming in
after
blk_mig_cleanup() has been called.

what is the proper way to detect this in blk_mig_read_cb()?

You could try this, it doesn't detect the situation in
blk_mig_read_cb(), but ensures that all callbacks happen before we do
the actual cleanup (completely untested):

after testing it for half an hour i can say, it seems to fix the problem.
no segfaults and also no other assertions.

while searching I have seen that the queses blk_list and bmds_list are 
initialized at

qemu startup. wouldn't it be better to initialize them at init_blk_migration
or at least check that they are really empty? i have also seen that 
prev_time_offset

is not initialized.

thank you,
peter

sth like this:

--- qemu-kvm-1.2.0/block-migration.c.orig2012-09-17 
21:14:44.458429855 +0200

+++ qemu-kvm-1.2.0/block-migration.c2012-09-17 21:15:40.599736962 +0200
@@ -311,8 +311,12 @@ static void init_blk_migration(QEMUFile
 block_mig_state.prev_progress = -1;
 block_mig_state.bulk_completed = 0;
 block_mig_state.total_time = 0;
+block_mig_state.prev_time_offset = 0;
 block_mig_state.reads = 0;

+QSIMPLEQ_INIT(block_mig_state.bmds_list);
+QSIMPLEQ_INIT(block_mig_state.blk_list);
+
 bdrv_iterate(init_blk_migration_it, NULL);
 }

@@ -760,9 +764,6 @@ SaveVMHandlers savevm_block_handlers = {

 void blk_mig_init(void)
 {
-QSIMPLEQ_INIT(block_mig_state.bmds_list);
-QSIMPLEQ_INIT(block_mig_state.blk_list);
-
 register_savevm_live(NULL, block, 0, 1, savevm_block_handlers,
block_mig_state);
 }


diff --git a/block-migration.c b/block-migration.c
index 7def8ab..ed93301 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -519,6 +519,8 @@ static void blk_mig_cleanup(void)
  BlkMigDevState *bmds;
  BlkMigBlock *blk;

+bdrv_drain_all();
+
  set_dirty_tracking(0);

  while ((bmds = QSIMPLEQ_FIRST(block_mig_state.bmds_list)) != NULL) {


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Block Migration Assertion in qemu-kvm 1.2.0

2012-09-17 Thread Peter Lieven

On 09/17/12 10:41, Kevin Wolf wrote:

Am 16.09.2012 12:13, schrieb Peter Lieven:

Hi,

when trying to block migrate a VM from one node to another, the source
VM crashed with the following assertion:
block.c:3829: bdrv_set_in_use: Assertion `bs-in_use != in_use' failed.

Is this sth already addresses/known?

Not that I'm aware of, at least.

Block migration doesn't seem to check whether the device is already in
use, maybe this is the problem. Not sure why it would be in use, though,
and in my quick test it didn't crash.
It seems that it only happens if a vServer that has been block migrated 
earlier is block migrated the next time.

So we need some more information: What's you command line, did you do
anything specific in the monitor with block devices, what does the
stacktrace look like, etc.?

Here is my cmdline:
/usr/bin/qemu-kvm-1.2.0  -net 
tap,vlan=164,script=no,downscript=no,ifname=tap0  -net nic,vlan
=164,model=e1000,macaddr=52:54:00:ff:01:19   -drive 
format=host_device,file=/dev/7cf58855099771c2/lieven-storage-migration-t-hd0,if=virtio,cache=none,aio=nat
ive  -m 2048 -smp 2,sockets=1,cores=2,threads=1  -monitor 
tcp:0:4001,server,nowait -vnc :1 -qmp tcp:0:3001,server,nowait  -name 
'lieven-storage-migration-test'  -boot or
der=dc,menu=off  -k de  -incoming tcp:172.21.55.34:5001  -pidfile 
/var/run/qemu/vm-254.pid  -mem-path /hugepages  -mem-prealloc  -rtc 
base=utc -usb -usbdevice tablet -no
-hpet -vga cirrus  -cpu host,+x2apic,model_id='Intel(R) Xeon(R) 
CPU   L5640  @ 2.27GHz',-tsc


I have seen other errors as well in the meantime:
block-migration.c:471: flush_blks: Assertion `block_mig_state.read_done 
= 0' failed.
qemu-kvm-1.2.0[27851]: segfault at 7f00746e78d7 ip 7f67eca6226d sp 
7fff56ae3340 error 4 in qemu-system-x86_64[7f67ec9e9000+418000]


I will now try to catch the situation in the debugger.

Thanks,
Peter


Kevin


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Block Migration Assertion in qemu-kvm 1.2.0

2012-09-17 Thread Peter Lieven

On 09/17/12 10:41, Kevin Wolf wrote:

Am 16.09.2012 12:13, schrieb Peter Lieven:

Hi,

when trying to block migrate a VM from one node to another, the source
VM crashed with the following assertion:
block.c:3829: bdrv_set_in_use: Assertion `bs-in_use != in_use' failed.

Is this sth already addresses/known?

Not that I'm aware of, at least.

Block migration doesn't seem to check whether the device is already in
use, maybe this is the problem. Not sure why it would be in use, though,
and in my quick test it didn't crash.

So we need some more information: What's you command line, did you do
anything specific in the monitor with block devices, what does the
stacktrace look like, etc.?
i was also able to reproduce a flush_blks: Assertion 
`block_mig_state.read_done = 0' failed. by

cancelling a block migration and restarting it afterwards.
however, how can I grep a stack trace after an assert?

thanks,
peter


Kevin


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Block Migration Assertion in qemu-kvm 1.2.0

2012-09-16 Thread Peter Lieven

Hi,

when trying to block migrate a VM from one node to another, the source 
VM crashed with the following assertion:

block.c:3829: bdrv_set_in_use: Assertion `bs-in_use != in_use' failed.

Is this sth already addresses/known?

Thanks,
Peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


qemu-kvm and XenServer missing MSRs

2012-09-16 Thread Peter Lieven

Hi,

i have seen some recent threads about running Xen as guest. For me it is 
still not working, but
I have read that Avi is working on some fixes. I have seen in the logs 
that the following MSRs

are missing. Maybe this is related:

 cpu0 unhandled rdmsr: 0xce
 cpu0 disabled perfctr wrmsr: 0xc1 data 0x0
 cpu0 disabled perfctr wrmsr: 0xc2 data 0x0
 cpu0 disabled perfctr wrmsr: 0x186 data 0x13003c
 cpu0 disabled perfctr wrmsr: 0xc1 data 0xfea6c644
 cpu0 disabled perfctr wrmsr: 0x186 data 0x53003c

I had a different thread started which was dealing with memtest not 
working on Nehalem CPUs,

at least 0xce was also involved there.

Peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Block Migration and xbzrle

2012-09-16 Thread Peter Lieven

Hi,

I remember that this was broken some time ago and currently with 
qemu-kvm 1.2.0 I am still not able to use
block migration plus xbzrle. The migration fails if both are used 
together. XBZRLE without block migration works.


Can someone please advise what is the current expected behaviour?

Thanks,
Peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: memtest 4.20+ does not work with -cpu host

2012-09-13 Thread Peter Lieven

On 10.09.2012 14:32, Avi Kivity wrote:

On 09/10/2012 03:29 PM, Peter Lieven wrote:

On 09/10/12 14:21, Gleb Natapov wrote:

On Mon, Sep 10, 2012 at 02:15:49PM +0200, Paolo Bonzini wrote:

Il 10/09/2012 13:52, Peter Lieven ha scritto:

dd if=/dev/cpu/0/msr skip=$((0x194)) bs=8 count=1 | xxd
dd if=/dev/cpu/0/msr skip=$((0xCE)) bs=8 count=1 | xxd

it only works without the skip. but the msr device returns all zeroes.

Hmm, the strange API of the MSR device doesn't work well with dd (dd
skips to 0x194 * 8 because bs is 8.  You can try this program:


There is rdmsr/wrmsr in msr-tools.

rdmsr returns it cannot read those MSRs. regardless if I use -cpu host
or -cpu qemu64.

On the host.



did you get my output?

#rdmsr -0 0x194
00011100
#rdmsr -0 0xce
0c0004011103

cheers,
peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: memtest 4.20+ does not work with -cpu host

2012-09-13 Thread Peter Lieven

On 13.09.2012 10:05, Gleb Natapov wrote:

On Thu, Sep 13, 2012 at 10:00:26AM +0200, Paolo Bonzini wrote:

Il 13/09/2012 09:57, Gleb Natapov ha scritto:

#rdmsr -0 0x194
00011100
#rdmsr -0 0xce
0c0004011103

Yes, that can help implementing it in KVM.  But without a spec to
understand what the bits actually mean, it's just as risky...

Peter, do you have any idea where to get the spec of the memory
controller MSRs in Nehalem and newer processors?  Apparently, memtest is
using them (and in particular 0x194) to find the speed of the FSB, or
something like that.


Why would anyone will want to run memtest in a vm? May be just add those
MSRs to ignore list and that's it.

From the output it looks like it's basically a list of bits.  Returning
something sensible is better, same as for the speed scaling MSRs.


Everything is list of bits in computers :) At least 0xce is documented in  SDM.
It cannot be implemented in a migration safe manner.

What do you suggest just say memtest does not work?
I am wondering why it is working with -cpu qemu64.

Peter



--
Gleb.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: memtest 4.20+ does not work with -cpu host

2012-09-13 Thread Peter Lieven

On 13.09.2012 14:42, Gleb Natapov wrote:

On Thu, Sep 13, 2012 at 02:05:23PM +0200, Peter Lieven wrote:

On 13.09.2012 10:05, Gleb Natapov wrote:

On Thu, Sep 13, 2012 at 10:00:26AM +0200, Paolo Bonzini wrote:

Il 13/09/2012 09:57, Gleb Natapov ha scritto:

#rdmsr -0 0x194
00011100
#rdmsr -0 0xce
0c0004011103

Yes, that can help implementing it in KVM.  But without a spec to
understand what the bits actually mean, it's just as risky...

Peter, do you have any idea where to get the spec of the memory
controller MSRs in Nehalem and newer processors?  Apparently, memtest is
using them (and in particular 0x194) to find the speed of the FSB, or
something like that.


Why would anyone will want to run memtest in a vm? May be just add those
MSRs to ignore list and that's it.

From the output it looks like it's basically a list of bits.  Returning
something sensible is better, same as for the speed scaling MSRs.


Everything is list of bits in computers :) At least 0xce is documented in  SDM.
It cannot be implemented in a migration safe manner.

What do you suggest just say memtest does not work?

Why do you want to run it in a guest?
Testing memory thorughput of different host memory layouts/settings 
(hugepages, ksm etc.).

Stress testing new settings and qemu-kvm builds.
Testing new nodes with a VM which claims all available pages. Its a lot 
easier than booting

a node with a CD and attaching to the Console.

This, of course, is all not missing critical and call also be done with 
cpu model qemu64. I just
came across memtest no longer working and where wondering if there is a 
general regressing.


BTW, from 
http://opensource.apple.com/source/xnu/xnu-1228.15.4/osfmk/i386/tsc.c?txt


#define MSR_FLEX_RATIO  0x194
#define MSR_PLATFORM_INFO   0x0ce
#define BASE_NHM_CLOCK_SOURCE   1ULL
#define CPUID_MODEL_NEHALEM 26

switch (cpuid_info()-cpuid_model) {
case CPUID_MODEL_NEHALEM: {
uint64_t cpu_mhz;
uint64_t msr_flex_ratio;
uint64_t msr_platform_info;

/* See if FLEX_RATIO is being used */
msr_flex_ratio = rdmsr64(MSR_FLEX_RATIO);
msr_platform_info = rdmsr64(MSR_PLATFORM_INFO);
flex_ratio_min = (uint32_t)bitfield(msr_platform_info, 47, 40);
flex_ratio_max = (uint32_t)bitfield(msr_platform_info, 15, 8);
/* No BIOS-programed flex ratio. Use hardware max as default */
tscGranularity = flex_ratio_max;
if (msr_flex_ratio  bit(16)) {
/* Flex Enabled: Use this MSR if less than max */
flex_ratio = (uint32_t)bitfield(msr_flex_ratio, 15, 8);
if (flex_ratio  flex_ratio_max)
tscGranularity = flex_ratio;
}

/* If EFI isn't configured correctly, use a constant
 * value. See 6036811.
 */
if (busFreq == 0)
busFreq = BASE_NHM_CLOCK_SOURCE;

cpu_mhz = tscGranularity * BASE_NHM_CLOCK_SOURCE;

kprintf([NHM] Maximum Non-Turbo Ratio = [%d]\n,
(uint32_t)tscGranularity);
kprintf([NHM] CPU: Frequency  = %6d.%04dMhz\n,
(uint32_t)(cpu_mhz / Mega), (uint32_t)(cpu_mhz % Mega));
break;
}



Peter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: memtest 4.20+ does not work with -cpu host

2012-09-10 Thread Peter Lieven

On 09/06/12 16:58, Avi Kivity wrote:

On 08/22/2012 06:06 PM, Peter Lieven wrote:

Hi,

has anyone ever tested to run memtest with -cpu host flag passed to
qemu-kvm?
For me it resets when probing the chipset. With -cpu qemu64 it works
just fine.

Maybe this is specific to memtest, but it might be sth that can happen
in other
applications to.

Any thoughts?

Try to identify the cpu flag that causes this by removing them
successively (-cpu host,-flag...).  Alternatively capture a trace
(http://www.linux-kvm.org/page/Tracing) look for TRIPLE_FAULT (Intel),
and post the few hundred lines preceding it.


Here we go:

qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_userspace_exit: reason 
KVM_EXIT_IO (2)

qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason EXCEPTION_NMI 
rip 0xd185 info 0 8307

qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_fpu: load
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason IO_INSTRUCTION 
rip 0xcc60 info cf80003 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_pio: pio_write at 0xcf8 
size 4 count 1
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_userspace_exit: reason 
KVM_EXIT_IO (2)

qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_fpu: unload
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason IO_INSTRUCTION 
rip 0xcc29 info cfc0009 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_emulate_insn: [FAILED TO 
PARSE] rip=52265 csbase=0 len=2 insn=fí%ÿÿ flags=5 failed=0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_pio: pio_read at 0xcfc size 
2 count 1
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_userspace_exit: reason 
KVM_EXIT_IO (2)

qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason IO_INSTRUCTION 
rip 0xcc60 info cf80003 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_pio: pio_write at 0xcf8 
size 4 count 1
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_userspace_exit: reason 
KVM_EXIT_IO (2)

qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason IO_INSTRUCTION 
rip 0xcc29 info cfe0009 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_emulate_insn: [FAILED TO 
PARSE] rip=52265 csbase=0 len=2 insn=fí%ÿÿ flags=5 failed=0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_pio: pio_read at 0xcfe size 
2 count 1
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_userspace_exit: reason 
KVM_EXIT_IO (2)

qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason EXCEPTION_NMI 
rip 0xd185 info 0 8307

qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_fpu: load
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason IO_INSTRUCTION 
rip 0xcc60 info cf80003 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_pio: pio_write at 0xcf8 
size 4 count 1
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_userspace_exit: reason 
KVM_EXIT_IO (2)

qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_fpu: unload
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason IO_INSTRUCTION 
rip 0xcc29 info cfc0009 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_emulate_insn: [FAILED TO 
PARSE] rip=52265 csbase=0 len=2 insn=fí%ÿÿ flags=5 failed=0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_pio: pio_read at 0xcfc size 
2 count 1
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_userspace_exit: reason 
KVM_EXIT_IO (2)

qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason IO_INSTRUCTION 
rip 0xcc60 info cf80003 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_pio: pio_write at 0xcf8 
size 4 count 1
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_userspace_exit: reason 
KVM_EXIT_IO (2)

qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason IO_INSTRUCTION 
rip 0xcc29 info cfc0009 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_emulate_insn: [FAILED TO 
PARSE] rip=52265 csbase=0 len=2 insn=fí%ÿÿ flags=5 failed=0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_pio: pio_read at 0xcfc size 
2 count 1
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_userspace_exit: reason 
KVM_EXIT_IO (2)

qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason EPT_MISCONFIG 
rip 0x86e0 info 0 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_emulate_insn: [FAILED TO 
PARSE] rip=34528 csbase=0 len=3 insn=ˆF@Š„Òuõ‰L$ flags=5 failed=0
qemu-kvm-1.0.1-5107 [007] 410771.148000: vcpu_match_mmio: gva 0xb873c 
gpa 0xb873c Write GPA
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_mmio: mmio write len 1 gpa 
0xb873c val 0x6f

qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit

Re: memtest 4.20+ does not work with -cpu host

2012-09-10 Thread Peter Lieven

On 09/10/12 13:29, Paolo Bonzini wrote:

Il 10/09/2012 13:06, Peter Lieven ha scritto:

qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason MSR_READ rip
0x11478 info 0 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_msr: msr_read 194 = 0x0 (#GP)
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_inj_exception: #GP (0x0)
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason TRIPLE_FAULT
rip 0x11478 info 0 0

Memory controller MSR:

static float getNHMmultiplier(void)
{
 unsigned int msr_lo, msr_hi;
 float coef;

 /* Find multiplier (by MSR) */
 /* First, check if Flexible Ratio is Enabled */
 rdmsr(0x194, msr_lo, msr_hi);
 if((msr_lo  16)  1){
 coef = (msr_lo  8)  0xFF;
  } else {
 rdmsr(0xCE, msr_lo, msr_hi);
 coef = (msr_lo  8)  0xFF;
  }

 return coef;
}

Looks like we need to emulate it since memtest only looks at the cpuid
to detect an integrated memory controller.  What does this return for you?

dd if=/dev/cpu/0/msr skip=$((0x194)) bs=8 count=1 | xxd
dd if=/dev/cpu/0/msr skip=$((0xCE)) bs=8 count=1 | xxd

I/O error.

Peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: memtest 4.20+ does not work with -cpu host

2012-09-10 Thread Peter Lieven

On 09/10/12 13:29, Paolo Bonzini wrote:

Il 10/09/2012 13:06, Peter Lieven ha scritto:

qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason MSR_READ rip
0x11478 info 0 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_msr: msr_read 194 = 0x0 (#GP)
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_inj_exception: #GP (0x0)
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0
qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason TRIPLE_FAULT
rip 0x11478 info 0 0

Memory controller MSR:

static float getNHMmultiplier(void)
{
 unsigned int msr_lo, msr_hi;
 float coef;

 /* Find multiplier (by MSR) */
 /* First, check if Flexible Ratio is Enabled */
 rdmsr(0x194, msr_lo, msr_hi);
 if((msr_lo  16)  1){
 coef = (msr_lo  8)  0xFF;
  } else {
 rdmsr(0xCE, msr_lo, msr_hi);
 coef = (msr_lo  8)  0xFF;
  }

 return coef;
}

Looks like we need to emulate it since memtest only looks at the cpuid
to detect an integrated memory controller.  What does this return for you?

dd if=/dev/cpu/0/msr skip=$((0x194)) bs=8 count=1 | xxd
dd if=/dev/cpu/0/msr skip=$((0xCE)) bs=8 count=1 | xxd

it only works without the skip. but the msr device returns all zeroes.

peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: memtest 4.20+ does not work with -cpu host

2012-09-10 Thread Peter Lieven

On 09/10/12 14:21, Gleb Natapov wrote:

On Mon, Sep 10, 2012 at 02:15:49PM +0200, Paolo Bonzini wrote:

Il 10/09/2012 13:52, Peter Lieven ha scritto:

dd if=/dev/cpu/0/msr skip=$((0x194)) bs=8 count=1 | xxd
dd if=/dev/cpu/0/msr skip=$((0xCE)) bs=8 count=1 | xxd

it only works without the skip. but the msr device returns all zeroes.

Hmm, the strange API of the MSR device doesn't work well with dd (dd
skips to 0x194 * 8 because bs is 8.  You can try this program:


There is rdmsr/wrmsr in msr-tools.
rdmsr returns it cannot read those MSRs. regardless if I use -cpu host 
or -cpu qemu64.


peter

#includefcntl.h
#includestdio.h
#includestdlib.h

int rdmsr(int fd, long reg)
{
 char msg[40];
 long long val;
 sprintf(msg, rdmsr(%#x), reg);
 if (pread(fd,val, 8, reg)  0) {
 perror(msg);
 } else {
 printf(%s: %#016llx\n, msg, val);
 fflush(stdout);
 }
}


int main()
{
 int fd = open(/dev/cpu/0/msr, O_RDONLY);
 if (fd  0) { perror(open); exit(1); }
 rdmsr(fd, 0x194);
 rdmsr(fd, 0xCE);
}

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: memtest 4.20+ does not work with -cpu host

2012-09-10 Thread Peter Lieven

On 09/10/12 14:32, Avi Kivity wrote:

On 09/10/2012 03:29 PM, Peter Lieven wrote:

On 09/10/12 14:21, Gleb Natapov wrote:

On Mon, Sep 10, 2012 at 02:15:49PM +0200, Paolo Bonzini wrote:

Il 10/09/2012 13:52, Peter Lieven ha scritto:

dd if=/dev/cpu/0/msr skip=$((0x194)) bs=8 count=1 | xxd
dd if=/dev/cpu/0/msr skip=$((0xCE)) bs=8 count=1 | xxd

it only works without the skip. but the msr device returns all zeroes.

Hmm, the strange API of the MSR device doesn't work well with dd (dd
skips to 0x194 * 8 because bs is 8.  You can try this program:


There is rdmsr/wrmsr in msr-tools.

rdmsr returns it cannot read those MSRs. regardless if I use -cpu host
or -cpu qemu64.

On the host.

aaah ok:

#rdmsr -0 0x194
00011100
#rdmsr -0 0xce
0c0004011103

Peter





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-08-22 Thread Peter Lieven

On 08/21/12 10:23, Stefan Hajnoczi wrote:

On Tue, Aug 21, 2012 at 8:21 AM, Jan Kiszkajan.kis...@siemens.com  wrote:

On 2012-08-19 11:42, Avi Kivity wrote:

On 08/17/2012 06:04 PM, Jan Kiszka wrote:

Can anyone imagine that such a barrier may actually be required? If it
is currently possible that env-stop is evaluated before we called into
sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the
signal without properly processing its reason (stop).

Should not be required (TM): Both signal eating / stop checking and stop
setting / signal generation happens under the BQL, thus the ordering
must not make a difference here.

Agree.



Don't see where we could lose a signal. Maybe due to a subtle memory
corruption that sets thread_kicked to non-zero, preventing the kicking
this way.

Cannot be ruled out, yet too much of a coincidence.

Could be a kernel bug (either in kvm or elsewhere), we've had several
before in this area.

Is this reproducible?

Not for me. Peter only hit it very rarely, Peter obviously more easily.

I have only hit this once and was not able to reproduce it.

For me it was very reproducible, but my issue was fixed by:

http://www.mail-archive.com/kvm@vger.kernel.org/msg70908.html

Never seen this since then,
Peter


Stefan


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


memtest 4.20+ does not work with -cpu host

2012-08-22 Thread Peter Lieven

Hi,

has anyone ever tested to run memtest with -cpu host flag passed to 
qemu-kvm?
For me it resets when probing the chipset. With -cpu qemu64 it works 
just fine.


Maybe this is specific to memtest, but it might be sth that can happen 
in other

applications to.

Any thoughts?

Thanks,
Peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-07-05 Thread Peter Lieven

On 05.07.2012 10:51, Xiao Guangrong wrote:

On 06/28/2012 05:11 PM, Peter Lieven wrote:


that here is bascially whats going on:

   qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio read len 3 
gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva 0xa 
gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason 
KVM_EXIT_MMIO (6)
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio read len 
3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva 0xa 
gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason 
KVM_EXIT_MMIO (6)
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio read len 
3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva 0xa 
gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason 
KVM_EXIT_MMIO (6)
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio read len 
3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva 0xa 
gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason 
KVM_EXIT_MMIO (6)


There are two mmio emulation after user-space-exit, it is caused by mmio
read access which spans two pages. But it should be fixed by:

commit f78146b0f9230765c6315b2e14f56112513389ad
Author: Avi Kivitya...@redhat.com
Date:   Wed Apr 18 19:22:47 2012 +0300

 KVM: Fix page-crossing MMIO

 MMIO that are split across a page boundary are currently broken - the
 code does not expect to be aborted by the exit to userspace for the
 first MMIO fragment.

 This patch fixes the problem by generalizing the current code for handling
 16-byte MMIOs to handle a number of fragments, and changes the MMIO
 code to create those fragments.

 Signed-off-by: Avi Kivitya...@redhat.com
 Signed-off-by: Marcelo Tosattimtosa...@redhat.com

Could you please pull the code from:
https://git.kernel.org/pub/scm/virt/kvm/kvm.git
and trace it again?

Thank you very much, this fixes the issue I have seen.

Thanks,
Peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race condition in qemu-kvm-1.0.1

2012-07-04 Thread Peter Lieven

On 07/03/12 17:54, Marcelo Tosatti wrote:

On Wed, Jun 27, 2012 at 12:35:22PM +0200, Peter Lieven wrote:

Hi,

we recently came across multiple VMs racing and stopping working. It
seems to happen when the system is at 100% cpu.
One way to reproduce this is:
qemu-kvm-1.0.1 with vnc-thread enabled

cmdline (or similar):
/usr/bin/qemu-kvm-1.0.1 -net
tap,vlan=141,script=no,downscript=no,ifname=tap15,vnet_hdr -net
nic,vlan=141,model=virtio,macaddr=52:54:00:ff:00:f7 -drive 
format=host_device,file=/dev/mapper/iqn.2001-05.com.equallogic:0-8a0906-efdf4e007-16700198c7f4fead-02-debug-race-hd01,if=virtio,cache=none,aio=native
-m 2048 -smp 2,sockets=1,cores=2,threads=1 -monitor
tcp:0:4026,server,nowait -vnc :26 -qmp tcp:0:3026,server,nowait
-name 02-debug-race -boot order=dc,menu=off -cdrom
/home/kvm/cdrom//root/ubuntu-12.04-server-amd64.iso -k de -pidfile
/var/run/qemu/vm-221.pid -mem-prealloc -cpu
host,+x2apic,model_id=Intel(R) Xeon(R) CPU   L5640  @
2.27GHz,-tsc -rtc base=utc -usb -usbdevice tablet -no-hpet -vga
cirrus

Is it reproducible without vnc thread enabled?
Yes, it is. I tried it with and without. It is also even happnig with 
0.12.5 where

no vnc thread (and i think also iothread) is available.

Thanks,
Peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-07-04 Thread Peter Lieven

On 07/03/12 15:25, Avi Kivity wrote:

On 07/03/2012 04:15 PM, Peter Lieven wrote:

On 03.07.2012 15:13, Avi Kivity wrote:

On 07/03/2012 04:01 PM, Peter Lieven wrote:

Further output from my testing.

Working:
Linux 2.6.38 with included kvm module
Linux 3.0.0 with included kvm module

Not-Working:
Linux 3.2.0 with included kvm module
Linux 2.6.28 with kvm-kmod 3.4
Linux 3.0.0 with kvm-kmod 3.4
Linux 3.2.0 with kvm-kmod 3.4

I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1.
It might be that the code was introduced somewhere between 3.0.0
and 3.2.0 in the kvm kernel module and that the flaw is not
in qemu-kvm.

Any hints?


A bisect could tell us where the problem is.

To avoid bisecting all of linux, try

 git bisect v3.2 v3.0 virt/kvm arch/x86/kvm



would it also be ok to bisect kvm-kmod?

Yes, but note that kvm-kmod is spread across two repositories which are
not often tested out of sync, so you may get build failures.

ok, i just started with this with a 3.0 (good) and 3.2 (bad) vanilla 
kernel. i can confirm
the bug and i am no starting to bisect. it will take while with my 
equipment if anyone

has a powerful testbed to run this i would greatly appreciate help.

if anyone wants to reproduce:

a) v3.2 from git.kernel.org
b) qemu-kvm 1.0.1 from sourceforge
c) ubuntu 64-bit 12.04 server cd
d) empty (e.g. all zero) hard disk image

cmdline:
./qemu-system-x86_64 -m 512 -cdrom 
/home/lieven/Downloads/ubuntu-12.04-server-amd64.iso -hda 
/dev/hd1/vmtest -vnc :1 -monitor stdio -boot dc


then choose boot from first harddisk and try to quit the qemu monitor 
with 'quit'. - hypervisor hangs.


peter



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-07-04 Thread Peter Lieven

On 07/03/12 15:13, Avi Kivity wrote:

On 07/03/2012 04:01 PM, Peter Lieven wrote:

Further output from my testing.

Working:
Linux 2.6.38 with included kvm module
Linux 3.0.0 with included kvm module

Not-Working:
Linux 3.2.0 with included kvm module
Linux 2.6.28 with kvm-kmod 3.4
Linux 3.0.0 with kvm-kmod 3.4
Linux 3.2.0 with kvm-kmod 3.4

I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1.
It might be that the code was introduced somewhere between 3.0.0
and 3.2.0 in the kvm kernel module and that the flaw is not
in qemu-kvm.

Any hints?


A bisect could tell us where the problem is.

To avoid bisecting all of linux, try

git bisect v3.2 v3.0 virt/kvm arch/x86/kvm

here we go:

commit ca7d58f375c650cf36900cb1da1ca2cc99b13393
Author: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
Date:   Wed Jul 13 14:31:08 2011 +0800

KVM: x86: fix broken read emulation spans a page boundary


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-07-03 Thread Peter Lieven

Further output from my testing.

Working:
Linux 2.6.38 with included kvm module
Linux 3.0.0 with included kvm module

Not-Working:
Linux 3.2.0 with included kvm module
Linux 2.6.28 with kvm-kmod 3.4
Linux 3.0.0 with kvm-kmod 3.4
Linux 3.2.0 with kvm-kmod 3.4

I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1.
It might be that the code was introduced somewhere between 3.0.0
and 3.2.0 in the kvm kernel module and that the flaw is not
in qemu-kvm.

Any hints?

Thanks,
Peter


On 02.07.2012 17:05, Avi Kivity wrote:

On 06/28/2012 12:38 PM, Peter Lieven wrote:

does anyone know whats that here in handle_mmio?

 /* hack: Red Hat 7.1 generates these weird accesses. */
 if ((addr  0xa-4  addr= 0xa)  kvm_run-mmio.len == 3)
 return 0;


Just what it says.  There is a 4-byte access to address 0x9.  The
first byte lies in RAM, the next three bytes are in mmio.  qemu is
geared to power-of-two accesses even though x86 can generate accesses to
any number of bytes between 1 and 8.

It appears that this has happened with your guest.  It's not impossible
that it's genuine.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-07-03 Thread Peter Lieven

On 03.07.2012 15:13, Avi Kivity wrote:

On 07/03/2012 04:01 PM, Peter Lieven wrote:

Further output from my testing.

Working:
Linux 2.6.38 with included kvm module
Linux 3.0.0 with included kvm module

Not-Working:
Linux 3.2.0 with included kvm module
Linux 2.6.28 with kvm-kmod 3.4
Linux 3.0.0 with kvm-kmod 3.4
Linux 3.2.0 with kvm-kmod 3.4

I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1.
It might be that the code was introduced somewhere between 3.0.0
and 3.2.0 in the kvm kernel module and that the flaw is not
in qemu-kvm.

Any hints?


A bisect could tell us where the problem is.

To avoid bisecting all of linux, try

git bisect v3.2 v3.0 virt/kvm arch/x86/kvm



would it also be ok to bisect kvm-kmod?

thanks,
peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-07-02 Thread Peter Lieven

On 02.07.2012 09:05, Jan Kiszka wrote:

On 2012-07-01 21:18, Peter Lieven wrote:

Am 01.07.2012 um 10:19 schrieb Avi Kivity:


On 06/28/2012 10:27 PM, Peter Lieven wrote:

Am 28.06.2012 um 18:32 schrieb Avi Kivity:


On 06/28/2012 07:29 PM, Peter Lieven wrote:

Yes. A signal is sent, and KVM returns from the guest to userspace on
pending signals.

is there a description available how this process exactly works?

The kernel part is in vcpu_enter_guest(), see the check for
signal_pending().  But this hasn't seen changes for quite a long while.

Thank you, i will have a look. I noticed a few patches that where submitted
during the last year, maybe one of them is related:

Switch SIG_IPI to SIGUSR1
Fix signal handling of SIG_IPI when io-thread is enabled

In the first commit there is mentioned a 32-on-64-bit Linux kernel bug
is there any reference to that?


http://web.archiveorange.com/archive/v/1XS1vwGSFLyYygwTXg1K.  Are you
running 32-on-64?

I think the issue occurs when running a 32-bit guest on a 64-bit system. Afaik, 
the
isolinux loader where is see the race is 32-bit altough it is a 64-bit ubuntu 
lts
cd image. The second case where i have seen the race is on shutdown of a
Windows 2000 Server which is also 32-bit.

32-on-64 particularly means using a 32-bit QEMU[-kvm] binary on a
64-bit host kernel. What does file qemu-system-x86_64 report about yours?
Its custom build on a 64-bit linux as 64-bit application. I will try to 
continue to find out

today whats going wrong. Any help or hints appreciated ;-)

Thanks,
Peter


Jan



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-07-02 Thread Peter Lieven

On 02.07.2012 17:05, Avi Kivity wrote:

On 06/28/2012 12:38 PM, Peter Lieven wrote:

does anyone know whats that here in handle_mmio?

 /* hack: Red Hat 7.1 generates these weird accesses. */
 if ((addr  0xa-4  addr= 0xa)  kvm_run-mmio.len == 3)
 return 0;


Just what it says.  There is a 4-byte access to address 0x9.  The
first byte lies in RAM, the next three bytes are in mmio.  qemu is
geared to power-of-two accesses even though x86 can generate accesses to
any number of bytes between 1 and 8.

I just stumbled across the word hack in the comment. When the race
occurs the CPU is basically reading from 0xa in an endless loop.

It appears that this has happened with your guest.  It's not impossible
that it's genuine.


I had a lot to do the last days, but I update our build environment to
Ubuntu LTS 12.04 64-bit Server which is based on Linux 3.2.0. I still
see the issue. If I use the kvm Module provided with the kernel it is
working correctly. If I use kvm-kmod-3.4 with qemu-kvm-1.0.1 (both
from sourceforge) I can reproduce the race condition.

I will keep you posted when I have more evidence.

Thanks,
Peter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-07-01 Thread Peter Lieven

Am 01.07.2012 um 10:19 schrieb Avi Kivity:

 On 06/28/2012 10:27 PM, Peter Lieven wrote:
 
 Am 28.06.2012 um 18:32 schrieb Avi Kivity:
 
 On 06/28/2012 07:29 PM, Peter Lieven wrote:
 Yes. A signal is sent, and KVM returns from the guest to userspace on
 pending signals.
 
 is there a description available how this process exactly works?
 
 The kernel part is in vcpu_enter_guest(), see the check for
 signal_pending().  But this hasn't seen changes for quite a long while.
 
 Thank you, i will have a look. I noticed a few patches that where submitted
 during the last year, maybe one of them is related:
 
 Switch SIG_IPI to SIGUSR1
 Fix signal handling of SIG_IPI when io-thread is enabled
 
 In the first commit there is mentioned a 32-on-64-bit Linux kernel bug
 is there any reference to that?
 
 
 http://web.archiveorange.com/archive/v/1XS1vwGSFLyYygwTXg1K.  Are you
 running 32-on-64?

I think the issue occurs when running a 32-bit guest on a 64-bit system. Afaik, 
the
isolinux loader where is see the race is 32-bit altough it is a 64-bit ubuntu 
lts
cd image. The second case where i have seen the race is on shutdown of a
Windows 2000 Server which is also 32-bit.

Peter

 
 
 -- 
 error compiling committee.c: too many arguments to function
 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-06-28 Thread Peter Lieven

On 27.06.2012 18:54, Jan Kiszka wrote:

On 2012-06-27 17:39, Peter Lieven wrote:

Hi all,

i debugged this further and found out that kvm-kmod-3.0 is working with
qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is
working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0).
Has anyone a clue which new KVM feature could cause this if a vcpu is in
an infinite loop?

Before accusing kvm-kmod ;), can you check if the effect is visible with
an original Linux 3.3.x or 3.4.x kernel as well?

sorry, i should have been more specific. maybe I also misunderstood sth.
I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel
3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if 
I use

a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't.
however, maybe we don't have to dig to deep - see below.

Then, bisection the change in qemu-kvm that apparently resolved the
issue would be interesting.

If we have to dig deeper, tracing [1] the lockup would likely be helpful
(all events of the qemu process, not just KVM related ones: trace-cmd
record -e all qemu-system-x86_64 ...).

that here is bascially whats going on:

  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
read len 3 gpa 0xa val 0x10ff
qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva 
0xa gpa 0xa Read GPA
qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
unsatisfied-read len 1 gpa 0xa val 0x0
qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason 
KVM_EXIT_MMIO (6)
qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
read len 3 gpa 0xa val 0x10ff
qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva 
0xa gpa 0xa Read GPA
qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
unsatisfied-read len 1 gpa 0xa val 0x0
qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason 
KVM_EXIT_MMIO (6)
qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
read len 3 gpa 0xa val 0x10ff
qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva 
0xa gpa 0xa Read GPA
qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
unsatisfied-read len 1 gpa 0xa val 0x0
qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason 
KVM_EXIT_MMIO (6)
qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
read len 3 gpa 0xa val 0x10ff
qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva 
0xa gpa 0xa Read GPA
qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
unsatisfied-read len 1 gpa 0xa val 0x0
qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason 
KVM_EXIT_MMIO (6)


its doing that forever. this is tracing the kvm module. doing the 
qemu-system-x86_64 trace is a bit compilcated, but
maybe this is already sufficient. otherwise i will of course gather this 
info as well.


thanks
peter



Jan

[1] http://www.linux-kvm.org/page/Tracing


---

Hi,

we recently came across multiple VMs racing and stopping working. It
seems to happen when the system is at 100% cpu.
One way to reproduce this is:
qemu-kvm-1.0.1 with vnc-thread enabled

cmdline (or similar):
/usr/bin/qemu-kvm-1.0.1 -net
tap,vlan=141,script=no,downscript=no,ifname=tap15,vnet_hdr -net
nic,vlan=141,model=virtio,macaddr=52:54:00:ff:00:f7 -drive
format=host_device,file=/dev/mapper/iqn.2001-05.com.equallogic:0-8a0906-efdf4e007-16700198c7f4fead-02-debug-race-hd01,if=virtio,cache=none,aio=native
-m 2048 -smp 2,sockets=1,cores=2,threads=1 -monitor
tcp:0:4026,server,nowait -vnc :26 -qmp tcp:0:3026,server,nowait -name
02-debug-race -boot order=dc,menu=off -cdrom
/home/kvm/cdrom//root/ubuntu-12.04-server-amd64.iso -k de -pidfile
/var/run/qemu/vm-221.pid -mem-prealloc -cpu
host,+x2apic,model_id=Intel(R) Xeon(R) CPU   L5640  @
2.27GHz,-tsc -rtc base=utc -usb -usbdevice tablet -no-hpet -vga cirrus

it is important that the attached virtio image contains only zeroes. if
the system boots from cd, select boot from first harddisk.
the hypervisor then hangs at 100% cpu and neither monitor nor qmp are
responsive anymore.

i have also seen customers reporting this when a VM is shut down.

if this is connected to the threaded vnc server it might be important to
connected at this time.

debug backtrace attached.

Thanks,
Peter

--

(gdb) file /usr/bin/qemu-kvm-1.0.1
Reading symbols from /usr/bin/qemu-kvm-1.0.1...done.
(gdb) attach 5145
Attaching to program: /usr/bin/qemu-kvm-1.0.1, process 5145
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
[Thread debugging using libthread_db enabled]
[New Thread 0x7f54d08b9700 (LWP 5253)]
[New Thread 0x7f5552757700 (LWP 5152)]
[New Thread 0x7f5552f58700 (LWP 5151)]
0x7f5553c6b5a3 in select () from /lib/libc.so.6
(gdb) info threads

Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-06-28 Thread Peter Lieven

On 28.06.2012 11:21, Jan Kiszka wrote:

On 2012-06-28 11:11, Peter Lieven wrote:

On 27.06.2012 18:54, Jan Kiszka wrote:

On 2012-06-27 17:39, Peter Lieven wrote:

Hi all,

i debugged this further and found out that kvm-kmod-3.0 is working with
qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is
working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0).
Has anyone a clue which new KVM feature could cause this if a vcpu is in
an infinite loop?

Before accusing kvm-kmod ;), can you check if the effect is visible with
an original Linux 3.3.x or 3.4.x kernel as well?

sorry, i should have been more specific. maybe I also misunderstood sth.
I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel
3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if
I use
a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't.
however, maybe we don't have to dig to deep - see below.

kvm-kmod wraps and patches things to make the kvm code from 3.3/3.4
working on an older kernel. This step may introduce bugs of its own.
Therefore my suggestion to use a real 3.x kernel to exclude that risk
first of all.


Then, bisection the change in qemu-kvm that apparently resolved the
issue would be interesting.

If we have to dig deeper, tracing [1] the lockup would likely be helpful
(all events of the qemu process, not just KVM related ones: trace-cmd
record -e all qemu-system-x86_64 ...).

that here is bascially whats going on:

   qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio read
len 3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
read len 3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
read len 3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
read len 3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)

its doing that forever. this is tracing the kvm module. doing the
qemu-system-x86_64 trace is a bit compilcated, but
maybe this is already sufficient. otherwise i will of course gather this
info as well.

That's only tracing KVM event, and it's tracing when things went wrong
already. We may need a full trace (-e all) specifically for the period
when this pattern above started.

i will do that. maybe i should explain that the vcpu is executing
garbage when this above starts. its basically booting from an empty 
harddisk.


if i understand correctly qemu-kvm loops in kvm_cpu_exec(CPUState *env);

maybe the time to handle the monitor/qmp connection is just to short.
if i understand furhter correctly, it can only handle monitor connections
while qemu-kvm is executing kvm_vcpu_ioctl(env, KVM_RUN, 0); or am i
wrong here? the time spend in this state might be rather short.

my concern is not that the machine hangs, just the the hypervisor is 
unresponsive
and its impossible to reset or quit gracefully. the only way to get the 
hypervisor

ended is via SIGKILL.

thanks
peter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-06-28 Thread Peter Lieven

does anyone know whats that here in handle_mmio?

/* hack: Red Hat 7.1 generates these weird accesses. */
if ((addr  0xa-4  addr = 0xa)  kvm_run-mmio.len == 3)
return 0;

thanks,
peter

On 28.06.2012 11:31, Peter Lieven wrote:

On 28.06.2012 11:21, Jan Kiszka wrote:

On 2012-06-28 11:11, Peter Lieven wrote:

On 27.06.2012 18:54, Jan Kiszka wrote:

On 2012-06-27 17:39, Peter Lieven wrote:

Hi all,

i debugged this further and found out that kvm-kmod-3.0 is working 
with

qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is
working as well is kvm-kmod-3.4 with an old userspace 
(qemu-kvm-0.13.0).
Has anyone a clue which new KVM feature could cause this if a vcpu 
is in

an infinite loop?
Before accusing kvm-kmod ;), can you check if the effect is visible 
with

an original Linux 3.3.x or 3.4.x kernel as well?
sorry, i should have been more specific. maybe I also misunderstood 
sth.
I was believing that kvm-kmod-3.0 is basically what is in vanialla 
kernel
3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it 
works, if

I use
a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't.
however, maybe we don't have to dig to deep - see below.

kvm-kmod wraps and patches things to make the kvm code from 3.3/3.4
working on an older kernel. This step may introduce bugs of its own.
Therefore my suggestion to use a real 3.x kernel to exclude that risk
first of all.


Then, bisection the change in qemu-kvm that apparently resolved the
issue would be interesting.

If we have to dig deeper, tracing [1] the lockup would likely be 
helpful

(all events of the qemu process, not just KVM related ones: trace-cmd
record -e all qemu-system-x86_64 ...).

that here is bascially whats going on:

   qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
read

len 3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   
reason

KVM_EXIT_MMIO (6)
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
read len 3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   
reason

KVM_EXIT_MMIO (6)
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
read len 3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   
reason

KVM_EXIT_MMIO (6)
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
read len 3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   
reason

KVM_EXIT_MMIO (6)

its doing that forever. this is tracing the kvm module. doing the
qemu-system-x86_64 trace is a bit compilcated, but
maybe this is already sufficient. otherwise i will of course gather 
this

info as well.

That's only tracing KVM event, and it's tracing when things went wrong
already. We may need a full trace (-e all) specifically for the period
when this pattern above started.

i will do that. maybe i should explain that the vcpu is executing
garbage when this above starts. its basically booting from an empty 
harddisk.


if i understand correctly qemu-kvm loops in kvm_cpu_exec(CPUState *env);

maybe the time to handle the monitor/qmp connection is just to short.
if i understand furhter correctly, it can only handle monitor connections
while qemu-kvm is executing kvm_vcpu_ioctl(env, KVM_RUN, 0); or am i
wrong here? the time spend in this state might be rather short.

my concern is not that the machine hangs, just the the hypervisor is 
unresponsive
and its impossible to reset or quit gracefully. the only way to get 
the hypervisor

ended is via SIGKILL.

thanks
peter


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-06-28 Thread Peter Lieven

On 28.06.2012 11:39, Jan Kiszka wrote:

On 2012-06-28 11:31, Peter Lieven wrote:

On 28.06.2012 11:21, Jan Kiszka wrote:

On 2012-06-28 11:11, Peter Lieven wrote:

On 27.06.2012 18:54, Jan Kiszka wrote:

On 2012-06-27 17:39, Peter Lieven wrote:

Hi all,

i debugged this further and found out that kvm-kmod-3.0 is working with
qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is
working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0).
Has anyone a clue which new KVM feature could cause this if a vcpu is in
an infinite loop?

Before accusing kvm-kmod ;), can you check if the effect is visible with
an original Linux 3.3.x or 3.4.x kernel as well?

sorry, i should have been more specific. maybe I also misunderstood sth.
I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel
3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if
I use
a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't.
however, maybe we don't have to dig to deep - see below.

kvm-kmod wraps and patches things to make the kvm code from 3.3/3.4
working on an older kernel. This step may introduce bugs of its own.
Therefore my suggestion to use a real 3.x kernel to exclude that risk
first of all.


Then, bisection the change in qemu-kvm that apparently resolved the
issue would be interesting.

If we have to dig deeper, tracing [1] the lockup would likely be helpful
(all events of the qemu process, not just KVM related ones: trace-cmd
record -e all qemu-system-x86_64 ...).

that here is bascially whats going on:

qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio read
len 3 gpa 0xa val 0x10ff
  qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
read len 3 gpa 0xa val 0x10ff
  qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
read len 3 gpa 0xa val 0x10ff
  qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
read len 3 gpa 0xa val 0x10ff
  qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)

its doing that forever. this is tracing the kvm module. doing the
qemu-system-x86_64 trace is a bit compilcated, but
maybe this is already sufficient. otherwise i will of course gather this
info as well.

That's only tracing KVM event, and it's tracing when things went wrong
already. We may need a full trace (-e all) specifically for the period
when this pattern above started.

i will do that. maybe i should explain that the vcpu is executing
garbage when this above starts. its basically booting from an empty
harddisk.

if i understand correctly qemu-kvm loops in kvm_cpu_exec(CPUState *env);

maybe the time to handle the monitor/qmp connection is just to short.
if i understand furhter correctly, it can only handle monitor connections
while qemu-kvm is executing kvm_vcpu_ioctl(env, KVM_RUN, 0); or am i
wrong here? the time spend in this state might be rather short.

Unless you played with priorities and affinities, the Linux scheduler
should provide the required time to the iothread.

I have a 1.1GB (85MB compressed) trace-file. If you have time to
look at it I could drop it somewhere.

We currently run all VMs with nice 1 because we observed that
this improves that controlability of the Node in case all VMs
have execessive CPU load. Running the VM unniced does
not change the behaviour unfortunately.

Peter

my concern is not that the machine hangs, just the the hypervisor is
unresponsive
and its impossible to reset or quit gracefully. the only way to get the
hypervisor
ended is via SIGKILL.

Right. Even if the guest runs wild, you must be able to control the vm
via the monitor etc. If not, that's a bug.

Jan



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo

Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-06-28 Thread Peter Lieven

On 28.06.2012 11:39, Jan Kiszka wrote:

On 2012-06-28 11:31, Peter Lieven wrote:

On 28.06.2012 11:21, Jan Kiszka wrote:

On 2012-06-28 11:11, Peter Lieven wrote:

On 27.06.2012 18:54, Jan Kiszka wrote:

On 2012-06-27 17:39, Peter Lieven wrote:

Hi all,

i debugged this further and found out that kvm-kmod-3.0 is working with
qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is
working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0).
Has anyone a clue which new KVM feature could cause this if a vcpu is in
an infinite loop?

Before accusing kvm-kmod ;), can you check if the effect is visible with
an original Linux 3.3.x or 3.4.x kernel as well?

sorry, i should have been more specific. maybe I also misunderstood sth.
I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel
3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if
I use
a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't.
however, maybe we don't have to dig to deep - see below.

kvm-kmod wraps and patches things to make the kvm code from 3.3/3.4
working on an older kernel. This step may introduce bugs of its own.
Therefore my suggestion to use a real 3.x kernel to exclude that risk
first of all.


Then, bisection the change in qemu-kvm that apparently resolved the
issue would be interesting.

If we have to dig deeper, tracing [1] the lockup would likely be helpful
(all events of the qemu process, not just KVM related ones: trace-cmd
record -e all qemu-system-x86_64 ...).

that here is bascially whats going on:

qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio read
len 3 gpa 0xa val 0x10ff
  qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
read len 3 gpa 0xa val 0x10ff
  qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
read len 3 gpa 0xa val 0x10ff
  qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
read len 3 gpa 0xa val 0x10ff
  qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)

its doing that forever. this is tracing the kvm module. doing the
qemu-system-x86_64 trace is a bit compilcated, but
maybe this is already sufficient. otherwise i will of course gather this
info as well.

That's only tracing KVM event, and it's tracing when things went wrong
already. We may need a full trace (-e all) specifically for the period
when this pattern above started.

i will do that. maybe i should explain that the vcpu is executing
garbage when this above starts. its basically booting from an empty
harddisk.

if i understand correctly qemu-kvm loops in kvm_cpu_exec(CPUState *env);

maybe the time to handle the monitor/qmp connection is just to short.
if i understand furhter correctly, it can only handle monitor connections
while qemu-kvm is executing kvm_vcpu_ioctl(env, KVM_RUN, 0); or am i
wrong here? the time spend in this state might be rather short.

Unless you played with priorities and affinities, the Linux scheduler
should provide the required time to the iothread.


my concern is not that the machine hangs, just the the hypervisor is
unresponsive
and its impossible to reset or quit gracefully. the only way to get the
hypervisor
ended is via SIGKILL.

Right. Even if the guest runs wild, you must be able to control the vm
via the monitor etc. If not, that's a bug.

what i observed just know is that the monitor is working up to the
point i try to quit the hypervisor or try to reset the cpu.

so we where looking at a completely wrong place...

it seems that in this short excerpt, that the deadlock appears not
on excution but when the vcpus shall be paused.

Program received signal SIGINT, Interrupt.
0x7fc8ec36785c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib/libpthread.so.0

(gdb) thread apply all bt

Thread 4 (Thread

qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-06-28 Thread Peter Lieven

Hi,

i debugged my initial problem further and found out that the problem 
happens to be that
the main thread is stuck in pause_all_vcpus() on reset or quit commands 
in the monitor
if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the 
condition from while (ret == 0)

to while ((ret == 0)  !env-stop); it works, but is this the right fix?
Quit command seems to work, but on Reset the VM enterns pause state.

Thanks,
Peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-06-28 Thread Peter Lieven

On 28.06.2012 15:25, Jan Kiszka wrote:

On 2012-06-28 15:05, Peter Lieven wrote:

Hi,

i debugged my initial problem further and found out that the problem
happens to be that
the main thread is stuck in pause_all_vcpus() on reset or quit commands
in the monitor
if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the
condition from while (ret == 0)
to while ((ret == 0)  !env-stop); it works, but is this the right fix?
Quit command seems to work, but on Reset the VM enterns pause state.

Before entering the wait loop in pause_all_vcpus, there are kicks sent
to all vcpus. Now we need to find out why some of those kicks apparently
don't reach the destination.

can you explain shot what exactly these kicks do? does these kicks lead
to leaving the kernel mode and returning to userspace?

Again:
  - on which host kernels does this occur, and which change may have
changed it?

I do not see it in 3.0.0 and have also not seen it in 2.6.38. both
the mainline 64-bit ubuntu-server kernels (for natty / oneiric 
respectively).

If I compile a more recent kvm-kmod 3.3 or 3.4 on these machines,
it is no longer working.

  - with which qemu-kvm version is it reproducible, and which commit
introduced or fixed it?

qemu-kvm-1.0.1 from sourceforge. to get into the scenario it
is not sufficient to boot from an empty harddisk. to reproduce
i have use a live cd like ubuntu-server 12.04 and choose to
boot from the first harddisk. i think the isolinux loader does
not check for a valid bootsector and just executes what is found
in sector 0. this leads to the mmio reads i posted and 100%
cpu load (most spent in kernel). at that time the monitor/qmp
is still responsible. if i sent a command that pauses all vcpus,
the first cpu is looping in kvm_cpu_exec and the main thread
is waiting. at that time the monitor stops responding.
i have also seen this issue on very old windows 2000 servers
where the system fails to power off and is just halted. maybe
this is also a busy loop.

i will try to bisect this asap and let you know, maybe the above
info helps you already to reproduce.

thanks,
peter


I failed reproducing so far.

Jan



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-06-28 Thread Peter Lieven

On 28.06.2012 17:22, Jan Kiszka wrote:

On 2012-06-28 17:02, Peter Lieven wrote:

On 28.06.2012 15:25, Jan Kiszka wrote:

On 2012-06-28 15:05, Peter Lieven wrote:

Hi,

i debugged my initial problem further and found out that the problem
happens to be that
the main thread is stuck in pause_all_vcpus() on reset or quit commands
in the monitor
if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the
condition from while (ret == 0)
to while ((ret == 0)   !env-stop); it works, but is this the right fix?
Quit command seems to work, but on Reset the VM enterns pause state.

Before entering the wait loop in pause_all_vcpus, there are kicks sent
to all vcpus. Now we need to find out why some of those kicks apparently
don't reach the destination.

can you explain shot what exactly these kicks do? does these kicks lead
to leaving the kernel mode and returning to userspace?

Yes. A signal is sent, and KVM returns from the guest to userspace on
pending signals.

is there a description available how this process exactly works?

thanks
peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


race condition in qemu-kvm-1.0.1

2012-06-27 Thread Peter Lieven

Hi,

we recently came across multiple VMs racing and stopping working. It 
seems to happen when the system is at 100% cpu.

One way to reproduce this is:
qemu-kvm-1.0.1 with vnc-thread enabled

cmdline (or similar):
/usr/bin/qemu-kvm-1.0.1 -net 
tap,vlan=141,script=no,downscript=no,ifname=tap15,vnet_hdr -net 
nic,vlan=141,model=virtio,macaddr=52:54:00:ff:00:f7 -drive 
format=host_device,file=/dev/mapper/iqn.2001-05.com.equallogic:0-8a0906-efdf4e007-16700198c7f4fead-02-debug-race-hd01,if=virtio,cache=none,aio=native 
-m 2048 -smp 2,sockets=1,cores=2,threads=1 -monitor 
tcp:0:4026,server,nowait -vnc :26 -qmp tcp:0:3026,server,nowait -name 
02-debug-race -boot order=dc,menu=off -cdrom 
/home/kvm/cdrom//root/ubuntu-12.04-server-amd64.iso -k de -pidfile 
/var/run/qemu/vm-221.pid -mem-prealloc -cpu 
host,+x2apic,model_id=Intel(R) Xeon(R) CPU   L5640  @ 
2.27GHz,-tsc -rtc base=utc -usb -usbdevice tablet -no-hpet -vga cirrus


it is important that the attached virtio image contains only zeroes. if 
the system boots from cd, select boot from first harddisk.
the hypervisor then hangs at 100% cpu and neither monitor nor qmp are 
responsive anymore.


i have also seen customers reporting this when a VM is shut down.

if this is connected to the threaded vnc server it might be important to 
connected at this time.


debug backtrace attached.

Thanks,
Peter

--

(gdb) file /usr/bin/qemu-kvm-1.0.1
Reading symbols from /usr/bin/qemu-kvm-1.0.1...done.
(gdb) attach 5145
Attaching to program: /usr/bin/qemu-kvm-1.0.1, process 5145
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols 
found)...done.

Loaded symbols for /lib64/ld-linux-x86-64.so.2
[Thread debugging using libthread_db enabled]
[New Thread 0x7f54d08b9700 (LWP 5253)]
[New Thread 0x7f5552757700 (LWP 5152)]
[New Thread 0x7f5552f58700 (LWP 5151)]
0x7f5553c6b5a3 in select () from /lib/libc.so.6
(gdb) info threads
  4 Thread 0x7f5552f58700 (LWP 5151)  0x7f5553c6a747 in ioctl () 
from /lib/libc.so.6
  3 Thread 0x7f5552757700 (LWP 5152)  0x7f5553c6a747 in ioctl () 
from /lib/libc.so.6
  2 Thread 0x7f54d08b9700 (LWP 5253)  0x7f5553f1a85c in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
* 1 Thread 0x7f50d700 (LWP 5145)  0x7f5553c6b5a3 in select () 
from /lib/libc.so.6

(gdb) thread apply all bt

Thread 4 (Thread 0x7f5552f58700 (LWP 5151)):
#0  0x7f5553c6a747 in ioctl () from /lib/libc.so.6
#1  0x7f727830 in kvm_vcpu_ioctl (env=0x7f5557652f10, 
type=44672) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:1101
#2  0x7f72728a in kvm_cpu_exec (env=0x7f5557652f10) at 
/usr/src/qemu-kvm-1.0.1/kvm-all.c:987
#3  0x7f6f5c08 in qemu_kvm_cpu_thread_fn (arg=0x7f5557652f10) at 
/usr/src/qemu-kvm-1.0.1/cpus.c:740

#4  0x7f5553f159ca in start_thread () from /lib/libpthread.so.0
#5  0x7f5553c72cdd in clone () from /lib/libc.so.6
#6  0x in ?? ()

Thread 3 (Thread 0x7f5552757700 (LWP 5152)):
#0  0x7f5553c6a747 in ioctl () from /lib/libc.so.6
#1  0x7f727830 in kvm_vcpu_ioctl (env=0x7f555766ae60, 
type=44672) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:1101
#2  0x7f72728a in kvm_cpu_exec (env=0x7f555766ae60) at 
/usr/src/qemu-kvm-1.0.1/kvm-all.c:987
#3  0x7f6f5c08 in qemu_kvm_cpu_thread_fn (arg=0x7f555766ae60) at 
/usr/src/qemu-kvm-1.0.1/cpus.c:740

#4  0x7f5553f159ca in start_thread () from /lib/libpthread.so.0
#5  0x7f5553c72cdd in clone () from /lib/libc.so.6
#6  0x in ?? ()

Thread 2 (Thread 0x7f54d08b9700 (LWP 5253)):
#0  0x7f5553f1a85c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib/libpthread.so.0
#1  0x7f679f5d in qemu_cond_wait (cond=0x7f5557ede1e0, 
mutex=0x7f5557ede210) at qemu-thread-posix.c:113
#2  0x7f6b06a1 in vnc_worker_thread_loop (queue=0x7f5557ede1e0) 
at ui/vnc-jobs-async.c:222
#3  0x7f6b0b7f in vnc_worker_thread (arg=0x7f5557ede1e0) at 
ui/vnc-jobs-async.c:318

#4  0x7f5553f159ca in start_thread () from /lib/libpthread.so.0
#5  0x7f5553c72cdd in clone () from /lib/libc.so.6
#6  0x in ?? ()

Thread 1 (Thread 0x7f50d700 (LWP 5145)):
#0  0x7f5553c6b5a3 in select () from /lib/libc.so.6
#1  0x7f6516be in main_loop_wait (nonblocking=0) at main-loop.c:456
#2  0x7f647ad0 in main_loop () at /usr/src/qemu-kvm-1.0.1/vl.c:1482
#3  0x7f64c698 in main (argc=38, argv=0x79d894a8, 
envp=0x79d895e0) at /usr/src/qemu-kvm-1.0.1/vl.c:3523

(gdb) thread apply all bt full

Thread 4 (Thread 0x7f5552f58700 (LWP 5151)):
#0  0x7f5553c6a747 in ioctl () from /lib/libc.so.6
No symbol table info available.
#1  0x7f727830 in kvm_vcpu_ioctl (env=0x7f5557652f10, 
type=44672) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:1101

ret = 32597
arg = 0x0
ap = {{gp_offset = 24, fp_offset = 48, overflow_arg_area = 
0x7f5552f57e50, reg_save_area = 0x7f5552f57d90}}
#2  0x7f72728a in kvm_cpu_exec (env=0x7f5557652f10) at 

Re: qemu-kvm-1.0 crashes with threaded vnc server?

2012-06-26 Thread Peter Lieven

On 13.03.2012 16:06, Alexander Graf wrote:

On 13.03.2012, at 16:05, Corentin Chary wrote:


On Tue, Mar 13, 2012 at 12:29 PM, Peter Lievenp...@dlh.net  wrote:

On 11.02.2012 09:55, Corentin Chary wrote:

On Thu, Feb 9, 2012 at 7:08 PM, Peter Lievenp...@dlh.net   wrote:

Hi,

is anyone aware if there are still problems when enabling the threaded
vnc
server?
I saw some VMs crashing when using a qemu-kvm build with
--enable-vnc-thread.

qemu-kvm-1.0[22646]: segfault at 0 ip 7fec1ca7ea0b sp
7fec19d056d0
error 6 in libz.so.1.2.3.3[7fec1ca75000+16000]
qemu-kvm-1.0[26056]: segfault at 7f06d8d6e010 ip 7f06e0a30d71 sp
7f06df035748 error 6 in libc-2.11.1.so[7f06e09aa000+17a000]

I had no time to debug further. It seems to happen shortly after
migrating,
but thats uncertain. At least the segfault in libz seems to
give a hint to VNC since I cannot image of any other part of qemu-kvm
using
libz except for VNC server.

Thanks,
Peter



Hi Peter,
I found two patches on my git tree that I sent long ago but somehow
get lost on the mailing list. I rebased the tree but did not have the
time (yet) to test them.
http://git.iksaif.net/?p=qemu.git;a=shortlog;h=refs/heads/wip
Feel free to try them. If QEMU segfault again, please send a full gdb
backtrace / valgrind trace / way to reproduce :).
Thanks,


I have seen no more crashes with these to patches applied. I would suggest
it would be good to push them to the master repository.

Thank you,
Peter


Ccing Alexander,

Ah, cool. Corentin, I think you're right now the closest thing we have to a 
maintainer for VNC. Could you please just send out a pull request for those?

hi all,

i suspect there is still a problem with the threaded vnc server. its 
just a guess, but we saw a resonable number of vms hanging in the
last weeks. hanging meaning the emulation is stopped and the qemu-kvm 
process does no longer react, not on monitor, not on vnc, not on qmp.
why i suspect the threaded vnc server is that in all cases we have 
analyzed this happened with an open vnc session and only on nodes with 
the threaded vnc server
enabled. it might also be the case that this happens at a resolution 
change. is there anything known or has someone an idea?


we are running qemu-kvm 1.0.1 with

  vnc: don't mess up with iohandlers in the vnc thread

  vnc: Limit r/w access to size of allocated memory

compiled in.

unfortunately, i was not yet able to reproduce this with a debugger 
attached.


thanks,
peter



Alex



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Assertion after chaning display resolution

2012-04-24 Thread Peter Lieven

Hi all,

I saw the following assert after chaning display resolution. This might 
be the cause, but i am not sure. Threaded VNC is enabled.

Anyone ever seen this?

 qemu-kvm-1.0: malloc.c:3096: sYSMALLOc: Assertion `(old_top == 
(((mbinptr) (((char *) ((av)-bins[((1) - 1) * 2])) - 
__builtin_offsetof (struct malloc_chunk, fd  old_size == 0) || 
((unsigned long) (old_size) = (unsigned long)__builtin_offsetof 
(struct malloc_chunk, fd_nextsize))+((2 * (sizeof(size_t))) - 1))  ~((2 
* (sizeof(size_t))) - 1)))  ((old_top)-size  0x1)  ((unsigned 
long)old_end  pagemask) == 0)' failed.


Thanks,
Peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Assertion after chaning display resolution

2012-04-24 Thread Peter Lieven

On 24.04.2012 15:34, Alon Levy wrote:

On Tue, Apr 24, 2012 at 03:24:31PM +0200, Peter Lieven wrote:

Hi all,

I saw the following assert after chaning display resolution. This might be
the cause, but i am not sure. Threaded VNC is enabled.
Anyone ever seen this?

  qemu-kvm-1.0: malloc.c:3096: sYSMALLOc: Assertion `(old_top == (((mbinptr)
(((char *)((av)-bins[((1) - 1) * 2])) - __builtin_offsetof (struct
malloc_chunk, fd  old_size == 0) || ((unsigned long) (old_size)=
(unsigned long)__builtin_offsetof (struct malloc_chunk,
fd_nextsize))+((2 * (sizeof(size_t))) - 1))  ~((2 * (sizeof(size_t))) -
1)))  ((old_top)-size  0x1)  ((unsigned long)old_end  pagemask) ==
0)' failed.

A shot in the dark - does valgrind show anything wrong?
Problem is i cannot reproduce this, but I can try running the VM in 
valgrind and

check if there is any problem.

Peter

Thanks,
Peter




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


qemu-kvm 1.0.1?

2012-04-16 Thread Peter Lieven

Hi,

i was wondering if there will be a qemu-kvm version 1.0.1?

The last tag I see here is 1.0:
http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=summary

Any hints?

Thanks,
Peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: performance trouble

2012-03-28 Thread Peter Lieven

On 27.03.2012 19:06, Vadim Rozenfeld wrote:

On Tuesday, March 27, 2012 06:16:11 PM Peter Lieven wrote:

On 27.03.2012 18:12, Vadim Rozenfeld wrote:

On Tuesday, March 27, 2012 05:58:01 PM Peter Lieven wrote:

On 27.03.2012 17:44, Vadim Rozenfeld wrote:

On Tuesday, March 27, 2012 04:06:13 PM Peter Lieven wrote:

On 27.03.2012 14:29, Gleb Natapov wrote:

On Tue, Mar 27, 2012 at 02:28:04PM +0200, Peter Lieven wrote:

On 27.03.2012 14:26, Gleb Natapov wrote:

On Tue, Mar 27, 2012 at 02:20:23PM +0200, Peter Lieven wrote:

On 27.03.2012 12:00, Gleb Natapov wrote:

On Tue, Mar 27, 2012 at 11:26:29AM +0200, Peter Lieven wrote:

On 27.03.2012 11:23, Vadim Rozenfeld wrote:

On Tuesday, March 27, 2012 10:56:05 AM Gleb Natapov wrote:

On Mon, Mar 26, 2012 at 10:11:43PM +0200, Vadim Rozenfeld

wrote:

On Monday, March 26, 2012 08:54:50 PM Peter Lieven wrote:

On 26.03.2012 20:36, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote:

On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld

wrote:

On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote:

On 22.03.2012 10:38, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 10:52:42 AM Peter Lieven

wrote:

On 22.03.2012 09:48, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov

wrote:

On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter
Lieven

wrote:

On 21.03.2012 12:10, David Cure wrote:

hello,

Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb
Natapov

ecrivait :

Try to addfeature policy='disable'
name='hypervisor'/to cpu definition in XML
and check command line.


ok I try this but I can't usecpu model
to map the host cpu

(my libvirt is 0.9.8) so I use :
 cpu match='exact'

   modelOpteron_G3/model
   feature policy='disable'
   name='hypervisor'/

 /cpu

(the physical server use Opteron CPU).

The log is here :
http://www.roullier.net/Report/report-3.2-vhost-ne
t- 1v cpu-cp u.tx t.gz

And now with only 1 vcpu, the response time is
8.5s, great

improvment. We keep this configuration for
production

: we check the response time when some other users

are connected.

please keep in mind, that setting -hypervisor,
disabling hpet and only one vcpu
makes windows use tsc as clocksource. you have to
make sure, that your vm is not switching between
physical sockets on your system and that you have
constant_tsc feature to have a stable tsc between
the cores in the same socket. its also likely that
the vm will crash when live migrated.

All true. I asked to try -hypervisor only to verify
where we loose performance. Since you get good
result with it frequent access to PM timer is
probably the reason. I do not recommend using
-hypervisor for production!


@gleb: do you know whats the state of in-kernel
hyper-v timers?

Vadim is working on it. I'll let him answer.

It would be nice to have synthetic timers supported.
But,

at the moment, I'm only researching  this feature.

So it will take months at least?

I would say weeks.

Is there a way, we could contribute and help you with
this?

Hi Peter,
You are welcome to add  an appropriate handler.

I think Vadim refers to this HV MSR
http://msdn.microsoft.com/en-us/library/windows/hardware/f
f5 42 633%28 v=vs .85 %29.aspx

This one is pretty simple to support. Please see
attachments for more details. I was thinking about
synthetic  timers http://msdn.microsoft.com/en-
us/library/windows/hardware/ff542758(v=vs.85).aspx

is this what microsoft qpc uses as clocksource in hyper-v?

Yes, it should be enough for Win7 / W2K8R2.

To clarify the thing that microsoft qpc uses is what is
implemented by the patch Vadim attached to his previous email.
But I believe that additional qemu patch is needed for Windows
to actually use it.

You are right.
bits 1 and 9 must be set to on in leaf 0x4003 and HPET
should be completely removed from ACPI.

could you advise how to do this and/or make a patch?

the stuff you send yesterday is for qemu, right? would
it be possible to use it in qemu-kvm also?

No, they are for kernel.

i meant the qemu.diff file.

Yes, I missed the second attachment.


if i understand correctly i have to pass -cpu host,+hv_refcnt to
qemu?

Looks like it.

ok, so it would be interesting if it helps to avoid the pmtimer
reads we observed earlier. right?

Yes.

first feedback: performance seems to be amazing. i cannot confirm that
it breaks hv_spinlocks, hv_vapic and hv_relaxed.
why did you assume this?

I didn't mean that hv_refcnt will break any other hyper-v features.
I just want to say that turning hv_refcnt on (as any other hv_ option)
will crash Win8 on boot-up.

yes, i got it meanwhile ;-)

let me know what you think should be done to further test
the refcnt implementation.

i would suggest to return at least 0x if msr 0x4021
is read.

IIRC Win7(W2k8R2) only reads this MSR. Win8 reads and writes.

you mean win7 only writes, don't

Re: performance trouble

2012-03-27 Thread Peter Lieven

On 27.03.2012 11:23, Vadim Rozenfeld wrote:

On Tuesday, March 27, 2012 10:56:05 AM Gleb Natapov wrote:

On Mon, Mar 26, 2012 at 10:11:43PM +0200, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 08:54:50 PM Peter Lieven wrote:

On 26.03.2012 20:36, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote:

On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote:

On 22.03.2012 10:38, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 10:52:42 AM Peter Lieven wrote:

On 22.03.2012 09:48, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote:

On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote:

On 21.03.2012 12:10, David Cure wrote:

hello,

Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov

ecrivait :

Try to addfeature policy='disable' name='hypervisor'/
to cpu definition in XML and check command line.


ok I try this but I can't usecpu model  to map the
host cpu

(my libvirt is 0.9.8) so I use :
  cpu match='exact'

modelOpteron_G3/model
feature policy='disable' name='hypervisor'/

  /cpu

(the physical server use Opteron CPU).

The log is here :
http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-cp
u.tx t.gz

And now with only 1 vcpu, the response time is 8.5s, great

improvment. We keep this configuration for production : we
check the response time when some other users are
connected.

please keep in mind, that setting -hypervisor, disabling hpet
and only one vcpu
makes windows use tsc as clocksource. you have to make sure,
that your vm is not switching between physical sockets on
your system and that you have constant_tsc feature to have a
stable tsc between the cores in the same socket. its also
likely that the vm will crash when live migrated.

All true. I asked to try -hypervisor only to verify where we
loose performance. Since you get good result with it frequent
access to PM timer is probably the reason. I do not recommend
using -hypervisor for production!


@gleb: do you know whats the state of in-kernel hyper-v
timers?

Vadim is working on it. I'll let him answer.

It would be nice to have synthetic timers supported. But,  at
the moment, I'm only researching  this feature.

So it will take months at least?

I would say weeks.

Is there a way, we could contribute and help you with this?

Hi Peter,
You are welcome to add  an appropriate handler.

I think Vadim refers to this HV MSR
http://msdn.microsoft.com/en-us/library/windows/hardware/ff542633%28
v=vs .85 %29.aspx

This one is pretty simple to support. Please see attachments for more
details. I was thinking about synthetic  timers
http://msdn.microsoft.com/en-
us/library/windows/hardware/ff542758(v=vs.85).aspx

is this what microsoft qpc uses as clocksource in hyper-v?

Yes, it should be enough for Win7 / W2K8R2.

To clarify the thing that microsoft qpc uses is what is implemented by
the patch Vadim attached to his previous email. But I believe that
additional qemu patch is needed for Windows to actually use it.

You are right.
bits 1 and 9 must be set to on in leaf 0x4003 and HPET
should be completely removed from ACPI.


could you advise how to do this and/or make a patch?

the stuff you send yesterday is for qemu, right? would
it be possible to use it in qemu-kvm also?

peter


--
Gleb.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: performance trouble

2012-03-27 Thread Peter Lieven

On 27.03.2012 12:40, Vadim Rozenfeld wrote:

On Tuesday, March 27, 2012 11:26:29 AM Peter Lieven wrote:

On 27.03.2012 11:23, Vadim Rozenfeld wrote:

On Tuesday, March 27, 2012 10:56:05 AM Gleb Natapov wrote:

On Mon, Mar 26, 2012 at 10:11:43PM +0200, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 08:54:50 PM Peter Lieven wrote:

On 26.03.2012 20:36, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote:

On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote:

On 22.03.2012 10:38, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 10:52:42 AM Peter Lieven wrote:

On 22.03.2012 09:48, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote:

On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote:

On 21.03.2012 12:10, David Cure wrote:

hello,

Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov

ecrivait :

Try to addfeature policy='disable' name='hypervisor'/
to cpu definition in XML and check command line.


ok I try this but I can't usecpu model   to map the
host cpu

(my libvirt is 0.9.8) so I use :
   cpu match='exact'

 modelOpteron_G3/model
 feature policy='disable' name='hypervisor'/

   /cpu

(the physical server use Opteron CPU).

The log is here :
http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-cp
u.tx t.gz

And now with only 1 vcpu, the response time is 8.5s, great

improvment. We keep this configuration for production : we
check the response time when some other users are
connected.

please keep in mind, that setting -hypervisor, disabling hpet
and only one vcpu
makes windows use tsc as clocksource. you have to make sure,
that your vm is not switching between physical sockets on
your system and that you have constant_tsc feature to have a
stable tsc between the cores in the same socket. its also
likely that the vm will crash when live migrated.

All true. I asked to try -hypervisor only to verify where we
loose performance. Since you get good result with it frequent
access to PM timer is probably the reason. I do not recommend
using -hypervisor for production!


@gleb: do you know whats the state of in-kernel hyper-v
timers?

Vadim is working on it. I'll let him answer.

It would be nice to have synthetic timers supported. But,  at
the moment, I'm only researching  this feature.

So it will take months at least?

I would say weeks.

Is there a way, we could contribute and help you with this?

Hi Peter,
You are welcome to add  an appropriate handler.

I think Vadim refers to this HV MSR
http://msdn.microsoft.com/en-us/library/windows/hardware/ff542633%28
v=vs .85 %29.aspx

This one is pretty simple to support. Please see attachments for more
details. I was thinking about synthetic  timers
http://msdn.microsoft.com/en-
us/library/windows/hardware/ff542758(v=vs.85).aspx

is this what microsoft qpc uses as clocksource in hyper-v?

Yes, it should be enough for Win7 / W2K8R2.

To clarify the thing that microsoft qpc uses is what is implemented by
the patch Vadim attached to his previous email. But I believe that
additional qemu patch is needed for Windows to actually use it.

You are right.
bits 1 and 9 must be set to on in leaf 0x4003 and HPET
should be completely removed from ACPI.

could you advise how to do this and/or make a patch?

Gleb mentioned that it properly handled in upstream,
otherwise just comment the entire HPET section in
acpi-dsdt.dsl file.

i have upstream bios installed. so -no-hpet should disable hpet completely.
can you give a hint, what
bits 1 and 9 must be set to on in leaf 0x4003 means?


the stuff you send yesterday is for qemu, right? would
it be possible to use it in qemu-kvm also?

Yes, but don't forget about kvm patch as well.

ok, i will try my best. would you consider your patch a quick hack
or do you think it would be worth to be uploaded to the upstream repository?

peter

peter


--

Gleb.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: performance trouble

2012-03-27 Thread Peter Lieven

On 27.03.2012 12:00, Gleb Natapov wrote:

On Tue, Mar 27, 2012 at 11:26:29AM +0200, Peter Lieven wrote:

On 27.03.2012 11:23, Vadim Rozenfeld wrote:

On Tuesday, March 27, 2012 10:56:05 AM Gleb Natapov wrote:

On Mon, Mar 26, 2012 at 10:11:43PM +0200, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 08:54:50 PM Peter Lieven wrote:

On 26.03.2012 20:36, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote:

On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote:

On 22.03.2012 10:38, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 10:52:42 AM Peter Lieven wrote:

On 22.03.2012 09:48, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote:

On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote:

On 21.03.2012 12:10, David Cure wrote:

hello,

Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov

ecrivait :

Try to addfeature policy='disable' name='hypervisor'/
to cpu definition in XML and check command line.


ok I try this but I can't usecpu model   to map the
host cpu

(my libvirt is 0.9.8) so I use :
  cpu match='exact'

modelOpteron_G3/model
feature policy='disable' name='hypervisor'/

  /cpu

(the physical server use Opteron CPU).

The log is here :
http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-cp
u.tx t.gz

And now with only 1 vcpu, the response time is 8.5s, great

improvment. We keep this configuration for production : we
check the response time when some other users are
connected.

please keep in mind, that setting -hypervisor, disabling hpet
and only one vcpu
makes windows use tsc as clocksource. you have to make sure,
that your vm is not switching between physical sockets on
your system and that you have constant_tsc feature to have a
stable tsc between the cores in the same socket. its also
likely that the vm will crash when live migrated.

All true. I asked to try -hypervisor only to verify where we
loose performance. Since you get good result with it frequent
access to PM timer is probably the reason. I do not recommend
using -hypervisor for production!


@gleb: do you know whats the state of in-kernel hyper-v
timers?

Vadim is working on it. I'll let him answer.

It would be nice to have synthetic timers supported. But,  at
the moment, I'm only researching  this feature.

So it will take months at least?

I would say weeks.

Is there a way, we could contribute and help you with this?

Hi Peter,
You are welcome to add  an appropriate handler.

I think Vadim refers to this HV MSR
http://msdn.microsoft.com/en-us/library/windows/hardware/ff542633%28
v=vs .85 %29.aspx

This one is pretty simple to support. Please see attachments for more
details. I was thinking about synthetic  timers
http://msdn.microsoft.com/en-
us/library/windows/hardware/ff542758(v=vs.85).aspx

is this what microsoft qpc uses as clocksource in hyper-v?

Yes, it should be enough for Win7 / W2K8R2.

To clarify the thing that microsoft qpc uses is what is implemented by
the patch Vadim attached to his previous email. But I believe that
additional qemu patch is needed for Windows to actually use it.

You are right.
bits 1 and 9 must be set to on in leaf 0x4003 and HPET
should be completely removed from ACPI.

could you advise how to do this and/or make a patch?

the stuff you send yesterday is for qemu, right? would
it be possible to use it in qemu-kvm also?


No, they are for kernel.

i meant the qemu.diff file.

if i understand correctly i have to pass -cpu host,+hv_refcnt to qemu?

peter




--
Gleb.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: performance trouble

2012-03-27 Thread Peter Lieven

On 27.03.2012 14:26, Gleb Natapov wrote:

On Tue, Mar 27, 2012 at 02:20:23PM +0200, Peter Lieven wrote:

On 27.03.2012 12:00, Gleb Natapov wrote:

On Tue, Mar 27, 2012 at 11:26:29AM +0200, Peter Lieven wrote:

On 27.03.2012 11:23, Vadim Rozenfeld wrote:

On Tuesday, March 27, 2012 10:56:05 AM Gleb Natapov wrote:

On Mon, Mar 26, 2012 at 10:11:43PM +0200, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 08:54:50 PM Peter Lieven wrote:

On 26.03.2012 20:36, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote:

On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote:

On 22.03.2012 10:38, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 10:52:42 AM Peter Lieven wrote:

On 22.03.2012 09:48, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote:

On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote:

On 21.03.2012 12:10, David Cure wrote:

hello,

Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov

ecrivait :

Try to addfeature policy='disable' name='hypervisor'/
to cpu definition in XML and check command line.


ok I try this but I can't usecpu modelto map the
host cpu

(my libvirt is 0.9.8) so I use :
  cpu match='exact'

modelOpteron_G3/model
feature policy='disable' name='hypervisor'/

  /cpu

(the physical server use Opteron CPU).

The log is here :
http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-cp
u.tx t.gz

And now with only 1 vcpu, the response time is 8.5s, great

improvment. We keep this configuration for production : we
check the response time when some other users are
connected.

please keep in mind, that setting -hypervisor, disabling hpet
and only one vcpu
makes windows use tsc as clocksource. you have to make sure,
that your vm is not switching between physical sockets on
your system and that you have constant_tsc feature to have a
stable tsc between the cores in the same socket. its also
likely that the vm will crash when live migrated.

All true. I asked to try -hypervisor only to verify where we
loose performance. Since you get good result with it frequent
access to PM timer is probably the reason. I do not recommend
using -hypervisor for production!


@gleb: do you know whats the state of in-kernel hyper-v
timers?

Vadim is working on it. I'll let him answer.

It would be nice to have synthetic timers supported. But,  at
the moment, I'm only researching  this feature.

So it will take months at least?

I would say weeks.

Is there a way, we could contribute and help you with this?

Hi Peter,
You are welcome to add  an appropriate handler.

I think Vadim refers to this HV MSR
http://msdn.microsoft.com/en-us/library/windows/hardware/ff542633%28
v=vs .85 %29.aspx

This one is pretty simple to support. Please see attachments for more
details. I was thinking about synthetic  timers
http://msdn.microsoft.com/en-
us/library/windows/hardware/ff542758(v=vs.85).aspx

is this what microsoft qpc uses as clocksource in hyper-v?

Yes, it should be enough for Win7 / W2K8R2.

To clarify the thing that microsoft qpc uses is what is implemented by
the patch Vadim attached to his previous email. But I believe that
additional qemu patch is needed for Windows to actually use it.

You are right.
bits 1 and 9 must be set to on in leaf 0x4003 and HPET
should be completely removed from ACPI.

could you advise how to do this and/or make a patch?

the stuff you send yesterday is for qemu, right? would
it be possible to use it in qemu-kvm also?


No, they are for kernel.

i meant the qemu.diff file.


Yes, I missed the second attachment.


if i understand correctly i have to pass -cpu host,+hv_refcnt to qemu?


Looks like it.

ok, so it would be interesting if it helps to avoid the pmtimer reads
we observed earlier. right?

peter


--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: performance trouble

2012-03-27 Thread Peter Lieven

On 27.03.2012 14:29, Gleb Natapov wrote:

On Tue, Mar 27, 2012 at 02:28:04PM +0200, Peter Lieven wrote:

On 27.03.2012 14:26, Gleb Natapov wrote:

On Tue, Mar 27, 2012 at 02:20:23PM +0200, Peter Lieven wrote:

On 27.03.2012 12:00, Gleb Natapov wrote:

On Tue, Mar 27, 2012 at 11:26:29AM +0200, Peter Lieven wrote:

On 27.03.2012 11:23, Vadim Rozenfeld wrote:

On Tuesday, March 27, 2012 10:56:05 AM Gleb Natapov wrote:

On Mon, Mar 26, 2012 at 10:11:43PM +0200, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 08:54:50 PM Peter Lieven wrote:

On 26.03.2012 20:36, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote:

On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote:

On 22.03.2012 10:38, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 10:52:42 AM Peter Lieven wrote:

On 22.03.2012 09:48, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote:

On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote:

On 21.03.2012 12:10, David Cure wrote:

hello,

Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov

ecrivait :

Try to addfeature policy='disable' name='hypervisor'/
to cpu definition in XML and check command line.


ok I try this but I can't usecpu model to map the
host cpu

(my libvirt is 0.9.8) so I use :
  cpu match='exact'

modelOpteron_G3/model
feature policy='disable' name='hypervisor'/

  /cpu

(the physical server use Opteron CPU).

The log is here :
http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-cp
u.tx t.gz

And now with only 1 vcpu, the response time is 8.5s, great

improvment. We keep this configuration for production : we
check the response time when some other users are
connected.

please keep in mind, that setting -hypervisor, disabling hpet
and only one vcpu
makes windows use tsc as clocksource. you have to make sure,
that your vm is not switching between physical sockets on
your system and that you have constant_tsc feature to have a
stable tsc between the cores in the same socket. its also
likely that the vm will crash when live migrated.

All true. I asked to try -hypervisor only to verify where we
loose performance. Since you get good result with it frequent
access to PM timer is probably the reason. I do not recommend
using -hypervisor for production!


@gleb: do you know whats the state of in-kernel hyper-v
timers?

Vadim is working on it. I'll let him answer.

It would be nice to have synthetic timers supported. But,  at
the moment, I'm only researching  this feature.

So it will take months at least?

I would say weeks.

Is there a way, we could contribute and help you with this?

Hi Peter,
You are welcome to add  an appropriate handler.

I think Vadim refers to this HV MSR
http://msdn.microsoft.com/en-us/library/windows/hardware/ff542633%28
v=vs .85 %29.aspx

This one is pretty simple to support. Please see attachments for more
details. I was thinking about synthetic  timers
http://msdn.microsoft.com/en-
us/library/windows/hardware/ff542758(v=vs.85).aspx

is this what microsoft qpc uses as clocksource in hyper-v?

Yes, it should be enough for Win7 / W2K8R2.

To clarify the thing that microsoft qpc uses is what is implemented by
the patch Vadim attached to his previous email. But I believe that
additional qemu patch is needed for Windows to actually use it.

You are right.
bits 1 and 9 must be set to on in leaf 0x4003 and HPET
should be completely removed from ACPI.

could you advise how to do this and/or make a patch?

the stuff you send yesterday is for qemu, right? would
it be possible to use it in qemu-kvm also?


No, they are for kernel.

i meant the qemu.diff file.


Yes, I missed the second attachment.


if i understand correctly i have to pass -cpu host,+hv_refcnt to qemu?


Looks like it.

ok, so it would be interesting if it helps to avoid the pmtimer reads
we observed earlier. right?


Yes.

ok, will try it.

can you give me a short advice if the patch has to applied to qemu-kvm 
or qemu latest

from git?

peter


--
Gleb.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: performance trouble

2012-03-27 Thread Peter Lieven

On 27.03.2012 14:29, Gleb Natapov wrote:

On Tue, Mar 27, 2012 at 02:28:04PM +0200, Peter Lieven wrote:

On 27.03.2012 14:26, Gleb Natapov wrote:

On Tue, Mar 27, 2012 at 02:20:23PM +0200, Peter Lieven wrote:

On 27.03.2012 12:00, Gleb Natapov wrote:

On Tue, Mar 27, 2012 at 11:26:29AM +0200, Peter Lieven wrote:

On 27.03.2012 11:23, Vadim Rozenfeld wrote:

On Tuesday, March 27, 2012 10:56:05 AM Gleb Natapov wrote:

On Mon, Mar 26, 2012 at 10:11:43PM +0200, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 08:54:50 PM Peter Lieven wrote:

On 26.03.2012 20:36, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote:

On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote:

On 22.03.2012 10:38, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 10:52:42 AM Peter Lieven wrote:

On 22.03.2012 09:48, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote:

On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote:

On 21.03.2012 12:10, David Cure wrote:

hello,

Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov

ecrivait :

Try to addfeature policy='disable' name='hypervisor'/
to cpu definition in XML and check command line.


ok I try this but I can't usecpu model to map the
host cpu

(my libvirt is 0.9.8) so I use :
  cpu match='exact'

modelOpteron_G3/model
feature policy='disable' name='hypervisor'/

  /cpu

(the physical server use Opteron CPU).

The log is here :
http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-cp
u.tx t.gz

And now with only 1 vcpu, the response time is 8.5s, great

improvment. We keep this configuration for production : we
check the response time when some other users are
connected.

please keep in mind, that setting -hypervisor, disabling hpet
and only one vcpu
makes windows use tsc as clocksource. you have to make sure,
that your vm is not switching between physical sockets on
your system and that you have constant_tsc feature to have a
stable tsc between the cores in the same socket. its also
likely that the vm will crash when live migrated.

All true. I asked to try -hypervisor only to verify where we
loose performance. Since you get good result with it frequent
access to PM timer is probably the reason. I do not recommend
using -hypervisor for production!


@gleb: do you know whats the state of in-kernel hyper-v
timers?

Vadim is working on it. I'll let him answer.

It would be nice to have synthetic timers supported. But,  at
the moment, I'm only researching  this feature.

So it will take months at least?

I would say weeks.

Is there a way, we could contribute and help you with this?

Hi Peter,
You are welcome to add  an appropriate handler.

I think Vadim refers to this HV MSR
http://msdn.microsoft.com/en-us/library/windows/hardware/ff542633%28
v=vs .85 %29.aspx

This one is pretty simple to support. Please see attachments for more
details. I was thinking about synthetic  timers
http://msdn.microsoft.com/en-
us/library/windows/hardware/ff542758(v=vs.85).aspx

is this what microsoft qpc uses as clocksource in hyper-v?

Yes, it should be enough for Win7 / W2K8R2.

To clarify the thing that microsoft qpc uses is what is implemented by
the patch Vadim attached to his previous email. But I believe that
additional qemu patch is needed for Windows to actually use it.

You are right.
bits 1 and 9 must be set to on in leaf 0x4003 and HPET
should be completely removed from ACPI.

could you advise how to do this and/or make a patch?

the stuff you send yesterday is for qemu, right? would
it be possible to use it in qemu-kvm also?


No, they are for kernel.

i meant the qemu.diff file.


Yes, I missed the second attachment.


if i understand correctly i have to pass -cpu host,+hv_refcnt to qemu?


Looks like it.

ok, so it would be interesting if it helps to avoid the pmtimer reads
we observed earlier. right?


Yes.
first feedback: performance seems to be amazing. i cannot confirm that 
it breaks hv_spinlocks, hv_vapic and hv_relaxed.

why did you assume this?

no more pmtimer reads. i can now almost fully utililizy a 1GBit 
interface with a file transfer while there was not one
cpu core fully utilized as observed with pmtimer. some live migration 
tests revealed that it did not crash even under load.


@vadim: i think we need a proper patch for the others to test this ;-)

what i observed: is it right, that HV_X64_MSR_TIME_REF_COUNT is missing 
in msrs_to_save[] in x86/x86.c of the kernel module?


thanks for you help,
peter


--
Gleb.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: performance trouble

2012-03-27 Thread Peter Lieven

On 27.03.2012 13:43, Vadim Rozenfeld wrote:

On Tuesday, March 27, 2012 12:49:58 PM Peter Lieven wrote:

On 27.03.2012 12:40, Vadim Rozenfeld wrote:

On Tuesday, March 27, 2012 11:26:29 AM Peter Lieven wrote:

On 27.03.2012 11:23, Vadim Rozenfeld wrote:

On Tuesday, March 27, 2012 10:56:05 AM Gleb Natapov wrote:

On Mon, Mar 26, 2012 at 10:11:43PM +0200, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 08:54:50 PM Peter Lieven wrote:

On 26.03.2012 20:36, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote:

On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote:

On 22.03.2012 10:38, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 10:52:42 AM Peter Lieven wrote:

On 22.03.2012 09:48, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote:

On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote:

On 21.03.2012 12:10, David Cure wrote:

hello,

Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov

ecrivait :

Try to addfeature policy='disable' name='hypervisor'/
to cpu definition in XML and check command line.


ok I try this but I can't usecpu modelto map

the

host cpu

(my libvirt is 0.9.8) so I use :
cpu match='exact'

  modelOpteron_G3/model
  feature policy='disable' name='hypervisor'/

/cpu

(the physical server use Opteron CPU).

The log is here :
http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-
cp u.tx t.gz

And now with only 1 vcpu, the response time is 8.5s,
great

improvment. We keep this configuration for production : we
check the response time when some other users are
connected.

please keep in mind, that setting -hypervisor, disabling
hpet and only one vcpu
makes windows use tsc as clocksource. you have to make
sure, that your vm is not switching between physical
sockets on your system and that you have constant_tsc
feature to have a stable tsc between the cores in the same
socket. its also likely that the vm will crash when live
migrated.

All true. I asked to try -hypervisor only to verify where we
loose performance. Since you get good result with it
frequent access to PM timer is probably the reason. I do
not recommend using -hypervisor for production!


@gleb: do you know whats the state of in-kernel hyper-v
timers?

Vadim is working on it. I'll let him answer.

It would be nice to have synthetic timers supported. But,  at
the moment, I'm only researching  this feature.

So it will take months at least?

I would say weeks.

Is there a way, we could contribute and help you with this?

Hi Peter,
You are welcome to add  an appropriate handler.

I think Vadim refers to this HV MSR
http://msdn.microsoft.com/en-us/library/windows/hardware/ff542633%
28 v=vs .85 %29.aspx

This one is pretty simple to support. Please see attachments for
more details. I was thinking about synthetic  timers
http://msdn.microsoft.com/en-
us/library/windows/hardware/ff542758(v=vs.85).aspx

is this what microsoft qpc uses as clocksource in hyper-v?

Yes, it should be enough for Win7 / W2K8R2.

To clarify the thing that microsoft qpc uses is what is implemented by
the patch Vadim attached to his previous email. But I believe that
additional qemu patch is needed for Windows to actually use it.

You are right.
bits 1 and 9 must be set to on in leaf 0x4003 and HPET
should be completely removed from ACPI.

could you advise how to do this and/or make a patch?

Gleb mentioned that it properly handled in upstream,
otherwise just comment the entire HPET section in
acpi-dsdt.dsl file.

i have upstream bios installed. so -no-hpet should disable hpet completely.
can you give a hint, what
bits 1 and 9 must be set to on in leaf 0x4003 means?

I mean the following code:
+if (hyperv_ref_counter_enabled()) {
+c-eax |= HV_X64_MSR_TIME_REF_COUNT_AVAILABLE;
+c-eax |= 0x200;
+}

Please see attached file for more information.


the stuff you send yesterday is for qemu, right? would
it be possible to use it in qemu-kvm also?

Yes, but don't forget about kvm patch as well.

ok, i will try my best. would you consider your patch a quick hack
or do you think it would be worth to be uploaded to the upstream
repository?

It was just a brief attempt from my side, mostly inspirited by our with Gleb
conversation,  to see what it worth to turn this option on.
It is not fully tested. It will crash Win8 (as well as the rest of the
currently introduced hyper-v features).

i can confirm that windows 8 installer does not start and resets the
vm continously. it tries to access hv msr 0x4021

http://msdn.microsoft.com/en-us/library/windows/hardware/ff542648%28v=vs.85%29.aspx

it is possible to tell the guest that the host is not iTSC (how they 
call it) capable. i will try to hack

a patch for this.

peter


I wouldn't commit this code without

Re: performance trouble

2012-03-27 Thread Peter Lieven

On 27.03.2012 17:37, Vadim Rozenfeld wrote:

On Tuesday, March 27, 2012 04:44:51 PM Peter Lieven wrote:

On 27.03.2012 13:43, Vadim Rozenfeld wrote:

On Tuesday, March 27, 2012 12:49:58 PM Peter Lieven wrote:

On 27.03.2012 12:40, Vadim Rozenfeld wrote:

On Tuesday, March 27, 2012 11:26:29 AM Peter Lieven wrote:

On 27.03.2012 11:23, Vadim Rozenfeld wrote:

On Tuesday, March 27, 2012 10:56:05 AM Gleb Natapov wrote:

On Mon, Mar 26, 2012 at 10:11:43PM +0200, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 08:54:50 PM Peter Lieven wrote:

On 26.03.2012 20:36, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote:

On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote:

On 22.03.2012 10:38, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 10:52:42 AM Peter Lieven wrote:

On 22.03.2012 09:48, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote:

On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven

wrote:

On 21.03.2012 12:10, David Cure wrote:

hello,

Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov

ecrivait :

Try to addfeature policy='disable' name='hypervisor'/
to cpu definition in XML and check command line.


ok I try this but I can't usecpu model to map

the


host cpu

(my libvirt is 0.9.8) so I use :
 cpu match='exact'

   modelOpteron_G3/model
   feature policy='disable' name='hypervisor'/

 /cpu

(the physical server use Opteron CPU).

The log is here :
http://www.roullier.net/Report/report-3.2-vhost-net-1vcp
u- cp u.tx t.gz

And now with only 1 vcpu, the response time is 8.5s,
great

improvment. We keep this configuration for production :
we check the response time when some other users are
connected.

please keep in mind, that setting -hypervisor, disabling
hpet and only one vcpu
makes windows use tsc as clocksource. you have to make
sure, that your vm is not switching between physical
sockets on your system and that you have constant_tsc
feature to have a stable tsc between the cores in the
same socket. its also likely that the vm will crash when
live migrated.

All true. I asked to try -hypervisor only to verify where
we loose performance. Since you get good result with it
frequent access to PM timer is probably the reason. I do
not recommend using -hypervisor for production!


@gleb: do you know whats the state of in-kernel hyper-v
timers?

Vadim is working on it. I'll let him answer.

It would be nice to have synthetic timers supported. But,
at the moment, I'm only researching  this feature.

So it will take months at least?

I would say weeks.

Is there a way, we could contribute and help you with this?

Hi Peter,
You are welcome to add  an appropriate handler.

I think Vadim refers to this HV MSR
http://msdn.microsoft.com/en-us/library/windows/hardware/ff54263
3% 28 v=vs .85 %29.aspx

This one is pretty simple to support. Please see attachments for
more details. I was thinking about synthetic  timers
http://msdn.microsoft.com/en-
us/library/windows/hardware/ff542758(v=vs.85).aspx

is this what microsoft qpc uses as clocksource in hyper-v?

Yes, it should be enough for Win7 / W2K8R2.

To clarify the thing that microsoft qpc uses is what is implemented
by the patch Vadim attached to his previous email. But I believe
that additional qemu patch is needed for Windows to actually use
it.

You are right.
bits 1 and 9 must be set to on in leaf 0x4003 and HPET
should be completely removed from ACPI.

could you advise how to do this and/or make a patch?

Gleb mentioned that it properly handled in upstream,
otherwise just comment the entire HPET section in
acpi-dsdt.dsl file.

i have upstream bios installed. so -no-hpet should disable hpet
completely. can you give a hint, what
bits 1 and 9 must be set to on in leaf 0x4003 means?

I mean the following code:
+if (hyperv_ref_counter_enabled()) {
+c-eax |= HV_X64_MSR_TIME_REF_COUNT_AVAILABLE;
+c-eax |= 0x200;
+}

Please see attached file for more information.


the stuff you send yesterday is for qemu, right? would
it be possible to use it in qemu-kvm also?

Yes, but don't forget about kvm patch as well.

ok, i will try my best. would you consider your patch a quick hack
or do you think it would be worth to be uploaded to the upstream
repository?

It was just a brief attempt from my side, mostly inspirited by our with
Gleb conversation,  to see what it worth to turn this option on.
It is not fully tested. It will crash Win8 (as well as the rest of the
currently introduced hyper-v features).

i can confirm that windows 8 installer does not start and resets the
vm continously. it tries to access hv msr 0x4021


Win8 needs more comprehensive Hyper-V support.

yes it seems. i read your comment wrong. i was believing the
hv_refcnt breaks the other hv_features

Re: performance trouble

2012-03-27 Thread Peter Lieven

On 27.03.2012 17:44, Vadim Rozenfeld wrote:

On Tuesday, March 27, 2012 04:06:13 PM Peter Lieven wrote:

On 27.03.2012 14:29, Gleb Natapov wrote:

On Tue, Mar 27, 2012 at 02:28:04PM +0200, Peter Lieven wrote:

On 27.03.2012 14:26, Gleb Natapov wrote:

On Tue, Mar 27, 2012 at 02:20:23PM +0200, Peter Lieven wrote:

On 27.03.2012 12:00, Gleb Natapov wrote:

On Tue, Mar 27, 2012 at 11:26:29AM +0200, Peter Lieven wrote:

On 27.03.2012 11:23, Vadim Rozenfeld wrote:

On Tuesday, March 27, 2012 10:56:05 AM Gleb Natapov wrote:

On Mon, Mar 26, 2012 at 10:11:43PM +0200, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 08:54:50 PM Peter Lieven wrote:

On 26.03.2012 20:36, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote:

On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld

wrote:

On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote:

On 22.03.2012 10:38, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 10:52:42 AM Peter Lieven wrote:

On 22.03.2012 09:48, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov

wrote:

On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven

wrote:

On 21.03.2012 12:10, David Cure wrote:

hello,

Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov

ecrivait :

Try to addfeature policy='disable'
name='hypervisor'/  to cpu definition in XML and
check command line.


ok I try this but I can't usecpu model  to
map the host cpu

(my libvirt is 0.9.8) so I use :
   cpu match='exact'

 modelOpteron_G3/model
 feature policy='disable' name='hypervisor'/

   /cpu

(the physical server use Opteron CPU).

The log is here :
http://www.roullier.net/Report/report-3.2-vhost-net-1v
cpu-cp u.tx t.gz

And now with only 1 vcpu, the response time is 8.5s,
great

improvment. We keep this configuration for production
: we check the response time when some other users
are connected.

please keep in mind, that setting -hypervisor,
disabling hpet and only one vcpu
makes windows use tsc as clocksource. you have to make
sure, that your vm is not switching between physical
sockets on your system and that you have constant_tsc
feature to have a stable tsc between the cores in the
same socket. its also likely that the vm will crash
when live migrated.

All true. I asked to try -hypervisor only to verify
where we loose performance. Since you get good result
with it frequent access to PM timer is probably the
reason. I do not recommend using -hypervisor for
production!


@gleb: do you know whats the state of in-kernel hyper-v
timers?

Vadim is working on it. I'll let him answer.

It would be nice to have synthetic timers supported. But,
  at the moment, I'm only researching  this feature.

So it will take months at least?

I would say weeks.

Is there a way, we could contribute and help you with this?

Hi Peter,
You are welcome to add  an appropriate handler.

I think Vadim refers to this HV MSR
http://msdn.microsoft.com/en-us/library/windows/hardware/ff542
633%28 v=vs .85 %29.aspx

This one is pretty simple to support. Please see attachments
for more details. I was thinking about synthetic  timers
http://msdn.microsoft.com/en-
us/library/windows/hardware/ff542758(v=vs.85).aspx

is this what microsoft qpc uses as clocksource in hyper-v?

Yes, it should be enough for Win7 / W2K8R2.

To clarify the thing that microsoft qpc uses is what is
implemented by the patch Vadim attached to his previous email.
But I believe that additional qemu patch is needed for Windows to
actually use it.

You are right.
bits 1 and 9 must be set to on in leaf 0x4003 and HPET
should be completely removed from ACPI.

could you advise how to do this and/or make a patch?

the stuff you send yesterday is for qemu, right? would
it be possible to use it in qemu-kvm also?

No, they are for kernel.

i meant the qemu.diff file.

Yes, I missed the second attachment.


if i understand correctly i have to pass -cpu host,+hv_refcnt to qemu?

Looks like it.

ok, so it would be interesting if it helps to avoid the pmtimer reads
we observed earlier. right?

Yes.

first feedback: performance seems to be amazing. i cannot confirm that
it breaks hv_spinlocks, hv_vapic and hv_relaxed.
why did you assume this?

I didn't mean that hv_refcnt will break any other hyper-v features.
I just want to say that turning hv_refcnt on (as any other hv_ option) will
crash Win8 on boot-up.

yes, i got it meanwhile ;-)

let me know what you think should be done to further test
the refcnt implementation.

i would suggest to return at least 0x if msr 0x4021
is read.

peter

Cheers,
Vadim.


no more pmtimer reads. i can now almost fully utililizy a 1GBit
interface with a file transfer while there was not one
cpu core fully utilized as observed with pmtimer. some live migration
tests revealed that it did not crash even under load.

@vadim: i think we need a proper patch for the others to test

Re: performance trouble

2012-03-27 Thread Peter Lieven

On 27.03.2012 18:12, Vadim Rozenfeld wrote:

On Tuesday, March 27, 2012 05:58:01 PM Peter Lieven wrote:

On 27.03.2012 17:44, Vadim Rozenfeld wrote:

On Tuesday, March 27, 2012 04:06:13 PM Peter Lieven wrote:

On 27.03.2012 14:29, Gleb Natapov wrote:

On Tue, Mar 27, 2012 at 02:28:04PM +0200, Peter Lieven wrote:

On 27.03.2012 14:26, Gleb Natapov wrote:

On Tue, Mar 27, 2012 at 02:20:23PM +0200, Peter Lieven wrote:

On 27.03.2012 12:00, Gleb Natapov wrote:

On Tue, Mar 27, 2012 at 11:26:29AM +0200, Peter Lieven wrote:

On 27.03.2012 11:23, Vadim Rozenfeld wrote:

On Tuesday, March 27, 2012 10:56:05 AM Gleb Natapov wrote:

On Mon, Mar 26, 2012 at 10:11:43PM +0200, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 08:54:50 PM Peter Lieven wrote:

On 26.03.2012 20:36, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote:

On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld

wrote:

On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote:

On 22.03.2012 10:38, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 10:52:42 AM Peter Lieven

wrote:

On 22.03.2012 09:48, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov

wrote:

On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven

wrote:

On 21.03.2012 12:10, David Cure wrote:

hello,

Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb
Natapov

ecrivait :

Try to addfeature policy='disable'
name='hypervisor'/   to cpu definition in XML and
check command line.


ok I try this but I can't usecpu model
to map the host cpu

(my libvirt is 0.9.8) so I use :
cpu match='exact'

  modelOpteron_G3/model
  feature policy='disable'
  name='hypervisor'/

/cpu

(the physical server use Opteron CPU).

The log is here :
http://www.roullier.net/Report/report-3.2-vhost-net-
1v cpu-cp u.tx t.gz

And now with only 1 vcpu, the response time is
8.5s, great

improvment. We keep this configuration for
production

: we check the response time when some other users

are connected.

please keep in mind, that setting -hypervisor,
disabling hpet and only one vcpu
makes windows use tsc as clocksource. you have to
make sure, that your vm is not switching between
physical sockets on your system and that you have
constant_tsc feature to have a stable tsc between
the cores in the same socket. its also likely that
the vm will crash when live migrated.

All true. I asked to try -hypervisor only to verify
where we loose performance. Since you get good result
with it frequent access to PM timer is probably the
reason. I do not recommend using -hypervisor for
production!


@gleb: do you know whats the state of in-kernel
hyper-v timers?

Vadim is working on it. I'll let him answer.

It would be nice to have synthetic timers supported.
But,

   at the moment, I'm only researching  this feature.

So it will take months at least?

I would say weeks.

Is there a way, we could contribute and help you with
this?

Hi Peter,
You are welcome to add  an appropriate handler.

I think Vadim refers to this HV MSR
http://msdn.microsoft.com/en-us/library/windows/hardware/ff5
42 633%28 v=vs .85 %29.aspx

This one is pretty simple to support. Please see attachments
for more details. I was thinking about synthetic  timers
http://msdn.microsoft.com/en-
us/library/windows/hardware/ff542758(v=vs.85).aspx

is this what microsoft qpc uses as clocksource in hyper-v?

Yes, it should be enough for Win7 / W2K8R2.

To clarify the thing that microsoft qpc uses is what is
implemented by the patch Vadim attached to his previous email.
But I believe that additional qemu patch is needed for Windows
to actually use it.

You are right.
bits 1 and 9 must be set to on in leaf 0x4003 and HPET
should be completely removed from ACPI.

could you advise how to do this and/or make a patch?

the stuff you send yesterday is for qemu, right? would
it be possible to use it in qemu-kvm also?

No, they are for kernel.

i meant the qemu.diff file.

Yes, I missed the second attachment.


if i understand correctly i have to pass -cpu host,+hv_refcnt to
qemu?

Looks like it.

ok, so it would be interesting if it helps to avoid the pmtimer reads
we observed earlier. right?

Yes.

first feedback: performance seems to be amazing. i cannot confirm that
it breaks hv_spinlocks, hv_vapic and hv_relaxed.
why did you assume this?

I didn't mean that hv_refcnt will break any other hyper-v features.
I just want to say that turning hv_refcnt on (as any other hv_ option)
will crash Win8 on boot-up.

yes, i got it meanwhile ;-)

let me know what you think should be done to further test
the refcnt implementation.

i would suggest to return at least 0x if msr 0x4021
is read.

IIRC Win7(W2k8R2) only reads this MSR. Win8 reads and writes.

you mean win7 only writes, don't you?
at least you put a break in set_msr_hyperv for this msr.

i just thought that it would be ok to return

Re: performance trouble

2012-03-26 Thread Peter Lieven

On 22.03.2012 10:38, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 10:52:42 AM Peter Lieven wrote:

On 22.03.2012 09:48, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote:

On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote:

On 21.03.2012 12:10, David Cure wrote:

hello,

Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov ecrivait :

Try to addfeature policy='disable' name='hypervisor'/to cpu
definition in XML and check command line.


ok I try this but I can't usecpu modelto map the host cpu

(my libvirt is 0.9.8) so I use :
cpu match='exact'

  modelOpteron_G3/model
  feature policy='disable' name='hypervisor'/

/cpu

(the physical server use Opteron CPU).

The log is here :
http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-cpu.txt.gz

And now with only 1 vcpu, the response time is 8.5s, great

improvment. We keep this configuration for production : we check the
response time when some other users are connected.

please keep in mind, that setting -hypervisor, disabling hpet and
only one vcpu
makes windows use tsc as clocksource. you have to make sure, that your
vm is not switching between physical sockets on your system and that
you have constant_tsc feature to have a stable tsc between the cores
in the same socket. its also likely that the vm will crash when live
migrated.

All true. I asked to try -hypervisor only to verify where we loose
performance. Since you get good result with it frequent access to PM
timer is probably the reason. I do not recommend using -hypervisor for
production!


@gleb: do you know whats the state of in-kernel hyper-v timers?

Vadim is working on it. I'll let him answer.

It would be nice to have synthetic timers supported. But,  at the moment,
I'm only researching  this feature.

So it will take months at least?

I would say weeks.

Is there a way, we could contribute and help you with this?

Peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: performance trouble

2012-03-26 Thread Peter Lieven

On 26.03.2012 20:36, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote:

On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld wrote:

On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote:

On 22.03.2012 10:38, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 10:52:42 AM Peter Lieven wrote:

On 22.03.2012 09:48, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote:

On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote:

On 21.03.2012 12:10, David Cure wrote:

hello,

Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov ecrivait :

Try to addfeature policy='disable' name='hypervisor'/ to
cpu definition in XML and check command line.


ok I try this but I can't usecpu model to map the host cpu

(my libvirt is 0.9.8) so I use :
 cpu match='exact'

   modelOpteron_G3/model
   feature policy='disable' name='hypervisor'/

 /cpu

(the physical server use Opteron CPU).

The log is here :
http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-cpu.tx
t.gz

And now with only 1 vcpu, the response time is 8.5s, great

improvment. We keep this configuration for production : we check
the response time when some other users are connected.

please keep in mind, that setting -hypervisor, disabling hpet and
only one vcpu
makes windows use tsc as clocksource. you have to make sure, that
your vm is not switching between physical sockets on your system
and that you have constant_tsc feature to have a stable tsc
between the cores in the same socket. its also likely that the
vm will crash when live migrated.

All true. I asked to try -hypervisor only to verify where we loose
performance. Since you get good result with it frequent access to
PM timer is probably the reason. I do not recommend using
-hypervisor for production!


@gleb: do you know whats the state of in-kernel hyper-v timers?

Vadim is working on it. I'll let him answer.

It would be nice to have synthetic timers supported. But,  at the
moment, I'm only researching  this feature.

So it will take months at least?

I would say weeks.

Is there a way, we could contribute and help you with this?

Hi Peter,
You are welcome to add  an appropriate handler.

I think Vadim refers to this HV MSR
http://msdn.microsoft.com/en-us/library/windows/hardware/ff542633%28v=vs.85
%29.aspx

This one is pretty simple to support. Please see attachments for more details.
I was thinking about synthetic  timers http://msdn.microsoft.com/en-
us/library/windows/hardware/ff542758(v=vs.85).aspx

is this what microsoft qpc uses as clocksource in hyper-v?
i will check tomorrow.

thanks
vadim


--
Gleb.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: performance trouble

2012-03-22 Thread Peter Lieven

On 22.03.2012 08:53, Gleb Natapov wrote:

On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote:

On 21.03.2012 12:10, David Cure wrote:

hello,

Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov ecrivait :

Try to addfeature policy='disable' name='hypervisor'/   to cpu
definition in XML and check command line.

ok I try this but I can't usecpu model   to map the host cpu
(my libvirt is 0.9.8) so I use :

   cpu match='exact'
 modelOpteron_G3/model
 feature policy='disable' name='hypervisor'/
   /cpu

(the physical server use Opteron CPU).

The log is here :
http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-cpu.txt.gz

And now with only 1 vcpu, the response time is 8.5s, great
improvment. We keep this configuration for production : we check the
response time when some other users are connected.

please keep in mind, that setting -hypervisor, disabling hpet and
only one vcpu
makes windows use tsc as clocksource. you have to make sure, that your vm
is not switching between physical sockets on your system and that you have
constant_tsc feature to have a stable tsc between the cores in the
same socket. its also likely that the vm will crash when live migrated.


All true. I asked to try -hypervisor only to verify where we loose
performance. Since you get good result with it frequent access to PM
timer is probably the reason. I do not recommend using -hypervisor for
production!


@gleb: do you know whats the state of in-kernel hyper-v timers?


Vadim is working on it. I'll let him answer.
@avi, gleb: another option would be to revisit the old in-kernel 
pm-timer implementation
and check if its feasible to use this as an alternative. it would also 
help non hyper-v aware
systems (i think bsd and old windows like xp). i rebased this old 
implementation and can

confirm that it does also solve the performance slow down.

peter



peter


David.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: performance trouble

2012-03-22 Thread Peter Lieven

On 22.03.2012 09:31, David Cure wrote:

Le Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven ecrivait :

please keep in mind, that setting -hypervisor, disabling hpet and only
one vcpu
makes windows use tsc as clocksource. you have to make sure, that your vm
is not switching between physical sockets on your system and that you have
constant_tsc feature to have a stable tsc between the cores in the
same socket. its also likely that the vm will crash when live migrated.

ok, yet I only have disable hpet and use 1vcpu.

you have to use 1 vcpu on 32-bit windows. 64-bit seems to
work with more than 1 vcpu. why all those limitations:
windows avoids using tsc in a hypervisor which is a good decision.
problem is it falls back to pm_timer or hpet. both of them are very
expensive in emulation currently because kvm exits kernel mode
and the userspace qemu-kvm handles this. i have done experiments
where i saw ~20.000 userpace exits just for pmtimer reads. this
made up ~30-40% of the whole processing power. every call to a
QPC timer in windows causes a pm_timer/hpet read. especially
each i/o request seems to cause a QPC timer read and which
is odd as well a lazy fpu call (but this is a differnt issue) which also
is very expensive to emulate (currently).


for the switching I need to pin the vcpu on 1 physical proc,
right ?

you need 1 vcpu for 32-bit windows and disabling hypervisor to
cheat windows and make it think it runs on real hardware. it then
uses tsc.

for constant_tsc, how can I check if I use it ?

cat /proc/cpuinfo on the host. there should be a flag 'constant_tsc'.
it might be that also rdtscp is necessary.

for live migration : what is the feature that cause trouble :
-hypervisor, hpet, vcpu or all ?

using tsc as clocksource is the problem not the features themselves.

peter

David.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: performance trouble

2012-03-22 Thread Peter Lieven

On 22.03.2012 09:33, David Cure wrote:

Le Thu, Mar 22, 2012 at 09:53:45AM +0200, Gleb Natapov ecrivait :

All true. I asked to try -hypervisor only to verify where we loose
performance. Since you get good result with it frequent access to PM
timer is probably the reason. I do not recommend using -hypervisor for
production!

so if I leave cpu as previous (not defined) and only disable
hpet and use 1 vcpu, it's ok for production ?

this is ok, but windows will use pm timer so you will have bad performance.

Is there a workaround for this PM access ?
there exists old patches from 2010 for in-kernel pmtimer. they work, but 
only
partly. problem here is windows enables the pmtimer overflow interrupt 
which this
patch did not address (amongst other things). i simply ignored it and 
windows ran
nevertheless. but i would not do this in production because i do not 
know which side

effects it might have.

there are to possible solutions:

a) a real in-kernel pmtimer implementation (which does also help other 
systems not only windows)
b) hyper-v support in-kernel at least partly (for the timing stuff). 
this is being worked on by Vadim.


Peter

David.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: performance trouble

2012-03-22 Thread Peter Lieven

On 22.03.2012 09:48, Vadim Rozenfeld wrote:

On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote:

On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote:

On 21.03.2012 12:10, David Cure wrote:

hello,

Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov ecrivait :

Try to addfeature policy='disable' name='hypervisor'/   to cpu
definition in XML and check command line.


ok I try this but I can't usecpu model   to map the host cpu

(my libvirt is 0.9.8) so I use :
   cpu match='exact'

 modelOpteron_G3/model
 feature policy='disable' name='hypervisor'/

   /cpu

(the physical server use Opteron CPU).

The log is here :
http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-cpu.txt.gz

And now with only 1 vcpu, the response time is 8.5s, great

improvment. We keep this configuration for production : we check the
response time when some other users are connected.

please keep in mind, that setting -hypervisor, disabling hpet and
only one vcpu
makes windows use tsc as clocksource. you have to make sure, that your vm
is not switching between physical sockets on your system and that you
have constant_tsc feature to have a stable tsc between the cores in the
same socket. its also likely that the vm will crash when live migrated.

All true. I asked to try -hypervisor only to verify where we loose
performance. Since you get good result with it frequent access to PM
timer is probably the reason. I do not recommend using -hypervisor for
production!


@gleb: do you know whats the state of in-kernel hyper-v timers?

Vadim is working on it. I'll let him answer.

It would be nice to have synthetic timers supported. But,  at the moment,
I'm only researching  this feature.


So it will take months at least?

What do the others think, would it be feasible to make a proper in-kernel
pmtimer solution in the meantime.

I think Windows guest performance is very important for the success of KVM.

Peter


peter


David.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: performance trouble

2012-03-21 Thread Peter Lieven

On 21.03.2012 12:10, David Cure wrote:

hello,

Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov ecrivait :

Try to addfeature policy='disable' name='hypervisor'/  to cpu
definition in XML and check command line.

ok I try this but I can't usecpu model  to map the host cpu
(my libvirt is 0.9.8) so I use :

   cpu match='exact'
 modelOpteron_G3/model
 feature policy='disable' name='hypervisor'/
   /cpu

(the physical server use Opteron CPU).

The log is here :
http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-cpu.txt.gz

And now with only 1 vcpu, the response time is 8.5s, great
improvment. We keep this configuration for production : we check the
response time when some other users are connected.
please keep in mind, that setting -hypervisor, disabling hpet and only 
one vcpu

makes windows use tsc as clocksource. you have to make sure, that your vm
is not switching between physical sockets on your system and that you have
constant_tsc feature to have a stable tsc between the cores in the
same socket. its also likely that the vm will crash when live migrated.

@gleb: do you know whats the state of in-kernel hyper-v timers?

peter


David.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm-1.0 crashes with threaded vnc server?

2012-03-13 Thread Peter Lieven

On 11.02.2012 09:55, Corentin Chary wrote:

On Thu, Feb 9, 2012 at 7:08 PM, Peter Lievenp...@dlh.net  wrote:

Hi,

is anyone aware if there are still problems when enabling the threaded vnc
server?
I saw some VMs crashing when using a qemu-kvm build with
--enable-vnc-thread.

qemu-kvm-1.0[22646]: segfault at 0 ip 7fec1ca7ea0b sp 7fec19d056d0
error 6 in libz.so.1.2.3.3[7fec1ca75000+16000]
qemu-kvm-1.0[26056]: segfault at 7f06d8d6e010 ip 7f06e0a30d71 sp
7f06df035748 error 6 in libc-2.11.1.so[7f06e09aa000+17a000]

I had no time to debug further. It seems to happen shortly after migrating,
but thats uncertain. At least the segfault in libz seems to
give a hint to VNC since I cannot image of any other part of qemu-kvm using
libz except for VNC server.

Thanks,
Peter



Hi Peter,
I found two patches on my git tree that I sent long ago but somehow
get lost on the mailing list. I rebased the tree but did not have the
time (yet) to test them.
http://git.iksaif.net/?p=qemu.git;a=shortlog;h=refs/heads/wip
Feel free to try them. If QEMU segfault again, please send a full gdb
backtrace / valgrind trace / way to reproduce :).
Thanks,


I have seen no more crashes with these to patches applied. I would suggest
it would be good to push them to the master repository.

Thank you,
Peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] linux guests and ksm performance

2012-02-28 Thread Peter Lieven

On 24.02.2012 08:23, Stefan Hajnoczi wrote:

On Fri, Feb 24, 2012 at 6:53 AM, Stefan Hajnoczistefa...@gmail.com  wrote:

On Fri, Feb 24, 2012 at 6:41 AM, Stefan Hajnoczistefa...@gmail.com  wrote:

On Thu, Feb 23, 2012 at 7:08 PM, peter.lie...@gmail.comp...@dlh.net  wrote:

Stefan Hajnoczistefa...@gmail.com  schrieb:


On Thu, Feb 23, 2012 at 3:40 PM, Peter Lievenp...@dlh.net  wrote:

However, in a virtual machine I have not observed the above slow down

to

that extend
while the benefit of zero after free in a virtualisation environment

is

obvious:

1) zero pages can easily be merged by ksm or other technique.
2) zero (dup) pages are a lot faster to transfer in case of

migration.

The other approach is a memory page discard mechanism - which
obviously requires more code changes than zeroing freed pages.

The advantage is that we don't take the brute-force and CPU intensive
approach of zeroing pages.  It would be like a fine-grained ballooning
feature.


I dont think that it is cpu intense. All user pages are zeroed anyway, but at 
allocation time it shouldnt be a big difference in terms of cpu power.

It's easy to find a scenario where eagerly zeroing pages is wasteful.
Imagine a process that uses all of physical memory.  Once it
terminates the system is going to run processes that only use a small
set of pages.  It's pointless zeroing all those pages if we're not
going to use them anymore.

Perhaps the middle path is to zero pages but do it after a grace
timeout.  I wonder if this helps eliminate the 2-3% slowdown you
noticed when compiling.

Gah, it's too early in the morning.  I don't think this timer actually
makes sense.


do you think it makes then sense to make a patchset/proposal to notice a 
guest

kernel about the presense of ksm in the host and switch to zero after free?

peter


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm-1.0 crashes with threaded vnc server?

2012-02-28 Thread Peter Lieven

On 28.02.2012 09:37, Corentin Chary wrote:

On Mon, Feb 13, 2012 at 10:24 AM, Peter Lievenp...@dlh.net  wrote:

Am 11.02.2012 um 09:55 schrieb Corentin Chary:


On Thu, Feb 9, 2012 at 7:08 PM, Peter Lievenp...@dlh.net  wrote:

Hi,

is anyone aware if there are still problems when enabling the threaded vnc
server?
I saw some VMs crashing when using a qemu-kvm build with
--enable-vnc-thread.

qemu-kvm-1.0[22646]: segfault at 0 ip 7fec1ca7ea0b sp 7fec19d056d0
error 6 in libz.so.1.2.3.3[7fec1ca75000+16000]
qemu-kvm-1.0[26056]: segfault at 7f06d8d6e010 ip 7f06e0a30d71 sp
7f06df035748 error 6 in libc-2.11.1.so[7f06e09aa000+17a000]

I had no time to debug further. It seems to happen shortly after migrating,
but thats uncertain. At least the segfault in libz seems to
give a hint to VNC since I cannot image of any other part of qemu-kvm using
libz except for VNC server.

Thanks,
Peter



Hi Peter,
I found two patches on my git tree that I sent long ago but somehow
get lost on the mailing list. I rebased the tree but did not have the
time (yet) to test them.
http://git.iksaif.net/?p=qemu.git;a=shortlog;h=refs/heads/wip
Feel free to try them. If QEMU segfault again, please send a full gdb
backtrace / valgrind trace / way to reproduce :).
Thanks,

Hi Corentin,

thanks for rebasing those patches. I remember that I have seen them the
last time I noticed (about 1 year ago) that the threaded VNC is crashing.
I'm on vacation this week, but I will test them next week
and let you know if I can force a crash with them applied. If not we should
consider to include them asap.

Hi Peter, any news on that ?
sorry, i had much trouble debugging nasty slow windows vm problems last 
2 weeks.

but its still on my list. i'll keep you all posted.

peter






--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] linux guests and ksm performance

2012-02-28 Thread Peter Lieven

On 28.02.2012 13:05, Stefan Hajnoczi wrote:

On Tue, Feb 28, 2012 at 11:46 AM, Peter Lievenp...@dlh.net  wrote:

On 24.02.2012 08:23, Stefan Hajnoczi wrote:

On Fri, Feb 24, 2012 at 6:53 AM, Stefan Hajnoczistefa...@gmail.com
  wrote:

On Fri, Feb 24, 2012 at 6:41 AM, Stefan Hajnoczistefa...@gmail.com
  wrote:

On Thu, Feb 23, 2012 at 7:08 PM, peter.lie...@gmail.comp...@dlh.net
  wrote:

Stefan Hajnoczistefa...@gmail.comschrieb:


On Thu, Feb 23, 2012 at 3:40 PM, Peter Lievenp...@dlh.netwrote:

However, in a virtual machine I have not observed the above slow down

to

that extend
while the benefit of zero after free in a virtualisation environment

is

obvious:

1) zero pages can easily be merged by ksm or other technique.
2) zero (dup) pages are a lot faster to transfer in case of

migration.

The other approach is a memory page discard mechanism - which
obviously requires more code changes than zeroing freed pages.

The advantage is that we don't take the brute-force and CPU intensive
approach of zeroing pages.  It would be like a fine-grained ballooning
feature.


I dont think that it is cpu intense. All user pages are zeroed anyway,
but at allocation time it shouldnt be a big difference in terms of cpu
power.

It's easy to find a scenario where eagerly zeroing pages is wasteful.
Imagine a process that uses all of physical memory.  Once it
terminates the system is going to run processes that only use a small
set of pages.  It's pointless zeroing all those pages if we're not
going to use them anymore.

Perhaps the middle path is to zero pages but do it after a grace
timeout.  I wonder if this helps eliminate the 2-3% slowdown you
noticed when compiling.

Gah, it's too early in the morning.  I don't think this timer actually
makes sense.


do you think it makes then sense to make a patchset/proposal to notice a
guest
kernel about the presense of ksm in the host and switch to zero after free?

I think your idea is interesting - whether or not people are happy
with it will depend on the performance impact.  It seems reasonable to
me.

could you support/help me in implementing and publishing this approach?

Peter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux guests and ksm performance

2012-02-28 Thread Peter Lieven

On 28.02.2012 14:16, Avi Kivity wrote:

On 02/24/2012 08:41 AM, Stefan Hajnoczi wrote:

I dont think that it is cpu intense. All user pages are zeroed anyway, but at 
allocation time it shouldnt be a big difference in terms of cpu power.

It's easy to find a scenario where eagerly zeroing pages is wasteful.
Imagine a process that uses all of physical memory.  Once it
terminates the system is going to run processes that only use a small
set of pages.  It's pointless zeroing all those pages if we're not
going to use them anymore.

In the long term, we will use them, except if the guest is completely idle.

The scenario in which zeroing is expensive is when the page is refilled
through DMA.  In that case the zeroing was wasted.  This is a pretty
common scenario in pagecache intensive workloads.


Avi, what do you think of the proposal to give the guest vm a hint
that the host is running ksm? In that case the administrator
has already chosen that saving physical memory is more important
than performance to him?

Peter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


linux guests and ksm performance

2012-02-23 Thread Peter Lieven
Hi,

i have recently been playing with an old idea (originally in grsecurity
for security reasons) to change
the policy from zero on allocate to zero after free in the linux page
allocator. My concern is that linux
leaves a lot of waste in the physical memory unlike Windows which per
default zeros pages after
they are freed.

I have run some tests and I can confirm some old results that a hardware
Linux machine
is approximately 2-3% slower with zero after free on big compilation jobs.
This might be due
to either the fact that pages are only zeroed on allocate if GFP_ZERO is
set or due to caching
benefits.

However, in a virtual machine I have not observed the above slow down to
that extend
while the benefit of zero after free in a virtualisation environment is
obvious:

1) zero pages can easily be merged by ksm or other technique.
2) zero (dup) pages are a lot faster to transfer in case of migration.

Therefore I would like to hear your thoughts if it would be a good idea to
change
the strategy in the Linux kernel from zero on allocate to zero after free
automatically
if the 'hypervisor' cpu feature is set? Or even have another technique to
tell a linux
guest that ksm is running on the host.

If this is not feasible can someone think of a kernel module / userspace
program that
zeroes out unused pages periodically.

Peter


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux guests and ksm performance

2012-02-23 Thread Peter Lieven

Am 24.02.2012 um 08:23 schrieb Stefan Hajnoczi:

 On Fri, Feb 24, 2012 at 6:53 AM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Fri, Feb 24, 2012 at 6:41 AM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Thu, Feb 23, 2012 at 7:08 PM, peter.lie...@gmail.com p...@dlh.net 
 wrote:
 Stefan Hajnoczi stefa...@gmail.com schrieb:
 
 On Thu, Feb 23, 2012 at 3:40 PM, Peter Lieven p...@dlh.net wrote:
 However, in a virtual machine I have not observed the above slow down
 to
 that extend
 while the benefit of zero after free in a virtualisation environment
 is
 obvious:
 
 1) zero pages can easily be merged by ksm or other technique.
 2) zero (dup) pages are a lot faster to transfer in case of
 migration.
 
 The other approach is a memory page discard mechanism - which
 obviously requires more code changes than zeroing freed pages.
 
 The advantage is that we don't take the brute-force and CPU intensive
 approach of zeroing pages.  It would be like a fine-grained ballooning
 feature.
 
 
 I dont think that it is cpu intense. All user pages are zeroed anyway, but 
 at allocation time it shouldnt be a big difference in terms of cpu power.
 
 It's easy to find a scenario where eagerly zeroing pages is wasteful.
 Imagine a process that uses all of physical memory.  Once it
 terminates the system is going to run processes that only use a small
 set of pages.  It's pointless zeroing all those pages if we're not
 going to use them anymore.
 
 Perhaps the middle path is to zero pages but do it after a grace
 timeout.  I wonder if this helps eliminate the 2-3% slowdown you
 noticed when compiling.
 
 Gah, it's too early in the morning.  I don't think this timer actually
 makes sense.

ok, that would be the idea of an ansynchronous page zeroing in the guest. i also
think this is to complicated.

maybe the other idea is too simple:
is it possible to give the guest a hint that ksm is enabled on the host (lets 
say in
a way like its done with kvmclock). if ksm is enabled on the host the 
administrator
has already made the decision that performance is not so important and he/she
is eager to save physical memory. what if and only if this flag is set switch 
from
zero on allocate to zero after free. i think the whole thing is less than 10-20
lines of code. and its code that has been proven to be working well in 
grsecurity
for ages.

this might introduce a little (2-3%) overhead, but only if there is a lot of 
non GFP_FREE
memory is allocated, but its definitely faster than swapping. 
of course, it has to be garanteed that this code does not slow down normal 
systems
due to additionales branches (would it be enough to mark the if statements as 
unlikely?)

peter


peter





 
 Stefan

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: win7 bad i/o performance, high insn_emulation and exists

2012-02-21 Thread Peter Lieven

On 20.02.2012 21:45, Gleb Natapov wrote:

On Mon, Feb 20, 2012 at 08:59:38PM +0100, Peter Lieven wrote:

On 20.02.2012 20:04, Gleb Natapov wrote:

On Mon, Feb 20, 2012 at 08:40:08PM +0200, Gleb Natapov wrote:

On Mon, Feb 20, 2012 at 07:17:55PM +0100, Peter Lieven wrote:

Hi,

I came a across an issue with a Windows 7 (32-bit) as well as with a
Windows 2008 R2 (64-bit) guest.

If I transfer a file from the VM via CIFS or FTP to a remote machine,
i get very poor read performance (around 13MB/s). The VM peaks at 100%
cpu and I see a lot of insn_emulations and all kinds of exists in kvm_stat

efer_reload0 0
exits2260976 79620
fpu_reload  619711
halt_exits114734  5011
halt_wakeup   95  4876
host_state_reload1499659 60962
hypercalls 0 0
insn_emulation   1577325 58488
insn_emulation_fail0 0
invlpg 0 0
io_exits  943949 40249

Hmm, too many of those.


irq_exits 108679  5434
irq_injections236545 10788
irq_window  7606   246
largepages   672 5
mmio_exits460020 16082
mmu_cache_miss   119 0
mmu_flooded0 0
mmu_pde_zapped 0 0
mmu_pte_updated0 0
mmu_pte_write  13474 9
mmu_recycled   0 0
mmu_shadow_zapped141 0
mmu_unsync 0 0
nmi_injections 0 0
nmi_window 0 0
pf_fixed   2280335
pf_guest   0 0
remote_tlb_flush 239 2
request_irq0 0
signal_exits   0 0
tlb_flush  20933 0

If I run the same VM with a Ubuntu 10.04.4 guest I get around 60MB/s
throughput. The kvm_stats look a lot more sane.

efer_reload0 0
exits6132004 17931
fpu_reload 19863 3
halt_exits264961  3083
halt_wakeup   236468  2959
host_state_reload1104468  3104
hypercalls 0 0
insn_emulation   1417443  7518
insn_emulation_fail0 0
invlpg 0 0
io_exits  869380  2795
irq_exits 253501  2362
irq_injections616967  6804
irq_window201186  2161
largepages  1019 0
mmio_exits205268 0
mmu_cache_miss   192 0
mmu_flooded0 0
mmu_pde_zapped 0 0
mmu_pte_updated0 0
mmu_pte_write7440546 0
mmu_recycled   0 0
mmu_shadow_zapped259 0
mmu_unsync 0 0
nmi_injections 0 0
nmi_window 0 0
pf_fixed   3852930
pf_guest   0 0
remote_tlb_flush 761 1
request_irq0 0
signal_exits   0 0
tlb_flush  0 0

I use virtio-net (with vhost-net) and virtio-blk. I tried disabling
hpet (which basically illiminated the mmio_exits, but does not
increase
performance) and also commit (39a7a362e16bb27e98738d63f24d1ab5811e26a8
) - no improvement.

My commandline:
/usr/bin/qemu-kvm-1.0 -netdev
type=tap,id=guest8,script=no,downscript=no,ifname=tap0,vhost=on
-device virtio-net-pci,netdev=guest8,mac=52:54:00:ff:00:d3 -drive 
format=host_device,file=/dev/mapper/iqn.2001-05.com.equallogic:0-8a0906-eeef4e007-a8a9f3818674f2fc-lieven-windows7-vc-r80788,if=virtio,cache=none,aio=native
-m 2048 -smp 2 -monitor tcp:0:4001,server,nowait -vnc :1 -name
lieven-win7-vc -boot order=dc,menu=off -k de -pidfile
/var/run/qemu/vm-187.pid -mem-path /hugepages -mem-prealloc -cpu
host -rtc base=localtime -vga std -usb -usbdevice tablet -no-hpet

What further information is needed to debug this further?


Which kernel version (looks like something recent)?
Which host CPU (looks like something old)?

Output of cat /proc/cpuinfo


Which Windows' virtio drivers are you using?

Take a trace like described here http://www.linux-kvm.org/page/Tracing
(with -no-hpet please).


And also info pci output from qemu monitor while we are at it.

here is the output while i was tracing. you can download the trace
i took while i did a ftp transfer from the vm:

-  http://82.141.21.156/report.txt.gz


Windows reads PM timer. A lot. 15152 times per second.

Can you try to run this command in Windows guest:

   bcdedit /set {default

Re: win7 bad i/o performance, high insn_emulation and exists

2012-02-21 Thread Peter Lieven

On 21.02.2012 11:56, Gleb Natapov wrote:

On Tue, Feb 21, 2012 at 11:50:47AM +0100, Peter Lieven wrote:

I hope it will make Windows use TSC instead, but you can't be sure
about anything with Windows :(

Whatever it does now it eates more CPU has almost equal
number of exits and throughput is about the same (15MB/s).
If pmtimer is at 0xb008 it still reads it like hell.

I checked with bcedit /v that useplatformclock is set to No.

Yeah, today I noticed that it is likely virtio drivers that hammer
on PM timer (at least rip of the instruction that access it is
very close to rip of the instruction that access virtio pio).
Vadim, Windows driver developer,  is CCed.
Ok, I will switch to IDE and e1000 to confirm this? Or does it not make 
sense?


Peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   >