Re: [Qemu-devel] Help debugging a regression in KVM Module
> Am 18.08.2015 um 17:25 schrieb Radim Krčmář : > > 2015-08-18 16:54+0200, Peter Lieven: >> After some experiments I was able to find out the bad commit that introduced >> the regression: >> >> commit f30ebc312ca9def25650b4e1d01cdb425c310dca >> Author: Radim Krčmář >> Date: Thu Oct 30 15:06:47 2014 +0100 >> >> It seems that this optimisation is not working reliabliy after live >> migration. I can't reproduce if >> I take a 3.19 kernel and revert this single commit. > > Hello, this bug has gone unnoticed for a long time so it is fixed only > since v4.1 (and v3.19.stable was dead at that point). thanks for the pointer. i noticed the regression some time ago, but never found the time to debug. some distros rely on 3.19 e.g. Ubuntu LTS 14.04.2. I will try to ping the maintainer. Peter > > commit b6ac069532218027f2991cba01d7a72a200688b0 > Author: Radim Krčmář > Date: Fri Jun 5 20:57:41 2015 +0200 > >KVM: x86: fix lapic.timer_mode on restore > >lapic.timer_mode was not properly initialized after migration, which >broke few useful things, like login, by making every sleep eternal. > >Fix this by calling apic_update_lvtt in kvm_apic_post_state_restore. > >There are other slowpaths that update lvtt, so this patch makes sure >something similar doesn't happen again by calling apic_update_lvtt >after every modification. > >Cc: sta...@vger.kernel.org >Fixes: f30ebc312ca9 ("KVM: x86: optimize some accesses to LVTT and SPIV") >Signed-off-by: Radim Krčmář >Signed-off-by: Marcelo Tosatti -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Help debugging a regression in KVM Module
Am 14.08.2015 um 22:01 schrieb Alex Bennée: Peter Lieven writes: Hi, some time a go I stumbled across a regression in the KVM Module that has been introduced somewhere between 3.17 and 3.19. I have a rather old openSUSE guest with an XFS filesystem which realiably crashes after some live migrations. I originally believed that the issue might be related to my setup with a 3.12 host kernel and kvm-kmod 3.19, but I now found that it is also still present with a 3.19 host kernel with included 3.19 kvm module. My idea was to continue testing on a 3.12 host kernel and then bisect all commits to the kvm related parts. Now my question is how to best bisect only kvm related changes (those that go into kvm-kmod)? In general I don't bother. As it is a bisection you eliminate half the commits at a time you get their fairly quickly anyway. However you can tell bisect which parts of the tree you car about: git bisect start -- arch/arm64/kvm include/linux/kvm* include/uapi/linux/kvm* virt/kvm/ After some experiments I was able to find out the bad commit that introduced the regression: commit f30ebc312ca9def25650b4e1d01cdb425c310dca Author: Radim Krčmář Date: Thu Oct 30 15:06:47 2014 +0100 It seems that this optimisation is not working reliabliy after live migration. I can't reproduce if I take a 3.19 kernel and revert this single commit. Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Help debugging a regression in KVM Module
Am 14.08.2015 um 22:01 schrieb Alex Bennée: Peter Lieven writes: Hi, some time a go I stumbled across a regression in the KVM Module that has been introduced somewhere between 3.17 and 3.19. I have a rather old openSUSE guest with an XFS filesystem which realiably crashes after some live migrations. I originally believed that the issue might be related to my setup with a 3.12 host kernel and kvm-kmod 3.19, but I now found that it is also still present with a 3.19 host kernel with included 3.19 kvm module. My idea was to continue testing on a 3.12 host kernel and then bisect all commits to the kvm related parts. Now my question is how to best bisect only kvm related changes (those that go into kvm-kmod)? In general I don't bother. As it is a bisection you eliminate half the commits at a time you get their fairly quickly anyway. However you can tell bisect which parts of the tree you car about: git bisect start -- arch/arm64/kvm include/linux/kvm* include/uapi/linux/kvm* virt/kvm/ Yes, I just have to find out how exactly that works out if I want to bisect the linux submodule of the kvm-kmod repository. But thanks for the pointer on how to limit the directories. Thanks, Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help debugging a regression in KVM Module
Am 14.08.2015 um 15:01 schrieb Paolo Bonzini: > > - Original Message - >> From: "Peter Lieven" >> To: qemu-de...@nongnu.org, kvm@vger.kernel.org >> Cc: "Paolo Bonzini" >> Sent: Friday, August 14, 2015 1:11:34 PM >> Subject: Help debugging a regression in KVM Module >> >> Hi, >> >> some time a go I stumbled across a regression in the KVM Module that has been >> introduced somewhere >> between 3.17 and 3.19. >> >> I have a rather old openSUSE guest with an XFS filesystem which realiably >> crashes after some live migrations. >> I originally believed that the issue might be related to my setup with a 3.12 >> host kernel and kvm-kmod 3.19, >> but I now found that it is also still present with a 3.19 host kernel with >> included 3.19 kvm module. >> >> My idea was to continue testing on a 3.12 host kernel and then bisect all >> commits to the kvm related parts. >> >> Now my question is how to best bisect only kvm related changes (those that go >> into kvm-kmod)? > I haven't forgotten this. Sorry. :( > > Unfortunately I'll be away for three weeks, but I'll make it a priority > when I'm back. Its not time critical, but I think its worth investigating as it might affect other systems as well - and maybe XFS is only very sensitive. I suppose you are going on vacation. Enjoy! Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Help debugging a regression in KVM Module
Hi, some time a go I stumbled across a regression in the KVM Module that has been introduced somewhere between 3.17 and 3.19. I have a rather old openSUSE guest with an XFS filesystem which realiably crashes after some live migrations. I originally believed that the issue might be related to my setup with a 3.12 host kernel and kvm-kmod 3.19, but I now found that it is also still present with a 3.19 host kernel with included 3.19 kvm module. My idea was to continue testing on a 3.12 host kernel and then bisect all commits to the kvm related parts. Now my question is how to best bisect only kvm related changes (those that go into kvm-kmod)? Thanks, Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v3 1/2] add support for Hyper-V reference time counter
Am 12.01.2014 um 13:08 schrieb Vadim Rozenfeld : > On Wed, 2014-01-08 at 23:20 +0100, Peter Lieven wrote: >> Am 08.01.2014 21:08, schrieb Vadim Rozenfeld: >>> On Wed, 2014-01-08 at 15:54 +0100, Peter Lieven wrote: >>>> On 08.01.2014 13:12, Vadim Rozenfeld wrote: >>>>> On Wed, 2014-01-08 at 12:48 +0100, Peter Lieven wrote: >>>>>> On 08.01.2014 11:44, Vadim Rozenfeld wrote: >>>>>>> On Wed, 2014-01-08 at 11:15 +0100, Peter Lieven wrote: >>>>>>>> On 08.01.2014 10:40, Vadim Rozenfeld wrote: >>>>>>>>> On Tue, 2014-01-07 at 18:52 +0100, Peter Lieven wrote: >>>>>>>>>> Am 07.01.2014 10:36, schrieb Vadim Rozenfeld: >>>>>>>>>>> On Thu, 2014-01-02 at 17:52 +0100, Peter Lieven wrote: >>>>>>>>>>>> Am 11.12.2013 19:59, schrieb Marcelo Tosatti: >>>>>>>>>>>>> On Wed, Dec 11, 2013 at 04:53:05PM -0200, Marcelo Tosatti wrote: >>>>>>>>>>>>>> On Sun, Dec 08, 2013 at 10:33:38PM +1100, Vadim Rozenfeld wrote: >>>>>>>>>>>>>>> Signed-off: Peter Lieven >>>>>>>>>>>>>>> Signed-off: Gleb Natapov >>>>>>>>>>>>>>> Signed-off: Vadim Rozenfeld >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> v1 -> v2 >>>>>>>>>>>>>>> 1. mark TSC page dirty as suggested by >>>>>>>>>>>>>>> Eric Northup and Gleb >>>>>>>>>>>>>>> 2. disable local irq when calling get_kernel_ns, >>>>>>>>>>>>>>> as it was done by Peter Lieven >>>>>>>>>>>>>>> 3. move check for TSC page enable from second patch >>>>>>>>>>>>>>> to this one. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>>arch/x86/include/asm/kvm_host.h| 2 ++ >>>>>>>>>>>>>>>arch/x86/include/uapi/asm/hyperv.h | 13 + >>>>>>>>>>>>>>>arch/x86/kvm/x86.c | 39 >>>>>>>>>>>>>>> +- >>>>>>>>>>>>>>>include/uapi/linux/kvm.h | 1 + >>>>>>>>>>>>>>>4 files changed, 54 insertions(+), 1 deletion(-) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> diff --git a/arch/x86/include/asm/kvm_host.h >>>>>>>>>>>>>>> b/arch/x86/include/asm/kvm_host.h >>>>>>>>>>>>>>> index ae5d783..2fd0753 100644 >>>>>>>>>>>>>>> --- a/arch/x86/include/asm/kvm_host.h >>>>>>>>>>>>>>> +++ b/arch/x86/include/asm/kvm_host.h >>>>>>>>>>>>>>> @@ -605,6 +605,8 @@ struct kvm_arch { >>>>>>>>>>>>>>> /* fields used by HYPER-V emulation */ >>>>>>>>>>>>>>> u64 hv_guest_os_id; >>>>>>>>>>>>>>> u64 hv_hypercall; >>>>>>>>>>>>>>> + u64 hv_ref_count; >>>>>>>>>>>>>>> + u64 hv_tsc_page; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> #ifdef CONFIG_KVM_MMU_AUDIT >>>>>>>>>>>>>>> int audit_point; >>>>>>>>>>>>>>> diff --git a/arch/x86/include/uapi/asm/hyperv.h >>>>>>>>>>>>>>> b/arch/x86/include/uapi/asm/hyperv.h >>>>>>>>>>>>>>> index b8f1c01..462efe7 100644 >>>>>>>>>>>>>>> --- a/arch/x86/include/uapi/asm/hyperv.h >>>>>>>>>>>>>>> +++ b/arch/x86/include/uapi/asm/hyperv.h >>>>>>>>>>>>>>> @@ -28,6 +28,9 @@ >>>>>>>>>>>>>>&g
Re: [RFC PATCH v3 1/2] add support for Hyper-V reference time counter
Am 08.01.2014 21:08, schrieb Vadim Rozenfeld: > On Wed, 2014-01-08 at 15:54 +0100, Peter Lieven wrote: >> On 08.01.2014 13:12, Vadim Rozenfeld wrote: >>> On Wed, 2014-01-08 at 12:48 +0100, Peter Lieven wrote: >>>> On 08.01.2014 11:44, Vadim Rozenfeld wrote: >>>>> On Wed, 2014-01-08 at 11:15 +0100, Peter Lieven wrote: >>>>>> On 08.01.2014 10:40, Vadim Rozenfeld wrote: >>>>>>> On Tue, 2014-01-07 at 18:52 +0100, Peter Lieven wrote: >>>>>>>> Am 07.01.2014 10:36, schrieb Vadim Rozenfeld: >>>>>>>>> On Thu, 2014-01-02 at 17:52 +0100, Peter Lieven wrote: >>>>>>>>>> Am 11.12.2013 19:59, schrieb Marcelo Tosatti: >>>>>>>>>>> On Wed, Dec 11, 2013 at 04:53:05PM -0200, Marcelo Tosatti wrote: >>>>>>>>>>>> On Sun, Dec 08, 2013 at 10:33:38PM +1100, Vadim Rozenfeld wrote: >>>>>>>>>>>>> Signed-off: Peter Lieven >>>>>>>>>>>>> Signed-off: Gleb Natapov >>>>>>>>>>>>> Signed-off: Vadim Rozenfeld >>>>>>>>>>>>> >>>>>>>>>>>>> v1 -> v2 >>>>>>>>>>>>> 1. mark TSC page dirty as suggested by >>>>>>>>>>>>>Eric Northup and Gleb >>>>>>>>>>>>> 2. disable local irq when calling get_kernel_ns, >>>>>>>>>>>>>as it was done by Peter Lieven >>>>>>>>>>>>> 3. move check for TSC page enable from second patch >>>>>>>>>>>>>to this one. >>>>>>>>>>>>> >>>>>>>>>>>>> --- >>>>>>>>>>>>> arch/x86/include/asm/kvm_host.h| 2 ++ >>>>>>>>>>>>> arch/x86/include/uapi/asm/hyperv.h | 13 + >>>>>>>>>>>>> arch/x86/kvm/x86.c | 39 >>>>>>>>>>>>> +- >>>>>>>>>>>>> include/uapi/linux/kvm.h | 1 + >>>>>>>>>>>>> 4 files changed, 54 insertions(+), 1 deletion(-) >>>>>>>>>>>>> >>>>>>>>>>>>> diff --git a/arch/x86/include/asm/kvm_host.h >>>>>>>>>>>>> b/arch/x86/include/asm/kvm_host.h >>>>>>>>>>>>> index ae5d783..2fd0753 100644 >>>>>>>>>>>>> --- a/arch/x86/include/asm/kvm_host.h >>>>>>>>>>>>> +++ b/arch/x86/include/asm/kvm_host.h >>>>>>>>>>>>> @@ -605,6 +605,8 @@ struct kvm_arch { >>>>>>>>>>>>> /* fields used by HYPER-V emulation */ >>>>>>>>>>>>> u64 hv_guest_os_id; >>>>>>>>>>>>> u64 hv_hypercall; >>>>>>>>>>>>> + u64 hv_ref_count; >>>>>>>>>>>>> + u64 hv_tsc_page; >>>>>>>>>>>>> >>>>>>>>>>>>> #ifdef CONFIG_KVM_MMU_AUDIT >>>>>>>>>>>>> int audit_point; >>>>>>>>>>>>> diff --git a/arch/x86/include/uapi/asm/hyperv.h >>>>>>>>>>>>> b/arch/x86/include/uapi/asm/hyperv.h >>>>>>>>>>>>> index b8f1c01..462efe7 100644 >>>>>>>>>>>>> --- a/arch/x86/include/uapi/asm/hyperv.h >>>>>>>>>>>>> +++ b/arch/x86/include/uapi/asm/hyperv.h >>>>>>>>>>>>> @@ -28,6 +28,9 @@ >>>>>>>>>>>>> /* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) >>>>>>>>>>>>> available*/ >>>>>>>>>>>>> #define HV_X64_MSR_TIME_REF_COUNT_AVAILABLE (1 << 1) >>>>>>>>>>>>> >>>>>>>>>>>>> +/* A partition's reference time stamp counter (TSC) page */ >>>>>>>>>>>>> +#define HV_X64_MSR_REFERENCE_TSC 0x4021 >>>>>>>&
Re: [RFC PATCH v3 1/2] add support for Hyper-V reference time counter
On 08.01.2014 13:12, Vadim Rozenfeld wrote: On Wed, 2014-01-08 at 12:48 +0100, Peter Lieven wrote: On 08.01.2014 11:44, Vadim Rozenfeld wrote: On Wed, 2014-01-08 at 11:15 +0100, Peter Lieven wrote: On 08.01.2014 10:40, Vadim Rozenfeld wrote: On Tue, 2014-01-07 at 18:52 +0100, Peter Lieven wrote: Am 07.01.2014 10:36, schrieb Vadim Rozenfeld: On Thu, 2014-01-02 at 17:52 +0100, Peter Lieven wrote: Am 11.12.2013 19:59, schrieb Marcelo Tosatti: On Wed, Dec 11, 2013 at 04:53:05PM -0200, Marcelo Tosatti wrote: On Sun, Dec 08, 2013 at 10:33:38PM +1100, Vadim Rozenfeld wrote: Signed-off: Peter Lieven Signed-off: Gleb Natapov Signed-off: Vadim Rozenfeld v1 -> v2 1. mark TSC page dirty as suggested by Eric Northup and Gleb 2. disable local irq when calling get_kernel_ns, as it was done by Peter Lieven 3. move check for TSC page enable from second patch to this one. --- arch/x86/include/asm/kvm_host.h| 2 ++ arch/x86/include/uapi/asm/hyperv.h | 13 + arch/x86/kvm/x86.c | 39 +- include/uapi/linux/kvm.h | 1 + 4 files changed, 54 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index ae5d783..2fd0753 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -605,6 +605,8 @@ struct kvm_arch { /* fields used by HYPER-V emulation */ u64 hv_guest_os_id; u64 hv_hypercall; + u64 hv_ref_count; + u64 hv_tsc_page; #ifdef CONFIG_KVM_MMU_AUDIT int audit_point; diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h index b8f1c01..462efe7 100644 --- a/arch/x86/include/uapi/asm/hyperv.h +++ b/arch/x86/include/uapi/asm/hyperv.h @@ -28,6 +28,9 @@ /* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) available*/ #define HV_X64_MSR_TIME_REF_COUNT_AVAILABLE (1 << 1) +/* A partition's reference time stamp counter (TSC) page */ +#define HV_X64_MSR_REFERENCE_TSC 0x4021 + /* * There is a single feature flag that signifies the presence of the MSR * that can be used to retrieve both the local APIC Timer frequency as @@ -198,6 +201,9 @@ #define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK\ (~((1ull << HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1)) +#define HV_X64_MSR_TSC_REFERENCE_ENABLE0x0001 +#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT 12 + #define HV_PROCESSOR_POWER_STATE_C0 0 #define HV_PROCESSOR_POWER_STATE_C1 1 #define HV_PROCESSOR_POWER_STATE_C2 2 @@ -210,4 +216,11 @@ #define HV_STATUS_INVALID_ALIGNMENT 4 #define HV_STATUS_INSUFFICIENT_BUFFERS 19 +typedef struct _HV_REFERENCE_TSC_PAGE { + __u32 tsc_sequence; + __u32 res1; + __u64 tsc_scale; + __s64 tsc_offset; +} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE; + #endif diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 21ef1ba..5e4e495a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -840,7 +840,7 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc); static u32 msrs_to_save[] = { MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW, - HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, + HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, HV_X64_MSR_TIME_REF_COUNT, HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME, MSR_KVM_PV_EOI_EN, MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP, @@ -1826,6 +1826,8 @@ static bool kvm_hv_msr_partition_wide(u32 msr) switch (msr) { case HV_X64_MSR_GUEST_OS_ID: case HV_X64_MSR_HYPERCALL: + case HV_X64_MSR_REFERENCE_TSC: + case HV_X64_MSR_TIME_REF_COUNT: r = true; break; } @@ -1865,6 +1867,29 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data) if (__copy_to_user((void __user *)addr, instructions, 4)) return 1; kvm->arch.hv_hypercall = data; + local_irq_disable(); + kvm->arch.hv_ref_count = get_kernel_ns() + kvm->arch.kvmclock_offset; + local_irq_enable(); Where does the docs say that HV_X64_MSR_HYPERCALL is the where the clock starts counting? No need to store kvmclock_offset in hv_ref_count? (moreover the name is weird, better name would be "hv_ref_start_time". Just add kvmclock_offset when reading the values (otherwise you have a "stale copy" of kvmclock_offset in hv_ref_count). After some experiments I think we do no need kvm->arch.hv_ref_count at all. I was debugging some weird clockjump issues and I think the problem is that after live migration kvm->arch.hv_ref_count is initialized to 0. Dep
Re: [RFC PATCH v3 1/2] add support for Hyper-V reference time counter
On 08.01.2014 11:44, Vadim Rozenfeld wrote: On Wed, 2014-01-08 at 11:15 +0100, Peter Lieven wrote: On 08.01.2014 10:40, Vadim Rozenfeld wrote: On Tue, 2014-01-07 at 18:52 +0100, Peter Lieven wrote: Am 07.01.2014 10:36, schrieb Vadim Rozenfeld: On Thu, 2014-01-02 at 17:52 +0100, Peter Lieven wrote: Am 11.12.2013 19:59, schrieb Marcelo Tosatti: On Wed, Dec 11, 2013 at 04:53:05PM -0200, Marcelo Tosatti wrote: On Sun, Dec 08, 2013 at 10:33:38PM +1100, Vadim Rozenfeld wrote: Signed-off: Peter Lieven Signed-off: Gleb Natapov Signed-off: Vadim Rozenfeld v1 -> v2 1. mark TSC page dirty as suggested by Eric Northup and Gleb 2. disable local irq when calling get_kernel_ns, as it was done by Peter Lieven 3. move check for TSC page enable from second patch to this one. --- arch/x86/include/asm/kvm_host.h| 2 ++ arch/x86/include/uapi/asm/hyperv.h | 13 + arch/x86/kvm/x86.c | 39 +- include/uapi/linux/kvm.h | 1 + 4 files changed, 54 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index ae5d783..2fd0753 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -605,6 +605,8 @@ struct kvm_arch { /* fields used by HYPER-V emulation */ u64 hv_guest_os_id; u64 hv_hypercall; + u64 hv_ref_count; + u64 hv_tsc_page; #ifdef CONFIG_KVM_MMU_AUDIT int audit_point; diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h index b8f1c01..462efe7 100644 --- a/arch/x86/include/uapi/asm/hyperv.h +++ b/arch/x86/include/uapi/asm/hyperv.h @@ -28,6 +28,9 @@ /* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) available*/ #define HV_X64_MSR_TIME_REF_COUNT_AVAILABLE (1 << 1) +/* A partition's reference time stamp counter (TSC) page */ +#define HV_X64_MSR_REFERENCE_TSC 0x4021 + /* * There is a single feature flag that signifies the presence of the MSR * that can be used to retrieve both the local APIC Timer frequency as @@ -198,6 +201,9 @@ #define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK \ (~((1ull << HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1)) +#define HV_X64_MSR_TSC_REFERENCE_ENABLE0x0001 +#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT 12 + #define HV_PROCESSOR_POWER_STATE_C0 0 #define HV_PROCESSOR_POWER_STATE_C1 1 #define HV_PROCESSOR_POWER_STATE_C2 2 @@ -210,4 +216,11 @@ #define HV_STATUS_INVALID_ALIGNMENT 4 #define HV_STATUS_INSUFFICIENT_BUFFERS 19 +typedef struct _HV_REFERENCE_TSC_PAGE { + __u32 tsc_sequence; + __u32 res1; + __u64 tsc_scale; + __s64 tsc_offset; +} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE; + #endif diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 21ef1ba..5e4e495a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -840,7 +840,7 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc); static u32 msrs_to_save[] = { MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW, - HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, + HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, HV_X64_MSR_TIME_REF_COUNT, HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME, MSR_KVM_PV_EOI_EN, MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP, @@ -1826,6 +1826,8 @@ static bool kvm_hv_msr_partition_wide(u32 msr) switch (msr) { case HV_X64_MSR_GUEST_OS_ID: case HV_X64_MSR_HYPERCALL: + case HV_X64_MSR_REFERENCE_TSC: + case HV_X64_MSR_TIME_REF_COUNT: r = true; break; } @@ -1865,6 +1867,29 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data) if (__copy_to_user((void __user *)addr, instructions, 4)) return 1; kvm->arch.hv_hypercall = data; + local_irq_disable(); + kvm->arch.hv_ref_count = get_kernel_ns() + kvm->arch.kvmclock_offset; + local_irq_enable(); Where does the docs say that HV_X64_MSR_HYPERCALL is the where the clock starts counting? No need to store kvmclock_offset in hv_ref_count? (moreover the name is weird, better name would be "hv_ref_start_time". Just add kvmclock_offset when reading the values (otherwise you have a "stale copy" of kvmclock_offset in hv_ref_count). After some experiments I think we do no need kvm->arch.hv_ref_count at all. I was debugging some weird clockjump issues and I think the problem is that after live migration kvm->arch.hv_ref_count is initialized to 0. Depending on the uptime of the vServer when the hypercall was set up this can lead to series jumps. So I would sugg
Re: [RFC PATCH v3 1/2] add support for Hyper-V reference time counter
On 08.01.2014 10:40, Vadim Rozenfeld wrote: On Tue, 2014-01-07 at 18:52 +0100, Peter Lieven wrote: Am 07.01.2014 10:36, schrieb Vadim Rozenfeld: On Thu, 2014-01-02 at 17:52 +0100, Peter Lieven wrote: Am 11.12.2013 19:59, schrieb Marcelo Tosatti: On Wed, Dec 11, 2013 at 04:53:05PM -0200, Marcelo Tosatti wrote: On Sun, Dec 08, 2013 at 10:33:38PM +1100, Vadim Rozenfeld wrote: Signed-off: Peter Lieven Signed-off: Gleb Natapov Signed-off: Vadim Rozenfeld v1 -> v2 1. mark TSC page dirty as suggested by Eric Northup and Gleb 2. disable local irq when calling get_kernel_ns, as it was done by Peter Lieven 3. move check for TSC page enable from second patch to this one. --- arch/x86/include/asm/kvm_host.h| 2 ++ arch/x86/include/uapi/asm/hyperv.h | 13 + arch/x86/kvm/x86.c | 39 +- include/uapi/linux/kvm.h | 1 + 4 files changed, 54 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index ae5d783..2fd0753 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -605,6 +605,8 @@ struct kvm_arch { /* fields used by HYPER-V emulation */ u64 hv_guest_os_id; u64 hv_hypercall; + u64 hv_ref_count; + u64 hv_tsc_page; #ifdef CONFIG_KVM_MMU_AUDIT int audit_point; diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h index b8f1c01..462efe7 100644 --- a/arch/x86/include/uapi/asm/hyperv.h +++ b/arch/x86/include/uapi/asm/hyperv.h @@ -28,6 +28,9 @@ /* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) available*/ #define HV_X64_MSR_TIME_REF_COUNT_AVAILABLE (1 << 1) +/* A partition's reference time stamp counter (TSC) page */ +#define HV_X64_MSR_REFERENCE_TSC 0x4021 + /* * There is a single feature flag that signifies the presence of the MSR * that can be used to retrieve both the local APIC Timer frequency as @@ -198,6 +201,9 @@ #define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK \ (~((1ull << HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1)) +#define HV_X64_MSR_TSC_REFERENCE_ENABLE0x0001 +#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT 12 + #define HV_PROCESSOR_POWER_STATE_C0 0 #define HV_PROCESSOR_POWER_STATE_C1 1 #define HV_PROCESSOR_POWER_STATE_C2 2 @@ -210,4 +216,11 @@ #define HV_STATUS_INVALID_ALIGNMENT 4 #define HV_STATUS_INSUFFICIENT_BUFFERS19 +typedef struct _HV_REFERENCE_TSC_PAGE { + __u32 tsc_sequence; + __u32 res1; + __u64 tsc_scale; + __s64 tsc_offset; +} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE; + #endif diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 21ef1ba..5e4e495a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -840,7 +840,7 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc); static u32 msrs_to_save[] = { MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW, - HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, + HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, HV_X64_MSR_TIME_REF_COUNT, HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME, MSR_KVM_PV_EOI_EN, MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP, @@ -1826,6 +1826,8 @@ static bool kvm_hv_msr_partition_wide(u32 msr) switch (msr) { case HV_X64_MSR_GUEST_OS_ID: case HV_X64_MSR_HYPERCALL: + case HV_X64_MSR_REFERENCE_TSC: + case HV_X64_MSR_TIME_REF_COUNT: r = true; break; } @@ -1865,6 +1867,29 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data) if (__copy_to_user((void __user *)addr, instructions, 4)) return 1; kvm->arch.hv_hypercall = data; + local_irq_disable(); + kvm->arch.hv_ref_count = get_kernel_ns() + kvm->arch.kvmclock_offset; + local_irq_enable(); Where does the docs say that HV_X64_MSR_HYPERCALL is the where the clock starts counting? No need to store kvmclock_offset in hv_ref_count? (moreover the name is weird, better name would be "hv_ref_start_time". Just add kvmclock_offset when reading the values (otherwise you have a "stale copy" of kvmclock_offset in hv_ref_count). After some experiments I think we do no need kvm->arch.hv_ref_count at all. I was debugging some weird clockjump issues and I think the problem is that after live migration kvm->arch.hv_ref_count is initialized to 0. Depending on the uptime of the vServer when the hypercall was set up this can lead to series jumps. So I would suggest to completely drop kvm->arch.hv_ref_count. And use simply this in get_msr_hyperv_pw(). case HV
Re: [RFC PATCH v3 1/2] add support for Hyper-V reference time counter
Am 07.01.2014 10:36, schrieb Vadim Rozenfeld: > On Thu, 2014-01-02 at 17:52 +0100, Peter Lieven wrote: >> Am 11.12.2013 19:59, schrieb Marcelo Tosatti: >>> On Wed, Dec 11, 2013 at 04:53:05PM -0200, Marcelo Tosatti wrote: >>>> On Sun, Dec 08, 2013 at 10:33:38PM +1100, Vadim Rozenfeld wrote: >>>>> Signed-off: Peter Lieven >>>>> Signed-off: Gleb Natapov >>>>> Signed-off: Vadim Rozenfeld >>>>> >>>>> v1 -> v2 >>>>> 1. mark TSC page dirty as suggested by >>>>> Eric Northup and Gleb >>>>> 2. disable local irq when calling get_kernel_ns, >>>>> as it was done by Peter Lieven >>>>> 3. move check for TSC page enable from second patch >>>>> to this one. >>>>> >>>>> --- >>>>> arch/x86/include/asm/kvm_host.h| 2 ++ >>>>> arch/x86/include/uapi/asm/hyperv.h | 13 + >>>>> arch/x86/kvm/x86.c | 39 >>>>> +- >>>>> include/uapi/linux/kvm.h | 1 + >>>>> 4 files changed, 54 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/arch/x86/include/asm/kvm_host.h >>>>> b/arch/x86/include/asm/kvm_host.h >>>>> index ae5d783..2fd0753 100644 >>>>> --- a/arch/x86/include/asm/kvm_host.h >>>>> +++ b/arch/x86/include/asm/kvm_host.h >>>>> @@ -605,6 +605,8 @@ struct kvm_arch { >>>>> /* fields used by HYPER-V emulation */ >>>>> u64 hv_guest_os_id; >>>>> u64 hv_hypercall; >>>>> + u64 hv_ref_count; >>>>> + u64 hv_tsc_page; >>>>> >>>>> #ifdef CONFIG_KVM_MMU_AUDIT >>>>> int audit_point; >>>>> diff --git a/arch/x86/include/uapi/asm/hyperv.h >>>>> b/arch/x86/include/uapi/asm/hyperv.h >>>>> index b8f1c01..462efe7 100644 >>>>> --- a/arch/x86/include/uapi/asm/hyperv.h >>>>> +++ b/arch/x86/include/uapi/asm/hyperv.h >>>>> @@ -28,6 +28,9 @@ >>>>> /* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) available*/ >>>>> #define HV_X64_MSR_TIME_REF_COUNT_AVAILABLE (1 << 1) >>>>> >>>>> +/* A partition's reference time stamp counter (TSC) page */ >>>>> +#define HV_X64_MSR_REFERENCE_TSC 0x4021 >>>>> + >>>>> /* >>>>> * There is a single feature flag that signifies the presence of the MSR >>>>> * that can be used to retrieve both the local APIC Timer frequency as >>>>> @@ -198,6 +201,9 @@ >>>>> #define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK \ >>>>> (~((1ull << HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1)) >>>>> >>>>> +#define HV_X64_MSR_TSC_REFERENCE_ENABLE 0x0001 >>>>> +#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT 12 >>>>> + >>>>> #define HV_PROCESSOR_POWER_STATE_C0 0 >>>>> #define HV_PROCESSOR_POWER_STATE_C1 1 >>>>> #define HV_PROCESSOR_POWER_STATE_C2 2 >>>>> @@ -210,4 +216,11 @@ >>>>> #define HV_STATUS_INVALID_ALIGNMENT 4 >>>>> #define HV_STATUS_INSUFFICIENT_BUFFERS 19 >>>>> >>>>> +typedef struct _HV_REFERENCE_TSC_PAGE { >>>>> + __u32 tsc_sequence; >>>>> + __u32 res1; >>>>> + __u64 tsc_scale; >>>>> + __s64 tsc_offset; >>>>> +} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE; >>>>> + >>>>> #endif >>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>>>> index 21ef1ba..5e4e495a 100644 >>>>> --- a/arch/x86/kvm/x86.c >>>>> +++ b/arch/x86/kvm/x86.c >>>>> @@ -840,7 +840,7 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc); >>>>> static u32 msrs_to_save[] = { >>>>> MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, >>>>> MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW, >>>>> - HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, >>>>> + HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, HV_X64_MSR_TIME_REF_COUNT, >>>>> HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME, >>>>> MSR_KVM_PV_EOI_EN, >>>>> MSR_IA32_SYSENTER_CS, MSR_IA32_SYSE
Re: [RFC PATCH v3 1/2] add support for Hyper-V reference time counter
Am 11.12.2013 19:59, schrieb Marcelo Tosatti: > On Wed, Dec 11, 2013 at 04:53:05PM -0200, Marcelo Tosatti wrote: >> On Sun, Dec 08, 2013 at 10:33:38PM +1100, Vadim Rozenfeld wrote: >>> Signed-off: Peter Lieven >>> Signed-off: Gleb Natapov >>> Signed-off: Vadim Rozenfeld >>> >>> v1 -> v2 >>> 1. mark TSC page dirty as suggested by >>> Eric Northup and Gleb >>> 2. disable local irq when calling get_kernel_ns, >>> as it was done by Peter Lieven >>> 3. move check for TSC page enable from second patch >>> to this one. >>> >>> --- >>> arch/x86/include/asm/kvm_host.h| 2 ++ >>> arch/x86/include/uapi/asm/hyperv.h | 13 + >>> arch/x86/kvm/x86.c | 39 >>> +- >>> include/uapi/linux/kvm.h | 1 + >>> 4 files changed, 54 insertions(+), 1 deletion(-) >>> >>> diff --git a/arch/x86/include/asm/kvm_host.h >>> b/arch/x86/include/asm/kvm_host.h >>> index ae5d783..2fd0753 100644 >>> --- a/arch/x86/include/asm/kvm_host.h >>> +++ b/arch/x86/include/asm/kvm_host.h >>> @@ -605,6 +605,8 @@ struct kvm_arch { >>> /* fields used by HYPER-V emulation */ >>> u64 hv_guest_os_id; >>> u64 hv_hypercall; >>> + u64 hv_ref_count; >>> + u64 hv_tsc_page; >>> >>> #ifdef CONFIG_KVM_MMU_AUDIT >>> int audit_point; >>> diff --git a/arch/x86/include/uapi/asm/hyperv.h >>> b/arch/x86/include/uapi/asm/hyperv.h >>> index b8f1c01..462efe7 100644 >>> --- a/arch/x86/include/uapi/asm/hyperv.h >>> +++ b/arch/x86/include/uapi/asm/hyperv.h >>> @@ -28,6 +28,9 @@ >>> /* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) available*/ >>> #define HV_X64_MSR_TIME_REF_COUNT_AVAILABLE(1 << 1) >>> >>> +/* A partition's reference time stamp counter (TSC) page */ >>> +#define HV_X64_MSR_REFERENCE_TSC 0x4021 >>> + >>> /* >>> * There is a single feature flag that signifies the presence of the MSR >>> * that can be used to retrieve both the local APIC Timer frequency as >>> @@ -198,6 +201,9 @@ >>> #define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK \ >>> (~((1ull << HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1)) >>> >>> +#define HV_X64_MSR_TSC_REFERENCE_ENABLE0x0001 >>> +#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT 12 >>> + >>> #define HV_PROCESSOR_POWER_STATE_C00 >>> #define HV_PROCESSOR_POWER_STATE_C11 >>> #define HV_PROCESSOR_POWER_STATE_C22 >>> @@ -210,4 +216,11 @@ >>> #define HV_STATUS_INVALID_ALIGNMENT4 >>> #define HV_STATUS_INSUFFICIENT_BUFFERS 19 >>> >>> +typedef struct _HV_REFERENCE_TSC_PAGE { >>> + __u32 tsc_sequence; >>> + __u32 res1; >>> + __u64 tsc_scale; >>> + __s64 tsc_offset; >>> +} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE; >>> + >>> #endif >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>> index 21ef1ba..5e4e495a 100644 >>> --- a/arch/x86/kvm/x86.c >>> +++ b/arch/x86/kvm/x86.c >>> @@ -840,7 +840,7 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc); >>> static u32 msrs_to_save[] = { >>> MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, >>> MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW, >>> - HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, >>> + HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, HV_X64_MSR_TIME_REF_COUNT, >>> HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME, >>> MSR_KVM_PV_EOI_EN, >>> MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP, >>> @@ -1826,6 +1826,8 @@ static bool kvm_hv_msr_partition_wide(u32 msr) >>> switch (msr) { >>> case HV_X64_MSR_GUEST_OS_ID: >>> case HV_X64_MSR_HYPERCALL: >>> + case HV_X64_MSR_REFERENCE_TSC: >>> + case HV_X64_MSR_TIME_REF_COUNT: >>> r = true; >>> break; >>> } >>> @@ -1865,6 +1867,29 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, >>> u32 msr, u64 data) >>> if (__copy_to_user((void __user *)addr, instructions, 4)) >>> return 1; >>> kvm->arch.hv_hypercall = data; >>> +
Re: [RFC PATCH v3 1/2] add support for Hyper-V reference time counter
Am 02.01.2014 14:57, schrieb Marcelo Tosatti: > On Thu, Jan 02, 2014 at 02:15:48PM +0100, Peter Lieven wrote: >> Am 11.12.2013 19:53, schrieb Marcelo Tosatti: >>> On Sun, Dec 08, 2013 at 10:33:38PM +1100, Vadim Rozenfeld wrote: >>>> Signed-off: Peter Lieven >>>> Signed-off: Gleb Natapov >>>> Signed-off: Vadim Rozenfeld >>>> >>>> v1 -> v2 >>>> 1. mark TSC page dirty as suggested by >>>> Eric Northup and Gleb >>>> 2. disable local irq when calling get_kernel_ns, >>>> as it was done by Peter Lieven >>>> 3. move check for TSC page enable from second patch >>>> to this one. >>>> >>>> --- >>>> arch/x86/include/asm/kvm_host.h| 2 ++ >>>> arch/x86/include/uapi/asm/hyperv.h | 13 + >>>> arch/x86/kvm/x86.c | 39 >>>> +- >>>> include/uapi/linux/kvm.h | 1 + >>>> 4 files changed, 54 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/arch/x86/include/asm/kvm_host.h >>>> b/arch/x86/include/asm/kvm_host.h >>>> index ae5d783..2fd0753 100644 >>>> --- a/arch/x86/include/asm/kvm_host.h >>>> +++ b/arch/x86/include/asm/kvm_host.h >>>> @@ -605,6 +605,8 @@ struct kvm_arch { >>>>/* fields used by HYPER-V emulation */ >>>>u64 hv_guest_os_id; >>>>u64 hv_hypercall; >>>> + u64 hv_ref_count; >>>> + u64 hv_tsc_page; >>>> >>>>#ifdef CONFIG_KVM_MMU_AUDIT >>>>int audit_point; >>>> diff --git a/arch/x86/include/uapi/asm/hyperv.h >>>> b/arch/x86/include/uapi/asm/hyperv.h >>>> index b8f1c01..462efe7 100644 >>>> --- a/arch/x86/include/uapi/asm/hyperv.h >>>> +++ b/arch/x86/include/uapi/asm/hyperv.h >>>> @@ -28,6 +28,9 @@ >>>> /* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) available*/ >>>> #define HV_X64_MSR_TIME_REF_COUNT_AVAILABLE (1 << 1) >>>> >>>> +/* A partition's reference time stamp counter (TSC) page */ >>>> +#define HV_X64_MSR_REFERENCE_TSC 0x4021 >>>> + >>>> /* >>>> * There is a single feature flag that signifies the presence of the MSR >>>> * that can be used to retrieve both the local APIC Timer frequency as >>>> @@ -198,6 +201,9 @@ >>>> #define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK \ >>>>(~((1ull << HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1)) >>>> >>>> +#define HV_X64_MSR_TSC_REFERENCE_ENABLE 0x0001 >>>> +#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT12 >>>> + >>>> #define HV_PROCESSOR_POWER_STATE_C0 0 >>>> #define HV_PROCESSOR_POWER_STATE_C1 1 >>>> #define HV_PROCESSOR_POWER_STATE_C2 2 >>>> @@ -210,4 +216,11 @@ >>>> #define HV_STATUS_INVALID_ALIGNMENT 4 >>>> #define HV_STATUS_INSUFFICIENT_BUFFERS19 >>>> >>>> +typedef struct _HV_REFERENCE_TSC_PAGE { >>>> + __u32 tsc_sequence; >>>> + __u32 res1; >>>> + __u64 tsc_scale; >>>> + __s64 tsc_offset; >>>> +} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE; >>>> + >>>> #endif >>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>>> index 21ef1ba..5e4e495a 100644 >>>> --- a/arch/x86/kvm/x86.c >>>> +++ b/arch/x86/kvm/x86.c >>>> @@ -840,7 +840,7 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc); >>>> static u32 msrs_to_save[] = { >>>>MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, >>>>MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW, >>>> - HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, >>>> + HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, HV_X64_MSR_TIME_REF_COUNT, >>>>HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME, >>>>MSR_KVM_PV_EOI_EN, >>>>MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP, >>>> @@ -1826,6 +1826,8 @@ static bool kvm_hv_msr_partition_wide(u32 msr) >>>>switch (msr) { >>>>case HV_X64_MSR_GUEST_OS_ID: >>>>case HV_X64_MSR_HYPERCALL: >>>> + case HV_X64_MSR_REFERENCE_TSC: >>>> + case HV_X64_MSR_TIME_REF_COUNT: >>>>
Re: [RFC PATCH v3 1/2] add support for Hyper-V reference time counter
Am 11.12.2013 19:53, schrieb Marcelo Tosatti: > On Sun, Dec 08, 2013 at 10:33:38PM +1100, Vadim Rozenfeld wrote: >> Signed-off: Peter Lieven >> Signed-off: Gleb Natapov >> Signed-off: Vadim Rozenfeld >> >> v1 -> v2 >> 1. mark TSC page dirty as suggested by >> Eric Northup and Gleb >> 2. disable local irq when calling get_kernel_ns, >> as it was done by Peter Lieven >> 3. move check for TSC page enable from second patch >> to this one. >> >> --- >> arch/x86/include/asm/kvm_host.h| 2 ++ >> arch/x86/include/uapi/asm/hyperv.h | 13 + >> arch/x86/kvm/x86.c | 39 >> +- >> include/uapi/linux/kvm.h | 1 + >> 4 files changed, 54 insertions(+), 1 deletion(-) >> >> diff --git a/arch/x86/include/asm/kvm_host.h >> b/arch/x86/include/asm/kvm_host.h >> index ae5d783..2fd0753 100644 >> --- a/arch/x86/include/asm/kvm_host.h >> +++ b/arch/x86/include/asm/kvm_host.h >> @@ -605,6 +605,8 @@ struct kvm_arch { >> /* fields used by HYPER-V emulation */ >> u64 hv_guest_os_id; >> u64 hv_hypercall; >> +u64 hv_ref_count; >> +u64 hv_tsc_page; >> >> #ifdef CONFIG_KVM_MMU_AUDIT >> int audit_point; >> diff --git a/arch/x86/include/uapi/asm/hyperv.h >> b/arch/x86/include/uapi/asm/hyperv.h >> index b8f1c01..462efe7 100644 >> --- a/arch/x86/include/uapi/asm/hyperv.h >> +++ b/arch/x86/include/uapi/asm/hyperv.h >> @@ -28,6 +28,9 @@ >> /* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) available*/ >> #define HV_X64_MSR_TIME_REF_COUNT_AVAILABLE (1 << 1) >> >> +/* A partition's reference time stamp counter (TSC) page */ >> +#define HV_X64_MSR_REFERENCE_TSC0x4021 >> + >> /* >> * There is a single feature flag that signifies the presence of the MSR >> * that can be used to retrieve both the local APIC Timer frequency as >> @@ -198,6 +201,9 @@ >> #define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK\ >> (~((1ull << HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1)) >> >> +#define HV_X64_MSR_TSC_REFERENCE_ENABLE 0x0001 >> +#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT 12 >> + >> #define HV_PROCESSOR_POWER_STATE_C0 0 >> #define HV_PROCESSOR_POWER_STATE_C1 1 >> #define HV_PROCESSOR_POWER_STATE_C2 2 >> @@ -210,4 +216,11 @@ >> #define HV_STATUS_INVALID_ALIGNMENT 4 >> #define HV_STATUS_INSUFFICIENT_BUFFERS 19 >> >> +typedef struct _HV_REFERENCE_TSC_PAGE { >> +__u32 tsc_sequence; >> +__u32 res1; >> +__u64 tsc_scale; >> +__s64 tsc_offset; >> +} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE; >> + >> #endif >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index 21ef1ba..5e4e495a 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -840,7 +840,7 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc); >> static u32 msrs_to_save[] = { >> MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, >> MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW, >> -HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, >> +HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, HV_X64_MSR_TIME_REF_COUNT, >> HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME, >> MSR_KVM_PV_EOI_EN, >> MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP, >> @@ -1826,6 +1826,8 @@ static bool kvm_hv_msr_partition_wide(u32 msr) >> switch (msr) { >> case HV_X64_MSR_GUEST_OS_ID: >> case HV_X64_MSR_HYPERCALL: >> +case HV_X64_MSR_REFERENCE_TSC: >> +case HV_X64_MSR_TIME_REF_COUNT: >> r = true; >> break; >> } >> @@ -1865,6 +1867,29 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, >> u32 msr, u64 data) >> if (__copy_to_user((void __user *)addr, instructions, 4)) >> return 1; >> kvm->arch.hv_hypercall = data; >> +local_irq_disable(); >> +kvm->arch.hv_ref_count = get_kernel_ns() + >> kvm->arch.kvmclock_offset; >> +local_irq_enable(); > > Where does the docs say that HV_X64_MSR_HYPERCALL is the where the clock > starts counting? > > No need to store kvmclock_offset in hv_ref_count? (moreover > the name is weird, better name would be "hv_ref_start_time". > >> +break; >> +
Re: [Qemu-devel] [Bug 1100843] Re: Live Migration Causes Performance Issues
On 07.10.2013 11:55, Paolo Bonzini wrote: Il 07/10/2013 11:49, Peter Lieven ha scritto: It's in general not easy to do this if you take non-x86 targets into account. What about the dirty way to zero out all non zero pages at the beginning of ram_load? I'm not sure I follow? sth like this for each ram block at the beginning of ram_load. +base = memory_region_get_ram_ptr(block->mr); +for (offset = 0; offset < block->length; + offset += TARGET_PAGE_SIZE) { +if (!is_zero_page(base + offset)) { +memset(base + offset, 0x00, TARGET_PAGE_SIZE); +} +} + Then add a capability "skip_zero_pages" which does not sent them on the source and enables this zeroing. it would also be possible to skip the zero check for each incoming compressed pages. Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [Bug 1100843] Re: Live Migration Causes Performance Issues
On 07.10.2013 11:37, Paolo Bonzini wrote: Il 07/10/2013 08:38, Peter Lieven ha scritto: On 06.10.2013 15:57, Zhang Haoyu wrote: >From my testing this has been fixed in the saucy version (1.5.0) of qemu. It is fixed by this patch: f1c72795af573b24a7da5eb52375c9aba8a37972 However later in the history this commit was reverted, and again broke this. The other commit that fixes this is: 211ea74022f51164a7729030b28eec90b6c99a08 See below post,please. https://lists.gnu.org/archive/html/qemu-devel/2013-08/msg05062.html I would still like to fix qemu to not load roms etc. if we set up a migration target. In this case we could drop the madvise, skip the checking for zero pages and also could avoid sending zero pages at all. It would be the cleanest solution. It's in general not easy to do this if you take non-x86 targets into account. What about the dirty way to zero out all non zero pages at the beginning of ram_load? Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [Bug 1100843] Re: Live Migration Causes Performance Issues
On 06.10.2013 15:57, Zhang Haoyu wrote: >From my testing this has been fixed in the saucy version (1.5.0) of qemu. It is fixed by this patch: f1c72795af573b24a7da5eb52375c9aba8a37972 However later in the history this commit was reverted, and again broke this. The other commit that fixes this is: 211ea74022f51164a7729030b28eec90b6c99a08 See below post,please. https://lists.gnu.org/archive/html/qemu-devel/2013-08/msg05062.html I would still like to fix qemu to not load roms etc. if we set up a migration target. In this case we could drop the madvise, skip the checking for zero pages and also could avoid sending zero pages at all. It would be the cleanest solution. Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 1/2] add support for Hyper-V reference time counter
On 23.05.2013 15:23, Paolo Bonzini wrote: Il 23/05/2013 15:20, Peter Lieven ha scritto: On 23.05.2013 15:18, Paolo Bonzini wrote: Il 23/05/2013 14:25, Vadim Rozenfeld ha scritto: - Original Message - From: "Peter Lieven" To: "Paolo Bonzini" Cc: "Vadim Rozenfeld" , "Marcelo Tosatti" , kvm@vger.kernel.org, g...@redhat.com, p...@dlh.net Sent: Thursday, May 23, 2013 4:17:57 PM Subject: Re: [RFC PATCH v2 1/2] add support for Hyper-V reference time counter On 22.05.2013 23:55, Paolo Bonzini wrote: Il 22/05/2013 09:32, Vadim Rozenfeld ha scritto: @@ -1827,6 +1829,29 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data) if (__copy_to_user((void __user *)addr, instructions, 4)) return 1; kvm->arch.hv_hypercall = data; +local_irq_disable(); +kvm->arch.hv_ref_count = get_kernel_ns(); +local_irq_enable(); +break; local_irq_disable/local_irq_enable not needed. What is the reasoning behind reading this time value at msr write time? [VR] Windows writs this MSR only once, during HAL initialization. So, I decided to treat this call as a partition crate event. But is it expected by Windows that the reference count starts counting up from 0 at partition creation time? If you could just use (get_kernel_ns() + kvm->arch.kvmclock_offset) / 100, it would also be simpler for migration purposes. I can just report, that I have used the patch that does it that way and it works. Maybe Windows is calculating the uptime by the reference counter? [VR] Windows use it (reference counters/iTSC/PMTimer/HPET) as a time-stamp source for (Ke)QueryPerformanceCounter function. So I would prefer to remove kvm->arch.hv_ref_count altogether. But only if the migration support is guaranteed. Migration support wouldn't work yet anyway, you need to recompute the scale and sequence. But that could be done by KVM_SET_CLOCK. hv_ref_counter does work out of the box. what I was trying to say is even it is slower than iTSC, it is significantly faster than hpet or pmtimer and I can confirm it works flawlessly with migration. And what if we have a host which lacks invariant TSC support? Then the sequence must be set to 0 or 0x, I still haven't understood. :) yes, but windows does then fall back to pmtimer or hpet which is much slower then reference counter. Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 1/2] add support for Hyper-V reference time counter
On 23.05.2013 15:18, Paolo Bonzini wrote: Il 23/05/2013 14:25, Vadim Rozenfeld ha scritto: - Original Message - From: "Peter Lieven" To: "Paolo Bonzini" Cc: "Vadim Rozenfeld" , "Marcelo Tosatti" , kvm@vger.kernel.org, g...@redhat.com, p...@dlh.net Sent: Thursday, May 23, 2013 4:17:57 PM Subject: Re: [RFC PATCH v2 1/2] add support for Hyper-V reference time counter On 22.05.2013 23:55, Paolo Bonzini wrote: Il 22/05/2013 09:32, Vadim Rozenfeld ha scritto: @@ -1827,6 +1829,29 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data) if (__copy_to_user((void __user *)addr, instructions, 4)) return 1; kvm->arch.hv_hypercall = data; + local_irq_disable(); + kvm->arch.hv_ref_count = get_kernel_ns(); + local_irq_enable(); + break; local_irq_disable/local_irq_enable not needed. What is the reasoning behind reading this time value at msr write time? [VR] Windows writs this MSR only once, during HAL initialization. So, I decided to treat this call as a partition crate event. But is it expected by Windows that the reference count starts counting up from 0 at partition creation time? If you could just use (get_kernel_ns() + kvm->arch.kvmclock_offset) / 100, it would also be simpler for migration purposes. I can just report, that I have used the patch that does it that way and it works. Maybe Windows is calculating the uptime by the reference counter? [VR] Windows use it (reference counters/iTSC/PMTimer/HPET) as a time-stamp source for (Ke)QueryPerformanceCounter function. So I would prefer to remove kvm->arch.hv_ref_count altogether. But only if the migration support is guaranteed. And what if we have a host which lacks invariant TSC support? Peter Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 2/2] add support for Hyper-V invariant TSC
On 23.05.2013 14:33, Vadim Rozenfeld wrote: - Original Message - From: "Peter Lieven" To: "Marcelo Tosatti" Cc: "Vadim Rozenfeld" , kvm@vger.kernel.org, g...@redhat.com, p...@dlh.net Sent: Thursday, May 23, 2013 4:18:55 PM Subject: Re: [RFC PATCH v2 2/2] add support for Hyper-V invariant TSC On 22.05.2013 23:23, Marcelo Tosatti wrote: On Wed, May 22, 2013 at 03:22:55AM -0400, Vadim Rozenfeld wrote: - Original Message - From: "Marcelo Tosatti" To: "Vadim Rozenfeld" Cc: kvm@vger.kernel.org, g...@redhat.com, p...@dlh.net Sent: Wednesday, May 22, 2013 10:50:46 AM Subject: Re: [RFC PATCH v2 2/2] add support for Hyper-V invariant TSC On Sun, May 19, 2013 at 05:06:37PM +1000, Vadim Rozenfeld wrote: The following patch allows to activate a partition reference time enlightenment that is based on the host platform's support for an Invariant Time Stamp Counter (iTSC). NOTE: This code will survive migration due to lack of VM stop/resume handlers, when offset, scale and sequence should be readjusted. --- arch/x86/kvm/x86.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9645dab..b423fe4 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1838,7 +1838,6 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data) u64 gfn; unsigned long addr; HV_REFERENCE_TSC_PAGE tsc_ref; - tsc_ref.TscSequence = 0; if (!(data & HV_X64_MSR_TSC_REFERENCE_ENABLE)) { kvm->arch.hv_tsc_page = data; break; @@ -1848,6 +1847,11 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data) HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT); if (kvm_is_error_hva(addr)) return 1; + tsc_ref.TscSequence = + boot_cpu_has(X86_FEATURE_CONSTANT_TSC) ? 1 : 0; 1) You want NONSTOP_TSC (see 40fb1715 commit) which matches INVARIANT TSC. [VR] Thank you for reviewing. Will fix it. 2) TscSequence should increase? "This field serves as a sequence number that is incremented whenever..." [VR] Yes, on every VM resume, including migration. After migration we also need to recalculate scale and adjust offset. 3) 0x is the value for invalid source of reference time? [VR] Yes, on boot-up. In this case guest will go with PMTimer (not sure about HPET but I can check). But if we set sequence to 0x after migration - it's probably will not work. "Reference TSC during Save and Restore and Migration To address migration scenarios to physical platforms that do not support iTSC, the TscSequence field is used. In the event that a guest partition is migrated from an iTSC capable host to a non-iTSC capable host, the hypervisor sets TscSequence to the special value of 0x, which directs the guest operating system to fall back to a different clock source (for example, the virtual PM timer)." Why it would not/does not work after migration? what exactly do we heed the reference TSC for? the reference counter alone works great and it seems that there is a lot of trouble and crash possibilities involved with the referece tsc. [VR] Because it is incredibly light and fast. The simple test which calls QueryPerformanceCounter in a loop 10 millions times gives we the following results: PMTimer 32269 ms HPET38466 ms Ref Count 6499 ms iTSC1169 ms is the ref_count with local_irq_disable or preempt_disable? Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 1/2] add support for Hyper-V reference time counter
On 23.05.2013 11:54, Paolo Bonzini wrote: Il 23/05/2013 08:17, Peter Lieven ha scritto: On 22.05.2013 23:55, Paolo Bonzini wrote: Il 22/05/2013 09:32, Vadim Rozenfeld ha scritto: @@ -1827,6 +1829,29 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data) if (__copy_to_user((void __user *)addr, instructions, 4)) return 1; kvm->arch.hv_hypercall = data; +local_irq_disable(); +kvm->arch.hv_ref_count = get_kernel_ns(); +local_irq_enable(); +break; local_irq_disable/local_irq_enable not needed. What is the reasoning behind reading this time value at msr write time? [VR] Windows writs this MSR only once, during HAL initialization. So, I decided to treat this call as a partition crate event. But is it expected by Windows that the reference count starts counting up from 0 at partition creation time? If you could just use (get_kernel_ns() + kvm->arch.kvmclock_offset) / 100, it would also be simpler for migration purposes. I can just report, that I have used the patch that does it that way and it works. What do you mean by "that way"? :) Ups sorry… I meant the way it was implemented in the old patch (I sent a few days ago). @@ -1426,6 +1428,21 @@ static int set_msr_hyperv_pw(struct kvm_ if (__copy_to_user((void *)addr, instructions, 4)) return 1; kvm->arch.hv_hypercall = data; + kvm->arch.hv_ref_count = get_kernel_ns(); + break; + } + case HV_X64_MSR_REFERENCE_TSC: { + u64 gfn; + unsigned long addr; + u32 hv_tsc_sequence; + gfn = data >> HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT; + addr = gfn_to_hva(kvm, gfn); + if (kvm_is_error_hva(addr)) + return 1; + hv_tsc_sequence = 0x0; //invalid + if (__copy_to_user((void *)addr, (void __user *) &hv_tsc_sequence, sizeof(hv_tsc_sequence))) + return 1; + kvm->arch.hv_reference_tsc = data; break; } default: @@ -1826,6 +1843,17 @@ static int get_msr_hyperv_pw(struct kvm_ case HV_X64_MSR_HYPERCALL: data = kvm->arch.hv_hypercall; break; + case HV_X64_MSR_TIME_REF_COUNT: { + u64 now_ns; + local_irq_disable(); + now_ns = get_kernel_ns(); + data = div_u64(now_ns + kvm->arch.kvmclock_offset - kvm->arch.hv_ref_count,100); + local_irq_enable(); + break; + } + case HV_X64_MSR_REFERENCE_TSC: + data = kvm->arch.hv_reference_tsc; + break; default: pr_unimpl(vcpu, "Hyper-V unhandled rdmsr: 0x%x\n", msr); return 1; Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 2/2] add support for Hyper-V invariant TSC
On 22.05.2013 23:23, Marcelo Tosatti wrote: On Wed, May 22, 2013 at 03:22:55AM -0400, Vadim Rozenfeld wrote: - Original Message - From: "Marcelo Tosatti" To: "Vadim Rozenfeld" Cc: kvm@vger.kernel.org, g...@redhat.com, p...@dlh.net Sent: Wednesday, May 22, 2013 10:50:46 AM Subject: Re: [RFC PATCH v2 2/2] add support for Hyper-V invariant TSC On Sun, May 19, 2013 at 05:06:37PM +1000, Vadim Rozenfeld wrote: The following patch allows to activate a partition reference time enlightenment that is based on the host platform's support for an Invariant Time Stamp Counter (iTSC). NOTE: This code will survive migration due to lack of VM stop/resume handlers, when offset, scale and sequence should be readjusted. --- arch/x86/kvm/x86.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9645dab..b423fe4 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1838,7 +1838,6 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data) u64 gfn; unsigned long addr; HV_REFERENCE_TSC_PAGE tsc_ref; - tsc_ref.TscSequence = 0; if (!(data & HV_X64_MSR_TSC_REFERENCE_ENABLE)) { kvm->arch.hv_tsc_page = data; break; @@ -1848,6 +1847,11 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data) HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT); if (kvm_is_error_hva(addr)) return 1; + tsc_ref.TscSequence = + boot_cpu_has(X86_FEATURE_CONSTANT_TSC) ? 1 : 0; 1) You want NONSTOP_TSC (see 40fb1715 commit) which matches INVARIANT TSC. [VR] Thank you for reviewing. Will fix it. 2) TscSequence should increase? "This field serves as a sequence number that is incremented whenever..." [VR] Yes, on every VM resume, including migration. After migration we also need to recalculate scale and adjust offset. 3) 0x is the value for invalid source of reference time? [VR] Yes, on boot-up. In this case guest will go with PMTimer (not sure about HPET but I can check). But if we set sequence to 0x after migration - it's probably will not work. "Reference TSC during Save and Restore and Migration To address migration scenarios to physical platforms that do not support iTSC, the TscSequence field is used. In the event that a guest partition is migrated from an iTSC capable host to a non-iTSC capable host, the hypervisor sets TscSequence to the special value of 0x, which directs the guest operating system to fall back to a different clock source (for example, the virtual PM timer)." Why it would not/does not work after migration? what exactly do we heed the reference TSC for? the reference counter alone works great and it seems that there is a lot of trouble and crash possibilities involved with the referece tsc. Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 1/2] add support for Hyper-V reference time counter
On 22.05.2013 23:55, Paolo Bonzini wrote: Il 22/05/2013 09:32, Vadim Rozenfeld ha scritto: @@ -1827,6 +1829,29 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data) if (__copy_to_user((void __user *)addr, instructions, 4)) return 1; kvm->arch.hv_hypercall = data; + local_irq_disable(); + kvm->arch.hv_ref_count = get_kernel_ns(); + local_irq_enable(); + break; local_irq_disable/local_irq_enable not needed. What is the reasoning behind reading this time value at msr write time? [VR] Windows writs this MSR only once, during HAL initialization. So, I decided to treat this call as a partition crate event. But is it expected by Windows that the reference count starts counting up from 0 at partition creation time? If you could just use (get_kernel_ns() + kvm->arch.kvmclock_offset) / 100, it would also be simpler for migration purposes. I can just report, that I have used the patch that does it that way and it works. Maybe Windows is calculating the uptime by the reference counter? Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/2] Hyper-H reference counter
Hi all, sorry that I am a bit unresponsive about this series. I have a few days off and can't spend much time in this. If I read that the REFERENCE TSC breaks migration I don't think its a good option to include it at all. I have this hyperv_refcnt MSR in an internal patch I sent over about 1.5 years ago and its working flawlessly with Win2k8R2, Win7, Win8 + Win2012. I set the reference TSC to 0x00 and this seems to work with all the above Windows versions. Some of the early Alphas of Windows 8 didn't work with this patch, but the final is running smoothly also with migration etc. I crafted this patch to avoid the heavy calls to PM Timer during high I/O which slowed down Windows approx. by 30% compared to Hyper-V. I reinclude this patch for reference. Its unchanged since mid 2012 and it might not apply. Cheers, Peter diff -Npur kvm-kmod-3.3/include/asm-x86/hyperv.h kvm-kmod-3.3-hyperv-refcnt/include/asm-x86/hyperv.h --- kvm-kmod-3.3/include/asm-x86/hyperv.h 2012-03-19 23:00:49.0 +0100 +++ kvm-kmod-3.3-hyperv-refcnt/include/asm-x86/hyperv.h 2012-03-28 12:23:02.0 +0200 @@ -169,7 +169,8 @@ /* MSR used to read the per-partition time reference counter */ #define HV_X64_MSR_TIME_REF_COUNT 0x4020 - +#define HV_X64_MSR_REFERENCE_TSC 0x4021 + /* Define the virtual APIC registers */ #define HV_X64_MSR_EOI 0x4070 #define HV_X64_MSR_ICR 0x4071 diff -Npur kvm-kmod-3.3/include/asm-x86/kvm_host.h kvm-kmod-3.3-hyperv-refcnt/include/asm-x86/kvm_host.h --- kvm-kmod-3.3/include/asm-x86/kvm_host.h 2012-03-19 23:00:49.0 +0100 +++ kvm-kmod-3.3-hyperv-refcnt/include/asm-x86/kvm_host.h 2012-03-28 15:08:24.0 +0200 @@ -553,6 +553,8 @@ struct kvm_arch { /* fields used by HYPER-V emulation */ u64 hv_guest_os_id; u64 hv_hypercall; + u64 hv_ref_count; + u64 hv_reference_tsc; atomic_t reader_counter; diff -Npur kvm-kmod-3.3/x86/x86.c kvm-kmod-3.3-hyperv-refcnt/x86/x86.c --- kvm-kmod-3.3/x86/x86.c 2012-03-19 23:00:56.0 +0100 +++ kvm-kmod-3.3-hyperv-refcnt/x86/x86.c2012-03-28 16:27:46.0 +0200 @@ -826,7 +826,7 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc); static u32 msrs_to_save[] = { MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW, - HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, + HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, HV_X64_MSR_TIME_REF_COUNT, HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME, MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP, MSR_STAR, @@ -1387,6 +1387,8 @@ static bool kvm_hv_msr_partition_wide(u3 switch (msr) { case HV_X64_MSR_GUEST_OS_ID: case HV_X64_MSR_HYPERCALL: + case HV_X64_MSR_REFERENCE_TSC: + case HV_X64_MSR_TIME_REF_COUNT: r = true; break; } @@ -1426,6 +1428,21 @@ static int set_msr_hyperv_pw(struct kvm_ if (__copy_to_user((void *)addr, instructions, 4)) return 1; kvm->arch.hv_hypercall = data; + kvm->arch.hv_ref_count = get_kernel_ns(); + break; + } + case HV_X64_MSR_REFERENCE_TSC: { + u64 gfn; + unsigned long addr; + u32 hv_tsc_sequence; + gfn = data >> HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT; + addr = gfn_to_hva(kvm, gfn); + if (kvm_is_error_hva(addr)) + return 1; + hv_tsc_sequence = 0x0; //invalid + if (__copy_to_user((void *)addr, (void __user *) &hv_tsc_sequence, sizeof(hv_tsc_sequence))) + return 1; + kvm->arch.hv_reference_tsc = data; break; } default: @@ -1826,6 +1843,17 @@ static int get_msr_hyperv_pw(struct kvm_ case HV_X64_MSR_HYPERCALL: data = kvm->arch.hv_hypercall; break; + case HV_X64_MSR_TIME_REF_COUNT: { + u64 now_ns; + local_irq_disable(); + now_ns = get_kernel_ns(); + data = div_u64(now_ns + kvm->arch.kvmclock_offset - kvm->arch.hv_ref_count,100); + local_irq_enable(); + break; + } + case HV_X64_MSR_REFERENCE_TSC: + data = kvm->arch.hv_reference_tsc; + break; default: pr_unimpl(vcpu, "Hyper-V unhandled rdmsr: 0x%x\n", msr); return 1; Am 20.05.2013 um 11:41 schrieb Gleb Natapov : > On Mon, May 20, 2013 at 11:32:27AM +0200, Paolo Bonzini wrote: >> Il 20/05/2013 11:25, Gleb Natapov ha scritto: >>> So in Hyper-V spec they >>> say: >>> >>> Special value of 0x is used to indicate that this facili
Re: [RFC PATCH 1/2] Hyper-H reference counter
On 13.05.2013 13:45, Vadim Rozenfeld wrote: Signed-off: Peter Lieven Signed-off: Gleb Natapov Signed-off: Vadim Rozenfeld The following patch allows to activate Hyper-V reference time counter --- arch/x86/include/asm/kvm_host.h| 2 ++ arch/x86/include/uapi/asm/hyperv.h | 3 +++ arch/x86/kvm/x86.c | 25 - 3 files changed, 29 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 3741c65..f0fee35 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -575,6 +575,8 @@ struct kvm_arch { /* fields used by HYPER-V emulation */ u64 hv_guest_os_id; u64 hv_hypercall; + u64 hv_ref_count; + u64 hv_tsc_page; #ifdef CONFIG_KVM_MMU_AUDIT int audit_point; diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h index b80420b..9711819 100644 --- a/arch/x86/include/uapi/asm/hyperv.h +++ b/arch/x86/include/uapi/asm/hyperv.h @@ -136,6 +136,9 @@ /* MSR used to read the per-partition time reference counter */ #define HV_X64_MSR_TIME_REF_COUNT 0x4020 +/* A partition's reference time stamp counter (TSC) page */ +#define HV_X64_MSR_REFERENCE_TSC 0x4021 + /* Define the virtual APIC registers */ #define HV_X64_MSR_EOI0x4070 #define HV_X64_MSR_ICR0x4071 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 094b5d9..1a4036d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -843,7 +843,7 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc); static u32 msrs_to_save[] = { MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW, - HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, + HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,HV_X64_MSR_TIME_REF_COUNT, HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME, MSR_KVM_PV_EOI_EN, MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP, @@ -1764,6 +1764,8 @@ static bool kvm_hv_msr_partition_wide(u32 msr) switch (msr) { case HV_X64_MSR_GUEST_OS_ID: case HV_X64_MSR_HYPERCALL: + case HV_X64_MSR_REFERENCE_TSC: + case HV_X64_MSR_TIME_REF_COUNT: r = true; break; } @@ -1803,6 +1805,21 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data) if (__copy_to_user((void __user *)addr, instructions, 4)) return 1; kvm->arch.hv_hypercall = data; + kvm->arch.hv_ref_count = get_kernel_ns(); + break; + } + case HV_X64_MSR_REFERENCE_TSC: { + u64 gfn; + unsigned long addr; + u32 tsc_ref; + gfn = data >> HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT; + addr = gfn_to_hva(kvm, gfn); + if (kvm_is_error_hva(addr)) + return 1; + tsc_ref = 0; + if(__copy_to_user((void __user *)addr, &tsc_ref, sizeof(tsc_ref))) + return 1; + kvm->arch.hv_tsc_page = data; break; } default: @@ -2229,6 +2246,12 @@ static int get_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case HV_X64_MSR_HYPERCALL: data = kvm->arch.hv_hypercall; break; + case HV_X64_MSR_TIME_REF_COUNT: + data = div_u64(get_kernel_ns() - kvm->arch.hv_ref_count,100); + break; in an earlier version of this patch I have the following: + case HV_X64_MSR_TIME_REF_COUNT: { + u64 now_ns; + local_irq_disable(); + now_ns = get_kernel_ns(); + data = div_u64(now_ns + kvm->arch.kvmclock_offset - kvm->arch.hv_ref_count,100); + local_irq_enable(); + break; + } I do not know if this is right, but I can report that this one is working without any flaws since approx. 1.5 years. Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Ubuntu/Debian Installer + Virtio-SCSI -> Bad ram pointer
On 19.11.2012 18:20, Stefan Hajnoczi wrote: On Thu, Nov 8, 2012 at 4:26 PM, Peter Lieven wrote: Has anyone any other idea what the cause could be or where to start? Hi Peter, I suggested posting the source tree you are building. Since you have applied patches yourself no one else is able to follow along with the gdb output or reproduce the issue accurately. Sorry for the late reply, I used qemu git at e24dc9feb0d68142d54dc3c097f57588836d1338 and libiscsi git at 3b3036b9dae55f0c3eef9d75db89c7b78f637a12. The cmdline: qemu-system-x86_64 -enable-kvm -m 1024 -drive if=virtio,file=iscsi://172.21.200.56/iqn.2001-05.com.equallogic:0-8a0906-62ff4e007-e4a3c8908af50839-test-3000g/0 -cdrom ubuntu-12.04.1-server-amd64.iso -vnc :1 The vm crashes with: Bad ram pointer 0x7fd220008000 after the user settings and timezone config when loading the module libdmraid1.0.0.rc16-udeb I hope this helps to reproduce. Peter Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Ubuntu/Debian Installer + Virtio-SCSI -> Bad ram pointer
Has anyone any other idea what the cause could be or where to start? Peter Am 31.10.2012 um 15:08 schrieb ronnie sahlberg: > On Tue, Oct 30, 2012 at 10:48 PM, Stefan Hajnoczi wrote: >> On Tue, Oct 30, 2012 at 10:09 PM, ronnie sahlberg >> wrote: >>> About half a year there was an issue where recent kernels had added >>> support to start using new scsi opcodes, but the qemu functions that >>> determine "which transfer direction is used for this opcode" had not >>> yet been updated, so that the opcode was sent with the wrong transfer >>> direction. >>> >>> That caused the guests memory to be overwritten and crash. >>> >>> I dont have (easy) access to the git tree right now, but it was a >>> patch for the ATA_PASSTHROUGH command that fixed that. >> >> This patch? >> >> http://patchwork.ozlabs.org/patch/174946/ >> >> Stefan > > This is the one I was thinking about : > 381b634c275ca1a2806e97392527bbfc01bcb333 > > But that also crashed when using local /dev/sg* devices. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Ubuntu/Debian Installer + Virtio-SCSI -> Bad ram pointer
Am 31.10.2012 um 15:08 schrieb ronnie sahlberg: > On Tue, Oct 30, 2012 at 10:48 PM, Stefan Hajnoczi wrote: >> On Tue, Oct 30, 2012 at 10:09 PM, ronnie sahlberg >> wrote: >>> About half a year there was an issue where recent kernels had added >>> support to start using new scsi opcodes, but the qemu functions that >>> determine "which transfer direction is used for this opcode" had not >>> yet been updated, so that the opcode was sent with the wrong transfer >>> direction. >>> >>> That caused the guests memory to be overwritten and crash. >>> >>> I dont have (easy) access to the git tree right now, but it was a >>> patch for the ATA_PASSTHROUGH command that fixed that. >> >> This patch? >> >> http://patchwork.ozlabs.org/patch/174946/ >> >> Stefan > > This is the one I was thinking about : > 381b634c275ca1a2806e97392527bbfc01bcb333 > > But that also crashed when using local /dev/sg* devices. I was using a local LVM Volume not an iSCSI disk. I added debugging output and breakpoints to scsi_cmd_xfer_mode(). The function is not called. Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Ubuntu/Debian Installer + Virtio-SCSI -> Bad ram pointer
Am 30.10.2012 19:27, schrieb Stefan Hajnoczi: On Tue, Oct 30, 2012 at 4:56 PM, Peter Lieven wrote: On 30.10.2012 09:32, Stefan Hajnoczi wrote: On Mon, Oct 29, 2012 at 03:09:37PM +0100, Peter Lieven wrote: Hi, Bug subject should be virtio-blk, not virtio-scsi. virtio-scsi is a different virtio device type from virtoi-blk and is not present in the backtrace you posted. Sounds pedantic but I want to make sure this gets chalked up against the right device :). If I try to Install Ubuntu 12.04 LTS / 12.10 64-bit on a virtio storage backend that supports iSCSI qemu-kvm crashes reliably with the following error: Are you using vanilla qemu-kvm-1.2.0 or are there patches applied? Have you tried qemu-kvm.git/master? Have you tried a local raw disk image to check whether libiscsi is involved? Bad ram pointer 0x3039303620008000 This happens directly after the confirmation of the Timezone before the Disk is partitioned. If I specify -global virtio-blk-pci.scsi=off in the cmdline this does not happen. Here is a stack trace: Thread 1 (Thread 0x77fee700 (LWP 8226)): #0 0x763c0a10 in abort () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. #1 <https://github.com/sahlberg/libiscsi/issues/1> 0x557b751d in qemu_ram_addr_from_host_nofail ( ptr=0x3039303620008000) at /usr/src/qemu-kvm-1.2.0/exec.c:2835 ram_addr = 0 #2 <https://github.com/sahlberg/libiscsi/issues/2> 0x557b9177 in cpu_physical_memory_unmap ( buffer=0x3039303620008000, len=4986663671065686081, is_write=1, access_len=1) at /usr/src/qemu-kvm-1.2.0/exec.c:3645 buffer and len are ASCII junk. It appears to be hex digits and it's not clear where they come from. It would be interesting to print *elem one stack frame up in #3 virtqueue_fill() to show the iovecs and in/out counts. (gdb) print *elem Great, thanks for providing this info: $6 = {index = 3, out_num = 2, in_num = 4, in_addr = {1914920960, 1916656688, 2024130072, 2024130088, 0 , 4129, 93825009696000, 140737328183160, 0 }, out_addr = {2024130056, 2038414056, 0, 8256, 4128, 93824999311936, 0, 3, 0 , 12385, 93825009696000, 140737328183160, 0 }, Up to here everything is fine. in_sg = {{ iov_base = 0x3039303620008000, iov_len = 4986663671065686081}, { iov_base = 0x383038454635, iov_len = 3544389261899019573}, { The fields are bogus, in_sg has been overwritten with ASCII data. Unfortunately I don't see any hint of where this ASCII data came from yet. The hdr fields you provided in stack frame #6 show that in_sg was overwritten during or after the bdrv_ioctl() call. We pulled valid data out of the vring and mapped buffers correctly. But something is overwriting in_sg and when we complete the request we blow up due to the bogus values. Please post your full qemu-kvm command-line. Please also post the exact qemu-kvm version you are using. I can see it's based on qemu-kvm-1.2.0 but are there any patches applied (e.g. distro packages may carry patches so the full package version information would be useful)? Stefan, Ronnie, if I do remove the following patch from my cherry-picked patches its working again: iSCSI: We need to support SG_IO also from iscsi_ioctl() Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Ubuntu/Debian Installer + Virtio-SCSI -> Bad ram pointer
Am 30.10.2012 19:27, schrieb Stefan Hajnoczi: On Tue, Oct 30, 2012 at 4:56 PM, Peter Lieven wrote: On 30.10.2012 09:32, Stefan Hajnoczi wrote: On Mon, Oct 29, 2012 at 03:09:37PM +0100, Peter Lieven wrote: Hi, Bug subject should be virtio-blk, not virtio-scsi. virtio-scsi is a different virtio device type from virtoi-blk and is not present in the backtrace you posted. Sounds pedantic but I want to make sure this gets chalked up against the right device :). If I try to Install Ubuntu 12.04 LTS / 12.10 64-bit on a virtio storage backend that supports iSCSI qemu-kvm crashes reliably with the following error: Are you using vanilla qemu-kvm-1.2.0 or are there patches applied? Have you tried qemu-kvm.git/master? Have you tried a local raw disk image to check whether libiscsi is involved? Bad ram pointer 0x3039303620008000 This happens directly after the confirmation of the Timezone before the Disk is partitioned. If I specify -global virtio-blk-pci.scsi=off in the cmdline this does not happen. Here is a stack trace: Thread 1 (Thread 0x77fee700 (LWP 8226)): #0 0x763c0a10 in abort () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. #1 <https://github.com/sahlberg/libiscsi/issues/1> 0x557b751d in qemu_ram_addr_from_host_nofail ( ptr=0x3039303620008000) at /usr/src/qemu-kvm-1.2.0/exec.c:2835 ram_addr = 0 #2 <https://github.com/sahlberg/libiscsi/issues/2> 0x557b9177 in cpu_physical_memory_unmap ( buffer=0x3039303620008000, len=4986663671065686081, is_write=1, access_len=1) at /usr/src/qemu-kvm-1.2.0/exec.c:3645 buffer and len are ASCII junk. It appears to be hex digits and it's not clear where they come from. It would be interesting to print *elem one stack frame up in #3 virtqueue_fill() to show the iovecs and in/out counts. (gdb) print *elem Great, thanks for providing this info: $6 = {index = 3, out_num = 2, in_num = 4, in_addr = {1914920960, 1916656688, 2024130072, 2024130088, 0 , 4129, 93825009696000, 140737328183160, 0 }, out_addr = {2024130056, 2038414056, 0, 8256, 4128, 93824999311936, 0, 3, 0 , 12385, 93825009696000, 140737328183160, 0 }, Up to here everything is fine. in_sg = {{ iov_base = 0x3039303620008000, iov_len = 4986663671065686081}, { iov_base = 0x383038454635, iov_len = 3544389261899019573}, { The fields are bogus, in_sg has been overwritten with ASCII data. Unfortunately I don't see any hint of where this ASCII data came from yet. The hdr fields you provided in stack frame #6 show that in_sg was overwritten during or after the bdrv_ioctl() call. We pulled valid data out of the vring and mapped buffers correctly. But something is overwriting in_sg and when we complete the request we blow up due to the bogus values. Ok. What I have to mention. I've been testing with qemu-kvm 1.2.0 and libiscsi for a few weeks now. Its been very stable. The only thing it blows up is during the debian/ubuntu installer. Ubuntu itself for instance is running flawlessly. My guess is that the installer is probing for something. The installer itself also runs flawlessly when I disable scsi passthru with scsi=off. Please post your full qemu-kvm command-line. /usr/bin/qemu-kvm-1.2.0 -net tap,vlan=164,script=no,downscript=no,ifname=tap0 -net nic,vlan=164,model=e1000,macaddr=52:54:00:ff:01:35 -iscsi initiator-name=iqn.2005-03.org.virtual-core:0025b51f001c -drive format=iscsi,file=iscsi://172.21.200.56/iqn.2001-05.com.equallogic:0-8a0906-335f4e007-d29001a3355508e8-libiscsi-test-hd0/0,if=virtio,cache=none,aio=native -m 2048 -smp 2,sockets=1,cores=2,threads=1 -monitor tcp:0:4002,server,nowait -vnc :2 -qmp tcp:0:3002,server,nowait -name 'libiscsi-debug' -boot order=dc,menu=off -k de -pidfile /var/run/qemu/vm-280.pid -mem-path /hugepages -mem-prealloc -cpu host,+x2apic,model_id='Intel(R) Xeon(R) CPU L5640 @ 2.27GHz',-tsc -rtc base=utc -usb -usbdevice tablet -no-hpet -vga cirrus Please also post the exact qemu-kvm version you are using. I can see it's based on qemu-kvm-1.2.0 but are there any patches applied (e.g. distro packages may carry patches so the full package version information would be useful)? I use vanilly qemu-kvm 1.2.0 with some cherry picked patches. I will retry with untouched qemu-kvm 1.2.0 and latest git tomorrow at latest. Thanks, Stefan Thank you, too Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Ubuntu/Debian Installer + Virtio-SCSI -> Bad ram pointer
On 30.10.2012 09:32, Stefan Hajnoczi wrote: On Mon, Oct 29, 2012 at 03:09:37PM +0100, Peter Lieven wrote: Hi, Bug subject should be virtio-blk, not virtio-scsi. virtio-scsi is a different virtio device type from virtoi-blk and is not present in the backtrace you posted. Sounds pedantic but I want to make sure this gets chalked up against the right device :). If I try to Install Ubuntu 12.04 LTS / 12.10 64-bit on a virtio storage backend that supports iSCSI qemu-kvm crashes reliably with the following error: Are you using vanilla qemu-kvm-1.2.0 or are there patches applied? Have you tried qemu-kvm.git/master? Have you tried a local raw disk image to check whether libiscsi is involved? Bad ram pointer 0x3039303620008000 This happens directly after the confirmation of the Timezone before the Disk is partitioned. If I specify -global virtio-blk-pci.scsi=off in the cmdline this does not happen. Here is a stack trace: Thread 1 (Thread 0x77fee700 (LWP 8226)): #0 0x763c0a10 in abort () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. #1 <https://github.com/sahlberg/libiscsi/issues/1> 0x557b751d in qemu_ram_addr_from_host_nofail ( ptr=0x3039303620008000) at /usr/src/qemu-kvm-1.2.0/exec.c:2835 ram_addr = 0 #2 <https://github.com/sahlberg/libiscsi/issues/2> 0x557b9177 in cpu_physical_memory_unmap ( buffer=0x3039303620008000, len=4986663671065686081, is_write=1, access_len=1) at /usr/src/qemu-kvm-1.2.0/exec.c:3645 buffer and len are ASCII junk. It appears to be hex digits and it's not clear where they come from. It would be interesting to print *elem one stack frame up in #3 virtqueue_fill() to show the iovecs and in/out counts. (gdb) print *elem $6 = {index = 3, out_num = 2, in_num = 4, in_addr = {1914920960, 1916656688, 2024130072, 2024130088, 0 , 4129, 93825009696000, 140737328183160, 0 }, out_addr = {2024130056, 2038414056, 0, 8256, 4128, 93824999311936, 0, 3, 0 , 12385, 93825009696000, 140737328183160, 0 }, in_sg = {{ iov_base = 0x3039303620008000, iov_len = 4986663671065686081}, { iov_base = 0x383038454635, iov_len = 3544389261899019573}, { iov_base = 0x2aab32443039, iov_len = 16}, {iov_base = 0x2aab2365c628, iov_len = 1}, {iov_base = 0x0, iov_len = 0}, {iov_base = 0x0, iov_len = 0}, {iov_base = 0x2041, iov_len = 93825010788016}, { iov_base = 0x7673f778, iov_len = 0}, {iov_base = 0x0, iov_len = 0} , {iov_base = 0x1021, iov_len = 93825010788016}, {iov_base = 0x7673f778, iov_len = 0}, { iov_base = 0x0, iov_len = 0} , {iov_base = 0x0, iov_len = 24768}, {iov_base = 0x1020, iov_len = 93824999311936}, { iov_base = 0x0, iov_len = 2}, {iov_base = 0x0, iov_len = 0} , {iov_base = 0x1021, iov_len = 93825009696000}, {iov_base = 0x7673f778, iov_len = 0}, { iov_base = 0x0, iov_len = 0} }, out_sg = {{ iov_base = 0x2aab2365c608, iov_len = 16}, {iov_base = 0x2aab243fbae8, iov_len = 6}, {iov_base = 0x0, iov_len = 0} , { iov_base = 0x0, iov_len = 33024}, {iov_base = 0x30, iov_len = 93825010821424}, {iov_base = 0x5670d7a0, iov_len = 0}, { iov_base = 0x5670cbb0, iov_len = 0}, {iov_base = 0x71, iov_len = 93825008729792}, {iov_base = 0x5670e960, iov_len = 0}, { iov_base = 0x31, iov_len = 140737328183192}, {iov_base = 0x7673f798, iov_len = 0}, {iov_base = 0x56711e20, iov_len = 80}, { iov_base = 0x20, iov_len = 93825010821584}, {iov_base = 0x0, iov_len = 33184}, {iov_base = 0x30, iov_len = 93825010821536}, { iov_base = 0x5670e840, iov_len = 0}, {iov_base = 0x5670e1b0, iov_len = 0}, {iov_base = 0x41, iov_len = 93825010821584}, { iov_base = 0x5670eb20, iov_len = 32}, {iov_base = 0x20, iov_len = 93825010821920}, {iov_base = 0x0, iov_len = 33296}, { iov_base = 0x30, iov_len = 93825010821872}, {iov_base = 0x5670e8b0, iov_len = 0}, {iov_base = 0x5670dc68, iov_len = 0}, { iov_base = 0x191, iov_len = 93825009696736}, {iov_base = 0x5670eb20, iov_len = 0}, {iov_base = 0x21, iov_len = 93825010826352}, { iov_base = 0x5670e880, iov_len = 64}, {iov_base = 0x30, iov_len = 93825010821200}, {iov_base = 0x5670e920, iov_len = 0}, { iov_base = 0x5670e5c8, iov_len = 0}, {iov_base = 0x41, iov_len = 93825008729792}, {iov_base = 0x5670e9d0, iov_len = 32}, { iov_base = 0x20, iov_len = 93825010821696}, {iov_base = 0x0, iov_len = 176}, {iov_base = 0x30, iov_len = 93825010821648}, { iov_base = 0x5670e990, iov_len = 0}, {iov_base = 0x5670e080, iov_len = 0}, {iov_base = 0x41, iov_len = 93825008729792}, { iov_base = 0x5670eb20, iov_len = 32}, {iov_base = 0x20, iov_len = 93825010822032}, {iov_base = 0x0, iov_len = 288}, { iov_base = 0x30, iov_len = 93825010821984}, {iov_base = 0x5670ea00, iov_le
Re: [Qemu-devel] Ubuntu/Debian Installer + Virtio-BLK -> Bad ram pointer
On 30.10.2012 09:32, Stefan Hajnoczi wrote: On Mon, Oct 29, 2012 at 03:09:37PM +0100, Peter Lieven wrote: Hi, Bug subject should be virtio-blk, not virtio-scsi. virtio-scsi is a different virtio device type from virtoi-blk and is not present in the backtrace you posted. you are right, sorry for that. Sounds pedantic but I want to make sure this gets chalked up against the right device :). If I try to Install Ubuntu 12.04 LTS / 12.10 64-bit on a virtio storage backend that supports iSCSI qemu-kvm crashes reliably with the following error: Are you using vanilla qemu-kvm-1.2.0 or are there patches applied? I use vanilla qemu-kvm 1.2.0 except for one virtio-blk related patch (CVE-2011-4127): http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=commit;h=1ba1f2e319afdcb485963cd3f426fdffd1b725f2 that for some reason did not made it into qemu-kvm 1.2.0 and two aio related patchs: http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=commit;h=00f78533326c5ba2e62fafada16655aa558a5520 http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=commit;h=2db2bfc0ccac5fd68dbf0ceb70fbc372c5d8a8c7 this is why I can circumvent the issue with scsi=off i guess. Have you tried qemu-kvm.git/master? not yet. Have you tried a local raw disk image to check whether libiscsi is involved? I have, here it does not happen. For a raw device scsi is scsi=off, isn't it? Bad ram pointer 0x3039303620008000 This happens directly after the confirmation of the Timezone before the Disk is partitioned. If I specify -global virtio-blk-pci.scsi=off in the cmdline this does not happen. Here is a stack trace: Thread 1 (Thread 0x77fee700 (LWP 8226)): #0 0x763c0a10 in abort () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. #1 <https://github.com/sahlberg/libiscsi/issues/1> 0x557b751d in qemu_ram_addr_from_host_nofail ( ptr=0x3039303620008000) at /usr/src/qemu-kvm-1.2.0/exec.c:2835 ram_addr = 0 #2 <https://github.com/sahlberg/libiscsi/issues/2> 0x557b9177 in cpu_physical_memory_unmap ( buffer=0x3039303620008000, len=4986663671065686081, is_write=1, access_len=1) at /usr/src/qemu-kvm-1.2.0/exec.c:3645 buffer and len are ASCII junk. It appears to be hex digits and it's not clear where they come from. It would be interesting to print *elem one stack frame up in #3 virtqueue_fill() to show the iovecs and in/out counts. I will collect that info for you. Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kvm-kmod 3.6 on linux 3.2.0
Hi, kvm-kmod 3.6 fails to compile against a 3.2.0 kernel with the following error: /usr/src/kvm-kmod-3.6/x86/x86.c: In function ‘get_msr_mce’: /usr/src/kvm-kmod-3.6/x86/x86.c:1908:27: error: ‘kvm’ undeclared (first use in this function) /usr/src/kvm-kmod-3.6/x86/x86.c:1908:27: note: each undeclared identifier is reported only once for each function it appears in Any ideas? Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Block Migration and xbzrle
Am 02.10.2012 um 12:40 schrieb Orit Wasserman: > On 10/02/2012 11:30 AM, Peter Lieven wrote: >> >> Am 02.10.2012 um 11:28 schrieb Orit Wasserman: >> >>> On 10/02/2012 10:33 AM, lieven-li...@dlh.net wrote: >>>> Orit Wasserman wrote: >>>>> On 09/16/2012 01:39 PM, Peter Lieven wrote: >>>>>> Hi, >>>>>> >>>>>> I remember that this was broken some time ago and currently with >>>>>> qemu-kvm 1.2.0 I am still not able to use >>>>>> block migration plus xbzrle. The migration fails if both are used >>>>>> together. XBZRLE without block migration works. >>>>>> >>>>>> Can someone please advise what is the current expected behaviour? >>>>> XBZRLE only work on guest memory so it shouldn't be effected by block >>>>> migration. >>>>> What is the error you are getting? >>>>> What command line ? >>>> >>>> Meanwhile I can confirm that it happens with and without block migration. >>>> I I observe 2 errors: >>>> a) >>>> qemu: warning: error while loading state section id 2 >>>> load of migration failed > Did you enabled XBZRLE on the destination also? > (migrate_set_capability xbzrle on) I was not aware that I have to enable it on both sides. I thought it had to be enabled only on the source side. However, it seems that it is enabled by default in 1.2.0?! I will retry with enabling it with the above command on both sides. Peter > > Orit >>>> b) >>>> the vm does not enter running state after migration. >>>> >>>> The command-line: >>>> /usr/bin/qemu-kvm-1.2.0 -net >>>> tap,vlan=798,script=no,downscript=no,ifname=tap1 -net >>>> nic,vlan=798,model=e1000,macaddr=52:54:00:ff:01:15 -drive >>>> format=host_device,file=/dev/mapper/iqn.2001-05.com.equallogic:0-8a0906-d85f4e007-3f30017ce11505df-ubuntu-tools-hd0,if=virtio,cache=none,aio=native >>>> -m 4096 -smp 2,sockets=1,cores=2,threads=1 -monitor >>>> tcp:0:4002,server,nowait -vnc :2 -qmp tcp:0:3002,server,nowait -name >>>> 'Ubuntu-Tools' -boot order=dc,menu=off -k de -incoming >>>> tcp:172.21.55.34:5002 -pidfile /var/run/qemu/vm-250.pid -mem-path >>>> /hugepages -mem-prealloc -rtc base=utc -usb -usbdevice tablet -no-hpet >>>> -vga cirrus -cpu host,+x2apic,model_id='Intel(R) Xeon(R) CPU >>> Migration with -cpu host is very problemtic, because the source and >>> destination can >>> have different cpu resulting in different cpu features. >>> Does regular migration works with this setup? >>> Can you try with a different cpu type? >>> What are the source and destination /proc/cpuinfo output ? > >> >> The CPUs are identical, we also check if flags and cpu types match if cpu >> type is set to host. >> Regular migration does work. > > > >> >> BR, >> Peter >> >>> >>> Cheers, >>> Orit >>> >>>> L5640 @ 2.27GHz',-tsc >>>> >>>> Thanks, >>>> Peter >>>> >>>>> >>>>> Regards, >>>>> Orit >>>>>> >>>>>> Thanks, >>>>>> Peter >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe kvm" in >>>> the body of a message to majord...@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>> >> > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Block Migration and xbzrle
Am 02.10.2012 um 11:38 schrieb Paolo Bonzini: > Il 16/09/2012 12:39, Peter Lieven ha scritto: >> >> I remember that this was broken some time ago and currently with >> qemu-kvm 1.2.0 I am still not able to use >> block migration plus xbzrle. The migration fails if both are used >> together. XBZRLE without block migration works. >> >> Can someone please advise what is the current expected behaviour? > > Block migration is broken by design. It will converge really slowly as > soon as you have real load in the VMs, and it will hamper the > convergence of RAM as well. > > Hopefully a real alternative will be in 1.3 (based on drive-mirror on > the source + an embedded NBD server running on the destination), then in > 1.4 we can reimplement the block migration monitor commands using the > alternative. Hi Paolo, i know that block migration is not that good, but it seems that there is a bug in XBZRLE that is independent of block migration. Peter > > Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Block Migration and xbzrle
Am 02.10.2012 um 11:28 schrieb Orit Wasserman: > On 10/02/2012 10:33 AM, lieven-li...@dlh.net wrote: >> Orit Wasserman wrote: >>> On 09/16/2012 01:39 PM, Peter Lieven wrote: >>>> Hi, >>>> >>>> I remember that this was broken some time ago and currently with >>>> qemu-kvm 1.2.0 I am still not able to use >>>> block migration plus xbzrle. The migration fails if both are used >>>> together. XBZRLE without block migration works. >>>> >>>> Can someone please advise what is the current expected behaviour? >>> XBZRLE only work on guest memory so it shouldn't be effected by block >>> migration. >>> What is the error you are getting? >>> What command line ? >> >> Meanwhile I can confirm that it happens with and without block migration. >> I I observe 2 errors: >> a) >> qemu: warning: error while loading state section id 2 >> load of migration failed >> b) >> the vm does not enter running state after migration. >> >> The command-line: >> /usr/bin/qemu-kvm-1.2.0 -net >> tap,vlan=798,script=no,downscript=no,ifname=tap1 -net >> nic,vlan=798,model=e1000,macaddr=52:54:00:ff:01:15 -drive >> format=host_device,file=/dev/mapper/iqn.2001-05.com.equallogic:0-8a0906-d85f4e007-3f30017ce11505df-ubuntu-tools-hd0,if=virtio,cache=none,aio=native >> -m 4096 -smp 2,sockets=1,cores=2,threads=1 -monitor >> tcp:0:4002,server,nowait -vnc :2 -qmp tcp:0:3002,server,nowait -name >> 'Ubuntu-Tools' -boot order=dc,menu=off -k de -incoming >> tcp:172.21.55.34:5002 -pidfile /var/run/qemu/vm-250.pid -mem-path >> /hugepages -mem-prealloc -rtc base=utc -usb -usbdevice tablet -no-hpet >> -vga cirrus -cpu host,+x2apic,model_id='Intel(R) Xeon(R) CPU > Migration with -cpu host is very problemtic, because the source and > destination can > have different cpu resulting in different cpu features. > Does regular migration works with this setup? > Can you try with a different cpu type? > What are the source and destination /proc/cpuinfo output ? The CPUs are identical, we also check if flags and cpu types match if cpu type is set to host. Regular migration does work. BR, Peter > > Cheers, > Orit > >> L5640 @ 2.27GHz',-tsc >> >> Thanks, >> Peter >> >>> >>> Regards, >>> Orit >>>> >>>> Thanks, >>>> Peter >>>> >>>> >>> >>> >>> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Block Migration Assertion in qemu-kvm 1.2.0
On 09/18/12 12:31, Kevin Wolf wrote: Am 18.09.2012 12:28, schrieb Peter Lieven: On 09/17/12 22:12, Peter Lieven wrote: On 09/17/12 10:41, Kevin Wolf wrote: Am 16.09.2012 12:13, schrieb Peter Lieven: Hi, when trying to block migrate a VM from one node to another, the source VM crashed with the following assertion: block.c:3829: bdrv_set_in_use: Assertion `bs->in_use != in_use' failed. Is this sth already addresses/known? Not that I'm aware of, at least. Block migration doesn't seem to check whether the device is already in use, maybe this is the problem. Not sure why it would be in use, though, and in my quick test it didn't crash. So we need some more information: What's you command line, did you do anything specific in the monitor with block devices, what does the stacktrace look like, etc.? kevin, it seems that i can very easily force a crash if I cancel a running block migration. if I understand correctly what happens there are aio callbacks coming in after blk_mig_cleanup() has been called. what is the proper way to detect this in blk_mig_read_cb()? You could try this, it doesn't detect the situation in blk_mig_read_cb(), but ensures that all callbacks happen before we do the actual cleanup (completely untested): after testing it for half an hour i can say, it seems to fix the problem. no segfaults and also no other assertions. while searching I have seen that the queses blk_list and bmds_list are initialized at qemu startup. wouldn't it be better to initialize them at init_blk_migration or at least check that they are really empty? i have also seen that prev_time_offset is not initialized. thank you, peter sth like this: --- qemu-kvm-1.2.0/block-migration.c.orig2012-09-17 21:14:44.458429855 +0200 +++ qemu-kvm-1.2.0/block-migration.c2012-09-17 21:15:40.599736962 +0200 @@ -311,8 +311,12 @@ static void init_blk_migration(QEMUFile block_mig_state.prev_progress = -1; block_mig_state.bulk_completed = 0; block_mig_state.total_time = 0; +block_mig_state.prev_time_offset = 0; block_mig_state.reads = 0; +QSIMPLEQ_INIT(&block_mig_state.bmds_list); +QSIMPLEQ_INIT(&block_mig_state.blk_list); + bdrv_iterate(init_blk_migration_it, NULL); } @@ -760,9 +764,6 @@ SaveVMHandlers savevm_block_handlers = { void blk_mig_init(void) { -QSIMPLEQ_INIT(&block_mig_state.bmds_list); -QSIMPLEQ_INIT(&block_mig_state.blk_list); - register_savevm_live(NULL, "block", 0, 1, &savevm_block_handlers, &block_mig_state); } diff --git a/block-migration.c b/block-migration.c index 7def8ab..ed93301 100644 --- a/block-migration.c +++ b/block-migration.c @@ -519,6 +519,8 @@ static void blk_mig_cleanup(void) BlkMigDevState *bmds; BlkMigBlock *blk; +bdrv_drain_all(); + set_dirty_tracking(0); while ((bmds = QSIMPLEQ_FIRST(&block_mig_state.bmds_list)) != NULL) { -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Block Migration Assertion in qemu-kvm 1.2.0
On 09/17/12 22:12, Peter Lieven wrote: On 09/17/12 10:41, Kevin Wolf wrote: Am 16.09.2012 12:13, schrieb Peter Lieven: Hi, when trying to block migrate a VM from one node to another, the source VM crashed with the following assertion: block.c:3829: bdrv_set_in_use: Assertion `bs->in_use != in_use' failed. Is this sth already addresses/known? Not that I'm aware of, at least. Block migration doesn't seem to check whether the device is already in use, maybe this is the problem. Not sure why it would be in use, though, and in my quick test it didn't crash. So we need some more information: What's you command line, did you do anything specific in the monitor with block devices, what does the stacktrace look like, etc.? kevin, it seems that i can very easily force a crash if I cancel a running block migration. if I understand correctly what happens there are aio callbacks coming in after blk_mig_cleanup() has been called. what is the proper way to detect this in blk_mig_read_cb()? Thanks, Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Block Migration Assertion in qemu-kvm 1.2.0
On 09/17/12 10:41, Kevin Wolf wrote: Am 16.09.2012 12:13, schrieb Peter Lieven: Hi, when trying to block migrate a VM from one node to another, the source VM crashed with the following assertion: block.c:3829: bdrv_set_in_use: Assertion `bs->in_use != in_use' failed. Is this sth already addresses/known? Not that I'm aware of, at least. Block migration doesn't seem to check whether the device is already in use, maybe this is the problem. Not sure why it would be in use, though, and in my quick test it didn't crash. So we need some more information: What's you command line, did you do anything specific in the monitor with block devices, what does the stacktrace look like, etc.? i was also able to reproduce a "flush_blks: Assertion `block_mig_state.read_done >= 0' failed." by cancelling a block migration and restarting it afterwards. however, how can I grep a stack trace after an assert? thanks, peter Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Block Migration Assertion in qemu-kvm 1.2.0
On 09/17/12 10:41, Kevin Wolf wrote: Am 16.09.2012 12:13, schrieb Peter Lieven: Hi, when trying to block migrate a VM from one node to another, the source VM crashed with the following assertion: block.c:3829: bdrv_set_in_use: Assertion `bs->in_use != in_use' failed. Is this sth already addresses/known? Not that I'm aware of, at least. Block migration doesn't seem to check whether the device is already in use, maybe this is the problem. Not sure why it would be in use, though, and in my quick test it didn't crash. It seems that it only happens if a vServer that has been block migrated earlier is block migrated the next time. So we need some more information: What's you command line, did you do anything specific in the monitor with block devices, what does the stacktrace look like, etc.? Here is my cmdline: /usr/bin/qemu-kvm-1.2.0 -net tap,vlan=164,script=no,downscript=no,ifname=tap0 -net nic,vlan =164,model=e1000,macaddr=52:54:00:ff:01:19 -drive format=host_device,file=/dev/7cf58855099771c2/lieven-storage-migration-t-hd0,if=virtio,cache=none,aio=nat ive -m 2048 -smp 2,sockets=1,cores=2,threads=1 -monitor tcp:0:4001,server,nowait -vnc :1 -qmp tcp:0:3001,server,nowait -name 'lieven-storage-migration-test' -boot or der=dc,menu=off -k de -incoming tcp:172.21.55.34:5001 -pidfile /var/run/qemu/vm-254.pid -mem-path /hugepages -mem-prealloc -rtc base=utc -usb -usbdevice tablet -no -hpet -vga cirrus -cpu host,+x2apic,model_id='Intel(R) Xeon(R) CPU L5640 @ 2.27GHz',-tsc I have seen other errors as well in the meantime: block-migration.c:471: flush_blks: Assertion `block_mig_state.read_done >= 0' failed. qemu-kvm-1.2.0[27851]: segfault at 7f00746e78d7 ip 7f67eca6226d sp 7fff56ae3340 error 4 in qemu-system-x86_64[7f67ec9e9000+418000] I will now try to catch the situation in the debugger. Thanks, Peter Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Block Migration and xbzrle
Hi, I remember that this was broken some time ago and currently with qemu-kvm 1.2.0 I am still not able to use block migration plus xbzrle. The migration fails if both are used together. XBZRLE without block migration works. Can someone please advise what is the current expected behaviour? Thanks, Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
qemu-kvm and XenServer missing MSRs
Hi, i have seen some recent threads about running Xen as guest. For me it is still not working, but I have read that Avi is working on some fixes. I have seen in the logs that the following MSRs are missing. Maybe this is related: cpu0 unhandled rdmsr: 0xce cpu0 disabled perfctr wrmsr: 0xc1 data 0x0 cpu0 disabled perfctr wrmsr: 0xc2 data 0x0 cpu0 disabled perfctr wrmsr: 0x186 data 0x13003c cpu0 disabled perfctr wrmsr: 0xc1 data 0xfea6c644 cpu0 disabled perfctr wrmsr: 0x186 data 0x53003c I had a different thread started which was dealing with memtest not working on Nehalem CPUs, at least 0xce was also involved there. Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Block Migration Assertion in qemu-kvm 1.2.0
Hi, when trying to block migrate a VM from one node to another, the source VM crashed with the following assertion: block.c:3829: bdrv_set_in_use: Assertion `bs->in_use != in_use' failed. Is this sth already addresses/known? Thanks, Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: memtest 4.20+ does not work with -cpu host
On 13.09.2012 14:42, Gleb Natapov wrote: On Thu, Sep 13, 2012 at 02:05:23PM +0200, Peter Lieven wrote: On 13.09.2012 10:05, Gleb Natapov wrote: On Thu, Sep 13, 2012 at 10:00:26AM +0200, Paolo Bonzini wrote: Il 13/09/2012 09:57, Gleb Natapov ha scritto: #rdmsr -0 0x194 00011100 #rdmsr -0 0xce 0c0004011103 Yes, that can help implementing it in KVM. But without a spec to understand what the bits actually mean, it's just as risky... Peter, do you have any idea where to get the spec of the memory controller MSRs in Nehalem and newer processors? Apparently, memtest is using them (and in particular 0x194) to find the speed of the FSB, or something like that. Why would anyone will want to run memtest in a vm? May be just add those MSRs to ignore list and that's it. >From the output it looks like it's basically a list of bits. Returning something sensible is better, same as for the speed scaling MSRs. Everything is list of bits in computers :) At least 0xce is documented in SDM. It cannot be implemented in a migration safe manner. What do you suggest just say memtest does not work? Why do you want to run it in a guest? Testing memory thorughput of different host memory layouts/settings (hugepages, ksm etc.). Stress testing new settings and qemu-kvm builds. Testing new nodes with a VM which claims all available pages. Its a lot easier than booting a node with a CD and attaching to the Console. This, of course, is all not missing critical and call also be done with cpu model qemu64. I just came across memtest no longer working and where wondering if there is a general regressing. BTW, from http://opensource.apple.com/source/xnu/xnu-1228.15.4/osfmk/i386/tsc.c?txt #define MSR_FLEX_RATIO 0x194 #define MSR_PLATFORM_INFO 0x0ce #define BASE_NHM_CLOCK_SOURCE 1ULL #define CPUID_MODEL_NEHALEM 26 switch (cpuid_info()->cpuid_model) { case CPUID_MODEL_NEHALEM: { uint64_t cpu_mhz; uint64_t msr_flex_ratio; uint64_t msr_platform_info; /* See if FLEX_RATIO is being used */ msr_flex_ratio = rdmsr64(MSR_FLEX_RATIO); msr_platform_info = rdmsr64(MSR_PLATFORM_INFO); flex_ratio_min = (uint32_t)bitfield(msr_platform_info, 47, 40); flex_ratio_max = (uint32_t)bitfield(msr_platform_info, 15, 8); /* No BIOS-programed flex ratio. Use hardware max as default */ tscGranularity = flex_ratio_max; if (msr_flex_ratio & bit(16)) { /* Flex Enabled: Use this MSR if less than max */ flex_ratio = (uint32_t)bitfield(msr_flex_ratio, 15, 8); if (flex_ratio < flex_ratio_max) tscGranularity = flex_ratio; } /* If EFI isn't configured correctly, use a constant * value. See 6036811. */ if (busFreq == 0) busFreq = BASE_NHM_CLOCK_SOURCE; cpu_mhz = tscGranularity * BASE_NHM_CLOCK_SOURCE; kprintf("[NHM] Maximum Non-Turbo Ratio = [%d]\n", (uint32_t)tscGranularity); kprintf("[NHM] CPU: Frequency = %6d.%04dMhz\n", (uint32_t)(cpu_mhz / Mega), (uint32_t)(cpu_mhz % Mega)); break; } Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: memtest 4.20+ does not work with -cpu host
On 13.09.2012 10:05, Gleb Natapov wrote: On Thu, Sep 13, 2012 at 10:00:26AM +0200, Paolo Bonzini wrote: Il 13/09/2012 09:57, Gleb Natapov ha scritto: #rdmsr -0 0x194 00011100 #rdmsr -0 0xce 0c0004011103 Yes, that can help implementing it in KVM. But without a spec to understand what the bits actually mean, it's just as risky... Peter, do you have any idea where to get the spec of the memory controller MSRs in Nehalem and newer processors? Apparently, memtest is using them (and in particular 0x194) to find the speed of the FSB, or something like that. Why would anyone will want to run memtest in a vm? May be just add those MSRs to ignore list and that's it. >From the output it looks like it's basically a list of bits. Returning something sensible is better, same as for the speed scaling MSRs. Everything is list of bits in computers :) At least 0xce is documented in SDM. It cannot be implemented in a migration safe manner. What do you suggest just say memtest does not work? I am wondering why it is working with -cpu qemu64. Peter -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: memtest 4.20+ does not work with -cpu host
On 10.09.2012 14:32, Avi Kivity wrote: On 09/10/2012 03:29 PM, Peter Lieven wrote: On 09/10/12 14:21, Gleb Natapov wrote: On Mon, Sep 10, 2012 at 02:15:49PM +0200, Paolo Bonzini wrote: Il 10/09/2012 13:52, Peter Lieven ha scritto: dd if=/dev/cpu/0/msr skip=$((0x194)) bs=8 count=1 | xxd dd if=/dev/cpu/0/msr skip=$((0xCE)) bs=8 count=1 | xxd it only works without the skip. but the msr device returns all zeroes. Hmm, the strange API of the MSR device doesn't work well with dd (dd skips to 0x194 * 8 because bs is 8. You can try this program: There is rdmsr/wrmsr in msr-tools. rdmsr returns it cannot read those MSRs. regardless if I use -cpu host or -cpu qemu64. On the host. did you get my output? #rdmsr -0 0x194 00011100 #rdmsr -0 0xce 0c0004011103 cheers, peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: memtest 4.20+ does not work with -cpu host
On 09/10/12 14:32, Avi Kivity wrote: On 09/10/2012 03:29 PM, Peter Lieven wrote: On 09/10/12 14:21, Gleb Natapov wrote: On Mon, Sep 10, 2012 at 02:15:49PM +0200, Paolo Bonzini wrote: Il 10/09/2012 13:52, Peter Lieven ha scritto: dd if=/dev/cpu/0/msr skip=$((0x194)) bs=8 count=1 | xxd dd if=/dev/cpu/0/msr skip=$((0xCE)) bs=8 count=1 | xxd it only works without the skip. but the msr device returns all zeroes. Hmm, the strange API of the MSR device doesn't work well with dd (dd skips to 0x194 * 8 because bs is 8. You can try this program: There is rdmsr/wrmsr in msr-tools. rdmsr returns it cannot read those MSRs. regardless if I use -cpu host or -cpu qemu64. On the host. aaah ok: #rdmsr -0 0x194 00011100 #rdmsr -0 0xce 0c0004011103 Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: memtest 4.20+ does not work with -cpu host
On 09/10/12 14:21, Gleb Natapov wrote: On Mon, Sep 10, 2012 at 02:15:49PM +0200, Paolo Bonzini wrote: Il 10/09/2012 13:52, Peter Lieven ha scritto: dd if=/dev/cpu/0/msr skip=$((0x194)) bs=8 count=1 | xxd dd if=/dev/cpu/0/msr skip=$((0xCE)) bs=8 count=1 | xxd it only works without the skip. but the msr device returns all zeroes. Hmm, the strange API of the MSR device doesn't work well with dd (dd skips to 0x194 * 8 because bs is 8. You can try this program: There is rdmsr/wrmsr in msr-tools. rdmsr returns it cannot read those MSRs. regardless if I use -cpu host or -cpu qemu64. peter #include #include #include int rdmsr(int fd, long reg) { char msg[40]; long long val; sprintf(msg, "rdmsr(%#x)", reg); if (pread(fd,&val, 8, reg)< 0) { perror(msg); } else { printf("%s: %#016llx\n", msg, val); fflush(stdout); } } int main() { int fd = open("/dev/cpu/0/msr", O_RDONLY); if (fd< 0) { perror("open"); exit(1); } rdmsr(fd, 0x194); rdmsr(fd, 0xCE); } Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: memtest 4.20+ does not work with -cpu host
On 09/10/12 13:29, Paolo Bonzini wrote: Il 10/09/2012 13:06, Peter Lieven ha scritto: qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason MSR_READ rip 0x11478 info 0 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_msr: msr_read 194 = 0x0 (#GP) qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_inj_exception: #GP (0x0) qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason TRIPLE_FAULT rip 0x11478 info 0 0 Memory controller MSR: static float getNHMmultiplier(void) { unsigned int msr_lo, msr_hi; float coef; /* Find multiplier (by MSR) */ /* First, check if Flexible Ratio is Enabled */ rdmsr(0x194, msr_lo, msr_hi); if((msr_lo>> 16)& 1){ coef = (msr_lo>> 8)& 0xFF; } else { rdmsr(0xCE, msr_lo, msr_hi); coef = (msr_lo>> 8)& 0xFF; } return coef; } Looks like we need to emulate it since memtest only looks at the cpuid to detect an integrated memory controller. What does this return for you? dd if=/dev/cpu/0/msr skip=$((0x194)) bs=8 count=1 | xxd dd if=/dev/cpu/0/msr skip=$((0xCE)) bs=8 count=1 | xxd it only works without the skip. but the msr device returns all zeroes. peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: memtest 4.20+ does not work with -cpu host
On 09/10/12 13:29, Paolo Bonzini wrote: Il 10/09/2012 13:06, Peter Lieven ha scritto: qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason MSR_READ rip 0x11478 info 0 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_msr: msr_read 194 = 0x0 (#GP) qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_inj_exception: #GP (0x0) qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason TRIPLE_FAULT rip 0x11478 info 0 0 Memory controller MSR: static float getNHMmultiplier(void) { unsigned int msr_lo, msr_hi; float coef; /* Find multiplier (by MSR) */ /* First, check if Flexible Ratio is Enabled */ rdmsr(0x194, msr_lo, msr_hi); if((msr_lo>> 16)& 1){ coef = (msr_lo>> 8)& 0xFF; } else { rdmsr(0xCE, msr_lo, msr_hi); coef = (msr_lo>> 8)& 0xFF; } return coef; } Looks like we need to emulate it since memtest only looks at the cpuid to detect an integrated memory controller. What does this return for you? dd if=/dev/cpu/0/msr skip=$((0x194)) bs=8 count=1 | xxd dd if=/dev/cpu/0/msr skip=$((0xCE)) bs=8 count=1 | xxd I/O error. Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: memtest 4.20+ does not work with -cpu host
On 09/06/12 16:58, Avi Kivity wrote: On 08/22/2012 06:06 PM, Peter Lieven wrote: Hi, has anyone ever tested to run memtest with -cpu host flag passed to qemu-kvm? For me it resets when probing the chipset. With -cpu qemu64 it works just fine. Maybe this is specific to memtest, but it might be sth that can happen in other applications to. Any thoughts? Try to identify the cpu flag that causes this by removing them successively (-cpu host,-flag...). Alternatively capture a trace (http://www.linux-kvm.org/page/Tracing) look for TRIPLE_FAULT (Intel), and post the few hundred lines preceding it. Here we go: qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_userspace_exit: reason KVM_EXIT_IO (2) qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason EXCEPTION_NMI rip 0xd185 info 0 8307 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_fpu: load qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason IO_INSTRUCTION rip 0xcc60 info cf80003 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_pio: pio_write at 0xcf8 size 4 count 1 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_userspace_exit: reason KVM_EXIT_IO (2) qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_fpu: unload qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason IO_INSTRUCTION rip 0xcc29 info cfc0009 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_emulate_insn: [FAILED TO PARSE] rip=52265 csbase=0 len=2 insn=fí%ÿÿ flags=5 failed=0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_pio: pio_read at 0xcfc size 2 count 1 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_userspace_exit: reason KVM_EXIT_IO (2) qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason IO_INSTRUCTION rip 0xcc60 info cf80003 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_pio: pio_write at 0xcf8 size 4 count 1 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_userspace_exit: reason KVM_EXIT_IO (2) qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason IO_INSTRUCTION rip 0xcc29 info cfe0009 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_emulate_insn: [FAILED TO PARSE] rip=52265 csbase=0 len=2 insn=fí%ÿÿ flags=5 failed=0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_pio: pio_read at 0xcfe size 2 count 1 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_userspace_exit: reason KVM_EXIT_IO (2) qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason EXCEPTION_NMI rip 0xd185 info 0 8307 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_fpu: load qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason IO_INSTRUCTION rip 0xcc60 info cf80003 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_pio: pio_write at 0xcf8 size 4 count 1 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_userspace_exit: reason KVM_EXIT_IO (2) qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_fpu: unload qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason IO_INSTRUCTION rip 0xcc29 info cfc0009 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_emulate_insn: [FAILED TO PARSE] rip=52265 csbase=0 len=2 insn=fí%ÿÿ flags=5 failed=0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_pio: pio_read at 0xcfc size 2 count 1 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_userspace_exit: reason KVM_EXIT_IO (2) qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason IO_INSTRUCTION rip 0xcc60 info cf80003 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_pio: pio_write at 0xcf8 size 4 count 1 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_userspace_exit: reason KVM_EXIT_IO (2) qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason IO_INSTRUCTION rip 0xcc29 info cfc0009 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_emulate_insn: [FAILED TO PARSE] rip=52265 csbase=0 len=2 insn=fí%ÿÿ flags=5 failed=0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_pio: pio_read at 0xcfc size 2 count 1 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_userspace_exit: reason KVM_EXIT_IO (2) qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit: reason EPT_MISCONFIG rip 0x86e0 info 0 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_emulate_insn: [FAILED TO PARSE] rip=34528 csbase=0 len=3 insn=F@ÒuõL$ flags=5 failed=0 qemu-kvm-1.0.1-5107 [007] 410771.148000: vcpu_match_mmio: gva 0xb873c gpa 0xb873c Write GPA qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_mmio: mmio write len 1 gpa 0xb873c val 0x6f qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_entry: vcpu 0 qemu-kvm-1.0.1-5107 [007] 410771.148000: kvm_exit
memtest 4.20+ does not work with -cpu host
Hi, has anyone ever tested to run memtest with -cpu host flag passed to qemu-kvm? For me it resets when probing the chipset. With -cpu qemu64 it works just fine. Maybe this is specific to memtest, but it might be sth that can happen in other applications to. Any thoughts? Thanks, Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop
On 08/21/12 10:23, Stefan Hajnoczi wrote: On Tue, Aug 21, 2012 at 8:21 AM, Jan Kiszka wrote: On 2012-08-19 11:42, Avi Kivity wrote: On 08/17/2012 06:04 PM, Jan Kiszka wrote: Can anyone imagine that such a barrier may actually be required? If it is currently possible that env->stop is evaluated before we called into sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the signal without properly processing its reason (stop). Should not be required (TM): Both signal eating / stop checking and stop setting / signal generation happens under the BQL, thus the ordering must not make a difference here. Agree. Don't see where we could lose a signal. Maybe due to a subtle memory corruption that sets thread_kicked to non-zero, preventing the kicking this way. Cannot be ruled out, yet too much of a coincidence. Could be a kernel bug (either in kvm or elsewhere), we've had several before in this area. Is this reproducible? Not for me. Peter only hit it very rarely, Peter obviously more easily. I have only hit this once and was not able to reproduce it. For me it was very reproducible, but my issue was fixed by: http://www.mail-archive.com/kvm@vger.kernel.org/msg70908.html Never seen this since then, Peter Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 05.07.2012 10:51, Xiao Guangrong wrote: On 06/28/2012 05:11 PM, Peter Lieven wrote: that here is bascially whats going on: qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) There are two mmio emulation after user-space-exit, it is caused by mmio read access which spans two pages. But it should be fixed by: commit f78146b0f9230765c6315b2e14f56112513389ad Author: Avi Kivity Date: Wed Apr 18 19:22:47 2012 +0300 KVM: Fix page-crossing MMIO MMIO that are split across a page boundary are currently broken - the code does not expect to be aborted by the exit to userspace for the first MMIO fragment. This patch fixes the problem by generalizing the current code for handling 16-byte MMIOs to handle a number of "fragments", and changes the MMIO code to create those fragments. Signed-off-by: Avi Kivity Signed-off-by: Marcelo Tosatti Could you please pull the code from: https://git.kernel.org/pub/scm/virt/kvm/kvm.git and trace it again? Thank you very much, this fixes the issue I have seen. Thanks, Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 07/03/12 15:13, Avi Kivity wrote: On 07/03/2012 04:01 PM, Peter Lieven wrote: Further output from my testing. Working: Linux 2.6.38 with included kvm module Linux 3.0.0 with included kvm module Not-Working: Linux 3.2.0 with included kvm module Linux 2.6.28 with kvm-kmod 3.4 Linux 3.0.0 with kvm-kmod 3.4 Linux 3.2.0 with kvm-kmod 3.4 I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1. It might be that the code was introduced somewhere between 3.0.0 and 3.2.0 in the kvm kernel module and that the flaw is not in qemu-kvm. Any hints? A bisect could tell us where the problem is. To avoid bisecting all of linux, try git bisect v3.2 v3.0 virt/kvm arch/x86/kvm here we go: commit ca7d58f375c650cf36900cb1da1ca2cc99b13393 Author: Xiao Guangrong Date: Wed Jul 13 14:31:08 2011 +0800 KVM: x86: fix broken read emulation spans a page boundary -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 07/03/12 15:25, Avi Kivity wrote: On 07/03/2012 04:15 PM, Peter Lieven wrote: On 03.07.2012 15:13, Avi Kivity wrote: On 07/03/2012 04:01 PM, Peter Lieven wrote: Further output from my testing. Working: Linux 2.6.38 with included kvm module Linux 3.0.0 with included kvm module Not-Working: Linux 3.2.0 with included kvm module Linux 2.6.28 with kvm-kmod 3.4 Linux 3.0.0 with kvm-kmod 3.4 Linux 3.2.0 with kvm-kmod 3.4 I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1. It might be that the code was introduced somewhere between 3.0.0 and 3.2.0 in the kvm kernel module and that the flaw is not in qemu-kvm. Any hints? A bisect could tell us where the problem is. To avoid bisecting all of linux, try git bisect v3.2 v3.0 virt/kvm arch/x86/kvm would it also be ok to bisect kvm-kmod? Yes, but note that kvm-kmod is spread across two repositories which are not often tested out of sync, so you may get build failures. ok, i just started with this with a 3.0 (good) and 3.2 (bad) vanilla kernel. i can confirm the bug and i am no starting to bisect. it will take while with my equipment if anyone has a powerful testbed to run this i would greatly appreciate help. if anyone wants to reproduce: a) v3.2 from git.kernel.org b) qemu-kvm 1.0.1 from sourceforge c) ubuntu 64-bit 12.04 server cd d) empty (e.g. all zero) hard disk image cmdline: ./qemu-system-x86_64 -m 512 -cdrom /home/lieven/Downloads/ubuntu-12.04-server-amd64.iso -hda /dev/hd1/vmtest -vnc :1 -monitor stdio -boot dc then choose boot from first harddisk and try to quit the qemu monitor with 'quit'. -> hypervisor hangs. peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race condition in qemu-kvm-1.0.1
On 07/03/12 17:54, Marcelo Tosatti wrote: On Wed, Jun 27, 2012 at 12:35:22PM +0200, Peter Lieven wrote: Hi, we recently came across multiple VMs racing and stopping working. It seems to happen when the system is at 100% cpu. One way to reproduce this is: qemu-kvm-1.0.1 with vnc-thread enabled cmdline (or similar): /usr/bin/qemu-kvm-1.0.1 -net tap,vlan=141,script=no,downscript=no,ifname=tap15,vnet_hdr -net nic,vlan=141,model=virtio,macaddr=52:54:00:ff:00:f7 -drive format=host_device,file=/dev/mapper/iqn.2001-05.com.equallogic:0-8a0906-efdf4e007-16700198c7f4fead-02-debug-race-hd01,if=virtio,cache=none,aio=native -m 2048 -smp 2,sockets=1,cores=2,threads=1 -monitor tcp:0:4026,server,nowait -vnc :26 -qmp tcp:0:3026,server,nowait -name 02-debug-race -boot order=dc,menu=off -cdrom /home/kvm/cdrom//root/ubuntu-12.04-server-amd64.iso -k de -pidfile /var/run/qemu/vm-221.pid -mem-prealloc -cpu host,+x2apic,model_id=Intel(R) Xeon(R) CPU L5640 @ 2.27GHz,-tsc -rtc base=utc -usb -usbdevice tablet -no-hpet -vga cirrus Is it reproducible without vnc thread enabled? Yes, it is. I tried it with and without. It is also even happnig with 0.12.5 where no vnc thread (and i think also iothread) is available. Thanks, Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 03.07.2012 15:13, Avi Kivity wrote: On 07/03/2012 04:01 PM, Peter Lieven wrote: Further output from my testing. Working: Linux 2.6.38 with included kvm module Linux 3.0.0 with included kvm module Not-Working: Linux 3.2.0 with included kvm module Linux 2.6.28 with kvm-kmod 3.4 Linux 3.0.0 with kvm-kmod 3.4 Linux 3.2.0 with kvm-kmod 3.4 I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1. It might be that the code was introduced somewhere between 3.0.0 and 3.2.0 in the kvm kernel module and that the flaw is not in qemu-kvm. Any hints? A bisect could tell us where the problem is. To avoid bisecting all of linux, try git bisect v3.2 v3.0 virt/kvm arch/x86/kvm would it also be ok to bisect kvm-kmod? thanks, peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
Further output from my testing. Working: Linux 2.6.38 with included kvm module Linux 3.0.0 with included kvm module Not-Working: Linux 3.2.0 with included kvm module Linux 2.6.28 with kvm-kmod 3.4 Linux 3.0.0 with kvm-kmod 3.4 Linux 3.2.0 with kvm-kmod 3.4 I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1. It might be that the code was introduced somewhere between 3.0.0 and 3.2.0 in the kvm kernel module and that the flaw is not in qemu-kvm. Any hints? Thanks, Peter On 02.07.2012 17:05, Avi Kivity wrote: On 06/28/2012 12:38 PM, Peter Lieven wrote: does anyone know whats that here in handle_mmio? /* hack: Red Hat 7.1 generates these weird accesses. */ if ((addr> 0xa-4&& addr<= 0xa)&& kvm_run->mmio.len == 3) return 0; Just what it says. There is a 4-byte access to address 0x9. The first byte lies in RAM, the next three bytes are in mmio. qemu is geared to power-of-two accesses even though x86 can generate accesses to any number of bytes between 1 and 8. It appears that this has happened with your guest. It's not impossible that it's genuine. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 02.07.2012 17:05, Avi Kivity wrote: On 06/28/2012 12:38 PM, Peter Lieven wrote: does anyone know whats that here in handle_mmio? /* hack: Red Hat 7.1 generates these weird accesses. */ if ((addr> 0xa-4&& addr<= 0xa)&& kvm_run->mmio.len == 3) return 0; Just what it says. There is a 4-byte access to address 0x9. The first byte lies in RAM, the next three bytes are in mmio. qemu is geared to power-of-two accesses even though x86 can generate accesses to any number of bytes between 1 and 8. I just stumbled across the word "hack" in the comment. When the race occurs the CPU is basically reading from 0xa in an endless loop. It appears that this has happened with your guest. It's not impossible that it's genuine. I had a lot to do the last days, but I update our build environment to Ubuntu LTS 12.04 64-bit Server which is based on Linux 3.2.0. I still see the issue. If I use the kvm Module provided with the kernel it is working correctly. If I use kvm-kmod-3.4 with qemu-kvm-1.0.1 (both from sourceforge) I can reproduce the race condition. I will keep you posted when I have more evidence. Thanks, Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop
On 02.07.2012 09:05, Jan Kiszka wrote: On 2012-07-01 21:18, Peter Lieven wrote: Am 01.07.2012 um 10:19 schrieb Avi Kivity: On 06/28/2012 10:27 PM, Peter Lieven wrote: Am 28.06.2012 um 18:32 schrieb Avi Kivity: On 06/28/2012 07:29 PM, Peter Lieven wrote: Yes. A signal is sent, and KVM returns from the guest to userspace on pending signals. is there a description available how this process exactly works? The kernel part is in vcpu_enter_guest(), see the check for signal_pending(). But this hasn't seen changes for quite a long while. Thank you, i will have a look. I noticed a few patches that where submitted during the last year, maybe one of them is related: Switch SIG_IPI to SIGUSR1 Fix signal handling of SIG_IPI when io-thread is enabled In the first commit there is mentioned a "32-on-64-bit Linux kernel bug" is there any reference to that? http://web.archiveorange.com/archive/v/1XS1vwGSFLyYygwTXg1K. Are you running 32-on-64? I think the issue occurs when running a 32-bit guest on a 64-bit system. Afaik, the isolinux loader where is see the race is 32-bit altough it is a 64-bit ubuntu lts cd image. The second case where i have seen the race is on shutdown of a Windows 2000 Server which is also 32-bit. "32-on-64" particularly means using a 32-bit QEMU[-kvm] binary on a 64-bit host kernel. What does "file qemu-system-x86_64" report about yours? Its custom build on a 64-bit linux as 64-bit application. I will try to continue to find out today whats going wrong. Any help or hints appreciated ;-) Thanks, Peter Jan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop
Am 01.07.2012 um 10:19 schrieb Avi Kivity: > On 06/28/2012 10:27 PM, Peter Lieven wrote: >> >> Am 28.06.2012 um 18:32 schrieb Avi Kivity: >> >>> On 06/28/2012 07:29 PM, Peter Lieven wrote: >>>>> Yes. A signal is sent, and KVM returns from the guest to userspace on >>>>> pending signals. >>> >>>> is there a description available how this process exactly works? >>> >>> The kernel part is in vcpu_enter_guest(), see the check for >>> signal_pending(). But this hasn't seen changes for quite a long while. >> >> Thank you, i will have a look. I noticed a few patches that where submitted >> during the last year, maybe one of them is related: >> >> Switch SIG_IPI to SIGUSR1 >> Fix signal handling of SIG_IPI when io-thread is enabled >> >> In the first commit there is mentioned a "32-on-64-bit Linux kernel bug" >> is there any reference to that? > > > http://web.archiveorange.com/archive/v/1XS1vwGSFLyYygwTXg1K. Are you > running 32-on-64? I think the issue occurs when running a 32-bit guest on a 64-bit system. Afaik, the isolinux loader where is see the race is 32-bit altough it is a 64-bit ubuntu lts cd image. The second case where i have seen the race is on shutdown of a Windows 2000 Server which is also 32-bit. Peter > > > -- > error compiling committee.c: too many arguments to function > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop
On 28.06.2012 17:22, Jan Kiszka wrote: On 2012-06-28 17:02, Peter Lieven wrote: On 28.06.2012 15:25, Jan Kiszka wrote: On 2012-06-28 15:05, Peter Lieven wrote: Hi, i debugged my initial problem further and found out that the problem happens to be that the main thread is stuck in pause_all_vcpus() on reset or quit commands in the monitor if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the condition from while (ret == 0) to while ((ret == 0)&& !env->stop); it works, but is this the right fix? "Quit" command seems to work, but on "Reset" the VM enterns pause state. Before entering the wait loop in pause_all_vcpus, there are kicks sent to all vcpus. Now we need to find out why some of those kicks apparently don't reach the destination. can you explain shot what exactly these kicks do? does these kicks lead to leaving the kernel mode and returning to userspace? Yes. A signal is sent, and KVM returns from the guest to userspace on pending signals. is there a description available how this process exactly works? thanks peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop
On 28.06.2012 15:25, Jan Kiszka wrote: On 2012-06-28 15:05, Peter Lieven wrote: Hi, i debugged my initial problem further and found out that the problem happens to be that the main thread is stuck in pause_all_vcpus() on reset or quit commands in the monitor if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the condition from while (ret == 0) to while ((ret == 0)&& !env->stop); it works, but is this the right fix? "Quit" command seems to work, but on "Reset" the VM enterns pause state. Before entering the wait loop in pause_all_vcpus, there are kicks sent to all vcpus. Now we need to find out why some of those kicks apparently don't reach the destination. can you explain shot what exactly these kicks do? does these kicks lead to leaving the kernel mode and returning to userspace? Again: - on which host kernels does this occur, and which change may have changed it? I do not see it in 3.0.0 and have also not seen it in 2.6.38. both the mainline 64-bit ubuntu-server kernels (for natty / oneiric respectively). If I compile a more recent kvm-kmod 3.3 or 3.4 on these machines, it is no longer working. - with which qemu-kvm version is it reproducible, and which commit introduced or fixed it? qemu-kvm-1.0.1 from sourceforge. to get into the scenario it is not sufficient to boot from an empty harddisk. to reproduce i have use a live cd like ubuntu-server 12.04 and choose to boot from the first harddisk. i think the isolinux loader does not check for a valid bootsector and just executes what is found in sector 0. this leads to the mmio reads i posted and 100% cpu load (most spent in kernel). at that time the monitor/qmp is still responsible. if i sent a command that pauses all vcpus, the first cpu is looping in kvm_cpu_exec and the main thread is waiting. at that time the monitor stops responding. i have also seen this issue on very old windows 2000 servers where the system fails to power off and is just halted. maybe this is also a busy loop. i will try to bisect this asap and let you know, maybe the above info helps you already to reproduce. thanks, peter I failed reproducing so far. Jan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop
Hi, i debugged my initial problem further and found out that the problem happens to be that the main thread is stuck in pause_all_vcpus() on reset or quit commands in the monitor if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the condition from while (ret == 0) to while ((ret == 0) && !env->stop); it works, but is this the right fix? "Quit" command seems to work, but on "Reset" the VM enterns pause state. Thanks, Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 28.06.2012 11:39, Jan Kiszka wrote: On 2012-06-28 11:31, Peter Lieven wrote: On 28.06.2012 11:21, Jan Kiszka wrote: On 2012-06-28 11:11, Peter Lieven wrote: On 27.06.2012 18:54, Jan Kiszka wrote: On 2012-06-27 17:39, Peter Lieven wrote: Hi all, i debugged this further and found out that kvm-kmod-3.0 is working with qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0). Has anyone a clue which new KVM feature could cause this if a vcpu is in an infinite loop? Before accusing kvm-kmod ;), can you check if the effect is visible with an original Linux 3.3.x or 3.4.x kernel as well? sorry, i should have been more specific. maybe I also misunderstood sth. I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel 3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if I use a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't. however, maybe we don't have to dig to deep - see below. kvm-kmod wraps and patches things to make the kvm code from 3.3/3.4 working on an older kernel. This step may introduce bugs of its own. Therefore my suggestion to use a "real" 3.x kernel to exclude that risk first of all. Then, bisection the change in qemu-kvm that apparently resolved the issue would be interesting. If we have to dig deeper, tracing [1] the lockup would likely be helpful (all events of the qemu process, not just KVM related ones: trace-cmd record -e all qemu-system-x86_64 ...). that here is bascially whats going on: qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) its doing that forever. this is tracing the kvm module. doing the qemu-system-x86_64 trace is a bit compilcated, but maybe this is already sufficient. otherwise i will of course gather this info as well. That's only tracing KVM event, and it's tracing when things went wrong already. We may need a full trace (-e all) specifically for the period when this pattern above started. i will do that. maybe i should explain that the vcpu is executing garbage when this above starts. its basically booting from an empty harddisk. if i understand correctly qemu-kvm loops in kvm_cpu_exec(CPUState *env); maybe the time to handle the monitor/qmp connection is just to short. if i understand furhter correctly, it can only handle monitor connections while qemu-kvm is executing kvm_vcpu_ioctl(env, KVM_RUN, 0); or am i wrong here? the time spend in this state might be rather short. Unless you played with priorities and affinities, the Linux scheduler should provide the required time to the iothread. my concern is not that the machine hangs, just the the hypervisor is unresponsive and its impossible to reset or quit gracefully. the only way to get the hypervisor ended is via SIGKILL. Right. Even if the guest runs wild, you must be able to control the vm via the monitor etc. If not, that's a bug. what i observed just know is that the monitor is working up to the point i try to quit the hypervisor or try to reset the cpu. so we where looking at a completely wrong place... it seems that in this short excerpt, that the deadlock appears not on excution but when the vcpus shall be paused. Program received signal SIGINT, Interrupt. 0x7fc8ec36785c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 (gdb) th
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 28.06.2012 11:39, Jan Kiszka wrote: On 2012-06-28 11:31, Peter Lieven wrote: On 28.06.2012 11:21, Jan Kiszka wrote: On 2012-06-28 11:11, Peter Lieven wrote: On 27.06.2012 18:54, Jan Kiszka wrote: On 2012-06-27 17:39, Peter Lieven wrote: Hi all, i debugged this further and found out that kvm-kmod-3.0 is working with qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0). Has anyone a clue which new KVM feature could cause this if a vcpu is in an infinite loop? Before accusing kvm-kmod ;), can you check if the effect is visible with an original Linux 3.3.x or 3.4.x kernel as well? sorry, i should have been more specific. maybe I also misunderstood sth. I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel 3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if I use a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't. however, maybe we don't have to dig to deep - see below. kvm-kmod wraps and patches things to make the kvm code from 3.3/3.4 working on an older kernel. This step may introduce bugs of its own. Therefore my suggestion to use a "real" 3.x kernel to exclude that risk first of all. Then, bisection the change in qemu-kvm that apparently resolved the issue would be interesting. If we have to dig deeper, tracing [1] the lockup would likely be helpful (all events of the qemu process, not just KVM related ones: trace-cmd record -e all qemu-system-x86_64 ...). that here is bascially whats going on: qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) its doing that forever. this is tracing the kvm module. doing the qemu-system-x86_64 trace is a bit compilcated, but maybe this is already sufficient. otherwise i will of course gather this info as well. That's only tracing KVM event, and it's tracing when things went wrong already. We may need a full trace (-e all) specifically for the period when this pattern above started. i will do that. maybe i should explain that the vcpu is executing garbage when this above starts. its basically booting from an empty harddisk. if i understand correctly qemu-kvm loops in kvm_cpu_exec(CPUState *env); maybe the time to handle the monitor/qmp connection is just to short. if i understand furhter correctly, it can only handle monitor connections while qemu-kvm is executing kvm_vcpu_ioctl(env, KVM_RUN, 0); or am i wrong here? the time spend in this state might be rather short. Unless you played with priorities and affinities, the Linux scheduler should provide the required time to the iothread. I have a 1.1GB (85MB compressed) trace-file. If you have time to look at it I could drop it somewhere. We currently run all VMs with nice 1 because we observed that this improves that controlability of the Node in case all VMs have execessive CPU load. Running the VM unniced does not change the behaviour unfortunately. Peter my concern is not that the machine hangs, just the the hypervisor is unresponsive and its impossible to reset or quit gracefully. the only way to get the hypervisor ended is via SIGKILL. Right. Even if the guest runs wild, you must be able to control the vm via the monitor etc. If not, that's a bug. Jan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body o
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
does anyone know whats that here in handle_mmio? /* hack: Red Hat 7.1 generates these weird accesses. */ if ((addr > 0xa-4 && addr <= 0xa) && kvm_run->mmio.len == 3) return 0; thanks, peter On 28.06.2012 11:31, Peter Lieven wrote: On 28.06.2012 11:21, Jan Kiszka wrote: On 2012-06-28 11:11, Peter Lieven wrote: On 27.06.2012 18:54, Jan Kiszka wrote: On 2012-06-27 17:39, Peter Lieven wrote: Hi all, i debugged this further and found out that kvm-kmod-3.0 is working with qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0). Has anyone a clue which new KVM feature could cause this if a vcpu is in an infinite loop? Before accusing kvm-kmod ;), can you check if the effect is visible with an original Linux 3.3.x or 3.4.x kernel as well? sorry, i should have been more specific. maybe I also misunderstood sth. I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel 3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if I use a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't. however, maybe we don't have to dig to deep - see below. kvm-kmod wraps and patches things to make the kvm code from 3.3/3.4 working on an older kernel. This step may introduce bugs of its own. Therefore my suggestion to use a "real" 3.x kernel to exclude that risk first of all. Then, bisection the change in qemu-kvm that apparently resolved the issue would be interesting. If we have to dig deeper, tracing [1] the lockup would likely be helpful (all events of the qemu process, not just KVM related ones: trace-cmd record -e all qemu-system-x86_64 ...). that here is bascially whats going on: qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) its doing that forever. this is tracing the kvm module. doing the qemu-system-x86_64 trace is a bit compilcated, but maybe this is already sufficient. otherwise i will of course gather this info as well. That's only tracing KVM event, and it's tracing when things went wrong already. We may need a full trace (-e all) specifically for the period when this pattern above started. i will do that. maybe i should explain that the vcpu is executing garbage when this above starts. its basically booting from an empty harddisk. if i understand correctly qemu-kvm loops in kvm_cpu_exec(CPUState *env); maybe the time to handle the monitor/qmp connection is just to short. if i understand furhter correctly, it can only handle monitor connections while qemu-kvm is executing kvm_vcpu_ioctl(env, KVM_RUN, 0); or am i wrong here? the time spend in this state might be rather short. my concern is not that the machine hangs, just the the hypervisor is unresponsive and its impossible to reset or quit gracefully. the only way to get the hypervisor ended is via SIGKILL. thanks peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 28.06.2012 11:21, Jan Kiszka wrote: On 2012-06-28 11:11, Peter Lieven wrote: On 27.06.2012 18:54, Jan Kiszka wrote: On 2012-06-27 17:39, Peter Lieven wrote: Hi all, i debugged this further and found out that kvm-kmod-3.0 is working with qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0). Has anyone a clue which new KVM feature could cause this if a vcpu is in an infinite loop? Before accusing kvm-kmod ;), can you check if the effect is visible with an original Linux 3.3.x or 3.4.x kernel as well? sorry, i should have been more specific. maybe I also misunderstood sth. I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel 3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if I use a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't. however, maybe we don't have to dig to deep - see below. kvm-kmod wraps and patches things to make the kvm code from 3.3/3.4 working on an older kernel. This step may introduce bugs of its own. Therefore my suggestion to use a "real" 3.x kernel to exclude that risk first of all. Then, bisection the change in qemu-kvm that apparently resolved the issue would be interesting. If we have to dig deeper, tracing [1] the lockup would likely be helpful (all events of the qemu process, not just KVM related ones: trace-cmd record -e all qemu-system-x86_64 ...). that here is bascially whats going on: qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) its doing that forever. this is tracing the kvm module. doing the qemu-system-x86_64 trace is a bit compilcated, but maybe this is already sufficient. otherwise i will of course gather this info as well. That's only tracing KVM event, and it's tracing when things went wrong already. We may need a full trace (-e all) specifically for the period when this pattern above started. i will do that. maybe i should explain that the vcpu is executing garbage when this above starts. its basically booting from an empty harddisk. if i understand correctly qemu-kvm loops in kvm_cpu_exec(CPUState *env); maybe the time to handle the monitor/qmp connection is just to short. if i understand furhter correctly, it can only handle monitor connections while qemu-kvm is executing kvm_vcpu_ioctl(env, KVM_RUN, 0); or am i wrong here? the time spend in this state might be rather short. my concern is not that the machine hangs, just the the hypervisor is unresponsive and its impossible to reset or quit gracefully. the only way to get the hypervisor ended is via SIGKILL. thanks peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 27.06.2012 18:54, Jan Kiszka wrote: On 2012-06-27 17:39, Peter Lieven wrote: Hi all, i debugged this further and found out that kvm-kmod-3.0 is working with qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0). Has anyone a clue which new KVM feature could cause this if a vcpu is in an infinite loop? Before accusing kvm-kmod ;), can you check if the effect is visible with an original Linux 3.3.x or 3.4.x kernel as well? sorry, i should have been more specific. maybe I also misunderstood sth. I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel 3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if I use a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't. however, maybe we don't have to dig to deep - see below. Then, bisection the change in qemu-kvm that apparently resolved the issue would be interesting. If we have to dig deeper, tracing [1] the lockup would likely be helpful (all events of the qemu process, not just KVM related ones: trace-cmd record -e all qemu-system-x86_64 ...). that here is bascially whats going on: qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) its doing that forever. this is tracing the kvm module. doing the qemu-system-x86_64 trace is a bit compilcated, but maybe this is already sufficient. otherwise i will of course gather this info as well. thanks peter Jan [1] http://www.linux-kvm.org/page/Tracing --- Hi, we recently came across multiple VMs racing and stopping working. It seems to happen when the system is at 100% cpu. One way to reproduce this is: qemu-kvm-1.0.1 with vnc-thread enabled cmdline (or similar): /usr/bin/qemu-kvm-1.0.1 -net tap,vlan=141,script=no,downscript=no,ifname=tap15,vnet_hdr -net nic,vlan=141,model=virtio,macaddr=52:54:00:ff:00:f7 -drive format=host_device,file=/dev/mapper/iqn.2001-05.com.equallogic:0-8a0906-efdf4e007-16700198c7f4fead-02-debug-race-hd01,if=virtio,cache=none,aio=native -m 2048 -smp 2,sockets=1,cores=2,threads=1 -monitor tcp:0:4026,server,nowait -vnc :26 -qmp tcp:0:3026,server,nowait -name 02-debug-race -boot order=dc,menu=off -cdrom /home/kvm/cdrom//root/ubuntu-12.04-server-amd64.iso -k de -pidfile /var/run/qemu/vm-221.pid -mem-prealloc -cpu host,+x2apic,model_id=Intel(R) Xeon(R) CPU L5640 @ 2.27GHz,-tsc -rtc base=utc -usb -usbdevice tablet -no-hpet -vga cirrus it is important that the attached virtio image contains only zeroes. if the system boots from cd, select boot from first harddisk. the hypervisor then hangs at 100% cpu and neither monitor nor qmp are responsive anymore. i have also seen customers reporting this when a VM is shut down. if this is connected to the threaded vnc server it might be important to connected at this time. debug backtrace attached. Thanks, Peter -- (gdb) file /usr/bin/qemu-kvm-1.0.1 Reading symbols from /usr/bin/qemu-kvm-1.0.1...done. (gdb) attach 5145 Attaching to program: /usr/bin/qemu-kvm-1.0.1, process 5145 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 [Thread debugging using libthread_db enabled] [New Thread 0x7f54d08b9700 (LWP 5253)] [New Thread 0x7f5552757700 (LWP 5152)] [New Thread 0x7f5552f58700 (LWP 5151)] 0x7f5553c6b5a3 in select () from /lib/libc.so.6 (gdb) in
race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
Hi all, i debugged this further and found out that kvm-kmod-3.0 is working with qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0). Has anyone a clue which new KVM feature could cause this if a vcpu is in an infinite loop? Thanks, Peter --- Hi, we recently came across multiple VMs racing and stopping working. It seems to happen when the system is at 100% cpu. One way to reproduce this is: qemu-kvm-1.0.1 with vnc-thread enabled cmdline (or similar): /usr/bin/qemu-kvm-1.0.1 -net tap,vlan=141,script=no,downscript=no,ifname=tap15,vnet_hdr -net nic,vlan=141,model=virtio,macaddr=52:54:00:ff:00:f7 -drive format=host_device,file=/dev/mapper/iqn.2001-05.com.equallogic:0-8a0906-efdf4e007-16700198c7f4fead-02-debug-race-hd01,if=virtio,cache=none,aio=native -m 2048 -smp 2,sockets=1,cores=2,threads=1 -monitor tcp:0:4026,server,nowait -vnc :26 -qmp tcp:0:3026,server,nowait -name 02-debug-race -boot order=dc,menu=off -cdrom /home/kvm/cdrom//root/ubuntu-12.04-server-amd64.iso -k de -pidfile /var/run/qemu/vm-221.pid -mem-prealloc -cpu host,+x2apic,model_id=Intel(R) Xeon(R) CPU L5640 @ 2.27GHz,-tsc -rtc base=utc -usb -usbdevice tablet -no-hpet -vga cirrus it is important that the attached virtio image contains only zeroes. if the system boots from cd, select boot from first harddisk. the hypervisor then hangs at 100% cpu and neither monitor nor qmp are responsive anymore. i have also seen customers reporting this when a VM is shut down. if this is connected to the threaded vnc server it might be important to connected at this time. debug backtrace attached. Thanks, Peter -- (gdb) file /usr/bin/qemu-kvm-1.0.1 Reading symbols from /usr/bin/qemu-kvm-1.0.1...done. (gdb) attach 5145 Attaching to program: /usr/bin/qemu-kvm-1.0.1, process 5145 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 [Thread debugging using libthread_db enabled] [New Thread 0x7f54d08b9700 (LWP 5253)] [New Thread 0x7f5552757700 (LWP 5152)] [New Thread 0x7f5552f58700 (LWP 5151)] 0x7f5553c6b5a3 in select () from /lib/libc.so.6 (gdb) info threads 4 Thread 0x7f5552f58700 (LWP 5151) 0x7f5553c6a747 in ioctl () from /lib/libc.so.6 3 Thread 0x7f5552757700 (LWP 5152) 0x7f5553c6a747 in ioctl () from /lib/libc.so.6 2 Thread 0x7f54d08b9700 (LWP 5253) 0x7f5553f1a85c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 * 1 Thread 0x7f50d700 (LWP 5145) 0x7f5553c6b5a3 in select () from /lib/libc.so.6 (gdb) thread apply all bt Thread 4 (Thread 0x7f5552f58700 (LWP 5151)): #0 0x7f5553c6a747 in ioctl () from /lib/libc.so.6 #1 0x7f727830 in kvm_vcpu_ioctl (env=0x7f5557652f10, type=44672) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:1101 #2 0x7f72728a in kvm_cpu_exec (env=0x7f5557652f10) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:987 #3 0x7f6f5c08 in qemu_kvm_cpu_thread_fn (arg=0x7f5557652f10) at /usr/src/qemu-kvm-1.0.1/cpus.c:740 #4 0x7f5553f159ca in start_thread () from /lib/libpthread.so.0 #5 0x7f5553c72cdd in clone () from /lib/libc.so.6 #6 0x in ?? () Thread 3 (Thread 0x7f5552757700 (LWP 5152)): #0 0x7f5553c6a747 in ioctl () from /lib/libc.so.6 #1 0x7f727830 in kvm_vcpu_ioctl (env=0x7f555766ae60, type=44672) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:1101 #2 0x7f72728a in kvm_cpu_exec (env=0x7f555766ae60) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:987 #3 0x7f6f5c08 in qemu_kvm_cpu_thread_fn (arg=0x7f555766ae60) at /usr/src/qemu-kvm-1.0.1/cpus.c:740 #4 0x7f5553f159ca in start_thread () from /lib/libpthread.so.0 #5 0x7f5553c72cdd in clone () from /lib/libc.so.6 #6 0x in ?? () Thread 2 (Thread 0x7f54d08b9700 (LWP 5253)): #0 0x7f5553f1a85c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x7f679f5d in qemu_cond_wait (cond=0x7f5557ede1e0, mutex=0x7f5557ede210) at qemu-thread-posix.c:113 #2 0x7f6b06a1 in vnc_worker_thread_loop (queue=0x7f5557ede1e0) at ui/vnc-jobs-async.c:222 #3 0x7f6b0b7f in vnc_worker_thread (arg=0x7f5557ede1e0) at ui/vnc-jobs-async.c:318 #4 0x7f5553f159ca in start_thread () from /lib/libpthread.so.0 #5 0x7f5553c72cdd in clone () from /lib/libc.so.6 #6 0x in ?? () Thread 1 (Thread 0x7f50d700 (LWP 5145)): #0 0x7f5553c6b5a3 in select () from /lib/libc.so.6 #1 0x7f6516be in main_loop_wait (nonblocking=0) at main-loop.c:456 #2 0x7f647ad0 in main_loop () at /usr/src/qemu-kvm-1.0.1/vl.c:1482 #3 0x7f64c698 in main (argc=38, argv=0x79d894a8, envp=0x79d895e0) at /usr/src/qemu-kvm-1.0.1/vl.c:3523 (gdb) thread apply all bt full Thread 4 (Thread 0x7f5552f58700 (LWP 5151)): #0 0x7f5553c6a747 in ioctl () from /lib/libc.so.6 No symbol table info available. #1 0x7f7278
race condition in qemu-kvm-1.0.1
Hi, we recently came across multiple VMs racing and stopping working. It seems to happen when the system is at 100% cpu. One way to reproduce this is: qemu-kvm-1.0.1 with vnc-thread enabled cmdline (or similar): /usr/bin/qemu-kvm-1.0.1 -net tap,vlan=141,script=no,downscript=no,ifname=tap15,vnet_hdr -net nic,vlan=141,model=virtio,macaddr=52:54:00:ff:00:f7 -drive format=host_device,file=/dev/mapper/iqn.2001-05.com.equallogic:0-8a0906-efdf4e007-16700198c7f4fead-02-debug-race-hd01,if=virtio,cache=none,aio=native -m 2048 -smp 2,sockets=1,cores=2,threads=1 -monitor tcp:0:4026,server,nowait -vnc :26 -qmp tcp:0:3026,server,nowait -name 02-debug-race -boot order=dc,menu=off -cdrom /home/kvm/cdrom//root/ubuntu-12.04-server-amd64.iso -k de -pidfile /var/run/qemu/vm-221.pid -mem-prealloc -cpu host,+x2apic,model_id=Intel(R) Xeon(R) CPU L5640 @ 2.27GHz,-tsc -rtc base=utc -usb -usbdevice tablet -no-hpet -vga cirrus it is important that the attached virtio image contains only zeroes. if the system boots from cd, select boot from first harddisk. the hypervisor then hangs at 100% cpu and neither monitor nor qmp are responsive anymore. i have also seen customers reporting this when a VM is shut down. if this is connected to the threaded vnc server it might be important to connected at this time. debug backtrace attached. Thanks, Peter -- (gdb) file /usr/bin/qemu-kvm-1.0.1 Reading symbols from /usr/bin/qemu-kvm-1.0.1...done. (gdb) attach 5145 Attaching to program: /usr/bin/qemu-kvm-1.0.1, process 5145 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 [Thread debugging using libthread_db enabled] [New Thread 0x7f54d08b9700 (LWP 5253)] [New Thread 0x7f5552757700 (LWP 5152)] [New Thread 0x7f5552f58700 (LWP 5151)] 0x7f5553c6b5a3 in select () from /lib/libc.so.6 (gdb) info threads 4 Thread 0x7f5552f58700 (LWP 5151) 0x7f5553c6a747 in ioctl () from /lib/libc.so.6 3 Thread 0x7f5552757700 (LWP 5152) 0x7f5553c6a747 in ioctl () from /lib/libc.so.6 2 Thread 0x7f54d08b9700 (LWP 5253) 0x7f5553f1a85c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 * 1 Thread 0x7f50d700 (LWP 5145) 0x7f5553c6b5a3 in select () from /lib/libc.so.6 (gdb) thread apply all bt Thread 4 (Thread 0x7f5552f58700 (LWP 5151)): #0 0x7f5553c6a747 in ioctl () from /lib/libc.so.6 #1 0x7f727830 in kvm_vcpu_ioctl (env=0x7f5557652f10, type=44672) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:1101 #2 0x7f72728a in kvm_cpu_exec (env=0x7f5557652f10) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:987 #3 0x7f6f5c08 in qemu_kvm_cpu_thread_fn (arg=0x7f5557652f10) at /usr/src/qemu-kvm-1.0.1/cpus.c:740 #4 0x7f5553f159ca in start_thread () from /lib/libpthread.so.0 #5 0x7f5553c72cdd in clone () from /lib/libc.so.6 #6 0x in ?? () Thread 3 (Thread 0x7f5552757700 (LWP 5152)): #0 0x7f5553c6a747 in ioctl () from /lib/libc.so.6 #1 0x7f727830 in kvm_vcpu_ioctl (env=0x7f555766ae60, type=44672) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:1101 #2 0x7f72728a in kvm_cpu_exec (env=0x7f555766ae60) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:987 #3 0x7f6f5c08 in qemu_kvm_cpu_thread_fn (arg=0x7f555766ae60) at /usr/src/qemu-kvm-1.0.1/cpus.c:740 #4 0x7f5553f159ca in start_thread () from /lib/libpthread.so.0 #5 0x7f5553c72cdd in clone () from /lib/libc.so.6 #6 0x in ?? () Thread 2 (Thread 0x7f54d08b9700 (LWP 5253)): #0 0x7f5553f1a85c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x7f679f5d in qemu_cond_wait (cond=0x7f5557ede1e0, mutex=0x7f5557ede210) at qemu-thread-posix.c:113 #2 0x7f6b06a1 in vnc_worker_thread_loop (queue=0x7f5557ede1e0) at ui/vnc-jobs-async.c:222 #3 0x7f6b0b7f in vnc_worker_thread (arg=0x7f5557ede1e0) at ui/vnc-jobs-async.c:318 #4 0x7f5553f159ca in start_thread () from /lib/libpthread.so.0 #5 0x7f5553c72cdd in clone () from /lib/libc.so.6 #6 0x in ?? () Thread 1 (Thread 0x7f50d700 (LWP 5145)): #0 0x7f5553c6b5a3 in select () from /lib/libc.so.6 #1 0x7f6516be in main_loop_wait (nonblocking=0) at main-loop.c:456 #2 0x7f647ad0 in main_loop () at /usr/src/qemu-kvm-1.0.1/vl.c:1482 #3 0x7f64c698 in main (argc=38, argv=0x79d894a8, envp=0x79d895e0) at /usr/src/qemu-kvm-1.0.1/vl.c:3523 (gdb) thread apply all bt full Thread 4 (Thread 0x7f5552f58700 (LWP 5151)): #0 0x7f5553c6a747 in ioctl () from /lib/libc.so.6 No symbol table info available. #1 0x7f727830 in kvm_vcpu_ioctl (env=0x7f5557652f10, type=44672) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:1101 ret = 32597 arg = 0x0 ap = {{gp_offset = 24, fp_offset = 48, overflow_arg_area = 0x7f5552f57e50, reg_save_area = 0x7f5552f57d90}} #2 0x7f72728a in kvm_cpu_exec (env=0x7f5557652f10) at /usr/src/qemu-kvm-1.0
Re: qemu-kvm-1.0 crashes with threaded vnc server?
On 13.03.2012 16:06, Alexander Graf wrote: On 13.03.2012, at 16:05, Corentin Chary wrote: On Tue, Mar 13, 2012 at 12:29 PM, Peter Lieven wrote: On 11.02.2012 09:55, Corentin Chary wrote: On Thu, Feb 9, 2012 at 7:08 PM, Peter Lieven wrote: Hi, is anyone aware if there are still problems when enabling the threaded vnc server? I saw some VMs crashing when using a qemu-kvm build with --enable-vnc-thread. qemu-kvm-1.0[22646]: segfault at 0 ip 7fec1ca7ea0b sp 7fec19d056d0 error 6 in libz.so.1.2.3.3[7fec1ca75000+16000] qemu-kvm-1.0[26056]: segfault at 7f06d8d6e010 ip 7f06e0a30d71 sp 7f06df035748 error 6 in libc-2.11.1.so[7f06e09aa000+17a000] I had no time to debug further. It seems to happen shortly after migrating, but thats uncertain. At least the segfault in libz seems to give a hint to VNC since I cannot image of any other part of qemu-kvm using libz except for VNC server. Thanks, Peter Hi Peter, I found two patches on my git tree that I sent long ago but somehow get lost on the mailing list. I rebased the tree but did not have the time (yet) to test them. http://git.iksaif.net/?p=qemu.git;a=shortlog;h=refs/heads/wip Feel free to try them. If QEMU segfault again, please send a full gdb backtrace / valgrind trace / way to reproduce :). Thanks, I have seen no more crashes with these to patches applied. I would suggest it would be good to push them to the master repository. Thank you, Peter Ccing Alexander, Ah, cool. Corentin, I think you're right now the closest thing we have to a maintainer for VNC. Could you please just send out a pull request for those? hi all, i suspect there is still a problem with the threaded vnc server. its just a guess, but we saw a resonable number of vms hanging in the last weeks. hanging meaning the emulation is stopped and the qemu-kvm process does no longer react, not on monitor, not on vnc, not on qmp. why i suspect the threaded vnc server is that in all cases we have analyzed this happened with an open vnc session and only on nodes with the threaded vnc server enabled. it might also be the case that this happens at a resolution change. is there anything known or has someone an idea? we are running qemu-kvm 1.0.1 with vnc: don't mess up with iohandlers in the vnc thread vnc: Limit r/w access to size of allocated memory compiled in. unfortunately, i was not yet able to reproduce this with a debugger attached. thanks, peter Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Assertion after chaning display resolution
On 24.04.2012 15:34, Alon Levy wrote: On Tue, Apr 24, 2012 at 03:24:31PM +0200, Peter Lieven wrote: Hi all, I saw the following assert after chaning display resolution. This might be the cause, but i am not sure. Threaded VNC is enabled. Anyone ever seen this? qemu-kvm-1.0: malloc.c:3096: sYSMALLOc: Assertion `(old_top == (((mbinptr) (((char *)&((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd&& old_size == 0) || ((unsigned long) (old_size)>= (unsigned long)__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 * (sizeof(size_t))) - 1))& ~((2 * (sizeof(size_t))) - 1)))&& ((old_top)->size& 0x1)&& ((unsigned long)old_end& pagemask) == 0)' failed. A shot in the dark - does valgrind show anything wrong? Problem is i cannot reproduce this, but I can try running the VM in valgrind and check if there is any problem. Peter Thanks, Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Assertion after chaning display resolution
Hi all, I saw the following assert after chaning display resolution. This might be the cause, but i am not sure. Threaded VNC is enabled. Anyone ever seen this? qemu-kvm-1.0: malloc.c:3096: sYSMALLOc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 * (sizeof(size_t))) - 1)) & ~((2 * (sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long)old_end & pagemask) == 0)' failed. Thanks, Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
qemu-kvm 1.0.1?
Hi, i was wondering if there will be a qemu-kvm version 1.0.1? The last tag I see here is 1.0: http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=summary Any hints? Thanks, Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance trouble
On 27.03.2012 19:06, Vadim Rozenfeld wrote: On Tuesday, March 27, 2012 06:16:11 PM Peter Lieven wrote: On 27.03.2012 18:12, Vadim Rozenfeld wrote: On Tuesday, March 27, 2012 05:58:01 PM Peter Lieven wrote: On 27.03.2012 17:44, Vadim Rozenfeld wrote: On Tuesday, March 27, 2012 04:06:13 PM Peter Lieven wrote: On 27.03.2012 14:29, Gleb Natapov wrote: On Tue, Mar 27, 2012 at 02:28:04PM +0200, Peter Lieven wrote: On 27.03.2012 14:26, Gleb Natapov wrote: On Tue, Mar 27, 2012 at 02:20:23PM +0200, Peter Lieven wrote: On 27.03.2012 12:00, Gleb Natapov wrote: On Tue, Mar 27, 2012 at 11:26:29AM +0200, Peter Lieven wrote: On 27.03.2012 11:23, Vadim Rozenfeld wrote: On Tuesday, March 27, 2012 10:56:05 AM Gleb Natapov wrote: On Mon, Mar 26, 2012 at 10:11:43PM +0200, Vadim Rozenfeld wrote: On Monday, March 26, 2012 08:54:50 PM Peter Lieven wrote: On 26.03.2012 20:36, Vadim Rozenfeld wrote: On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote: On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld wrote: On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote: On 22.03.2012 10:38, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 10:52:42 AM Peter Lieven wrote: On 22.03.2012 09:48, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote: On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote: On 21.03.2012 12:10, David Cure wrote: hello, Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov ecrivait : Try to addto cpu definition in XML and check command line. ok I try this but I can't use to map the host cpu (my libvirt is 0.9.8) so I use : Opteron_G3 (the physical server use Opteron CPU). The log is here : http://www.roullier.net/Report/report-3.2-vhost-ne t- 1v cpu-cp u.tx t.gz And now with only 1 vcpu, the response time is 8.5s, great improvment. We keep this configuration for production : we check the response time when some other users are connected. please keep in mind, that setting -hypervisor, disabling hpet and only one vcpu makes windows use tsc as clocksource. you have to make sure, that your vm is not switching between physical sockets on your system and that you have constant_tsc feature to have a stable tsc between the cores in the same socket. its also likely that the vm will crash when live migrated. All true. I asked to try -hypervisor only to verify where we loose performance. Since you get good result with it frequent access to PM timer is probably the reason. I do not recommend using -hypervisor for production! @gleb: do you know whats the state of in-kernel hyper-v timers? Vadim is working on it. I'll let him answer. It would be nice to have synthetic timers supported. But, at the moment, I'm only researching this feature. So it will take months at least? I would say weeks. Is there a way, we could contribute and help you with this? Hi Peter, You are welcome to add an appropriate handler. I think Vadim refers to this HV MSR http://msdn.microsoft.com/en-us/library/windows/hardware/f f5 42 633%28 v=vs .85 %29.aspx This one is pretty simple to support. Please see attachments for more details. I was thinking about synthetic timers http://msdn.microsoft.com/en- us/library/windows/hardware/ff542758(v=vs.85).aspx is this what microsoft qpc uses as clocksource in hyper-v? Yes, it should be enough for Win7 / W2K8R2. To clarify the thing that microsoft qpc uses is what is implemented by the patch Vadim attached to his previous email. But I believe that additional qemu patch is needed for Windows to actually use it. You are right. bits 1 and 9 must be set to on in leaf 0x4003 and HPET should be completely removed from ACPI. could you advise how to do this and/or make a patch? the stuff you send yesterday is for qemu, right? would it be possible to use it in qemu-kvm also? No, they are for kernel. i meant the qemu.diff file. Yes, I missed the second attachment. if i understand correctly i have to pass -cpu host,+hv_refcnt to qemu? Looks like it. ok, so it would be interesting if it helps to avoid the pmtimer reads we observed earlier. right? Yes. first feedback: performance seems to be amazing. i cannot confirm that it breaks hv_spinlocks, hv_vapic and hv_relaxed. why did you assume this? I didn't mean that hv_refcnt will break any other hyper-v features. I just want to say that turning hv_refcnt on (as any other hv_ option) will crash Win8 on boot-up. yes, i got it meanwhile ;-) let me know what you think should be done to further test the refcnt implementation. i would suggest to return at least 0x if msr 0x4021 is read. IIRC Win7(W2k8R2) only reads this MSR. Win8 reads and writes. you mean win7 only writes, don't you? Oh, yes. It only writes. Actually it works this way: kernel allocates one page, maps it into the system
Re: performance trouble
On 27.03.2012 18:12, Vadim Rozenfeld wrote: On Tuesday, March 27, 2012 05:58:01 PM Peter Lieven wrote: On 27.03.2012 17:44, Vadim Rozenfeld wrote: On Tuesday, March 27, 2012 04:06:13 PM Peter Lieven wrote: On 27.03.2012 14:29, Gleb Natapov wrote: On Tue, Mar 27, 2012 at 02:28:04PM +0200, Peter Lieven wrote: On 27.03.2012 14:26, Gleb Natapov wrote: On Tue, Mar 27, 2012 at 02:20:23PM +0200, Peter Lieven wrote: On 27.03.2012 12:00, Gleb Natapov wrote: On Tue, Mar 27, 2012 at 11:26:29AM +0200, Peter Lieven wrote: On 27.03.2012 11:23, Vadim Rozenfeld wrote: On Tuesday, March 27, 2012 10:56:05 AM Gleb Natapov wrote: On Mon, Mar 26, 2012 at 10:11:43PM +0200, Vadim Rozenfeld wrote: On Monday, March 26, 2012 08:54:50 PM Peter Lieven wrote: On 26.03.2012 20:36, Vadim Rozenfeld wrote: On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote: On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld wrote: On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote: On 22.03.2012 10:38, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 10:52:42 AM Peter Lieven wrote: On 22.03.2012 09:48, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote: On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote: On 21.03.2012 12:10, David Cure wrote: hello, Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov ecrivait : Try to add to cpu definition in XML and check command line. ok I try this but I can't use to map the host cpu (my libvirt is 0.9.8) so I use : Opteron_G3 (the physical server use Opteron CPU). The log is here : http://www.roullier.net/Report/report-3.2-vhost-net- 1v cpu-cp u.tx t.gz And now with only 1 vcpu, the response time is 8.5s, great improvment. We keep this configuration for production : we check the response time when some other users are connected. please keep in mind, that setting -hypervisor, disabling hpet and only one vcpu makes windows use tsc as clocksource. you have to make sure, that your vm is not switching between physical sockets on your system and that you have constant_tsc feature to have a stable tsc between the cores in the same socket. its also likely that the vm will crash when live migrated. All true. I asked to try -hypervisor only to verify where we loose performance. Since you get good result with it frequent access to PM timer is probably the reason. I do not recommend using -hypervisor for production! @gleb: do you know whats the state of in-kernel hyper-v timers? Vadim is working on it. I'll let him answer. It would be nice to have synthetic timers supported. But, at the moment, I'm only researching this feature. So it will take months at least? I would say weeks. Is there a way, we could contribute and help you with this? Hi Peter, You are welcome to add an appropriate handler. I think Vadim refers to this HV MSR http://msdn.microsoft.com/en-us/library/windows/hardware/ff5 42 633%28 v=vs .85 %29.aspx This one is pretty simple to support. Please see attachments for more details. I was thinking about synthetic timers http://msdn.microsoft.com/en- us/library/windows/hardware/ff542758(v=vs.85).aspx is this what microsoft qpc uses as clocksource in hyper-v? Yes, it should be enough for Win7 / W2K8R2. To clarify the thing that microsoft qpc uses is what is implemented by the patch Vadim attached to his previous email. But I believe that additional qemu patch is needed for Windows to actually use it. You are right. bits 1 and 9 must be set to on in leaf 0x4003 and HPET should be completely removed from ACPI. could you advise how to do this and/or make a patch? the stuff you send yesterday is for qemu, right? would it be possible to use it in qemu-kvm also? No, they are for kernel. i meant the qemu.diff file. Yes, I missed the second attachment. if i understand correctly i have to pass -cpu host,+hv_refcnt to qemu? Looks like it. ok, so it would be interesting if it helps to avoid the pmtimer reads we observed earlier. right? Yes. first feedback: performance seems to be amazing. i cannot confirm that it breaks hv_spinlocks, hv_vapic and hv_relaxed. why did you assume this? I didn't mean that hv_refcnt will break any other hyper-v features. I just want to say that turning hv_refcnt on (as any other hv_ option) will crash Win8 on boot-up. yes, i got it meanwhile ;-) let me know what you think should be done to further test the refcnt implementation. i would suggest to return at least 0x if msr 0x4021 is read. IIRC Win7(W2k8R2) only reads this MSR. Win8 reads and writes. you mean win7 only writes, don't you? at least you put a break in set_msr_hyperv for this msr. i just thought that it would be ok to return the value that is defined for iTSC is not supported? peter peter Cheers, Vadim. no more pmtimer reads. i can now
Re: performance trouble
On 27.03.2012 17:44, Vadim Rozenfeld wrote: On Tuesday, March 27, 2012 04:06:13 PM Peter Lieven wrote: On 27.03.2012 14:29, Gleb Natapov wrote: On Tue, Mar 27, 2012 at 02:28:04PM +0200, Peter Lieven wrote: On 27.03.2012 14:26, Gleb Natapov wrote: On Tue, Mar 27, 2012 at 02:20:23PM +0200, Peter Lieven wrote: On 27.03.2012 12:00, Gleb Natapov wrote: On Tue, Mar 27, 2012 at 11:26:29AM +0200, Peter Lieven wrote: On 27.03.2012 11:23, Vadim Rozenfeld wrote: On Tuesday, March 27, 2012 10:56:05 AM Gleb Natapov wrote: On Mon, Mar 26, 2012 at 10:11:43PM +0200, Vadim Rozenfeld wrote: On Monday, March 26, 2012 08:54:50 PM Peter Lieven wrote: On 26.03.2012 20:36, Vadim Rozenfeld wrote: On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote: On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld wrote: On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote: On 22.03.2012 10:38, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 10:52:42 AM Peter Lieven wrote: On 22.03.2012 09:48, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote: On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote: On 21.03.2012 12:10, David Cure wrote: hello, Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov ecrivait : Try to add to cpu definition in XML and check command line. ok I try this but I can't use to map the host cpu (my libvirt is 0.9.8) so I use : Opteron_G3 (the physical server use Opteron CPU). The log is here : http://www.roullier.net/Report/report-3.2-vhost-net-1v cpu-cp u.tx t.gz And now with only 1 vcpu, the response time is 8.5s, great improvment. We keep this configuration for production : we check the response time when some other users are connected. please keep in mind, that setting -hypervisor, disabling hpet and only one vcpu makes windows use tsc as clocksource. you have to make sure, that your vm is not switching between physical sockets on your system and that you have constant_tsc feature to have a stable tsc between the cores in the same socket. its also likely that the vm will crash when live migrated. All true. I asked to try -hypervisor only to verify where we loose performance. Since you get good result with it frequent access to PM timer is probably the reason. I do not recommend using -hypervisor for production! @gleb: do you know whats the state of in-kernel hyper-v timers? Vadim is working on it. I'll let him answer. It would be nice to have synthetic timers supported. But, at the moment, I'm only researching this feature. So it will take months at least? I would say weeks. Is there a way, we could contribute and help you with this? Hi Peter, You are welcome to add an appropriate handler. I think Vadim refers to this HV MSR http://msdn.microsoft.com/en-us/library/windows/hardware/ff542 633%28 v=vs .85 %29.aspx This one is pretty simple to support. Please see attachments for more details. I was thinking about synthetic timers http://msdn.microsoft.com/en- us/library/windows/hardware/ff542758(v=vs.85).aspx is this what microsoft qpc uses as clocksource in hyper-v? Yes, it should be enough for Win7 / W2K8R2. To clarify the thing that microsoft qpc uses is what is implemented by the patch Vadim attached to his previous email. But I believe that additional qemu patch is needed for Windows to actually use it. You are right. bits 1 and 9 must be set to on in leaf 0x4003 and HPET should be completely removed from ACPI. could you advise how to do this and/or make a patch? the stuff you send yesterday is for qemu, right? would it be possible to use it in qemu-kvm also? No, they are for kernel. i meant the qemu.diff file. Yes, I missed the second attachment. if i understand correctly i have to pass -cpu host,+hv_refcnt to qemu? Looks like it. ok, so it would be interesting if it helps to avoid the pmtimer reads we observed earlier. right? Yes. first feedback: performance seems to be amazing. i cannot confirm that it breaks hv_spinlocks, hv_vapic and hv_relaxed. why did you assume this? I didn't mean that hv_refcnt will break any other hyper-v features. I just want to say that turning hv_refcnt on (as any other hv_ option) will crash Win8 on boot-up. yes, i got it meanwhile ;-) let me know what you think should be done to further test the refcnt implementation. i would suggest to return at least 0x if msr 0x4021 is read. peter Cheers, Vadim. no more pmtimer reads. i can now almost fully utililizy a 1GBit interface with a file transfer while there was not one cpu core fully utilized as observed with pmtimer. some live migration tests revealed that it did not crash even under load. @vadim: i think we need a proper patch for the others to test this ;-) what i observed: is it right, that HV_X64_MSR_TIME_REF_COUNT is missing in msrs_to_save[] in x86
Re: performance trouble
On 27.03.2012 17:37, Vadim Rozenfeld wrote: On Tuesday, March 27, 2012 04:44:51 PM Peter Lieven wrote: On 27.03.2012 13:43, Vadim Rozenfeld wrote: On Tuesday, March 27, 2012 12:49:58 PM Peter Lieven wrote: On 27.03.2012 12:40, Vadim Rozenfeld wrote: On Tuesday, March 27, 2012 11:26:29 AM Peter Lieven wrote: On 27.03.2012 11:23, Vadim Rozenfeld wrote: On Tuesday, March 27, 2012 10:56:05 AM Gleb Natapov wrote: On Mon, Mar 26, 2012 at 10:11:43PM +0200, Vadim Rozenfeld wrote: On Monday, March 26, 2012 08:54:50 PM Peter Lieven wrote: On 26.03.2012 20:36, Vadim Rozenfeld wrote: On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote: On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld wrote: On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote: On 22.03.2012 10:38, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 10:52:42 AM Peter Lieven wrote: On 22.03.2012 09:48, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote: On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote: On 21.03.2012 12:10, David Cure wrote: hello, Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov ecrivait : Try to add to cpu definition in XML and check command line. ok I try this but I can't use to map the host cpu (my libvirt is 0.9.8) so I use : Opteron_G3 (the physical server use Opteron CPU). The log is here : http://www.roullier.net/Report/report-3.2-vhost-net-1vcp u- cp u.tx t.gz And now with only 1 vcpu, the response time is 8.5s, great improvment. We keep this configuration for production : we check the response time when some other users are connected. please keep in mind, that setting -hypervisor, disabling hpet and only one vcpu makes windows use tsc as clocksource. you have to make sure, that your vm is not switching between physical sockets on your system and that you have constant_tsc feature to have a stable tsc between the cores in the same socket. its also likely that the vm will crash when live migrated. All true. I asked to try -hypervisor only to verify where we loose performance. Since you get good result with it frequent access to PM timer is probably the reason. I do not recommend using -hypervisor for production! @gleb: do you know whats the state of in-kernel hyper-v timers? Vadim is working on it. I'll let him answer. It would be nice to have synthetic timers supported. But, at the moment, I'm only researching this feature. So it will take months at least? I would say weeks. Is there a way, we could contribute and help you with this? Hi Peter, You are welcome to add an appropriate handler. I think Vadim refers to this HV MSR http://msdn.microsoft.com/en-us/library/windows/hardware/ff54263 3% 28 v=vs .85 %29.aspx This one is pretty simple to support. Please see attachments for more details. I was thinking about synthetic timers http://msdn.microsoft.com/en- us/library/windows/hardware/ff542758(v=vs.85).aspx is this what microsoft qpc uses as clocksource in hyper-v? Yes, it should be enough for Win7 / W2K8R2. To clarify the thing that microsoft qpc uses is what is implemented by the patch Vadim attached to his previous email. But I believe that additional qemu patch is needed for Windows to actually use it. You are right. bits 1 and 9 must be set to on in leaf 0x4003 and HPET should be completely removed from ACPI. could you advise how to do this and/or make a patch? Gleb mentioned that it properly handled in upstream, otherwise just comment the entire HPET section in acpi-dsdt.dsl file. i have upstream bios installed. so -no-hpet should disable hpet completely. can you give a hint, what "bits 1 and 9 must be set to on in leaf 0x4003" means? I mean the following code: +if (hyperv_ref_counter_enabled()) { +c->eax |= HV_X64_MSR_TIME_REF_COUNT_AVAILABLE; +c->eax |= 0x200; +} Please see attached file for more information. the stuff you send yesterday is for qemu, right? would it be possible to use it in qemu-kvm also? Yes, but don't forget about kvm patch as well. ok, i will try my best. would you consider your patch a quick hack or do you think it would be worth to be uploaded to the upstream repository? It was just a brief attempt from my side, mostly inspirited by our with Gleb conversation, to see what it worth to turn this option on. It is not fully tested. It will crash Win8 (as well as the rest of the currently introduced hyper-v features). i can confirm that windows 8 installer does not start and resets the vm continously. it tries to access hv msr 0x4021 Win8 needs more comprehensive Hyper-V support. yes it seems. i read your comment wrong. i was believing the hv_refcnt breaks the other hv_features and windows 8, but i guess you said any of the hv_features will break win 8?! peter
Re: performance trouble
On 27.03.2012 13:43, Vadim Rozenfeld wrote: On Tuesday, March 27, 2012 12:49:58 PM Peter Lieven wrote: On 27.03.2012 12:40, Vadim Rozenfeld wrote: On Tuesday, March 27, 2012 11:26:29 AM Peter Lieven wrote: On 27.03.2012 11:23, Vadim Rozenfeld wrote: On Tuesday, March 27, 2012 10:56:05 AM Gleb Natapov wrote: On Mon, Mar 26, 2012 at 10:11:43PM +0200, Vadim Rozenfeld wrote: On Monday, March 26, 2012 08:54:50 PM Peter Lieven wrote: On 26.03.2012 20:36, Vadim Rozenfeld wrote: On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote: On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld wrote: On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote: On 22.03.2012 10:38, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 10:52:42 AM Peter Lieven wrote: On 22.03.2012 09:48, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote: On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote: On 21.03.2012 12:10, David Cure wrote: hello, Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov ecrivait : Try to add to cpu definition in XML and check command line. ok I try this but I can't useto map the host cpu (my libvirt is 0.9.8) so I use : Opteron_G3 (the physical server use Opteron CPU). The log is here : http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu- cp u.tx t.gz And now with only 1 vcpu, the response time is 8.5s, great improvment. We keep this configuration for production : we check the response time when some other users are connected. please keep in mind, that setting -hypervisor, disabling hpet and only one vcpu makes windows use tsc as clocksource. you have to make sure, that your vm is not switching between physical sockets on your system and that you have constant_tsc feature to have a stable tsc between the cores in the same socket. its also likely that the vm will crash when live migrated. All true. I asked to try -hypervisor only to verify where we loose performance. Since you get good result with it frequent access to PM timer is probably the reason. I do not recommend using -hypervisor for production! @gleb: do you know whats the state of in-kernel hyper-v timers? Vadim is working on it. I'll let him answer. It would be nice to have synthetic timers supported. But, at the moment, I'm only researching this feature. So it will take months at least? I would say weeks. Is there a way, we could contribute and help you with this? Hi Peter, You are welcome to add an appropriate handler. I think Vadim refers to this HV MSR http://msdn.microsoft.com/en-us/library/windows/hardware/ff542633% 28 v=vs .85 %29.aspx This one is pretty simple to support. Please see attachments for more details. I was thinking about synthetic timers http://msdn.microsoft.com/en- us/library/windows/hardware/ff542758(v=vs.85).aspx is this what microsoft qpc uses as clocksource in hyper-v? Yes, it should be enough for Win7 / W2K8R2. To clarify the thing that microsoft qpc uses is what is implemented by the patch Vadim attached to his previous email. But I believe that additional qemu patch is needed for Windows to actually use it. You are right. bits 1 and 9 must be set to on in leaf 0x4003 and HPET should be completely removed from ACPI. could you advise how to do this and/or make a patch? Gleb mentioned that it properly handled in upstream, otherwise just comment the entire HPET section in acpi-dsdt.dsl file. i have upstream bios installed. so -no-hpet should disable hpet completely. can you give a hint, what "bits 1 and 9 must be set to on in leaf 0x4003" means? I mean the following code: +if (hyperv_ref_counter_enabled()) { +c->eax |= HV_X64_MSR_TIME_REF_COUNT_AVAILABLE; +c->eax |= 0x200; +} Please see attached file for more information. the stuff you send yesterday is for qemu, right? would it be possible to use it in qemu-kvm also? Yes, but don't forget about kvm patch as well. ok, i will try my best. would you consider your patch a quick hack or do you think it would be worth to be uploaded to the upstream repository? It was just a brief attempt from my side, mostly inspirited by our with Gleb conversation, to see what it worth to turn this option on. It is not fully tested. It will crash Win8 (as well as the rest of the currently introduced hyper-v features). i can confirm that windows 8 installer does not start and resets the vm continously. it tries to access hv msr 0x4021 http://msdn.microsoft.com/en-us/library/windows/hardware/ff542648%28v=vs.85%29.aspx it is possible to tell the guest that the host is not iTSC (how they call it) capable. i will try to hack a patch for this. peter I wouldn't commit this code without comprehensive testing. Vadim. peter peter -- Gleb. -
Re: performance trouble
On 27.03.2012 14:29, Gleb Natapov wrote: On Tue, Mar 27, 2012 at 02:28:04PM +0200, Peter Lieven wrote: On 27.03.2012 14:26, Gleb Natapov wrote: On Tue, Mar 27, 2012 at 02:20:23PM +0200, Peter Lieven wrote: On 27.03.2012 12:00, Gleb Natapov wrote: On Tue, Mar 27, 2012 at 11:26:29AM +0200, Peter Lieven wrote: On 27.03.2012 11:23, Vadim Rozenfeld wrote: On Tuesday, March 27, 2012 10:56:05 AM Gleb Natapov wrote: On Mon, Mar 26, 2012 at 10:11:43PM +0200, Vadim Rozenfeld wrote: On Monday, March 26, 2012 08:54:50 PM Peter Lieven wrote: On 26.03.2012 20:36, Vadim Rozenfeld wrote: On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote: On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld wrote: On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote: On 22.03.2012 10:38, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 10:52:42 AM Peter Lieven wrote: On 22.03.2012 09:48, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote: On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote: On 21.03.2012 12:10, David Cure wrote: hello, Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov ecrivait : Try to add to cpu definition in XML and check command line. ok I try this but I can't use to map the host cpu (my libvirt is 0.9.8) so I use : Opteron_G3 (the physical server use Opteron CPU). The log is here : http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-cp u.tx t.gz And now with only 1 vcpu, the response time is 8.5s, great improvment. We keep this configuration for production : we check the response time when some other users are connected. please keep in mind, that setting -hypervisor, disabling hpet and only one vcpu makes windows use tsc as clocksource. you have to make sure, that your vm is not switching between physical sockets on your system and that you have constant_tsc feature to have a stable tsc between the cores in the same socket. its also likely that the vm will crash when live migrated. All true. I asked to try -hypervisor only to verify where we loose performance. Since you get good result with it frequent access to PM timer is probably the reason. I do not recommend using -hypervisor for production! @gleb: do you know whats the state of in-kernel hyper-v timers? Vadim is working on it. I'll let him answer. It would be nice to have synthetic timers supported. But, at the moment, I'm only researching this feature. So it will take months at least? I would say weeks. Is there a way, we could contribute and help you with this? Hi Peter, You are welcome to add an appropriate handler. I think Vadim refers to this HV MSR http://msdn.microsoft.com/en-us/library/windows/hardware/ff542633%28 v=vs .85 %29.aspx This one is pretty simple to support. Please see attachments for more details. I was thinking about synthetic timers http://msdn.microsoft.com/en- us/library/windows/hardware/ff542758(v=vs.85).aspx is this what microsoft qpc uses as clocksource in hyper-v? Yes, it should be enough for Win7 / W2K8R2. To clarify the thing that microsoft qpc uses is what is implemented by the patch Vadim attached to his previous email. But I believe that additional qemu patch is needed for Windows to actually use it. You are right. bits 1 and 9 must be set to on in leaf 0x4003 and HPET should be completely removed from ACPI. could you advise how to do this and/or make a patch? the stuff you send yesterday is for qemu, right? would it be possible to use it in qemu-kvm also? No, they are for kernel. i meant the qemu.diff file. Yes, I missed the second attachment. if i understand correctly i have to pass -cpu host,+hv_refcnt to qemu? Looks like it. ok, so it would be interesting if it helps to avoid the pmtimer reads we observed earlier. right? Yes. first feedback: performance seems to be amazing. i cannot confirm that it breaks hv_spinlocks, hv_vapic and hv_relaxed. why did you assume this? no more pmtimer reads. i can now almost fully utililizy a 1GBit interface with a file transfer while there was not one cpu core fully utilized as observed with pmtimer. some live migration tests revealed that it did not crash even under load. @vadim: i think we need a proper patch for the others to test this ;-) what i observed: is it right, that HV_X64_MSR_TIME_REF_COUNT is missing in msrs_to_save[] in x86/x86.c of the kernel module? thanks for you help, peter -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance trouble
On 27.03.2012 14:29, Gleb Natapov wrote: On Tue, Mar 27, 2012 at 02:28:04PM +0200, Peter Lieven wrote: On 27.03.2012 14:26, Gleb Natapov wrote: On Tue, Mar 27, 2012 at 02:20:23PM +0200, Peter Lieven wrote: On 27.03.2012 12:00, Gleb Natapov wrote: On Tue, Mar 27, 2012 at 11:26:29AM +0200, Peter Lieven wrote: On 27.03.2012 11:23, Vadim Rozenfeld wrote: On Tuesday, March 27, 2012 10:56:05 AM Gleb Natapov wrote: On Mon, Mar 26, 2012 at 10:11:43PM +0200, Vadim Rozenfeld wrote: On Monday, March 26, 2012 08:54:50 PM Peter Lieven wrote: On 26.03.2012 20:36, Vadim Rozenfeld wrote: On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote: On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld wrote: On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote: On 22.03.2012 10:38, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 10:52:42 AM Peter Lieven wrote: On 22.03.2012 09:48, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote: On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote: On 21.03.2012 12:10, David Cure wrote: hello, Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov ecrivait : Try to add to cpu definition in XML and check command line. ok I try this but I can't use to map the host cpu (my libvirt is 0.9.8) so I use : Opteron_G3 (the physical server use Opteron CPU). The log is here : http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-cp u.tx t.gz And now with only 1 vcpu, the response time is 8.5s, great improvment. We keep this configuration for production : we check the response time when some other users are connected. please keep in mind, that setting -hypervisor, disabling hpet and only one vcpu makes windows use tsc as clocksource. you have to make sure, that your vm is not switching between physical sockets on your system and that you have constant_tsc feature to have a stable tsc between the cores in the same socket. its also likely that the vm will crash when live migrated. All true. I asked to try -hypervisor only to verify where we loose performance. Since you get good result with it frequent access to PM timer is probably the reason. I do not recommend using -hypervisor for production! @gleb: do you know whats the state of in-kernel hyper-v timers? Vadim is working on it. I'll let him answer. It would be nice to have synthetic timers supported. But, at the moment, I'm only researching this feature. So it will take months at least? I would say weeks. Is there a way, we could contribute and help you with this? Hi Peter, You are welcome to add an appropriate handler. I think Vadim refers to this HV MSR http://msdn.microsoft.com/en-us/library/windows/hardware/ff542633%28 v=vs .85 %29.aspx This one is pretty simple to support. Please see attachments for more details. I was thinking about synthetic timers http://msdn.microsoft.com/en- us/library/windows/hardware/ff542758(v=vs.85).aspx is this what microsoft qpc uses as clocksource in hyper-v? Yes, it should be enough for Win7 / W2K8R2. To clarify the thing that microsoft qpc uses is what is implemented by the patch Vadim attached to his previous email. But I believe that additional qemu patch is needed for Windows to actually use it. You are right. bits 1 and 9 must be set to on in leaf 0x4003 and HPET should be completely removed from ACPI. could you advise how to do this and/or make a patch? the stuff you send yesterday is for qemu, right? would it be possible to use it in qemu-kvm also? No, they are for kernel. i meant the qemu.diff file. Yes, I missed the second attachment. if i understand correctly i have to pass -cpu host,+hv_refcnt to qemu? Looks like it. ok, so it would be interesting if it helps to avoid the pmtimer reads we observed earlier. right? Yes. ok, will try it. can you give me a short advice if the patch has to applied to qemu-kvm or qemu latest from git? peter -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance trouble
On 27.03.2012 14:26, Gleb Natapov wrote: On Tue, Mar 27, 2012 at 02:20:23PM +0200, Peter Lieven wrote: On 27.03.2012 12:00, Gleb Natapov wrote: On Tue, Mar 27, 2012 at 11:26:29AM +0200, Peter Lieven wrote: On 27.03.2012 11:23, Vadim Rozenfeld wrote: On Tuesday, March 27, 2012 10:56:05 AM Gleb Natapov wrote: On Mon, Mar 26, 2012 at 10:11:43PM +0200, Vadim Rozenfeld wrote: On Monday, March 26, 2012 08:54:50 PM Peter Lieven wrote: On 26.03.2012 20:36, Vadim Rozenfeld wrote: On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote: On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld wrote: On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote: On 22.03.2012 10:38, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 10:52:42 AM Peter Lieven wrote: On 22.03.2012 09:48, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote: On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote: On 21.03.2012 12:10, David Cure wrote: hello, Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov ecrivait : Try to add to cpu definition in XML and check command line. ok I try this but I can't useto map the host cpu (my libvirt is 0.9.8) so I use : Opteron_G3 (the physical server use Opteron CPU). The log is here : http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-cp u.tx t.gz And now with only 1 vcpu, the response time is 8.5s, great improvment. We keep this configuration for production : we check the response time when some other users are connected. please keep in mind, that setting -hypervisor, disabling hpet and only one vcpu makes windows use tsc as clocksource. you have to make sure, that your vm is not switching between physical sockets on your system and that you have constant_tsc feature to have a stable tsc between the cores in the same socket. its also likely that the vm will crash when live migrated. All true. I asked to try -hypervisor only to verify where we loose performance. Since you get good result with it frequent access to PM timer is probably the reason. I do not recommend using -hypervisor for production! @gleb: do you know whats the state of in-kernel hyper-v timers? Vadim is working on it. I'll let him answer. It would be nice to have synthetic timers supported. But, at the moment, I'm only researching this feature. So it will take months at least? I would say weeks. Is there a way, we could contribute and help you with this? Hi Peter, You are welcome to add an appropriate handler. I think Vadim refers to this HV MSR http://msdn.microsoft.com/en-us/library/windows/hardware/ff542633%28 v=vs .85 %29.aspx This one is pretty simple to support. Please see attachments for more details. I was thinking about synthetic timers http://msdn.microsoft.com/en- us/library/windows/hardware/ff542758(v=vs.85).aspx is this what microsoft qpc uses as clocksource in hyper-v? Yes, it should be enough for Win7 / W2K8R2. To clarify the thing that microsoft qpc uses is what is implemented by the patch Vadim attached to his previous email. But I believe that additional qemu patch is needed for Windows to actually use it. You are right. bits 1 and 9 must be set to on in leaf 0x4003 and HPET should be completely removed from ACPI. could you advise how to do this and/or make a patch? the stuff you send yesterday is for qemu, right? would it be possible to use it in qemu-kvm also? No, they are for kernel. i meant the qemu.diff file. Yes, I missed the second attachment. if i understand correctly i have to pass -cpu host,+hv_refcnt to qemu? Looks like it. ok, so it would be interesting if it helps to avoid the pmtimer reads we observed earlier. right? peter -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance trouble
On 27.03.2012 12:00, Gleb Natapov wrote: On Tue, Mar 27, 2012 at 11:26:29AM +0200, Peter Lieven wrote: On 27.03.2012 11:23, Vadim Rozenfeld wrote: On Tuesday, March 27, 2012 10:56:05 AM Gleb Natapov wrote: On Mon, Mar 26, 2012 at 10:11:43PM +0200, Vadim Rozenfeld wrote: On Monday, March 26, 2012 08:54:50 PM Peter Lieven wrote: On 26.03.2012 20:36, Vadim Rozenfeld wrote: On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote: On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld wrote: On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote: On 22.03.2012 10:38, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 10:52:42 AM Peter Lieven wrote: On 22.03.2012 09:48, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote: On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote: On 21.03.2012 12:10, David Cure wrote: hello, Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov ecrivait : Try to add to cpu definition in XML and check command line. ok I try this but I can't use to map the host cpu (my libvirt is 0.9.8) so I use : Opteron_G3 (the physical server use Opteron CPU). The log is here : http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-cp u.tx t.gz And now with only 1 vcpu, the response time is 8.5s, great improvment. We keep this configuration for production : we check the response time when some other users are connected. please keep in mind, that setting -hypervisor, disabling hpet and only one vcpu makes windows use tsc as clocksource. you have to make sure, that your vm is not switching between physical sockets on your system and that you have constant_tsc feature to have a stable tsc between the cores in the same socket. its also likely that the vm will crash when live migrated. All true. I asked to try -hypervisor only to verify where we loose performance. Since you get good result with it frequent access to PM timer is probably the reason. I do not recommend using -hypervisor for production! @gleb: do you know whats the state of in-kernel hyper-v timers? Vadim is working on it. I'll let him answer. It would be nice to have synthetic timers supported. But, at the moment, I'm only researching this feature. So it will take months at least? I would say weeks. Is there a way, we could contribute and help you with this? Hi Peter, You are welcome to add an appropriate handler. I think Vadim refers to this HV MSR http://msdn.microsoft.com/en-us/library/windows/hardware/ff542633%28 v=vs .85 %29.aspx This one is pretty simple to support. Please see attachments for more details. I was thinking about synthetic timers http://msdn.microsoft.com/en- us/library/windows/hardware/ff542758(v=vs.85).aspx is this what microsoft qpc uses as clocksource in hyper-v? Yes, it should be enough for Win7 / W2K8R2. To clarify the thing that microsoft qpc uses is what is implemented by the patch Vadim attached to his previous email. But I believe that additional qemu patch is needed for Windows to actually use it. You are right. bits 1 and 9 must be set to on in leaf 0x4003 and HPET should be completely removed from ACPI. could you advise how to do this and/or make a patch? the stuff you send yesterday is for qemu, right? would it be possible to use it in qemu-kvm also? No, they are for kernel. i meant the qemu.diff file. if i understand correctly i have to pass -cpu host,+hv_refcnt to qemu? peter -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance trouble
On 27.03.2012 12:40, Vadim Rozenfeld wrote: On Tuesday, March 27, 2012 11:26:29 AM Peter Lieven wrote: On 27.03.2012 11:23, Vadim Rozenfeld wrote: On Tuesday, March 27, 2012 10:56:05 AM Gleb Natapov wrote: On Mon, Mar 26, 2012 at 10:11:43PM +0200, Vadim Rozenfeld wrote: On Monday, March 26, 2012 08:54:50 PM Peter Lieven wrote: On 26.03.2012 20:36, Vadim Rozenfeld wrote: On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote: On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld wrote: On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote: On 22.03.2012 10:38, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 10:52:42 AM Peter Lieven wrote: On 22.03.2012 09:48, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote: On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote: On 21.03.2012 12:10, David Cure wrote: hello, Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov ecrivait : Try to add to cpu definition in XML and check command line. ok I try this but I can't use to map the host cpu (my libvirt is 0.9.8) so I use : Opteron_G3 (the physical server use Opteron CPU). The log is here : http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-cp u.tx t.gz And now with only 1 vcpu, the response time is 8.5s, great improvment. We keep this configuration for production : we check the response time when some other users are connected. please keep in mind, that setting -hypervisor, disabling hpet and only one vcpu makes windows use tsc as clocksource. you have to make sure, that your vm is not switching between physical sockets on your system and that you have constant_tsc feature to have a stable tsc between the cores in the same socket. its also likely that the vm will crash when live migrated. All true. I asked to try -hypervisor only to verify where we loose performance. Since you get good result with it frequent access to PM timer is probably the reason. I do not recommend using -hypervisor for production! @gleb: do you know whats the state of in-kernel hyper-v timers? Vadim is working on it. I'll let him answer. It would be nice to have synthetic timers supported. But, at the moment, I'm only researching this feature. So it will take months at least? I would say weeks. Is there a way, we could contribute and help you with this? Hi Peter, You are welcome to add an appropriate handler. I think Vadim refers to this HV MSR http://msdn.microsoft.com/en-us/library/windows/hardware/ff542633%28 v=vs .85 %29.aspx This one is pretty simple to support. Please see attachments for more details. I was thinking about synthetic timers http://msdn.microsoft.com/en- us/library/windows/hardware/ff542758(v=vs.85).aspx is this what microsoft qpc uses as clocksource in hyper-v? Yes, it should be enough for Win7 / W2K8R2. To clarify the thing that microsoft qpc uses is what is implemented by the patch Vadim attached to his previous email. But I believe that additional qemu patch is needed for Windows to actually use it. You are right. bits 1 and 9 must be set to on in leaf 0x4003 and HPET should be completely removed from ACPI. could you advise how to do this and/or make a patch? Gleb mentioned that it properly handled in upstream, otherwise just comment the entire HPET section in acpi-dsdt.dsl file. i have upstream bios installed. so -no-hpet should disable hpet completely. can you give a hint, what "bits 1 and 9 must be set to on in leaf 0x4003" means? the stuff you send yesterday is for qemu, right? would it be possible to use it in qemu-kvm also? Yes, but don't forget about kvm patch as well. ok, i will try my best. would you consider your patch a quick hack or do you think it would be worth to be uploaded to the upstream repository? peter peter -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance trouble
On 27.03.2012 11:23, Vadim Rozenfeld wrote: On Tuesday, March 27, 2012 10:56:05 AM Gleb Natapov wrote: On Mon, Mar 26, 2012 at 10:11:43PM +0200, Vadim Rozenfeld wrote: On Monday, March 26, 2012 08:54:50 PM Peter Lieven wrote: On 26.03.2012 20:36, Vadim Rozenfeld wrote: On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote: On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld wrote: On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote: On 22.03.2012 10:38, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 10:52:42 AM Peter Lieven wrote: On 22.03.2012 09:48, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote: On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote: On 21.03.2012 12:10, David Cure wrote: hello, Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov ecrivait : Try to add to cpu definition in XML and check command line. ok I try this but I can't use to map the host cpu (my libvirt is 0.9.8) so I use : Opteron_G3 (the physical server use Opteron CPU). The log is here : http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-cp u.tx t.gz And now with only 1 vcpu, the response time is 8.5s, great improvment. We keep this configuration for production : we check the response time when some other users are connected. please keep in mind, that setting -hypervisor, disabling hpet and only one vcpu makes windows use tsc as clocksource. you have to make sure, that your vm is not switching between physical sockets on your system and that you have constant_tsc feature to have a stable tsc between the cores in the same socket. its also likely that the vm will crash when live migrated. All true. I asked to try -hypervisor only to verify where we loose performance. Since you get good result with it frequent access to PM timer is probably the reason. I do not recommend using -hypervisor for production! @gleb: do you know whats the state of in-kernel hyper-v timers? Vadim is working on it. I'll let him answer. It would be nice to have synthetic timers supported. But, at the moment, I'm only researching this feature. So it will take months at least? I would say weeks. Is there a way, we could contribute and help you with this? Hi Peter, You are welcome to add an appropriate handler. I think Vadim refers to this HV MSR http://msdn.microsoft.com/en-us/library/windows/hardware/ff542633%28 v=vs .85 %29.aspx This one is pretty simple to support. Please see attachments for more details. I was thinking about synthetic timers http://msdn.microsoft.com/en- us/library/windows/hardware/ff542758(v=vs.85).aspx is this what microsoft qpc uses as clocksource in hyper-v? Yes, it should be enough for Win7 / W2K8R2. To clarify the thing that microsoft qpc uses is what is implemented by the patch Vadim attached to his previous email. But I believe that additional qemu patch is needed for Windows to actually use it. You are right. bits 1 and 9 must be set to on in leaf 0x4003 and HPET should be completely removed from ACPI. could you advise how to do this and/or make a patch? the stuff you send yesterday is for qemu, right? would it be possible to use it in qemu-kvm also? peter -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance trouble
On 26.03.2012 20:36, Vadim Rozenfeld wrote: On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote: On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld wrote: On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote: On 22.03.2012 10:38, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 10:52:42 AM Peter Lieven wrote: On 22.03.2012 09:48, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote: On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote: On 21.03.2012 12:10, David Cure wrote: hello, Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov ecrivait : Try to add to cpu definition in XML and check command line. ok I try this but I can't use to map the host cpu (my libvirt is 0.9.8) so I use : Opteron_G3 (the physical server use Opteron CPU). The log is here : http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-cpu.tx t.gz And now with only 1 vcpu, the response time is 8.5s, great improvment. We keep this configuration for production : we check the response time when some other users are connected. please keep in mind, that setting -hypervisor, disabling hpet and only one vcpu makes windows use tsc as clocksource. you have to make sure, that your vm is not switching between physical sockets on your system and that you have constant_tsc feature to have a stable tsc between the cores in the same socket. its also likely that the vm will crash when live migrated. All true. I asked to try -hypervisor only to verify where we loose performance. Since you get good result with it frequent access to PM timer is probably the reason. I do not recommend using -hypervisor for production! @gleb: do you know whats the state of in-kernel hyper-v timers? Vadim is working on it. I'll let him answer. It would be nice to have synthetic timers supported. But, at the moment, I'm only researching this feature. So it will take months at least? I would say weeks. Is there a way, we could contribute and help you with this? Hi Peter, You are welcome to add an appropriate handler. I think Vadim refers to this HV MSR http://msdn.microsoft.com/en-us/library/windows/hardware/ff542633%28v=vs.85 %29.aspx This one is pretty simple to support. Please see attachments for more details. I was thinking about synthetic timers http://msdn.microsoft.com/en- us/library/windows/hardware/ff542758(v=vs.85).aspx is this what microsoft qpc uses as clocksource in hyper-v? i will check tomorrow. thanks vadim -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance trouble
On 22.03.2012 10:38, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 10:52:42 AM Peter Lieven wrote: On 22.03.2012 09:48, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote: On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote: On 21.03.2012 12:10, David Cure wrote: hello, Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov ecrivait : Try to addto cpu definition in XML and check command line. ok I try this but I can't useto map the host cpu (my libvirt is 0.9.8) so I use : Opteron_G3 (the physical server use Opteron CPU). The log is here : http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-cpu.txt.gz And now with only 1 vcpu, the response time is 8.5s, great improvment. We keep this configuration for production : we check the response time when some other users are connected. please keep in mind, that setting -hypervisor, disabling hpet and only one vcpu makes windows use tsc as clocksource. you have to make sure, that your vm is not switching between physical sockets on your system and that you have constant_tsc feature to have a stable tsc between the cores in the same socket. its also likely that the vm will crash when live migrated. All true. I asked to try -hypervisor only to verify where we loose performance. Since you get good result with it frequent access to PM timer is probably the reason. I do not recommend using -hypervisor for production! @gleb: do you know whats the state of in-kernel hyper-v timers? Vadim is working on it. I'll let him answer. It would be nice to have synthetic timers supported. But, at the moment, I'm only researching this feature. So it will take months at least? I would say weeks. Is there a way, we could contribute and help you with this? Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance trouble
On 22.03.2012 09:48, Vadim Rozenfeld wrote: On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov wrote: On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote: On 21.03.2012 12:10, David Cure wrote: hello, Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov ecrivait : Try to add to cpu definition in XML and check command line. ok I try this but I can't use to map the host cpu (my libvirt is 0.9.8) so I use : Opteron_G3 (the physical server use Opteron CPU). The log is here : http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-cpu.txt.gz And now with only 1 vcpu, the response time is 8.5s, great improvment. We keep this configuration for production : we check the response time when some other users are connected. please keep in mind, that setting -hypervisor, disabling hpet and only one vcpu makes windows use tsc as clocksource. you have to make sure, that your vm is not switching between physical sockets on your system and that you have constant_tsc feature to have a stable tsc between the cores in the same socket. its also likely that the vm will crash when live migrated. All true. I asked to try -hypervisor only to verify where we loose performance. Since you get good result with it frequent access to PM timer is probably the reason. I do not recommend using -hypervisor for production! @gleb: do you know whats the state of in-kernel hyper-v timers? Vadim is working on it. I'll let him answer. It would be nice to have synthetic timers supported. But, at the moment, I'm only researching this feature. So it will take months at least? What do the others think, would it be feasible to make a proper in-kernel pmtimer solution in the meantime. I think Windows guest performance is very important for the success of KVM. Peter peter David. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance trouble
On 22.03.2012 09:33, David Cure wrote: Le Thu, Mar 22, 2012 at 09:53:45AM +0200, Gleb Natapov ecrivait : All true. I asked to try -hypervisor only to verify where we loose performance. Since you get good result with it frequent access to PM timer is probably the reason. I do not recommend using -hypervisor for production! so if I leave cpu as previous (not defined) and only disable hpet and use 1 vcpu, it's ok for production ? this is ok, but windows will use pm timer so you will have bad performance. Is there a workaround for this PM access ? there exists old patches from 2010 for in-kernel pmtimer. they work, but only partly. problem here is windows enables the pmtimer overflow interrupt which this patch did not address (amongst other things). i simply ignored it and windows ran nevertheless. but i would not do this in production because i do not know which side effects it might have. there are to possible solutions: a) a real in-kernel pmtimer implementation (which does also help other systems not only windows) b) hyper-v support in-kernel at least partly (for the timing stuff). this is being worked on by Vadim. Peter David. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance trouble
On 22.03.2012 09:31, David Cure wrote: Le Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven ecrivait : please keep in mind, that setting -hypervisor, disabling hpet and only one vcpu makes windows use tsc as clocksource. you have to make sure, that your vm is not switching between physical sockets on your system and that you have constant_tsc feature to have a stable tsc between the cores in the same socket. its also likely that the vm will crash when live migrated. ok, yet I "only" have disable hpet and use 1vcpu. you have to use 1 vcpu on 32-bit windows. 64-bit seems to work with more than 1 vcpu. why all those limitations: windows avoids using tsc in a hypervisor which is a good decision. problem is it falls back to pm_timer or hpet. both of them are very expensive in emulation currently because kvm exits kernel mode and the userspace qemu-kvm handles this. i have done experiments where i saw ~20.000 userpace exits just for pmtimer reads. this made up ~30-40% of the whole processing power. every call to a QPC timer in windows causes a pm_timer/hpet read. especially each i/o request seems to cause a QPC timer read and which is odd as well a lazy fpu call (but this is a differnt issue) which also is very expensive to emulate (currently). for the switching I need to pin the vcpu on 1 physical proc, right ? you need 1 vcpu for 32-bit windows and disabling hypervisor to cheat windows and make it think it runs on real hardware. it then uses tsc. for constant_tsc, how can I check if I use it ? cat /proc/cpuinfo on the host. there should be a flag 'constant_tsc'. it might be that also rdtscp is necessary. for live migration : what is the "feature" that cause trouble : -hypervisor, hpet, vcpu or all ? using tsc as clocksource is the problem not the features themselves. peter David. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance trouble
On 22.03.2012 08:53, Gleb Natapov wrote: On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter Lieven wrote: On 21.03.2012 12:10, David Cure wrote: hello, Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov ecrivait : Try to add to cpu definition in XML and check command line. ok I try this but I can't use to map the host cpu (my libvirt is 0.9.8) so I use : Opteron_G3 (the physical server use Opteron CPU). The log is here : http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-cpu.txt.gz And now with only 1 vcpu, the response time is 8.5s, great improvment. We keep this configuration for production : we check the response time when some other users are connected. please keep in mind, that setting -hypervisor, disabling hpet and only one vcpu makes windows use tsc as clocksource. you have to make sure, that your vm is not switching between physical sockets on your system and that you have constant_tsc feature to have a stable tsc between the cores in the same socket. its also likely that the vm will crash when live migrated. All true. I asked to try -hypervisor only to verify where we loose performance. Since you get good result with it frequent access to PM timer is probably the reason. I do not recommend using -hypervisor for production! @gleb: do you know whats the state of in-kernel hyper-v timers? Vadim is working on it. I'll let him answer. @avi, gleb: another option would be to revisit the old in-kernel pm-timer implementation and check if its feasible to use this as an alternative. it would also help non hyper-v aware systems (i think bsd and old windows like xp). i rebased this old implementation and can confirm that it does also solve the performance slow down. peter peter David. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance trouble
On 21.03.2012 12:10, David Cure wrote: hello, Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov ecrivait : Try to add to cpu definition in XML and check command line. ok I try this but I can't use to map the host cpu (my libvirt is 0.9.8) so I use : Opteron_G3 (the physical server use Opteron CPU). The log is here : http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-cpu.txt.gz And now with only 1 vcpu, the response time is 8.5s, great improvment. We keep this configuration for production : we check the response time when some other users are connected. please keep in mind, that setting -hypervisor, disabling hpet and only one vcpu makes windows use tsc as clocksource. you have to make sure, that your vm is not switching between physical sockets on your system and that you have constant_tsc feature to have a stable tsc between the cores in the same socket. its also likely that the vm will crash when live migrated. @gleb: do you know whats the state of in-kernel hyper-v timers? peter David. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm-1.0 crashes with threaded vnc server?
On 11.02.2012 09:55, Corentin Chary wrote: On Thu, Feb 9, 2012 at 7:08 PM, Peter Lieven wrote: Hi, is anyone aware if there are still problems when enabling the threaded vnc server? I saw some VMs crashing when using a qemu-kvm build with --enable-vnc-thread. qemu-kvm-1.0[22646]: segfault at 0 ip 7fec1ca7ea0b sp 7fec19d056d0 error 6 in libz.so.1.2.3.3[7fec1ca75000+16000] qemu-kvm-1.0[26056]: segfault at 7f06d8d6e010 ip 7f06e0a30d71 sp 7f06df035748 error 6 in libc-2.11.1.so[7f06e09aa000+17a000] I had no time to debug further. It seems to happen shortly after migrating, but thats uncertain. At least the segfault in libz seems to give a hint to VNC since I cannot image of any other part of qemu-kvm using libz except for VNC server. Thanks, Peter Hi Peter, I found two patches on my git tree that I sent long ago but somehow get lost on the mailing list. I rebased the tree but did not have the time (yet) to test them. http://git.iksaif.net/?p=qemu.git;a=shortlog;h=refs/heads/wip Feel free to try them. If QEMU segfault again, please send a full gdb backtrace / valgrind trace / way to reproduce :). Thanks, I have seen no more crashes with these to patches applied. I would suggest it would be good to push them to the master repository. Thank you, Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux guests and ksm performance
On 28.02.2012 14:16, Avi Kivity wrote: On 02/24/2012 08:41 AM, Stefan Hajnoczi wrote: I dont think that it is cpu intense. All user pages are zeroed anyway, but at allocation time it shouldnt be a big difference in terms of cpu power. It's easy to find a scenario where eagerly zeroing pages is wasteful. Imagine a process that uses all of physical memory. Once it terminates the system is going to run processes that only use a small set of pages. It's pointless zeroing all those pages if we're not going to use them anymore. In the long term, we will use them, except if the guest is completely idle. The scenario in which zeroing is expensive is when the page is refilled through DMA. In that case the zeroing was wasted. This is a pretty common scenario in pagecache intensive workloads. Avi, what do you think of the proposal to give the guest vm a hint that the host is running ksm? In that case the administrator has already chosen that saving physical memory is more important than performance to him? Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] linux guests and ksm performance
On 28.02.2012 13:05, Stefan Hajnoczi wrote: On Tue, Feb 28, 2012 at 11:46 AM, Peter Lieven wrote: On 24.02.2012 08:23, Stefan Hajnoczi wrote: On Fri, Feb 24, 2012 at 6:53 AM, Stefan Hajnoczi wrote: On Fri, Feb 24, 2012 at 6:41 AM, Stefan Hajnoczi wrote: On Thu, Feb 23, 2012 at 7:08 PM, peter.lie...@gmail.com wrote: Stefan Hajnoczischrieb: On Thu, Feb 23, 2012 at 3:40 PM, Peter Lievenwrote: However, in a virtual machine I have not observed the above slow down to that extend while the benefit of zero after free in a virtualisation environment is obvious: 1) zero pages can easily be merged by ksm or other technique. 2) zero (dup) pages are a lot faster to transfer in case of migration. The other approach is a memory page "discard" mechanism - which obviously requires more code changes than zeroing freed pages. The advantage is that we don't take the brute-force and CPU intensive approach of zeroing pages. It would be like a fine-grained ballooning feature. I dont think that it is cpu intense. All user pages are zeroed anyway, but at allocation time it shouldnt be a big difference in terms of cpu power. It's easy to find a scenario where eagerly zeroing pages is wasteful. Imagine a process that uses all of physical memory. Once it terminates the system is going to run processes that only use a small set of pages. It's pointless zeroing all those pages if we're not going to use them anymore. Perhaps the middle path is to zero pages but do it after a grace timeout. I wonder if this helps eliminate the 2-3% slowdown you noticed when compiling. Gah, it's too early in the morning. I don't think this timer actually makes sense. do you think it makes then sense to make a patchset/proposal to notice a guest kernel about the presense of ksm in the host and switch to zero after free? I think your idea is interesting - whether or not people are happy with it will depend on the performance impact. It seems reasonable to me. could you support/help me in implementing and publishing this approach? Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm-1.0 crashes with threaded vnc server?
On 28.02.2012 09:37, Corentin Chary wrote: On Mon, Feb 13, 2012 at 10:24 AM, Peter Lieven wrote: Am 11.02.2012 um 09:55 schrieb Corentin Chary: On Thu, Feb 9, 2012 at 7:08 PM, Peter Lieven wrote: Hi, is anyone aware if there are still problems when enabling the threaded vnc server? I saw some VMs crashing when using a qemu-kvm build with --enable-vnc-thread. qemu-kvm-1.0[22646]: segfault at 0 ip 7fec1ca7ea0b sp 7fec19d056d0 error 6 in libz.so.1.2.3.3[7fec1ca75000+16000] qemu-kvm-1.0[26056]: segfault at 7f06d8d6e010 ip 7f06e0a30d71 sp 7f06df035748 error 6 in libc-2.11.1.so[7f06e09aa000+17a000] I had no time to debug further. It seems to happen shortly after migrating, but thats uncertain. At least the segfault in libz seems to give a hint to VNC since I cannot image of any other part of qemu-kvm using libz except for VNC server. Thanks, Peter Hi Peter, I found two patches on my git tree that I sent long ago but somehow get lost on the mailing list. I rebased the tree but did not have the time (yet) to test them. http://git.iksaif.net/?p=qemu.git;a=shortlog;h=refs/heads/wip Feel free to try them. If QEMU segfault again, please send a full gdb backtrace / valgrind trace / way to reproduce :). Thanks, Hi Corentin, thanks for rebasing those patches. I remember that I have seen them the last time I noticed (about 1 year ago) that the threaded VNC is crashing. I'm on vacation this week, but I will test them next week and let you know if I can force a crash with them applied. If not we should consider to include them asap. Hi Peter, any news on that ? sorry, i had much trouble debugging nasty slow windows vm problems last 2 weeks. but its still on my list. i'll keep you all posted. peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] linux guests and ksm performance
On 24.02.2012 08:23, Stefan Hajnoczi wrote: On Fri, Feb 24, 2012 at 6:53 AM, Stefan Hajnoczi wrote: On Fri, Feb 24, 2012 at 6:41 AM, Stefan Hajnoczi wrote: On Thu, Feb 23, 2012 at 7:08 PM, peter.lie...@gmail.com wrote: Stefan Hajnoczi schrieb: On Thu, Feb 23, 2012 at 3:40 PM, Peter Lieven wrote: However, in a virtual machine I have not observed the above slow down to that extend while the benefit of zero after free in a virtualisation environment is obvious: 1) zero pages can easily be merged by ksm or other technique. 2) zero (dup) pages are a lot faster to transfer in case of migration. The other approach is a memory page "discard" mechanism - which obviously requires more code changes than zeroing freed pages. The advantage is that we don't take the brute-force and CPU intensive approach of zeroing pages. It would be like a fine-grained ballooning feature. I dont think that it is cpu intense. All user pages are zeroed anyway, but at allocation time it shouldnt be a big difference in terms of cpu power. It's easy to find a scenario where eagerly zeroing pages is wasteful. Imagine a process that uses all of physical memory. Once it terminates the system is going to run processes that only use a small set of pages. It's pointless zeroing all those pages if we're not going to use them anymore. Perhaps the middle path is to zero pages but do it after a grace timeout. I wonder if this helps eliminate the 2-3% slowdown you noticed when compiling. Gah, it's too early in the morning. I don't think this timer actually makes sense. do you think it makes then sense to make a patchset/proposal to notice a guest kernel about the presense of ksm in the host and switch to zero after free? peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux guests and ksm performance
Am 24.02.2012 um 08:23 schrieb Stefan Hajnoczi: > On Fri, Feb 24, 2012 at 6:53 AM, Stefan Hajnoczi wrote: >> On Fri, Feb 24, 2012 at 6:41 AM, Stefan Hajnoczi wrote: >>> On Thu, Feb 23, 2012 at 7:08 PM, peter.lie...@gmail.com >>> wrote: >>>> Stefan Hajnoczi schrieb: >>>> >>>>> On Thu, Feb 23, 2012 at 3:40 PM, Peter Lieven wrote: >>>>>> However, in a virtual machine I have not observed the above slow down >>>>> to >>>>>> that extend >>>>>> while the benefit of zero after free in a virtualisation environment >>>>> is >>>>>> obvious: >>>>>> >>>>>> 1) zero pages can easily be merged by ksm or other technique. >>>>>> 2) zero (dup) pages are a lot faster to transfer in case of >>>>> migration. >>>>> >>>>> The other approach is a memory page "discard" mechanism - which >>>>> obviously requires more code changes than zeroing freed pages. >>>>> >>>>> The advantage is that we don't take the brute-force and CPU intensive >>>>> approach of zeroing pages. It would be like a fine-grained ballooning >>>>> feature. >>>>> >>>> >>>> I dont think that it is cpu intense. All user pages are zeroed anyway, but >>>> at allocation time it shouldnt be a big difference in terms of cpu power. >>> >>> It's easy to find a scenario where eagerly zeroing pages is wasteful. >>> Imagine a process that uses all of physical memory. Once it >>> terminates the system is going to run processes that only use a small >>> set of pages. It's pointless zeroing all those pages if we're not >>> going to use them anymore. >> >> Perhaps the middle path is to zero pages but do it after a grace >> timeout. I wonder if this helps eliminate the 2-3% slowdown you >> noticed when compiling. > > Gah, it's too early in the morning. I don't think this timer actually > makes sense. ok, that would be the idea of an ansynchronous page zeroing in the guest. i also think this is to complicated. maybe the other idea is too simple: is it possible to give the guest a hint that ksm is enabled on the host (lets say in a way like its done with kvmclock). if ksm is enabled on the host the administrator has already made the decision that performance is not so important and he/she is eager to save physical memory. what if and only if this flag is set switch from zero on allocate to zero after free. i think the whole thing is less than 10-20 lines of code. and its code that has been proven to be working well in grsecurity for ages. this might introduce a little (2-3%) overhead, but only if there is a lot of non GFP_FREE memory is allocated, but its definitely faster than swapping. of course, it has to be garanteed that this code does not slow down normal systems due to additionales branches (would it be enough to mark the if statements as unlikely?) peter peter > > Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html