Re: kvm-unit-test failures
Il 03/09/2014 20:25, Chris J Arges ha scritto: snip I'm not sure about the reason for the warp, but indeed the offset and uptime match (I'll check them against the trace tomorrow) so it's just that the VM's TSC base is not taken into account correctly. Can you gather another trace with the problematic patch reverted? Paolo Here is the third trace running with 0d3da0d2 reverted from the latest kvm queue branch 11cc9ea3: http://people.canonical.com/~arges/kvm/trace-3.dat.xz Thanks! And---yay!---I reproduced it on another machine. Paolo $ uptime 18:25:13 up 5 min, 1 user, load average: 0.21, 0.74, 0.44 qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio -device pci-testdev -kernel x86/kvmclock_test.flat -smp 2 --append 1000 1409768537 enabling apic enabling apic kvm-clock: cpu 0, msr 0x:44e520 kvm-clock: cpu 0, msr 0x:44e520 Wallclock test, threshold 5 Seconds get from host: 1409768537 Seconds get from kvmclock: 1409768538 Offset:1 Wallclock test, threshold 5 Seconds get from host: 1409768537 Seconds get from kvmclock: 1409768538 Offset:1 Check the stability of raw cycle ... Total vcpus: 2 Test loops: 1000 Total warps: 0 Total stalls: 0 Worst warp: 0 Raw cycle is stable Monotonic cycle test: Total vcpus: 2 Test loops: 1000 Total warps: 0 Total stalls: 0 Worst warp: 0 Measure the performance of raw cycle ... Total vcpus: 2 Test loops: 1000 TSC cycles: 1241970306 Measure the performance of adjusted cycle ... Total vcpus: 2 Test loops: 1000 TSC cycles: 3266701026 Return value from qemu: 1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-unit-test failures
Il 04/09/2014 11:53, Paolo Bonzini ha scritto: Il 03/09/2014 20:25, Chris J Arges ha scritto: snip I'm not sure about the reason for the warp, but indeed the offset and uptime match (I'll check them against the trace tomorrow) so it's just that the VM's TSC base is not taken into account correctly. Can you gather another trace with the problematic patch reverted? Paolo Here is the third trace running with 0d3da0d2 reverted from the latest kvm queue branch 11cc9ea3: http://people.canonical.com/~arges/kvm/trace-3.dat.xz Thanks! And---yay!---I reproduced it on another machine. And my bisection landed on the merge of the timer branch (commit e7fda6c4c3c1a7d6996dd75fd84670fa0b5d448f). Here is the log: $ git bisect bad origin/master $ git bisect good v3.16 $ git bisect good kvm-3.17-1 # 42cbc04fd3b5e3f9b011bf9fa3ce0b3d1e10b58b good[ 6929.863545] loaded kvm module (v3.17-rc1-158-g451fd72219dd) bad [ 6971.932790] loaded kvm module (for-linus) bad [ 7216.073128] loaded kvm module (v3.16-6426-gae045e245542) good[ 7286.198948] loaded kvm module (v3.16-3283-g53ee983378ff) good[ 7350.534060] loaded kvm module (v3.16-rc7-1668-gaef4f5b6db65) good[ 7439.037038] loaded kvm module (v3.16-4006-g91c2ff7708d4) good[ 7481.188637] loaded kvm module (v3.16-rc6-450-g7ba3c21c17d0) bad [ 7535.292730] loaded kvm module (v3.16-4635-ge7fda6c4c3c1) good[ 7589.722691] loaded kvm module (v3.16-rc5-110-g9b0fd802e8c0) good[ 7630.286418] loaded kvm module (v3.16-4467-ged5c41d30ef2) good[ 7712.470986] loaded kvm module (v3.16-rc1-35-g885d078bfe92) good[ 7763.443626] loaded kvm module (v3.16-rc1-381-g1b0733837a9b) good[ 7825.497414] loaded kvm module (v3.16-rc5-116-g7806f60e1d20) good[ 7893.174056] loaded kvm module (v3.16-rc1-384-gc6f1224573c3) This means that: - Tomasz's patch (commit 0d3da0d26e3c3515997c99451ce3b0ad1a69a36c) is fine, it just enables the (wrong) master clock more often - the failure is within that branch. I then cherry-picked Tomasz's patch during a new bisection, and landed on one of my original suspects: commit cbcf2dd3b3d4d990610259e8d878fc8dc1f17d80 Author: Thomas Gleixner t...@linutronix.de Date: Wed Jul 16 21:04:54 2014 + x86: kvm: Make kvm_get_time_and_clockread() nanoseconds based Convert the relevant base data right away to nanoseconds instead of doing the conversion on every readout. Reduces text size by 160 bytes. Signed-off-by: Thomas Gleixner t...@linutronix.de Cc: Gleb Natapov g...@kernel.org Cc: kvm@vger.kernel.org Acked-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: John Stultz john.stu...@linaro.org Again, here is the log: $ git bisect bad 953dec21aed4038464fec02f96a2f1b8701a5bce $ git bisect good 1af447bd8cbfb808a320885d214555fb2d32e6e6 good[ 8384.334892] loaded kvm module (v3.16-rc5-81-g68f6783d2831) bad [ 8525.975170] loaded kvm module (v3.16-rc5-99-gf519b1a2e08c) good[ 8562.204988] loaded kvm module (v3.16-rc5-90-g41fa4215f8e8) bad [ 8629.133287] loaded kvm module (v3.16-rc5-94-g48f18fd6addc) bad [ 8772.846612] loaded kvm module (v3.16-rc5-92-gcbcf2dd3b3d4) good[ 8836.509602] loaded kvm module (v3.16-rc5-91-gbb0b58127c5a) Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-unit-test failures
On Thu, Sep 04, 2014 at 01:33:10PM +0200, Paolo Bonzini wrote: Il 04/09/2014 11:53, Paolo Bonzini ha scritto: Il 03/09/2014 20:25, Chris J Arges ha scritto: snip I'm not sure about the reason for the warp, but indeed the offset and uptime match (I'll check them against the trace tomorrow) so it's just that the VM's TSC base is not taken into account correctly. Can you gather another trace with the problematic patch reverted? Paolo Here is the third trace running with 0d3da0d2 reverted from the latest kvm queue branch 11cc9ea3: http://people.canonical.com/~arges/kvm/trace-3.dat.xz Thanks! And---yay!---I reproduced it on another machine. And my bisection landed on the merge of the timer branch (commit e7fda6c4c3c1a7d6996dd75fd84670fa0b5d448f). Here is the log: $ git bisect bad origin/master $ git bisect good v3.16 $ git bisect good kvm-3.17-1 # 42cbc04fd3b5e3f9b011bf9fa3ce0b3d1e10b58b good[ 6929.863545] loaded kvm module (v3.17-rc1-158-g451fd72219dd) bad [ 6971.932790] loaded kvm module (for-linus) bad [ 7216.073128] loaded kvm module (v3.16-6426-gae045e245542) good[ 7286.198948] loaded kvm module (v3.16-3283-g53ee983378ff) good[ 7350.534060] loaded kvm module (v3.16-rc7-1668-gaef4f5b6db65) good[ 7439.037038] loaded kvm module (v3.16-4006-g91c2ff7708d4) good[ 7481.188637] loaded kvm module (v3.16-rc6-450-g7ba3c21c17d0) bad [ 7535.292730] loaded kvm module (v3.16-4635-ge7fda6c4c3c1) good[ 7589.722691] loaded kvm module (v3.16-rc5-110-g9b0fd802e8c0) good[ 7630.286418] loaded kvm module (v3.16-4467-ged5c41d30ef2) good[ 7712.470986] loaded kvm module (v3.16-rc1-35-g885d078bfe92) good[ 7763.443626] loaded kvm module (v3.16-rc1-381-g1b0733837a9b) good[ 7825.497414] loaded kvm module (v3.16-rc5-116-g7806f60e1d20) good[ 7893.174056] loaded kvm module (v3.16-rc1-384-gc6f1224573c3) This means that: - Tomasz's patch (commit 0d3da0d26e3c3515997c99451ce3b0ad1a69a36c) is fine, it just enables the (wrong) master clock more often - the failure is within that branch. I then cherry-picked Tomasz's patch during a new bisection, and landed on one of my original suspects: commit cbcf2dd3b3d4d990610259e8d878fc8dc1f17d80 Author: Thomas Gleixner t...@linutronix.de Date: Wed Jul 16 21:04:54 2014 + x86: kvm: Make kvm_get_time_and_clockread() nanoseconds based Convert the relevant base data right away to nanoseconds instead of doing the conversion on every readout. Reduces text size by 160 bytes. Signed-off-by: Thomas Gleixner t...@linutronix.de Cc: Gleb Natapov g...@kernel.org Cc: kvm@vger.kernel.org Acked-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: John Stultz john.stu...@linaro.org Yes, I also look into this bad commit recently which lead to guest hang after live migration or after local save/restore. Regards, Wanpeng Li Again, here is the log: $ git bisect bad 953dec21aed4038464fec02f96a2f1b8701a5bce $ git bisect good 1af447bd8cbfb808a320885d214555fb2d32e6e6 good[ 8384.334892] loaded kvm module (v3.16-rc5-81-g68f6783d2831) bad [ 8525.975170] loaded kvm module (v3.16-rc5-99-gf519b1a2e08c) good[ 8562.204988] loaded kvm module (v3.16-rc5-90-g41fa4215f8e8) bad [ 8629.133287] loaded kvm module (v3.16-rc5-94-g48f18fd6addc) bad [ 8772.846612] loaded kvm module (v3.16-rc5-92-gcbcf2dd3b3d4) good[ 8836.509602] loaded kvm module (v3.16-rc5-91-gbb0b58127c5a) Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-unit-test failures
Il 04/09/2014 14:24, Wanpeng Li ha scritto: On Thu, Sep 04, 2014 at 01:33:10PM +0200, Paolo Bonzini wrote: Il 04/09/2014 11:53, Paolo Bonzini ha scritto: Il 03/09/2014 20:25, Chris J Arges ha scritto: snip I'm not sure about the reason for the warp, but indeed the offset and uptime match (I'll check them against the trace tomorrow) so it's just that the VM's TSC base is not taken into account correctly. Can you gather another trace with the problematic patch reverted? Paolo Here is the third trace running with 0d3da0d2 reverted from the latest kvm queue branch 11cc9ea3: http://people.canonical.com/~arges/kvm/trace-3.dat.xz Thanks! And---yay!---I reproduced it on another machine. And my bisection landed on the merge of the timer branch (commit e7fda6c4c3c1a7d6996dd75fd84670fa0b5d448f). Here is the log: $ git bisect bad origin/master $ git bisect good v3.16 $ git bisect good kvm-3.17-1 # 42cbc04fd3b5e3f9b011bf9fa3ce0b3d1e10b58b good[ 6929.863545] loaded kvm module (v3.17-rc1-158-g451fd72219dd) bad [ 6971.932790] loaded kvm module (for-linus) bad [ 7216.073128] loaded kvm module (v3.16-6426-gae045e245542) good[ 7286.198948] loaded kvm module (v3.16-3283-g53ee983378ff) good[ 7350.534060] loaded kvm module (v3.16-rc7-1668-gaef4f5b6db65) good[ 7439.037038] loaded kvm module (v3.16-4006-g91c2ff7708d4) good[ 7481.188637] loaded kvm module (v3.16-rc6-450-g7ba3c21c17d0) bad [ 7535.292730] loaded kvm module (v3.16-4635-ge7fda6c4c3c1) good[ 7589.722691] loaded kvm module (v3.16-rc5-110-g9b0fd802e8c0) good[ 7630.286418] loaded kvm module (v3.16-4467-ged5c41d30ef2) good[ 7712.470986] loaded kvm module (v3.16-rc1-35-g885d078bfe92) good[ 7763.443626] loaded kvm module (v3.16-rc1-381-g1b0733837a9b) good[ 7825.497414] loaded kvm module (v3.16-rc5-116-g7806f60e1d20) good[ 7893.174056] loaded kvm module (v3.16-rc1-384-gc6f1224573c3) This means that: - Tomasz's patch (commit 0d3da0d26e3c3515997c99451ce3b0ad1a69a36c) is fine, it just enables the (wrong) master clock more often - the failure is within that branch. I then cherry-picked Tomasz's patch during a new bisection, and landed on one of my original suspects: commit cbcf2dd3b3d4d990610259e8d878fc8dc1f17d80 Author: Thomas Gleixner t...@linutronix.de Date: Wed Jul 16 21:04:54 2014 + x86: kvm: Make kvm_get_time_and_clockread() nanoseconds based Convert the relevant base data right away to nanoseconds instead of doing the conversion on every readout. Reduces text size by 160 bytes. Signed-off-by: Thomas Gleixner t...@linutronix.de Cc: Gleb Natapov g...@kernel.org Cc: kvm@vger.kernel.org Acked-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: John Stultz john.stu...@linaro.org Yes, I also look into this bad commit recently which lead to guest hang after live migration or after local save/restore. Thanks for the report! Wanpeng, can you test and/or review the patch I just posted ([PATCH] KVM: x86: fix kvmclock breakage from timers branch merge)? Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-unit-test failures
Il 02/09/2014 21:57, Chris J Arges ha scritto: Can you please trace the test using trace-cmd (http://www.linux-kvm.org/page/Tracing) and send the output? Paolo Paolo, I have posted the trace data here: http://people.canonical.com/~arges/kvm/trace.dat.xz Can you try running the test again (no need to get a new trace) with clocksource=hpet on the kernel command line? Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-unit-test failures
Il 02/09/2014 21:57, Chris J Arges ha scritto: Seconds get from host: 1409687073 Seconds get from kvmclock: 1409333034 Offset:-354039 offset too large! Check the stability of raw cycle ... Worst warp -354462672821748 Total vcpus: 2 Test loops: 1000 Total warps: 1 Total stalls: 0 Worst warp: -354462672821748 Raw cycle is not stable Monotonic cycle test: Worst warp -354455286691490 Looks like one CPU is not being initialized correctly: - The next correction in the trace is 18445647546048704244, and (next-2^64) / -354039 is about 3.1*10^9. This is a pretty plausible value of the TSC frequency. As a comparison, on my machine I have next=18446366988261784997 and an uptime of 29:12 hours, and the two match nicely with the CPU clock: -(18446366988261784997-2^64) / (29.2 * 3600 * 10^9) = 3.587 $ grep -m1 model.name /proc/cpuinfo model name: Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz - The offset in seconds * 10^9 is pretty close to the warp in nanoseconds. Can you: 1) try this patch 2) gather a new trace 3) include uptime and cpuinfo in your report? All this without clocksource=hpet of course. Thanks, Paolo diff --git a/x86/kvmclock_test.c b/x86/kvmclock_test.c index 52a43fb..f68881c 100644 --- a/x86/kvmclock_test.c +++ b/x86/kvmclock_test.c @@ -7,6 +7,9 @@ #define DEFAULT_TEST_LOOPS 1L #define DEFAULT_THRESHOLD 5L +long threshold = DEFAULT_THRESHOLD; +int nerr = 0; + struct test_info { struct spinlock lock; long loops; /* test loops */ @@ -20,8 +23,9 @@ struct test_info { struct test_info ti[4]; -static int wallclock_test(long sec, long threshold) +static void wallclock_test(void *p_sec) { + long sec = *(long *)p_sec; long ksec, offset; struct timespec ts; @@ -36,10 +40,8 @@ static int wallclock_test(long sec, long threshold) if (offset threshold || offset -threshold) { printf(offset too large!\n); -return 1; +nerr++; } - -return 0; } static void kvm_clock_test(void *data) @@ -116,10 +118,9 @@ static int cycle_test(int ncpus, long loops, int check, struct test_info *ti) int main(int ac, char **av) { int ncpus; -int nerr = 0, i; +int i; long loops = DEFAULT_TEST_LOOPS; long sec = 0; -long threshold = DEFAULT_THRESHOLD; if (ac 1) loops = atol(av[1]); @@ -137,7 +138,8 @@ int main(int ac, char **av) on_cpu(i, kvm_clock_init, (void *)0); if (ac 2) -nerr += wallclock_test(sec, threshold); + for (i = 0; i ncpus; ++i) + on_cpu(i, wallclock_test, sec); printf(Check the stability of raw cycle ...\n); pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-unit-test failures
On 09/03/2014 09:47 AM, Paolo Bonzini wrote: Il 02/09/2014 21:57, Chris J Arges ha scritto: Can you please trace the test using trace-cmd (http://www.linux-kvm.org/page/Tracing) and send the output? Paolo Paolo, I have posted the trace data here: http://people.canonical.com/~arges/kvm/trace.dat.xz Can you try running the test again (no need to get a new trace) with clocksource=hpet on the kernel command line? Paolo ./x86-run x86/kvmclock_test.flat -smp 2 --append 1000 `date +%s` qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio -device pci-testdev -kernel x86/kvmclock_test.flat -smp 2 --append 1000 1409757645 enabling apic enabling apic kvm-clock: cpu 0, msr 0x:44d4c0 kvm-clock: cpu 0, msr 0x:44d4c0 Wallclock test, threshold 5 Seconds get from host: 1409757645 Seconds get from kvmclock: 1409757645 Offset:0 Check the stability of raw cycle ... Total vcpus: 2 Test loops: 1000 Total warps: 0 Total stalls: 0 Worst warp: 0 Raw cycle is stable Monotonic cycle test: Total vcpus: 2 Test loops: 1000 Total warps: 0 Total stalls: 0 Worst warp: 0 Measure the performance of raw cycle ... Total vcpus: 2 Test loops: 1000 TSC cycles: 1106490299 Measure the performance of adjusted cycle ... Total vcpus: 2 Test loops: 1000 TSC cycles: 3463433372 Return value from qemu: 1 Ok this passes, I'll now try the patch without the cmdline option. --chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-unit-test failures
On 09/03/2014 09:59 AM, Paolo Bonzini wrote: Il 02/09/2014 21:57, Chris J Arges ha scritto: Seconds get from host: 1409687073 Seconds get from kvmclock: 1409333034 Offset:-354039 offset too large! Check the stability of raw cycle ... Worst warp -354462672821748 Total vcpus: 2 Test loops: 1000 Total warps: 1 Total stalls: 0 Worst warp: -354462672821748 Raw cycle is not stable Monotonic cycle test: Worst warp -354455286691490 Looks like one CPU is not being initialized correctly: - The next correction in the trace is 18445647546048704244, and (next-2^64) / -354039 is about 3.1*10^9. This is a pretty plausible value of the TSC frequency. As a comparison, on my machine I have next=18446366988261784997 and an uptime of 29:12 hours, and the two match nicely with the CPU clock: -(18446366988261784997-2^64) / (29.2 * 3600 * 10^9) = 3.587 $ grep -m1 model.name /proc/cpuinfo model name : Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz - The offset in seconds * 10^9 is pretty close to the warp in nanoseconds. Can you: 1) try this patch 2) gather a new trace 3) include uptime and cpuinfo in your report? All this without clocksource=hpet of course. Thanks, Paolo diff --git a/x86/kvmclock_test.c b/x86/kvmclock_test.c index 52a43fb..f68881c 100644 --- a/x86/kvmclock_test.c +++ b/x86/kvmclock_test.c @@ -7,6 +7,9 @@ #define DEFAULT_TEST_LOOPS 1L #define DEFAULT_THRESHOLD 5L +long threshold = DEFAULT_THRESHOLD; +int nerr = 0; + struct test_info { struct spinlock lock; long loops; /* test loops */ @@ -20,8 +23,9 @@ struct test_info { struct test_info ti[4]; -static int wallclock_test(long sec, long threshold) +static void wallclock_test(void *p_sec) { + long sec = *(long *)p_sec; long ksec, offset; struct timespec ts; @@ -36,10 +40,8 @@ static int wallclock_test(long sec, long threshold) if (offset threshold || offset -threshold) { printf(offset too large!\n); -return 1; +nerr++; } - -return 0; } static void kvm_clock_test(void *data) @@ -116,10 +118,9 @@ static int cycle_test(int ncpus, long loops, int check, struct test_info *ti) int main(int ac, char **av) { int ncpus; -int nerr = 0, i; +int i; long loops = DEFAULT_TEST_LOOPS; long sec = 0; -long threshold = DEFAULT_THRESHOLD; if (ac 1) loops = atol(av[1]); @@ -137,7 +138,8 @@ int main(int ac, char **av) on_cpu(i, kvm_clock_init, (void *)0); if (ac 2) -nerr += wallclock_test(sec, threshold); + for (i = 0; i ncpus; ++i) + on_cpu(i, wallclock_test, sec); printf(Check the stability of raw cycle ...\n); pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT Here are the results of the trace as you requested: http://people.canonical.com/~arges/kvm/trace-2.dat.xz $ uptime 16:18:31 up 53 min, 1 user, load average: 1.16, 0.39, 0.17 $ grep -m1 model.name /proc/cpuinfo model name : Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz Here is the output of the command: qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio -device pci-testdev -kernel x86/kvmclock_test.flat -smp 2 --append 1000 1409761075 enabling apic enabling apic kvm-clock: cpu 0, msr 0x:44e520 kvm-clock: cpu 0, msr 0x:44e520 Wallclock test, threshold 5 Seconds get from host: 1409761075 Seconds get from kvmclock: 1409757927 Offset:-3148 offset too large! Wallclock test, threshold 5 Seconds get from host: 1409761075 Seconds get from kvmclock: 1409757927 Offset:-3148 offset too large! Check the stability of raw cycle ... Worst warp -3147762665310 Total vcpus: 2 Test loops: 1000 Total warps: 1 Total stalls: 0 Worst warp: -3147762665310 Raw cycle is not stable Monotonic cycle test: Worst warp -3142929472775 Total vcpus: 2 Test loops: 1000 Total warps: 1 Total stalls: 0 Worst warp: -3142929472775 Measure the performance of raw cycle ... Total vcpus: 2 Test loops: 1000 TSC cycles: 1242044050 Measure the performance of adjusted cycle ... Total vcpus: 2 Test loops: 1000 TSC cycles: 1242665486 Return value from qemu: 3 --chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-unit-test failures
Il 03/09/2014 18:23, Chris J Arges ha scritto: $ uptime 16:18:31 up 53 min, 1 user, load average: 1.16, 0.39, 0.17 $ grep -m1 model.name /proc/cpuinfo model name : Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz Here is the output of the command: qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio -device pci-testdev -kernel x86/kvmclock_test.flat -smp 2 --append 1000 1409761075 enabling apic enabling apic kvm-clock: cpu 0, msr 0x:44e520 kvm-clock: cpu 0, msr 0x:44e520 Wallclock test, threshold 5 Seconds get from host: 1409761075 Seconds get from kvmclock: 1409757927 Offset:-3148 offset too large! Wallclock test, threshold 5 Seconds get from host: 1409761075 Seconds get from kvmclock: 1409757927 Offset:-3148 offset too large! Check the stability of raw cycle ... Worst warp -3147762665310 I'm not sure about the reason for the warp, but indeed the offset and uptime match (I'll check them against the trace tomorrow) so it's just that the VM's TSC base is not taken into account correctly. Can you gather another trace with the problematic patch reverted? Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-unit-test failures
snip I'm not sure about the reason for the warp, but indeed the offset and uptime match (I'll check them against the trace tomorrow) so it's just that the VM's TSC base is not taken into account correctly. Can you gather another trace with the problematic patch reverted? Paolo Here is the third trace running with 0d3da0d2 reverted from the latest kvm queue branch 11cc9ea3: http://people.canonical.com/~arges/kvm/trace-3.dat.xz $ uptime 18:25:13 up 5 min, 1 user, load average: 0.21, 0.74, 0.44 qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio -device pci-testdev -kernel x86/kvmclock_test.flat -smp 2 --append 1000 1409768537 enabling apic enabling apic kvm-clock: cpu 0, msr 0x:44e520 kvm-clock: cpu 0, msr 0x:44e520 Wallclock test, threshold 5 Seconds get from host: 1409768537 Seconds get from kvmclock: 1409768538 Offset:1 Wallclock test, threshold 5 Seconds get from host: 1409768537 Seconds get from kvmclock: 1409768538 Offset:1 Check the stability of raw cycle ... Total vcpus: 2 Test loops: 1000 Total warps: 0 Total stalls: 0 Worst warp: 0 Raw cycle is stable Monotonic cycle test: Total vcpus: 2 Test loops: 1000 Total warps: 0 Total stalls: 0 Worst warp: 0 Measure the performance of raw cycle ... Total vcpus: 2 Test loops: 1000 TSC cycles: 1241970306 Measure the performance of adjusted cycle ... Total vcpus: 2 Test loops: 1000 TSC cycles: 3266701026 Return value from qemu: 1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-unit-test failures
On 08/31/2014 11:05 AM, Paolo Bonzini wrote: Il 29/08/2014 23:05, Chris J Arges ha scritto: And indeed there is a condition where matched already_matched are both true. In this case we don't zero or increment nr_vcpus_matched_tsc. Incrementing nr_vcpus_matched_tsc in that last else clause allows the test to pass; however this is identical to the logic before the patch. Can you please trace the test using trace-cmd (http://www.linux-kvm.org/page/Tracing) and send the output? Paolo Paolo, I have posted the trace data here: http://people.canonical.com/~arges/kvm/trace.dat.xz Here is the output from the actual test case: enabling apic enabling apic kvm-clock: cpu 0, msr 0x:44d4c0 kvm-clock: cpu 0, msr 0x:44d4c0 Wallclock test, threshold 5 Seconds get from host: 1409687073 Seconds get from kvmclock: 1409333034 Offset:-354039 offset too large! Check the stability of raw cycle ... Worst warp -354462672821748 Total vcpus: 2 Test loops: 1000 Total warps: 1 Total stalls: 0 Worst warp: -354462672821748 Raw cycle is not stable Monotonic cycle test: Worst warp -354455286691490 Total vcpus: 2 Test loops: 1000 Total warps: 1 Total stalls: 0 Worst warp: -354455286691490 Measure the performance of raw cycle ... Total vcpus: 2 Test loops: 1000 TSC cycles: 1234719818 Measure the performance of adjusted cycle ... Total vcpus: 2 Test loops: 1000 TSC cycles: 1234750103 I also enabled dynamic debugging on arch/x86/kvm/x86.c (on a separate run): [354558.460867] kvm_get_time_scale: base_khz 100 = 2593993, shift 2, mul 2785278775 [354558.461034] kvm: new tsc generation 1, clock 0 [354558.461812] kvm_get_time_scale: base_khz 100 = 2593993, shift 2, mul 2785278775 [354558.461846] kvm: matched tsc offset for 0 [354558.477542] kvm: matched tsc offset for 0 [354558.477614] kvm: matched tsc offset for 0 [354558.513729] kvm: matched tsc offset for 0 [354558.541724] kvm: matched tsc offset for 0 [354558.741532] kvm_get_time_scale: base_khz 2593993 = 100, shift -1, mul 3311471770 [354558.742551] kvm_get_time_scale: base_khz 2593993 = 100, shift -1, mul 3311471770 Also I ensured that this was reproducible with the latest qemu, and these results were gathered with the latest version as of today. Any other places to look into? --chris j arges -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-unit-test failures
Il 29/08/2014 23:05, Chris J Arges ha scritto: And indeed there is a condition where matched already_matched are both true. In this case we don't zero or increment nr_vcpus_matched_tsc. Incrementing nr_vcpus_matched_tsc in that last else clause allows the test to pass; however this is identical to the logic before the patch. Can you please trace the test using trace-cmd (http://www.linux-kvm.org/page/Tracing) and send the output? Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-unit-test failures
On 08/27/2014 05:05 PM, Paolo Bonzini wrote: - Messaggio originale - Da: Chris J Arges chris.j.ar...@canonical.com A: Paolo Bonzini pbonz...@redhat.com, kvm@vger.kernel.org Inviato: Mercoledì, 27 agosto 2014 23:24:14 Oggetto: kvm-unit-test failures (was: [PATCH 1/2 v3] add check parameter to run_tests configuration) snip Thanks, looks good. Are there more failures? Paolo Paolo, Thanks for applying those patches! I now only see the two failures on my machine: model name : Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz I'm running with the tip of kvm master: 0ac625df43ce9d085d4ff54c1f739611f4308b13 (Merge tag 'kvm-s390-20140825') sudo ./x86-run x86/apic.flat -smp 2 -cpu qemu64,+x2apic,+tsc-deadline | grep -v PASS qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio -device pci-testdev -kernel x86/apic.flat -smp 2 -cpu qemu64,+x2apic,+tsc-deadline enabling apic enabling apic paging enabled cr0 = 80010011 cr3 = 7fff000 cr4 = 20 apic version: 1050014 x2apic enabled FAIL: tsc deadline timer clearing tsc deadline timer enabled This is fixed in kvm/next (3.18). SUMMARY: 16 tests, 1 unexpected failures Return value from qemu: 3 sudo ./x86-run x86/kvmclock_test.flat -smp 2 --append 1000 `date +%s` qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio -device pci-testdev -kernel x86/kvmclock_test.flat -smp 2 --append 1000 1409174399 enabling apic enabling apic kvm-clock: cpu 0, msr 0x:44d4c0 kvm-clock: cpu 0, msr 0x:44d4c0 Wallclock test, threshold 5 Seconds get from host: 1409174399 Seconds get from kvmclock: 1409173176 Offset:-1223 Weird, your clock is 20 minutes behind in the VM than it is in the host. Is the offset always around -1200? What happens if you reboot? (I get 0, 1 or sometimes 2). Paolo Hi Paolo, Results building with kvm queue tree (fd2752352bbc98850d83b5448a288d8991590317): CPU: model name : Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz I still get failures with the following test, I actually tested on multiple machines with identical hardware and the same failure occurred. In v3.13/v3.16 series kernels this passes. I'll look into which commit changed this result for me. I suspect it was fairly recent. ./x86-run x86/kvmclock_test.flat -smp 2 --append 1000 `date +%s` | grep -v PASS qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio -device pci-testdev -kernel x86/kvmclock_test.flat -smp 2 --append 1000 1409160326 enabling apic enabling apic kvm-clock: cpu 0, msr 0x:44d520 kvm-clock: cpu 0, msr 0x:44d520 Wallclock test, threshold 5 Seconds get from host: 1409160326 Seconds get from kvmclock: 1409153484 Offset:-6842 offset too large! Check the stability of raw cycle ... Worst warp -6841795339348 Total vcpus: 2 Test loops: 1000 Total warps: 1 Total stalls: 0 Worst warp: -6841795339348 Raw cycle is not stable Monotonic cycle test: Worst warp -6836691572679 Total vcpus: 2 Test loops: 1000 Total warps: 1 Total stalls: 0 Worst warp: -6836691572679 Measure the performance of raw cycle ... Total vcpus: 2 Test loops: 1000 TSC cycles: 1098400654 Measure the performance of adjusted cycle ... Total vcpus: 2 Test loops: 1000 TSC cycles: 1106302952 Return value from qemu: 3 This is another test that fails or hangs, this passes in 3.13, but fails on 3.16 with my testing. I'll dig into this more perhaps to find out which commit changes things. ./x86-run x86/vmx.flat -smp 1 -cpu host,+vmx | grep -v PASS qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio -device pci-testdev -kernel x86/vmx.flat -smp 1 -cpu host,+vmx enabling apic paging enabled cr0 = 80010011 cr3 = 7fff000 cr4 = 20 Test suite: VMX capability reporting Test suite: vmenter Test suite: preemption timer Test suite: control field PAT Test suite: control field EFER Test suite: CR shadowing Test suite: I/O bitmap Test suite: instruction intercept Test suite: EPT framework FAIL: EPT violation - paging structure Test suite: interrupt `ASS: running a guest with interrupt acknowledgement set Test suite: debug controls SUMMARY: 109 tests, 1 unexpected failures Return value from qemu: 3 Here is the test case hanging: ./x86-run x86/vmx.flat -smp 1 -cpu host,+vmx | grep -v PASS qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio -device pci-testdev -kernel x86/vmx.flat -smp 1 -cpu host,+vmx enabling apic paging enabled cr0 = 80010011 cr3 = 7fff000 cr4 = 20 Test suite: VMX capability reporting Test suite: vmenter Test suite: preemption timer Test suite: control field PAT Test suite: control field EFER Test suite
Re: kvm-unit-test failures
On 08/29/2014 12:36 PM, Chris J Arges wrote: On 08/27/2014 05:05 PM, Paolo Bonzini wrote: - Messaggio originale - Da: Chris J Arges chris.j.ar...@canonical.com A: Paolo Bonzini pbonz...@redhat.com, kvm@vger.kernel.org Inviato: Mercoledì, 27 agosto 2014 23:24:14 Oggetto: kvm-unit-test failures (was: [PATCH 1/2 v3] add check parameter to run_tests configuration) snip Thanks, looks good. Are there more failures? Paolo Paolo, Thanks for applying those patches! I now only see the two failures on my machine: model name : Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz I'm running with the tip of kvm master: 0ac625df43ce9d085d4ff54c1f739611f4308b13 (Merge tag 'kvm-s390-20140825') sudo ./x86-run x86/apic.flat -smp 2 -cpu qemu64,+x2apic,+tsc-deadline | grep -v PASS qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio -device pci-testdev -kernel x86/apic.flat -smp 2 -cpu qemu64,+x2apic,+tsc-deadline enabling apic enabling apic paging enabled cr0 = 80010011 cr3 = 7fff000 cr4 = 20 apic version: 1050014 x2apic enabled FAIL: tsc deadline timer clearing tsc deadline timer enabled This is fixed in kvm/next (3.18). SUMMARY: 16 tests, 1 unexpected failures Return value from qemu: 3 sudo ./x86-run x86/kvmclock_test.flat -smp 2 --append 1000 `date +%s` qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio -device pci-testdev -kernel x86/kvmclock_test.flat -smp 2 --append 1000 1409174399 enabling apic enabling apic kvm-clock: cpu 0, msr 0x:44d4c0 kvm-clock: cpu 0, msr 0x:44d4c0 Wallclock test, threshold 5 Seconds get from host: 1409174399 Seconds get from kvmclock: 1409173176 Offset:-1223 Weird, your clock is 20 minutes behind in the VM than it is in the host. Is the offset always around -1200? What happens if you reboot? (I get 0, 1 or sometimes 2). Paolo Hi Paolo, Results building with kvm queue tree (fd2752352bbc98850d83b5448a288d8991590317): CPU: model name : Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz I still get failures with the following test, I actually tested on multiple machines with identical hardware and the same failure occurred. In v3.13/v3.16 series kernels this passes. I'll look into which commit changed this result for me. I suspect it was fairly recent. I did a quick grep through the patches between v3.16 and the latest and I found: (0d3da0d26e3c3515997c99451ce3b0ad1a69a36c KVM: x86: fix TSC matching). I reverted this patch and the test case passed for me. Looking at the patch, I added an extra else statement as so: if (!matched) { kvm-arch.nr_vcpus_matched_tsc = 0; } else if (!already_matched) { kvm-arch.nr_vcpus_matched_tsc++; } else { printk(kvm: do we get here?\n); } And indeed there is a condition where matched already_matched are both true. In this case we don't zero or increment nr_vcpus_matched_tsc. Incrementing nr_vcpus_matched_tsc in that last else clause allows the test to pass; however this is identical to the logic before the patch. Any suggestions for fixing this case? Thanks, --chris j arges ./x86-run x86/kvmclock_test.flat -smp 2 --append 1000 `date +%s` | grep -v PASS qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio -device pci-testdev -kernel x86/kvmclock_test.flat -smp 2 --append 1000 1409160326 enabling apic enabling apic kvm-clock: cpu 0, msr 0x:44d520 kvm-clock: cpu 0, msr 0x:44d520 Wallclock test, threshold 5 Seconds get from host: 1409160326 Seconds get from kvmclock: 1409153484 Offset:-6842 offset too large! Check the stability of raw cycle ... Worst warp -6841795339348 Total vcpus: 2 Test loops: 1000 Total warps: 1 Total stalls: 0 Worst warp: -6841795339348 Raw cycle is not stable Monotonic cycle test: Worst warp -6836691572679 Total vcpus: 2 Test loops: 1000 Total warps: 1 Total stalls: 0 Worst warp: -6836691572679 Measure the performance of raw cycle ... Total vcpus: 2 Test loops: 1000 TSC cycles: 1098400654 Measure the performance of adjusted cycle ... Total vcpus: 2 Test loops: 1000 TSC cycles: 1106302952 Return value from qemu: 3 This is another test that fails or hangs, this passes in 3.13, but fails on 3.16 with my testing. I'll dig into this more perhaps to find out which commit changes things. ./x86-run x86/vmx.flat -smp 1 -cpu host,+vmx | grep -v PASS qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio -device pci-testdev -kernel x86/vmx.flat -smp 1 -cpu host,+vmx enabling apic paging enabled cr0 = 80010011 cr3 = 7fff000 cr4 = 20 Test suite: VMX
Re: kvm-unit-test failures
Il 29/08/2014 19:36, Chris J Arges ha scritto: I still get failures with the following test, I actually tested on multiple machines with identical hardware and the same failure occurred. In v3.13/v3.16 series kernels this passes. I'll look into which commit changed this result for me. I suspect it was fairly recent. I would try bisecting between 0e5ac3a8b100469ea154f87dd57b685fbdd356f6 (might be bad) and 41fa4215f8e8150bdc5d2a5f8704915f1b059fa8 (might be good). Is this a laptop or desktop? Any relationship between the Offset and how long the machine has been up? Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-unit-test failures
On 08/29/2014 04:08 PM, Paolo Bonzini wrote: Il 29/08/2014 19:36, Chris J Arges ha scritto: I still get failures with the following test, I actually tested on multiple machines with identical hardware and the same failure occurred. In v3.13/v3.16 series kernels this passes. I'll look into which commit changed this result for me. I suspect it was fairly recent. I would try bisecting between 0e5ac3a8b100469ea154f87dd57b685fbdd356f6 (might be bad) and 41fa4215f8e8150bdc5d2a5f8704915f1b059fa8 (might be good). Just sent the other email with my bisect results, the 'bad' commit for me is: 0d3da0d26e3c3515997c99451ce3b0ad1a69a36c A revert on this commit allows the tests to pass again. Is this a laptop or desktop? Any relationship between the Offset and how long the machine has been up? Paolo This is a server; and no, I've reboot the machine and re-run the test. I've also run it after days of uptime. --chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kvm-unit-test failures (was: [PATCH 1/2 v3] add check parameter to run_tests configuration)
snip Thanks, looks good. Are there more failures? Paolo Paolo, Thanks for applying those patches! I now only see the two failures on my machine: model name : Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz I'm running with the tip of kvm master: 0ac625df43ce9d085d4ff54c1f739611f4308b13 (Merge tag 'kvm-s390-20140825') sudo ./x86-run x86/apic.flat -smp 2 -cpu qemu64,+x2apic,+tsc-deadline | grep -v PASS qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio -device pci-testdev -kernel x86/apic.flat -smp 2 -cpu qemu64,+x2apic,+tsc-deadline enabling apic enabling apic paging enabled cr0 = 80010011 cr3 = 7fff000 cr4 = 20 apic version: 1050014 x2apic enabled FAIL: tsc deadline timer clearing tsc deadline timer enabled SUMMARY: 16 tests, 1 unexpected failures Return value from qemu: 3 sudo ./x86-run x86/kvmclock_test.flat -smp 2 --append 1000 `date +%s` qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio -device pci-testdev -kernel x86/kvmclock_test.flat -smp 2 --append 1000 1409174399 enabling apic enabling apic kvm-clock: cpu 0, msr 0x:44d4c0 kvm-clock: cpu 0, msr 0x:44d4c0 Wallclock test, threshold 5 Seconds get from host: 1409174399 Seconds get from kvmclock: 1409173176 Offset:-1223 offset too large! Check the stability of raw cycle ... Worst warp -1222831419357 Total vcpus: 2 Test loops: 1000 Total warps: 1 Total stalls: 0 Worst warp: -1222831419357 Raw cycle is not stable Monotonic cycle test: Worst warp -1219118621614 Total vcpus: 2 Test loops: 1000 Total warps: 1 Total stalls: 0 Worst warp: -1219118621614 Measure the performance of raw cycle ... Total vcpus: 2 Test loops: 1000 TSC cycles: 1065145046 Measure the performance of adjusted cycle ... Total vcpus: 2 Test loops: 1000 TSC cycles: 1126981511 Return value from qemu: 3 Let me know if anything comes to mind. I can also look more deeply into these failures. Thanks, --chris j arges -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-unit-test failures (was: [PATCH 1/2 v3] add check parameter to run_tests configuration)
- Messaggio originale - Da: Chris J Arges chris.j.ar...@canonical.com A: Paolo Bonzini pbonz...@redhat.com, kvm@vger.kernel.org Inviato: Mercoledì, 27 agosto 2014 23:24:14 Oggetto: kvm-unit-test failures (was: [PATCH 1/2 v3] add check parameter to run_tests configuration) snip Thanks, looks good. Are there more failures? Paolo Paolo, Thanks for applying those patches! I now only see the two failures on my machine: model name : Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz I'm running with the tip of kvm master: 0ac625df43ce9d085d4ff54c1f739611f4308b13 (Merge tag 'kvm-s390-20140825') sudo ./x86-run x86/apic.flat -smp 2 -cpu qemu64,+x2apic,+tsc-deadline | grep -v PASS qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio -device pci-testdev -kernel x86/apic.flat -smp 2 -cpu qemu64,+x2apic,+tsc-deadline enabling apic enabling apic paging enabled cr0 = 80010011 cr3 = 7fff000 cr4 = 20 apic version: 1050014 x2apic enabled FAIL: tsc deadline timer clearing tsc deadline timer enabled This is fixed in kvm/next (3.18). SUMMARY: 16 tests, 1 unexpected failures Return value from qemu: 3 sudo ./x86-run x86/kvmclock_test.flat -smp 2 --append 1000 `date +%s` qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio -device pci-testdev -kernel x86/kvmclock_test.flat -smp 2 --append 1000 1409174399 enabling apic enabling apic kvm-clock: cpu 0, msr 0x:44d4c0 kvm-clock: cpu 0, msr 0x:44d4c0 Wallclock test, threshold 5 Seconds get from host: 1409174399 Seconds get from kvmclock: 1409173176 Offset:-1223 Weird, your clock is 20 minutes behind in the VM than it is in the host. Is the offset always around -1200? What happens if you reboot? (I get 0, 1 or sometimes 2). Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html