Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-07-05 Thread Xiao Guangrong
On 07/05/2012 07:12 AM, Peter Lieven wrote:
 On 07/03/12 15:13, Avi Kivity wrote:
 On 07/03/2012 04:01 PM, Peter Lieven wrote:
 Further output from my testing.

 Working:
 Linux 2.6.38 with included kvm module
 Linux 3.0.0 with included kvm module

 Not-Working:
 Linux 3.2.0 with included kvm module
 Linux 2.6.28 with kvm-kmod 3.4
 Linux 3.0.0 with kvm-kmod 3.4
 Linux 3.2.0 with kvm-kmod 3.4

 I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1.
 It might be that the code was introduced somewhere between 3.0.0
 and 3.2.0 in the kvm kernel module and that the flaw is not
 in qemu-kvm.

 Any hints?

 A bisect could tell us where the problem is.

 To avoid bisecting all of linux, try

 git bisect v3.2 v3.0 virt/kvm arch/x86/kvm
 here we go:
 
 commit ca7d58f375c650cf36900cb1da1ca2cc99b13393
 Author: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
 Date:   Wed Jul 13 14:31:08 2011 +0800
 
 KVM: x86: fix broken read emulation spans a page boundary

Ah, i will try to reproduce it and fix it.

Thanks for your work.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-07-05 Thread Xiao Guangrong
On 06/28/2012 05:11 PM, Peter Lieven wrote:

 that here is bascially whats going on:
 
   qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio read len 
 3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva 0xa 
 gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
 unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason 
 KVM_EXIT_MMIO (6)
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio read 
 len 3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva 0xa 
 gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
 unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason 
 KVM_EXIT_MMIO (6)
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio read 
 len 3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva 0xa 
 gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
 unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason 
 KVM_EXIT_MMIO (6)
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio read 
 len 3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva 0xa 
 gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
 unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason 
 KVM_EXIT_MMIO (6)
 

There are two mmio emulation after user-space-exit, it is caused by mmio
read access which spans two pages. But it should be fixed by:

commit f78146b0f9230765c6315b2e14f56112513389ad
Author: Avi Kivity a...@redhat.com
Date:   Wed Apr 18 19:22:47 2012 +0300

KVM: Fix page-crossing MMIO

MMIO that are split across a page boundary are currently broken - the
code does not expect to be aborted by the exit to userspace for the
first MMIO fragment.

This patch fixes the problem by generalizing the current code for handling
16-byte MMIOs to handle a number of fragments, and changes the MMIO
code to create those fragments.

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Could you please pull the code from:
https://git.kernel.org/pub/scm/virt/kvm/kvm.git
and trace it again?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-07-05 Thread Peter Lieven

On 05.07.2012 10:51, Xiao Guangrong wrote:

On 06/28/2012 05:11 PM, Peter Lieven wrote:


that here is bascially whats going on:

   qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio read len 3 
gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva 0xa 
gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason 
KVM_EXIT_MMIO (6)
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio read len 
3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva 0xa 
gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason 
KVM_EXIT_MMIO (6)
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio read len 
3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva 0xa 
gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason 
KVM_EXIT_MMIO (6)
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio read len 
3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva 0xa 
gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason 
KVM_EXIT_MMIO (6)


There are two mmio emulation after user-space-exit, it is caused by mmio
read access which spans two pages. But it should be fixed by:

commit f78146b0f9230765c6315b2e14f56112513389ad
Author: Avi Kivitya...@redhat.com
Date:   Wed Apr 18 19:22:47 2012 +0300

 KVM: Fix page-crossing MMIO

 MMIO that are split across a page boundary are currently broken - the
 code does not expect to be aborted by the exit to userspace for the
 first MMIO fragment.

 This patch fixes the problem by generalizing the current code for handling
 16-byte MMIOs to handle a number of fragments, and changes the MMIO
 code to create those fragments.

 Signed-off-by: Avi Kivitya...@redhat.com
 Signed-off-by: Marcelo Tosattimtosa...@redhat.com

Could you please pull the code from:
https://git.kernel.org/pub/scm/virt/kvm/kvm.git
and trace it again?

Thank you very much, this fixes the issue I have seen.

Thanks,
Peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-07-04 Thread Peter Lieven

On 07/03/12 15:25, Avi Kivity wrote:

On 07/03/2012 04:15 PM, Peter Lieven wrote:

On 03.07.2012 15:13, Avi Kivity wrote:

On 07/03/2012 04:01 PM, Peter Lieven wrote:

Further output from my testing.

Working:
Linux 2.6.38 with included kvm module
Linux 3.0.0 with included kvm module

Not-Working:
Linux 3.2.0 with included kvm module
Linux 2.6.28 with kvm-kmod 3.4
Linux 3.0.0 with kvm-kmod 3.4
Linux 3.2.0 with kvm-kmod 3.4

I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1.
It might be that the code was introduced somewhere between 3.0.0
and 3.2.0 in the kvm kernel module and that the flaw is not
in qemu-kvm.

Any hints?


A bisect could tell us where the problem is.

To avoid bisecting all of linux, try

 git bisect v3.2 v3.0 virt/kvm arch/x86/kvm



would it also be ok to bisect kvm-kmod?

Yes, but note that kvm-kmod is spread across two repositories which are
not often tested out of sync, so you may get build failures.

ok, i just started with this with a 3.0 (good) and 3.2 (bad) vanilla 
kernel. i can confirm
the bug and i am no starting to bisect. it will take while with my 
equipment if anyone

has a powerful testbed to run this i would greatly appreciate help.

if anyone wants to reproduce:

a) v3.2 from git.kernel.org
b) qemu-kvm 1.0.1 from sourceforge
c) ubuntu 64-bit 12.04 server cd
d) empty (e.g. all zero) hard disk image

cmdline:
./qemu-system-x86_64 -m 512 -cdrom 
/home/lieven/Downloads/ubuntu-12.04-server-amd64.iso -hda 
/dev/hd1/vmtest -vnc :1 -monitor stdio -boot dc


then choose boot from first harddisk and try to quit the qemu monitor 
with 'quit'. - hypervisor hangs.


peter



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-07-04 Thread Peter Lieven

On 07/03/12 15:13, Avi Kivity wrote:

On 07/03/2012 04:01 PM, Peter Lieven wrote:

Further output from my testing.

Working:
Linux 2.6.38 with included kvm module
Linux 3.0.0 with included kvm module

Not-Working:
Linux 3.2.0 with included kvm module
Linux 2.6.28 with kvm-kmod 3.4
Linux 3.0.0 with kvm-kmod 3.4
Linux 3.2.0 with kvm-kmod 3.4

I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1.
It might be that the code was introduced somewhere between 3.0.0
and 3.2.0 in the kvm kernel module and that the flaw is not
in qemu-kvm.

Any hints?


A bisect could tell us where the problem is.

To avoid bisecting all of linux, try

git bisect v3.2 v3.0 virt/kvm arch/x86/kvm

here we go:

commit ca7d58f375c650cf36900cb1da1ca2cc99b13393
Author: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
Date:   Wed Jul 13 14:31:08 2011 +0800

KVM: x86: fix broken read emulation spans a page boundary


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-07-03 Thread Peter Lieven

Further output from my testing.

Working:
Linux 2.6.38 with included kvm module
Linux 3.0.0 with included kvm module

Not-Working:
Linux 3.2.0 with included kvm module
Linux 2.6.28 with kvm-kmod 3.4
Linux 3.0.0 with kvm-kmod 3.4
Linux 3.2.0 with kvm-kmod 3.4

I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1.
It might be that the code was introduced somewhere between 3.0.0
and 3.2.0 in the kvm kernel module and that the flaw is not
in qemu-kvm.

Any hints?

Thanks,
Peter


On 02.07.2012 17:05, Avi Kivity wrote:

On 06/28/2012 12:38 PM, Peter Lieven wrote:

does anyone know whats that here in handle_mmio?

 /* hack: Red Hat 7.1 generates these weird accesses. */
 if ((addr  0xa-4  addr= 0xa)  kvm_run-mmio.len == 3)
 return 0;


Just what it says.  There is a 4-byte access to address 0x9.  The
first byte lies in RAM, the next three bytes are in mmio.  qemu is
geared to power-of-two accesses even though x86 can generate accesses to
any number of bytes between 1 and 8.

It appears that this has happened with your guest.  It's not impossible
that it's genuine.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-07-03 Thread Avi Kivity
On 07/03/2012 04:01 PM, Peter Lieven wrote:
 Further output from my testing.
 
 Working:
 Linux 2.6.38 with included kvm module
 Linux 3.0.0 with included kvm module
 
 Not-Working:
 Linux 3.2.0 with included kvm module
 Linux 2.6.28 with kvm-kmod 3.4
 Linux 3.0.0 with kvm-kmod 3.4
 Linux 3.2.0 with kvm-kmod 3.4
 
 I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1.
 It might be that the code was introduced somewhere between 3.0.0
 and 3.2.0 in the kvm kernel module and that the flaw is not
 in qemu-kvm.
 
 Any hints?
 

A bisect could tell us where the problem is.

To avoid bisecting all of linux, try

   git bisect v3.2 v3.0 virt/kvm arch/x86/kvm



-- 
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-07-03 Thread Peter Lieven

On 03.07.2012 15:13, Avi Kivity wrote:

On 07/03/2012 04:01 PM, Peter Lieven wrote:

Further output from my testing.

Working:
Linux 2.6.38 with included kvm module
Linux 3.0.0 with included kvm module

Not-Working:
Linux 3.2.0 with included kvm module
Linux 2.6.28 with kvm-kmod 3.4
Linux 3.0.0 with kvm-kmod 3.4
Linux 3.2.0 with kvm-kmod 3.4

I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1.
It might be that the code was introduced somewhere between 3.0.0
and 3.2.0 in the kvm kernel module and that the flaw is not
in qemu-kvm.

Any hints?


A bisect could tell us where the problem is.

To avoid bisecting all of linux, try

git bisect v3.2 v3.0 virt/kvm arch/x86/kvm



would it also be ok to bisect kvm-kmod?

thanks,
peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-07-03 Thread Avi Kivity
On 07/03/2012 04:15 PM, Peter Lieven wrote:
 On 03.07.2012 15:13, Avi Kivity wrote:
 On 07/03/2012 04:01 PM, Peter Lieven wrote:
 Further output from my testing.

 Working:
 Linux 2.6.38 with included kvm module
 Linux 3.0.0 with included kvm module

 Not-Working:
 Linux 3.2.0 with included kvm module
 Linux 2.6.28 with kvm-kmod 3.4
 Linux 3.0.0 with kvm-kmod 3.4
 Linux 3.2.0 with kvm-kmod 3.4

 I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1.
 It might be that the code was introduced somewhere between 3.0.0
 and 3.2.0 in the kvm kernel module and that the flaw is not
 in qemu-kvm.

 Any hints?

 A bisect could tell us where the problem is.

 To avoid bisecting all of linux, try

 git bisect v3.2 v3.0 virt/kvm arch/x86/kvm


 would it also be ok to bisect kvm-kmod?

Yes, but note that kvm-kmod is spread across two repositories which are
not often tested out of sync, so you may get build failures.


-- 
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-07-02 Thread Avi Kivity
On 06/28/2012 12:38 PM, Peter Lieven wrote:
 does anyone know whats that here in handle_mmio?
 
 /* hack: Red Hat 7.1 generates these weird accesses. */
 if ((addr  0xa-4  addr = 0xa)  kvm_run-mmio.len == 3)
 return 0;
 

Just what it says.  There is a 4-byte access to address 0x9.  The
first byte lies in RAM, the next three bytes are in mmio.  qemu is
geared to power-of-two accesses even though x86 can generate accesses to
any number of bytes between 1 and 8.

It appears that this has happened with your guest.  It's not impossible
that it's genuine.

-- 
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-07-02 Thread Peter Lieven

On 02.07.2012 17:05, Avi Kivity wrote:

On 06/28/2012 12:38 PM, Peter Lieven wrote:

does anyone know whats that here in handle_mmio?

 /* hack: Red Hat 7.1 generates these weird accesses. */
 if ((addr  0xa-4  addr= 0xa)  kvm_run-mmio.len == 3)
 return 0;


Just what it says.  There is a 4-byte access to address 0x9.  The
first byte lies in RAM, the next three bytes are in mmio.  qemu is
geared to power-of-two accesses even though x86 can generate accesses to
any number of bytes between 1 and 8.

I just stumbled across the word hack in the comment. When the race
occurs the CPU is basically reading from 0xa in an endless loop.

It appears that this has happened with your guest.  It's not impossible
that it's genuine.


I had a lot to do the last days, but I update our build environment to
Ubuntu LTS 12.04 64-bit Server which is based on Linux 3.2.0. I still
see the issue. If I use the kvm Module provided with the kernel it is
working correctly. If I use kvm-kmod-3.4 with qemu-kvm-1.0.1 (both
from sourceforge) I can reproduce the race condition.

I will keep you posted when I have more evidence.

Thanks,
Peter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-06-28 Thread Peter Lieven

On 27.06.2012 18:54, Jan Kiszka wrote:

On 2012-06-27 17:39, Peter Lieven wrote:

Hi all,

i debugged this further and found out that kvm-kmod-3.0 is working with
qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is
working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0).
Has anyone a clue which new KVM feature could cause this if a vcpu is in
an infinite loop?

Before accusing kvm-kmod ;), can you check if the effect is visible with
an original Linux 3.3.x or 3.4.x kernel as well?

sorry, i should have been more specific. maybe I also misunderstood sth.
I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel
3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if 
I use

a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't.
however, maybe we don't have to dig to deep - see below.

Then, bisection the change in qemu-kvm that apparently resolved the
issue would be interesting.

If we have to dig deeper, tracing [1] the lockup would likely be helpful
(all events of the qemu process, not just KVM related ones: trace-cmd
record -e all qemu-system-x86_64 ...).

that here is bascially whats going on:

  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
read len 3 gpa 0xa val 0x10ff
qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva 
0xa gpa 0xa Read GPA
qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
unsatisfied-read len 1 gpa 0xa val 0x0
qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason 
KVM_EXIT_MMIO (6)
qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
read len 3 gpa 0xa val 0x10ff
qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva 
0xa gpa 0xa Read GPA
qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
unsatisfied-read len 1 gpa 0xa val 0x0
qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason 
KVM_EXIT_MMIO (6)
qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
read len 3 gpa 0xa val 0x10ff
qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva 
0xa gpa 0xa Read GPA
qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
unsatisfied-read len 1 gpa 0xa val 0x0
qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason 
KVM_EXIT_MMIO (6)
qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
read len 3 gpa 0xa val 0x10ff
qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva 
0xa gpa 0xa Read GPA
qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
unsatisfied-read len 1 gpa 0xa val 0x0
qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason 
KVM_EXIT_MMIO (6)


its doing that forever. this is tracing the kvm module. doing the 
qemu-system-x86_64 trace is a bit compilcated, but
maybe this is already sufficient. otherwise i will of course gather this 
info as well.


thanks
peter



Jan

[1] http://www.linux-kvm.org/page/Tracing


---

Hi,

we recently came across multiple VMs racing and stopping working. It
seems to happen when the system is at 100% cpu.
One way to reproduce this is:
qemu-kvm-1.0.1 with vnc-thread enabled

cmdline (or similar):
/usr/bin/qemu-kvm-1.0.1 -net
tap,vlan=141,script=no,downscript=no,ifname=tap15,vnet_hdr -net
nic,vlan=141,model=virtio,macaddr=52:54:00:ff:00:f7 -drive
format=host_device,file=/dev/mapper/iqn.2001-05.com.equallogic:0-8a0906-efdf4e007-16700198c7f4fead-02-debug-race-hd01,if=virtio,cache=none,aio=native
-m 2048 -smp 2,sockets=1,cores=2,threads=1 -monitor
tcp:0:4026,server,nowait -vnc :26 -qmp tcp:0:3026,server,nowait -name
02-debug-race -boot order=dc,menu=off -cdrom
/home/kvm/cdrom//root/ubuntu-12.04-server-amd64.iso -k de -pidfile
/var/run/qemu/vm-221.pid -mem-prealloc -cpu
host,+x2apic,model_id=Intel(R) Xeon(R) CPU   L5640  @
2.27GHz,-tsc -rtc base=utc -usb -usbdevice tablet -no-hpet -vga cirrus

it is important that the attached virtio image contains only zeroes. if
the system boots from cd, select boot from first harddisk.
the hypervisor then hangs at 100% cpu and neither monitor nor qmp are
responsive anymore.

i have also seen customers reporting this when a VM is shut down.

if this is connected to the threaded vnc server it might be important to
connected at this time.

debug backtrace attached.

Thanks,
Peter

--

(gdb) file /usr/bin/qemu-kvm-1.0.1
Reading symbols from /usr/bin/qemu-kvm-1.0.1...done.
(gdb) attach 5145
Attaching to program: /usr/bin/qemu-kvm-1.0.1, process 5145
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
[Thread debugging using libthread_db enabled]
[New Thread 0x7f54d08b9700 (LWP 5253)]
[New Thread 0x7f5552757700 (LWP 5152)]
[New Thread 0x7f5552f58700 (LWP 5151)]
0x7f5553c6b5a3 in select () from /lib/libc.so.6
(gdb) info threads
 

Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-06-28 Thread Jan Kiszka
On 2012-06-28 11:11, Peter Lieven wrote:
 On 27.06.2012 18:54, Jan Kiszka wrote:
 On 2012-06-27 17:39, Peter Lieven wrote:
 Hi all,

 i debugged this further and found out that kvm-kmod-3.0 is working with
 qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is
 working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0).
 Has anyone a clue which new KVM feature could cause this if a vcpu is in
 an infinite loop?
 Before accusing kvm-kmod ;), can you check if the effect is visible with
 an original Linux 3.3.x or 3.4.x kernel as well?
 sorry, i should have been more specific. maybe I also misunderstood sth.
 I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel
 3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if
 I use
 a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't.
 however, maybe we don't have to dig to deep - see below.

kvm-kmod wraps and patches things to make the kvm code from 3.3/3.4
working on an older kernel. This step may introduce bugs of its own.
Therefore my suggestion to use a real 3.x kernel to exclude that risk
first of all.

 Then, bisection the change in qemu-kvm that apparently resolved the
 issue would be interesting.

 If we have to dig deeper, tracing [1] the lockup would likely be helpful
 (all events of the qemu process, not just KVM related ones: trace-cmd
 record -e all qemu-system-x86_64 ...).
 that here is bascially whats going on:
 
   qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio read
 len 3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
 0xa gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
 unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
 KVM_EXIT_MMIO (6)
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
 read len 3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
 0xa gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
 unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
 KVM_EXIT_MMIO (6)
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
 read len 3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
 0xa gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
 unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
 KVM_EXIT_MMIO (6)
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
 read len 3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
 0xa gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
 unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
 KVM_EXIT_MMIO (6)
 
 its doing that forever. this is tracing the kvm module. doing the
 qemu-system-x86_64 trace is a bit compilcated, but
 maybe this is already sufficient. otherwise i will of course gather this
 info as well.

That's only tracing KVM event, and it's tracing when things went wrong
already. We may need a full trace (-e all) specifically for the period
when this pattern above started.

However, let's focus on bisecting at kernel and qemu-kvm level first.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-06-28 Thread Peter Lieven

On 28.06.2012 11:21, Jan Kiszka wrote:

On 2012-06-28 11:11, Peter Lieven wrote:

On 27.06.2012 18:54, Jan Kiszka wrote:

On 2012-06-27 17:39, Peter Lieven wrote:

Hi all,

i debugged this further and found out that kvm-kmod-3.0 is working with
qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is
working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0).
Has anyone a clue which new KVM feature could cause this if a vcpu is in
an infinite loop?

Before accusing kvm-kmod ;), can you check if the effect is visible with
an original Linux 3.3.x or 3.4.x kernel as well?

sorry, i should have been more specific. maybe I also misunderstood sth.
I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel
3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if
I use
a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't.
however, maybe we don't have to dig to deep - see below.

kvm-kmod wraps and patches things to make the kvm code from 3.3/3.4
working on an older kernel. This step may introduce bugs of its own.
Therefore my suggestion to use a real 3.x kernel to exclude that risk
first of all.


Then, bisection the change in qemu-kvm that apparently resolved the
issue would be interesting.

If we have to dig deeper, tracing [1] the lockup would likely be helpful
(all events of the qemu process, not just KVM related ones: trace-cmd
record -e all qemu-system-x86_64 ...).

that here is bascially whats going on:

   qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio read
len 3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
read len 3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
read len 3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
read len 3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)

its doing that forever. this is tracing the kvm module. doing the
qemu-system-x86_64 trace is a bit compilcated, but
maybe this is already sufficient. otherwise i will of course gather this
info as well.

That's only tracing KVM event, and it's tracing when things went wrong
already. We may need a full trace (-e all) specifically for the period
when this pattern above started.

i will do that. maybe i should explain that the vcpu is executing
garbage when this above starts. its basically booting from an empty 
harddisk.


if i understand correctly qemu-kvm loops in kvm_cpu_exec(CPUState *env);

maybe the time to handle the monitor/qmp connection is just to short.
if i understand furhter correctly, it can only handle monitor connections
while qemu-kvm is executing kvm_vcpu_ioctl(env, KVM_RUN, 0); or am i
wrong here? the time spend in this state might be rather short.

my concern is not that the machine hangs, just the the hypervisor is 
unresponsive
and its impossible to reset or quit gracefully. the only way to get the 
hypervisor

ended is via SIGKILL.

thanks
peter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-06-28 Thread Peter Lieven

does anyone know whats that here in handle_mmio?

/* hack: Red Hat 7.1 generates these weird accesses. */
if ((addr  0xa-4  addr = 0xa)  kvm_run-mmio.len == 3)
return 0;

thanks,
peter

On 28.06.2012 11:31, Peter Lieven wrote:

On 28.06.2012 11:21, Jan Kiszka wrote:

On 2012-06-28 11:11, Peter Lieven wrote:

On 27.06.2012 18:54, Jan Kiszka wrote:

On 2012-06-27 17:39, Peter Lieven wrote:

Hi all,

i debugged this further and found out that kvm-kmod-3.0 is working 
with

qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is
working as well is kvm-kmod-3.4 with an old userspace 
(qemu-kvm-0.13.0).
Has anyone a clue which new KVM feature could cause this if a vcpu 
is in

an infinite loop?
Before accusing kvm-kmod ;), can you check if the effect is visible 
with

an original Linux 3.3.x or 3.4.x kernel as well?
sorry, i should have been more specific. maybe I also misunderstood 
sth.
I was believing that kvm-kmod-3.0 is basically what is in vanialla 
kernel
3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it 
works, if

I use
a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't.
however, maybe we don't have to dig to deep - see below.

kvm-kmod wraps and patches things to make the kvm code from 3.3/3.4
working on an older kernel. This step may introduce bugs of its own.
Therefore my suggestion to use a real 3.x kernel to exclude that risk
first of all.


Then, bisection the change in qemu-kvm that apparently resolved the
issue would be interesting.

If we have to dig deeper, tracing [1] the lockup would likely be 
helpful

(all events of the qemu process, not just KVM related ones: trace-cmd
record -e all qemu-system-x86_64 ...).

that here is bascially whats going on:

   qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio 
read

len 3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   
reason

KVM_EXIT_MMIO (6)
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
read len 3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   
reason

KVM_EXIT_MMIO (6)
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
read len 3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   
reason

KVM_EXIT_MMIO (6)
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
read len 3 gpa 0xa val 0x10ff
 qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
 qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   
reason

KVM_EXIT_MMIO (6)

its doing that forever. this is tracing the kvm module. doing the
qemu-system-x86_64 trace is a bit compilcated, but
maybe this is already sufficient. otherwise i will of course gather 
this

info as well.

That's only tracing KVM event, and it's tracing when things went wrong
already. We may need a full trace (-e all) specifically for the period
when this pattern above started.

i will do that. maybe i should explain that the vcpu is executing
garbage when this above starts. its basically booting from an empty 
harddisk.


if i understand correctly qemu-kvm loops in kvm_cpu_exec(CPUState *env);

maybe the time to handle the monitor/qmp connection is just to short.
if i understand furhter correctly, it can only handle monitor connections
while qemu-kvm is executing kvm_vcpu_ioctl(env, KVM_RUN, 0); or am i
wrong here? the time spend in this state might be rather short.

my concern is not that the machine hangs, just the the hypervisor is 
unresponsive
and its impossible to reset or quit gracefully. the only way to get 
the hypervisor

ended is via SIGKILL.

thanks
peter


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-06-28 Thread Jan Kiszka
On 2012-06-28 11:31, Peter Lieven wrote:
 On 28.06.2012 11:21, Jan Kiszka wrote:
 On 2012-06-28 11:11, Peter Lieven wrote:
 On 27.06.2012 18:54, Jan Kiszka wrote:
 On 2012-06-27 17:39, Peter Lieven wrote:
 Hi all,

 i debugged this further and found out that kvm-kmod-3.0 is working with
 qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is
 working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0).
 Has anyone a clue which new KVM feature could cause this if a vcpu is in
 an infinite loop?
 Before accusing kvm-kmod ;), can you check if the effect is visible with
 an original Linux 3.3.x or 3.4.x kernel as well?
 sorry, i should have been more specific. maybe I also misunderstood sth.
 I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel
 3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if
 I use
 a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't.
 however, maybe we don't have to dig to deep - see below.
 kvm-kmod wraps and patches things to make the kvm code from 3.3/3.4
 working on an older kernel. This step may introduce bugs of its own.
 Therefore my suggestion to use a real 3.x kernel to exclude that risk
 first of all.

 Then, bisection the change in qemu-kvm that apparently resolved the
 issue would be interesting.

 If we have to dig deeper, tracing [1] the lockup would likely be helpful
 (all events of the qemu process, not just KVM related ones: trace-cmd
 record -e all qemu-system-x86_64 ...).
 that here is bascially whats going on:

qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio read
 len 3 gpa 0xa val 0x10ff
  qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
 0xa gpa 0xa Read GPA
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
 unsatisfied-read len 1 gpa 0xa val 0x0
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
 KVM_EXIT_MMIO (6)
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
 read len 3 gpa 0xa val 0x10ff
  qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
 0xa gpa 0xa Read GPA
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
 unsatisfied-read len 1 gpa 0xa val 0x0
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
 KVM_EXIT_MMIO (6)
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
 read len 3 gpa 0xa val 0x10ff
  qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
 0xa gpa 0xa Read GPA
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
 unsatisfied-read len 1 gpa 0xa val 0x0
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
 KVM_EXIT_MMIO (6)
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
 read len 3 gpa 0xa val 0x10ff
  qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
 0xa gpa 0xa Read GPA
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
 unsatisfied-read len 1 gpa 0xa val 0x0
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
 KVM_EXIT_MMIO (6)

 its doing that forever. this is tracing the kvm module. doing the
 qemu-system-x86_64 trace is a bit compilcated, but
 maybe this is already sufficient. otherwise i will of course gather this
 info as well.
 That's only tracing KVM event, and it's tracing when things went wrong
 already. We may need a full trace (-e all) specifically for the period
 when this pattern above started.
 i will do that. maybe i should explain that the vcpu is executing
 garbage when this above starts. its basically booting from an empty 
 harddisk.
 
 if i understand correctly qemu-kvm loops in kvm_cpu_exec(CPUState *env);
 
 maybe the time to handle the monitor/qmp connection is just to short.
 if i understand furhter correctly, it can only handle monitor connections
 while qemu-kvm is executing kvm_vcpu_ioctl(env, KVM_RUN, 0); or am i
 wrong here? the time spend in this state might be rather short.

Unless you played with priorities and affinities, the Linux scheduler
should provide the required time to the iothread.

 
 my concern is not that the machine hangs, just the the hypervisor is 
 unresponsive
 and its impossible to reset or quit gracefully. the only way to get the 
 hypervisor
 ended is via SIGKILL.

Right. Even if the guest runs wild, you must be able to control the vm
via the monitor etc. If not, that's a bug.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-06-28 Thread Peter Lieven

On 28.06.2012 11:39, Jan Kiszka wrote:

On 2012-06-28 11:31, Peter Lieven wrote:

On 28.06.2012 11:21, Jan Kiszka wrote:

On 2012-06-28 11:11, Peter Lieven wrote:

On 27.06.2012 18:54, Jan Kiszka wrote:

On 2012-06-27 17:39, Peter Lieven wrote:

Hi all,

i debugged this further and found out that kvm-kmod-3.0 is working with
qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is
working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0).
Has anyone a clue which new KVM feature could cause this if a vcpu is in
an infinite loop?

Before accusing kvm-kmod ;), can you check if the effect is visible with
an original Linux 3.3.x or 3.4.x kernel as well?

sorry, i should have been more specific. maybe I also misunderstood sth.
I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel
3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if
I use
a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't.
however, maybe we don't have to dig to deep - see below.

kvm-kmod wraps and patches things to make the kvm code from 3.3/3.4
working on an older kernel. This step may introduce bugs of its own.
Therefore my suggestion to use a real 3.x kernel to exclude that risk
first of all.


Then, bisection the change in qemu-kvm that apparently resolved the
issue would be interesting.

If we have to dig deeper, tracing [1] the lockup would likely be helpful
(all events of the qemu process, not just KVM related ones: trace-cmd
record -e all qemu-system-x86_64 ...).

that here is bascially whats going on:

qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio read
len 3 gpa 0xa val 0x10ff
  qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
read len 3 gpa 0xa val 0x10ff
  qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
read len 3 gpa 0xa val 0x10ff
  qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
read len 3 gpa 0xa val 0x10ff
  qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)

its doing that forever. this is tracing the kvm module. doing the
qemu-system-x86_64 trace is a bit compilcated, but
maybe this is already sufficient. otherwise i will of course gather this
info as well.

That's only tracing KVM event, and it's tracing when things went wrong
already. We may need a full trace (-e all) specifically for the period
when this pattern above started.

i will do that. maybe i should explain that the vcpu is executing
garbage when this above starts. its basically booting from an empty
harddisk.

if i understand correctly qemu-kvm loops in kvm_cpu_exec(CPUState *env);

maybe the time to handle the monitor/qmp connection is just to short.
if i understand furhter correctly, it can only handle monitor connections
while qemu-kvm is executing kvm_vcpu_ioctl(env, KVM_RUN, 0); or am i
wrong here? the time spend in this state might be rather short.

Unless you played with priorities and affinities, the Linux scheduler
should provide the required time to the iothread.

I have a 1.1GB (85MB compressed) trace-file. If you have time to
look at it I could drop it somewhere.

We currently run all VMs with nice 1 because we observed that
this improves that controlability of the Node in case all VMs
have execessive CPU load. Running the VM unniced does
not change the behaviour unfortunately.

Peter

my concern is not that the machine hangs, just the the hypervisor is
unresponsive
and its impossible to reset or quit gracefully. the only way to get the
hypervisor
ended is via SIGKILL.

Right. Even if the guest runs wild, you must be able to control the vm
via the monitor etc. If not, that's a bug.

Jan



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo 

Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-06-28 Thread Peter Lieven

On 28.06.2012 11:39, Jan Kiszka wrote:

On 2012-06-28 11:31, Peter Lieven wrote:

On 28.06.2012 11:21, Jan Kiszka wrote:

On 2012-06-28 11:11, Peter Lieven wrote:

On 27.06.2012 18:54, Jan Kiszka wrote:

On 2012-06-27 17:39, Peter Lieven wrote:

Hi all,

i debugged this further and found out that kvm-kmod-3.0 is working with
qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is
working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0).
Has anyone a clue which new KVM feature could cause this if a vcpu is in
an infinite loop?

Before accusing kvm-kmod ;), can you check if the effect is visible with
an original Linux 3.3.x or 3.4.x kernel as well?

sorry, i should have been more specific. maybe I also misunderstood sth.
I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel
3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if
I use
a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't.
however, maybe we don't have to dig to deep - see below.

kvm-kmod wraps and patches things to make the kvm code from 3.3/3.4
working on an older kernel. This step may introduce bugs of its own.
Therefore my suggestion to use a real 3.x kernel to exclude that risk
first of all.


Then, bisection the change in qemu-kvm that apparently resolved the
issue would be interesting.

If we have to dig deeper, tracing [1] the lockup would likely be helpful
(all events of the qemu process, not just KVM related ones: trace-cmd
record -e all qemu-system-x86_64 ...).

that here is bascially whats going on:

qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio read
len 3 gpa 0xa val 0x10ff
  qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
read len 3 gpa 0xa val 0x10ff
  qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
read len 3 gpa 0xa val 0x10ff
  qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
read len 3 gpa 0xa val 0x10ff
  qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:  gva
0xa gpa 0xa Read GPA
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio: mmio
unsatisfied-read len 1 gpa 0xa val 0x0
  qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)

its doing that forever. this is tracing the kvm module. doing the
qemu-system-x86_64 trace is a bit compilcated, but
maybe this is already sufficient. otherwise i will of course gather this
info as well.

That's only tracing KVM event, and it's tracing when things went wrong
already. We may need a full trace (-e all) specifically for the period
when this pattern above started.

i will do that. maybe i should explain that the vcpu is executing
garbage when this above starts. its basically booting from an empty
harddisk.

if i understand correctly qemu-kvm loops in kvm_cpu_exec(CPUState *env);

maybe the time to handle the monitor/qmp connection is just to short.
if i understand furhter correctly, it can only handle monitor connections
while qemu-kvm is executing kvm_vcpu_ioctl(env, KVM_RUN, 0); or am i
wrong here? the time spend in this state might be rather short.

Unless you played with priorities and affinities, the Linux scheduler
should provide the required time to the iothread.


my concern is not that the machine hangs, just the the hypervisor is
unresponsive
and its impossible to reset or quit gracefully. the only way to get the
hypervisor
ended is via SIGKILL.

Right. Even if the guest runs wild, you must be able to control the vm
via the monitor etc. If not, that's a bug.

what i observed just know is that the monitor is working up to the
point i try to quit the hypervisor or try to reset the cpu.

so we where looking at a completely wrong place...

it seems that in this short excerpt, that the deadlock appears not
on excution but when the vcpus shall be paused.

Program received signal SIGINT, Interrupt.
0x7fc8ec36785c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib/libpthread.so.0

(gdb) thread apply all bt

Thread 4 (Thread 

Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

2012-06-27 Thread Jan Kiszka
On 2012-06-27 17:39, Peter Lieven wrote:
 Hi all,
 
 i debugged this further and found out that kvm-kmod-3.0 is working with
 qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is
 working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0).
 Has anyone a clue which new KVM feature could cause this if a vcpu is in
 an infinite loop?

Before accusing kvm-kmod ;), can you check if the effect is visible with
an original Linux 3.3.x or 3.4.x kernel as well?

Then, bisection the change in qemu-kvm that apparently resolved the
issue would be interesting.

If we have to dig deeper, tracing [1] the lockup would likely be helpful
(all events of the qemu process, not just KVM related ones: trace-cmd
record -e all qemu-system-x86_64 ...).

Jan

[1] http://www.linux-kvm.org/page/Tracing

 ---
 
 Hi,
 
 we recently came across multiple VMs racing and stopping working. It
 seems to happen when the system is at 100% cpu.
 One way to reproduce this is:
 qemu-kvm-1.0.1 with vnc-thread enabled
 
 cmdline (or similar):
 /usr/bin/qemu-kvm-1.0.1 -net
 tap,vlan=141,script=no,downscript=no,ifname=tap15,vnet_hdr -net
 nic,vlan=141,model=virtio,macaddr=52:54:00:ff:00:f7 -drive
 format=host_device,file=/dev/mapper/iqn.2001-05.com.equallogic:0-8a0906-efdf4e007-16700198c7f4fead-02-debug-race-hd01,if=virtio,cache=none,aio=native
 -m 2048 -smp 2,sockets=1,cores=2,threads=1 -monitor
 tcp:0:4026,server,nowait -vnc :26 -qmp tcp:0:3026,server,nowait -name
 02-debug-race -boot order=dc,menu=off -cdrom
 /home/kvm/cdrom//root/ubuntu-12.04-server-amd64.iso -k de -pidfile
 /var/run/qemu/vm-221.pid -mem-prealloc -cpu
 host,+x2apic,model_id=Intel(R) Xeon(R) CPU   L5640  @
 2.27GHz,-tsc -rtc base=utc -usb -usbdevice tablet -no-hpet -vga cirrus
 
 it is important that the attached virtio image contains only zeroes. if
 the system boots from cd, select boot from first harddisk.
 the hypervisor then hangs at 100% cpu and neither monitor nor qmp are
 responsive anymore.
 
 i have also seen customers reporting this when a VM is shut down.
 
 if this is connected to the threaded vnc server it might be important to
 connected at this time.
 
 debug backtrace attached.
 
 Thanks,
 Peter
 
 -- 
 
 (gdb) file /usr/bin/qemu-kvm-1.0.1
 Reading symbols from /usr/bin/qemu-kvm-1.0.1...done.
 (gdb) attach 5145
 Attaching to program: /usr/bin/qemu-kvm-1.0.1, process 5145
 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
 found)...done.
 Loaded symbols for /lib64/ld-linux-x86-64.so.2
 [Thread debugging using libthread_db enabled]
 [New Thread 0x7f54d08b9700 (LWP 5253)]
 [New Thread 0x7f5552757700 (LWP 5152)]
 [New Thread 0x7f5552f58700 (LWP 5151)]
 0x7f5553c6b5a3 in select () from /lib/libc.so.6
 (gdb) info threads
   4 Thread 0x7f5552f58700 (LWP 5151)  0x7f5553c6a747 in ioctl ()
 from /lib/libc.so.6
   3 Thread 0x7f5552757700 (LWP 5152)  0x7f5553c6a747 in ioctl ()
 from /lib/libc.so.6
   2 Thread 0x7f54d08b9700 (LWP 5253)  0x7f5553f1a85c in
 pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
 * 1 Thread 0x7f50d700 (LWP 5145)  0x7f5553c6b5a3 in select ()
 from /lib/libc.so.6
 (gdb) thread apply all bt
 
 Thread 4 (Thread 0x7f5552f58700 (LWP 5151)):
 #0  0x7f5553c6a747 in ioctl () from /lib/libc.so.6
 #1  0x7f727830 in kvm_vcpu_ioctl (env=0x7f5557652f10,
 type=44672) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:1101
 #2  0x7f72728a in kvm_cpu_exec (env=0x7f5557652f10) at
 /usr/src/qemu-kvm-1.0.1/kvm-all.c:987
 #3  0x7f6f5c08 in qemu_kvm_cpu_thread_fn (arg=0x7f5557652f10) at
 /usr/src/qemu-kvm-1.0.1/cpus.c:740
 #4  0x7f5553f159ca in start_thread () from /lib/libpthread.so.0
 #5  0x7f5553c72cdd in clone () from /lib/libc.so.6
 #6  0x in ?? ()
 
 Thread 3 (Thread 0x7f5552757700 (LWP 5152)):
 #0  0x7f5553c6a747 in ioctl () from /lib/libc.so.6
 #1  0x7f727830 in kvm_vcpu_ioctl (env=0x7f555766ae60,
 type=44672) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:1101
 #2  0x7f72728a in kvm_cpu_exec (env=0x7f555766ae60) at
 /usr/src/qemu-kvm-1.0.1/kvm-all.c:987
 #3  0x7f6f5c08 in qemu_kvm_cpu_thread_fn (arg=0x7f555766ae60) at
 /usr/src/qemu-kvm-1.0.1/cpus.c:740
 #4  0x7f5553f159ca in start_thread () from /lib/libpthread.so.0
 #5  0x7f5553c72cdd in clone () from /lib/libc.so.6
 #6  0x in ?? ()
 
 Thread 2 (Thread 0x7f54d08b9700 (LWP 5253)):
 #0  0x7f5553f1a85c in pthread_cond_wait@@GLIBC_2.3.2 () from
 /lib/libpthread.so.0
 #1  0x7f679f5d in qemu_cond_wait (cond=0x7f5557ede1e0,
 mutex=0x7f5557ede210) at qemu-thread-posix.c:113
 #2  0x7f6b06a1 in vnc_worker_thread_loop (queue=0x7f5557ede1e0)
 at ui/vnc-jobs-async.c:222
 #3  0x7f6b0b7f in vnc_worker_thread (arg=0x7f5557ede1e0) at
 ui/vnc-jobs-async.c:318
 #4  0x7f5553f159ca in start_thread () from /lib/libpthread.so.0
 #5  0x7f5553c72cdd in clone () from /lib/libc.so.6
 #6  0x in ?? ()
 
 Thread 1 (Thread