Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-08-22 Thread Peter Lieven

On 08/21/12 10:23, Stefan Hajnoczi wrote:

On Tue, Aug 21, 2012 at 8:21 AM, Jan Kiszkajan.kis...@siemens.com  wrote:

On 2012-08-19 11:42, Avi Kivity wrote:

On 08/17/2012 06:04 PM, Jan Kiszka wrote:

Can anyone imagine that such a barrier may actually be required? If it
is currently possible that env-stop is evaluated before we called into
sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the
signal without properly processing its reason (stop).

Should not be required (TM): Both signal eating / stop checking and stop
setting / signal generation happens under the BQL, thus the ordering
must not make a difference here.

Agree.



Don't see where we could lose a signal. Maybe due to a subtle memory
corruption that sets thread_kicked to non-zero, preventing the kicking
this way.

Cannot be ruled out, yet too much of a coincidence.

Could be a kernel bug (either in kvm or elsewhere), we've had several
before in this area.

Is this reproducible?

Not for me. Peter only hit it very rarely, Peter obviously more easily.

I have only hit this once and was not able to reproduce it.

For me it was very reproducible, but my issue was fixed by:

http://www.mail-archive.com/kvm@vger.kernel.org/msg70908.html

Never seen this since then,
Peter


Stefan





Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-08-21 Thread Jan Kiszka
On 2012-08-19 11:42, Avi Kivity wrote:
 On 08/17/2012 06:04 PM, Jan Kiszka wrote:
  
 Can anyone imagine that such a barrier may actually be required? If it
 is currently possible that env-stop is evaluated before we called into
 sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the
 signal without properly processing its reason (stop).

 Should not be required (TM): Both signal eating / stop checking and stop
 setting / signal generation happens under the BQL, thus the ordering
 must not make a difference here.
 
 Agree.
 
 
 Don't see where we could lose a signal. Maybe due to a subtle memory
 corruption that sets thread_kicked to non-zero, preventing the kicking
 this way.
 
 Cannot be ruled out, yet too much of a coincidence.
 
 Could be a kernel bug (either in kvm or elsewhere), we've had several
 before in this area.
 
 Is this reproducible?

Not for me. Peter only hit it very rarely, Peter obviously more easily.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-08-21 Thread Stefan Hajnoczi
On Tue, Aug 21, 2012 at 8:21 AM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-08-19 11:42, Avi Kivity wrote:
 On 08/17/2012 06:04 PM, Jan Kiszka wrote:

 Can anyone imagine that such a barrier may actually be required? If it
 is currently possible that env-stop is evaluated before we called into
 sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the
 signal without properly processing its reason (stop).

 Should not be required (TM): Both signal eating / stop checking and stop
 setting / signal generation happens under the BQL, thus the ordering
 must not make a difference here.

 Agree.


 Don't see where we could lose a signal. Maybe due to a subtle memory
 corruption that sets thread_kicked to non-zero, preventing the kicking
 this way.

 Cannot be ruled out, yet too much of a coincidence.

 Could be a kernel bug (either in kvm or elsewhere), we've had several
 before in this area.

 Is this reproducible?

 Not for me. Peter only hit it very rarely, Peter obviously more easily.

I have only hit this once and was not able to reproduce it.

Stefan



Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-08-19 Thread Avi Kivity
On 08/17/2012 06:04 PM, Jan Kiszka wrote:
  
 Can anyone imagine that such a barrier may actually be required? If it
 is currently possible that env-stop is evaluated before we called into
 sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the
 signal without properly processing its reason (stop).
 
 Should not be required (TM): Both signal eating / stop checking and stop
 setting / signal generation happens under the BQL, thus the ordering
 must not make a difference here.

Agree.


 Don't see where we could lose a signal. Maybe due to a subtle memory
 corruption that sets thread_kicked to non-zero, preventing the kicking
 this way.

Cannot be ruled out, yet too much of a coincidence.

Could be a kernel bug (either in kvm or elsewhere), we've had several
before in this area.

Is this reproducible?

-- 
error compiling committee.c: too many arguments to function



Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-08-17 Thread Jan Kiszka
On 2012-08-06 17:11, Stefan Hajnoczi wrote:
 On Thu, Jun 28, 2012 at 2:05 PM, Peter Lieven p...@dlhnet.de wrote:
 i debugged my initial problem further and found out that the problem happens
 to be that
 the main thread is stuck in pause_all_vcpus() on reset or quit commands in
 the monitor
 if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the
 condition from while (ret == 0)
 to while ((ret == 0)  !env-stop); it works, but is this the right fix?
 Quit command seems to work, but on Reset the VM enterns pause state.
 
 I think I'm hitting something similar.  I installed a F17 amd64 guest
 (3.5 kernel) but before booting entered the GRUB boot menu edit mode.
 The guest seemed unresponsive so I switched to the monitor, which also
 froze shortly afterwards.  The VNC screen ended up being all black.
 
 qemu-kvm.git/master 3e4305694fd891b69e4450e59ec4c65420907ede
 Linux 3.2.0-3-amd64 from Debian testing
 
 $ qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -drive
 if=virtio,cache=none,file=f17.img,aio=native -serial stdio
 
 (gdb) thread apply all bt
 
 Thread 3 (Thread 0x7f8008e23700 (LWP 367)):
 #0  0x7f800f891727 in ioctl () at ../sysdeps/unix/syscall-template.S:82
 #1  0x7f80137b92c9 in kvm_vcpu_ioctl
 (env=env@entry=0x7f8015b49640, type=type@entry=44672)
 at /home/stefanha/qemu-kvm/kvm-all.c:1619
 #2  0x7f80137b93fe in kvm_cpu_exec (env=env@entry=0x7f8015b49640)
 at /home/stefanha/qemu-kvm/kvm-all.c:1506
 #3  0x7f8013766f31 in qemu_kvm_cpu_thread_fn (arg=0x7f8015b49640)
 at /home/stefanha/qemu-kvm/cpus.c:756
 #4  0x7f800fb4db50 in start_thread (arg=optimized out) at
 pthread_create.c:304
 #5  0x7f800f8986dd in clone () at
 ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
 #6  0x in ?? ()
 
 This vcpu is still executing guest code and I've seen it successfully
 dispatching I/O.  The problem is it's missing the exit_request...
 
 Thread 2 (Thread 0x7f8008622700 (LWP 368)):
 #0  pthread_cond_wait@@GLIBC_2.3.2 ()
 at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
 #1  0x7f801372b229 in qemu_cond_wait (cond=optimized out,
 mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
 #2  0x7f8013766eff in qemu_kvm_wait_io_event (env=optimized out)
 at /home/stefanha/qemu-kvm/cpus.c:724
 #3  qemu_kvm_cpu_thread_fn (arg=0x7f8015b67450) at
 /home/stefanha/qemu-kvm/cpus.c:761
 #4  0x7f800fb4db50 in start_thread (arg=optimized out) at
 pthread_create.c:304
 #5  0x7f800f8986dd in clone () at
 ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
 #6  0x in ?? ()
 
 No problems here.
 
 Thread 1 (Thread 0x7f801347b8c0 (LWP 365)):
 #0  pthread_cond_wait@@GLIBC_2.3.2 ()
 at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
 #1  0x7f801372b229 in qemu_cond_wait (cond=cond@entry=0x7f801402fd80,
 mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
 #2  0x7f8013768949 in pause_all_vcpus () at
 /home/stefanha/qemu-kvm/cpus.c:962
 #3  0x7f80136028c8 in main (argc=optimized out, argv=optimized out,
 envp=optimized out) at /home/stefanha/qemu-kvm/vl.c:3695
 
 We're deadlocked in pause_all_vcpus(), waiting for vcpu #0 to pause.
 Unfortunately vcpu #0 has -exit_request=0 although -stop=1.
 
 Here are the vcpus:
 
 (gdb) p first_cpu
 $6 = (struct CPUX86State *) 0x7f8015b49640
 (gdb) p first_cpu-next_cpu
 $7 = (struct CPUX86State *) 0x7f8015b67450
 (gdb) p first_cpu-next_cpu-next_cpu
 $8 = (struct CPUX86State *) 0x0
 
 (gdb) p first_cpu-stop
 $9 = 1
 (gdb) p first_cpu-stopped
 $10 = 0
 (gdb) p first_cpu-exit_request
 $11 = 0

CPUState::exit_request is only set on specific synchronous events, see
target-i386/kvm.c.

More interesting is CPUState::thread_kicked. If it's set, qemu_cpu_kick
will skip the kicking via a signal. Maybe there is some race. Let me
think about such possibilities again...

Jan

 
 :(
 
 This isn't easy to reproduce.  I tried entering the GRUB boot menu
 again and there was no deadlock.
 
 Stefan
 

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-08-17 Thread Jan Kiszka
On 2012-08-17 15:11, Jan Kiszka wrote:
 On 2012-08-06 17:11, Stefan Hajnoczi wrote:
 On Thu, Jun 28, 2012 at 2:05 PM, Peter Lieven p...@dlhnet.de wrote:
 i debugged my initial problem further and found out that the problem happens
 to be that
 the main thread is stuck in pause_all_vcpus() on reset or quit commands in
 the monitor
 if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the
 condition from while (ret == 0)
 to while ((ret == 0)  !env-stop); it works, but is this the right fix?
 Quit command seems to work, but on Reset the VM enterns pause state.

 I think I'm hitting something similar.  I installed a F17 amd64 guest
 (3.5 kernel) but before booting entered the GRUB boot menu edit mode.
 The guest seemed unresponsive so I switched to the monitor, which also
 froze shortly afterwards.  The VNC screen ended up being all black.

 qemu-kvm.git/master 3e4305694fd891b69e4450e59ec4c65420907ede
 Linux 3.2.0-3-amd64 from Debian testing

 $ qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -drive
 if=virtio,cache=none,file=f17.img,aio=native -serial stdio

 (gdb) thread apply all bt

 Thread 3 (Thread 0x7f8008e23700 (LWP 367)):
 #0  0x7f800f891727 in ioctl () at ../sysdeps/unix/syscall-template.S:82
 #1  0x7f80137b92c9 in kvm_vcpu_ioctl
 (env=env@entry=0x7f8015b49640, type=type@entry=44672)
 at /home/stefanha/qemu-kvm/kvm-all.c:1619
 #2  0x7f80137b93fe in kvm_cpu_exec (env=env@entry=0x7f8015b49640)
 at /home/stefanha/qemu-kvm/kvm-all.c:1506
 #3  0x7f8013766f31 in qemu_kvm_cpu_thread_fn (arg=0x7f8015b49640)
 at /home/stefanha/qemu-kvm/cpus.c:756
 #4  0x7f800fb4db50 in start_thread (arg=optimized out) at
 pthread_create.c:304
 #5  0x7f800f8986dd in clone () at
 ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
 #6  0x in ?? ()

 This vcpu is still executing guest code and I've seen it successfully
 dispatching I/O.  The problem is it's missing the exit_request...

 Thread 2 (Thread 0x7f8008622700 (LWP 368)):
 #0  pthread_cond_wait@@GLIBC_2.3.2 ()
 at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
 #1  0x7f801372b229 in qemu_cond_wait (cond=optimized out,
 mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
 #2  0x7f8013766eff in qemu_kvm_wait_io_event (env=optimized out)
 at /home/stefanha/qemu-kvm/cpus.c:724
 #3  qemu_kvm_cpu_thread_fn (arg=0x7f8015b67450) at
 /home/stefanha/qemu-kvm/cpus.c:761
 #4  0x7f800fb4db50 in start_thread (arg=optimized out) at
 pthread_create.c:304
 #5  0x7f800f8986dd in clone () at
 ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
 #6  0x in ?? ()

 No problems here.

 Thread 1 (Thread 0x7f801347b8c0 (LWP 365)):
 #0  pthread_cond_wait@@GLIBC_2.3.2 ()
 at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
 #1  0x7f801372b229 in qemu_cond_wait (cond=cond@entry=0x7f801402fd80,
 mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
 #2  0x7f8013768949 in pause_all_vcpus () at
 /home/stefanha/qemu-kvm/cpus.c:962
 #3  0x7f80136028c8 in main (argc=optimized out, argv=optimized out,
 envp=optimized out) at /home/stefanha/qemu-kvm/vl.c:3695

 We're deadlocked in pause_all_vcpus(), waiting for vcpu #0 to pause.
 Unfortunately vcpu #0 has -exit_request=0 although -stop=1.

 Here are the vcpus:

 (gdb) p first_cpu
 $6 = (struct CPUX86State *) 0x7f8015b49640
 (gdb) p first_cpu-next_cpu
 $7 = (struct CPUX86State *) 0x7f8015b67450
 (gdb) p first_cpu-next_cpu-next_cpu
 $8 = (struct CPUX86State *) 0x0

 (gdb) p first_cpu-stop
 $9 = 1
 (gdb) p first_cpu-stopped
 $10 = 0
 (gdb) p first_cpu-exit_request
 $11 = 0
 
 CPUState::exit_request is only set on specific synchronous events, see
 target-i386/kvm.c.
 
 More interesting is CPUState::thread_kicked. If it's set, qemu_cpu_kick
 will skip the kicking via a signal. Maybe there is some race. Let me
 think about such possibilities again...

diff --git a/cpus.c b/cpus.c
index e476a3c..30f3228 100644
--- a/cpus.c
+++ b/cpus.c
@@ -726,6 +726,9 @@ static void qemu_kvm_wait_io_event(CPUArchState *env)
 }
 
 qemu_kvm_eat_signals(env);
+/* Ensure that checking env-stop cannot overtake signal processing so
+ * that we lose the latter without stopping. */
+smp_rmb();
 qemu_wait_io_event_common(env);
 }
 
Can anyone imagine that such a barrier may actually be required? If it
is currently possible that env-stop is evaluated before we called into
sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the
signal without properly processing its reason (stop).

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-08-17 Thread Jan Kiszka
On 2012-08-17 16:36, Jan Kiszka wrote:
 On 2012-08-17 15:11, Jan Kiszka wrote:
 On 2012-08-06 17:11, Stefan Hajnoczi wrote:
 On Thu, Jun 28, 2012 at 2:05 PM, Peter Lieven p...@dlhnet.de wrote:
 i debugged my initial problem further and found out that the problem 
 happens
 to be that
 the main thread is stuck in pause_all_vcpus() on reset or quit commands in
 the monitor
 if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the
 condition from while (ret == 0)
 to while ((ret == 0)  !env-stop); it works, but is this the right fix?
 Quit command seems to work, but on Reset the VM enterns pause state.

 I think I'm hitting something similar.  I installed a F17 amd64 guest
 (3.5 kernel) but before booting entered the GRUB boot menu edit mode.
 The guest seemed unresponsive so I switched to the monitor, which also
 froze shortly afterwards.  The VNC screen ended up being all black.

 qemu-kvm.git/master 3e4305694fd891b69e4450e59ec4c65420907ede
 Linux 3.2.0-3-amd64 from Debian testing

 $ qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -drive
 if=virtio,cache=none,file=f17.img,aio=native -serial stdio

 (gdb) thread apply all bt

 Thread 3 (Thread 0x7f8008e23700 (LWP 367)):
 #0  0x7f800f891727 in ioctl () at ../sysdeps/unix/syscall-template.S:82
 #1  0x7f80137b92c9 in kvm_vcpu_ioctl
 (env=env@entry=0x7f8015b49640, type=type@entry=44672)
 at /home/stefanha/qemu-kvm/kvm-all.c:1619
 #2  0x7f80137b93fe in kvm_cpu_exec (env=env@entry=0x7f8015b49640)
 at /home/stefanha/qemu-kvm/kvm-all.c:1506
 #3  0x7f8013766f31 in qemu_kvm_cpu_thread_fn (arg=0x7f8015b49640)
 at /home/stefanha/qemu-kvm/cpus.c:756
 #4  0x7f800fb4db50 in start_thread (arg=optimized out) at
 pthread_create.c:304
 #5  0x7f800f8986dd in clone () at
 ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
 #6  0x in ?? ()

 This vcpu is still executing guest code and I've seen it successfully
 dispatching I/O.  The problem is it's missing the exit_request...

 Thread 2 (Thread 0x7f8008622700 (LWP 368)):
 #0  pthread_cond_wait@@GLIBC_2.3.2 ()
 at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
 #1  0x7f801372b229 in qemu_cond_wait (cond=optimized out,
 mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
 #2  0x7f8013766eff in qemu_kvm_wait_io_event (env=optimized out)
 at /home/stefanha/qemu-kvm/cpus.c:724
 #3  qemu_kvm_cpu_thread_fn (arg=0x7f8015b67450) at
 /home/stefanha/qemu-kvm/cpus.c:761
 #4  0x7f800fb4db50 in start_thread (arg=optimized out) at
 pthread_create.c:304
 #5  0x7f800f8986dd in clone () at
 ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
 #6  0x in ?? ()

 No problems here.

 Thread 1 (Thread 0x7f801347b8c0 (LWP 365)):
 #0  pthread_cond_wait@@GLIBC_2.3.2 ()
 at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
 #1  0x7f801372b229 in qemu_cond_wait (cond=cond@entry=0x7f801402fd80,
 mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
 #2  0x7f8013768949 in pause_all_vcpus () at
 /home/stefanha/qemu-kvm/cpus.c:962
 #3  0x7f80136028c8 in main (argc=optimized out, argv=optimized out,
 envp=optimized out) at /home/stefanha/qemu-kvm/vl.c:3695

 We're deadlocked in pause_all_vcpus(), waiting for vcpu #0 to pause.
 Unfortunately vcpu #0 has -exit_request=0 although -stop=1.

 Here are the vcpus:

 (gdb) p first_cpu
 $6 = (struct CPUX86State *) 0x7f8015b49640
 (gdb) p first_cpu-next_cpu
 $7 = (struct CPUX86State *) 0x7f8015b67450
 (gdb) p first_cpu-next_cpu-next_cpu
 $8 = (struct CPUX86State *) 0x0

 (gdb) p first_cpu-stop
 $9 = 1
 (gdb) p first_cpu-stopped
 $10 = 0
 (gdb) p first_cpu-exit_request
 $11 = 0

 CPUState::exit_request is only set on specific synchronous events, see
 target-i386/kvm.c.

 More interesting is CPUState::thread_kicked. If it's set, qemu_cpu_kick
 will skip the kicking via a signal. Maybe there is some race. Let me
 think about such possibilities again...
 
 diff --git a/cpus.c b/cpus.c
 index e476a3c..30f3228 100644
 --- a/cpus.c
 +++ b/cpus.c
 @@ -726,6 +726,9 @@ static void qemu_kvm_wait_io_event(CPUArchState *env)
  }
  
  qemu_kvm_eat_signals(env);
 +/* Ensure that checking env-stop cannot overtake signal processing so
 + * that we lose the latter without stopping. */
 +smp_rmb();

rmb is nonsense. Should be a plain barrier() - if at all.

  qemu_wait_io_event_common(env);
  }
  
 Can anyone imagine that such a barrier may actually be required? If it
 is currently possible that env-stop is evaluated before we called into
 sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the
 signal without properly processing its reason (stop).

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-08-17 Thread Jan Kiszka
On 2012-08-17 16:41, Jan Kiszka wrote:
 On 2012-08-17 16:36, Jan Kiszka wrote:
 On 2012-08-17 15:11, Jan Kiszka wrote:
 On 2012-08-06 17:11, Stefan Hajnoczi wrote:
 On Thu, Jun 28, 2012 at 2:05 PM, Peter Lieven p...@dlhnet.de wrote:
 i debugged my initial problem further and found out that the problem 
 happens
 to be that
 the main thread is stuck in pause_all_vcpus() on reset or quit commands in
 the monitor
 if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the
 condition from while (ret == 0)
 to while ((ret == 0)  !env-stop); it works, but is this the right fix?
 Quit command seems to work, but on Reset the VM enterns pause state.

 I think I'm hitting something similar.  I installed a F17 amd64 guest
 (3.5 kernel) but before booting entered the GRUB boot menu edit mode.
 The guest seemed unresponsive so I switched to the monitor, which also
 froze shortly afterwards.  The VNC screen ended up being all black.

 qemu-kvm.git/master 3e4305694fd891b69e4450e59ec4c65420907ede
 Linux 3.2.0-3-amd64 from Debian testing

 $ qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -drive
 if=virtio,cache=none,file=f17.img,aio=native -serial stdio

 (gdb) thread apply all bt

 Thread 3 (Thread 0x7f8008e23700 (LWP 367)):
 #0  0x7f800f891727 in ioctl () at ../sysdeps/unix/syscall-template.S:82
 #1  0x7f80137b92c9 in kvm_vcpu_ioctl
 (env=env@entry=0x7f8015b49640, type=type@entry=44672)
 at /home/stefanha/qemu-kvm/kvm-all.c:1619
 #2  0x7f80137b93fe in kvm_cpu_exec (env=env@entry=0x7f8015b49640)
 at /home/stefanha/qemu-kvm/kvm-all.c:1506
 #3  0x7f8013766f31 in qemu_kvm_cpu_thread_fn (arg=0x7f8015b49640)
 at /home/stefanha/qemu-kvm/cpus.c:756
 #4  0x7f800fb4db50 in start_thread (arg=optimized out) at
 pthread_create.c:304
 #5  0x7f800f8986dd in clone () at
 ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
 #6  0x in ?? ()

 This vcpu is still executing guest code and I've seen it successfully
 dispatching I/O.  The problem is it's missing the exit_request...

 Thread 2 (Thread 0x7f8008622700 (LWP 368)):
 #0  pthread_cond_wait@@GLIBC_2.3.2 ()
 at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
 #1  0x7f801372b229 in qemu_cond_wait (cond=optimized out,
 mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
 #2  0x7f8013766eff in qemu_kvm_wait_io_event (env=optimized out)
 at /home/stefanha/qemu-kvm/cpus.c:724
 #3  qemu_kvm_cpu_thread_fn (arg=0x7f8015b67450) at
 /home/stefanha/qemu-kvm/cpus.c:761
 #4  0x7f800fb4db50 in start_thread (arg=optimized out) at
 pthread_create.c:304
 #5  0x7f800f8986dd in clone () at
 ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
 #6  0x in ?? ()

 No problems here.

 Thread 1 (Thread 0x7f801347b8c0 (LWP 365)):
 #0  pthread_cond_wait@@GLIBC_2.3.2 ()
 at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
 #1  0x7f801372b229 in qemu_cond_wait (cond=cond@entry=0x7f801402fd80,
 mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
 #2  0x7f8013768949 in pause_all_vcpus () at
 /home/stefanha/qemu-kvm/cpus.c:962
 #3  0x7f80136028c8 in main (argc=optimized out, argv=optimized out,
 envp=optimized out) at /home/stefanha/qemu-kvm/vl.c:3695

 We're deadlocked in pause_all_vcpus(), waiting for vcpu #0 to pause.
 Unfortunately vcpu #0 has -exit_request=0 although -stop=1.

 Here are the vcpus:

 (gdb) p first_cpu
 $6 = (struct CPUX86State *) 0x7f8015b49640
 (gdb) p first_cpu-next_cpu
 $7 = (struct CPUX86State *) 0x7f8015b67450
 (gdb) p first_cpu-next_cpu-next_cpu
 $8 = (struct CPUX86State *) 0x0

 (gdb) p first_cpu-stop
 $9 = 1
 (gdb) p first_cpu-stopped
 $10 = 0
 (gdb) p first_cpu-exit_request
 $11 = 0

 CPUState::exit_request is only set on specific synchronous events, see
 target-i386/kvm.c.

 More interesting is CPUState::thread_kicked. If it's set, qemu_cpu_kick
 will skip the kicking via a signal. Maybe there is some race. Let me
 think about such possibilities again...

 diff --git a/cpus.c b/cpus.c
 index e476a3c..30f3228 100644
 --- a/cpus.c
 +++ b/cpus.c
 @@ -726,6 +726,9 @@ static void qemu_kvm_wait_io_event(CPUArchState *env)
  }
  
  qemu_kvm_eat_signals(env);
 +/* Ensure that checking env-stop cannot overtake signal processing so
 + * that we lose the latter without stopping. */
 +smp_rmb();
 
 rmb is nonsense. Should be a plain barrier() - if at all.
 
  qemu_wait_io_event_common(env);
  }
  
 Can anyone imagine that such a barrier may actually be required? If it
 is currently possible that env-stop is evaluated before we called into
 sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the
 signal without properly processing its reason (stop).

Should not be required (TM): Both signal eating / stop checking and stop
setting / signal generation happens under the BQL, thus the ordering
must not make a difference here.

Don't see where we could lose a signal. Maybe due to a subtle memory

Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-08-06 Thread Stefan Hajnoczi
On Thu, Jun 28, 2012 at 2:05 PM, Peter Lieven p...@dlhnet.de wrote:
 i debugged my initial problem further and found out that the problem happens
 to be that
 the main thread is stuck in pause_all_vcpus() on reset or quit commands in
 the monitor
 if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the
 condition from while (ret == 0)
 to while ((ret == 0)  !env-stop); it works, but is this the right fix?
 Quit command seems to work, but on Reset the VM enterns pause state.

I think I'm hitting something similar.  I installed a F17 amd64 guest
(3.5 kernel) but before booting entered the GRUB boot menu edit mode.
The guest seemed unresponsive so I switched to the monitor, which also
froze shortly afterwards.  The VNC screen ended up being all black.

qemu-kvm.git/master 3e4305694fd891b69e4450e59ec4c65420907ede
Linux 3.2.0-3-amd64 from Debian testing

$ qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -drive
if=virtio,cache=none,file=f17.img,aio=native -serial stdio

(gdb) thread apply all bt

Thread 3 (Thread 0x7f8008e23700 (LWP 367)):
#0  0x7f800f891727 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x7f80137b92c9 in kvm_vcpu_ioctl
(env=env@entry=0x7f8015b49640, type=type@entry=44672)
at /home/stefanha/qemu-kvm/kvm-all.c:1619
#2  0x7f80137b93fe in kvm_cpu_exec (env=env@entry=0x7f8015b49640)
at /home/stefanha/qemu-kvm/kvm-all.c:1506
#3  0x7f8013766f31 in qemu_kvm_cpu_thread_fn (arg=0x7f8015b49640)
at /home/stefanha/qemu-kvm/cpus.c:756
#4  0x7f800fb4db50 in start_thread (arg=optimized out) at
pthread_create.c:304
#5  0x7f800f8986dd in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x in ?? ()

This vcpu is still executing guest code and I've seen it successfully
dispatching I/O.  The problem is it's missing the exit_request...

Thread 2 (Thread 0x7f8008622700 (LWP 368)):
#0  pthread_cond_wait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x7f801372b229 in qemu_cond_wait (cond=optimized out,
mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
#2  0x7f8013766eff in qemu_kvm_wait_io_event (env=optimized out)
at /home/stefanha/qemu-kvm/cpus.c:724
#3  qemu_kvm_cpu_thread_fn (arg=0x7f8015b67450) at
/home/stefanha/qemu-kvm/cpus.c:761
#4  0x7f800fb4db50 in start_thread (arg=optimized out) at
pthread_create.c:304
#5  0x7f800f8986dd in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x in ?? ()

No problems here.

Thread 1 (Thread 0x7f801347b8c0 (LWP 365)):
#0  pthread_cond_wait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x7f801372b229 in qemu_cond_wait (cond=cond@entry=0x7f801402fd80,
mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
#2  0x7f8013768949 in pause_all_vcpus () at
/home/stefanha/qemu-kvm/cpus.c:962
#3  0x7f80136028c8 in main (argc=optimized out, argv=optimized out,
envp=optimized out) at /home/stefanha/qemu-kvm/vl.c:3695

We're deadlocked in pause_all_vcpus(), waiting for vcpu #0 to pause.
Unfortunately vcpu #0 has -exit_request=0 although -stop=1.

Here are the vcpus:

(gdb) p first_cpu
$6 = (struct CPUX86State *) 0x7f8015b49640
(gdb) p first_cpu-next_cpu
$7 = (struct CPUX86State *) 0x7f8015b67450
(gdb) p first_cpu-next_cpu-next_cpu
$8 = (struct CPUX86State *) 0x0

(gdb) p first_cpu-stop
$9 = 1
(gdb) p first_cpu-stopped
$10 = 0
(gdb) p first_cpu-exit_request
$11 = 0

:(

This isn't easy to reproduce.  I tried entering the GRUB boot menu
again and there was no deadlock.

Stefan



Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-07-02 Thread Jan Kiszka
On 2012-07-01 21:18, Peter Lieven wrote:
 
 Am 01.07.2012 um 10:19 schrieb Avi Kivity:
 
 On 06/28/2012 10:27 PM, Peter Lieven wrote:

 Am 28.06.2012 um 18:32 schrieb Avi Kivity:

 On 06/28/2012 07:29 PM, Peter Lieven wrote:
 Yes. A signal is sent, and KVM returns from the guest to userspace on
 pending signals.

 is there a description available how this process exactly works?

 The kernel part is in vcpu_enter_guest(), see the check for
 signal_pending().  But this hasn't seen changes for quite a long while.

 Thank you, i will have a look. I noticed a few patches that where submitted
 during the last year, maybe one of them is related:

 Switch SIG_IPI to SIGUSR1
 Fix signal handling of SIG_IPI when io-thread is enabled

 In the first commit there is mentioned a 32-on-64-bit Linux kernel bug
 is there any reference to that?


 http://web.archiveorange.com/archive/v/1XS1vwGSFLyYygwTXg1K.  Are you
 running 32-on-64?
 
 I think the issue occurs when running a 32-bit guest on a 64-bit system. 
 Afaik, the
 isolinux loader where is see the race is 32-bit altough it is a 64-bit ubuntu 
 lts
 cd image. The second case where i have seen the race is on shutdown of a
 Windows 2000 Server which is also 32-bit.

32-on-64 particularly means using a 32-bit QEMU[-kvm] binary on a
64-bit host kernel. What does file qemu-system-x86_64 report about yours?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux





Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-07-02 Thread Peter Lieven

On 02.07.2012 09:05, Jan Kiszka wrote:

On 2012-07-01 21:18, Peter Lieven wrote:

Am 01.07.2012 um 10:19 schrieb Avi Kivity:


On 06/28/2012 10:27 PM, Peter Lieven wrote:

Am 28.06.2012 um 18:32 schrieb Avi Kivity:


On 06/28/2012 07:29 PM, Peter Lieven wrote:

Yes. A signal is sent, and KVM returns from the guest to userspace on
pending signals.

is there a description available how this process exactly works?

The kernel part is in vcpu_enter_guest(), see the check for
signal_pending().  But this hasn't seen changes for quite a long while.

Thank you, i will have a look. I noticed a few patches that where submitted
during the last year, maybe one of them is related:

Switch SIG_IPI to SIGUSR1
Fix signal handling of SIG_IPI when io-thread is enabled

In the first commit there is mentioned a 32-on-64-bit Linux kernel bug
is there any reference to that?


http://web.archiveorange.com/archive/v/1XS1vwGSFLyYygwTXg1K.  Are you
running 32-on-64?

I think the issue occurs when running a 32-bit guest on a 64-bit system. Afaik, 
the
isolinux loader where is see the race is 32-bit altough it is a 64-bit ubuntu 
lts
cd image. The second case where i have seen the race is on shutdown of a
Windows 2000 Server which is also 32-bit.

32-on-64 particularly means using a 32-bit QEMU[-kvm] binary on a
64-bit host kernel. What does file qemu-system-x86_64 report about yours?
Its custom build on a 64-bit linux as 64-bit application. I will try to 
continue to find out

today whats going wrong. Any help or hints appreciated ;-)

Thanks,
Peter


Jan






Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-07-01 Thread Avi Kivity
On 06/28/2012 10:27 PM, Peter Lieven wrote:
 
 Am 28.06.2012 um 18:32 schrieb Avi Kivity:
 
 On 06/28/2012 07:29 PM, Peter Lieven wrote:
 Yes. A signal is sent, and KVM returns from the guest to userspace on
 pending signals.

 is there a description available how this process exactly works?

 The kernel part is in vcpu_enter_guest(), see the check for
 signal_pending().  But this hasn't seen changes for quite a long while.
 
 Thank you, i will have a look. I noticed a few patches that where submitted
 during the last year, maybe one of them is related:
 
 Switch SIG_IPI to SIGUSR1
 Fix signal handling of SIG_IPI when io-thread is enabled
 
 In the first commit there is mentioned a 32-on-64-bit Linux kernel bug
 is there any reference to that?


http://web.archiveorange.com/archive/v/1XS1vwGSFLyYygwTXg1K.  Are you
running 32-on-64?


-- 
error compiling committee.c: too many arguments to function





Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-07-01 Thread Peter Lieven

Am 01.07.2012 um 10:19 schrieb Avi Kivity:

 On 06/28/2012 10:27 PM, Peter Lieven wrote:
 
 Am 28.06.2012 um 18:32 schrieb Avi Kivity:
 
 On 06/28/2012 07:29 PM, Peter Lieven wrote:
 Yes. A signal is sent, and KVM returns from the guest to userspace on
 pending signals.
 
 is there a description available how this process exactly works?
 
 The kernel part is in vcpu_enter_guest(), see the check for
 signal_pending().  But this hasn't seen changes for quite a long while.
 
 Thank you, i will have a look. I noticed a few patches that where submitted
 during the last year, maybe one of them is related:
 
 Switch SIG_IPI to SIGUSR1
 Fix signal handling of SIG_IPI when io-thread is enabled
 
 In the first commit there is mentioned a 32-on-64-bit Linux kernel bug
 is there any reference to that?
 
 
 http://web.archiveorange.com/archive/v/1XS1vwGSFLyYygwTXg1K.  Are you
 running 32-on-64?

I think the issue occurs when running a 32-bit guest on a 64-bit system. Afaik, 
the
isolinux loader where is see the race is 32-bit altough it is a 64-bit ubuntu 
lts
cd image. The second case where i have seen the race is on shutdown of a
Windows 2000 Server which is also 32-bit.

Peter

 
 
 -- 
 error compiling committee.c: too many arguments to function
 
 




[Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-06-28 Thread Peter Lieven

Hi,

i debugged my initial problem further and found out that the problem 
happens to be that
the main thread is stuck in pause_all_vcpus() on reset or quit commands 
in the monitor
if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the 
condition from while (ret == 0)

to while ((ret == 0)  !env-stop); it works, but is this the right fix?
Quit command seems to work, but on Reset the VM enterns pause state.

Thanks,
Peter




Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-06-28 Thread Jan Kiszka
On 2012-06-28 15:05, Peter Lieven wrote:
 Hi,
 
 i debugged my initial problem further and found out that the problem
 happens to be that
 the main thread is stuck in pause_all_vcpus() on reset or quit commands
 in the monitor
 if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the
 condition from while (ret == 0)
 to while ((ret == 0)  !env-stop); it works, but is this the right fix?
 Quit command seems to work, but on Reset the VM enterns pause state.

Before entering the wait loop in pause_all_vcpus, there are kicks sent
to all vcpus. Now we need to find out why some of those kicks apparently
don't reach the destination.

Again:
 - on which host kernels does this occur, and which change may have
   changed it?
 - with which qemu-kvm version is it reproducible, and which commit
   introduced or fixed it?

I failed reproducing so far.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux




Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-06-28 Thread Peter Lieven

On 28.06.2012 15:25, Jan Kiszka wrote:

On 2012-06-28 15:05, Peter Lieven wrote:

Hi,

i debugged my initial problem further and found out that the problem
happens to be that
the main thread is stuck in pause_all_vcpus() on reset or quit commands
in the monitor
if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the
condition from while (ret == 0)
to while ((ret == 0)  !env-stop); it works, but is this the right fix?
Quit command seems to work, but on Reset the VM enterns pause state.

Before entering the wait loop in pause_all_vcpus, there are kicks sent
to all vcpus. Now we need to find out why some of those kicks apparently
don't reach the destination.

can you explain shot what exactly these kicks do? does these kicks lead
to leaving the kernel mode and returning to userspace?

Again:
  - on which host kernels does this occur, and which change may have
changed it?

I do not see it in 3.0.0 and have also not seen it in 2.6.38. both
the mainline 64-bit ubuntu-server kernels (for natty / oneiric 
respectively).

If I compile a more recent kvm-kmod 3.3 or 3.4 on these machines,
it is no longer working.

  - with which qemu-kvm version is it reproducible, and which commit
introduced or fixed it?

qemu-kvm-1.0.1 from sourceforge. to get into the scenario it
is not sufficient to boot from an empty harddisk. to reproduce
i have use a live cd like ubuntu-server 12.04 and choose to
boot from the first harddisk. i think the isolinux loader does
not check for a valid bootsector and just executes what is found
in sector 0. this leads to the mmio reads i posted and 100%
cpu load (most spent in kernel). at that time the monitor/qmp
is still responsible. if i sent a command that pauses all vcpus,
the first cpu is looping in kvm_cpu_exec and the main thread
is waiting. at that time the monitor stops responding.
i have also seen this issue on very old windows 2000 servers
where the system fails to power off and is just halted. maybe
this is also a busy loop.

i will try to bisect this asap and let you know, maybe the above
info helps you already to reproduce.

thanks,
peter


I failed reproducing so far.

Jan






Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-06-28 Thread Jan Kiszka
On 2012-06-28 17:02, Peter Lieven wrote:
 On 28.06.2012 15:25, Jan Kiszka wrote:
 On 2012-06-28 15:05, Peter Lieven wrote:
 Hi,

 i debugged my initial problem further and found out that the problem
 happens to be that
 the main thread is stuck in pause_all_vcpus() on reset or quit commands
 in the monitor
 if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the
 condition from while (ret == 0)
 to while ((ret == 0)  !env-stop); it works, but is this the right fix?
 Quit command seems to work, but on Reset the VM enterns pause state.
 Before entering the wait loop in pause_all_vcpus, there are kicks sent
 to all vcpus. Now we need to find out why some of those kicks apparently
 don't reach the destination.
 can you explain shot what exactly these kicks do? does these kicks lead
 to leaving the kernel mode and returning to userspace?

Yes. A signal is sent, and KVM returns from the guest to userspace on
pending signals.

 Again:
   - on which host kernels does this occur, and which change may have
 changed it?
 I do not see it in 3.0.0 and have also not seen it in 2.6.38. both
 the mainline 64-bit ubuntu-server kernels (for natty / oneiric 
 respectively).
 If I compile a more recent kvm-kmod 3.3 or 3.4 on these machines,
 it is no longer working.

I was asking for kernel 3.3 or 3.4 without kvm-kmod.

   - with which qemu-kvm version is it reproducible, and which commit
 introduced or fixed it?
 qemu-kvm-1.0.1 from sourceforge. to get into the scenario it
 is not sufficient to boot from an empty harddisk. to reproduce

Please also try qemu-kvm git to see if something fixed it there.

 i have use a live cd like ubuntu-server 12.04 and choose to
 boot from the first harddisk. i think the isolinux loader does
 not check for a valid bootsector and just executes what is found
 in sector 0. this leads to the mmio reads i posted and 100%
 cpu load (most spent in kernel). at that time the monitor/qmp
 is still responsible. if i sent a command that pauses all vcpus,
 the first cpu is looping in kvm_cpu_exec and the main thread
 is waiting. at that time the monitor stops responding.
 i have also seen this issue on very old windows 2000 servers
 where the system fails to power off and is just halted. maybe
 this is also a busy loop.
 
 i will try to bisect this asap and let you know, maybe the above
 info helps you already to reproduce.

OK, thanks,

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux





Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-06-28 Thread Peter Lieven

On 28.06.2012 17:22, Jan Kiszka wrote:

On 2012-06-28 17:02, Peter Lieven wrote:

On 28.06.2012 15:25, Jan Kiszka wrote:

On 2012-06-28 15:05, Peter Lieven wrote:

Hi,

i debugged my initial problem further and found out that the problem
happens to be that
the main thread is stuck in pause_all_vcpus() on reset or quit commands
in the monitor
if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the
condition from while (ret == 0)
to while ((ret == 0)   !env-stop); it works, but is this the right fix?
Quit command seems to work, but on Reset the VM enterns pause state.

Before entering the wait loop in pause_all_vcpus, there are kicks sent
to all vcpus. Now we need to find out why some of those kicks apparently
don't reach the destination.

can you explain shot what exactly these kicks do? does these kicks lead
to leaving the kernel mode and returning to userspace?

Yes. A signal is sent, and KVM returns from the guest to userspace on
pending signals.

is there a description available how this process exactly works?

thanks
peter




Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-06-28 Thread Avi Kivity
On 06/28/2012 07:29 PM, Peter Lieven wrote:
 Yes. A signal is sent, and KVM returns from the guest to userspace on
 pending signals.

 is there a description available how this process exactly works?

The kernel part is in vcpu_enter_guest(), see the check for
signal_pending().  But this hasn't seen changes for quite a long while.

-- 
error compiling committee.c: too many arguments to function





Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-06-28 Thread Peter Lieven

Am 28.06.2012 um 18:32 schrieb Avi Kivity:

 On 06/28/2012 07:29 PM, Peter Lieven wrote:
 Yes. A signal is sent, and KVM returns from the guest to userspace on
 pending signals.
 
 is there a description available how this process exactly works?
 
 The kernel part is in vcpu_enter_guest(), see the check for
 signal_pending().  But this hasn't seen changes for quite a long while.

Thank you, i will have a look. I noticed a few patches that where submitted
during the last year, maybe one of them is related:

Switch SIG_IPI to SIGUSR1
Fix signal handling of SIG_IPI when io-thread is enabled

In the first commit there is mentioned a 32-on-64-bit Linux kernel bug
is there any reference to that?

Thank you,
Peter

 
 -- 
 error compiling committee.c: too many arguments to function