Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-08-22 Thread Peter Lieven

On 08/21/12 10:23, Stefan Hajnoczi wrote:

On Tue, Aug 21, 2012 at 8:21 AM, Jan Kiszka  wrote:

On 2012-08-19 11:42, Avi Kivity wrote:

On 08/17/2012 06:04 PM, Jan Kiszka wrote:

Can anyone imagine that such a barrier may actually be required? If it
is currently possible that env->stop is evaluated before we called into
sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the
signal without properly processing its reason (stop).

Should not be required (TM): Both signal eating / stop checking and stop
setting / signal generation happens under the BQL, thus the ordering
must not make a difference here.

Agree.



Don't see where we could lose a signal. Maybe due to a subtle memory
corruption that sets thread_kicked to non-zero, preventing the kicking
this way.

Cannot be ruled out, yet too much of a coincidence.

Could be a kernel bug (either in kvm or elsewhere), we've had several
before in this area.

Is this reproducible?

Not for me. Peter only hit it very rarely, Peter obviously more easily.

I have only hit this once and was not able to reproduce it.

For me it was very reproducible, but my issue was fixed by:

http://www.mail-archive.com/kvm@vger.kernel.org/msg70908.html

Never seen this since then,
Peter


Stefan





Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-08-21 Thread Stefan Hajnoczi
On Tue, Aug 21, 2012 at 8:21 AM, Jan Kiszka  wrote:
> On 2012-08-19 11:42, Avi Kivity wrote:
>> On 08/17/2012 06:04 PM, Jan Kiszka wrote:
>>>
> Can anyone imagine that such a barrier may actually be required? If it
> is currently possible that env->stop is evaluated before we called into
> sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the
> signal without properly processing its reason (stop).
>>>
>>> Should not be required (TM): Both signal eating / stop checking and stop
>>> setting / signal generation happens under the BQL, thus the ordering
>>> must not make a difference here.
>>
>> Agree.
>>
>>
>>> Don't see where we could lose a signal. Maybe due to a subtle memory
>>> corruption that sets thread_kicked to non-zero, preventing the kicking
>>> this way.
>>
>> Cannot be ruled out, yet too much of a coincidence.
>>
>> Could be a kernel bug (either in kvm or elsewhere), we've had several
>> before in this area.
>>
>> Is this reproducible?
>
> Not for me. Peter only hit it very rarely, Peter obviously more easily.

I have only hit this once and was not able to reproduce it.

Stefan



Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-08-21 Thread Jan Kiszka
On 2012-08-19 11:42, Avi Kivity wrote:
> On 08/17/2012 06:04 PM, Jan Kiszka wrote:
>>  
 Can anyone imagine that such a barrier may actually be required? If it
 is currently possible that env->stop is evaluated before we called into
 sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the
 signal without properly processing its reason (stop).
>>
>> Should not be required (TM): Both signal eating / stop checking and stop
>> setting / signal generation happens under the BQL, thus the ordering
>> must not make a difference here.
> 
> Agree.
> 
> 
>> Don't see where we could lose a signal. Maybe due to a subtle memory
>> corruption that sets thread_kicked to non-zero, preventing the kicking
>> this way.
> 
> Cannot be ruled out, yet too much of a coincidence.
> 
> Could be a kernel bug (either in kvm or elsewhere), we've had several
> before in this area.
> 
> Is this reproducible?

Not for me. Peter only hit it very rarely, Peter obviously more easily.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-08-19 Thread Avi Kivity
On 08/17/2012 06:04 PM, Jan Kiszka wrote:
>  
>>> Can anyone imagine that such a barrier may actually be required? If it
>>> is currently possible that env->stop is evaluated before we called into
>>> sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the
>>> signal without properly processing its reason (stop).
> 
> Should not be required (TM): Both signal eating / stop checking and stop
> setting / signal generation happens under the BQL, thus the ordering
> must not make a difference here.

Agree.


> Don't see where we could lose a signal. Maybe due to a subtle memory
> corruption that sets thread_kicked to non-zero, preventing the kicking
> this way.

Cannot be ruled out, yet too much of a coincidence.

Could be a kernel bug (either in kvm or elsewhere), we've had several
before in this area.

Is this reproducible?

-- 
error compiling committee.c: too many arguments to function



Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-08-17 Thread Jan Kiszka
On 2012-08-17 16:41, Jan Kiszka wrote:
> On 2012-08-17 16:36, Jan Kiszka wrote:
>> On 2012-08-17 15:11, Jan Kiszka wrote:
>>> On 2012-08-06 17:11, Stefan Hajnoczi wrote:
 On Thu, Jun 28, 2012 at 2:05 PM, Peter Lieven  wrote:
> i debugged my initial problem further and found out that the problem 
> happens
> to be that
> the main thread is stuck in pause_all_vcpus() on reset or quit commands in
> the monitor
> if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the
> condition from while (ret == 0)
> to while ((ret == 0) && !env->stop); it works, but is this the right fix?
> "Quit" command seems to work, but on "Reset" the VM enterns pause state.

 I think I'm hitting something similar.  I installed a F17 amd64 guest
 (3.5 kernel) but before booting entered the GRUB boot menu edit mode.
 The guest seemed unresponsive so I switched to the monitor, which also
 froze shortly afterwards.  The VNC screen ended up being all black.

 qemu-kvm.git/master 3e4305694fd891b69e4450e59ec4c65420907ede
 Linux 3.2.0-3-amd64 from Debian testing

 $ qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -drive
 if=virtio,cache=none,file=f17.img,aio=native -serial stdio

 (gdb) thread apply all bt

 Thread 3 (Thread 0x7f8008e23700 (LWP 367)):
 #0  0x7f800f891727 in ioctl () at ../sysdeps/unix/syscall-template.S:82
 #1  0x7f80137b92c9 in kvm_vcpu_ioctl
 (env=env@entry=0x7f8015b49640, type=type@entry=44672)
 at /home/stefanha/qemu-kvm/kvm-all.c:1619
 #2  0x7f80137b93fe in kvm_cpu_exec (env=env@entry=0x7f8015b49640)
 at /home/stefanha/qemu-kvm/kvm-all.c:1506
 #3  0x7f8013766f31 in qemu_kvm_cpu_thread_fn (arg=0x7f8015b49640)
 at /home/stefanha/qemu-kvm/cpus.c:756
 #4  0x7f800fb4db50 in start_thread (arg=) at
 pthread_create.c:304
 #5  0x7f800f8986dd in clone () at
 ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
 #6  0x in ?? ()

 This vcpu is still executing guest code and I've seen it successfully
 dispatching I/O.  The problem is it's missing the exit_request...

 Thread 2 (Thread 0x7f8008622700 (LWP 368)):
 #0  pthread_cond_wait@@GLIBC_2.3.2 ()
 at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
 #1  0x7f801372b229 in qemu_cond_wait (cond=,
 mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
 #2  0x7f8013766eff in qemu_kvm_wait_io_event (env=)
 at /home/stefanha/qemu-kvm/cpus.c:724
 #3  qemu_kvm_cpu_thread_fn (arg=0x7f8015b67450) at
 /home/stefanha/qemu-kvm/cpus.c:761
 #4  0x7f800fb4db50 in start_thread (arg=) at
 pthread_create.c:304
 #5  0x7f800f8986dd in clone () at
 ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
 #6  0x in ?? ()

 No problems here.

 Thread 1 (Thread 0x7f801347b8c0 (LWP 365)):
 #0  pthread_cond_wait@@GLIBC_2.3.2 ()
 at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
 #1  0x7f801372b229 in qemu_cond_wait (cond=cond@entry=0x7f801402fd80,
 mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
 #2  0x7f8013768949 in pause_all_vcpus () at
 /home/stefanha/qemu-kvm/cpus.c:962
 #3  0x7f80136028c8 in main (argc=, argv=,
 envp=) at /home/stefanha/qemu-kvm/vl.c:3695

 We're deadlocked in pause_all_vcpus(), waiting for vcpu #0 to pause.
 Unfortunately vcpu #0 has ->exit_request=0 although ->stop=1.

 Here are the vcpus:

 (gdb) p first_cpu
 $6 = (struct CPUX86State *) 0x7f8015b49640
 (gdb) p first_cpu->next_cpu
 $7 = (struct CPUX86State *) 0x7f8015b67450
 (gdb) p first_cpu->next_cpu->next_cpu
 $8 = (struct CPUX86State *) 0x0

 (gdb) p first_cpu->stop
 $9 = 1
 (gdb) p first_cpu->stopped
 $10 = 0
 (gdb) p first_cpu->exit_request
 $11 = 0
>>>
>>> CPUState::exit_request is only set on specific synchronous events, see
>>> target-i386/kvm.c.
>>>
>>> More interesting is CPUState::thread_kicked. If it's set, qemu_cpu_kick
>>> will skip the kicking via a signal. Maybe there is some race. Let me
>>> think about such possibilities again...
>>
>> diff --git a/cpus.c b/cpus.c
>> index e476a3c..30f3228 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -726,6 +726,9 @@ static void qemu_kvm_wait_io_event(CPUArchState *env)
>>  }
>>  
>>  qemu_kvm_eat_signals(env);
>> +/* Ensure that checking env->stop cannot overtake signal processing so
>> + * that we lose the latter without stopping. */
>> +smp_rmb();
> 
> rmb is nonsense. Should be a plain barrier() - if at all.
> 
>>  qemu_wait_io_event_common(env);
>>  }
>>  
>> Can anyone imagine that such a barrier may actually be required? If it
>> is currently possible that env->stop is evaluated before we called into
>> sigtimedwait in qemu_kvm_eat_signals, then w

Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-08-17 Thread Jan Kiszka
On 2012-08-17 16:36, Jan Kiszka wrote:
> On 2012-08-17 15:11, Jan Kiszka wrote:
>> On 2012-08-06 17:11, Stefan Hajnoczi wrote:
>>> On Thu, Jun 28, 2012 at 2:05 PM, Peter Lieven  wrote:
 i debugged my initial problem further and found out that the problem 
 happens
 to be that
 the main thread is stuck in pause_all_vcpus() on reset or quit commands in
 the monitor
 if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the
 condition from while (ret == 0)
 to while ((ret == 0) && !env->stop); it works, but is this the right fix?
 "Quit" command seems to work, but on "Reset" the VM enterns pause state.
>>>
>>> I think I'm hitting something similar.  I installed a F17 amd64 guest
>>> (3.5 kernel) but before booting entered the GRUB boot menu edit mode.
>>> The guest seemed unresponsive so I switched to the monitor, which also
>>> froze shortly afterwards.  The VNC screen ended up being all black.
>>>
>>> qemu-kvm.git/master 3e4305694fd891b69e4450e59ec4c65420907ede
>>> Linux 3.2.0-3-amd64 from Debian testing
>>>
>>> $ qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -drive
>>> if=virtio,cache=none,file=f17.img,aio=native -serial stdio
>>>
>>> (gdb) thread apply all bt
>>>
>>> Thread 3 (Thread 0x7f8008e23700 (LWP 367)):
>>> #0  0x7f800f891727 in ioctl () at ../sysdeps/unix/syscall-template.S:82
>>> #1  0x7f80137b92c9 in kvm_vcpu_ioctl
>>> (env=env@entry=0x7f8015b49640, type=type@entry=44672)
>>> at /home/stefanha/qemu-kvm/kvm-all.c:1619
>>> #2  0x7f80137b93fe in kvm_cpu_exec (env=env@entry=0x7f8015b49640)
>>> at /home/stefanha/qemu-kvm/kvm-all.c:1506
>>> #3  0x7f8013766f31 in qemu_kvm_cpu_thread_fn (arg=0x7f8015b49640)
>>> at /home/stefanha/qemu-kvm/cpus.c:756
>>> #4  0x7f800fb4db50 in start_thread (arg=) at
>>> pthread_create.c:304
>>> #5  0x7f800f8986dd in clone () at
>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>>> #6  0x in ?? ()
>>>
>>> This vcpu is still executing guest code and I've seen it successfully
>>> dispatching I/O.  The problem is it's missing the exit_request...
>>>
>>> Thread 2 (Thread 0x7f8008622700 (LWP 368)):
>>> #0  pthread_cond_wait@@GLIBC_2.3.2 ()
>>> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
>>> #1  0x7f801372b229 in qemu_cond_wait (cond=,
>>> mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
>>> #2  0x7f8013766eff in qemu_kvm_wait_io_event (env=)
>>> at /home/stefanha/qemu-kvm/cpus.c:724
>>> #3  qemu_kvm_cpu_thread_fn (arg=0x7f8015b67450) at
>>> /home/stefanha/qemu-kvm/cpus.c:761
>>> #4  0x7f800fb4db50 in start_thread (arg=) at
>>> pthread_create.c:304
>>> #5  0x7f800f8986dd in clone () at
>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>>> #6  0x in ?? ()
>>>
>>> No problems here.
>>>
>>> Thread 1 (Thread 0x7f801347b8c0 (LWP 365)):
>>> #0  pthread_cond_wait@@GLIBC_2.3.2 ()
>>> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
>>> #1  0x7f801372b229 in qemu_cond_wait (cond=cond@entry=0x7f801402fd80,
>>> mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
>>> #2  0x7f8013768949 in pause_all_vcpus () at
>>> /home/stefanha/qemu-kvm/cpus.c:962
>>> #3  0x7f80136028c8 in main (argc=, argv=,
>>> envp=) at /home/stefanha/qemu-kvm/vl.c:3695
>>>
>>> We're deadlocked in pause_all_vcpus(), waiting for vcpu #0 to pause.
>>> Unfortunately vcpu #0 has ->exit_request=0 although ->stop=1.
>>>
>>> Here are the vcpus:
>>>
>>> (gdb) p first_cpu
>>> $6 = (struct CPUX86State *) 0x7f8015b49640
>>> (gdb) p first_cpu->next_cpu
>>> $7 = (struct CPUX86State *) 0x7f8015b67450
>>> (gdb) p first_cpu->next_cpu->next_cpu
>>> $8 = (struct CPUX86State *) 0x0
>>>
>>> (gdb) p first_cpu->stop
>>> $9 = 1
>>> (gdb) p first_cpu->stopped
>>> $10 = 0
>>> (gdb) p first_cpu->exit_request
>>> $11 = 0
>>
>> CPUState::exit_request is only set on specific synchronous events, see
>> target-i386/kvm.c.
>>
>> More interesting is CPUState::thread_kicked. If it's set, qemu_cpu_kick
>> will skip the kicking via a signal. Maybe there is some race. Let me
>> think about such possibilities again...
> 
> diff --git a/cpus.c b/cpus.c
> index e476a3c..30f3228 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -726,6 +726,9 @@ static void qemu_kvm_wait_io_event(CPUArchState *env)
>  }
>  
>  qemu_kvm_eat_signals(env);
> +/* Ensure that checking env->stop cannot overtake signal processing so
> + * that we lose the latter without stopping. */
> +smp_rmb();

rmb is nonsense. Should be a plain barrier() - if at all.

>  qemu_wait_io_event_common(env);
>  }
>  
> Can anyone imagine that such a barrier may actually be required? If it
> is currently possible that env->stop is evaluated before we called into
> sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the
> signal without properly processing its reason (stop).

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Compete

Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-08-17 Thread Jan Kiszka
On 2012-08-17 15:11, Jan Kiszka wrote:
> On 2012-08-06 17:11, Stefan Hajnoczi wrote:
>> On Thu, Jun 28, 2012 at 2:05 PM, Peter Lieven  wrote:
>>> i debugged my initial problem further and found out that the problem happens
>>> to be that
>>> the main thread is stuck in pause_all_vcpus() on reset or quit commands in
>>> the monitor
>>> if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the
>>> condition from while (ret == 0)
>>> to while ((ret == 0) && !env->stop); it works, but is this the right fix?
>>> "Quit" command seems to work, but on "Reset" the VM enterns pause state.
>>
>> I think I'm hitting something similar.  I installed a F17 amd64 guest
>> (3.5 kernel) but before booting entered the GRUB boot menu edit mode.
>> The guest seemed unresponsive so I switched to the monitor, which also
>> froze shortly afterwards.  The VNC screen ended up being all black.
>>
>> qemu-kvm.git/master 3e4305694fd891b69e4450e59ec4c65420907ede
>> Linux 3.2.0-3-amd64 from Debian testing
>>
>> $ qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -drive
>> if=virtio,cache=none,file=f17.img,aio=native -serial stdio
>>
>> (gdb) thread apply all bt
>>
>> Thread 3 (Thread 0x7f8008e23700 (LWP 367)):
>> #0  0x7f800f891727 in ioctl () at ../sysdeps/unix/syscall-template.S:82
>> #1  0x7f80137b92c9 in kvm_vcpu_ioctl
>> (env=env@entry=0x7f8015b49640, type=type@entry=44672)
>> at /home/stefanha/qemu-kvm/kvm-all.c:1619
>> #2  0x7f80137b93fe in kvm_cpu_exec (env=env@entry=0x7f8015b49640)
>> at /home/stefanha/qemu-kvm/kvm-all.c:1506
>> #3  0x7f8013766f31 in qemu_kvm_cpu_thread_fn (arg=0x7f8015b49640)
>> at /home/stefanha/qemu-kvm/cpus.c:756
>> #4  0x7f800fb4db50 in start_thread (arg=) at
>> pthread_create.c:304
>> #5  0x7f800f8986dd in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>> #6  0x in ?? ()
>>
>> This vcpu is still executing guest code and I've seen it successfully
>> dispatching I/O.  The problem is it's missing the exit_request...
>>
>> Thread 2 (Thread 0x7f8008622700 (LWP 368)):
>> #0  pthread_cond_wait@@GLIBC_2.3.2 ()
>> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
>> #1  0x7f801372b229 in qemu_cond_wait (cond=,
>> mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
>> #2  0x7f8013766eff in qemu_kvm_wait_io_event (env=)
>> at /home/stefanha/qemu-kvm/cpus.c:724
>> #3  qemu_kvm_cpu_thread_fn (arg=0x7f8015b67450) at
>> /home/stefanha/qemu-kvm/cpus.c:761
>> #4  0x7f800fb4db50 in start_thread (arg=) at
>> pthread_create.c:304
>> #5  0x7f800f8986dd in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>> #6  0x in ?? ()
>>
>> No problems here.
>>
>> Thread 1 (Thread 0x7f801347b8c0 (LWP 365)):
>> #0  pthread_cond_wait@@GLIBC_2.3.2 ()
>> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
>> #1  0x7f801372b229 in qemu_cond_wait (cond=cond@entry=0x7f801402fd80,
>> mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
>> #2  0x7f8013768949 in pause_all_vcpus () at
>> /home/stefanha/qemu-kvm/cpus.c:962
>> #3  0x7f80136028c8 in main (argc=, argv=,
>> envp=) at /home/stefanha/qemu-kvm/vl.c:3695
>>
>> We're deadlocked in pause_all_vcpus(), waiting for vcpu #0 to pause.
>> Unfortunately vcpu #0 has ->exit_request=0 although ->stop=1.
>>
>> Here are the vcpus:
>>
>> (gdb) p first_cpu
>> $6 = (struct CPUX86State *) 0x7f8015b49640
>> (gdb) p first_cpu->next_cpu
>> $7 = (struct CPUX86State *) 0x7f8015b67450
>> (gdb) p first_cpu->next_cpu->next_cpu
>> $8 = (struct CPUX86State *) 0x0
>>
>> (gdb) p first_cpu->stop
>> $9 = 1
>> (gdb) p first_cpu->stopped
>> $10 = 0
>> (gdb) p first_cpu->exit_request
>> $11 = 0
> 
> CPUState::exit_request is only set on specific synchronous events, see
> target-i386/kvm.c.
> 
> More interesting is CPUState::thread_kicked. If it's set, qemu_cpu_kick
> will skip the kicking via a signal. Maybe there is some race. Let me
> think about such possibilities again...

diff --git a/cpus.c b/cpus.c
index e476a3c..30f3228 100644
--- a/cpus.c
+++ b/cpus.c
@@ -726,6 +726,9 @@ static void qemu_kvm_wait_io_event(CPUArchState *env)
 }
 
 qemu_kvm_eat_signals(env);
+/* Ensure that checking env->stop cannot overtake signal processing so
+ * that we lose the latter without stopping. */
+smp_rmb();
 qemu_wait_io_event_common(env);
 }
 
Can anyone imagine that such a barrier may actually be required? If it
is currently possible that env->stop is evaluated before we called into
sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the
signal without properly processing its reason (stop).

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-08-17 Thread Jan Kiszka
On 2012-08-06 17:11, Stefan Hajnoczi wrote:
> On Thu, Jun 28, 2012 at 2:05 PM, Peter Lieven  wrote:
>> i debugged my initial problem further and found out that the problem happens
>> to be that
>> the main thread is stuck in pause_all_vcpus() on reset or quit commands in
>> the monitor
>> if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the
>> condition from while (ret == 0)
>> to while ((ret == 0) && !env->stop); it works, but is this the right fix?
>> "Quit" command seems to work, but on "Reset" the VM enterns pause state.
> 
> I think I'm hitting something similar.  I installed a F17 amd64 guest
> (3.5 kernel) but before booting entered the GRUB boot menu edit mode.
> The guest seemed unresponsive so I switched to the monitor, which also
> froze shortly afterwards.  The VNC screen ended up being all black.
> 
> qemu-kvm.git/master 3e4305694fd891b69e4450e59ec4c65420907ede
> Linux 3.2.0-3-amd64 from Debian testing
> 
> $ qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -drive
> if=virtio,cache=none,file=f17.img,aio=native -serial stdio
> 
> (gdb) thread apply all bt
> 
> Thread 3 (Thread 0x7f8008e23700 (LWP 367)):
> #0  0x7f800f891727 in ioctl () at ../sysdeps/unix/syscall-template.S:82
> #1  0x7f80137b92c9 in kvm_vcpu_ioctl
> (env=env@entry=0x7f8015b49640, type=type@entry=44672)
> at /home/stefanha/qemu-kvm/kvm-all.c:1619
> #2  0x7f80137b93fe in kvm_cpu_exec (env=env@entry=0x7f8015b49640)
> at /home/stefanha/qemu-kvm/kvm-all.c:1506
> #3  0x7f8013766f31 in qemu_kvm_cpu_thread_fn (arg=0x7f8015b49640)
> at /home/stefanha/qemu-kvm/cpus.c:756
> #4  0x7f800fb4db50 in start_thread (arg=) at
> pthread_create.c:304
> #5  0x7f800f8986dd in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
> #6  0x in ?? ()
> 
> This vcpu is still executing guest code and I've seen it successfully
> dispatching I/O.  The problem is it's missing the exit_request...
> 
> Thread 2 (Thread 0x7f8008622700 (LWP 368)):
> #0  pthread_cond_wait@@GLIBC_2.3.2 ()
> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
> #1  0x7f801372b229 in qemu_cond_wait (cond=,
> mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
> #2  0x7f8013766eff in qemu_kvm_wait_io_event (env=)
> at /home/stefanha/qemu-kvm/cpus.c:724
> #3  qemu_kvm_cpu_thread_fn (arg=0x7f8015b67450) at
> /home/stefanha/qemu-kvm/cpus.c:761
> #4  0x7f800fb4db50 in start_thread (arg=) at
> pthread_create.c:304
> #5  0x7f800f8986dd in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
> #6  0x in ?? ()
> 
> No problems here.
> 
> Thread 1 (Thread 0x7f801347b8c0 (LWP 365)):
> #0  pthread_cond_wait@@GLIBC_2.3.2 ()
> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
> #1  0x7f801372b229 in qemu_cond_wait (cond=cond@entry=0x7f801402fd80,
> mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
> #2  0x7f8013768949 in pause_all_vcpus () at
> /home/stefanha/qemu-kvm/cpus.c:962
> #3  0x7f80136028c8 in main (argc=, argv=,
> envp=) at /home/stefanha/qemu-kvm/vl.c:3695
> 
> We're deadlocked in pause_all_vcpus(), waiting for vcpu #0 to pause.
> Unfortunately vcpu #0 has ->exit_request=0 although ->stop=1.
> 
> Here are the vcpus:
> 
> (gdb) p first_cpu
> $6 = (struct CPUX86State *) 0x7f8015b49640
> (gdb) p first_cpu->next_cpu
> $7 = (struct CPUX86State *) 0x7f8015b67450
> (gdb) p first_cpu->next_cpu->next_cpu
> $8 = (struct CPUX86State *) 0x0
> 
> (gdb) p first_cpu->stop
> $9 = 1
> (gdb) p first_cpu->stopped
> $10 = 0
> (gdb) p first_cpu->exit_request
> $11 = 0

CPUState::exit_request is only set on specific synchronous events, see
target-i386/kvm.c.

More interesting is CPUState::thread_kicked. If it's set, qemu_cpu_kick
will skip the kicking via a signal. Maybe there is some race. Let me
think about such possibilities again...

Jan

> 
> :(
> 
> This isn't easy to reproduce.  I tried entering the GRUB boot menu
> again and there was no deadlock.
> 
> Stefan
> 

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-08-06 Thread Stefan Hajnoczi
On Thu, Jun 28, 2012 at 2:05 PM, Peter Lieven  wrote:
> i debugged my initial problem further and found out that the problem happens
> to be that
> the main thread is stuck in pause_all_vcpus() on reset or quit commands in
> the monitor
> if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the
> condition from while (ret == 0)
> to while ((ret == 0) && !env->stop); it works, but is this the right fix?
> "Quit" command seems to work, but on "Reset" the VM enterns pause state.

I think I'm hitting something similar.  I installed a F17 amd64 guest
(3.5 kernel) but before booting entered the GRUB boot menu edit mode.
The guest seemed unresponsive so I switched to the monitor, which also
froze shortly afterwards.  The VNC screen ended up being all black.

qemu-kvm.git/master 3e4305694fd891b69e4450e59ec4c65420907ede
Linux 3.2.0-3-amd64 from Debian testing

$ qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -drive
if=virtio,cache=none,file=f17.img,aio=native -serial stdio

(gdb) thread apply all bt

Thread 3 (Thread 0x7f8008e23700 (LWP 367)):
#0  0x7f800f891727 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x7f80137b92c9 in kvm_vcpu_ioctl
(env=env@entry=0x7f8015b49640, type=type@entry=44672)
at /home/stefanha/qemu-kvm/kvm-all.c:1619
#2  0x7f80137b93fe in kvm_cpu_exec (env=env@entry=0x7f8015b49640)
at /home/stefanha/qemu-kvm/kvm-all.c:1506
#3  0x7f8013766f31 in qemu_kvm_cpu_thread_fn (arg=0x7f8015b49640)
at /home/stefanha/qemu-kvm/cpus.c:756
#4  0x7f800fb4db50 in start_thread (arg=) at
pthread_create.c:304
#5  0x7f800f8986dd in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x in ?? ()

This vcpu is still executing guest code and I've seen it successfully
dispatching I/O.  The problem is it's missing the exit_request...

Thread 2 (Thread 0x7f8008622700 (LWP 368)):
#0  pthread_cond_wait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x7f801372b229 in qemu_cond_wait (cond=,
mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
#2  0x7f8013766eff in qemu_kvm_wait_io_event (env=)
at /home/stefanha/qemu-kvm/cpus.c:724
#3  qemu_kvm_cpu_thread_fn (arg=0x7f8015b67450) at
/home/stefanha/qemu-kvm/cpus.c:761
#4  0x7f800fb4db50 in start_thread (arg=) at
pthread_create.c:304
#5  0x7f800f8986dd in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x in ?? ()

No problems here.

Thread 1 (Thread 0x7f801347b8c0 (LWP 365)):
#0  pthread_cond_wait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x7f801372b229 in qemu_cond_wait (cond=cond@entry=0x7f801402fd80,
mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
#2  0x7f8013768949 in pause_all_vcpus () at
/home/stefanha/qemu-kvm/cpus.c:962
#3  0x7f80136028c8 in main (argc=, argv=,
envp=) at /home/stefanha/qemu-kvm/vl.c:3695

We're deadlocked in pause_all_vcpus(), waiting for vcpu #0 to pause.
Unfortunately vcpu #0 has ->exit_request=0 although ->stop=1.

Here are the vcpus:

(gdb) p first_cpu
$6 = (struct CPUX86State *) 0x7f8015b49640
(gdb) p first_cpu->next_cpu
$7 = (struct CPUX86State *) 0x7f8015b67450
(gdb) p first_cpu->next_cpu->next_cpu
$8 = (struct CPUX86State *) 0x0

(gdb) p first_cpu->stop
$9 = 1
(gdb) p first_cpu->stopped
$10 = 0
(gdb) p first_cpu->exit_request
$11 = 0

:(

This isn't easy to reproduce.  I tried entering the GRUB boot menu
again and there was no deadlock.

Stefan



Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-07-02 Thread Peter Lieven

On 02.07.2012 09:05, Jan Kiszka wrote:

On 2012-07-01 21:18, Peter Lieven wrote:

Am 01.07.2012 um 10:19 schrieb Avi Kivity:


On 06/28/2012 10:27 PM, Peter Lieven wrote:

Am 28.06.2012 um 18:32 schrieb Avi Kivity:


On 06/28/2012 07:29 PM, Peter Lieven wrote:

Yes. A signal is sent, and KVM returns from the guest to userspace on
pending signals.

is there a description available how this process exactly works?

The kernel part is in vcpu_enter_guest(), see the check for
signal_pending().  But this hasn't seen changes for quite a long while.

Thank you, i will have a look. I noticed a few patches that where submitted
during the last year, maybe one of them is related:

Switch SIG_IPI to SIGUSR1
Fix signal handling of SIG_IPI when io-thread is enabled

In the first commit there is mentioned a "32-on-64-bit Linux kernel bug"
is there any reference to that?


http://web.archiveorange.com/archive/v/1XS1vwGSFLyYygwTXg1K.  Are you
running 32-on-64?

I think the issue occurs when running a 32-bit guest on a 64-bit system. Afaik, 
the
isolinux loader where is see the race is 32-bit altough it is a 64-bit ubuntu 
lts
cd image. The second case where i have seen the race is on shutdown of a
Windows 2000 Server which is also 32-bit.

"32-on-64" particularly means using a 32-bit QEMU[-kvm] binary on a
64-bit host kernel. What does "file qemu-system-x86_64" report about yours?
Its custom build on a 64-bit linux as 64-bit application. I will try to 
continue to find out

today whats going wrong. Any help or hints appreciated ;-)

Thanks,
Peter


Jan






Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-07-02 Thread Jan Kiszka
On 2012-07-01 21:18, Peter Lieven wrote:
> 
> Am 01.07.2012 um 10:19 schrieb Avi Kivity:
> 
>> On 06/28/2012 10:27 PM, Peter Lieven wrote:
>>>
>>> Am 28.06.2012 um 18:32 schrieb Avi Kivity:
>>>
 On 06/28/2012 07:29 PM, Peter Lieven wrote:
>> Yes. A signal is sent, and KVM returns from the guest to userspace on
>> pending signals.

> is there a description available how this process exactly works?

 The kernel part is in vcpu_enter_guest(), see the check for
 signal_pending().  But this hasn't seen changes for quite a long while.
>>>
>>> Thank you, i will have a look. I noticed a few patches that where submitted
>>> during the last year, maybe one of them is related:
>>>
>>> Switch SIG_IPI to SIGUSR1
>>> Fix signal handling of SIG_IPI when io-thread is enabled
>>>
>>> In the first commit there is mentioned a "32-on-64-bit Linux kernel bug"
>>> is there any reference to that?
>>
>>
>> http://web.archiveorange.com/archive/v/1XS1vwGSFLyYygwTXg1K.  Are you
>> running 32-on-64?
> 
> I think the issue occurs when running a 32-bit guest on a 64-bit system. 
> Afaik, the
> isolinux loader where is see the race is 32-bit altough it is a 64-bit ubuntu 
> lts
> cd image. The second case where i have seen the race is on shutdown of a
> Windows 2000 Server which is also 32-bit.

"32-on-64" particularly means using a 32-bit QEMU[-kvm] binary on a
64-bit host kernel. What does "file qemu-system-x86_64" report about yours?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux





Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-07-01 Thread Peter Lieven

Am 01.07.2012 um 10:19 schrieb Avi Kivity:

> On 06/28/2012 10:27 PM, Peter Lieven wrote:
>> 
>> Am 28.06.2012 um 18:32 schrieb Avi Kivity:
>> 
>>> On 06/28/2012 07:29 PM, Peter Lieven wrote:
> Yes. A signal is sent, and KVM returns from the guest to userspace on
> pending signals.
>>> 
 is there a description available how this process exactly works?
>>> 
>>> The kernel part is in vcpu_enter_guest(), see the check for
>>> signal_pending().  But this hasn't seen changes for quite a long while.
>> 
>> Thank you, i will have a look. I noticed a few patches that where submitted
>> during the last year, maybe one of them is related:
>> 
>> Switch SIG_IPI to SIGUSR1
>> Fix signal handling of SIG_IPI when io-thread is enabled
>> 
>> In the first commit there is mentioned a "32-on-64-bit Linux kernel bug"
>> is there any reference to that?
> 
> 
> http://web.archiveorange.com/archive/v/1XS1vwGSFLyYygwTXg1K.  Are you
> running 32-on-64?

I think the issue occurs when running a 32-bit guest on a 64-bit system. Afaik, 
the
isolinux loader where is see the race is 32-bit altough it is a 64-bit ubuntu 
lts
cd image. The second case where i have seen the race is on shutdown of a
Windows 2000 Server which is also 32-bit.

Peter

> 
> 
> -- 
> error compiling committee.c: too many arguments to function
> 
> 




Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-07-01 Thread Avi Kivity
On 06/28/2012 10:27 PM, Peter Lieven wrote:
> 
> Am 28.06.2012 um 18:32 schrieb Avi Kivity:
> 
>> On 06/28/2012 07:29 PM, Peter Lieven wrote:
 Yes. A signal is sent, and KVM returns from the guest to userspace on
 pending signals.
>>
>>> is there a description available how this process exactly works?
>>
>> The kernel part is in vcpu_enter_guest(), see the check for
>> signal_pending().  But this hasn't seen changes for quite a long while.
> 
> Thank you, i will have a look. I noticed a few patches that where submitted
> during the last year, maybe one of them is related:
> 
> Switch SIG_IPI to SIGUSR1
> Fix signal handling of SIG_IPI when io-thread is enabled
> 
> In the first commit there is mentioned a "32-on-64-bit Linux kernel bug"
> is there any reference to that?


http://web.archiveorange.com/archive/v/1XS1vwGSFLyYygwTXg1K.  Are you
running 32-on-64?


-- 
error compiling committee.c: too many arguments to function





Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-06-28 Thread Peter Lieven

Am 28.06.2012 um 18:32 schrieb Avi Kivity:

> On 06/28/2012 07:29 PM, Peter Lieven wrote:
>>> Yes. A signal is sent, and KVM returns from the guest to userspace on
>>> pending signals.
> 
>> is there a description available how this process exactly works?
> 
> The kernel part is in vcpu_enter_guest(), see the check for
> signal_pending().  But this hasn't seen changes for quite a long while.

Thank you, i will have a look. I noticed a few patches that where submitted
during the last year, maybe one of them is related:

Switch SIG_IPI to SIGUSR1
Fix signal handling of SIG_IPI when io-thread is enabled

In the first commit there is mentioned a "32-on-64-bit Linux kernel bug"
is there any reference to that?

Thank you,
Peter

> 
> -- 
> error compiling committee.c: too many arguments to function
> 
> 



Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-06-28 Thread Avi Kivity
On 06/28/2012 07:29 PM, Peter Lieven wrote:
>> Yes. A signal is sent, and KVM returns from the guest to userspace on
>> pending signals.

> is there a description available how this process exactly works?

The kernel part is in vcpu_enter_guest(), see the check for
signal_pending().  But this hasn't seen changes for quite a long while.

-- 
error compiling committee.c: too many arguments to function





Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-06-28 Thread Peter Lieven

On 28.06.2012 17:22, Jan Kiszka wrote:

On 2012-06-28 17:02, Peter Lieven wrote:

On 28.06.2012 15:25, Jan Kiszka wrote:

On 2012-06-28 15:05, Peter Lieven wrote:

Hi,

i debugged my initial problem further and found out that the problem
happens to be that
the main thread is stuck in pause_all_vcpus() on reset or quit commands
in the monitor
if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the
condition from while (ret == 0)
to while ((ret == 0)&&   !env->stop); it works, but is this the right fix?
"Quit" command seems to work, but on "Reset" the VM enterns pause state.

Before entering the wait loop in pause_all_vcpus, there are kicks sent
to all vcpus. Now we need to find out why some of those kicks apparently
don't reach the destination.

can you explain shot what exactly these kicks do? does these kicks lead
to leaving the kernel mode and returning to userspace?

Yes. A signal is sent, and KVM returns from the guest to userspace on
pending signals.

is there a description available how this process exactly works?

thanks
peter




Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-06-28 Thread Jan Kiszka
On 2012-06-28 17:02, Peter Lieven wrote:
> On 28.06.2012 15:25, Jan Kiszka wrote:
>> On 2012-06-28 15:05, Peter Lieven wrote:
>>> Hi,
>>>
>>> i debugged my initial problem further and found out that the problem
>>> happens to be that
>>> the main thread is stuck in pause_all_vcpus() on reset or quit commands
>>> in the monitor
>>> if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the
>>> condition from while (ret == 0)
>>> to while ((ret == 0)&&  !env->stop); it works, but is this the right fix?
>>> "Quit" command seems to work, but on "Reset" the VM enterns pause state.
>> Before entering the wait loop in pause_all_vcpus, there are kicks sent
>> to all vcpus. Now we need to find out why some of those kicks apparently
>> don't reach the destination.
> can you explain shot what exactly these kicks do? does these kicks lead
> to leaving the kernel mode and returning to userspace?

Yes. A signal is sent, and KVM returns from the guest to userspace on
pending signals.

>> Again:
>>   - on which host kernels does this occur, and which change may have
>> changed it?
> I do not see it in 3.0.0 and have also not seen it in 2.6.38. both
> the mainline 64-bit ubuntu-server kernels (for natty / oneiric 
> respectively).
> If I compile a more recent kvm-kmod 3.3 or 3.4 on these machines,
> it is no longer working.

I was asking for kernel 3.3 or 3.4 without kvm-kmod.

>>   - with which qemu-kvm version is it reproducible, and which commit
>> introduced or fixed it?
> qemu-kvm-1.0.1 from sourceforge. to get into the scenario it
> is not sufficient to boot from an empty harddisk. to reproduce

Please also try qemu-kvm git to see if something fixed it there.

> i have use a live cd like ubuntu-server 12.04 and choose to
> boot from the first harddisk. i think the isolinux loader does
> not check for a valid bootsector and just executes what is found
> in sector 0. this leads to the mmio reads i posted and 100%
> cpu load (most spent in kernel). at that time the monitor/qmp
> is still responsible. if i sent a command that pauses all vcpus,
> the first cpu is looping in kvm_cpu_exec and the main thread
> is waiting. at that time the monitor stops responding.
> i have also seen this issue on very old windows 2000 servers
> where the system fails to power off and is just halted. maybe
> this is also a busy loop.
> 
> i will try to bisect this asap and let you know, maybe the above
> info helps you already to reproduce.

OK, thanks,

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux





Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-06-28 Thread Peter Lieven

On 28.06.2012 15:25, Jan Kiszka wrote:

On 2012-06-28 15:05, Peter Lieven wrote:

Hi,

i debugged my initial problem further and found out that the problem
happens to be that
the main thread is stuck in pause_all_vcpus() on reset or quit commands
in the monitor
if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the
condition from while (ret == 0)
to while ((ret == 0)&&  !env->stop); it works, but is this the right fix?
"Quit" command seems to work, but on "Reset" the VM enterns pause state.

Before entering the wait loop in pause_all_vcpus, there are kicks sent
to all vcpus. Now we need to find out why some of those kicks apparently
don't reach the destination.

can you explain shot what exactly these kicks do? does these kicks lead
to leaving the kernel mode and returning to userspace?

Again:
  - on which host kernels does this occur, and which change may have
changed it?

I do not see it in 3.0.0 and have also not seen it in 2.6.38. both
the mainline 64-bit ubuntu-server kernels (for natty / oneiric 
respectively).

If I compile a more recent kvm-kmod 3.3 or 3.4 on these machines,
it is no longer working.

  - with which qemu-kvm version is it reproducible, and which commit
introduced or fixed it?

qemu-kvm-1.0.1 from sourceforge. to get into the scenario it
is not sufficient to boot from an empty harddisk. to reproduce
i have use a live cd like ubuntu-server 12.04 and choose to
boot from the first harddisk. i think the isolinux loader does
not check for a valid bootsector and just executes what is found
in sector 0. this leads to the mmio reads i posted and 100%
cpu load (most spent in kernel). at that time the monitor/qmp
is still responsible. if i sent a command that pauses all vcpus,
the first cpu is looping in kvm_cpu_exec and the main thread
is waiting. at that time the monitor stops responding.
i have also seen this issue on very old windows 2000 servers
where the system fails to power off and is just halted. maybe
this is also a busy loop.

i will try to bisect this asap and let you know, maybe the above
info helps you already to reproduce.

thanks,
peter


I failed reproducing so far.

Jan






Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

2012-06-28 Thread Jan Kiszka
On 2012-06-28 15:05, Peter Lieven wrote:
> Hi,
> 
> i debugged my initial problem further and found out that the problem
> happens to be that
> the main thread is stuck in pause_all_vcpus() on reset or quit commands
> in the monitor
> if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the
> condition from while (ret == 0)
> to while ((ret == 0) && !env->stop); it works, but is this the right fix?
> "Quit" command seems to work, but on "Reset" the VM enterns pause state.

Before entering the wait loop in pause_all_vcpus, there are kicks sent
to all vcpus. Now we need to find out why some of those kicks apparently
don't reach the destination.

Again:
 - on which host kernels does this occur, and which change may have
   changed it?
 - with which qemu-kvm version is it reproducible, and which commit
   introduced or fixed it?

I failed reproducing so far.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux