Re: [kvm-devel] [PATCH] Fix QEMU vcpu thread race with apic_reset

2008-04-28 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

> @@ -369,6 +372,10 @@ static void *ap_main_loop(void *_env)
>  sigfillset(&signals);
>  sigprocmask(SIG_BLOCK, &signals, NULL);
>  kvm_create_vcpu(kvm_context, env->cpu_index);
> +pthread_mutex_lock(&vcpu_mutex);
> +vcpu->created = 1;
> +pthread_cond_signal(&qemu_vcpuup_cond);
> +pthread_mutex_unlock(&vcpu_mutex);
>  kvm_qemu_init_env(env);
>  kvm_main_loop_cpu(env);
>  return NULL;

Still no locking needed on any CPU we support.  The memory access is
atomic and that's all that counts here.  With the mutex taken the woken
thread immediately runs into a brick wall and has to be put to sleep again.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFIFm/w2ijCOnn/RHQRAmxRAKDHoem5zvdt6gSVdoX6vQoYPr106QCcC3IA
RzKcqRUNgUmUDzqnrkjHrAI=
=MNbq
-END PGP SIGNATURE-

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Fix QEMU vcpu thread race with apic_reset

2008-04-28 Thread Anthony Liguori
Ryan Harper wrote:
> * Avi Kivity <[EMAIL PROTECTED]> [2008-04-26 02:23]:
>   
>> Please reuse qemu_mutex for this, no need for a new one.
>> 
>
> I'm having a little trouble wrapping my head around all of the locking
> here.  If I avoid qemu_mutex and use a new one, I've got everything
> working.  However, attemping to use qemu_mutex is stepping into a pile
> of locking crap.  I'm sure there is a good reason... 
>
> The current code looks like this:
>
> Thread1:
> main()
>   kvm_qemu_init() // mutex_lock()
>   machine->init()
>pc_init1()
> pc_new_cpu()
>  cpu_init()
>   cpu_x86_init()
>kvm_init_new_ap() // create vcpu Thread2
><--  
>   <--  
>  <--  
> <--  
><--  
>   <--  
>   kvm_main_loop() // mutex_unlock()
> 
> Thread2:
> ap_main_loop()
>  /* vcpu init */
>  kvm_main_loop_cpu()
>   kvm_main_loop_wait() // mutex_unlock() on enter, lock on exit
>kvm_eat_signals()   // mutex_lock() on enter, unlock on exit
><--
>   <--
>  <--
>   

The qemu_mutex is meant to ensure that the QEMU code is only ever 
entered by one thread at a time.  QEMU is not thread-safe so this is 
necessary.  It's a little odd because we're taking this lock initially 
by being called from QEMU code.  Here's the basic theory of locking AFAICT:

kvm_main_loop_cpu() is the main loop for each VCPU.  It must acquire the 
QEMU mutex since it will call into normal QEMU code to process events.  
Whenever a VCPU is allowed to run, or when the VCPU is idling, it needs 
to release the QEMU mutex.  The former is done via the post_kvm_run() 
and pre_kvm_run() hooks.  The later is done within kvm_main_loop_wait().

kvm_main_loop_wait() will release the QEMU mutex and call 
kvm_eat_signals() which calls kvm_eat_signal() which will issue a 
sigtimedwait().  This is where we actually idle (and why SIGIO is so 
important right now).  We don't want to idle with the QEMU mutex held as 
that may result in dead lock so this is why we release it here.

kvm_eat_signal() has to acquire the lock again in order to dispatch IO 
events (via kvm_process_signal()).

Regards,

Anthony Liguori

> It wedges up in kvm_init_new_ap() if I attempt acquire qemu_mutex.
> Quite obvious after I looked at the call trace and discovered
> kvm_qemu_init() locking on exit.  I see other various functions that
> unlock and then lock; I really don't want to wade into this mess...
> rather whomever cooked it up should do some cleanup.  I tried the
> unlock, then re-lock on exit in kvm_init_new_ap() but that also wedged.
>
> Here is a rework with a new flag in vcpu_info indicating vcpu
> creation.  Tested this with 64 1VCPU guests booting with 1 second delay,
> and single 16-way SMP guest boot.
>
>   


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Fix QEMU vcpu thread race with apic_reset

2008-04-28 Thread Ryan Harper
* Avi Kivity <[EMAIL PROTECTED]> [2008-04-26 02:23]:
> 
> Please reuse qemu_mutex for this, no need for a new one.

I'm having a little trouble wrapping my head around all of the locking
here.  If I avoid qemu_mutex and use a new one, I've got everything
working.  However, attemping to use qemu_mutex is stepping into a pile
of locking crap.  I'm sure there is a good reason... 

The current code looks like this:

Thread1:
main()
  kvm_qemu_init() // mutex_lock()
  machine->init()
   pc_init1()
pc_new_cpu()
 cpu_init()
  cpu_x86_init()
   kvm_init_new_ap() // create vcpu Thread2
   <--  
  <--  
 <--  
<--  
   <--  
  <--  
  kvm_main_loop() // mutex_unlock()

Thread2:
ap_main_loop()
 /* vcpu init */
 kvm_main_loop_cpu()
  kvm_main_loop_wait() // mutex_unlock() on enter, lock on exit
   kvm_eat_signals()   // mutex_lock() on enter, unlock on exit
   <--
  <--
 <--


It wedges up in kvm_init_new_ap() if I attempt acquire qemu_mutex.
Quite obvious after I looked at the call trace and discovered
kvm_qemu_init() locking on exit.  I see other various functions that
unlock and then lock; I really don't want to wade into this mess...
rather whomever cooked it up should do some cleanup.  I tried the
unlock, then re-lock on exit in kvm_init_new_ap() but that also wedged.

Here is a rework with a new flag in vcpu_info indicating vcpu
creation.  Tested this with 64 1VCPU guests booting with 1 second delay,
and single 16-way SMP guest boot.

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
[EMAIL PROTECTED]


diffstat output:
 qemu-kvm.c |   14 --
 1 files changed, 12 insertions(+), 2 deletions(-)

Signed-off-by: Ryan Harper <[EMAIL PROTECTED]>
---
diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c
index 78127de..2768ea5 100644
--- a/qemu/qemu-kvm.c
+++ b/qemu/qemu-kvm.c
@@ -31,7 +31,9 @@ extern int smp_cpus;
 static int qemu_kvm_reset_requested;
 
 pthread_mutex_t qemu_mutex = PTHREAD_MUTEX_INITIALIZER;
+pthread_mutex_t vcpu_mutex = PTHREAD_MUTEX_INITIALIZER;
 pthread_cond_t qemu_aio_cond = PTHREAD_COND_INITIALIZER;
+pthread_cond_t qemu_vcpuup_cond = PTHREAD_COND_INITIALIZER;
 __thread struct vcpu_info *vcpu;
 
 struct qemu_kvm_signal_table {
@@ -53,6 +55,7 @@ struct vcpu_info {
 int stop;
 int stopped;
 int reload_regs;
+int created;
 } vcpu_info[256];
 
 pthread_t io_thread;
@@ -369,6 +372,10 @@ static void *ap_main_loop(void *_env)
 sigfillset(&signals);
 sigprocmask(SIG_BLOCK, &signals, NULL);
 kvm_create_vcpu(kvm_context, env->cpu_index);
+pthread_mutex_lock(&vcpu_mutex);
+vcpu->created = 1;
+pthread_cond_signal(&qemu_vcpuup_cond);
+pthread_mutex_unlock(&vcpu_mutex);
 kvm_qemu_init_env(env);
 kvm_main_loop_cpu(env);
 return NULL;
@@ -388,9 +395,12 @@ static void kvm_add_signal(struct qemu_kvm_signal_table 
*sigtab, int signum)
 
 void kvm_init_new_ap(int cpu, CPUState *env)
 {
+pthread_mutex_lock(&vcpu_mutex);
 pthread_create(&vcpu_info[cpu].thread, NULL, ap_main_loop, env);
-/* FIXME: wait for thread to spin up */
-usleep(200);
+while (vcpu_info[cpu].created == 0) {
+pthread_cond_wait(&qemu_vcpuup_cond, &vcpu_mutex);
+}
+pthread_mutex_unlock(&vcpu_mutex);
 }
 
 static void qemu_kvm_init_signal_tables(void)

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Fix QEMU vcpu thread race with apic_reset

2008-04-26 Thread Anthony Liguori
Ulrich Drepper wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Ryan Harper wrote:
>   
>> @@ -388,9 +395,10 @@ static void kvm_add_signal(struct qemu_kvm_signal_table 
>> *sigtab, int signum)
>>  
>>  void kvm_init_new_ap(int cpu, CPUState *env)
>>  {
>> +pthread_mutex_lock(&vcpu_mutex);
>>  pthread_create(&vcpu_info[cpu].thread, NULL, ap_main_loop, env);
>> -/* FIXME: wait for thread to spin up */
>> -usleep(200);
>> +pthread_cond_wait(&qemu_vcpuup_cond, &vcpu_mutex);
>> +pthread_mutex_unlock(&vcpu_mutex);
>> 
>
> And something is very wrong here.  The pattern for using a condvar is
>
> 1 take mutex
>
> 2 check condition
>
> 3 if condition is not fulfilled
> 3a   call cond_wait
> 3b   when returning, go back to step 2
>
> 4 unlock mutex
>
>
> Anything else is buggy.
>
> So, either your condvar use is wrong or you don't really want a condvar
> in the first place.  I haven't checked the code.
>   

A flag is needed in the vcpu structure that indicates whether the vcpu 
spun up or not.  This is what the while () condition should be.  This is 
needed as the thread may spin up before it gets to the 
pthread_cond_wait() in which case the signal happens when noone is 
waiting on it.  The other reason a while () is usually needed is that 
cond_signal may not wake up the right thread so it's necessary to check 
whether you really have something to do.  Not really a problem here but 
the former race is.

A condvar is definitely the right thing to use here.

Regards,

Anthony Liguori

> - --
> ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.7 (GNU/Linux)
>
> iD8DBQFIE2CD2ijCOnn/RHQRAs4kAJ40kbWjNJAzj2gGdbo/sSxZTx5b0ACglbis
> kw7ST4eJK9CXhNbjKphNsUo=
> =ISaC
> -END PGP SIGNATURE-
>
> -
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
> Don't miss this year's exciting event. There's still time to save $100. 
> Use priority code J8TL2D2. 
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
> ___
> kvm-devel mailing list
> kvm-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>   


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Fix QEMU vcpu thread race with apic_reset

2008-04-26 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Ryan Harper wrote:
> @@ -388,9 +395,10 @@ static void kvm_add_signal(struct qemu_kvm_signal_table 
> *sigtab, int signum)
>  
>  void kvm_init_new_ap(int cpu, CPUState *env)
>  {
> +pthread_mutex_lock(&vcpu_mutex);
>  pthread_create(&vcpu_info[cpu].thread, NULL, ap_main_loop, env);
> -/* FIXME: wait for thread to spin up */
> -usleep(200);
> +pthread_cond_wait(&qemu_vcpuup_cond, &vcpu_mutex);
> +pthread_mutex_unlock(&vcpu_mutex);

And something is very wrong here.  The pattern for using a condvar is

1 take mutex

2 check condition

3 if condition is not fulfilled
3a   call cond_wait
3b   when returning, go back to step 2

4 unlock mutex


Anything else is buggy.

So, either your condvar use is wrong or you don't really want a condvar
in the first place.  I haven't checked the code.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFIE2CD2ijCOnn/RHQRAs4kAJ40kbWjNJAzj2gGdbo/sSxZTx5b0ACglbis
kw7ST4eJK9CXhNbjKphNsUo=
=ISaC
-END PGP SIGNATURE-

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Fix QEMU vcpu thread race with apic_reset

2008-04-26 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Ryan Harper wrote:
> +/* block until cond_wait occurs */
> +pthread_mutex_lock(&vcpu_mutex);
> +/* now we can signal */
> +pthread_cond_signal(&qemu_vcpuup_cond);
> +pthread_mutex_unlock(&vcpu_mutex);

It is not necessary to take the mutex for a condvar if you're just
waking a waiter.  This just unnecessarily slows things down.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFIE19D2ijCOnn/RHQRAvLqAJ9j+7ICPHpB/gCthCelDgYn5poGMgCfaZVy
+tE5zOuxqlBMUR7Fgufw/wY=
=5j4s
-END PGP SIGNATURE-

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Fix QEMU vcpu thread race with apic_reset

2008-04-26 Thread Avi Kivity
Ryan Harper wrote:
> There is a race between when the vcpu thread issues a create ioctl and when
> apic_reset() gets called resulting in getting a badfd error.
>
>   

The problem is indeed there, but the fix is wrong:

> main threadvcpu thread
> diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c
> index 78127de..3513e8c 100644
> --- a/qemu/qemu-kvm.c
> +++ b/qemu/qemu-kvm.c
> @@ -31,7 +31,9 @@ extern int smp_cpus;
>  static int qemu_kvm_reset_requested;
>  
>  pthread_mutex_t qemu_mutex = PTHREAD_MUTEX_INITIALIZER;
> +pthread_mutex_t vcpu_mutex = PTHREAD_MUTEX_INITIALIZER;
>  pthread_cond_t qemu_aio_cond = PTHREAD_COND_INITIALIZER;
> +pthread_cond_t qemu_vcpuup_cond = PTHREAD_COND_INITIALIZER;
>  __thread struct vcpu_info *vcpu;
>  
>  struct qemu_kvm_signal_table {
> @@ -369,6 +371,11 @@ static void *ap_main_loop(void *_env)
>  sigfillset(&signals);
>  sigprocmask(SIG_BLOCK, &signals, NULL);
>  kvm_create_vcpu(kvm_context, env->cpu_index);
> +/* block until cond_wait occurs */
> +pthread_mutex_lock(&vcpu_mutex);
> +/* now we can signal */
> +pthread_cond_signal(&qemu_vcpuup_cond);
> +pthread_mutex_unlock(&vcpu_mutex);
>  kvm_qemu_init_env(env);
>  kvm_main_loop_cpu(env);
>  return NULL;
> @@ -388,9 +395,10 @@ static void kvm_add_signal(struct qemu_kvm_signal_table 
> *sigtab, int signum)
>  
>  void kvm_init_new_ap(int cpu, CPUState *env)
>  {
> +pthread_mutex_lock(&vcpu_mutex);
>  pthread_create(&vcpu_info[cpu].thread, NULL, ap_main_loop, env);
> -/* FIXME: wait for thread to spin up */
> -usleep(200);
> +pthread_cond_wait(&qemu_vcpuup_cond, &vcpu_mutex);
>   

pthread_cond_wait() is never correct outside a loop.  The signal may 
arrive before wait is called.

The usual idiom is

while (condition is not fulfilled)
 pthread_cond_wait();

I see you have something there to ensure we block, but please use the 
right idiom.

> +pthread_mutex_unlock(&vcpu_mutex);
>  }
>  

Please reuse qemu_mutex for this, no need for a new one.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Fix QEMU vcpu thread race with apic_reset

2008-04-25 Thread Ryan Harper
* Ryan Harper <[EMAIL PROTECTED]> [2008-04-26 00:27]:
> There is a race between when the vcpu thread issues a create ioctl and when
> apic_reset() gets called resulting in getting a badfd error.
> 
> main threadvcpu thread

guilt refresh clipped my text short.


main threadvcpu thread
------
qemu/hw/pc.c:pc_new_cpu()
cpu_init()
cpu_x86_init()
kvm_init_new_ap()  ap_main_loop()
  *blocks*
usleep()
apic_init()
kvm_set_lapic()
kvm_ioctl with unitilized context
badfd

To fix this, ensure we create the vcpu in the vcpu thread before returning from
kvm_init_new_ap.  Synchronize on a new mutux, vcpu_mutex, and wait for the
vcpuup condition before signaling to ensure the main thread is waiting before we
send the signal.

With this patch, I can launch 64 kvm guests, 1 second apart and not see any Bad
File descriptor errors.

Signed-off-by: Ryan Harper <[EMAIL PROTECTED]>


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
[EMAIL PROTECTED]

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH] Fix QEMU vcpu thread race with apic_reset

2008-04-25 Thread Ryan Harper
There is a race between when the vcpu thread issues a create ioctl and when
apic_reset() gets called resulting in getting a badfd error.

main threadvcpu thread
diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c
index 78127de..3513e8c 100644
--- a/qemu/qemu-kvm.c
+++ b/qemu/qemu-kvm.c
@@ -31,7 +31,9 @@ extern int smp_cpus;
 static int qemu_kvm_reset_requested;
 
 pthread_mutex_t qemu_mutex = PTHREAD_MUTEX_INITIALIZER;
+pthread_mutex_t vcpu_mutex = PTHREAD_MUTEX_INITIALIZER;
 pthread_cond_t qemu_aio_cond = PTHREAD_COND_INITIALIZER;
+pthread_cond_t qemu_vcpuup_cond = PTHREAD_COND_INITIALIZER;
 __thread struct vcpu_info *vcpu;
 
 struct qemu_kvm_signal_table {
@@ -369,6 +371,11 @@ static void *ap_main_loop(void *_env)
 sigfillset(&signals);
 sigprocmask(SIG_BLOCK, &signals, NULL);
 kvm_create_vcpu(kvm_context, env->cpu_index);
+/* block until cond_wait occurs */
+pthread_mutex_lock(&vcpu_mutex);
+/* now we can signal */
+pthread_cond_signal(&qemu_vcpuup_cond);
+pthread_mutex_unlock(&vcpu_mutex);
 kvm_qemu_init_env(env);
 kvm_main_loop_cpu(env);
 return NULL;
@@ -388,9 +395,10 @@ static void kvm_add_signal(struct qemu_kvm_signal_table 
*sigtab, int signum)
 
 void kvm_init_new_ap(int cpu, CPUState *env)
 {
+pthread_mutex_lock(&vcpu_mutex);
 pthread_create(&vcpu_info[cpu].thread, NULL, ap_main_loop, env);
-/* FIXME: wait for thread to spin up */
-usleep(200);
+pthread_cond_wait(&qemu_vcpuup_cond, &vcpu_mutex);
+pthread_mutex_unlock(&vcpu_mutex);
 }
 
 static void qemu_kvm_init_signal_tables(void)

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel