Re: [Qemu-devel] [RFC v1 09/11] tcg: drop global lock during TCG code execution

2016-03-23 Thread Jan Kiszka
On 2016-03-23 17:27, Alex Bennée wrote:
> 
> KONRAD Frederic  writes:
> 
>> Hi Alex,
>>
>> Thanks for having pulling all that together!
>> About this patch the original author was Jan Kiszka
>> (https://lists.gnu.org/archive/html/qemu-devel/2011-09/msg03323.html)
>>
>> This has probably been dropped during a rebase.
> 
> I'll amend the author on v2.
> 
> Jan,
> 
> The original didn't have a s-o-b tag from you. Are you happy to give one
> for its current iteration?

You can consider it

Signed-off-by: Jan Kiszka 

At that point, I didn't consider it ready for upstream (see the log), so
I left out the signature.

Jan

> 
>>
>> Thanks,
>> Fred
>>
>> Le 18/03/2016 17:18, Alex Bennée a écrit :
>>> From: KONRAD Frederic 
>>>
>>> This finally allows TCG to benefit from the iothread introduction: Drop
>>> the global mutex while running pure TCG CPU code. Reacquire the lock
>>> when entering MMIO or PIO emulation, or when leaving the TCG loop.
>>>
>>> We have to revert a few optimization for the current TCG threading
>>> model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
>>> kicking it in qemu_cpu_kick. We also need to disable RAM block
>>> reordering until we have a more efficient locking mechanism at hand.
>>>
>>> Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
>>> These numbers demonstrate where we gain something:
>>>
>>> 20338 jan   20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
>>> 20337 jan   20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm
>>>
>>> The guest CPU was fully loaded, but the iothread could still run mostly
>>> independent on a second core. Without the patch we don't get beyond
>>>
>>> 32206 jan   20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
>>> 32204 jan   20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm
>>>
>>> We don't benefit significantly, though, when the guest is not fully
>>> loading a host CPU.
>>>
>>> Message-Id: <1439220437-23957-10-git-send-email-fred.kon...@greensocs.com>
>>> Signed-off-by: KONRAD Frederic 
>>> [AJB: -smp single-threaded fix, rm old info from commit msg]
>>> Signed-off-by: Alex Bennée 
>>>
>>> ---
>>> v1 (ajb):
>>>- SMP failure now fixed by previous commit
>>> Changes from Fred Konrad (mttcg-v7 via paolo):
>>>* Rebase on the current HEAD.
>>>* Fixes a deadlock in qemu_devices_reset().
>>>* Remove the mutex in address_space_*
>>> ---
>>>   cpu-exec.c | 10 ++
>>>   cpus.c | 26 +++---
>>>   cputlb.c   |  4 
>>>   exec.c | 12 
>>>   hw/i386/kvmvapic.c |  3 +++
>>>   softmmu_template.h | 17 +
>>>   translate-all.c| 11 +--
>>>   7 files changed, 58 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/cpu-exec.c b/cpu-exec.c
>>> index 3572256..76891fd 100644
>>> --- a/cpu-exec.c
>>> +++ b/cpu-exec.c
>>> @@ -28,6 +28,7 @@
>>>   #include "qemu/rcu.h"
>>>   #include "exec/tb-hash.h"
>>>   #include "exec/log.h"
>>> +#include "qemu/main-loop.h"
>>>   #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
>>>   #include "hw/i386/apic.h"
>>>   #endif
>>> @@ -449,6 +450,13 @@ int cpu_exec(CPUState *cpu)
>>>   for(;;) {
>>>   interrupt_request = cpu->interrupt_request;
>>>   if (unlikely(interrupt_request)) {
>>> +/* FIXME: this needs to take the iothread lock.
>>> + * For this we need to find all places in
>>> + * cc->cpu_exec_interrupt that can call cpu_loop_exit,
>>> + * and call qemu_unlock_iothread_mutex() there.  Else,
>>> + * add a flag telling cpu_loop_exit() to unlock it.
>>> + */
>>> +
>>>   if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
>>>   /* Mask out external interrupts for this step. */
>>>   interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
>>> @@ -501,6 +509,7 @@ int cpu_exec(CPUState *cpu)
>>>  the program flow was changed */
>>>   next_tb = 0;
>>>   }
>>> +
>>>   }
>>>   if (unlikely(cpu->exit_request
>>>|| replay_has_interrupt())) {
>>> @@ -618,6 +627,7 @@ int cpu_exec(CPUState *cpu)
>>>   g_assert(env == _cpu->env);
>>>   #endif
>>>   #endif /* buggy compiler */
>>> +
>>>   cpu->can_do_io = 1;
>>>   tb_lock_reset();
>>>   }
>>> diff --git a/cpus.c b/cpus.c
>>> index a87fbf9..0e3d5f9 100644
>>> --- a/cpus.c
>>> +++ b/cpus.c
>>> @@ -911,8 +911,6 @@ static void qemu_kvm_init_cpu_signals(CPUState *cpu)
>>>   #endif /* _WIN32 */
>>>
>>>   static QemuMutex qemu_global_mutex;
>>> -static 

Re: [Qemu-devel] [RFC v1 09/11] tcg: drop global lock during TCG code execution

2016-03-23 Thread Alex Bennée

KONRAD Frederic  writes:

> Hi Alex,
>
> Thanks for having pulling all that together!
> About this patch the original author was Jan Kiszka
> (https://lists.gnu.org/archive/html/qemu-devel/2011-09/msg03323.html)
>
> This has probably been dropped during a rebase.

I'll amend the author on v2.

Jan,

The original didn't have a s-o-b tag from you. Are you happy to give one
for its current iteration?

>
> Thanks,
> Fred
>
> Le 18/03/2016 17:18, Alex Bennée a écrit :
>> From: KONRAD Frederic 
>>
>> This finally allows TCG to benefit from the iothread introduction: Drop
>> the global mutex while running pure TCG CPU code. Reacquire the lock
>> when entering MMIO or PIO emulation, or when leaving the TCG loop.
>>
>> We have to revert a few optimization for the current TCG threading
>> model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
>> kicking it in qemu_cpu_kick. We also need to disable RAM block
>> reordering until we have a more efficient locking mechanism at hand.
>>
>> Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
>> These numbers demonstrate where we gain something:
>>
>> 20338 jan   20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
>> 20337 jan   20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm
>>
>> The guest CPU was fully loaded, but the iothread could still run mostly
>> independent on a second core. Without the patch we don't get beyond
>>
>> 32206 jan   20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
>> 32204 jan   20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm
>>
>> We don't benefit significantly, though, when the guest is not fully
>> loading a host CPU.
>>
>> Message-Id: <1439220437-23957-10-git-send-email-fred.kon...@greensocs.com>
>> Signed-off-by: KONRAD Frederic 
>> [AJB: -smp single-threaded fix, rm old info from commit msg]
>> Signed-off-by: Alex Bennée 
>>
>> ---
>> v1 (ajb):
>>- SMP failure now fixed by previous commit
>> Changes from Fred Konrad (mttcg-v7 via paolo):
>>* Rebase on the current HEAD.
>>* Fixes a deadlock in qemu_devices_reset().
>>* Remove the mutex in address_space_*
>> ---
>>   cpu-exec.c | 10 ++
>>   cpus.c | 26 +++---
>>   cputlb.c   |  4 
>>   exec.c | 12 
>>   hw/i386/kvmvapic.c |  3 +++
>>   softmmu_template.h | 17 +
>>   translate-all.c| 11 +--
>>   7 files changed, 58 insertions(+), 25 deletions(-)
>>
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index 3572256..76891fd 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -28,6 +28,7 @@
>>   #include "qemu/rcu.h"
>>   #include "exec/tb-hash.h"
>>   #include "exec/log.h"
>> +#include "qemu/main-loop.h"
>>   #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
>>   #include "hw/i386/apic.h"
>>   #endif
>> @@ -449,6 +450,13 @@ int cpu_exec(CPUState *cpu)
>>   for(;;) {
>>   interrupt_request = cpu->interrupt_request;
>>   if (unlikely(interrupt_request)) {
>> +/* FIXME: this needs to take the iothread lock.
>> + * For this we need to find all places in
>> + * cc->cpu_exec_interrupt that can call cpu_loop_exit,
>> + * and call qemu_unlock_iothread_mutex() there.  Else,
>> + * add a flag telling cpu_loop_exit() to unlock it.
>> + */
>> +
>>   if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
>>   /* Mask out external interrupts for this step. */
>>   interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
>> @@ -501,6 +509,7 @@ int cpu_exec(CPUState *cpu)
>>  the program flow was changed */
>>   next_tb = 0;
>>   }
>> +
>>   }
>>   if (unlikely(cpu->exit_request
>>|| replay_has_interrupt())) {
>> @@ -618,6 +627,7 @@ int cpu_exec(CPUState *cpu)
>>   g_assert(env == _cpu->env);
>>   #endif
>>   #endif /* buggy compiler */
>> +
>>   cpu->can_do_io = 1;
>>   tb_lock_reset();
>>   }
>> diff --git a/cpus.c b/cpus.c
>> index a87fbf9..0e3d5f9 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -911,8 +911,6 @@ static void qemu_kvm_init_cpu_signals(CPUState *cpu)
>>   #endif /* _WIN32 */
>>
>>   static QemuMutex qemu_global_mutex;
>> -static QemuCond qemu_io_proceeded_cond;
>> -static unsigned iothread_requesting_mutex;
>>
>>   static QemuThread io_thread;
>>
>> @@ -928,7 +926,6 @@ void qemu_init_cpu_loop(void)
>>   qemu_cond_init(_cpu_cond);
>>   qemu_cond_init(_pause_cond);
>>   qemu_cond_init(_work_cond);
>> -qemu_cond_init(_io_proceeded_cond);
>>   

Re: [Qemu-devel] [RFC v1 09/11] tcg: drop global lock during TCG code execution

2016-03-23 Thread KONRAD Frederic

Hi Alex,

Thanks for having pulling all that together!
About this patch the original author was Jan Kiszka 
(https://lists.gnu.org/archive/html/qemu-devel/2011-09/msg03323.html)


This has probably been dropped during a rebase.

Thanks,
Fred

Le 18/03/2016 17:18, Alex Bennée a écrit :

From: KONRAD Frederic 

This finally allows TCG to benefit from the iothread introduction: Drop
the global mutex while running pure TCG CPU code. Reacquire the lock
when entering MMIO or PIO emulation, or when leaving the TCG loop.

We have to revert a few optimization for the current TCG threading
model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
kicking it in qemu_cpu_kick. We also need to disable RAM block
reordering until we have a more efficient locking mechanism at hand.

Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
These numbers demonstrate where we gain something:

20338 jan   20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
20337 jan   20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm

The guest CPU was fully loaded, but the iothread could still run mostly
independent on a second core. Without the patch we don't get beyond

32206 jan   20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
32204 jan   20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm

We don't benefit significantly, though, when the guest is not fully
loading a host CPU.

Message-Id: <1439220437-23957-10-git-send-email-fred.kon...@greensocs.com>
Signed-off-by: KONRAD Frederic 
[AJB: -smp single-threaded fix, rm old info from commit msg]
Signed-off-by: Alex Bennée 

---
v1 (ajb):
   - SMP failure now fixed by previous commit
Changes from Fred Konrad (mttcg-v7 via paolo):
   * Rebase on the current HEAD.
   * Fixes a deadlock in qemu_devices_reset().
   * Remove the mutex in address_space_*
---
  cpu-exec.c | 10 ++
  cpus.c | 26 +++---
  cputlb.c   |  4 
  exec.c | 12 
  hw/i386/kvmvapic.c |  3 +++
  softmmu_template.h | 17 +
  translate-all.c| 11 +--
  7 files changed, 58 insertions(+), 25 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 3572256..76891fd 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -28,6 +28,7 @@
  #include "qemu/rcu.h"
  #include "exec/tb-hash.h"
  #include "exec/log.h"
+#include "qemu/main-loop.h"
  #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
  #include "hw/i386/apic.h"
  #endif
@@ -449,6 +450,13 @@ int cpu_exec(CPUState *cpu)
  for(;;) {
  interrupt_request = cpu->interrupt_request;
  if (unlikely(interrupt_request)) {
+/* FIXME: this needs to take the iothread lock.
+ * For this we need to find all places in
+ * cc->cpu_exec_interrupt that can call cpu_loop_exit,
+ * and call qemu_unlock_iothread_mutex() there.  Else,
+ * add a flag telling cpu_loop_exit() to unlock it.
+ */
+
  if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
  /* Mask out external interrupts for this step. */
  interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
@@ -501,6 +509,7 @@ int cpu_exec(CPUState *cpu)
 the program flow was changed */
  next_tb = 0;
  }
+
  }
  if (unlikely(cpu->exit_request
   || replay_has_interrupt())) {
@@ -618,6 +627,7 @@ int cpu_exec(CPUState *cpu)
  g_assert(env == _cpu->env);
  #endif
  #endif /* buggy compiler */
+
  cpu->can_do_io = 1;
  tb_lock_reset();
  }
diff --git a/cpus.c b/cpus.c
index a87fbf9..0e3d5f9 100644
--- a/cpus.c
+++ b/cpus.c
@@ -911,8 +911,6 @@ static void qemu_kvm_init_cpu_signals(CPUState *cpu)
  #endif /* _WIN32 */
  
  static QemuMutex qemu_global_mutex;

-static QemuCond qemu_io_proceeded_cond;
-static unsigned iothread_requesting_mutex;
  
  static QemuThread io_thread;
  
@@ -928,7 +926,6 @@ void qemu_init_cpu_loop(void)

  qemu_cond_init(_cpu_cond);
  qemu_cond_init(_pause_cond);
  qemu_cond_init(_work_cond);
-qemu_cond_init(_io_proceeded_cond);
  qemu_mutex_init(_global_mutex);
  
  qemu_thread_get_self(_thread);

@@ -1043,10 +1040,6 @@ static void qemu_tcg_wait_io_event(CPUState *cpu)
  qemu_cond_wait(cpu->halt_cond, _global_mutex);
  }
  
-while (iothread_requesting_mutex) {

-qemu_cond_wait(_io_proceeded_cond, _global_mutex);
-}
-
  CPU_FOREACH(cpu) {
  qemu_wait_io_event_common(cpu);
  }
@@ -1314,22 +1307,7 @@ bool qemu_mutex_iothread_locked(void)
  
  void qemu_mutex_lock_iothread(void)

  {
-

[Qemu-devel] [RFC v1 09/11] tcg: drop global lock during TCG code execution

2016-03-18 Thread Alex Bennée
From: KONRAD Frederic 

This finally allows TCG to benefit from the iothread introduction: Drop
the global mutex while running pure TCG CPU code. Reacquire the lock
when entering MMIO or PIO emulation, or when leaving the TCG loop.

We have to revert a few optimization for the current TCG threading
model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
kicking it in qemu_cpu_kick. We also need to disable RAM block
reordering until we have a more efficient locking mechanism at hand.

Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
These numbers demonstrate where we gain something:

20338 jan   20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
20337 jan   20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm

The guest CPU was fully loaded, but the iothread could still run mostly
independent on a second core. Without the patch we don't get beyond

32206 jan   20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
32204 jan   20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm

We don't benefit significantly, though, when the guest is not fully
loading a host CPU.

Message-Id: <1439220437-23957-10-git-send-email-fred.kon...@greensocs.com>
Signed-off-by: KONRAD Frederic 
[AJB: -smp single-threaded fix, rm old info from commit msg]
Signed-off-by: Alex Bennée 

---
v1 (ajb):
  - SMP failure now fixed by previous commit
Changes from Fred Konrad (mttcg-v7 via paolo):
  * Rebase on the current HEAD.
  * Fixes a deadlock in qemu_devices_reset().
  * Remove the mutex in address_space_*
---
 cpu-exec.c | 10 ++
 cpus.c | 26 +++---
 cputlb.c   |  4 
 exec.c | 12 
 hw/i386/kvmvapic.c |  3 +++
 softmmu_template.h | 17 +
 translate-all.c| 11 +--
 7 files changed, 58 insertions(+), 25 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 3572256..76891fd 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -28,6 +28,7 @@
 #include "qemu/rcu.h"
 #include "exec/tb-hash.h"
 #include "exec/log.h"
+#include "qemu/main-loop.h"
 #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
 #include "hw/i386/apic.h"
 #endif
@@ -449,6 +450,13 @@ int cpu_exec(CPUState *cpu)
 for(;;) {
 interrupt_request = cpu->interrupt_request;
 if (unlikely(interrupt_request)) {
+/* FIXME: this needs to take the iothread lock.
+ * For this we need to find all places in
+ * cc->cpu_exec_interrupt that can call cpu_loop_exit,
+ * and call qemu_unlock_iothread_mutex() there.  Else,
+ * add a flag telling cpu_loop_exit() to unlock it.
+ */
+
 if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
 /* Mask out external interrupts for this step. */
 interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
@@ -501,6 +509,7 @@ int cpu_exec(CPUState *cpu)
the program flow was changed */
 next_tb = 0;
 }
+
 }
 if (unlikely(cpu->exit_request
  || replay_has_interrupt())) {
@@ -618,6 +627,7 @@ int cpu_exec(CPUState *cpu)
 g_assert(env == _cpu->env);
 #endif
 #endif /* buggy compiler */
+
 cpu->can_do_io = 1;
 tb_lock_reset();
 }
diff --git a/cpus.c b/cpus.c
index a87fbf9..0e3d5f9 100644
--- a/cpus.c
+++ b/cpus.c
@@ -911,8 +911,6 @@ static void qemu_kvm_init_cpu_signals(CPUState *cpu)
 #endif /* _WIN32 */
 
 static QemuMutex qemu_global_mutex;
-static QemuCond qemu_io_proceeded_cond;
-static unsigned iothread_requesting_mutex;
 
 static QemuThread io_thread;
 
@@ -928,7 +926,6 @@ void qemu_init_cpu_loop(void)
 qemu_cond_init(_cpu_cond);
 qemu_cond_init(_pause_cond);
 qemu_cond_init(_work_cond);
-qemu_cond_init(_io_proceeded_cond);
 qemu_mutex_init(_global_mutex);
 
 qemu_thread_get_self(_thread);
@@ -1043,10 +1040,6 @@ static void qemu_tcg_wait_io_event(CPUState *cpu)
 qemu_cond_wait(cpu->halt_cond, _global_mutex);
 }
 
-while (iothread_requesting_mutex) {
-qemu_cond_wait(_io_proceeded_cond, _global_mutex);
-}
-
 CPU_FOREACH(cpu) {
 qemu_wait_io_event_common(cpu);
 }
@@ -1314,22 +1307,7 @@ bool qemu_mutex_iothread_locked(void)
 
 void qemu_mutex_lock_iothread(void)
 {
-atomic_inc(_requesting_mutex);
-/* In the simple case there is no need to bump the VCPU thread out of
- * TCG code execution.
- */
-if (!tcg_enabled() || qemu_in_vcpu_thread() ||
-!first_cpu || !first_cpu->created) {
-qemu_mutex_lock(_global_mutex);
-atomic_dec(_requesting_mutex);
-} else {
-if 

Re: [Qemu-devel] [RFC v1 09/11] tcg: drop global lock during TCG code execution

2016-03-18 Thread Paolo Bonzini


On 18/03/2016 17:18, Alex Bennée wrote:
> From: KONRAD Frederic 
> 
> This finally allows TCG to benefit from the iothread introduction: Drop
> the global mutex while running pure TCG CPU code. Reacquire the lock
> when entering MMIO or PIO emulation, or when leaving the TCG loop.
> 
> We have to revert a few optimization for the current TCG threading
> model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
> kicking it in qemu_cpu_kick. We also need to disable RAM block
> reordering until we have a more efficient locking mechanism at hand.
> 
> Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
> These numbers demonstrate where we gain something:
> 
> 20338 jan   20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
> 20337 jan   20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm
> 
> The guest CPU was fully loaded, but the iothread could still run mostly
> independent on a second core. Without the patch we don't get beyond
> 
> 32206 jan   20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
> 32204 jan   20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm
> 
> We don't benefit significantly, though, when the guest is not fully
> loading a host CPU.
> 
> Message-Id: <1439220437-23957-10-git-send-email-fred.kon...@greensocs.com>
> Signed-off-by: KONRAD Frederic 
> [AJB: -smp single-threaded fix, rm old info from commit msg]
> Signed-off-by: Alex Bennée 

You should probably squash this and patch 10 together.

Paolo