Re: [Qemu-devel] [PATCH v8 08/25] tcg: drop global lock during TCG code execution

2017-01-30 Thread Richard Henderson
On 01/30/2017 01:57 AM, Alex Bennée wrote:
>>> diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
>>> index a9ee7fddf9..2624d8d909 100644
>>> --- a/hw/intc/arm_gicv3_cpuif.c
>>> +++ b/hw/intc/arm_gicv3_cpuif.c
>>> @@ -14,6 +14,7 @@
>>>
>>>  #include "qemu/osdep.h"
>>>  #include "qemu/bitops.h"
>>> +#include "qemu/main-loop.h"
>>>  #include "trace.h"
>>>  #include "gicv3_internal.h"
>>>  #include "cpu.h"
>>> @@ -733,6 +734,8 @@ void gicv3_cpuif_update(GICv3CPUState *cs)
>>>  ARMCPU *cpu = ARM_CPU(cs->cpu);
>>>  CPUARMState *env = >env;
>>>
>>> +g_assert(qemu_mutex_iothread_locked());
>>> +
>> tcg_debug_assert()?
> Depends if KVM can use this for emulation as well (which I think it can).
> 

It probably could, but it shouldn't.  Let's not leak tcg_* into hw/.


r~



Re: [Qemu-devel] [PATCH v8 08/25] tcg: drop global lock during TCG code execution

2017-01-30 Thread Alex Bennée

Pranith Kumar  writes:

> Alex Bennée writes:
>
>> From: Jan Kiszka 
>>
>> This finally allows TCG to benefit from the iothread introduction: Drop
>> the global mutex while running pure TCG CPU code. Reacquire the lock
>> when entering MMIO or PIO emulation, or when leaving the TCG loop.
>>
>> We have to revert a few optimization for the current TCG threading
>> model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
>> kicking it in qemu_cpu_kick. We also need to disable RAM block
>> reordering until we have a more efficient locking mechanism at hand.
>>
>> Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
>> These numbers demonstrate where we gain something:
>>
>> 20338 jan   20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
>> 20337 jan   20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm
>>
>> The guest CPU was fully loaded, but the iothread could still run mostly
>> independent on a second core. Without the patch we don't get beyond
>>
>> 32206 jan   20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
>> 32204 jan   20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm
>>
>> We don't benefit significantly, though, when the guest is not fully
>> loading a host CPU.
>>
>> Signed-off-by: Jan Kiszka 
>> Message-Id: <1439220437-23957-10-git-send-email-fred.kon...@greensocs.com>
>> [FK: Rebase, fix qemu_devices_reset deadlock, rm address_space_* mutex]
>> Signed-off-by: KONRAD Frederic 
>> [EGC: fixed iothread lock for cpu-exec IRQ handling]
>> Signed-off-by: Emilio G. Cota 
>> [AJB: -smp single-threaded fix, clean commit msg, BQL fixes]
>> Signed-off-by: Alex Bennée 
>> Reviewed-by: Richard Henderson 
>>
>> ---
>> v8:
>>  - merged in BQL fixes for PPC target: ppc_set_irq
>>  - merged in BQL fixes for ARM target: ARM_CP_IO helpers
>>  - merged in BQL fixes for ARM target: arm_call_el_change_hook
>>
>> v5 (ajb, base patches):
>>  - added an assert to BQL unlock/lock functions instead of hanging
>>  - ensure all cpu->interrupt_requests *modifications* protected by BQL
>>  - add a re-read on cpu->interrupt_request for correctness
>>  - BQL fixes for:
>>- assert BQL held for PPC hypercalls (emulate_spar_hypercall)
>>- SCLP service calls on s390x
>>  - merge conflict with kick timer patch
>> v4 (ajb, base patches):
>>  - protect cpu->interrupt updates with BQL
>>  - fix wording io_mem_notdirty calls
>>  - s/we/with/
>> v3 (ajb, base-patches):
>>   - stale iothread_unlocks removed (cpu_exit/resume_from_signal deals
>>   with it in the longjmp).
>>   - fix re-base conflicts
>> v2 (ajb):
>>   - merge with tcg: grab iothread lock in cpu-exec interrupt handling
>>   - use existing fns for tracking lock state
>>   - lock iothread for mem_region
>> - add assert on mem region modification
>> - ensure smm_helper holds iothread
>>   - Add JK s-o-b
>>   - Fix-up FK s-o-b annotation
>> v1 (ajb, base-patches):
>>   - SMP failure now fixed by previous commit
>>
>> Changes from Fred Konrad (mttcg-v7 via paolo):
>>   * Rebase on the current HEAD.
>>   * Fixes a deadlock in qemu_devices_reset().
>>   * Remove the mutex in address_space_*
>> ---
>>  cpu-exec.c | 20 ++--
>>  cpus.c | 28 +---
>>  cputlb.c   | 21 -
>>  exec.c | 12 +---
>>  hw/core/irq.c  |  1 +
>>  hw/i386/kvmvapic.c |  4 ++--
>>  hw/intc/arm_gicv3_cpuif.c  |  3 +++
>>  hw/ppc/ppc.c   | 16 +++-
>>  hw/ppc/spapr.c |  3 +++
>>  include/qom/cpu.h  |  1 +
>>  memory.c   |  2 ++
>>  qom/cpu.c  | 10 ++
>>  target/arm/helper.c|  6 ++
>>  target/arm/op_helper.c | 43 +++
>>  target/i386/smm_helper.c   |  7 +++
>>  target/s390x/misc_helper.c |  5 -
>>  translate-all.c|  9 +++--
>>  translate-common.c | 21 +++--
>>  18 files changed, 163 insertions(+), 49 deletions(-)
>>
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index f9e836c8dd..f42a128bdf 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -29,6 +29,7 @@
>>  #include "qemu/rcu.h"
>>  #include "exec/tb-hash.h"
>>  #include "exec/log.h"
>> +#include "qemu/main-loop.h"
>>  #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
>>  #include "hw/i386/apic.h"
>>  #endif
>> @@ -388,8 +389,10 @@ static inline bool cpu_handle_halt(CPUState *cpu)
>>  if ((cpu->interrupt_request & CPU_INTERRUPT_POLL)
>>  && replay_interrupt()) {
>>  X86CPU *x86_cpu = X86_CPU(cpu);
>> +qemu_mutex_lock_iothread();
>>  apic_poll_irq(x86_cpu->apic_state);
>>  cpu_reset_interrupt(cpu, 

Re: [Qemu-devel] [PATCH v8 08/25] tcg: drop global lock during TCG code execution

2017-01-29 Thread Pranith Kumar

Alex Bennée writes:

> From: Jan Kiszka 
>
> This finally allows TCG to benefit from the iothread introduction: Drop
> the global mutex while running pure TCG CPU code. Reacquire the lock
> when entering MMIO or PIO emulation, or when leaving the TCG loop.
>
> We have to revert a few optimization for the current TCG threading
> model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
> kicking it in qemu_cpu_kick. We also need to disable RAM block
> reordering until we have a more efficient locking mechanism at hand.
>
> Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
> These numbers demonstrate where we gain something:
>
> 20338 jan   20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
> 20337 jan   20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm
>
> The guest CPU was fully loaded, but the iothread could still run mostly
> independent on a second core. Without the patch we don't get beyond
>
> 32206 jan   20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
> 32204 jan   20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm
>
> We don't benefit significantly, though, when the guest is not fully
> loading a host CPU.
>
> Signed-off-by: Jan Kiszka 
> Message-Id: <1439220437-23957-10-git-send-email-fred.kon...@greensocs.com>
> [FK: Rebase, fix qemu_devices_reset deadlock, rm address_space_* mutex]
> Signed-off-by: KONRAD Frederic 
> [EGC: fixed iothread lock for cpu-exec IRQ handling]
> Signed-off-by: Emilio G. Cota 
> [AJB: -smp single-threaded fix, clean commit msg, BQL fixes]
> Signed-off-by: Alex Bennée 
> Reviewed-by: Richard Henderson 
>
> ---
> v8:
>  - merged in BQL fixes for PPC target: ppc_set_irq
>  - merged in BQL fixes for ARM target: ARM_CP_IO helpers
>  - merged in BQL fixes for ARM target: arm_call_el_change_hook
>
> v5 (ajb, base patches):
>  - added an assert to BQL unlock/lock functions instead of hanging
>  - ensure all cpu->interrupt_requests *modifications* protected by BQL
>  - add a re-read on cpu->interrupt_request for correctness
>  - BQL fixes for:
>- assert BQL held for PPC hypercalls (emulate_spar_hypercall)
>- SCLP service calls on s390x
>  - merge conflict with kick timer patch
> v4 (ajb, base patches):
>  - protect cpu->interrupt updates with BQL
>  - fix wording io_mem_notdirty calls
>  - s/we/with/
> v3 (ajb, base-patches):
>   - stale iothread_unlocks removed (cpu_exit/resume_from_signal deals
>   with it in the longjmp).
>   - fix re-base conflicts
> v2 (ajb):
>   - merge with tcg: grab iothread lock in cpu-exec interrupt handling
>   - use existing fns for tracking lock state
>   - lock iothread for mem_region
> - add assert on mem region modification
> - ensure smm_helper holds iothread
>   - Add JK s-o-b
>   - Fix-up FK s-o-b annotation
> v1 (ajb, base-patches):
>   - SMP failure now fixed by previous commit
>
> Changes from Fred Konrad (mttcg-v7 via paolo):
>   * Rebase on the current HEAD.
>   * Fixes a deadlock in qemu_devices_reset().
>   * Remove the mutex in address_space_*
> ---
>  cpu-exec.c | 20 ++--
>  cpus.c | 28 +---
>  cputlb.c   | 21 -
>  exec.c | 12 +---
>  hw/core/irq.c  |  1 +
>  hw/i386/kvmvapic.c |  4 ++--
>  hw/intc/arm_gicv3_cpuif.c  |  3 +++
>  hw/ppc/ppc.c   | 16 +++-
>  hw/ppc/spapr.c |  3 +++
>  include/qom/cpu.h  |  1 +
>  memory.c   |  2 ++
>  qom/cpu.c  | 10 ++
>  target/arm/helper.c|  6 ++
>  target/arm/op_helper.c | 43 +++
>  target/i386/smm_helper.c   |  7 +++
>  target/s390x/misc_helper.c |  5 -
>  translate-all.c|  9 +++--
>  translate-common.c | 21 +++--
>  18 files changed, 163 insertions(+), 49 deletions(-)
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index f9e836c8dd..f42a128bdf 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -29,6 +29,7 @@
>  #include "qemu/rcu.h"
>  #include "exec/tb-hash.h"
>  #include "exec/log.h"
> +#include "qemu/main-loop.h"
>  #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
>  #include "hw/i386/apic.h"
>  #endif
> @@ -388,8 +389,10 @@ static inline bool cpu_handle_halt(CPUState *cpu)
>  if ((cpu->interrupt_request & CPU_INTERRUPT_POLL)
>  && replay_interrupt()) {
>  X86CPU *x86_cpu = X86_CPU(cpu);
> +qemu_mutex_lock_iothread();
>  apic_poll_irq(x86_cpu->apic_state);
>  cpu_reset_interrupt(cpu, CPU_INTERRUPT_POLL);
> +qemu_mutex_unlock_iothread();
>  }
>  #endif
>  if (!cpu_has_work(cpu)) {
> @@ -443,7 +446,9 @@ static inline bool 

[Qemu-devel] [PATCH v8 08/25] tcg: drop global lock during TCG code execution

2017-01-27 Thread Alex Bennée
From: Jan Kiszka 

This finally allows TCG to benefit from the iothread introduction: Drop
the global mutex while running pure TCG CPU code. Reacquire the lock
when entering MMIO or PIO emulation, or when leaving the TCG loop.

We have to revert a few optimization for the current TCG threading
model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
kicking it in qemu_cpu_kick. We also need to disable RAM block
reordering until we have a more efficient locking mechanism at hand.

Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
These numbers demonstrate where we gain something:

20338 jan   20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
20337 jan   20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm

The guest CPU was fully loaded, but the iothread could still run mostly
independent on a second core. Without the patch we don't get beyond

32206 jan   20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
32204 jan   20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm

We don't benefit significantly, though, when the guest is not fully
loading a host CPU.

Signed-off-by: Jan Kiszka 
Message-Id: <1439220437-23957-10-git-send-email-fred.kon...@greensocs.com>
[FK: Rebase, fix qemu_devices_reset deadlock, rm address_space_* mutex]
Signed-off-by: KONRAD Frederic 
[EGC: fixed iothread lock for cpu-exec IRQ handling]
Signed-off-by: Emilio G. Cota 
[AJB: -smp single-threaded fix, clean commit msg, BQL fixes]
Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 

---
v8:
 - merged in BQL fixes for PPC target: ppc_set_irq
 - merged in BQL fixes for ARM target: ARM_CP_IO helpers
 - merged in BQL fixes for ARM target: arm_call_el_change_hook

v5 (ajb, base patches):
 - added an assert to BQL unlock/lock functions instead of hanging
 - ensure all cpu->interrupt_requests *modifications* protected by BQL
 - add a re-read on cpu->interrupt_request for correctness
 - BQL fixes for:
   - assert BQL held for PPC hypercalls (emulate_spar_hypercall)
   - SCLP service calls on s390x
 - merge conflict with kick timer patch
v4 (ajb, base patches):
 - protect cpu->interrupt updates with BQL
 - fix wording io_mem_notdirty calls
 - s/we/with/
v3 (ajb, base-patches):
  - stale iothread_unlocks removed (cpu_exit/resume_from_signal deals
  with it in the longjmp).
  - fix re-base conflicts
v2 (ajb):
  - merge with tcg: grab iothread lock in cpu-exec interrupt handling
  - use existing fns for tracking lock state
  - lock iothread for mem_region
- add assert on mem region modification
- ensure smm_helper holds iothread
  - Add JK s-o-b
  - Fix-up FK s-o-b annotation
v1 (ajb, base-patches):
  - SMP failure now fixed by previous commit

Changes from Fred Konrad (mttcg-v7 via paolo):
  * Rebase on the current HEAD.
  * Fixes a deadlock in qemu_devices_reset().
  * Remove the mutex in address_space_*
---
 cpu-exec.c | 20 ++--
 cpus.c | 28 +---
 cputlb.c   | 21 -
 exec.c | 12 +---
 hw/core/irq.c  |  1 +
 hw/i386/kvmvapic.c |  4 ++--
 hw/intc/arm_gicv3_cpuif.c  |  3 +++
 hw/ppc/ppc.c   | 16 +++-
 hw/ppc/spapr.c |  3 +++
 include/qom/cpu.h  |  1 +
 memory.c   |  2 ++
 qom/cpu.c  | 10 ++
 target/arm/helper.c|  6 ++
 target/arm/op_helper.c | 43 +++
 target/i386/smm_helper.c   |  7 +++
 target/s390x/misc_helper.c |  5 -
 translate-all.c|  9 +++--
 translate-common.c | 21 +++--
 18 files changed, 163 insertions(+), 49 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index f9e836c8dd..f42a128bdf 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -29,6 +29,7 @@
 #include "qemu/rcu.h"
 #include "exec/tb-hash.h"
 #include "exec/log.h"
+#include "qemu/main-loop.h"
 #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
 #include "hw/i386/apic.h"
 #endif
@@ -388,8 +389,10 @@ static inline bool cpu_handle_halt(CPUState *cpu)
 if ((cpu->interrupt_request & CPU_INTERRUPT_POLL)
 && replay_interrupt()) {
 X86CPU *x86_cpu = X86_CPU(cpu);
+qemu_mutex_lock_iothread();
 apic_poll_irq(x86_cpu->apic_state);
 cpu_reset_interrupt(cpu, CPU_INTERRUPT_POLL);
+qemu_mutex_unlock_iothread();
 }
 #endif
 if (!cpu_has_work(cpu)) {
@@ -443,7 +446,9 @@ static inline bool cpu_handle_exception(CPUState *cpu, int 
*ret)
 #else
 if (replay_exception()) {
 CPUClass *cc = CPU_GET_CLASS(cpu);
+qemu_mutex_lock_iothread();
 cc->do_interrupt(cpu);
+   

[Qemu-devel] [PATCH v8 08/25] tcg: drop global lock during TCG code execution

2017-01-27 Thread Alex Bennée
From: Jan Kiszka 

This finally allows TCG to benefit from the iothread introduction: Drop
the global mutex while running pure TCG CPU code. Reacquire the lock
when entering MMIO or PIO emulation, or when leaving the TCG loop.

We have to revert a few optimization for the current TCG threading
model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
kicking it in qemu_cpu_kick. We also need to disable RAM block
reordering until we have a more efficient locking mechanism at hand.

Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
These numbers demonstrate where we gain something:

20338 jan   20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
20337 jan   20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm

The guest CPU was fully loaded, but the iothread could still run mostly
independent on a second core. Without the patch we don't get beyond

32206 jan   20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
32204 jan   20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm

We don't benefit significantly, though, when the guest is not fully
loading a host CPU.

Signed-off-by: Jan Kiszka 
Message-Id: <1439220437-23957-10-git-send-email-fred.kon...@greensocs.com>
[FK: Rebase, fix qemu_devices_reset deadlock, rm address_space_* mutex]
Signed-off-by: KONRAD Frederic 
[EGC: fixed iothread lock for cpu-exec IRQ handling]
Signed-off-by: Emilio G. Cota 
[AJB: -smp single-threaded fix, clean commit msg, BQL fixes]
Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 

---
v8:
 - merged in BQL fixes for PPC target: ppc_set_irq
 - merged in BQL fixes for ARM target: ARM_CP_IO helpers
 - merged in BQL fixes for ARM target: arm_call_el_change_hook

v5 (ajb, base patches):
 - added an assert to BQL unlock/lock functions instead of hanging
 - ensure all cpu->interrupt_requests *modifications* protected by BQL
 - add a re-read on cpu->interrupt_request for correctness
 - BQL fixes for:
   - assert BQL held for PPC hypercalls (emulate_spar_hypercall)
   - SCLP service calls on s390x
 - merge conflict with kick timer patch
v4 (ajb, base patches):
 - protect cpu->interrupt updates with BQL
 - fix wording io_mem_notdirty calls
 - s/we/with/
v3 (ajb, base-patches):
  - stale iothread_unlocks removed (cpu_exit/resume_from_signal deals
  with it in the longjmp).
  - fix re-base conflicts
v2 (ajb):
  - merge with tcg: grab iothread lock in cpu-exec interrupt handling
  - use existing fns for tracking lock state
  - lock iothread for mem_region
- add assert on mem region modification
- ensure smm_helper holds iothread
  - Add JK s-o-b
  - Fix-up FK s-o-b annotation
v1 (ajb, base-patches):
  - SMP failure now fixed by previous commit

Changes from Fred Konrad (mttcg-v7 via paolo):
  * Rebase on the current HEAD.
  * Fixes a deadlock in qemu_devices_reset().
  * Remove the mutex in address_space_*
---
 cpu-exec.c | 20 ++--
 cpus.c | 28 +---
 cputlb.c   | 21 -
 exec.c | 12 +---
 hw/core/irq.c  |  1 +
 hw/i386/kvmvapic.c |  4 ++--
 hw/intc/arm_gicv3_cpuif.c  |  3 +++
 hw/ppc/ppc.c   | 16 +++-
 hw/ppc/spapr.c |  3 +++
 include/qom/cpu.h  |  1 +
 memory.c   |  2 ++
 qom/cpu.c  | 10 ++
 target/arm/helper.c|  6 ++
 target/arm/op_helper.c | 43 +++
 target/i386/smm_helper.c   |  7 +++
 target/s390x/misc_helper.c |  5 -
 translate-all.c|  9 +++--
 translate-common.c | 21 +++--
 18 files changed, 163 insertions(+), 49 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index f9e836c8dd..f42a128bdf 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -29,6 +29,7 @@
 #include "qemu/rcu.h"
 #include "exec/tb-hash.h"
 #include "exec/log.h"
+#include "qemu/main-loop.h"
 #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
 #include "hw/i386/apic.h"
 #endif
@@ -388,8 +389,10 @@ static inline bool cpu_handle_halt(CPUState *cpu)
 if ((cpu->interrupt_request & CPU_INTERRUPT_POLL)
 && replay_interrupt()) {
 X86CPU *x86_cpu = X86_CPU(cpu);
+qemu_mutex_lock_iothread();
 apic_poll_irq(x86_cpu->apic_state);
 cpu_reset_interrupt(cpu, CPU_INTERRUPT_POLL);
+qemu_mutex_unlock_iothread();
 }
 #endif
 if (!cpu_has_work(cpu)) {
@@ -443,7 +446,9 @@ static inline bool cpu_handle_exception(CPUState *cpu, int 
*ret)
 #else
 if (replay_exception()) {
 CPUClass *cc = CPU_GET_CLASS(cpu);
+qemu_mutex_lock_iothread();
 cc->do_interrupt(cpu);
+