Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work

2014-07-31 Thread David Hildenbrand
  We have
  - wait (wait bit in PSW)
  - disabled wait (wait bit and interrupt fencing in PSW)
  - STOPPED (not related to PSW, state change usually handled via service 
  processor or hypervisor)
 
  I think we have to differentiate between KVM/TCG. On KVM we always do in 
  kernel halt and qemu sees a halted only for STOPPED or disabled wait. TCG 
  has to take care of the normal wait as well.
 
   From a first glimpse, a disabled wait and STOPPED look similar, but there 
  are (important) differences, e.g. other CPUs get a different a different 
  result from a SIGP SENSE. This makes a big difference, e.g. for Linux 
  guests, that send a SIGP STOP, followed by a SIGP SENSE loop until the CPU 
  is down on hotplug (and shutdown, kexec..) So I think we agree, that 
  handling the cpu states natively makes sense.
 
  The question is now only how to model it correctly without breaking TCG/KVM 
  and reuse as much common code as possible. Correct?
 
  Do I understand you correctly, that your collapsing of stopped and halted 
  is only in the qemu coding sense, IOW maybe we could just modify 
  kvm_arch_process_async_events to consider the STOPPED state, as TCGs sigp 
  implementation does not support SMP anyway?
 
 That works for me, yes.
 
 
 Alex
 

I had a look at it yesterday and it seems like we can totally drop this patch:

1. TCG doesn't support multiple CPUs and the TCG SIGP implementation isn't
ready for proper STOP/START/SENSE. Testing for STOPPED cpus in cpu_has_work()
can be dropped. To be able to support TCG was the main reason for this patch -
as we don't want to do so for now, we can leave it as is. We can still decide
to support the cpu states later using a mechanism suggest by Alex
(interrupt_requests).

Even if cpu_has_work() would make cpu.c:cpu_thread_is_idle() return false,
kvm_arch_process_async_events() called by kvm-all.c:kvm_cpu_exec() would make
it go back to sleep. Therefore a stopped VCPU will never be able to run in the
KVM case (because it always has cs-halted = true).

2. The unhalt in kvm_arch_process_async_events is for a special case where a
VCPU is in disabled wait and receives e.g. a machine-check interrupt. These
might happen in the future, for now we will never see them (the only
way to get a vcpu out of disabled wait are SIGP RESTART/CPU RESET - so we
don't break anything at that point).

David




Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work

2014-07-29 Thread Christian Borntraeger
On 28/07/14 16:22, Alexander Graf wrote:
 
 On 28.07.2014, at 16:16, David Hildenbrand d...@linux.vnet.ibm.com wrote:
 

 On 10.07.14 15:10, Christian Borntraeger wrote:
 From: David Hildenbrand d...@linux.vnet.ibm.com

 If a cpu is stopped, it must never be allowed to run and no interrupt may 
 wake it
 up. A cpu also has to be unhalted if it is halted and has work to do - this
 scenario wasn't hit in kvm case yet, as only disabled wait is processed 
 within
 QEMU.

 Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com
 Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
 Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com
 Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com

 This looks like it's something that generic infrastructure should take 
 care of, no? How does this work for the other archs? They always get an 
 interrupt on the transition between !has_work - has_work. Why don't we 
 get one for s390x?


 Alex



 Well, we have the special case on s390 as a CPU that is in the STOPPED or the
 CHECK STOP state may never run - even if there is an interrupt. It's
 basically like this CPU has been switched off.

 Imagine that it is tried to inject an interrupt into a stopped vcpu. It
 will kick the stopped vcpu and thus lead to a call to
 kvm_arch_process_async_events(). We have to deny that this vcpu will ever
 run as long as it is stopped. It's like a way to suppress the
 interrupt for such a transition you mentioned.
 
 An interrupt kick usually just means we go back into the main loop. From 
 there we check the interrupt bitmap which interrupt to handle. Check out the 
 handling code here:
 
   
 http://git.qemu.org/?p=qemu.git;a=blob;f=cpu-exec.c;h=38e5f02a307523d99134f4e2e6c51683bb10b45b;hb=HEAD#l580
 
 If you just check for the stopped state in here, do_interrupt() will never 
 get called and thus the CPU shouldn't ever get executed. Unless I'm heavily 
 mistaken :).
 

 Later, another vcpu might decide to turn that vcpu back on (by e.g. sending a
 SIGP START to that vcpu).
 
 Yes, in that case that other CPU generates a signal (a different bit in 
 interrupt_request) and the first CPU would see that it has to wake up and 
 wake up.
 
 I am not sure if such a mechanism/scenario is applicable to any other arch. 
 They
 all seem to reset the cs-halted flag if they know they are able to run (e.g.
 due to an interrupt) - they have no such thing as stopped cpus, only
 halted/waiting cpus.
 
 There's not really much difference between the two. The only difference from 
 a software point of view is that a stopped CPU has its external interrupt 
 bits masked off, no?

We have
- wait (wait bit in PSW)
- disabled wait (wait bit and interrupt fencing in PSW)
- STOPPED (not related to PSW, state change usually handled via service 
processor or hypervisor)

I think we have to differentiate between KVM/TCG. On KVM we always do in kernel 
halt and qemu sees a halted only for STOPPED or disabled wait. TCG has to take 
care of the normal wait as well.

From a first glimpse, a disabled wait and STOPPED look similar, but there are 
(important) differences, e.g. other CPUs get a different a different result 
from a SIGP SENSE. This makes a big difference, e.g. for Linux guests, that 
send a SIGP STOP, followed by a SIGP SENSE loop until the CPU is down on 
hotplug (and shutdown, kexec..) So I think we agree, that handling the cpu 
states natively makes sense.

The question is now only how to model it correctly without breaking TCG/KVM and 
reuse as much common code as possible. Correct?

Do I understand you correctly, that your collapsing of stopped and halted is 
only in the qemu coding sense, IOW maybe we could just modify 
kvm_arch_process_async_events to consider the STOPPED state, as TCGs sigp 
implementation does not support SMP anyway?
David would that work?

Christian




Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work

2014-07-29 Thread Alexander Graf


On 29.07.14 13:44, Christian Borntraeger wrote:

On 28/07/14 16:22, Alexander Graf wrote:

On 28.07.2014, at 16:16, David Hildenbrand d...@linux.vnet.ibm.com wrote:


On 10.07.14 15:10, Christian Borntraeger wrote:

From: David Hildenbrand d...@linux.vnet.ibm.com

If a cpu is stopped, it must never be allowed to run and no interrupt may wake 
it
up. A cpu also has to be unhalted if it is halted and has work to do - this
scenario wasn't hit in kvm case yet, as only disabled wait is processed within
QEMU.

Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com
Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com

This looks like it's something that generic infrastructure should take
care of, no? How does this work for the other archs? They always get an
interrupt on the transition between !has_work - has_work. Why don't we
get one for s390x?


Alex



Well, we have the special case on s390 as a CPU that is in the STOPPED or the
CHECK STOP state may never run - even if there is an interrupt. It's
basically like this CPU has been switched off.

Imagine that it is tried to inject an interrupt into a stopped vcpu. It
will kick the stopped vcpu and thus lead to a call to
kvm_arch_process_async_events(). We have to deny that this vcpu will ever
run as long as it is stopped. It's like a way to suppress the
interrupt for such a transition you mentioned.

An interrupt kick usually just means we go back into the main loop. From there 
we check the interrupt bitmap which interrupt to handle. Check out the handling 
code here:

   
http://git.qemu.org/?p=qemu.git;a=blob;f=cpu-exec.c;h=38e5f02a307523d99134f4e2e6c51683bb10b45b;hb=HEAD#l580

If you just check for the stopped state in here, do_interrupt() will never get 
called and thus the CPU shouldn't ever get executed. Unless I'm heavily 
mistaken :).


Later, another vcpu might decide to turn that vcpu back on (by e.g. sending a
SIGP START to that vcpu).

Yes, in that case that other CPU generates a signal (a different bit in 
interrupt_request) and the first CPU would see that it has to wake up and wake 
up.


I am not sure if such a mechanism/scenario is applicable to any other arch. They
all seem to reset the cs-halted flag if they know they are able to run (e.g.
due to an interrupt) - they have no such thing as stopped cpus, only
halted/waiting cpus.

There's not really much difference between the two. The only difference from a software 
point of view is that a stopped CPU has its external interrupt bits masked 
off, no?

We have
- wait (wait bit in PSW)
- disabled wait (wait bit and interrupt fencing in PSW)
- STOPPED (not related to PSW, state change usually handled via service 
processor or hypervisor)

I think we have to differentiate between KVM/TCG. On KVM we always do in kernel 
halt and qemu sees a halted only for STOPPED or disabled wait. TCG has to take 
care of the normal wait as well.

 From a first glimpse, a disabled wait and STOPPED look similar, but there are 
(important) differences, e.g. other CPUs get a different a different result 
from a SIGP SENSE. This makes a big difference, e.g. for Linux guests, that 
send a SIGP STOP, followed by a SIGP SENSE loop until the CPU is down on 
hotplug (and shutdown, kexec..) So I think we agree, that handling the cpu 
states natively makes sense.

The question is now only how to model it correctly without breaking TCG/KVM and 
reuse as much common code as possible. Correct?

Do I understand you correctly, that your collapsing of stopped and halted is 
only in the qemu coding sense, IOW maybe we could just modify 
kvm_arch_process_async_events to consider the STOPPED state, as TCGs sigp 
implementation does not support SMP anyway?


That works for me, yes.


Alex




Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work

2014-07-29 Thread Paolo Bonzini
Il 28/07/2014 17:03, David Hildenbrand ha scritto:
 Well the difference is, that a STOPPED vcpu can be woken up by non-interrupt
 like things (SIGP START) AND a special interrupt (SIGP RESTART - which is like
 a SIPI++ as it performs a psw exchange - NMI). So we basically have two
 paths that can lead to a state change.  All interrupt bits may be in any
 combination (SIGP RESTART interrupts can't be masked out, nor can SIGP START 
 be
 denied).
 
 The other thing may be that on s390, each vcpu (including itself) can put
 another vcpu into the STOPPED state - I assume that this is different for x86 
 
 INIT_RECEIVED. For this reason we have to watch out for bad race conditions
 (e.g. multiple vcpus working on another vcpu)...

You can do that in x86 by sending an INIT inter-processor interrupt.  A
SIPI is ignored if the CPU is not in INIT_RECEIVED state.

Commit 66450a21f99636af4fafac2afd33f1a40631bc3a introduced the current
implementation.

- an INIT cancels a previous SIPI;

- if both INIT and SIPI are sent, on real hardware you need to have a
few hundred microseconds between them, but KVM will reliably process
INIT before SIPI.

See commit 299018f44ac553dce3caf84df1d14c4764faa279 for an example of
the races that can happen.

Note that x86 has KVM_MP_STATE_SIPI_RECEIVED state but it is obsolete,
we go straight from KVM_MP_STATE_INIT_RECEIVED to KVM_MP_STATE_RUNNABLE.



Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work

2014-07-29 Thread David Hildenbrand
 Il 28/07/2014 17:03, David Hildenbrand ha scritto:
  Well the difference is, that a STOPPED vcpu can be woken up by non-interrupt
  like things (SIGP START) AND a special interrupt (SIGP RESTART - which is 
  like
  a SIPI++ as it performs a psw exchange - NMI). So we basically have two
  paths that can lead to a state change.  All interrupt bits may be in any
  combination (SIGP RESTART interrupts can't be masked out, nor can SIGP 
  START be
  denied).
  
  The other thing may be that on s390, each vcpu (including itself) can put
  another vcpu into the STOPPED state - I assume that this is different for 
  x86 
  INIT_RECEIVED. For this reason we have to watch out for bad race conditions
  (e.g. multiple vcpus working on another vcpu)...
 
 You can do that in x86 by sending an INIT inter-processor interrupt.  A
 SIPI is ignored if the CPU is not in INIT_RECEIVED state.
 
 Commit 66450a21f99636af4fafac2afd33f1a40631bc3a introduced the current
 implementation.
 
 - an INIT cancels a previous SIPI;
 
 - if both INIT and SIPI are sent, on real hardware you need to have a
 few hundred microseconds between them, but KVM will reliably process
 INIT before SIPI.
 
 See commit 299018f44ac553dce3caf84df1d14c4764faa279 for an example of
 the races that can happen.
 
 Note that x86 has KVM_MP_STATE_SIPI_RECEIVED state but it is obsolete,
 we go straight from KVM_MP_STATE_INIT_RECEIVED to KVM_MP_STATE_RUNNABLE.
 

Thanks for the explanation Paolo!

Looks like from an interrupt point of view, the states have a lot in common.
The major thing that differs on s390 is probably the way these interrupts are
generated and what else they influence (all the power of the SIGP facility :)
+ special check-stop state that can't be left by an interrupt, only by SIGP
  CPU resets).

David




Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work

2014-07-28 Thread Alexander Graf


On 10.07.14 15:10, Christian Borntraeger wrote:

From: David Hildenbrand d...@linux.vnet.ibm.com

If a cpu is stopped, it must never be allowed to run and no interrupt may wake 
it
up. A cpu also has to be unhalted if it is halted and has work to do - this
scenario wasn't hit in kvm case yet, as only disabled wait is processed within
QEMU.

Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com
Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com


This looks like it's something that generic infrastructure should take 
care of, no? How does this work for the other archs? They always get an 
interrupt on the transition between !has_work - has_work. Why don't we 
get one for s390x?



Alex




Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work

2014-07-28 Thread David Hildenbrand
 
 On 10.07.14 15:10, Christian Borntraeger wrote:
  From: David Hildenbrand d...@linux.vnet.ibm.com
 
  If a cpu is stopped, it must never be allowed to run and no interrupt may 
  wake it
  up. A cpu also has to be unhalted if it is halted and has work to do - this
  scenario wasn't hit in kvm case yet, as only disabled wait is processed 
  within
  QEMU.
 
  Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com
  Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
  Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com
  Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
 
 This looks like it's something that generic infrastructure should take 
 care of, no? How does this work for the other archs? They always get an 
 interrupt on the transition between !has_work - has_work. Why don't we 
 get one for s390x?
 
 
 Alex
 
 

Well, we have the special case on s390 as a CPU that is in the STOPPED or the
CHECK STOP state may never run - even if there is an interrupt. It's
basically like this CPU has been switched off.

Imagine that it is tried to inject an interrupt into a stopped vcpu. It
will kick the stopped vcpu and thus lead to a call to
kvm_arch_process_async_events(). We have to deny that this vcpu will ever
run as long as it is stopped. It's like a way to suppress the
interrupt for such a transition you mentioned.

Later, another vcpu might decide to turn that vcpu back on (by e.g. sending a
SIGP START to that vcpu).

I am not sure if such a mechanism/scenario is applicable to any other arch. They
all seem to reset the cs-halted flag if they know they are able to run (e.g.
due to an interrupt) - they have no such thing as stopped cpus, only
halted/waiting cpus.

David




Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work

2014-07-28 Thread Paolo Bonzini
Il 28/07/2014 16:16, David Hildenbrand ha scritto:
 Later, another vcpu might decide to turn that vcpu back on (by e.g. sending a
 SIGP START to that vcpu).
 
 I am not sure if such a mechanism/scenario is applicable to any other arch. 
 They
 all seem to reset the cs-halted flag if they know they are able to run (e.g.
 due to an interrupt) - they have no such thing as stopped cpus, only
 halted/waiting cpus.

On x86, INIT_RECEIVED is pretty much a stopped CPU.  It can only run
(and receive interrupts) after getting a special startup interrupt (SIPI).

Paolo



Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work

2014-07-28 Thread Alexander Graf

On 28.07.2014, at 16:16, David Hildenbrand d...@linux.vnet.ibm.com wrote:

 
 On 10.07.14 15:10, Christian Borntraeger wrote:
 From: David Hildenbrand d...@linux.vnet.ibm.com
 
 If a cpu is stopped, it must never be allowed to run and no interrupt may 
 wake it
 up. A cpu also has to be unhalted if it is halted and has work to do - this
 scenario wasn't hit in kvm case yet, as only disabled wait is processed 
 within
 QEMU.
 
 Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com
 Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
 Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com
 Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
 
 This looks like it's something that generic infrastructure should take 
 care of, no? How does this work for the other archs? They always get an 
 interrupt on the transition between !has_work - has_work. Why don't we 
 get one for s390x?
 
 
 Alex
 
 
 
 Well, we have the special case on s390 as a CPU that is in the STOPPED or the
 CHECK STOP state may never run - even if there is an interrupt. It's
 basically like this CPU has been switched off.
 
 Imagine that it is tried to inject an interrupt into a stopped vcpu. It
 will kick the stopped vcpu and thus lead to a call to
 kvm_arch_process_async_events(). We have to deny that this vcpu will ever
 run as long as it is stopped. It's like a way to suppress the
 interrupt for such a transition you mentioned.

An interrupt kick usually just means we go back into the main loop. From there 
we check the interrupt bitmap which interrupt to handle. Check out the handling 
code here:

  
http://git.qemu.org/?p=qemu.git;a=blob;f=cpu-exec.c;h=38e5f02a307523d99134f4e2e6c51683bb10b45b;hb=HEAD#l580

If you just check for the stopped state in here, do_interrupt() will never get 
called and thus the CPU shouldn't ever get executed. Unless I'm heavily 
mistaken :).

 
 Later, another vcpu might decide to turn that vcpu back on (by e.g. sending a
 SIGP START to that vcpu).

Yes, in that case that other CPU generates a signal (a different bit in 
interrupt_request) and the first CPU would see that it has to wake up and wake 
up.

 I am not sure if such a mechanism/scenario is applicable to any other arch. 
 They
 all seem to reset the cs-halted flag if they know they are able to run (e.g.
 due to an interrupt) - they have no such thing as stopped cpus, only
 halted/waiting cpus.

There's not really much difference between the two. The only difference from a 
software point of view is that a stopped CPU has its external interrupt bits 
masked off, no?


Alex




Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work

2014-07-28 Thread David Hildenbrand
 
 On 28.07.2014, at 16:16, David Hildenbrand d...@linux.vnet.ibm.com wrote:
 
  
  On 10.07.14 15:10, Christian Borntraeger wrote:
  From: David Hildenbrand d...@linux.vnet.ibm.com
  
  If a cpu is stopped, it must never be allowed to run and no interrupt may 
  wake it
  up. A cpu also has to be unhalted if it is halted and has work to do - 
  this
  scenario wasn't hit in kvm case yet, as only disabled wait is processed 
  within
  QEMU.
  
  Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com
  Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
  Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com
  Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
  
  This looks like it's something that generic infrastructure should take 
  care of, no? How does this work for the other archs? They always get an 
  interrupt on the transition between !has_work - has_work. Why don't we 
  get one for s390x?
  
  
  Alex
  
  
  
  Well, we have the special case on s390 as a CPU that is in the STOPPED or 
  the
  CHECK STOP state may never run - even if there is an interrupt. It's
  basically like this CPU has been switched off.
  
  Imagine that it is tried to inject an interrupt into a stopped vcpu. It
  will kick the stopped vcpu and thus lead to a call to
  kvm_arch_process_async_events(). We have to deny that this vcpu will ever
  run as long as it is stopped. It's like a way to suppress the
  interrupt for such a transition you mentioned.
 
 An interrupt kick usually just means we go back into the main loop. From 
 there we check the interrupt bitmap which interrupt to handle. Check out the 
 handling code here:
 
   
 http://git.qemu.org/?p=qemu.git;a=blob;f=cpu-exec.c;h=38e5f02a307523d99134f4e2e6c51683bb10b45b;hb=HEAD#l580
 
 If you just check for the stopped state in here, do_interrupt() will never 
 get called and thus the CPU shouldn't ever get executed. Unless I'm heavily 
 mistaken :).

So you would rather move the check out of has_work() into the main loop in
cpu-exec.c and directly into kvm_arch_process_async_events()?

This would on the other hand lead to an unhalt of the vcpu in cpu_exec() on any
CPU_INTERRUPT_HARD. A VCPU might thus be unhalted although it is not able to 
run. Is okay?

Looking at cpu.c:cpu_thread_is_idle(), we would maybe return false, although we
are idle (because we are idle when we are stopped)?

My qemu kvm knowledge is way better than the qemu emulation knowledge, so I
appreciate any insights :)

 
  
  Later, another vcpu might decide to turn that vcpu back on (by e.g. sending 
  a
  SIGP START to that vcpu).
 
 Yes, in that case that other CPU generates a signal (a different bit in 
 interrupt_request) and the first CPU would see that it has to wake up and 
 wake up.
 
  I am not sure if such a mechanism/scenario is applicable to any other arch. 
  They
  all seem to reset the cs-halted flag if they know they are able to run 
  (e.g.
  due to an interrupt) - they have no such thing as stopped cpus, only
  halted/waiting cpus.
 
 There's not really much difference between the two. The only difference from 
 a software point of view is that a stopped CPU has its external interrupt 
 bits masked off, no?

Well the difference is, that a STOPPED vcpu can be woken up by non-interrupt
like things (SIGP START) AND a special interrupt (SIGP RESTART - which is like
a SIPI++ as it performs a psw exchange - NMI). So we basically have two
paths that can lead to a state change. All interrupt bits may be in any
combination (SIGP RESTART interrupts can't be masked out, nor can SIGP START be
denied).

The other thing may be that on s390, each vcpu (including itself) can put
another vcpu into the STOPPED state - I assume that this is different for x86 
INIT_RECEIVED. For this reason we have to watch out for bad race conditions
(e.g. multiple vcpus working on another vcpu)...

David

 
 
 Alex
 




Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work

2014-07-28 Thread Alexander Graf


On 28.07.14 17:03, David Hildenbrand wrote:

On 28.07.2014, at 16:16, David Hildenbrand d...@linux.vnet.ibm.com wrote:


On 10.07.14 15:10, Christian Borntraeger wrote:

From: David Hildenbrand d...@linux.vnet.ibm.com

If a cpu is stopped, it must never be allowed to run and no interrupt may wake 
it
up. A cpu also has to be unhalted if it is halted and has work to do - this
scenario wasn't hit in kvm case yet, as only disabled wait is processed within
QEMU.

Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com
Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com

This looks like it's something that generic infrastructure should take
care of, no? How does this work for the other archs? They always get an
interrupt on the transition between !has_work - has_work. Why don't we
get one for s390x?


Alex



Well, we have the special case on s390 as a CPU that is in the STOPPED or the
CHECK STOP state may never run - even if there is an interrupt. It's
basically like this CPU has been switched off.

Imagine that it is tried to inject an interrupt into a stopped vcpu. It
will kick the stopped vcpu and thus lead to a call to
kvm_arch_process_async_events(). We have to deny that this vcpu will ever
run as long as it is stopped. It's like a way to suppress the
interrupt for such a transition you mentioned.

An interrupt kick usually just means we go back into the main loop. From there 
we check the interrupt bitmap which interrupt to handle. Check out the handling 
code here:

   
http://git.qemu.org/?p=qemu.git;a=blob;f=cpu-exec.c;h=38e5f02a307523d99134f4e2e6c51683bb10b45b;hb=HEAD#l580

If you just check for the stopped state in here, do_interrupt() will never get 
called and thus the CPU shouldn't ever get executed. Unless I'm heavily 
mistaken :).

So you would rather move the check out of has_work() into the main loop in
cpu-exec.c and directly into kvm_arch_process_async_events()?

This would on the other hand lead to an unhalt of the vcpu in cpu_exec() on any
CPU_INTERRUPT_HARD. A VCPU might thus be unhalted although it is not able to 
run. Is okay?


Not really I think. We could create a new interrupt_request bit called 
CPU_INTERRUPT_STOPPED that doesn't get unset automatically and simply 
sets cpu-halted = 1 (similar to CPU_INTERRUPT_HALT).




Looking at cpu.c:cpu_thread_is_idle(), we would maybe return false, although we
are idle (because we are idle when we are stopped)?

My qemu kvm knowledge is way better than the qemu emulation knowledge, so I
appreciate any insights :)


Later, another vcpu might decide to turn that vcpu back on (by e.g. sending a
SIGP START to that vcpu).

Yes, in that case that other CPU generates a signal (a different bit in 
interrupt_request) and the first CPU would see that it has to wake up and wake 
up.


I am not sure if such a mechanism/scenario is applicable to any other arch. They
all seem to reset the cs-halted flag if they know they are able to run (e.g.
due to an interrupt) - they have no such thing as stopped cpus, only
halted/waiting cpus.

There's not really much difference between the two. The only difference from a software 
point of view is that a stopped CPU has its external interrupt bits masked 
off, no?

Well the difference is, that a STOPPED vcpu can be woken up by non-interrupt
like things (SIGP START) AND a special interrupt (SIGP RESTART - which is like
a SIPI++ as it performs a psw exchange - NMI). So we basically have two
paths that can lead to a state change. All interrupt bits may be in any
combination (SIGP RESTART interrupts can't be masked out, nor can SIGP START be
denied).


That's perfectly normal behavior. Just make it two different interrupt 
types:


if (interrupt_request  CPU_INTERRUPT_STOPPED) {
  /* Go back to halted state */
  ...
} else if (interrupt_request  CPU_INTERRUPT_SIGP) {
  env-interrupt_request = ~CPU_INTERRUPT_STOPPED;
  /* swap PSW */
  ...
} else if ((interrupt_request  CPU_INTERRUPT_HARD) 
(env-psw.mask  PSW_MASK_EXT)) {
  ...
}



The other thing may be that on s390, each vcpu (including itself) can put
another vcpu into the STOPPED state - I assume that this is different for x86 
INIT_RECEIVED. For this reason we have to watch out for bad race conditions
(e.g. multiple vcpus working on another vcpu)...


TCG is single-threaded :). And if you stick to the interrupt logic above 
all should be good.



Alex




[Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work

2014-07-10 Thread Christian Borntraeger
From: David Hildenbrand d...@linux.vnet.ibm.com

If a cpu is stopped, it must never be allowed to run and no interrupt may wake 
it
up. A cpu also has to be unhalted if it is halted and has work to do - this
scenario wasn't hit in kvm case yet, as only disabled wait is processed within
QEMU.

Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com
Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 target-s390x/cpu.c | 6 ++
 target-s390x/kvm.c | 5 +
 2 files changed, 11 insertions(+)

diff --git a/target-s390x/cpu.c b/target-s390x/cpu.c
index c5ab98f..1d32f5a 100644
--- a/target-s390x/cpu.c
+++ b/target-s390x/cpu.c
@@ -72,6 +72,12 @@ static bool s390_cpu_has_work(CPUState *cs)
 S390CPU *cpu = S390_CPU(cs);
 CPUS390XState *env = cpu-env;
 
+/* stopped cpus can never run */
+if (env-cpu_state == CPU_STATE_STOPPED ||
+env-cpu_state == CPU_STATE_CHECK_STOP) {
+return false;
+}
+
 return (cs-interrupt_request  CPU_INTERRUPT_HARD) 
(env-psw.mask  PSW_MASK_EXT);
 }
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index db2e42c..00125f1 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -553,6 +553,11 @@ void kvm_arch_post_run(CPUState *cpu, struct kvm_run *run)
 
 int kvm_arch_process_async_events(CPUState *cs)
 {
+if (cs-halted  CPU_GET_CLASS(cs)-has_work(cs)) {
+/* has_work will take care of stopped cpus */
+s390_cpu_unhalt(S390_CPU(cs));
+}
+
 return cs-halted;
 }
 
-- 
1.8.4.2