Marcelo Tosatti wrote:
If a vcpu has been offlined, or not initialized at all, signals
requesting userspace work to be performed will result in KVM attempting
to re-entry guest mode.

Problem is that the in-kernel irqchip emulation happily executes HALTED
state vcpu's. This breaks "savevm" on Windows SMP installation (that
only boots up a single vcpu), for example.

Fix it by blocking halted vcpu's at kvm_arch_vcpu_ioctl_run().
Change the promotion from halted to running to happen in the vcpu
context. Use the information available in kvm_vcpu_block(), and the
current mpstate to make the decision:

- If there's an in-kernel timer or irq event the halted->running
promotion evaluation can be performed, no need for userspace assistance.

- If there's a signal, there's either userspace work to be performed
in the vcpu's context or irqchip emulation is in userspace.

This has the nice side effect of avoiding userspace exit in case of irq injection to a halted vcpu from the iothread.

Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>

Index: kvm/arch/x86/kvm/x86.c
===================================================================
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -2505,17 +2505,25 @@ void kvm_arch_exit(void)
        kvm_mmu_module_exit();
 }
+static void kvm_vcpu_promote_runnable(struct kvm_vcpu *vcpu)
+{
+       if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED)
+               vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+}
+
 int kvm_emulate_halt(struct kvm_vcpu *vcpu)
 {
        ++vcpu->stat.halt_exits;
        KVMTRACE_0D(HLT, vcpu, handler);
        if (irqchip_in_kernel(vcpu->kvm)) {
+               int ret;

Missing blank line.

@@ -2978,10 +2986,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_v
        if (vcpu->sigset_active)
                sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
- if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
-               kvm_vcpu_block(vcpu);
-               r = -EAGAIN;
-               goto out;
+       if (unlikely(!kvm_arch_vcpu_runnable(vcpu))) {
+               if (kvm_vcpu_block(vcpu)) {
+                       r = -EAGAIN;
+                       goto out;
+               }
+               kvm_vcpu_promote_runnable(vcpu);
        }


Any reason this is not in __vcpu_run()?

Our main loop could look like

  while (no reason to stop)
        if (runnable)
             enter guest
        else
             block
        deal with aftermath

kvm_emulate_halt would then simply modify the mp state.

/* re-sync apic's tpr */
Index: kvm/include/linux/kvm_host.h
===================================================================
--- kvm.orig/include/linux/kvm_host.h
+++ kvm/include/linux/kvm_host.h
@@ -199,7 +199,7 @@ struct kvm_memory_slot *gfn_to_memslot(s
 int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
 void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
-void kvm_vcpu_block(struct kvm_vcpu *vcpu);
+int kvm_vcpu_block(struct kvm_vcpu *vcpu);
 void kvm_resched(struct kvm_vcpu *vcpu);
 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
Index: kvm/virt/kvm/kvm_main.c
===================================================================
--- kvm.orig/virt/kvm/kvm_main.c
+++ kvm/virt/kvm/kvm_main.c
@@ -818,9 +818,10 @@ void mark_page_dirty(struct kvm *kvm, gf
 /*
  * The vCPU has executed a HLT instruction with in-kernel mode enabled.
  */
-void kvm_vcpu_block(struct kvm_vcpu *vcpu)
+int kvm_vcpu_block(struct kvm_vcpu *vcpu)
 {
        DEFINE_WAIT(wait);
+       int ret = 0;
for (;;) {
                prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
@@ -831,8 +832,10 @@ void kvm_vcpu_block(struct kvm_vcpu *vcp
                        break;
                if (kvm_arch_vcpu_runnable(vcpu))
                        break;
-               if (signal_pending(current))
+               if (signal_pending(current)) {
+                       ret = 1;
                        break;
+               }

This is ambiguous. Multiple exit conditions could be true at the same time (vcpu becomes runnable _and_ signal is pending), so you can't trust the return code. It doesn't affect the usage in the rest of the patch (I think), but it is best to avoid such subtlety.

Can this be done by setting a KVM_REQ_UNHALT bit in vcpu->requests?

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to