Re: [kvm-devel] irqdevice INTR example

Gregory Haskins Thu, 12 Apr 2007 06:43:41 -0700

I have attached a new version of the patch which eliminates the condition 
variable (if only by name, anyway ;)

>>> On Thu, Apr 12, 2007 at  8:49 AM, in message <[EMAIL PROTECTED]>,
Avi Kivity <[EMAIL PROTECTED]> wrote: 
>
> 
> Actually, I am in favor of having well- defined synchronization 
> primitives.  The only issues I see with condition variables are:
> 
> -  there are many variants of mutexes in the kernel (mutex, spinlock, 
> spinlock with irqs disabled, ...), so many condvar variants are needed.

Yeah, I tried to address that with lock_ops, but I agree.

> -  they're very close to waitqueues, so perhaps there's not much 
> justification

Agreed they are very close.  I won't go into my opinion of waitqueues vs 
cond-vars in this forum ;)

> 
> I'm no synchronization primitive expert, though.  My comment had nothing 
> to do with my opinion on condition variables or your implementation 
> thereof.  kvm is simply not the place to introduce them.

No problem.  I think your argument is a good one.

> 
> I referred to the comment.  Maybe just "the hardware"?

Done

> 
> Ah, ok --  I misunderstood the whole thing.  The way to avoid the race is 
> to disable interrupts before entering the guest.  This way the IPI is 
> delayed until you enter guest mode:
> 
>     irq_disable();
> 
>     spin_lock();
>     vcpu- >guest_mode = 1;
>     check_whether_an_irq_is_pending_and_if_so_inject_it();
>     spin_unlock();
> 
>     asm ("vm_entry_sequence");
> 
>     vcpu- >guest_mode = 0;
>    
>     irq_enable(); // currently in the vm entry sequence
> 
>     // recheck here?
> 
> If the interrupt happens before the spin_lock(), we'll get a non- ipi 
> wakeup and then see it in check_whether().  If it happens after it we'll 
> get an IPI which will be ignored until we're snugly in guest mode.

When I first read this I thought "whoa! you want to disable interrupts during 
the whole time we are in GUEST mode?"  But then it dawned on me what you are 
suggesting:  Interrupts would be re-enabled after the context switch because we 
re-load the processor state?  Cool!  The thing I can't wrap my head around is 
what happens when the guest has IF=0 and and external interrupt comes in?  
Would we still exit?

But that aside, a problem that I see is that (IIUC) IPIs use NMIs not EXT-INTs. 
 Assuming that is right, I suppose we might be able to do a similar trick 
except we also disable NMIs first (is this possible/recommended/forbidden?).

> 
> In general I find it useful to pretend there are many userspaces being 
> written for kvm, otherwise we get locked into qemu's current mode of 
> operation.

Sounds reasonable.  I will start to do the same.

> 
>> This makes sense now that I think about it because something like hlt should 
> cause a suspension of CPU activity until further notice. 
> 
> An alternative is to handle hlt in the kernel in a place where we're 
> ready for the IPI wakeup.  The downside to that is that we have to be 
> prepared for external wakeup sources as well (signal, poll, aio... messy).

Hmmm...interesting.  I wonder if there are advantages that make this worth 
exploring.  Oh well, I will back burner these thoughts until the SMP/PV/APIC 
stuff is sorted out.

> 
> It's best not to use signals internally.  Qemu relies on them and we 
> have to support them, but in kernel we should use existing kernel 
> mechanisms.

But Avi, this was *your* idea to use signals ;)  But in all seriousness, I 
don't know if I have a choice.  I have two requirements which constrain me:

1) I need an IPI to cause a VMEXIT
2) I need to support 2.6.16, strongly preferable as a loadable module.

According to my research (which is undoubtedly not 100% definitive), 2.6.16 or 
even the newer kernels do not export most of the IPI facilities.  send_sig() 
happens to be exported and it happens to invoke a reschedule_IPI under the 
hood, so its convenient.  If there is another way to get access to the IPI 
facility without playing games with signals, I am all ears.   But until then I 
don't know what else to do.  If I had the luxury of modifying the kernel 
source, we could just export what we needed and be done with it.   

> 
> I was interested in how - >pending() and - >read_vector() and the raise 
> callback interact, but got distracted by the, err, uniqueness of the 
> signal thing.

You still haven't weighed in here ;)  Hopefully you have a clearer picture of 
what I was trying to do now at least.  Whether you agree or not is another 
matter.

Regards,
-Greg

KVM: Preemptible VCPU

From:  <>

This adds support for interrupting an executing CPU

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/kvm.h      |   11 ++++++++++
 drivers/kvm/kvm_main.c |   54 ++++++++++++++++++++++++++++++++++++++++++++----
 drivers/kvm/svm.c      |   35 +++++++++++++++++++++++++++++++
 drivers/kvm/vmx.c      |   35 +++++++++++++++++++++++++++++++
 4 files changed, 130 insertions(+), 5 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 58966d9..70d1bb9 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -271,6 +271,16 @@ void kvm_io_bus_register_dev(struct kvm_io_bus *bus,
 
 #define NR_IRQ_WORDS KVM_IRQ_BITMAP_SIZE(unsigned long)
 
+/*
+ * structure for maintaining info for interrupting an executing VCPU
+ */
+struct kvm_vcpu_irq {
+       spinlock_t          lock;
+       wait_queue_head_t   wq;
+       struct task_struct *task;
+       int                 pending;
+};
+
 struct kvm_vcpu {
        struct kvm *kvm;
        union {
@@ -284,6 +294,7 @@ struct kvm_vcpu {
        struct kvm_run *run;
        int interrupt_window_open;
        struct kvm_irqdevice irq_dev;
+       struct kvm_vcpu_irq irq;
        unsigned long regs[NR_VCPU_REGS]; /* for rsp: vcpu_load_rsp_rip() */
        unsigned long rip;      /* needs vcpu_load_rsp_rip() */
 
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 7e00412..1cf4060 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -299,6 +299,11 @@ static struct kvm *kvm_create_vm(void)
                struct kvm_vcpu *vcpu = &kvm->vcpus[i];
 
                mutex_init(&vcpu->mutex);
+
+               memset(&vcpu->irq, 0, sizeof(vcpu->irq));
+               spin_lock_init(&vcpu->irq.lock);
+               init_waitqueue_head(&vcpu->irq.wq);
+
                vcpu->cpu = -1;
                vcpu->kvm = kvm;
                vcpu->mmu.root_hpa = INVALID_PAGE;
@@ -2320,13 +2325,52 @@ static void kvm_vcpu_intr(struct kvm_irqsink *this,
         * Our irq device is requesting to interrupt the vcpu.  If it is
         * currently running, we should inject a host IPI to force a VMEXIT 
         */
-       
+       struct kvm_vcpu *vcpu = (struct kvm_vcpu*)this->private;
+
        /*
-        * FIXME: Implement this or the CPU wont notice the interrupt until
-        * the next natural VMEXIT.  Note that this is how the system
-        * has always worked, so nothing is broken here.  This is a future
-        * enhancement
+        * HACK ALERT!
+        *
+        * We want to send a virtual interrupt signal to the task that owns
+        * the guest.  However, the signal will only force a VMEXIT (via
+        * a reschedule IPI) if the task is currently in GUEST mode.  There
+        * is a race condition between the time that we mark the vcpu as
+        * running and the time the system actually enter guest mode.  Since
+        * there doesnt appear to be any way to help with this situation from
+        * the hardware, we are forced to wait to make sure the guest 
+        * actually gets interrupted in a reasonable amount of time.  If it
+        * does not, we assume that the IPI failed because it was too early
+        * and must try again until it does.
+        *
+        * This condvar/spinlock/timeout/retry eliminate the race in a safe
+        * manner, at the expense of making the INTR delivery synchronous
         */
+       spin_lock(&vcpu->irq.lock);
+       
+       if (vcpu->irq.task) {
+               struct timespec tmo = {
+                       .tv_sec  = 0,
+                       .tv_nsec = 100000 /* 100us */
+               };
+
+               BUG_ON(vcpu->irq.task == current);
+                       
+               while (vcpu->irq.task) {
+                       DEFINE_WAIT(__wait);    
+
+                       send_sig(SIGSTOP, vcpu->irq.task, 0);   
+
+                       prepare_to_wait(&vcpu->irq.wq, &__wait, 
+                                       TASK_UNINTERRUPTIBLE);
+                       spin_unlock(&vcpu->irq.lock);
+                       schedule_timeout(timespec_to_jiffies(&tmo));
+                       spin_lock(&vcpu->irq.lock);
+                       finish_wait(&vcpu->irq.wq, &__wait);
+               }
+               
+               vcpu->irq.pending = 1;
+       }
+       
+       spin_unlock(&vcpu->irq.lock);
 }
 
 static void kvm_vcpu_irqsink_init(struct kvm_vcpu *vcpu)
diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index e59a548..41765bd 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -1463,9 +1463,25 @@ static int svm_vcpu_run(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
        int r;
 
 again:
+       spin_lock(&vcpu->irq.lock);
+
+       /*
+        * Setting vcpu->task signals to outsiders that the VMCS is 
+        * effectively in GUEST mode, and therefore must be signalled
+        * to transition the task back to HOST mode if any new interrupts
+        * arrive.
+        */
+       vcpu->irq.task = current;
+
+       /*
+        * We also must inject interrupts (if any) while the irq_lock
+        * is held
+        */
        if (!vcpu->mmio_read_completed)
                do_interrupt_requests(vcpu, kvm_run);
 
+       spin_unlock(&vcpu->irq.lock);
+
        clgi();
 
        pre_svm_run(vcpu);
@@ -1617,6 +1633,25 @@ again:
        reload_tss(vcpu);
 
        /*
+        * Signal that we have transitioned back to host mode 
+        */
+       spin_lock(&vcpu->irq.lock);
+
+       vcpu->irq.task = NULL;
+       wake_up(&vcpu->irq.wq);
+
+       /*
+        * If irqpending is asserted someone undoubtedly has sent us a SIGSTOP
+        * signal.  Counter it with a SIGCONT
+        */
+       if(vcpu->irq.pending) {
+           send_sig(SIGCONT, current, 0);
+           vcpu->irq.pending = 0;
+       }
+
+       spin_unlock(&vcpu->irq.lock);
+
+       /*
         * Profile KVM exit RIPs:
         */
        if (unlikely(prof_on == KVM_PROFILING))
diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index a0fdf02..1d5ce85 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -1748,9 +1748,25 @@ again:
        vmcs_writel(HOST_GS_BASE, segment_base(gs_sel));
 #endif
 
+       spin_lock(&vcpu->irq.lock);
+
+       /*
+        * Setting vcpu->task signals to outsiders that the VMCS is 
+        * effectively in GUEST mode, and therefore must be signalled
+        * to transition the task back to HOST mode if any new interrupts
+        * arrive.
+        */
+       vcpu->irq.task = current;
+
+       /*
+        * We also must inject interrupts (if any) while the irq_lock
+        * is held
+        */
        if (!vcpu->mmio_read_completed)
                do_interrupt_requests(vcpu, kvm_run);
 
+       spin_unlock(&vcpu->irq.lock);
+
        if (vcpu->guest_debug.enabled)
                kvm_guest_debug_pre(vcpu);
 
@@ -1911,6 +1927,25 @@ again:
 
        asm ("mov %0, %%ds; mov %0, %%es" : : "r"(__USER_DS));
 
+       /*
+        * Signal that we have transitioned back to host mode 
+        */
+       spin_lock(&vcpu->irq.lock);
+
+       vcpu->irq.task = NULL;
+       wake_up(&vcpu->irq.wq);
+
+       /*
+        * If irqpending is asserted someone undoubtedly has sent us a SIGSTOP
+        * signal.  Counter it with a SIGCONT
+        */
+       if(vcpu->irq.pending) {
+           send_sig(SIGCONT, current, 0);
+           vcpu->irq.pending = 0;
+       }
+
+       spin_unlock(&vcpu->irq.lock);
+
        if (fail) {
                kvm_run->exit_reason = KVM_EXIT_FAIL_ENTRY;
                kvm_run->fail_entry.hardware_entry_failure_reason

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

_______________________________________________
kvm-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] irqdevice INTR example

Reply via email to