I have attached a new version of the patch which eliminates the condition
variable (if only by name, anyway ;)
>>> On Thu, Apr 12, 2007 at 8:49 AM, in message <[EMAIL PROTECTED]>,
Avi Kivity <[EMAIL PROTECTED]> wrote:
>
>
> Actually, I am in favor of having well- defined synchronization
> primitives. The only issues I see with condition variables are:
>
> - there are many variants of mutexes in the kernel (mutex, spinlock,
> spinlock with irqs disabled, ...), so many condvar variants are needed.
Yeah, I tried to address that with lock_ops, but I agree.
> - they're very close to waitqueues, so perhaps there's not much
> justification
Agreed they are very close. I won't go into my opinion of waitqueues vs
cond-vars in this forum ;)
>
> I'm no synchronization primitive expert, though. My comment had nothing
> to do with my opinion on condition variables or your implementation
> thereof. kvm is simply not the place to introduce them.
No problem. I think your argument is a good one.
>
> I referred to the comment. Maybe just "the hardware"?
Done
>
> Ah, ok -- I misunderstood the whole thing. The way to avoid the race is
> to disable interrupts before entering the guest. This way the IPI is
> delayed until you enter guest mode:
>
> irq_disable();
>
> spin_lock();
> vcpu- >guest_mode = 1;
> check_whether_an_irq_is_pending_and_if_so_inject_it();
> spin_unlock();
>
> asm ("vm_entry_sequence");
>
> vcpu- >guest_mode = 0;
>
> irq_enable(); // currently in the vm entry sequence
>
> // recheck here?
>
> If the interrupt happens before the spin_lock(), we'll get a non- ipi
> wakeup and then see it in check_whether(). If it happens after it we'll
> get an IPI which will be ignored until we're snugly in guest mode.
When I first read this I thought "whoa! you want to disable interrupts during
the whole time we are in GUEST mode?" But then it dawned on me what you are
suggesting: Interrupts would be re-enabled after the context switch because we
re-load the processor state? Cool! The thing I can't wrap my head around is
what happens when the guest has IF=0 and and external interrupt comes in?
Would we still exit?
But that aside, a problem that I see is that (IIUC) IPIs use NMIs not EXT-INTs.
Assuming that is right, I suppose we might be able to do a similar trick
except we also disable NMIs first (is this possible/recommended/forbidden?).
>
> In general I find it useful to pretend there are many userspaces being
> written for kvm, otherwise we get locked into qemu's current mode of
> operation.
Sounds reasonable. I will start to do the same.
>
>> This makes sense now that I think about it because something like hlt should
> cause a suspension of CPU activity until further notice.
>
> An alternative is to handle hlt in the kernel in a place where we're
> ready for the IPI wakeup. The downside to that is that we have to be
> prepared for external wakeup sources as well (signal, poll, aio... messy).
Hmmm...interesting. I wonder if there are advantages that make this worth
exploring. Oh well, I will back burner these thoughts until the SMP/PV/APIC
stuff is sorted out.
>
> It's best not to use signals internally. Qemu relies on them and we
> have to support them, but in kernel we should use existing kernel
> mechanisms.
But Avi, this was *your* idea to use signals ;) But in all seriousness, I
don't know if I have a choice. I have two requirements which constrain me:
1) I need an IPI to cause a VMEXIT
2) I need to support 2.6.16, strongly preferable as a loadable module.
According to my research (which is undoubtedly not 100% definitive), 2.6.16 or
even the newer kernels do not export most of the IPI facilities. send_sig()
happens to be exported and it happens to invoke a reschedule_IPI under the
hood, so its convenient. If there is another way to get access to the IPI
facility without playing games with signals, I am all ears. But until then I
don't know what else to do. If I had the luxury of modifying the kernel
source, we could just export what we needed and be done with it.
>
> I was interested in how - >pending() and - >read_vector() and the raise
> callback interact, but got distracted by the, err, uniqueness of the
> signal thing.
You still haven't weighed in here ;) Hopefully you have a clearer picture of
what I was trying to do now at least. Whether you agree or not is another
matter.
Regards,
-Greg
KVM: Preemptible VCPU
From: <>
This adds support for interrupting an executing CPU
Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---
drivers/kvm/kvm.h | 11 ++++++++++
drivers/kvm/kvm_main.c | 54 ++++++++++++++++++++++++++++++++++++++++++++----
drivers/kvm/svm.c | 35 +++++++++++++++++++++++++++++++
drivers/kvm/vmx.c | 35 +++++++++++++++++++++++++++++++
4 files changed, 130 insertions(+), 5 deletions(-)
diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 58966d9..70d1bb9 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -271,6 +271,16 @@ void kvm_io_bus_register_dev(struct kvm_io_bus *bus,
#define NR_IRQ_WORDS KVM_IRQ_BITMAP_SIZE(unsigned long)
+/*
+ * structure for maintaining info for interrupting an executing VCPU
+ */
+struct kvm_vcpu_irq {
+ spinlock_t lock;
+ wait_queue_head_t wq;
+ struct task_struct *task;
+ int pending;
+};
+
struct kvm_vcpu {
struct kvm *kvm;
union {
@@ -284,6 +294,7 @@ struct kvm_vcpu {
struct kvm_run *run;
int interrupt_window_open;
struct kvm_irqdevice irq_dev;
+ struct kvm_vcpu_irq irq;
unsigned long regs[NR_VCPU_REGS]; /* for rsp: vcpu_load_rsp_rip() */
unsigned long rip; /* needs vcpu_load_rsp_rip() */
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 7e00412..1cf4060 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -299,6 +299,11 @@ static struct kvm *kvm_create_vm(void)
struct kvm_vcpu *vcpu = &kvm->vcpus[i];
mutex_init(&vcpu->mutex);
+
+ memset(&vcpu->irq, 0, sizeof(vcpu->irq));
+ spin_lock_init(&vcpu->irq.lock);
+ init_waitqueue_head(&vcpu->irq.wq);
+
vcpu->cpu = -1;
vcpu->kvm = kvm;
vcpu->mmu.root_hpa = INVALID_PAGE;
@@ -2320,13 +2325,52 @@ static void kvm_vcpu_intr(struct kvm_irqsink *this,
* Our irq device is requesting to interrupt the vcpu. If it is
* currently running, we should inject a host IPI to force a VMEXIT
*/
-
+ struct kvm_vcpu *vcpu = (struct kvm_vcpu*)this->private;
+
/*
- * FIXME: Implement this or the CPU wont notice the interrupt until
- * the next natural VMEXIT. Note that this is how the system
- * has always worked, so nothing is broken here. This is a future
- * enhancement
+ * HACK ALERT!
+ *
+ * We want to send a virtual interrupt signal to the task that owns
+ * the guest. However, the signal will only force a VMEXIT (via
+ * a reschedule IPI) if the task is currently in GUEST mode. There
+ * is a race condition between the time that we mark the vcpu as
+ * running and the time the system actually enter guest mode. Since
+ * there doesnt appear to be any way to help with this situation from
+ * the hardware, we are forced to wait to make sure the guest
+ * actually gets interrupted in a reasonable amount of time. If it
+ * does not, we assume that the IPI failed because it was too early
+ * and must try again until it does.
+ *
+ * This condvar/spinlock/timeout/retry eliminate the race in a safe
+ * manner, at the expense of making the INTR delivery synchronous
*/
+ spin_lock(&vcpu->irq.lock);
+
+ if (vcpu->irq.task) {
+ struct timespec tmo = {
+ .tv_sec = 0,
+ .tv_nsec = 100000 /* 100us */
+ };
+
+ BUG_ON(vcpu->irq.task == current);
+
+ while (vcpu->irq.task) {
+ DEFINE_WAIT(__wait);
+
+ send_sig(SIGSTOP, vcpu->irq.task, 0);
+
+ prepare_to_wait(&vcpu->irq.wq, &__wait,
+ TASK_UNINTERRUPTIBLE);
+ spin_unlock(&vcpu->irq.lock);
+ schedule_timeout(timespec_to_jiffies(&tmo));
+ spin_lock(&vcpu->irq.lock);
+ finish_wait(&vcpu->irq.wq, &__wait);
+ }
+
+ vcpu->irq.pending = 1;
+ }
+
+ spin_unlock(&vcpu->irq.lock);
}
static void kvm_vcpu_irqsink_init(struct kvm_vcpu *vcpu)
diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index e59a548..41765bd 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -1463,9 +1463,25 @@ static int svm_vcpu_run(struct kvm_vcpu *vcpu, struct
kvm_run *kvm_run)
int r;
again:
+ spin_lock(&vcpu->irq.lock);
+
+ /*
+ * Setting vcpu->task signals to outsiders that the VMCS is
+ * effectively in GUEST mode, and therefore must be signalled
+ * to transition the task back to HOST mode if any new interrupts
+ * arrive.
+ */
+ vcpu->irq.task = current;
+
+ /*
+ * We also must inject interrupts (if any) while the irq_lock
+ * is held
+ */
if (!vcpu->mmio_read_completed)
do_interrupt_requests(vcpu, kvm_run);
+ spin_unlock(&vcpu->irq.lock);
+
clgi();
pre_svm_run(vcpu);
@@ -1617,6 +1633,25 @@ again:
reload_tss(vcpu);
/*
+ * Signal that we have transitioned back to host mode
+ */
+ spin_lock(&vcpu->irq.lock);
+
+ vcpu->irq.task = NULL;
+ wake_up(&vcpu->irq.wq);
+
+ /*
+ * If irqpending is asserted someone undoubtedly has sent us a SIGSTOP
+ * signal. Counter it with a SIGCONT
+ */
+ if(vcpu->irq.pending) {
+ send_sig(SIGCONT, current, 0);
+ vcpu->irq.pending = 0;
+ }
+
+ spin_unlock(&vcpu->irq.lock);
+
+ /*
* Profile KVM exit RIPs:
*/
if (unlikely(prof_on == KVM_PROFILING))
diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index a0fdf02..1d5ce85 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -1748,9 +1748,25 @@ again:
vmcs_writel(HOST_GS_BASE, segment_base(gs_sel));
#endif
+ spin_lock(&vcpu->irq.lock);
+
+ /*
+ * Setting vcpu->task signals to outsiders that the VMCS is
+ * effectively in GUEST mode, and therefore must be signalled
+ * to transition the task back to HOST mode if any new interrupts
+ * arrive.
+ */
+ vcpu->irq.task = current;
+
+ /*
+ * We also must inject interrupts (if any) while the irq_lock
+ * is held
+ */
if (!vcpu->mmio_read_completed)
do_interrupt_requests(vcpu, kvm_run);
+ spin_unlock(&vcpu->irq.lock);
+
if (vcpu->guest_debug.enabled)
kvm_guest_debug_pre(vcpu);
@@ -1911,6 +1927,25 @@ again:
asm ("mov %0, %%ds; mov %0, %%es" : : "r"(__USER_DS));
+ /*
+ * Signal that we have transitioned back to host mode
+ */
+ spin_lock(&vcpu->irq.lock);
+
+ vcpu->irq.task = NULL;
+ wake_up(&vcpu->irq.wq);
+
+ /*
+ * If irqpending is asserted someone undoubtedly has sent us a SIGSTOP
+ * signal. Counter it with a SIGCONT
+ */
+ if(vcpu->irq.pending) {
+ send_sig(SIGCONT, current, 0);
+ vcpu->irq.pending = 0;
+ }
+
+ spin_unlock(&vcpu->irq.lock);
+
if (fail) {
kvm_run->exit_reason = KVM_EXIT_FAIL_ENTRY;
kvm_run->fail_entry.hardware_entry_failure_reason
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
kvm-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/kvm-devel