Re: [kvm-devel] irqdevice INTR example

Gregory Haskins Thu, 12 Apr 2007 04:55:28 -0700

>>> On Thu, Apr 12, 2007 at  4:02 AM, in message <[EMAIL PROTECTED]>,
Avi Kivity <[EMAIL PROTECTED]> wrote: 
> Gregory Haskins wrote:
>> Hi All,
>>   Attached are the first three patches in my queue.  The first two you are 
> likely familiar with at this point (though I have made some more of the 
> requested changes to 02- irqdevice.patch).  The last item 
> (03- preemptible- cpu.patch) adds an implementation to the previously unused 
> kvm_vcpu_intr() callback.  This acts as a functional example of the INTR 
> callback mechanism as Avi requested.  Note that the work related to 
> IF/NMI/TPR classification of interrupts happens later in my queue and is not 
> mature enough to share yet, but hopefully soon.
>>
>>   
>> KVM: Preemptible VCPU
>>
>> From:  <>
>>
>> This adds support for interrupting an executing CPU
>>
>> diff -- git a/drivers/kvm/condvar.c b/drivers/kvm/condvar.c
>> new file mode 100644
>> index 0000000..87e464a
>> ---  /dev/null
>> +++ b/drivers/kvm/condvar.c
>> @@ - 0,0 +1,109 @@
>> +/*
>> + * Condition Variable
>> + *
>> + * Copyright (C) 2007, Novell
>> + *
>> + * Authors:
>> + *   Gregory Haskins <[EMAIL PROTECTED]>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> + * the COPYING file in the top- level directory.
>> + *
>> + */
>>   
> 
> If you want condition variables, activate your cryogenically- cooled suit 
> and post on it on lkml.  This cannot be added to the kernel via kvm.


Hehe.  I wasn't aware of the contention surrounding them but that does explain 
why they are notably missing.  I will rip it out this code in favor of a pure 
waitqueue as Christof suggested to avoid more controversy.

> 
> 
> 
>> +
>>      /*
>> -     * FIXME: Implement this or the CPU wont notice the interrupt until
>> -     * the next natural VMEXIT.  Note that this is how the system
>> -     * has always worked, so nothing is broken here.  This is a future
>> -     * enhancement
>> +     * HACK ALERT!
>> +     *
>> +     * We want to send a virtual interrupt signal to the task that owns
>> +     * the guest.  However, the signal will only force a VMEXIT (via
>> +     * a reschedule IPI) if the task is currently in GUEST mode.  There
>> +     * is a race condition between the time that we mark the vcpu as
>> +     * running and the time the system actually enter guest mode.  Since
>> +     * there doesnt appear to be any way to help with this situation from
>> +     * the VT hardware, we are forced to wait to make sure the guest 
>>   
> 
> We support AMD too.

My patch actually supports both, and I didn't mean to infer that AMD was 
skipped.  I was speaking generally when I said "VT".  Is there a better term to 
refer to the concept of VMX/SVN?

> 
>> +     * actually gets interrupted in a reasonable amount of time.  If it
>> +     * does not, we assume that the IPI failed because it was too early
>> +     * and must try again until it does.
>>   
> 
> Waiting seems totally broken.  I'm not even sure what we are waiting for 
> --  certainly you can't wait for re- entry into guest mode, since there's 
> not guarantee that's coming.  

Actually, there is ;)  (unless I missed something)   If you look at the other 
side of this implementation in vcpu_XXX_run(), I only set the vcpu->irq.task 
when the vcpu is just about to enter guest mode.  And I clear it when it 
leaves.  In other words, the kvn_vcpu_intr() function is a no-op if the vcpu is 
not in guest mode (or close to it).

> If it is elsewhere, we need to use one of the many (alas) standard wakeup 
> mechanisms to 
> wake up userspace and let it know the vcpu needs service.

My code assumes we only need to kick the vcpu if its in guest mode because it 
figures it will just pick up the interrupt on the next run cycle otherwise.  I 
haven't studied QEMU deep enough to know that it can suspend the execution of 
the run-cycle...i just thought it looped indefinitely.  This makes sense now 
that I think about it because something like hlt should cause a suspension of 
CPU activity until further notice.  If we need to be able to signal the vcpu 
*anytime* I will have to go back and refactor some things.

> 
> Qemu uses signals, but I think that making the vcpu fd writable is more 
> generic.  For qemu, we can attach a signal to the fd (using fcntl(2)); 
> that has the benefit of leaving the choice of which signal to use to 
> userspace.

Ack,  I will look into this design.

> 
>> +     *
>> +     * This condvar/spinlock/timeout/retry eliminate the race in a safe
>> +     * manner, at the expense of making the INTR delivery synchronous
>>       */
>> +    spin_lock(&vcpu- >irq.lock);
>> +    
>> +    if (vcpu- >irq.task) {
>> +            struct timespec tmo = {
>> +                    .tv_sec  = 0,
>> +                    .tv_nsec = 100000 /* 100us */
>> +            };
>> +
>> +            BUG_ON(vcpu- >irq.task == current);
>> +                    
>> +            while (vcpu- >irq.task) {
>> +                    send_sig(SIGSTOP, vcpu- >irq.task, 0);
>>   
> 
> SIGSTOP?!  I lost the plot here.

Well, I needed a signal to send that would reliably IPI the guest.  I 
originally wrote the code to send RT signal 33 (which I defined as 
KVM_SIGNAL_VIRTUAL_INTERRUPT).  In both cases, the design calls for sending a 
signal (purely for the IPI function of the kernel) and then to remove the 
signal before userspace sees it.  I had some trouble with the code that used 
dequeue_signal() to strip out the RT signal.  So I then changed to sending 
SIGSTOP to IPI, and SIGCONT to effectively strip out the pending SIGSTOP.

Keep in mind that this is really just proof-of-concept code.  The general idea 
of this POC is to demonstrate how I envisioned the irqdevice API to be used to 
deliver interrupts even once concurrent access is possible (e.g., SMP/PV).  The 
details of which signal are used, etc, are incidental for now until we agree 
that the irqdevice API approach is acceptable. 

Regards,
-Greg


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
kvm-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] irqdevice INTR example

Reply via email to