date:20070508

Re: [kvm-devel] Best practices for tracking KVM development

2007-05-08 Thread Avi Kivity

Wink Saville wrote:
> Hello,
>
> Based on the initial feed back I'm going to start trying to implement
> a PV block device that will use kshmem/ACE. What is the best
> way to track KVM development.
>
>   

See http://kvm.qumranet.com/kvmwiki/Code.

-- 
error compiling committee.c: too many arguments to function

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] kshmem & ACE

2007-05-08 Thread Avi Kivity

Wink Saville wrote:
>>
>> Most paravirtual devices use atomic operations (or even just raw memory
>> accesses and memory barriers), which don't need any special
>> infrastructure.  This effectively makes them message-passing protocols
>> rather than shared memory protocol.  I can't see offhand why sharing
>> data structures would bring a great improvement, but maybe I'm tied to
>> the old way of thinking.
>>
>
> One of the uses of kshmem/ACE will be an implementation of a message
> passing technique that I hope to be quite general. Since this appears to
> be a common technique then maybe there is nothing new in what I've done
> which may make it redundant or may be a real contribution if it is more
> general then current techniques.
>
> Where might I find the current implementations of the PV devices?
>

A good well-tuned example is the Xen paravirtualized drivers. See 
http://article.gmane.org/gmane.linux.kernel.virtualization/2659 for a 
driver, and 
http://article.gmane.org/gmane.linux.kernel.virtualization/2660 (ring.h) 
for the underlying machinery.

-- 
error compiling committee.c: too many arguments to function

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 3/4] KVM: Adds ability to preempt an executing VCPU

2007-05-08 Thread Avi Kivity

Gregory Haskins wrote:
>> 
>>>  
>>> vcpu- >cpu = - 1;
>>> vcpu- >kvm = kvm;
>>> @@ - 366,13 +370,20 @@ static void free_pio_guest_pages(struct kvm_vcpu 
>>> *vcpu)
>>>  
>>>  static void kvm_free_vcpu(struct kvm_vcpu *vcpu)
>>>  {
>>> +   unsigned long irqsave;
>>> +
>>> if (!vcpu- >vmcs)
>>> return;
>>>  
>>> vcpu_load(vcpu);
>>> kvm_mmu_destroy(vcpu);
>>> vcpu_put(vcpu);
>>> +
>>> +   spin_lock_irqsave(&vcpu- >irq.lock, irqsave);
>>> +   vcpu- >irq.task = NULL;
>>> +   spin_unlock_irqrestore(&vcpu- >irq.lock, irqsave);
>>>   
>>>   
>> Can irq.task be non- NULL here at all?  Also, we only free vcpus when we 
>> destroy the vm, and paravirt drivers would hopefully hold a ref to the 
>> vm, so there's nobody to race against here.
>> 
>
> I am perhaps being a bit overzealous here.  What I found in practice is that 
> the LVTT can screw things up on shutdown, so I was being pretty conservative 
> on the synchronization here.  
>
>   

That may point out to a different sync problem.  All pending timers 
ought to have been canceled before we reach here.  Please check to make 
sure this isn't papering over another problem.

>>> kvm_irqdevice_destructor(&vcpu- >irq.dev);
>>>  
>>> @@ - 1868,6 +1880,10 @@ static int kvm_vcpu_ioctl_run(struct kvm_vcpu 
>>> *vcpu, 
>>>   
>> struct kvm_run *kvm_run)
>> 
>>> kvm_arch_ops- >decache_regs(vcpu);
>>> }
>>>  
>>> +   spin_lock_irqsave(&vcpu- >irq.lock, irqsaved);
>>> +   vcpu- >irq.task = current;
>>> +   spin_unlock_irqrestore(&vcpu- >irq.lock, irqsaved);
>>> +
>>>   
>>>   
>> Just assignment + __smp_wmb().
>> 
>
> (This comment applies to all of the subsequent reviews where memory barriers 
> are recommended instead of locks:)
>
> I cant quite wrap my head around whether all these critical sections are 
> correct with just a barrier instead of a full-blown lock.  I would prefer to 
> be conservative and leave them as locks for now.  Someone with better insight 
> could make a second pass and optimize the locks into barriers where 
> appropriate.  I am just uncomfortable doing it feeling confident that I am 
> not causing races.  If you insist on making the changes before the code is 
> accepted, ok.  Just note that I am not comfortable ;)
>
>   

I approach it from the other direction: to me, a locked assignment says 
that something is fundamentally wrong.  Usually anything under a lock is 
a read-modify-write operation, otherwise the writes just stomp on each 
other.

This is the source of the all the race-after-vmexit-irq-check comments 
you've been getting to me.  No matter how many times you explain it, 
every time I see it the automatic race alarm pops up.

>>> +/*
>>>   * This function will be invoked whenever the vcpu- >irq.dev raises its 
>>> INTR
>>>   * line
>>>   */
>>> @@ - 2318,10 +2348,52 @@ static void kvm_vcpu_intr(struct kvm_irqsink *this,
>>>  {
>>> struct kvm_vcpu *vcpu = (struct kvm_vcpu*)this- >private;
>>> unsigned long flags;
>>> +   int direct_ipi = - 1;
>>>  
>>> spin_lock_irqsave(&vcpu- >irq.lock, flags);
>>> -   __set_bit(pin, &vcpu- >irq.pending);
>>> +
>>> +   if (!test_bit(pin, &vcpu- >irq.pending)) {
>>> +   /*
>>> +* Record the change..
>>> +*/
>>> +   __set_bit(pin, &vcpu- >irq.pending);
>>> +
>>> +   /*
>>> +* then wake up the vcpu (if necessary)
>>> +*/
>>> +   if (vcpu- >irq.task && (vcpu- >irq.task != current)) {
>>> +   if (vcpu- >irq.guest_mode) {
>>> +   /*
>>> +* If we are in guest mode, we can optimize
>>> +* the IPI by executing a function directly
>>> +* on the owning processor.
>>> +*/
>>> +   direct_ipi = task_cpu(vcpu- >irq.task);
>>> +   BUG_ON(direct_ipi == smp_processor_id());
>>> +   } else
>>> +   /*
>>> +* otherwise, we must assume that we could be
>>> +* blocked anywhere, including userspace. Send
>>> +* a signal to give everyone a chance to get
>>> +* notification
>>> +*/
>>> +   send_sig(vcpu- >irq.signo, vcpu- >irq.task, 0);
>>> +   }
>>> +   }
>>> +
>>> spin_unlock_irqrestore(&vcpu- >irq.lock, flags);
>>> +
>>> +   if (direct_ipi != - 1) {
>>> +   /*
>>> +* Not sure if disabling preemption is needed.
>>> +* The kick_process() code does this so I copied it
>>> +*/
>>> +   preempt_disable();
>>>   

[preemption is disabled here anyway]

>>> +   smp_call_function_single(direct_ipi,
>>> +kvm_vcpu_guest_intr,
>>> +

Re: [kvm-devel] [PATCH 4/4] KVM: Add support for in-kernel LAPIC model

2007-05-08 Thread Avi Kivity

Gregory Haskins wrote:
>   
>>> +
>>> +static struct kvm_irqdevice *get_irq_dev(struct kvm_kernint *s)
>>> +{
>>> +   struct kvm_irqdevice *dev;
>>> +
>>> +   if (kvm_lapic_enabled(s- >vcpu))
>>> +   dev = &s- >apic_irq;
>>> +   else
>>> +   dev = s- >ext_irq;
>>> +
>>> +   if (!dev)
>>> +   kvm_crash_guest(s- >vcpu- >kvm);
>>>   
>>>   
>> Can this happen? Doesn't there always have to be an external interrupt 
>> controller when the lapic is disabled?
>> 
>
> Well, if a non-BSP processor calls this, ext_irq could be null.  Its a 
> pathological condition, thus the harsh punishment to the guest.  If its an 
> AP, it *better* be using its LAPIC.  If its not, something is busted. 
>
>
>   

Yes, I agree.

>>>   
>>>   
>> ... for example:
>> #define hrtimer timer_list
>>
>> I'd just drop it for now; it makes the code hard to read. We can re- add 
>> it later.
>> 
>
> By "drop it", I assume you mean drop the abstraction and just use native 
> HRTIMER directly for now?  
>
>   

Yes, we can worry about compatibility when this merged.

>> [I didn't audit the lapic code]
>>
>> Where's vcpu- >cr8 gone?
>> 
>
> I know you requested that this entry remain in the VCPU structure.  However, 
> I couldnt make this work reasonably.  The APIC wants to maintain this value 
> itself so the two can very easily get out of sync.  I suppose there could be 
> ways to make the APIC use the vcpu->cr8 variable as storage for TPR (albeit 
> messy), but this idea falls apart when we start looking at optimizations like 
> TPR shadowing.
>
> Based on all that, I felt it was best to just maintain CR8 as the TPR 
> register in the model.
>
>   

I don't understand.  Isn't the tpr read-only from the point of view of 
the lapic?

A simple set_cr8 helper should do the trick.


-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 3/4] KVM: Adds ability to preempt an executing VCPU

2007-05-08 Thread Avi Kivity

Gregory Haskins wrote:
>   
>> Hopefully not by setting the 
>> signal number, bit by making the vcpu fd writable (userspace can attach 
>> a signal to the fd if it wishes).
>> 
>
> Can you provide an example of what you would like here?  I am not quite sure 
> what you mean by making the fd writable.
>   

Making it respond to poll(2) as a writable fd.

See http://lwn.net/Articles/226252/ for an example.  It makes the fd 
readable instead of writable, but it's the same mechanism.

The larger picture is that fds and the poll() family are the closest 
thing Linux has to a generic event framework.  If the patchset in the 
article above is accepted, fds _will_ be the generic event framework.  
If something else is accepted, we'll just switch to that.

Qemu currently depends on signals, but you can have a writable fd 
generate a signal, and the mechanism for that is optional and 
configurable from userspace, which is what we want anyway.

-- 
error compiling committee.c: too many arguments to function

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] Clarify old version messages and clean-up

2007-05-08 Thread Avi Kivity

Nguyen Anh Quynh wrote:
> This patch clarifies some "old version" error message, and deletes
> some unused functions in userspace code.
>
> Signed-off-by: Nguyen Anh Quynh <[EMAIL PROTECTED]>
>

Applied.  Please send unrelated changes in different patches in the future.

-- 
error compiling committee.c: too many arguments to function

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] install kvm modules into drivers/kvm

2007-05-08 Thread Avi Kivity

Nguyen Anh Quynh wrote:
> This patch installs kvm modules into drivers/kvm directory rather than 
> in extra/
>
> Without this patch, I got a problem: after installing kvm (got from
> git repo), modprobe always found and picked kvm modules in drivers/kvm
> (ie old modules) instead the new ones (in extra/). This confuses me
> for sometime because the old modules (which is available in 2.6.20) is
> not supported by the current libkvm.
>

I'm not sure about this.  While the problem is very real (in fact, I've 
been bitten by it too), overwriting somebody else's files seems wrong.  
For example, now you can't uninstall kvm and get back the original modules.

-- 
error compiling committee.c: too many arguments to function

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] configure qemu

2007-05-08 Thread Avi Kivity

Jeff Chua wrote:
> Avi,
>
> Here's a little patch to silent ...
>   - sdl-config when SDK is compiled without static library
>   - texi2html when not found
>
>   

As this is just a cleanup, and not strictly necessary, please post it to 
qemu-devel so that we diverge as little as possible from upstream qemu.

-- 
error compiling committee.c: too many arguments to function

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] Cursor release failures

2007-05-08 Thread Avi Kivity

Michael Ivanov wrote:
> Avi Kivity пишет:
>
>   
>> Well, it's supposed to be fixed in kvm-22.  Are you 100% certain you 
>> don't have leftover modules somewhere?  Please load them with insmod 
>> instead of modprobe.
>> 
> I loaded modules using insmod directly from kvm-22/kernel and retested.
> I was not exact in my initial mail, sorry: the pointer would not be
> released on ctrl alt, but the whole system hanged when the guest system
> terminated. My current results are as follows:
>
>   * ctrl alt does NOT release the pointer
>   

This is strange.

>   * kill -TERM to qemu does NOT hang the system: the vm is terminated
> and the pointer IS released.
>   

Okay.

>   * when the guest system tries to reboot, the host system hangs. I was
> not able to verify whether it was just frozen x windows or the
> whole system crashed.
>   

This is fixed in kvm-23.


-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH][UPDATE] Shortcut MSR save/restore for lightweight VM Exit (was: RE: shortcut for lightweight VM Exit)

2007-05-08 Thread Dong, Eddie

[EMAIL PROTECTED] wrote:
> BTW, I have another patch in hand to further reduce MSR
> save/restore and
> thus
> improve performance more for lightweight VM Exit. Base on my
> observation for FC5 32 bits
> guest, 93% VM Exit will fall into the lightweight path.
> 
This patch reduce the VM Exit handling cost continuously
for those lightweight VM Exit which occupies 93% of VM Exit in
KB case if 64 bits OS has similar situation with 32 bits. In my old 
machine, I saw 20% performance increasement of KB within 64 bits 
RHEL5 guest and flat for 32bits FC5.
There are still some room to improvment here, but this one
focus on basic MSR save/restore framework only for now and leave
 future to opitmize specific MSRs like GS_BASE etc.
thx,eddie

Signed-off-by:  Yaozu(Eddie) Dong [EMAIL PROTECTED]

against 5cf48c367dec74ba8553c53ed332cd075fa38b88


commit a7294eae555b7d42f7e44b8d7955becad2feebf8
Author: root <[EMAIL PROTECTED](none)>
Date:   Tue May 8 17:32:24 2007 +0800

Avoid MSR save/restore for lightweight VM Exit

Signed-off-by:  Yaozu(Eddie) Dong [EMAIL PROTECTED]

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 11eb25e..86abf2d 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -285,6 +285,7 @@ struct kvm_vcpu {
u64 apic_base;
u64 ia32_misc_enable_msr;
int nmsrs;
+   int sw_save_msrs;
struct vmx_msr_entry *guest_msrs;
struct vmx_msr_entry *host_msrs;
 
diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index 4e04b85..c2d06b5 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -80,23 +80,11 @@ static const u32 vmx_msr_index[] = {
 #ifdef CONFIG_X86_64
MSR_SYSCALL_MASK, MSR_LSTAR, MSR_CSTAR, MSR_KERNEL_GS_BASE,
 #endif
-   MSR_EFER, MSR_K6_STAR,
+   MSR_K6_STAR, MSR_EFER,
 };
+#define NR_HW_SAVE_MSRS1   /* HW save MSR_EFER */
 #define NR_VMX_MSR ARRAY_SIZE(vmx_msr_index)
 
-#ifdef CONFIG_X86_64
-static unsigned msr_offset_kernel_gs_base;
-#define NR_64BIT_MSRS 4
-/*
- * avoid save/load MSR_SYSCALL_MASK and MSR_LSTAR by std vt
- * mechanism (cpu bug AA24)
- */
-#define NR_BAD_MSRS 2
-#else
-#define NR_64BIT_MSRS 0
-#define NR_BAD_MSRS 0
-#endif
-
 static inline int is_page_fault(u32 intr_info)
 {
return (intr_info & (INTR_INFO_INTR_TYPE_MASK |
INTR_INFO_VECTOR_MASK |
@@ -339,23 +327,19 @@ static void vmx_inject_gp(struct kvm_vcpu *vcpu,
unsigned error_code)
  */
 static void setup_msrs(struct kvm_vcpu *vcpu)
 {
-   int nr_skip, nr_good_msrs;
+   int nr_skip;
 
-   if (is_long_mode(vcpu))
-   nr_skip = NR_BAD_MSRS;
-   else
-   nr_skip = NR_64BIT_MSRS;
-   nr_good_msrs = vcpu->nmsrs - nr_skip;
+   vcpu->sw_save_msrs = nr_skip = vcpu->nmsrs - NR_HW_SAVE_MSRS;
 
/*
 * MSR_K6_STAR is only needed on long mode guests, and only
 * if efer.sce is enabled.
 */
if (find_msr_entry(vcpu, MSR_K6_STAR)) {
-   --nr_good_msrs;
+   --vcpu->sw_save_msrs;
 #ifdef CONFIG_X86_64
if (is_long_mode(vcpu) && (vcpu->shadow_efer &
EFER_SCE))
-   ++nr_good_msrs;
+   ++vcpu->sw_save_msrs;
 #endif
}
 
@@ -365,9 +349,9 @@ static void setup_msrs(struct kvm_vcpu *vcpu)
virt_to_phys(vcpu->guest_msrs + nr_skip));
vmcs_writel(VM_EXIT_MSR_LOAD_ADDR,
virt_to_phys(vcpu->host_msrs + nr_skip));
-   vmcs_write32(VM_EXIT_MSR_STORE_COUNT, nr_good_msrs); /* 22.2.2
*/
-   vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, nr_good_msrs);  /* 22.2.2
*/
-   vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, nr_good_msrs); /* 22.2.2
*/
+   vmcs_write32(VM_EXIT_MSR_STORE_COUNT, NR_HW_SAVE_MSRS);
+   vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, NR_HW_SAVE_MSRS);
+   vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, NR_HW_SAVE_MSRS);
 }
 
 /*
@@ -486,7 +470,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32
msr_index, u64 data)
msr = find_msr_entry(vcpu, msr_index);
if (msr)
msr->data = data;
-   load_msrs(vcpu->guest_msrs, NR_BAD_MSRS);
+   load_msrs(vcpu->guest_msrs, vcpu->sw_save_msrs);
break;
 #endif
case MSR_IA32_SYSENTER_CS:
@@ -1218,10 +1202,6 @@ static int vmx_vcpu_setup(struct kvm_vcpu *vcpu)
vcpu->host_msrs[j].reserved = 0;
vcpu->host_msrs[j].data = data;
vcpu->guest_msrs[j] = vcpu->host_msrs[j];
-#ifdef CONFIG_X86_64
-   if (index == MSR_KERNEL_GS_BASE)
-   msr_offset_kernel_gs_base = j;
-#endif
++vcpu->nmsrs;
}
 
@@ -1861,12 +1841,8 @@ preempted:
fx_restore(vcpu->guest_fx_image);
}
 
-#ifdef CONFIG_X86_64
-   if (is_long_mode(vcpu)) {
-   save_msrs(vcpu->host_msrs + msr_offset_kernel_gs_base,
1);
-   load_msrs(vcpu->guest_msrs, NR_BAD_MSRS);
-   }
-#endif
+   save_msrs(vcpu

Re: [kvm-devel] [PATCH][UPDATE] Shortcut MSR save/restore for lightweight VM Exit

2007-05-08 Thread Avi Kivity

Dong, Eddie wrote:
> [EMAIL PROTECTED] wrote:
>   
>> BTW, I have another patch in hand to further reduce MSR
>> save/restore and
>> thus
>> improve performance more for lightweight VM Exit. Base on my
>> observation for FC5 32 bits
>> guest, 93% VM Exit will fall into the lightweight path.
>>
>> 
> This patch reduce the VM Exit handling cost continuously
> for those lightweight VM Exit which occupies 93% of VM Exit in
> KB case if 64 bits OS has similar situation with 32 bits. In my old 
> machine, I saw 20% performance increasement of KB within 64 bits 
> RHEL5 guest and flat for 32bits FC5.
>   There are still some room to improvment here, but this one
> focus on basic MSR save/restore framework only for now and leave
>  future to opitmize specific MSRs like GS_BASE etc.
> thx,eddie
>
> Signed-off-by:  Yaozu(Eddie) Dong [EMAIL PROTECTED]
>
> against 5cf48c367dec74ba8553c53ed332cd075fa38b88
>
>   

Much has changed.  Please rebase against HEAD.

Also, there have been a lot of regressions with the msr code.  Please 
test on i386 and on Core Duo i386 (which is a little different) in 
addition to the regular x86_64.

> commit a7294eae555b7d42f7e44b8d7955becad2feebf8
> Author: root <[EMAIL PROTECTED](none)>
> Date:   Tue May 8 17:32:24 2007 +0800
>
> Avoid MSR save/restore for lightweight VM Exit
> 
> Signed-off-by:  Yaozu(Eddie) Dong [EMAIL PROTECTED]
>
> diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
> index 11eb25e..86abf2d 100644
> --- a/drivers/kvm/kvm.h
> +++ b/drivers/kvm/kvm.h
> @@ -285,6 +285,7 @@ struct kvm_vcpu {
>   u64 apic_base;
>   u64 ia32_misc_enable_msr;
>   int nmsrs;
> + int sw_save_msrs;
>   struct vmx_msr_entry *guest_msrs;
>   struct vmx_msr_entry *host_msrs;
>  
> diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
> index 4e04b85..c2d06b5 100644
> --- a/drivers/kvm/vmx.c
> +++ b/drivers/kvm/vmx.c
> @@ -80,23 +80,11 @@ static const u32 vmx_msr_index[] = {
>  #ifdef CONFIG_X86_64
>   MSR_SYSCALL_MASK, MSR_LSTAR, MSR_CSTAR, MSR_KERNEL_GS_BASE,
>  #endif
> - MSR_EFER, MSR_K6_STAR,
> + MSR_K6_STAR, MSR_EFER,
>  };
> +#define NR_HW_SAVE_MSRS  1   /* HW save MSR_EFER */
>  #define NR_VMX_MSR ARRAY_SIZE(vmx_msr_index)
>   

The code has a comment that MSR_K6_STAR should be last... the comment 
should be removed.

Also, does this mean that software msr saving is faster than hardware 
msr saving?  Will this be true in future processors as well?

>  
>  static inline int is_page_fault(u32 intr_info)
>  {
>   return (intr_info & (INTR_INFO_INTR_TYPE_MASK |
> INTR_INFO_VECTOR_MASK |
> @@ -339,23 +327,19 @@ static void vmx_inject_gp(struct kvm_vcpu *vcpu,
> unsigned error_code)
>   */
>  static void setup_msrs(struct kvm_vcpu *vcpu)
>  {
> - int nr_skip, nr_good_msrs;
> + int nr_skip;
>  
> - if (is_long_mode(vcpu))
> - nr_skip = NR_BAD_MSRS;
> - else
> - nr_skip = NR_64BIT_MSRS;
> - nr_good_msrs = vcpu->nmsrs - nr_skip;
> + vcpu->sw_save_msrs = nr_skip = vcpu->nmsrs - NR_HW_SAVE_MSRS;
>   

One assignment per statement please.

 


-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [ANNOUNCE] kvm-24 release

2007-05-08 Thread Avi Kivity

The recent performance enhancements caused another serious regression, 
namely an oops while loading kvm-intel.ko on i386.

Changes from kvm-23:
- fix oops loading kvm-intel module on i386 with highmem

I will be traveling for the rest of this week and unable to introduce 
new regressions, so kvm-24 should last for a while.

Notes:
   If you use the modules from kvm-24, you can use any version of Linux
from 2.6.9 upwards.
   If you use the modules from Linux 2.6.20, you need to use kvm-12.
   If you use the modules from Linux 2.6.21, you need to use kvm-17.

   API/ABI stability is planned for Linux 2.6.22.

http://kvm.qumranet.com

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 3/4] KVM: Adds ability to preempt an executing VCPU

2007-05-08 Thread Gregory Haskins

>>> On Tue, May 8, 2007 at  4:13 AM, in message <[EMAIL PROTECTED]>,
Avi Kivity <[EMAIL PROTECTED]> wrote: 
> Gregory Haskins wrote:
>>
>> I am perhaps being a bit overzealous here.  What I found in practice is that 
> the LVTT can screw things up on shutdown, so I was being pretty conservative 
> on the synchronization here.  
>>
>>   
> 
> That may point out to a different sync problem.  All pending timers 
> ought to have been canceled before we reach here.  Please check to make 
> sure this isn't papering over another problem.
> 

You are definitely right there.  I had added this logic in the early stage of 
debugging.  It turned out that I was missing an apic_dropref, which effectively 
meant the hrtimer_cancel() was never being issued.  That was the root-cause of 
my "LVTT expiration after guest shutdown" bug.  I left the sync code in as a 
conservative measure, but I will clean this up.

>>
>>   
> 
> I approach it from the other direction: to me, a locked assignment says 
> that something is fundamentally wrong.  Usually anything under a lock is 
> a read- modify- write operation, otherwise the writes just stomp on each 
> other.
> 

Interesting. That makes sense.  So if I replace the assignment cases with wmb, 
do I need to sprinkle rmbs anywhere or is that take care of naturally by the 
places where we take the lock for a compound operation?

>
> 
> [preemption is disabled here anyway]
>

Ack.  I will remove the calls

 +  smp_call_function_single(direct_ipi,
 +   kvm_vcpu_guest_intr,
 +   vcpu, 0, 0);
 +  preempt_enable();
 +  }

>>> I see why you must issue the IPI outside the spin_lock_irqsave(), but 
>>> aren't you now opening a race?  vcpu enters guest mode, irq on other 
>>> cpu, irq sets direct_ipi to wakeup guest, releases lock, vcpu exits to 
>>> userspace (or migrates to another cpu), ipi is issued but nobody cares.
>>> 
>>
>> Its subtle, but I think its ok.  The race is actually against the setting of 
> the irq.pending.  This *must* happen inside the lock or the guest could exit 
> and miss the interrupt.  Once the pending bit is set, however, the guest can 
> be woken up in any old fashion and the behavior should be correct.  If the 
> guest has already exited before the IPI is issued, its effectively a no- op 
> (well, really its just a wasted IPI/reschedule event,  but no harm is done).  
> Does this make sense?  Did I miss something else?
>>   
> 
> No, you are correct wrt the vcpu migrating to another cpu.
> 
> What about vs. exit to userspace where we may sleep?

My logic being correct is predicated on the assumption that you and I made a 
week or two ago:  That the user-space will not sleep for anything but HLT.  If 
userspace *can* sleep on other things besides HLT, I agree that there is a race 
here.  If it is limited to HLT, we will be taken care of by the virtue of the 
fact that irq.pending be set before the handle_halt() logic is checked.  I 
admit that I was coding against an assumption that I do not yet know for a fact 
to be true.  I will update the comments to note this assumption so its clearer, 
and we can address it in the future if its ever revealed to be false.

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH][UPDATE] Shortcut MSR save/restore for lightweight VM Exit

2007-05-08 Thread Dong, Eddie

Avi Kivity wrote:
> Dong, Eddie wrote:
>> This patch reduce the VM Exit handling cost continuously
>> for those lightweight VM Exit which occupies 93% of VM Exit in
>> KB case if 64 bits OS has similar situation with 32 bits. In my old
>> machine, I saw 20% performance increasement of KB within 64 bits
>> RHEL5 guest and flat for 32bits FC5.
>>  There are still some room to improvment here, but this one
>> focus on basic MSR save/restore framework only for now and leave
>>  future to opitmize specific MSRs like GS_BASE etc.
>> thx,eddie
>> 
>> Signed-off-by:  Yaozu(Eddie) Dong [EMAIL PROTECTED]
>> 
>> against 5cf48c367dec74ba8553c53ed332cd075fa38b88
>> 
>> 
> 
> Much has changed.  Please rebase against HEAD.
> 
> Also, there have been a lot of regressions with the msr code.  Please

The previous MSR optmization patch only focus on heavyweight VM exit
path, with 5cf48c367dec74ba8553c53ed332cd075fa38b88, it doesn't cover
the major path. 
This patch remove save/restore for most HW MSRs, which imply to both
heavy weight
and light weight VM Exit path, to heavyweight VM Exit only SW path. 

> test on i386 and on Core Duo i386 (which is a little different) in
> addition to the regular x86_64.

I have tested this on I386, it is exactly same since software save MSRs 
are 0 and HW saved is 1, which is same with before. I will test on
Conroe/Woodcrest to see the exactly performance gain with KB.

thx,eddie

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 3/4] KVM: Adds ability to preempt an executing VCPU

2007-05-08 Thread Avi Kivity

Gregory Haskins wrote:
 On Tue, May 8, 2007 at  4:13 AM, in message <[EMAIL PROTECTED]>,
 
> Avi Kivity <[EMAIL PROTECTED]> wrote: 
>   
>> Gregory Haskins wrote:
>> 
>>> I am perhaps being a bit overzealous here.  What I found in practice is 
>>> that 
>>>   
>> the LVTT can screw things up on shutdown, so I was being pretty conservative 
>> on the synchronization here.  
>> 
>>>   
>>>   
>> That may point out to a different sync problem.  All pending timers 
>> ought to have been canceled before we reach here.  Please check to make 
>> sure this isn't papering over another problem.
>>
>> 
>
> You are definitely right there.  I had added this logic in the early stage of 
> debugging.  It turned out that I was missing an apic_dropref, which 
> effectively meant the hrtimer_cancel() was never being issued.  That was the 
> root-cause of my "LVTT expiration after guest shutdown" bug.  I left the sync 
> code in as a conservative measure, but I will clean this up.
>
>   

Okay.  An alternative to removing it is replacing it with a BUG_ON() so 
make sure the constraint is checked.

>>>   
>>>   
>> I approach it from the other direction: to me, a locked assignment says 
>> that something is fundamentally wrong.  Usually anything under a lock is 
>> a read- modify- write operation, otherwise the writes just stomp on each 
>> other.
>>
>> 
>
> Interesting. That makes sense.  So if I replace the assignment cases with 
> wmb, do I need to sprinkle rmbs anywhere or is that take care of naturally by 
> the places where we take the lock for a compound operation?
>   

I was going to say yes, but I'm not so sure now.  In any case I'm still 
uneasy about the lack of rmw in there.

See Documentation/memory-barriers.txt for an interesting, if difficult, 
discussion of the subject.

>>>   
>>>   
>> No, you are correct wrt the vcpu migrating to another cpu.
>>
>> What about vs. exit to userspace where we may sleep?
>> 
>
> My logic being correct is predicated on the assumption that you and I made a 
> week or two ago:  That the user-space will not sleep for anything but HLT.  
> If userspace *can* sleep on other things besides HLT, I agree that there is a 
> race here.  If it is limited to HLT, we will be taken care of by the virtue 
> of the fact that irq.pending be set before the handle_halt() logic is 
> checked.  I admit that I was coding against an assumption that I do not yet 
> know for a fact to be true.  I will update the comments to note this 
> assumption so its clearer, and we can address it in the future if its ever 
> revealed to be false.
>   

Yeah, I keep forgetting this.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 4/4] KVM: Add support for in-kernel LAPIC model

2007-05-08 Thread Gregory Haskins

>>> On Tue, May 8, 2007 at  4:19 AM, in message <[EMAIL PROTECTED]>,
Avi Kivity <[EMAIL PROTECTED]> wrote: 
> Gregory Haskins wrote:
>
>> By "drop it", I assume you mean drop the abstraction and just use native 
> HRTIMER directly for now?  
>>
>>   
> 
> Yes, we can worry about compatibility when this merged.
>

Ack.  I have removed the abstraction.

>>> [I didn't audit the lapic code]
>>>
>>> Where's vcpu-  >cr8 gone?
>>> 
>>
>> I know you requested that this entry remain in the VCPU structure.  However, 
> I couldnt make this work reasonably.  The APIC wants to maintain this value 
> itself so the two can very easily get out of sync.  I suppose there could be 
> ways to make the APIC use the vcpu- >cr8 variable as storage for TPR (albeit 
> messy), but this idea falls apart when we start looking at optimizations like 
> TPR shadowing.
>>
>> Based on all that, I felt it was best to just maintain CR8 as the TPR 
> register in the model.
>>
>>   
> 
> I don't understand.  Isn't the tpr read- only from the point of view of 
> the lapic?
> 

Not quite.  Its true that the APIC proper views the TPR as read-only.  However, 
TPR can be set by the CPU using both MOV to CR8 as well as an MMIO operation to 
the TPR register, and MMIOs are handled by the APIC code (on behalf of the vCPU)

> A simple set_cr8 helper should do the trick.

That would certainly solve the simple MOV to CR8 problem, yes.  I gets a little 
goofier when we start looking at the MMIO access path, but even that isn't 
insurmountable.  Where I really hit the wall was when I was thinking about TPR 
shadowing, which really wants direct access to the contiguous register file of 
the LAPIC.  I suppose we could give the CPU a real shadow of the registers and 
simply sync the TPR value on exit.  But it just seemed to be getting hacky for 
the sake of having vcpu->cr8.  Whats wrong with simply looking at the LAPIC 
registers when you want to know? ;)

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 3/4] KVM: Adds ability to preempt an executing VCPU

2007-05-08 Thread Gregory Haskins

>>> On Tue, May 8, 2007 at  4:26 AM, in message <[EMAIL PROTECTED]>,
Avi Kivity <[EMAIL PROTECTED]> wrote: 
> Gregory Haskins wrote:
>>   
>>> Hopefully not by setting the 
>>> signal number, bit by making the vcpu fd writable (userspace can attach 
>>> a signal to the fd if it wishes).
>>> 
>>
>> Can you provide an example of what you would like here?  I am not quite sure 
> what you mean by making the fd writable.
>>   
> 
> Making it respond to poll(2) as a writable fd.
> 
> See http://lwn.net/Articles/226252/ for an example.  It makes the fd 
> readable instead of writable, but it's the same mechanism.
> 
> The larger picture is that fds and the poll() family are the closest 
> thing Linux has to a generic event framework.  If the patchset in the 
> article above is accepted, fds _will_ be the generic event framework.  
> If something else is accepted, we'll just switch to that.
> 
> Qemu currently depends on signals, but you can have a writable fd 
> generate a signal, and the mechanism for that is optional and 
> configurable from userspace, which is what we want anyway.


Thanks for the link!  I will take a look and add this for my next drop.

-Greg

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] Cursor release failures

2007-05-08 Thread Michael Ivanov

Avi Kivity пишет:

>>   * ctrl alt does NOT release the pointer
> 
> This is strange.
Just tested it again with kvm-24
The pointer is not released on ctrl del.
When the guest system reboots the system does not hang anymore though.

Is it possible to turn on some kind of trace/debug output to figure out the 
problem?

Best regards
-- 
 \   / | Michael Ivanov
 (OvO) |
 (^^^) |
  \^/  |
  ^ ^  |

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] paravirtualization status

2007-05-08 Thread Omar Khan


hi,
   What is the status of paravirtualization? Also when Ingo released his
paravirtualization patch and some results Avi noted that :

"Very impressive!  The gain probably comes not only from avoiding the 
vmentry/vmexit, but also from avoiding the flushing of the global page 
tlb entries." [http://thread.gmane.org/gmane.linux.kernel/481084]

can someone please explain briefly what the "global page tlb entries" are? 

Thanks
Omar Khan




-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH 0/8] in-kernel APIC support "v1"

2007-05-08 Thread Gregory Haskins

Here is my latest series incorporating the feedback and numerous bugfixes.  I
did not keep an official change-log, so its difficult to say what changed off
the top of my head without an interdiff.  I will keep a changelog from here on
out.  Lets call this drop officially "v1".  I will start tracking versions of
the drop so its easier to refer to them in review notes, etc.

Here are a few notes:

A) I implemented Avi's idea for a fd-based signaling mechanism.  I didnt quite
get what he meant by "writable-fd".  The way I saw it, it should be readable
so that is how I implemented it.  If that is not satisfactory, please
elaborate on the writable idea and I will change it over.

B) I changed the controversial kvm_irqdevice_ack() mechanism to use an "out"
structure, instead of an int pointer + return bitmap.  Hopefully, this design
puts Avi's mind at ease as the return code is more standard now.  In addition,
this API makes it easier to extend, which I take advantage of later in the
series for the TPR-shadow stuff.

C) I changed the irq.task assignment from a lock to a barrier, per review
comments.  However, I left the irq.guestmode = 0 assignment in a lock because
I believe it is actually required to eliminate a race.  E.g. We want to make
sure that the irq.pending and IPI-method are decided atomically and the
irq.guest-mode is essentially identifiying a critical section.  I could be
convinced otherwise, but for now its still there.

D) Patch #8 is for demonstration purposes only.  Dont apply it (yet) as it
causes the system to error on VMENTRY.  I include it purely so its clear where
I am going.

Overall, this code (excluding patch #8) seems to be working quite well from a
pure functional standpoint.  One problem that I see is QEMU remains pretty
busy even when the guest is idle.  I have a feeling it has something to do
with the way signals are delivered...TBD.  Otherwise, its working from my
perspective.  I would love to hear feedback from testers.

An interesting discovery on my part while working on this is that there is an
aparent mis-emulation in the QEMU LAPIC code.  The kernel that ships as the
SLED-10 installer (2.6.16.21, I think) maps LINT0 as an NMI and masks off all
interrupts in the 8259 except the PIT.  It also leaves the PIT input on the
IOAPIC active. 

This means that every timer tick gets delivered both as a FIXED vector from
the IOAPIC, and as an NMI.  As far as I can tell from reading google, this is
what linux intended.  Note, however, that under QEMU LAPIC, LINT0 is dropped
if the vector is not EXTINT whereas the in-kernel APIC emulates both.
Therefore, cat'ing /proc/interrupts under stock KVM shows only IRQ: 0, and LOC
incrementing, with NMI at 0.  The in-kernel patches show NMIs also
incrementing. 

I could generate a patch to fix the QEMU code, but what I am not sure of is
whether this was intentionally coded to ignore the LINT0 NMI programming? 

Regards,
-Greg

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH 1/8] KVM: Adds support for in-kernel mmio handlers

2007-05-08 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/kvm.h  |   60 +++
 drivers/kvm/kvm_main.c |   94 ++--
 2 files changed, 142 insertions(+), 12 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 9c20d5d..b76631b 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -254,6 +254,65 @@ struct kvm_stat {
u32 light_exits;
 };
 
+struct kvm_io_device {
+   void (*read)(struct kvm_io_device *this,
+gpa_t addr,
+int len,
+void *val);
+   void (*write)(struct kvm_io_device *this,
+ gpa_t addr,
+ int len,
+ const void *val);
+   int (*in_range)(struct kvm_io_device *this, gpa_t addr);
+   void (*destructor)(struct kvm_io_device *this);
+
+   void *private;
+};
+
+static inline void kvm_iodevice_read(struct kvm_io_device *dev,
+gpa_t addr,
+int len,
+void *val)
+{
+   dev->read(dev, addr, len, val);
+}
+
+static inline void kvm_iodevice_write(struct kvm_io_device *dev,
+ gpa_t addr,
+ int len,
+ const void *val)
+{
+   dev->write(dev, addr, len, val);
+}
+
+static inline int kvm_iodevice_inrange(struct kvm_io_device *dev, gpa_t addr)
+{
+   return dev->in_range(dev, addr);
+}
+
+static inline void kvm_iodevice_destructor(struct kvm_io_device *dev)
+{
+   dev->destructor(dev);
+}
+
+/*
+ * It would be nice to use something smarter than a linear search, TBD...
+ * Thankfully we dont expect many devices to register (famous last words :),
+ * so until then it will suffice.  At least its abstracted so we can change
+ * in one place.
+ */
+struct kvm_io_bus {
+   int   dev_count;
+#define NR_IOBUS_DEVS 6
+   struct kvm_io_device *devs[NR_IOBUS_DEVS];
+};
+
+void kvm_io_bus_init(struct kvm_io_bus *bus);
+void kvm_io_bus_destroy(struct kvm_io_bus *bus);
+struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus, gpa_t addr);
+void kvm_io_bus_register_dev(struct kvm_io_bus *bus,
+struct kvm_io_device *dev);
+
 struct kvm_vcpu {
struct kvm *kvm;
union {
@@ -367,6 +426,7 @@ struct kvm {
unsigned long rmap_overflow;
struct list_head vm_list;
struct file *filp;
+   struct kvm_io_bus mmio_bus;
 };
 
 struct descriptor_table {
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index a3723dd..2bc5dbb 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -295,6 +295,7 @@ static struct kvm *kvm_create_vm(void)
 
spin_lock_init(&kvm->lock);
INIT_LIST_HEAD(&kvm->active_mmu_pages);
+   kvm_io_bus_init(&kvm->mmio_bus);
for (i = 0; i < KVM_MAX_VCPUS; ++i) {
struct kvm_vcpu *vcpu = &kvm->vcpus[i];
 
@@ -392,6 +393,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
spin_lock(&kvm_lock);
list_del(&kvm->vm_list);
spin_unlock(&kvm_lock);
+   kvm_io_bus_destroy(&kvm->mmio_bus);
kvm_free_vcpus(kvm);
kvm_free_physmem(kvm);
kfree(kvm);
@@ -1015,12 +1017,25 @@ static int emulator_write_std(unsigned long addr,
return X86EMUL_UNHANDLEABLE;
 }
 
+static struct kvm_io_device *vcpu_find_mmio_dev(struct kvm_vcpu *vcpu,
+   gpa_t addr)
+{
+   /*
+* Note that its important to have this wrapper function because
+* in the very near future we will be checking for MMIOs against
+* the LAPIC as well as the general MMIO bus
+*/
+   return kvm_io_bus_find_dev(&vcpu->kvm->mmio_bus, addr);
+}
+
 static int emulator_read_emulated(unsigned long addr,
  void *val,
  unsigned int bytes,
  struct x86_emulate_ctxt *ctxt)
 {
-   struct kvm_vcpu *vcpu = ctxt->vcpu;
+   struct kvm_vcpu  *vcpu = ctxt->vcpu;
+   struct kvm_io_device *mmio_dev;
+   gpa_t gpa;
 
if (vcpu->mmio_read_completed) {
memcpy(val, vcpu->mmio_data, bytes);
@@ -1029,18 +1044,26 @@ static int emulator_read_emulated(unsigned long addr,
} else if (emulator_read_std(addr, val, bytes, ctxt)
   == X86EMUL_CONTINUE)
return X86EMUL_CONTINUE;
-   else {
-   gpa_t gpa = vcpu->mmu.gva_to_gpa(vcpu, addr);
 
-   if (gpa == UNMAPPED_GVA)
-   return X86EMUL_PROPAGATE_FAULT;
-   vcpu->mmio_needed = 1;
-   vcpu->mmio_phys_addr = gpa;
-   vcpu->mmio_size = bytes;
-   vcpu->mmio_is_write = 0;
+   gpa = vcpu->mmu.gva_to_gpa(vc

[kvm-devel] [PATCH 2/8] KVM: Add irqdevice object

2007-05-08 Thread Gregory Haskins

The current code is geared towards using a user-mode (A)PIC.  This patch adds
an "irqdevice" abstraction, and implements a "userint" model to handle the
duties of the original code.  Later, we can develop other irqdevice models
to handle objects like LAPIC, IOAPIC, i8259, etc, as appropriate

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/Makefile|2 
 drivers/kvm/irqdevice.h |  176 +
 drivers/kvm/kvm.h   |  107 ++-
 drivers/kvm/kvm_main.c  |   58 +---
 drivers/kvm/svm.c   |  158 -
 drivers/kvm/userint.c   |  223 +++
 drivers/kvm/vmx.c   |  161 +-
 7 files changed, 780 insertions(+), 105 deletions(-)

diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index c0a789f..540afbc 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -2,7 +2,7 @@
 # Makefile for Kernel-based Virtual Machine module
 #
 
-kvm-objs := kvm_main.o mmu.o x86_emulate.o
+kvm-objs := kvm_main.o mmu.o x86_emulate.o userint.o
 obj-$(CONFIG_KVM) += kvm.o
 kvm-intel-objs = vmx.o
 obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
diff --git a/drivers/kvm/irqdevice.h b/drivers/kvm/irqdevice.h
new file mode 100644
index 000..097d179
--- /dev/null
+++ b/drivers/kvm/irqdevice.h
@@ -0,0 +1,176 @@
+/*
+ * Defines an interface for an abstract interrupt controller.  The model
+ * consists of a unit with an arbitrary number of input lines N (IRQ0-(N-1)),
+ * an arbitrary number of output lines (INTR) (LINT, EXTINT, NMI, etc), and
+ * methods for completing an interrupt-acknowledge cycle (INTA).  A particular
+ * implementation of this model will define various policies, such as
+ * irq-to-vector translation, INTA/auto-EOI policy, etc.
+ *
+ * In addition, the INTR callback mechanism allows the unit to be "wired" to
+ * an interruptible source in a very flexible manner. For instance, an
+ * irqdevice could have its INTR wired to a VCPU (ala LAPIC), or another
+ * interrupt controller (ala cascaded i8259s)
+ *
+ * Copyright (C) 2007 Novell
+ *
+ * Authors:
+ *   Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef __IRQDEVICE_H
+#define __IRQDEVICE_H
+
+struct kvm_irqdevice;
+
+typedef enum {
+   kvm_irqpin_localint,
+   kvm_irqpin_extint,
+   kvm_irqpin_smi,
+   kvm_irqpin_nmi,
+   kvm_irqpin_invalid, /* must always be last */
+} kvm_irqpin_t;
+
+
+struct kvm_irqsink {
+   void (*set_intr)(struct kvm_irqsink *this,
+struct kvm_irqdevice *dev,
+kvm_irqpin_t pin);
+
+   void *private;
+};
+
+#define KVM_IRQACKDATA_VECTOR_VALID   (1 << 0)
+#define KVM_IRQACKDATA_VECTOR_PENDING (1 << 1)
+
+#define KVM_IRQACK_FLAG_PEEK  (1 << 0)
+
+struct kvm_irqack_data {
+   int flags;
+   int vector;
+};
+
+struct kvm_irqdevice {
+   int  (*ack)(struct kvm_irqdevice *this, int flags,
+   struct kvm_irqack_data *data);
+   int  (*set_pin)(struct kvm_irqdevice *this, int pin, int level);
+   void (*destructor)(struct kvm_irqdevice *this);
+
+   void   *private;
+   struct kvm_irqsink  sink;
+};
+
+/**
+ * kvm_irqdevice_init - initialize the kvm_irqdevice for use
+ * @dev: The device
+ *
+ * Description: Initialize the kvm_irqdevice for use.  Should be called before
+ *  calling any derived implementation init functions
+ *
+ * Returns: (void)
+ */
+static inline void kvm_irqdevice_init(struct kvm_irqdevice *dev)
+{
+   memset(dev, 0, sizeof(*dev));
+}
+
+/**
+ * kvm_irqdevice_ack - read and ack the highest priority vector from the device
+ * @dev: The device
+ * @flags: Modifies default behavior
+ *   [ KVM_IRQACK_FLAG_PEEK - Dont ack vector, just check status ]
+ * @data: A pointer to a kvm_irqack_data structure to hold the result
+ *
+ * Description: Read the highest priority pending vector from the device,
+ *  potentially invoking auto-EOI depending on device policy
+ *
+ *  Successful return indicates that the *data* structure is valid
+ *
+ *   data.flags -
+ *  [KVM_IRQACKDATA_VECTOR_VALID - data.vector is valid]
+ *  [KVM_IRQACKDATA_VECTOR_PENDING - more vectors are pending]
+ *
+ * Returns: (int)
+ *   [-1 = failure]
+ *   [ 0 = success]
+ */
+static inline int kvm_irqdevice_ack(struct kvm_irqdevice *dev, int flags,
+   struct kvm_irqack_data *data)
+{
+   return dev->ack(dev, flags, data);
+}
+
+/**
+ * kvm_irqdevice_set_pin - allows the caller to assert/deassert an IRQ
+ * @dev: The device
+ * @pin: The input pin to alter
+ * @level: The value to set (1 = assert, 0 = deassert)
+ *
+ * Description: Allows the caller to assert/deassert an IRQ input pin to t

[kvm-devel] [PATCH 3/8] KVM: Adds ability to preempt an executing VCPU

2007-05-08 Thread Gregory Haskins

The VCPU executes synchronously w.r.t. userspace today, and therefore
interrupt injection is pretty straight forward.  However, we will soon need
to be able to inject interrupts asynchronous to the execution of the VCPU
due to the introduction of SMP, paravirtualized drivers, and asynchronous
hypercalls.  This patch adds support to the interrupt mechanism to force
a VCPU to VMEXIT when a new interrupt is pending.

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/kvm.h  |2 ++
 drivers/kvm/kvm_main.c |   59 +++-
 drivers/kvm/svm.c  |   43 +++
 drivers/kvm/vmx.c  |   43 +++
 4 files changed, 146 insertions(+), 1 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 059f074..0f6cc32 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -329,6 +329,8 @@ struct kvm_vcpu_irq {
struct kvm_irqdevice dev;
int  pending;
int  deferred;
+   struct task_struct  *task;
+   int  guest_mode;
 };
 
 struct kvm_vcpu {
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 199489b..a160638 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -1868,6 +1868,9 @@ static int kvm_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
kvm_arch_ops->decache_regs(vcpu);
}
 
+   vcpu->irq.task = current;
+   smp_wmb();
+
r = kvm_arch_ops->run(vcpu, kvm_run);
 
 out:
@@ -2309,6 +2312,20 @@ out1:
 }
 
 /*
+ * This function is invoked whenever we want to interrupt a vcpu that is
+ * currently executing in guest-mode.  It currently is a no-op because
+ * the simple delivery of the IPI to execute this function accomplishes our
+ * goal: To cause a VMEXIT.  We pass the vcpu (which contains the
+ * vcpu->irq.task, etc) for future use
+ */
+static void kvm_vcpu_guest_intr(void *info)
+{
+#ifdef NOT_YET
+   struct kvm_vcpu *vcpu = (struct kvm_vcpu*)info;
+#endif
+}
+
+/*
  * This function will be invoked whenever the vcpu->irq.dev raises its INTR
  * line
  */
@@ -2318,10 +2335,50 @@ static void kvm_vcpu_intr(struct kvm_irqsink *this,
 {
struct kvm_vcpu *vcpu = (struct kvm_vcpu*)this->private;
unsigned long flags;
+   int direct_ipi = -1;
 
spin_lock_irqsave(&vcpu->irq.lock, flags);
-   __set_bit(pin, &vcpu->irq.pending);
+
+   if (!test_bit(pin, &vcpu->irq.pending)) {
+   /*
+* Record the change..
+*/
+   __set_bit(pin, &vcpu->irq.pending);
+
+   /*
+* then wake up the vcpu (if necessary)
+*/
+   if (vcpu->irq.task && (vcpu->irq.task != current)) {
+   if (vcpu->irq.guest_mode) {
+   /*
+* If we are in guest mode, we can optimize
+* the IPI by executing a function directly
+* on the owning processor.
+*/
+   direct_ipi = task_cpu(vcpu->irq.task);
+   BUG_ON(direct_ipi == smp_processor_id());
+   }
+   }
+   }
+
spin_unlock_irqrestore(&vcpu->irq.lock, flags);
+
+   /*
+* we can safely send the IPI outside of the lock-scope because the
+* irq.pending has already been updated.  This code assumes that
+* userspace will not sleep on anything other than HLT instructions.
+* HLT is covered in a race-free way because irq.pending was updated
+* in the critical section, and handle_halt() which check if any
+* interrupts are pending before returning to userspace.
+*
+* If it turns out that userspace can sleep on conditions other than
+* HLT, this code will need to be enhanced to allow the irq.pending
+* flags to be exported to userspace
+*/
+   if (direct_ipi != -1)
+   smp_call_function_single(direct_ipi,
+kvm_vcpu_guest_intr,
+vcpu, 0, 0);
 }
 
 static void kvm_vcpu_irqsink_init(struct kvm_vcpu *vcpu)
diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index 4c03881..91546ae 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -1542,11 +1542,40 @@ static int svm_vcpu_run(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
u16 gs_selector;
u16 ldt_selector;
int r;
+   unsigned long irq_flags;
 
 again:
+   /*
+* We disable interrupts until the next VMEXIT to eliminate a race
+* condition for delivery of virtual interrutps.  Note that this is
+* probably not as bad as it sounds, as interrupts will still invoke
+* a VMEXIT once transitioned to GUEST mode (and thus exit th

[kvm-devel] [PATCH 4/8] KVM: Adds ability to signal userspace using a file-descriptor

2007-05-08 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/kvm.h  |2 +
 drivers/kvm/kvm_main.c |   82 
 2 files changed, 84 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 0f6cc32..b5bfc91 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -331,6 +331,8 @@ struct kvm_vcpu_irq {
int  deferred;
struct task_struct  *task;
int  guest_mode;
+   wait_queue_head_twq;
+   int  usignal;
 };
 
 struct kvm_vcpu {
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index a160638..6b40c18 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "x86_emulate.h"
 #include "segment_descriptor.h"
@@ -304,6 +305,7 @@ static struct kvm *kvm_create_vm(void)
memset(&vcpu->irq, 0, sizeof(vcpu->irq));
spin_lock_init(&vcpu->irq.lock);
vcpu->irq.deferred = -1;
+   init_waitqueue_head(&vcpu->irq.wq);
 
vcpu->cpu = -1;
vcpu->kvm = kvm;
@@ -2265,11 +2267,78 @@ static int kvm_vcpu_release(struct inode *inode, struct 
file *filp)
return 0;
 }
 
+static unsigned int kvm_vcpu_poll(struct file *filp, poll_table *wait)
+{
+   struct kvm_vcpu *vcpu = filp->private_data;
+   unsigned int events = 0;
+   unsigned long flags;
+
+   poll_wait(filp, &vcpu->irq.wq, wait);
+
+   spin_lock_irqsave(&vcpu->irq.lock, flags);
+   if (vcpu->irq.usignal)
+   events |= POLLIN;
+   spin_unlock_irqrestore(&vcpu->irq.lock, flags);
+
+   return events;
+}
+
+static ssize_t kvm_vcpu_read(struct file *filp, char __user *buf, size_t count,
+loff_t *ppos)
+{
+   struct kvm_vcpu *vcpu = filp->private_data;
+   ssize_t res = -EAGAIN;
+   DECLARE_WAITQUEUE(wait, current);
+   unsigned long flags;
+   int val;
+
+   if (count < sizeof(vcpu->irq.usignal))
+   return -EINVAL;
+
+   spin_lock_irqsave(&vcpu->irq.lock, flags);
+
+   val = vcpu->irq.usignal;
+
+   if (val > 0)
+   res = sizeof(val);
+   else if (!(filp->f_flags & O_NONBLOCK)) {
+   __add_wait_queue(&vcpu->irq.wq, &wait);
+   for (res = 0;;) {
+   set_current_state(TASK_INTERRUPTIBLE);
+   if (val > 0) {
+   res = sizeof(val);
+   break;
+   }
+   if (signal_pending(current)) {
+   res = -ERESTARTSYS;
+   break;
+   }
+   spin_unlock_irqrestore(&vcpu->irq.lock, flags);
+   schedule();
+   spin_lock_irqsave(&vcpu->irq.lock, flags);
+   }
+   __remove_wait_queue(&vcpu->irq.wq, &wait);
+   __set_current_state(TASK_RUNNING);
+   }
+
+   if (res > 0)
+   vcpu->irq.usignal = 0;
+
+   spin_unlock_irqrestore(&vcpu->irq.lock, flags);
+
+   if (res > 0 && put_user(val, (int __user *) buf))
+   return -EFAULT;
+
+   return res;
+}
+
 static struct file_operations kvm_vcpu_fops = {
.release= kvm_vcpu_release,
.unlocked_ioctl = kvm_vcpu_ioctl,
.compat_ioctl   = kvm_vcpu_ioctl,
.mmap   = kvm_vcpu_mmap,
+   .poll   = kvm_vcpu_poll,
+   .read   = kvm_vcpu_read,
 };
 
 /*
@@ -2336,6 +2405,7 @@ static void kvm_vcpu_intr(struct kvm_irqsink *this,
struct kvm_vcpu *vcpu = (struct kvm_vcpu*)this->private;
unsigned long flags;
int direct_ipi = -1;
+   int indirect_sig = 0;
 
spin_lock_irqsave(&vcpu->irq.lock, flags);
 
@@ -2357,6 +2427,15 @@ static void kvm_vcpu_intr(struct kvm_irqsink *this,
 */
direct_ipi = task_cpu(vcpu->irq.task);
BUG_ON(direct_ipi == smp_processor_id());
+   } else {
+   /*
+* otherwise, we must assume that we could be
+* blocked anywhere, including userspace. Send
+* a signal to give everyone a chance to get
+* notification
+*/
+   vcpu->irq.usignal++;
+   indirect_sig = 1;
}
}
}
@@ -2379,6 +2458,9 @@ static void kvm_vcpu_intr(struct kvm_irqsink *this,
smp_call_function_single(direct_ipi,
 kvm_vcpu_guest_intr,
 vcpu, 0, 0);

[kvm-devel] [PATCH 5/8] KVM: Add support for in-kernel LAPIC model

2007-05-08 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/Makefile   |2 
 drivers/kvm/kernint.c  |  149 +
 drivers/kvm/kvm.h  |   35 +
 drivers/kvm/kvm_main.c |  179 +-
 drivers/kvm/lapic.c| 1412 
 drivers/kvm/svm.c  |   13 
 drivers/kvm/userint.c  |8 
 drivers/kvm/vmx.c  |   16 -
 include/linux/kvm.h|   16 +
 9 files changed, 1789 insertions(+), 41 deletions(-)

diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index 540afbc..1aad737 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -2,7 +2,7 @@
 # Makefile for Kernel-based Virtual Machine module
 #
 
-kvm-objs := kvm_main.o mmu.o x86_emulate.o userint.o
+kvm-objs := kvm_main.o mmu.o x86_emulate.o userint.o lapic.o kernint.o
 obj-$(CONFIG_KVM) += kvm.o
 kvm-intel-objs = vmx.o
 obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
diff --git a/drivers/kvm/kernint.c b/drivers/kvm/kernint.c
new file mode 100644
index 000..b5cbcae
--- /dev/null
+++ b/drivers/kvm/kernint.c
@@ -0,0 +1,149 @@
+/*
+ * Kernel Interrupt IRQ device
+ *
+ * Provides a model for connecting in-kernel interrupt resources to a VCPU.
+ *
+ * A typical modern x86 processor has the concept of an internal Local-APIC
+ * and some external signal pins.  The way in which interrupts are injected is
+ * dependent on whether software enables the LAPIC or not.  When enabled,
+ * interrupts are acknowledged through the LAPIC.  Otherwise they are through
+ * an externally connected PIC (typically an i8259 on the BSP)
+ *
+ * Copyright (C) 2007 Novell
+ *
+ * Authors:
+ *   Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "kvm.h"
+
+struct kvm_kernint {
+   struct kvm_vcpu  *vcpu;
+   struct kvm_irqdevice *self_irq;
+   struct kvm_irqdevice *ext_irq;
+   struct kvm_irqdevice  apic_irq;
+
+};
+
+static struct kvm_irqdevice *get_irq_dev(struct kvm_kernint *s)
+{
+   struct kvm_irqdevice *dev;
+
+   if (kvm_lapic_enabled(s->vcpu))
+   dev = &s->apic_irq;
+   else
+   dev = s->ext_irq;
+
+   if (!dev)
+   kvm_crash_guest(s->vcpu->kvm);
+
+   return dev;
+}
+
+static int kernint_irqdev_ack(struct kvm_irqdevice *this, int flags,
+ struct kvm_irqack_data *data)
+{
+   struct kvm_kernint *s = (struct kvm_kernint*)this->private;
+
+   return kvm_irqdevice_ack(get_irq_dev(s), flags, data);
+}
+
+static int kernint_irqdev_set_pin(struct kvm_irqdevice *this,
+ int irq, int level)
+{
+   /* no-op */
+   return 0;
+}
+
+static void kernint_irqdev_destructor(struct kvm_irqdevice *this)
+{
+   struct kvm_kernint *s = (struct kvm_kernint*)this->private;
+
+   kvm_irqdevice_destructor(&s->apic_irq);
+   kvm_lapic_destroy(s->vcpu);
+   kfree(s);
+}
+
+static void kvm_apic_intr(struct kvm_irqsink *this,
+ struct kvm_irqdevice *dev,
+ kvm_irqpin_t pin)
+{
+   struct kvm_kernint *s = (struct kvm_kernint*)this->private;
+
+   /*
+* If the LAPIC sent us an interrupt it *must* be enabled,
+* just forward it on to the CPU
+*/
+   kvm_irqdevice_set_intr(s->self_irq, pin);
+}
+
+static void kvm_ext_intr(struct kvm_irqsink *this,
+struct kvm_irqdevice *dev,
+kvm_irqpin_t pin)
+{
+   struct kvm_kernint *s = (struct kvm_kernint*)this->private;
+
+   /*
+* If the EXTINT device sent us an interrupt, forward it to the LINT0
+* pin of the LAPIC
+*/
+   if (pin != kvm_irqpin_localint)
+   return;
+
+   /*
+* "irq 0" = LINT0, 1 = LINT1
+*/
+   kvm_irqdevice_set_pin(&s->apic_irq, 0, 1);
+}
+
+int kvm_kernint_init(struct kvm_vcpu *vcpu)
+{
+   struct kvm_irqdevice *irqdev = &vcpu->irq.dev;
+   struct kvm_kernint *s;
+   struct kvm_irqsink apicsink;
+
+   s = kzalloc(sizeof(*s), GFP_KERNEL);
+   if (!s)
+   return -ENOMEM;
+
+   s->vcpu = vcpu;
+
+   /*
+* Configure the irqdevice interface
+*/
+   irqdev->ack = kernint_irqdev_ack;
+   irqdev->set_pin = kernint_irqdev_set_pin;
+   irqdev->destructor  = kernint_irqdev_destructor;
+
+   irqdev->private = s;
+   s->self_irq = irqdev;
+
+   /*
+* Configure the EXTINT device if this is the BSP processor
+*/
+   if (!vcpu_slot(vcpu)) {
+   struct kvm_irqsink extsink = {
+   .set_intr   = kvm_ext_intr,
+   .private= s
+   };
+   s->ext_irq = &vcpu->kvm->isa_irq;
+   kvm_irqdevice_register_sink(s->ext_irq, &extsink);
+   }
+
+   /*
+* Configure the LAP

[kvm-devel] [PATCH 7/8] KVM: Adds basic plumbing to support TPR shadow features

2007-05-08 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/irqdevice.h |3 +++
 drivers/kvm/kvm.h   |1 +
 drivers/kvm/lapic.c |   15 +++
 3 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/irqdevice.h b/drivers/kvm/irqdevice.h
index 097d179..173313d 100644
--- a/drivers/kvm/irqdevice.h
+++ b/drivers/kvm/irqdevice.h
@@ -45,12 +45,14 @@ struct kvm_irqsink {
 
 #define KVM_IRQACKDATA_VECTOR_VALID   (1 << 0)
 #define KVM_IRQACKDATA_VECTOR_PENDING (1 << 1)
+#define KVM_IRQACKDATA_NEXT_VALID (1 << 2)
 
 #define KVM_IRQACK_FLAG_PEEK  (1 << 0)
 
 struct kvm_irqack_data {
int flags;
int vector;
+   int next;
 };
 
 struct kvm_irqdevice {
@@ -92,6 +94,7 @@ static inline void kvm_irqdevice_init(struct kvm_irqdevice 
*dev)
  *   data.flags -
  *  [KVM_IRQACKDATA_VECTOR_VALID - data.vector is valid]
  *  [KVM_IRQACKDATA_VECTOR_PENDING - more vectors are pending]
+ *  [KVM_IRQACKDATA_NEXT_VALID - next-vector is valid]
  *
  * Returns: (int)
  *   [-1 = failure]
diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 60710d8..4ae616f 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -178,6 +178,7 @@ void kvm_lapic_save(struct kvm_vcpu *vcpu, struct kvm_sregs 
*sregs);
 void kvm_lapic_restore(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs);
 void kvm_lapic_reset(struct kvm_vcpu *vcpu);
 int  kvm_lapic_enabled(struct kvm_vcpu *vcpu);
+void *kvm_lapic_get_regs(struct kvm_vcpu *vcpu);
 
 /*
  * x86 supports 3 paging modes (4-level 64-bit, 3-level 64-bit, and 2-level
diff --git a/drivers/kvm/lapic.c b/drivers/kvm/lapic.c
index f7b04f9..da51710 100644
--- a/drivers/kvm/lapic.c
+++ b/drivers/kvm/lapic.c
@@ -1140,6 +1140,13 @@ int kvm_lapic_enabled(struct kvm_vcpu *vcpu)
return ret;
 }
 
+void *kvm_lapic_get_regs(struct kvm_vcpu *vcpu)
+{
+   struct kvm_kern_apic *apic = (struct kvm_kern_apic*)vcpu->apic.dev;
+   return apic->regs;
+}
+EXPORT_SYMBOL_GPL(kvm_lapic_get_regs);
+
 /*
  *--
  * timer interface
@@ -1278,6 +1285,14 @@ static int apic_irqdev_ack(struct kvm_irqdevice *this, 
int flags,
 */
if (irq > apic_get_reg(apic, APIC_TASKPRI))
data->flags |= KVM_IRQACKDATA_VECTOR_PENDING;
+
+   /*
+* We report the next pending vector here so that the system
+* can asses TPR thresholds for TPR-shadowing purposes
+* (if applicable)
+*/
+   data->next   = irq;
+   data->flags |= KVM_IRQACKDATA_NEXT_VALID;
}
 
  out:


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH 6/8] KVM: Adds support for real NMI injection on VMX processors

2007-05-08 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/vmx.c |   63 +
 drivers/kvm/vmx.h |3 +++
 2 files changed, 61 insertions(+), 5 deletions(-)

diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index bee4831..1c99bc9 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -1148,7 +1148,14 @@ static int vmx_vcpu_setup(struct kvm_vcpu *vcpu)
   PIN_BASED_VM_EXEC_CONTROL,
   PIN_BASED_EXT_INTR_MASK   /* 20.6.1 */
   | PIN_BASED_NMI_EXITING   /* 20.6.1 */
+  | PIN_BASED_VIRTUAL_NMI   /* 20.6.1 */
);
+
+   if (!(vmcs_read32(PIN_BASED_VM_EXEC_CONTROL) & PIN_BASED_VIRTUAL_NMI))
+   printk(KERN_WARNING "KVM: Warning - Host processor does " \
+  "not support virtual-NMI injection.  Using IRQ " \
+  "method\n");
+ 
vmcs_write32_fixedbits(MSR_IA32_VMX_PROCBASED_CTLS,
   CPU_BASED_VM_EXEC_CONTROL,
   CPU_BASED_HLT_EXITING /* 20.6.2 */
@@ -1297,6 +1304,43 @@ static void inject_rmode_irq(struct kvm_vcpu *vcpu, int 
irq)
vmcs_writel(GUEST_RSP, (vmcs_readl(GUEST_RSP) & ~0x) | (sp - 6));
 }
 
+static int do_nmi_requests(struct kvm_vcpu *vcpu)
+{
+   int nmi_window = 0;
+
+   BUG_ON(!(test_bit(kvm_irqpin_nmi, &vcpu->irq.pending)));
+
+   nmi_window =
+   (((vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & 0xb) == 0)
+&& (vmcs_read32(VM_ENTRY_INTR_INFO_FIELD)
+& INTR_INFO_VALID_MASK));
+
+   if (nmi_window) {
+   if (vcpu->rmode.active)
+   inject_rmode_irq(vcpu, 2);
+   else
+   vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
+2 |
+INTR_TYPE_NMI |
+INTR_INFO_VALID_MASK);
+
+   __clear_bit(kvm_irqpin_nmi, &vcpu->irq.pending);
+   } else {
+   /*
+* NMIs blocked.  Wait for unblock.
+*/
+   u32 cbvec = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
+   cbvec |= CPU_BASED_NMI_EXITING;
+   vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cbvec); 
+   }
+
+   /*
+* nmi_window correctly reflects whether we handled this interrupt
+* or not, so just return it as the "handled" indicator
+*/
+   return nmi_window;
+}
+
 static int do_intr_requests(struct kvm_vcpu *vcpu,
struct kvm_run *kvm_run,
kvm_irqpin_t pin)
@@ -1329,9 +1373,11 @@ static int do_intr_requests(struct kvm_vcpu *vcpu,
break;
case kvm_irqpin_nmi:
/*
-* FIXME: Someday we will handle this using the
-* specific VMX NMI features.  For now, just inject
-* the NMI as a standard interrupt on vector 2
+* We should only get here if the processor does
+* not support virtual NMIs.  Inject the NMI as a
+* standard interrupt on vector 2.  The implication is
+* that NMIs are going to be subject to RFLAGS.IF
+* masking, unfortunately.
 */
ack.flags |= KVM_IRQACKDATA_VECTOR_VALID;
ack.vector = 2;
@@ -1374,7 +1420,8 @@ static int do_intr_requests(struct kvm_vcpu *vcpu,
 static void clear_pending_controls(struct kvm_vcpu *vcpu)
 {
u32 cbvec = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
-   cbvec &= ~CPU_BASED_VIRTUAL_INTR_PENDING;
+   cbvec &= ~(CPU_BASED_VIRTUAL_INTR_PENDING
+  | CPU_BASED_NMI_EXITING);
vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cbvec);
 }
 
@@ -1391,7 +1438,6 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu,
switch (pin) {
case kvm_irqpin_localint:
case kvm_irqpin_extint:
-   case kvm_irqpin_nmi:
do_intr_requests(vcpu, kvm_run, pin);
break;
case kvm_irqpin_smi:
@@ -1399,6 +1445,13 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu,
printk(KERN_WARNING "KVM: dropping unhandled SMI\n");
__clear_bit(pin, &vcpu->irq.pending);
break;
+   case kvm_irqpin_nmi:
+   if (vmcs_read32(PIN_BASED_VM_EXEC_CONTROL)
+   & PIN_BASED_VIRTUAL_NMI)
+   do_nmi_requests(vcpu);
+   else
+   do_intr_requests(vcpu, kvm_run, pin);   
+   break;

[kvm-devel] [PATCH 8/8] KVM: Adds support for TPR shadowing under VMX processors

2007-05-08 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/vmx.c |   32 ++--
 1 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index 1c99bc9..7745bb9 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -1159,13 +1159,26 @@ static int vmx_vcpu_setup(struct kvm_vcpu *vcpu)
vmcs_write32_fixedbits(MSR_IA32_VMX_PROCBASED_CTLS,
   CPU_BASED_VM_EXEC_CONTROL,
   CPU_BASED_HLT_EXITING /* 20.6.2 */
-  | CPU_BASED_CR8_LOAD_EXITING/* 20.6.2 */
-  | CPU_BASED_CR8_STORE_EXITING   /* 20.6.2 */
+  | CPU_BASED_TPR_SHADOW/* 20.6.2 */
   | CPU_BASED_ACTIVATE_IO_BITMAP  /* 20.6.2 */
   | CPU_BASED_MOV_DR_EXITING
   | CPU_BASED_USE_TSC_OFFSETING   /* 21.3 */
);
 
+   if (!(vmcs_read32(CPU_BASED_VM_EXEC_CONTROL) & CPU_BASED_TPR_SHADOW)) {
+   u32 cbvec;
+
+   cbvec  = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
+   cbvec |= CPU_BASED_CR8_LOAD_EXITING;/* 20.6.2 */
+   cbvec |= CPU_BASED_CR8_STORE_EXITING;   /* 20.6.2 */
+   vmcs_write32_fixedbits(MSR_IA32_VMX_PROCBASED_CTLS,
+  CPU_BASED_VM_EXEC_CONTROL,
+  cbvec);
+
+   printk(KERN_WARNING "KVM: Warning - Host processor does " \
+  "not support TPR-shadow\n");
+   }
+
vmcs_write32(EXCEPTION_BITMAP, 1 << PF_VECTOR);
vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, 0);
vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, 0);
@@ -1239,7 +1252,7 @@ static int vmx_vcpu_setup(struct kvm_vcpu *vcpu)
vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);  /* 22.2.1 */
 
 #ifdef CONFIG_X86_64
-   vmcs_writel(VIRTUAL_APIC_PAGE_ADDR, 0);
+   vmcs_writel(VIRTUAL_APIC_PAGE_ADDR, kvm_lapic_get_regs(vcpu));
vmcs_writel(TPR_THRESHOLD, 0);
 #endif
 
@@ -1346,6 +1359,9 @@ static int do_intr_requests(struct kvm_vcpu *vcpu,
kvm_irqpin_t pin)
 {
int handled = 0;
+   struct kvm_irqack_data ack;
+
+   memset(&ack, 0, sizeof(ack));
 
vcpu->interrupt_window_open =
((vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
@@ -1357,11 +1373,8 @@ static int do_intr_requests(struct kvm_vcpu *vcpu,
 * If interrupts enabled, and not blocked by sti or mov ss.
 * Good.
 */
-   struct kvm_irqack_data ack;
int r = 0;
 
-   memset(&ack, 0, sizeof(ack));
-
switch (pin) {
case kvm_irqpin_localint:
r = kvm_vcpu_irq_pop(vcpu, &ack);
@@ -1414,6 +1427,13 @@ static int do_intr_requests(struct kvm_vcpu *vcpu,
vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cbvec);
}
 
+#ifdef CONFIG_X86_64
+   if (ack.flags & KVM_IRQACKDATA_NEXT_VALID)
+   vmcs_write32(TPR_THRESHOLD, ack.next >> 4);
+   else
+   vmcs_write32(TPR_THRESHOLD, 0);
+#endif
+
return handled;
 }
 


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] Best practices for tracking KVM development

Re: [kvm-devel] kshmem & ACE

Re: [kvm-devel] [PATCH 3/4] KVM: Adds ability to preempt an executing VCPU

Re: [kvm-devel] [PATCH 4/4] KVM: Add support for in-kernel LAPIC model

Re: [kvm-devel] [PATCH 3/4] KVM: Adds ability to preempt an executing VCPU

Re: [kvm-devel] Clarify old version messages and clean-up

Re: [kvm-devel] [PATCH] install kvm modules into drivers/kvm

Re: [kvm-devel] [PATCH] configure qemu

Re: [kvm-devel] Cursor release failures

[kvm-devel] [PATCH][UPDATE] Shortcut MSR save/restore for lightweight VM Exit (was: RE: shortcut for lightweight VM Exit)

Re: [kvm-devel] [PATCH][UPDATE] Shortcut MSR save/restore for lightweight VM Exit

[kvm-devel] [ANNOUNCE] kvm-24 release

Re: [kvm-devel] [PATCH 3/4] KVM: Adds ability to preempt an executing VCPU

Re: [kvm-devel] [PATCH][UPDATE] Shortcut MSR save/restore for lightweight VM Exit

Re: [kvm-devel] [PATCH 3/4] KVM: Adds ability to preempt an executing VCPU

Re: [kvm-devel] [PATCH 4/4] KVM: Add support for in-kernel LAPIC model

Re: [kvm-devel] [PATCH 3/4] KVM: Adds ability to preempt an executing VCPU

Re: [kvm-devel] Cursor release failures

[kvm-devel] paravirtualization status

[kvm-devel] [PATCH 0/8] in-kernel APIC support "v1"

[kvm-devel] [PATCH 1/8] KVM: Adds support for in-kernel mmio handlers

[kvm-devel] [PATCH 2/8] KVM: Add irqdevice object

[kvm-devel] [PATCH 3/8] KVM: Adds ability to preempt an executing VCPU

[kvm-devel] [PATCH 4/8] KVM: Adds ability to signal userspace using a file-descriptor

[kvm-devel] [PATCH 5/8] KVM: Add support for in-kernel LAPIC model

[kvm-devel] [PATCH 7/8] KVM: Adds basic plumbing to support TPR shadow features

[kvm-devel] [PATCH 6/8] KVM: Adds support for real NMI injection on VMX processors

[kvm-devel] [PATCH 8/8] KVM: Adds support for TPR shadowing under VMX processors

28 matches

Site Navigation

Mail list logo

Footer information