Re: [PATCH 0/7] Fix emulate_invalid_guest_state=0 part 2

2012-12-21 Thread Gleb Natapov
On Fri, Dec 21, 2012 at 09:55:29PM -0200, Marcelo Tosatti wrote:
> On Wed, Dec 12, 2012 at 07:10:48PM +0200, Gleb Natapov wrote:
> > emulate_invalid_guest_state=0 still does not work since last patch of
> > my previous series was not applied. It causes kvm to emulate code that
> > previously kvm managed to run in vm86 and this, in turn, causes emulation
> > errors for instructions kvm does not know how to emulate. Patch 2 of
> > the series should fix that. Patch 3 is exactly same as 4/4 from previous
> > series. Other patches are clean-ups.
> 
> Can't see how guest_state_valid() can affect
> emulate_invalid_guest_state=0.
> 
Patch 3/7 fixes it. Patch 2/7 is needed for 3/7 to not break
emulate_invalid_guest_state=1.

> For the cleanups, Reviewed-by: Marcelo Tosatti .

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] KVM: VMX: relax check for CS register in rmode_segment_valid()

2012-12-21 Thread Gleb Natapov
On Fri, Dec 21, 2012 at 09:17:16PM -0200, Marcelo Tosatti wrote:
> On Wed, Dec 12, 2012 at 07:10:50PM +0200, Gleb Natapov wrote:
> > rmode_segment_valid() checks if segment descriptor can be used to enter
> > vm86 mode. VMX spec mandates that in vm86 mode CS register will be of
> > type data, not code. Lets allow guest entry with vm86 mode if the only
> > problem with CS register is incorrect type. Otherwise entire real mode
> > will be emulated.
> > 
> > Signed-off-by: Gleb Natapov 
> > ---
> >  arch/x86/kvm/vmx.c |2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 4df3991..acbe86f 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -3383,6 +3383,8 @@ static bool rmode_segment_valid(struct kvm_vcpu 
> > *vcpu, int seg)
> > var.dpl = 0x3;
> > var.g = 0;
> > var.db = 0;
> > +   if (seg == VCPU_SREG_CS)
> > +   var.type = 0x3;
> > ar = vmx_segment_access_rights(&var);
> >  
> > if (var.base != (var.selector << 4))
> > -- 
> > 1.7.10.4
> 
> But with emulate_invalid_guest_state=1, segments are not fixed on 
> transition to real mode. So this patch can result in 
> invalid guest state entry failure.
> 
At the point of this patch emulate_invalid_guest_state=1 is broken and
sate is fixed unconditionally during the call to vmx_set_segment(). See
big switch at the end of the function there. Note that here
rmode_segment_valid() does not check register properly anyway. It ignores
db value and allow guest entry with limit > 0x.  My patches fixes
this. After the patches are applied rmode_segment_valid() still relax
dpl and CS.type check:

var.dpl = 0x3;
if (seg == VCPU_SREG_CS)
var.type = 0x3;

but fix_rmode_seg() unconditionally does exactly same:

var.dpl = 0x3;
if (seg == VCPU_SREG_CS)
var.type = 0x3;

> Does this defeat the purpose of emulate_invalid_guest_state=1?
No since changing dpl to 3 and segment register type to 3 does not
affect instruction execution in vm86 mode, so entering vcpu in vm86
mode or fully emulate all instruction will yield exactly same result
no matter what instructions are executed. If we do not relax this check
in rmode_segment_valid() all real mode will be fully emulated because
no guest ever configure CS register to data type and set DPL of all
segments to 3 before entering real mode.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] Fix emulate_invalid_guest_state=0 part 2

2012-12-21 Thread Marcelo Tosatti
On Wed, Dec 12, 2012 at 07:10:48PM +0200, Gleb Natapov wrote:
> emulate_invalid_guest_state=0 still does not work since last patch of
> my previous series was not applied. It causes kvm to emulate code that
> previously kvm managed to run in vm86 and this, in turn, causes emulation
> errors for instructions kvm does not know how to emulate. Patch 2 of
> the series should fix that. Patch 3 is exactly same as 4/4 from previous
> series. Other patches are clean-ups.

Can't see how guest_state_valid() can affect
emulate_invalid_guest_state=0.

For the cleanups, Reviewed-by: Marcelo Tosatti .

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] KVM: VMX: relax check for CS register in rmode_segment_valid()

2012-12-21 Thread Marcelo Tosatti
On Wed, Dec 12, 2012 at 07:10:50PM +0200, Gleb Natapov wrote:
> rmode_segment_valid() checks if segment descriptor can be used to enter
> vm86 mode. VMX spec mandates that in vm86 mode CS register will be of
> type data, not code. Lets allow guest entry with vm86 mode if the only
> problem with CS register is incorrect type. Otherwise entire real mode
> will be emulated.
> 
> Signed-off-by: Gleb Natapov 
> ---
>  arch/x86/kvm/vmx.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 4df3991..acbe86f 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -3383,6 +3383,8 @@ static bool rmode_segment_valid(struct kvm_vcpu *vcpu, 
> int seg)
>   var.dpl = 0x3;
>   var.g = 0;
>   var.db = 0;
> + if (seg == VCPU_SREG_CS)
> + var.type = 0x3;
>   ar = vmx_segment_access_rights(&var);
>  
>   if (var.base != (var.selector << 4))
> -- 
> 1.7.10.4

But with emulate_invalid_guest_state=1, segments are not fixed on 
transition to real mode. So this patch can result in 
invalid guest state entry failure.

Does this defeat the purpose of emulate_invalid_guest_state=1?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: remove unneeded checks in qcow code

2012-12-21 Thread Jesper Juhl
On Thu, 20 Dec 2012, Sasha Levin wrote:

> We already know q!=NULL at that point, no need to retest.
> 
Right, and even if it was ==NULL then 'free(NULL)' is guaranteed to be a 
NOP, so it would still be fine.

Reviewed-by: Jesper Juhl 


> Signed-off-by: Sasha Levin 
> ---
>  tools/kvm/disk/qcow.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/kvm/disk/qcow.c b/tools/kvm/disk/qcow.c
> index ee2992e..64a2550 100644
> --- a/tools/kvm/disk/qcow.c
> +++ b/tools/kvm/disk/qcow.c
> @@ -1361,8 +1361,7 @@ free_header:
>   if (q->header)
>   free(q->header);
>  free_qcow:
> - if (q)
> - free(q);
> + free(q);
>  
>   return NULL;
>  }
> @@ -1492,8 +1491,7 @@ free_header:
>   if (q->header)
>   free(q->header);
>  free_qcow:
> - if (q)
> - free(q);
> + free(q);
>  
>   return NULL;
>  }
> 

-- 
Jesper Juhlhttp://www.chaosbits.net/
Don't top-post http://www.catb.org/jargon/html/T/top-post.html
Plain text mails only, please.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: Fix memory slot generation updates

2012-12-21 Thread Alex Williamson
Previous patch "kvm: Minor memory slot optimization" overlooked the
generation field of the memory slots.  Re-using the original memory
slots left us with with two slightly different memory slots with the
same generation.  To fix this, make update_memslots() take a new
parameter to specify the last generation.  This also makes generation
management more explicit to avoid such problems in the future.

Reported-by: Takuya Yoshikawa 
Signed-off-by: Alex Williamson 
---
 include/linux/kvm_host.h |3 ++-
 virt/kvm/kvm_main.c  |9 +
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 32fdc45..cbe0d68 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -428,7 +428,8 @@ void kvm_exit(void);
 
 void kvm_get_kvm(struct kvm *kvm);
 void kvm_put_kvm(struct kvm *kvm);
-void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot *new);
+void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot *new,
+u64 last_generation);
 
 static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm)
 {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index bd31096..14cbae8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -672,7 +672,8 @@ static void sort_memslots(struct kvm_memslots *slots)
slots->id_to_index[slots->memslots[i].id] = i;
 }
 
-void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot *new)
+void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot *new,
+u64 last_generation)
 {
if (new) {
int id = new->id;
@@ -684,7 +685,7 @@ void update_memslots(struct kvm_memslots *slots, struct 
kvm_memory_slot *new)
sort_memslots(slots);
}
 
-   slots->generation++;
+   slots->generation = last_generation + 1;
 }
 
 static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
@@ -819,7 +820,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
slot = id_to_memslot(slots, mem->slot);
slot->flags |= KVM_MEMSLOT_INVALID;
 
-   update_memslots(slots, NULL);
+   update_memslots(slots, NULL, kvm->memslots->generation);
 
old_memslots = kvm->memslots;
rcu_assign_pointer(kvm->memslots, slots);
@@ -867,7 +868,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
memset(&new.arch, 0, sizeof(new.arch));
}
 
-   update_memslots(slots, &new);
+   update_memslots(slots, &new, kvm->memslots->generation);
old_memslots = kvm->memslots;
rcu_assign_pointer(kvm->memslots, slots);
synchronize_srcu_expedited(&kvm->srcu);

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] KVM: Alleviate mmu_lock hold time when we start dirty logging

2012-12-21 Thread Alex Williamson
On Fri, 2012-12-21 at 10:54 +0200, Gleb Natapov wrote:
> On Fri, Dec 21, 2012 at 05:02:50PM +0900, Takuya Yoshikawa wrote:
> > On Thu, 20 Dec 2012 07:55:43 -0700
> > Alex Williamson  wrote:
> > 
> > > > Yes, the fix should work, but I do not want to update the
> > > > generation from outside of update_memslots().
> > > 
> > > Ok, then:
> > >
> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > index 87089dd..c7b5061 100644
> > > --- a/include/linux/kvm_host.h
> > > +++ b/include/linux/kvm_host.h
> > > @@ -413,7 +413,8 @@ void kvm_exit(void);
> > >  
> > >  void kvm_get_kvm(struct kvm *kvm);
> > >  void kvm_put_kvm(struct kvm *kvm);
> > > -void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot 
> > > *new);
> > > +void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot 
> > > *new,
> > > + u64 last_generation);
> > >  
> > >  static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm)
> > >  {
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > index c4c8ec1..06961ea 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -667,7 +667,8 @@ static void sort_memslots(struct kvm_memslots *slots)
> > >   slots->id_to_index[slots->memslots[i].id] = i;
> > >  }
> > >  
> > > -void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot 
> > > *new)
> > > +void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot 
> > > *new,
> > > + u64 last_generation)
> > >  {
> > >   if (new) {
> > >   int id = new->id;
> > > @@ -679,7 +680,7 @@ void update_memslots(struct kvm_memslots *slots, 
> > > struct kvm_memory_slot *new)
> > >   sort_memslots(slots);
> > >   }
> > >  
> > > - slots->generation++;
> > > + slots->generation = last_generation + 1;
> > >  }
> > >  
> > >  static int check_memory_region_flags(struct kvm_userspace_memory_region 
> > > *mem)
> > > @@ -814,7 +815,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
> > >   slot = id_to_memslot(slots, mem->slot);
> > >   slot->flags |= KVM_MEMSLOT_INVALID;
> > >  
> > > - update_memslots(slots, NULL);
> > > + update_memslots(slots, NULL, kvm->memslots->generation);
> > >  
> > >   old_memslots = kvm->memslots;
> > >   rcu_assign_pointer(kvm->memslots, slots);
> > > @@ -862,7 +863,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
> > >   memset(&new.arch, 0, sizeof(new.arch));
> > >   }
> > >  
> > > - update_memslots(slots, &new);
> > > + update_memslots(slots, &new, kvm->memslots->generation);
> > >   old_memslots = kvm->memslots;
> > >   rcu_assign_pointer(kvm->memslots, slots);
> > >   synchronize_srcu_expedited(&kvm->srcu);
> > > 
> > > > > The original patch can be reverted, there are no following 
> > > > > dependencies,
> > > > > but the idea was that we're making the memslot array larger, so there
> > > > > could be more pressure in allocating it, so let's not trivially do 
> > > > > extra
> > > > > frees and allocs.  Thanks,
> > > > 
> > > > I agree that the current set_memory_region() is not good for frequent 
> > > > updates.
> > > > But the alloc/free is not a major factor at the moment: flushing 
> > > > shadows should
> > > > be more problematic.
> > > 
> > > I don't understand why we'd throw away even a minor optimization that's
> > > so easy to fix.  We're not only removing a free/alloc, but we're being
> > > more friendly to the cache by avoiding the extra memcpy.  The above also
> > > makes the generation management a bit more explicit.  Thanks,
> > 
> > Looks good to me!
> > 
> Me too.
> 
> > I just wanted to keep the code readable, so no reason to object to
> > a clean solution.  Any chance to see the fix on kvm.git soon?
> > 
> Soon after Alex will send proper patch with Signed-off-by.

I'll test and do that first thing today, thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 2/3] x86, apicv: add virtual interrupt delivery support

2012-12-21 Thread Gleb Natapov
On Fri, Dec 21, 2012 at 09:39:20AM -0200, Marcelo Tosatti wrote:
> On Fri, Dec 21, 2012 at 09:51:40AM +0200, Gleb Natapov wrote:
> > On Thu, Dec 20, 2012 at 08:59:11PM -0200, Marcelo Tosatti wrote:
> > > On Mon, Dec 17, 2012 at 01:30:49PM +0800, Yang Zhang wrote:
> > > > From: Yang Zhang 
> > > > 
> > > > Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> > > > manually, which is fully taken care of by the hardware. This needs
> > > > some special awareness into existing interrupr injection path:
> > > > 
> > > > - for pending interrupt, instead of direct injection, we may need
> > > >   update architecture specific indicators before resuming to guest.
> > > > 
> > > > - A pending interrupt, which is masked by ISR, should be also
> > > >   considered in above update action, since hardware will decide
> > > >   when to inject it at right time. Current has_interrupt and
> > > >   get_interrupt only returns a valid vector from injection p.o.v.
> > > > 
> > > > Signed-off-by: Kevin Tian 
> > > > Signed-off-by: Yang Zhang 
> > > > ---
> > > >  arch/ia64/kvm/lapic.h   |6 ++
> > > >  arch/x86/include/asm/kvm_host.h |6 ++
> > > >  arch/x86/include/asm/vmx.h  |   11 +++
> > > >  arch/x86/kvm/irq.c  |   56 +-
> > > >  arch/x86/kvm/lapic.c|   65 ++---
> > > >  arch/x86/kvm/lapic.h|   28 ++-
> > > >  arch/x86/kvm/svm.c  |   24 ++
> > > >  arch/x86/kvm/vmx.c  |  154 
> > > > ++-
> > > >  arch/x86/kvm/x86.c  |   11 ++-
> > > >  include/linux/kvm_host.h|2 +
> > > >  virt/kvm/ioapic.c   |   36 +
> > > >  virt/kvm/ioapic.h   |1 +
> > > >  virt/kvm/irq_comm.c |   20 +
> > > >  13 files changed, 379 insertions(+), 41 deletions(-)
> > > > 
> > > > diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h
> > > > index c5f92a9..cb59eb4 100644
> > > > --- a/arch/ia64/kvm/lapic.h
> > > > +++ b/arch/ia64/kvm/lapic.h
> > > > @@ -27,4 +27,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct 
> > > > kvm_lapic_irq *irq);
> > > >  #define kvm_apic_present(x) (true)
> > > >  #define kvm_lapic_enabled(x) (true)
> > > >  
> > > > +static inline void kvm_update_eoi_exitmap(struct kvm *kvm,
> > > > +   struct kvm_lapic_irq *irq)
> > > > +{
> > > > +   /* IA64 has no apicv supporting, do nothing here */
> > > > +}
> > > > +
> > > >  #endif
> > > > diff --git a/arch/x86/include/asm/kvm_host.h 
> > > > b/arch/x86/include/asm/kvm_host.h
> > > > index c431b33..b63a144 100644
> > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > @@ -697,6 +697,11 @@ struct kvm_x86_ops {
> > > > void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
> > > > void (*enable_irq_window)(struct kvm_vcpu *vcpu);
> > > > void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, 
> > > > int irr);
> > > > +   int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
> > > > +   void (*update_apic_irq)(struct kvm_vcpu *vcpu, int max_irr);
> > > > +   void (*update_eoi_exitmap)(struct kvm *kvm, struct 
> > > > kvm_lapic_irq *irq);
> > > > +   void (*reset_eoi_exitmap)(struct kvm_vcpu *vcpu);
> > > > +   void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu);
> > > > int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
> > > > int (*get_tdp_level)(void);
> > > > u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool 
> > > > is_mmio);
> > > 
> > > EOI exit bitmap is problematic (its racy). Please do this:
> > > 
> > > 1. Make a synchronous (1) KVM_REQ_EOIBITMAP request on IOAPIC register
> > > modifications which require EOI exit bitmap updates.
> > > 2. On VM-entry, during KVM_REQ_EOIBITMAP processing, each checks IOAPIC 
> > > map
> > > and adjusts its own EOI exit bitmap VMCS registers.
> > > 
> > > 1) that waits until remote executing VCPUs have acknowledge the request,
> > > using make_all_cpus_request (see virt/kvm/kvm_main.c), similarly to
> > > remote TLB flushes.
> > > 
> > > What is the problem now: there is no control over _when_ a VCPU updates
> > > its EOI exit bitmap VMCS register from the (remotely updated) master EOI
> > > exit bitmap. The VCPU can be processing a KVM_REQ_EOIBITMAP relative to
> > > a precedence IOAPIC register write while the current IOAPIC register
> > > write is updating the EOI exit bitmap. There is no way to fix that
> > > without locking (which can be avoided if the IOAPIC->EOI exit bitmap
> > > synchronization is vcpu local).
> > > 
> > The race is benign. We have similar one for interrupt injection and the 
> > same race
> > exists on a real HW. The two cases that can happen due to the race are:
> > 
> > 1. exitbitmap bit X is changed from 1 to 0
> >   No problem. It is harmless to do an exit, on the next entry exitbitmap
> >   will be fix

Re: [PATCH v7 2/3] x86, apicv: add virtual interrupt delivery support

2012-12-21 Thread Marcelo Tosatti
On Fri, Dec 21, 2012 at 09:51:40AM +0200, Gleb Natapov wrote:
> On Thu, Dec 20, 2012 at 08:59:11PM -0200, Marcelo Tosatti wrote:
> > On Mon, Dec 17, 2012 at 01:30:49PM +0800, Yang Zhang wrote:
> > > From: Yang Zhang 
> > > 
> > > Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> > > manually, which is fully taken care of by the hardware. This needs
> > > some special awareness into existing interrupr injection path:
> > > 
> > > - for pending interrupt, instead of direct injection, we may need
> > >   update architecture specific indicators before resuming to guest.
> > > 
> > > - A pending interrupt, which is masked by ISR, should be also
> > >   considered in above update action, since hardware will decide
> > >   when to inject it at right time. Current has_interrupt and
> > >   get_interrupt only returns a valid vector from injection p.o.v.
> > > 
> > > Signed-off-by: Kevin Tian 
> > > Signed-off-by: Yang Zhang 
> > > ---
> > >  arch/ia64/kvm/lapic.h   |6 ++
> > >  arch/x86/include/asm/kvm_host.h |6 ++
> > >  arch/x86/include/asm/vmx.h  |   11 +++
> > >  arch/x86/kvm/irq.c  |   56 +-
> > >  arch/x86/kvm/lapic.c|   65 ++---
> > >  arch/x86/kvm/lapic.h|   28 ++-
> > >  arch/x86/kvm/svm.c  |   24 ++
> > >  arch/x86/kvm/vmx.c  |  154 
> > > ++-
> > >  arch/x86/kvm/x86.c  |   11 ++-
> > >  include/linux/kvm_host.h|2 +
> > >  virt/kvm/ioapic.c   |   36 +
> > >  virt/kvm/ioapic.h   |1 +
> > >  virt/kvm/irq_comm.c |   20 +
> > >  13 files changed, 379 insertions(+), 41 deletions(-)
> > > 
> > > diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h
> > > index c5f92a9..cb59eb4 100644
> > > --- a/arch/ia64/kvm/lapic.h
> > > +++ b/arch/ia64/kvm/lapic.h
> > > @@ -27,4 +27,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct 
> > > kvm_lapic_irq *irq);
> > >  #define kvm_apic_present(x) (true)
> > >  #define kvm_lapic_enabled(x) (true)
> > >  
> > > +static inline void kvm_update_eoi_exitmap(struct kvm *kvm,
> > > + struct kvm_lapic_irq *irq)
> > > +{
> > > + /* IA64 has no apicv supporting, do nothing here */
> > > +}
> > > +
> > >  #endif
> > > diff --git a/arch/x86/include/asm/kvm_host.h 
> > > b/arch/x86/include/asm/kvm_host.h
> > > index c431b33..b63a144 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -697,6 +697,11 @@ struct kvm_x86_ops {
> > >   void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
> > >   void (*enable_irq_window)(struct kvm_vcpu *vcpu);
> > >   void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
> > > + int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
> > > + void (*update_apic_irq)(struct kvm_vcpu *vcpu, int max_irr);
> > > + void (*update_eoi_exitmap)(struct kvm *kvm, struct kvm_lapic_irq *irq);
> > > + void (*reset_eoi_exitmap)(struct kvm_vcpu *vcpu);
> > > + void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu);
> > >   int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
> > >   int (*get_tdp_level)(void);
> > >   u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
> > 
> > EOI exit bitmap is problematic (its racy). Please do this:
> > 
> > 1. Make a synchronous (1) KVM_REQ_EOIBITMAP request on IOAPIC register
> > modifications which require EOI exit bitmap updates.
> > 2. On VM-entry, during KVM_REQ_EOIBITMAP processing, each checks IOAPIC map
> > and adjusts its own EOI exit bitmap VMCS registers.
> > 
> > 1) that waits until remote executing VCPUs have acknowledge the request,
> > using make_all_cpus_request (see virt/kvm/kvm_main.c), similarly to
> > remote TLB flushes.
> > 
> > What is the problem now: there is no control over _when_ a VCPU updates
> > its EOI exit bitmap VMCS register from the (remotely updated) master EOI
> > exit bitmap. The VCPU can be processing a KVM_REQ_EOIBITMAP relative to
> > a precedence IOAPIC register write while the current IOAPIC register
> > write is updating the EOI exit bitmap. There is no way to fix that
> > without locking (which can be avoided if the IOAPIC->EOI exit bitmap
> > synchronization is vcpu local).
> > 
> The race is benign. We have similar one for interrupt injection and the same 
> race
> exists on a real HW. The two cases that can happen due to the race are:
> 
> 1. exitbitmap bit X is changed from 1 to 0
>   No problem. It is harmless to do an exit, on the next entry exitbitmap
>   will be fixed.
> 2. exitbitmap bit X is changed from 0 to 1
>   If vcpu serves X at the time this happens it was delivered as edge, so
>   no need to exit. The exitbitmap will be updated after the next vmexit
>   which will happen due to KVM_REQ_EOIBITMAP processing.

1. Missed the case where bitmap is being reset (where EOI exit bitmaps
are zeroed). In this case vcpu enters gu

Re: [PATCH 0/7] KVM: Alleviate mmu_lock hold time when we start dirty logging

2012-12-21 Thread Gleb Natapov
On Fri, Dec 21, 2012 at 05:02:50PM +0900, Takuya Yoshikawa wrote:
> On Thu, 20 Dec 2012 07:55:43 -0700
> Alex Williamson  wrote:
> 
> > > Yes, the fix should work, but I do not want to update the
> > > generation from outside of update_memslots().
> > 
> > Ok, then:
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 87089dd..c7b5061 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -413,7 +413,8 @@ void kvm_exit(void);
> >  
> >  void kvm_get_kvm(struct kvm *kvm);
> >  void kvm_put_kvm(struct kvm *kvm);
> > -void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot 
> > *new);
> > +void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot 
> > *new,
> > + u64 last_generation);
> >  
> >  static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm)
> >  {
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index c4c8ec1..06961ea 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -667,7 +667,8 @@ static void sort_memslots(struct kvm_memslots *slots)
> > slots->id_to_index[slots->memslots[i].id] = i;
> >  }
> >  
> > -void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot 
> > *new)
> > +void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot 
> > *new,
> > + u64 last_generation)
> >  {
> > if (new) {
> > int id = new->id;
> > @@ -679,7 +680,7 @@ void update_memslots(struct kvm_memslots *slots, struct 
> > kvm_memory_slot *new)
> > sort_memslots(slots);
> > }
> >  
> > -   slots->generation++;
> > +   slots->generation = last_generation + 1;
> >  }
> >  
> >  static int check_memory_region_flags(struct kvm_userspace_memory_region 
> > *mem)
> > @@ -814,7 +815,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
> > slot = id_to_memslot(slots, mem->slot);
> > slot->flags |= KVM_MEMSLOT_INVALID;
> >  
> > -   update_memslots(slots, NULL);
> > +   update_memslots(slots, NULL, kvm->memslots->generation);
> >  
> > old_memslots = kvm->memslots;
> > rcu_assign_pointer(kvm->memslots, slots);
> > @@ -862,7 +863,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
> > memset(&new.arch, 0, sizeof(new.arch));
> > }
> >  
> > -   update_memslots(slots, &new);
> > +   update_memslots(slots, &new, kvm->memslots->generation);
> > old_memslots = kvm->memslots;
> > rcu_assign_pointer(kvm->memslots, slots);
> > synchronize_srcu_expedited(&kvm->srcu);
> > 
> > > > The original patch can be reverted, there are no following dependencies,
> > > > but the idea was that we're making the memslot array larger, so there
> > > > could be more pressure in allocating it, so let's not trivially do extra
> > > > frees and allocs.  Thanks,
> > > 
> > > I agree that the current set_memory_region() is not good for frequent 
> > > updates.
> > > But the alloc/free is not a major factor at the moment: flushing shadows 
> > > should
> > > be more problematic.
> > 
> > I don't understand why we'd throw away even a minor optimization that's
> > so easy to fix.  We're not only removing a free/alloc, but we're being
> > more friendly to the cache by avoiding the extra memcpy.  The above also
> > makes the generation management a bit more explicit.  Thanks,
> 
> Looks good to me!
> 
Me too.

> I just wanted to keep the code readable, so no reason to object to
> a clean solution.  Any chance to see the fix on kvm.git soon?
> 
Soon after Alex will send proper patch with Signed-off-by.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] KVM: Alleviate mmu_lock hold time when we start dirty logging

2012-12-21 Thread Takuya Yoshikawa
On Thu, 20 Dec 2012 07:55:43 -0700
Alex Williamson  wrote:

> > Yes, the fix should work, but I do not want to update the
> > generation from outside of update_memslots().
> 
> Ok, then:
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 87089dd..c7b5061 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -413,7 +413,8 @@ void kvm_exit(void);
>  
>  void kvm_get_kvm(struct kvm *kvm);
>  void kvm_put_kvm(struct kvm *kvm);
> -void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot 
> *new);
> +void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot *new,
> + u64 last_generation);
>  
>  static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm)
>  {
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index c4c8ec1..06961ea 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -667,7 +667,8 @@ static void sort_memslots(struct kvm_memslots *slots)
>   slots->id_to_index[slots->memslots[i].id] = i;
>  }
>  
> -void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot *new)
> +void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot *new,
> + u64 last_generation)
>  {
>   if (new) {
>   int id = new->id;
> @@ -679,7 +680,7 @@ void update_memslots(struct kvm_memslots *slots, struct 
> kvm_memory_slot *new)
>   sort_memslots(slots);
>   }
>  
> - slots->generation++;
> + slots->generation = last_generation + 1;
>  }
>  
>  static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
> @@ -814,7 +815,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
>   slot = id_to_memslot(slots, mem->slot);
>   slot->flags |= KVM_MEMSLOT_INVALID;
>  
> - update_memslots(slots, NULL);
> + update_memslots(slots, NULL, kvm->memslots->generation);
>  
>   old_memslots = kvm->memslots;
>   rcu_assign_pointer(kvm->memslots, slots);
> @@ -862,7 +863,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
>   memset(&new.arch, 0, sizeof(new.arch));
>   }
>  
> - update_memslots(slots, &new);
> + update_memslots(slots, &new, kvm->memslots->generation);
>   old_memslots = kvm->memslots;
>   rcu_assign_pointer(kvm->memslots, slots);
>   synchronize_srcu_expedited(&kvm->srcu);
> 
> > > The original patch can be reverted, there are no following dependencies,
> > > but the idea was that we're making the memslot array larger, so there
> > > could be more pressure in allocating it, so let's not trivially do extra
> > > frees and allocs.  Thanks,
> > 
> > I agree that the current set_memory_region() is not good for frequent 
> > updates.
> > But the alloc/free is not a major factor at the moment: flushing shadows 
> > should
> > be more problematic.
> 
> I don't understand why we'd throw away even a minor optimization that's
> so easy to fix.  We're not only removing a free/alloc, but we're being
> more friendly to the cache by avoiding the extra memcpy.  The above also
> makes the generation management a bit more explicit.  Thanks,

Looks good to me!

I just wanted to keep the code readable, so no reason to object to
a clean solution.  Any chance to see the fix on kvm.git soon?

Thanks,
Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] KVM: Alleviate mmu_lock hold time when we start dirty logging

2012-12-21 Thread Takuya Yoshikawa
On Thu, 20 Dec 2012 07:55:43 -0700
Alex Williamson  wrote:

> > Yes, the fix should work, but I do not want to update the
> > generation from outside of update_memslots().
> 
> Ok, then:
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 87089dd..c7b5061 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -413,7 +413,8 @@ void kvm_exit(void);
>  
>  void kvm_get_kvm(struct kvm *kvm);
>  void kvm_put_kvm(struct kvm *kvm);
> -void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot 
> *new);
> +void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot *new,
> + u64 last_generation);
>  
>  static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm)
>  {
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index c4c8ec1..06961ea 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -667,7 +667,8 @@ static void sort_memslots(struct kvm_memslots *slots)
>   slots->id_to_index[slots->memslots[i].id] = i;
>  }
>  
> -void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot *new)
> +void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot *new,
> + u64 last_generation)
>  {
>   if (new) {
>   int id = new->id;
> @@ -679,7 +680,7 @@ void update_memslots(struct kvm_memslots *slots, struct 
> kvm_memory_slot *new)
>   sort_memslots(slots);
>   }
>  
> - slots->generation++;
> + slots->generation = last_generation + 1;
>  }
>  
>  static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
> @@ -814,7 +815,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
>   slot = id_to_memslot(slots, mem->slot);
>   slot->flags |= KVM_MEMSLOT_INVALID;
>  
> - update_memslots(slots, NULL);
> + update_memslots(slots, NULL, kvm->memslots->generation);
>  
>   old_memslots = kvm->memslots;
>   rcu_assign_pointer(kvm->memslots, slots);
> @@ -862,7 +863,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
>   memset(&new.arch, 0, sizeof(new.arch));
>   }
>  
> - update_memslots(slots, &new);
> + update_memslots(slots, &new, kvm->memslots->generation);
>   old_memslots = kvm->memslots;
>   rcu_assign_pointer(kvm->memslots, slots);
>   synchronize_srcu_expedited(&kvm->srcu);
> 
> > > The original patch can be reverted, there are no following dependencies,
> > > but the idea was that we're making the memslot array larger, so there
> > > could be more pressure in allocating it, so let's not trivially do extra
> > > frees and allocs.  Thanks,
> > 
> > I agree that the current set_memory_region() is not good for frequent 
> > updates.
> > But the alloc/free is not a major factor at the moment: flushing shadows 
> > should
> > be more problematic.
> 
> I don't understand why we'd throw away even a minor optimization that's
> so easy to fix.  We're not only removing a free/alloc, but we're being
> more friendly to the cache by avoiding the extra memcpy.  The above also
> makes the generation management a bit more explicit.  Thanks,

Looks good to me!

I just wanted to keep the code readable, so no reason to object to
a clean solution.  Any chance to see the fix on kvm.git soon?

Thanks,
Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 6/6] KVM: VMX: handle IO when emulation is due to #GP in real mode.

2012-12-21 Thread Gleb Natapov
On Thu, Dec 20, 2012 at 03:58:49PM -0700, Alex Williamson wrote:
> On Thu, 2012-12-20 at 18:43 +0200, Gleb Natapov wrote:
> > On Thu, Dec 20, 2012 at 08:25:41AM -0700, Alex Williamson wrote:
> > > On Thu, 2012-12-20 at 16:57 +0200, Gleb Natapov wrote:
> > > > With emulate_invalid_guest_state=0 if a vcpu is in real mode VMX can
> > > > enter the vcpu with smaller segment limit than guest configured.  If the
> > > > guest tries to access pass this limit it will get #GP at which point
> > > > instruction will be emulated with correct segment limit applied. If
> > > > during the emulation IO is detected it is not handled correctly. Vcpu
> > > > thread should exit to userspace to serve the IO, but it returns to the
> > > > guest instead.  Since emulation is not completed till userspace 
> > > > completes
> > > > the IO the faulty instruction is re-executed ad infinitum.
> > > > 
> > > > The patch fixes that by exiting to userspace if IO happens during
> > > > instruction emulation.
> > > 
> > > Thanks for finding the right fix Gleb.  This originally came about from
> > > an experiment in lazily mapping assigned device MMIO BARs.  That's
> > > something I think might still have value for conserving memory slots,
> > > but now I have to be aware of this bug.  That makes me wonder if we need
> > > to expose a capability for the fix.  I could also just avoid the lazy
> > > path if a device includes a ROM, but it's still difficult to know how
> > > long such a workaround is required.  Thanks,
> > > 
> > I do not like the idea to have capability flags for emulator bugs. We
> > have to many of those.  And It is hard to know at which point exactly
> > this capability should be set.  Case in point: apply this patch series
> > and the one it depends upon, but do not apply the last patch that actually
> > fixes the bug you've found. Run your test. It will work. Also the test
> > case will work on older kernels on AMD and modern Intel cpus with
> > unrestricted guest capability.
> 
> I agree, capabilities are terrible, the only thing worse is having an
> un-probe-able bug that could randomly bite us from an unknown option
> rom.  The capability doesn't eliminate that entirely though, we could
> still have an MMIO BAR that must be emulated because if it's size.  I'll
> just have to avoid it as best as I can.  Thanks,
> 
Precisely (to capability doesn't eliminate that entirely part). You will
not hit the bug with the patch series, but without the last patch because
now option roms are fully emulated. The reason is that according to BIOS
spec option roms are called in big real mode and we cannot enter vcpu
in this mode, the fact you hit the #GP bug is the result of another
bug that caused us to enter the vcpu anyway (with incorrect segment
registers). Since option roms are now fully emulated we cannot tell
for sure which one will work and which one will not until we implement
all real mode instruction in the emulator. We are almost, but not yet,
there. Note that this is true no matter your changes. When you assign
device with option rom it may fail due to missing emulation.

> Alex
>  
> > > > Reported-by: Alex Williamson 
> > > > Signed-off-by: Gleb Natapov 
> > > > ---
> > > >  arch/x86/kvm/vmx.c |   75 
> > > > 
> > > >  1 file changed, 41 insertions(+), 34 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > > index a101dd4..ab5933f 100644
> > > > --- a/arch/x86/kvm/vmx.c
> > > > +++ b/arch/x86/kvm/vmx.c
> > > > @@ -4230,28 +4230,9 @@ static int vmx_set_tss_addr(struct kvm *kvm, 
> > > > unsigned int addr)
> > > > return 0;
> > > >  }
> > > >  
> > > > -static int handle_rmode_exception(struct kvm_vcpu *vcpu,
> > > > - int vec, u32 err_code)
> > > > +static bool rmode_exception(struct kvm_vcpu *vcpu, int vec) 
> > > >  {
> > > > -   /*
> > > > -* Instruction with address size override prefix opcode 0x67
> > > > -* Cause the #SS fault with 0 error code in VM86 mode.
> > > > -*/
> > > > -   if (((vec == GP_VECTOR) || (vec == SS_VECTOR)) && err_code == 0)
> > > > -   if (emulate_instruction(vcpu, 0) == EMULATE_DONE)
> > > > -   return 1;
> > > > -   /*
> > > > -* Forward all other exceptions that are valid in real mode.
> > > > -* FIXME: Breaks guest debugging in real mode, needs to be 
> > > > fixed with
> > > > -*the required debugging infrastructure rework.
> > > > -*/
> > > > switch (vec) {
> > > > -   case DB_VECTOR:
> > > > -   if (vcpu->guest_debug &
> > > > -   (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
> > > > -   return 0;
> > > > -   kvm_queue_exception(vcpu, vec);
> > > > -   return 1;
> > > > case BP_VECTOR:
> > > > /*
> > > >  * Update instruction l