RE: [PATCH 1/5]KVM: x86, apicv: add APICv register virtualization support

2012-09-18 Thread Li, Jiongxi


> -Original Message-
> From: Avi Kivity [mailto:a...@redhat.com]
> Sent: Friday, September 07, 2012 12:02 AM
> To: Li, Jiongxi
> Cc: kvm@vger.kernel.org
> Subject: Re: [PATCH 1/5]KVM: x86, apicv: add APICv register virtualization
> support
> 
> On 09/05/2012 08:41 AM, Li, Jiongxi wrote:
> > - APIC read doesn't cause VM-Exit
> > - APIC write becomes trap-like
> >
> >
> > +/* emulate APIC access in a trap manner */ int
> > +kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset) {
> > +   u32 val;
> > +
> > +   /* hw has done the conditional check and inst decode */
> > +   offset &= 0xff0;
> > +   if ((offset != APIC_EOI) &&
> > +apic_reg_read(vcpu->arch.apic, offset, 4, &val))
> > +   return 1;
> 
> TMICT is a write-only register IIRC.
> 
> > +
> > +   /* TODO: optimize to just emulate side effect w/o one more write */
> > +   return apic_reg_write(vcpu->arch.apic, offset, val);
> 
> val may be uninitialized here.
> 
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode);
> > +
> >  void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu)  {
> >
> > +static bool __read_mostly enable_apicv_reg = 0;
> 
> Enable by default.
> 
> > +module_param(enable_apicv_reg, bool, S_IRUGO);
> 
> Let's have one module parameter for all related features, called apicv.
>  So modeprobe kvm-intel apicv=0 disables it.
> 
> >
> > +static int handle_apic_write(struct kvm_vcpu *vcpu) {
> > +   unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> > +   u32 offset = exit_qualification & 0xfff;
> > +
> > +   /* APIC-write VM exit is trap-like and thus no need to adjust IP */
> > +   return kvm_apic_write_nodecode(vcpu, offset) == 0; }
> 
> Return 1 here means exit to userspace.  This will go crazy.
> 
> You need to return 0 always.  If this is an msr write to a read-only 
> register, you
> need to inject a #GP (IIRC).

Return 0 means exit to userspace, so it should return 1, right?
__vcpu_run
{
  while (r>0)
  {  
 r = vcpu_enter_guest(vcpu)
 if (r<=0)
   break;
  }
}
> 
> 
> 
> --
> error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 4/5]KVM:x86, apicv: add interface for poking EOI exit bitmap

2012-09-17 Thread Li, Jiongxi


> -Original Message-
> From: Avi Kivity [mailto:a...@redhat.com]
> Sent: Friday, September 07, 2012 12:38 AM
> To: Li, Jiongxi
> Cc: kvm@vger.kernel.org
> Subject: Re: [PATCH 4/5]KVM:x86, apicv: add interface for poking EOI exit
> bitmap
> 
> On 09/05/2012 08:41 AM, Li, Jiongxi wrote:
> > With APICv virtual interrupt delivery feature, EOI write from non root
> > mode doesn't cause VM-Exit unless set in EOI exit bitmap VMCS field.
> > Basically there're two methods to manipulate EOI exit bitmap:
> 
> Should be folded into the previous patch, otherwise the previous patch breaks
> level interrupts.
> 
> >
> > [Option 1]
> > Ideally only level triggered irq requires a hook in vLAPIC EOI write,
> > so that vIOAPIC EOI is triggered and emulated. So the simplest
> > approach is to manipulate EOI exit bitmap when vLAPIC acks a new
> > interrupt, based on value of TMR. There're several corner cases worthy
> > of note though:
> >
> >   - KVM has specific notifier hooks on vIOAPIC EOI path. So far two
> > sources use it: INT-based device passthrough and PIT pending
> > timers. For the former, it's virtually wired to vIOAPIC and
> > thus TMR already covers it. PIT is special here, which is an
> > edge triggered source. But since other timer sources like
> > vLAPIC timer don't require this notifier hook, possibly PIT
> > can be relaxed in the future too.
> 
> I would like to switch to changing the timer frequency when we need to catch
> up.  But that can be done later.
> 
> >
> >   - posted interrupt will update TMR directly, w/o chance for KVM
> > to update EOI exit bitmap accordingly. This becomes a gap
> 
> Why not? we know what vector the PIT is wired to.
> 
> >
> > [Option 2]
> > Indicate EOI exit bitmap requirement ('need_eoi') directly from every
> > interrupt source device, and then check this requirement when vLAPIC
> > acks a new pending interrupt. This requires more intrusive changes to
> > current vLAPIC/vIOAPIC logic, so that the "irq_source_id" indicating
> > source of interrupt is passed through from origination point to vLAPIC
> > ack point. For natual requirement like vIOAPIC level triggered
> > entries, it can be implicitly deduced.
> > On the other hand for non-natural requirements like aformentioned PIT
> > or posted interrupt, this approach can handle it efficiently.
> >
> > For simplicity reason, now option 1 is used which should be enough to
> > test MSI-based device passthrough.
> 
> You can change kvm_register_irq_ack_notifier() to call the ioapic and pic to 
> find
> out what vectors need EOI exits.
> 
> (alternatively, if we fix the PIT, then we only need ack notifiers for level
> interrupts).
> 
> >
> > Signed-off-by: Kevin Tian 
> > Signed-off-by: Jiongxi Li 
> > ---
> >  arch/x86/include/asm/kvm_host.h |1 +
> >  arch/x86/kvm/lapic.c|7 ++-
> >  arch/x86/kvm/vmx.c  |   37
> +
> >  3 files changed, 44 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h
> > b/arch/x86/include/asm/kvm_host.h index ef74df5..4e06a82 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -671,6 +671,7 @@ struct kvm_x86_ops {
> > void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
> > int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
> > void (*update_irq)(struct kvm_vcpu *vcpu);
> > +   void (*set_eoi_exitmap)(struct kvm_vcpu *vcpu, int vector, int
> > +need_eoi);
> > int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
> > int (*get_tdp_level)(void);
> > u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index
> > d203501..4058384 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -499,8 +499,13 @@ static int __apic_accept_irq(struct kvm_lapic *apic,
> int delivery_mode,
> > if (trig_mode) {
> > apic_debug("level trig mode for vector %d", vector);
> > apic_set_vector(vector, apic->regs + APIC_TMR);
> > -   } else
> > +   if (kvm_apic_vid_enabled(vcpu))
> > +   kvm_x86_ops->set_eoi_exitmap(vcpu, vector, 1);
> > +   } else {
> > apic_clear_vector(vector, apic->regs + APIC_TMR);

RE: [PATCH 2/5]KVM:x86, apicv: adjust for virtual interrupt delivery

2012-09-17 Thread Li, Jiongxi


> -Original Message-
> From: Li, Jiongxi
> Sent: Friday, September 14, 2012 10:16 PM
> To: 'Avi Kivity'
> Cc: kvm@vger.kernel.org
> Subject: RE: [PATCH 2/5]KVM:x86, apicv: adjust for virtual interrupt delivery
> 
> Sorry for the late response
> 
> > -Original Message-
> > From: Avi Kivity [mailto:a...@redhat.com]
> > Sent: Friday, September 07, 2012 12:22 AM
> > To: Li, Jiongxi
> > Cc: kvm@vger.kernel.org
> > Subject: Re: [PATCH 2/5]KVM:x86, apicv: adjust for virtual interrupt
> > delivery
> >
> > On 09/05/2012 08:41 AM, Li, Jiongxi wrote:
> > > Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> > > manually, which is fully taken care of by the hardware. This needs
> > > some special awareness into existing interrupr injection path:
> > >
> > >   - for pending interrupt, instead of direct injection, we may need
> > > update architecture specific indicators before resuming to guest.
> > >
> > >   - A pending interrupt, which is masked by ISR, should be also
> > > considered in above update action, since hardware will decide
> > > when to inject it at right time. Current has_interrupt and
> > > get_interrupt only returns a valid vector from injection p.o.v.
> > >
> > >
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -5194,6 +5194,13 @@ static void inject_pending_event(struct
> > kvm_vcpu *vcpu)
> > >   vcpu->arch.nmi_injected = true;
> > >   kvm_x86_ops->set_nmi(vcpu);
> > >   }
> > > + } else if (kvm_apic_vid_enabled(vcpu)) {
> > > + if (kvm_cpu_has_interrupt_apic_vid(vcpu) &&
> > > + kvm_x86_ops->interrupt_allowed(vcpu)) {
> > > + kvm_queue_interrupt(vcpu,
> > > + kvm_cpu_get_interrupt_apic_vid(vcpu), false);
> > > + kvm_x86_ops->set_irq(vcpu);
> > > + }
> >
> > It may be simpler to change kvm_cpu_{has,get}_interrupt to ignore the
> > apic if virtual interrupt delivery is enabled.
> OKs

Kvm_cpu_has_interrupt is also called in other place, no just used to judge 
whether to inject interrupt manually. For instance, it is called in 
kvm_arch_vcpu_runnable. In that case, apic can't be ingored. So for safety, I 
think it is better to use another function here other than change the original 
kvm_cpu_has_interrupt function.
> >
> > > @@ -5293,16 +5300,27 @@ static int vcpu_enter_guest(struct kvm_vcpu
> > *vcpu)
> > >   }
> > >
> > >   if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> > > + /* update archtecture specific hints for APIC virtual interrupt
> > > +delivery
> > */
> > > + if (kvm_apic_vid_enabled(vcpu))
> > > + kvm_x86_ops->update_irq(vcpu);
> > > +
> >
> > Not defined.
> This function is defined in patch 3/5. Because virtual interrupt delivery is 
> not
> enabled in this patch. So this function is not called. Since we will enable 
> this
> feature by default, so maybe we can merge PATCH 2,3,4 together into one
> patch.
> >
> > >   inject_pending_event(vcpu);
> > >
> > >   /* enable NMI/IRQ window open exits if needed */
> > >   if (vcpu->arch.nmi_pending)
> > >   kvm_x86_ops->enable_nmi_window(vcpu);
> > > - else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> > > + else if (kvm_apic_vid_enabled(vcpu)) {
> > > + if (kvm_cpu_has_interrupt_apic_vid(vcpu))
> > > + kvm_x86_ops->enable_irq_window(vcpu);
> > > + } else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> > >   kvm_x86_ops->enable_irq_window(vcpu);
> > >
> > >   if (kvm_lapic_enabled(vcpu)) {
> > > - update_cr8_intercept(vcpu);
> > > + /* no need for tpr_threshold update if APIC virtual
> > > +  * interrupt delivery is enabled
> > > +  */
> > > + if (!kvm_apic_vid_enabled(vcpu))
> > > + update_cr8_intercept(vcpu);
> >
> > Perhaps the arch function should do the ignoring.
> You means putting the 'vid_enabled' judgement in
> 'kvm_x86_ops->update_cr8_intercept'? Is it just out of the reason that
> reducing the code change in common code?
> >
> > >   kvm_lapic_sync_to_vapic(vcpu);
> > >   }
> > >   }
> > >
> >
> >
> > --
> > error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 4/5]KVM:x86, apicv: add interface for poking EOI exit bitmap

2012-09-14 Thread Li, Jiongxi
Sorry for the late response

> -Original Message-
> From: Avi Kivity [mailto:a...@redhat.com]
> Sent: Friday, September 07, 2012 12:38 AM
> To: Li, Jiongxi
> Cc: kvm@vger.kernel.org
> Subject: Re: [PATCH 4/5]KVM:x86, apicv: add interface for poking EOI exit
> bitmap
> 
> On 09/05/2012 08:41 AM, Li, Jiongxi wrote:
> > With APICv virtual interrupt delivery feature, EOI write from non root
> > mode doesn't cause VM-Exit unless set in EOI exit bitmap VMCS field.
> > Basically there're two methods to manipulate EOI exit bitmap:
> 
> Should be folded into the previous patch, otherwise the previous patch breaks
> level interrupts.
OK
> 
> >
> > [Option 1]
> > Ideally only level triggered irq requires a hook in vLAPIC EOI write,
> > so that vIOAPIC EOI is triggered and emulated. So the simplest
> > approach is to manipulate EOI exit bitmap when vLAPIC acks a new
> > interrupt, based on value of TMR. There're several corner cases worthy
> > of note though:
> >
> >   - KVM has specific notifier hooks on vIOAPIC EOI path. So far two
> > sources use it: INT-based device passthrough and PIT pending
> > timers. For the former, it's virtually wired to vIOAPIC and
> > thus TMR already covers it. PIT is special here, which is an
> > edge triggered source. But since other timer sources like
> > vLAPIC timer don't require this notifier hook, possibly PIT
> > can be relaxed in the future too.
> 
> I would like to switch to changing the timer frequency when we need to catch
> up.  But that can be done later.
> 
> >
> >   - posted interrupt will update TMR directly, w/o chance for KVM
> > to update EOI exit bitmap accordingly. This becomes a gap
> 
> Why not? we know what vector the PIT is wired to.
What PIT case are you talking about? Can you elaborate it?
> 
> >
> > [Option 2]
> > Indicate EOI exit bitmap requirement ('need_eoi') directly from every
> > interrupt source device, and then check this requirement when vLAPIC
> > acks a new pending interrupt. This requires more intrusive changes to
> > current vLAPIC/vIOAPIC logic, so that the "irq_source_id" indicating
> > source of interrupt is passed through from origination point to vLAPIC
> > ack point. For natual requirement like vIOAPIC level triggered
> > entries, it can be implicitly deduced.
> > On the other hand for non-natural requirements like aformentioned PIT
> > or posted interrupt, this approach can handle it efficiently.
> >
> > For simplicity reason, now option 1 is used which should be enough to
> > test MSI-based device passthrough.
> 
> You can change kvm_register_irq_ack_notifier() to call the ioapic and pic to 
> find
> out what vectors need EOI exits.
We will look into that, thanks for pointing out. And we will use option 1 here 
first
> 
> (alternatively, if we fix the PIT, then we only need ack notifiers for level
> interrupts).
> 
> >
> > Signed-off-by: Kevin Tian 
> > Signed-off-by: Jiongxi Li 
> > ---
> >  arch/x86/include/asm/kvm_host.h |1 +
> >  arch/x86/kvm/lapic.c|7 ++-
> >  arch/x86/kvm/vmx.c  |   37
> +
> >  3 files changed, 44 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h
> > b/arch/x86/include/asm/kvm_host.h index ef74df5..4e06a82 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -671,6 +671,7 @@ struct kvm_x86_ops {
> > void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
> > int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
> > void (*update_irq)(struct kvm_vcpu *vcpu);
> > +   void (*set_eoi_exitmap)(struct kvm_vcpu *vcpu, int vector, int
> > +need_eoi);
> > int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
> > int (*get_tdp_level)(void);
> > u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index
> > d203501..4058384 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -499,8 +499,13 @@ static int __apic_accept_irq(struct kvm_lapic *apic,
> int delivery_mode,
> > if (trig_mode) {
> > apic_debug("level trig mode for vector %d", vector);
> > apic_set_vector(vector, apic->regs + APIC_TMR);
> > -   } else
> > +   if (kvm_apic_vid_enabled(vcpu))
> > + 

RE: [PATCH 3/5]KVM:x86, apicv: enable virtual interrupt delivery for VMX

2012-09-14 Thread Li, Jiongxi
Sorry for the late response

> -Original Message-
> From: Avi Kivity [mailto:a...@redhat.com]
> Sent: Friday, September 07, 2012 12:30 AM
> To: Li, Jiongxi
> Cc: kvm@vger.kernel.org
> Subject: Re: [PATCH 3/5]KVM:x86, apicv: enable virtual interrupt delivery for
> VMX
> 
> On 09/05/2012 08:41 AM, Li, Jiongxi wrote:
> > - before returning to guest, RVI should be updated if any pending IRRs
> 
> process pending interrupts does that for you, so you only need this with
> KVM_SET_APIC.
> 
> > - EOI exit bitmap controls whether an EOI write should cause VM-Exit.
> >   if set, a trap-like induced EOI VM-Exit is triggered. Keep all the
> >   bitmaps cleared for now, which should be enough to allow a MSI based
> >   device passthrough
> 
> What about level-triggered interrupts, or interrupts which have ack notifiers
> set?
> 
We will merge the EOI exit bitmap part patch
> >
> > -static void apic_send_ipi(struct kvm_lapic *apic)
> > +/*
> > + * this interface assumes a trap-like exit, which has already
> > +finished
> > + * desired side effect including vISR and vPPR update.
> > + */
> > +void kvm_apic_set_eoi(struct kvm_vcpu *vcpu, int vector) {
> > +   struct kvm_lapic *apic = vcpu->arch.apic;
> > +   int trigger_mode;
> > +
> > +   if (apic_test_and_clear_vector(vector, apic->regs + APIC_TMR))
> > +   trigger_mode = IOAPIC_LEVEL_TRIG;
> > +   else
> > +   trigger_mode = IOAPIC_EDGE_TRIG;
> > +
> > +   if (!(apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI))
> > +   kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> > +   kvm_make_request(KVM_REQ_EVENT, apic->vcpu); }
> > +EXPORT_SYMBOL_GPL(kvm_apic_set_eoi);
> 
> What's the difference between this and apic_set_eoi()?
In kvm_apic_set_eoi, We can use the vector directly from exit_qualification 
while there is EOI-induced VMExit, and doesn't need to do the 'clear isr', 
'update ppr' things which are handled by hardware.
> 
> > +
> > + static void apic_send_ipi(struct kvm_lapic *apic)
> 
> Extra space added.
OK.

> 
> >  /*
> >   * If nested=1, nested virtualization is supported, i.e., guests may use
> >   * VMX and be a hypervisor for its own guests. If nested=0, guests
> > may not @@ -430,6 +433,8 @@ struct vcpu_vmx {
> >
> > bool rdtscp_enabled;
> >
> > +   u64 eoi_exit_bitmap[4];
> > +
> 
> Unused?
This is used in PATCH 4/5

> 
> >
> > @@ -3876,6 +3894,15 @@ static int vmx_vcpu_setup(struct vcpu_vmx
> *vmx)
> > vmx_secondary_exec_control(vmx));
> > }
> >
> > +   if (enable_apicv_vid) {
> > +   vmcs_write64(EOI_EXIT_BITMAP0, 0);
> > +   vmcs_write64(EOI_EXIT_BITMAP1, 0);
> > +   vmcs_write64(EOI_EXIT_BITMAP2, 0);
> > +   vmcs_write64(EOI_EXIT_BITMAP3, 0);
> > +
> > +   vmcs_write16(GUEST_INTR_STATUS, 0);
> 
> Need to update GUEST_INTR_STATUS after live migration (or perhaps also
> when enabling the APIC?)
After live migration, GUEST_INTR_STATUS will be updated in VMEntry. 
'kvm_x86_ops->update_irq(vcpu)' function does that.

> 
> 
> 
> --
> error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/5]KVM:x86, apicv: adjust for virtual interrupt delivery

2012-09-14 Thread Li, Jiongxi
Sorry for the late response

> -Original Message-
> From: Avi Kivity [mailto:a...@redhat.com]
> Sent: Friday, September 07, 2012 12:22 AM
> To: Li, Jiongxi
> Cc: kvm@vger.kernel.org
> Subject: Re: [PATCH 2/5]KVM:x86, apicv: adjust for virtual interrupt delivery
> 
> On 09/05/2012 08:41 AM, Li, Jiongxi wrote:
> > Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> > manually, which is fully taken care of by the hardware. This needs
> > some special awareness into existing interrupr injection path:
> >
> >   - for pending interrupt, instead of direct injection, we may need
> > update architecture specific indicators before resuming to guest.
> >
> >   - A pending interrupt, which is masked by ISR, should be also
> > considered in above update action, since hardware will decide
> > when to inject it at right time. Current has_interrupt and
> > get_interrupt only returns a valid vector from injection p.o.v.
> >
> >
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -5194,6 +5194,13 @@ static void inject_pending_event(struct
> kvm_vcpu *vcpu)
> > vcpu->arch.nmi_injected = true;
> > kvm_x86_ops->set_nmi(vcpu);
> > }
> > +   } else if (kvm_apic_vid_enabled(vcpu)) {
> > +   if (kvm_cpu_has_interrupt_apic_vid(vcpu) &&
> > +   kvm_x86_ops->interrupt_allowed(vcpu)) {
> > +   kvm_queue_interrupt(vcpu,
> > +   kvm_cpu_get_interrupt_apic_vid(vcpu), false);
> > +   kvm_x86_ops->set_irq(vcpu);
> > +   }
> 
> It may be simpler to change kvm_cpu_{has,get}_interrupt to ignore the apic if
> virtual interrupt delivery is enabled.
OKs

> 
> > @@ -5293,16 +5300,27 @@ static int vcpu_enter_guest(struct kvm_vcpu
> *vcpu)
> > }
> >
> > if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> > +   /* update archtecture specific hints for APIC virtual interrupt 
> > delivery
> */
> > +   if (kvm_apic_vid_enabled(vcpu))
> > +   kvm_x86_ops->update_irq(vcpu);
> > +
> 
> Not defined.
This function is defined in patch 3/5. Because virtual interrupt delivery is 
not enabled in this patch. So this function is not called. Since we will enable 
this feature by default, so maybe we can merge PATCH 2,3,4 together into one 
patch.
> 
> > inject_pending_event(vcpu);
> >
> > /* enable NMI/IRQ window open exits if needed */
> > if (vcpu->arch.nmi_pending)
> > kvm_x86_ops->enable_nmi_window(vcpu);
> > -   else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> > +   else if (kvm_apic_vid_enabled(vcpu)) {
> > +   if (kvm_cpu_has_interrupt_apic_vid(vcpu))
> > +   kvm_x86_ops->enable_irq_window(vcpu);
> > +   } else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> > kvm_x86_ops->enable_irq_window(vcpu);
> >
> > if (kvm_lapic_enabled(vcpu)) {
> > -   update_cr8_intercept(vcpu);
> > +   /* no need for tpr_threshold update if APIC virtual
> > +* interrupt delivery is enabled
> > +*/
> > +   if (!kvm_apic_vid_enabled(vcpu))
> > +   update_cr8_intercept(vcpu);
> 
> Perhaps the arch function should do the ignoring.
You means putting the 'vid_enabled' judgement in 
'kvm_x86_ops->update_cr8_intercept'? Is it just out of the reason that reducing 
the code change in common code?
> 
> > kvm_lapic_sync_to_vapic(vcpu);
> > }
> > }
> >
> 
> 
> --
> error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/5]KVM:Enable APIC-Register Virtualization and Virtual-interrupt delivery

2012-09-14 Thread Li, Jiongxi
Sorry for the late response.

> -Original Message-
> From: Avi Kivity [mailto:a...@redhat.com]
> Sent: Thursday, September 06, 2012 11:45 PM
> To: Li, Jiongxi
> Cc: kvm@vger.kernel.org
> Subject: Re: [PATCH 0/5]KVM:Enable APIC-Register Virtualization and
> Virtual-interrupt delivery
> 
> On 09/05/2012 08:40 AM, Li, Jiongxi wrote:
> > The VMCS includes controls that enable the virtualization of interrupts and
> the Advanced Programmable Interrupt Controller (APIC).
> > When these controls are used, the processor will emulate many accesses to
> the APIC, track the state of the virtual APIC, and deliver virtual interrupts 
> - all in
> VMX non-root operation without a VM exit.
> > You can refer to Chapter 29 of the latest SDM.
> >
> > APICv support in KVM is split into 5 patches:
> >   0001-x86-apicv-add-APICv-register-virtualization-support.patch - enable
> APICv register virtualization
> >   0002-x86-apicv-adjust-for-virtual-interrupt-delivery.patch - add basic KVM
> frameowrk for virtual interrupt delivery
> >   0003-x86-apicv-enable-virtual-interrupt-delivery-for-VMX.patch - enable
> APICv virtual interrupt delivery
> >   0004-x86-apicv-add-interface-for-poking-EOI-exit-bitmap.patch - EOI exit
> bitmap handling
> >   0005-x86-apicv-add-virtual-x2apic-support.patch - handle MSR style
> > in virtual x2apic
> >
> > Apply them in above order
> > APICv is disabled by default, and use below command to enable it:
> > modprobe enable_apicv_reg=1 enable_apicv_vid=1
> >
> 
> Please use git send-email in the future for correct threading of the messages.
OK, We will use git send-email to send out the v2 patch

> 
> I don't see patches for enabling posted interrupts?  This can improve both
> assigned and virtual interrupt delivery.
We will have a separate patch for posted interrupts after cleaning up this 
patch. Meanwhile it is not ready.

> 
> --
> error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/5]KVM: x86, apicv: add APICv register virtualization support

2012-09-14 Thread Li, Jiongxi
Sorry for the late response

> -Original Message-
> From: Avi Kivity [mailto:a...@redhat.com]
> Sent: Friday, September 07, 2012 12:02 AM
> To: Li, Jiongxi
> Cc: kvm@vger.kernel.org
> Subject: Re: [PATCH 1/5]KVM: x86, apicv: add APICv register virtualization
> support
> 
> On 09/05/2012 08:41 AM, Li, Jiongxi wrote:
> > - APIC read doesn't cause VM-Exit
> > - APIC write becomes trap-like
> >
> >
> > +/* emulate APIC access in a trap manner */ int
> > +kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset) {
> > +   u32 val;
> > +
> > +   /* hw has done the conditional check and inst decode */
> > +   offset &= 0xff0;
> > +   if ((offset != APIC_EOI) &&
> > +apic_reg_read(vcpu->arch.apic, offset, 4, &val))
> > +   return 1;
> 
> TMICT is a write-only register IIRC.
> 
I haven't seen TMICT write-only in SDM. Also in there is ' 
apic_get_reg(apci,APIC_TMICT)' call in 'apic_get_tmcct' function.
> > +
> > +   /* TODO: optimize to just emulate side effect w/o one more write */
> > +   return apic_reg_write(vcpu->arch.apic, offset, val);
> 
> val may be uninitialized here.
> 
Can you elaborate that? For APIC_EOI, there is no need to use val. For 
apic_reg_read fails case, it will return before apic_reg_write
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode);
> > +
> >  void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu)  {
> >
> > +static bool __read_mostly enable_apicv_reg = 0;
> 
> Enable by default.
> 
> > +module_param(enable_apicv_reg, bool, S_IRUGO);
> 
> Let's have one module parameter for all related features, called apicv.
>  So modeprobe kvm-intel apicv=0 disables it.
Ok, We will change that
> 
> >
> > +static int handle_apic_write(struct kvm_vcpu *vcpu) {
> > +   unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> > +   u32 offset = exit_qualification & 0xfff;
> > +
> > +   /* APIC-write VM exit is trap-like and thus no need to adjust IP */
> > +   return kvm_apic_write_nodecode(vcpu, offset) == 0; }
> 
> Return 1 here means exit to userspace.  This will go crazy.
> 
> You need to return 0 always.  If this is an msr write to a read-only 
> register, you
> need to inject a #GP (IIRC).
Oks.
> 
> 
> 
> --
> error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5]KVM:x86, apicv: add virtual x2apic support

2012-09-04 Thread Li, Jiongxi
basically to benefit from apicv, we need clear MSR bitmap for
corresponding x2apic MSRs:
0x800 - 0x8ff: no read intercept for apicv register virtualization
TPR,EOI,SELF-IPI: no write intercept for virtual interrupt delivery

Signed-off-by: Kevin Tian 
Signed-off-by: Jiongxi Li 
---
 arch/x86/kvm/vmx.c |   46 +++---
 1 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 73ff537..2db1ddc 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3724,7 +3724,9 @@ static void free_vpid(struct vcpu_vmx *vmx)
spin_unlock(&vmx_vpid_lock);
 }
 
-static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, u32 msr)
+#define MSR_TYPE_R 1
+#define MSR_TYPE_W 2
+static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, u32 
msr, int type)
 {
int f = sizeof(unsigned long);
 
@@ -3737,20 +3739,38 @@ static void __vmx_disable_intercept_for_msr(unsigned 
long *msr_bitmap, u32 msr)
 * We can control MSRs 0x-0x1fff and 0xc000-0xc0001fff.
 */
if (msr <= 0x1fff) {
-   __clear_bit(msr, msr_bitmap + 0x000 / f); /* read-low */
-   __clear_bit(msr, msr_bitmap + 0x800 / f); /* write-low */
+   if (type & MSR_TYPE_R)
+   __clear_bit(msr, msr_bitmap + 0x000 / f); /* read-low */
+   if (type & MSR_TYPE_W)
+   __clear_bit(msr, msr_bitmap + 0x800 / f); /* write-low 
*/
} else if ((msr >= 0xc000) && (msr <= 0xc0001fff)) {
msr &= 0x1fff;
-   __clear_bit(msr, msr_bitmap + 0x400 / f); /* read-high */
-   __clear_bit(msr, msr_bitmap + 0xc00 / f); /* write-high */
+   if (type & MSR_TYPE_R)
+   __clear_bit(msr, msr_bitmap + 0x400 / f); /* read-high 
*/
+   if (type & MSR_TYPE_W)
+   __clear_bit(msr, msr_bitmap + 0xc00 / f); /* write-high 
*/
}
 }
 
 static void vmx_disable_intercept_for_msr(u32 msr, bool longmode_only)
 {
if (!longmode_only)
-   __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy, msr);
-   __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode, msr);
+   __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy, msr, 
MSR_TYPE_R | MSR_TYPE_W);
+   __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode, msr, 
MSR_TYPE_R | MSR_TYPE_W);
+}
+
+static void vmx_disable_intercept_for_msr_read(u32 msr, bool longmode_only)
+{
+   if (!longmode_only)
+   __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy, msr, 
MSR_TYPE_R);
+   __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode, msr, 
MSR_TYPE_R);
+}
+
+static void vmx_disable_intercept_for_msr_write(u32 msr, bool longmode_only)
+{
+   if (!longmode_only)
+   __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy, msr, 
MSR_TYPE_W);
+   __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode, msr, 
MSR_TYPE_W);
 }
 
 /*
@@ -7524,6 +7544,18 @@ static int __init vmx_init(void)
vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false);
vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false);
 
+   if (enable_apicv_reg) {
+   int msr;
+   for (msr = 0x800; msr <= 0x8ff; msr++)
+   vmx_disable_intercept_for_msr_read(msr, false);
+   }
+
+   if (enable_apicv_vid) {
+   vmx_disable_intercept_for_msr_write(0x808, false); // TPR
+   vmx_disable_intercept_for_msr_write(0x80b, false); // EOI
+   vmx_disable_intercept_for_msr_write(0x83f, false); // SELF-IPI
+   }
+
if (enable_ept) {
kvm_mmu_set_mask_ptes(0ull,
(enable_ept_ad_bits) ? VMX_EPT_ACCESS_BIT : 0ull,
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5]KVM:x86, apicv: add interface for poking EOI exit bitmap

2012-09-04 Thread Li, Jiongxi
With APICv virtual interrupt delivery feature, EOI write from non
root mode doesn't cause VM-Exit unless set in EOI exit bitmap VMCS
field. Basically there're two methods to manipulate EOI exit bitmap:

[Option 1]
Ideally only level triggered irq requires a hook in vLAPIC EOI write,
so that vIOAPIC EOI is triggered and emulated. So the simplest
approach is to manipulate EOI exit bitmap when vLAPIC acks a new
interrupt, based on value of TMR. There're several corner cases
worthy of note though:

  - KVM has specific notifier hooks on vIOAPIC EOI path. So far two
sources use it: INT-based device passthrough and PIT pending
timers. For the former, it's virtually wired to vIOAPIC and
thus TMR already covers it. PIT is special here, which is an
edge triggered source. But since other timer sources like
vLAPIC timer don't require this notifier hook, possibly PIT
can be relaxed in the future too.

  - posted interrupt will update TMR directly, w/o chance for KVM
to update EOI exit bitmap accordingly. This becomes a gap

[Option 2]
Indicate EOI exit bitmap requirement ('need_eoi') directly from
every interrupt source device, and then check this requirement
when vLAPIC acks a new pending interrupt. This requires more
intrusive changes to current vLAPIC/vIOAPIC logic, so that the
"irq_source_id" indicating source of interrupt is passed through
from origination point to vLAPIC ack point. For natual requirement
like vIOAPIC level triggered entries, it can be implicitly deduced.
On the other hand for non-natural requirements like aformentioned
PIT or posted interrupt, this approach can handle it efficiently.

For simplicity reason, now option 1 is used which should be
enough to test MSI-based device passthrough.

Signed-off-by: Kevin Tian 
Signed-off-by: Jiongxi Li 
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/lapic.c|7 ++-
 arch/x86/kvm/vmx.c  |   37 +
 3 files changed, 44 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ef74df5..4e06a82 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -671,6 +671,7 @@ struct kvm_x86_ops {
void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
void (*update_irq)(struct kvm_vcpu *vcpu);
+   void (*set_eoi_exitmap)(struct kvm_vcpu *vcpu, int vector, int 
need_eoi);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*get_tdp_level)(void);
u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index d203501..4058384 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -499,8 +499,13 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int 
delivery_mode,
if (trig_mode) {
apic_debug("level trig mode for vector %d", vector);
apic_set_vector(vector, apic->regs + APIC_TMR);
-   } else
+   if (kvm_apic_vid_enabled(vcpu))
+   kvm_x86_ops->set_eoi_exitmap(vcpu, vector, 1);
+   } else {
apic_clear_vector(vector, apic->regs + APIC_TMR);
+   if (kvm_apic_vid_enabled(vcpu))
+   kvm_x86_ops->set_eoi_exitmap(vcpu, vector, 0);
+   }
 
result = !apic_test_and_set_irr(vector, apic);
trace_kvm_apic_accept_irq(vcpu->vcpu_id, delivery_mode,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 424a09d..73ff537 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -433,6 +433,7 @@ struct vcpu_vmx {
 
bool rdtscp_enabled;
 
+   u32 eoi_exitmap_changed;
u64 eoi_exit_bitmap[4];
 
/* Support for a guest hypervisor (nested VMX) */
@@ -6128,6 +6129,7 @@ static void vmx_update_irq(struct kvm_vcpu *vcpu)
u16 status;
u8 old;
int vector;
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
 
vector = kvm_apic_get_highest_irr(vcpu);
if (vector == -1)
@@ -6140,6 +6142,40 @@ static void vmx_update_irq(struct kvm_vcpu *vcpu)
status |= (u8)vector;
vmcs_write16(GUEST_INTR_STATUS, status);
}
+
+   if (vmx->eoi_exitmap_changed) {
+#define UPDATE_EOI_EXITMAP(v, e) { \
+   if (test_and_clear_bit(e, (void *)&(v)->eoi_exitmap_changed))   \
+   vmcs_write64(EOI_EXIT_BITMAP##e, (v)->eoi_exit_bitmap[e]);}
+
+   UPDATE_EOI_EXITMAP(vmx, 0);
+   UPDATE_EOI_EXITMAP(vmx, 1);
+   UPDATE_EOI_EXITMAP(vmx, 2);
+   UPDATE_EOI_EXITMAP(vmx, 3);
+   }
+}
+
+static void vmx_set_eoi_exitmap(struct kvm_vcpu *vcpu,
+   int vector,
+ 

[PATCH 3/5]KVM:x86, apicv: enable virtual interrupt delivery for VMX

2012-09-04 Thread Li, Jiongxi
- before returning to guest, RVI should be updated if any pending IRRs
- EOI exit bitmap controls whether an EOI write should cause VM-Exit.
  if set, a trap-like induced EOI VM-Exit is triggered. Keep all the
  bitmaps cleared for now, which should be enough to allow a MSI based
  device passthrough

Signed-off-by: Kevin Tian 
Signed-off-by: Jiongxi Li 
---
 arch/x86/include/asm/vmx.h |   11 
 arch/x86/kvm/lapic.c   |   22 +++-
 arch/x86/kvm/lapic.h   |1 +
 arch/x86/kvm/vmx.c |   62 ++-
 4 files changed, 93 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 4a8193e..b1eca96 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -60,6 +60,7 @@
 #define SECONDARY_EXEC_WBINVD_EXITING  0x0040
 #define SECONDARY_EXEC_UNRESTRICTED_GUEST  0x0080
 #define SECONDARY_EXEC_APIC_REGISTER_VIRT   0x0100
+#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY0x0200
 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING  0x0400
 #define SECONDARY_EXEC_ENABLE_INVPCID  0x1000
 
@@ -97,6 +98,7 @@ enum vmcs_field {
GUEST_GS_SELECTOR   = 0x080a,
GUEST_LDTR_SELECTOR = 0x080c,
GUEST_TR_SELECTOR   = 0x080e,
+   GUEST_INTR_STATUS   = 0x0810,
HOST_ES_SELECTOR= 0x0c00,
HOST_CS_SELECTOR= 0x0c02,
HOST_SS_SELECTOR= 0x0c04,
@@ -124,6 +126,14 @@ enum vmcs_field {
APIC_ACCESS_ADDR_HIGH   = 0x2015,
EPT_POINTER = 0x201a,
EPT_POINTER_HIGH= 0x201b,
+   EOI_EXIT_BITMAP0= 0x201c,
+   EOI_EXIT_BITMAP0_HIGH   = 0x201d,
+   EOI_EXIT_BITMAP1= 0x201e,
+   EOI_EXIT_BITMAP1_HIGH   = 0x201f,
+   EOI_EXIT_BITMAP2= 0x2020,
+   EOI_EXIT_BITMAP2_HIGH   = 0x2021,
+   EOI_EXIT_BITMAP3= 0x2022,
+   EOI_EXIT_BITMAP3_HIGH   = 0x2023,
GUEST_PHYSICAL_ADDRESS  = 0x2400,
GUEST_PHYSICAL_ADDRESS_HIGH = 0x2401,
VMCS_LINK_POINTER   = 0x2800,
@@ -279,6 +289,7 @@ enum vmcs_field {
 #define EXIT_REASON_MCE_DURING_VMENTRY  41
 #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
 #define EXIT_REASON_APIC_ACCESS 44
+#define EXIT_REASON_EOI_INDUCED 45
 #define EXIT_REASON_EPT_VIOLATION   48
 #define EXIT_REASON_EPT_MISCONFIG   49
 #define EXIT_REASON_WBINVD 54
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index c47f3d3..d203501 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -604,7 +604,27 @@ static int apic_set_eoi(struct kvm_lapic *apic)
return vector;
 }
 
-static void apic_send_ipi(struct kvm_lapic *apic)
+/*
+ * this interface assumes a trap-like exit, which has already finished
+ * desired side effect including vISR and vPPR update.
+ */
+void kvm_apic_set_eoi(struct kvm_vcpu *vcpu, int vector)
+{
+   struct kvm_lapic *apic = vcpu->arch.apic;
+   int trigger_mode;
+
+   if (apic_test_and_clear_vector(vector, apic->regs + APIC_TMR))
+   trigger_mode = IOAPIC_LEVEL_TRIG;
+   else
+   trigger_mode = IOAPIC_EDGE_TRIG;
+
+   if (!(apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI))
+   kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
+   kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
+}
+EXPORT_SYMBOL_GPL(kvm_apic_set_eoi);
+
+ static void apic_send_ipi(struct kvm_lapic *apic)
 {
u32 icr_low = apic_get_reg(apic, APIC_ICR);
u32 icr_high = apic_get_reg(apic, APIC_ICR2);
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 4e3b435..585337f 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -60,6 +60,7 @@ u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
 void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
 
 int kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);
+void kvm_apic_set_eoi(struct kvm_vcpu *vcpu, int vector);
 
 void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr);
 void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4a26d04..424a09d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -86,6 +86,9 @@ module_param(fasteoi, bool, S_IRUGO);
 static bool __read_mostly enable_apicv_reg = 0;
 module_param(enable_apicv_reg, bool, S_IRUGO);
 
+static bool __read_mostly enable_apicv_vid = 0;
+module_param(enable_apicv_vid, bool, S_IRUGO);
+
 /*
  * If nested=1, nested virtualization is supported, i.e., guests may use
  * VMX and be a hypervisor for its own guests. If nested=0, guests may not
@@ -430,6 +433,8 @@ struct vcpu_vmx {

[PATCH 2/5]KVM:x86, apicv: adjust for virtual interrupt delivery

2012-09-04 Thread Li, Jiongxi
Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
manually, which is fully taken care of by the hardware. This needs
some special awareness into existing interrupr injection path:

  - for pending interrupt, instead of direct injection, we may need
update architecture specific indicators before resuming to guest.

  - A pending interrupt, which is masked by ISR, should be also
considered in above update action, since hardware will decide
when to inject it at right time. Current has_interrupt and
get_interrupt only returns a valid vector from injection p.o.v.

Three new interfaces are introduced accordingly:
kvm_apic_get_highest_irr
kvm_cpu_has_interrupt_apicv_vid
kvm_cpu_get_interrupt_apic_vid

Signed-off-by: Kevin Tian 
Signed-off-by: Jiongxi Li 
---
 arch/x86/include/asm/kvm_host.h |2 +
 arch/x86/kvm/irq.c  |   44 +++
 arch/x86/kvm/lapic.c|   13 +++
 arch/x86/kvm/lapic.h|   10 
 arch/x86/kvm/svm.c  |6 +
 arch/x86/kvm/vmx.c  |6 +
 arch/x86/kvm/x86.c  |   22 +-
 7 files changed, 101 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 09155d6..ef74df5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -669,6 +669,8 @@ struct kvm_x86_ops {
void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
void (*enable_irq_window)(struct kvm_vcpu *vcpu);
void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
+   int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
+   void (*update_irq)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*get_tdp_level)(void);
u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 7e06ba1..abd3831 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -60,6 +60,29 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
 EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
 
 /*
+ * check if there is pending interrupt without
+ * intack. This _apicv version is used when hardware
+ * supports APIC virtualization with virtual interrupt
+ * delivery support. In such case, KVM is not required
+ * to poll pending APIC interrupt, and thus this
+ * interface is used to poll pending interupts from
+ * non-APIC source.
+ */
+int kvm_cpu_has_interrupt_apic_vid(struct kvm_vcpu *v)
+{
+   struct kvm_pic *s;
+
+   if (!irqchip_in_kernel(v->kvm))
+   return v->arch.interrupt.pending;
+
+   if (kvm_apic_accept_pic_intr(v)) {
+   s = pic_irqchip(v->kvm);/* PIC */
+   return s->output;
+   } else
+   return 0;
+}
+
+/*
  * Read pending interrupt vector and intack.
  */
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
@@ -82,6 +105,27 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
 }
 EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
 
+/*
+ * Read pending interrupt vector and intack.
+ * Similar to kvm_cpu_has_interrupt_apicv, to get
+ * interrupts from non-APIC sources.
+ */
+int kvm_cpu_get_interrupt_apic_vid(struct kvm_vcpu *v)
+{
+   struct kvm_pic *s;
+   int vector = -1;
+
+   if (!irqchip_in_kernel(v->kvm))
+   return v->arch.interrupt.nr;
+
+   if (kvm_apic_accept_pic_intr(v)) {
+   s = pic_irqchip(v->kvm);
+   s->output = 0;  /* PIC */
+   vector = kvm_pic_read_irq(v->kvm);
+   }
+   return vector;
+}
+
 void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
 {
kvm_inject_apic_timer_irqs(vcpu);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 4a6d3a4..c47f3d3 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1310,6 +1310,8 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
kvm_lapic_reset(vcpu);
kvm_iodevice_init(&apic->dev, &apic_mmio_ops);
 
+   if (kvm_x86_ops->has_virtual_interrupt_delivery(vcpu))
+   apic->vid_enabled = true;
return 0;
 nomem_free_apic:
kfree(apic);
@@ -1333,6 +1335,17 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
return highest_irr;
 }
 
+int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu)
+{
+struct kvm_lapic *apic = vcpu->arch.apic;
+
+if (!apic || !apic_enabled(apic))
+return -1;
+
+return apic_find_highest_irr(apic);
+}
+EXPORT_SYMBOL_GPL(kvm_apic_get_highest_irr);
+
 int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
 {
u32 lvt0 = apic_get_reg(vcpu->arch.apic, APIC_LVT0);
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index cd4875e..4e3b435 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -13,6 +13,7 @@ struct kvm_lapic {
u32 divide_count;
struct kvm_vcpu *vcpu;
 

[PATCH 1/5]KVM: x86, apicv: add APICv register virtualization support

2012-09-04 Thread Li, Jiongxi
- APIC read doesn't cause VM-Exit
- APIC write becomes trap-like

Signed-off-by: Kevin Tian 
Signed-off-by: Jiongxi Li 
---
 arch/x86/include/asm/vmx.h |2 ++
 arch/x86/kvm/lapic.c   |   16 
 arch/x86/kvm/lapic.h   |2 ++
 arch/x86/kvm/vmx.c |   30 ++
 4 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 74fcb96..4a8193e 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -59,6 +59,7 @@
 #define SECONDARY_EXEC_ENABLE_VPID  0x0020
 #define SECONDARY_EXEC_WBINVD_EXITING  0x0040
 #define SECONDARY_EXEC_UNRESTRICTED_GUEST  0x0080
+#define SECONDARY_EXEC_APIC_REGISTER_VIRT   0x0100
 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING  0x0400
 #define SECONDARY_EXEC_ENABLE_INVPCID  0x1000
 
@@ -282,6 +283,7 @@ enum vmcs_field {
 #define EXIT_REASON_EPT_MISCONFIG   49
 #define EXIT_REASON_WBINVD 54
 #define EXIT_REASON_XSETBV 55
+#define EXIT_REASON_APIC_WRITE 56
 #define EXIT_REASON_INVPCID58
 
 /*
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index ce87878..4a6d3a4 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1037,6 +1037,22 @@ static int apic_mmio_write(struct kvm_io_device *this,
return 0;
 }
 
+/* emulate APIC access in a trap manner */
+int kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
+{
+   u32 val;
+
+   /* hw has done the conditional check and inst decode */
+   offset &= 0xff0;
+   if ((offset != APIC_EOI) &&
+apic_reg_read(vcpu->arch.apic, offset, 4, &val))
+   return 1;
+
+   /* TODO: optimize to just emulate side effect w/o one more write */
+   return apic_reg_write(vcpu->arch.apic, offset, val);
+}
+EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode);
+
 void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu)
 {
struct kvm_lapic *apic = vcpu->arch.apic;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 4af5405..cd4875e 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -55,6 +55,8 @@ int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
 u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
 void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
 
+int kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);
+
 void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr);
 void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
 void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c00f03d..3d92277 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -83,6 +83,9 @@ module_param(vmm_exclusive, bool, S_IRUGO);
 static bool __read_mostly fasteoi = 1;
 module_param(fasteoi, bool, S_IRUGO);
 
+static bool __read_mostly enable_apicv_reg = 0;
+module_param(enable_apicv_reg, bool, S_IRUGO);
+
 /*
  * If nested=1, nested virtualization is supported, i.e., guests may use
  * VMX and be a hypervisor for its own guests. If nested=0, guests may not
@@ -760,6 +763,12 @@ static inline bool 
cpu_has_vmx_virtualize_apic_accesses(void)
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
 }
 
+static inline bool cpu_has_vmx_apic_register_virt(void)
+{
+   return vmcs_config.cpu_based_2nd_exec_ctrl &
+   SECONDARY_EXEC_APIC_REGISTER_VIRT;
+}
+
 static inline bool cpu_has_vmx_flexpriority(void)
 {
return cpu_has_vmx_tpr_shadow() &&
@@ -2475,6 +2484,7 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf)
SECONDARY_EXEC_UNRESTRICTED_GUEST |
SECONDARY_EXEC_PAUSE_LOOP_EXITING |
SECONDARY_EXEC_RDTSCP |
+   SECONDARY_EXEC_APIC_REGISTER_VIRT |
SECONDARY_EXEC_ENABLE_INVPCID;
if (adjust_vmx_controls(min2, opt2,
MSR_IA32_VMX_PROCBASED_CTLS2,
@@ -2486,6 +2496,11 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf)
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES))
_cpu_based_exec_control &= ~CPU_BASED_TPR_SHADOW;
 #endif
+
+   if (!(_cpu_based_exec_control & CPU_BASED_TPR_SHADOW))
+   _cpu_based_2nd_exec_control &= ~(
+   SECONDARY_EXEC_APIC_REGISTER_VIRT);
+
if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) {
/* CR3 accesses and invlpg don't need to cause VM Exits when EPT
   enabled */
@@ -2683,6 +2698,9 @@ static __init int hardware_setup(void)
if (!cpu_has_vmx_ple())
ple_gap = 0;
 
+   if (!cpu_has_vmx_apic_register_virt())
+   enable_apicv_reg = 0;
+
if (nested)
nested_vmx_setup_ctls_msrs();
 
@@ -3812,6 +3830,8 @@ static u32 vmx_s

[PATCH 0/5]KVM:Enable APIC-Register Virtualization and Virtual-interrupt delivery

2012-09-04 Thread Li, Jiongxi
The VMCS includes controls that enable the virtualization of interrupts and the 
Advanced Programmable Interrupt Controller (APIC).
When these controls are used, the processor will emulate many accesses to the 
APIC, track the state of the virtual APIC, and deliver virtual interrupts - all 
in VMX non-root operation without a VM exit.
You can refer to Chapter 29 of the latest SDM.

APICv support in KVM is split into 5 patches:
  0001-x86-apicv-add-APICv-register-virtualization-support.patch - enable APICv 
register virtualization
  0002-x86-apicv-adjust-for-virtual-interrupt-delivery.patch - add basic KVM 
frameowrk for virtual interrupt delivery
  0003-x86-apicv-enable-virtual-interrupt-delivery-for-VMX.patch - enable APICv 
virtual interrupt delivery
  0004-x86-apicv-add-interface-for-poking-EOI-exit-bitmap.patch - EOI exit 
bitmap handling
  0005-x86-apicv-add-virtual-x2apic-support.patch - handle MSR style in virtual 
x2apic

Apply them in above order
APICv is disabled by default, and use below command to enable it:
modprobe enable_apicv_reg=1 enable_apicv_vid=1
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html