Re: GET_RNG_SEED hypercall ABI? (Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm)

2014-08-28 Thread Paolo Bonzini
Il 29/08/2014 02:13, Andy Lutomirski ha scritto:
> Hmm.  Then, assuming that someone manages to allocate a
> cross-hypervisor MSR number for this, what am I supposed to do in the
> KVM code?  Just make it available unconditionally?  I don't see why
> that wouldn't work reliably, but it seems like an odd design.

The odd part of it is what Gleb mentioned.

> Also, the one and only native feature flag I tested (rdtscp) actually
> does work: RDTSCP seems to send #UD if QEMU is passed -cpu
> host,-rdtscp.

True, and I'm not sure why.  There are a couple others.  I was thinking
more of things like SSE, AVX or DE (that affects the availability of a
bit in CR4).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofitscorrespondingioapic->irr bit always set

2014-08-28 Thread Jason Wang
On 08/29/2014 12:07 PM, Zhang, Yang Z wrote:
> Zhang Haoyu wrote on 2014-08-29:
>> > Hi, Yang, Gleb, Michael,
>> > Could you help review below patch please?
> I don't quite understand the background. Why ioacpi->irr is setting before 
> EOI? It should be driver's responsibility to clear the interrupt before 
> issuing EOI.
>

This may happen when a interrupt was injected to guest when its irq
handler (driver) was not registered. So irr was still set even during
EOI broadcast, and then this irq will be injected to guest immediately.
This may cause a dead loop for guest who does not have the ability to
detect and disable interrupt storm.

This may be a bug of device model, but we want to know in real cpu, is
there a small time gap between the finish of EOI broadcast and the
interrupt raised by EOI? If yes, looks like KVM should emulate this
behaviour?

Thanks
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofitscorrespondingioapic->irr bit always set

2014-08-28 Thread Zhang Haoyu
Hi, Yang, Gleb, Michael,
Could you help review below patch please?

Thanks,
Zhang Haoyu

>> Hi Jason,
>> I tested below patch, it's okay, the e1000 interrupt storm disappeared.
>> But I am going to make a bit change on it, could you help review it?
>>
>>> Currently, we call ioapic_service() immediately when we find the irq is 
>>> still
>>> active during eoi broadcast. But for real hardware, there's some dealy 
>>> between
>>> the EOI writing and irq delivery (system bus latency?). So we need to 
>>> emulate
>>> this behavior. Otherwise, for a guest who haven't register a proper irq 
>>> handler
>>> , it would stay in the interrupt routine as this irq would be re-injected
>>> immediately after guest enables interrupt. This would lead guest can't move
>>> forward and may miss the possibility to get proper irq handler registered 
>>> (one
>>> example is windows guest resuming from hibernation).
>>>
>>> As there's no way to differ the unhandled irq from new raised ones, this 
>>> patch
>>> solve this problems by scheduling a delayed work when the count of irq 
>>> injected
>>> during eoi broadcast exceeds a threshold value. After this patch, the guest 
>>> can
>>> move a little forward when there's no suitable irq handler in case it may
>>> register one very soon and for guest who has a bad irq detection routine ( 
>>> such
>>> as note_interrupt() in linux ), this bad irq would be recognized soon as in 
>>> the
>>> past.
>>>
>>> Signed-off-by: Jason Wang  redhat.com>
>>> ---
>>> virt/kvm/ioapic.c |   47 +--
>>> virt/kvm/ioapic.h |2 ++
>>> 2 files changed, 47 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
>>> index dcaf272..892253e 100644
>>> --- a/virt/kvm/ioapic.c
>>> +++ b/virt/kvm/ioapic.c
>>> -221,6 +221,24  int kvm_ioapic_set_irq(struct 
>>> kvm_ioapic *ioapic, int irq, int level)
>>> return ret;
>>> }
>>>
>>> +static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
>>> +{
>>> +   int i, ret;
>>> +   struct kvm_ioapic *ioapic = container_of(work, struct kvm_ioapic,
>>> +eoi_inject.work);
>>> +   spin_lock(&ioapic->lock);
>>> +   for (i = 0; i < IOAPIC_NUM_PINS; i++) {
>>> +   union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
>>> +
>>> +   if (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG)
>>> +   continue;
>>> +
>>> +   if (ioapic->irr & (1 << i) && !ent->fields.remote_irr)
>>> +   ret = ioapic_service(ioapic, i);
>>> +   }
>>> +   spin_unlock(&ioapic->lock);
>>> +}
>>> +
>>> static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
>>>  int trigger_mode)
>>> {
>>> -249,8 +267,29  static void 
>>> __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
>>>
>>> ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
>>> ent->fields.remote_irr = 0;
>>> -   if (!ent->fields.mask && (ioapic->irr & (1 << i)))
>>> -   ioapic_service(ioapic, i);
>>> +   if (!ent->fields.mask && (ioapic->irr & (1 << i))) {
>>> +   ++ioapic->irq_eoi;
>> -+   ++ioapic->irq_eoi;
>> ++   ++ioapic->irq_eoi[i];
>>> +   if (ioapic->irq_eoi == 100) {
>> -+   if (ioapic->irq_eoi == 100) {
>> ++   if (ioapic->irq_eoi[i] == 100) {
>>> +   /*
>>> +* Real hardware does not deliver the irq so
>>> +* immediately during eoi broadcast, so we need
>>> +* to emulate this behavior. Otherwise, for
>>> +* guests who has not registered handler of a
>>> +* level irq, this irq would be injected
>>> +* immediately after guest enables interrupt
>>> +* (which happens usually at the end of the
>>> +* common interrupt routine). This would lead
>>> +* guest can't move forward and may miss the
>>> +* possibility to get proper irq handler
>>> +* registered. So we need to give some breath to
>>> +* guest. TODO: 1 is too long?
>>> +*/
>>> +   schedule_delayed_work(&ioapic->eoi_inject, 1);
>>> +   ioapic->irq_eoi = 0;
>> -+   ioapic->irq_eoi = 0;
>> ++   ioapic->irq_eoi[i] = 0;
>>> +   } else {
>>> +   ioapic_service(ioapic, i);
>>> +   }
>>> +   }
>> ++   else {
>> ++   ioapic->irq_eoi[i] = 0;
>> ++   }
>>> }
>>> }
>> I think ioapic->irq_eoi is prone to reach to 100, beca

Re: [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofits correspondingioapic->irr bit always set

2014-08-28 Thread Jason Wang
On 08/28/2014 08:55 PM, Zhang Haoyu wrote:
> Hi Jason,
> I tested below patch, it's okay, the e1000 interrupt storm disappeared.
> But I am going to make a bit change on it, could you help review it?
>
>> Currently, we call ioapic_service() immediately when we find the irq is still
>> active during eoi broadcast. But for real hardware, there's some dealy 
>> between
>> the EOI writing and irq delivery (system bus latency?). So we need to emulate
>> this behavior. Otherwise, for a guest who haven't register a proper irq 
>> handler
>> , it would stay in the interrupt routine as this irq would be re-injected
>> immediately after guest enables interrupt. This would lead guest can't move
>> forward and may miss the possibility to get proper irq handler registered 
>> (one
>> example is windows guest resuming from hibernation).
>>
>> As there's no way to differ the unhandled irq from new raised ones, this 
>> patch
>> solve this problems by scheduling a delayed work when the count of irq 
>> injected
>> during eoi broadcast exceeds a threshold value. After this patch, the guest 
>> can
>> move a little forward when there's no suitable irq handler in case it may
>> register one very soon and for guest who has a bad irq detection routine ( 
>> such
>> as note_interrupt() in linux ), this bad irq would be recognized soon as in 
>> the
>> past.
>>
>> Signed-off-by: Jason Wang  redhat.com>
>> ---
>> virt/kvm/ioapic.c |   47 +--
>> virt/kvm/ioapic.h |2 ++
>> 2 files changed, 47 insertions(+), 2 deletions(-)
>>
>> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
>> index dcaf272..892253e 100644
>> --- a/virt/kvm/ioapic.c
>> +++ b/virt/kvm/ioapic.c
>> -221,6 +221,24  int kvm_ioapic_set_irq(struct 
>> kvm_ioapic *ioapic, int irq, int level)
>>  return ret;
>> }
>>
>> +static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
>> +{
>> +int i, ret;
>> +struct kvm_ioapic *ioapic = container_of(work, struct kvm_ioapic,
>> + eoi_inject.work);
>> +spin_lock(&ioapic->lock);
>> +for (i = 0; i < IOAPIC_NUM_PINS; i++) {
>> +union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
>> +
>> +if (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG)
>> +continue;
>> +
>> +if (ioapic->irr & (1 << i) && !ent->fields.remote_irr)
>> +ret = ioapic_service(ioapic, i);
>> +}
>> +spin_unlock(&ioapic->lock);
>> +}
>> +
>> static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
>>   int trigger_mode)
>> {
>> -249,8 +267,29  static void 
>> __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
>>
>>  ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
>>  ent->fields.remote_irr = 0;
>> -if (!ent->fields.mask && (ioapic->irr & (1 << i)))
>> -ioapic_service(ioapic, i);
>> +if (!ent->fields.mask && (ioapic->irr & (1 << i))) {
>> +++ioapic->irq_eoi;
> -+++ioapic->irq_eoi;
> ++++ioapic->irq_eoi[i];
>> +if (ioapic->irq_eoi == 100) {
> -+if (ioapic->irq_eoi == 100) {
> ++if (ioapic->irq_eoi[i] == 100) {
>> +/*
>> + * Real hardware does not deliver the irq so
>> + * immediately during eoi broadcast, so we need
>> + * to emulate this behavior. Otherwise, for
>> + * guests who has not registered handler of a
>> + * level irq, this irq would be injected
>> + * immediately after guest enables interrupt
>> + * (which happens usually at the end of the
>> + * common interrupt routine). This would lead
>> + * guest can't move forward and may miss the
>> + * possibility to get proper irq handler
>> + * registered. So we need to give some breath to
>> + * guest. TODO: 1 is too long?
>> + */
>> +schedule_delayed_work(&ioapic->eoi_inject, 1);
>> +ioapic->irq_eoi = 0;
> -+ioapic->irq_eoi = 0;
> ++ioapic->irq_eoi[i] = 0;
>> +} else {
>> +ioapic_service(ioapic, i);
>> +}
>> +}
> ++else {
> ++ioapic->irq_eoi[i] = 0;
> ++}
>>  }
>> }
> I think ioapic->irq_eoi is prone to reach to 100, because during a eoi 
> broadcast, 
> it's possible that another interrupt's (not current eoi's corresponding 
> int

RE: [PATCH v2] KVM: x86: keep eoi exit bitmap accurate before loading it.

2014-08-28 Thread Wang, Wei W
I think we can think about it for another couple of days and see if any corner 
case is not covered.
 
Wei

> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Thursday, August 28, 2014 7:01 PM
> To: Wang, Wei W; Zhang, Yang Z; kvm@vger.kernel.org
> Cc: alex.william...@redhat.com
> Subject: Re: [PATCH v2] KVM: x86: keep eoi exit bitmap accurate before
> loading it.
> 
> Il 28/08/2014 12:14, Wang, Wei W ha scritto:
> > We will do some more tests on it to make sure there are no problems.
> 
> No, I don't think there are any easily-detected practical problems with the
> patch.  But I'm not sure I understand all the theoretical problems and
> whether possible races are benign.  These would be really hard to debug,
> unless you get a bug that is 100% reproducible.
> 
> Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GET_RNG_SEED hypercall ABI? (Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm)

2014-08-28 Thread Andy Lutomirski
On Thu, Aug 28, 2014 at 12:46 PM, Paolo Bonzini  wrote:
> Il 28/08/2014 18:22, Andy Lutomirski ha scritto:
>> Is there a non-cpuid interface between QEMU and KVM for this?
>
> No.

Hmm.  Then, assuming that someone manages to allocate a
cross-hypervisor MSR number for this, what am I supposed to do in the
KVM code?  Just make it available unconditionally?  I don't see why
that wouldn't work reliably, but it seems like an odd design.

>
>> AFAICT, even turning off cpuid bits for things like async pf doesn't
>> actually disable the MSRs (which is arguably an attack surface issue).
>
> No, it doesn't.  You cannot disable instructions even if you hide CPUID
> bits, so KVM just extends this to MSRs (both native and paravirtual). It
> sometimes helps too, for example with a particular guest OS that does
> not necessary check CPUID for bits that are always present on Apple
> hardware...

But I bet that no one assumes that KVM paravirt MSRs are available
even if the feature bit isn't set.

Also, the one and only native feature flag I tested (rdtscp) actually
does work: RDTSCP seems to send #UD if QEMU is passed -cpu
host,-rdtscp.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: x86: fix stale mmio cache bug

2014-08-28 Thread David Matlack
On Mon, Aug 18, 2014 at 3:46 PM, David Matlack  wrote:
> The following events can lead to an incorrect KVM_EXIT_MMIO bubbling
> up to userspace:
>
> (1) Guest accesses gpa X without a memory slot. The gfn is cached in
> struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets
> the SPTE write-execute-noread so that future accesses cause
> EPT_MISCONFIGs.
>
> (2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION
> covering the page just accessed.
>
> (3) Guest attempts to read or write to gpa X again. On Intel, this
> generates an EPT_MISCONFIG. The memory slot generation number that
> was incremented in (2) would normally take care of this but we fast
> path mmio faults through quickly_check_mmio_pf(), which only checks
> the per-vcpu mmio cache. Since we hit the cache, KVM passes a
> KVM_EXIT_MMIO up to userspace.
>
> This patch fixes the issue by using the memslot generation number
> to validate the mmio cache.
>
> [ xiaoguangrong: adjust the code to make it simpler for stable-tree fix. ]
>
> Cc: sta...@vger.kernel.org
> Signed-off-by: David Matlack 
> Signed-off-by: Xiao Guangrong 
> ---

Paolo,
It seems like this patch ("[PATCH 2/2] kvm: x86: fix stale mmio cache")
is ready to go. Is there anything blocking it from being merged?

(It should be fine to merge this on its own, independent of the fix
discussed in "[PATCH 1/2] KVM: fix cache stale memslot info with
correct mmio generation number", https://lkml.org/lkml/2014/8/14/62.)

>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/mmu.c  |  4 ++--
>  arch/x86/kvm/mmu.h  |  2 ++
>  arch/x86/kvm/x86.h  | 21 -
>  4 files changed, 21 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 49205d0..f518d14 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -479,6 +479,7 @@ struct kvm_vcpu_arch {
> u64 mmio_gva;
> unsigned access;
> gfn_t mmio_gfn;
> +   unsigned int mmio_gen;
>
> struct kvm_pmu pmu;
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 9314678..e00fbfe 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -234,7 +234,7 @@ static unsigned int get_mmio_spte_generation(u64 spte)
> return gen;
>  }
>
> -static unsigned int kvm_current_mmio_generation(struct kvm *kvm)
> +unsigned int kvm_current_mmio_generation(struct kvm *kvm)
>  {
> /*
>  * Init kvm generation close to MMIO_MAX_GEN to easily test the
> @@ -3163,7 +3163,7 @@ static void mmu_sync_roots(struct kvm_vcpu *vcpu)
> if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
> return;
>
> -   vcpu_clear_mmio_info(vcpu, ~0ul);
> +   vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY);
> kvm_mmu_audit(vcpu, AUDIT_PRE_SYNC);
> if (vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL) {
> hpa_t root = vcpu->arch.mmu.root_hpa;
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index b982112..e2d902a 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -76,6 +76,8 @@ enum {
>  };
>
>  int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool 
> direct);
> +unsigned int kvm_current_mmio_generation(struct kvm *kvm);
> +
>  void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
>  void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context,
> bool execonly);
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 8c97bac..ae7006d 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -3,6 +3,7 @@
>
>  #include 
>  #include "kvm_cache_regs.h"
> +#include "mmu.h"
>
>  static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu)
>  {
> @@ -78,15 +79,23 @@ static inline void vcpu_cache_mmio_info(struct kvm_vcpu 
> *vcpu,
> vcpu->arch.mmio_gva = gva & PAGE_MASK;
> vcpu->arch.access = access;
> vcpu->arch.mmio_gfn = gfn;
> +   vcpu->arch.mmio_gen = kvm_current_mmio_generation(vcpu->kvm);
> +}
> +
> +static inline bool vcpu_match_mmio_gen(struct kvm_vcpu *vcpu)
> +{
> +   return vcpu->arch.mmio_gen == kvm_current_mmio_generation(vcpu->kvm);
>  }
>
>  /*
> - * Clear the mmio cache info for the given gva,
> - * specially, if gva is ~0ul, we clear all mmio cache info.
> + * Clear the mmio cache info for the given gva. If gva is MMIO_GVA_ANY, we
> + * clear all mmio cache info.
>   */
> +#define MMIO_GVA_ANY (~(gva_t)0)
> +
>  static inline void vcpu_clear_mmio_info(struct kvm_vcpu *vcpu, gva_t gva)
>  {
> -   if (gva != (~0ul) && vcpu->arch.mmio_gva != (gva & PAGE_MASK))
> +   if (gva != MMIO_GVA_ANY && vcpu->arch.mmio_gva != (gva & PAGE_MASK))
> return;
>
> vcpu->arch.mmio_gva = 0;
> @@ -94,7 +103,8 @@ static inline void vcpu_clear_mmio_info(struct kvm_vcpu 
> *vcpu, gva_t gva)
>
>  static inline bool vcpu_match_mmio_gva(struct k

Re: [PATCH v10 0/6] arm: dirty page logging support for ARMv7

2014-08-28 Thread Mario Smarduch
On 08/28/2014 12:26 PM, Joel Schopp wrote:
> 
> On 08/26/2014 06:51 PM, Mario Smarduch wrote:
>> This patch adds support for ARMv7 dirty page logging. Some functions
>> of dirty
>> page logging have been split to generic and arch specific
>> implementations,
>> details below. Dirty page logging is one of serveral features required
>> for
>> live migration, live migration has been tested for ARMv7.
> Any reason not to cc the kvm arm list?  I almost missed this patch set. 
> kvm...@lists.cs.columbia.edu

All previous series were copied to kvm arm list.

This time around email sever setup caused some problems,
including this and few others.

Sorry about that.

- Mario
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GET_RNG_SEED hypercall ABI? (Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm)

2014-08-28 Thread Paolo Bonzini
Il 28/08/2014 18:22, Andy Lutomirski ha scritto:
> Is there a non-cpuid interface between QEMU and KVM for this?

No.

> AFAICT, even turning off cpuid bits for things like async pf doesn't
> actually disable the MSRs (which is arguably an attack surface issue).

No, it doesn't.  You cannot disable instructions even if you hide CPUID
bits, so KVM just extends this to MSRs (both native and paravirtual). It
sometimes helps too, for example with a particular guest OS that does
not necessary check CPUID for bits that are always present on Apple
hardware...

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v10 0/6] arm: dirty page logging support for ARMv7

2014-08-28 Thread Joel Schopp


On 08/26/2014 06:51 PM, Mario Smarduch wrote:

This patch adds support for ARMv7 dirty page logging. Some functions of dirty
page logging have been split to generic and arch specific implementations,
details below. Dirty page logging is one of serveral features required for
live migration, live migration has been tested for ARMv7.
Any reason not to cc the kvm arm list?  I almost missed this patch set.  
kvm...@lists.cs.columbia.edu

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] x86: Check level, xlevel before returning CPUID data

2014-08-28 Thread Eduardo Habkost
None of the existing code using cpuid checks level or xlevel before
running it. Instead of changing all callers, make the cpuid() function
check if the requested leaf is available, before returning any data.

All existing callers of cpuid() and cpuid_indexed() are checks for the
presence of feature bits, so it is safe to return all zeroes when the
requested CPUID leaf is not available.

Signed-off-by: Eduardo Habkost 
---
 lib/x86/processor.h | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/lib/x86/processor.h b/lib/x86/processor.h
index d4e295b..7973879 100644
--- a/lib/x86/processor.h
+++ b/lib/x86/processor.h
@@ -280,7 +280,7 @@ static inline ulong read_dr7(void)
 
 struct cpuid { u32 a, b, c, d; };
 
-static inline struct cpuid cpuid_indexed(u32 function, u32 index)
+static inline struct cpuid raw_cpuid(u32 function, u32 index)
 {
 struct cpuid r;
 asm volatile ("cpuid"
@@ -289,6 +289,14 @@ static inline struct cpuid cpuid_indexed(u32 function, u32 
index)
 return r;
 }
 
+static inline struct cpuid cpuid_indexed(u32 function, u32 index)
+{
+u32 level = raw_cpuid(function & 0xf000, 0).a;
+if (level < function)
+return (struct cpuid) { 0, 0, 0, 0 };
+return raw_cpuid(function, index);
+}
+
 static inline struct cpuid cpuid(u32 function)
 {
 return cpuid_indexed(function, 0);
@@ -296,9 +304,9 @@ static inline struct cpuid cpuid(u32 function)
 
 static inline u8 cpuid_maxphyaddr(void)
 {
-if (cpuid(0x8000).a < 0x8008)
+if (raw_cpuid(0x8000, 0).a < 0x8008)
 return 36;
-return cpuid(0x8008).a & 0xff;
+return raw_cpuid(0x8008, 0).a & 0xff;
 }
 
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] kvm-unit-tests: Check CPUID level/xlevel before using CPUID data

2014-08-28 Thread Eduardo Habkost
Change the kvm-unit-tests x86 code to always check CPUID level/xlevel before
looking at CPUID data. Otherwise, the test code will be looking at bogus data.

Eduardo Habkost (3):
  x86: apic: Look up MAXPHYADDR on CPUID correctly
  x86: vmx: Use cpuid_maxphyaddr()
  x86: Check level, xlevel before returning CPUID data

 lib/x86/processor.h | 18 +-
 x86/apic.c  |  2 +-
 x86/vmx.c   |  6 +++---
 3 files changed, 21 insertions(+), 5 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] x86: vmx: Use cpuid_maxphyaddr()

2014-08-28 Thread Eduardo Habkost
The vmx test code calls cpuid(0x8008) without checking xlevel first.
Change it to use cpuid_maxphyaddr() instead.

Signed-off-by: Eduardo Habkost 
---
 x86/vmx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/x86/vmx.c b/x86/vmx.c
index ef9caca..132e5f0 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -161,7 +161,7 @@ void print_vmexit_info()
 static void test_vmclear(void)
 {
struct vmcs *tmp_root;
-   int width = cpuid(0x8008).a & 0xff;
+   int width = cpuid_maxphyaddr();
 
/*
 * Note- The tests below do not necessarily have a
@@ -652,7 +652,7 @@ static int test_vmxon(void)
 {
int ret, ret1;
u64 *tmp_region = vmxon_region;
-   int width = cpuid(0x8008).a & 0xff;
+   int width = cpuid_maxphyaddr();
 
/* Unaligned page access */
vmxon_region = (u64 *)((intptr_t)vmxon_region + 1);
@@ -694,7 +694,7 @@ out:
 static void test_vmptrld(void)
 {
struct vmcs *vmcs, *tmp_root;
-   int width = cpuid(0x8008).a & 0xff;
+   int width = cpuid_maxphyaddr();
 
vmcs = alloc_page();
vmcs->revision_id = basic.revision;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] x86: apic: Look up MAXPHYADDR on CPUID correctly

2014-08-28 Thread Eduardo Habkost
When the CPUID xlevel on QEMU is < 0x8008, we get the following:

$ ./x86-run x86/apic.flat -smp 2 -cpu qemu64,xlevel=0x8007
[...]
FAIL: apicbase: reserved physaddr bits

That happens because CPUID[0x8008].EAX won't have the expected data
if xlevel < 0x8008.

When the CPUID physical address bits information is not available on CPUID,
assume it is 36, as documented on Intel SDM, Volume 3A, section
10.4.4 "Local APIC Status and Location":

> Bits 0 through 7, bits 9 and 10, and bits MAXPHYADDR[1] through 63 in the
> IA32_APIC_BASE MSR are reserved.
>
> [1] The MAXPHYADDR is 36 bits for processors that do not support CPUID
> leaf 8008H, or indicated by CPUID.8008H:EAX[bits 7:0] for
> processors that support CPUID leaf 8008H."

Signed-off-by: Eduardo Habkost 
---
 lib/x86/processor.h | 8 
 x86/apic.c  | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/lib/x86/processor.h b/lib/x86/processor.h
index f5f1c82..d4e295b 100644
--- a/lib/x86/processor.h
+++ b/lib/x86/processor.h
@@ -294,6 +294,14 @@ static inline struct cpuid cpuid(u32 function)
 return cpuid_indexed(function, 0);
 }
 
+static inline u8 cpuid_maxphyaddr(void)
+{
+if (cpuid(0x8000).a < 0x8008)
+return 36;
+return cpuid(0x8008).a & 0xff;
+}
+
+
 static inline void pause(void)
 {
 asm volatile ("pause");
diff --git a/x86/apic.c b/x86/apic.c
index 3f463a5..2619d85 100644
--- a/x86/apic.c
+++ b/x86/apic.c
@@ -124,7 +124,7 @@ static void test_apicbase(void)
 report("relocate apic",
*(volatile u32 *)(ALTERNATE_APIC_BASE + APIC_LVR) == lvr);
 
-value = orig_apicbase | (1UL << (cpuid(0x8008).a & 0xff));
+value = orig_apicbase | (1UL << cpuid_maxphyaddr());
 report("apicbase: reserved physaddr bits",
test_for_exception(GP_VECTOR, do_write_apicbase, &value));
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GET_RNG_SEED hypercall ABI? (Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm)

2014-08-28 Thread Andy Lutomirski
On Aug 28, 2014 7:17 AM, "Gleb Natapov"  wrote:
>
> On Tue, Aug 26, 2014 at 04:58:34PM -0700, Andy Lutomirski wrote:
> > hpa pointed out that the ABI that I chose (an MSR from the KVM range
> > and a KVM cpuid bit) is unnecessarily KVM-specific.  It would be nice
> > to allocate an MSR that everyone involved can agree on and, rather
> > than relying on a cpuid bit, just have the guest probe for the MSR.
> >
> CPUID part allows feature to be disabled for machine compatibility purpose
> during migration. Of course interface can explicitly state that one successful
> use of the MSR does not mean that next use will not result in a #GP, but that
> doesn't sound very elegant and is different from any other MSR out there.
>

Is there a non-cpuid interface between QEMU and KVM for this?

AFAICT, even turning off cpuid bits for things like async pf doesn't
actually disable the MSRs (which is arguably an attack surface issue).

--Andy

> --
> Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GET_RNG_SEED hypercall ABI? (Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm)

2014-08-28 Thread Gleb Natapov
On Tue, Aug 26, 2014 at 04:58:34PM -0700, Andy Lutomirski wrote:
> hpa pointed out that the ABI that I chose (an MSR from the KVM range
> and a KVM cpuid bit) is unnecessarily KVM-specific.  It would be nice
> to allocate an MSR that everyone involved can agree on and, rather
> than relying on a cpuid bit, just have the guest probe for the MSR.
> 
CPUID part allows feature to be disabled for machine compatibility purpose
during migration. Of course interface can explicitly state that one successful
use of the MSR does not mean that next use will not result in a #GP, but that
doesn't sound very elegant and is different from any other MSR out there.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: remove garbage arg to *hardware_{en,dis}able

2014-08-28 Thread Radim Krčmář
In the beggining was on_each_cpu(), which required an unused argument to
kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten.

Remove unnecessary arguments that stem from this.

Signed-off-by: Radim Krčmář 
---
 arch/arm/include/asm/kvm_host.h |  2 +-
 arch/arm/kvm/arm.c  |  2 +-
 arch/arm64/include/asm/kvm_host.h   |  2 +-
 arch/ia64/kvm/kvm-ia64.c|  4 ++--
 arch/mips/include/asm/kvm_host.h|  2 +-
 arch/mips/kvm/mips.c|  2 +-
 arch/powerpc/include/asm/kvm_host.h |  2 +-
 arch/powerpc/kvm/powerpc.c  |  2 +-
 arch/s390/include/asm/kvm_host.h|  2 +-
 arch/s390/kvm/kvm-s390.c|  2 +-
 arch/x86/include/asm/kvm_host.h |  4 ++--
 arch/x86/kvm/svm.c  |  4 ++--
 arch/x86/kvm/vmx.c  |  4 ++--
 arch/x86/kvm/x86.c  | 12 ++--
 include/linux/kvm_host.h|  4 ++--
 virt/kvm/kvm_main.c |  4 ++--
 16 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 84291fe..278ecfa 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -233,7 +233,7 @@ static inline void vgic_arch_setup(const struct vgic_params 
*vgic)
 int kvm_perf_init(void);
 int kvm_perf_teardown(void);
 
-static inline void kvm_arch_hardware_disable(void *garbage) {}
+static inline void kvm_arch_hardware_disable(void) {}
 static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 132bb0d..005a7b5 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -87,7 +87,7 @@ struct kvm_vcpu __percpu **kvm_get_running_vcpus(void)
return &kvm_arm_running_vcpu;
 }
 
-int kvm_arch_hardware_enable(void *garbage)
+int kvm_arch_hardware_enable(void)
 {
return 0;
 }
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 94d8a3c..3816c52 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -244,7 +244,7 @@ static inline void vgic_arch_setup(const struct vgic_params 
*vgic)
}
 }
 
-static inline void kvm_arch_hardware_disable(void *garbage) {}
+static inline void kvm_arch_hardware_disable(void) {}
 static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 5e14dca..ec6b9ac 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -125,7 +125,7 @@ long ia64_pal_vp_create(u64 *vpd, u64 *host_iva, u64 
*opt_handler)
 
 static  DEFINE_SPINLOCK(vp_lock);
 
-int kvm_arch_hardware_enable(void *garbage)
+int kvm_arch_hardware_enable(void)
 {
long  status;
long  tmp_base;
@@ -160,7 +160,7 @@ int kvm_arch_hardware_enable(void *garbage)
return 0;
 }
 
-void kvm_arch_hardware_disable(void *garbage)
+void kvm_arch_hardware_disable(void)
 {
 
long status;
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index b4d900a..2f63ca2 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -767,7 +767,7 @@ extern int kvm_mips_trans_mtc0(uint32_t inst, uint32_t *opc,
 extern void kvm_mips_dump_stats(struct kvm_vcpu *vcpu);
 extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
 
-static inline void kvm_arch_hardware_disable(void *garbage) {}
+static inline void kvm_arch_hardware_disable(void) {}
 static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_free_memslot(struct kvm *kvm,
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 0ec7490..e3b21e5 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -77,7 +77,7 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
return 1;
 }
 
-int kvm_arch_hardware_enable(void *garbage)
+int kvm_arch_hardware_enable(void)
 {
return 0;
 }
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 0e646c1..b00113b 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -687,7 +687,7 @@ struct kvm_vcpu_arch {
 #define __KVM_HAVE_ARCH_WQP
 #define __KVM_HAVE_CREATE_DEVICE
 
-static inline void kvm_arch_hardware_disable(void *garbage) {}
+static inline void kvm_arch_hardware_disable(void) {}
 static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_memslots_updated(struct kvm *kvm) {}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 72c3fc0..da50523 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powe

[PATCH 0/2] KVM: minor cleanup and optimizations

2014-08-28 Thread Radim Krčmář
The first patch answers a demand for inline arch functions.
(There is a lot of constant functions that could be inlined as well.)

Second patch digs a bit into the history of KVM and removes a useless
argument that seemed suspicious when preparing the first patch.


Radim Krčmář (2):
  KVM: static inline empty kvm_arch functions
  KVM: remove garbage arg to *hardware_{en,dis}able

 arch/arm/include/asm/kvm_host.h |  6 +
 arch/arm/kvm/arm.c  | 21 +
 arch/arm64/include/asm/kvm_host.h   |  6 +
 arch/ia64/include/asm/kvm_host.h| 12 ++
 arch/ia64/kvm/kvm-ia64.c| 34 ++--
 arch/mips/include/asm/kvm_host.h| 11 +
 arch/mips/kvm/mips.c| 44 +---
 arch/powerpc/include/asm/kvm_host.h |  8 +++
 arch/powerpc/kvm/powerpc.c  | 31 +
 arch/s390/include/asm/kvm_host.h| 14 
 arch/s390/kvm/kvm-s390.c| 45 +
 arch/x86/include/asm/kvm_host.h |  4 ++--
 arch/x86/kvm/svm.c  |  4 ++--
 arch/x86/kvm/vmx.c  |  4 ++--
 arch/x86/kvm/x86.c  | 12 +-
 include/linux/kvm_host.h|  4 ++--
 virt/kvm/kvm_main.c |  4 ++--
 17 files changed, 79 insertions(+), 185 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: static inline empty kvm_arch functions

2014-08-28 Thread Radim Krčmář
Using static inline is going to save few bytes and cycles.
For example on powerpc, the difference is 700 B after stripping.
(5 kB before)

This patch also deals with two overlooked empty functions:
kvm_arch_flush_shadow was not removed from arch/mips/kvm/mips.c
  2df72e9bc KVM: split kvm_arch_flush_shadow
and kvm_arch_sched_in never made it into arch/ia64/kvm/kvm-ia64.c.
  e790d9ef6 KVM: add kvm_arch_sched_in

Signed-off-by: Radim Krčmář 
---
 arch/arm/include/asm/kvm_host.h |  6 ++
 arch/arm/kvm/arm.c  | 19 
 arch/arm64/include/asm/kvm_host.h   |  6 ++
 arch/ia64/include/asm/kvm_host.h| 12 +++
 arch/ia64/kvm/kvm-ia64.c| 30 --
 arch/mips/include/asm/kvm_host.h| 11 ++
 arch/mips/kvm/mips.c| 42 
 arch/powerpc/include/asm/kvm_host.h |  8 +++
 arch/powerpc/kvm/powerpc.c  | 29 -
 arch/s390/include/asm/kvm_host.h| 14 
 arch/s390/kvm/kvm-s390.c| 43 -
 11 files changed, 57 insertions(+), 163 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 6dfb404..84291fe 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -233,4 +233,10 @@ static inline void vgic_arch_setup(const struct 
vgic_params *vgic)
 int kvm_perf_init(void);
 int kvm_perf_teardown(void);
 
+static inline void kvm_arch_hardware_disable(void *garbage) {}
+static inline void kvm_arch_hardware_unsetup(void) {}
+static inline void kvm_arch_sync_events(struct kvm *kvm) {}
+static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 9f788eb..132bb0d 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -97,27 +97,16 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
 }
 
-void kvm_arch_hardware_disable(void *garbage)
-{
-}
-
 int kvm_arch_hardware_setup(void)
 {
return 0;
 }
 
-void kvm_arch_hardware_unsetup(void)
-{
-}
-
 void kvm_arch_check_processor_compat(void *rtn)
 {
*(int *)rtn = 0;
 }
 
-void kvm_arch_sync_events(struct kvm *kvm)
-{
-}
 
 /**
  * kvm_arch_init_vm - initializes a VM data structure
@@ -284,14 +273,6 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
return 0;
 }
 
-void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
-{
-}
-
-void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
-{
-}
-
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
vcpu->cpu = cpu;
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index e10c45a..94d8a3c 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -244,4 +244,10 @@ static inline void vgic_arch_setup(const struct 
vgic_params *vgic)
}
 }
 
+static inline void kvm_arch_hardware_disable(void *garbage) {}
+static inline void kvm_arch_hardware_unsetup(void) {}
+static inline void kvm_arch_sync_events(struct kvm *kvm) {}
+static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index db95f57..353167d 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -595,6 +595,18 @@ void kvm_sal_emul(struct kvm_vcpu *vcpu);
 struct kvm *kvm_arch_alloc_vm(void);
 void kvm_arch_free_vm(struct kvm *kvm);
 
+static inline void kvm_arch_sync_events(struct kvm *kvm) {}
+static inline void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_free_memslot(struct kvm *kvm,
+   struct kvm_memory_slot *free, struct kvm_memory_slot *dont) {}
+static inline void kvm_arch_memslots_updated(struct kvm *kvm) {}
+static inline void kvm_arch_commit_memory_region(struct kvm *kvm,
+   struct kvm_userspace_memory_region *mem,
+   const struct kvm_memory_slot *old,
+   enum kvm_mr_change change) {}
+static inline void kvm_arch_hardware_unsetup(void) {}
+
 #endif /* __ASSEMBLY__*/
 
 #endif
diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 0729ba6..5e14dca 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1364,10 +1364,6 @@ static void kvm_release_vm_pages(struct kvm *kvm)
}
 }
 
-void kvm_arch_sync_events(struct kvm *kvm)
-{
-}
-
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
kvm_iommu_unmap_guest(kvm);
@@ -1376,10 +1372,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kvm_release_vm_pages(kvm);
 }
 
-void kvm_arch_vcpu_put(struct kvm_

Re: [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofits correspondingioapic->irr bit always set

2014-08-28 Thread Zhang Haoyu
Hi Jason,
I tested below patch, it's okay, the e1000 interrupt storm disappeared.
But I am going to make a bit change on it, could you help review it?

>Currently, we call ioapic_service() immediately when we find the irq is still
>active during eoi broadcast. But for real hardware, there's some dealy between
>the EOI writing and irq delivery (system bus latency?). So we need to emulate
>this behavior. Otherwise, for a guest who haven't register a proper irq handler
>, it would stay in the interrupt routine as this irq would be re-injected
>immediately after guest enables interrupt. This would lead guest can't move
>forward and may miss the possibility to get proper irq handler registered (one
>example is windows guest resuming from hibernation).
>
>As there's no way to differ the unhandled irq from new raised ones, this patch
>solve this problems by scheduling a delayed work when the count of irq injected
>during eoi broadcast exceeds a threshold value. After this patch, the guest can
>move a little forward when there's no suitable irq handler in case it may
>register one very soon and for guest who has a bad irq detection routine ( such
>as note_interrupt() in linux ), this bad irq would be recognized soon as in the
>past.
>
>Signed-off-by: Jason Wang  redhat.com>
>---
> virt/kvm/ioapic.c |   47 +--
> virt/kvm/ioapic.h |2 ++
> 2 files changed, 47 insertions(+), 2 deletions(-)
>
>diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
>index dcaf272..892253e 100644
>--- a/virt/kvm/ioapic.c
>+++ b/virt/kvm/ioapic.c
> -221,6 +221,24  int kvm_ioapic_set_irq(struct 
> kvm_ioapic *ioapic, int irq, int level)
>   return ret;
> }
>
>+static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
>+{
>+  int i, ret;
>+  struct kvm_ioapic *ioapic = container_of(work, struct kvm_ioapic,
>+   eoi_inject.work);
>+  spin_lock(&ioapic->lock);
>+  for (i = 0; i < IOAPIC_NUM_PINS; i++) {
>+  union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
>+
>+  if (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG)
>+  continue;
>+
>+  if (ioapic->irr & (1 << i) && !ent->fields.remote_irr)
>+  ret = ioapic_service(ioapic, i);
>+  }
>+  spin_unlock(&ioapic->lock);
>+}
>+
> static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
>int trigger_mode)
> {
> -249,8 +267,29  static void 
> __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
>
>   ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
>   ent->fields.remote_irr = 0;
>-  if (!ent->fields.mask && (ioapic->irr & (1 << i)))
>-  ioapic_service(ioapic, i);
>+  if (!ent->fields.mask && (ioapic->irr & (1 << i))) {
>+  ++ioapic->irq_eoi;
-+  ++ioapic->irq_eoi;
++  ++ioapic->irq_eoi[i];
>+  if (ioapic->irq_eoi == 100) {
-+  if (ioapic->irq_eoi == 100) {
++  if (ioapic->irq_eoi[i] == 100) {
>+  /*
>+   * Real hardware does not deliver the irq so
>+   * immediately during eoi broadcast, so we need
>+   * to emulate this behavior. Otherwise, for
>+   * guests who has not registered handler of a
>+   * level irq, this irq would be injected
>+   * immediately after guest enables interrupt
>+   * (which happens usually at the end of the
>+   * common interrupt routine). This would lead
>+   * guest can't move forward and may miss the
>+   * possibility to get proper irq handler
>+   * registered. So we need to give some breath to
>+   * guest. TODO: 1 is too long?
>+   */
>+  schedule_delayed_work(&ioapic->eoi_inject, 1);
>+  ioapic->irq_eoi = 0;
-+  ioapic->irq_eoi = 0;
++  ioapic->irq_eoi[i] = 0;
>+  } else {
>+  ioapic_service(ioapic, i);
>+  }
>+  }
++  else {
++  ioapic->irq_eoi[i] = 0;
++  }
>   }
> }
I think ioapic->irq_eoi is prone to reach to 100, because during a eoi 
broadcast, 
it's possible that another interrupt's (not current eoi's corresponding 
interrupt) irr is set, so the ioapic->irq_eoi will grow continually,
and not too long, ioapic->irq_eoi will reach to 100.
I want to add "u32 irq_eoi[IOAPIC_NUM_PIN

Re: [PATCH v2] KVM: x86: keep eoi exit bitmap accurate before loading it.

2014-08-28 Thread Paolo Bonzini
Il 28/08/2014 12:14, Wang, Wei W ha scritto:
> We will do some more tests on it to make sure there are no problems.

No, I don't think there are any easily-detected practical problems with
the patch.  But I'm not sure I understand all the theoretical problems
and whether possible races are benign.  These would be really hard to
debug, unless you get a bug that is 100% reproducible.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2] KVM: x86: keep eoi exit bitmap accurate before loading it.

2014-08-28 Thread Wang, Wei W
We will do some more tests on it to make sure there are no problems.
Wei

-Original Message-
From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo Bonzini
Sent: Thursday, August 28, 2014 4:44 PM
To: Zhang, Yang Z; Wang, Wei W; kvm@vger.kernel.org
Cc: alex.william...@redhat.com
Subject: Re: [PATCH v2] KVM: x86: keep eoi exit bitmap accurate before loading 
it.

Il 28/08/2014 08:17, Zhang, Yang Z ha scritto:
> > Program edge-triggered MSI for vector 123
> > Interrupt comes in, ISR[123]=1
> > Mask MSI
> > Program level-triggered IOAPIC interrupt for vector 123
> 
> You cannot assign the vector 123 to another trigger mode interrupt 
> before previous IRQ handler finished (means issue EOI). If there is an 
> interrupt comes from the IOAPIC entry immediately after you reprogram 
> the entry, it will update the TMR to 1. Since we are still in previous 
> IRQ handler, it will get confused to see the TMR becomes 1.

Yeah, that could be confusing to real hardware as well.  Still, I'm a bit 
nervous at the possibility of races introduced by these patches...

I wouldn't mind a second review.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] kvmtool: ARM64: Fix compile error for aarch64

2014-08-28 Thread Will Deacon
On Thu, Aug 28, 2014 at 10:56:29AM +0100, Pekka Enberg wrote:
> On 08/07/2014 12:12 PM, Will Deacon wrote:
> > Ok. Pekka, could you merge in 3.16 to the kvmtool master branch please?
> > You'll need my patch below to resolve some ARM build fallout.
> 
> Done.

Thanks, Pekka! We'll give it a spin.

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] kvmtool: ARM64: Fix compile error for aarch64

2014-08-28 Thread Pekka Enberg

On 08/07/2014 12:12 PM, Will Deacon wrote:

Ok. Pekka, could you merge in 3.16 to the kvmtool master branch please?
You'll need my patch below to resolve some ARM build fallout.


Done.

- Pekka
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 2/2] docs: update ivshmem device spec

2014-08-28 Thread Stefan Hajnoczi
On Tue, Aug 26, 2014 at 01:04:30PM +0200, Paolo Bonzini wrote:
> Il 26/08/2014 08:47, David Marchand ha scritto:
> > 
> > Using a version message supposes we want to keep ivshmem-server and QEMU
> > separated (for example, in two distribution packages) while we can avoid
> > this, so why would we do so ?
> > 
> > If we want the ivshmem-server to come with QEMU, then both are supposed
> > to be aligned on your system.
> 
> What about upgrading QEMU and ivshmem-server while you have existing
> guests?  You cannot restart ivshmem-server, and the new QEMU would have
> to talk to the old ivshmem-server.

Version negotiation also helps avoid confusion if someone combines
ivshmem-server and QEMU from different origins (e.g. built from source
and distro packaged).

It's a safeguard to prevent hard-to-diagnose failures when the system is
misconfigured.

Stefan


pgpFp_zSoZiZ7.pgp
Description: PGP signature


Re: [PATCH v4] KVM: PPC: BOOKE: Emulate debug registers and exception

2014-08-28 Thread Alexander Graf


On 13.08.14 11:09, Bharat Bhushan wrote:
> This patch emulates debug registers and debug exception
> to support guest using debug resource. This enables running
> gdb/kgdb etc in guest.
> 
> On BOOKE architecture we cannot share debug resources between QEMU and
> guest because:
> When QEMU is using debug resources then debug exception must
> be always enabled. To achieve this we set MSR_DE and also set
> MSRP_DEP so guest cannot change MSR_DE.
> 
> When emulating debug resource for guest we want guest
> to control MSR_DE (enable/disable debug interrupt on need).
> 
> So above mentioned two configuration cannot be supported
> at the same time. So the result is that we cannot share
> debug resources between QEMU and Guest on BOOKE architecture.
> 
> In the current design QEMU gets priority over guest, this means that if
> QEMU is using debug resources then guest cannot use them and if guest is
> using debug resource then QEMU can overwrite them.
> 
> Signed-off-by: Bharat Bhushan 

Thanks, applied to kvm-ppc-queue.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: x86: keep eoi exit bitmap accurate before loading it.

2014-08-28 Thread Paolo Bonzini
Il 28/08/2014 08:17, Zhang, Yang Z ha scritto:
> > Program edge-triggered MSI for vector 123
> > Interrupt comes in, ISR[123]=1
> > Mask MSI
> > Program level-triggered IOAPIC interrupt for vector 123
> 
> You cannot assign the vector 123 to another trigger mode interrupt
> before previous IRQ handler finished (means issue EOI). If there is an
> interrupt comes from the IOAPIC entry immediately after you reprogram
> the entry, it will update the TMR to 1. Since we are still in previous
> IRQ handler, it will get confused to see the TMR becomes 1.

Yeah, that could be confusing to real hardware as well.  Still, I'm a
bit nervous at the possibility of races introduced by these patches...

I wouldn't mind a second review.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [question] e1000 interrupt storm happened becauseofits correspondingioapic->irr bit always set

2014-08-28 Thread Jason Wang
On 08/27/2014 05:31 PM, Zhang Haoyu wrote:
>>> Hi, all
>>> >>
>>> >> I use a qemu-1.4.1/qemu-2.0.0 to run win7 guest, and encounter 
>>> >> e1000 NIC interrupt storm, 
>>> >> because "if (!ent->fields.mask && (ioapic->irr & (1 << i)))" is 
>>> >> always true in __kvm_ioapic_update_eoi().
>>> >>
>>> >> Any ideas?
>> > We meet this several times: search the autoneg patches for an 
>> > example of
>> > workaround for this in qemu, and patch kvm: ioapic: conditionally 
>> > delay
>> > irq delivery during eoi broadcast for an workaround in kvm 
>> > (rejected).
>> >
>  Thanks, Jason,
>  I searched "e1000 autoneg" in gmane.comp.emulators.qemu, and found 
>  below patches, 
>  http://thread.gmane.org/gmane.comp.emulators.qemu/143001/focus=143007
 >>> This series is the first try to fix the guest hang during guest
 >>> hibernation or driver enable/disable.
>  http://thread.gmane.org/gmane.comp.emulators.qemu/284105/focus=284765
>  http://thread.gmane.org/gmane.comp.emulators.qemu/186159/focus=187351
 >>> Those are follow-up that tries to fix the bugs introduced by the 
 >>> autoneg
 >>> hack.
>  which one tries to fix this problem, or all of them?
 >>> As you can see, those kinds of hacking may not as good as we expect
 >>> since we don't know exactly how e1000 works. Only the register function
 >>> description from Intel's manual may not be sufficient. And you can
 >>> search e1000 in the archives and you can find some behaviour of e1000
 >>> registers were not fictionalized like what spec said. It was really
 >>> suggested to use virtio-net instead of e1000 in guest. 
>>> >> Will the "[PATCH] kvm: ioapic: conditionally delay irq delivery during 
>>> >> eoi broadcast" add delay to virtual interrupt injection sometimes,
>>> >> then some time delay sensitive applications will be impacted?
>> >
>> >I don't test it too much but it only give a minor delay of 1% irq in the
>> >hope of guest irq handler will be registered shortly. But I suspect it's
>> >the bug of e1000 who inject the irq in the wrong time. Under what cases
>> >did you meet this issue?
> Some scenarios, not constant and 100% reproducity, 
> e.g., reboot vm, ifdown e1000 nic, install kaspersky(network configuration is 
> performed during installing stage), .etc.

If you want to start the debugging, I suggest to enable e1000 debug and
then analysis the log before the interrupt storm. This may help to
locate the issue.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html