[COMMIT master] KVM: SVM: Do not report xsave in supported cpuid

2010-12-13 Thread Avi Kivity
From: Joerg Roedel 

To support xsave properly for the guest the SVM module need
software support for it. As long as this is not present do
not report the xsave as supported feature in cpuid.
As a side-effect this patch moves the bit() helper function
into the x86.h file so that it can be used in svm.c too.

KVM-Stable-Tag.
Signed-off-by: Joerg Roedel 
Signed-off-by: Avi Kivity 

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 740884b..9b3d166 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3622,6 +3622,10 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu)
 static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
 {
switch (func) {
+   case 0x0001:
+   /* Mask out xsave bit as long as it is not supported by SVM */
+   entry->ecx &= ~(bit(X86_FEATURE_XSAVE));
+   break;
case 0x8001:
if (nested)
entry->ecx |= (1 << 2); /* Set SVM bit */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5c62ef2..c195260 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4268,11 +4268,6 @@ static int vmx_get_lpage_level(void)
return PT_PDPE_LEVEL;
 }
 
-static inline u32 bit(int bitno)
-{
-   return 1 << (bitno & 31);
-}
-
 static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bb04957..8d76150 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -163,11 +163,6 @@ static inline void kvm_async_pf_hash_reset(struct kvm_vcpu 
*vcpu)
vcpu->arch.apf.gfns[i] = ~0;
 }
 
-static inline u32 bit(int bitno)
-{
-   return 1 << (bitno & 31);
-}
-
 static void kvm_on_user_return(struct user_return_notifier *urn)
 {
unsigned slot;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 2cea414..c600da8 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -70,6 +70,11 @@ static inline int is_paging(struct kvm_vcpu *vcpu)
return kvm_read_cr0_bits(vcpu, X86_CR0_PG);
 }
 
+static inline u32 bit(int bitno)
+{
+   return 1 << (bitno & 31);
+}
+
 void kvm_before_handle_nmi(struct kvm_vcpu *vcpu);
 void kvm_after_handle_nmi(struct kvm_vcpu *vcpu);
 int kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq);
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: MMU: Fix incorrect direct page write protection due to ro host page

2010-12-13 Thread Avi Kivity
From: Avi Kivity 

If KVM sees a read-only host page, it will map it as read-only to prevent
breaking a COW.  However, if the page was part of a large guest page, KVM
incorrectly extends the write protection to the entire large page frame
instead of limiting it to the normal host page.

This results in the instantiation of a new shadow page with read-only access.

If this happens for a MOVS instruction that moves memory between two normal
pages, within a single large page frame, and mapped within the guest as a
large page, and if, in addition, the source operand is not writeable in the
host (perhaps due to KSM), then KVM will instantiate a read-only direct
shadow page, instantiate an spte for the source operand, then instantiate
a new read/write direct shadow page and instantiate an spte for the
destination operand.  Since these two sptes are in different shadow pages,
MOVS will never see them at the same time and the guest will not make
progress.

Fix by mapping the direct shadow page read/write, and only marking the
host page read-only.

Signed-off-by: Avi Kivity 

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 146b681..5ca9426 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -511,6 +511,9 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
link_shadow_page(it.sptep, sp);
}
 
+   if (!map_writable)
+   access &= ~ACC_WRITE_MASK;
+
mmu_set_spte(vcpu, it.sptep, access, gw->pte_access & access,
 user_fault, write_fault, dirty, ptwrite, it.level,
 gw->gfn, pfn, prefault, map_writable);
@@ -593,9 +596,6 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
if (is_error_pfn(pfn))
return kvm_handle_bad_page(vcpu->kvm, walker.gfn, pfn);
 
-   if (!map_writable)
-   walker.pte_access &= ~ACC_WRITE_MASK;
-
spin_lock(&vcpu->kvm->mmu_lock);
if (mmu_notifier_retry(vcpu, mmu_seq))
goto out_unlock;
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: Fix build error on s390 due to missing tlbs_dirty

2010-12-13 Thread Avi Kivity
From: Avi Kivity 

Make it available for all archs.

Signed-off-by: Avi Kivity 

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index bd0da8f..b5021db 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -256,8 +256,8 @@ struct kvm {
struct mmu_notifier mmu_notifier;
unsigned long mmu_notifier_seq;
long mmu_notifier_count;
-   long tlbs_dirty;
 #endif
+   long tlbs_dirty;
 };
 
 /* The guest did something we don't support. */
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: enlarge number of possible CPUID leaves

2010-12-13 Thread Avi Kivity
From: Andre Przywara 

Currently the number of CPUID leaves KVM handles is limited to 40.
My desktop machine (AthlonII) already has 35 and future CPUs will
expand this well beyond the limit. Extend the limit to 80 to make
room for future processors.

KVM-Stable-Tag.
Signed-off-by: Andre Przywara 
Signed-off-by: Avi Kivity 

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b55d789..4461429 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -79,7 +79,7 @@
 #define KVM_NUM_MMU_PAGES (1 << KVM_MMU_HASH_SHIFT)
 #define KVM_MIN_FREE_MMU_PAGES 5
 #define KVM_REFILL_PAGES 25
-#define KVM_MAX_CPUID_ENTRIES 40
+#define KVM_MAX_CPUID_ENTRIES 80
 #define KVM_NR_FIXED_MTRR_REGION 88
 #define KVM_NR_VAR_MTRR 8
 
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Merge branch 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into next

2010-12-13 Thread Avi Kivity
From: Avi Kivity 

* 'master' of 
ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6: (94 
commits)
  drm/i915: i915 cannot provide switcher services.
  Input: wacom - add new Bamboo PT (0xdb)
  ARM: S3C24XX: Fix mess with gpio {set,get}_pull callbacks
  drm/radeon/kms: fix vram base calculation on rs780/rs880
  drm/radeon/kms: fix formatting of vram and gtt info
  drm/radeon/kms: forbid big bo allocation (fdo 31708) v3
  drm: Don't try and disable an encoder that was never enabled
  drm: Add missing drm_vblank_put() along queue vblank error path
  drm/i915/dp: Only apply the workaround if the select is still active
  MN10300: Fix interrupt mask alteration function call name in gdbstub
  autofs4 - remove ioctl mutex (bz23142)
  drm/i915: Emit a request to clear a flushed and idle ring for unbusy bo
  Linux 2.6.37-rc5
  ARM: tegra: fix regression from addruart rewrite
  Input: add input driver for polled GPIO buttons
  PM / Hibernate: Fix memory corruption related to swap
  PM / Hibernate: Use async I/O when reading compressed hibernation image
  wmi: use memcmp instead of strncmp to compare GUIDs
  perf record: Fix eternal wait for stillborn child
  ARM: 6524/1: GIC irq desciptor bug fix
  ...

Signed-off-by: Avi Kivity 
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: SVM: Add xsetbv intercept

2010-12-13 Thread Avi Kivity
From: Joerg Roedel 

This patch implements the xsetbv intercept to the AMD part
of KVM. This makes AVX usable in a save way for the guest on
AVX capable AMD hardware.

The patch is tested by using AVX in the guest and host in
parallel and checking for data corruption. I also used the
KVM xsave unit-tests and they all pass.

Signed-off-by: Joerg Roedel 
Signed-off-by: Avi Kivity 

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 82ecaa3..f7087bf 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -47,6 +47,7 @@ enum {
INTERCEPT_MONITOR,
INTERCEPT_MWAIT,
INTERCEPT_MWAIT_COND,
+   INTERCEPT_XSETBV,
 };
 
 
@@ -329,6 +330,7 @@ struct __attribute__ ((__packed__)) vmcb {
 #define SVM_EXIT_MONITOR   0x08a
 #define SVM_EXIT_MWAIT 0x08b
 #define SVM_EXIT_MWAIT_COND0x08c
+#define SVM_EXIT_XSETBV0x08d
 #define SVM_EXIT_NPF   0x400
 
 #define SVM_EXIT_ERR   -1
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 9b3d166..24b4373 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -935,6 +935,7 @@ static void init_vmcb(struct vcpu_svm *svm)
set_intercept(svm, INTERCEPT_WBINVD);
set_intercept(svm, INTERCEPT_MONITOR);
set_intercept(svm, INTERCEPT_MWAIT);
+   set_intercept(svm, INTERCEPT_XSETBV);
 
control->iopm_base_pa = iopm_base;
control->msrpm_base_pa = __pa(svm->msrpm);
@@ -2546,6 +2547,19 @@ static int skinit_interception(struct vcpu_svm *svm)
return 1;
 }
 
+static int xsetbv_interception(struct vcpu_svm *svm)
+{
+   u64 new_bv = kvm_read_edx_eax(&svm->vcpu);
+   u32 index = kvm_register_read(&svm->vcpu, VCPU_REGS_RCX);
+
+   if (kvm_set_xcr(&svm->vcpu, index, new_bv) == 0) {
+   svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
+   skip_emulated_instruction(&svm->vcpu);
+   }
+
+   return 1;
+}
+
 static int invalid_op_interception(struct vcpu_svm *svm)
 {
kvm_queue_exception(&svm->vcpu, UD_VECTOR);
@@ -2971,6 +2985,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm) = 
{
[SVM_EXIT_WBINVD]   = emulate_on_interception,
[SVM_EXIT_MONITOR]  = invalid_op_interception,
[SVM_EXIT_MWAIT]= invalid_op_interception,
+   [SVM_EXIT_XSETBV]   = xsetbv_interception,
[SVM_EXIT_NPF]  = pf_interception,
 };
 
@@ -3622,10 +3637,6 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu)
 static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
 {
switch (func) {
-   case 0x0001:
-   /* Mask out xsave bit as long as it is not supported by SVM */
-   entry->ecx &= ~(bit(X86_FEATURE_XSAVE));
-   break;
case 0x8001:
if (nested)
entry->ecx |= (1 << 2); /* Set SVM bit */
@@ -3699,6 +3710,7 @@ static const struct trace_print_flags 
svm_exit_reasons_str[] = {
{ SVM_EXIT_WBINVD,  "wbinvd" },
{ SVM_EXIT_MONITOR, "monitor" },
{ SVM_EXIT_MWAIT,   "mwait" },
+   { SVM_EXIT_XSETBV,  "xsetbv" },
{ SVM_EXIT_NPF, "npf" },
{ -1, NULL }
 };
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: MMU: Make the way of accessing lpage_info more generic

2010-12-13 Thread Avi Kivity
From: Takuya Yoshikawa 

Large page information has two elements but one of them, write_count, alone
is accessed by a helper function.

This patch replaces this helper function with more generic one which returns
newly named kvm_lpage_info structure and use it to access the other element
rmap_pde.

Signed-off-by: Takuya Yoshikawa 
Signed-off-by: Avi Kivity 

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 1a953ac..5bc820c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -477,46 +477,46 @@ static void kvm_mmu_page_set_gfn(struct kvm_mmu_page *sp, 
int index, gfn_t gfn)
 }
 
 /*
- * Return the pointer to the largepage write count for a given
- * gfn, handling slots that are not large page aligned.
+ * Return the pointer to the large page information for a given gfn,
+ * handling slots that are not large page aligned.
  */
-static int *slot_largepage_idx(gfn_t gfn,
-  struct kvm_memory_slot *slot,
-  int level)
+static struct kvm_lpage_info *lpage_info_slot(gfn_t gfn,
+ struct kvm_memory_slot *slot,
+ int level)
 {
unsigned long idx;
 
idx = (gfn >> KVM_HPAGE_GFN_SHIFT(level)) -
  (slot->base_gfn >> KVM_HPAGE_GFN_SHIFT(level));
-   return &slot->lpage_info[level - 2][idx].write_count;
+   return &slot->lpage_info[level - 2][idx];
 }
 
 static void account_shadowed(struct kvm *kvm, gfn_t gfn)
 {
struct kvm_memory_slot *slot;
-   int *write_count;
+   struct kvm_lpage_info *linfo;
int i;
 
slot = gfn_to_memslot(kvm, gfn);
for (i = PT_DIRECTORY_LEVEL;
 i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
-   write_count   = slot_largepage_idx(gfn, slot, i);
-   *write_count += 1;
+   linfo = lpage_info_slot(gfn, slot, i);
+   linfo->write_count += 1;
}
 }
 
 static void unaccount_shadowed(struct kvm *kvm, gfn_t gfn)
 {
struct kvm_memory_slot *slot;
-   int *write_count;
+   struct kvm_lpage_info *linfo;
int i;
 
slot = gfn_to_memslot(kvm, gfn);
for (i = PT_DIRECTORY_LEVEL;
 i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
-   write_count   = slot_largepage_idx(gfn, slot, i);
-   *write_count -= 1;
-   WARN_ON(*write_count < 0);
+   linfo = lpage_info_slot(gfn, slot, i);
+   linfo->write_count -= 1;
+   WARN_ON(linfo->write_count < 0);
}
 }
 
@@ -525,12 +525,12 @@ static int has_wrprotected_page(struct kvm *kvm,
int level)
 {
struct kvm_memory_slot *slot;
-   int *largepage_idx;
+   struct kvm_lpage_info *linfo;
 
slot = gfn_to_memslot(kvm, gfn);
if (slot) {
-   largepage_idx = slot_largepage_idx(gfn, slot, level);
-   return *largepage_idx;
+   linfo = lpage_info_slot(gfn, slot, level);
+   return linfo->write_count;
}
 
return 1;
@@ -585,16 +585,15 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t 
large_gfn)
 static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn, int level)
 {
struct kvm_memory_slot *slot;
-   unsigned long idx;
+   struct kvm_lpage_info *linfo;
 
slot = gfn_to_memslot(kvm, gfn);
if (likely(level == PT_PAGE_TABLE_LEVEL))
return &slot->rmap[gfn - slot->base_gfn];
 
-   idx = (gfn >> KVM_HPAGE_GFN_SHIFT(level)) -
-   (slot->base_gfn >> KVM_HPAGE_GFN_SHIFT(level));
+   linfo = lpage_info_slot(gfn, slot, level);
 
-   return &slot->lpage_info[level - 2][idx].rmap_pde;
+   return &linfo->rmap_pde;
 }
 
 /*
@@ -882,19 +881,16 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long 
hva,
end = start + (memslot->npages << PAGE_SHIFT);
if (hva >= start && hva < end) {
gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
+   gfn_t gfn = memslot->base_gfn + gfn_offset;
 
ret = handler(kvm, &memslot->rmap[gfn_offset], data);
 
for (j = 0; j < KVM_NR_PAGE_SIZES - 1; ++j) {
-   unsigned long idx;
-   int sh;
-
-   sh = KVM_HPAGE_GFN_SHIFT(PT_DIRECTORY_LEVEL+j);
-   idx = ((memslot->base_gfn+gfn_offset) >> sh) -
-   (memslot->base_gfn >> sh);
-   ret |= handler(kvm,
-   &memslot->lpage_info[j][idx].rmap_pde,
-   data);
+   struct kvm_lpage_info *linfo;
+
+   linfo = lpage_info_slot(gfn, memslot,
+

[COMMIT master] KVM: VMX: add module parameter to avoid trapping HLT instructions (v5)

2010-12-13 Thread Avi Kivity
From: Anthony Liguori 

In certain use-cases, we want to allocate guests fixed time slices where idle
guest cycles leave the machine idling.  There are many approaches to achieve
this but the most direct is to simply avoid trapping the HLT instruction which
lets the guest directly execute the instruction putting the processor to sleep.

Introduce this as a module-level option for kvm-vmx.ko since if you do this
for one guest, you probably want to do it for all.

Signed-off-by: Anthony Liguori 
Signed-off-by: Avi Kivity 

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 42d9590..9642c22 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -297,6 +297,12 @@ enum vmcs_field {
 #define GUEST_INTR_STATE_SMI   0x0004
 #define GUEST_INTR_STATE_NMI   0x0008
 
+/* GUEST_ACTIVITY_STATE flags */
+#define GUEST_ACTIVITY_ACTIVE  0
+#define GUEST_ACTIVITY_HLT 1
+#define GUEST_ACTIVITY_SHUTDOWN2
+#define GUEST_ACTIVITY_WAIT_SIPI   3
+
 /*
  * Exit Qualifications for MOV for Control Register Access
  */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 72cfdb7..5c62ef2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -69,6 +69,9 @@ module_param(emulate_invalid_guest_state, bool, S_IRUGO);
 static int __read_mostly vmm_exclusive = 1;
 module_param(vmm_exclusive, bool, S_IRUGO);
 
+static int __read_mostly yield_on_hlt = 1;
+module_param(yield_on_hlt, bool, S_IRUGO);
+
 #define KVM_GUEST_CR0_MASK_UNRESTRICTED_GUEST  \
(X86_CR0_WP | X86_CR0_NE | X86_CR0_NW | X86_CR0_CD)
 #define KVM_GUEST_CR0_MASK \
@@ -1009,6 +1012,17 @@ static void skip_emulated_instruction(struct kvm_vcpu 
*vcpu)
vmx_set_interrupt_shadow(vcpu, 0);
 }
 
+static void vmx_clear_hlt(struct kvm_vcpu *vcpu)
+{
+   /* Ensure that we clear the HLT state in the VMCS.  We don't need to
+* explicitly skip the instruction because if the HLT state is set, then
+* the instruction is already executing and RIP has already been
+* advanced. */
+   if (!yield_on_hlt &&
+   vmcs_read32(GUEST_ACTIVITY_STATE) == GUEST_ACTIVITY_HLT)
+   vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
+}
+
 static void vmx_queue_exception(struct kvm_vcpu *vcpu, unsigned nr,
bool has_error_code, u32 error_code,
bool reinject)
@@ -1035,6 +1049,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu, 
unsigned nr,
intr_info |= INTR_TYPE_HARD_EXCEPTION;
 
vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info);
+   vmx_clear_hlt(vcpu);
 }
 
 static bool vmx_rdtscp_supported(void)
@@ -1419,7 +1434,7 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf)
&_pin_based_exec_control) < 0)
return -EIO;
 
-   min = CPU_BASED_HLT_EXITING |
+   min =
 #ifdef CONFIG_X86_64
  CPU_BASED_CR8_LOAD_EXITING |
  CPU_BASED_CR8_STORE_EXITING |
@@ -1432,6 +1447,10 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf)
  CPU_BASED_MWAIT_EXITING |
  CPU_BASED_MONITOR_EXITING |
  CPU_BASED_INVLPG_EXITING;
+
+   if (yield_on_hlt)
+   min |= CPU_BASED_HLT_EXITING;
+
opt = CPU_BASED_TPR_SHADOW |
  CPU_BASED_USE_MSR_BITMAPS |
  CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
@@ -2728,7 +2747,7 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
vmcs_writel(GUEST_IDTR_BASE, 0);
vmcs_write32(GUEST_IDTR_LIMIT, 0x);
 
-   vmcs_write32(GUEST_ACTIVITY_STATE, 0);
+   vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, 0);
vmcs_write32(GUEST_PENDING_DBG_EXCEPTIONS, 0);
 
@@ -2821,6 +2840,7 @@ static void vmx_inject_irq(struct kvm_vcpu *vcpu)
} else
intr |= INTR_TYPE_EXT_INTR;
vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr);
+   vmx_clear_hlt(vcpu);
 }
 
 static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
@@ -2848,6 +2868,7 @@ static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
}
vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR);
+   vmx_clear_hlt(vcpu);
 }
 
 static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: Fix OSXSAVE after migration

2010-12-13 Thread Avi Kivity
From: Sheng Yang 

CPUID's OSXSAVE is a mirror of CR4.OSXSAVE bit. We need to update the CPUID
after migration.

KVM-Stable-Tag.
Signed-off-by: Sheng Yang 
Signed-off-by: Avi Kivity 

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 018bb70..bb04957 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5585,6 +5585,8 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 
mmu_reset_needed |= kvm_read_cr4(vcpu) != sregs->cr4;
kvm_x86_ops->set_cr4(vcpu, sregs->cr4);
+   if (sregs->cr4 & X86_CR4_OSXSAVE)
+   update_cpuid(vcpu);
if (!is_long_mode(vcpu) && is_pae(vcpu)) {
load_pdptrs(vcpu, vcpu->arch.walk_mmu, vcpu->arch.cr3);
mmu_reset_needed = 1;
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: MMU: retry #PF for softmmu

2010-12-13 Thread Avi Kivity
From: Xiao Guangrong 

Retry #PF for softmmu only when the current vcpu has the same cr3 as the time
when #PF occurs

Signed-off-by: Xiao Guangrong 
Signed-off-by: Avi Kivity 

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f7e5066..b55d789 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -593,6 +593,7 @@ struct kvm_x86_ops {
 struct kvm_arch_async_pf {
u32 token;
gfn_t gfn;
+   unsigned long cr3;
bool direct_map;
 };
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 04f9033..1a953ac 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2607,9 +2607,11 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, 
gva_t gva,
 static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn)
 {
struct kvm_arch_async_pf arch;
+
arch.token = (vcpu->arch.apf.id++ << 12) | vcpu->vcpu_id;
arch.gfn = gfn;
arch.direct_map = vcpu->arch.mmu.direct_map;
+   arch.cr3 = vcpu->arch.mmu.get_cr3(vcpu);
 
return kvm_setup_async_pf(vcpu, gva, gfn, &arch);
 }
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 52b3e91..146b681 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -438,7 +438,8 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, 
struct guest_walker *gw,
 static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 struct guest_walker *gw,
 int user_fault, int write_fault, int hlevel,
-int *ptwrite, pfn_t pfn, bool map_writable)
+int *ptwrite, pfn_t pfn, bool map_writable,
+bool prefault)
 {
unsigned access = gw->pt_access;
struct kvm_mmu_page *sp = NULL;
@@ -512,7 +513,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 
mmu_set_spte(vcpu, it.sptep, access, gw->pte_access & access,
 user_fault, write_fault, dirty, ptwrite, it.level,
-gw->gfn, pfn, false, map_writable);
+gw->gfn, pfn, prefault, map_writable);
FNAME(pte_prefetch)(vcpu, gw, it.sptep);
 
return it.sptep;
@@ -568,8 +569,11 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
 */
if (!r) {
pgprintk("%s: guest page fault\n", __func__);
-   inject_page_fault(vcpu, &walker.fault);
-   vcpu->arch.last_pt_write_count = 0; /* reset fork detector */
+   if (!prefault) {
+   inject_page_fault(vcpu, &walker.fault);
+   /* reset fork detector */
+   vcpu->arch.last_pt_write_count = 0;
+   }
return 0;
}
 
@@ -599,7 +603,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
trace_kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT);
kvm_mmu_free_some_pages(vcpu);
sptep = FNAME(fetch)(vcpu, addr, &walker, user_fault, write_fault,
-level, &write_pt, pfn, map_writable);
+level, &write_pt, pfn, map_writable, prefault);
(void)sptep;
pgprintk("%s: shadow pte %p %llx ptwrite %d\n", __func__,
 sptep, *sptep, write_pt);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ed373ba..018bb70 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6183,7 +6183,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, 
struct kvm_async_pf *work)
 {
int r;
 
-   if (!vcpu->arch.mmu.direct_map || !work->arch.direct_map ||
+   if ((vcpu->arch.mmu.direct_map != work->arch.direct_map) ||
  is_error_page(work->page))
return;
 
@@ -6191,6 +6191,10 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, 
struct kvm_async_pf *work)
if (unlikely(r))
return;
 
+   if (!vcpu->arch.mmu.direct_map &&
+ work->arch.cr3 != vcpu->arch.mmu.get_cr3(vcpu))
+   return;
+
vcpu->arch.mmu.page_fault(vcpu, work->gva, 0, true);
 }
 
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: SVM: Use svm_flush_tlb instead of force_new_asid

2010-12-13 Thread Avi Kivity
From: Joerg Roedel 

This patch replaces all calls to force_new_asid which are
intended to flush the guest-tlb by the more appropriate
function svm_flush_tlb. As a side-effect the force_new_asid
function is removed.

Signed-off-by: Joerg Roedel 
Signed-off-by: Avi Kivity 

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 16334bb..b4aad21 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -421,11 +421,6 @@ static inline void invlpga(unsigned long addr, u32 asid)
asm volatile (__ex(SVM_INVLPGA) : : "a"(addr), "c"(asid));
 }
 
-static inline void force_new_asid(struct kvm_vcpu *vcpu)
-{
-   to_svm(vcpu)->asid_generation--;
-}
-
 static int get_npt_level(void)
 {
 #ifdef CONFIG_X86_64
@@ -999,7 +994,7 @@ static void init_vmcb(struct vcpu_svm *svm)
save->cr3 = 0;
save->cr4 = 0;
}
-   force_new_asid(&svm->vcpu);
+   svm->asid_generation = 0;
 
svm->nested.vmcb = 0;
svm->vcpu.arch.hflags = 0;
@@ -1419,7 +1414,7 @@ static void svm_set_cr4(struct kvm_vcpu *vcpu, unsigned 
long cr4)
unsigned long old_cr4 = to_svm(vcpu)->vmcb->save.cr4;
 
if (npt_enabled && ((old_cr4 ^ cr4) & X86_CR4_PGE))
-   force_new_asid(vcpu);
+   svm_flush_tlb(vcpu);
 
vcpu->arch.cr4 = cr4;
if (!npt_enabled)
@@ -1762,7 +1757,7 @@ static void nested_svm_set_tdp_cr3(struct kvm_vcpu *vcpu,
 
svm->vmcb->control.nested_cr3 = root;
mark_dirty(svm->vmcb, VMCB_NPT);
-   force_new_asid(vcpu);
+   svm_flush_tlb(vcpu);
 }
 
 static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
@@ -2366,7 +2361,7 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
svm->nested.intercept_exceptions = 
nested_vmcb->control.intercept_exceptions;
svm->nested.intercept= nested_vmcb->control.intercept;
 
-   force_new_asid(&svm->vcpu);
+   svm_flush_tlb(&svm->vcpu);
svm->vmcb->control.int_ctl = nested_vmcb->control.int_ctl | 
V_INTR_MASKING_MASK;
if (nested_vmcb->control.int_ctl & V_INTR_MASKING_MASK)
svm->vcpu.arch.hflags |= HF_VINTR_MASK;
@@ -3308,7 +3303,7 @@ static int svm_set_tss_addr(struct kvm *kvm, unsigned int 
addr)
 
 static void svm_flush_tlb(struct kvm_vcpu *vcpu)
 {
-   force_new_asid(vcpu);
+   to_svm(vcpu)->asid_generation--;
 }
 
 static void svm_prepare_guest_switch(struct kvm_vcpu *vcpu)
@@ -3560,7 +3555,7 @@ static void svm_set_cr3(struct kvm_vcpu *vcpu, unsigned 
long root)
 
svm->vmcb->save.cr3 = root;
mark_dirty(svm->vmcb, VMCB_CR);
-   force_new_asid(vcpu);
+   svm_flush_tlb(vcpu);
 }
 
 static void set_tdp_cr3(struct kvm_vcpu *vcpu, unsigned long root)
@@ -3574,7 +3569,7 @@ static void set_tdp_cr3(struct kvm_vcpu *vcpu, unsigned 
long root)
svm->vmcb->save.cr3 = vcpu->arch.cr3;
mark_dirty(svm->vmcb, VMCB_CR);
 
-   force_new_asid(vcpu);
+   svm_flush_tlb(vcpu);
 }
 
 static int is_disabled(void)
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: SVM: Implement Flush-By-Asid feature

2010-12-13 Thread Avi Kivity
From: Joerg Roedel 

This patch adds the new flush-by-asid of upcoming AMD
processors to the KVM-AMD module.

Signed-off-by: Joerg Roedel 
Signed-off-by: Avi Kivity 

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 235dd73..82ecaa3 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -88,6 +88,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 
 #define TLB_CONTROL_DO_NOTHING 0
 #define TLB_CONTROL_FLUSH_ALL_ASID 1
+#define TLB_CONTROL_FLUSH_ASID 3
+#define TLB_CONTROL_FLUSH_ASID_LOCAL 7
 
 #define V_TPR_MASK 0x0f
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index b4aad21..740884b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3158,7 +3158,6 @@ static void pre_svm_run(struct vcpu_svm *svm)
 
struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
 
-   svm->vmcb->control.tlb_ctl = TLB_CONTROL_DO_NOTHING;
/* FIXME: handle wraparound of asid_generation */
if (svm->asid_generation != sd->asid_generation)
new_asid(svm, sd);
@@ -3303,7 +3302,12 @@ static int svm_set_tss_addr(struct kvm *kvm, unsigned 
int addr)
 
 static void svm_flush_tlb(struct kvm_vcpu *vcpu)
 {
-   to_svm(vcpu)->asid_generation--;
+   struct vcpu_svm *svm = to_svm(vcpu);
+
+   if (static_cpu_has(X86_FEATURE_FLUSHBYASID))
+   svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID;
+   else
+   svm->asid_generation--;
 }
 
 static void svm_prepare_guest_switch(struct kvm_vcpu *vcpu)
@@ -3527,6 +3531,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 
svm->next_rip = 0;
 
+   svm->vmcb->control.tlb_ctl = TLB_CONTROL_DO_NOTHING;
+
/* if exit due to PF check for async PF */
if (svm->vmcb->control.exit_code == SVM_EXIT_EXCP_BASE + PF_VECTOR)
svm->apf_reason = kvm_read_and_reset_pf_reason();
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: SVM: Add clean-bit for GDT and IDT

2010-12-13 Thread Avi Kivity
From: Joerg Roedel 

This patch implements the clean-bit for the base and limit
of the gdt and idt in the vmcb.

Signed-off-by: Joerg Roedel 
Signed-off-by: Avi Kivity 

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 89be0d6..bb640ae 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -194,6 +194,7 @@ enum {
VMCB_NPT,/* npt_en, nCR3, gPAT */
VMCB_CR, /* CR0, CR3, CR4, EFER */
VMCB_DR, /* DR6, DR7 */
+   VMCB_DT, /* GDT, IDT */
VMCB_DIRTY_MAX,
 };
 
@@ -1304,6 +1305,7 @@ static void svm_set_idt(struct kvm_vcpu *vcpu, struct 
desc_ptr *dt)
 
svm->vmcb->save.idtr.limit = dt->size;
svm->vmcb->save.idtr.base = dt->address ;
+   mark_dirty(svm->vmcb, VMCB_DT);
 }
 
 static void svm_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
@@ -1320,6 +1322,7 @@ static void svm_set_gdt(struct kvm_vcpu *vcpu, struct 
desc_ptr *dt)
 
svm->vmcb->save.gdtr.limit = dt->size;
svm->vmcb->save.gdtr.base = dt->address ;
+   mark_dirty(svm->vmcb, VMCB_DT);
 }
 
 static void svm_decache_cr0_guest_bits(struct kvm_vcpu *vcpu)
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: SVM: Remove flush_guest_tlb function

2010-12-13 Thread Avi Kivity
From: Joerg Roedel 

This function is unused and there is svm_flush_tlb which
does the same. So this function can be removed.

Signed-off-by: Joerg Roedel 
Signed-off-by: Avi Kivity 

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 05ae90a..16334bb 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -426,11 +426,6 @@ static inline void force_new_asid(struct kvm_vcpu *vcpu)
to_svm(vcpu)->asid_generation--;
 }
 
-static inline void flush_guest_tlb(struct kvm_vcpu *vcpu)
-{
-   force_new_asid(vcpu);
-}
-
 static int get_npt_level(void)
 {
 #ifdef CONFIG_X86_64
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: MMU: fix accessed bit set on prefault path

2010-12-13 Thread Avi Kivity
From: Xiao Guangrong 

Retry #PF is the speculative path, so don't set the accessed bit

Signed-off-by: Xiao Guangrong 
Signed-off-by: Avi Kivity 

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4954de9..04f9033 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2214,7 +2214,8 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, 
u64 *sptep)
 }
 
 static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
-   int map_writable, int level, gfn_t gfn, pfn_t pfn)
+   int map_writable, int level, gfn_t gfn, pfn_t pfn,
+   bool prefault)
 {
struct kvm_shadow_walk_iterator iterator;
struct kvm_mmu_page *sp;
@@ -2229,7 +2230,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, 
int write,
pte_access &= ~ACC_WRITE_MASK;
mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, pte_access,
 0, write, 1, &pt_write,
-level, gfn, pfn, false, map_writable);
+level, gfn, pfn, prefault, map_writable);
direct_pte_prefetch(vcpu, iterator.sptep);
++vcpu->stat.pf_fixed;
break;
@@ -2321,7 +2322,8 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
int write, gfn_t gfn,
if (mmu_notifier_retry(vcpu, mmu_seq))
goto out_unlock;
kvm_mmu_free_some_pages(vcpu);
-   r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn);
+   r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn,
+prefault);
spin_unlock(&vcpu->kvm->mmu_lock);
 
 
@@ -2684,7 +2686,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
goto out_unlock;
kvm_mmu_free_some_pages(vcpu);
r = __direct_map(vcpu, gpa, write, map_writable,
-level, gfn, pfn);
+level, gfn, pfn, prefault);
spin_unlock(&vcpu->kvm->mmu_lock);
 
return r;
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: MMU: rename 'no_apf' to 'prefault'

2010-12-13 Thread Avi Kivity
From: Xiao Guangrong 

It's the speculative path if 'no_apf = 1' and we will specially handle this
speculative path in the later patch, so 'prefault' is better to fit the sense.

Signed-off-by: Xiao Guangrong 
Signed-off-by: Avi Kivity 

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index cfbcbfa..f7e5066 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -241,7 +241,8 @@ struct kvm_mmu {
void (*new_cr3)(struct kvm_vcpu *vcpu);
void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long root);
unsigned long (*get_cr3)(struct kvm_vcpu *vcpu);
-   int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err, bool 
no_apf);
+   int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err,
+ bool prefault);
void (*inject_page_fault)(struct kvm_vcpu *vcpu,
  struct x86_exception *fault);
void (*free)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d75ba1e..4954de9 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2284,11 +2284,11 @@ static int kvm_handle_bad_page(struct kvm *kvm, gfn_t 
gfn, pfn_t pfn)
return 1;
 }
 
-static bool try_async_pf(struct kvm_vcpu *vcpu, bool no_apf, gfn_t gfn,
+static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 gva_t gva, pfn_t *pfn, bool write, bool *writable);
 
 static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn,
-bool no_apf)
+bool prefault)
 {
int r;
int level;
@@ -2310,7 +2310,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
int write, gfn_t gfn,
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
 
-   if (try_async_pf(vcpu, no_apf, gfn, v, &pfn, write, &map_writable))
+   if (try_async_pf(vcpu, prefault, gfn, v, &pfn, write, &map_writable))
return 0;
 
/* mmio */
@@ -2583,7 +2583,7 @@ static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu 
*vcpu, gva_t vaddr,
 }
 
 static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
-   u32 error_code, bool no_apf)
+   u32 error_code, bool prefault)
 {
gfn_t gfn;
int r;
@@ -2599,7 +2599,7 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, 
gva_t gva,
gfn = gva >> PAGE_SHIFT;
 
return nonpaging_map(vcpu, gva & PAGE_MASK,
-error_code & PFERR_WRITE_MASK, gfn, no_apf);
+error_code & PFERR_WRITE_MASK, gfn, prefault);
 }
 
 static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn)
@@ -2621,7 +2621,7 @@ static bool can_do_async_pf(struct kvm_vcpu *vcpu)
return kvm_x86_ops->interrupt_allowed(vcpu);
 }
 
-static bool try_async_pf(struct kvm_vcpu *vcpu, bool no_apf, gfn_t gfn,
+static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 gva_t gva, pfn_t *pfn, bool write, bool *writable)
 {
bool async;
@@ -2633,7 +2633,7 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool 
no_apf, gfn_t gfn,
 
put_page(pfn_to_page(*pfn));
 
-   if (!no_apf && can_do_async_pf(vcpu)) {
+   if (!prefault && can_do_async_pf(vcpu)) {
trace_kvm_try_async_get_page(gva, gfn);
if (kvm_find_async_pf_gfn(vcpu, gfn)) {
trace_kvm_async_pf_doublefault(gva, gfn);
@@ -2649,7 +2649,7 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool 
no_apf, gfn_t gfn,
 }
 
 static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
- bool no_apf)
+ bool prefault)
 {
pfn_t pfn;
int r;
@@ -2673,7 +2673,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
 
-   if (try_async_pf(vcpu, no_apf, gfn, gpa, &pfn, write, &map_writable))
+   if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, write, &map_writable))
return 0;
 
/* mmio */
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index d5a0a11..52b3e91 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -539,7 +539,7 @@ out_gpte_changed:
  *   a negative value on error.
  */
 static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
-bool no_apf)
+bool prefault)
 {
int write_fault = error_code & PFERR_WRITE_MASK;
int user_fault = error_code & PFERR_USER_MASK;
@@ -581,7 +581,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
 
-   if (try_async_pf(vcpu, no_apf, walker.gfn, addr, &pf

[COMMIT master] KVM: SVM: Add clean-bit for LBR state

2010-12-13 Thread Avi Kivity
From: Joerg Roedel 

This patch implements the clean-bit for all LBR related
state. This includes the debugctl, br_from, br_to,
last_excp_from, and last_excp_to msrs.

Signed-off-by: Joerg Roedel 
Signed-off-by: Avi Kivity 

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index e5db339..05ae90a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -197,6 +197,7 @@ enum {
VMCB_DT, /* GDT, IDT */
VMCB_SEG,/* CS, DS, SS, ES, CPL */
VMCB_CR2,/* CR2 only */
+   VMCB_LBR,/* DBGCTL, BR_FROM, BR_TO, LAST_EX_FROM, LAST_EX_TO */
VMCB_DIRTY_MAX,
 };
 
@@ -2847,6 +2848,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned 
ecx, u64 data)
return 1;
 
svm->vmcb->save.dbgctl = data;
+   mark_dirty(svm->vmcb, VMCB_LBR);
if (data & (1ULL<<0))
svm_enable_lbrv(svm);
else
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: SVM: Add clean-bit for CR2 register

2010-12-13 Thread Avi Kivity
From: Joerg Roedel 

This patch implements the clean-bit for the cr2 register in
the vmcb.

Signed-off-by: Joerg Roedel 
Signed-off-by: Avi Kivity 

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 85d3350..e5db339 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -196,11 +196,12 @@ enum {
VMCB_DR, /* DR6, DR7 */
VMCB_DT, /* GDT, IDT */
VMCB_SEG,/* CS, DS, SS, ES, CPL */
+   VMCB_CR2,/* CR2 only */
VMCB_DIRTY_MAX,
 };
 
-/* TPR is always written before VMRUN */
-#define VMCB_ALWAYS_DIRTY_MASK (1U << VMCB_INTR)
+/* TPR and CR2 are always written before VMRUN */
+#define VMCB_ALWAYS_DIRTY_MASK ((1U << VMCB_INTR) | (1U << VMCB_CR2))
 
 static inline void mark_all_dirty(struct vmcb *vmcb)
 {
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: SVM: Add clean-bit for DR6 and DR7

2010-12-13 Thread Avi Kivity
From: Joerg Roedel 

This patch implements the clean-bit for the dr6 and dr7
debug registers in the vmcb.

Signed-off-by: Joerg Roedel 
Signed-off-by: Avi Kivity 

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 135727c..89be0d6 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -193,6 +193,7 @@ enum {
VMCB_INTR,   /* int_ctl, int_vector */
VMCB_NPT,/* npt_en, nCR3, gPAT */
VMCB_CR, /* CR0, CR3, CR4, EFER */
+   VMCB_DR, /* DR6, DR7 */
VMCB_DIRTY_MAX,
 };
 
@@ -1484,6 +1485,8 @@ static void svm_guest_debug(struct kvm_vcpu *vcpu, struct 
kvm_guest_debug *dbg)
else
svm->vmcb->save.dr7 = vcpu->arch.dr7;
 
+   mark_dirty(svm->vmcb, VMCB_DR);
+
update_db_intercept(vcpu);
 }
 
@@ -1506,6 +1509,7 @@ static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned 
long value)
struct vcpu_svm *svm = to_svm(vcpu);
 
svm->vmcb->save.dr7 = value;
+   mark_dirty(svm->vmcb, VMCB_DR);
 }
 
 static int pf_interception(struct vcpu_svm *svm)
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: SVM: Add clean-bit for Segements and CPL

2010-12-13 Thread Avi Kivity
From: Joerg Roedel 

This patch implements the clean-bit defined for the cs, ds,
ss, an es segemnts and the current cpl saved in the vmcb.

Signed-off-by: Joerg Roedel 
Signed-off-by: Avi Kivity 

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index bb640ae..85d3350 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -195,6 +195,7 @@ enum {
VMCB_CR, /* CR0, CR3, CR4, EFER */
VMCB_DR, /* DR6, DR7 */
VMCB_DT, /* GDT, IDT */
+   VMCB_SEG,/* CS, DS, SS, ES, CPL */
VMCB_DIRTY_MAX,
 };
 
@@ -1457,6 +1458,7 @@ static void svm_set_segment(struct kvm_vcpu *vcpu,
= (svm->vmcb->save.cs.attrib
   >> SVM_SELECTOR_DPL_SHIFT) & 3;
 
+   mark_dirty(svm->vmcb, VMCB_SEG);
 }
 
 static void update_db_intercept(struct kvm_vcpu *vcpu)
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: SVM: Add clean-bit for control registers

2010-12-13 Thread Avi Kivity
From: Joerg Roedel 

This patch implements the CRx clean-bit for the vmcb. This
bit covers cr0, cr3, cr4, and efer.

Signed-off-by: Joerg Roedel 
Signed-off-by: Avi Kivity 

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 2a63dfa..135727c 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -192,6 +192,7 @@ enum {
VMCB_ASID,   /* ASID */
VMCB_INTR,   /* int_ctl, int_vector */
VMCB_NPT,/* npt_en, nCR3, gPAT */
+   VMCB_CR, /* CR0, CR3, CR4, EFER */
VMCB_DIRTY_MAX,
 };
 
@@ -441,6 +442,7 @@ static void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
efer &= ~EFER_LME;
 
to_svm(vcpu)->vmcb->save.efer = efer | EFER_SVME;
+   mark_dirty(to_svm(vcpu)->vmcb, VMCB_CR);
 }
 
 static int is_external_interrupt(u32 info)
@@ -1338,6 +1340,7 @@ static void update_cr0_intercept(struct vcpu_svm *svm)
*hcr0 = (*hcr0 & ~SVM_CR0_SELECTIVE_MASK)
| (gcr0 & SVM_CR0_SELECTIVE_MASK);
 
+   mark_dirty(svm->vmcb, VMCB_CR);
 
if (gcr0 == *hcr0 && svm->vcpu.fpu_active) {
clr_cr_intercept(svm, INTERCEPT_CR0_READ);
@@ -1404,6 +1407,7 @@ static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned 
long cr0)
 */
cr0 &= ~(X86_CR0_CD | X86_CR0_NW);
svm->vmcb->save.cr0 = cr0;
+   mark_dirty(svm->vmcb, VMCB_CR);
update_cr0_intercept(svm);
 }
 
@@ -1420,6 +1424,7 @@ static void svm_set_cr4(struct kvm_vcpu *vcpu, unsigned 
long cr4)
cr4 |= X86_CR4_PAE;
cr4 |= host_cr4_mce;
to_svm(vcpu)->vmcb->save.cr4 = cr4;
+   mark_dirty(to_svm(vcpu)->vmcb, VMCB_CR);
 }
 
 static void svm_set_segment(struct kvm_vcpu *vcpu,
@@ -3547,6 +3552,7 @@ static void svm_set_cr3(struct kvm_vcpu *vcpu, unsigned 
long root)
struct vcpu_svm *svm = to_svm(vcpu);
 
svm->vmcb->save.cr3 = root;
+   mark_dirty(svm->vmcb, VMCB_CR);
force_new_asid(vcpu);
 }
 
@@ -3559,6 +3565,7 @@ static void set_tdp_cr3(struct kvm_vcpu *vcpu, unsigned 
long root)
 
/* Also sync guest cr3 here in case we live migrate */
svm->vmcb->save.cr3 = vcpu->arch.cr3;
+   mark_dirty(svm->vmcb, VMCB_CR);
 
force_new_asid(vcpu);
 }
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: SVM: Add clean-bit for NPT state

2010-12-13 Thread Avi Kivity
From: Joerg Roedel 

This patch implements the clean-bit for all nested paging
related state in the vmcb.

Signed-off-by: Joerg Roedel 
Signed-off-by: Avi Kivity 

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index b98092d..2a63dfa 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -191,6 +191,7 @@ enum {
VMCB_PERM_MAP,   /* IOPM Base and MSRPM Base */
VMCB_ASID,   /* ASID */
VMCB_INTR,   /* int_ctl, int_vector */
+   VMCB_NPT,/* npt_en, nCR3, gPAT */
VMCB_DIRTY_MAX,
 };
 
@@ -1749,6 +1750,7 @@ static void nested_svm_set_tdp_cr3(struct kvm_vcpu *vcpu,
struct vcpu_svm *svm = to_svm(vcpu);
 
svm->vmcb->control.nested_cr3 = root;
+   mark_dirty(svm->vmcb, VMCB_NPT);
force_new_asid(vcpu);
 }
 
@@ -3553,6 +3555,7 @@ static void set_tdp_cr3(struct kvm_vcpu *vcpu, unsigned 
long root)
struct vcpu_svm *svm = to_svm(vcpu);
 
svm->vmcb->control.nested_cr3 = root;
+   mark_dirty(svm->vmcb, VMCB_NPT);
 
/* Also sync guest cr3 here in case we live migrate */
svm->vmcb->save.cr3 = vcpu->arch.cr3;
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: SVM: Add clean-bit for the ASID

2010-12-13 Thread Avi Kivity
From: Joerg Roedel 

This patch implements the clean-bit for the asid in the
vmcb.

Signed-off-by: Joerg Roedel 
Signed-off-by: Avi Kivity 

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1802f7c..a3fd9ba 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -189,6 +189,7 @@ enum {
VMCB_INTERCEPTS, /* Intercept vectors, TSC offset,
pause filter count */
VMCB_PERM_MAP,   /* IOPM Base and MSRPM Base */
+   VMCB_ASID,   /* ASID */
VMCB_DIRTY_MAX,
 };
 
@@ -1488,6 +1489,8 @@ static void new_asid(struct vcpu_svm *svm, struct 
svm_cpu_data *sd)
 
svm->asid_generation = sd->asid_generation;
svm->vmcb->control.asid = sd->next_asid++;
+
+   mark_dirty(svm->vmcb, VMCB_ASID);
 }
 
 static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned long value)
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: SVM: Add clean-bit for interrupt state

2010-12-13 Thread Avi Kivity
From: Joerg Roedel 

This patch implements the clean-bit for all interrupt
related state in the vmcb. This corresponds to vmcb offset
0x60-0x67.

Signed-off-by: Joerg Roedel 
Signed-off-by: Avi Kivity 

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index a3fd9ba..b98092d 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -190,10 +190,12 @@ enum {
pause filter count */
VMCB_PERM_MAP,   /* IOPM Base and MSRPM Base */
VMCB_ASID,   /* ASID */
+   VMCB_INTR,   /* int_ctl, int_vector */
VMCB_DIRTY_MAX,
 };
 
-#define VMCB_ALWAYS_DIRTY_MASK 0U
+/* TPR is always written before VMRUN */
+#define VMCB_ALWAYS_DIRTY_MASK (1U << VMCB_INTR)
 
 static inline void mark_all_dirty(struct vmcb *vmcb)
 {
@@ -2508,6 +2510,8 @@ static int clgi_interception(struct vcpu_svm *svm)
svm_clear_vintr(svm);
svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
 
+   mark_dirty(svm->vmcb, VMCB_INTR);
+
return 1;
 }
 
@@ -2878,6 +2882,7 @@ static int interrupt_window_interception(struct vcpu_svm 
*svm)
kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
svm_clear_vintr(svm);
svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
+   mark_dirty(svm->vmcb, VMCB_INTR);
/*
 * If the user space waits to inject interrupts, exit as soon as
 * possible
@@ -3169,6 +3174,7 @@ static inline void svm_inject_irq(struct vcpu_svm *svm, 
int irq)
control->int_ctl &= ~V_INTR_PRIO_MASK;
control->int_ctl |= V_IRQ_MASK |
((/*control->int_vector >> 4*/ 0xf) << V_INTR_PRIO_SHIFT);
+   mark_dirty(svm->vmcb, VMCB_INTR);
 }
 
 static void svm_set_irq(struct kvm_vcpu *vcpu)
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: SVM: Add clean-bit for intercetps, tsc-offset and pause filter count

2010-12-13 Thread Avi Kivity
From: Joerg Roedel 

This patch adds the clean-bit for intercepts-vectors, the
TSC offset and the pause-filter count to the appropriate
places. The IO and MSR permission bitmaps are not subject to
this bit.

Signed-off-by: Joerg Roedel 
Signed-off-by: Avi Kivity 

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0904c11..609f661 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -186,6 +186,8 @@ static int nested_svm_check_exception(struct vcpu_svm *svm, 
unsigned nr,
  bool has_error_code, u32 error_code);
 
 enum {
+   VMCB_INTERCEPTS, /* Intercept vectors, TSC offset,
+   pause filter count */
VMCB_DIRTY_MAX,
 };
 
@@ -217,6 +219,8 @@ static void recalc_intercepts(struct vcpu_svm *svm)
struct vmcb_control_area *c, *h;
struct nested_state *g;
 
+   mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
+
if (!is_guest_mode(&svm->vcpu))
return;
 
@@ -854,6 +858,8 @@ static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 
offset)
}
 
svm->vmcb->control.tsc_offset = offset + g_tsc_offset;
+
+   mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
 }
 
 static void svm_adjust_tsc_offset(struct kvm_vcpu *vcpu, s64 adjustment)
@@ -863,6 +869,7 @@ static void svm_adjust_tsc_offset(struct kvm_vcpu *vcpu, 
s64 adjustment)
svm->vmcb->control.tsc_offset += adjustment;
if (is_guest_mode(vcpu))
svm->nested.hsave->control.tsc_offset += adjustment;
+   mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
 }
 
 static void init_vmcb(struct vcpu_svm *svm)
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: SVM: Add clean-bit for IOPM_BASE and MSRPM_BASE

2010-12-13 Thread Avi Kivity
From: Joerg Roedel 

This patch adds the clean bit for the physical addresses of
the MSRPM and the IOPM. It does not need to be set in the
code because the only place where these values are changed
is the nested-svm vmrun and vmexit path. These functions
already mark the complete VMCB as dirty.

Signed-off-by: Joerg Roedel 
Signed-off-by: Avi Kivity 

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 609f661..1802f7c 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -188,6 +188,7 @@ static int nested_svm_check_exception(struct vcpu_svm *svm, 
unsigned nr,
 enum {
VMCB_INTERCEPTS, /* Intercept vectors, TSC offset,
pause filter count */
+   VMCB_PERM_MAP,   /* IOPM Base and MSRPM Base */
VMCB_DIRTY_MAX,
 };
 
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: SVM: Add clean-bits infrastructure code

2010-12-13 Thread Avi Kivity
From: Roedel, Joerg 

This patch adds the infrastructure for the implementation of
the individual clean-bits.

Signed-off-by: Joerg Roedel 
Signed-off-by: Avi Kivity 

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 11dbca7..235dd73 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -79,7 +79,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
u32 event_inj_err;
u64 nested_cr3;
u64 lbr_ctl;
-   u64 reserved_5;
+   u32 clean;
+   u32 reserved_5;
u64 next_rip;
u8 reserved_6[816];
 };
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ae943bb..0904c11 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -185,6 +185,28 @@ static int nested_svm_vmexit(struct vcpu_svm *svm);
 static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
  bool has_error_code, u32 error_code);
 
+enum {
+   VMCB_DIRTY_MAX,
+};
+
+#define VMCB_ALWAYS_DIRTY_MASK 0U
+
+static inline void mark_all_dirty(struct vmcb *vmcb)
+{
+   vmcb->control.clean = 0;
+}
+
+static inline void mark_all_clean(struct vmcb *vmcb)
+{
+   vmcb->control.clean = ((1 << VMCB_DIRTY_MAX) - 1)
+  & ~VMCB_ALWAYS_DIRTY_MASK;
+}
+
+static inline void mark_dirty(struct vmcb *vmcb, int bit)
+{
+   vmcb->control.clean &= ~(1 << bit);
+}
+
 static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
 {
return container_of(vcpu, struct vcpu_svm, vcpu);
@@ -973,6 +995,8 @@ static void init_vmcb(struct vcpu_svm *svm)
set_intercept(svm, INTERCEPT_PAUSE);
}
 
+   mark_all_dirty(svm->vmcb);
+
enable_gif(svm);
 }
 
@@ -1089,6 +1113,7 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
if (unlikely(cpu != vcpu->cpu)) {
svm->asid_generation = 0;
+   mark_all_dirty(svm->vmcb);
}
 
 #ifdef CONFIG_X86_64
@@ -2140,6 +2165,8 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
svm->vmcb->save.cpl = 0;
svm->vmcb->control.exit_int_info = 0;
 
+   mark_all_dirty(svm->vmcb);
+
nested_svm_unmap(page);
 
nested_svm_uninit_mmu_context(&svm->vcpu);
@@ -2351,6 +2378,8 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
 
enable_gif(svm);
 
+   mark_all_dirty(svm->vmcb);
+
return true;
 }
 
@@ -3488,6 +3517,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
if (unlikely(svm->vmcb->control.exit_code ==
 SVM_EXIT_EXCP_BASE + MC_VECTOR))
svm_handle_mce(svm);
+
+   mark_all_clean(svm->vmcb);
 }
 
 #undef R
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] pci: Error on PCI capability collisions

2010-12-13 Thread Avi Kivity
From: Alex Williamson 

Nothing good can happen when we overlap capabilities

Signed-off-by: Alex Williamson 
Signed-off-by: Avi Kivity 

diff --git a/hw/pci.c b/hw/pci.c
index b08113d..288d6fd 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1845,6 +1845,20 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
 if (!offset) {
 return -ENOSPC;
 }
+} else {
+int i;
+
+for (i = offset; i < offset + size; i++) {
+if (pdev->config_map[i]) {
+fprintf(stderr, "ERROR: %04x:%02x:%02x.%x "
+"Attempt to add PCI capability %x at offset "
+"%x overlaps existing capability %x at offset %x\n",
+pci_find_domain(pdev->bus), pci_bus_num(pdev->bus),
+PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn),
+cap_id, offset, pdev->config_map[i], i);
+return -EFAULT;
+}
+}
 }
 
 config = pdev->config + offset;
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] device-assignment: Error checking when adding capabilities

2010-12-13 Thread Avi Kivity
From: Alex Williamson 

Signed-off-by: Alex Williamson 
Signed-off-by: Avi Kivity 

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 1a90a89..0ae04de 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1288,7 +1288,7 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 {
 AssignedDevice *dev = container_of(pci_dev, AssignedDevice, dev);
 PCIRegion *pci_region = dev->real_device.regions;
-int pos;
+int ret, pos;
 
 /* Clear initial capabilities pointer and status copied from hw */
 pci_set_byte(pci_dev->config + PCI_CAPABILITY_LIST, 0);
@@ -1303,7 +1303,9 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI))) {
 dev->cap.available |= ASSIGNED_DEVICE_CAP_MSI;
 /* Only 32-bit/no-mask currently supported */
-pci_add_capability(pci_dev, PCI_CAP_ID_MSI, pos, 10);
+if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_MSI, pos, 10)) < 0) {
+return ret;
+}
 
 pci_set_word(pci_dev->config + pos + PCI_MSI_FLAGS,
  pci_get_word(pci_dev->config + pos + PCI_MSI_FLAGS) &
@@ -1325,7 +1327,9 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 uint32_t msix_table_entry;
 
 dev->cap.available |= ASSIGNED_DEVICE_CAP_MSIX;
-pci_add_capability(pci_dev, PCI_CAP_ID_MSIX, pos, 12);
+if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_MSIX, pos, 12)) < 0) 
{
+return ret;
+}
 
 pci_set_word(pci_dev->config + pos + PCI_MSIX_FLAGS,
  pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS) &
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] device-assignment: pass through and stub more PCI caps

2010-12-13 Thread Avi Kivity
From: Alex Williamson 

Some drivers depend on finding capabilities like power management,
PCI express/X, vital product data, or vendor specific fields.  Now
that we have better capability support, we can pass more of these
tables through to the guest.  Note that VPD and VNDR are direct pass
through capabilies, the rest are mostly empty shells with a few
writable bits where necessary.

It may be possible to consolidate dummy capabilities into common files
for other drivers to use, but I prefer to leave them here for now as
we figure out what bits to handle directly with hardware and what bits
are purely emulated.

Signed-off-by: Alex Williamson 
Signed-off-by: Avi Kivity 

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 0ae04de..50c6408 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -67,6 +67,9 @@ static void assigned_device_pci_cap_write_config(PCIDevice 
*pci_dev,
  uint32_t address,
  uint32_t val, int len);
 
+static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
+uint32_t address, int len);
+
 static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
uint32_t addr, int len, uint32_t *val)
 {
@@ -370,11 +373,32 @@ static uint8_t assigned_dev_pci_read_byte(PCIDevice *d, 
int pos)
 return (uint8_t)assigned_dev_pci_read(d, pos, 1);
 }
 
-static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap)
+static void assigned_dev_pci_write(PCIDevice *d, int pos, uint32_t val, int 
len)
+{
+AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
+ssize_t ret;
+int fd = pci_dev->real_device.config_fd;
+
+again:
+ret = pwrite(fd, &val, len, pos);
+if (ret != len) {
+   if ((ret < 0) && (errno == EINTR || errno == EAGAIN))
+   goto again;
+
+   fprintf(stderr, "%s: pwrite failed, ret = %zd errno = %d\n",
+   __func__, ret, errno);
+
+   exit(1);
+}
+
+return;
+}
+
+static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap, uint8_t start)
 {
 int id;
 int max_cap = 48;
-int pos = PCI_CAPABILITY_LIST;
+int pos = start ? start : PCI_CAPABILITY_LIST;
 int status;
 
 status = assigned_dev_pci_read_byte(d, PCI_STATUS);
@@ -453,10 +477,16 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice 
*d, uint32_t address,
 ssize_t ret;
 AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
 
+if (address >= PCI_CONFIG_HEADER_SIZE && d->config_map[address]) {
+val = assigned_device_pci_cap_read_config(d, address, len);
+DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
+  (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
+return val;
+}
+
 if (address < 0x4 || (pci_dev->need_emulate_cmd && address == 0x4) ||
(address >= 0x10 && address <= 0x24) || address == 0x30 ||
-address == 0x34 || address == 0x3c || address == 0x3d ||
-(address >= PCI_CONFIG_HEADER_SIZE && d->config_map[address])) {
+address == 0x34 || address == 0x3c || address == 0x3d) {
 val = pci_default_read_config(d, address, len);
 DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
   (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
@@ -1251,7 +1281,70 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, 
unsigned int ctrl_pos)
 #endif
 #endif
 
-static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t 
address,
+/* There can be multiple VNDR capabilities per device, we need to find the
+ * one that starts closet to the given address without going over. */
+static uint8_t find_vndr_start(PCIDevice *pci_dev, uint32_t address)
+{
+uint8_t cap, pos;
+
+for (cap = pos = 0;
+ (pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_VNDR, pos));
+ pos += PCI_CAP_LIST_NEXT) {
+if (pos <= address) {
+cap = MAX(pos, cap);
+}
+}
+return cap;
+}
+
+/* Merge the bits set in mask from mval into val.  Both val and mval are
+ * at the same addr offset, pos is the starting offset of the mask. */
+static uint32_t merge_bits(uint32_t val, uint32_t mval, uint8_t addr,
+   int len, uint8_t pos, uint32_t mask)
+{
+if (!ranges_overlap(addr, len, pos, 4)) {
+return val;
+}
+
+if (addr >= pos) {
+mask >>= (addr - pos) * 8;
+} else {
+mask <<= (pos - addr) * 8;
+}
+mask &= 0xU >> (4 - len) * 8;
+
+val &= ~mask;
+val |= (mval & mask);
+
+return val;
+}
+
+static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
+uint32_t address, int len)
+{
+uint8_t cap, cap_id = pci_dev->config_map[address];
+uint32_t val;
+
+switch (cap_id) {
+
+case PCI_CA

[COMMIT master] pci: Remove PCI_CAPABILITY_CONFIG_*

2010-12-13 Thread Avi Kivity
From: Alex Williamson 

Half of these aren't used anywhere, the other half are wrong.  Now that
device assignment is trying to match physical hardware offsets for PCI
capabilities, we can't round up the MSI and MSI-X length.  MSI-X is
always 12 bytes.  MSI is variable length depending on features, but for
the current device assignment implementation, it's always the minimum
length of 10 bytes.

Signed-off-by: Alex Williamson 
Signed-off-by: Avi Kivity 

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 6d6e657..1a90a89 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1302,10 +1302,9 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
  * MSI capability is the 1st capability in capability config */
 if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI))) {
 dev->cap.available |= ASSIGNED_DEVICE_CAP_MSI;
-pci_add_capability(pci_dev, PCI_CAP_ID_MSI, pos,
-   PCI_CAPABILITY_CONFIG_MSI_LENGTH);
-
 /* Only 32-bit/no-mask currently supported */
+pci_add_capability(pci_dev, PCI_CAP_ID_MSI, pos, 10);
+
 pci_set_word(pci_dev->config + pos + PCI_MSI_FLAGS,
  pci_get_word(pci_dev->config + pos + PCI_MSI_FLAGS) &
  PCI_MSI_FLAGS_QMASK);
@@ -1326,8 +1325,7 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 uint32_t msix_table_entry;
 
 dev->cap.available |= ASSIGNED_DEVICE_CAP_MSIX;
-pci_add_capability(pci_dev, PCI_CAP_ID_MSIX, pos,
-   PCI_CAPABILITY_CONFIG_MSIX_LENGTH);
+pci_add_capability(pci_dev, PCI_CAP_ID_MSIX, pos, 12);
 
 pci_set_word(pci_dev->config + pos + PCI_MSIX_FLAGS,
  pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS) &
diff --git a/hw/pci.h b/hw/pci.h
index 34955d8..d579738 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -122,11 +122,6 @@ enum {
 QEMU_PCI_CAP_MULTIFUNCTION = (1 << QEMU_PCI_CAP_MULTIFUNCTION_BITNR),
 };
 
-#define PCI_CAPABILITY_CONFIG_MAX_LENGTH 0x60
-#define PCI_CAPABILITY_CONFIG_DEFAULT_START_ADDR 0x40
-#define PCI_CAPABILITY_CONFIG_MSI_LENGTH 0x10
-#define PCI_CAPABILITY_CONFIG_MSIX_LENGTH 0x10
-
 typedef int (*msix_mask_notifier_func)(PCIDevice *, unsigned vector,
   int masked);
 
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] device-assignment: Fix off-by-one in header check

2010-12-13 Thread Avi Kivity
From: Alex Williamson 

Include the first byte at 40h or else access might go to the
hardware instead of the emulated config space, resulting in
capability loops, since the ordering is different.

Signed-off-by: Alex Williamson 
Signed-off-by: Avi Kivity 

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 832c236..6d6e657 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -410,7 +410,7 @@ static void assigned_dev_pci_write_config(PCIDevice *d, 
uint32_t address,
   ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
   (uint16_t) address, val, len);
 
-if (address > PCI_CONFIG_HEADER_SIZE && d->config_map[address]) {
+if (address >= PCI_CONFIG_HEADER_SIZE && d->config_map[address]) {
 return assigned_device_pci_cap_write_config(d, address, val, len);
 }
 
@@ -456,7 +456,7 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, 
uint32_t address,
 if (address < 0x4 || (pci_dev->need_emulate_cmd && address == 0x4) ||
(address >= 0x10 && address <= 0x24) || address == 0x30 ||
 address == 0x34 || address == 0x3c || address == 0x3d ||
-(address > PCI_CONFIG_HEADER_SIZE && d->config_map[address])) {
+(address >= PCI_CONFIG_HEADER_SIZE && d->config_map[address])) {
 val = pci_default_read_config(d, address, len);
 DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
   (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
--
To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html