[PATCH v2] ARM: mm: Fix stage-2 device memory attributes

2014-01-04 Thread Christoffer Dall
The stage-2 memory attributes are distinct from the Hyp memory
attributes and the Stage-1 memory attributes.  We were using the stage-1
memory attributes for stage-2 mappings causing device mappings to be
mapped as normal memory.  Add the S2 equivalent defines for memory
attributes and fix the comments explaining the defines while at it.

Add a prot_pte_s2 field to the mem_type struct and fill out the field
for device mappings accordingly.

Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
---
Changelog[v2]:
 - Guard the use of L_PTE_S2 defines with s2_policy to allow non-LPAE compiles.

 arch/arm/include/asm/pgtable-3level.h | 20 +---
 arch/arm/mm/mm.h  |  1 +
 arch/arm/mm/mmu.c | 15 ++-
 3 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-3level.h 
b/arch/arm/include/asm/pgtable-3level.h
index 4f95039..d5e04d6 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -120,13 +120,19 @@
 /*
  * 2nd stage PTE definitions for LPAE.
  */
-#define L_PTE_S2_MT_UNCACHED(_AT(pteval_t, 0x5)  2) /* MemAttr[3:0] */
-#define L_PTE_S2_MT_WRITETHROUGH (_AT(pteval_t, 0xa)  2) /* MemAttr[3:0] */
-#define L_PTE_S2_MT_WRITEBACK   (_AT(pteval_t, 0xf)  2) /* MemAttr[3:0] */
-#define L_PTE_S2_RDONLY (_AT(pteval_t, 1)  6)   /* HAP[1]   
*/
-#define L_PTE_S2_RDWR   (_AT(pteval_t, 3)  6)   /* HAP[2:1] */
-
-#define L_PMD_S2_RDWR   (_AT(pmdval_t, 3)  6)   /* HAP[2:1] */
+#define L_PTE_S2_MT_UNCACHED   (_AT(pteval_t, 0x0)  2) /* strongly 
ordered */
+#define L_PTE_S2_MT_WRITETHROUGH   (_AT(pteval_t, 0xa)  2) /* normal 
inner write-through */
+#define L_PTE_S2_MT_WRITEBACK  (_AT(pteval_t, 0xf)  2) /* normal 
inner write-back */
+#define L_PTE_S2_MT_DEV_SHARED (_AT(pteval_t, 0x1)  2) /* device */
+#define L_PTE_S2_MT_DEV_NONSHARED  (_AT(pteval_t, 0x1)  2) /* device */
+#define L_PTE_S2_MT_DEV_WC (_AT(pteval_t, 0x5)  2) /* normal 
non-cacheable */
+#define L_PTE_S2_MT_DEV_CACHED (_AT(pteval_t, 0xf)  2) /* normal 
inner write-back */
+#define L_PTE_S2_MT_MASK   (_AT(pteval_t, 0xf)  2)
+
+#define L_PTE_S2_RDONLY(_AT(pteval_t, 1)  6)   /* 
HAP[1]   */
+#define L_PTE_S2_RDWR  (_AT(pteval_t, 3)  6)   /* HAP[2:1] */
+
+#define L_PMD_S2_RDWR  (_AT(pmdval_t, 3)  6)   /* HAP[2:1] */
 
 /*
  * Hyp-mode PL2 PTE definitions for LPAE.
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index d5a982d..7ea641b 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -38,6 +38,7 @@ static inline pmd_t *pmd_off_k(unsigned long virt)
 
 struct mem_type {
pteval_t prot_pte;
+   pteval_t prot_pte_s2;
pmdval_t prot_l1;
pmdval_t prot_sect;
unsigned int domain;
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 580ef2d..44d571f 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -231,36 +231,48 @@ __setup(noalign, noalign_setup);
 #endif /* ifdef CONFIG_CPU_CP15 / else */
 
 #define PROT_PTE_DEVICE
L_PTE_PRESENT|L_PTE_YOUNG|L_PTE_DIRTY|L_PTE_XN
+#define PROT_PTE_S2_DEVICE PROT_PTE_DEVICE
 #define PROT_SECT_DEVICE   PMD_TYPE_SECT|PMD_SECT_AP_WRITE
 
 static struct mem_type mem_types[] = {
[MT_DEVICE] = {   /* Strongly ordered / ARMv6 shared device */
.prot_pte   = PROT_PTE_DEVICE | L_PTE_MT_DEV_SHARED |
  L_PTE_SHARED,
+   .prot_pte_s2= s2_policy(PROT_PTE_S2_DEVICE) |
+ s2_policy(L_PTE_S2_MT_DEV_SHARED) |
+ L_PTE_SHARED,
.prot_l1= PMD_TYPE_TABLE,
.prot_sect  = PROT_SECT_DEVICE | PMD_SECT_S,
.domain = DOMAIN_IO,
},
[MT_DEVICE_NONSHARED] = { /* ARMv6 non-shared device */
.prot_pte   = PROT_PTE_DEVICE | L_PTE_MT_DEV_NONSHARED,
+   .prot_pte_s2= s2_policy(PROT_PTE_S2_DEVICE) |
+ s2_policy(L_PTE_S2_MT_DEV_NONSHARED),
.prot_l1= PMD_TYPE_TABLE,
.prot_sect  = PROT_SECT_DEVICE,
.domain = DOMAIN_IO,
},
[MT_DEVICE_CACHED] = {/* ioremap_cached */
.prot_pte   = PROT_PTE_DEVICE | L_PTE_MT_DEV_CACHED,
+   .prot_pte_s2= s2_policy(PROT_PTE_S2_DEVICE) |
+ s2_policy(L_PTE_S2_MT_DEV_CACHED),
.prot_l1= PMD_TYPE_TABLE,
.prot_sect  = PROT_SECT_DEVICE | PMD_SECT_WB,
.domain = DOMAIN_IO,
},
[MT_DEVICE_WC] = {  /* ioremap_wc */
.prot_pte   = PROT_PTE_DEVICE | L_PTE_MT_DEV_WC,
+   .prot_pte_s2= s2_policy(PROT_PTE_S2_DEVICE) |
+   

[PATCH 0/3] trace-cmd: Updates for kvm plugin

2014-01-04 Thread Jan Kiszka
Patch 1 is resent unchanged from a previous round, patches 2 and 3
improve the output of nested vmexit tracepoints.

Jan Kiszka (3):
  trace-cmd: Report unknown VMX exit reasons with code
  trace-cmd: Factor out print_exit_reason in kvm plugin
  trace-cmd: Fix and cleanup kvm_nested_vmexit tracepoints

 plugin_kvm.c | 47 +++
 1 file changed, 27 insertions(+), 20 deletions(-)

-- 
1.8.1.1.298.ge7eed54

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] trace-cmd: Fix and cleanup kvm_nested_vmexit tracepoints

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Fix several issues of kvm_nested_vmexit[_inject]: field width aren't
supported with pevent_print, rip was printed twice/incorrectly, SVM ISA
was hard-coded, we don't use ':' to separate field names.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 plugin_kvm.c | 18 ++
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/plugin_kvm.c b/plugin_kvm.c
index c407e55..cb52e9e 100644
--- a/plugin_kvm.c
+++ b/plugin_kvm.c
@@ -330,19 +330,13 @@ static int kvm_emulate_insn_handler(struct trace_seq *s, 
struct pevent_record *r
 static int kvm_nested_vmexit_inject_handler(struct trace_seq *s, struct 
pevent_record *record,
struct event_format *event, void 
*context)
 {
-   unsigned long long val;
-
-   pevent_print_num_field(s,  rip %0x016llx, event, rip, record, 1);
-
-   if (pevent_get_field_val(s, event, exit_code, record, val, 1)  0)
+   if (print_exit_reason(s, record, event, exit_code)  0)
return -1;
 
-   trace_seq_printf(s, reason %s, find_exit_reason(2, val));
-
-   pevent_print_num_field(s,  ext_inf1: %0x016llx, event, exit_info1, 
record, 1);
-   pevent_print_num_field(s,  ext_inf2: %0x016llx, event, exit_info2, 
record, 1);
-   pevent_print_num_field(s,  ext_int: %0x016llx, event, 
exit_int_info, record, 1);
-   pevent_print_num_field(s,  ext_int_err: %0x016llx, event, 
exit_int_info_err, record, 1);
+   pevent_print_num_field(s,  info1 %llx, event, exit_info1, record, 
1);
+   pevent_print_num_field(s,  info2 %llx, event, exit_info2, record, 
1);
+   pevent_print_num_field(s,  int_info %llx, event, exit_int_info, 
record, 1);
+   pevent_print_num_field(s,  int_info_err %llx, event, 
exit_int_info_err, record, 1);
 
return 0;
 }
@@ -350,7 +344,7 @@ static int kvm_nested_vmexit_inject_handler(struct 
trace_seq *s, struct pevent_r
 static int kvm_nested_vmexit_handler(struct trace_seq *s, struct pevent_record 
*record,
 struct event_format *event, void *context)
 {
-   pevent_print_num_field(s,  rip %0x016llx, event, rip, record, 1);
+   pevent_print_num_field(s, rip %lx , event, rip, record, 1);
 
return kvm_nested_vmexit_inject_handler(s, record, event, context);
 }
-- 
1.8.1.1.298.ge7eed54

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] trace-cmd: Factor out print_exit_reason in kvm plugin

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

We will reuse it for nested vmexit tracepoints.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 plugin_kvm.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/plugin_kvm.c b/plugin_kvm.c
index 59443e5..c407e55 100644
--- a/plugin_kvm.c
+++ b/plugin_kvm.c
@@ -244,15 +244,14 @@ static const char *find_exit_reason(unsigned isa, int val)
return strings[i].str;
 }
 
-static int kvm_exit_handler(struct trace_seq *s, struct pevent_record *record,
-   struct event_format *event, void *context)
+static int print_exit_reason(struct trace_seq *s, struct pevent_record *record,
+struct event_format *event, const char *field)
 {
unsigned long long isa;
unsigned long long val;
-   unsigned long long info1 = 0, info2 = 0;
const char *reason;
 
-   if (pevent_get_field_val(s, event, exit_reason, record, val, 1)  0)
+   if (pevent_get_field_val(s, event, field, record, val, 1)  0)
return -1;
 
if (pevent_get_field_val(s, event, isa, record, isa, 0)  0)
@@ -263,6 +262,16 @@ static int kvm_exit_handler(struct trace_seq *s, struct 
pevent_record *record,
trace_seq_printf(s, reason %s, reason);
else
trace_seq_printf(s, reason UNKNOWN (%llu), val);
+   return 0;
+}
+
+static int kvm_exit_handler(struct trace_seq *s, struct pevent_record *record,
+   struct event_format *event, void *context)
+{
+   unsigned long long info1 = 0, info2 = 0;
+
+   if (print_exit_reason(s, record, event, exit_reason)  0)
+   return -1;
 
pevent_print_num_field(s,  rip 0x%lx, event, guest_rip, record, 1);
 
-- 
1.8.1.1.298.ge7eed54

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] trace-cmd: Report unknown VMX exit reasons with code

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Allows to parse the result even if the KVM plugin does not yet
understand a specific exit code.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 plugin_kvm.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/plugin_kvm.c b/plugin_kvm.c
index 8a25cf1..59443e5 100644
--- a/plugin_kvm.c
+++ b/plugin_kvm.c
@@ -240,9 +240,8 @@ static const char *find_exit_reason(unsigned isa, int val)
for (i = 0; strings[i].val = 0; i++)
if (strings[i].val == val)
break;
-   if (strings[i].str)
-   return strings[i].str;
-   return UNKNOWN;
+
+   return strings[i].str;
 }
 
 static int kvm_exit_handler(struct trace_seq *s, struct pevent_record *record,
@@ -251,6 +250,7 @@ static int kvm_exit_handler(struct trace_seq *s, struct 
pevent_record *record,
unsigned long long isa;
unsigned long long val;
unsigned long long info1 = 0, info2 = 0;
+   const char *reason;
 
if (pevent_get_field_val(s, event, exit_reason, record, val, 1)  0)
return -1;
@@ -258,7 +258,11 @@ static int kvm_exit_handler(struct trace_seq *s, struct 
pevent_record *record,
if (pevent_get_field_val(s, event, isa, record, isa, 0)  0)
isa = 1;
 
-   trace_seq_printf(s, reason %s, find_exit_reason(isa, val));
+   reason = find_exit_reason(isa, val);
+   if (reason)
+   trace_seq_printf(s, reason %s, reason);
+   else
+   trace_seq_printf(s, reason UNKNOWN (%llu), val);
 
pevent_print_num_field(s,  rip 0x%lx, event, guest_rip, record, 1);
 
-- 
1.8.1.1.298.ge7eed54

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX

2014-01-04 Thread Jan Kiszka
This is on top of next after merging in the two patches of mine that are
only present in master ATM.

Highlights:
 - reworked fix of DR6 reading on SVM
 - full check for invalid writes to IA32_APIC_BASE
 - fixed support for halting in L2 (nVMX)
 - fully emulated preemption timer (nVMX)
 - tracing of nested vmexits (nVMX)

The patch KVM: nVMX: Leave VMX mode on clearing of feature control MSR
is included again, unchanged from previous posting.

Most fixes are backed by KVM unit tests, to be posted soon as well.

Jan Kiszka (12):
  KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS
  KVM: SVM: Fix reading of DR6
  KVM: VMX: Fix DR6 update on #DB exception
  KVM: x86: Validate guest writes to MSR_IA32_APICBASE
  KVM: nVMX: Leave VMX mode on clearing of feature control MSR
  KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit
  KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject
  KVM: nVMX: Clean up handling of VMX-related MSRs
  KVM: nVMX: Fix nested_run_pending on activity state HLT
  KVM: nVMX: Update guest activity state field on L2 exits
  KVM: nVMX: Rework interception of IRQs and NMIs
  KVM: nVMX: Fully emulate preemption timer

 arch/x86/include/asm/kvm_host.h   |   4 +
 arch/x86/include/uapi/asm/msr-index.h |   1 +
 arch/x86/kvm/cpuid.h  |   8 +
 arch/x86/kvm/lapic.h  |   2 +-
 arch/x86/kvm/svm.c|  15 ++
 arch/x86/kvm/vmx.c| 399 --
 arch/x86/kvm/x86.c|  67 +-
 7 files changed, 318 insertions(+), 178 deletions(-)

-- 
1.8.1.1.298.ge7eed54

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/12] KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Whenever we change arch.dr7, we also have to call kvm_update_dr7. In
case guest debugging is off, this will synchronize the new state into
hardware.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 arch/x86/kvm/x86.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1dc0359..5f75230 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2988,6 +2988,7 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct 
kvm_vcpu *vcpu,
memcpy(vcpu-arch.db, dbgregs-db, sizeof(vcpu-arch.db));
vcpu-arch.dr6 = dbgregs-dr6;
vcpu-arch.dr7 = dbgregs-dr7;
+   kvm_update_dr7(vcpu);
 
return 0;
 }
-- 
1.8.1.1.298.ge7eed54

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/12] KVM: nVMX: Update guest activity state field on L2 exits

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Set guest activity state in L1's VMCS according to the VCPUs mp_state.
This ensures we report the correct state in case we L2 executed HLT or
if we put L2 into HLT state and it was now woken up by an event.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 arch/x86/kvm/vmx.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bde8ddd..1245ff1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8223,6 +8223,10 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12,
vmcs_read32(GUEST_INTERRUPTIBILITY_INFO);
vmcs12-guest_pending_dbg_exceptions =
vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS);
+   if (vcpu-arch.mp_state == KVM_MP_STATE_HALTED)
+   vmcs12-guest_activity_state = GUEST_ACTIVITY_HLT;
+   else
+   vmcs12-guest_activity_state = GUEST_ACTIVITY_ACTIVE;
 
if ((vmcs12-pin_based_vm_exec_control  
PIN_BASED_VMX_PREEMPTION_TIMER) 
(vmcs12-vm_exit_controls  VM_EXIT_SAVE_VMX_PREEMPTION_TIMER))
-- 
1.8.1.1.298.ge7eed54

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/12] KVM: SVM: Fix reading of DR6

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

In contrast to VMX, SVM dose not automatically transfer DR6 into the
VCPU's arch.dr6. So if we face a DR6 read, we must consult a new vendor
hook to obtain the current value. And as SVM now picks the DR6 state
from its VMCB, we also need a set callback in order to write updates of
DR6 back.

Fixes a regression of 020df0794f.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/svm.c  | 15 +++
 arch/x86/kvm/vmx.c  | 11 +++
 arch/x86/kvm/x86.c  | 19 +--
 4 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ae5d783..e73651b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -699,6 +699,8 @@ struct kvm_x86_ops {
void (*set_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
void (*get_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
void (*set_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
+   u64 (*get_dr6)(struct kvm_vcpu *vcpu);
+   void (*set_dr6)(struct kvm_vcpu *vcpu, unsigned long value);
void (*set_dr7)(struct kvm_vcpu *vcpu, unsigned long value);
void (*cache_reg)(struct kvm_vcpu *vcpu, enum kvm_reg reg);
unsigned long (*get_rflags)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c7168a5..e81df8f 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1671,6 +1671,19 @@ static void new_asid(struct vcpu_svm *svm, struct 
svm_cpu_data *sd)
mark_dirty(svm-vmcb, VMCB_ASID);
 }
 
+static u64 svm_get_dr6(struct kvm_vcpu *vcpu)
+{
+   return to_svm(vcpu)-vmcb-save.dr6;
+}
+
+static void svm_set_dr6(struct kvm_vcpu *vcpu, unsigned long value)
+{
+   struct vcpu_svm *svm = to_svm(vcpu);
+
+   svm-vmcb-save.dr6 = value;
+   mark_dirty(svm-vmcb, VMCB_DR);
+}
+
 static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned long value)
 {
struct vcpu_svm *svm = to_svm(vcpu);
@@ -4286,6 +4299,8 @@ static struct kvm_x86_ops svm_x86_ops = {
.set_idt = svm_set_idt,
.get_gdt = svm_get_gdt,
.set_gdt = svm_set_gdt,
+   .get_dr6 = svm_get_dr6,
+   .set_dr6 = svm_set_dr6,
.set_dr7 = svm_set_dr7,
.cache_reg = svm_cache_reg,
.get_rflags = svm_get_rflags,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e947cba..55cb4b6 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5149,6 +5149,15 @@ static int handle_dr(struct kvm_vcpu *vcpu)
return 1;
 }
 
+static u64 vmx_get_dr6(struct kvm_vcpu *vcpu)
+{
+   return vcpu-arch.dr6;
+}
+
+static void vmx_set_dr6(struct kvm_vcpu *vcpu, unsigned long val)
+{
+}
+
 static void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
 {
vmcs_writel(GUEST_DR7, val);
@@ -8558,6 +8567,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
.set_idt = vmx_set_idt,
.get_gdt = vmx_get_gdt,
.set_gdt = vmx_set_gdt,
+   .get_dr6 = vmx_get_dr6,
+   .set_dr6 = vmx_set_dr6,
.set_dr7 = vmx_set_dr7,
.cache_reg = vmx_cache_reg,
.get_rflags = vmx_get_rflags,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5f75230..ea7c6a5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -719,6 +719,12 @@ unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_get_cr8);
 
+static void kvm_update_dr6(struct kvm_vcpu *vcpu)
+{
+   if (!(vcpu-guest_debug  KVM_GUESTDBG_USE_HW_BP))
+   kvm_x86_ops-set_dr6(vcpu, vcpu-arch.dr6);
+}
+
 static void kvm_update_dr7(struct kvm_vcpu *vcpu)
 {
unsigned long dr7;
@@ -747,6 +753,7 @@ static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, 
unsigned long val)
if (val  0xULL)
return -1; /* #GP */
vcpu-arch.dr6 = (val  DR6_VOLATILE) | DR6_FIXED_1;
+   kvm_update_dr6(vcpu);
break;
case 5:
if (kvm_read_cr4_bits(vcpu, X86_CR4_DE))
@@ -788,7 +795,10 @@ static int _kvm_get_dr(struct kvm_vcpu *vcpu, int dr, 
unsigned long *val)
return 1;
/* fall through */
case 6:
-   *val = vcpu-arch.dr6;
+   if (vcpu-guest_debug  KVM_GUESTDBG_USE_HW_BP)
+   *val = vcpu-arch.dr6;
+   else
+   *val = kvm_x86_ops-get_dr6(vcpu);
break;
case 5:
if (kvm_read_cr4_bits(vcpu, X86_CR4_DE))
@@ -2972,8 +2982,11 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct 
kvm_vcpu *vcpu,
 static void kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
 struct kvm_debugregs *dbgregs)
 {
+   unsigned long val;
+
memcpy(dbgregs-db, vcpu-arch.db, sizeof(vcpu-arch.db));
-   dbgregs-dr6 

[PATCH 11/12] KVM: nVMX: Rework interception of IRQs and NMIs

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Move the check for leaving L2 on pending and intercepted IRQs or NMIs
from the *_allowed handler into a dedicated callback. Invoke this
callback at the relevant points before KVM checks if IRQs/NMIs can be
injected. The callback has the task to switch from L2 to L1 if needed
and inject the proper vmexit events.

The rework fixes L2 wakeups from HLT and provides the foundation for
preemption timer emulation.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/vmx.c  | 67 +++--
 arch/x86/kvm/x86.c  | 15 +++--
 3 files changed, 53 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e73651b..d195421 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -764,6 +764,8 @@ struct kvm_x86_ops {
   struct x86_instruction_info *info,
   enum x86_intercept_stage stage);
void (*handle_external_intr)(struct kvm_vcpu *vcpu);
+
+   int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1245ff1..ec8a976 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4620,22 +4620,8 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool 
masked)
 
 static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
 {
-   if (is_guest_mode(vcpu)) {
-   if (to_vmx(vcpu)-nested.nested_run_pending)
-   return 0;
-   if (nested_exit_on_nmi(vcpu)) {
-   nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
- NMI_VECTOR | INTR_TYPE_NMI_INTR |
- INTR_INFO_VALID_MASK, 0);
-   /*
-* The NMI-triggered VM exit counts as injection:
-* clear this one and block further NMIs.
-*/
-   vcpu-arch.nmi_pending = 0;
-   vmx_set_nmi_mask(vcpu, true);
-   return 0;
-   }
-   }
+   if (to_vmx(vcpu)-nested.nested_run_pending)
+   return 0;
 
if (!cpu_has_virtual_nmis()  to_vmx(vcpu)-soft_vnmi_blocked)
return 0;
@@ -4647,19 +4633,8 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
 
 static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
 {
-   if (is_guest_mode(vcpu)) {
-   if (to_vmx(vcpu)-nested.nested_run_pending)
-   return 0;
-   if (nested_exit_on_intr(vcpu)) {
-   nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT,
- 0, 0);
-   /*
-* fall through to normal code, but now in L1, not L2
-*/
-   }
-   }
-
-   return (vmcs_readl(GUEST_RFLAGS)  X86_EFLAGS_IF) 
+   return (!to_vmx(vcpu)-nested.nested_run_pending 
+   vmcs_readl(GUEST_RFLAGS)  X86_EFLAGS_IF) 
!(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) 
(GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS));
 }
@@ -8158,6 +8133,35 @@ static void vmcs12_save_pending_event(struct kvm_vcpu 
*vcpu,
}
 }
 
+static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+   if (vcpu-arch.nmi_pending  nested_exit_on_nmi(vcpu)) {
+   if (vmx-nested.nested_run_pending)
+   return -EBUSY;
+   nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
+ NMI_VECTOR | INTR_TYPE_NMI_INTR |
+ INTR_INFO_VALID_MASK, 0);
+   /*
+* The NMI-triggered VM exit counts as injection:
+* clear this one and block further NMIs.
+*/
+   vcpu-arch.nmi_pending = 0;
+   vmx_set_nmi_mask(vcpu, true);
+   return 0;
+   }
+
+   if ((kvm_cpu_has_interrupt(vcpu) || external_intr) 
+   nested_exit_on_intr(vcpu)) {
+   if (vmx-nested.nested_run_pending)
+   return -EBUSY;
+   nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
+   }
+
+   return 0;
+}
+
 /*
  * prepare_vmcs12 is part of what we need to do when the nested L2 guest exits
  * and we want to prepare to run its L1 parent. L1 keeps a vmcs for L2 
(vmcs12),
@@ -8498,6 +8502,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
nested_vmx_succeed(vcpu);
if (enable_shadow_vmcs)
vmx-nested.sync_shadow_vmcs = true;
+
+   /* in case we halted in L2 */
+   

[PATCH 12/12] KVM: nVMX: Fully emulate preemption timer

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

We cannot rely on the hardware-provided preemption timer support because
we are holding L2 in HLT outside non-root mode. Furthermore, emulating
the preemption will resolve tick rate errata on older Intel CPUs.

The emulation is based on hrtimer which is started on L2 entry, stopped
on L2 exit and evaluated via the new check_nested_events hook. As we no
longer rely on hardware features, we can enable both the preemption
timer support and value saving unconditionally.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 arch/x86/kvm/vmx.c | 151 ++---
 1 file changed, 96 insertions(+), 55 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ec8a976..51d13c7 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -31,6 +31,7 @@
 #include linux/ftrace_event.h
 #include linux/slab.h
 #include linux/tboot.h
+#include linux/hrtimer.h
 #include kvm_cache_regs.h
 #include x86.h
 
@@ -110,6 +111,8 @@ module_param(nested, bool, S_IRUGO);
 
 #define RMODE_GUEST_OWNED_EFLAGS_BITS (~(X86_EFLAGS_IOPL | X86_EFLAGS_VM))
 
+#define VMX_MISC_EMULATED_PREEMPTION_TIMER_RATE 5
+
 /*
  * These 2 parameters are used to config the controls for Pause-Loop Exiting:
  * ple_gap:upper bound on the amount of time between two successive
@@ -374,6 +377,9 @@ struct nested_vmx {
 */
struct page *apic_access_page;
u64 msr_ia32_feature_control;
+
+   struct hrtimer preemption_timer;
+   bool preemption_timer_expired;
 };
 
 #define POSTED_INTR_ON  0
@@ -1047,6 +1053,12 @@ static inline bool nested_cpu_has_virtual_nmis(struct 
vmcs12 *vmcs12)
return vmcs12-pin_based_vm_exec_control  PIN_BASED_VIRTUAL_NMIS;
 }
 
+static inline bool nested_cpu_has_preemption_timer(struct vmcs12 *vmcs12)
+{
+   return vmcs12-pin_based_vm_exec_control 
+   PIN_BASED_VMX_PREEMPTION_TIMER;
+}
+
 static inline int nested_cpu_has_ept(struct vmcs12 *vmcs12)
 {
return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_EPT);
@@ -2248,9 +2260,9 @@ static __init void nested_vmx_setup_ctls_msrs(void)
 */
nested_vmx_pinbased_ctls_low |= PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
nested_vmx_pinbased_ctls_high = PIN_BASED_EXT_INTR_MASK |
-   PIN_BASED_NMI_EXITING | PIN_BASED_VIRTUAL_NMIS |
+   PIN_BASED_NMI_EXITING | PIN_BASED_VIRTUAL_NMIS;
+   nested_vmx_pinbased_ctls_high |= PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR |
PIN_BASED_VMX_PREEMPTION_TIMER;
-   nested_vmx_pinbased_ctls_high |= PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
 
/*
 * Exit controls
@@ -2265,15 +2277,10 @@ static __init void nested_vmx_setup_ctls_msrs(void)
 #ifdef CONFIG_X86_64
VM_EXIT_HOST_ADDR_SPACE_SIZE |
 #endif
-   VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
+   VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
+   nested_vmx_exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
+   VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER |
VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
-   if (!(nested_vmx_pinbased_ctls_high  PIN_BASED_VMX_PREEMPTION_TIMER) ||
-   !(nested_vmx_exit_ctls_high  VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {
-   nested_vmx_exit_ctls_high = ~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
-   nested_vmx_pinbased_ctls_high = 
~PIN_BASED_VMX_PREEMPTION_TIMER;
-   }
-   nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
-   VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER);
 
/* entry controls */
rdmsr(MSR_IA32_VMX_ENTRY_CTLS,
@@ -2342,9 +2349,9 @@ static __init void nested_vmx_setup_ctls_msrs(void)
 
/* miscellaneous data */
rdmsr(MSR_IA32_VMX_MISC, nested_vmx_misc_low, nested_vmx_misc_high);
-   nested_vmx_misc_low = VMX_MISC_PREEMPTION_TIMER_RATE_MASK |
-   VMX_MISC_SAVE_EFER_LMA;
-   nested_vmx_misc_low |= VMX_MISC_ACTIVITY_HLT;
+   nested_vmx_misc_low = VMX_MISC_SAVE_EFER_LMA;
+   nested_vmx_misc_low |= VMX_MISC_EMULATED_PREEMPTION_TIMER_RATE |
+   VMX_MISC_ACTIVITY_HLT;
nested_vmx_misc_high = 0;
 }
 
@@ -5702,6 +5709,18 @@ static void nested_vmx_failValid(struct kvm_vcpu *vcpu,
 */
 }
 
+static enum hrtimer_restart vmx_preemption_timer_fn(struct hrtimer *timer)
+{
+   struct vcpu_vmx *vmx =
+   container_of(timer, struct vcpu_vmx, nested.preemption_timer);
+
+   vmx-nested.preemption_timer_expired = true;
+   kvm_make_request(KVM_REQ_EVENT, vmx-vcpu);
+   kvm_vcpu_kick(vmx-vcpu);
+
+   return HRTIMER_NORESTART;
+}
+
 /*
  * Emulate the VMXON instruction.
  * Currently, we just remember that VMX is active, and do not save or even
@@ -5766,6 +5785,10 @@ static int handle_vmon(struct kvm_vcpu *vcpu)
INIT_LIST_HEAD((vmx-nested.vmcs02_pool));
vmx-nested.vmcs02_num = 0;
 
+   

[PATCH 06/12] KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Instead of fixing up the vmcs12 after the nested vmexit, pass key
parameters already when calling nested_vmx_vmexit. This will help
tracing those vmexits.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 arch/x86/kvm/vmx.c | 63 +-
 1 file changed, 34 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 3edf08f..0bd0509 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1058,7 +1058,9 @@ static inline bool is_exception(u32 intr_info)
== (INTR_TYPE_HARD_EXCEPTION | INTR_INFO_VALID_MASK);
 }
 
-static void nested_vmx_vmexit(struct kvm_vcpu *vcpu);
+static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
+ u32 exit_intr_info,
+ unsigned long exit_qualification);
 static void nested_vmx_entry_failure(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12,
u32 reason, unsigned long qualification);
@@ -1967,7 +1969,9 @@ static int nested_vmx_check_exception(struct kvm_vcpu 
*vcpu, unsigned nr)
if (!(vmcs12-exception_bitmap  (1u  nr)))
return 0;
 
-   nested_vmx_vmexit(vcpu);
+   nested_vmx_vmexit(vcpu, to_vmx(vcpu)-exit_reason,
+ vmcs_read32(VM_EXIT_INTR_INFO),
+ vmcs_readl(EXIT_QUALIFICATION));
return 1;
 }
 
@@ -4650,15 +4654,12 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, 
bool masked)
 static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
 {
if (is_guest_mode(vcpu)) {
-   struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
-
if (to_vmx(vcpu)-nested.nested_run_pending)
return 0;
if (nested_exit_on_nmi(vcpu)) {
-   nested_vmx_vmexit(vcpu);
-   vmcs12-vm_exit_reason = EXIT_REASON_EXCEPTION_NMI;
-   vmcs12-vm_exit_intr_info = NMI_VECTOR |
-   INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK;
+   nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
+ NMI_VECTOR | INTR_TYPE_NMI_INTR |
+ INTR_INFO_VALID_MASK, 0);
/*
 * The NMI-triggered VM exit counts as injection:
 * clear this one and block further NMIs.
@@ -4680,15 +4681,11 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
 static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
 {
if (is_guest_mode(vcpu)) {
-   struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
-
if (to_vmx(vcpu)-nested.nested_run_pending)
return 0;
if (nested_exit_on_intr(vcpu)) {
-   nested_vmx_vmexit(vcpu);
-   vmcs12-vm_exit_reason =
-   EXIT_REASON_EXTERNAL_INTERRUPT;
-   vmcs12-vm_exit_intr_info = 0;
+   nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT,
+ 0, 0);
/*
 * fall through to normal code, but now in L1, not L2
 */
@@ -6853,7 +6850,9 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
return handle_invalid_guest_state(vcpu);
 
if (is_guest_mode(vcpu)  nested_vmx_exit_handled(vcpu)) {
-   nested_vmx_vmexit(vcpu);
+   nested_vmx_vmexit(vcpu, exit_reason,
+ vmcs_read32(VM_EXIT_INTR_INFO),
+ vmcs_readl(EXIT_QUALIFICATION));
return 1;
}
 
@@ -7594,15 +7593,14 @@ static void vmx_set_supported_cpuid(u32 func, struct 
kvm_cpuid_entry2 *entry)
 static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
struct x86_exception *fault)
 {
-   struct vmcs12 *vmcs12;
-   nested_vmx_vmexit(vcpu);
-   vmcs12 = get_vmcs12(vcpu);
+   struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+   u32 exit_reason;
 
if (fault-error_code  PFERR_RSVD_MASK)
-   vmcs12-vm_exit_reason = EXIT_REASON_EPT_MISCONFIG;
+   exit_reason = EXIT_REASON_EPT_MISCONFIG;
else
-   vmcs12-vm_exit_reason = EXIT_REASON_EPT_VIOLATION;
-   vmcs12-exit_qualification = vcpu-arch.exit_qualification;
+   exit_reason = EXIT_REASON_EPT_VIOLATION;
+   nested_vmx_vmexit(vcpu, exit_reason, 0, vcpu-arch.exit_qualification);
vmcs12-guest_physical_address = fault-address;
 }
 
@@ -7640,7 +7638,9 @@ static void vmx_inject_page_fault_nested(struct kvm_vcpu 
*vcpu,
 
/* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */
if (vmcs12-exception_bitmap  (1u  PF_VECTOR))
-   nested_vmx_vmexit(vcpu);
+   

[PATCH 05/12] KVM: nVMX: Leave VMX mode on clearing of feature control MSR

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

When userspace sets MSR_IA32_FEATURE_CONTROL to 0, make sure we leave
root and non-root mode, fully disabling VMX. The register state of the
VCPU is undefined after this step, so userspace has to set it to a
proper state afterward.

This enables to reboot a VM while it is running some hypervisor code.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 arch/x86/kvm/vmx.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9fa8a1c..3edf08f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2455,6 +2455,8 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 
msr_index, u64 *pdata)
return 1;
 }
 
+static void vmx_leave_nested(struct kvm_vcpu *vcpu);
+
 static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
u32 msr_index = msr_info-index;
@@ -2470,6 +2472,8 @@ static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
 FEATURE_CONTROL_LOCKED)
return 0;
to_vmx(vcpu)-nested.msr_ia32_feature_control = data;
+   if (host_initialized  data == 0)
+   vmx_leave_nested(vcpu);
return 1;
}
 
@@ -8507,6 +8511,16 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu)
 }
 
 /*
+ * Forcibly leave nested mode in order to be able to reset the VCPU later on.
+ */
+static void vmx_leave_nested(struct kvm_vcpu *vcpu)
+{
+   if (is_guest_mode(vcpu))
+   nested_vmx_vmexit(vcpu);
+   free_nested(to_vmx(vcpu));
+}
+
+/*
  * L1's failure to enter L2 is a subset of a normal exit, as explained in
  * 23.7 VM-entry failures during or after loading guest state (this also
  * lists the acceptable exit-reason and exit-qualification parameters).
-- 
1.8.1.1.298.ge7eed54

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/12] KVM: nVMX: Clean up handling of VMX-related MSRs

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

This simplifies the code and also stops issuing warning about writing to
unhandled MSRs when VMX is disabled or the Feature Control MSR is
locked - we do handle them all according to the spec.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 arch/x86/include/uapi/asm/msr-index.h |  1 +
 arch/x86/kvm/vmx.c| 79 ++-
 2 files changed, 24 insertions(+), 56 deletions(-)

diff --git a/arch/x86/include/uapi/asm/msr-index.h 
b/arch/x86/include/uapi/asm/msr-index.h
index 37813b5..2e4a42d 100644
--- a/arch/x86/include/uapi/asm/msr-index.h
+++ b/arch/x86/include/uapi/asm/msr-index.h
@@ -527,6 +527,7 @@
 #define MSR_IA32_VMX_TRUE_PROCBASED_CTLS 0x048e
 #define MSR_IA32_VMX_TRUE_EXIT_CTLS  0x048f
 #define MSR_IA32_VMX_TRUE_ENTRY_CTLS 0x0490
+#define MSR_IA32_VMX_VMFUNC 0x0491
 
 /* VMX_BASIC bits and bitmasks */
 #define VMX_BASIC_VMCS_SIZE_SHIFT  32
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9cd6eb7..36efd47 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2361,32 +2361,10 @@ static inline u64 vmx_control_msr(u32 low, u32 high)
return low | ((u64)high  32);
 }
 
-/*
- * If we allow our guest to use VMX instructions (i.e., nested VMX), we should
- * also let it use VMX-specific MSRs.
- * vmx_get_vmx_msr() and vmx_set_vmx_msr() return 1 when we handled a
- * VMX-specific MSR, or 0 when we haven't (and the caller should handle it
- * like all other MSRs).
- */
+/* Returns 0 on success, non-0 otherwise. */
 static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
 {
-   if (!nested_vmx_allowed(vcpu)  msr_index = MSR_IA32_VMX_BASIC 
-msr_index = MSR_IA32_VMX_TRUE_ENTRY_CTLS) {
-   /*
-* According to the spec, processors which do not support VMX
-* should throw a #GP(0) when VMX capability MSRs are read.
-*/
-   kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
-   return 1;
-   }
-
switch (msr_index) {
-   case MSR_IA32_FEATURE_CONTROL:
-   if (nested_vmx_allowed(vcpu)) {
-   *pdata = to_vmx(vcpu)-nested.msr_ia32_feature_control;
-   break;
-   }
-   return 0;
case MSR_IA32_VMX_BASIC:
/*
 * This MSR reports some information about VMX support. We
@@ -2453,38 +2431,9 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 
msr_index, u64 *pdata)
*pdata = nested_vmx_ept_caps;
break;
default:
-   return 0;
-   }
-
-   return 1;
-}
-
-static void vmx_leave_nested(struct kvm_vcpu *vcpu);
-
-static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
-{
-   u32 msr_index = msr_info-index;
-   u64 data = msr_info-data;
-   bool host_initialized = msr_info-host_initiated;
-
-   if (!nested_vmx_allowed(vcpu))
-   return 0;
-
-   if (msr_index == MSR_IA32_FEATURE_CONTROL) {
-   if (!host_initialized 
-   to_vmx(vcpu)-nested.msr_ia32_feature_control
-FEATURE_CONTROL_LOCKED)
-   return 0;
-   to_vmx(vcpu)-nested.msr_ia32_feature_control = data;
-   if (host_initialized  data == 0)
-   vmx_leave_nested(vcpu);
return 1;
}
 
-   /*
-* No need to treat VMX capability MSRs specially: If we don't handle
-* them, handle_wrmsr will #GP(0), which is correct (they are readonly)
-*/
return 0;
 }
 
@@ -2530,13 +2479,20 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 
msr_index, u64 *pdata)
case MSR_IA32_SYSENTER_ESP:
data = vmcs_readl(GUEST_SYSENTER_ESP);
break;
+   case MSR_IA32_FEATURE_CONTROL:
+   if (!nested_vmx_allowed(vcpu))
+   return 1;
+   data = to_vmx(vcpu)-nested.msr_ia32_feature_control;
+   break;
+   case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
+   if (!nested_vmx_allowed(vcpu))
+   return 1;
+   return vmx_get_vmx_msr(vcpu, msr_index, pdata);
case MSR_TSC_AUX:
if (!to_vmx(vcpu)-rdtscp_enabled)
return 1;
/* Otherwise falls through */
default:
-   if (vmx_get_vmx_msr(vcpu, msr_index, pdata))
-   return 0;
msr = find_msr_entry(to_vmx(vcpu), msr_index);
if (msr) {
data = msr-data;
@@ -2549,6 +2505,8 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 
msr_index, u64 *pdata)
return 0;
 }
 
+static void vmx_leave_nested(struct kvm_vcpu *vcpu);
+
 /*
  * Writes msr value into into the appropriate 

[PATCH 04/12] KVM: x86: Validate guest writes to MSR_IA32_APICBASE

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Check for invalid state transitions on guest-initiated updates of
MSR_IA32_APICBASE. This address both enabling of the x2APIC when it is
not supported and all invalid transitions as described in SDM section
10.12.5. It also checks that no reserved bit is set in APICBASE by the
guest.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 arch/x86/kvm/cpuid.h |  8 
 arch/x86/kvm/lapic.h |  2 +-
 arch/x86/kvm/vmx.c   |  9 +
 arch/x86/kvm/x86.c   | 32 +---
 4 files changed, 39 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index f1e4895..a2a1bb7 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -72,4 +72,12 @@ static inline bool guest_cpuid_has_pcid(struct kvm_vcpu 
*vcpu)
return best  (best-ecx  bit(X86_FEATURE_PCID));
 }
 
+static inline bool guest_cpuid_has_x2apic(struct kvm_vcpu *vcpu)
+{
+   struct kvm_cpuid_entry2 *best;
+
+   best = kvm_find_cpuid_entry(vcpu, 1, 0);
+   return best  (best-ecx  bit(X86_FEATURE_X2APIC));
+}
+
 #endif
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index c730ac9..3ee60ef 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -65,7 +65,7 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct 
kvm_lapic *src,
struct kvm_lapic_irq *irq, int *r, unsigned long *dest_map);
 
 u64 kvm_get_apic_base(struct kvm_vcpu *vcpu);
-void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data);
+int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
 void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
struct kvm_lapic_state *s);
 int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2a95ce0..9fa8a1c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4417,7 +4417,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
 static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
-   u64 msr;
+   struct msr_data apic_base_msr;
 
vmx-rmode.vm86_active = 0;
 
@@ -4425,10 +4425,11 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
vmx-vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(vmx-vcpu, 0);
-   msr = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
+   apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(vmx-vcpu))
-   msr |= MSR_IA32_APICBASE_BSP;
-   kvm_set_apic_base(vmx-vcpu, msr);
+   apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
+   apic_base_msr.host_initiated = true;
+   kvm_set_apic_base(vmx-vcpu, apic_base_msr);
 
vmx_segment_cache_clear(vmx);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ea7c6a5..559ae75 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -254,10 +254,26 @@ u64 kvm_get_apic_base(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_get_apic_base);
 
-void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data)
-{
-   /* TODO: reserve bits check */
-   kvm_lapic_set_base(vcpu, data);
+int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+   u64 old_state = vcpu-arch.apic_base 
+   (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE);
+   u64 new_state = msr_info-data 
+   (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE);
+   u64 reserved_bits = ((~0ULL)  boot_cpu_data.x86_phys_bits) | 0x2ff |
+   (guest_cpuid_has_x2apic(vcpu) ? 0 : X2APIC_ENABLE);
+
+   if (!msr_info-host_initiated 
+   ((msr_info-data  reserved_bits) != 0 ||
+new_state == X2APIC_ENABLE ||
+(new_state == MSR_IA32_APICBASE_ENABLE 
+ old_state == (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE)) ||
+(new_state == (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE) 
+ old_state == 0)))
+   return 1;
+
+   kvm_lapic_set_base(vcpu, msr_info-data);
+   return 0;
 }
 EXPORT_SYMBOL_GPL(kvm_set_apic_base);
 
@@ -2027,8 +2043,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case 0x200 ... 0x2ff:
return set_msr_mtrr(vcpu, msr, data);
case MSR_IA32_APICBASE:
-   kvm_set_apic_base(vcpu, data);
-   break;
+   return kvm_set_apic_base(vcpu, msr_info);
case APIC_BASE_MSR ... APIC_BASE_MSR + 0x3ff:
return kvm_x2apic_msr_write(vcpu, msr, data);
case MSR_IA32_TSCDEADLINE:
@@ -6459,6 +6474,7 @@ EXPORT_SYMBOL_GPL(kvm_task_switch);
 int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
  struct kvm_sregs *sregs)
 {
+   struct msr_data apic_base_msr;
int mmu_reset_needed = 0;
int pending_vec, max_bits, idx;
struct desc_ptr dt;
@@ -6482,7 +6498,9 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 

[PATCH 07/12] KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Used von SVM introduced for tracing nested vmexit: kvm_nested_vmexit
marks exits from L2 to L0 while kvm_nested_vmexit_inject marks vmexits
that are reflected to L1.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 arch/x86/kvm/vmx.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0bd0509..9cd6eb7 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6701,6 +6701,13 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu 
*vcpu)
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
u32 exit_reason = vmx-exit_reason;
 
+   trace_kvm_nested_vmexit(kvm_rip_read(vcpu), exit_reason,
+   vmcs_readl(EXIT_QUALIFICATION),
+   vmx-idt_vectoring_info,
+   intr_info,
+   vmcs_read32(VM_EXIT_INTR_ERROR_CODE),
+   KVM_ISA_VMX);
+
if (vmx-nested.nested_run_pending)
return 0;
 
@@ -8472,6 +8479,13 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
prepare_vmcs12(vcpu, vmcs12, exit_reason, exit_intr_info,
   exit_qualification);
 
+   trace_kvm_nested_vmexit_inject(vmcs12-vm_exit_reason,
+  vmcs12-exit_qualification,
+  vmcs12-idt_vectoring_info_field,
+  vmcs12-vm_exit_intr_info,
+  vmcs12-vm_exit_intr_error_code,
+  KVM_ISA_VMX);
+
cpu = get_cpu();
vmx-loaded_vmcs = vmx-vmcs01;
vmx_vcpu_put(vcpu);
-- 
1.8.1.1.298.ge7eed54

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/12] KVM: nVMX: Fix nested_run_pending on activity state HLT

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

When we suspend the guest in HLT state, the nested run is no longer
pending - we emulated it completely. So only set nested_run_pending
after checking the activity state.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 arch/x86/kvm/vmx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 36efd47..bde8ddd 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8050,8 +8050,6 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool 
launch)
 
enter_guest_mode(vcpu);
 
-   vmx-nested.nested_run_pending = 1;
-
vmx-nested.vmcs01_tsc_offset = vmcs_read64(TSC_OFFSET);
 
cpu = get_cpu();
@@ -8070,6 +8068,8 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool 
launch)
if (vmcs12-guest_activity_state == GUEST_ACTIVITY_HLT)
return kvm_emulate_halt(vcpu);
 
+   vmx-nested.nested_run_pending = 1;
+
/*
 * Note no nested_vmx_succeed or nested_vmx_fail here. At this point
 * we are no longer running L1, and VMLAUNCH/VMRESUME has not yet
-- 
1.8.1.1.298.ge7eed54

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/12] KVM: VMX: Fix DR6 update on #DB exception

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

According to the SDM, only bits 0-3 of DR6 may be cleared by certain
debug exception. So do update them on #DB exception in KVM, but leave
the rest alone, only setting BD and BS in addition to already set bits
in DR6. This also aligns us with kvm_vcpu_check_singlestep.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 arch/x86/kvm/vmx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 55cb4b6..2a95ce0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4869,7 +4869,8 @@ static int handle_exception(struct kvm_vcpu *vcpu)
dr6 = vmcs_readl(EXIT_QUALIFICATION);
if (!(vcpu-guest_debug 
  (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))) {
-   vcpu-arch.dr6 = dr6 | DR6_FIXED_1;
+   vcpu-arch.dr6 = ~15;
+   vcpu-arch.dr6 |= dr6;
kvm_queue_exception(vcpu, DB_VECTOR);
return 1;
}
-- 
1.8.1.1.298.ge7eed54

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/13] kvm-unit-tests: Various improvements for x86 tests

2014-01-04 Thread Jan Kiszka
Highlights:
 - improved preemption timer and interrupt injection tests
   (obsoletes my two patches in vmx queue)
 - tests for IA32_APIC_BASE writes
 - test for unconditional IO exiting (VMX)
 - basic test of debug facilities (hw breakpoints etc.)

Jan Kiszka (13):
  VMX: Add test cases around interrupt injection and halting
  VMX: Extend preemption timer tests
  apic: Remove redundant enable_apic
  VMX: Fix return label in fault-triggering handlers
  lib/x86: Move exception test code into library
  x2apic: Test for invalid state transitions
  lib/x86/apic: Consolidate over MSR_IA32_APICBASE
  apic: Add test case for relocation and writing reserved bits
  VMX: Check unconditional I/O exiting
  Provide common report and report_summary services
  Ignore *.elf build outputs
  svm: Add missing build dependency
  x86: Add debug facility test case

 .gitignore|   1 +
 Makefile  |   3 +-
 config-x86-common.mak |   4 +-
 config-x86_64.mak |   2 +-
 lib/libcflat.h|   4 +
 lib/report.c  |  36 +++
 lib/x86/apic-defs.h   |   3 +
 lib/x86/apic.c|   7 +-
 lib/x86/desc.c|  24 +
 lib/x86/desc.h|   6 ++
 x86/apic.c|  84 +---
 x86/debug.c   | 113 +
 x86/emulator.c|  16 +--
 x86/eventinj.c|  15 +--
 x86/idt_test.c|  21 +---
 x86/msr.c |  15 +--
 x86/pcid.c|  14 +--
 x86/pmu.c |  37 +++
 x86/taskswitch2.c |  15 +--
 x86/unittests.cfg |   3 +
 x86/vmx.c |  57 +++
 x86/vmx.h |   4 +-
 x86/vmx_tests.c   | 264 ++
 23 files changed, 548 insertions(+), 200 deletions(-)
 create mode 100644 lib/report.c
 create mode 100644 x86/debug.c

-- 
1.8.1.1.298.ge7eed54

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/13] VMX: Extend preemption timer tests

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

This checks that we properly expire the preemption timer while the guest
is in HLT state and that we do not progress guest execution of the
preemption timer is activated with a timer value of 0.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 x86/vmx_tests.c | 84 +++--
 1 file changed, 64 insertions(+), 20 deletions(-)

diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 70efb50..0077f3f 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -99,6 +99,7 @@ int vmenter_exit_handler()
 u32 preempt_scale;
 volatile unsigned long long tsc_val;
 volatile u32 preempt_val;
+u64 saved_rip;
 
 int preemption_timer_init()
 {
@@ -126,17 +127,24 @@ void preemption_timer_main()
if (get_stage() == 1)
vmcall();
}
-   while (1) {
+   set_stage(1);
+   while (get_stage() == 1) {
if (((rdtsc() - tsc_val)  preempt_scale)
 10 * preempt_val) {
set_stage(2);
vmcall();
}
}
+   tsc_val = rdtsc();
+   asm volatile (hlt);
+   vmcall();
+   set_stage(5);
+   vmcall();
 }
 
 int preemption_timer_exit_handler()
 {
+   bool guest_halted;
u64 guest_rip;
ulong reason;
u32 insn_len;
@@ -147,33 +155,69 @@ int preemption_timer_exit_handler()
insn_len = vmcs_read(EXI_INST_LEN);
switch (reason) {
case VMX_PREEMPT:
-   if (((rdtsc() - tsc_val)  preempt_scale)  preempt_val)
-   report(Preemption timer, 0);
-   else
-   report(Preemption timer, 1);
+   switch (get_stage()) {
+   case 1:
+   case 2:
+   report(busy-wait for preemption timer,
+  ((rdtsc() - tsc_val)  preempt_scale) =
+  preempt_val);
+   set_stage(3);
+   vmcs_write(PREEMPT_TIMER_VALUE, preempt_val);
+   return VMX_TEST_RESUME;
+   case 3:
+   guest_halted =
+   (vmcs_read(GUEST_ACTV_STATE) == ACTV_HLT);
+   report(preemption timer during hlt,
+  ((rdtsc() - tsc_val)  preempt_scale) =
+  preempt_val  guest_halted);
+   set_stage(4);
+   vmcs_write(PIN_CONTROLS,
+  vmcs_read(PIN_CONTROLS)  ~PIN_PREEMPT);
+   vmcs_write(GUEST_ACTV_STATE, ACTV_ACTIVE);
+   return VMX_TEST_RESUME;
+   case 4:
+   report(preemption timer with 0 value,
+  saved_rip == guest_rip);
+   break;
+   default:
+   printf(Invalid stage.\n);
+   print_vmexit_info();
+   break;
+   }
break;
case VMX_VMCALL:
+   vmcs_write(GUEST_RIP, guest_rip + insn_len);
switch (get_stage()) {
case 0:
-   if (vmcs_read(PREEMPT_TIMER_VALUE) != preempt_val)
-   report(Save preemption value, 0);
-   else {
-   set_stage(get_stage() + 1);
-   ctrl_exit = (vmcs_read(EXI_CONTROLS) |
-   EXI_SAVE_PREEMPT)  ctrl_exit_rev.clr;
-   vmcs_write(EXI_CONTROLS, ctrl_exit);
-   }
-   vmcs_write(GUEST_RIP, guest_rip + insn_len);
+   report(Keep preemption value,
+  vmcs_read(PREEMPT_TIMER_VALUE) == preempt_val);
+   set_stage(1);
+   vmcs_write(PREEMPT_TIMER_VALUE, preempt_val);
+   ctrl_exit = (vmcs_read(EXI_CONTROLS) |
+   EXI_SAVE_PREEMPT)  ctrl_exit_rev.clr;
+   vmcs_write(EXI_CONTROLS, ctrl_exit);
return VMX_TEST_RESUME;
case 1:
-   if (vmcs_read(PREEMPT_TIMER_VALUE) = preempt_val)
-   report(Save preemption value, 0);
-   else
-   report(Save preemption value, 1);
-   vmcs_write(GUEST_RIP, guest_rip + insn_len);
+   report(Save preemption value,
+  vmcs_read(PREEMPT_TIMER_VALUE)  preempt_val);
return VMX_TEST_RESUME;
case 2:
-   report(Preemption timer, 0);
+   report(busy-wait for preemption timer, 0);
+   

[PATCH 12/13] svm: Add missing build dependency

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 config-x86-common.mak | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/config-x86-common.mak b/config-x86-common.mak
index bf88c67..32da7fb 100644
--- a/config-x86-common.mak
+++ b/config-x86-common.mak
@@ -86,7 +86,7 @@ $(TEST_DIR)/xsave.elf: $(cstart.o) $(TEST_DIR)/xsave.o
 
 $(TEST_DIR)/rmap_chain.elf: $(cstart.o) $(TEST_DIR)/rmap_chain.o
 
-$(TEST_DIR)/svm.elf: $(cstart.o)
+$(TEST_DIR)/svm.elf: $(cstart.o) $(TEST_DIR)/svm.o
 
 $(TEST_DIR)/kvmclock_test.elf: $(cstart.o) $(TEST_DIR)/kvmclock.o \
 $(TEST_DIR)/kvmclock_test.o
-- 
1.8.1.1.298.ge7eed54

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/13] VMX: Check unconditional I/O exiting

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Test if we ignore unconditional I/O exiting as long as use I/O
bitmap is enabled. Also test if unconditional exiting itself works.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 x86/vmx_tests.c | 36 +++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 0077f3f..2c2d6c4 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -701,13 +701,21 @@ static void iobmp_main()
report(I/O bitmap - overrun, 1);
else
report(I/O bitmap - overrun, 0);
+   set_stage(9);
+   vmcall();
+   outb(0x0, 0x0);
+   report(I/O bitmap - ignore unconditional exiting, stage == 9);
+   set_stage(10);
+   vmcall();
+   outb(0x0, 0x0);
+   report(I/O bitmap - unconditional exiting, stage == 11);
 }
 
 static int iobmp_exit_handler()
 {
u64 guest_rip;
ulong reason, exit_qual;
-   u32 insn_len;
+   u32 insn_len, ctrl_cpu0;
 
guest_rip = vmcs_read(GUEST_RIP);
reason = vmcs_read(EXI_REASON)  0xff;
@@ -765,6 +773,32 @@ static int iobmp_exit_handler()
if (((exit_qual  VMX_IO_PORT_MASK)  
VMX_IO_PORT_SHIFT) == 0x)
set_stage(stage + 1);
break;
+   case 9:
+   case 10:
+   ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0);
+   vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0  ~CPU_IO);
+   set_stage(stage + 1);
+   break;
+   default:
+   // Should not reach here
+   printf(ERROR : unexpected stage, %d\n, get_stage());
+   print_vmexit_info();
+   return VMX_TEST_VMEXIT;
+   }
+   vmcs_write(GUEST_RIP, guest_rip + insn_len);
+   return VMX_TEST_RESUME;
+   case VMX_VMCALL:
+   switch (get_stage()) {
+   case 9:
+   ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0);
+   ctrl_cpu0 |= CPU_IO | CPU_IO_BITMAP;
+   vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0);
+   break;
+   case 10:
+   ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0);
+   ctrl_cpu0 = (ctrl_cpu0  ~CPU_IO_BITMAP) | CPU_IO;
+   vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0);
+   break;
default:
// Should not reach here
printf(ERROR : unexpected stage, %d\n, get_stage());
-- 
1.8.1.1.298.ge7eed54

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/13] VMX: Fix return label in fault-triggering handlers

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Some compiler versions (seen with gcc 4.8.1) move the resume label after
the return statement which, of course, causes sever problems.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 x86/vmx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/x86/vmx.c b/x86/vmx.c
index a475aec..f9d5493 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -563,7 +563,7 @@ static void do_vmxon_off(void)
vmx_on();
vmx_off();
 resume:
-   return;
+   barrier();
 }
 
 static void do_write_feature_control(void)
@@ -572,7 +572,7 @@ static void do_write_feature_control(void)
barrier();
wrmsr(MSR_IA32_FEATURE_CONTROL, 0);
 resume:
-   return;
+   barrier();
 }
 
 static int test_vmx_feature_control(void)
-- 
1.8.1.1.298.ge7eed54

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/13] apic: Add test case for relocation and writing reserved bits

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Check that the xAPIC is relocatable and that writing a reserved bit to
MSR_IA32_APICBASE triggers a #GP.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 x86/apic.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/x86/apic.c b/x86/apic.c
index 4ebcd4f..8febfa2 100644
--- a/x86/apic.c
+++ b/x86/apic.c
@@ -120,6 +120,32 @@ void test_enable_x2apic(void)
 }
 }
 
+#define ALTERNATE_APIC_BASE0x4200
+
+static void test_apicbase(void)
+{
+u64 orig_apicbase = rdmsr(MSR_IA32_APICBASE);
+u32 lvr = apic_read(APIC_LVR);
+u64 value;
+
+wrmsr(MSR_IA32_APICBASE, orig_apicbase  ~(APIC_EN | APIC_EXTD));
+wrmsr(MSR_IA32_APICBASE, ALTERNATE_APIC_BASE | APIC_BSP | APIC_EN);
+
+report(relocate apic,
+   *(volatile u32 *)(ALTERNATE_APIC_BASE + APIC_LVR) == lvr);
+
+value = orig_apicbase | (1UL  (cpuid(0x8008).a  0xff));
+report(apicbase: reserved physaddr bits,
+   test_for_exception(GP_VECTOR, do_write_apicbase, value));
+
+value = orig_apicbase | 1;
+report(apicbase: reserved low bits,
+   test_for_exception(GP_VECTOR, do_write_apicbase, value));
+
+wrmsr(MSR_IA32_APICBASE, orig_apicbase);
+apic_write(APIC_SPIV, 0x1ff);
+}
+
 static void eoi(void)
 {
 apic_write(APIC_EOI, 0);
@@ -366,6 +392,7 @@ int main()
 
 mask_pic_interrupts();
 test_enable_x2apic();
+test_apicbase();
 
 test_self_ipi();
 
-- 
1.8.1.1.298.ge7eed54

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/13] Ignore *.elf build outputs

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 .gitignore | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.gitignore b/.gitignore
index ed857b7..d6663ec 100644
--- a/.gitignore
+++ b/.gitignore
@@ -3,6 +3,7 @@
 *.d
 *.o
 *.flat
+*.elf
 .pc
 patches
 .stgit-*
-- 
1.8.1.1.298.ge7eed54

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/13] x86: Add debug facility test case

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

This adds a basic test for INT3/#BP, hardware breakpoints, hardware
watchpoints and single-stepping.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 config-x86-common.mak |   2 +
 config-x86_64.mak |   2 +-
 lib/x86/desc.h|   2 +
 x86/debug.c   | 113 ++
 x86/unittests.cfg |   3 ++
 5 files changed, 121 insertions(+), 1 deletion(-)
 create mode 100644 x86/debug.c

diff --git a/config-x86-common.mak b/config-x86-common.mak
index 32da7fb..aa5a439 100644
--- a/config-x86-common.mak
+++ b/config-x86-common.mak
@@ -103,6 +103,8 @@ $(TEST_DIR)/pcid.elf: $(cstart.o) $(TEST_DIR)/pcid.o
 
 $(TEST_DIR)/vmx.elf: $(cstart.o) $(TEST_DIR)/vmx.o $(TEST_DIR)/vmx_tests.o
 
+$(TEST_DIR)/debug.elf: $(cstart.o) $(TEST_DIR)/debug.o
+
 arch_clean:
$(RM) $(TEST_DIR)/*.o $(TEST_DIR)/*.flat $(TEST_DIR)/*.elf \
$(TEST_DIR)/.*.d $(TEST_DIR)/lib/.*.d $(TEST_DIR)/lib/*.o
diff --git a/config-x86_64.mak b/config-x86_64.mak
index bb8ee89..a9a2a9e 100644
--- a/config-x86_64.mak
+++ b/config-x86_64.mak
@@ -7,7 +7,7 @@ CFLAGS += -D__x86_64__
 tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
  $(TEST_DIR)/emulator.flat $(TEST_DIR)/idt_test.flat \
  $(TEST_DIR)/xsave.flat $(TEST_DIR)/rmap_chain.flat \
- $(TEST_DIR)/pcid.flat
+ $(TEST_DIR)/pcid.flat $(TEST_DIR)/debug.flat
 tests += $(TEST_DIR)/svm.flat
 tests += $(TEST_DIR)/vmx.flat
 
diff --git a/lib/x86/desc.h b/lib/x86/desc.h
index 5c850b2..b795aad 100644
--- a/lib/x86/desc.h
+++ b/lib/x86/desc.h
@@ -66,6 +66,8 @@ typedef struct {
 .popsection \n\t  \
 :
 
+#define DB_VECTOR   1
+#define BP_VECTOR   3
 #define UD_VECTOR   6
 #define GP_VECTOR   13
 
diff --git a/x86/debug.c b/x86/debug.c
new file mode 100644
index 000..154c7fe
--- /dev/null
+++ b/x86/debug.c
@@ -0,0 +1,113 @@
+/*
+ * Test for x86 debugging facilities
+ *
+ * Copyright (c) Siemens AG, 2014
+ *
+ * Authors:
+ *  Jan Kiszka jan.kis...@siemens.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+
+#include libcflat.h
+#include desc.h
+
+static volatile unsigned long bp_addr[10], dr6[10];
+static volatile unsigned int n;
+static volatile unsigned long value;
+
+static unsigned long get_dr6(void)
+{
+   unsigned long value;
+
+   asm volatile(mov %%dr6,%0 : =r (value));
+   return value;
+}
+
+static void set_dr0(void *value)
+{
+   asm volatile(mov %0,%%dr0 : : r (value));
+}
+
+static void set_dr1(void *value)
+{
+   asm volatile(mov %0,%%dr1 : : r (value));
+}
+
+static void set_dr7(unsigned long value)
+{
+   asm volatile(mov %0,%%dr7 : : r (value));
+}
+
+static void handle_db(struct ex_regs *regs)
+{
+   bp_addr[n] = regs-rip;
+   dr6[n] = get_dr6();
+
+   if (dr6[n]  0x1)
+   regs-rflags |= (1  16);
+
+   if (++n = 10) {
+   regs-rflags = ~(1  8);
+   set_dr7(0x0400);
+   }
+}
+
+static void handle_bp(struct ex_regs *regs)
+{
+   bp_addr[0] = regs-rip;
+}
+
+int main(int ac, char **av)
+{
+   unsigned long start;
+
+   setup_idt();
+   handle_exception(DB_VECTOR, handle_db);
+   handle_exception(BP_VECTOR, handle_bp);
+
+sw_bp:
+   asm volatile(int3);
+   report(#BP, bp_addr[0] == (unsigned long)sw_bp + 1);
+
+   set_dr0(hw_bp);
+   set_dr7(0x0402);
+hw_bp:
+   asm volatile(nop);
+   report(hw breakpoint,
+  n == 1 
+  bp_addr[0] == ((unsigned long)hw_bp)  dr6[0] == 0x0ff1);
+
+   n = 0;
+   asm volatile(
+   pushf\n\t
+   pop %%rax\n\t
+   or $(18),%%rax\n\t
+   push %%rax\n\t
+   lea (%%rip),%0\n\t
+   popf\n\t
+   and $~(18),%%rax\n\t
+   push %%rax\n\t
+   popf\n\t
+   : =g (start) : : rax);
+   report(single step,
+  n == 3 
+  bp_addr[0] == start+1+6  dr6[0] == 0x4ff0 
+  bp_addr[1] == start+1+6+1  dr6[1] == 0x4ff0 
+  bp_addr[2] == start+1+6+1+1  dr6[2] == 0x4ff0);
+
+   n = 0;
+   set_dr1((void *)value);
+   set_dr7(0x00d0040a);
+
+   asm volatile(
+   mov $42,%%rax\n\t
+   mov %%rax,%0\n\t
+   : =m (value) : : rax);
+hw_wp:
+   report(hw watchpoint,
+  n == 1 
+  bp_addr[0] == ((unsigned long)hw_wp)  dr6[0] == 0x4ff2);
+
+   return 0;
+}
diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index 85c36aa..7930c02 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -155,3 +155,6 @@ file = vmx.flat
 extra_params = -cpu host,+vmx
 arch = x86_64
 
+[debug]
+file = debug.flag
+arch = x86_64
-- 
1.8.1.1.298.ge7eed54

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More 

[PATCH 01/13] VMX: Add test cases around interrupt injection and halting

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

This checks for interrupt delivery to L2, unintercepted hlt in L2 and
explicit L2 suspension via the activity state HLT. All tests are
performed both with direct interrupt injection and external interrupt
interception.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 x86/vmx.c   |   3 +-
 x86/vmx.h   |   3 ++
 x86/vmx_tests.c | 144 
 3 files changed, 148 insertions(+), 2 deletions(-)

diff --git a/x86/vmx.c b/x86/vmx.c
index fe950e6..a475aec 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -457,7 +457,7 @@ static void init_vmcs_guest(void)
vmcs_write(GUEST_RFLAGS, 0x2);
 
/* 26.3.1.5 */
-   vmcs_write(GUEST_ACTV_STATE, 0);
+   vmcs_write(GUEST_ACTV_STATE, ACTV_ACTIVE);
vmcs_write(GUEST_INTR_STATE, 0);
 }
 
@@ -482,7 +482,6 @@ static int init_vmcs(struct vmcs **vmcs)
ctrl_pin |= PIN_EXTINT | PIN_NMI | PIN_VIRT_NMI;
ctrl_exit = EXI_LOAD_EFER | EXI_HOST_64;
ctrl_enter = (ENT_LOAD_EFER | ENT_GUEST_64);
-   ctrl_cpu[0] |= CPU_HLT;
/* DIsable IO instruction VMEXIT now */
ctrl_cpu[0] = (~(CPU_IO | CPU_IO_BITMAP));
ctrl_cpu[1] = 0;
diff --git a/x86/vmx.h b/x86/vmx.h
index bc8c86f..3867793 100644
--- a/x86/vmx.h
+++ b/x86/vmx.h
@@ -500,6 +500,9 @@ enum Ctrl1 {
 #define INVEPT_SINGLE  1
 #define INVEPT_GLOBAL  2
 
+#define ACTV_ACTIVE0
+#define ACTV_HLT   1
+
 extern struct regs regs;
 
 extern union vmx_basic basic;
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index bec34c4..70efb50 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -9,6 +9,8 @@
 #include vm.h
 #include io.h
 #include fwcfg.h
+#include isr.h
+#include apic.h
 
 u64 ia32_pat;
 u64 ia32_efer;
@@ -1117,6 +1119,146 @@ static int ept_exit_handler()
return VMX_TEST_VMEXIT;
 }
 
+#define TIMER_VECTOR   222
+
+static volatile bool timer_fired;
+
+static void timer_isr(isr_regs_t *regs)
+{
+   timer_fired = true;
+   apic_write(APIC_EOI, 0);
+}
+
+static int interrupt_init(struct vmcs *vmcs)
+{
+   msr_bmp_init();
+   vmcs_write(PIN_CONTROLS, vmcs_read(PIN_CONTROLS)  ~PIN_EXTINT);
+   handle_irq(TIMER_VECTOR, timer_isr);
+   return VMX_TEST_START;
+}
+
+static void interrupt_main(void)
+{
+   long long start, loops;
+
+   set_stage(0);
+
+   apic_write(APIC_LVTT, TIMER_VECTOR);
+   irq_enable();
+
+   apic_write(APIC_TMICT, 1);
+   for (loops = 0; loops  1000  !timer_fired; loops++)
+   asm volatile (nop);
+   report(direct interrupt while running guest, timer_fired);
+
+   apic_write(APIC_TMICT, 0);
+   irq_disable();
+   vmcall();
+   timer_fired = false;
+   apic_write(APIC_TMICT, 1);
+   for (loops = 0; loops  1000  !timer_fired; loops++)
+   asm volatile (nop);
+   report(intercepted interrupt while running guest, timer_fired);
+
+   irq_enable();
+   apic_write(APIC_TMICT, 0);
+   irq_disable();
+   vmcall();
+   timer_fired = false;
+   start = rdtsc();
+   apic_write(APIC_TMICT, 100);
+
+   asm volatile (sti; hlt);
+
+   report(direct interrupt + hlt,
+  rdtsc() - start  100  timer_fired);
+
+   apic_write(APIC_TMICT, 0);
+   irq_disable();
+   vmcall();
+   timer_fired = false;
+   start = rdtsc();
+   apic_write(APIC_TMICT, 100);
+
+   asm volatile (sti; hlt);
+
+   report(intercepted interrupt + hlt,
+  rdtsc() - start  1  timer_fired);
+
+   apic_write(APIC_TMICT, 0);
+   irq_disable();
+   vmcall();
+   timer_fired = false;
+   start = rdtsc();
+   apic_write(APIC_TMICT, 100);
+
+   irq_enable();
+   asm volatile (nop);
+   vmcall();
+
+   report(direct interrupt + activity state hlt,
+  rdtsc() - start  1  timer_fired);
+
+   apic_write(APIC_TMICT, 0);
+   irq_disable();
+   vmcall();
+   timer_fired = false;
+   start = rdtsc();
+   apic_write(APIC_TMICT, 100);
+
+   irq_enable();
+   asm volatile (nop);
+   vmcall();
+
+   report(intercepted interrupt + activity state hlt,
+  rdtsc() - start  1  timer_fired);
+}
+
+static int interrupt_exit_handler(void)
+{
+   u64 guest_rip = vmcs_read(GUEST_RIP);
+   ulong reason = vmcs_read(EXI_REASON)  0xff;
+   u32 insn_len = vmcs_read(EXI_INST_LEN);
+
+   switch (reason) {
+   case VMX_VMCALL:
+   switch (get_stage()) {
+   case 0:
+   case 2:
+   case 5:
+   vmcs_write(PIN_CONTROLS,
+  vmcs_read(PIN_CONTROLS) | PIN_EXTINT);
+   break;
+   case 1:
+   case 3:
+   vmcs_write(PIN_CONTROLS,
+  vmcs_read(PIN_CONTROLS)  

[PATCH 05/13] lib/x86: Move exception test code into library

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Will also be used by the APIC test. Moving exception_return assignment
out of line, we can drop the explicit compiler barrier.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 lib/x86/desc.c | 24 
 lib/x86/desc.h |  4 
 x86/vmx.c  | 34 +++---
 3 files changed, 35 insertions(+), 27 deletions(-)

diff --git a/lib/x86/desc.c b/lib/x86/desc.c
index 7c5c721..f75ec1d 100644
--- a/lib/x86/desc.c
+++ b/lib/x86/desc.c
@@ -353,3 +353,27 @@ void print_current_tss_info(void)
tr, tss[0].prev, tss[i].prev);
 }
 #endif
+
+static bool exception;
+static void *exception_return;
+
+static void exception_handler(struct ex_regs *regs)
+{
+   exception = true;
+   regs-rip = (unsigned long)exception_return;
+}
+
+bool test_for_exception(unsigned int ex, void (*trigger_func)(void *data),
+   void *data)
+{
+   handle_exception(ex, exception_handler);
+   exception = false;
+   trigger_func(data);
+   handle_exception(ex, NULL);
+   return exception;
+}
+
+void set_exception_return(void *addr)
+{
+   exception_return = addr;
+}
diff --git a/lib/x86/desc.h b/lib/x86/desc.h
index f819452..5c850b2 100644
--- a/lib/x86/desc.h
+++ b/lib/x86/desc.h
@@ -84,4 +84,8 @@ void set_intr_task_gate(int e, void *fn);
 void print_current_tss_info(void);
 void handle_exception(u8 v, void (*func)(struct ex_regs *regs));
 
+bool test_for_exception(unsigned int ex, void (*trigger_func)(void *data),
+   void *data);
+void set_exception_return(void *addr);
+
 #endif
diff --git a/x86/vmx.c b/x86/vmx.c
index f9d5493..4f0bb8d 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -538,38 +538,18 @@ static void init_vmx(void)
memset(guest_syscall_stack, 0, PAGE_SIZE);
 }
 
-static bool exception;
-static void *exception_return;
-
-static void exception_handler(struct ex_regs *regs)
+static void do_vmxon_off(void *data)
 {
-   exception = true;
-   regs-rip = (u64)exception_return;
-}
-
-static int test_for_exception(unsigned int ex, void (*func)(void))
-{
-   handle_exception(ex, exception_handler);
-   exception = false;
-   func();
-   handle_exception(ex, NULL);
-   return exception;
-}
-
-static void do_vmxon_off(void)
-{
-   exception_return = resume;
-   barrier();
+   set_exception_return(resume);
vmx_on();
vmx_off();
 resume:
barrier();
 }
 
-static void do_write_feature_control(void)
+static void do_write_feature_control(void *data)
 {
-   exception_return = resume;
-   barrier();
+   set_exception_return(resume);
wrmsr(MSR_IA32_FEATURE_CONTROL, 0);
 resume:
barrier();
@@ -592,18 +572,18 @@ static int test_vmx_feature_control(void)
 
wrmsr(MSR_IA32_FEATURE_CONTROL, 0);
report(test vmxon with FEATURE_CONTROL cleared,
-  test_for_exception(GP_VECTOR, do_vmxon_off));
+  test_for_exception(GP_VECTOR, do_vmxon_off, NULL));
 
wrmsr(MSR_IA32_FEATURE_CONTROL, 0x4);
report(test vmxon without FEATURE_CONTROL lock,
-  test_for_exception(GP_VECTOR, do_vmxon_off));
+  test_for_exception(GP_VECTOR, do_vmxon_off, NULL));
 
wrmsr(MSR_IA32_FEATURE_CONTROL, 0x5);
vmx_enabled = ((rdmsr(MSR_IA32_FEATURE_CONTROL)  0x5) == 0x5);
report(test enable VMX in FEATURE_CONTROL, vmx_enabled);
 
report(test FEATURE_CONTROL lock bit,
-  test_for_exception(GP_VECTOR, do_write_feature_control));
+  test_for_exception(GP_VECTOR, do_write_feature_control, NULL));
 
return !vmx_enabled;
 }
-- 
1.8.1.1.298.ge7eed54

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/13] lib/x86/apic: Consolidate over MSR_IA32_APICBASE

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 lib/x86/apic.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/lib/x86/apic.c b/lib/x86/apic.c
index 7bb98ed..6876d85 100644
--- a/lib/x86/apic.c
+++ b/lib/x86/apic.c
@@ -1,5 +1,6 @@
 #include libcflat.h
 #include apic.h
+#include msr.h
 
 static void *g_apic = (void *)0xfee0;
 static void *g_ioapic = (void *)0xfec0;
@@ -99,8 +100,6 @@ uint32_t apic_id(void)
 return apic_ops-id();
 }
 
-#define MSR_APIC_BASE 0x001b
-
 int enable_x2apic(void)
 {
 unsigned a, b, c, d;
@@ -108,9 +107,9 @@ int enable_x2apic(void)
 asm (cpuid : =a(a), =b(b), =c(c), =d(d) : 0(1));
 
 if (c  (1  21)) {
-asm (rdmsr : =a(a), =d(d) : c(MSR_APIC_BASE));
+asm (rdmsr : =a(a), =d(d) : c(MSR_IA32_APICBASE));
 a |= 1  10;
-asm (wrmsr : : a(a), d(d), c(MSR_APIC_BASE));
+asm (wrmsr : : a(a), d(d), c(MSR_IA32_APICBASE));
 apic_ops = x2apic_ops;
 return 1;
 } else {
-- 
1.8.1.1.298.ge7eed54

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/13] x2apic: Test for invalid state transitions

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

This checks if KVM properly acknowledges invalid state transitions on
MSR_APIC_BASE writes with a #GP.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 lib/x86/apic-defs.h |  3 +++
 x86/apic.c  | 41 -
 2 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/lib/x86/apic-defs.h b/lib/x86/apic-defs.h
index c061e3d..94112b4 100644
--- a/lib/x86/apic-defs.h
+++ b/lib/x86/apic-defs.h
@@ -9,6 +9,9 @@
  */
 
 #defineAPIC_DEFAULT_PHYS_BASE  0xfee0
+#define APIC_BSP   (1UL  8)
+#define APIC_EXTD  (1UL  10)
+#define APIC_EN(1UL  11)
 
 #defineAPIC_ID 0x20
 
diff --git a/x86/apic.c b/x86/apic.c
index d06153f..4ebcd4f 100644
--- a/x86/apic.c
+++ b/x86/apic.c
@@ -4,6 +4,7 @@
 #include smp.h
 #include desc.h
 #include isr.h
+#include msr.h
 
 static int g_fail;
 static int g_tests;
@@ -70,14 +71,52 @@ static void test_tsc_deadline_timer(void)
 }
 }
 
-#define MSR_APIC_BASE 0x001b
+static void do_write_apicbase(void *data)
+{
+set_exception_return(resume);
+wrmsr(MSR_IA32_APICBASE, *(u64 *)data);
+resume:
+barrier();
+}
 
 void test_enable_x2apic(void)
 {
+u64 invalid_state = APIC_DEFAULT_PHYS_BASE | APIC_BSP | APIC_EXTD;
+u64 apic_enabled = APIC_DEFAULT_PHYS_BASE | APIC_BSP | APIC_EN;
+u64 x2apic_enabled =
+APIC_DEFAULT_PHYS_BASE | APIC_BSP | APIC_EN | APIC_EXTD;
+
 if (enable_x2apic()) {
 printf(x2apic enabled\n);
+
+report(x2apic enabled to invalid state,
+   test_for_exception(GP_VECTOR, do_write_apicbase,
+  invalid_state));
+report(x2apic enabled to apic enabled,
+   test_for_exception(GP_VECTOR, do_write_apicbase,
+  apic_enabled));
+
+wrmsr(MSR_IA32_APICBASE, APIC_DEFAULT_PHYS_BASE | APIC_BSP);
+report(disabled to invalid state,
+   test_for_exception(GP_VECTOR, do_write_apicbase,
+  invalid_state));
+report(disabled to x2apic enabled,
+   test_for_exception(GP_VECTOR, do_write_apicbase,
+  x2apic_enabled));
+
+wrmsr(MSR_IA32_APICBASE, apic_enabled);
+report(apic enabled to invalid state,
+   test_for_exception(GP_VECTOR, do_write_apicbase,
+  invalid_state));
+
+wrmsr(MSR_IA32_APICBASE, x2apic_enabled);
+apic_write(APIC_SPIV, 0x1ff);
 } else {
 printf(x2apic not detected\n);
+
+report(enable unsupported x2apic,
+   test_for_exception(GP_VECTOR, do_write_apicbase,
+  x2apic_enabled));
 }
 }
 
-- 
1.8.1.1.298.ge7eed54

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/13] Provide common report and report_summary services

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

This both reduces code duplication and standardizes the output format a
bit more.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 Makefile  |  3 ++-
 lib/libcflat.h|  4 
 lib/report.c  | 36 
 x86/apic.c| 15 +--
 x86/emulator.c| 16 +---
 x86/eventinj.c| 15 +--
 x86/idt_test.c| 21 -
 x86/msr.c | 15 +--
 x86/pcid.c| 14 +-
 x86/pmu.c | 37 +++--
 x86/taskswitch2.c | 15 +--
 x86/vmx.c | 16 +---
 x86/vmx.h |  1 -
 13 files changed, 68 insertions(+), 140 deletions(-)
 create mode 100644 lib/report.c

diff --git a/Makefile b/Makefile
index b6e8759..f5eccc7 100644
--- a/Makefile
+++ b/Makefile
@@ -14,7 +14,8 @@ libcflat := lib/libcflat.a
 cflatobjs := \
lib/panic.o \
lib/printf.o \
-   lib/string.o
+   lib/string.o \
+   lib/report.o
 cflatobjs += lib/argv.o
 
 #include architecure specific make rules
diff --git a/lib/libcflat.h b/lib/libcflat.h
index fadc33d..f734fde 100644
--- a/lib/libcflat.h
+++ b/lib/libcflat.h
@@ -45,6 +45,7 @@ extern char *strcat(char *dest, const char *src);
 extern int strcmp(const char *a, const char *b);
 
 extern int printf(const char *fmt, ...);
+extern int snprintf(char *buf, int size, const char *fmt, ...);
 extern int vsnprintf(char *buf, int size, const char *fmt, va_list va);
 
 extern void puts(const char *s);
@@ -58,4 +59,7 @@ extern long atol(const char *ptr);
 #define offsetof(TYPE, MEMBER) __builtin_offsetof (TYPE, MEMBER)
 
 #define NULL ((void *)0UL)
+
+void report(const char *msg_fmt, bool pass, ...);
+int report_summary(void);
 #endif
diff --git a/lib/report.c b/lib/report.c
new file mode 100644
index 000..ff562a1
--- /dev/null
+++ b/lib/report.c
@@ -0,0 +1,36 @@
+/*
+ * Test result reporting
+ *
+ * Copyright (c) Siemens AG, 2014
+ *
+ * Authors:
+ *  Jan Kiszka jan.kis...@siemens.com
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+
+#include libcflat.h
+
+static unsigned int tests, failures;
+
+void report(const char *msg_fmt, bool pass, ...)
+{
+   char buf[2000];
+   va_list va;
+
+   tests++;
+   printf(%s: , pass ? PASS : FAIL);
+   va_start(va, pass);
+   vsnprintf(buf, sizeof(buf), msg_fmt, va);
+   va_end(va);
+   puts(buf);
+   puts(\n);
+   if (!pass)
+   failures++;
+}
+
+int report_summary(void)
+{
+   printf(\nSUMMARY: %d tests, %d failures\n, tests, failures);
+   return failures  0 ? 1 : 0;
+}
diff --git a/x86/apic.c b/x86/apic.c
index 8febfa2..487c248 100644
--- a/x86/apic.c
+++ b/x86/apic.c
@@ -6,17 +6,6 @@
 #include isr.h
 #include msr.h
 
-static int g_fail;
-static int g_tests;
-
-static void report(const char *msg, int pass)
-{
-++g_tests;
-printf(%s: %s\n, msg, (pass ? PASS : FAIL));
-if (!pass)
-++g_fail;
-}
-
 static void test_lapic_existence(void)
 {
 u32 lvr;
@@ -403,7 +392,5 @@ int main()
 
 test_tsc_deadline_timer();
 
-printf(\nsummary: %d tests, %d failures\n, g_tests, g_fail);
-
-return g_fail != 0;
+return report_summary();
 }
diff --git a/x86/emulator.c b/x86/emulator.c
index b70e540..2e25dd8 100644
--- a/x86/emulator.c
+++ b/x86/emulator.c
@@ -7,8 +7,6 @@
 #define memset __builtin_memset
 #define TESTDEV_IO_PORT 0xe0
 
-int fails, tests;
-
 static int exceptions;
 
 struct regs {
@@ -25,17 +23,6 @@ struct insn_desc {
size_t len;
 };
 
-void report(const char *name, int result)
-{
-   ++tests;
-   if (result)
-   printf(PASS: %s\n, name);
-   else {
-   printf(FAIL: %s\n, name);
-   ++fails;
-   }
-}
-
 static char st1[] = abcdefghijklmnop;
 
 void test_stringio()
@@ -1022,6 +1009,5 @@ int main()
 
test_string_io_mmio(mem);
 
-   printf(\nSUMMARY: %d tests, %d failures\n, tests, fails);
-   return fails ? 1 : 0;
+   return report_summary();
 }
diff --git a/x86/eventinj.c b/x86/eventinj.c
index 9d4392c..a218aaf 100644
--- a/x86/eventinj.c
+++ b/x86/eventinj.c
@@ -12,9 +12,6 @@
 #  define R e
 #endif
 
-static int g_fail;
-static int g_tests;
-
 static inline void io_delay(void)
 {
 }
@@ -24,14 +21,6 @@ static inline void outl(int addr, int val)
 asm volatile (outl %1, %w0 : : d (addr), a (val));
 }
 
-static void report(const char *msg, int pass)
-{
-++g_tests;
-printf(%s: %s\n, msg, (pass ? PASS : FAIL));
-if (!pass)
-++g_fail;
-}
-
 void apic_self_ipi(u8 v)
 {
apic_icr_write(APIC_DEST_SELF | APIC_DEST_PHYSICAL | APIC_DM_FIXED |
@@ -416,7 +405,5 @@ int main()
printf(After int 33 with shadowed stack\n);
report(int 33 with shadowed stack, test_count == 1);
 
-   printf(\nsummary: %d tests, %d failures\n, g_tests, g_fail);
-
-   return g_fail != 0;

[PATCH 03/13] apic: Remove redundant enable_apic

2014-01-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Already called by the bootstrap code.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 x86/apic.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/x86/apic.c b/x86/apic.c
index 50e77fc..d06153f 100644
--- a/x86/apic.c
+++ b/x86/apic.c
@@ -326,7 +326,6 @@ int main()
 test_lapic_existence();
 
 mask_pic_interrupts();
-enable_apic();
 test_enable_x2apic();
 
 test_self_ipi();
-- 
1.8.1.1.298.ge7eed54

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IOMMU groups ... PEX8606 switch?

2014-01-04 Thread Dana Goyette

On 01/03/2014 04:03 PM, Alex Williamson wrote:

On Mon, 2013-12-30 at 16:13 -0800, Dana Goyette wrote:

On 12/29/2013 08:16 PM, Alex Williamson wrote:

On Sat, 2013-12-28 at 23:32 -0800, Dana Goyette wrote:

On 12/28/2013 7:23 PM, Alex Williamson wrote:

On Sat, 2013-12-28 at 18:31 -0800, Dana Goyette wrote:

I have purchased both a SuperMicro X10SAE and an X10SAT, and I need to
soon decide which one to keep.

The SuperMicro X10SAT has all the PCIe x1 slots hidden behind a PLX
PEX8066 switch, which claims to support ACS.  I'd expect the devices
downstream of the PLX switch to be in separate groups.

With Linux 3.13-rc5 and enable overrides for missing ACS capabilities
applied and set for the Intel root ports, the devices behind the switch
remain stuck in the same group.

In terms of passing devices to different VMs, which is better: all
devices on different root ports, or all devices behind the one
ACS-supporting switch?

Can you provide lspci -vvv info?  If you're getting that for groups
either the switch has ACS capabilities, but doesn't support the features
we need or we're doing something wrong.  Thanks,


I initially tried attaching the output as a .txt file, but it's too
large.  Anyway, here's the output of lspci -nnvvv (you may notice that I
moved the Radeon to a different slot).

Well, something seems amiss since the downstream switch ports all seem
to support and enable the correct set of ACS capabilities.  I'm tending
to suspect something wrong with the ACS override patch or how it's being
used since your IOMMU group is still based at the root port.  Each root
port is isolated from the other root ports though, so something is
happening with the override patch.  Can you provide the kernel command
line you use to enable ACS overrides and the override patch you're
using, as it applies to 3.13-rc5?  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


I'm using the original acs-override patch from this post:
https://lkml.org/lkml/2013/5/30/513

Kernel parameter is:
pcie_acs_override=id:8086:8c10,id:8086:8c12,id:8086:8c16,id:8086:8c18

When booting a kernel without the override patch, the following devices
are all in the same group: Intel Root Ports 1, 2, 4, 5; ASMedia SATA
controller; PLX PEX8606 switch; Renesas USB controller; TI Firewire
controller; Intel I210 Ethernet controller.

Could you please try the patch below and send dmesg for the system once
booted.  This applies directly to upstream and includes the acs override
patch.  Thanks,


(removed patch from quote.)

Here's the complete dmesg, with pcie_acs_override still set:

http://pastebin.com/YHuKnrTb

Most relevant section:

[0.524362] DMAR: No ATSR found
[0.524386] IOMMU 1 0xfed91000: using Queued invalidation
[0.524389] IOMMU: Setting RMRR:
[0.524398] IOMMU: Setting identity map for device :00:1d.0 
[0x7bea1000 - 0x7bea]
[0.524423] IOMMU: Setting identity map for device :00:1a.0 
[0x7bea1000 - 0x7bea]
[0.524441] IOMMU: Setting identity map for device :00:14.0 
[0x7bea1000 - 0x7bea]

[0.524454] IOMMU: Prepare 0-16MiB unity mapping for LPC
[0.524461] IOMMU: Setting identity map for device :00:1f.0 [0x0 
- 0xff]

[0.524548] PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
[0.524551] intel_iommu_add_device(:00:00.0)
[0.524552] dma_pdev #1: :00:00.0
[0.524553] dma_pdev #2: :00:00.0
[0.524554] dma_pdev #3: :00:00.0
[0.524554] dma_pdev #4: :00:00.0
[0.524565] intel_iommu_add_device(:00:01.0)
[0.524566] dma_pdev #1: :00:01.0
[0.524567] dma_pdev #2: :00:01.0
[0.524569] pci_acs_enabled(:00:01.0, 001d)
[0.524572] pci_acs_flags_enabled no ACS capability on :00:01.0
[0.524573] pci_acs_flags_enabled(:00:01.0, 001d) - false
[0.524574] - false
[0.524575] pci_acs_enabled(:00:01.0, 001d)
[0.524577] pci_acs_flags_enabled no ACS capability on :00:01.0
[0.524578] pci_acs_flags_enabled(:00:01.0, 001d) - false
[0.524579] - false
[0.524580] dma_pdev #3: :00:01.0
[0.524581] dma_pdev #4: :00:01.0
[0.524584] intel_iommu_add_device(:00:01.1)
[0.524586] dma_pdev #1: :00:01.1
[0.524586] dma_pdev #2: :00:01.1
[0.524587] pci_acs_enabled(:00:01.1, 001d)
[0.524589] pci_acs_flags_enabled no ACS capability on :00:01.1
[0.524590] pci_acs_flags_enabled(:00:01.1, 001d) - false
[0.524591] - false
[0.524592] pci_acs_enabled(:00:01.0, 001d)
[0.524593] pci_acs_flags_enabled no ACS capability on :00:01.0
[0.524595] pci_acs_flags_enabled(:00:01.0, 001d) - false
[0.524596] - false
[0.524596] dma_pdev #3: :00:01.0
[0.524597] dma_pdev #4: :00:01.0
[0.524599] intel_iommu_add_device(:00:02.0)
[0.524601] dma_pdev #1: 

Re: IOMMU groups ... PEX8606 switch?

2014-01-04 Thread Alex Williamson
On Sat, 2014-01-04 at 11:26 -0800, Dana Goyette wrote:
 On 01/03/2014 04:03 PM, Alex Williamson wrote:
  On Mon, 2013-12-30 at 16:13 -0800, Dana Goyette wrote:
  On 12/29/2013 08:16 PM, Alex Williamson wrote:
  On Sat, 2013-12-28 at 23:32 -0800, Dana Goyette wrote:
  On 12/28/2013 7:23 PM, Alex Williamson wrote:
  On Sat, 2013-12-28 at 18:31 -0800, Dana Goyette wrote:
  I have purchased both a SuperMicro X10SAE and an X10SAT, and I need to
  soon decide which one to keep.
 
  The SuperMicro X10SAT has all the PCIe x1 slots hidden behind a PLX
  PEX8066 switch, which claims to support ACS.  I'd expect the devices
  downstream of the PLX switch to be in separate groups.
 
  With Linux 3.13-rc5 and enable overrides for missing ACS capabilities
  applied and set for the Intel root ports, the devices behind the switch
  remain stuck in the same group.
 
  In terms of passing devices to different VMs, which is better: all
  devices on different root ports, or all devices behind the one
  ACS-supporting switch?
  Can you provide lspci -vvv info?  If you're getting that for groups
  either the switch has ACS capabilities, but doesn't support the features
  we need or we're doing something wrong.  Thanks,
 
  I initially tried attaching the output as a .txt file, but it's too
  large.  Anyway, here's the output of lspci -nnvvv (you may notice that I
  moved the Radeon to a different slot).
  Well, something seems amiss since the downstream switch ports all seem
  to support and enable the correct set of ACS capabilities.  I'm tending
  to suspect something wrong with the ACS override patch or how it's being
  used since your IOMMU group is still based at the root port.  Each root
  port is isolated from the other root ports though, so something is
  happening with the override patch.  Can you provide the kernel command
  line you use to enable ACS overrides and the override patch you're
  using, as it applies to 3.13-rc5?  Thanks,
 
  Alex
 
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
  I'm using the original acs-override patch from this post:
  https://lkml.org/lkml/2013/5/30/513
 
  Kernel parameter is:
  pcie_acs_override=id:8086:8c10,id:8086:8c12,id:8086:8c16,id:8086:8c18

Actually, you're not:

pcie_acs_override=id:8086:8c10,id:8086:8c16,id:8086:8c18,id:8086:ac1a,id:8086:8c1c,id:8086:8c1e,id:10b5:8606
 

And we register all of them:

[0.00] PCIe ACS bypass added for 8086:8c10
[0.00] PCIe ACS bypass added for 8086:8c16
[0.00] PCIe ACS bypass added for 8086:8c18
[0.00] PCIe ACS bypass added for 8086:ac1a
[0.00] PCIe ACS bypass added for 8086:8c1c
[0.00] PCIe ACS bypass added for 8086:8c1e
[0.00] PCIe ACS bypass added for 10b5:8606

However, note that the root port causing you trouble is 8086:8c12, which
isn't provided as an override, therefore the code is doing the right
thing and grouping all devices behind that root port together.

 
  When booting a kernel without the override patch, the following devices
  are all in the same group: Intel Root Ports 1, 2, 4, 5; ASMedia SATA
  controller; PLX PEX8606 switch; Renesas USB controller; TI Firewire
  controller; Intel I210 Ethernet controller.
  Could you please try the patch below and send dmesg for the system once
  booted.  This applies directly to upstream and includes the acs override
  patch.  Thanks,
 
 (removed patch from quote.)
 
 Here's the complete dmesg, with pcie_acs_override still set:
 
 http://pastebin.com/YHuKnrTb
 
 Most relevant section:
 
 [0.524362] DMAR: No ATSR found
 [0.524386] IOMMU 1 0xfed91000: using Queued invalidation
 [0.524389] IOMMU: Setting RMRR:
 [0.524398] IOMMU: Setting identity map for device :00:1d.0 
 [0x7bea1000 - 0x7bea]
 [0.524423] IOMMU: Setting identity map for device :00:1a.0 
 [0x7bea1000 - 0x7bea]
 [0.524441] IOMMU: Setting identity map for device :00:14.0 
 [0x7bea1000 - 0x7bea]
 [0.524454] IOMMU: Prepare 0-16MiB unity mapping for LPC
 [0.524461] IOMMU: Setting identity map for device :00:1f.0 [0x0 
 - 0xff]
 [0.524548] PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
 [0.524551] intel_iommu_add_device(:00:00.0)
 [0.524552] dma_pdev #1: :00:00.0
 [0.524553] dma_pdev #2: :00:00.0
 [0.524554] dma_pdev #3: :00:00.0
 [0.524554] dma_pdev #4: :00:00.0
 [0.524565] intel_iommu_add_device(:00:01.0)
 [0.524566] dma_pdev #1: :00:01.0
 [0.524567] dma_pdev #2: :00:01.0
 [0.524569] pci_acs_enabled(:00:01.0, 001d)
 [0.524572] pci_acs_flags_enabled no ACS capability on :00:01.0
 [0.524573] pci_acs_flags_enabled(:00:01.0, 001d) - false
 [0.524574] - false
 [0.524575] pci_acs_enabled(:00:01.0, 001d)
 [0.524577] pci_acs_flags_enabled no ACS 

Re: IOMMU groups ... PEX8606 switch?

2014-01-04 Thread Dana Goyette

On 01/04/2014 12:22 PM, Alex Williamson wrote:

On Sat, 2014-01-04 at 11:26 -0800, Dana Goyette wrote:

On 01/03/2014 04:03 PM, Alex Williamson wrote:

On Mon, 2013-12-30 at 16:13 -0800, Dana Goyette wrote:

On 12/29/2013 08:16 PM, Alex Williamson wrote:

On Sat, 2013-12-28 at 23:32 -0800, Dana Goyette wrote:

On 12/28/2013 7:23 PM, Alex Williamson wrote:

On Sat, 2013-12-28 at 18:31 -0800, Dana Goyette wrote:

I have purchased both a SuperMicro X10SAE and an X10SAT, and I need to
soon decide which one to keep.

The SuperMicro X10SAT has all the PCIe x1 slots hidden behind a PLX
PEX8066 switch, which claims to support ACS.  I'd expect the devices
downstream of the PLX switch to be in separate groups.

With Linux 3.13-rc5 and enable overrides for missing ACS capabilities
applied and set for the Intel root ports, the devices behind the switch
remain stuck in the same group.

In terms of passing devices to different VMs, which is better: all
devices on different root ports, or all devices behind the one
ACS-supporting switch?

Can you provide lspci -vvv info?  If you're getting that for groups
either the switch has ACS capabilities, but doesn't support the features
we need or we're doing something wrong.  Thanks,


I initially tried attaching the output as a .txt file, but it's too
large.  Anyway, here's the output of lspci -nnvvv (you may notice that I
moved the Radeon to a different slot).

Well, something seems amiss since the downstream switch ports all seem
to support and enable the correct set of ACS capabilities.  I'm tending
to suspect something wrong with the ACS override patch or how it's being
used since your IOMMU group is still based at the root port.  Each root
port is isolated from the other root ports though, so something is
happening with the override patch.  Can you provide the kernel command
line you use to enable ACS overrides and the override patch you're
using, as it applies to 3.13-rc5?  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


I'm using the original acs-override patch from this post:
https://lkml.org/lkml/2013/5/30/513

Kernel parameter is:
pcie_acs_override=id:8086:8c10,id:8086:8c12,id:8086:8c16,id:8086:8c18


Actually, you're not:

pcie_acs_override=id:8086:8c10,id:8086:8c16,id:8086:8c18,id:8086:ac1a,id:8086:8c1c,id:8086:8c1e,id:10b5:8606

And we register all of them:

[0.00] PCIe ACS bypass added for 8086:8c10
[0.00] PCIe ACS bypass added for 8086:8c16
[0.00] PCIe ACS bypass added for 8086:8c18
[0.00] PCIe ACS bypass added for 8086:ac1a
[0.00] PCIe ACS bypass added for 8086:8c1c
[0.00] PCIe ACS bypass added for 8086:8c1e
[0.00] PCIe ACS bypass added for 10b5:8606

However, note that the root port causing you trouble is 8086:8c12, which
isn't provided as an override, therefore the code is doing the right
thing and grouping all devices behind that root port together.



Thanks for catching that -- I certainly missed it!
I've added the override for that root port and removed the override for 
the PLX switch; now all the ports are indeed in separate groups.


Do we yet know if it'll be possible to properly isolate the Intel root 
ports, without this ACS override?


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html