Re: [PATCH v4 2/2] kvm-unit-tests : The first version of VMX nested test case

2013-07-16 Thread Arthur Chunqi Li
On Wed, Jul 17, 2013 at 2:26 PM, Jan Kiszka  wrote:
> On 2013-07-17 08:05, Arthur Chunqi Li wrote:
>> This is the first version for VMX nested environment test case. It
>> contains the basic VMX instructions test cases, including VMXON/
>> VMXOFF/VMXPTRLD/VMXPTRST/VMCLEAR/VMLAUNCH/VMRESUME/VMCALL. This patch
>> also tests the basic execution routine in VMX nested environment and
>> let the VM print "Hello World" to inform its successfully run.
>>
>> New files added:
>> x86/vmx.h : contains all VMX related macro declerations
>> x86/vmx.c : main file for VMX nested test case
>>
>> Signed-off-by: Arthur Chunqi Li 
>> ---
>>  config-x86-common.mak |2 +
>>  config-x86_64.mak |1 +
>>  lib/x86/msr.h |5 +
>>  x86/cstart64.S|4 +
>>  x86/unittests.cfg |6 +
>>  x86/vmx.c |  676 
>> +
>>  x86/vmx.h |  419 ++
>>  7 files changed, 1113 insertions(+)
>>  create mode 100644 x86/vmx.c
>>  create mode 100644 x86/vmx.h
>>
>> diff --git a/config-x86-common.mak b/config-x86-common.mak
>> index 455032b..34a41e1 100644
>> --- a/config-x86-common.mak
>> +++ b/config-x86-common.mak
>> @@ -101,6 +101,8 @@ $(TEST_DIR)/asyncpf.elf: $(cstart.o) 
>> $(TEST_DIR)/asyncpf.o
>>
>>  $(TEST_DIR)/pcid.elf: $(cstart.o) $(TEST_DIR)/pcid.o
>>
>> +$(TEST_DIR)/vmx.elf: $(cstart.o) $(TEST_DIR)/vmx.o
>> +
>>  arch_clean:
>>   $(RM) $(TEST_DIR)/*.o $(TEST_DIR)/*.flat $(TEST_DIR)/*.elf \
>>   $(TEST_DIR)/.*.d $(TEST_DIR)/lib/.*.d $(TEST_DIR)/lib/*.o
>> diff --git a/config-x86_64.mak b/config-x86_64.mak
>> index 91ffcce..5d9b22a 100644
>> --- a/config-x86_64.mak
>> +++ b/config-x86_64.mak
>> @@ -11,5 +11,6 @@ tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
>> $(TEST_DIR)/xsave.flat $(TEST_DIR)/rmap_chain.flat \
>> $(TEST_DIR)/pcid.flat
>>  tests += $(TEST_DIR)/svm.flat
>> +tests += $(TEST_DIR)/vmx.flat
>>
>>  include config-x86-common.mak
>> diff --git a/lib/x86/msr.h b/lib/x86/msr.h
>> index 509a421..281255a 100644
>> --- a/lib/x86/msr.h
>> +++ b/lib/x86/msr.h
>> @@ -396,6 +396,11 @@
>>  #define MSR_IA32_VMX_VMCS_ENUM  0x048a
>>  #define MSR_IA32_VMX_PROCBASED_CTLS20x048b
>>  #define MSR_IA32_VMX_EPT_VPID_CAP   0x048c
>> +#define MSR_IA32_VMX_TRUE_PIN0x048d
>> +#define MSR_IA32_VMX_TRUE_PROC   0x048e
>> +#define MSR_IA32_VMX_TRUE_EXIT   0x048f
>> +#define MSR_IA32_VMX_TRUE_ENTRY  0x0490
>> +
>>
>>  /* AMD-V MSRs */
>>
>> diff --git a/x86/cstart64.S b/x86/cstart64.S
>> index 24df5f8..0fe76da 100644
>> --- a/x86/cstart64.S
>> +++ b/x86/cstart64.S
>> @@ -4,6 +4,10 @@
>>  .globl boot_idt
>>  boot_idt = 0
>>
>> +.globl idt_descr
>> +.globl tss_descr
>> +.globl gdt64_desc
>> +
>>  ipi_vector = 0x20
>>
>>  max_cpus = 64
>> diff --git a/x86/unittests.cfg b/x86/unittests.cfg
>> index bc9643e..85c36aa 100644
>> --- a/x86/unittests.cfg
>> +++ b/x86/unittests.cfg
>> @@ -149,3 +149,9 @@ extra_params = --append "1000 `date +%s`"
>>  file = pcid.flat
>>  extra_params = -cpu qemu64,+pcid
>>  arch = x86_64
>> +
>> +[vmx]
>> +file = vmx.flat
>> +extra_params = -cpu host,+vmx
>> +arch = x86_64
>> +
>> diff --git a/x86/vmx.c b/x86/vmx.c
>> new file mode 100644
>> index 000..af48fce
>> --- /dev/null
>> +++ b/x86/vmx.c
>> @@ -0,0 +1,676 @@
>> +#include "libcflat.h"
>> +#include "processor.h"
>> +#include "vm.h"
>> +#include "desc.h"
>> +#include "vmx.h"
>> +#include "msr.h"
>> +#include "smp.h"
>> +#include "io.h"
>> +#include "setjmp.h"
>> +
>> +int fails = 0, tests = 0;
>> +u32 *vmxon_region;
>> +struct vmcs *vmcs_root;
>> +void *io_bmp1, *io_bmp2;
>> +void *msr_bmp;
>> +u32 vpid_ctr;
>> +char *guest_stack, *host_stack;
>> +char *guest_syscall_stack, *host_syscall_stack;
>> +u32 ctrl_pin, ctrl_enter, ctrl_exit, ctrl_cpu[2];
>> +ulong fix_cr0_set, fix_cr0_clr;
>> +ulong fix_cr4_set, fix_cr4_clr;
>> +struct regs regs;
>> +jmp_buf env;
>> +struct vmx_test *current;
>> +
>> +extern u64 gdt64_desc[];
>> +extern u64 idt_descr[];
>> +extern u64 tss_descr[];
>> +extern void *entry_vmx;
>> +extern void *entry_sysenter;
>> +extern void *guest_entry;
>> +
>> +void report(const char *name, int result)
>> +{
>> + ++tests;
>> + if (result)
>> + printf("PASS: %s\n", name);
>> + else {
>> + printf("FAIL: %s\n", name);
>> + ++fails;
>> + }
>> +}
>> +
>> +inline u64 get_rflags(void)
>> +{
>> + u64 r;
>> + asm volatile("pushf; pop %0\n\t" : "=q"(r) : : "cc");
>> + return r;
>> +}
>> +
>> +inline void set_rflags(u64 r)
>> +{
>> + asm volatile("push %0; popf\n\t" : : "q"(r) : "cc");
>> +}
>> +
>> +int vmcs_clear(struct vmcs *vmcs)
>> +{
>> + bool ret;
>> + asm volatile ("vmclear %1; setbe %0" : "=q" (ret) : "m" (vmcs) : "cc");
>> + return ret;
>> +}
>> +
>> +u64 vmcs_read(enum Encoding enc)
>> +{
>> + u64 val;
>> + asm volatile ("

Re: [PATCH v4 2/2] kvm-unit-tests : The first version of VMX nested test case

2013-07-16 Thread Jan Kiszka
On 2013-07-17 08:05, Arthur Chunqi Li wrote:
> This is the first version for VMX nested environment test case. It
> contains the basic VMX instructions test cases, including VMXON/
> VMXOFF/VMXPTRLD/VMXPTRST/VMCLEAR/VMLAUNCH/VMRESUME/VMCALL. This patch
> also tests the basic execution routine in VMX nested environment and
> let the VM print "Hello World" to inform its successfully run.
> 
> New files added:
> x86/vmx.h : contains all VMX related macro declerations
> x86/vmx.c : main file for VMX nested test case
> 
> Signed-off-by: Arthur Chunqi Li 
> ---
>  config-x86-common.mak |2 +
>  config-x86_64.mak |1 +
>  lib/x86/msr.h |5 +
>  x86/cstart64.S|4 +
>  x86/unittests.cfg |6 +
>  x86/vmx.c |  676 
> +
>  x86/vmx.h |  419 ++
>  7 files changed, 1113 insertions(+)
>  create mode 100644 x86/vmx.c
>  create mode 100644 x86/vmx.h
> 
> diff --git a/config-x86-common.mak b/config-x86-common.mak
> index 455032b..34a41e1 100644
> --- a/config-x86-common.mak
> +++ b/config-x86-common.mak
> @@ -101,6 +101,8 @@ $(TEST_DIR)/asyncpf.elf: $(cstart.o) $(TEST_DIR)/asyncpf.o
>  
>  $(TEST_DIR)/pcid.elf: $(cstart.o) $(TEST_DIR)/pcid.o
>  
> +$(TEST_DIR)/vmx.elf: $(cstart.o) $(TEST_DIR)/vmx.o
> +
>  arch_clean:
>   $(RM) $(TEST_DIR)/*.o $(TEST_DIR)/*.flat $(TEST_DIR)/*.elf \
>   $(TEST_DIR)/.*.d $(TEST_DIR)/lib/.*.d $(TEST_DIR)/lib/*.o
> diff --git a/config-x86_64.mak b/config-x86_64.mak
> index 91ffcce..5d9b22a 100644
> --- a/config-x86_64.mak
> +++ b/config-x86_64.mak
> @@ -11,5 +11,6 @@ tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
> $(TEST_DIR)/xsave.flat $(TEST_DIR)/rmap_chain.flat \
> $(TEST_DIR)/pcid.flat
>  tests += $(TEST_DIR)/svm.flat
> +tests += $(TEST_DIR)/vmx.flat
>  
>  include config-x86-common.mak
> diff --git a/lib/x86/msr.h b/lib/x86/msr.h
> index 509a421..281255a 100644
> --- a/lib/x86/msr.h
> +++ b/lib/x86/msr.h
> @@ -396,6 +396,11 @@
>  #define MSR_IA32_VMX_VMCS_ENUM  0x048a
>  #define MSR_IA32_VMX_PROCBASED_CTLS20x048b
>  #define MSR_IA32_VMX_EPT_VPID_CAP   0x048c
> +#define MSR_IA32_VMX_TRUE_PIN0x048d
> +#define MSR_IA32_VMX_TRUE_PROC   0x048e
> +#define MSR_IA32_VMX_TRUE_EXIT   0x048f
> +#define MSR_IA32_VMX_TRUE_ENTRY  0x0490
> +
>  
>  /* AMD-V MSRs */
>  
> diff --git a/x86/cstart64.S b/x86/cstart64.S
> index 24df5f8..0fe76da 100644
> --- a/x86/cstart64.S
> +++ b/x86/cstart64.S
> @@ -4,6 +4,10 @@
>  .globl boot_idt
>  boot_idt = 0
>  
> +.globl idt_descr
> +.globl tss_descr
> +.globl gdt64_desc
> +
>  ipi_vector = 0x20
>  
>  max_cpus = 64
> diff --git a/x86/unittests.cfg b/x86/unittests.cfg
> index bc9643e..85c36aa 100644
> --- a/x86/unittests.cfg
> +++ b/x86/unittests.cfg
> @@ -149,3 +149,9 @@ extra_params = --append "1000 `date +%s`"
>  file = pcid.flat
>  extra_params = -cpu qemu64,+pcid
>  arch = x86_64
> +
> +[vmx]
> +file = vmx.flat
> +extra_params = -cpu host,+vmx
> +arch = x86_64
> +
> diff --git a/x86/vmx.c b/x86/vmx.c
> new file mode 100644
> index 000..af48fce
> --- /dev/null
> +++ b/x86/vmx.c
> @@ -0,0 +1,676 @@
> +#include "libcflat.h"
> +#include "processor.h"
> +#include "vm.h"
> +#include "desc.h"
> +#include "vmx.h"
> +#include "msr.h"
> +#include "smp.h"
> +#include "io.h"
> +#include "setjmp.h"
> +
> +int fails = 0, tests = 0;
> +u32 *vmxon_region;
> +struct vmcs *vmcs_root;
> +void *io_bmp1, *io_bmp2;
> +void *msr_bmp;
> +u32 vpid_ctr;
> +char *guest_stack, *host_stack;
> +char *guest_syscall_stack, *host_syscall_stack;
> +u32 ctrl_pin, ctrl_enter, ctrl_exit, ctrl_cpu[2];
> +ulong fix_cr0_set, fix_cr0_clr;
> +ulong fix_cr4_set, fix_cr4_clr;
> +struct regs regs;
> +jmp_buf env;
> +struct vmx_test *current;
> +
> +extern u64 gdt64_desc[];
> +extern u64 idt_descr[];
> +extern u64 tss_descr[];
> +extern void *entry_vmx;
> +extern void *entry_sysenter;
> +extern void *guest_entry;
> +
> +void report(const char *name, int result)
> +{
> + ++tests;
> + if (result)
> + printf("PASS: %s\n", name);
> + else {
> + printf("FAIL: %s\n", name);
> + ++fails;
> + }
> +}
> +
> +inline u64 get_rflags(void)
> +{
> + u64 r;
> + asm volatile("pushf; pop %0\n\t" : "=q"(r) : : "cc");
> + return r;
> +}
> +
> +inline void set_rflags(u64 r)
> +{
> + asm volatile("push %0; popf\n\t" : : "q"(r) : "cc");
> +}
> +
> +int vmcs_clear(struct vmcs *vmcs)
> +{
> + bool ret;
> + asm volatile ("vmclear %1; setbe %0" : "=q" (ret) : "m" (vmcs) : "cc");
> + return ret;
> +}
> +
> +u64 vmcs_read(enum Encoding enc)
> +{
> + u64 val;
> + asm volatile ("vmread %1, %0" : "=rm" (val) : "r" ((u64)enc) : "cc");
> + return val;
> +}
> +
> +int vmcs_write(enum Encoding enc, u64 val)
> +{
> + bool ret;
> + asm volatile ("vmwrite %1, %2; setbe %0"
> + 

Re: [PATCH v4 0/2] Basic nested VMX test suite

2013-07-16 Thread Gleb Natapov
On Wed, Jul 17, 2013 at 02:08:54PM +0800, Arthur Chunqi Li wrote:
> Hi Gleb and Paolo,
> As your suggestion, I add general interface for adding test suite in
> this version. It is similar to the achievement of x86/vmx.c, and I
> also move tests for vmenter (vmlaunch and vmresume test) to an
> independent test suite.
> 
The general interface looks fine, can be extended if needed, but you
ignored my comment about refactoring vmx_run() to make vmexit return
just after vmresume. Do it, you will see how clearer the code and the
logic will be. 99% of code we are dealing with as a programmers is
linear, we are much better following liner logic.

> Arthur
> 
> On Wed, Jul 17, 2013 at 2:05 PM, Arthur Chunqi Li  wrote:
> > These two patches are focued on providing a basic version of VMX nested
> > test suite. This commit provides the architecture of nested VMX similiar
> > to x86/svm.c.
> >
> > When new test suite added, just write functions included in "struct 
> > vmx_test"
> > and register it in array "vmx_tests". GPR in guest can be read from 
> > vmx_test.
> > guest_regs. VM exit times can be read from vmx_test.exits.
> >
> > setjmp/longjmp is designed to avoid exiting VMX in call-back form.
> >
> > Arthur Chunqi Li (2):
> >   kvm-unit-tests : Add setjmp/longjmp to libcflat
> >   kvm-unit-tests : The first version of VMX nested test case
> >
> >  config-x86-common.mak |2 +
> >  config-x86_64.mak |3 +
> >  lib/setjmp.h  |   11 +
> >  lib/x86/msr.h |5 +
> >  lib/x86/setjmp64.S|   27 ++
> >  x86/cstart64.S|4 +
> >  x86/unittests.cfg |6 +
> >  x86/vmx.c |  676 
> > +
> >  x86/vmx.h |  419 ++
> >  9 files changed, 1153 insertions(+)
> >  create mode 100644 lib/setjmp.h
> >  create mode 100644 lib/x86/setjmp64.S
> >  create mode 100644 x86/vmx.c
> >  create mode 100644 x86/vmx.h
> >
> > --
> > 1.7.9.5
> >
> 
> 
> 
> -- 
> Arthur Chunqi Li
> Department of Computer Science
> School of EECS
> Peking University
> Beijing, China

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/2] Basic nested VMX test suite

2013-07-16 Thread Arthur Chunqi Li
Hi Gleb and Paolo,
As your suggestion, I add general interface for adding test suite in
this version. It is similar to the achievement of x86/vmx.c, and I
also move tests for vmenter (vmlaunch and vmresume test) to an
independent test suite.

Arthur

On Wed, Jul 17, 2013 at 2:05 PM, Arthur Chunqi Li  wrote:
> These two patches are focued on providing a basic version of VMX nested
> test suite. This commit provides the architecture of nested VMX similiar
> to x86/svm.c.
>
> When new test suite added, just write functions included in "struct vmx_test"
> and register it in array "vmx_tests". GPR in guest can be read from vmx_test.
> guest_regs. VM exit times can be read from vmx_test.exits.
>
> setjmp/longjmp is designed to avoid exiting VMX in call-back form.
>
> Arthur Chunqi Li (2):
>   kvm-unit-tests : Add setjmp/longjmp to libcflat
>   kvm-unit-tests : The first version of VMX nested test case
>
>  config-x86-common.mak |2 +
>  config-x86_64.mak |3 +
>  lib/setjmp.h  |   11 +
>  lib/x86/msr.h |5 +
>  lib/x86/setjmp64.S|   27 ++
>  x86/cstart64.S|4 +
>  x86/unittests.cfg |6 +
>  x86/vmx.c |  676 
> +
>  x86/vmx.h |  419 ++
>  9 files changed, 1153 insertions(+)
>  create mode 100644 lib/setjmp.h
>  create mode 100644 lib/x86/setjmp64.S
>  create mode 100644 x86/vmx.c
>  create mode 100644 x86/vmx.h
>
> --
> 1.7.9.5
>



-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 2/2] kvm-unit-tests : The first version of VMX nested test case

2013-07-16 Thread Arthur Chunqi Li
This is the first version for VMX nested environment test case. It
contains the basic VMX instructions test cases, including VMXON/
VMXOFF/VMXPTRLD/VMXPTRST/VMCLEAR/VMLAUNCH/VMRESUME/VMCALL. This patch
also tests the basic execution routine in VMX nested environment and
let the VM print "Hello World" to inform its successfully run.

New files added:
x86/vmx.h : contains all VMX related macro declerations
x86/vmx.c : main file for VMX nested test case

Signed-off-by: Arthur Chunqi Li 
---
 config-x86-common.mak |2 +
 config-x86_64.mak |1 +
 lib/x86/msr.h |5 +
 x86/cstart64.S|4 +
 x86/unittests.cfg |6 +
 x86/vmx.c |  676 +
 x86/vmx.h |  419 ++
 7 files changed, 1113 insertions(+)
 create mode 100644 x86/vmx.c
 create mode 100644 x86/vmx.h

diff --git a/config-x86-common.mak b/config-x86-common.mak
index 455032b..34a41e1 100644
--- a/config-x86-common.mak
+++ b/config-x86-common.mak
@@ -101,6 +101,8 @@ $(TEST_DIR)/asyncpf.elf: $(cstart.o) $(TEST_DIR)/asyncpf.o
 
 $(TEST_DIR)/pcid.elf: $(cstart.o) $(TEST_DIR)/pcid.o
 
+$(TEST_DIR)/vmx.elf: $(cstart.o) $(TEST_DIR)/vmx.o
+
 arch_clean:
$(RM) $(TEST_DIR)/*.o $(TEST_DIR)/*.flat $(TEST_DIR)/*.elf \
$(TEST_DIR)/.*.d $(TEST_DIR)/lib/.*.d $(TEST_DIR)/lib/*.o
diff --git a/config-x86_64.mak b/config-x86_64.mak
index 91ffcce..5d9b22a 100644
--- a/config-x86_64.mak
+++ b/config-x86_64.mak
@@ -11,5 +11,6 @@ tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
  $(TEST_DIR)/xsave.flat $(TEST_DIR)/rmap_chain.flat \
  $(TEST_DIR)/pcid.flat
 tests += $(TEST_DIR)/svm.flat
+tests += $(TEST_DIR)/vmx.flat
 
 include config-x86-common.mak
diff --git a/lib/x86/msr.h b/lib/x86/msr.h
index 509a421..281255a 100644
--- a/lib/x86/msr.h
+++ b/lib/x86/msr.h
@@ -396,6 +396,11 @@
 #define MSR_IA32_VMX_VMCS_ENUM  0x048a
 #define MSR_IA32_VMX_PROCBASED_CTLS20x048b
 #define MSR_IA32_VMX_EPT_VPID_CAP   0x048c
+#define MSR_IA32_VMX_TRUE_PIN  0x048d
+#define MSR_IA32_VMX_TRUE_PROC 0x048e
+#define MSR_IA32_VMX_TRUE_EXIT 0x048f
+#define MSR_IA32_VMX_TRUE_ENTRY0x0490
+
 
 /* AMD-V MSRs */
 
diff --git a/x86/cstart64.S b/x86/cstart64.S
index 24df5f8..0fe76da 100644
--- a/x86/cstart64.S
+++ b/x86/cstart64.S
@@ -4,6 +4,10 @@
 .globl boot_idt
 boot_idt = 0
 
+.globl idt_descr
+.globl tss_descr
+.globl gdt64_desc
+
 ipi_vector = 0x20
 
 max_cpus = 64
diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index bc9643e..85c36aa 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -149,3 +149,9 @@ extra_params = --append "1000 `date +%s`"
 file = pcid.flat
 extra_params = -cpu qemu64,+pcid
 arch = x86_64
+
+[vmx]
+file = vmx.flat
+extra_params = -cpu host,+vmx
+arch = x86_64
+
diff --git a/x86/vmx.c b/x86/vmx.c
new file mode 100644
index 000..af48fce
--- /dev/null
+++ b/x86/vmx.c
@@ -0,0 +1,676 @@
+#include "libcflat.h"
+#include "processor.h"
+#include "vm.h"
+#include "desc.h"
+#include "vmx.h"
+#include "msr.h"
+#include "smp.h"
+#include "io.h"
+#include "setjmp.h"
+
+int fails = 0, tests = 0;
+u32 *vmxon_region;
+struct vmcs *vmcs_root;
+void *io_bmp1, *io_bmp2;
+void *msr_bmp;
+u32 vpid_ctr;
+char *guest_stack, *host_stack;
+char *guest_syscall_stack, *host_syscall_stack;
+u32 ctrl_pin, ctrl_enter, ctrl_exit, ctrl_cpu[2];
+ulong fix_cr0_set, fix_cr0_clr;
+ulong fix_cr4_set, fix_cr4_clr;
+struct regs regs;
+jmp_buf env;
+struct vmx_test *current;
+
+extern u64 gdt64_desc[];
+extern u64 idt_descr[];
+extern u64 tss_descr[];
+extern void *entry_vmx;
+extern void *entry_sysenter;
+extern void *guest_entry;
+
+void report(const char *name, int result)
+{
+   ++tests;
+   if (result)
+   printf("PASS: %s\n", name);
+   else {
+   printf("FAIL: %s\n", name);
+   ++fails;
+   }
+}
+
+inline u64 get_rflags(void)
+{
+   u64 r;
+   asm volatile("pushf; pop %0\n\t" : "=q"(r) : : "cc");
+   return r;
+}
+
+inline void set_rflags(u64 r)
+{
+   asm volatile("push %0; popf\n\t" : : "q"(r) : "cc");
+}
+
+int vmcs_clear(struct vmcs *vmcs)
+{
+   bool ret;
+   asm volatile ("vmclear %1; setbe %0" : "=q" (ret) : "m" (vmcs) : "cc");
+   return ret;
+}
+
+u64 vmcs_read(enum Encoding enc)
+{
+   u64 val;
+   asm volatile ("vmread %1, %0" : "=rm" (val) : "r" ((u64)enc) : "cc");
+   return val;
+}
+
+int vmcs_write(enum Encoding enc, u64 val)
+{
+   bool ret;
+   asm volatile ("vmwrite %1, %2; setbe %0"
+   : "=q"(ret) : "rm" (val), "r" ((u64)enc) : "cc");
+   return ret;
+}
+
+int make_vmcs_current(struct vmcs *vmcs)
+{
+   bool ret;
+
+   asm volatile ("vmptrld %1; setbe %0" : "=q" (ret) : "m" (vmcs) : "cc");
+   return ret;
+}
+
+int save_vmcs(struct vmcs **vmcs)
+{
+   bool ret;
+
+   asm volatile ("vmptrst %1; setbe %0" 

[PATCH v4 0/2] Basic nested VMX test suite

2013-07-16 Thread Arthur Chunqi Li
These two patches are focued on providing a basic version of VMX nested
test suite. This commit provides the architecture of nested VMX similiar
to x86/svm.c.

When new test suite added, just write functions included in "struct vmx_test"
and register it in array "vmx_tests". GPR in guest can be read from vmx_test.
guest_regs. VM exit times can be read from vmx_test.exits.

setjmp/longjmp is designed to avoid exiting VMX in call-back form.

Arthur Chunqi Li (2):
  kvm-unit-tests : Add setjmp/longjmp to libcflat
  kvm-unit-tests : The first version of VMX nested test case

 config-x86-common.mak |2 +
 config-x86_64.mak |3 +
 lib/setjmp.h  |   11 +
 lib/x86/msr.h |5 +
 lib/x86/setjmp64.S|   27 ++
 x86/cstart64.S|4 +
 x86/unittests.cfg |6 +
 x86/vmx.c |  676 +
 x86/vmx.h |  419 ++
 9 files changed, 1153 insertions(+)
 create mode 100644 lib/setjmp.h
 create mode 100644 lib/x86/setjmp64.S
 create mode 100644 x86/vmx.c
 create mode 100644 x86/vmx.h

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 1/2] kvm-unit-tests : Add setjmp/longjmp to libcflat

2013-07-16 Thread Arthur Chunqi Li
Add setjmp and longjmp functions to libcflat. Now these two functions
are only supported in X86_64 arch.

New files added:
lib/x86/setjmp64.S
lib/x86/setjmp64.c

Signed-off-by: Arthur Chunqi Li 
---
 config-x86_64.mak  |2 ++
 lib/setjmp.h   |   11 +++
 lib/x86/setjmp64.S |   27 +++
 3 files changed, 40 insertions(+)
 create mode 100644 lib/setjmp.h
 create mode 100644 lib/x86/setjmp64.S

diff --git a/config-x86_64.mak b/config-x86_64.mak
index 4e525f5..91ffcce 100644
--- a/config-x86_64.mak
+++ b/config-x86_64.mak
@@ -4,6 +4,8 @@ bits = 64
 ldarch = elf64-x86-64
 CFLAGS += -D__x86_64__
 
+cflatobjs += lib/x86/setjmp64.o
+
 tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
  $(TEST_DIR)/emulator.flat $(TEST_DIR)/idt_test.flat \
  $(TEST_DIR)/xsave.flat $(TEST_DIR)/rmap_chain.flat \
diff --git a/lib/setjmp.h b/lib/setjmp.h
new file mode 100644
index 000..eca70d9
--- /dev/null
+++ b/lib/setjmp.h
@@ -0,0 +1,11 @@
+#ifndef LIBCFLAT_SETJMP64_H
+#define LIBCFLAT_SETJMP64_H
+
+#include "libcflat.h"
+
+typedef char jmp_buf[64];
+
+void longjmp(jmp_buf env, int val);
+int setjmp(jmp_buf env);
+
+#endif
diff --git a/lib/x86/setjmp64.S b/lib/x86/setjmp64.S
new file mode 100644
index 000..c8ae790
--- /dev/null
+++ b/lib/x86/setjmp64.S
@@ -0,0 +1,27 @@
+.globl setjmp
+setjmp:
+   mov (%rsp), %rsi
+   mov %rsi, (%rdi)
+   mov %rsp, 0x8(%rdi)
+   mov %rbp, 0x10(%rdi)
+   mov %rbx, 0x18(%rdi)
+   mov %r12, 0x20(%rdi)
+   mov %r13, 0x28(%rdi)
+   mov %r14, 0x30(%rdi)
+   mov %r15, 0x38(%rdi)
+   xor %eax, %eax
+   ret
+
+.globl longjmp
+longjmp:
+   mov %esi, %eax
+   mov 0x38(%rdi), %r15
+   mov 0x30(%rdi), %r14
+   mov 0x28(%rdi), %r13
+   mov 0x20(%rdi), %r12
+   mov 0x18(%rdi), %rbx
+   mov 0x10(%rdi), %rbp
+   mov 0x8(%rdi), %rsp
+   mov (%rdi), %rsi
+   mov %rsi, (%rsp)
+   ret
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vga passthrough // vfio // qemu bridge

2013-07-16 Thread Martin Wolf
thank you for the quick response alex,

but i still need your help ;)
i cloned both git trees
(http://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00432.html)
that was the easy part for me (it boots like a charm ...) ... then i
built the qemu tree and found out
that it is just 1.4.50 and something was missing
so qemu would not start up ...

would you be so kind alex and supply me with
the patches you meant yesterday i would need for qemu?

ty in advance



Am 16.07.2013 16:25, schrieb Alex Williamson:
> On Tue, 2013-07-16 at 14:35 +0200, Martin Wolf wrote:
>> Early 2012 i tested the old vga passthrough capabilities of KVM and was
>> partly successful.
>> now with the new vfio driver i tried again according to alex's hints and
>> this guide:
>> https://bbs.archlinux.org/viewtopic.php?id=162768
>>
>> since im primarily using ubuntu i used the daily build of saucy.
>> it ships qemu 1.5 and seabios 1.7.3 so the requirements are met.
>>
>> according to the guide i prepared the vga card (amd 7870)
>>
>> [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-2-generic
>> root=UUID=26fed560-a972-499d-ab14-7fec6439fd3d ro intel_iommu=on
>> pci-stub.ids=1002:6818,1002:aab0 quiet splash vt.handoff=7
>> [0.00] Kernel command line:
>> BOOT_IMAGE=/boot/vmlinuz-3.10.0-2-generic
>> root=UUID=26fed560-a972-499d-ab14-7fec6439fd3d ro intel_iommu=on
>> pci-stub.ids=1002:6818,1002:aab0 quiet splash vt.handoff=7
>> [0.569977] pci-stub: add 1002:6818 sub=:
>> cls=/
>> [0.569987] pci-stub :01:00.0: claimed by stub
>> [0.569994] pci-stub: add 1002:AAB0 sub=:
>> cls=/
>> [0.569998] pci-stub :01:00.1: claimed by stub
>>
>> then did this just to be sure:
>> echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" >
>> /etc/modprobe.d/vfio_iommu_type1.conf
>> (or was that wrong?)
>> im using a z87 haswell mainboard
> Hopefully not needed, only use this option if you need to.  vfio will
> print an error to dmesg and qemu will fail to start if you need it.
>
>> after that i binded the two devices to vfio-pci with:
>> vfio-bind :01:00.0 :01:00.1 (the script in the guide)
>>
>> afterwards i was able to start the kvm with
>> qemu-system-x86_64 -enable-kvm -M q35 -m 8192 -cpu host \
>> -smp 8,sockets=1,cores=4,threads=8 \
>> -bios /usr/share/qemu/bios.bin -vga none \
>> -device
>> ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1 \
>> -device
>> vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on \
>> -device vfio-pci,host=01:00.1,bus=root.1,addr=00.1 \
>> -device ahci,bus=pcie.0,id=ahci \
>> -drive file=/home/martin/windows.img,if=none,id=disk,format=raw -device
>> virtio-blk-pci,drive=disk \
>> -drive file=/home/martin/X17-59885.iso,id=isocd -device
>> ide-cd,bus=ahci.0,drive=isocd \
>> -net nic,model=virtio \
>> -net user \
>> -usb -usbdevice host:1532:000c \
>> -drive file=/home/martin/Downloads/virtio-win-0.1-59.iso,id=isocd1
>> -device ide-cd,bus=ahci.1,drive=isocd1
>>
>> to my surprise i instantly got the windows installation running
>> installed the virtio drivers for nic and storage
>> and had 15 mins later a working win7 installation.
>> now i installed the amd driver (13.4) and rebooted.
>> i got a bluescreen. similar to my old expiriences so i thought do a
>> clean host reboot and try again.
>> but still the same. so i tried to load the bios.rom for the card (found
>> it on techpowerup) again no luck.
>> maybe someone knows a hint?
> I think most of the folks using the guide you reference are using my
> vfio-vga-reset branches of qemu & kernel (or patches extracted from
> them).  These add an important step for reproducibility, by being able
> to reset the bus for the graphics card, giving us a way to reset the
> device.  The other thing in the qemu branch are improved quirks.  I've
> just posted these to qemu-devel and plan to get them pulled for 1.6.
> Note that for ATI/AMD cards, a critical quirks is intercepting the byte
> at I/O port address 0x3c3.  Without this, the VGA BIOS can't bootstrap
> itself.  The vfio-vga-reset branch has a conditional replacement of
> this, which doesn't seem to work for everyone.  I believe the version I
> posted to qemu-devel yesterday is a better implementation of that quirk.
>
>> ---
>>
>> about qemu bridge
>> i tried to set up a bridge with the config but qemu always told me that
>> qemu-bridge-helper is not present.
>> all i found out that it propably got removed from the package because of
>> the lack of control over the tap
>> devices.
>> now my question how can i still bridge the vm into my network without
>> that helper?
> I don't know what qemu-bridge-helper is/was, but you're probably better
> off asking bridge questions in a separate thread instead of buried here.
> Thanks,
>
> Alex
>

-- 
Adiumentum GmbH
Gf. Martin Wolf
Banderbacherstraße 76
90513 Zirndorf

0911 / 9601470
mw...@adiumentum.com

--
To unsubscribe from this list:

Linux Plumbers ACPI/PM, PCI Microconference

2013-07-16 Thread Myron Stowe
Linux Plumbers has approved an ACPI/PM, PCI microconference. The
overview page is here:

http://wiki.linuxplumbersconf.org/2013:pci_subsystem

We would like to start receiving volunteers for presenting topics of
interest.  There is a lot of activity in these subsystems so please
respond by submitting presentation or discussion proposals that you
would be willing to cover for consideration.  You should also feel
free to submit ideas as proposals that others could cover.  The
instructions for submitting ideas should be at:

http://www.linuxplumbersconf.org/2013/submitting-topic/

This is the "Plumbers" conference so topics concerning userspace
interaction with the kernel and not just topics concerning pure kernel
subsystem internals would be greatly welcomed.  As a possible aid
towards promoting topics for submissions we have listed a few
candidates that seem to meet the criteria in the microconference
proposals wiki located at
http://wiki.linuxplumbersconf.org/2013:pci_subsystem

Last year, we had a good discussion on general issues in PCI space -
that also might be worth giving updates on.

If you have topics that you would like to add, wait until the
instructions get posted at the link above. If you are impatient, feel
free to email me directly (but probably best to drop the broad mailing
lists from the reply).

Thanks!

Len, Bjorn, Myron, Rafael



Yinghai - An update on host bridge hot-add capabilities, what is in
place and what still remains to be done, would seem like a good topic.

Shuah - You brought up the idea about "Converting drivers from Legacy
PM ops to dev_pm_ops"; would you like to present what you have
done/encountered so far?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] powerpc: using reset hcall when kvm,has-reset

2013-07-16 Thread Scott Wood

On 07/16/2013 06:26:40 PM, Scott Wood wrote:

On 07/16/2013 06:21:51 PM, Alexander Graf wrote:


On 16.07.2013, at 00:23, Scott Wood wrote:

> Still, I'm not sure what sort of error you're thinking of.  If the  
guest didn't support the hcall mechanism we would have returned from  
the function by that point.  In fact, seeing kvm,has-reset on a  
different hypervisor ought to mean that that hypervisor is emulating  
KVM in this particular respect.


Imagine we're running on KVM with this reset hcall support. Now if  
the guest has CONFIG_EPAPR_PARAVIRT enabled but CONFIG_KVM_GUEST  
disabled, we would patch the callback, but kvm_hypercall0() would be  
implemented as a nop.


Ugh -- that should be renamed epapr_hypercall and moved to  
epapr_paravirt.c.


Or rather, kvm_hypercall() should become epapr_hypercall() in  
epapr_paravirt.c -- there's nothing KVM-specific about it.


kvm_hypercall0() and friends could become epapr_hypercall0() in  
epapr_hcalls.h, with the KVM_HCALL_TOKEN() moved to the caller.  Or  
they could stay as they are but depend on CONFIG_EPAPR_PARAVIRT rather  
than CONFIG_KVM_GUEST -- there's no real dependency on the rest of the  
KVM guest code.


-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] powerpc: using reset hcall when kvm,has-reset

2013-07-16 Thread Scott Wood

On 07/16/2013 06:21:51 PM, Alexander Graf wrote:


On 16.07.2013, at 00:23, Scott Wood wrote:

> Still, I'm not sure what sort of error you're thinking of.  If the  
guest didn't support the hcall mechanism we would have returned from  
the function by that point.  In fact, seeing kvm,has-reset on a  
different hypervisor ought to mean that that hypervisor is emulating  
KVM in this particular respect.


Imagine we're running on KVM with this reset hcall support. Now if  
the guest has CONFIG_EPAPR_PARAVIRT enabled but CONFIG_KVM_GUEST  
disabled, we would patch the callback, but kvm_hypercall0() would be  
implemented as a nop.


Ugh -- that should be renamed epapr_hypercall and moved to  
epapr_paravirt.c.


-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] powerpc: using reset hcall when kvm,has-reset

2013-07-16 Thread Alexander Graf

On 16.07.2013, at 00:23, Scott Wood wrote:

> On 07/15/2013 03:55:08 PM, Alexander Graf wrote:
>> On 15.07.2013, at 22:52, Scott Wood wrote:
>> > On 07/15/2013 03:28:46 PM, Alexander Graf wrote:
>> >> On 15.07.2013, at 20:21, Scott Wood wrote:
>> >> > On 07/15/2013 10:16:41 AM, Bhushan Bharat-R65777 wrote:
>> >> >> > >>> +printk("error: system reset returned with error %ld\n", 
>> >> >> > >>> ret);
>> >> >> > >>
>> >> >> > >> So we should fall back to the normal reset handler here.
>> >> >> > >
>> >> >> > > Do you mean return normally from here, no BUG() etc?
>> >> >> >
>> >> >> > If we guard the patching against everything, we can treat a broken 
>> >> >> > hcall as BUG.
>> >> >> > However, if we don't we want to fall back to the normal guts based 
>> >> >> > reset.
>> >> >> Will let Scott comment on this?
>> >> >> But ppc_md.restart can point to only one handler and during paravirt 
>> >> >> patching we changed this to new handler. So we cannot jump back to 
>> >> >> guts type handler
>> >> >
>> >> > I don't think it's worth implementing a fall-back scheme -- if KVM 
>> >> > advertises that the reset hcall exists, then it had better exist.
>> >> If we also check for kvm_para_available() I agree. Otherwise QEMU might 
>> >> advertise the reset hcall, but the guest kernel may not implement KVM 
>> >> hypercalls. In that case the device tree check will succeed, but the 
>> >> actual hypercall will not.
>> >
>> > Wouldn't that be a bug in QEMU?  Or in KVM for exposing the hcall 
>> > capability without implementing them?
>> No, because it would be the guest that doesn't know how to handle kvm 
>> hypercalls.
> 
> Oh, I misread "guest kernel" as "host kernel". :-P
> 
> Still, I'm not sure what sort of error you're thinking of.  If the guest 
> didn't support the hcall mechanism we would have returned from the function 
> by that point.  In fact, seeing kvm,has-reset on a different hypervisor ought 
> to mean that that hypervisor is emulating KVM in this particular respect.

Imagine we're running on KVM with this reset hcall support. Now if the guest 
has CONFIG_EPAPR_PARAVIRT enabled but CONFIG_KVM_GUEST disabled, we would patch 
the callback, but kvm_hypercall0() would be implemented as a nop.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/5] booke: define reset and shutdown hcalls

2013-07-16 Thread Scott Wood

On 07/16/2013 01:35:55 AM, Gleb Natapov wrote:

On Mon, Jul 15, 2013 at 01:17:33PM -0500, Scott Wood wrote:
> On 07/15/2013 06:30:20 AM, Gleb Natapov wrote:
> >There is no much sense to share hypercalls between architectures.
> >There
> >is zero probability x86 will implement those for instance
>
> This is similar to the question of whether to keep device API
> enumerations per-architecture...  It costs very little to keep it in
> a common place, and it's hard to go back in the other direction if
> we later realize there are things that should be shared.
>
This is different from device API since with device API all arches  
have

to create/destroy devices, so it make sense to put device lifecycle
management into the common code, and device API has single entry point
to the code - device fd ioctl - where it makes sense to handle common
tasks, if any, and despatch others to specific device implementation.

This is totally unlike hypercalls which are, by definition, very
architecture specific (the way they are triggered, the way parameter
are passed from guest to host, what hypercalls arch needs...).


The ABI is architecture specific.  The API doesn't need to be, any more  
than it does with syscalls (I consider the architecture-specific  
definition of syscall numbers and similar constants in Linux to be  
unfortunate, especially for tools such as strace or QEMU's linux-user  
emulation).



> Keeping it in a common place also makes it more visible to people
> looking to add new hcalls, which could cut down on reinventing the
> wheel.
I do not want other arches to start using hypercalls in the way  
powerpc

started to use them: separate device io space, so it is better to hide
this as far away from common code as possible :) But on a more serious
note hypercalls should be a last resort and added only when no other
possibility exists, so people should not look what hcalls others
implemented, so they can add them to their favorite arch, but they
should have a problem at hand that they cannot solve without hcall,  
but
at this point they will have pretty good idea what this hcall should  
do.


Why are hcalls such a bad thing?

Should new Linux syscalls be avoided too, in favor of new emulated  
devices exposed via vfio? :-)



> >(not sure why PPC will want them either instead of emulating
> >devices that do
> >shutdown/reset).
>
> Besides what Alex said, for shutdown we don't have any existing
> device to emulate (our real hardware just doesn't have that
> functionality).  For reset we currently do emulate, but it's awkward
> to describe in the device tree what we actually emulate since the
> reset functionality is part of a kitchen-sink "device" of which we
> emulate virtually nothing other than the reset.  Currently we
> advertise the entire thing and just ignore the rest, but that causes
> problems with the guest seeing the node and trying to use that
> functionality.
>
What about writing virtio device for shutdown


That sounds like quite a bit more work than hcalls.  It also ties up a  
virtual PCI slot -- some machines don't have very many (mpc8544ds has  
2, though we could and should expand that in the paravirt e500 machine).



and add missing emulation
to kitchen-sink device (yeah I know easily said that done),


Not going to happen... there's lots of low-level and chip-specific  
stuff in there.  We'd have to make several versions, even for the  
paravirt platform so it would correspond to some chip that goes with  
the cpu being used.  Even then we couldn't do everything, at least with  
KVM -- one of the things in there is the ability to freeze the  
timebase, but reading the timebase doesn't trap, and we aren't going to  
freeze the host timebase.


-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: vfio interface for platform devices

2013-07-16 Thread Scott Wood

On 07/16/2013 05:41:04 PM, Yoder Stuart-B08248 wrote:



> -Original Message-
> From: Wood Scott-B07421
> Sent: Tuesday, July 16, 2013 5:01 PM
> To: Yoder Stuart-B08248
> Cc: Wood Scott-B07421; Alex Williamson; Alexander Graf; Bhushan  
Bharat-R65777; Sethi Varun-B16395;
> virtualizat...@lists.linux-foundation.org; Antonios Motakis;  
kvm@vger.kernel.org list; kvm-

> p...@vger.kernel.org; kvm...@lists.cs.columbia.edu
> Subject: Re: RFC: vfio interface for platform devices
>
> What if the interrupt map is for devices without explicit nodes,  
such

> as with a PCI controller (ignore the fact that we would normally use
> vfio_pci for the indivdual PCI devices instead)?
>
> You could say the same thing about ranges -- why expose ranges  
instead

> of the individual child node regs after translation?

Hmm...yes, I guess ranges and interrupt-map fall into the same
basic type of resource category.  I'm not sure it's realistic
to pass entire bus controllers through to user space vs
just individual devices on a bus, but I guess it's theoretically
possible.


Where "theoretically possible" means "we've done it before in other  
contexts". :-)



So the question is whether we future proof by adding flags
for both ranges and interrupt-map, or wait until there is
an actual need for it.


We don't need to actually add a flag for it, but we should have a  
flag/type for the resources we do support, so that code written to the  
current API would recognize that it doesn't recognize an interrupt-map  
entry if it's added later.


> > > > __u8path[]; /* output: Full path to  
associated

> > > > device tree node */
> > >
> > > How does the caller know what size buffer to supply for this?
>
> Ping

This is in the v2 RFC... the caller invokes the ioctl which returns
the complete/full size, then re-allocs the buffer and calls the
ioctl again.


OK.

Or, as Alex suggested, just use a sufficiently large buffer to start  
with.


It's fine for a user of the API to simplify things by using a large  
fixed buffer, but the API shouldn't force that approach.


-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: RFC: vfio interface for platform devices

2013-07-16 Thread Yoder Stuart-B08248


> -Original Message-
> From: Wood Scott-B07421
> Sent: Tuesday, July 16, 2013 5:01 PM
> To: Yoder Stuart-B08248
> Cc: Wood Scott-B07421; Alex Williamson; Alexander Graf; Bhushan 
> Bharat-R65777; Sethi Varun-B16395;
> virtualizat...@lists.linux-foundation.org; Antonios Motakis; 
> kvm@vger.kernel.org list; kvm-
> p...@vger.kernel.org; kvm...@lists.cs.columbia.edu
> Subject: Re: RFC: vfio interface for platform devices
> 
> On 07/16/2013 04:51:12 PM, Yoder Stuart-B08248 wrote:
> > > > 3.  VFIO_DEVICE_GET_REGION_INFO
> > > >
> > > >No changes needed, except perhaps adding a new flag.  Freescale
> > > > has some
> > > >devices with regions that must be mapped cacheable.
> > >
> > > While I don't object to making the information available to the user
> > > just in case, the main thing we need here is to influence what the
> > > kernel does when the user tries to map it.  At least on PPC it's
> > not up
> > > to userspace to select whether a mmap is cacheable.
> >
> > If user space really can't do anything with the 'cacheable'
> > flag, do you think there is good reason to keep it?   Will it
> > help any decision that user space makes?  Maybe we should just
> > drop it.
> 
> As long as we can be sure all architectures will map things correctly
> without any flags needing to be specified, that's fine.
> 
> > > >struct vfio_path_info {
> > > > __u32   argsz;
> > > > __u32   flags;
> > > >#define VFIO_DEVTREE_INFO_RANGES  (1 << 3) /* the region
> > is a
> > > > "ranges" property */
> > >
> > > What about distinguishing a normal interrupt from one found in an
> > > interrupt-map?
> >
> > I'm not sure we need that.  The kernel needs to use the interrupt
> > map to get interrupts hooked up right, but all user space needs to
> > know is that there are N interrupts and possibly device tree
> > paths to help user space interpret which interrupt is which.
> 
> What if the interrupt map is for devices without explicit nodes, such
> as with a PCI controller (ignore the fact that we would normally use
> vfio_pci for the indivdual PCI devices instead)?
> 
> You could say the same thing about ranges -- why expose ranges instead
> of the individual child node regs after translation?

Hmm...yes, I guess ranges and interrupt-map fall into the same
basic type of resource category.  I'm not sure it's realistic
to pass entire bus controllers through to user space vs
just individual devices on a bus, but I guess it's theoretically
possible.

So the question is whether we future proof by adding flags 
for both ranges and interrupt-map, or wait until there is
an actual need for it.

> > > In the case of both ranges and interrupt-maps, we'll also want to
> > > decide what the policy is for when to expose them directly, versus
> > just
> > > using them to translate regs and interrupts of child nodes
> >
> > Yes, not sure the best approach there...but guess we can cross
> > that bridge when we implement this.  It doesn't affect this
> > interface.
> 
> It does affect the interface, because if you allow either of them to be
> mapped directly (rather than implicitly used when mapping a child
> node), you need a way to indicate which type of resource it is you're
> describing (as you already do for reg/ranges).
>
> It also affects how vfio device binding is done, even if only to the
> point of specifying default behavior in the absence of knobs which
> change whether interrupt maps and/or ranges are mapped.

My opinion is that we want to expose the regs and interrupts for
individual nodes by default, not ranges (or interrupt maps).   When someone
needs ranges/interrupt-map in the future they'll need to figure out some
means for the vfio layer to do the right thing.  It's complicated
and I would be surprised to see someone need it.
 
> > > > __u8path[]; /* output: Full path to associated
> > > > device tree node */
> > >
> > > How does the caller know what size buffer to supply for this?
> 
> Ping

This is in the v2 RFC... the caller invokes the ioctl which returns
the complete/full size, then re-allocs the buffer and calls the
ioctl again.  Or, as Alex suggested, just use a sufficiently large
buffer to start with.

Stuart

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: vfio interface for platform devices

2013-07-16 Thread Scott Wood

On 07/16/2013 04:51:12 PM, Yoder Stuart-B08248 wrote:

> > 3.  VFIO_DEVICE_GET_REGION_INFO
> >
> >No changes needed, except perhaps adding a new flag.  Freescale
> > has some
> >devices with regions that must be mapped cacheable.
>
> While I don't object to making the information available to the user
> just in case, the main thing we need here is to influence what the
> kernel does when the user tries to map it.  At least on PPC it's  
not up

> to userspace to select whether a mmap is cacheable.

If user space really can't do anything with the 'cacheable'
flag, do you think there is good reason to keep it?   Will it
help any decision that user space makes?  Maybe we should just
drop it.


As long as we can be sure all architectures will map things correctly  
without any flags needing to be specified, that's fine.



> >struct vfio_path_info {
> > __u32   argsz;
> > __u32   flags;
> >#define VFIO_DEVTREE_INFO_RANGES  (1 << 3) /* the region  
is a

> > "ranges" property */
>
> What about distinguishing a normal interrupt from one found in an
> interrupt-map?

I'm not sure we need that.  The kernel needs to use the interrupt
map to get interrupts hooked up right, but all user space needs to
know is that there are N interrupts and possibly device tree
paths to help user space interpret which interrupt is which.


What if the interrupt map is for devices without explicit nodes, such  
as with a PCI controller (ignore the fact that we would normally use  
vfio_pci for the indivdual PCI devices instead)?


You could say the same thing about ranges -- why expose ranges instead  
of the individual child node regs after translation?



> In the case of both ranges and interrupt-maps, we'll also want to
> decide what the policy is for when to expose them directly, versus  
just

> using them to translate regs and interrupts of child nodes

Yes, not sure the best approach there...but guess we can cross
that bridge when we implement this.  It doesn't affect this
interface.


It does affect the interface, because if you allow either of them to be  
mapped directly (rather than implicitly used when mapping a child  
node), you need a way to indicate which type of resource it is you're  
describing (as you already do for reg/ranges).


It also affects how vfio device binding is done, even if only to the  
point of specifying default behavior in the absence of knobs which  
change whether interrupt maps and/or ranges are mapped.



> > __u8path[]; /* output: Full path to associated
> > device tree node */
>
> How does the caller know what size buffer to supply for this?


Ping

-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: RFC: vfio interface for platform devices (v2)

2013-07-16 Thread Yoder Stuart-B08248
(sorry for the delayed response, but I've been on PTO)

> > 1.  VFIO_GROUP_GET_DEVICE_FD
> >
> >   User space knows by out-of-band means which device it is accessing
> >   and will call VFIO_GROUP_GET_DEVICE_FD passing a specific sysfs path
> >   to get the device information:
> >
> >   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,
> >  "/sys/bus/platform/devices/ffe21.usb"));
> 
> FWIW, I'm in favor of whichever way works out cleaner in the code for
> pre-pending "/sys/bus" or not.  It sort of seems like it's unnecessary.
> It's also a little inconsistent that the returned path doesn't
> pre-pend /sys in the examples below.

Ok.  For the returned path in the examples I have the actual device tree
path which is slightly different from the path in /sys.  The device
tree path is what user space would need to interpret /proc/device-tree.

> > 2.  VFIO_DEVICE_GET_INFO
> >
> >The number of regions corresponds to the regions defined
> >in "reg" and "ranges" in the device tree.
> >
> >Two new flags are added to struct vfio_device_info:
> >
> >#define VFIO_DEVICE_FLAGS_PLATFORM (1 << ?) /* A platform bus device */
> >#define VFIO_DEVICE_FLAGS_DEVTREE  (1 << ?) /* device tree info 
> > available */
> >
> >It is possible that there could be platform bus devices
> >that are not in the device tree, so we use 2 flags to
> >allow for that.
> >
> >If just VFIO_DEVICE_FLAGS_PLATFORM is set, it means
> >that there are regions and IRQs but no device tree info
> >available.
> >
> >If just VFIO_DEVICE_FLAGS_DEVTREE is set, it means
> >there is device tree info available.
> 
> But it would be invalid to only have DEVTREE w/o PLATFORM for now,
> right?

Right.  The way I stated it is incorrect. DEVTREE would never
be set by itself.

> > 3. VFIO_DEVICE_GET_REGION_INFO
> >
> >For platform devices with multiple regions, information
> >is needed to correlate the regions with the device
> >tree structure that drivers use to determine the meaning
> >of device resources.
> >
> >The VFIO_DEVICE_GET_REGION_INFO is extended to provide
> >device tree information.
> >
> >The following information is needed:
> >   -the device tree path to the node corresponding to the
> >region
> >   -whether it corresponds to a "reg" or "ranges" property
> >   -there could be multiple sub-regions per "reg" or "ranges" and
> >the sub-index within the reg/ranges is needed
> >
> >There are 5 new flags added to vfio_region_info :
> >
> >struct vfio_region_info {
> > __u32   argsz;
> > __u32   flags;
> >#define VFIO_REGION_INFO_FLAG_CACHEABLE (1 << ?)
> >#define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1 << ?)
> >#define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1 << ?)
> >#define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1 << ?)
> >#define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1 << ?)
> > __u32   index;  /* Region index */
> > __u32   resv;   /* Reserved for alignment */
> > __u64   size;   /* Region size (bytes) */
> > __u64   offset; /* Region offset from start of device fd */
> >};
> >
> >VFIO_REGION_INFO_FLAG_CACHEABLE
> >-if set indicates that the region must be mapped as cacheable
> >
> >VFIO_DEVTREE_REGION_INFO_FLAG_REG
> >-if set indicates that the region corresponds to a "reg" property
> > in the device tree representation of the device
> >
> >VFIO_DEVTREE_REGION_INFO_FLAG_RANGE
> >-if set indicates that the region corresponds to a "ranges" property
> > in the device tree representation of the device
> >
> >VFIO_DEVTREE_REGION_INFO_FLAG_INDEX
> >-if set indicates that there is a dword aligned struct
> > struct vfio_devtree_region_info_index appended to the
> > end of vfio_region_info:
> >
> > struct vfio_devtree_region_info_index
> > {
> >   u32 index;
> > }
> >
> > A reg or ranges property may have multiple regsion.  The index
> > specifies the index within the "reg" or "ranges"
> > that this region corresponds to.
> >
> >VFIO_DEVTREE_REGION_INFO_FLAG_PATH
> >-if set indicates that there is a dword aligned struct
> > struct vfio_devtree_info_path appended to the
> > end of vfio_region_info:
> >
> > struct vfio_devtree_info_path
> > {
> > u32 len;
> > u8 path[];
> > }
> >
> > The path is the full path to the corresponding device
> > tree node.  The len field specifies the length of the
> > path string.
> >
> >If multiple flags are set that indicate that there is
> >an appended struct, the order of the flags indicates
> >the order of the structs.
> >
> >argsz is set by the kernel specifying the total size of
> >struct vfio_region_info and all appended structs.
> >
> >Suggested usage:
>

RE: RFC: vfio interface for platform devices

2013-07-16 Thread Yoder Stuart-B08248

> -Original Message-
> From: Wood Scott-B07421
> Sent: Wednesday, July 03, 2013 5:32 PM
> To: Yoder Stuart-B08248
> Cc: Alex Williamson; Alexander Graf; Wood Scott-B07421; Bhushan 
> Bharat-R65777; Sethi Varun-B16395;
> virtualizat...@lists.linux-foundation.org; Antonios Motakis; 
> kvm@vger.kernel.org list; kvm-
> p...@vger.kernel.org; kvm...@lists.cs.columbia.edu
> Subject: Re: RFC: vfio interface for platform devices
> 
> On 07/02/2013 06:25:59 PM, Yoder Stuart-B08248 wrote:
> > The write-up below is the first draft of a proposal for how the
> > kernel can expose
> > platform devices to user space using vfio.
> >
> > In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
> > allows user space to correlate regions and interrupts to the
> > corresponding
> > device tree node structure that is defined for most platform devices.
> >
> > Regards,
> > Stuart Yoder
> >
> > --
> > VFIO for Platform Devices
> >
> > The existing infrastructure for vfio-pci is pretty close to what we
> > need:
> >-mechanism to create a container
> >-add groups/devices to a container
> >-set the IOMMU model
> >-map DMA regions
> >-get an fd for a specific device, which allows user space to
> > determine
> > info about device regions (e.g. registers) and interrupt info
> >-support for mmapping device regions
> >-mechanism to set how interrupts are signaled
> >
> > Platform devices can get complicated-- potentially with a tree
> > hierarchy
> > of nodes, and links/phandles pointing to other platform
> > devices.   The kernel doesn't expose relationships between
> > devices.  The kernel just exposes mappable register regions and
> > interrupts.
> > It's up to user space to work out relationships between devices
> > if it needs to-- this can be determined in the device tree exposed in
> > /proc/device-tree.
> >
> > I think the changes needed for vfio are around some of the device tree
> > related info that needs to be available with the device fd.
> >
> > 1.  VFIO_GROUP_GET_DEVICE_FD
> >
> >   User space has to know which device it is accessing and will call
> >   VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
> >   get the device information:
> >
> >   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,
> > "/soc@ffe00/usb@21");
> >
> >   (whether the path is a device tree path or a sysfs path is up for
> >   discussion, e.g. "/sys/bus/platform/devices/ffe21.usb")
> 
> Doesn't VFIO need to operate on an actual Linux device, rather than
> just an OF node?
> 
> Are we going to have a fixed assumption that you always want all the
> children of the node corresponding to the assigned device, or will it
> be possible to exclude some?

My assumption is that you always get all the children of the
node corresponding to the assigned device.

> > 2.  VFIO_DEVICE_GET_INFO
> >
> >Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
> >than adding a new flag identifying a devices as a 'platform'
> >device.
> >
> >This ioctl simply returns the number of regions and number of irqs.
> >
> >The number of regions corresponds to the number of regions
> >that can be mapped for the device-- corresponds to the regions
> > defined
> >in "reg" and "ranges" in the device tree.
> >
> > 3.  VFIO_DEVICE_GET_REGION_INFO
> >
> >No changes needed, except perhaps adding a new flag.  Freescale
> > has some
> >devices with regions that must be mapped cacheable.
> 
> While I don't object to making the information available to the user
> just in case, the main thing we need here is to influence what the
> kernel does when the user tries to map it.  At least on PPC it's not up
> to userspace to select whether a mmap is cacheable.

If user space really can't do anything with the 'cacheable'
flag, do you think there is good reason to keep it?   Will it
help any decision that user space makes?  Maybe we should just
drop it.
 
> > 4. VFIO_DEVICE_GET_DEVTREE_INFO
> >
> >The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
> >expose device regions and interrupts, but it's not enough to know
> >that there are X regions and Y interrupts.  User space needs to
> >know what the resources are for-- to correlate those
> > regions/interrupts
> >to the device tree structure that drivers use.  The device tree
> >structure could consist of multiple nodes and it is necessary to
> >identify the node corresponding to the region/interrupt exposed
> >by VFIO.
> >
> >The following information is needed:
> >   -the device tree path to the node corresponding to the
> >region or interrupt
> >   -for a region, whether it corresponds to a "reg" or "ranges"
> >property
> >   -there could be multiple sub-regions per "reg" or "ranges" and
> >the sub-index within the reg/ranges is needed
> >
> >The VFIO_DEVICE_GET

Re: [PATCH RFC V10 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-07-16 Thread Raghavendra K T

On 07/16/2013 09:18 PM, Peter Zijlstra wrote:

On Tue, Jul 16, 2013 at 09:02:15AM +0300, Gleb Natapov wrote:

BTW can NMI handler take spinlocks?


No -- that is, yes you can using trylock, but you still shouldn't.


Thanks Peter for the clarification.

I had started checking few of nmi handlers code to confirm.
Did saw a raw spinlock in drivers/acpi/apei/ghes.c, but then stopped.




If it can what happens if NMI is
delivered in a section protected by local_irq_save()/local_irq_restore()?


You deadlock.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V10 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-07-16 Thread Raghavendra K T

On 07/16/2013 11:32 AM, Gleb Natapov wrote:

On Tue, Jul 16, 2013 at 09:07:53AM +0530, Raghavendra K T wrote:

On 07/15/2013 04:06 PM, Gleb Natapov wrote:

On Mon, Jul 15, 2013 at 03:20:06PM +0530, Raghavendra K T wrote:

On 07/14/2013 06:42 PM, Gleb Natapov wrote:

On Mon, Jun 24, 2013 at 06:13:42PM +0530, Raghavendra K T wrote:

kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

From: Srivatsa Vaddagiri 


trimming
[...]

+
+static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
+{
+   struct kvm_lock_waiting *w;
+   int cpu;
+   u64 start;
+   unsigned long flags;
+
+   w = &__get_cpu_var(lock_waiting);
+   cpu = smp_processor_id();
+   start = spin_time_start();
+
+   /*
+* Make sure an interrupt handler can't upset things in a
+* partially setup state.
+*/
+   local_irq_save(flags);
+
+   /*
+* The ordering protocol on this is that the "lock" pointer
+* may only be set non-NULL if the "want" ticket is correct.
+* If we're updating "want", we must first clear "lock".
+*/
+   w->lock = NULL;
+   smp_wmb();
+   w->want = want;
+   smp_wmb();
+   w->lock = lock;
+
+   add_stats(TAKEN_SLOW, 1);
+
+   /*
+* This uses set_bit, which is atomic but we should not rely on its
+* reordering gurantees. So barrier is needed after this call.
+*/
+   cpumask_set_cpu(cpu, &waiting_cpus);
+
+   barrier();
+
+   /*
+* Mark entry to slowpath before doing the pickup test to make
+* sure we don't deadlock with an unlocker.
+*/
+   __ticket_enter_slowpath(lock);
+
+   /*
+* check again make sure it didn't become free while
+* we weren't looking.
+*/
+   if (ACCESS_ONCE(lock->tickets.head) == want) {
+   add_stats(TAKEN_SLOW_PICKUP, 1);
+   goto out;
+   }
+
+   /* Allow interrupts while blocked */
+   local_irq_restore(flags);
+

So what happens if an interrupt comes here and an interrupt handler
takes another spinlock that goes into the slow path? As far as I see
lock_waiting will become overwritten and cpu will be cleared from
waiting_cpus bitmap by nested kvm_lock_spinning(), so when halt is
called here after returning from the interrupt handler nobody is going
to wake this lock holder. Next random interrupt will "fix" it, but it
may be several milliseconds away, or never. We should probably check
if interrupt were enabled and call native_safe_halt() here.



Okay you mean something like below should be done.
if irq_enabled()
   native_safe_halt()
else
   halt()

It is been a complex stuff for analysis for me.

So in our discussion stack would looking like this.

spinlock()
   kvm_lock_spinning()
   <-- interrupt here
   halt()


 From the halt if we trace


It is to early to trace the halt since it was not executed yet. Guest
stack trace will look something like this:

spinlock(a)
   kvm_lock_spinning(a)
lock_waiting = a
set bit in waiting_cpus
 <-- interrupt here
 spinlock(b)
   kvm_lock_spinning(b)
 lock_waiting = b
 set bit in waiting_cpus
 halt()
 unset bit in waiting_cpus
 lock_waiting = NULL
  --> ret from interrupt
halt()

Now at the time of the last halt above lock_waiting == NULL and
waiting_cpus is empty and not interrupt it pending, so who will unhalt
the waiter?



Yes. if an interrupt occurs between
local_irq_restore() and halt(), this is possible. and since this is
rarest of rare (possiility of irq entering slowpath and then no
random irq to do spurious wakeup), we had never hit this problem in
the past.

I do not think it is very rare to get interrupt between
local_irq_restore() and halt() under load since any interrupt that
occurs between local_irq_save() and local_irq_restore() will be delivered
immediately after local_irq_restore(). Of course the chance of no other
random interrupt waking lock waiter is very low, but waiter can sleep
for much longer then needed and this will be noticeable in performance.


Yes, I meant the entire thing. I did infact turned WARN on w->lock==null 
before halt() [ though we can potentially have irq right after that ], 
but did not hit so far.



BTW can NMI handler take spinlocks? If it can what happens if NMI is
delivered in a section protected by local_irq_save()/local_irq_restore()?



Had another idea if NMI, halts are causing problem until I saw PeterZ's 
reply similar to V2 of pvspinlock posted  here:


 https://lkml.org/lkml/2011/10/23/211

Instead of halt we started with a sleep hypercall in those
 versions. Changed to halt() once Avi suggested to reuse existing sleep.

If we use older hypercall with few changes like below:

kvm_pv_wait_for_kick_op(flags, vcpu, 

Re: [PATCH RFC V10 16/18] kvm hypervisor : Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic

2013-07-16 Thread Raghavendra K T

On 07/15/2013 09:16 PM, Gleb Natapov wrote:

On Mon, Jul 15, 2013 at 09:06:13PM +0530, Raghavendra K T wrote:

On 07/14/2013 06:54 PM, Gleb Natapov wrote:

On Mon, Jun 24, 2013 at 06:13:53PM +0530, Raghavendra K T wrote:

Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic

From: Raghavendra K T 

Note that we are using APIC_DM_REMRD which has reserved usage.
In future if APIC_DM_REMRD usage is standardized, then we should
find some other way or go back to old method.

Suggested-by: Gleb Natapov 
Signed-off-by: Raghavendra K T 
---
  arch/x86/kvm/lapic.c |5 -
  arch/x86/kvm/x86.c   |   25 ++---
  2 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e1adbb4..3f5f82e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -706,7 +706,10 @@ out:
break;

case APIC_DM_REMRD:
-   apic_debug("Ignoring delivery mode 3\n");
+   result = 1;
+   vcpu->arch.pv.pv_unhalted = 1;
+   kvm_make_request(KVM_REQ_EVENT, vcpu);
+   kvm_vcpu_kick(vcpu);
break;

case APIC_DM_SMI:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 92a9932..b963c86 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5456,27 +5456,14 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
   */
  static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
  {
-   struct kvm_vcpu *vcpu = NULL;
-   int i;
+   struct kvm_lapic_irq lapic_irq;

-   kvm_for_each_vcpu(i, vcpu, kvm) {
-   if (!kvm_apic_present(vcpu))
-   continue;
+   lapic_irq.shorthand = 0;
+   lapic_irq.dest_mode = 0;
+   lapic_irq.dest_id = apicid;

-   if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
-   break;
-   }
-   if (vcpu) {
-   /*
-* Setting unhalt flag here can result in spurious runnable
-* state when unhalt reset does not happen in vcpu_block.
-* But that is harmless since that should soon result in halt.
-*/
-   vcpu->arch.pv.pv_unhalted = true;
-   /* We need everybody see unhalt before vcpu unblocks */
-   smp_wmb();
-   kvm_vcpu_kick(vcpu);
-   }
+   lapic_irq.delivery_mode = APIC_DM_REMRD;

Need to make sure that delivery_mode cannot be set to APIC_DM_REMRD

>from MSI/IOAPIC/IPI path.

I Gleb,
I need your help here since I do not know much about apic.

so are you saying explicitly checking that, kvm_set_msi_irq,
apic_send_ipi, native_setup_ioapic_entry, does not have
APIC_DM_REMRD as delivery_mode set?


Yes, but on a second thought what bad can happen if we will not check it?
If guest configures irq to inject APIC_DM_REMRD interrupt this may
needlessly wakeup sleeping vcpu and it will try to accrue spinlock
again, so no correctness problem only performance. If this is the case
lets leave it as it for now.


Okay.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-unit-tests : The first version of VMX nested test case

2013-07-16 Thread Gleb Natapov
On Wed, Jul 17, 2013 at 01:13:56AM +0800, Arthur Chunqi Li wrote:
> On Wed, Jul 17, 2013 at 12:45 AM, Gleb Natapov  wrote:
> > On Tue, Jul 16, 2013 at 11:29:20PM +0800, Arthur Chunqi Li wrote:
> >> On Tue, Jul 16, 2013 at 11:20 PM, Gleb Natapov  wrote:
> >> > On Tue, Jul 16, 2013 at 12:28:05PM +0200, Paolo Bonzini wrote:
> >> >> > +void vmx_exit(void)
> >> >> > +{
> >> >> > +   test_vmxoff();
> >> >> > +   printf("\nSUMMARY: %d tests, %d failures\n", tests, fails);
> >> >> > +   exit(fails ? -1 : 0);
> >> >> > +}
> >> >>
> >> >> Can you try to jump back to main, and do test_vmxoff there?  This will
> >> >> avoid having to write our tests in callback style, which is a pain.
> >> >> Basically something similar to setjmp/longjmp.  In main:
> >> >>
> >> >>   if (setjmp(jmpbuf) == 0) {
> >> >>   vmx_run();
> >> >>   /* Should not reach here */
> >> >>   report("test vmlaunch", 0);
> >> >>   }
> >> >>   test_vmxoff();
> >> >>
> >> >> exit:
> >> >>   printf("\nSUMMARY: %d tests, %d failures\n", tests, fails);
> >> >>   return fails ? 1 : 0;
> >> >>
> >> >> In vmx_handler:
> >> >>
> >> >>   case VMX_HLT:
> >> >>   printf("\nVM exit.\n");
> >> >>   longjmp(jmpbuf, 1);
> >> >>
> >> > Why not just make vmexit occur after vmlaunch/vmresume like KVM does. It
> >> > will make code much more straightforward and easer to follow.
> >> The concept "easier to follow" may have different meanings in
> >> different view. This achievement puts all the test cases in main
> >> function instead of scattering everywhere, which is another view to
> >> "easy to follow". As this is just a test case, I prefer this one.
> >>
> > I do not see why what I propose will prevent you to put all tests into main.
> >
> > vmx_run() will looks like that:
> >
> >vmlaunch
> >while(1) {
> >vmresume
> >  < vmexit jumps here
> >switch(exit reason) {
> >   case reason1:
> >   break;
> >   case reason2:
> >   break;
> >   case HLT
> >   return;
> >}
> >}
> Yes, this recalls me some KVM codes I have read before. This mixes
> vmlaunch/resume and vmx_handler into one piece of code. It is a good
> way to explicitly show the execution sequence though, it increases LOC
> in one function.
LOC in one function is not an issue to be considered at all. Besides you
can put the switch into separate vmx_handler() function, or have an
array of vmexits.

> >
> >> Besides, this way we can start another VM following the previous one
> >> simply in main function. This is flexible if we want to test re-enter
> >> to VMX mode or so.
> >>
> > That's what I am missing. How do one writes more tests now?
> >
> > I was thinking about interface like that:
> >
> > guest_func_test1()
> > {
> > }
> >
> > tes1t_exit_handlers[] = {test1_handle_hlt, test1_handle_exception, }
> >
> > main()
> > {
> >
> >init_vmcs(); /* generic stuff */
> >init_vmcs_test1(); /* test1 related stuff */
> >r = run_in_guest(guest_func_test1, test1_exit_handlers);
> >report("test1", r);
> > }
> >
> I have thought about this question and I'm not quite sure how to solve
> it now. I have two ways. The first is that we just leave vmx.c as the
> VMX instructions and execution routine test suite, and develop other
> test cases in other files. Since all other tests of nested vmx is
> independent to the basic routine and it is hard for us to put all test
> cases for nested VMX in one file, so we just let this file do simple
> things and reuse some of its functions in other test suites of nested
> vmx. Your proposal of adding new test cases can be implemented in
> other test suites.
> 
> The other way is not splitting nested vmx tests cases in contrast.
> This method may cause a HUGE vmx.c file, and tests for different parts
> are not distinctive.
> 
> Actually, I prefer the former solution.
I do not think we need separate infrastructure just to test basic
instructions. Actually testing those are less interesting part (well
just to sorely test vmlaunch/vmresume we will likely have to write
hundred of tests to verify that all things that should cause failure do
that, but we likely settle for only a couple). I am not worrying about
vmx.c become a huge file as long as writing test is easy and more or
less self contained task.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-unit-tests : The first version of VMX nested test case

2013-07-16 Thread Paolo Bonzini
Il 16/07/2013 19:13, Arthur Chunqi Li ha scritto:
> On Wed, Jul 17, 2013 at 12:45 AM, Gleb Natapov  wrote:
>> On Tue, Jul 16, 2013 at 11:29:20PM +0800, Arthur Chunqi Li wrote:
>>> On Tue, Jul 16, 2013 at 11:20 PM, Gleb Natapov  wrote:
 On Tue, Jul 16, 2013 at 12:28:05PM +0200, Paolo Bonzini wrote:
>> +void vmx_exit(void)
>> +{
>> +   test_vmxoff();
>> +   printf("\nSUMMARY: %d tests, %d failures\n", tests, fails);
>> +   exit(fails ? -1 : 0);
>> +}
>
> Can you try to jump back to main, and do test_vmxoff there?  This will
> avoid having to write our tests in callback style, which is a pain.
> Basically something similar to setjmp/longjmp.  In main:
>
>   if (setjmp(jmpbuf) == 0) {
>   vmx_run();
>   /* Should not reach here */
>   report("test vmlaunch", 0);
>   }
>   test_vmxoff();
>
> exit:
>   printf("\nSUMMARY: %d tests, %d failures\n", tests, fails);
>   return fails ? 1 : 0;
>
> In vmx_handler:
>
>   case VMX_HLT:
>   printf("\nVM exit.\n");
>   longjmp(jmpbuf, 1);
>
 Why not just make vmexit occur after vmlaunch/vmresume like KVM does. It
 will make code much more straightforward and easer to follow.
>>> The concept "easier to follow" may have different meanings in
>>> different view. This achievement puts all the test cases in main
>>> function instead of scattering everywhere, which is another view to
>>> "easy to follow". As this is just a test case, I prefer this one.
>>>
>> I do not see why what I propose will prevent you to put all tests into main.
>>
>> vmx_run() will looks like that:
>>
>>vmlaunch
>>while(1) {
>>vmresume
>>  < vmexit jumps here
>>switch(exit reason) {
>>   case reason1:
>>   break;
>>   case reason2:
>>   break;
>>   case HLT
>>   return;
>>}
>>}
> Yes, this recalls me some KVM codes I have read before. This mixes
> vmlaunch/resume and vmx_handler into one piece of code. It is a good
> way to explicitly show the execution sequence though, it increases LOC
> in one function.
>>
>>> Besides, this way we can start another VM following the previous one
>>> simply in main function. This is flexible if we want to test re-enter
>>> to VMX mode or so.
>>>
>> That's what I am missing. How do one writes more tests now?
>>
>> I was thinking about interface like that:
>>
>> guest_func_test1()
>> {
>> }
>>
>> tes1t_exit_handlers[] = {test1_handle_hlt, test1_handle_exception, }
>>
>> main()
>> {
>>
>>init_vmcs(); /* generic stuff */
>>init_vmcs_test1(); /* test1 related stuff */
>>r = run_in_guest(guest_func_test1, test1_exit_handlers);
>>report("test1", r);
>> }
>>
> I have thought about this question and I'm not quite sure how to solve
> it now.

Why can't you just use a different vmx_handler (e.g. with an indirect
call in entry_vmx) for each test (as in Gleb's test1_exit_handlers)?
run_in_guest would prepare the function pointers and do

init_vmcs(&vmcs_root);

if (setjmp(env) == 0){
vmx_run();
/* Should not reach here */
report("test vmlaunch", 0);
}

as in your current testcase.

vmx.c would be a "library", and testcases could be either grouped in the
same file or spread across many of them, as you see fit.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-unit-tests : The first version of VMX nested test case

2013-07-16 Thread Arthur Chunqi Li
On Wed, Jul 17, 2013 at 12:45 AM, Gleb Natapov  wrote:
> On Tue, Jul 16, 2013 at 11:29:20PM +0800, Arthur Chunqi Li wrote:
>> On Tue, Jul 16, 2013 at 11:20 PM, Gleb Natapov  wrote:
>> > On Tue, Jul 16, 2013 at 12:28:05PM +0200, Paolo Bonzini wrote:
>> >> > +void vmx_exit(void)
>> >> > +{
>> >> > +   test_vmxoff();
>> >> > +   printf("\nSUMMARY: %d tests, %d failures\n", tests, fails);
>> >> > +   exit(fails ? -1 : 0);
>> >> > +}
>> >>
>> >> Can you try to jump back to main, and do test_vmxoff there?  This will
>> >> avoid having to write our tests in callback style, which is a pain.
>> >> Basically something similar to setjmp/longjmp.  In main:
>> >>
>> >>   if (setjmp(jmpbuf) == 0) {
>> >>   vmx_run();
>> >>   /* Should not reach here */
>> >>   report("test vmlaunch", 0);
>> >>   }
>> >>   test_vmxoff();
>> >>
>> >> exit:
>> >>   printf("\nSUMMARY: %d tests, %d failures\n", tests, fails);
>> >>   return fails ? 1 : 0;
>> >>
>> >> In vmx_handler:
>> >>
>> >>   case VMX_HLT:
>> >>   printf("\nVM exit.\n");
>> >>   longjmp(jmpbuf, 1);
>> >>
>> > Why not just make vmexit occur after vmlaunch/vmresume like KVM does. It
>> > will make code much more straightforward and easer to follow.
>> The concept "easier to follow" may have different meanings in
>> different view. This achievement puts all the test cases in main
>> function instead of scattering everywhere, which is another view to
>> "easy to follow". As this is just a test case, I prefer this one.
>>
> I do not see why what I propose will prevent you to put all tests into main.
>
> vmx_run() will looks like that:
>
>vmlaunch
>while(1) {
>vmresume
>  < vmexit jumps here
>switch(exit reason) {
>   case reason1:
>   break;
>   case reason2:
>   break;
>   case HLT
>   return;
>}
>}
Yes, this recalls me some KVM codes I have read before. This mixes
vmlaunch/resume and vmx_handler into one piece of code. It is a good
way to explicitly show the execution sequence though, it increases LOC
in one function.
>
>> Besides, this way we can start another VM following the previous one
>> simply in main function. This is flexible if we want to test re-enter
>> to VMX mode or so.
>>
> That's what I am missing. How do one writes more tests now?
>
> I was thinking about interface like that:
>
> guest_func_test1()
> {
> }
>
> tes1t_exit_handlers[] = {test1_handle_hlt, test1_handle_exception, }
>
> main()
> {
>
>init_vmcs(); /* generic stuff */
>init_vmcs_test1(); /* test1 related stuff */
>r = run_in_guest(guest_func_test1, test1_exit_handlers);
>report("test1", r);
> }
>
I have thought about this question and I'm not quite sure how to solve
it now. I have two ways. The first is that we just leave vmx.c as the
VMX instructions and execution routine test suite, and develop other
test cases in other files. Since all other tests of nested vmx is
independent to the basic routine and it is hard for us to put all test
cases for nested VMX in one file, so we just let this file do simple
things and reuse some of its functions in other test suites of nested
vmx. Your proposal of adding new test cases can be implemented in
other test suites.

The other way is not splitting nested vmx tests cases in contrast.
This method may cause a HUGE vmx.c file, and tests for different parts
are not distinctive.

Actually, I prefer the former solution.
> --
> Gleb.


Besides, there is also another "pseudo" bug in PATCH 2/2, here:

+int test_vmx_capability(void)
+{
+   struct cpuid r;
+   u64 ret1, ret2;
+   r = cpuid(1);
+   ret1 = ((r.c) >> 5) & 1;
+   ret2 = ((rdmsr(MSR_IA32_FEATURE_CONTROL) & 0x5) == 0x5);
+   report("test vmx capability", ret1 & ret2);
+   return !(ret1 & ret2);
+}

The IA32_FEATURE_CONTROL MSR should be set by seabios. SInce there's
no patch for seabios and in fact software can also set this MSR if its
lock bit is unset. So I prefer to change it like this:

+int test_vmx_capability(void)
+{
+   struct cpuid r;
+   u64 ret1, ret2;
+   u64 ia32_feature_control;
+   r = cpuid(1);
+   ret1 = ((r.c) >> 5) & 1;
+   ia32_feature_control = rdmsr(MSR_IA32_FEATURE_CONTROL);
+   ret2 = ((ia32_feature_control & 0x5) == 0x5);
+   if ((!ret2) && ((ia32_feature_control & 0x1) == 0)){
+   wrmsr(MSR_IA32_FEATURE_CONTROL, 0x5);
+   ia32_feature_control = rdmsr(MSR_IA32_FEATURE_CONTROL);
+   ret2 = ((ia32_feature_control & 0x5) == 0x5);
+   }
+   report("test vmx capability", ret1 & ret2);
+   return !(ret1 & ret2);
+}


Arthur
--
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  h

Re: [PATCH] kvm-unit-tests : The first version of VMX nested test case

2013-07-16 Thread Gleb Natapov
On Tue, Jul 16, 2013 at 11:29:20PM +0800, Arthur Chunqi Li wrote:
> On Tue, Jul 16, 2013 at 11:20 PM, Gleb Natapov  wrote:
> > On Tue, Jul 16, 2013 at 12:28:05PM +0200, Paolo Bonzini wrote:
> >> > +void vmx_exit(void)
> >> > +{
> >> > +   test_vmxoff();
> >> > +   printf("\nSUMMARY: %d tests, %d failures\n", tests, fails);
> >> > +   exit(fails ? -1 : 0);
> >> > +}
> >>
> >> Can you try to jump back to main, and do test_vmxoff there?  This will
> >> avoid having to write our tests in callback style, which is a pain.
> >> Basically something similar to setjmp/longjmp.  In main:
> >>
> >>   if (setjmp(jmpbuf) == 0) {
> >>   vmx_run();
> >>   /* Should not reach here */
> >>   report("test vmlaunch", 0);
> >>   }
> >>   test_vmxoff();
> >>
> >> exit:
> >>   printf("\nSUMMARY: %d tests, %d failures\n", tests, fails);
> >>   return fails ? 1 : 0;
> >>
> >> In vmx_handler:
> >>
> >>   case VMX_HLT:
> >>   printf("\nVM exit.\n");
> >>   longjmp(jmpbuf, 1);
> >>
> > Why not just make vmexit occur after vmlaunch/vmresume like KVM does. It
> > will make code much more straightforward and easer to follow.
> The concept "easier to follow" may have different meanings in
> different view. This achievement puts all the test cases in main
> function instead of scattering everywhere, which is another view to
> "easy to follow". As this is just a test case, I prefer this one.
> 
I do not see why what I propose will prevent you to put all tests into main.

vmx_run() will looks like that:

   vmlaunch
   while(1) {
   vmresume
 < vmexit jumps here
   switch(exit reason) {
  case reason1:
  break;
  case reason2:
  break;
  case HLT
  return;
   }
   }  
  
> Besides, this way we can start another VM following the previous one
> simply in main function. This is flexible if we want to test re-enter
> to VMX mode or so.
> 
That's what I am missing. How do one writes more tests now?

I was thinking about interface like that:

guest_func_test1()
{
}

tes1t_exit_handlers[] = {test1_handle_hlt, test1_handle_exception, }

main()
{

   init_vmcs(); /* generic stuff */
   init_vmcs_test1(); /* test1 related stuff */
   r = run_in_guest(guest_func_test1, test1_exit_handlers);
   report("test1", r);
}

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V10 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-07-16 Thread Gleb Natapov
On Tue, Jul 16, 2013 at 05:48:52PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 16, 2013 at 09:02:15AM +0300, Gleb Natapov wrote:
> > BTW can NMI handler take spinlocks? 
> 
> No -- that is, yes you can using trylock, but you still shouldn't.
> 
Great news for this code. Thanks.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/2] Basic nested VMX test suite

2013-07-16 Thread Arthur Chunqi Li
On this commit, I delete two "TODO" lines in PATCH 2/2, and it can
only run with all my previous patches.

Arthur

On Tue, Jul 16, 2013 at 11:58 PM, Arthur Chunqi Li  wrote:
> These two patches are focued on providing a basic version of VMX nested
> test suite. This commit provides the architecture of nested VMX similiar
> to x86/svm.c.
>
> setjmp/longjmp is designed to avoid exiting VMX in call-back form.
>
> Arthur Chunqi Li (2):
>   kvm-unit-tests : Add setjmp/longjmp to libcflat
>   kvm-unit-tests : The first version of VMX nested test case
>
>  config-x86-common.mak |2 +
>  config-x86_64.mak |3 +
>  lib/setjmp.h  |   11 +
>  lib/x86/msr.h |5 +
>  lib/x86/setjmp64.S|   27 +++
>  x86/cstart64.S|4 +
>  x86/unittests.cfg |6 +
>  x86/vmx.c |  561 
> +
>  x86/vmx.h |  406 +++
>  9 files changed, 1025 insertions(+)
>  create mode 100644 lib/setjmp.h
>  create mode 100644 lib/x86/setjmp64.S
>  create mode 100644 x86/vmx.c
>  create mode 100644 x86/vmx.h
>
> --
> 1.7.9.5
>



-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/2] kvm-unit-tests : The first version of VMX nested test case

2013-07-16 Thread Arthur Chunqi Li
This is the first version for VMX nested environment test case. It
contains the basic VMX instructions test cases, including VMXON/
VMXOFF/VMXPTRLD/VMXPTRST/VMCLEAR/VMLAUNCH/VMRESUME/VMCALL. This patch
also tests the basic execution routine in VMX nested environment and
let the VM print "Hello World" to inform its successfully run.

New files added:
x86/vmx.h : contains all VMX related macro declerations
x86/vmx.c : main file for VMX nested test case

Signed-off-by: Arthur Chunqi Li 
---
 config-x86-common.mak |2 +
 config-x86_64.mak |1 +
 lib/x86/msr.h |5 +
 x86/cstart64.S|4 +
 x86/unittests.cfg |6 +
 x86/vmx.c |  561 +
 x86/vmx.h |  406 +++
 7 files changed, 985 insertions(+)
 create mode 100644 x86/vmx.c
 create mode 100644 x86/vmx.h

diff --git a/config-x86-common.mak b/config-x86-common.mak
index 455032b..34a41e1 100644
--- a/config-x86-common.mak
+++ b/config-x86-common.mak
@@ -101,6 +101,8 @@ $(TEST_DIR)/asyncpf.elf: $(cstart.o) $(TEST_DIR)/asyncpf.o
 
 $(TEST_DIR)/pcid.elf: $(cstart.o) $(TEST_DIR)/pcid.o
 
+$(TEST_DIR)/vmx.elf: $(cstart.o) $(TEST_DIR)/vmx.o
+
 arch_clean:
$(RM) $(TEST_DIR)/*.o $(TEST_DIR)/*.flat $(TEST_DIR)/*.elf \
$(TEST_DIR)/.*.d $(TEST_DIR)/lib/.*.d $(TEST_DIR)/lib/*.o
diff --git a/config-x86_64.mak b/config-x86_64.mak
index 91ffcce..5d9b22a 100644
--- a/config-x86_64.mak
+++ b/config-x86_64.mak
@@ -11,5 +11,6 @@ tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
  $(TEST_DIR)/xsave.flat $(TEST_DIR)/rmap_chain.flat \
  $(TEST_DIR)/pcid.flat
 tests += $(TEST_DIR)/svm.flat
+tests += $(TEST_DIR)/vmx.flat
 
 include config-x86-common.mak
diff --git a/lib/x86/msr.h b/lib/x86/msr.h
index 509a421..281255a 100644
--- a/lib/x86/msr.h
+++ b/lib/x86/msr.h
@@ -396,6 +396,11 @@
 #define MSR_IA32_VMX_VMCS_ENUM  0x048a
 #define MSR_IA32_VMX_PROCBASED_CTLS20x048b
 #define MSR_IA32_VMX_EPT_VPID_CAP   0x048c
+#define MSR_IA32_VMX_TRUE_PIN  0x048d
+#define MSR_IA32_VMX_TRUE_PROC 0x048e
+#define MSR_IA32_VMX_TRUE_EXIT 0x048f
+#define MSR_IA32_VMX_TRUE_ENTRY0x0490
+
 
 /* AMD-V MSRs */
 
diff --git a/x86/cstart64.S b/x86/cstart64.S
index 24df5f8..0fe76da 100644
--- a/x86/cstart64.S
+++ b/x86/cstart64.S
@@ -4,6 +4,10 @@
 .globl boot_idt
 boot_idt = 0
 
+.globl idt_descr
+.globl tss_descr
+.globl gdt64_desc
+
 ipi_vector = 0x20
 
 max_cpus = 64
diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index bc9643e..85c36aa 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -149,3 +149,9 @@ extra_params = --append "1000 `date +%s`"
 file = pcid.flat
 extra_params = -cpu qemu64,+pcid
 arch = x86_64
+
+[vmx]
+file = vmx.flat
+extra_params = -cpu host,+vmx
+arch = x86_64
+
diff --git a/x86/vmx.c b/x86/vmx.c
new file mode 100644
index 000..b858400
--- /dev/null
+++ b/x86/vmx.c
@@ -0,0 +1,561 @@
+#include "libcflat.h"
+#include "processor.h"
+#include "vm.h"
+#include "desc.h"
+#include "vmx.h"
+#include "msr.h"
+#include "smp.h"
+#include "io.h"
+#include "setjmp.h"
+
+int fails = 0, tests = 0;
+u32 *vmxon_region;
+struct vmcs *vmcs_root;
+void *io_bmp1, *io_bmp2;
+void *msr_bmp;
+u32 vpid_ctr;
+char *guest_stack, *host_stack;
+char *guest_syscall_stack, *host_syscall_stack;
+u32 ctrl_pin, ctrl_enter, ctrl_exit, ctrl_cpu[2];
+ulong fix_cr0_set, fix_cr0_clr;
+ulong fix_cr4_set, fix_cr4_clr;
+struct regs regs;
+jmp_buf env;
+
+extern u64 gdt64_desc[];
+extern u64 idt_descr[];
+extern u64 tss_descr[];
+extern void *entry_vmx;
+extern void *entry_sysenter;
+extern void *entry_guest;
+
+void report(const char *name, int result)
+{
+   ++tests;
+   if (result)
+   printf("PASS: %s\n", name);
+   else {
+   printf("FAIL: %s\n", name);
+   ++fails;
+   }
+}
+
+inline u64 get_rflags(void)
+{
+   u64 r;
+   asm volatile("pushf; pop %0\n\t" : "=q"(r) : : "cc");
+   return r;
+}
+
+inline void set_rflags(u64 r)
+{
+   asm volatile("push %0; popf\n\t" : : "q"(r) : "cc");
+}
+
+int vmcs_clear(struct vmcs *vmcs)
+{
+   bool ret;
+   asm volatile ("vmclear %1; setbe %0" : "=q" (ret) : "m" (vmcs) : "cc");
+   return ret;
+}
+
+u64 vmcs_read(enum Encoding enc)
+{
+   u64 val;
+   asm volatile ("vmread %1, %0" : "=rm" (val) : "r" ((u64)enc) : "cc");
+   return val;
+}
+
+int vmcs_write(enum Encoding enc, u64 val)
+{
+   bool ret;
+   asm volatile ("vmwrite %1, %2; setbe %0"
+   : "=q"(ret) : "rm" (val), "r" ((u64)enc) : "cc");
+   return ret;
+}
+
+int make_vmcs_current(struct vmcs *vmcs)
+{
+   bool ret;
+
+   asm volatile ("vmptrld %1; setbe %0" : "=q" (ret) : "m" (vmcs) : "cc");
+   return ret;
+}
+
+int save_vmcs(struct vmcs **vmcs)
+{
+   bool ret;
+
+   asm volatile ("vmptrst %1; setbe %0" : "=q" (ret) : "m" (*vm

[PATCH v3 1/2] kvm-unit-tests : Add setjmp/longjmp to libcflat

2013-07-16 Thread Arthur Chunqi Li
Add setjmp and longjmp functions to libcflat. Now these two functions
are only supported in X86_64 arch.

New files added:
lib/x86/setjmp64.S
lib/x86/setjmp64.c

Signed-off-by: Arthur Chunqi Li 
---
 config-x86_64.mak  |2 ++
 lib/setjmp.h   |   11 +++
 lib/x86/setjmp64.S |   27 +++
 3 files changed, 40 insertions(+)
 create mode 100644 lib/setjmp.h
 create mode 100644 lib/x86/setjmp64.S

diff --git a/config-x86_64.mak b/config-x86_64.mak
index 4e525f5..91ffcce 100644
--- a/config-x86_64.mak
+++ b/config-x86_64.mak
@@ -4,6 +4,8 @@ bits = 64
 ldarch = elf64-x86-64
 CFLAGS += -D__x86_64__
 
+cflatobjs += lib/x86/setjmp64.o
+
 tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
  $(TEST_DIR)/emulator.flat $(TEST_DIR)/idt_test.flat \
  $(TEST_DIR)/xsave.flat $(TEST_DIR)/rmap_chain.flat \
diff --git a/lib/setjmp.h b/lib/setjmp.h
new file mode 100644
index 000..eca70d9
--- /dev/null
+++ b/lib/setjmp.h
@@ -0,0 +1,11 @@
+#ifndef LIBCFLAT_SETJMP64_H
+#define LIBCFLAT_SETJMP64_H
+
+#include "libcflat.h"
+
+typedef char jmp_buf[64];
+
+void longjmp(jmp_buf env, int val);
+int setjmp(jmp_buf env);
+
+#endif
diff --git a/lib/x86/setjmp64.S b/lib/x86/setjmp64.S
new file mode 100644
index 000..c8ae790
--- /dev/null
+++ b/lib/x86/setjmp64.S
@@ -0,0 +1,27 @@
+.globl setjmp
+setjmp:
+   mov (%rsp), %rsi
+   mov %rsi, (%rdi)
+   mov %rsp, 0x8(%rdi)
+   mov %rbp, 0x10(%rdi)
+   mov %rbx, 0x18(%rdi)
+   mov %r12, 0x20(%rdi)
+   mov %r13, 0x28(%rdi)
+   mov %r14, 0x30(%rdi)
+   mov %r15, 0x38(%rdi)
+   xor %eax, %eax
+   ret
+
+.globl longjmp
+longjmp:
+   mov %esi, %eax
+   mov 0x38(%rdi), %r15
+   mov 0x30(%rdi), %r14
+   mov 0x28(%rdi), %r13
+   mov 0x20(%rdi), %r12
+   mov 0x18(%rdi), %rbx
+   mov 0x10(%rdi), %rbp
+   mov 0x8(%rdi), %rsp
+   mov (%rdi), %rsi
+   mov %rsi, (%rsp)
+   ret
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/2] Basic nested VMX test suite

2013-07-16 Thread Arthur Chunqi Li
These two patches are focued on providing a basic version of VMX nested
test suite. This commit provides the architecture of nested VMX similiar
to x86/svm.c.

setjmp/longjmp is designed to avoid exiting VMX in call-back form.

Arthur Chunqi Li (2):
  kvm-unit-tests : Add setjmp/longjmp to libcflat
  kvm-unit-tests : The first version of VMX nested test case

 config-x86-common.mak |2 +
 config-x86_64.mak |3 +
 lib/setjmp.h  |   11 +
 lib/x86/msr.h |5 +
 lib/x86/setjmp64.S|   27 +++
 x86/cstart64.S|4 +
 x86/unittests.cfg |6 +
 x86/vmx.c |  561 +
 x86/vmx.h |  406 +++
 9 files changed, 1025 insertions(+)
 create mode 100644 lib/setjmp.h
 create mode 100644 lib/x86/setjmp64.S
 create mode 100644 x86/vmx.c
 create mode 100644 x86/vmx.h

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V10 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-07-16 Thread Peter Zijlstra
On Tue, Jul 16, 2013 at 09:02:15AM +0300, Gleb Natapov wrote:
> BTW can NMI handler take spinlocks? 

No -- that is, yes you can using trylock, but you still shouldn't.

> If it can what happens if NMI is
> delivered in a section protected by local_irq_save()/local_irq_restore()?

You deadlock.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-unit-tests : The first version of VMX nested test case

2013-07-16 Thread Gleb Natapov
On Tue, Jul 16, 2013 at 12:28:05PM +0200, Paolo Bonzini wrote:
> > +void vmx_exit(void)
> > +{
> > +   test_vmxoff();
> > +   printf("\nSUMMARY: %d tests, %d failures\n", tests, fails);
> > +   exit(fails ? -1 : 0);
> > +}
> 
> Can you try to jump back to main, and do test_vmxoff there?  This will
> avoid having to write our tests in callback style, which is a pain.
> Basically something similar to setjmp/longjmp.  In main:
> 
>   if (setjmp(jmpbuf) == 0) {
>   vmx_run();
>   /* Should not reach here */
>   report("test vmlaunch", 0);
>   }
>   test_vmxoff();
> 
> exit:
>   printf("\nSUMMARY: %d tests, %d failures\n", tests, fails);
>   return fails ? 1 : 0;
> 
> In vmx_handler:
> 
>   case VMX_HLT:
>   printf("\nVM exit.\n");
>   longjmp(jmpbuf, 1);
> 
Why not just make vmexit occur after vmlaunch/vmresume like KVM does. It
will make code much more straightforward and easer to follow.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-unit-tests : The first version of VMX nested test case

2013-07-16 Thread Arthur Chunqi Li
On Tue, Jul 16, 2013 at 11:20 PM, Gleb Natapov  wrote:
> On Tue, Jul 16, 2013 at 12:28:05PM +0200, Paolo Bonzini wrote:
>> > +void vmx_exit(void)
>> > +{
>> > +   test_vmxoff();
>> > +   printf("\nSUMMARY: %d tests, %d failures\n", tests, fails);
>> > +   exit(fails ? -1 : 0);
>> > +}
>>
>> Can you try to jump back to main, and do test_vmxoff there?  This will
>> avoid having to write our tests in callback style, which is a pain.
>> Basically something similar to setjmp/longjmp.  In main:
>>
>>   if (setjmp(jmpbuf) == 0) {
>>   vmx_run();
>>   /* Should not reach here */
>>   report("test vmlaunch", 0);
>>   }
>>   test_vmxoff();
>>
>> exit:
>>   printf("\nSUMMARY: %d tests, %d failures\n", tests, fails);
>>   return fails ? 1 : 0;
>>
>> In vmx_handler:
>>
>>   case VMX_HLT:
>>   printf("\nVM exit.\n");
>>   longjmp(jmpbuf, 1);
>>
> Why not just make vmexit occur after vmlaunch/vmresume like KVM does. It
> will make code much more straightforward and easer to follow.
The concept "easier to follow" may have different meanings in
different view. This achievement puts all the test cases in main
function instead of scattering everywhere, which is another view to
"easy to follow". As this is just a test case, I prefer this one.

Besides, this way we can start another VM following the previous one
simply in main function. This is flexible if we want to test re-enter
to VMX mode or so.

Arthur
>
> --
> Gleb.



--
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: RFC: vfio interface for platform devices (v2)

2013-07-16 Thread Yoder Stuart-B08248


> -Original Message-
> From: Mario Smarduch [mailto:mario.smard...@huawei.com]
> Sent: Thursday, July 04, 2013 9:45 AM
> To: Yoder Stuart-B08248
> Cc: Alex Williamson; Alexander Graf; Wood Scott-B07421; kvm@vger.kernel.org 
> list; Bhushan Bharat-R65777;
> kvm-...@vger.kernel.org; virtualizat...@lists.linux-foundation.org; Sethi 
> Varun-B16395;
> kvm...@lists.cs.columbia.edu
> Subject: Re: RFC: vfio interface for platform devices (v2)
> 
> 
> I'm having trouble understanding how this works where
> the Guest Device Model != Host. How do you inform the guest
> where the device is mapped in its physical address space,
> and handle GPA faults?

The vfio mechanisms just expose hardware to user space
and the user space app may or may not QEMU.  So there
may be no 'guest' at all.

The intent of this RFC is to provide enough info to user space so
an application can use the device, or in the case of QEMU expose
the device to a VM.  Platform devices are typically exposed via
the device tree and that is how I envision them being presented
to a guest.

Are there real cases you see where guest device model != host?
I don't envision ever presenting a platform device as a PCI device
or vise versa.

Stuart

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] kvm-unit-tests : Add setjmp/longjmp to libcflat

2013-07-16 Thread Paolo Bonzini
Il 16/07/2013 17:11, Arthur Chunqi Li ha scritto:
> Add setjmp and longjmp functions to libcflat. Now these two functions
> are only supported in X86_64 arch.
> 
> New files added:
>   lib/x86/setjmp64.S
>   lib/x86/setjmp64.c
> 
> Signed-off-by: Arthur Chunqi Li 
> ---
>  config-x86_64.mak  |2 ++
>  lib/x86/setjmp64.S |   27 +++
>  lib/x86/setjmp64.h |   15 +++
>  3 files changed, 44 insertions(+)
>  create mode 100644 lib/x86/setjmp64.S
>  create mode 100644 lib/x86/setjmp64.h
> 
> diff --git a/config-x86_64.mak b/config-x86_64.mak
> index 4e525f5..91ffcce 100644
> --- a/config-x86_64.mak
> +++ b/config-x86_64.mak
> @@ -4,6 +4,8 @@ bits = 64
>  ldarch = elf64-x86-64
>  CFLAGS += -D__x86_64__
>  
> +cflatobjs += lib/x86/setjmp64.o
> +
>  tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
> $(TEST_DIR)/emulator.flat $(TEST_DIR)/idt_test.flat \
> $(TEST_DIR)/xsave.flat $(TEST_DIR)/rmap_chain.flat \
> diff --git a/lib/x86/setjmp64.S b/lib/x86/setjmp64.S
> new file mode 100644
> index 000..c8ae790
> --- /dev/null
> +++ b/lib/x86/setjmp64.S
> @@ -0,0 +1,27 @@
> +.globl setjmp
> +setjmp:
> + mov (%rsp), %rsi
> + mov %rsi, (%rdi)
> + mov %rsp, 0x8(%rdi)
> + mov %rbp, 0x10(%rdi)
> + mov %rbx, 0x18(%rdi)
> + mov %r12, 0x20(%rdi)
> + mov %r13, 0x28(%rdi)
> + mov %r14, 0x30(%rdi)
> + mov %r15, 0x38(%rdi)
> + xor %eax, %eax
> + ret
> +
> +.globl longjmp
> +longjmp:
> + mov %esi, %eax
> + mov 0x38(%rdi), %r15
> + mov 0x30(%rdi), %r14
> + mov 0x28(%rdi), %r13
> + mov 0x20(%rdi), %r12
> + mov 0x18(%rdi), %rbx
> + mov 0x10(%rdi), %rbp
> + mov 0x8(%rdi), %rsp
> + mov (%rdi), %rsi
> + mov %rsi, (%rsp)
> + ret

Nice!

> diff --git a/lib/x86/setjmp64.h b/lib/x86/setjmp64.h
> new file mode 100644
> index 000..98ff16a
> --- /dev/null
> +++ b/lib/x86/setjmp64.h
> @@ -0,0 +1,15 @@
> +#ifndef LIBCFLAT_SETJMP64_H
> +#define LIBCFLAT_SETJMP64_H
> +
> +#include "libcflat.h"
> +
> +struct jmp_buf {
> + u64 calling_rip;
> + u64 rsp, rbp, rbx;
> + u64 r12, r13, r14, r15;
> +};

Please make it a typedef instead.  In fact, jmp_buf is typically an
array so that longjmp(env, 1) works without the ampersand.  A
"struct-like" declaration can go in setjmp64.S as a comment.

With this change, you can put it in lib/setjmp.h so that implementation
for other arches can be added later.

Otherwise looks good, thanks!

Paolo

> +void longjmp(struct jmp_buf *env, int val);
> +int setjmp(struct jmp_buf *env);
> +
> +#endif
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/2] Basic nested VMX test suite

2013-07-16 Thread Arthur Chunqi Li
These two patches are focued on providing a basic version of VMX nested
test suite. This commit provides the architecture of nested VMX similiar
to x86/svm.c.

setjmp/longjmp is designed to avoid exiting VMX in call-back form.

Arthur Chunqi Li (2):
  kvm-unit-tests : Add setjmp/longjmp to libcflat
  kvm-unit-tests : The first version of VMX nested test case

 config-x86-common.mak |2 +
 config-x86_64.mak |3 +
 lib/x86/msr.h |5 +
 lib/x86/setjmp64.S|   27 +++
 lib/x86/setjmp64.h|   15 ++
 x86/cstart64.S|4 +
 x86/unittests.cfg |6 +
 x86/vmx.c |  565 +
 x86/vmx.h |  406 +++
 9 files changed, 1033 insertions(+)
 create mode 100644 lib/x86/setjmp64.S
 create mode 100644 lib/x86/setjmp64.h
 create mode 100644 x86/vmx.c
 create mode 100644 x86/vmx.h

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] kvm-unit-tests : Add setjmp/longjmp to libcflat

2013-07-16 Thread Arthur Chunqi Li
Add setjmp and longjmp functions to libcflat. Now these two functions
are only supported in X86_64 arch.

New files added:
lib/x86/setjmp64.S
lib/x86/setjmp64.c

Signed-off-by: Arthur Chunqi Li 
---
 config-x86_64.mak  |2 ++
 lib/x86/setjmp64.S |   27 +++
 lib/x86/setjmp64.h |   15 +++
 3 files changed, 44 insertions(+)
 create mode 100644 lib/x86/setjmp64.S
 create mode 100644 lib/x86/setjmp64.h

diff --git a/config-x86_64.mak b/config-x86_64.mak
index 4e525f5..91ffcce 100644
--- a/config-x86_64.mak
+++ b/config-x86_64.mak
@@ -4,6 +4,8 @@ bits = 64
 ldarch = elf64-x86-64
 CFLAGS += -D__x86_64__
 
+cflatobjs += lib/x86/setjmp64.o
+
 tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
  $(TEST_DIR)/emulator.flat $(TEST_DIR)/idt_test.flat \
  $(TEST_DIR)/xsave.flat $(TEST_DIR)/rmap_chain.flat \
diff --git a/lib/x86/setjmp64.S b/lib/x86/setjmp64.S
new file mode 100644
index 000..c8ae790
--- /dev/null
+++ b/lib/x86/setjmp64.S
@@ -0,0 +1,27 @@
+.globl setjmp
+setjmp:
+   mov (%rsp), %rsi
+   mov %rsi, (%rdi)
+   mov %rsp, 0x8(%rdi)
+   mov %rbp, 0x10(%rdi)
+   mov %rbx, 0x18(%rdi)
+   mov %r12, 0x20(%rdi)
+   mov %r13, 0x28(%rdi)
+   mov %r14, 0x30(%rdi)
+   mov %r15, 0x38(%rdi)
+   xor %eax, %eax
+   ret
+
+.globl longjmp
+longjmp:
+   mov %esi, %eax
+   mov 0x38(%rdi), %r15
+   mov 0x30(%rdi), %r14
+   mov 0x28(%rdi), %r13
+   mov 0x20(%rdi), %r12
+   mov 0x18(%rdi), %rbx
+   mov 0x10(%rdi), %rbp
+   mov 0x8(%rdi), %rsp
+   mov (%rdi), %rsi
+   mov %rsi, (%rsp)
+   ret
diff --git a/lib/x86/setjmp64.h b/lib/x86/setjmp64.h
new file mode 100644
index 000..98ff16a
--- /dev/null
+++ b/lib/x86/setjmp64.h
@@ -0,0 +1,15 @@
+#ifndef LIBCFLAT_SETJMP64_H
+#define LIBCFLAT_SETJMP64_H
+
+#include "libcflat.h"
+
+struct jmp_buf {
+   u64 calling_rip;
+   u64 rsp, rbp, rbx;
+   u64 r12, r13, r14, r15;
+};
+
+void longjmp(struct jmp_buf *env, int val);
+int setjmp(struct jmp_buf *env);
+
+#endif
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] kvm-unit-tests : The first version of VMX nested test case

2013-07-16 Thread Arthur Chunqi Li
This is the first version for VMX nested environment test case. It
contains the basic VMX instructions test cases, including VMXON/
VMXOFF/VMXPTRLD/VMXPTRST/VMCLEAR/VMLAUNCH/VMRESUME/VMCALL. This patch
also tests the basic execution routine in VMX nested environment and
let the VM print "Hello World" to inform its successfully run.

New files added:
x86/vmx.h : contains all VMX related macro declerations
x86/vmx.c : main file for VMX nested test case

Signed-off-by: Arthur Chunqi Li 
---
 config-x86-common.mak |2 +
 config-x86_64.mak |1 +
 lib/x86/msr.h |5 +
 x86/cstart64.S|4 +
 x86/unittests.cfg |6 +
 x86/vmx.c |  565 +
 x86/vmx.h |  406 +++
 7 files changed, 989 insertions(+)
 create mode 100644 x86/vmx.c
 create mode 100644 x86/vmx.h

diff --git a/config-x86-common.mak b/config-x86-common.mak
index 455032b..34a41e1 100644
--- a/config-x86-common.mak
+++ b/config-x86-common.mak
@@ -101,6 +101,8 @@ $(TEST_DIR)/asyncpf.elf: $(cstart.o) $(TEST_DIR)/asyncpf.o
 
 $(TEST_DIR)/pcid.elf: $(cstart.o) $(TEST_DIR)/pcid.o
 
+$(TEST_DIR)/vmx.elf: $(cstart.o) $(TEST_DIR)/vmx.o
+
 arch_clean:
$(RM) $(TEST_DIR)/*.o $(TEST_DIR)/*.flat $(TEST_DIR)/*.elf \
$(TEST_DIR)/.*.d $(TEST_DIR)/lib/.*.d $(TEST_DIR)/lib/*.o
diff --git a/config-x86_64.mak b/config-x86_64.mak
index 91ffcce..5d9b22a 100644
--- a/config-x86_64.mak
+++ b/config-x86_64.mak
@@ -11,5 +11,6 @@ tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
  $(TEST_DIR)/xsave.flat $(TEST_DIR)/rmap_chain.flat \
  $(TEST_DIR)/pcid.flat
 tests += $(TEST_DIR)/svm.flat
+tests += $(TEST_DIR)/vmx.flat
 
 include config-x86-common.mak
diff --git a/lib/x86/msr.h b/lib/x86/msr.h
index 509a421..281255a 100644
--- a/lib/x86/msr.h
+++ b/lib/x86/msr.h
@@ -396,6 +396,11 @@
 #define MSR_IA32_VMX_VMCS_ENUM  0x048a
 #define MSR_IA32_VMX_PROCBASED_CTLS20x048b
 #define MSR_IA32_VMX_EPT_VPID_CAP   0x048c
+#define MSR_IA32_VMX_TRUE_PIN  0x048d
+#define MSR_IA32_VMX_TRUE_PROC 0x048e
+#define MSR_IA32_VMX_TRUE_EXIT 0x048f
+#define MSR_IA32_VMX_TRUE_ENTRY0x0490
+
 
 /* AMD-V MSRs */
 
diff --git a/x86/cstart64.S b/x86/cstart64.S
index 24df5f8..0fe76da 100644
--- a/x86/cstart64.S
+++ b/x86/cstart64.S
@@ -4,6 +4,10 @@
 .globl boot_idt
 boot_idt = 0
 
+.globl idt_descr
+.globl tss_descr
+.globl gdt64_desc
+
 ipi_vector = 0x20
 
 max_cpus = 64
diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index bc9643e..85c36aa 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -149,3 +149,9 @@ extra_params = --append "1000 `date +%s`"
 file = pcid.flat
 extra_params = -cpu qemu64,+pcid
 arch = x86_64
+
+[vmx]
+file = vmx.flat
+extra_params = -cpu host,+vmx
+arch = x86_64
+
diff --git a/x86/vmx.c b/x86/vmx.c
new file mode 100644
index 000..9082fc7
--- /dev/null
+++ b/x86/vmx.c
@@ -0,0 +1,565 @@
+#include "libcflat.h"
+#include "processor.h"
+#include "vm.h"
+#include "desc.h"
+#include "vmx.h"
+#include "msr.h"
+#include "smp.h"
+#include "io.h"
+#include "setjmp64.h"
+
+int fails = 0, tests = 0;
+u32 *vmxon_region;
+struct vmcs *vmcs_root;
+void *io_bmp1, *io_bmp2;
+void *msr_bmp;
+u32 vpid_ctr;
+char *guest_stack, *host_stack;
+char *guest_syscall_stack, *host_syscall_stack;
+u32 ctrl_pin, ctrl_enter, ctrl_exit, ctrl_cpu[2];
+ulong fix_cr0_set, fix_cr0_clr;
+ulong fix_cr4_set, fix_cr4_clr;
+struct regs regs;
+struct jmp_buf env;
+
+extern u64 gdt64_desc[];
+extern u64 idt_descr[];
+extern u64 tss_descr[];
+extern void *entry_vmx;
+extern void *entry_sysenter;
+extern void *entry_guest;
+
+void report(const char *name, int result)
+{
+   ++tests;
+   if (result)
+   printf("PASS: %s\n", name);
+   else {
+   printf("FAIL: %s\n", name);
+   ++fails;
+   }
+}
+
+inline u64 get_rflags(void)
+{
+   u64 r;
+   asm volatile("pushf; pop %0\n\t" : "=q"(r) : : "cc");
+   return r;
+}
+
+inline void set_rflags(u64 r)
+{
+   asm volatile("push %0; popf\n\t" : : "q"(r) : "cc");
+}
+
+int vmcs_clear(struct vmcs *vmcs)
+{
+   bool ret;
+   asm volatile ("vmclear %1; setbe %0" : "=q" (ret) : "m" (vmcs) : "cc");
+   return ret;
+}
+
+u64 vmcs_read(enum Encoding enc)
+{
+   u64 val;
+   asm volatile ("vmread %1, %0" : "=rm" (val) : "r" ((u64)enc) : "cc");
+   return val;
+}
+
+int vmcs_write(enum Encoding enc, u64 val)
+{
+   bool ret;
+   asm volatile ("vmwrite %1, %2; setbe %0"
+   : "=q"(ret) : "rm" (val), "r" ((u64)enc) : "cc");
+   return ret;
+}
+
+int make_vmcs_current(struct vmcs *vmcs)
+{
+   bool ret;
+
+   asm volatile ("vmptrld %1; setbe %0" : "=q" (ret) : "m" (vmcs) : "cc");
+   return ret;
+}
+
+int save_vmcs(struct vmcs **vmcs)
+{
+   bool ret;
+
+   asm volatile ("vmptrst %1; setbe %0" : "=q" (ret) :

Re: vga passthrough // vfio // qemu bridge

2013-07-16 Thread Alex Williamson
On Tue, 2013-07-16 at 14:35 +0200, Martin Wolf wrote:
> Early 2012 i tested the old vga passthrough capabilities of KVM and was
> partly successful.
> now with the new vfio driver i tried again according to alex's hints and
> this guide:
> https://bbs.archlinux.org/viewtopic.php?id=162768
> 
> since im primarily using ubuntu i used the daily build of saucy.
> it ships qemu 1.5 and seabios 1.7.3 so the requirements are met.
> 
> according to the guide i prepared the vga card (amd 7870)
> 
> [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-2-generic
> root=UUID=26fed560-a972-499d-ab14-7fec6439fd3d ro intel_iommu=on
> pci-stub.ids=1002:6818,1002:aab0 quiet splash vt.handoff=7
> [0.00] Kernel command line:
> BOOT_IMAGE=/boot/vmlinuz-3.10.0-2-generic
> root=UUID=26fed560-a972-499d-ab14-7fec6439fd3d ro intel_iommu=on
> pci-stub.ids=1002:6818,1002:aab0 quiet splash vt.handoff=7
> [0.569977] pci-stub: add 1002:6818 sub=:
> cls=/
> [0.569987] pci-stub :01:00.0: claimed by stub
> [0.569994] pci-stub: add 1002:AAB0 sub=:
> cls=/
> [0.569998] pci-stub :01:00.1: claimed by stub
> 
> then did this just to be sure:
> echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" >
> /etc/modprobe.d/vfio_iommu_type1.conf
> (or was that wrong?)
> im using a z87 haswell mainboard

Hopefully not needed, only use this option if you need to.  vfio will
print an error to dmesg and qemu will fail to start if you need it.

> after that i binded the two devices to vfio-pci with:
> vfio-bind :01:00.0 :01:00.1 (the script in the guide)
> 
> afterwards i was able to start the kvm with
> qemu-system-x86_64 -enable-kvm -M q35 -m 8192 -cpu host \
> -smp 8,sockets=1,cores=4,threads=8 \
> -bios /usr/share/qemu/bios.bin -vga none \
> -device
> ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1 \
> -device
> vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on \
> -device vfio-pci,host=01:00.1,bus=root.1,addr=00.1 \
> -device ahci,bus=pcie.0,id=ahci \
> -drive file=/home/martin/windows.img,if=none,id=disk,format=raw -device
> virtio-blk-pci,drive=disk \
> -drive file=/home/martin/X17-59885.iso,id=isocd -device
> ide-cd,bus=ahci.0,drive=isocd \
> -net nic,model=virtio \
> -net user \
> -usb -usbdevice host:1532:000c \
> -drive file=/home/martin/Downloads/virtio-win-0.1-59.iso,id=isocd1
> -device ide-cd,bus=ahci.1,drive=isocd1
> 
> to my surprise i instantly got the windows installation running
> installed the virtio drivers for nic and storage
> and had 15 mins later a working win7 installation.
> now i installed the amd driver (13.4) and rebooted.
> i got a bluescreen. similar to my old expiriences so i thought do a
> clean host reboot and try again.
> but still the same. so i tried to load the bios.rom for the card (found
> it on techpowerup) again no luck.
> maybe someone knows a hint?

I think most of the folks using the guide you reference are using my
vfio-vga-reset branches of qemu & kernel (or patches extracted from
them).  These add an important step for reproducibility, by being able
to reset the bus for the graphics card, giving us a way to reset the
device.  The other thing in the qemu branch are improved quirks.  I've
just posted these to qemu-devel and plan to get them pulled for 1.6.
Note that for ATI/AMD cards, a critical quirks is intercepting the byte
at I/O port address 0x3c3.  Without this, the VGA BIOS can't bootstrap
itself.  The vfio-vga-reset branch has a conditional replacement of
this, which doesn't seem to work for everyone.  I believe the version I
posted to qemu-devel yesterday is a better implementation of that quirk.

> ---
> 
> about qemu bridge
> i tried to set up a bridge with the config but qemu always told me that
> qemu-bridge-helper is not present.
> all i found out that it propably got removed from the package because of
> the lack of control over the tap
> devices.
> now my question how can i still bridge the vm into my network without
> that helper?

I don't know what qemu-bridge-helper is/was, but you're probably better
off asking bridge questions in a separate thread instead of buried here.
Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


vga passthrough // vfio // qemu bridge

2013-07-16 Thread Martin Wolf
Early 2012 i tested the old vga passthrough capabilities of KVM and was
partly successful.
now with the new vfio driver i tried again according to alex's hints and
this guide:
https://bbs.archlinux.org/viewtopic.php?id=162768

since im primarily using ubuntu i used the daily build of saucy.
it ships qemu 1.5 and seabios 1.7.3 so the requirements are met.

according to the guide i prepared the vga card (amd 7870)

[0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-2-generic
root=UUID=26fed560-a972-499d-ab14-7fec6439fd3d ro intel_iommu=on
pci-stub.ids=1002:6818,1002:aab0 quiet splash vt.handoff=7
[0.00] Kernel command line:
BOOT_IMAGE=/boot/vmlinuz-3.10.0-2-generic
root=UUID=26fed560-a972-499d-ab14-7fec6439fd3d ro intel_iommu=on
pci-stub.ids=1002:6818,1002:aab0 quiet splash vt.handoff=7
[0.569977] pci-stub: add 1002:6818 sub=:
cls=/
[0.569987] pci-stub :01:00.0: claimed by stub
[0.569994] pci-stub: add 1002:AAB0 sub=:
cls=/
[0.569998] pci-stub :01:00.1: claimed by stub

then did this just to be sure:
echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" >
/etc/modprobe.d/vfio_iommu_type1.conf
(or was that wrong?)
im using a z87 haswell mainboard

after that i binded the two devices to vfio-pci with:
vfio-bind :01:00.0 :01:00.1 (the script in the guide)

afterwards i was able to start the kvm with
qemu-system-x86_64 -enable-kvm -M q35 -m 8192 -cpu host \
-smp 8,sockets=1,cores=4,threads=8 \
-bios /usr/share/qemu/bios.bin -vga none \
-device
ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1 \
-device
vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on \
-device vfio-pci,host=01:00.1,bus=root.1,addr=00.1 \
-device ahci,bus=pcie.0,id=ahci \
-drive file=/home/martin/windows.img,if=none,id=disk,format=raw -device
virtio-blk-pci,drive=disk \
-drive file=/home/martin/X17-59885.iso,id=isocd -device
ide-cd,bus=ahci.0,drive=isocd \
-net nic,model=virtio \
-net user \
-usb -usbdevice host:1532:000c \
-drive file=/home/martin/Downloads/virtio-win-0.1-59.iso,id=isocd1
-device ide-cd,bus=ahci.1,drive=isocd1

to my surprise i instantly got the windows installation running
installed the virtio drivers for nic and storage
and had 15 mins later a working win7 installation.
now i installed the amd driver (13.4) and rebooted.
i got a bluescreen. similar to my old expiriences so i thought do a
clean host reboot and try again.
but still the same. so i tried to load the bios.rom for the card (found
it on techpowerup) again no luck.
maybe someone knows a hint?

---

about qemu bridge
i tried to set up a bridge with the config but qemu always told me that
qemu-bridge-helper is not present.
all i found out that it propably got removed from the package because of
the lack of control over the tap
devices.
now my question how can i still bridge the vm into my network without
that helper?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Disabling mergeable rx buffers for the guest

2013-07-16 Thread Naor Shlomo
Paolo,

Thanks for your help in this matter. Issue is resolved.

One more thing I changed in order to get the machine up and running is this:

has been changed to:


I verified by running:
cat /sys/bus/virtio/devices/virtio0/features

That the 15th bit is off hence mergeable buffers are disabled.

Thanks again,
Naor


-Original Message-
From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of 
Naor Shlomo
Sent: Tuesday, July 16, 2013 2:06 PM
To: Paolo Bonzini
Cc: kvm@vger.kernel.org
Subject: RE: Disabling mergeable rx buffers for the guest

Paolo, 

Sorry for all the trouble.
We got a progress. Your last qemu:commandline really worked and I was able to 
try and start the machine.
The problem is that I reached up to the tun driver load section and then the 
machine shut itself down.

The last lines of the console showed the following:

[1.568226] Fixed MDIO Bus: probed
[1.568833] tun: Universal TUN/TAP device driver, 1.6
[1.569108] tun: (C) 1999-2004 Max Krasnyansky 

And it got me back to the virsh command line.

Here's an output of the log file:

LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-1.0 -no-kvm -m 2069 -smp 
1,sockets=1,cores=1,threads=1 -name NaorDev -uuid 
60b5a0ab-8932-47c1-8674-760c7e1f4743 -nodefconfig -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/NaorDev.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
-drive 
file=/var/lib/libvirt/images/NaorDev.img,if=none,id=drive-virtio-disk0,format=raw,cache=writeback
 -device 
virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -drive 
file=/var/lib/libvirt/images/NaorAddonHdd.qcow,if=none,id=drive-virtio-disk1,format=qcow2,cache=none
 -device 
virtio-blk-pci,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1 
-netdev tap,fd=18,id=hostnet0,vhost=on,vhostfd=19 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:c6:b7:a3,bus=pci.0,addr=0x3 
-netdev tap,fd=20,id=hostnet1,vhost=on,vhostfd=21 -device 
virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:9e:44:90,bus=pci.0,addr=0x4 
-netdev tap,fd=22,id=hostnet2,vhost=on,vhostfd=23 -device 
virtio-net-pci,netdev=hostnet2,id=net2,mac=52:54:00:e7:f9:bf,bus=pci.0,addr=0x7 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-usb -vnc 127.0.0.1:0 -vga cirrus -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -global 
virtio-net-pci.mrg_rxbuf=off Domain id=9 is tainted: custom-argv char device 
redirected to /dev/pts/1
2013-07-16 12:48:26.440+: shutting down

and here's the output of lsmod:

vhost_net  32360  0 
macvtap18529  1 vhost_net
macvlan19003  1 macvtap
ip6table_filter12816  0 
ip6_tables 27686  1 ip6table_filter
ebtable_nat12808  0 
ebtables   31024  1 ebtable_nat
ipt_MASQUERADE 12760  3 
iptable_nat13230  1 
nf_nat 25646  2 ipt_MASQUERADE,iptable_nat
nf_conntrack_ipv4  14531  4 iptable_nat,nf_nat
nf_defrag_ipv4 12730  1 nf_conntrack_ipv4
xt_state   12579  1 
nf_conntrack   83300  5 
ipt_MASQUERADE,iptable_nat,nf_nat,nf_conntrack_ipv4,xt_state
ipt_REJECT 12577  2 
xt_CHECKSUM12550  1 
iptable_mangle 12735  1 
xt_tcpudp  12604  5 
iptable_filter 12811  1 
ip_tables  27474  3 iptable_nat,iptable_mangle,iptable_filter
x_tables   29892  12 
ip6table_filter,ip6_tables,ebtables,ipt_MASQUERADE,iptable_nat,xt_state,ipt_REJECT,xt_CHECKSUM,iptable_mangle,xt_tcpudp,iptable_filter,ip_tables
dm_crypt   23126  0 
coretemp   13642  0 
kvm_intel 137888  0 
kvm   422160  1 kvm_intel
ext2   73799  1 
gpio_ich   13384  0 
dm_multipath   23306  0 
scsi_dh14589  1 dm_multipath
lp 17800  0 
microcode  23030  0 
parport46563  1 lp
bridge 91498  0 
sb_edac18104  0 
stp12977  1 bridge
llc14598  2 bridge,stp
joydev 17694  0 
edac_core  53053  3 sb_edac
shpchp 37278  0 
lpc_ich17145  0 
mei41410  0 
acpi_power_meter   18124  0 
mac_hid13254  0 
btrfs 781017  0 
zlib_deflate   27140  1 btrfs
libcrc32c  12645  1 btrfs
vesafb 13846  1 
hid_generic12541  0 
usbhid 47259  0 
hid   100815  2 hid_generic,usbhid
ses17418  0 
enclosure  15313  1 ses
ghash_clmulni_intel13221  0 
aesni_intel51134  0 
cryptd 20531  2 ghash_clmulni_intel,aesni_intel
aes_x86_64   

Re: [[Qemu-devel] [PATCH]] nVMX: Initialize IA32_FEATURE_CONTROL MSR in reset and migration

2013-07-16 Thread Gleb Natapov
On Tue, Jul 16, 2013 at 07:56:25PM +0800, Arthur Chunqi Li wrote:
> On Tue, Jul 16, 2013 at 7:42 PM, Gleb Natapov  wrote:
> > On Sun, Jul 07, 2013 at 11:13:37PM +0800, Arthur Chunqi Li wrote:
> >> The recent KVM patch adds IA32_FEATURE_CONTROL support. QEMU needs
> >> to clear this MSR when reset vCPU and keep the value of it when
> >> migration. This patch add this feature.
> >>
> > So what happens if we migrate from qemu that does not have this patch
> > to qemu that does? Since msr_ia32_feature_control will not be migrated
> > it will not be set on the destination so destination will not be able to
> > use nested vmx. Since nested vmx is experimental it may be to early for
> > us to care about it though, and nested vmx does not work with migration
> > anyway.
> In my test, if migration doesn't care about msr_ia32_feature_control,
> the value will be set to 0 in the destination VM and this may cause
> some logical confusions, but the VMX running on it may not aware of
> this (if migration nested vmx is supported in the future) because once
> VMX initialized, it will not check this msr any more in normal cases.
> 
With vmm_exclusive=0 kvm does vmxon/vmxoff while running. But lest not
worry about nested kvm migration for now. There are much harder problems
to overcome before it will work.

> This is also a complex problem since we don't know how many states
> like this msr need to be migrated related to nested virt. If there're
> a lot of states need migrating, it is better to reconstruct the
> relevant codes. But now this patch is enough.
> 
> Besides, though migration is not supported in nested vmx, we should
> keep the machine state consistent during migration. So this patch is
> also meaningful.
> 
> Arthur
> >
> >> Signed-off-by: Arthur Chunqi Li 
> >> ---
> >>  target-i386/cpu.h |2 ++
> >>  target-i386/kvm.c |4 
> >>  target-i386/machine.c |   22 ++
> >>  3 files changed, 28 insertions(+)
> >>
> >> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> >> index 62e3547..a418e17 100644
> >> --- a/target-i386/cpu.h
> >> +++ b/target-i386/cpu.h
> >> @@ -301,6 +301,7 @@
> >>  #define MSR_IA32_APICBASE_BSP   (1<<8)
> >>  #define MSR_IA32_APICBASE_ENABLE(1<<11)
> >>  #define MSR_IA32_APICBASE_BASE  (0xf<<12)
> >> +#define MSR_IA32_FEATURE_CONTROL0x003a
> >>  #define MSR_TSC_ADJUST  0x003b
> >>  #define MSR_IA32_TSCDEADLINE0x6e0
> >>
> >> @@ -813,6 +814,7 @@ typedef struct CPUX86State {
> >>
> >>  uint64_t mcg_status;
> >>  uint64_t msr_ia32_misc_enable;
> >> +uint64_t msr_ia32_feature_control;
> >>
> >>  /* exception/interrupt handling */
> >>  int error_code;
> >> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> >> index 39f4fbb..3cb2161 100644
> >> --- a/target-i386/kvm.c
> >> +++ b/target-i386/kvm.c
> >> @@ -1122,6 +1122,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
> >>  if (hyperv_vapic_recommended()) {
> >>  kvm_msr_entry_set(&msrs[n++], HV_X64_MSR_APIC_ASSIST_PAGE, 0);
> >>  }
> >> +kvm_msr_entry_set(&msrs[n++], MSR_IA32_FEATURE_CONTROL, 
> >> env->msr_ia32_feature_control);
> >>  }
> >>  if (env->mcg_cap) {
> >>  int i;
> >> @@ -1346,6 +1347,7 @@ static int kvm_get_msrs(X86CPU *cpu)
> >>  if (has_msr_misc_enable) {
> >>  msrs[n++].index = MSR_IA32_MISC_ENABLE;
> >>  }
> >> +msrs[n++].index = MSR_IA32_FEATURE_CONTROL;
> >>
> >>  if (!env->tsc_valid) {
> >>  msrs[n++].index = MSR_IA32_TSC;
> >> @@ -1444,6 +1446,8 @@ static int kvm_get_msrs(X86CPU *cpu)
> >>  case MSR_IA32_MISC_ENABLE:
> >>  env->msr_ia32_misc_enable = msrs[i].data;
> >>  break;
> >> +case MSR_IA32_FEATURE_CONTROL:
> >> +env->msr_ia32_feature_control = msrs[i].data;
> >>  default:
> >>  if (msrs[i].index >= MSR_MC0_CTL &&
> >>  msrs[i].index < MSR_MC0_CTL + (env->mcg_cap & 0xff) * 4) {
> >> diff --git a/target-i386/machine.c b/target-i386/machine.c
> >> index 3659db9..94ca914 100644
> >> --- a/target-i386/machine.c
> >> +++ b/target-i386/machine.c
> >> @@ -399,6 +399,14 @@ static bool misc_enable_needed(void *opaque)
> >>  return env->msr_ia32_misc_enable != MSR_IA32_MISC_ENABLE_DEFAULT;
> >>  }
> >>
> >> +static bool feature_control_needed(void *opaque)
> >> +{
> >> +X86CPU *cpu = opaque;
> >> +CPUX86State *env = &cpu->env;
> >> +
> >> +return env->msr_ia32_feature_control != 0;
> >> +}
> >> +
> >>  static const VMStateDescription vmstate_msr_ia32_misc_enable = {
> >>  .name = "cpu/msr_ia32_misc_enable",
> >>  .version_id = 1,
> >> @@ -410,6 +418,17 @@ static const VMStateDescription 
> >> vmstate_msr_ia32_misc_enable = {
> >>  }
> >>  };
> >>
> >> +static const VMStateDescription vmstate_msr_ia32_feature_control = {
> >> +.name = "cpu/msr_ia32_feature_control",
> >> +.version_id = 1,
> >> +

Re: [PATCH] kvm-unit-tests : The first version of VMX nested test case

2013-07-16 Thread Paolo Bonzini
Il 16/07/2013 13:47, Arthur Chunqi Li ha scritto:
> setjmp and longjmp is not implemented in our test environment, and
> these two functions are highly depend on architecture. Do you think we
> need to write a general code for both 32bit and 64bit, or just write
> specific one just for this test case?

Supporting it in x86-64 is enough for now.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [[Qemu-devel] [PATCH]] nVMX: Initialize IA32_FEATURE_CONTROL MSR in reset and migration

2013-07-16 Thread Paolo Bonzini
Il 16/07/2013 13:42, Gleb Natapov ha scritto:
> On Sun, Jul 07, 2013 at 11:13:37PM +0800, Arthur Chunqi Li wrote:
>> The recent KVM patch adds IA32_FEATURE_CONTROL support. QEMU needs
>> to clear this MSR when reset vCPU and keep the value of it when
>> migration. This patch add this feature.
>>
> So what happens if we migrate from qemu that does not have this patch
> to qemu that does? Since msr_ia32_feature_control will not be migrated
> it will not be set on the destination so destination will not be able to
> use nested vmx.

Yes, that's the same as with every subsection.

> Since nested vmx is experimental it may be to early for
> us to care about it though, and nested vmx does not work with migration
> anyway.

Exactly.  The next time you start KVM, it will set the MSR to 5 again.

Paolo

>> Signed-off-by: Arthur Chunqi Li 
>> ---
>>  target-i386/cpu.h |2 ++
>>  target-i386/kvm.c |4 
>>  target-i386/machine.c |   22 ++
>>  3 files changed, 28 insertions(+)
>>
>> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
>> index 62e3547..a418e17 100644
>> --- a/target-i386/cpu.h
>> +++ b/target-i386/cpu.h
>> @@ -301,6 +301,7 @@
>>  #define MSR_IA32_APICBASE_BSP   (1<<8)
>>  #define MSR_IA32_APICBASE_ENABLE(1<<11)
>>  #define MSR_IA32_APICBASE_BASE  (0xf<<12)
>> +#define MSR_IA32_FEATURE_CONTROL0x003a
>>  #define MSR_TSC_ADJUST  0x003b
>>  #define MSR_IA32_TSCDEADLINE0x6e0
>>  
>> @@ -813,6 +814,7 @@ typedef struct CPUX86State {
>>  
>>  uint64_t mcg_status;
>>  uint64_t msr_ia32_misc_enable;
>> +uint64_t msr_ia32_feature_control;
>>  
>>  /* exception/interrupt handling */
>>  int error_code;
>> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
>> index 39f4fbb..3cb2161 100644
>> --- a/target-i386/kvm.c
>> +++ b/target-i386/kvm.c
>> @@ -1122,6 +1122,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
>>  if (hyperv_vapic_recommended()) {
>>  kvm_msr_entry_set(&msrs[n++], HV_X64_MSR_APIC_ASSIST_PAGE, 0);
>>  }
>> +kvm_msr_entry_set(&msrs[n++], MSR_IA32_FEATURE_CONTROL, 
>> env->msr_ia32_feature_control);
>>  }
>>  if (env->mcg_cap) {
>>  int i;
>> @@ -1346,6 +1347,7 @@ static int kvm_get_msrs(X86CPU *cpu)
>>  if (has_msr_misc_enable) {
>>  msrs[n++].index = MSR_IA32_MISC_ENABLE;
>>  }
>> +msrs[n++].index = MSR_IA32_FEATURE_CONTROL;
>>  
>>  if (!env->tsc_valid) {
>>  msrs[n++].index = MSR_IA32_TSC;
>> @@ -1444,6 +1446,8 @@ static int kvm_get_msrs(X86CPU *cpu)
>>  case MSR_IA32_MISC_ENABLE:
>>  env->msr_ia32_misc_enable = msrs[i].data;
>>  break;
>> +case MSR_IA32_FEATURE_CONTROL:
>> +env->msr_ia32_feature_control = msrs[i].data;
>>  default:
>>  if (msrs[i].index >= MSR_MC0_CTL &&
>>  msrs[i].index < MSR_MC0_CTL + (env->mcg_cap & 0xff) * 4) {
>> diff --git a/target-i386/machine.c b/target-i386/machine.c
>> index 3659db9..94ca914 100644
>> --- a/target-i386/machine.c
>> +++ b/target-i386/machine.c
>> @@ -399,6 +399,14 @@ static bool misc_enable_needed(void *opaque)
>>  return env->msr_ia32_misc_enable != MSR_IA32_MISC_ENABLE_DEFAULT;
>>  }
>>  
>> +static bool feature_control_needed(void *opaque)
>> +{
>> +X86CPU *cpu = opaque;
>> +CPUX86State *env = &cpu->env;
>> +
>> +return env->msr_ia32_feature_control != 0;
>> +}
>> +
>>  static const VMStateDescription vmstate_msr_ia32_misc_enable = {
>>  .name = "cpu/msr_ia32_misc_enable",
>>  .version_id = 1,
>> @@ -410,6 +418,17 @@ static const VMStateDescription 
>> vmstate_msr_ia32_misc_enable = {
>>  }
>>  };
>>  
>> +static const VMStateDescription vmstate_msr_ia32_feature_control = {
>> +.name = "cpu/msr_ia32_feature_control",
>> +.version_id = 1,
>> +.minimum_version_id = 1,
>> +.minimum_version_id_old = 1,
>> +.fields  = (VMStateField []) {
>> +VMSTATE_UINT64(env.msr_ia32_feature_control, X86CPU),
>> +VMSTATE_END_OF_LIST()
>> +}
>> +};
>> +
>>  const VMStateDescription vmstate_x86_cpu = {
>>  .name = "cpu",
>>  .version_id = 12,
>> @@ -535,6 +554,9 @@ const VMStateDescription vmstate_x86_cpu = {
>>  }, {
>>  .vmsd = &vmstate_msr_ia32_misc_enable,
>>  .needed = misc_enable_needed,
>> +}, {
>> +.vmsd = &vmstate_msr_ia32_feature_control,
>> +.needed = feature_control_needed,
>>  } , {
>>  /* empty */
>>  }
>> -- 
>> 1.7.9.5
> 
> --
>   Gleb.
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [[Qemu-devel] [PATCH]] nVMX: Initialize IA32_FEATURE_CONTROL MSR in reset and migration

2013-07-16 Thread Arthur Chunqi Li
On Tue, Jul 16, 2013 at 7:42 PM, Gleb Natapov  wrote:
> On Sun, Jul 07, 2013 at 11:13:37PM +0800, Arthur Chunqi Li wrote:
>> The recent KVM patch adds IA32_FEATURE_CONTROL support. QEMU needs
>> to clear this MSR when reset vCPU and keep the value of it when
>> migration. This patch add this feature.
>>
> So what happens if we migrate from qemu that does not have this patch
> to qemu that does? Since msr_ia32_feature_control will not be migrated
> it will not be set on the destination so destination will not be able to
> use nested vmx. Since nested vmx is experimental it may be to early for
> us to care about it though, and nested vmx does not work with migration
> anyway.
In my test, if migration doesn't care about msr_ia32_feature_control,
the value will be set to 0 in the destination VM and this may cause
some logical confusions, but the VMX running on it may not aware of
this (if migration nested vmx is supported in the future) because once
VMX initialized, it will not check this msr any more in normal cases.

This is also a complex problem since we don't know how many states
like this msr need to be migrated related to nested virt. If there're
a lot of states need migrating, it is better to reconstruct the
relevant codes. But now this patch is enough.

Besides, though migration is not supported in nested vmx, we should
keep the machine state consistent during migration. So this patch is
also meaningful.

Arthur
>
>> Signed-off-by: Arthur Chunqi Li 
>> ---
>>  target-i386/cpu.h |2 ++
>>  target-i386/kvm.c |4 
>>  target-i386/machine.c |   22 ++
>>  3 files changed, 28 insertions(+)
>>
>> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
>> index 62e3547..a418e17 100644
>> --- a/target-i386/cpu.h
>> +++ b/target-i386/cpu.h
>> @@ -301,6 +301,7 @@
>>  #define MSR_IA32_APICBASE_BSP   (1<<8)
>>  #define MSR_IA32_APICBASE_ENABLE(1<<11)
>>  #define MSR_IA32_APICBASE_BASE  (0xf<<12)
>> +#define MSR_IA32_FEATURE_CONTROL0x003a
>>  #define MSR_TSC_ADJUST  0x003b
>>  #define MSR_IA32_TSCDEADLINE0x6e0
>>
>> @@ -813,6 +814,7 @@ typedef struct CPUX86State {
>>
>>  uint64_t mcg_status;
>>  uint64_t msr_ia32_misc_enable;
>> +uint64_t msr_ia32_feature_control;
>>
>>  /* exception/interrupt handling */
>>  int error_code;
>> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
>> index 39f4fbb..3cb2161 100644
>> --- a/target-i386/kvm.c
>> +++ b/target-i386/kvm.c
>> @@ -1122,6 +1122,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
>>  if (hyperv_vapic_recommended()) {
>>  kvm_msr_entry_set(&msrs[n++], HV_X64_MSR_APIC_ASSIST_PAGE, 0);
>>  }
>> +kvm_msr_entry_set(&msrs[n++], MSR_IA32_FEATURE_CONTROL, 
>> env->msr_ia32_feature_control);
>>  }
>>  if (env->mcg_cap) {
>>  int i;
>> @@ -1346,6 +1347,7 @@ static int kvm_get_msrs(X86CPU *cpu)
>>  if (has_msr_misc_enable) {
>>  msrs[n++].index = MSR_IA32_MISC_ENABLE;
>>  }
>> +msrs[n++].index = MSR_IA32_FEATURE_CONTROL;
>>
>>  if (!env->tsc_valid) {
>>  msrs[n++].index = MSR_IA32_TSC;
>> @@ -1444,6 +1446,8 @@ static int kvm_get_msrs(X86CPU *cpu)
>>  case MSR_IA32_MISC_ENABLE:
>>  env->msr_ia32_misc_enable = msrs[i].data;
>>  break;
>> +case MSR_IA32_FEATURE_CONTROL:
>> +env->msr_ia32_feature_control = msrs[i].data;
>>  default:
>>  if (msrs[i].index >= MSR_MC0_CTL &&
>>  msrs[i].index < MSR_MC0_CTL + (env->mcg_cap & 0xff) * 4) {
>> diff --git a/target-i386/machine.c b/target-i386/machine.c
>> index 3659db9..94ca914 100644
>> --- a/target-i386/machine.c
>> +++ b/target-i386/machine.c
>> @@ -399,6 +399,14 @@ static bool misc_enable_needed(void *opaque)
>>  return env->msr_ia32_misc_enable != MSR_IA32_MISC_ENABLE_DEFAULT;
>>  }
>>
>> +static bool feature_control_needed(void *opaque)
>> +{
>> +X86CPU *cpu = opaque;
>> +CPUX86State *env = &cpu->env;
>> +
>> +return env->msr_ia32_feature_control != 0;
>> +}
>> +
>>  static const VMStateDescription vmstate_msr_ia32_misc_enable = {
>>  .name = "cpu/msr_ia32_misc_enable",
>>  .version_id = 1,
>> @@ -410,6 +418,17 @@ static const VMStateDescription 
>> vmstate_msr_ia32_misc_enable = {
>>  }
>>  };
>>
>> +static const VMStateDescription vmstate_msr_ia32_feature_control = {
>> +.name = "cpu/msr_ia32_feature_control",
>> +.version_id = 1,
>> +.minimum_version_id = 1,
>> +.minimum_version_id_old = 1,
>> +.fields  = (VMStateField []) {
>> +VMSTATE_UINT64(env.msr_ia32_feature_control, X86CPU),
>> +VMSTATE_END_OF_LIST()
>> +}
>> +};
>> +
>>  const VMStateDescription vmstate_x86_cpu = {
>>  .name = "cpu",
>>  .version_id = 12,
>> @@ -535,6 +554,9 @@ const VMStateDescription vmstate_x86_cpu = {
>>  }, {
>>  .vmsd = &vmstate_msr_ia32_misc_enable,
>> 

Re: [PATCH] kvm-unit-tests : The first version of VMX nested test case

2013-07-16 Thread Arthur Chunqi Li
Hi Paolo,

On Tue, Jul 16, 2013 at 6:28 PM, Paolo Bonzini  wrote:
> Il 16/07/2013 11:27, Arthur Chunqi Li ha scritto:
>> This is the first version for VMX nested environment test case. It
>> contains the basic VMX instructions test cases, including VMXON/
>> VMXOFF/VMXPTRLD/VMXPTRST/VMCLEAR/VMLAUNCH/VMRESUME/VMCALL. This patch
>> also tests the basic execution routine in VMX nested environment and
>> let the VM print "Hello World" to inform its successfully run.
>>
>> New files added:
>> x86/vmx.h : contains all VMX related macro declerations
>> x86/vmx.c : main file for VMX nested test case
>>
>> Signed-off-by: Arthur Chunqi Li 
>> ---
>>  config-x86-common.mak |2 +
>>  config-x86_64.mak |1 +
>>  lib/x86/msr.h |5 +
>>  x86/cstart64.S|4 +
>>  x86/unittests.cfg |6 +
>>  x86/vmx.c |  568 
>> +
>>  x86/vmx.h |  406 +++
>>  7 files changed, 992 insertions(+)
>>  create mode 100644 x86/vmx.c
>>  create mode 100644 x86/vmx.h
>>
>> diff --git a/config-x86-common.mak b/config-x86-common.mak
>> index 455032b..34a41e1 100644
>> --- a/config-x86-common.mak
>> +++ b/config-x86-common.mak
>> @@ -101,6 +101,8 @@ $(TEST_DIR)/asyncpf.elf: $(cstart.o) 
>> $(TEST_DIR)/asyncpf.o
>>
>>  $(TEST_DIR)/pcid.elf: $(cstart.o) $(TEST_DIR)/pcid.o
>>
>> +$(TEST_DIR)/vmx.elf: $(cstart.o) $(TEST_DIR)/vmx.o
>> +
>>  arch_clean:
>>   $(RM) $(TEST_DIR)/*.o $(TEST_DIR)/*.flat $(TEST_DIR)/*.elf \
>>   $(TEST_DIR)/.*.d $(TEST_DIR)/lib/.*.d $(TEST_DIR)/lib/*.o
>> diff --git a/config-x86_64.mak b/config-x86_64.mak
>> index 4e525f5..bb8ee89 100644
>> --- a/config-x86_64.mak
>> +++ b/config-x86_64.mak
>> @@ -9,5 +9,6 @@ tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
>> $(TEST_DIR)/xsave.flat $(TEST_DIR)/rmap_chain.flat \
>> $(TEST_DIR)/pcid.flat
>>  tests += $(TEST_DIR)/svm.flat
>> +tests += $(TEST_DIR)/vmx.flat
>>
>>  include config-x86-common.mak
>> diff --git a/lib/x86/msr.h b/lib/x86/msr.h
>> index 509a421..281255a 100644
>> --- a/lib/x86/msr.h
>> +++ b/lib/x86/msr.h
>> @@ -396,6 +396,11 @@
>>  #define MSR_IA32_VMX_VMCS_ENUM  0x048a
>>  #define MSR_IA32_VMX_PROCBASED_CTLS20x048b
>>  #define MSR_IA32_VMX_EPT_VPID_CAP   0x048c
>> +#define MSR_IA32_VMX_TRUE_PIN0x048d
>> +#define MSR_IA32_VMX_TRUE_PROC   0x048e
>> +#define MSR_IA32_VMX_TRUE_EXIT   0x048f
>> +#define MSR_IA32_VMX_TRUE_ENTRY  0x0490
>> +
>>
>>  /* AMD-V MSRs */
>>
>> diff --git a/x86/cstart64.S b/x86/cstart64.S
>> index 24df5f8..0fe76da 100644
>> --- a/x86/cstart64.S
>> +++ b/x86/cstart64.S
>> @@ -4,6 +4,10 @@
>>  .globl boot_idt
>>  boot_idt = 0
>>
>> +.globl idt_descr
>> +.globl tss_descr
>> +.globl gdt64_desc
>> +
>>  ipi_vector = 0x20
>>
>>  max_cpus = 64
>> diff --git a/x86/unittests.cfg b/x86/unittests.cfg
>> index bc9643e..e846739 100644
>> --- a/x86/unittests.cfg
>> +++ b/x86/unittests.cfg
>> @@ -149,3 +149,9 @@ extra_params = --append "1000 `date +%s`"
>>  file = pcid.flat
>>  extra_params = -cpu qemu64,+pcid
>>  arch = x86_64
>> +
>> +[vmx]
>> +file = vmx.flat
>> +extra_params = -cpu Nehalem,+vmx
>
> Should this use "-cpu host" instead? (Or "-cpu host,+vmx", I don't
> remember).
>
>> +arch = x86_64
>> +
>> diff --git a/x86/vmx.c b/x86/vmx.c
>> new file mode 100644
>> index 000..0435746
>> --- /dev/null
>> +++ b/x86/vmx.c
>> @@ -0,0 +1,568 @@
>> +#include "libcflat.h"
>> +#include "processor.h"
>> +#include "vm.h"
>> +#include "desc.h"
>> +#include "vmx.h"
>> +#include "msr.h"
>> +#include "smp.h"
>> +#include "io.h"
>> +
>> +
>> +int fails = 0, tests = 0;
>> +u32 *vmxon_region;
>> +struct vmcs *vmcs_root;
>> +void *io_bmp1, *io_bmp2;
>> +void *msr_bmp;
>> +u32 vpid_ctr;
>> +char *guest_stack, *host_stack;
>> +char *guest_syscall_stack, *host_syscall_stack;
>> +u32 ctrl_pin, ctrl_enter, ctrl_exit, ctrl_cpu[2];
>> +ulong fix_cr0_set, fix_cr0_clr;
>> +ulong fix_cr4_set, fix_cr4_clr;
>> +struct regs regs;
>> +
>> +extern u64 gdt64_desc[];
>> +extern u64 idt_descr[];
>> +extern u64 tss_descr[];
>> +extern void *entry_vmx;
>> +extern void *entry_sysenter;
>> +extern void *entry_guest;
>> +
>> +void report(const char *name, int result)
>> +{
>> + ++tests;
>> + if (result)
>> + printf("PASS: %s\n", name);
>> + else {
>> + printf("FAIL: %s\n", name);
>> + ++fails;
>> + }
>> +}
>> +
>> +inline u64 get_rflags(void)
>> +{
>> + u64 r;
>> + asm volatile("pushf; pop %0\n\t" : "=q"(r) : : "cc");
>> + return r;
>> +}
>> +
>> +inline void set_rflags(u64 r)
>> +{
>> + asm volatile("push %0; popf\n\t" : : "q"(r) : "cc");
>> +}
>> +
>> +int vmcs_clear(struct vmcs *vmcs)
>> +{
>> + bool ret;
>> + asm volatile ("vmclear %1; seta %0" : "=q" (ret) : "m" (vmcs) : "cc");
>
> You can use "setbe", it's clearer and avoids the ! in the return state

Re: [[Qemu-devel] [PATCH]] nVMX: Initialize IA32_FEATURE_CONTROL MSR in reset and migration

2013-07-16 Thread Gleb Natapov
On Sun, Jul 07, 2013 at 11:13:37PM +0800, Arthur Chunqi Li wrote:
> The recent KVM patch adds IA32_FEATURE_CONTROL support. QEMU needs
> to clear this MSR when reset vCPU and keep the value of it when
> migration. This patch add this feature.
> 
So what happens if we migrate from qemu that does not have this patch
to qemu that does? Since msr_ia32_feature_control will not be migrated
it will not be set on the destination so destination will not be able to
use nested vmx. Since nested vmx is experimental it may be to early for
us to care about it though, and nested vmx does not work with migration
anyway.

> Signed-off-by: Arthur Chunqi Li 
> ---
>  target-i386/cpu.h |2 ++
>  target-i386/kvm.c |4 
>  target-i386/machine.c |   22 ++
>  3 files changed, 28 insertions(+)
> 
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index 62e3547..a418e17 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -301,6 +301,7 @@
>  #define MSR_IA32_APICBASE_BSP   (1<<8)
>  #define MSR_IA32_APICBASE_ENABLE(1<<11)
>  #define MSR_IA32_APICBASE_BASE  (0xf<<12)
> +#define MSR_IA32_FEATURE_CONTROL0x003a
>  #define MSR_TSC_ADJUST  0x003b
>  #define MSR_IA32_TSCDEADLINE0x6e0
>  
> @@ -813,6 +814,7 @@ typedef struct CPUX86State {
>  
>  uint64_t mcg_status;
>  uint64_t msr_ia32_misc_enable;
> +uint64_t msr_ia32_feature_control;
>  
>  /* exception/interrupt handling */
>  int error_code;
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 39f4fbb..3cb2161 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -1122,6 +1122,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
>  if (hyperv_vapic_recommended()) {
>  kvm_msr_entry_set(&msrs[n++], HV_X64_MSR_APIC_ASSIST_PAGE, 0);
>  }
> +kvm_msr_entry_set(&msrs[n++], MSR_IA32_FEATURE_CONTROL, 
> env->msr_ia32_feature_control);
>  }
>  if (env->mcg_cap) {
>  int i;
> @@ -1346,6 +1347,7 @@ static int kvm_get_msrs(X86CPU *cpu)
>  if (has_msr_misc_enable) {
>  msrs[n++].index = MSR_IA32_MISC_ENABLE;
>  }
> +msrs[n++].index = MSR_IA32_FEATURE_CONTROL;
>  
>  if (!env->tsc_valid) {
>  msrs[n++].index = MSR_IA32_TSC;
> @@ -1444,6 +1446,8 @@ static int kvm_get_msrs(X86CPU *cpu)
>  case MSR_IA32_MISC_ENABLE:
>  env->msr_ia32_misc_enable = msrs[i].data;
>  break;
> +case MSR_IA32_FEATURE_CONTROL:
> +env->msr_ia32_feature_control = msrs[i].data;
>  default:
>  if (msrs[i].index >= MSR_MC0_CTL &&
>  msrs[i].index < MSR_MC0_CTL + (env->mcg_cap & 0xff) * 4) {
> diff --git a/target-i386/machine.c b/target-i386/machine.c
> index 3659db9..94ca914 100644
> --- a/target-i386/machine.c
> +++ b/target-i386/machine.c
> @@ -399,6 +399,14 @@ static bool misc_enable_needed(void *opaque)
>  return env->msr_ia32_misc_enable != MSR_IA32_MISC_ENABLE_DEFAULT;
>  }
>  
> +static bool feature_control_needed(void *opaque)
> +{
> +X86CPU *cpu = opaque;
> +CPUX86State *env = &cpu->env;
> +
> +return env->msr_ia32_feature_control != 0;
> +}
> +
>  static const VMStateDescription vmstate_msr_ia32_misc_enable = {
>  .name = "cpu/msr_ia32_misc_enable",
>  .version_id = 1,
> @@ -410,6 +418,17 @@ static const VMStateDescription 
> vmstate_msr_ia32_misc_enable = {
>  }
>  };
>  
> +static const VMStateDescription vmstate_msr_ia32_feature_control = {
> +.name = "cpu/msr_ia32_feature_control",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.minimum_version_id_old = 1,
> +.fields  = (VMStateField []) {
> +VMSTATE_UINT64(env.msr_ia32_feature_control, X86CPU),
> +VMSTATE_END_OF_LIST()
> +}
> +};
> +
>  const VMStateDescription vmstate_x86_cpu = {
>  .name = "cpu",
>  .version_id = 12,
> @@ -535,6 +554,9 @@ const VMStateDescription vmstate_x86_cpu = {
>  }, {
>  .vmsd = &vmstate_msr_ia32_misc_enable,
>  .needed = misc_enable_needed,
> +}, {
> +.vmsd = &vmstate_msr_ia32_feature_control,
> +.needed = feature_control_needed,
>  } , {
>  /* empty */
>  }
> -- 
> 1.7.9.5

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Disabling mergeable rx buffers for the guest

2013-07-16 Thread Naor Shlomo
Thanks Daniel,

Paolo already got it right, it's:


  
  


-Original Message-
From: Daniel P. Berrange [mailto:berra...@redhat.com] 
Sent: Tuesday, July 16, 2013 1:43 PM
To: Naor Shlomo
Cc: Paolo Bonzini; kvm@vger.kernel.org
Subject: Re: Disabling mergeable rx buffers for the guest

On Tue, Jul 16, 2013 at 10:40:28AM +, Naor Shlomo wrote:
> Hi Paolo,
> 
> For some unknown reason it suddenly started to accept the changes to the XML 
> and the strings you gave me are now in place.
> Upon machine start I now receive the following error messages:
> 
> virsh # start NaorDev
> error: Failed to start domain NaorDev
> error: internal error Process exited while reading console log output: kvm: 
> -global: requires an argument
> 
> Here's the XML:
> 

>   
> 
> 
>   

Presumably what you wanted to do was

   
 
 
   

Rather than setting an environment variable.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
N�r��yb�X��ǧv�^�)޺{.n�+h����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf

RE: Disabling mergeable rx buffers for the guest

2013-07-16 Thread Naor Shlomo
Paolo, 

Sorry for all the trouble.
We got a progress. Your last qemu:commandline really worked and I was able to 
try and start the machine.
The problem is that I reached up to the tun driver load section and then the 
machine shut itself down.

The last lines of the console showed the following:

[1.568226] Fixed MDIO Bus: probed
[1.568833] tun: Universal TUN/TAP device driver, 1.6
[1.569108] tun: (C) 1999-2004 Max Krasnyansky 

And it got me back to the virsh command line.

Here's an output of the log file:

LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-1.0 -no-kvm -m 2069 -smp 
1,sockets=1,cores=1,threads=1 -name NaorDev -uuid 
60b5a0ab-8932-47c1-8674-760c7e1f4743 -nodefconfig -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/NaorDev.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
-drive 
file=/var/lib/libvirt/images/NaorDev.img,if=none,id=drive-virtio-disk0,format=raw,cache=writeback
 -device 
virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -drive 
file=/var/lib/libvirt/images/NaorAddonHdd.qcow,if=none,id=drive-virtio-disk1,format=qcow2,cache=none
 -device 
virtio-blk-pci,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1 
-netdev tap,fd=18,id=hostnet0,vhost=on,vhostfd=19 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:c6:b7:a3,bus=pci.0,addr=0x3 
-netdev tap,fd=20,id=hostnet1,vhost=on,vhostfd=21 -device 
virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:9e:44:90,bus=pci.0,addr=0x4 
-netdev tap,fd=22,id=hostnet2,vhost=on,vhostfd=23 -device 
virtio-net-pci,netdev=hostnet2,id=net2,mac=52:54:00:e7:f9:bf,bus=pci.0,addr=0x7 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-usb -vnc 127.0.0.1:0 -vga cirrus -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -global 
virtio-net-pci.mrg_rxbuf=off
Domain id=9 is tainted: custom-argv
char device redirected to /dev/pts/1
2013-07-16 12:48:26.440+: shutting down

and here's the output of lsmod:

vhost_net  32360  0 
macvtap18529  1 vhost_net
macvlan19003  1 macvtap
ip6table_filter12816  0 
ip6_tables 27686  1 ip6table_filter
ebtable_nat12808  0 
ebtables   31024  1 ebtable_nat
ipt_MASQUERADE 12760  3 
iptable_nat13230  1 
nf_nat 25646  2 ipt_MASQUERADE,iptable_nat
nf_conntrack_ipv4  14531  4 iptable_nat,nf_nat
nf_defrag_ipv4 12730  1 nf_conntrack_ipv4
xt_state   12579  1 
nf_conntrack   83300  5 
ipt_MASQUERADE,iptable_nat,nf_nat,nf_conntrack_ipv4,xt_state
ipt_REJECT 12577  2 
xt_CHECKSUM12550  1 
iptable_mangle 12735  1 
xt_tcpudp  12604  5 
iptable_filter 12811  1 
ip_tables  27474  3 iptable_nat,iptable_mangle,iptable_filter
x_tables   29892  12 
ip6table_filter,ip6_tables,ebtables,ipt_MASQUERADE,iptable_nat,xt_state,ipt_REJECT,xt_CHECKSUM,iptable_mangle,xt_tcpudp,iptable_filter,ip_tables
dm_crypt   23126  0 
coretemp   13642  0 
kvm_intel 137888  0 
kvm   422160  1 kvm_intel
ext2   73799  1 
gpio_ich   13384  0 
dm_multipath   23306  0 
scsi_dh14589  1 dm_multipath
lp 17800  0 
microcode  23030  0 
parport46563  1 lp
bridge 91498  0 
sb_edac18104  0 
stp12977  1 bridge
llc14598  2 bridge,stp
joydev 17694  0 
edac_core  53053  3 sb_edac
shpchp 37278  0 
lpc_ich17145  0 
mei41410  0 
acpi_power_meter   18124  0 
mac_hid13254  0 
btrfs 781017  0 
zlib_deflate   27140  1 btrfs
libcrc32c  12645  1 btrfs
vesafb 13846  1 
hid_generic12541  0 
usbhid 47259  0 
hid   100815  2 hid_generic,usbhid
ses17418  0 
enclosure  15313  1 ses
ghash_clmulni_intel13221  0 
aesni_intel51134  0 
cryptd 20531  2 ghash_clmulni_intel,aesni_intel
aes_x86_64 17256  1 aesni_intel
megaraid_sas   82737  2 
ixgbe 184924  0 
dca15233  1 ixgbe
mdio   13808  1 ixgbe
tg3   156594  0 
wmi19257  0

I will appreciate it if you tell me what I am missing now.

Thanks,
Naor

-Original Message-
From: Paolo Bonzini [mailto:pbonz...@redhat.com] 
Sent: Tuesday, July 16, 2013 1:42 PM
To: Naor Shlomo
Cc: kvm@vger.kernel.org
Subject: Re: Disabling mergeable rx buffers for the guest

Il 16/07/2013 12:40, Naor Shlomo ha scritto:
> Hi Pao

Re: [PATCH v5] KVM: nVMX: Set segment infomation of L1 when L2 exits

2013-07-16 Thread Paolo Bonzini
Il 15/07/2013 10:04, Arthur Chunqi Li ha scritto:
> When L2 exits to L1, segment infomations of L1 are not set correctly.
> According to Intel SDM 27.5.2(Loading Host Segment and Descriptor
> Table Registers), segment base/limit/access right of L1 should be
> set to some designed value when L2 exits to L1. This patch fixes
> this.
> 
> Signed-off-by: Arthur Chunqi Li 
> ---
>  arch/x86/kvm/vmx.c |   58 
> +++-
>  1 file changed, 48 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 064d0be..71eb55b 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -7948,6 +7948,8 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, 
> struct vmcs12 *vmcs12)
>  static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
>  struct vmcs12 *vmcs12)
>  {
> + struct kvm_segment seg;
> +
>   if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_EFER)
>   vcpu->arch.efer = vmcs12->host_ia32_efer;
>   else if (vmcs12->vm_exit_controls & VM_EXIT_HOST_ADDR_SPACE_SIZE)
> @@ -8001,16 +8003,6 @@ static void load_vmcs12_host_state(struct kvm_vcpu 
> *vcpu,
>   vmcs_writel(GUEST_SYSENTER_EIP, vmcs12->host_ia32_sysenter_eip);
>   vmcs_writel(GUEST_IDTR_BASE, vmcs12->host_idtr_base);
>   vmcs_writel(GUEST_GDTR_BASE, vmcs12->host_gdtr_base);
> - vmcs_writel(GUEST_TR_BASE, vmcs12->host_tr_base);
> - vmcs_writel(GUEST_GS_BASE, vmcs12->host_gs_base);
> - vmcs_writel(GUEST_FS_BASE, vmcs12->host_fs_base);
> - vmcs_write16(GUEST_ES_SELECTOR, vmcs12->host_es_selector);
> - vmcs_write16(GUEST_CS_SELECTOR, vmcs12->host_cs_selector);
> - vmcs_write16(GUEST_SS_SELECTOR, vmcs12->host_ss_selector);
> - vmcs_write16(GUEST_DS_SELECTOR, vmcs12->host_ds_selector);
> - vmcs_write16(GUEST_FS_SELECTOR, vmcs12->host_fs_selector);
> - vmcs_write16(GUEST_GS_SELECTOR, vmcs12->host_gs_selector);
> - vmcs_write16(GUEST_TR_SELECTOR, vmcs12->host_tr_selector);
>  
>   if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PAT)
>   vmcs_write64(GUEST_IA32_PAT, vmcs12->host_ia32_pat);
> @@ -8018,6 +8010,52 @@ static void load_vmcs12_host_state(struct kvm_vcpu 
> *vcpu,
>   vmcs_write64(GUEST_IA32_PERF_GLOBAL_CTRL,
>   vmcs12->host_ia32_perf_global_ctrl);
>  
> + /* Set L1 segment info according to Intel SDM
> + 27.5.2 Loading Host Segment and Descriptor-Table Registers */
> + seg = (struct kvm_segment) {
> + .base = 0,
> + .limit = 0x,
> + .selector = vmcs12->host_cs_selector,
> + .type = 11,
> + .present = 1,
> + .s = 1,
> + .g = 1
> + };
> + if (vmcs12->vm_exit_controls & VM_EXIT_HOST_ADDR_SPACE_SIZE)
> + seg.l = 1;
> + else
> + seg.db = 1;
> + vmx_set_segment(vcpu, &seg, VCPU_SREG_CS);
> + seg = (struct kvm_segment) {
> + .base = 0,
> + .limit = 0x,
> + .type = 3,
> + .present = 1,
> + .s = 1,
> + .db = 1,
> + .g = 1
> + };
> + seg.selector = vmcs12->host_ds_selector;
> + vmx_set_segment(vcpu, &seg, VCPU_SREG_DS);
> + seg.selector = vmcs12->host_es_selector;
> + vmx_set_segment(vcpu, &seg, VCPU_SREG_ES);
> + seg.selector = vmcs12->host_ss_selector;
> + vmx_set_segment(vcpu, &seg, VCPU_SREG_SS);
> + seg.selector = vmcs12->host_fs_selector;
> + seg.base = vmcs12->host_fs_base;
> + vmx_set_segment(vcpu, &seg, VCPU_SREG_FS);
> + seg.selector = vmcs12->host_gs_selector;
> + seg.base = vmcs12->host_gs_base;
> + vmx_set_segment(vcpu, &seg, VCPU_SREG_GS);
> + seg = (struct kvm_segment) {
> + .base = 0,
> + .limit = 0x67,
> + .selector = vmcs12->host_tr_selector,
> + .type = 11,
> + .present = 1
> + };
> + vmx_set_segment(vcpu, &seg, VCPU_SREG_TR);
> +
>   kvm_set_dr(vcpu, 7, 0x400);
>   vmcs_write64(GUEST_IA32_DEBUGCTL, 0);
>  }
> 

Applied, thanks.

Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Disabling mergeable rx buffers for the guest

2013-07-16 Thread Paolo Bonzini
Il 16/07/2013 12:40, Naor Shlomo ha scritto:
> Hi Paolo,
> 
> For some unknown reason it suddenly started to accept the changes to the XML 
> and the strings you gave me are now in place.

Good.

> Upon machine start I now receive the following error messages:
> 
> virsh # start NaorDev
> error: Failed to start domain NaorDev
> error: internal error Process exited while reading console log output: kvm: 
> -global: requires an argument"
> 

That's because I cut-and-pasted without reading:

>   
> 
> 
>   

The right one is (or at this point I'd better say "should be"):

   
 
 
   

Paolo

> 
> 
> Naor
> 
> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com] 
> Sent: Tuesday, July 16, 2013 12:36 PM
> To: Naor Shlomo
> Cc: kvm@vger.kernel.org
> Subject: Re: Disabling mergeable rx buffers for the guest
> 
> Il 16/07/2013 10:06, Naor Shlomo ha scritto:
>> Thanks again Paolo,
>>
>> I used your string and read the documents in the site you referred me to but 
>> could not understand why doesn't it accept the 
>> xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0' string.
>>
>> I tried it on the following version:
>> Compiled against library: libvir 0.9.8 Using library: libvir 0.9.8 
>> Using API: QEMU 0.9.8 Running hypervisor: QEMU 1.0.0
>>
>> So according to the site it should be supported.
>>
>> Any idea what am I missing now?
> 
> Not sure.  Can you post here the full XML, and the one you're trying to use?
> 
> Paolo
> 
>> Naor
>>
>> -Original Message-
>> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
>> Sent: Tuesday, July 16, 2013 9:42 AM
>> To: Naor Shlomo
>> Cc: kvm@vger.kernel.org
>> Subject: Re: Disabling mergeable rx buffers for the guest
>>
>> Il 16/07/2013 08:40, Naor Shlomo ha scritto:
>>> Hi Paolo and thanks for your quick reply,
>>>
>>> I tried editing (virsh edit) the domain's XML and put the XML excerpt you 
>>> gave me everywhere but with no success.
>>> The moment I exit the edit mode the text was gone (I guess it didn't pass 
>>> some sort of sanity and that's why it was automatically erased).
>>>
>>> What am I doing wrong?
>>
>> My fault.  You need to change the  opening tag to
>>
>> > xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
>>
>> See http://libvirt.org/drvqemu.html#qemucommand for the docs.
>>
>> Paolo
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Disabling mergeable rx buffers for the guest

2013-07-16 Thread Daniel P. Berrange
On Tue, Jul 16, 2013 at 10:40:28AM +, Naor Shlomo wrote:
> Hi Paolo,
> 
> For some unknown reason it suddenly started to accept the changes to the XML 
> and the strings you gave me are now in place.
> Upon machine start I now receive the following error messages:
> 
> virsh # start NaorDev
> error: Failed to start domain NaorDev
> error: internal error Process exited while reading console log output: kvm: 
> -global: requires an argument
> 
> Here's the XML:
> 

>   
> 
> 
>   

Presumably what you wanted to do was

   
 
 
   

Rather than setting an environment variable.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Disabling mergeable rx buffers for the guest

2013-07-16 Thread Naor Shlomo
Hi Paolo,

For some unknown reason it suddenly started to accept the changes to the XML 
and the strings you gave me are now in place.
Upon machine start I now receive the following error messages:

virsh # start NaorDev
error: Failed to start domain NaorDev
error: internal error Process exited while reading console log output: kvm: 
-global: requires an argument"

Here's the XML:


  NaorDev
  60b5a0ab-8932-47c1-8674-760c7e1f4743
  2118656
  2118656
  1
  
hvm

  
  



  
  
  destroy
  restart
  restart
  
/usr/bin/kvm

  
  
  
  


  
  
  
  


  


  
  
  
  


  
  
  
  


  
  
  
  


  


  




  
  


  

  
  


  


Naor

-Original Message-
From: Paolo Bonzini [mailto:pbonz...@redhat.com] 
Sent: Tuesday, July 16, 2013 12:36 PM
To: Naor Shlomo
Cc: kvm@vger.kernel.org
Subject: Re: Disabling mergeable rx buffers for the guest

Il 16/07/2013 10:06, Naor Shlomo ha scritto:
> Thanks again Paolo,
> 
> I used your string and read the documents in the site you referred me to but 
> could not understand why doesn't it accept the 
> xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0' string.
> 
> I tried it on the following version:
> Compiled against library: libvir 0.9.8 Using library: libvir 0.9.8 
> Using API: QEMU 0.9.8 Running hypervisor: QEMU 1.0.0
> 
> So according to the site it should be supported.
> 
> Any idea what am I missing now?

Not sure.  Can you post here the full XML, and the one you're trying to use?

Paolo

> Naor
> 
> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Tuesday, July 16, 2013 9:42 AM
> To: Naor Shlomo
> Cc: kvm@vger.kernel.org
> Subject: Re: Disabling mergeable rx buffers for the guest
> 
> Il 16/07/2013 08:40, Naor Shlomo ha scritto:
>> Hi Paolo and thanks for your quick reply,
>>
>> I tried editing (virsh edit) the domain's XML and put the XML excerpt you 
>> gave me everywhere but with no success.
>> The moment I exit the edit mode the text was gone (I guess it didn't pass 
>> some sort of sanity and that's why it was automatically erased).
>>
>> What am I doing wrong?
> 
> My fault.  You need to change the  opening tag to
> 
>  xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
> 
> See http://libvirt.org/drvqemu.html#qemucommand for the docs.
> 
> Paolo
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-unit-tests : The first version of VMX nested test case

2013-07-16 Thread Paolo Bonzini
Il 16/07/2013 11:27, Arthur Chunqi Li ha scritto:
> This is the first version for VMX nested environment test case. It
> contains the basic VMX instructions test cases, including VMXON/
> VMXOFF/VMXPTRLD/VMXPTRST/VMCLEAR/VMLAUNCH/VMRESUME/VMCALL. This patch
> also tests the basic execution routine in VMX nested environment and
> let the VM print "Hello World" to inform its successfully run.
> 
> New files added:
> x86/vmx.h : contains all VMX related macro declerations
> x86/vmx.c : main file for VMX nested test case
> 
> Signed-off-by: Arthur Chunqi Li 
> ---
>  config-x86-common.mak |2 +
>  config-x86_64.mak |1 +
>  lib/x86/msr.h |5 +
>  x86/cstart64.S|4 +
>  x86/unittests.cfg |6 +
>  x86/vmx.c |  568 
> +
>  x86/vmx.h |  406 +++
>  7 files changed, 992 insertions(+)
>  create mode 100644 x86/vmx.c
>  create mode 100644 x86/vmx.h
> 
> diff --git a/config-x86-common.mak b/config-x86-common.mak
> index 455032b..34a41e1 100644
> --- a/config-x86-common.mak
> +++ b/config-x86-common.mak
> @@ -101,6 +101,8 @@ $(TEST_DIR)/asyncpf.elf: $(cstart.o) $(TEST_DIR)/asyncpf.o
>  
>  $(TEST_DIR)/pcid.elf: $(cstart.o) $(TEST_DIR)/pcid.o
>  
> +$(TEST_DIR)/vmx.elf: $(cstart.o) $(TEST_DIR)/vmx.o
> +
>  arch_clean:
>   $(RM) $(TEST_DIR)/*.o $(TEST_DIR)/*.flat $(TEST_DIR)/*.elf \
>   $(TEST_DIR)/.*.d $(TEST_DIR)/lib/.*.d $(TEST_DIR)/lib/*.o
> diff --git a/config-x86_64.mak b/config-x86_64.mak
> index 4e525f5..bb8ee89 100644
> --- a/config-x86_64.mak
> +++ b/config-x86_64.mak
> @@ -9,5 +9,6 @@ tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
> $(TEST_DIR)/xsave.flat $(TEST_DIR)/rmap_chain.flat \
> $(TEST_DIR)/pcid.flat
>  tests += $(TEST_DIR)/svm.flat
> +tests += $(TEST_DIR)/vmx.flat
>  
>  include config-x86-common.mak
> diff --git a/lib/x86/msr.h b/lib/x86/msr.h
> index 509a421..281255a 100644
> --- a/lib/x86/msr.h
> +++ b/lib/x86/msr.h
> @@ -396,6 +396,11 @@
>  #define MSR_IA32_VMX_VMCS_ENUM  0x048a
>  #define MSR_IA32_VMX_PROCBASED_CTLS20x048b
>  #define MSR_IA32_VMX_EPT_VPID_CAP   0x048c
> +#define MSR_IA32_VMX_TRUE_PIN0x048d
> +#define MSR_IA32_VMX_TRUE_PROC   0x048e
> +#define MSR_IA32_VMX_TRUE_EXIT   0x048f
> +#define MSR_IA32_VMX_TRUE_ENTRY  0x0490
> +
>  
>  /* AMD-V MSRs */
>  
> diff --git a/x86/cstart64.S b/x86/cstart64.S
> index 24df5f8..0fe76da 100644
> --- a/x86/cstart64.S
> +++ b/x86/cstart64.S
> @@ -4,6 +4,10 @@
>  .globl boot_idt
>  boot_idt = 0
>  
> +.globl idt_descr
> +.globl tss_descr
> +.globl gdt64_desc
> +
>  ipi_vector = 0x20
>  
>  max_cpus = 64
> diff --git a/x86/unittests.cfg b/x86/unittests.cfg
> index bc9643e..e846739 100644
> --- a/x86/unittests.cfg
> +++ b/x86/unittests.cfg
> @@ -149,3 +149,9 @@ extra_params = --append "1000 `date +%s`"
>  file = pcid.flat
>  extra_params = -cpu qemu64,+pcid
>  arch = x86_64
> +
> +[vmx]
> +file = vmx.flat
> +extra_params = -cpu Nehalem,+vmx

Should this use "-cpu host" instead? (Or "-cpu host,+vmx", I don't
remember).

> +arch = x86_64
> +
> diff --git a/x86/vmx.c b/x86/vmx.c
> new file mode 100644
> index 000..0435746
> --- /dev/null
> +++ b/x86/vmx.c
> @@ -0,0 +1,568 @@
> +#include "libcflat.h"
> +#include "processor.h"
> +#include "vm.h"
> +#include "desc.h"
> +#include "vmx.h"
> +#include "msr.h"
> +#include "smp.h"
> +#include "io.h"
> +
> +
> +int fails = 0, tests = 0;
> +u32 *vmxon_region;
> +struct vmcs *vmcs_root;
> +void *io_bmp1, *io_bmp2;
> +void *msr_bmp;
> +u32 vpid_ctr;
> +char *guest_stack, *host_stack;
> +char *guest_syscall_stack, *host_syscall_stack;
> +u32 ctrl_pin, ctrl_enter, ctrl_exit, ctrl_cpu[2];
> +ulong fix_cr0_set, fix_cr0_clr;
> +ulong fix_cr4_set, fix_cr4_clr;
> +struct regs regs;
> +
> +extern u64 gdt64_desc[];
> +extern u64 idt_descr[];
> +extern u64 tss_descr[];
> +extern void *entry_vmx;
> +extern void *entry_sysenter;
> +extern void *entry_guest;
> +
> +void report(const char *name, int result)
> +{
> + ++tests;
> + if (result)
> + printf("PASS: %s\n", name);
> + else {
> + printf("FAIL: %s\n", name);
> + ++fails;
> + }
> +}
> +
> +inline u64 get_rflags(void)
> +{
> + u64 r;
> + asm volatile("pushf; pop %0\n\t" : "=q"(r) : : "cc");
> + return r;
> +}
> +
> +inline void set_rflags(u64 r)
> +{
> + asm volatile("push %0; popf\n\t" : : "q"(r) : "cc");
> +}
> +
> +int vmcs_clear(struct vmcs *vmcs)
> +{
> + bool ret;
> + asm volatile ("vmclear %1; seta %0" : "=q" (ret) : "m" (vmcs) : "cc");

You can use "setbe", it's clearer and avoids the ! in the return statement.

We should later add tests for failure conditions, since failing to
detect errors could give rise to L2->L1 attack vectors.  When we do so,
we will have to distinguish CF from ZF.


> + return !ret;
> +}
> +
> 

Re: [PATCH] kvm-unit-tests : The first version of VMX nested test case

2013-07-16 Thread Gleb Natapov
On Tue, Jul 16, 2013 at 05:53:56PM +0800, Arthur Chunqi Li wrote:
> On Tue, Jul 16, 2013 at 5:45 PM, Gleb Natapov  wrote:
> > On Tue, Jul 16, 2013 at 05:35:16PM +0800, Arthur Chunqi Li wrote:
> >> Hi there,
> >> This is a version calling for comments. Some minor changes should be
> > Add RFC before PATCH for such submission please.
> >
> >> done before final commitment (TODOs in it), because these places are
> >> related to the bugs I have commited in the previous weeks and the
> >> relevant patches are not accpeted. But these bugs are all about some
> > I am aware of only one patch. Did I miss/forgot something?
> I have only commit one patch related to VMX tests. Other patches cited
> here is about the bug fixes such as
> [http://www.mail-archive.com/kvm@vger.kernel.org/msg92932.html]
> [http://www.mail-archive.com/kvm@vger.kernel.org/msg93046.html], which
> will cause some test cases fail in this patch.
> 
Those are applied to queue already. I am aware of one that is not
applied yet but it looks fine to me, waiting for Paolo's review.

> I have been discussing with Jan in the past weeks because most of the
> questions are technical affairs, which I don't think it's suitable to
> discuss in the community. This is a rather mature version and I think
> I can commit it with some minor changes.
> >
> >> unexpected occasions and the hypervisor can run if we simply ignore
> >> them. After all bugs fixed, everything will be OK.
> >>
> >> Arthur
> >>
> >
> > --
> > Gleb.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-unit-tests : The first version of VMX nested test case

2013-07-16 Thread Arthur Chunqi Li
On Tue, Jul 16, 2013 at 5:45 PM, Gleb Natapov  wrote:
> On Tue, Jul 16, 2013 at 05:35:16PM +0800, Arthur Chunqi Li wrote:
>> Hi there,
>> This is a version calling for comments. Some minor changes should be
> Add RFC before PATCH for such submission please.
>
>> done before final commitment (TODOs in it), because these places are
>> related to the bugs I have commited in the previous weeks and the
>> relevant patches are not accpeted. But these bugs are all about some
> I am aware of only one patch. Did I miss/forgot something?
I have only commit one patch related to VMX tests. Other patches cited
here is about the bug fixes such as
[http://www.mail-archive.com/kvm@vger.kernel.org/msg92932.html]
[http://www.mail-archive.com/kvm@vger.kernel.org/msg93046.html], which
will cause some test cases fail in this patch.

I have been discussing with Jan in the past weeks because most of the
questions are technical affairs, which I don't think it's suitable to
discuss in the community. This is a rather mature version and I think
I can commit it with some minor changes.
>
>> unexpected occasions and the hypervisor can run if we simply ignore
>> them. After all bugs fixed, everything will be OK.
>>
>> Arthur
>>
>
> --
> Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-unit-tests : The first version of VMX nested test case

2013-07-16 Thread Gleb Natapov
On Tue, Jul 16, 2013 at 05:35:16PM +0800, Arthur Chunqi Li wrote:
> Hi there,
> This is a version calling for comments. Some minor changes should be
Add RFC before PATCH for such submission please.

> done before final commitment (TODOs in it), because these places are
> related to the bugs I have commited in the previous weeks and the
> relevant patches are not accpeted. But these bugs are all about some
I am aware of only one patch. Did I miss/forgot something?

> unexpected occasions and the hypervisor can run if we simply ignore
> them. After all bugs fixed, everything will be OK.
> 
> Arthur
> 

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Disabling mergeable rx buffers for the guest

2013-07-16 Thread Paolo Bonzini
Il 16/07/2013 10:06, Naor Shlomo ha scritto:
> Thanks again Paolo,
> 
> I used your string and read the documents in the site you referred me to but 
> could not understand why doesn't it accept the 
> xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0' string.
> 
> I tried it on the following version:
> Compiled against library: libvir 0.9.8
> Using library: libvir 0.9.8
> Using API: QEMU 0.9.8
> Running hypervisor: QEMU 1.0.0
> 
> So according to the site it should be supported.
> 
> Any idea what am I missing now?

Not sure.  Can you post here the full XML, and the one you're trying to use?

Paolo

> Naor
> 
> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com] 
> Sent: Tuesday, July 16, 2013 9:42 AM
> To: Naor Shlomo
> Cc: kvm@vger.kernel.org
> Subject: Re: Disabling mergeable rx buffers for the guest
> 
> Il 16/07/2013 08:40, Naor Shlomo ha scritto:
>> Hi Paolo and thanks for your quick reply,
>>
>> I tried editing (virsh edit) the domain's XML and put the XML excerpt you 
>> gave me everywhere but with no success.
>> The moment I exit the edit mode the text was gone (I guess it didn't pass 
>> some sort of sanity and that's why it was automatically erased).
>>
>> What am I doing wrong?
> 
> My fault.  You need to change the  opening tag to
> 
> 
> 
> See http://libvirt.org/drvqemu.html#qemucommand for the docs.
> 
> Paolo
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-unit-tests : The first version of VMX nested test case

2013-07-16 Thread Arthur Chunqi Li
Hi there,
This is a version calling for comments. Some minor changes should be
done before final commitment (TODOs in it), because these places are
related to the bugs I have commited in the previous weeks and the
relevant patches are not accpeted. But these bugs are all about some
unexpected occasions and the hypervisor can run if we simply ignore
them. After all bugs fixed, everything will be OK.

Arthur

On Tue, Jul 16, 2013 at 5:27 PM, Arthur Chunqi Li  wrote:
> This is the first version for VMX nested environment test case. It
> contains the basic VMX instructions test cases, including VMXON/
> VMXOFF/VMXPTRLD/VMXPTRST/VMCLEAR/VMLAUNCH/VMRESUME/VMCALL. This patch
> also tests the basic execution routine in VMX nested environment and
> let the VM print "Hello World" to inform its successfully run.
>
> New files added:
> x86/vmx.h : contains all VMX related macro declerations
> x86/vmx.c : main file for VMX nested test case
>
> Signed-off-by: Arthur Chunqi Li 
> ---
>  config-x86-common.mak |2 +
>  config-x86_64.mak |1 +
>  lib/x86/msr.h |5 +
>  x86/cstart64.S|4 +
>  x86/unittests.cfg |6 +
>  x86/vmx.c |  568 
> +
>  x86/vmx.h |  406 +++
>  7 files changed, 992 insertions(+)
>  create mode 100644 x86/vmx.c
>  create mode 100644 x86/vmx.h
>
> diff --git a/config-x86-common.mak b/config-x86-common.mak
> index 455032b..34a41e1 100644
> --- a/config-x86-common.mak
> +++ b/config-x86-common.mak
> @@ -101,6 +101,8 @@ $(TEST_DIR)/asyncpf.elf: $(cstart.o) $(TEST_DIR)/asyncpf.o
>
>  $(TEST_DIR)/pcid.elf: $(cstart.o) $(TEST_DIR)/pcid.o
>
> +$(TEST_DIR)/vmx.elf: $(cstart.o) $(TEST_DIR)/vmx.o
> +
>  arch_clean:
> $(RM) $(TEST_DIR)/*.o $(TEST_DIR)/*.flat $(TEST_DIR)/*.elf \
> $(TEST_DIR)/.*.d $(TEST_DIR)/lib/.*.d $(TEST_DIR)/lib/*.o
> diff --git a/config-x86_64.mak b/config-x86_64.mak
> index 4e525f5..bb8ee89 100644
> --- a/config-x86_64.mak
> +++ b/config-x86_64.mak
> @@ -9,5 +9,6 @@ tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
>   $(TEST_DIR)/xsave.flat $(TEST_DIR)/rmap_chain.flat \
>   $(TEST_DIR)/pcid.flat
>  tests += $(TEST_DIR)/svm.flat
> +tests += $(TEST_DIR)/vmx.flat
>
>  include config-x86-common.mak
> diff --git a/lib/x86/msr.h b/lib/x86/msr.h
> index 509a421..281255a 100644
> --- a/lib/x86/msr.h
> +++ b/lib/x86/msr.h
> @@ -396,6 +396,11 @@
>  #define MSR_IA32_VMX_VMCS_ENUM  0x048a
>  #define MSR_IA32_VMX_PROCBASED_CTLS20x048b
>  #define MSR_IA32_VMX_EPT_VPID_CAP   0x048c
> +#define MSR_IA32_VMX_TRUE_PIN  0x048d
> +#define MSR_IA32_VMX_TRUE_PROC 0x048e
> +#define MSR_IA32_VMX_TRUE_EXIT 0x048f
> +#define MSR_IA32_VMX_TRUE_ENTRY0x0490
> +
>
>  /* AMD-V MSRs */
>
> diff --git a/x86/cstart64.S b/x86/cstart64.S
> index 24df5f8..0fe76da 100644
> --- a/x86/cstart64.S
> +++ b/x86/cstart64.S
> @@ -4,6 +4,10 @@
>  .globl boot_idt
>  boot_idt = 0
>
> +.globl idt_descr
> +.globl tss_descr
> +.globl gdt64_desc
> +
>  ipi_vector = 0x20
>
>  max_cpus = 64
> diff --git a/x86/unittests.cfg b/x86/unittests.cfg
> index bc9643e..e846739 100644
> --- a/x86/unittests.cfg
> +++ b/x86/unittests.cfg
> @@ -149,3 +149,9 @@ extra_params = --append "1000 `date +%s`"
>  file = pcid.flat
>  extra_params = -cpu qemu64,+pcid
>  arch = x86_64
> +
> +[vmx]
> +file = vmx.flat
> +extra_params = -cpu Nehalem,+vmx
> +arch = x86_64
> +
> diff --git a/x86/vmx.c b/x86/vmx.c
> new file mode 100644
> index 000..0435746
> --- /dev/null
> +++ b/x86/vmx.c
> @@ -0,0 +1,568 @@
> +#include "libcflat.h"
> +#include "processor.h"
> +#include "vm.h"
> +#include "desc.h"
> +#include "vmx.h"
> +#include "msr.h"
> +#include "smp.h"
> +#include "io.h"
> +
> +
> +int fails = 0, tests = 0;
> +u32 *vmxon_region;
> +struct vmcs *vmcs_root;
> +void *io_bmp1, *io_bmp2;
> +void *msr_bmp;
> +u32 vpid_ctr;
> +char *guest_stack, *host_stack;
> +char *guest_syscall_stack, *host_syscall_stack;
> +u32 ctrl_pin, ctrl_enter, ctrl_exit, ctrl_cpu[2];
> +ulong fix_cr0_set, fix_cr0_clr;
> +ulong fix_cr4_set, fix_cr4_clr;
> +struct regs regs;
> +
> +extern u64 gdt64_desc[];
> +extern u64 idt_descr[];
> +extern u64 tss_descr[];
> +extern void *entry_vmx;
> +extern void *entry_sysenter;
> +extern void *entry_guest;
> +
> +void report(const char *name, int result)
> +{
> +   ++tests;
> +   if (result)
> +   printf("PASS: %s\n", name);
> +   else {
> +   printf("FAIL: %s\n", name);
> +   ++fails;
> +   }
> +}
> +
> +inline u64 get_rflags(void)
> +{
> +   u64 r;
> +   asm volatile("pushf; pop %0\n\t" : "=q"(r) : : "cc");
> +   return r;
> +}
> +
> +inline void set_rflags(u64 r)
> +{
> +   asm volatile("push %0; popf\n\t" : : "q"(r) : "cc");
> +}
> +
> +int vmcs_clear(struct vmcs *vmcs)
> +{
> +   bool ret;
> +   asm volatile ("vmclear

[PATCH] kvm-unit-tests : The first version of VMX nested test case

2013-07-16 Thread Arthur Chunqi Li
This is the first version for VMX nested environment test case. It
contains the basic VMX instructions test cases, including VMXON/
VMXOFF/VMXPTRLD/VMXPTRST/VMCLEAR/VMLAUNCH/VMRESUME/VMCALL. This patch
also tests the basic execution routine in VMX nested environment and
let the VM print "Hello World" to inform its successfully run.

New files added:
x86/vmx.h : contains all VMX related macro declerations
x86/vmx.c : main file for VMX nested test case

Signed-off-by: Arthur Chunqi Li 
---
 config-x86-common.mak |2 +
 config-x86_64.mak |1 +
 lib/x86/msr.h |5 +
 x86/cstart64.S|4 +
 x86/unittests.cfg |6 +
 x86/vmx.c |  568 +
 x86/vmx.h |  406 +++
 7 files changed, 992 insertions(+)
 create mode 100644 x86/vmx.c
 create mode 100644 x86/vmx.h

diff --git a/config-x86-common.mak b/config-x86-common.mak
index 455032b..34a41e1 100644
--- a/config-x86-common.mak
+++ b/config-x86-common.mak
@@ -101,6 +101,8 @@ $(TEST_DIR)/asyncpf.elf: $(cstart.o) $(TEST_DIR)/asyncpf.o
 
 $(TEST_DIR)/pcid.elf: $(cstart.o) $(TEST_DIR)/pcid.o
 
+$(TEST_DIR)/vmx.elf: $(cstart.o) $(TEST_DIR)/vmx.o
+
 arch_clean:
$(RM) $(TEST_DIR)/*.o $(TEST_DIR)/*.flat $(TEST_DIR)/*.elf \
$(TEST_DIR)/.*.d $(TEST_DIR)/lib/.*.d $(TEST_DIR)/lib/*.o
diff --git a/config-x86_64.mak b/config-x86_64.mak
index 4e525f5..bb8ee89 100644
--- a/config-x86_64.mak
+++ b/config-x86_64.mak
@@ -9,5 +9,6 @@ tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
  $(TEST_DIR)/xsave.flat $(TEST_DIR)/rmap_chain.flat \
  $(TEST_DIR)/pcid.flat
 tests += $(TEST_DIR)/svm.flat
+tests += $(TEST_DIR)/vmx.flat
 
 include config-x86-common.mak
diff --git a/lib/x86/msr.h b/lib/x86/msr.h
index 509a421..281255a 100644
--- a/lib/x86/msr.h
+++ b/lib/x86/msr.h
@@ -396,6 +396,11 @@
 #define MSR_IA32_VMX_VMCS_ENUM  0x048a
 #define MSR_IA32_VMX_PROCBASED_CTLS20x048b
 #define MSR_IA32_VMX_EPT_VPID_CAP   0x048c
+#define MSR_IA32_VMX_TRUE_PIN  0x048d
+#define MSR_IA32_VMX_TRUE_PROC 0x048e
+#define MSR_IA32_VMX_TRUE_EXIT 0x048f
+#define MSR_IA32_VMX_TRUE_ENTRY0x0490
+
 
 /* AMD-V MSRs */
 
diff --git a/x86/cstart64.S b/x86/cstart64.S
index 24df5f8..0fe76da 100644
--- a/x86/cstart64.S
+++ b/x86/cstart64.S
@@ -4,6 +4,10 @@
 .globl boot_idt
 boot_idt = 0
 
+.globl idt_descr
+.globl tss_descr
+.globl gdt64_desc
+
 ipi_vector = 0x20
 
 max_cpus = 64
diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index bc9643e..e846739 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -149,3 +149,9 @@ extra_params = --append "1000 `date +%s`"
 file = pcid.flat
 extra_params = -cpu qemu64,+pcid
 arch = x86_64
+
+[vmx]
+file = vmx.flat
+extra_params = -cpu Nehalem,+vmx
+arch = x86_64
+
diff --git a/x86/vmx.c b/x86/vmx.c
new file mode 100644
index 000..0435746
--- /dev/null
+++ b/x86/vmx.c
@@ -0,0 +1,568 @@
+#include "libcflat.h"
+#include "processor.h"
+#include "vm.h"
+#include "desc.h"
+#include "vmx.h"
+#include "msr.h"
+#include "smp.h"
+#include "io.h"
+
+
+int fails = 0, tests = 0;
+u32 *vmxon_region;
+struct vmcs *vmcs_root;
+void *io_bmp1, *io_bmp2;
+void *msr_bmp;
+u32 vpid_ctr;
+char *guest_stack, *host_stack;
+char *guest_syscall_stack, *host_syscall_stack;
+u32 ctrl_pin, ctrl_enter, ctrl_exit, ctrl_cpu[2];
+ulong fix_cr0_set, fix_cr0_clr;
+ulong fix_cr4_set, fix_cr4_clr;
+struct regs regs;
+
+extern u64 gdt64_desc[];
+extern u64 idt_descr[];
+extern u64 tss_descr[];
+extern void *entry_vmx;
+extern void *entry_sysenter;
+extern void *entry_guest;
+
+void report(const char *name, int result)
+{
+   ++tests;
+   if (result)
+   printf("PASS: %s\n", name);
+   else {
+   printf("FAIL: %s\n", name);
+   ++fails;
+   }
+}
+
+inline u64 get_rflags(void)
+{
+   u64 r;
+   asm volatile("pushf; pop %0\n\t" : "=q"(r) : : "cc");
+   return r;
+}
+
+inline void set_rflags(u64 r)
+{
+   asm volatile("push %0; popf\n\t" : : "q"(r) : "cc");
+}
+
+int vmcs_clear(struct vmcs *vmcs)
+{
+   bool ret;
+   asm volatile ("vmclear %1; seta %0" : "=q" (ret) : "m" (vmcs) : "cc");
+   return !ret;
+}
+
+u64 vmcs_read(enum Encoding enc)
+{
+   u64 val;
+   asm volatile ("vmread %1, %0" : "=rm" (val) : "r" ((u64)enc) : "cc");
+   return val;
+}
+
+int vmcs_write(enum Encoding enc, u64 val)
+{
+   bool ret;
+   asm volatile ("vmwrite %1, %2; seta %0"
+   : "=q"(ret) : "rm" (val), "r" ((u64)enc) : "cc");
+   return !ret;
+}
+
+int make_vmcs_current(struct vmcs *vmcs)
+{
+   bool ret;
+
+   asm volatile ("vmptrld %1; seta %0" : "=q" (ret) : "m" (vmcs) : "cc");
+   return !ret;
+}
+
+int save_vmcs(struct vmcs **vmcs)
+{
+   bool ret;
+
+   asm volatile ("vmptrst %1; seta %0" : "=q" (ret) : "m" (*vmcs) : "cc");
+   return !ret;

RE: Disabling mergeable rx buffers for the guest

2013-07-16 Thread Naor Shlomo
Thanks again Paolo,

I used your string and read the documents in the site you referred me to but 
could not understand why doesn't it accept the 
xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0' string.

I tried it on the following version:
Compiled against library: libvir 0.9.8
Using library: libvir 0.9.8
Using API: QEMU 0.9.8
Running hypervisor: QEMU 1.0.0

So according to the site it should be supported.

Any idea what am I missing now?

Naor

-Original Message-
From: Paolo Bonzini [mailto:pbonz...@redhat.com] 
Sent: Tuesday, July 16, 2013 9:42 AM
To: Naor Shlomo
Cc: kvm@vger.kernel.org
Subject: Re: Disabling mergeable rx buffers for the guest

Il 16/07/2013 08:40, Naor Shlomo ha scritto:
> Hi Paolo and thanks for your quick reply,
> 
> I tried editing (virsh edit) the domain's XML and put the XML excerpt you 
> gave me everywhere but with no success.
> The moment I exit the edit mode the text was gone (I guess it didn't pass 
> some sort of sanity and that's why it was automatically erased).
> 
> What am I doing wrong?

My fault.  You need to change the  opening tag to



See http://libvirt.org/drvqemu.html#qemucommand for the docs.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html