Re: [PATCH] KVM: x86: emulate: correct page fault error code for NoWrite instructions

2015-11-19 Thread Nadav Amit
Wanpeng Li  wrote:

> 2015-11-20 10:52 GMT+08:00 Wanpeng Li :
>> Hi Paolo,
>> 2015-02-09 17:03 GMT+08:00 Paolo Bonzini :
>>> NoWrite instructions (e.g. cmp or test) never set the "write access"
>>> bit in the error code, even if one of the operands is treated as a
>>> destination.
>> 
>> Sorry to reply to an old commit, btw, could you point out where in SDM
>> describe above?
> 
> I see.

I don’t understand whether you still need my help, so to clarify: on a
page-fault the error code should indicate whether the access was due to a
write access. Previously KVM marked it as “write access” for instructions
such as test and cmp that do not perform write.

Nadav--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: fix eflags state following processor init/reset

2015-10-28 Thread Nadav Amit
Here are my 5 cents. Note that vmx_vcpu_reset calls:

vmcs_writel(GUEST_RFLAGS, 0x02);

(And the RFLAGS value is not cached by KVM, so no consistency problem should
occur.)

You may want to change the value into constant or call a wrapper function
for setting RFLAGS, but I don’t see something broken in the functionality.

Regards,
Nadav

Wanpeng Li  wrote:

> Ping, :-)
> On 10/21/15 2:28 PM, Wanpeng Li wrote:
>> Reference SDM 3.4.3:
>> 
>> Following initialization of the processor (either by asserting the
>> RESET pin or the INIT pin), the state of the EFLAGS register is
>> 0002H.
>> 
>> However, the eflags fixed bit is not set and other bits are also not
>> cleared during the init/reset in kvm.
>> 
>> This patch fix it by set eflags register to 0002H following
>> initialization of the processor.
>> 
>> Signed-off-by: Wanpeng Li 
>> ---
>>  arch/x86/kvm/vmx.c | 1 +
>>  1 file changed, 1 insertion(+)
>> 
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index b680c2e..326f6ea 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -4935,6 +4935,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool 
>> init_event)
>>  vmx_set_efer(vcpu, 0);
>>  vmx_fpu_activate(vcpu);
>>  update_exception_bitmap(vcpu);
>> +vmx_set_rflags(vcpu, X86_EFLAGS_FIXED);
>>  vpid_sync_context(vmx->vpid);
>>  }
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests] x86: Test watchpoints during emulation

2015-09-24 Thread Nadav Amit
From: Rami Burstein& Anrey Isakov 

This adds a test for data and port watchpoints during instruction emulation.

Signed-off-by: Rami Burstein 
Signed-off-by: Andrey Isakov 

---

This unit tests are based on old kvm-unit-tests version and are only delivered
for reference. They are certainly bloated in their current form.

---
 x86/debug.c | 374 
 1 file changed, 374 insertions(+)

diff --git a/x86/debug.c b/x86/debug.c
index 34e56fb..43a2777 100644
--- a/x86/debug.c
+++ b/x86/debug.c
@@ -11,10 +11,24 @@
 
 #include "libcflat.h"
 #include "desc.h"
+#include "vm.h"
+
+#define TESTDEV_IO_PORT0xe0
 
 static volatile unsigned long bp_addr[10], dr6[10];
 static volatile unsigned int n;
 static volatile unsigned long value;
+static volatile char st1[] = "abcdefghijklmnop";
+
+static void set_dr2(void *value)
+{
+   asm volatile("mov %0,%%dr2" : : "r" (value));
+}
+
+static void set_dr3(void *value)
+{
+   asm volatile("mov %0,%%dr3" : : "r" (value));
+}
 
 static unsigned long get_dr6(void)
 {
@@ -58,6 +72,360 @@ static void handle_bp(struct ex_regs *regs)
bp_addr[0] = regs->rip;
 }
 
+static void test_port_io_bp()
+{
+   unsigned char r = 0;
+   unsigned short port_in = 0;
+   unsigned int port_din = 0;
+
+   n = 0;
+   set_dr0((void*)TESTDEV_IO_PORT);
+   set_dr1((void*)TESTDEV_IO_PORT);
+   set_dr2((void*)TESTDEV_IO_PORT);
+   set_dr3((void*)TESTDEV_IO_PORT);
+   set_dr7(0x00020402);
+
+   // Set DE flag.
+   write_cr4(read_cr4() | 0x8);
+   asm volatile("movw %0, %%dx \n\t" "outsb \n\t"
+   : : "i"((short)TESTDEV_IO_PORT),"S"(st1));
+   asm volatile("inb %1, %0\n\t" : "=a"(r) : "i"((short)TESTDEV_IO_PORT));
+io_0:
+   report("em_IO out_in 1 byte bp through dr0",
+   n == 2 && bp_addr[n-1] == ((unsigned long)&_0) &&
+   dr6[n-1] == 0x4ff1);
+
+   set_dr7(0x00200408);
+   asm volatile("movw %0, %%dx \n\t" "outsb \n\t"
+   : : "i"((short)TESTDEV_IO_PORT), "S"(st1));
+   asm volatile("inb %1, %0\n\t" : "=a"(r)
+   : "i"((short)TESTDEV_IO_PORT));
+io_1:
+   report("em_IO out_in 1 byte bp through dr1",
+   n == 4 && bp_addr[n-1] == ((unsigned long)&_1) &&
+   dr6[n-1] == 0x4ff2);
+
+   set_dr7(0x02000420);
+   asm volatile("movw %0, %%dx \n\t" "outsb \n\t"
+   : : "i"((short)TESTDEV_IO_PORT), "S"(st1));
+   asm volatile("inb %1, %0\n\t" : "=a"(r)
+   : "i"((short)TESTDEV_IO_PORT));
+io_2:
+   report("em_IO out_in 1 byte bp through dr2",
+   n == 6 && bp_addr[n-1] == ((unsigned long)&_2) &&
+   dr6[n-1] == 0x4ff4);
+
+   set_dr7(0x2480);
+   asm volatile("movw %0, %%dx \n\t" "outsb \n\t"
+   : : "i"((short)TESTDEV_IO_PORT), "S"(st1));
+   asm volatile("inb %1, %0\n\t" : "=a"(r)
+   : "i"((short)TESTDEV_IO_PORT));
+io_3:
+   report("em_IO out_in 1 byte bp through dr3",
+   n == 8 && bp_addr[n-1] == ((unsigned long)&_3) &&
+   dr6[n-1] == 0x4ff8);
+
+   n=0;
+   set_dr7(0x6480);
+   asm volatile("movw %0, %%dx \n\t" "outsw \n\t"
+: : "i"((short)TESTDEV_IO_PORT), "S"(st1));
+   asm volatile("inw %1, %0\n\t" : "=r"(port_in)
+   : "i"((short)TESTDEV_IO_PORT));
+io_w:
+   report("em_IO out_in 2 byte bp through dr3",
+   n == 2 && bp_addr[n-1] == ((unsigned 
long)&_w) &&
+   dr6[n-1] == 0x4ff8);
+
+   set_dr7(0xe480);
+   asm volatile("movw %0, %%dx \n\t" "outsl \n\t"
+   : : "i"((short)TESTDEV_IO_PORT), "S"(st1));
+   asm volatile ("insl;" :"+D" (port_din)
+   : "i" ((short)TESTDEV_IO_PORT) : "memory");
+io_dw:
+   report("em_IO out_in 4 byte bp through dr3",
+   n == 4 && bp_addr[n-1] == ((unsigned long)&_dw)
+   && dr6[n-1] == 0x4ff8);
+}
+static void test_port_io_aligned_bp(){
+
+   unsigned char r = 0;
+   unsigned short port_in = 0;
+   unsigned int port_din = 0;
+
+   n = 0;
+   write_cr4(read_cr4() | 0x8);
+
+   set_dr7(0x00020402);
+   set_dr0((void*)(TESTDEV_IO_PORT ));
+   asm volatile("movw %0, %%dx \n\t" "outsb \n\t"
+   : : "i"((short)TESTDEV_IO_PORT),"S"(st1));
+   asm volatile("inb %1, %0\n\t" : "=a"(r)
+   : "i"((short)TESTDEV_IO_PORT));
+aio_0:
+   report("em_IO aligned out_in 1 byte bp through dr0",
+   n == 2 && 

Re: [PATCH 2/2] KVM: x86: Guest watchpoints during emulation.

2015-09-24 Thread Nadav Amit
Looking at the patch again, there two points you may want to change, see
inline.

If you want, I’ll send a v2.

Nadav

Nadav Amit <na...@cs.technion.ac.il> wrote:

> From: Nadav Amit <nadav.a...@gmail.com>
> 
> This adds support for guest data and I/O breakpoints during instruction
> emulation.
> 
> Watchpoints are examined during data and io interceptions: segmented_read,
> segmented_write, em_in, em_out, segmented_read_std and kvm_fast_pio_out.
> 
> When such a breakpoint is triggered, trap is reported by DB_VECTOR
> exception.
> 
> Signed-off-by: Andrey Isakov <andreyisak...@gmail.com>
> Signed-off-by: Rami Burstein <ramib...@gmail.com>
> Signed-off-by: Nadav Amit <nadav.a...@gmail.com>
> ---
> arch/x86/include/asm/kvm_emulate.h |  3 ++
> arch/x86/kvm/emulate.c | 32 +
> arch/x86/kvm/x86.c | 74 +-
> 3 files changed, 92 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_emulate.h 
> b/arch/x86/include/asm/kvm_emulate.h
> index e16466e..f6d5d6c 100644
> --- a/arch/x86/include/asm/kvm_emulate.h
> +++ b/arch/x86/include/asm/kvm_emulate.h
> @@ -211,6 +211,9 @@ struct x86_emulate_ops {
>   void (*get_cpuid)(struct x86_emulate_ctxt *ctxt,
> u32 *eax, u32 *ebx, u32 *ecx, u32 *edx);
>   void (*set_nmi_mask)(struct x86_emulate_ctxt *ctxt, bool masked);
> + bool (*check_watchpoint)(struct x86_emulate_ctxt *ctxt,
> +  unsigned long addr, unsigned int length,
> +  int type);
> };
> 
> typedef u32 __attribute__((vector_size(16))) sse128_t;
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index b372a75..4e91b7b 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -24,6 +24,7 @@
> #include "kvm_cache_regs.h"
> #include 
> #include 
> +#include 
> #include 
> #include 
> 
> @@ -564,6 +565,17 @@ static int emulate_db(struct x86_emulate_ctxt *ctxt)
>   return emulate_exception(ctxt, DB_VECTOR, 0, false);
> }
> 
> +static void emulate_db_trap(struct x86_emulate_ctxt *ctxt)
> +{
> + /*
> +  * If a fault is later encountered, the exception information will be
> +  * overridden. Otherwise the trap would be handled after the emulation
> +  * is completed.
> +  */
> + (void)emulate_exception(ctxt, DB_VECTOR, 0, false);
> + ctxt->have_exception = true;
> +}
> +
> static int emulate_gp(struct x86_emulate_ctxt *ctxt, int err)
> {
>   return emulate_exception(ctxt, GP_VECTOR, err, true);
> @@ -776,6 +788,10 @@ static int segmented_read_std(struct x86_emulate_ctxt 
> *ctxt,
>   rc = linearize(ctxt, addr, size, false, );
>   if (rc != X86EMUL_CONTINUE)
>   return rc;
> +
> + if (ctxt->ops->check_watchpoint(ctxt, linear, size, DR_RW_READ))
> + emulate_db_trap(ctxt);
> +
>   return ctxt->ops->read_std(ctxt, linear, data, size, >exception);
> }
> 
> @@ -1369,6 +1385,10 @@ static int segmented_read(struct x86_emulate_ctxt 
> *ctxt,
>   rc = linearize(ctxt, addr, size, false, );
>   if (rc != X86EMUL_CONTINUE)
>   return rc;
> +
> + if (ctxt->ops->check_watchpoint(ctxt, linear, size, DR_RW_READ))
> + emulate_db_trap(ctxt);
> +
>   return read_emulated(ctxt, linear, data, size);
> }
> 
> @@ -1383,6 +1403,10 @@ static int segmented_write(struct x86_emulate_ctxt 
> *ctxt,
>   rc = linearize(ctxt, addr, size, true, );
>   if (rc != X86EMUL_CONTINUE)
>   return rc;
> +
> + if (ctxt->ops->check_watchpoint(ctxt, linear, size, DR_RW_WRITE))
> + emulate_db_trap(ctxt);
> +
>   return ctxt->ops->write_emulated(ctxt, linear, data, size,
>>exception);
> }
> @@ -3729,11 +3753,19 @@ static int em_in(struct x86_emulate_ctxt *ctxt)
>>dst.val))
>   return X86EMUL_IO_NEEDED;
> 
> + if (ctxt->ops->check_watchpoint(ctxt, ctxt->src.val, ctxt->dst.bytes,
> + DR_RW_PORT))
> + emulate_db_trap(ctxt);
> +
>   return X86EMUL_CONTINUE;
> }
> 
> static int em_out(struct x86_emulate_ctxt *ctxt)
> {
> + if (ctxt->ops->check_watchpoint(ctxt, ctxt->dst.val, ctxt->src.bytes,
> + DR_RW_PORT))
> + emulate_db_trap(ctxt);
> +
>   ctxt->ops->pio_out_emulated(ctxt, ctxt->src.bytes, ctxt->dst.val,
>   >src.v

[PATCH 2/2] KVM: x86: Guest watchpoints during emulation.

2015-09-24 Thread Nadav Amit
From: Nadav Amit <nadav.a...@gmail.com>

This adds support for guest data and I/O breakpoints during instruction
emulation.

Watchpoints are examined during data and io interceptions: segmented_read,
segmented_write, em_in, em_out, segmented_read_std and kvm_fast_pio_out.

When such a breakpoint is triggered, trap is reported by DB_VECTOR
exception.

Signed-off-by: Andrey Isakov <andreyisak...@gmail.com>
Signed-off-by: Rami Burstein <ramib...@gmail.com>
Signed-off-by: Nadav Amit <nadav.a...@gmail.com>
---
 arch/x86/include/asm/kvm_emulate.h |  3 ++
 arch/x86/kvm/emulate.c | 32 +
 arch/x86/kvm/x86.c | 74 +-
 3 files changed, 92 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index e16466e..f6d5d6c 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -211,6 +211,9 @@ struct x86_emulate_ops {
void (*get_cpuid)(struct x86_emulate_ctxt *ctxt,
  u32 *eax, u32 *ebx, u32 *ecx, u32 *edx);
void (*set_nmi_mask)(struct x86_emulate_ctxt *ctxt, bool masked);
+   bool (*check_watchpoint)(struct x86_emulate_ctxt *ctxt,
+unsigned long addr, unsigned int length,
+int type);
 };
 
 typedef u32 __attribute__((vector_size(16))) sse128_t;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index b372a75..4e91b7b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -24,6 +24,7 @@
 #include "kvm_cache_regs.h"
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -564,6 +565,17 @@ static int emulate_db(struct x86_emulate_ctxt *ctxt)
return emulate_exception(ctxt, DB_VECTOR, 0, false);
 }
 
+static void emulate_db_trap(struct x86_emulate_ctxt *ctxt)
+{
+   /*
+* If a fault is later encountered, the exception information will be
+* overridden. Otherwise the trap would be handled after the emulation
+* is completed.
+*/
+   (void)emulate_exception(ctxt, DB_VECTOR, 0, false);
+   ctxt->have_exception = true;
+}
+
 static int emulate_gp(struct x86_emulate_ctxt *ctxt, int err)
 {
return emulate_exception(ctxt, GP_VECTOR, err, true);
@@ -776,6 +788,10 @@ static int segmented_read_std(struct x86_emulate_ctxt 
*ctxt,
rc = linearize(ctxt, addr, size, false, );
if (rc != X86EMUL_CONTINUE)
return rc;
+
+   if (ctxt->ops->check_watchpoint(ctxt, linear, size, DR_RW_READ))
+   emulate_db_trap(ctxt);
+
return ctxt->ops->read_std(ctxt, linear, data, size, >exception);
 }
 
@@ -1369,6 +1385,10 @@ static int segmented_read(struct x86_emulate_ctxt *ctxt,
rc = linearize(ctxt, addr, size, false, );
if (rc != X86EMUL_CONTINUE)
return rc;
+
+   if (ctxt->ops->check_watchpoint(ctxt, linear, size, DR_RW_READ))
+   emulate_db_trap(ctxt);
+
return read_emulated(ctxt, linear, data, size);
 }
 
@@ -1383,6 +1403,10 @@ static int segmented_write(struct x86_emulate_ctxt *ctxt,
rc = linearize(ctxt, addr, size, true, );
if (rc != X86EMUL_CONTINUE)
return rc;
+
+   if (ctxt->ops->check_watchpoint(ctxt, linear, size, DR_RW_WRITE))
+   emulate_db_trap(ctxt);
+
return ctxt->ops->write_emulated(ctxt, linear, data, size,
 >exception);
 }
@@ -3729,11 +3753,19 @@ static int em_in(struct x86_emulate_ctxt *ctxt)
 >dst.val))
return X86EMUL_IO_NEEDED;
 
+   if (ctxt->ops->check_watchpoint(ctxt, ctxt->src.val, ctxt->dst.bytes,
+   DR_RW_PORT))
+   emulate_db_trap(ctxt);
+
return X86EMUL_CONTINUE;
 }
 
 static int em_out(struct x86_emulate_ctxt *ctxt)
 {
+   if (ctxt->ops->check_watchpoint(ctxt, ctxt->dst.val, ctxt->src.bytes,
+   DR_RW_PORT))
+   emulate_db_trap(ctxt);
+
ctxt->ops->pio_out_emulated(ctxt, ctxt->src.bytes, ctxt->dst.val,
>src.val, 1);
/* Disable writeback. */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3e4d032..ba75f76 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4831,6 +4831,55 @@ static void emulator_set_nmi_mask(struct 
x86_emulate_ctxt *ctxt, bool masked)
kvm_x86_ops->set_nmi_mask(emul_to_vcpu(ctxt), masked);
 }
 
+static int kvm_vcpu_check_hw_bp(unsigned long addr, u32 type, u32 len,
+   u32 dr7, unsigned long *db)
+{
+   u32 dr6 = 0;
+   int i;
+   u32 enable, dr7_rw, dr7_len;
+   unsigned long align_db;
+
+   enable = dr7;
+   dr7_rw

[PATCH 1/2] KVM: x86: Add DR flag definition for IO watchpoints

2015-09-24 Thread Nadav Amit
From: Rami Burstein <ramib...@gmail.com>

Add DR_RW_PROT for I/O watchponts.

Signed-off-by: Andrey Isakov <andreyisak...@gmail.com>
Signed-off-by: Rami Burstein <ramib...@gmail.com>
Signed-off-by: Nadav Amit <nadav.a...@gmail.com>
---
 arch/x86/include/uapi/asm/debugreg.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/uapi/asm/debugreg.h 
b/arch/x86/include/uapi/asm/debugreg.h
index 3c0874d..d815cd4 100644
--- a/arch/x86/include/uapi/asm/debugreg.h
+++ b/arch/x86/include/uapi/asm/debugreg.h
@@ -37,6 +37,7 @@
 
 #define DR_RW_EXECUTE (0x0)   /* Settings for the access types to trap on */
 #define DR_RW_WRITE (0x1)
+#define DR_RW_PORT (0x2)
 #define DR_RW_READ (0x3)
 
 #define DR_LEN_1 (0x0) /* Settings for data length to trap on */
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] KVM: x86: Emulated I/O and data breakpoints support

2015-09-24 Thread Nadav Amit
This patch-set adds data and I/O breakpoints support on emulated instructions
to KVM. The first patch just adds a missing definition and all the logic is in
the second one.

These patches were implemented as part of a university project, and then
cleaned and rebased on the latest KVM by me. I have some concerns about their
impact on performance, so it is possible to add a fast-patch for the case all
breakpoints are disabled. In addition, these patches do not handle watchpoints
that are set by the host debugger.

Nadav Amit (1):
  KVM: x86: Guest watchpoints during emulation.

Rami Burstein (1):
  KVM: x86: Add DR flag definition for IO watchpoints

 arch/x86/include/asm/kvm_emulate.h   |  3 ++
 arch/x86/include/uapi/asm/debugreg.h |  1 +
 arch/x86/kvm/emulate.c   | 32 
 arch/x86/kvm/x86.c   | 74 +++-
 4 files changed, 93 insertions(+), 17 deletions(-)

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests] x86: debug: test debug extensions

2015-09-22 Thread Nadav Amit
Sorry for misleading. I forgot there are several issues with the patches, so
I cannot send them. If I find the time (and you don’t implement it first),
I’ll fix and send them.

Nadav

Paolo Bonzini <pbonz...@redhat.com> wrote:

> 
> 
> On 21/09/2015 14:26, Nadav Amit wrote:
>> Paolo,
>> 
>> Two students here implemented emulated I/O and data breakpoints support for
>> KVM under my supervision. I mistakenly graded their project before they
>> actually sent the patches, and at this point (surprisingly) they
>> disappeared. The patches are relatively ok and include unit-tests. I also
>> ran some Intel tests to check the patches, which they (eventually) passed.
>> 
>> I don’t know if I can find the time to rebase them (they are based on 3.19),
>> but let me know if you are interested.
> 
> Yes, I am.  Worst case just send them based on 3.19, I owe you enough
> bugfixes that I can do the rebase myself. :)
> 
> Paolo


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests] x86: debug: test debug extensions

2015-09-21 Thread Nadav Amit
Paolo,

Two students here implemented emulated I/O and data breakpoints support for
KVM under my supervision. I mistakenly graded their project before they
actually sent the patches, and at this point (surprisingly) they
disappeared. The patches are relatively ok and include unit-tests. I also
ran some Intel tests to check the patches, which they (eventually) passed.

I don’t know if I can find the time to rebase them (they are based on 3.19),
but let me know if you are interested.

Nadav

Paolo Bonzini  wrote:

> Not committing this yet since KVM does not support I/O breakpoints, but 
> posting
> it because it is useful for TCG too.
> 
> TCG fails the tests because it doesn't preserve DRn_FIXED_1 on mov to dr6 and
> dr7, and also because it lacks support for ICEBP, but it is easy to fix the
> former and disable the ICEBP test.
> 
> Signed-off-by: Paolo Bonzini 
> ---
> x86/debug.c | 49 +
> 1 file changed, 49 insertions(+)
> 
> diff --git a/x86/debug.c b/x86/debug.c
> index d04de23..a71a8ae 100644
> --- a/x86/debug.c
> +++ b/x86/debug.c
> @@ -11,6 +11,7 @@
> 
> #include "libcflat.h"
> #include "desc.h"
> +#include "processor.h"
> 
> static volatile unsigned long bp_addr[10], dr6[10];
> static volatile unsigned int n;
> @@ -71,6 +72,8 @@ int main(int ac, char **av)
>   handle_exception(DB_VECTOR, handle_db);
>   handle_exception(BP_VECTOR, handle_bp);
> 
> + write_cr4(read_cr4() | X86_CR4_DE);
> +
> sw_bp:
>   asm volatile("int3");
>   report("#BP", bp_addr[0] == (unsigned long)&_bp + 1);
> @@ -146,5 +149,51 @@ sw_icebp:
>  bp_addr[0] == (unsigned long)&_icebp + 1 &&
>  dr6[0] == 0x0ff0);
> 
> + n = 0;
> + set_dr1((void *)0xe1ul);
> + set_dr7(0x0020400a);
> + set_dr6(0);
> +
> + asm volatile(
> + "out %ax,$0xe0\n\t"
> + "out %al,$0xe1\n\t");
> +hw_wp_out1:
> +
> + set_dr1((void *)0xe0ul);
> + set_dr7(0x00e0400a);
> + asm volatile(
> + "out %al,$0xe0\n\t");
> +hw_wp_out2:
> +
> + report("hw I/O port watchpoint (out)",
> +n == 3 &&
> +bp_addr[0] == ((unsigned long)&_wp_out1 - 2) &&
> +bp_addr[1] == ((unsigned long)&_wp_out1) &&
> +bp_addr[2] == ((unsigned long)&_wp_out2) &&
> +dr6[0] == 0x0ff2);
> +
> + n = 0;
> + set_dr1((void *)0xe1ul);
> + set_dr7(0x0020400a);
> + set_dr6(0);
> +
> + asm volatile(
> + "in $0xe0,%%ax\n\t"
> + "in $0xe1,%%al\n\t" : : : "rax");
> +hw_wp_in1:
> +
> + set_dr1((void *)0xe0ul);
> + set_dr7(0x00e0400a);
> +
> + asm volatile(
> + "in $0xe0,%%al\n\t" : : : "rax");
> +hw_wp_in2:
> + report("hw I/O port watchpoint (in)",
> +n == 3 &&
> +bp_addr[0] == ((unsigned long)&_wp_in1 - 2) &&
> +bp_addr[1] == ((unsigned long)&_wp_in1) &&
> +bp_addr[2] == ((unsigned long)&_wp_in2) &&
> +dr6[0] == 0x0ff2);
> +
>   return report_summary();
> }
> -- 
> 2.5.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 1/2] target-i386: disable LINT0 after reset

2015-09-16 Thread Nadav Amit
I don’t happen to have a similar platform. On regular qemu/kvm runs with
q35, I see APIC_LVT0 is set once to 0x8700 on the BSP - as expected:

 qemu-system-x86-19345 [011] d... 2583274.503018: kvm_entry: vcpu 0
 qemu-system-x86-19345 [011] d... 2583274.503019: kvm_exit: reason APIC_ACCESS 
rip 0x7ffb8288 info 1350 0
 qemu-system-x86-19345 [011]  2583274.503020: kvm_emulate_insn: 
0:7ffb8288:c7 05 50 03 e0 fe 00 87 00 00 (prot32)
 qemu-system-x86-19345 [011]  2583274.503021: kvm_mmio: mmio write len 4 
gpa 0xfee00350 val 0x8700
 qemu-system-x86-19345 [011]  2583274.503021: kvm_apic: apic_write 
APIC_LVT0 = 0x8700

If someone sends a trace ( http://www.linux-kvm.org/page/Tracing ) of the
failure, I would be happy to assist.

Nadav

Gerd Hoffmann <kra...@redhat.com> wrote:

> On Mi, 2015-09-16 at 07:23 +0200, Jan Kiszka wrote:
>> On 2015-09-15 23:19, Alex Williamson wrote:
>>> On Mon, 2015-04-13 at 02:32 +0300, Nadav Amit wrote:
>>>> Due to old Seabios bug, QEMU reenable LINT0 after reset. This bug is long 
>>>> gone
>>>> and therefore this hack is no longer needed.  Since it violates the
>>>> specifications, it is removed.
>>>> 
>>>> Signed-off-by: Nadav Amit <na...@cs.technion.ac.il>
>>>> ---
>>>> hw/intc/apic_common.c | 9 -
>>>> 1 file changed, 9 deletions(-)
>>> 
>>> Please see bug: https://bugs.launchpad.net/qemu/+bug/1488363
>>> 
>>> Is this bug perhaps not as long gone as we thought, or is there
>>> something else going on here?  Thanks,
>> 
>> I would say, someone needs to check if the SeaBIOS line that is supposed
>> to enable LINT0 is actually executed on one of the broken systems and,
>> if not, why not.
> 
> There is only one reason (beside miscompiling seabios with
> CONFIG_QEMU=n) why seabios would skip acpi initialization, and that is
> apic not being present according to cpuid:
> 
>cpuid(1, , , , _features);
>if (eax < 1 || !(cpuid_features & CPUID_APIC)) {
>// No apic - only the main cpu is present.
> 
> https://www.kraxel.org/cgit/seabios/tree/src/fw/smp.c#n79
> 
> cheers,
>  Gerd
> 
> PS: coreboot tripped over this too, fixed just a few days ago.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: Don't return error on nested bitmap memory allocation failure

2015-04-29 Thread Nadav Amit


Bandan Das b...@redhat.com wrote:

 
 If get_free_page() fails for nested bitmap area, it's evident that
 we are gonna get screwed anyway but returning failure because we failed
 allocating memory for a nested structure seems like an unnecessary big
 hammer. Also, save the call for later; after we are done with other
 non-nested allocations.
 
 Signed-off-by: Bandan Das b...@redhat.com
 ---
 arch/x86/kvm/vmx.c | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index f7b6168..200bc5c 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -6039,20 +6039,22 @@ static __init int hardware_setup(void)
   if (!vmx_msr_bitmap_longmode_x2apic)
   goto out4;
 
 - if (nested) {
 - vmx_msr_bitmap_nested =
 - (unsigned long *)__get_free_page(GFP_KERNEL);
 - if (!vmx_msr_bitmap_nested)
 - goto out5;
 - }
 -
   vmx_vmread_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL);
   if (!vmx_vmread_bitmap)
 - goto out6;
 + goto out5;
 
   vmx_vmwrite_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL);
   if (!vmx_vmwrite_bitmap)
 - goto out7;
 + goto out6;
 +
 + if (nested) {
 + vmx_msr_bitmap_nested =
 + (unsigned long *)__get_free_page(GFP_KERNEL);
 + if (!vmx_msr_bitmap_nested) {
 + printk(KERN_WARNING
 +vmx: Failed getting memory for nested 
 bitmap\n);
 + nested = 0;
 + }
 
   memset(vmx_vmread_bitmap, 0xff, PAGE_SIZE);
   memset(vmx_vmwrite_bitmap, 0xff, PAGE_SIZE);
 @@ -6073,7 +6075,7 @@ static __init int hardware_setup(void)
 
   if (setup_vmcs_config(vmcs_config)  0) {
   r = -EIO;
 - goto out8;
 + goto out7;
   }
 
   if (boot_cpu_has(X86_FEATURE_NX))
 @@ -6190,13 +6192,12 @@ static __init int hardware_setup(void)
 
   return alloc_kvm_area();
 
 -out8:
 - free_page((unsigned long)vmx_vmwrite_bitmap);
 out7:
 - free_page((unsigned long)vmx_vmread_bitmap);
 -out6:
   if (nested)
   free_page((unsigned long)vmx_msr_bitmap_nested);
 + free_page((unsigned long)vmx_vmwrite_bitmap);
 +out6:
 + free_page((unsigned long)vmx_vmread_bitmap);
 out5:
   free_page((unsigned long)vmx_msr_bitmap_longmode_x2apic);
 out4:
 
free_page appears to check whether the address is zero before it actually
frees the page. Perhaps it is better to leverage this behaviour to remove
all the outX and simplify the code.

Nadav

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Guest handling of IA32_DEBUGCTL MSR

2015-04-28 Thread Nadav Amit
It seems strange that the guest is allowed to set IA32_DEBUGCTL MSR for the
nested VM and get this value to the physical IA32_DEBUGCTL (see
prepare_vmcs02), while it cannot set IA32_DEBUGCTL for itself (see
kvm_set_msr_common).

Am I missing something?

Regards,
Nadav
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Guest handling of IA32_DEBUGCTL MSR

2015-04-28 Thread Nadav Amit
Jan Kiszka jan.kis...@siemens.com wrote:

 Am 2015-04-28 um 13:43 schrieb Paolo Bonzini:
 On 28/04/2015 13:42, Nadav Amit wrote:
 It seems strange that the guest is allowed to set IA32_DEBUGCTL MSR for the
 nested VM and get this value to the physical IA32_DEBUGCTL (see
 prepare_vmcs02), while it cannot set IA32_DEBUGCTL for itself (see
 kvm_set_msr_common).
 
 Am I missing something?
 
 No, it makes no sense.
 
 Are you sure that vmx is not allowing direct access to that MSR while in
 guest mode? We do save/restore it on all Intel CPUs, see
 setup_vmcs_config. Not sure about the AMD situation, though.

Unless you explicitly disable the interception of the MSR in the bitmap, you
trap WRMSR to this MSR. I guess the original intention of saving/restoring
it was to address situations in which the CPU itself implicitly changes it
(e.g., clearing LBR when “freezing LBR on PMI” is set). I think that these
situations do not apply when the MSR is zeroed, as it happens to be in KVM.

Nadav--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: x86: Fix update RCX/RDI/RSI on REP-string

2015-04-28 Thread Nadav Amit
When REP-string instruction is preceded with an address-size prefix,
ECX/EDI/ESI are used as the operation counter and pointers.  When they are
updated, the high 32-bits of RCX/RDI/RSI are cleared, similarly to the way they
are updated on every 32-bit register operation.  Fix it.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c1bc650..296d58e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -524,13 +524,9 @@ static void masked_increment(ulong *reg, ulong mask, int 
inc)
 static inline void
 register_address_increment(struct x86_emulate_ctxt *ctxt, int reg, int inc)
 {
-   ulong mask;
+   ulong *preg = reg_rmw(ctxt, reg);
 
-   if (ctxt-ad_bytes == sizeof(unsigned long))
-   mask = ~0UL;
-   else
-   mask = ad_mask(ctxt);
-   masked_increment(reg_rmw(ctxt, reg), mask, inc);
+   assign_register(preg, *preg + inc, ctxt-ad_bytes);
 }
 
 static void rsp_increment(struct x86_emulate_ctxt *ctxt, int inc)
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] KVM: x86: Fix REP-string effect on RCX/RSI/RDI

2015-04-28 Thread Nadav Amit
This patch-set fixes KVM behavior when handling a REP-string instruction that
runs with an address-size of 32-bit.  In this case ECX/EDI/ESI are used as
counter and pointers, and the high 32-bits should be cleared.

The first patch handles with the simple case. The second one handles the
corner-case in which ECX is initially zero.  It appears that Intel and AMD
behave differently in this case (and some experiments suggest even different
Intel generations act differently), and I could not find any documentation that
describes it. Yet, the behavior of INS/OUTS can be observed by the guest and
VMware appears to get it right.

Thanks for reviewing the patches.

Nadav Amit (2):
  KVM: x86: Fix update RCX/RDI/RSI on REP-string
  KVM: x86: Fix zero iterations REP-string

 arch/x86/kvm/emulate.c | 33 +++--
 1 file changed, 27 insertions(+), 6 deletions(-)

-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: x86: Fix zero iterations REP-string

2015-04-28 Thread Nadav Amit
When a REP-string is executed in 64-bit mode with an address-size prefix,
ECX/EDI/ESI are used as counter and pointers. When ECX is initially zero, Intel
CPUs clear the high 32-bits of RCX, and recent Intel CPUs update the high bits
of the pointers in MOVS/STOS. This behavior is specific to Intel according to
few experiments.

As one may guess, this is an undocumented behavior. Yet, it is observable in
the guest, since at least VMX traps REP-INS/OUTS even when ECX=0. Note that
VMware appears to get it right.  The behavior can be observed using the
following code:

 #include stdio.h

 #define LOW_MASK   (0xull)
 #define ALL_MASK   (0xull)
 #define TEST(opcode)   \
do {\
asm volatile(.byte 0xf2 \n\t .byte 0x67 \n\t .byte  opcode \n\t \
: =S(s), =c(c), =D(d) \
: S(ALL_MASK), c(LOW_MASK), D(ALL_MASK)); \
printf(opcode %s rcx=%llx rsi=%llx rdi=%llx\n,\
opcode, c, s, d);   \
} while(0)

void main()
{
unsigned long long s, d, c;
iopl(3);
TEST(0x6c);
TEST(0x6d);
TEST(0x6e);
TEST(0x6f);
TEST(0xa4);
TEST(0xa5);
TEST(0xa6);
TEST(0xa7);
TEST(0xaa);
TEST(0xab);
TEST(0xae);
TEST(0xaf);
}

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 296d58e..e726c50 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2591,6 +2591,30 @@ static bool emulator_io_permited(struct x86_emulate_ctxt 
*ctxt,
return true;
 }
 
+static void string_registers_quirk(struct x86_emulate_ctxt *ctxt)
+{
+   /*
+* Intel CPUs mask the counter and pointers in quite strange
+* manner when ECX is zero due to REP-string optimizations.
+*/
+#ifdef CONFIG_X86_64
+   if (ctxt-ad_bytes != 4 || !vendor_intel(ctxt))
+   return;
+
+   *reg_write(ctxt, VCPU_REGS_RCX) = 0;
+
+   switch (ctxt-b) {
+   case 0xa4:  /* movsb */
+   case 0xa5:  /* movsd/w */
+   *reg_rmw(ctxt, VCPU_REGS_RSI) = (u32)-1;
+   /* fall through */
+   case 0xaa:  /* stosb */
+   case 0xab:  /* stosd/w */
+   *reg_rmw(ctxt, VCPU_REGS_RDI) = (u32)-1;
+   }
+#endif
+}
+
 static void save_state_to_tss16(struct x86_emulate_ctxt *ctxt,
struct tss_segment_16 *tss)
 {
@@ -5004,6 +5028,7 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
if (ctxt-rep_prefix  (ctxt-d  String)) {
/* All REP prefixes have the same first termination 
condition */
if (address_mask(ctxt, reg_read(ctxt, VCPU_REGS_RCX)) 
== 0) {
+   string_registers_quirk(ctxt);
ctxt-eip = ctxt-_eip;
ctxt-eflags = ~X86_EFLAGS_RF;
goto done;
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: Fix cmpxchg with two 32-bit registers

2015-04-26 Thread Nadav Amit
The emulation of CMPXCHG with two register operands in 64-bit mistakenly masks
the high 32-bits as it performs assignment. Fix it.

Fixes: 2fcf5c8ae244b4c298d2111a288d410a719ac626

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 630bcb0..84c132b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2242,6 +2242,8 @@ static int em_cmpxchg(struct x86_emulate_ctxt *ctxt)
ctxt-src.val = ctxt-dst.orig_val;
/* Create write-cycle to dest by writing the same value */
ctxt-dst.val = ctxt-dst.orig_val;
+   if (ctxt-dst.type != OP_MEM)
+   ctxt-dst.type = OP_NONE;
}
return X86EMUL_CONTINUE;
 }
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM x86 emulator design question

2015-04-19 Thread Nadav Amit
Hi,

I would appreciate if someone briefly explains the design choice that leaded
to separating the x86 emulator from the rest of x86 code, i.e., making it
oblivious to VCPUs and using the x86_emulate_ops as an interface.

Thanks,
Nadav
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: Mask DR7 correctly on task-switch while debugging

2015-04-19 Thread Nadav Amit
If the host sets hardware breakpoints to debug the guest, and a task-switch
occurs in the guest, the architectural DR7 will not be updated. The effective
DR7 would be updated instead.

This fix uses the standard DR setting mechanism instead of the one that was
previously used. As a bonus, the update of DR7 will now be effective for AMD as
well.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/vmx.c | 3 ---
 arch/x86/kvm/x86.c | 3 +++
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f7a0a7f..8f731c0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5703,9 +5703,6 @@ static int handle_task_switch(struct kvm_vcpu *vcpu)
return 0;
}
 
-   /* clear all local breakpoint enable flags */
-   vmcs_writel(GUEST_DR7, vmcs_readl(GUEST_DR7)  ~0x155);
-
/*
 * TODO: What about debug traps on tss switch?
 *   Are we supposed to inject them and update dr6?
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2046be4..a170c35 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6816,6 +6816,9 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 
tss_selector, int idt_index,
 
kvm_rip_write(vcpu, ctxt-eip);
kvm_set_rflags(vcpu, ctxt-eflags);
+   ret = __kvm_set_dr(vcpu, 7, vcpu-arch.dr7  ~(DR_LOCAL_ENABLE_MASK |
+  DR_LOCAL_SLOWDOWN));
+   WARN_ON(ret != 0);
kvm_make_request(KVM_REQ_EVENT, vcpu);
return EMULATE_DONE;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] KVM: x86: Fix DR7 mask on task-switch while debugging

2015-04-19 Thread Nadav Amit
If the host sets hardware breakpoints to debug the guest, and a task-switch
occurs in the guest, the architectural DR7 will not be updated. The effective
DR7 would be updated instead.

This fix puts the DR7 update during task-switch emulation, so it now uses the
standard DR setting mechanism instead of the one that was previously used. As a
bonus, the update of DR7 will now be effective for AMD as well.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il

---

v1 - v2:
- Move the setting to emulate.c instead of x86.c
- Shorten title
---
 arch/x86/kvm/emulate.c | 6 +-
 arch/x86/kvm/vmx.c | 3 ---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 630bcb0..4a4555a 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -25,6 +25,7 @@
 #include linux/module.h
 #include asm/kvm_emulate.h
 #include linux/stringify.h
+#include asm/debugreg.h
 
 #include x86.h
 #include tss.h
@@ -2849,7 +2850,7 @@ static int emulator_do_task_switch(struct 
x86_emulate_ctxt *ctxt,
ulong old_tss_base =
ops-get_cached_segment_base(ctxt, VCPU_SREG_TR);
u32 desc_limit;
-   ulong desc_addr;
+   ulong desc_addr, dr7;
 
/* FIXME: old_tss_base == ~0 ? */
 
@@ -2934,6 +2935,9 @@ static int emulator_do_task_switch(struct 
x86_emulate_ctxt *ctxt,
ret = em_push(ctxt);
}
 
+   ops-get_dr(ctxt, 7, dr7);
+   ops-set_dr(ctxt, 7, dr7  ~(DR_LOCAL_ENABLE_MASK | DR_LOCAL_SLOWDOWN));
+
return ret;
 }
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f7a0a7f..8f731c0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5703,9 +5703,6 @@ static int handle_task_switch(struct kvm_vcpu *vcpu)
return 0;
}
 
-   /* clear all local breakpoint enable flags */
-   vmcs_writel(GUEST_DR7, vmcs_readl(GUEST_DR7)  ~0x155);
-
/*
 * TODO: What about debug traps on tss switch?
 *   Are we supposed to inject them and update dr6?
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: XP machine freeze

2015-04-19 Thread Nadav Amit
Brad Campbell lists2...@fnarfbargle.com wrote:

 
 On 13/04/15 22:02, Paolo Bonzini wrote:
 On 13/04/2015 14:45, Brad Campbell wrote:
 G'day Paolo,
 
 Yes, on AMD and I've tried hard to reproduce it on Intel and been unable
 to thus far.
 
 Now you mention it may be AMD specific, I have a spare motherboard and
 processor sitting in a drawer. I'll bolt it together tomorrow and see if
 I can reproduce it on another AMD machine. Two machines should let me
 test it twice as fast.
 
 I got a fail this afternoon, so I'm due to reboot tonight. I'll just
 revert that one suspect commit from a known bad kernel and see if that
 cleans it up. If not then I'll work through the remainder of the
 information in your mail. I really appreciate the attention you've paid
 to this, it has been a frustrating bug for me because I'm in a position
 of not knowing what I don't know, and obviously doing something wrong in
 very long bisection processes.
 Actually, if you have time to change your course of action, please
 revert the one that Nadav pointed out (f210f7572bed, KVM: x86:
 Fix lost interrupt on irr_pending race) or cherry-pick it on top of 3.17.
 
 Paolo
 Ok, I think we have a winner. Patch manually plopped on top of vanilla 3.17. 
 It has never gone for anywhere near this long on a bad kernel.
 
 brad@srv:~$ uptime
 23:24:48 up 6 days,  1:01,  3 users,  load average: 1.48, 1.95, 2.48
 
 So this patch went into the kernel during the 3.19 release cycle? Affected 
 kernels 3.16-3.18?

Actually, the original bug seemed to be introduced by commit
33e4c68656a2e461b296ce714ec322978de85412 KVM: Optimize searching for
highest IRR”. So the bug goes all the way back to 2.6.32. The race that this
patch fixes just became more apparent (i.e., likely to happen) on 3.16. It
is fixed in 3.19.

I guess Paolo would push it to stable now. Right?

Regards,
Nadav

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Mask DR7 correctly on task-switch while debugging

2015-04-19 Thread Nadav Amit
Paolo Bonzini pbonz...@redhat.com wrote:

 
 
 On 19/04/2015 14:18, Nadav Amit wrote:
 If the host sets hardware breakpoints to debug the guest, and a task-switch
 occurs in the guest, the architectural DR7 will not be updated. The effective
 DR7 would be updated instead.
 
 This fix uses the standard DR setting mechanism instead of the one that was
 previously used. As a bonus, the update of DR7 will now be effective for AMD 
 as
 well.
 
 Is there a reason not to do it in emulate.c instead?

Not that I can think of. I will make a new version.

Nadav--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] KVM: x86: Clear CR2 on VCPU reset

2015-04-14 Thread Nadav Amit
Wanpeng Li wanpeng...@linux.intel.com wrote:

 Hi Nadav,
 On Thu, Apr 02, 2015 at 03:10:38AM +0300, Nadav Amit wrote:
 CR2 is not cleared as it should after reset.  See Intel SDM table named 
 IA-32
 Processor States Following Power-up, Reset, or INIT.
 
 How you trigger the reset instead of the Power-up one?

I sent an IPI of INIT for the KVM “reset” flow. I posted a unit-test:
http://www.spinics.net/lists/kvm/msg115525.html

The actual reset is handled by qemu, but KVM is still able to introduce bugs
in it, as it did in not reseting DR0-DR3.

Nadav--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] KVM: x86: INIT and reset sequences are different

2015-04-13 Thread Nadav Amit
x86 architecture defines differences between the reset and INIT sequences.
INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU,
MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP.

References (from Intel SDM):

If the MP protocol has completed and a BSP is chosen, subsequent INITs (either
to a specific processor or system wide) do not cause the MP protocol to be
repeated. [8.4.2: MP Initialization Protocol Requirements and Restrictions]

[Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT]

If the processor is reset by asserting the INIT# pin, the x87 FPU state is not
changed. [9.2: X87 FPU INITIALIZATION]

The state of the local APIC following an INIT reset is the same as it is after
a power-up or hardware reset, except that the APIC ID and arbitration ID
registers are not affected. [10.4.7.3: Local APIC State After an INIT Reset
(“Wait-for-SIPI” State)]

Signed-off-by: Nadav Amit na...@cs.technion.ac.il

---

v3:

- Leave EFER unchanged on INIT. Instead, set cr0 correctly so vmx_set_cr0 would
  recognize that paging was changed from on to off and clear LMA.
- Clean the surrounding from unnecassary indirection of vmx-vcpu.
- Change svm similarly to vmx (UNTESTED).

v2:

- Same as v1 (was part of a patch-set that was modified due to missing tests)
---
 arch/x86/include/asm/kvm_host.h |  6 ++---
 arch/x86/kvm/lapic.c| 11 +
 arch/x86/kvm/lapic.h|  2 +-
 arch/x86/kvm/svm.c  | 27 +++---
 arch/x86/kvm/vmx.c  | 51 +++--
 arch/x86/kvm/x86.c  | 17 --
 6 files changed, 63 insertions(+), 51 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f80ad59..3a19e30 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -711,7 +711,7 @@ struct kvm_x86_ops {
/* Create, but do not attach this VCPU */
struct kvm_vcpu *(*vcpu_create)(struct kvm *kvm, unsigned id);
void (*vcpu_free)(struct kvm_vcpu *vcpu);
-   void (*vcpu_reset)(struct kvm_vcpu *vcpu);
+   void (*vcpu_reset)(struct kvm_vcpu *vcpu, bool init_event);
 
void (*prepare_guest_switch)(struct kvm_vcpu *vcpu);
void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu);
@@ -1001,7 +1001,7 @@ void kvm_pic_clear_all(struct kvm_pic *pic, int 
irq_source_id);
 
 void kvm_inject_nmi(struct kvm_vcpu *vcpu);
 
-int fx_init(struct kvm_vcpu *vcpu);
+int fx_init(struct kvm_vcpu *vcpu, bool init_event);
 
 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
   const u8 *new, int bytes);
@@ -1145,7 +1145,7 @@ int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
 int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
-void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
+void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
 void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu);
 void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
   unsigned long address);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index fe2d89e..a91fb2f 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1557,7 +1557,7 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
 
 }
 
-void kvm_lapic_reset(struct kvm_vcpu *vcpu)
+void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
struct kvm_lapic *apic;
int i;
@@ -1571,7 +1571,8 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
/* Stop the timer in case it's a reset to an active apic */
hrtimer_cancel(apic-lapic_timer.timer);
 
-   kvm_apic_set_id(apic, vcpu-vcpu_id);
+   if (!init_event)
+   kvm_apic_set_id(apic, vcpu-vcpu_id);
kvm_apic_set_version(apic-vcpu);
 
for (i = 0; i  APIC_LVT_NUM; i++)
@@ -1713,7 +1714,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE);
 
static_key_slow_inc(apic_sw_disabled.key); /* sw disabled at reset */
-   kvm_lapic_reset(vcpu);
+   kvm_lapic_reset(vcpu, false);
kvm_iodevice_init(apic-dev, apic_mmio_ops);
 
return 0;
@@ -2047,8 +2048,8 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
pe = xchg(apic-pending_events, 0);
 
if (test_bit(KVM_APIC_INIT, pe)) {
-   kvm_lapic_reset(vcpu);
-   kvm_vcpu_reset(vcpu);
+   kvm_lapic_reset(vcpu, true);
+   kvm_vcpu_reset(vcpu, true);
if (kvm_vcpu_is_bsp(apic-vcpu))
vcpu-arch.mp_state = KVM_MP_STATE_RUNNABLE;
else
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 9d28383..793f761 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -48,7 +48,7 @@ int

Re: [PATCH] KVM: x86: Support for disabling quirks

2015-04-13 Thread Nadav Amit
Thanks. If you want a test-case you can apply/try the following on top of
the previous kvm-unit-tests patch-set I sent 
( http://www.spinics.net/lists/kvm/msg115525.html ).

Regards,
Nadav

-- 8 --

From: Nadav Amit na...@cs.technion.ac.il
Subject: [PATCH kvm-unit-tests] x86: Test LINT0 is disabled after reset

Requires x2APIC in order to easily save LINT0 during 16-bit code.  For the test
to pass a fix for qemu is required.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 x86/init.c| 13 -
 x86/unittests.cfg |  1 +
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/x86/init.c b/x86/init.c
index 2d7ea99..23079ad 100644
--- a/x86/init.c
+++ b/x86/init.c
@@ -70,6 +70,7 @@ static struct expected_state expected[] = {
 #define cr2 (*(volatile int*)0x2010)
 #define sysenter_eip (*(volatile int*)0x2014)
 #define boot_apic_id (*(volatile int *)0x2018)
+#define lvt0 (*(volatile int *)0x201c)
 
 static void set_test_regs(void)
 {
@@ -94,7 +95,10 @@ static bool check_test_regs(bool init)
printf(wrong sysenter_eip msr: %x\n, sysenter_eip);
error = true;
}
-
+   if (lvt0 != 0x1) {
+   printf(wrong lvt0 value: %x\n, lvt0);
+   error = true;
+   }
return error;
 }
 
@@ -240,6 +244,13 @@ asm (
cpuid\n
shrl $24, %ebx\n
mov %ebx, %cs:0x2018\n// apic_id
+   mov $0x1b, %ecx\n // IA32_APIC_BASE
+   rdmsr\n
+   or $0x400, %eax\n
+   wrmsr\n   // Enabling x2apic
+   mov $0x835, %ecx\n
+   rdmsr\n
+mov %eax, %cs:0x201c\n// lvt0
mov $0x0f, %al\n  // rtc_out(0x0f, 0x00);
out %al, $0x70\n
mov $0x00, %al\n
diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index 2d25801..0d1a42b 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -86,6 +86,7 @@ arch = x86_64
 [init]
 file = init.flat
 smp = 2
+extra_params = -cpu qemu64,+x2apic
 
 [msr]
 file = msr.flat
-- 
1.9.1




Paolo Bonzini pbonz...@redhat.com wrote:

 
 
 On 13/04/2015 00:53, Nadav Amit wrote:
 Introducing KVM_CAP_DISABLE_QUIRKS for disabling x86 quirks that were 
 previous
 created in order to overcome QEMU issues. Those issue were mostly result of
 invalid VM BIOS.  Currently there are two quirks that can be disabled:
 
 1. KVM_QUIRK_LINT0_REENABLED - LINT0 was enabled after boot
 2. KVM_QUIRK_CD_NW_CLEARED - CD and NW are cleared after boot
 
 These two issues are already resolved in recent releases of QEMU, and would
 therefore be disabled by QEMU.
 
 Signed-off-by: Nadav Amit na...@cs.technion.ac.il
 ---
 Documentation/virtual/kvm/api.txt |  3 ++-
 arch/x86/include/asm/kvm_host.h   |  2 ++
 arch/x86/include/uapi/asm/kvm.h   |  3 +++
 arch/x86/kvm/lapic.c  |  5 +++--
 arch/x86/kvm/svm.c|  3 ++-
 arch/x86/kvm/x86.c| 29 +
 include/uapi/linux/kvm.h  |  1 +
 7 files changed, 42 insertions(+), 4 deletions(-)
 
 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index bc9f6fe..3931221 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -959,7 +959,8 @@ documentation when it pops into existence).
 4.37 KVM_ENABLE_CAP
 
 Capability: KVM_CAP_ENABLE_CAP, KVM_CAP_ENABLE_CAP_VM
 -Architectures: ppc, s390
 +Architectures: x86 (only KVM_CAP_ENABLE_CAP_VM),
 +   mips (only KVM_CAP_ENABLE_CAP), ppc, s390
 Type: vcpu ioctl, vm ioctl (with KVM_CAP_ENABLE_CAP_VM)
 Parameters: struct kvm_enable_cap (in)
 Returns: 0 on success; -1 on error
 diff --git a/arch/x86/include/asm/kvm_host.h 
 b/arch/x86/include/asm/kvm_host.h
 index dea2e7e..f80ad59 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -635,6 +635,8 @@ struct kvm_arch {
  #endif
 
  bool boot_vcpu_runs_old_kvmclock;
 +
 +u64 disabled_quirks;
 };
 
 struct kvm_vm_stat {
 diff --git a/arch/x86/include/uapi/asm/kvm.h 
 b/arch/x86/include/uapi/asm/kvm.h
 index d7dcef5..2fec75e 100644
 --- a/arch/x86/include/uapi/asm/kvm.h
 +++ b/arch/x86/include/uapi/asm/kvm.h
 @@ -345,4 +345,7 @@ struct kvm_xcrs {
 struct kvm_sync_regs {
 };
 
 +#define KVM_QUIRK_LINT0_REENABLED   (1  0)
 +#define KVM_QUIRK_CD_NW_CLEARED (1  1)
 +
 #endif /* _ASM_X86_KVM_H */
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 4a6e58a..fe2d89e 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -1577,8 +1577,9 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
  for (i = 0; i  APIC_LVT_NUM; i++)
  apic_set_reg(apic, APIC_LVTT + 0x10 * i, APIC_LVT_MASKED);
  apic-lapic_timer.timer_mode = 0;
 -apic_set_reg(apic, APIC_LVT0,
 - SET_APIC_DELIVERY_MODE(0, APIC_MODE_EXTINT));
 +if (!(vcpu-kvm-arch.disabled_quirks  KVM_QUIRK_LINT0_REENABLED))
 +apic_set_reg(apic, APIC_LVT0

Re: [Qemu-devel] [PATCH 2/2] target-i386: kvm: Disable KVM quirks

2015-04-13 Thread Nadav Amit
Paolo Bonzini pbonz...@redhat.com wrote:

 
 
 On 13/04/2015 16:17, Nadav Amit wrote:
 Paolo Bonzini pbonz...@redhat.com wrote:
 
 On 13/04/2015 01:32, Nadav Amit wrote:
 +if (kvm_check_extension(s, KVM_CAP_ENABLE_CAP_VM)) {
 
 The right capability to check here is KVM_CAP_DISABLE_QUIRKS, not
 KVM_CAP_ENABLE_CAP_VM.
 
 Paolo
 
 +ret = kvm_vm_enable_cap(s, KVM_CAP_DISABLE_QUIRKS, 0,
 +KVM_QUIRK_LINT0_DISABLED |
 +KVM_QUIRK_CD_NW_CLEARED);
 +if (ret  0) {
 +return ret;
 +}
 +}
 
 Of course…
 Do you want a v2 now, later (after 4.2), or would you change it yourself?
 
 Later, close to 4.2 but not necessarily after it.

Anyhow, in that case the KVM patch is also wrong (not reporting
KVM_CAP_DISABLE_QUIRKS is supported). I don’t want to spam, so I’ll run some
tests and resubmit.

Nadav--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/2] target-i386: kvm: Disable KVM quirks

2015-04-13 Thread Nadav Amit
Paolo Bonzini pbonz...@redhat.com wrote:

 
 
 On 13/04/2015 01:32, Nadav Amit wrote:
 +if (kvm_check_extension(s, KVM_CAP_ENABLE_CAP_VM)) {
 
 The right capability to check here is KVM_CAP_DISABLE_QUIRKS, not
 KVM_CAP_ENABLE_CAP_VM.
 
 Paolo
 
 +ret = kvm_vm_enable_cap(s, KVM_CAP_DISABLE_QUIRKS, 0,
 +KVM_QUIRK_LINT0_DISABLED |
 +KVM_QUIRK_CD_NW_CLEARED);
 +if (ret  0) {
 +return ret;
 +}
 +}

Of course…
Do you want a v2 now, later (after 4.2), or would you change it yourself?

Thanks,
Nadav--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: XP machine freeze

2015-04-13 Thread Nadav Amit
Paolo,

I hope I am not misleading or interrupting, and I am obviously very biased —
but couldn’t it be related to the issue that patch f210f7572bed (KVM: x86:
Fix lost interrupt on irr_pending race”) deals with?

I got this issue first when I upgraded to 3.17 in my testing environment,
since apparently a race got worse due to patch 56cc2406d68c. Did anyone try
3.19 that has this fix?

Regards,
Nadav

Paolo Bonzini pbonz...@redhat.com wrote:

 
 
 On 13/04/2015 06:07, Brad Campbell wrote:
 On 31/03/15 05:11, Paolo Bonzini wrote:
 On 22/03/2015 16:31, Brad Campbell wrote:
 No help I'm afraid, but at least I can conclusively say that 3.16 is
 good, and 3.17 is bad.
 Can you try more specifically around the first KVM pull request?  That
 would be between c9b88e958182 (presumed good) and 8533ce727188 (presumed
 bad)?
 
 
 G'day Paolo.
 
 I can confirm that the fault appears to lie between good and bad as
 specified above.
 Bad failed before 48 hours, good ran for 143 hours. I'm bisecting now.
 
 Thanks!  Remember to bisect only with arch/x86/kvm.
 
 Also:
 
 1) Brad, I see you are on AMD.  Have you ever reproduced it on Intel?
 Saso, are you on AMD as well?
 
 If so, the most likely culprit is this:
 
 commit 6addfc42992be4b073c39137ecfdf4b2aa2d487f
 Author: Paolo Bonzini pbonz...@redhat.com
 Date:   Thu Mar 27 11:29:28 2014 +0100
 
KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation
 
Despite the provisions to emulate up to 130 consecutive instructions, in
practice KVM will emulate just one before exiting 
 handle_invalid_guest_state,
because x86_emulate_instruction always sets KVM_REQ_EVENT.
 
However, we only need to do this if an interrupt could be injected,
which happens a) if an interrupt shadow bit (STI or MOV SS) has gone
away; b) if the interrupt flag has just been set (other instructions
than STI can set it without enabling an interrupt shadow).
 
This cuts another 700-900 cycles from the cost of emulating an
instruction (measured on a Sandy Bridge Xeon: 1650-2600 cycles
before the patch on kvm-unit-tests, 925-1700 afterwards).
 
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 
 I would first try this one, and see if it is bad.
 
 Radim, do you think this could cause a missed interrupt injection
 after Windows does a TPR write?
 
 2) For bisection feel free to git bisect skip the following:
 
 03916db9348c079d8d214f971cc114bb51c6b869 Replace NR_VMX_MSR with its 
 definition
 9a2a05b9ed618b1bb6d4cbec0c2e1f80d6636609 KVM: nVMX: clean up 
 nested_release_vmcs12 and code around it
 4fa7734c62cdd8c07edd54fa5a5e91482273071a KVM: nVMX: fix lifetime issues for 
 vmcs02
 c9cdd085bb75226879fd468b88e2e7eb467325b7 KVM: x86: Defining missing x86 
 vectors
 0123be429fef40f067e5b1811576c3994229f59e KVM: x86: Assertions to check no 
 overrun in MSR lists
 296f047502f1b3ddfd63adbc192624ce80740081 KVM: vmx: remove duplicate 
 vmx_mpx_supported() prototype
 963fee1656603ce2e91ebb988cd5a92f2af41369 KVM: nVMX: Fix virtual interrupt 
 delivery injection
 6cbc5f5a80a9ae5a80bc81efc574b5a85bfd4a84 KVM: nSVM: Set correct port for IOIO 
 interception evaluation
 6493f1574e898b46370e2b2315836d76a1980f2c KVM: nSVM: Fix IOIO size reported on 
 emulation
 9bf418335e24da995ea682a028926d7e1036be6f KVM: nSVM: Fix IOIO bitmap evaluation
 62baf44cad3bc6b37115cc21e4228fe53d4f3474 KVM: nSVM: Do not report CLTS via 
 SVM_EXIT_WRITE_CR0 to L1
 5381417f6a51293e7b8af1eb18aefa5d47976a71 KVM: nVMX: Fix returned value of 
 MSR_IA32_VMX_VMCS_ENUM
 2996fca0690f03a5220203588f4a0d8c5acba2b0 KVM: nVMX: Allow to disable 
 VM_{ENTRY_LOAD,EXIT_SAVE}_DEBUG_CONTROLS
 560b7ee12ca5e1ebc1675d7eb4008bb22708277a KVM: nVMX: Fix returned value of 
 MSR_IA32_VMX_PROCBASED_CTLS
 3dcdf3ec6e48d918741ea11349d4436d0c5aac93 KVM: nVMX: Allow to disable CR3 
 access interception
 3dbcd8da7b564194f93271b003a1c46ef404cbdb KVM: nVMX: Advertise support for 
 MSR_IA32_VMX_TRUE_*_CTLS
 bc39c4db7110f88f338cbbabe53d3e43c7400a59 arch/x86/kvm/vmx.c: use PAGE_ALIGNED 
 instead of IS_ALIGNED(PAGE_SIZE
 e4aa5288ff07766d101751de9a8420d666c61735 KVM: x86: Fix constant value of 
 VM_{EXIT_SAVE,ENTRY_LOAD}_DEBUG_CONTROLS
 42cbc04fd3b5e3f9b011bf9fa3ce0b3d1e10b58b x86/kvm: Resolve shadow warnings in 
 macro expansion
 b55a8144d1807f9e74c51cb584f0dd198483d86c x86/kvm: Resolve shadow warning from 
 min macro
 98eff52ab5c0ff5cb96940a93e99a1aeb2f11c89 KVM: x86: Fix lapic.c debug prints
 9f6226a762c7ae02f6a23a3d4fc552dafa57ea23 arch: x86: kvm: x86.c: Cleaning up 
 variable is set more than once
 80112c89ed872c725e7dc39ccf6c37d1a585e161 KVM: Synthesize G bit for all 
 segments.
 27e6fb5dae2819d17f38dc9224692b771e989981 KVM: vmx: vmx instructions handling 
 does not consider cs.l
 bdc907222c5e4edd848da0c031deb55b59f1cf9a KVM: emulate: fix harmless typo in 
 MMX decoding
 10e38fc7cabc668738e6a7b7b57cbcddb2234440 KVM: x86: Emulator flag for 
 instruction that only support 16-bit addresses in real mode
 

[Qemu-devel][PATCH 1/2] target-i386: disable LINT0 after reset

2015-04-12 Thread Nadav Amit
Due to old Seabios bug, QEMU reenable LINT0 after reset. This bug is long gone
and therefore this hack is no longer needed.  Since it violates the
specifications, it is removed.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 hw/intc/apic_common.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index 042e960..d38d24b 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -243,15 +243,6 @@ static void apic_reset_common(DeviceState *dev)
 info-vapic_base_update(s);
 
 apic_init_reset(dev);
-
-if (bsp) {
-/*
- * LINT0 delivery mode on CPU #0 is set to ExtInt at initialization
- * time typically by BIOS, so PIC interrupt can be delivered to the
- * processor when local APIC is enabled.
- */
-s-lvt[APIC_LVT_LINT0] = 0x700;
-}
 }
 
 /* This function is only used for old state version 1 and 2 */
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Qemu-devel][PATCH 0/2] target-i386: disable LINT0 after reset and init

2015-04-12 Thread Nadav Amit
LINT0 is currently reenabled after reset to circumvent old seabios bug, which
violates x86 specifications.  This patch-set handles this issue, by removing
the old hack from qemu and reporting to kvm that this quirk is no longer
needed.  In addition, we disable another kvm quirk that clears CD and NW from
CR0 after reset.

Thanks for reviewing these patches.

Nadav Amit (2):
  target-i386: disable LINT0 after reset
  target-i386: kvm: Disable KVM quirks

 hw/intc/apic_common.c   | 9 -
 linux-headers/asm-x86/kvm.h | 4 
 linux-headers/linux/kvm.h   | 1 +
 target-i386/kvm.c   | 8 
 4 files changed, 13 insertions(+), 9 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Qemu-devel][PATCH 2/2] target-i386: kvm: Disable KVM quirks

2015-04-12 Thread Nadav Amit
KVM has quirks to overcome legacy QEMU bugs that are already resolved.  Using a
new KVM feature for disabling these quirks.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 linux-headers/asm-x86/kvm.h | 4 
 linux-headers/linux/kvm.h   | 1 +
 target-i386/kvm.c   | 8 
 3 files changed, 13 insertions(+)

diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index d7dcef5..f8fbb4a 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -345,4 +345,8 @@ struct kvm_xcrs {
 struct kvm_sync_regs {
 };
 
+/* KVM legacy quirks for use with KVM_CAP_DISABLE_QUIRKS */
+#define KVM_QUIRK_LINT0_DISABLED   (1  0)
+#define KVM_QUIRK_CD_NW_CLEARED(1  1)
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 60a54c8..757e869 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -760,6 +760,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_ENABLE_HCALL 104
 #define KVM_CAP_CHECK_EXTENSION_VM 105
 #define KVM_CAP_S390_USER_SIGP 106
+#define KVM_CAP_DISABLE_QUIRKS 115
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 41d09e5..3b28122 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -898,6 +898,14 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 return ret;
 }
 }
+if (kvm_check_extension(s, KVM_CAP_ENABLE_CAP_VM)) {
+ret = kvm_vm_enable_cap(s, KVM_CAP_DISABLE_QUIRKS, 0,
+KVM_QUIRK_LINT0_DISABLED |
+KVM_QUIRK_CD_NW_CLEARED);
+if (ret  0) {
+return ret;
+}
+}
 return 0;
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: Fix MSR_IA32_BNDCFGS in msrs_to_save

2015-04-12 Thread Nadav Amit
kvm_init_msr_list is currently called before hardware_setup. As a result,
vmx_mpx_supported always returns false when kvm_init_msr_list checks whether to
save MSR_IA32_BNDCFGS.

Move kvm_init_msr_list after vmx_hardware_setup is called to fix this issue.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il

---

Note that this patch was not tested on a system that supports MPX.
---
 arch/x86/kvm/x86.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b8cb1d0..b7e3b05 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5800,7 +5800,6 @@ int kvm_arch_init(void *opaque)
kvm_set_mmio_spte_mask();
 
kvm_x86_ops = ops;
-   kvm_init_msr_list();
 
kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
PT_DIRTY_MASK, PT64_NX_MASK, 0);
@@ -7254,7 +7253,14 @@ void kvm_arch_hardware_disable(void)
 
 int kvm_arch_hardware_setup(void)
 {
-   return kvm_x86_ops-hardware_setup();
+   int r;
+
+   r = kvm_x86_ops-hardware_setup();
+   if (r != 0)
+   return r;
+
+   kvm_init_msr_list();
+   return 0;
 }
 
 void kvm_arch_hardware_unsetup(void)
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86: Question regarding the reset value of LINT0

2015-04-12 Thread Nadav Amit
Paolo Bonzini pbonz...@redhat.com wrote:

 
 
 On 09/04/2015 21:17, Bandan Das wrote:
 Excluding (1) all of the other issues are related to the VM BIOS. Perhaps
 KVM should somehow realize which VM BIOS runs? (yes, it sounds just as bad.)
 
 How about renaming the toggle Avi mentioned above to something more generic
 (KVM_DISABLE_LEGACY_QUIRKS ?) and grouping all the issues together ? Modern 
 userspace
 will always enable it and get the new correct behavior. When more cases are 
 discovered,
 KVM can just add them to the list.
 
 It can be a VM capability (KVM_FIRMWARE_QUIRKS?) that is enabled via
 KVM_ENABLE_CAP.
 
 The first argument in struct kvm_enable_cap can be used to add more
 quirks in the future.  For now, an argument of zero could be used to:
 
 1) set up LINT0 correctly
 
 2) set up CD and NW correctly in svm_set_cr0
 
 AFAIK the MTRR issue in SeaBIOS was not fixed.  For that, QEMU could
 write to MSR_MTRRcap with KVM_SET_MSR.
 
 Setting the accessed bit in vmx_set_segment is IMHO harmless, and we
 might as well do it even if !enable_unrestricted_guest.

Ok. I’ll send an updated version (as well as updated version of the
init/reset issues) in the next few days.

Nadav--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: Support for disabling quirks

2015-04-12 Thread Nadav Amit
Introducing KVM_CAP_DISABLE_QUIRKS for disabling x86 quirks that were previous
created in order to overcome QEMU issues. Those issue were mostly result of
invalid VM BIOS.  Currently there are two quirks that can be disabled:

1. KVM_QUIRK_LINT0_REENABLED - LINT0 was enabled after boot
2. KVM_QUIRK_CD_NW_CLEARED - CD and NW are cleared after boot

These two issues are already resolved in recent releases of QEMU, and would
therefore be disabled by QEMU.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 Documentation/virtual/kvm/api.txt |  3 ++-
 arch/x86/include/asm/kvm_host.h   |  2 ++
 arch/x86/include/uapi/asm/kvm.h   |  3 +++
 arch/x86/kvm/lapic.c  |  5 +++--
 arch/x86/kvm/svm.c|  3 ++-
 arch/x86/kvm/x86.c| 29 +
 include/uapi/linux/kvm.h  |  1 +
 7 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index bc9f6fe..3931221 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -959,7 +959,8 @@ documentation when it pops into existence).
 4.37 KVM_ENABLE_CAP
 
 Capability: KVM_CAP_ENABLE_CAP, KVM_CAP_ENABLE_CAP_VM
-Architectures: ppc, s390
+Architectures: x86 (only KVM_CAP_ENABLE_CAP_VM),
+  mips (only KVM_CAP_ENABLE_CAP), ppc, s390
 Type: vcpu ioctl, vm ioctl (with KVM_CAP_ENABLE_CAP_VM)
 Parameters: struct kvm_enable_cap (in)
 Returns: 0 on success; -1 on error
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index dea2e7e..f80ad59 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -635,6 +635,8 @@ struct kvm_arch {
#endif
 
bool boot_vcpu_runs_old_kvmclock;
+
+   u64 disabled_quirks;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index d7dcef5..2fec75e 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -345,4 +345,7 @@ struct kvm_xcrs {
 struct kvm_sync_regs {
 };
 
+#define KVM_QUIRK_LINT0_REENABLED  (1  0)
+#define KVM_QUIRK_CD_NW_CLEARED(1  1)
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 4a6e58a..fe2d89e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1577,8 +1577,9 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
for (i = 0; i  APIC_LVT_NUM; i++)
apic_set_reg(apic, APIC_LVTT + 0x10 * i, APIC_LVT_MASKED);
apic-lapic_timer.timer_mode = 0;
-   apic_set_reg(apic, APIC_LVT0,
-SET_APIC_DELIVERY_MODE(0, APIC_MODE_EXTINT));
+   if (!(vcpu-kvm-arch.disabled_quirks  KVM_QUIRK_LINT0_REENABLED))
+   apic_set_reg(apic, APIC_LVT0,
+SET_APIC_DELIVERY_MODE(0, APIC_MODE_EXTINT));
 
apic_set_reg(apic, APIC_DFR, 0xU);
apic_set_spiv(apic, 0xff);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ce741b8..46299da 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1575,7 +1575,8 @@ static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned 
long cr0)
 * does not do it - this results in some delay at
 * reboot
 */
-   cr0 = ~(X86_CR0_CD | X86_CR0_NW);
+   if (!(vcpu-kvm-arch.disabled_quirks  KVM_QUIRK_CD_NW_CLEARED))
+   cr0 = ~(X86_CR0_CD | X86_CR0_NW);
svm-vmcb-save.cr0 = cr0;
mark_dirty(svm-vmcb, VMCB_CR);
update_cr0_intercept(svm);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b8cb1d0..c3859a6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2778,6 +2778,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
ext)
case KVM_CAP_HYPERV_TIME:
case KVM_CAP_IOAPIC_POLARITY_IGNORED:
case KVM_CAP_TSC_DEADLINE_TIMER:
+   case KVM_CAP_ENABLE_CAP_VM:
 #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
case KVM_CAP_ASSIGN_DEV_IRQ:
case KVM_CAP_PCI_2_3:
@@ -3825,6 +3826,26 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct 
kvm_irq_level *irq_event,
return 0;
 }
 
+static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
+  struct kvm_enable_cap *cap)
+{
+   int r;
+
+   if (cap-flags)
+   return -EINVAL;
+
+   switch (cap-cap) {
+   case KVM_CAP_DISABLE_QUIRKS:
+   kvm-arch.disabled_quirks = cap-args[0];
+   r = 0;
+   break;
+   default:
+   r = -EINVAL;
+   break;
+   }
+   return r;
+}
+
 long kvm_arch_vm_ioctl(struct file *filp,
   unsigned int ioctl, unsigned long arg)
 {
@@ -4077,7 +4098,15 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = 0;
break;
}
+   case KVM_ENABLE_CAP: {
+   struct kvm_enable_cap cap;
 
+   r = -EFAULT;
+   if (copy_from_user(cap, argp

Re: x86: Question regarding the reset value of LINT0

2015-04-09 Thread Nadav Amit
Avi Kivity avi.kiv...@gmail.com wrote:

 On 04/09/2015 09:21 PM, Nadav Amit wrote:
 Bandan Das b...@redhat.com wrote:
 
 Nadav Amit nadav.a...@gmail.com writes:
 
 Jan Kiszka jan.kis...@siemens.com wrote:
 
 On 2015-04-08 19:40, Nadav Amit wrote:
 Jan Kiszka jan.kis...@siemens.com wrote:
 
 On 2015-04-08 18:59, Nadav Amit wrote:
 Jan Kiszka jan.kis...@siemens.com wrote:
 
 On 2015-04-08 18:40, Nadav Amit wrote:
 Hi,
 
 I would appreciate if someone explains the reason for enabling LINT0 
 during
 APIC reset. This does not correspond with Intel SDM Figure 10-8: 
 “Local
 Vector Table” that says all LVT registers are reset to 0x1.
 
 In kvm_lapic_reset, I see:
 
  apic_set_reg(apic, APIC_LVT0,
  SET_APIC_DELIVERY_MODE(0, APIC_MODE_EXTINT));
 
 Which is actually pretty similar to QEMU’s apic_reset_common:
 
 if (bsp) {
 /*
  * LINT0 delivery mode on CPU #0 is set to ExtInt at 
 initialization
  * time typically by BIOS, so PIC interrupt can be delivered to 
 the
  * processor when local APIC is enabled.
  */
 s-lvt[APIC_LVT_LINT0] = 0x700;
 }
 
 Yet, in both cases, I miss the point - if it is typically done by 
 the BIOS,
 why does QEMU or KVM enable it?
 
 BTW: KVM seems to run fine without it, and I think setting it causes 
 me
 problems in certain cases.
 I suspect it has some historic BIOS backgrounds. Already tried to find
 more information in the git logs of both code bases? Or something that
 indicates of SeaBIOS or BochsBIOS once didn't do this initialization?
 Thanks. I found no indication of such thing.
 
 QEMU’s commit message (0e21e12bb311c4c1095d0269dc2ef81196ccb60a) says:
 
  Don't route PIC interrupts through the local APIC if the local APIC
  config says so. By Ari Kivity.
 
 Maybe Avi Kivity knows this guy.
 ths? That should have been Thiemo Seufer (IIRC), but he just committed
 the code back then (and is no longer with us, sadly).
 Oh… I am sorry - I didn’t know about that.. (I tried to make an unfunny 
 joke
 about Avi knowing “Ari”).
 Ah. No problem. My brain apparently fixed that typo up unnoticed.
 
 But if that commit went in without any BIOS changes around it, QEMU
 simply had to do the job of the latter to keep things working.
 So should I leave it as is? Can I at least disable in KVM during INIT 
 (and
 leave it as is for RESET)?
 No, I don't think there is a need to leave this inaccurate for QEMU if
 our included BIOS gets it right. I don't know what the backward
 bug-compatibility of KVM is, though. Maybe you can identify since when
 our BIOS is fine so that we can discuss time frames.
 I think that it was addressed in commit
 19c1a7692bf65fc40e56f93ad00cc3eefaad22a4 (Initialize the LINT LVTs on the
 local APIC of the BSP.”) So it should be included in seabios 0.5.0, which
 means qemu 0.12 - so we are talking about the end of 2009 or start of 2010.
 The probability that someone will use a newer version of kernel with 
 something
 as old as 0.12 is probably minimal. I think it's ok to change it with a 
 comment
 indicating the reason. To be on the safe side, however, a user changeable 
 switch
 is something worth considering.
 I don’t see any existing mechanism for KVM to be aware of its user type and
 version. I do see another case of KVM hacks that are intended for fixing
 very old QEMU bugs (see 3a624e29c75 changes in vmx_set_segment, which are
 from pretty much the same time-frame of the issue I try to fix).
 
 Since this is something which would follow around, please advise what would
 be the format. A new ioctl that would supply the userspace “type” (according
 to predefined constants) and version?
 
 That would be madness. KVM shouldn't even know that qemu exists, let alone
 track its versions.
 
 Simply add a new toggle KVM_USE_STANDARD_LAPIC_LVT_INIT and document that
 userspace MUST use it. Old userspace won't, and will get the old buggy
 behavior.

I fully agree it would be madness. Yet it appears to be a recurring problem.
Here are similar problems  found from a short search:

1. vmx_set_segment setting segment accessed (3a624e29c75)
2. svm_set_cr0 clearing CD and NW (709ddebf81c)
3. Limited number of MTRRs due to Seabios bus (0d234daf7e0a)

Excluding (1) all of the other issues are related to the VM BIOS. Perhaps
KVM should somehow realize which VM BIOS runs? (yes, it sounds just as bad.)

Nadav

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86: Question regarding the reset value of LINT0

2015-04-09 Thread Nadav Amit
Bandan Das b...@redhat.com wrote:

 Nadav Amit nadav.a...@gmail.com writes:
 
 Jan Kiszka jan.kis...@siemens.com wrote:
 
 On 2015-04-08 19:40, Nadav Amit wrote:
 Jan Kiszka jan.kis...@siemens.com wrote:
 
 On 2015-04-08 18:59, Nadav Amit wrote:
 Jan Kiszka jan.kis...@siemens.com wrote:
 
 On 2015-04-08 18:40, Nadav Amit wrote:
 Hi,
 
 I would appreciate if someone explains the reason for enabling LINT0 
 during
 APIC reset. This does not correspond with Intel SDM Figure 10-8: “Local
 Vector Table” that says all LVT registers are reset to 0x1.
 
 In kvm_lapic_reset, I see:
 
apic_set_reg(apic, APIC_LVT0,
SET_APIC_DELIVERY_MODE(0, APIC_MODE_EXTINT));
 
 Which is actually pretty similar to QEMU’s apic_reset_common:
 
 if (bsp) {
 /*
  * LINT0 delivery mode on CPU #0 is set to ExtInt at initialization
  * time typically by BIOS, so PIC interrupt can be delivered to the
  * processor when local APIC is enabled.
  */
 s-lvt[APIC_LVT_LINT0] = 0x700;
 }
 
 Yet, in both cases, I miss the point - if it is typically done by the 
 BIOS,
 why does QEMU or KVM enable it?
 
 BTW: KVM seems to run fine without it, and I think setting it causes me
 problems in certain cases.
 
 I suspect it has some historic BIOS backgrounds. Already tried to find
 more information in the git logs of both code bases? Or something that
 indicates of SeaBIOS or BochsBIOS once didn't do this initialization?
 Thanks. I found no indication of such thing.
 
 QEMU’s commit message (0e21e12bb311c4c1095d0269dc2ef81196ccb60a) says:
 
  Don't route PIC interrupts through the local APIC if the local APIC
  config says so. By Ari Kivity.
 
 Maybe Avi Kivity knows this guy.
 
 ths? That should have been Thiemo Seufer (IIRC), but he just committed
 the code back then (and is no longer with us, sadly).
 Oh… I am sorry - I didn’t know about that.. (I tried to make an unfunny 
 joke
 about Avi knowing “Ari”).
 
 Ah. No problem. My brain apparently fixed that typo up unnoticed.
 
 But if that commit went in without any BIOS changes around it, QEMU
 simply had to do the job of the latter to keep things working.
 So should I leave it as is? Can I at least disable in KVM during INIT (and
 leave it as is for RESET)?
 
 No, I don't think there is a need to leave this inaccurate for QEMU if
 our included BIOS gets it right. I don't know what the backward
 bug-compatibility of KVM is, though. Maybe you can identify since when
 our BIOS is fine so that we can discuss time frames.
 
 I think that it was addressed in commit
 19c1a7692bf65fc40e56f93ad00cc3eefaad22a4 (Initialize the LINT LVTs on the
 local APIC of the BSP.”) So it should be included in seabios 0.5.0, which
 means qemu 0.12 - so we are talking about the end of 2009 or start of 2010.
 
 The probability that someone will use a newer version of kernel with something
 as old as 0.12 is probably minimal. I think it's ok to change it with a 
 comment
 indicating the reason. To be on the safe side, however, a user changeable 
 switch
 is something worth considering.

I don’t see any existing mechanism for KVM to be aware of its user type and
version. I do see another case of KVM hacks that are intended for fixing
very old QEMU bugs (see 3a624e29c75 changes in vmx_set_segment, which are
from pretty much the same time-frame of the issue I try to fix).

Since this is something which would follow around, please advise what would
be the format. A new ioctl that would supply the userspace “type” (according
to predefined constants) and version?

Thanks,
Nadav--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: x86: fix x86 eflags fixed bit

2015-04-08 Thread Nadav Amit
Sorry for that - fixes 0efb04406de834d820f7ba150a00d1d3194aa8a6 (KVM: x86:
removing redundant eflags bits definitions”).

Nadav

Wanpeng Li wanpeng...@linux.intel.com wrote:

 Guest can't be booted w/ ept=0, there is a message dumped as below:
 
 If you're running a guest on an Intel machine without unrestricted mode
 support, the failure can be most likely due to the guest entering an invalid
 state for Intel VT. For example, the guest maybe running in big real mode
 which is not supported on less recent Intel processors.
 
 EAX=0011 EBX=f000d2f6 ECX=6cac EDX=000f8956
 ESI=bffbdf62 EDI= EBP=6c68 ESP=6c68
 EIP=d187 EFL=0004 [-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
 ES =e000 000e  00809300 DPL=0 DS16 [-WA]
 CS =f000 000f  00809b00 DPL=0 CS16 [-RA]
 SS =   00809300 DPL=0 DS16 [-WA]
 DS =   00809300 DPL=0 DS16 [-WA]
 FS =   00809300 DPL=0 DS16 [-WA]
 GS =   00809300 DPL=0 DS16 [-WA]
 LDT=   8200 DPL=0 LDT
 TR =   8b00 DPL=0 TSS32-busy
 GDT= 000f6a80 0037
 IDT= 000f6abe 
 CR0=0011 CR2= CR3= CR4=
 DR0= DR1= DR2= 
 DR3= 
 DR6=0ff0 DR7=0400
 EFER=
 Code=01 1e b8 6a 2e 0f 01 16 74 6a 0f 20 c0 66 83 c8 01 0f 22 c0 66 ea 8f 
 d1 0f 00 08 00 b8 10 00 00 00 8e d8 8e c0 8e d0 8e e0 8e e8 89 c8 ff e2 89 c1 
 b8X
 
 X86 eflags bit 1 is fixed set, which means that 1  1 is set instead of 1, 
 this patch fix it.
 
 Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com
 ---
 arch/x86/kvm/emulate.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index b304728..630bcb0 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -2033,7 +2033,7 @@ static int emulate_iret_real(struct x86_emulate_ctxt 
 *ctxt)
X86_EFLAGS_IF | X86_EFLAGS_DF | X86_EFLAGS_OF |
X86_EFLAGS_IOPL | X86_EFLAGS_NT | X86_EFLAGS_RF |
X86_EFLAGS_AC | X86_EFLAGS_ID |
 -  X86_EFLAGS_FIXED_BIT;
 +  X86_EFLAGS_FIXED;
   unsigned long vm86_mask = X86_EFLAGS_VM | X86_EFLAGS_VIF |
 X86_EFLAGS_VIP;
 
 @@ -2072,7 +2072,7 @@ static int emulate_iret_real(struct x86_emulate_ctxt 
 *ctxt)
   }
 
   ctxt-eflags = ~EFLG_RESERVED_ZEROS_MASK; /* Clear reserved zeros */
 - ctxt-eflags |= X86_EFLAGS_FIXED_BIT;
 + ctxt-eflags |= X86_EFLAGS_FIXED;
   ctxt-ops-set_nmi_mask(ctxt, false);
 
   return rc;
 @@ -2390,7 +2390,7 @@ static int em_syscall(struct x86_emulate_ctxt *ctxt)
 
   ops-get_msr(ctxt, MSR_SYSCALL_MASK, msr_data);
   ctxt-eflags = ~msr_data;
 - ctxt-eflags |= X86_EFLAGS_FIXED_BIT;
 + ctxt-eflags |= X86_EFLAGS_FIXED;
 #endif
   } else {
   /* legacy mode */
 -- 
 1.7.1
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


x86: Question regarding the reset value of LINT0

2015-04-08 Thread Nadav Amit
Hi,

I would appreciate if someone explains the reason for enabling LINT0 during
APIC reset. This does not correspond with Intel SDM Figure 10-8: “Local
Vector Table” that says all LVT registers are reset to 0x1.

In kvm_lapic_reset, I see:

apic_set_reg(apic, APIC_LVT0,
SET_APIC_DELIVERY_MODE(0, APIC_MODE_EXTINT));

Which is actually pretty similar to QEMU’s apic_reset_common:

if (bsp) {
/*
 * LINT0 delivery mode on CPU #0 is set to ExtInt at initialization
 * time typically by BIOS, so PIC interrupt can be delivered to the
 * processor when local APIC is enabled.
 */
s-lvt[APIC_LVT_LINT0] = 0x700;
}

Yet, in both cases, I miss the point - if it is typically done by the BIOS,
why does QEMU or KVM enable it?

BTW: KVM seems to run fine without it, and I think setting it causes me
problems in certain cases.

Thanks,
Nadav--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86: Question regarding the reset value of LINT0

2015-04-08 Thread Nadav Amit
Jan Kiszka jan.kis...@siemens.com wrote:

 On 2015-04-08 18:40, Nadav Amit wrote:
 Hi,
 
 I would appreciate if someone explains the reason for enabling LINT0 during
 APIC reset. This does not correspond with Intel SDM Figure 10-8: “Local
 Vector Table” that says all LVT registers are reset to 0x1.
 
 In kvm_lapic_reset, I see:
 
  apic_set_reg(apic, APIC_LVT0,
  SET_APIC_DELIVERY_MODE(0, APIC_MODE_EXTINT));
 
 Which is actually pretty similar to QEMU’s apic_reset_common:
 
if (bsp) {
/*
 * LINT0 delivery mode on CPU #0 is set to ExtInt at initialization
 * time typically by BIOS, so PIC interrupt can be delivered to the
 * processor when local APIC is enabled.
 */
s-lvt[APIC_LVT_LINT0] = 0x700;
}
 
 Yet, in both cases, I miss the point - if it is typically done by the BIOS,
 why does QEMU or KVM enable it?
 
 BTW: KVM seems to run fine without it, and I think setting it causes me
 problems in certain cases.
 
 I suspect it has some historic BIOS backgrounds. Already tried to find
 more information in the git logs of both code bases? Or something that
 indicates of SeaBIOS or BochsBIOS once didn't do this initialization?
Thanks. I found no indication of such thing.

QEMU’s commit message (0e21e12bb311c4c1095d0269dc2ef81196ccb60a) says:

Don't route PIC interrupts through the local APIC if the local APIC
config says so. By Ari Kivity.

Maybe Avi Kivity knows this guy.

Regards,
Nadav--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86: Question regarding the reset value of LINT0

2015-04-08 Thread Nadav Amit
Jan Kiszka jan.kis...@siemens.com wrote:

 On 2015-04-08 18:59, Nadav Amit wrote:
 Jan Kiszka jan.kis...@siemens.com wrote:
 
 On 2015-04-08 18:40, Nadav Amit wrote:
 Hi,
 
 I would appreciate if someone explains the reason for enabling LINT0 during
 APIC reset. This does not correspond with Intel SDM Figure 10-8: “Local
 Vector Table” that says all LVT registers are reset to 0x1.
 
 In kvm_lapic_reset, I see:
 
apic_set_reg(apic, APIC_LVT0,
SET_APIC_DELIVERY_MODE(0, APIC_MODE_EXTINT));
 
 Which is actually pretty similar to QEMU’s apic_reset_common:
 
   if (bsp) {
   /*
* LINT0 delivery mode on CPU #0 is set to ExtInt at initialization
* time typically by BIOS, so PIC interrupt can be delivered to the
* processor when local APIC is enabled.
*/
   s-lvt[APIC_LVT_LINT0] = 0x700;
   }
 
 Yet, in both cases, I miss the point - if it is typically done by the BIOS,
 why does QEMU or KVM enable it?
 
 BTW: KVM seems to run fine without it, and I think setting it causes me
 problems in certain cases.
 
 I suspect it has some historic BIOS backgrounds. Already tried to find
 more information in the git logs of both code bases? Or something that
 indicates of SeaBIOS or BochsBIOS once didn't do this initialization?
 Thanks. I found no indication of such thing.
 
 QEMU’s commit message (0e21e12bb311c4c1095d0269dc2ef81196ccb60a) says:
 
Don't route PIC interrupts through the local APIC if the local APIC
config says so. By Ari Kivity.
 
 Maybe Avi Kivity knows this guy.
 
 ths? That should have been Thiemo Seufer (IIRC), but he just committed
 the code back then (and is no longer with us, sadly).
Oh… I am sorry - I didn’t know about that.. (I tried to make an unfunny joke
about Avi knowing “Ari”).

 But if that commit went in without any BIOS changes around it, QEMU
 simply had to do the job of the latter to keep things working.
So should I leave it as is? Can I at least disable in KVM during INIT (and
leave it as is for RESET)?

Thanks,
Nadav

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86: Question regarding the reset value of LINT0

2015-04-08 Thread Nadav Amit
Jan Kiszka jan.kis...@siemens.com wrote:

 On 2015-04-08 19:40, Nadav Amit wrote:
 Jan Kiszka jan.kis...@siemens.com wrote:
 
 On 2015-04-08 18:59, Nadav Amit wrote:
 Jan Kiszka jan.kis...@siemens.com wrote:
 
 On 2015-04-08 18:40, Nadav Amit wrote:
 Hi,
 
 I would appreciate if someone explains the reason for enabling LINT0 
 during
 APIC reset. This does not correspond with Intel SDM Figure 10-8: “Local
 Vector Table” that says all LVT registers are reset to 0x1.
 
 In kvm_lapic_reset, I see:
 
  apic_set_reg(apic, APIC_LVT0,
  SET_APIC_DELIVERY_MODE(0, APIC_MODE_EXTINT));
 
 Which is actually pretty similar to QEMU’s apic_reset_common:
 
  if (bsp) {
  /*
   * LINT0 delivery mode on CPU #0 is set to ExtInt at initialization
   * time typically by BIOS, so PIC interrupt can be delivered to the
   * processor when local APIC is enabled.
   */
  s-lvt[APIC_LVT_LINT0] = 0x700;
  }
 
 Yet, in both cases, I miss the point - if it is typically done by the 
 BIOS,
 why does QEMU or KVM enable it?
 
 BTW: KVM seems to run fine without it, and I think setting it causes me
 problems in certain cases.
 
 I suspect it has some historic BIOS backgrounds. Already tried to find
 more information in the git logs of both code bases? Or something that
 indicates of SeaBIOS or BochsBIOS once didn't do this initialization?
 Thanks. I found no indication of such thing.
 
 QEMU’s commit message (0e21e12bb311c4c1095d0269dc2ef81196ccb60a) says:
 
   Don't route PIC interrupts through the local APIC if the local APIC
   config says so. By Ari Kivity.
 
 Maybe Avi Kivity knows this guy.
 
 ths? That should have been Thiemo Seufer (IIRC), but he just committed
 the code back then (and is no longer with us, sadly).
 Oh… I am sorry - I didn’t know about that.. (I tried to make an unfunny joke
 about Avi knowing “Ari”).
 
 Ah. No problem. My brain apparently fixed that typo up unnoticed.
 
 But if that commit went in without any BIOS changes around it, QEMU
 simply had to do the job of the latter to keep things working.
 So should I leave it as is? Can I at least disable in KVM during INIT (and
 leave it as is for RESET)?
 
 No, I don't think there is a need to leave this inaccurate for QEMU if
 our included BIOS gets it right. I don't know what the backward
 bug-compatibility of KVM is, though. Maybe you can identify since when
 our BIOS is fine so that we can discuss time frames.

I think that it was addressed in commit
19c1a7692bf65fc40e56f93ad00cc3eefaad22a4 (Initialize the LINT LVTs on the
local APIC of the BSP.”) So it should be included in seabios 0.5.0, which
means qemu 0.12 - so we are talking about the end of 2009 or start of 2010.

What is the verdict?

Nadav--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 3/4] x86: check register values after reset

2015-04-01 Thread Nadav Amit
Checking the reset values of dr0, cr2 and sysenter_eip after reset.  Since INIT
does not initialize certain registers, we keep for each state whether it is
INIT or not. It is also required for the next patch.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 x86/init.c | 55 +++
 1 file changed, 55 insertions(+)

diff --git a/x86/init.c b/x86/init.c
index 717bd12..3694d67 100644
--- a/x86/init.c
+++ b/x86/init.c
@@ -1,6 +1,8 @@
 #include libcflat.h
 #include apic.h
 #include io.h
+#include processor.h
+#include msr.h
 
 #define CHECK_HARD_RESET   0
 
@@ -36,9 +38,51 @@ static inline void rtc_out(u8 reg, u8 val)
 
 extern char resume_start, resume_end;
 
+struct expected_state {
+   bool init;
+};
+
+static struct expected_state expected[] = {
+   {.init = false},
+   {.init = false},
+   {.init = false},
+   {.init = false},
+   {.init = true},
+};
+
 #define state (*(volatile int *)0x2000)
 #define bad (*(volatile int *)0x2004)
 #define resumed (*(volatile int *)0x2008)
+#define dr0 (*(volatile int*)0x200c)
+#define cr2 (*(volatile int*)0x2010)
+#define sysenter_eip (*(volatile int*)0x2014)
+
+static void set_test_regs(void)
+{
+   write_dr0(1);
+   write_cr2(1);
+   wrmsr(MSR_IA32_SYSENTER_EIP, 1);
+}
+
+static bool check_test_regs(bool init)
+{
+   bool error = false;
+
+   if (dr0 != 0) {
+   printf(error: dr0 was not initialized: %x\n, dr0);
+   error = true;
+   }
+   if (cr2 != 0) {
+   printf(error: cr2 was not initialized\n, cr2);
+   error = true;
+   }
+   if (init  sysenter_eip != 1) {
+   printf(wrong sysenter_eip msr: %x\n, sysenter_eip);
+   error = true;
+   }
+
+   return error;
+}
 
 int main(int argc, char **argv)
 {
@@ -63,6 +107,9 @@ int main(int argc, char **argv)
bad |= 2;
}
 
+   if (check_test_regs(expected[state-1].init))
+   bad |= 4;
+
 #if CHECK_HARD_RESET
/*
 * Port 92 bit 0 is cleared on system reset.  On a soft reset it
@@ -76,6 +123,7 @@ int main(int argc, char **argv)
 #endif
}
 
+   set_test_regs();
resumed = 0;
 
switch (state++) {
@@ -122,6 +170,13 @@ asm (
.code16\n
resume_start:\n
incb %cs:0x2008\n // resumed++;
+   mov %dr0, %ecx\n
+   mov %ecx, %cs:0x200c\n// dr0
+   mov %cr2, %ecx\n
+   mov %ecx, %cs:0x2010\n// cr2
+   mov $0x176, %ecx\n
+   rdmsr\n
+   mov %eax, %cs:0x2014\n// sysenter_eip
mov $0x0f, %al\n  // rtc_out(0x0f, 0x00);
out %al, $0x70\n
mov $0x00, %al\n
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 4/4] x86: Test BSP change

2015-04-01 Thread Nadav Amit
Testing whether changing the BSP, followed by INIT works correctly. The APIC ID
is saved after resume and compared. For this test to pass, a fix of QEMU is
needed in addition to a KVM fix.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 x86/init.c| 77 ++-
 x86/unittests.cfg |  1 +
 2 files changed, 71 insertions(+), 7 deletions(-)

diff --git a/x86/init.c b/x86/init.c
index 3694d67..2d7ea99 100644
--- a/x86/init.c
+++ b/x86/init.c
@@ -3,6 +3,7 @@
 #include io.h
 #include processor.h
 #include msr.h
+#include smp.h
 
 #define CHECK_HARD_RESET   0
 
@@ -36,18 +37,30 @@ static inline void rtc_out(u8 reg, u8 val)
 outb(val, 0x71);
 }
 
+static inline void send_init(void)
+{
+   char *reset_vec = (void*)0xfff0;
+
+   reset_vec[0] = 0xeb;
+   reset_vec[1] = 0xfe;
+   apic_icr_write(APIC_DEST_ALLINC | APIC_DEST_PHYSICAL
+ | APIC_DM_INIT, 0);
+}
+
 extern char resume_start, resume_end;
 
 struct expected_state {
bool init;
+   unsigned int bsp;
 };
 
 static struct expected_state expected[] = {
-   {.init = false},
-   {.init = false},
-   {.init = false},
-   {.init = false},
-   {.init = true},
+   {.init = false, .bsp = 0},
+   {.init = false, .bsp = 0},
+   {.init = false, .bsp = 0},
+   {.init = false, .bsp = 0},
+   {.init = true, .bsp = 0},
+   {.init = true, .bsp = 1},
 };
 
 #define state (*(volatile int *)0x2000)
@@ -56,6 +69,7 @@ static struct expected_state expected[] = {
 #define dr0 (*(volatile int*)0x200c)
 #define cr2 (*(volatile int*)0x2010)
 #define sysenter_eip (*(volatile int*)0x2014)
+#define boot_apic_id (*(volatile int *)0x2018)
 
 static void set_test_regs(void)
 {
@@ -84,6 +98,39 @@ static bool check_test_regs(bool init)
return error;
 }
 
+static void change_bsp(bool bsp)
+{
+   unsigned long apicbase = rdmsr(MSR_IA32_APICBASE);
+
+   if (bsp)
+   apicbase |= MSR_IA32_APICBASE_BSP;
+   else
+   apicbase = ~MSR_IA32_APICBASE_BSP;
+wrmsr(MSR_IA32_APICBASE, apicbase);
+}
+
+static void set_bsp(void *data)
+{
+   set_test_regs();
+   change_bsp(true);
+}
+
+static bool change_bsp_test(void)
+{
+   int ncpus;
+
+   smp_init();
+   ncpus = cpu_count();
+   if (ncpus != 2) {
+   printf(Skipping BSP test\n);
+   return false;
+   }
+   on_cpu(1, set_bsp, NULL);
+   change_bsp(false);
+   return true;
+}
+
+
 int main(int argc, char **argv)
 {
volatile u16 *resume_vector_ptr = (u16 *)0x467L;
@@ -110,6 +157,12 @@ int main(int argc, char **argv)
if (check_test_regs(expected[state-1].init))
bad |= 4;
 
+   if (expected[state-1].bsp != boot_apic_id) {
+   printf(error: wrong BSP: %x expected: %x\n,
+   boot_apic_id, expected[state-1].bsp);
+   bad |= 8;
+   }
+
 #if CHECK_HARD_RESET
/*
 * Port 92 bit 0 is cleared on system reset.  On a soft reset it
@@ -151,11 +204,17 @@ int main(int argc, char **argv)
 
case 4:
printf(testing init to BSP... );
-   apic_icr_write(APIC_DEST_SELF | APIC_DEST_PHYSICAL
- | APIC_DM_INIT, 0);
+   send_init();
break;
 
case 5:
+   printf(testing init to BSP (2nd core)... );
+   if (change_bsp_test()) {
+   send_init();
+   break;
+   }
+   state++;/* fall through */
+   case 6:
exit(bad);
}
 
@@ -177,6 +236,10 @@ asm (
mov $0x176, %ecx\n
rdmsr\n
mov %eax, %cs:0x2014\n// sysenter_eip
+   mov $1, %eax\n
+   cpuid\n
+   shrl $24, %ebx\n
+   mov %ebx, %cs:0x2018\n// apic_id
mov $0x0f, %al\n  // rtc_out(0x0f, 0x00);
out %al, $0x70\n
mov $0x00, %al\n
diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index d7ddd9d..2d25801 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -85,6 +85,7 @@ arch = x86_64
 
 [init]
 file = init.flat
+smp = 2
 
 [msr]
 file = msr.flat
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 1/4] x86: move some common functions to processor.h

2015-04-01 Thread Nadav Amit
Moving some code for reading/writing drs to processor.h. The names of the
functions are changed to be consistent with others.  While we are at it, fixing
lldt so it would work.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 lib/x86/processor.h | 19 ++-
 x86/debug.c | 36 +++-
 2 files changed, 25 insertions(+), 30 deletions(-)

diff --git a/lib/x86/processor.h b/lib/x86/processor.h
index 7973879..47f8d2c 100644
--- a/lib/x86/processor.h
+++ b/lib/x86/processor.h
@@ -232,7 +232,7 @@ static inline void sidt(struct descriptor_table_ptr *ptr)
 
 static inline void lldt(unsigned val)
 {
-asm volatile (lldt %0 : : rm(val));
+asm volatile (lldt %0 : : m(val));
 }
 
 static inline u16 sldt(void)
@@ -254,6 +254,23 @@ static inline u16 str(void)
 return val;
 }
 
+static inline ulong read_dr0(void)
+{
+ulong val;
+asm volatile (mov %%dr0, %0 : =r(val));
+return val;
+}
+
+static inline void write_dr0(ulong val)
+{
+   asm volatile(mov %0,%%dr0 : : r (val));
+}
+
+static inline void write_dr1(ulong val)
+{
+   asm volatile(mov %0,%%dr1 : : r (val));
+}
+
 static inline void write_dr6(ulong val)
 {
 asm volatile (mov %0, %%dr6 : : r(val) : memory);
diff --git a/x86/debug.c b/x86/debug.c
index 34e56fb..68cac20 100644
--- a/x86/debug.c
+++ b/x86/debug.c
@@ -11,45 +11,23 @@
 
 #include libcflat.h
 #include desc.h
+#include processor.h
 
 static volatile unsigned long bp_addr[10], dr6[10];
 static volatile unsigned int n;
 static volatile unsigned long value;
 
-static unsigned long get_dr6(void)
-{
-   unsigned long value;
-
-   asm volatile(mov %%dr6,%0 : =r (value));
-   return value;
-}
-
-static void set_dr0(void *value)
-{
-   asm volatile(mov %0,%%dr0 : : r (value));
-}
-
-static void set_dr1(void *value)
-{
-   asm volatile(mov %0,%%dr1 : : r (value));
-}
-
-static void set_dr7(unsigned long value)
-{
-   asm volatile(mov %0,%%dr7 : : r (value));
-}
-
 static void handle_db(struct ex_regs *regs)
 {
bp_addr[n] = regs-rip;
-   dr6[n] = get_dr6();
+   dr6[n] = read_dr6();
 
if (dr6[n]  0x1)
regs-rflags |= (1  16);
 
if (++n = 10) {
regs-rflags = ~(1  8);
-   set_dr7(0x0400);
+   write_dr7(0x0400);
}
 }
 
@@ -70,8 +48,8 @@ sw_bp:
asm volatile(int3);
report(#BP, bp_addr[0] == (unsigned long)sw_bp + 1);
 
-   set_dr0(hw_bp);
-   set_dr7(0x0402);
+   write_dr0((ulong)hw_bp);
+   write_dr7(0x0402);
 hw_bp:
asm volatile(nop);
report(hw breakpoint,
@@ -97,8 +75,8 @@ hw_bp:
   bp_addr[2] == start+1+6+1+1  dr6[2] == 0x4ff0);
 
n = 0;
-   set_dr1((void *)value);
-   set_dr7(0x00d0040a);
+   write_dr1((ulong)value);
+   write_dr7(0x00d0040a);
 
asm volatile(
mov $42,%%rax\n\t
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 0/4] Init tests for reset reg checks

2015-04-01 Thread Nadav Amit
Reenabling the init tests in order to check whether the registers are set
correctly after reset, and to check whether the BSP can be changed from the VM.

Thanks for reviewing the patches.

Nadav Amit (4):
  x86: move some common functions to processor.h
  x86: Reenable init tests
  x86: check register values after reset
  x86: Test BSP change

 lib/x86/processor.h |  19 +++-
 x86/debug.c |  36 +++
 x86/init.c  | 128 +++-
 x86/unittests.cfg   |   5 +-
 4 files changed, 154 insertions(+), 34 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 2/4] x86: Reenable init tests

2015-04-01 Thread Nadav Amit
Reenabling init tests. Disabling the tests of port 0x92 since it currently
fails.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 x86/init.c| 6 ++
 x86/unittests.cfg | 4 ++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/x86/init.c b/x86/init.c
index 344dc1c..717bd12 100644
--- a/x86/init.c
+++ b/x86/init.c
@@ -2,6 +2,8 @@
 #include apic.h
 #include io.h
 
+#define CHECK_HARD_RESET   0
+
 #define KBD_CCMD_READ_OUTPORT   0xD0/* read output port */
 #define KBD_CCMD_WRITE_OUTPORT  0xD1/* write output port */
 #define KBD_CCMD_RESET  0xFE/* CPU reset */
@@ -60,14 +62,18 @@ int main(int argc, char **argv)
printf(Uh, resume vector visited %d times?\n, 
resumed);
bad |= 2;
}
+
+#if CHECK_HARD_RESET
/*
 * Port 92 bit 0 is cleared on system reset.  On a soft reset it
 * is left to 1.  Use this to distinguish INIT from hard reset.
+* This test is currently disabled since it currently fails.
 */
if (resumed != 0  (inb(0x92)  1) == 0) {
printf(Uh, hard reset!\n);
bad |= 1;
}
+#endif
}
 
resumed = 0;
diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index badb08a..d7ddd9d 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -83,8 +83,8 @@ file = hypercall.flat
 file = idt_test.flat
 arch = x86_64
 
-#[init]
-#file = init.flat
+[init]
+file = init.flat
 
 [msr]
 file = msr.flat
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/4] KVM: x86: INIT and reset sequences are different

2015-04-01 Thread Nadav Amit
x86 architecture defines differences between the reset and INIT sequences.
INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU,
MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP.

EFER is supposed NOT to be reset according to the SDM, but leaving the LMA and
LME untouched causes failed VM-entry.  Therefore we reset EFER (although it is
unclear whether the rest of EFER bits should be reset).

References (from Intel SDM):

If the MP protocol has completed and a BSP is chosen, subsequent INITs (either
to a specific processor or system wide) do not cause the MP protocol to be
repeated. [8.4.2: MP Initialization Protocol Requirements and Restrictions]

[Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT]

If the processor is reset by asserting the INIT# pin, the x87 FPU state is not
changed. [9.2: X87 FPU INITIALIZATION]

The state of the local APIC following an INIT reset is the same as it is after
a power-up or hardware reset, except that the APIC ID and arbitration ID
registers are not affected. [10.4.7.3: Local APIC State After an INIT Reset
(“Wait-for-SIPI” State)]

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/include/asm/kvm_host.h |  6 +++---
 arch/x86/kvm/lapic.c| 11 ++-
 arch/x86/kvm/lapic.h|  2 +-
 arch/x86/kvm/svm.c  |  2 +-
 arch/x86/kvm/vmx.c  | 30 +-
 arch/x86/kvm/x86.c  | 17 ++---
 6 files changed, 38 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bf5a160..59f4374 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -701,7 +701,7 @@ struct kvm_x86_ops {
/* Create, but do not attach this VCPU */
struct kvm_vcpu *(*vcpu_create)(struct kvm *kvm, unsigned id);
void (*vcpu_free)(struct kvm_vcpu *vcpu);
-   void (*vcpu_reset)(struct kvm_vcpu *vcpu);
+   void (*vcpu_reset)(struct kvm_vcpu *vcpu, bool init_event);
 
void (*prepare_guest_switch)(struct kvm_vcpu *vcpu);
void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu);
@@ -989,7 +989,7 @@ void kvm_pic_clear_all(struct kvm_pic *pic, int 
irq_source_id);
 
 void kvm_inject_nmi(struct kvm_vcpu *vcpu);
 
-int fx_init(struct kvm_vcpu *vcpu);
+int fx_init(struct kvm_vcpu *vcpu, bool init_event);
 
 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
   const u8 *new, int bytes);
@@ -1134,7 +1134,7 @@ int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
 int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
-void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
+void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
 void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu);
 void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
   unsigned long address);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index bd4e34d..17da6fc 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1534,7 +1534,7 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
 
 }
 
-void kvm_lapic_reset(struct kvm_vcpu *vcpu)
+void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
struct kvm_lapic *apic;
int i;
@@ -1548,7 +1548,8 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
/* Stop the timer in case it's a reset to an active apic */
hrtimer_cancel(apic-lapic_timer.timer);
 
-   kvm_apic_set_id(apic, vcpu-vcpu_id);
+   if (!init_event)
+   kvm_apic_set_id(apic, vcpu-vcpu_id);
kvm_apic_set_version(apic-vcpu);
 
for (i = 0; i  APIC_LVT_NUM; i++)
@@ -1689,7 +1690,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE);
 
static_key_slow_inc(apic_sw_disabled.key); /* sw disabled at reset */
-   kvm_lapic_reset(vcpu);
+   kvm_lapic_reset(vcpu, false);
kvm_iodevice_init(apic-dev, apic_mmio_ops);
 
return 0;
@@ -2023,8 +2024,8 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
pe = xchg(apic-pending_events, 0);
 
if (test_bit(KVM_APIC_INIT, pe)) {
-   kvm_lapic_reset(vcpu);
-   kvm_vcpu_reset(vcpu);
+   kvm_lapic_reset(vcpu, true);
+   kvm_vcpu_reset(vcpu, true);
if (kvm_vcpu_is_bsp(apic-vcpu))
vcpu-arch.mp_state = KVM_MP_STATE_RUNNABLE;
else
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 0bc6c65..e4c82dc 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -48,7 +48,7 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
 int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
 void

[PATCH v2 2/4] KVM: x86: BSP in MSR_IA32_APICBASE is writable

2015-04-01 Thread Nadav Amit
After reset, the CPU can change the BSP, which will be used upon INIT.  Reset
should return the BSP which QEMU asked for, and therefore handled accordingly.

To quote: If the MP protocol has completed and a BSP is chosen, subsequent
INITs (either to a specific processor or system wide) do not cause the MP
protocol to be repeated.
[Intel SDM 8.4.2: MP Initialization Protocol Requirements and Restrictions]

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/lapic.c | 2 --
 arch/x86/kvm/svm.c   | 2 +-
 arch/x86/kvm/vmx.c   | 2 +-
 arch/x86/kvm/x86.c   | 2 +-
 include/linux/kvm_host.h | 7 ++-
 5 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 17da6fc..b0dbf68 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1498,8 +1498,6 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
return;
}
 
-   if (!kvm_vcpu_is_bsp(apic-vcpu))
-   value = ~MSR_IA32_APICBASE_BSP;
vcpu-arch.apic_base = value;
 
/* update jump label if enable bit changes */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1ef4c0d..ef5bf21 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1261,7 +1261,7 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
 
svm-vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
   MSR_IA32_APICBASE_ENABLE;
-   if (kvm_vcpu_is_bsp(svm-vcpu))
+   if (kvm_vcpu_is_reset_bsp(svm-vcpu))
svm-vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
 
svm_init_osvw(svm-vcpu);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e9c94c6..5925c65 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4709,7 +4709,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool 
init_event)
if (!init_event) {
apic_base_msr.data = APIC_DEFAULT_PHYS_BASE |
 MSR_IA32_APICBASE_ENABLE;
-   if (kvm_vcpu_is_bsp(vmx-vcpu))
+   if (kvm_vcpu_is_reset_bsp(vmx-vcpu))
apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
apic_base_msr.host_initiated = true;
kvm_set_apic_base(vmx-vcpu, apic_base_msr);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 324e639..c93b1b5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7264,7 +7264,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 
vcpu-arch.pv.pv_unhalted = false;
vcpu-arch.emulate_ctxt.ops = emulate_ops;
-   if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_bsp(vcpu))
+   if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_reset_bsp(vcpu))
vcpu-arch.mp_state = KVM_MP_STATE_RUNNABLE;
else
vcpu-arch.mp_state = KVM_MP_STATE_UNINITIALIZED;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0f574eb..8365cae 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -968,11 +968,16 @@ static inline int kvm_ioeventfd(struct kvm *kvm, struct 
kvm_ioeventfd *args)
 #endif /* CONFIG_HAVE_KVM_EVENTFD */
 
 #ifdef CONFIG_KVM_APIC_ARCHITECTURE
-static inline bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu)
+static inline bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)
 {
return vcpu-kvm-bsp_vcpu_id == vcpu-vcpu_id;
 }
 
+static inline bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu)
+{
+   return (vcpu-arch.apic_base  MSR_IA32_APICBASE_BSP) != 0;
+}
+
 bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu);
 
 #else
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/4] KVM: x86: Reset fixes

2015-04-01 Thread Nadav Amit
This set includes 2 previous patches that deal with the INIT flow that is not
distinguished from regular boot, and allowing the VM to change BSP (which is
used in very certain testing environments). The next 2 patches are new, dealing
with regression that cause DR0-DR3 not to be reset (even when QEMU initiates
the RESET) and CR2 not cleared after INIT.

The second patch regarding BSP requires an additional fix for QEMU, as
otherwise reset fails. A separate patch was submitted to QEMU mailing-list.

Thanks for reviewing the patches.

Nadav Amit (4):
  KVM: x86: INIT and reset sequences are different
  KVM: x86: BSP in MSR_IA32_APICBASE is writable
  KVM: x86: DR0-DR3 are not clear on reset
  KVM: x86: Clear CR2 on VCPU reset

 arch/x86/include/asm/kvm_host.h |  7 ---
 arch/x86/kvm/lapic.c| 13 ++---
 arch/x86/kvm/lapic.h|  2 +-
 arch/x86/kvm/svm.c  |  4 ++--
 arch/x86/kvm/vmx.c  | 30 +-
 arch/x86/kvm/x86.c  | 35 +++
 include/linux/kvm_host.h|  7 ++-
 7 files changed, 63 insertions(+), 35 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/4] KVM: x86: DR0-DR3 are not clear on reset

2015-04-01 Thread Nadav Amit
DR0-DR3 are not cleared as they should during reset and when they are set from
userspace.  It appears to be caused by c77fb5fe6f03 (KVM: x86: Allow the guest
to run with dirty debug registers).

Force their reload on these situations.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/x86.c  | 14 ++
 2 files changed, 15 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 59f4374..495fcb6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -345,6 +345,7 @@ struct kvm_pmu {
 enum {
KVM_DEBUGREG_BP_ENABLED = 1,
KVM_DEBUGREG_WONT_EXIT = 2,
+   KVM_DEBUGREG_RELOAD = 4,
 };
 
 struct kvm_vcpu_arch {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c93b1b5..e4ac17e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -801,6 +801,17 @@ unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_get_cr8);
 
+static void kvm_update_dr0123(struct kvm_vcpu *vcpu)
+{
+   int i;
+
+   if (!(vcpu-guest_debug  KVM_GUESTDBG_USE_HW_BP)) {
+   for (i = 0; i  KVM_NR_DB_REGS; i++)
+   vcpu-arch.eff_db[i] = vcpu-arch.db[i];
+   vcpu-arch.switch_db_regs |= KVM_DEBUGREG_RELOAD;
+   }
+}
+
 static void kvm_update_dr6(struct kvm_vcpu *vcpu)
 {
if (!(vcpu-guest_debug  KVM_GUESTDBG_USE_HW_BP))
@@ -3150,6 +3161,7 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct 
kvm_vcpu *vcpu,
return -EINVAL;
 
memcpy(vcpu-arch.db, dbgregs-db, sizeof(vcpu-arch.db));
+   kvm_update_dr0123(vcpu);
vcpu-arch.dr6 = dbgregs-dr6;
kvm_update_dr6(vcpu);
vcpu-arch.dr7 = dbgregs-dr7;
@@ -6322,6 +6334,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
set_debugreg(vcpu-arch.eff_db[2], 2);
set_debugreg(vcpu-arch.eff_db[3], 3);
set_debugreg(vcpu-arch.dr6, 6);
+   vcpu-arch.switch_db_regs = ~KVM_DEBUGREG_RELOAD;
}
 
trace_kvm_entry(vcpu-vcpu_id);
@@ -7098,6 +7111,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool 
init_event)
kvm_clear_exception_queue(vcpu);
 
memset(vcpu-arch.db, 0, sizeof(vcpu-arch.db));
+   kvm_update_dr0123(vcpu);
vcpu-arch.dr6 = DR6_INIT;
kvm_update_dr6(vcpu);
vcpu-arch.dr7 = DR7_FIXED_1;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/4] KVM: x86: Clear CR2 on VCPU reset

2015-04-01 Thread Nadav Amit
CR2 is not cleared as it should after reset.  See Intel SDM table named IA-32
Processor States Following Power-up, Reset, or INIT.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/x86.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e4ac17e..8fdad04 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7117,6 +7117,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool 
init_event)
vcpu-arch.dr7 = DR7_FIXED_1;
kvm_update_dr7(vcpu);
 
+   vcpu-arch.cr2 = 0;
+
kvm_make_request(KVM_REQ_EVENT, vcpu);
vcpu-arch.apf.msr_val = 0;
vcpu-arch.st.msr_val = 0;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] KVM: x86: INIT and reset sequences are different

2015-03-30 Thread Nadav Amit
x86 architecture defines differences between the reset and INIT sequences.
INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU,
MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP.

References (from Intel SDM):

If the MP protocol has completed and a BSP is chosen, subsequent INITs (either
to a specific processor or system wide) do not cause the MP protocol to be
repeated. [8.4.2: MP Initialization Protocol Requirements and Restrictions]

[Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT]

If the processor is reset by asserting the INIT# pin, the x87 FPU state is not
changed. [9.2: X87 FPU INITIALIZATION]

The state of the local APIC following an INIT reset is the same as it is after
a power-up or hardware reset, except that the APIC ID and arbitration ID
registers are not affected. [10.4.7.3: Local APIC State After an INIT Reset
(“Wait-for-SIPI” State)]

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/include/asm/kvm_host.h |  6 +++---
 arch/x86/kvm/lapic.c| 11 ++-
 arch/x86/kvm/lapic.h|  2 +-
 arch/x86/kvm/svm.c  |  2 +-
 arch/x86/kvm/vmx.c  | 33 +++--
 arch/x86/kvm/x86.c  | 17 ++---
 6 files changed, 40 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bf5a160..59f4374 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -701,7 +701,7 @@ struct kvm_x86_ops {
/* Create, but do not attach this VCPU */
struct kvm_vcpu *(*vcpu_create)(struct kvm *kvm, unsigned id);
void (*vcpu_free)(struct kvm_vcpu *vcpu);
-   void (*vcpu_reset)(struct kvm_vcpu *vcpu);
+   void (*vcpu_reset)(struct kvm_vcpu *vcpu, bool init_event);
 
void (*prepare_guest_switch)(struct kvm_vcpu *vcpu);
void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu);
@@ -989,7 +989,7 @@ void kvm_pic_clear_all(struct kvm_pic *pic, int 
irq_source_id);
 
 void kvm_inject_nmi(struct kvm_vcpu *vcpu);
 
-int fx_init(struct kvm_vcpu *vcpu);
+int fx_init(struct kvm_vcpu *vcpu, bool init_event);
 
 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
   const u8 *new, int bytes);
@@ -1134,7 +1134,7 @@ int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
 int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
-void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
+void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
 void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu);
 void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
   unsigned long address);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index bd4e34d..17da6fc 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1534,7 +1534,7 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
 
 }
 
-void kvm_lapic_reset(struct kvm_vcpu *vcpu)
+void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
struct kvm_lapic *apic;
int i;
@@ -1548,7 +1548,8 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
/* Stop the timer in case it's a reset to an active apic */
hrtimer_cancel(apic-lapic_timer.timer);
 
-   kvm_apic_set_id(apic, vcpu-vcpu_id);
+   if (!init_event)
+   kvm_apic_set_id(apic, vcpu-vcpu_id);
kvm_apic_set_version(apic-vcpu);
 
for (i = 0; i  APIC_LVT_NUM; i++)
@@ -1689,7 +1690,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE);
 
static_key_slow_inc(apic_sw_disabled.key); /* sw disabled at reset */
-   kvm_lapic_reset(vcpu);
+   kvm_lapic_reset(vcpu, false);
kvm_iodevice_init(apic-dev, apic_mmio_ops);
 
return 0;
@@ -2023,8 +2024,8 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
pe = xchg(apic-pending_events, 0);
 
if (test_bit(KVM_APIC_INIT, pe)) {
-   kvm_lapic_reset(vcpu);
-   kvm_vcpu_reset(vcpu);
+   kvm_lapic_reset(vcpu, true);
+   kvm_vcpu_reset(vcpu, true);
if (kvm_vcpu_is_bsp(apic-vcpu))
vcpu-arch.mp_state = KVM_MP_STATE_RUNNABLE;
else
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 0bc6c65..e4c82dc 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -48,7 +48,7 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
 int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
 void kvm_apic_accept_events(struct kvm_vcpu *vcpu);
-void kvm_lapic_reset(struct kvm_vcpu *vcpu);
+void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event);
 u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu);
 void

[PATCH 5/5] KVM: x86: BSP in MSR_IA32_APICBASE is writable

2015-03-30 Thread Nadav Amit
After reset, the CPU can change the BSP, which will be used upon INIT.  Reset
should return the BSP which QEMU asked for, and therefore handled accordingly.

To quote: If the MP protocol has completed and a BSP is chosen, subsequent
INITs (either to a specific processor or system wide) do not cause the MP
protocol to be repeated.
[Intel SDM 8.4.2: MP Initialization Protocol Requirements and Restrictions]

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/lapic.c | 2 --
 arch/x86/kvm/svm.c   | 2 +-
 arch/x86/kvm/vmx.c   | 2 +-
 include/linux/kvm_host.h | 7 ++-
 4 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 17da6fc..b0dbf68 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1498,8 +1498,6 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
return;
}
 
-   if (!kvm_vcpu_is_bsp(apic-vcpu))
-   value = ~MSR_IA32_APICBASE_BSP;
vcpu-arch.apic_base = value;
 
/* update jump label if enable bit changes */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1ef4c0d..ef5bf21 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1261,7 +1261,7 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
 
svm-vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
   MSR_IA32_APICBASE_ENABLE;
-   if (kvm_vcpu_is_bsp(svm-vcpu))
+   if (kvm_vcpu_is_reset_bsp(svm-vcpu))
svm-vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
 
svm_init_osvw(svm-vcpu);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 8aee6db..7e370b2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4709,7 +4709,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool 
init_event)
if (!init_event) {
apic_base_msr.data = APIC_DEFAULT_PHYS_BASE |
 MSR_IA32_APICBASE_ENABLE;
-   if (kvm_vcpu_is_bsp(vmx-vcpu))
+   if (kvm_vcpu_is_reset_bsp(vmx-vcpu))
apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
apic_base_msr.host_initiated = true;
kvm_set_apic_base(vmx-vcpu, apic_base_msr);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0f574eb..8365cae 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -968,11 +968,16 @@ static inline int kvm_ioeventfd(struct kvm *kvm, struct 
kvm_ioeventfd *args)
 #endif /* CONFIG_HAVE_KVM_EVENTFD */
 
 #ifdef CONFIG_KVM_APIC_ARCHITECTURE
-static inline bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu)
+static inline bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)
 {
return vcpu-kvm-bsp_vcpu_id == vcpu-vcpu_id;
 }
 
+static inline bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu)
+{
+   return (vcpu-arch.apic_base  MSR_IA32_APICBASE_BSP) != 0;
+}
+
 bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu);
 
 #else
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] KVM: x86: CMOV emulation on legacy mode is wrong

2015-03-30 Thread Nadav Amit
On legacy mode CMOV emulation should still clear bits [63:32] even if the
assignment is not done. The previous fix 140bad89fd (KVM: x86: emulation of
dword cmov on long-mode should clear [63:32]) was incomplete.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c941abe..62f7a39 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -5126,8 +5126,7 @@ twobyte_insn:
case 0x40 ... 0x4f: /* cmov */
if (test_cc(ctxt-b, ctxt-eflags))
ctxt-dst.val = ctxt-src.val;
-   else if (ctxt-mode != X86EMUL_MODE_PROT64 ||
-ctxt-op_bytes != 4)
+   else if (ctxt-op_bytes != 4)
ctxt-dst.type = OP_NONE; /* no writeback */
break;
case 0x80 ... 0x8f: /* jnz rel, etc*/
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] KVM: x86: BSF and BSR emulation change register unnecassarily

2015-03-30 Thread Nadav Amit
If the source of BSF and BSR is zero, the destination register should not
change. That is how real hardware behaves.  If we set the destination even with
the same value that we had before, we may clear bits [63:32] unnecassarily.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 4961dc5..7004577 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -962,6 +962,22 @@ FASTOP2(xadd);
 
 FASTOP2R(cmp, cmp_r);
 
+static int em_bsf_c(struct x86_emulate_ctxt *ctxt)
+{
+   /* If src is zero, do not writeback, but update flags */
+   if (ctxt-src.val == 0)
+   ctxt-dst.type = OP_NONE;
+   return fastop(ctxt, em_bsf);
+}
+
+static int em_bsr_c(struct x86_emulate_ctxt *ctxt)
+{
+   /* If src is zero, do not writeback, but update flags */
+   if (ctxt-src.val == 0)
+   ctxt-dst.type = OP_NONE;
+   return fastop(ctxt, em_bsr);
+}
+
 static u8 test_cc(unsigned int condition, unsigned long flags)
 {
u8 rc;
@@ -4188,7 +4204,8 @@ static const struct opcode twobyte_table[256] = {
N, N,
G(BitOp, group8),
F(DstMem | SrcReg | ModRM | BitOp | Lock | PageTable, em_btc),
-   F(DstReg | SrcMem | ModRM, em_bsf), F(DstReg | SrcMem | ModRM, em_bsr),
+   I(DstReg | SrcMem | ModRM, em_bsf_c),
+   I(DstReg | SrcMem | ModRM, em_bsr_c),
D(DstReg | SrcMem8 | ModRM | Mov), D(DstReg | SrcMem16 | ModRM | Mov),
/* 0xC0 - 0xC7 */
F2bv(DstMem | SrcReg | ModRM | SrcWrite | Lock, em_xadd),
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] KVM: x86: 64/32 bit fixes and INIT/BSP fixes

2015-03-30 Thread Nadav Amit
This patch-set handles 2 issues. Patches 1-3 deal with some more cases in which
bits [63:32] are not cleared when using dword opsize.  Patches 4-5 handle
anomalies with INIT/BSP (INIT does not behave exactly as reset).

Thanks for reviewing the patches.

Nadav Amit (5):
  KVM: x86: CMOV emulation on legacy mode is wrong
  KVM: x86: POPA emulation may not clear bits [63:32]
  KVM: x86: BSF and BSR emulation change register unnecassarily
  KVM: x86: INIT and reset sequences are different
  KVM: x86: BSP in MSR_IA32_APICBASE is writable

 arch/x86/include/asm/kvm_host.h |  6 ++--
 arch/x86/kvm/emulate.c  | 61 -
 arch/x86/kvm/lapic.c| 13 -
 arch/x86/kvm/lapic.h|  2 +-
 arch/x86/kvm/svm.c  |  4 +--
 arch/x86/kvm/vmx.c  | 33 --
 arch/x86/kvm/x86.c  | 17 +++-
 include/linux/kvm_host.h|  7 -
 8 files changed, 89 insertions(+), 54 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] KVM: x86: POPA emulation may not clear bits [63:32]

2015-03-30 Thread Nadav Amit
POPA should assign the values to the registers as usual registers are assigned.
In other words, 32-bits register assignments should clear bits [63:32] of the
register.

Split the code of register assignments that will be used by future changes as
well.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 39 +++
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 62f7a39..4961dc5 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -478,6 +478,25 @@ static void assign_masked(ulong *dest, ulong src, ulong 
mask)
*dest = (*dest  ~mask) | (src  mask);
 }
 
+static void assign_register(unsigned long *reg, u64 val, int bytes)
+{
+   /* The 4-byte case *is* correct: in 64-bit mode we zero-extend. */
+   switch (bytes) {
+   case 1:
+   *(u8 *)reg = (u8)val;
+   break;
+   case 2:
+   *(u16 *)reg = (u16)val;
+   break;
+   case 4:
+   *reg = (u32)val;
+   break;  /* 64b: zero-extend */
+   case 8:
+   *reg = val;
+   break;
+   }
+}
+
 static inline unsigned long ad_mask(struct x86_emulate_ctxt *ctxt)
 {
return (1UL  (ctxt-ad_bytes  3)) - 1;
@@ -1691,21 +1710,7 @@ static int load_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
 
 static void write_register_operand(struct operand *op)
 {
-   /* The 4-byte case *is* correct: in 64-bit mode we zero-extend. */
-   switch (op-bytes) {
-   case 1:
-   *(u8 *)op-addr.reg = (u8)op-val;
-   break;
-   case 2:
-   *(u16 *)op-addr.reg = (u16)op-val;
-   break;
-   case 4:
-   *op-addr.reg = (u32)op-val;
-   break;  /* 64b: zero-extend */
-   case 8:
-   *op-addr.reg = op-val;
-   break;
-   }
+   return assign_register(op-addr.reg, op-val, op-bytes);
 }
 
 static int writeback(struct x86_emulate_ctxt *ctxt, struct operand *op)
@@ -1926,6 +1931,7 @@ static int em_popa(struct x86_emulate_ctxt *ctxt)
 {
int rc = X86EMUL_CONTINUE;
int reg = VCPU_REGS_RDI;
+   u32 val;
 
while (reg = VCPU_REGS_RAX) {
if (reg == VCPU_REGS_RSP) {
@@ -1933,9 +1939,10 @@ static int em_popa(struct x86_emulate_ctxt *ctxt)
--reg;
}
 
-   rc = emulate_pop(ctxt, reg_rmw(ctxt, reg), ctxt-op_bytes);
+   rc = emulate_pop(ctxt, val, ctxt-op_bytes);
if (rc != X86EMUL_CONTINUE)
break;
+   assign_register(reg_rmw(ctxt, reg), val, ctxt-op_bytes);
--reg;
}
return rc;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: x86: BSP in MSR_IA32_APICBASE is writable

2015-03-30 Thread Nadav Amit
Paolo,

It appears you are right and I have not tested 4 and 5 well enough. I’ll
repost them (the others were tested presumably well enough).

Two short questions:
Can I use init.c in the kvm-unit-tests ? 
Why is it disabled?

Nadav


Paolo Bonzini pbonz...@redhat.com wrote:

 
 
 On 30/03/2015 14:39, Nadav Amit wrote:
 After reset, the CPU can change the BSP, which will be used upon INIT.  Reset
 should return the BSP which QEMU asked for, and therefore handled 
 accordingly.
 
 To quote: If the MP protocol has completed and a BSP is chosen, subsequent
 INITs (either to a specific processor or system wide) do not cause the MP
 protocol to be repeated.
 [Intel SDM 8.4.2: MP Initialization Protocol Requirements and Restrictions]
 
 Signed-off-by: Nadav Amit na...@cs.technion.ac.il
 
 Please provide a kvm-unit-tests testcase for this.
 
 Paolo
 
 ---
 arch/x86/kvm/lapic.c | 2 --
 arch/x86/kvm/svm.c   | 2 +-
 arch/x86/kvm/vmx.c   | 2 +-
 include/linux/kvm_host.h | 7 ++-
 4 files changed, 8 insertions(+), 5 deletions(-)
 
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 17da6fc..b0dbf68 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -1498,8 +1498,6 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 
 value)
  return;
  }
 
 -if (!kvm_vcpu_is_bsp(apic-vcpu))
 -value = ~MSR_IA32_APICBASE_BSP;
  vcpu-arch.apic_base = value;
 
  /* update jump label if enable bit changes */
 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 index 1ef4c0d..ef5bf21 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -1261,7 +1261,7 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm 
 *kvm, unsigned int id)
 
  svm-vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
 MSR_IA32_APICBASE_ENABLE;
 -if (kvm_vcpu_is_bsp(svm-vcpu))
 +if (kvm_vcpu_is_reset_bsp(svm-vcpu))
  svm-vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
 
  svm_init_osvw(svm-vcpu);
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 8aee6db..7e370b2 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -4709,7 +4709,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool 
 init_event)
  if (!init_event) {
  apic_base_msr.data = APIC_DEFAULT_PHYS_BASE |
   MSR_IA32_APICBASE_ENABLE;
 -if (kvm_vcpu_is_bsp(vmx-vcpu))
 +if (kvm_vcpu_is_reset_bsp(vmx-vcpu))
  apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
  apic_base_msr.host_initiated = true;
  kvm_set_apic_base(vmx-vcpu, apic_base_msr);
 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
 index 0f574eb..8365cae 100644
 --- a/include/linux/kvm_host.h
 +++ b/include/linux/kvm_host.h
 @@ -968,11 +968,16 @@ static inline int kvm_ioeventfd(struct kvm *kvm, 
 struct kvm_ioeventfd *args)
 #endif /* CONFIG_HAVE_KVM_EVENTFD */
 
 #ifdef CONFIG_KVM_APIC_ARCHITECTURE
 -static inline bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu)
 +static inline bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)
 {
  return vcpu-kvm-bsp_vcpu_id == vcpu-vcpu_id;
 }
 
 +static inline bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu)
 +{
 +return (vcpu-arch.apic_base  MSR_IA32_APICBASE_BSP) != 0;
 +}
 +
 bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu);
 
 #else
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: x86: BSP in MSR_IA32_APICBASE is writable

2015-03-30 Thread Nadav Amit
Paolo Bonzini pbonz...@redhat.com wrote:

 On 30/03/2015 16:40, Nadav Amit wrote:
 Paolo,
 
 It appears you are right and I have not tested 4 and 5 well enough. I’ll
 repost them (the others were tested presumably well enough).
 
 Two short questions:
 Can I use init.c in the kvm-unit-tests ? 
 Why is it disabled?
 
 Because QEMU support for INIT is incomplete, so the tests would fail
 (Uh, hard reset!).  IIRC sending init to BSP with APIC_DEST_SELF is
 also not supported by actual hardware (or at least not supported
 officially) so that test would also have to be changed to not use a
 shortcut.

So, I would revive the init unit-test and disable the failing assertions,
right?

BTW: It appears that there is another bug - DR[0..3] are not reloaded after
reset.

Regards,
Nadav

-- 8 --

From: Nadav Amit na...@cs.technion.ac.il
Subject: [PATCH] KVM: x86: DR0-DR3 are not clear on reset

DR0-DR3 are not cleared as they should during reset and when they are set from
userspace.  It appears to be caused by c77fb5fe6f03 (KVM: x86: Allow the guest
to run with dirty debug registers).

Force their reload on these situations.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/x86.c  | 14 ++
 2 files changed, 15 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bf5a160..913ae41 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -345,6 +345,7 @@ struct kvm_pmu {
 enum {
KVM_DEBUGREG_BP_ENABLED = 1,
KVM_DEBUGREG_WONT_EXIT = 2,
+   KVM_DEBUGREG_RELOAD = 4,
 };
 
 struct kvm_vcpu_arch {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cc2c759..1f65c3a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -801,6 +801,17 @@ unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_get_cr8);
 
+static void kvm_update_dr0123(struct kvm_vcpu *vcpu)
+{
+   int i;
+
+   if (!(vcpu-guest_debug  KVM_GUESTDBG_USE_HW_BP)) {
+   for (i = 0; i  KVM_NR_DB_REGS; i++)
+   vcpu-arch.eff_db[i] = vcpu-arch.db[i];
+   vcpu-arch.switch_db_regs |= KVM_DEBUGREG_RELOAD;
+   }
+}
+
 static void kvm_update_dr6(struct kvm_vcpu *vcpu)
 {
if (!(vcpu-guest_debug  KVM_GUESTDBG_USE_HW_BP))
@@ -3150,6 +3161,7 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct 
kvm_vcpu *vcpu,
return -EINVAL;
 
memcpy(vcpu-arch.db, dbgregs-db, sizeof(vcpu-arch.db));
+   kvm_update_dr0123(vcpu);
vcpu-arch.dr6 = dbgregs-dr6;
kvm_update_dr6(vcpu);
vcpu-arch.dr7 = dbgregs-dr7;
@@ -6322,6 +6334,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
set_debugreg(vcpu-arch.eff_db[2], 2);
set_debugreg(vcpu-arch.eff_db[3], 3);
set_debugreg(vcpu-arch.dr6, 6);
+   vcpu-arch.switch_db_regs = ~KVM_DEBUGREG_RELOAD;
}
 
trace_kvm_entry(vcpu-vcpu_id);
@@ -7096,6 +7109,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
kvm_clear_exception_queue(vcpu);
 
memset(vcpu-arch.db, 0, sizeof(vcpu-arch.db));
+   kvm_update_dr0123(vcpu);
vcpu-arch.dr6 = DR6_INIT;
kvm_update_dr6(vcpu);
vcpu-arch.dr7 = DR7_FIXED_1;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: x86: Remove redundant definitions

2015-03-29 Thread Nadav Amit
Some constants are redfined in emulate.c. Avoid it.

s/SELECTOR_RPL_MASK/SEGMENT_RPL_MASK
s/SELECTOR_TI_MASK/SEGMENT_TI_MASK

No functional change.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/include/asm/kvm_host.h |  3 ---
 arch/x86/kvm/emulate.c  |  6 +++---
 arch/x86/kvm/vmx.c  | 18 +-
 3 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7ba3d9d..30b28dc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -81,9 +81,6 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, 
int level)
(base_gfn  KVM_HPAGE_GFN_SHIFT(level));
 }
 
-#define SELECTOR_TI_MASK (1  2)
-#define SELECTOR_RPL_MASK 0x03
-
 #define KVM_PERMILLE_MMU_PAGES 20
 #define KVM_MIN_ALLOC_MMU_PAGES 64
 #define KVM_MMU_HASH_SHIFT 10
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 9a578a1..a48bcd7 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2412,7 +2412,7 @@ static int em_sysenter(struct x86_emulate_ctxt *ctxt)
return emulate_gp(ctxt, 0);
 
ctxt-eflags = ~(X86_EFLAGS_VM | X86_EFLAGS_IF);
-   cs_sel = (u16)msr_data  ~SELECTOR_RPL_MASK;
+   cs_sel = (u16)msr_data  ~SEGMENT_RPL_MASK;
ss_sel = cs_sel + 8;
if (efer  EFER_LMA) {
cs.d = 0;
@@ -2479,8 +2479,8 @@ static int em_sysexit(struct x86_emulate_ctxt *ctxt)
return emulate_gp(ctxt, 0);
break;
}
-   cs_sel |= SELECTOR_RPL_MASK;
-   ss_sel |= SELECTOR_RPL_MASK;
+   cs_sel |= SEGMENT_RPL_MASK;
+   ss_sel |= SEGMENT_RPL_MASK;
 
ops-set_segment(ctxt, cs_sel, cs, 0, VCPU_SREG_CS);
ops-set_segment(ctxt, ss_sel, ss, 0, VCPU_SREG_SS);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index fdd9f8b..63ca692 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3263,8 +3263,8 @@ static void fix_pmode_seg(struct kvm_vcpu *vcpu, int seg,
 * default value.
 */
if (seg == VCPU_SREG_CS || seg == VCPU_SREG_SS)
-   save-selector = ~SELECTOR_RPL_MASK;
-   save-dpl = save-selector  SELECTOR_RPL_MASK;
+   save-selector = ~SEGMENT_RPL_MASK;
+   save-dpl = save-selector  SEGMENT_RPL_MASK;
save-s = 1;
}
vmx_set_segment(vcpu, save, seg);
@@ -3837,7 +3837,7 @@ static bool code_segment_valid(struct kvm_vcpu *vcpu)
unsigned int cs_rpl;
 
vmx_get_segment(vcpu, cs, VCPU_SREG_CS);
-   cs_rpl = cs.selector  SELECTOR_RPL_MASK;
+   cs_rpl = cs.selector  SEGMENT_RPL_MASK;
 
if (cs.unusable)
return false;
@@ -3865,7 +3865,7 @@ static bool stack_segment_valid(struct kvm_vcpu *vcpu)
unsigned int ss_rpl;
 
vmx_get_segment(vcpu, ss, VCPU_SREG_SS);
-   ss_rpl = ss.selector  SELECTOR_RPL_MASK;
+   ss_rpl = ss.selector  SEGMENT_RPL_MASK;
 
if (ss.unusable)
return true;
@@ -3887,7 +3887,7 @@ static bool data_segment_valid(struct kvm_vcpu *vcpu, int 
seg)
unsigned int rpl;
 
vmx_get_segment(vcpu, var, seg);
-   rpl = var.selector  SELECTOR_RPL_MASK;
+   rpl = var.selector  SEGMENT_RPL_MASK;
 
if (var.unusable)
return true;
@@ -3914,7 +3914,7 @@ static bool tr_valid(struct kvm_vcpu *vcpu)
 
if (tr.unusable)
return false;
-   if (tr.selector  SELECTOR_TI_MASK) /* TI = 1 */
+   if (tr.selector  SEGMENT_TI_MASK)  /* TI = 1 */
return false;
if (tr.type != 3  tr.type != 11) /* TODO: Check if guest is in IA32e 
mode */
return false;
@@ -3932,7 +3932,7 @@ static bool ldtr_valid(struct kvm_vcpu *vcpu)
 
if (ldtr.unusable)
return true;
-   if (ldtr.selector  SELECTOR_TI_MASK)   /* TI = 1 */
+   if (ldtr.selector  SEGMENT_TI_MASK)/* TI = 1 */
return false;
if (ldtr.type != 2)
return false;
@@ -3949,8 +3949,8 @@ static bool cs_ss_rpl_check(struct kvm_vcpu *vcpu)
vmx_get_segment(vcpu, cs, VCPU_SREG_CS);
vmx_get_segment(vcpu, ss, VCPU_SREG_SS);
 
-   return ((cs.selector  SELECTOR_RPL_MASK) ==
-(ss.selector  SELECTOR_RPL_MASK));
+   return ((cs.selector  SEGMENT_RPL_MASK) ==
+(ss.selector  SEGMENT_RPL_MASK));
 }
 
 /*
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: x86: removing redundant eflags bits definitions

2015-03-29 Thread Nadav Amit
The eflags are redefined (using other defines) in emulate.c.
Use the definition from processor-flags.h as some mess already started.
No functional change.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/include/asm/kvm_host.h |   2 -
 arch/x86/kvm/emulate.c  | 105 ++--
 2 files changed, 46 insertions(+), 61 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bf5a160..7ba3d9d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -84,8 +84,6 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, 
int level)
 #define SELECTOR_TI_MASK (1  2)
 #define SELECTOR_RPL_MASK 0x03
 
-#define IOPL_SHIFT 12
-
 #define KVM_PERMILLE_MMU_PAGES 20
 #define KVM_MIN_ALLOC_MMU_PAGES 64
 #define KVM_MMU_HASH_SHIFT 10
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c941abe..9a578a1 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -248,27 +248,7 @@ struct mode_dual {
struct opcode mode64;
 };
 
-/* EFLAGS bit definitions. */
-#define EFLG_ID (121)
-#define EFLG_VIP (120)
-#define EFLG_VIF (119)
-#define EFLG_AC (118)
-#define EFLG_VM (117)
-#define EFLG_RF (116)
-#define EFLG_IOPL (312)
-#define EFLG_NT (114)
-#define EFLG_OF (111)
-#define EFLG_DF (110)
-#define EFLG_IF (19)
-#define EFLG_TF (18)
-#define EFLG_SF (17)
-#define EFLG_ZF (16)
-#define EFLG_AF (14)
-#define EFLG_PF (12)
-#define EFLG_CF (10)
-
 #define EFLG_RESERVED_ZEROS_MASK 0xffc0802a
-#define EFLG_RESERVED_ONE_MASK 2
 
 enum x86_transfer_type {
X86_TRANSFER_NONE,
@@ -317,7 +297,8 @@ static void invalidate_registers(struct x86_emulate_ctxt 
*ctxt)
  * These EFLAGS bits are restored from saved value during emulation, and
  * any changes are written back to the saved value after emulation.
  */
-#define EFLAGS_MASK (EFLG_OF|EFLG_SF|EFLG_ZF|EFLG_AF|EFLG_PF|EFLG_CF)
+#define EFLAGS_MASK (X86_EFLAGS_OF|X86_EFLAGS_SF|X86_EFLAGS_ZF|X86_EFLAGS_AF|\
+X86_EFLAGS_PF|X86_EFLAGS_CF)
 
 #ifdef CONFIG_X86_64
 #define ON64(x) x
@@ -1399,7 +1380,7 @@ static int pio_in_emulated(struct x86_emulate_ctxt *ctxt,
unsigned int in_page, n;
unsigned int count = ctxt-rep_prefix ?
address_mask(ctxt, reg_read(ctxt, VCPU_REGS_RCX)) : 1;
-   in_page = (ctxt-eflags  EFLG_DF) ?
+   in_page = (ctxt-eflags  X86_EFLAGS_DF) ?
offset_in_page(reg_read(ctxt, VCPU_REGS_RDI)) :
PAGE_SIZE - offset_in_page(reg_read(ctxt, 
VCPU_REGS_RDI));
n = min3(in_page, (unsigned int)sizeof(rc-data) / size, count);
@@ -1412,7 +1393,7 @@ static int pio_in_emulated(struct x86_emulate_ctxt *ctxt,
}
 
if (ctxt-rep_prefix  (ctxt-d  String) 
-   !(ctxt-eflags  EFLG_DF)) {
+   !(ctxt-eflags  X86_EFLAGS_DF)) {
ctxt-dst.data = rc-data + rc-pos;
ctxt-dst.type = OP_MEM_STR;
ctxt-dst.count = (rc-end - rc-pos) / size;
@@ -1792,32 +1773,34 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt,
 {
int rc;
unsigned long val, change_mask;
-   int iopl = (ctxt-eflags  X86_EFLAGS_IOPL)  IOPL_SHIFT;
+   int iopl = (ctxt-eflags  X86_EFLAGS_IOPL)  X86_EFLAGS_IOPL_BIT;
int cpl = ctxt-ops-cpl(ctxt);
 
rc = emulate_pop(ctxt, val, len);
if (rc != X86EMUL_CONTINUE)
return rc;
 
-   change_mask = EFLG_CF | EFLG_PF | EFLG_AF | EFLG_ZF | EFLG_SF | EFLG_OF
-   | EFLG_TF | EFLG_DF | EFLG_NT | EFLG_AC | EFLG_ID;
+   change_mask = X86_EFLAGS_CF | X86_EFLAGS_PF | X86_EFLAGS_AF |
+ X86_EFLAGS_ZF | X86_EFLAGS_SF | X86_EFLAGS_OF |
+ X86_EFLAGS_TF | X86_EFLAGS_DF | X86_EFLAGS_NT |
+ X86_EFLAGS_AC | X86_EFLAGS_ID;
 
switch(ctxt-mode) {
case X86EMUL_MODE_PROT64:
case X86EMUL_MODE_PROT32:
case X86EMUL_MODE_PROT16:
if (cpl == 0)
-   change_mask |= EFLG_IOPL;
+   change_mask |= X86_EFLAGS_IOPL;
if (cpl = iopl)
-   change_mask |= EFLG_IF;
+   change_mask |= X86_EFLAGS_IF;
break;
case X86EMUL_MODE_VM86:
if (iopl  3)
return emulate_gp(ctxt, 0);
-   change_mask |= EFLG_IF;
+   change_mask |= X86_EFLAGS_IF;
break;
default: /* real mode */
-   change_mask |= (EFLG_IOPL | EFLG_IF);
+   change_mask |= (X86_EFLAGS_IOPL | X86_EFLAGS_IF);
break;
}
 
@@ -1918,7 +1901,7 @@ static int em_pusha(struct x86_emulate_ctxt *ctxt)
 
 static int em_pushf(struct x86_emulate_ctxt *ctxt)
 {
-   ctxt-src.val = (unsigned long)ctxt-eflags  ~EFLG_VM;
+   ctxt-src.val = (unsigned long)ctxt-eflags

[PATCH 0/2] KVM: x86: Removing redundant definitions from emulator

2015-03-29 Thread Nadav Amit
There are several redundant definitions in processor-flags.h and emulator.c.

Slowly, but surely they will get mixed, so removing those of emulator.c seems
like a reasonable move (unless I am missing something, e.g., kvm-kmod
consideration).

Nadav Amit (2):
  KVM: x86: removing redundant eflags bits definitions
  KVM: x86: Remove redundant definitions

 arch/x86/include/asm/kvm_host.h |   5 --
 arch/x86/kvm/emulate.c  | 111 ++--
 arch/x86/kvm/vmx.c  |  18 +++
 3 files changed, 58 insertions(+), 76 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2 CPU Conformance Issue in KVM/x86

2015-03-09 Thread Nadav Amit
Avi Kivity avi.kiv...@gmail.com wrote:

 On 03/03/2015 11:52 AM, Paolo Bonzini wrote:
 In this
 case, the VM might expect exceptions when PTE bits which are higher than the
 maximum (reported) address width are set, and it would not get such
 exceptions. This problem can easily be experienced by small change to the
 existing KVM unit-tests.
 
 There are many variants to this problem, and the only solution which I
 consider complete is to report to the VM the maximum (52) physical address
 width to the VM, configure the VM to exit on #PF with reserved-bit
 error-codes, and then emulate these faulting instructions.
 Not even that would be a definitive solution.  If the guest tries to map
 RAM (e.g. a PCI BAR that is backed by RAM) above the host MAXPHYADDR,
 you would get EPT misconfiguration vmexits.
 
 I think there is no way to emulate physical address width correctly,
 except by disabling EPT.
 
 Is the issue emulating a higher MAXPHYADDR on the guest than is available
 on the host? I don't think there's any need to support that.
 
 Emulating a lower setting on the guest than is available on the host is, I
 think, desirable. Whether it would work depends on the relative priority
 of EPT misconfiguration exits vs. page table permission faults.

Thanks for the feedback.

Guest page-table permissions faults got priority over EPT misconfiguration.
KVM can even be set to trap page-table permission faults, at least in VT-x.
Anyhow, I don’t think it is enough. Here is an example

My machine has MAXPHYADDR of 46. I modified kvm-unit-tests access test to
set pte.45 instead of pte.51, which from the VM point-of-view should cause
the #PF error-code indicate the reserved bits are set (just as pte.51 does).
Here is one error from the log:

test pte.p pte.45 pde.p user: FAIL: error code 5 expected d
Dump mapping: address: 1234
--L4: 304b007
--L3: 304c007
--L2: 304d001
--L1: 2201

As you can see, the #PF should have had two reasons: reserved bits, and user
access to supervisor only page. The error-code however does not indicate the
reserved-bits are set.

Note that KVM did not trap any exit on that faulting instruction, as
otherwise it would try to emulate the instruction and assuming it is
supported (and that the #PF was not on an instruction fetch), should be able
to emulate the #PF correctly. 
[ The test actually crashes soon after this error due to these reasons. ]

Anyhow, that is the reason for me to assume that having the maximum
MAXPHYADDR is better.

Nadav

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2 CPU Conformance Issue in KVM/x86

2015-03-09 Thread Nadav Amit
Avi Kivity avi.kiv...@gmail.com wrote:

 On 03/09/2015 07:51 PM, Nadav Amit wrote:
 Avi Kivity avi.kiv...@gmail.com wrote:
 
 On 03/03/2015 11:52 AM, Paolo Bonzini wrote:
 In this
 case, the VM might expect exceptions when PTE bits which are higher than 
 the
 maximum (reported) address width are set, and it would not get such
 exceptions. This problem can easily be experienced by small change to the
 existing KVM unit-tests.
 
 There are many variants to this problem, and the only solution which I
 consider complete is to report to the VM the maximum (52) physical address
 width to the VM, configure the VM to exit on #PF with reserved-bit
 error-codes, and then emulate these faulting instructions.
 Not even that would be a definitive solution.  If the guest tries to map
 RAM (e.g. a PCI BAR that is backed by RAM) above the host MAXPHYADDR,
 you would get EPT misconfiguration vmexits.
 
 I think there is no way to emulate physical address width correctly,
 except by disabling EPT.
 Is the issue emulating a higher MAXPHYADDR on the guest than is available
 on the host? I don't think there's any need to support that.
 
 Emulating a lower setting on the guest than is available on the host is, I
 think, desirable. Whether it would work depends on the relative priority
 of EPT misconfiguration exits vs. page table permission faults.
 Thanks for the feedback.
 
 Guest page-table permissions faults got priority over EPT misconfiguration.
 KVM can even be set to trap page-table permission faults, at least in VT-x.
 Anyhow, I don’t think it is enough.
 
 Why is it not enough? If you trap a permission fault, you can inject any 
 exception error code you like.

Because there is no real permission fault. In the following example, the VM
expects one (VM’s MAXPHYADDR=40), but there isn’t (Host’s MAXPHYADDR=46), so
the hypervisor cannot trap it. It can only trap all #PF, which is obviously
too intrusive.

  Here is an example
 
 My machine has MAXPHYADDR of 46. I modified kvm-unit-tests access test to
 set pte.45 instead of pte.51, which from the VM point-of-view should cause
 the #PF error-code indicate the reserved bits are set (just as pte.51 does).
 Here is one error from the log:
 
 test pte.p pte.45 pde.p user: FAIL: error code 5 expected d
 Dump mapping: address: 1234
 --L4: 304b007
 --L3: 304c007
 --L2: 304d001
 --L1: 2201
 
 This is with an ept misconfig programmed into that address, yes?
A reserved bit in the PTE is set - from the VM point-of-view. If there
wasn’t another cause for #PF, it would lead to EPT violation/misconfig.

 As you can see, the #PF should have had two reasons: reserved bits, and user
 access to supervisor only page. The error-code however does not indicate the
 reserved-bits are set.
 
 Note that KVM did not trap any exit on that faulting instruction, as
 otherwise it would try to emulate the instruction and assuming it is
 supported (and that the #PF was not on an instruction fetch), should be able
 to emulate the #PF correctly.
 [ The test actually crashes soon after this error due to these reasons. ]
 
 Anyhow, that is the reason for me to assume that having the maximum
 MAXPHYADDR is better.
 
 Well, that doesn't work for the reasons Paolo noted.  The guest can have a 
 ivshmem device attached, and map it above a host-supported virtual address, 
 and suddenly it goes slow.

I fully understand. That’s the reason I don’t have a reasonable solution.

Regards,
Nadav--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2 CPU Conformance Issue in KVM/x86

2015-03-03 Thread Nadav Amit
I got two conformance issues in x86/KVM. For the first one I have no
solution. For the latter, my solution is not “great”. Ideas and feedback
would be appreciated.

The first problem is caused by the deprecating of FPU CS/DS in new Intel
CPUs. Assume the VM executes a floating point instruction in real mode (when
CS != 0), and later KVM exits to userspace, causing XSAVE/XRSTOR to save and
restore the FPU state. At this point FPU CS/DS in new CPUs are zero. If the
VM then executes FSAVE in real-mode the save FPU IP would be wrong, since it
is actually calculated by the CPU as [FPU CS] * 4 + [FPU IP].

The second problem occurs when the maximum physical address width that KVM
reports to the VM is different than the real one. Assume the real one is
greater than the reported one (which in KVM is not greater than 40). In this
case, the VM might expect exceptions when PTE bits which are higher than the
maximum (reported) address width are set, and it would not get such
exceptions. This problem can easily be experienced by small change to the
existing KVM unit-tests.

There are many variants to this problem, and the only solution which I
consider complete is to report to the VM the maximum (52) physical address
width to the VM, configure the VM to exit on #PF with reserved-bit
error-codes, and then emulate these faulting instructions.

Thoughts/ideas?

Regards,
Nadav--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2 CPU Conformance Issue in KVM/x86

2015-03-03 Thread Nadav Amit
Paolo,

Thanks for the feedback. One small comment below.

Paolo Bonzini pbonz...@redhat.com wrote:

 
 
 On 03/03/2015 09:34, Nadav Amit wrote:
 I got two conformance issues in x86/KVM. For the first one I have no
 solution. For the latter, my solution is not “great”. Ideas and feedback
 would be appreciated.
 
 The first problem is caused by the deprecating of FPU CS/DS in new Intel
 CPUs. Assume the VM executes a floating point instruction in real mode (when
 CS != 0), and later KVM exits to userspace, causing XSAVE/XRSTOR to save and
 restore the FPU state. At this point FPU CS/DS in new CPUs are zero. If the
 VM then executes FSAVE in real-mode the save FPU IP would be wrong, since it
 is actually calculated by the CPU as [FPU CS] * 4 + [FPU IP].
 
 I think this was analyzed a couple years ago and we decided that this bit
 was not virtualizable.

I am fully aware of the previous reports ( https://lkml.org/lkml/2013/10/16/258 
).

However, one might have expected the conformance problem to be fully
resolved now, since [FPU CS] and [FPU DS] are deprecated in new CPUs.
Indeed, the problem is resolved in all modes, but not in real-mode.

Regards,
Nadav--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/4] KVM: APIC improvements (with bonus mixed mode)

2015-02-26 Thread Nadav Amit
Marcello, Radim,

As you know - I can run some tests on the patches and whether they comply
with real hardware. Please let me know which version to test and I’ll try to
do next week.

Regards,
Nadav

Marcelo Tosatti mtosa...@redhat.com wrote:

 Radim,
 
 On Thu, Feb 12, 2015 at 07:41:30PM +0100, Radim Krčmář wrote:
 Each patch has a diff from v1, here is only a prologue on the mythical
 mixed xAPIC and x2APIC mode:
 
 There is one interesting alias in xAPIC and x2APIC ICR destination, the
 0xff00, which is a broadcast in xAPIC and either a physical message
 to high x2APIC ID or a message to an empty set in x2APIC cluster.
 
 This corner case in mixed mode can be triggered by a weird message
 1) when we have no x2APIC ID 0xff00, but send x2APIC message there
 
 10.7 SYSTEM AND APIC BUS ARBITRATION
 
 Note that except for the SIPI IPI (see Section 10.6.1, “Interrupt
 Command Register (ICR)”), all bus messages that fail to be delivered
 to their specified destination or destinations are automatically
 retried. Software should avoid situations in which IPIs are sent to
 disabled or nonexistent local APICs, causing the messages to be resent
 repeatedly.
 
 or after something that isn't forbidden in SDM most likely because they
 didn't think that anyone would ever consider it
 2) we have x2APIC ID 0xff00 and reset other x2APICs into xAPIC mode
 
 Or just x2APIC initialization in non lockstep:
 
 
   vcpu0   vcpu1
 T0)   xapic mode  xapic mode
 T1)   x2apic enable
 T2)   broadcast IPI
 
 Current KVM doesn't need to consider (2), so there only is a slim chance
 that some hobbyist OS pushes the specification to its limits.
 
 The problem is that SDM for xAPIC doesn't believably specify[1] if all
 messages beginning with 0xff are considered as broadcasts, 10.6.2.1:
  A broadcast IPI (bits 28-31 of the MDA are 1's)
 
 No volunteer came to clear this up, so I hacked Linux to have one x2APIC
 core between xAPIC ones.  Physical IPI to 0xff00 got delivered only
 to CPU0, like most other destinations, Logical IPI to 0xff00 got
 dropped and only 0x worked as a broadcast in both modes.
 
 I think it is because Intel never considered mixed mode to be valid, and
 seen delivery might be an emergent feature of QPI routing.
 
 Luckily, broadcast from xAPIC isn't delivered to x2APIC.
 
 In real hardware?
 
 Real exploration would require greater modifications to Linux (ideally
 writing a custom kernel), so this series implements something that makes
 some sense and isn't too far from reality.
 
 Ok, so the problem is that broadcast (ICR.destination == 0xff00)
 from xAPIC CPU is not delivered to x2APIC CPUs ?
 
 Radim Krčmář (4):
  KVM: x86: use MDA for interrupt matching
  KVM: x86: fix mixed APIC mode broadcast
  KVM: x86: avoid logical_map when it is invalid
  KVM: x86: simplify kvm_apic_map
 
 I can't find any restriction against (cpu0==x2APIC, cpu1==xAPIC) in 
 the documentation.
 
 Anyway, emulation should match physical hardware. From your message above,
 it is not clear what is the behaviour there:
 
 No volunteer came to clear this up, so I hacked Linux to have one x2APIC
 core between xAPIC ones.  Physical IPI to 0xff00 got delivered only
 to CPU0, like most other destinations, Logical IPI to 0xff00 got
 dropped and only 0x worked as a broadcast in both modes.
 
 Luckily, broadcast from xAPIC isn't delivered to x2APIC.
 
 From the x2APIC CPU or the xAPIC ones?
 
 It should be easy to write kvm-unit-test testcases that match physical
 hardware behaviour (in general, i am having a hard time figure out
 in what way mixed mode is supposed to behave, please describe it
 clearly).
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: emulate: correct page fault error code for NoWrite instructions

2015-02-10 Thread Nadav Amit
Sorry for that. I’ll try to run some few more tests next week.

Nadav

Paolo Bonzini pbonz...@redhat.com wrote:

 NoWrite instructions (e.g. cmp or test) never set the write access
 bit in the error code, even if one of the operands is treated as a
 destination.
 
 Fixes: c205fb7d7d4f81e46fc577b707ceb9e356af1456
 Cc: Nadav Amit na...@cs.technion.ac.il
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
 arch/x86/kvm/emulate.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index 81dcf7964701..a943bf0c06d0 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -4954,7 +4954,8 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
   rc = segmented_read(ctxt, ctxt-dst.addr.mem,
  ctxt-dst.val, ctxt-dst.bytes);
   if (rc != X86EMUL_CONTINUE) {
 - if (rc == X86EMUL_PROPAGATE_FAULT 
 + if (!(ctxt-d  NoWrite) 
 + rc == X86EMUL_PROPAGATE_FAULT 
   ctxt-exception.vector == PF_VECTOR)
   ctxt-exception.error_code |= PFERR_WRITE_MASK;
   goto done;
 -- 
 1.8.3.1
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/7] KVM: x86: Dirty the dest op page on cmpxchg emulation

2015-01-25 Thread Nadav Amit
Intel SDM says for CMPXCHG: To simplify the interface to the processor’s bus,
the destination operand receives a write cycle without regard to the result of
the comparison.. This means the destination page should be dirtied.

Fix it to by writing back the original value if cmpxchg failed.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index ef23c1e..aa27254 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2205,12 +2205,15 @@ static int em_cmpxchg(struct x86_emulate_ctxt *ctxt)
fastop(ctxt, em_cmp);
 
if (ctxt-eflags  EFLG_ZF) {
-   /* Success: write back to memory. */
+   /* Success: write back to memory; no update of EAX */
+   ctxt-src.type = OP_NONE;
ctxt-dst.val = ctxt-src.orig_val;
} else {
/* Failure: write the value we saw to EAX. */
-   ctxt-dst.type = OP_REG;
-   ctxt-dst.addr.reg = reg_rmw(ctxt, VCPU_REGS_RAX);
+   ctxt-src.type = OP_REG;
+   ctxt-src.addr.reg = reg_rmw(ctxt, VCPU_REGS_RAX);
+   ctxt-src.val = ctxt-dst.orig_val;
+   /* Create write-cycle to dest by writing the same value */
ctxt-dst.val = ctxt-dst.orig_val;
}
return X86EMUL_CONTINUE;
@@ -4157,7 +4160,7 @@ static const struct opcode twobyte_table[256] = {
F(DstMem | SrcReg | Src2CL | ModRM, em_shrd),
GD(0, group15), F(DstReg | SrcMem | ModRM, em_imul),
/* 0xB0 - 0xB7 */
-   I2bv(DstMem | SrcReg | ModRM | Lock | PageTable, em_cmpxchg),
+   I2bv(DstMem | SrcReg | ModRM | Lock | PageTable | SrcWrite, em_cmpxchg),
I(DstReg | SrcMemFAddr | ModRM | Src2SS, em_lseg),
F(DstMem | SrcReg | ModRM | BitOp | Lock, em_btr),
I(DstReg | SrcMemFAddr | ModRM | Src2FS, em_lseg),
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/7] KVM: x86: Fix defines in emulator.c

2015-01-25 Thread Nadav Amit
Unnecassary define was left after commit 7d882ffa81d5 (KVM: x86: Revert
NoBigReal patch in the emulator”).

Commit 39f062ff51b2 (KVM: x86: Generate #UD when memory operand is required”)
was missing undef.

Fix it.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index db3cf39..997c9eb 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -167,7 +167,6 @@
 #define NoMod  ((u64)1  47)  /* Mod field is ignored */
 #define Intercept   ((u64)1  48)  /* Has valid intercept field */
 #define CheckPerm   ((u64)1  49)  /* Has valid check_perm field */
-#define NoBigReal   ((u64)1  50)  /* No big real mode */
 #define PrivUD  ((u64)1  51)  /* #UD instead of #GP on CPL  0 */
 #define NearBranch  ((u64)1  52)  /* Near branches */
 #define No16   ((u64)1  53)  /* No 16 bit operand */
@@ -4246,6 +4245,7 @@ static const struct opcode opcode_map_0f_38[256] = {
 #undef GP
 #undef EXT
 #undef MD
+#undef ID
 
 #undef D2bv
 #undef D2bvIP
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/7] KVM: x86: ARPL emulation can cause spurious exceptions

2015-01-25 Thread Nadav Amit
ARPL and MOVSXD are encoded the same and their execution depends on the
execution mode.  The operand sizes of each instruction are different.
Currently, ARPL is detected too late, after the decoding was already done, and
therefore may result in spurious exception (instead of failed emulation).

Introduce a group to the emulator to handle instructions according to execution
mode (32/64 bits). Note: in order not to make changes that may affect
performance, the new ModeDual can only be applied to instructions with ModRM.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 32 ++--
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index fa3ca55..db3cf39 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -125,6 +125,7 @@
 #define RMExt   (415) /* Opcode extension in ModRM r/m if mod == 3 */
 #define Escape  (515) /* Escape to coprocessor instruction */
 #define InstrDual   (615) /* Alternate instruction decoding of mod == 3 
*/
+#define ModeDual(715) /* Different instruction for 32/64 bit */
 #define Sse (118) /* SSE Vector instruction */
 /* Generic ModRM decode. */
 #define ModRM   (119)
@@ -215,6 +216,7 @@ struct opcode {
const struct gprefix *gprefix;
const struct escape *esc;
const struct instr_dual *idual;
+   const struct mode_dual *mdual;
void (*fastop)(struct fastop *fake);
} u;
int (*check_perm)(struct x86_emulate_ctxt *ctxt);
@@ -242,6 +244,11 @@ struct instr_dual {
struct opcode mod3;
 };
 
+struct mode_dual {
+   struct opcode mode32;
+   struct opcode mode64;
+};
+
 /* EFLAGS bit definitions. */
 #define EFLG_ID (121)
 #define EFLG_VIP (120)
@@ -3530,6 +3537,12 @@ static int em_clflush(struct x86_emulate_ctxt *ctxt)
return X86EMUL_CONTINUE;
 }
 
+static int em_movsxd(struct x86_emulate_ctxt *ctxt)
+{
+   ctxt-dst.val = (s32) ctxt-src.val;
+   return X86EMUL_CONTINUE;
+}
+
 static bool valid_cr(int nr)
 {
switch (nr) {
@@ -3729,6 +3742,7 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt)
 #define G(_f, _g) { .flags = ((_f) | Group | ModRM), .u.group = (_g) }
 #define GD(_f, _g) { .flags = ((_f) | GroupDual | ModRM), .u.gdual = (_g) }
 #define ID(_f, _i) { .flags = ((_f) | InstrDual | ModRM), .u.idual = (_i) }
+#define MD(_f, _m) { .flags = ((_f) | ModeDual), .u.mdual = (_m) }
 #define E(_f, _e) { .flags = ((_f) | Escape | ModRM), .u.esc = (_e) }
 #define I(_f, _e) { .flags = (_f), .u.execute = (_e) }
 #define F(_f, _e) { .flags = (_f) | Fastop, .u.fastop = (_e) }
@@ -3973,6 +3987,10 @@ static const struct instr_dual instr_dual_0f_c3 = {
I(DstMem | SrcReg | ModRM | No16 | Mov, em_mov), N
 };
 
+static const struct mode_dual mode_dual_63 = {
+   N, I(DstReg | SrcMem32 | ModRM | Mov, em_movsxd)
+};
+
 static const struct opcode opcode_table[256] = {
/* 0x00 - 0x07 */
F6ALU(Lock, em_add),
@@ -4007,7 +4025,7 @@ static const struct opcode opcode_table[256] = {
/* 0x60 - 0x67 */
I(ImplicitOps | Stack | No64, em_pusha),
I(ImplicitOps | Stack | No64, em_popa),
-   N, D(DstReg | SrcMem32 | ModRM | Mov) /* movsxd (x86/64) */ ,
+   N, MD(ModRM, mode_dual_63),
N, N, N, N,
/* 0x68 - 0x6F */
I(SrcImm | Mov | Stack, em_push),
@@ -4227,6 +4245,7 @@ static const struct opcode opcode_map_0f_38[256] = {
 #undef I
 #undef GP
 #undef EXT
+#undef MD
 
 #undef D2bv
 #undef D2bvIP
@@ -4616,6 +4635,12 @@ done_prefixes:
else
opcode = opcode.u.idual-mod012;
break;
+   case ModeDual:
+   if (ctxt-mode == X86EMUL_MODE_PROT64)
+   opcode = opcode.u.mdual-mode64;
+   else
+   opcode = opcode.u.mdual-mode32;
+   break;
default:
return EMULATION_FAILED;
}
@@ -4956,11 +4981,6 @@ special_insn:
goto threebyte_insn;
 
switch (ctxt-b) {
-   case 0x63:  /* movsxd */
-   if (ctxt-mode != X86EMUL_MODE_PROT64)
-   goto cannot_emulate;
-   ctxt-dst.val = (s32) ctxt-src.val;
-   break;
case 0x70 ... 0x7f: /* jcc (short) */
if (test_cc(ctxt-b, ctxt-eflags))
rc = jmp_rel(ctxt, ctxt-src.val);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/7] KVM: x86: IRET emulation does not clear NMI masking

2015-01-25 Thread Nadav Amit
The IRET instruction should clear NMI masking, but the current implementation
does not do so.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/include/asm/kvm_emulate.h | 1 +
 arch/x86/kvm/emulate.c | 1 +
 arch/x86/kvm/x86.c | 6 ++
 3 files changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index eb18117..57a9d94 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -208,6 +208,7 @@ struct x86_emulate_ops {
 
void (*get_cpuid)(struct x86_emulate_ctxt *ctxt,
  u32 *eax, u32 *ebx, u32 *ecx, u32 *edx);
+   void (*set_nmi_mask)(struct x86_emulate_ctxt *ctxt, bool masked);
 };
 
 typedef u32 __attribute__((vector_size(16))) sse128_t;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index dd71004..fa3ca55 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2047,6 +2047,7 @@ static int emulate_iret_real(struct x86_emulate_ctxt 
*ctxt)
 
ctxt-eflags = ~EFLG_RESERVED_ZEROS_MASK; /* Clear reserved zeros */
ctxt-eflags |= EFLG_RESERVED_ONE_MASK;
+   ctxt-ops-set_nmi_mask(ctxt, false);
 
return rc;
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index afa0815..cdd6606 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4963,6 +4963,11 @@ static void emulator_write_gpr(struct x86_emulate_ctxt 
*ctxt, unsigned reg, ulon
kvm_register_write(emul_to_vcpu(ctxt), reg, val);
 }
 
+static void emulator_set_nmi_mask(struct x86_emulate_ctxt *ctxt, bool masked)
+{
+   kvm_x86_ops-set_nmi_mask(emul_to_vcpu(ctxt), masked);
+}
+
 static const struct x86_emulate_ops emulate_ops = {
.read_gpr= emulator_read_gpr,
.write_gpr   = emulator_write_gpr,
@@ -4998,6 +5003,7 @@ static const struct x86_emulate_ops emulate_ops = {
.put_fpu = emulator_put_fpu,
.intercept   = emulator_intercept,
.get_cpuid   = emulator_get_cpuid,
+   .set_nmi_mask= emulator_set_nmi_mask,
 };
 
 static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask)
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] KVM: x86: 32-bit wraparound read/write not emulated correctly

2015-01-25 Thread Nadav Amit
If we got a wraparound of 32-bit operand, and the limit is 0x, read and
writes should be successful. It just needs to be done in two segments.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 10 +++---
 arch/x86/kvm/x86.c |  2 ++
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 997c9eb..c3b0757 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -684,9 +684,13 @@ static __always_inline int __linearize(struct 
x86_emulate_ctxt *ctxt,
}
if (addr.ea  lim)
goto bad;
-   *max_size = min_t(u64, ~0u, (u64)lim + 1 - addr.ea);
-   if (size  *max_size)
-   goto bad;
+   if (lim == 0x)
+   *max_size = ~0u;
+   else {
+   *max_size = (u64)lim + 1 - addr.ea;
+   if (size  *max_size)
+   goto bad;
+   }
la = (u32)-1;
break;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cdd6606..1e10e3f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4495,6 +4495,8 @@ int emulator_read_write(struct x86_emulate_ctxt *ctxt, 
unsigned long addr,
if (rc != X86EMUL_CONTINUE)
return rc;
addr += now;
+   if (ctxt-mode != X86EMUL_MODE_PROT64)
+   addr = (u32)addr;
val += now;
bytes -= now;
}
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] KVM: x86: Emulation of call may use incorrect stack size

2015-01-25 Thread Nadav Amit
On long-mode, when far call that changes cs.l takes place, the stack size is
determined by the new mode.  For instance, if we go from 32-bit mode to 64-bit
mode, the stack-size if 64.  KVM uses the old stack size.

Fix it.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 25 ++---
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c3b0757..81dcf79 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -741,19 +741,26 @@ static int assign_eip_far(struct x86_emulate_ctxt *ctxt, 
ulong dst,
  const struct desc_struct *cs_desc)
 {
enum x86emul_mode mode = ctxt-mode;
+   int rc;
 
 #ifdef CONFIG_X86_64
-   if (ctxt-mode = X86EMUL_MODE_PROT32  cs_desc-l) {
-   u64 efer = 0;
+   if (ctxt-mode = X86EMUL_MODE_PROT16) {
+   if (cs_desc-l) {
+   u64 efer = 0;
 
-   ctxt-ops-get_msr(ctxt, MSR_EFER, efer);
-   if (efer  EFER_LMA)
-   mode = X86EMUL_MODE_PROT64;
+   ctxt-ops-get_msr(ctxt, MSR_EFER, efer);
+   if (efer  EFER_LMA)
+   mode = X86EMUL_MODE_PROT64;
+   } else
+   mode = X86EMUL_MODE_PROT32; /* temporary value */
}
 #endif
if (mode == X86EMUL_MODE_PROT16 || mode == X86EMUL_MODE_PROT32)
mode = cs_desc-d ? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16;
-   return assign_eip(ctxt, dst, mode);
+   rc = assign_eip(ctxt, dst, mode);
+   if (rc == X86EMUL_CONTINUE)
+   ctxt-mode = mode;
+   return rc;
 }
 
 static inline int jmp_rel(struct x86_emulate_ctxt *ctxt, int rel)
@@ -3062,6 +3069,7 @@ static int em_call_far(struct x86_emulate_ctxt *ctxt)
struct desc_struct old_desc, new_desc;
const struct x86_emulate_ops *ops = ctxt-ops;
int cpl = ctxt-ops-cpl(ctxt);
+   enum x86emul_mode prev_mode = ctxt-mode;
 
old_eip = ctxt-_eip;
ops-get_segment(ctxt, old_cs, old_desc, NULL, VCPU_SREG_CS);
@@ -3085,11 +3093,14 @@ static int em_call_far(struct x86_emulate_ctxt *ctxt)
rc = em_push(ctxt);
/* If we failed, we tainted the memory, but the very least we should
   restore cs */
-   if (rc != X86EMUL_CONTINUE)
+   if (rc != X86EMUL_CONTINUE) {
+   pr_warn_once(faulting far call emulation tainted memory\n);
goto fail;
+   }
return rc;
 fail:
ops-set_segment(ctxt, old_cs, old_desc, 0, VCPU_SREG_CS);
+   ctxt-mode = prev_mode;
return rc;
 
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/7] KVM: x86: Wrong operand size for far ret

2015-01-25 Thread Nadav Amit
Indeed, Intel SDM specifically states that for the RET instruction In 64-bit
mode, the default operation size of this instruction is the stack-address size,
i.e. 64 bits.

However, experiments show this is not the case. Here is for example objdump of
small 64-bit asm:

  4004f1:   ca 14 00lret   $0x14
  4004f4:   48 cb   lretq
  4004f6:   48 ca 14 00 lretq  $0x14

Therefore, remove the Stack flag from far-ret instructions.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index aa27254..dd71004 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4062,8 +4062,8 @@ static const struct opcode opcode_table[256] = {
G(ByteOp, group11), G(0, group11),
/* 0xC8 - 0xCF */
I(Stack | SrcImmU16 | Src2ImmByte, em_enter), I(Stack, em_leave),
-   I(ImplicitOps | Stack | SrcImmU16, em_ret_far_imm),
-   I(ImplicitOps | Stack, em_ret_far),
+   I(ImplicitOps | SrcImmU16, em_ret_far_imm),
+   I(ImplicitOps, em_ret_far),
D(ImplicitOps), DI(SrcImmByte, intn),
D(ImplicitOps | No64), II(ImplicitOps, em_iret, iret),
/* 0xD0 - 0xD7 */
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/7] KVM: x86: Emulator fixes

2015-01-25 Thread Nadav Amit
Sorry for sending patches at the last minute. There is nothing critical in this
patch-set.  Yet, if you may want to incorporate something in 3.20 -
specifically 5 (small define mistakes) or 7 (which is somewhat affected by
recent changes).

Thanks for reviewing the patches.

Nadav Amit (7):
  KVM: x86: Dirty the dest op page on cmpxchg emulation
  KVM: x86: Wrong operand size for far ret
  KVM: x86: IRET emulation does not clear NMI masking
  KVM: x86: ARPL emulation can cause spurious exceptions
  KVM: x86: Fix defines in emulator.c
  KVM: x86: 32-bit wraparound read/write not emulated correctly
  KVM: x86: Emulation of call may use incorrect stack size

 arch/x86/include/asm/kvm_emulate.h |  1 +
 arch/x86/kvm/emulate.c | 85 +++---
 arch/x86/kvm/x86.c |  8 
 3 files changed, 71 insertions(+), 23 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI

2015-01-20 Thread Nadav Amit
Radim Kr?má? rkrc...@redhat.com wrote:

 2015-01-14 01:27+, Wu, Feng:
 the new
 hardware even doesn't consider the TPR for lowest priority interrupts
 delivery.
 
 A bold move ... what hardware was the first to do so?
 
 I think it was starting with Nehalem.
 
 Thanks,  (Could be that QPI can't inform about TPR changes anymore ...)
 
 I played with Linux's TPR on Haswell and found that is has no effect.

Sorry for jumping into the discussion, but doesn’t it depend on
IA32_MISC_ENABLE[23]? This bit disables xTPR messages. On my machine it is
set (probably by the BIOS), but since there is no IA32_MISC_ENABLE is not
locked for changes, the OS can control it.

Nadav--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong

2014-12-27 Thread Nadav Amit
Feng feng...@intel.com wrote:

 
 
 -Original Message-
 From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
 Behalf Of Nadav Amit
 Sent: Thursday, December 25, 2014 8:52 AM
 To: pbonz...@redhat.com
 Cc: kvm@vger.kernel.org; Nadav Amit
 Subject: [PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong
 
 When emulating an instruction that reads the destination memory operand
 (i.e.,
 instructions without the Mov flag in the emulator), the operand is first 
 read.
 If a page-fault is detected in this phase, the error-code which would be
 delivered to the VM does not indicate that the access that caused the
 exception
 is a write one. This does not conform with real hardware, and may cause the
 VM
 to enter the page-fault handler twice for no reason (once for read, once for
 write).
 
 Signed-off-by: Nadav Amit na...@cs.technion.ac.il
 ---
 arch/x86/include/asm/kvm_host.h | 12 
 arch/x86/kvm/emulate.c  |  6 +-
 arch/x86/kvm/mmu.h  | 12 
 3 files changed, 17 insertions(+), 13 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_host.h
 b/arch/x86/include/asm/kvm_host.h
 index 73ccb12..d6f90ca 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -162,6 +162,18 @@ enum {
 #define DR7_FIXED_1  0x0400
 #define DR7_VOLATILE 0x2bff
 
 +#define PFERR_PRESENT_BIT 0
 +#define PFERR_WRITE_BIT 1
 +#define PFERR_USER_BIT 2
 +#define PFERR_RSVD_BIT 3
 +#define PFERR_FETCH_BIT 4
 +
 +#define PFERR_PRESENT_MASK (1U  PFERR_PRESENT_BIT)
 +#define PFERR_WRITE_MASK (1U  PFERR_WRITE_BIT)
 +#define PFERR_USER_MASK (1U  PFERR_USER_BIT)
 +#define PFERR_RSVD_MASK (1U  PFERR_RSVD_BIT)
 +#define PFERR_FETCH_MASK (1U  PFERR_FETCH_BIT)
 +
 /* apic attention bits */
 #define KVM_APIC_CHECK_VAPIC 0
 /*
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index 7a9697f..e5a84be 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -4941,8 +4941,12 @@ int x86_emulate_insn(struct x86_emulate_ctxt
 *ctxt)
  /* optimisation - avoid slow emulated read if Mov */
  rc = segmented_read(ctxt, ctxt-dst.addr.mem,
 ctxt-dst.val, ctxt-dst.bytes);
 
 This is a write operation, do you know why we need to read the dst operand 
 first here?
Some x86 instructions read the destination operand during their operation.

For instance - MOV [EAX], EBX (Intel ASM format), would perform 
[EAX] = [EAX] + EBX. Therefore, it would first read the memory of [EAX] at
this stage.

Nadav

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes

2014-12-27 Thread Nadav Amit
Feng feng...@intel.com wrote:

 
 
 -Original Message-
 From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
 Behalf Of Nadav Amit
 Sent: Thursday, December 25, 2014 5:55 PM
 To: Chen, Tiejun
 Cc: Paolo Bonzini; kvm list
 Subject: Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes
 
 Tiejun tiejun.c...@intel.com wrote:
 
 On 2014/12/25 8:52, Nadav Amit wrote:
 Although pop sreg updates RSP according to the operand size, only 2 bytes
 are
 read.  The current behavior may result in incorrect #GP or #PF exceptions.
 
 Signed-off-by: Nadav Amit na...@cs.technion.ac.il
 ---
 arch/x86/kvm/emulate.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
 
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index e5a84be..702da5e 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct
 x86_emulate_ctxt *ctxt)
 unsigned long selector;
int rc;
 
 Looks we just should do similar thing to em_push_sreg(),
 
   unsigned long selector;
   int rc;
 
 +   if (ctxt-op_bytes == 4) {
 +   rsp_increment(ctxt, -2);
 +   ctxt-op_bytes = 2;
 +   }
   rc = emulate_pop(ctxt, selector, ctxt-op_bytes);
   if (rc != X86EMUL_CONTINUE)
   return rc;
 
 Right?
 I don't think so. It seems the behaviour of push and pop is a bit different.
 For push: If the source operand is a segment register (16 bits) and the
 operand size is 64-bits, a zero-extended value is pushed on the stack; if
 the operand size is 32-bits ... all recent Core and Atom processors perform
 a 16-bit move, leaving the upper portion of the stack location unmodified.
 
 Therefore, for push in the case of op_bytes==8 we push zero-extended value.
 
 For pop the behaviour is not well-documented, but experimentally it appears
 only the first two bytes are accessed. I cannot see why it would be
 different when opsize is 8, since it is not like the push case, where the
 segment register value was zero extended.
 
 Let's take 64-bits operand size as an example. When pushing a segment 
 register, it
 uses zero-extended value, so 8 bytes will be pushed on the stack. When 
 popping it,
 the current code return the top 8 bytes in the stack, and it only uses the 
 lowest 2
 bytes for load_segment_descriptor(). what is the issue here?
The issue I try to solve is that during the emulated write operation of the
pop the read is perform using the wrong size (operand size instead of
segment selector size). As you indicated, the destination register/memory of
the pop instruction will be identical before the fix and after the fix.

However, the emulated read may cause #PF if the operand-size that does not
occur on read hardware. Consider for instance a case in which the operand
size is 8, RSP=0xFFC and the page of [0x1000] is non-present. In this case
POP-SREG should not cause a #PF, yet on KVM it does.

Nadav

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes

2014-12-25 Thread Nadav Amit
Tiejun tiejun.c...@intel.com wrote:

 On 2014/12/25 8:52, Nadav Amit wrote:
 Although pop sreg updates RSP according to the operand size, only 2 bytes are
 read.  The current behavior may result in incorrect #GP or #PF exceptions.
 
 Signed-off-by: Nadav Amit na...@cs.technion.ac.il
 ---
  arch/x86/kvm/emulate.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)
 
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index e5a84be..702da5e 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct x86_emulate_ctxt *ctxt)
  unsigned long selector;
  int rc;
 
 Looks we just should do similar thing to em_push_sreg(),
 
unsigned long selector;
int rc;
 
 +   if (ctxt-op_bytes == 4) {
 +   rsp_increment(ctxt, -2);
 +   ctxt-op_bytes = 2;
 +   }
rc = emulate_pop(ctxt, selector, ctxt-op_bytes);
if (rc != X86EMUL_CONTINUE)
return rc;
 
 Right?
I don’t think so. It seems the behaviour of push and pop is a bit different.
For push: “If the source operand is a segment register (16 bits) and the
operand size is 64-bits, a zero-extended value is pushed on the stack; if
the operand size is 32-bits ... all recent Core and Atom processors perform
a 16-bit move, leaving the upper portion of the stack location unmodified.”

Therefore, for push in the case of op_bytes==8 we push zero-extended value.

For pop the behaviour is not well-documented, but experimentally it appears
only the first two bytes are accessed. I cannot see why it would be
different when opsize is 8, since it is not like the push case, where the
segment register value was zero extended.

If you feel strongly about it, I’ll create a unit test.

Nadav

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes

2014-12-24 Thread Nadav Amit
Although pop sreg updates RSP according to the operand size, only 2 bytes are
read.  The current behavior may result in incorrect #GP or #PF exceptions.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e5a84be..702da5e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct x86_emulate_ctxt *ctxt)
unsigned long selector;
int rc;
 
-   rc = emulate_pop(ctxt, selector, ctxt-op_bytes);
+   rc = emulate_pop(ctxt, selector, 2);
if (rc != X86EMUL_CONTINUE)
return rc;
 
if (ctxt-modrm_reg == VCPU_SREG_SS)
ctxt-interruptibility = KVM_X86_SHADOW_INT_MOV_SS;
+   if (ctxt-op_bytes  2)
+   rsp_increment(ctxt, ctxt-op_bytes - 2);
 
rc = load_segment_descriptor(ctxt, (u16)selector, seg);
return rc;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/8] KVM: x86: Emulator fixes

2014-12-24 Thread Nadav Amit
Few more emulator fixes. Each is logically independent from the others.

The first one is the most interesting one.  It appears that the current
behavior may cause the VM to enter the page-fault handler twice on certain
faulting write accesses. If you do not like my solution, please propose a
better one.

The fourth (JMP/CALL using call- or task-gate) is not a fix, but returns an
error instead of emulating the wrong (#GP) exception.

Thanks for reviewing the patches.

Nadav Amit (8):
  KVM: x86: #PF error-code on R/W operations is wrong
  KVM: x86: pop sreg accesses only 2 bytes
  KVM: x86: fnstcw and fnstsw may cause spurious exception
  KVM: x86: JMP/CALL using call- or task-gate causes exception
  KVM: x86: em_call_far should return failure result
  KVM: x86: POP [ESP] is not emulated correctly
  KVM: x86: Do not set access bit on accessed segments
  KVM: x86: Access to LDT/GDT that wraparound is incorrect

 arch/x86/include/asm/kvm_host.h |  12 
 arch/x86/kvm/emulate.c  | 138 ++--
 arch/x86/kvm/mmu.h  |  12 
 3 files changed, 103 insertions(+), 59 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/8] KVM: x86: #PF error-code on R/W operations is wrong

2014-12-24 Thread Nadav Amit
When emulating an instruction that reads the destination memory operand (i.e.,
instructions without the Mov flag in the emulator), the operand is first read.
If a page-fault is detected in this phase, the error-code which would be
delivered to the VM does not indicate that the access that caused the exception
is a write one. This does not conform with real hardware, and may cause the VM
to enter the page-fault handler twice for no reason (once for read, once for
write).

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/include/asm/kvm_host.h | 12 
 arch/x86/kvm/emulate.c  |  6 +-
 arch/x86/kvm/mmu.h  | 12 
 3 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 73ccb12..d6f90ca 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -162,6 +162,18 @@ enum {
 #define DR7_FIXED_10x0400
 #define DR7_VOLATILE   0x2bff
 
+#define PFERR_PRESENT_BIT 0
+#define PFERR_WRITE_BIT 1
+#define PFERR_USER_BIT 2
+#define PFERR_RSVD_BIT 3
+#define PFERR_FETCH_BIT 4
+
+#define PFERR_PRESENT_MASK (1U  PFERR_PRESENT_BIT)
+#define PFERR_WRITE_MASK (1U  PFERR_WRITE_BIT)
+#define PFERR_USER_MASK (1U  PFERR_USER_BIT)
+#define PFERR_RSVD_MASK (1U  PFERR_RSVD_BIT)
+#define PFERR_FETCH_MASK (1U  PFERR_FETCH_BIT)
+
 /* apic attention bits */
 #define KVM_APIC_CHECK_VAPIC   0
 /*
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 7a9697f..e5a84be 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4941,8 +4941,12 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
/* optimisation - avoid slow emulated read if Mov */
rc = segmented_read(ctxt, ctxt-dst.addr.mem,
   ctxt-dst.val, ctxt-dst.bytes);
-   if (rc != X86EMUL_CONTINUE)
+   if (rc != X86EMUL_CONTINUE) {
+   if (rc == X86EMUL_PROPAGATE_FAULT 
+   ctxt-exception.vector == PF_VECTOR)
+   ctxt-exception.error_code |= PFERR_WRITE_MASK;
goto done;
+   }
}
ctxt-dst.orig_val = ctxt-dst.val;
 
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 6b34876b..daae711 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -44,18 +44,6 @@
 #define PT_DIRECTORY_LEVEL 2
 #define PT_PAGE_TABLE_LEVEL 1
 
-#define PFERR_PRESENT_BIT 0
-#define PFERR_WRITE_BIT 1
-#define PFERR_USER_BIT 2
-#define PFERR_RSVD_BIT 3
-#define PFERR_FETCH_BIT 4
-
-#define PFERR_PRESENT_MASK (1U  PFERR_PRESENT_BIT)
-#define PFERR_WRITE_MASK (1U  PFERR_WRITE_BIT)
-#define PFERR_USER_MASK (1U  PFERR_USER_BIT)
-#define PFERR_RSVD_MASK (1U  PFERR_RSVD_BIT)
-#define PFERR_FETCH_MASK (1U  PFERR_FETCH_BIT)
-
 static inline u64 rsvd_bits(int s, int e)
 {
return ((1ULL  (e - s + 1)) - 1)  s;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/8] KVM: x86: JMP/CALL using call- or task-gate causes exception

2014-12-24 Thread Nadav Amit
The KVM emulator does not emulate JMP and CALL that target a call gate or a
task gate.  This patch does not try to implement these scenario as they are
presumably rare; yet it returns X86EMUL_UNHANDLEABLE error in such cases
instead of generating an exception.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 54 +++---
 1 file changed, 34 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 19923e7..fd89471 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -264,6 +264,13 @@ struct instr_dual {
 #define EFLG_RESERVED_ZEROS_MASK 0xffc0802a
 #define EFLG_RESERVED_ONE_MASK 2
 
+enum x86_transfer_type {
+   X86_TRANSFER_NONE,
+   X86_TRANSFER_CALL_JMP,
+   X86_TRANSFER_RET,
+   X86_TRANSFER_TASK_SWITCH,
+};
+
 static ulong reg_read(struct x86_emulate_ctxt *ctxt, unsigned nr)
 {
if (!(ctxt-regs_valid  (1  nr))) {
@@ -1474,7 +1481,7 @@ static int write_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
 /* Does not support long mode */
 static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
 u16 selector, int seg, u8 cpl,
-bool in_task_switch,
+enum x86_transfer_type transfer,
 struct desc_struct *desc)
 {
struct desc_struct seg_desc, old_desc;
@@ -1528,11 +1535,15 @@ static int __load_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
return ret;
 
err_code = selector  0xfffc;
-   err_vec = in_task_switch ? TS_VECTOR : GP_VECTOR;
+   err_vec = (transfer == X86_TRANSFER_TASK_SWITCH) ? TS_VECTOR :
+  GP_VECTOR;
 
/* can't load system descriptor into segment selector */
-   if (seg = VCPU_SREG_GS  !seg_desc.s)
+   if (seg = VCPU_SREG_GS  !seg_desc.s) {
+   if (transfer == X86_TRANSFER_CALL_JMP)
+   return X86EMUL_UNHANDLEABLE;
goto exception;
+   }
 
if (!seg_desc.p) {
err_vec = (seg == VCPU_SREG_SS) ? SS_VECTOR : NP_VECTOR;
@@ -1630,7 +1641,8 @@ static int load_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
   u16 selector, int seg)
 {
u8 cpl = ctxt-ops-cpl(ctxt);
-   return __load_segment_descriptor(ctxt, selector, seg, cpl, false, NULL);
+   return __load_segment_descriptor(ctxt, selector, seg, cpl,
+X86_TRANSFER_NONE, NULL);
 }
 
 static void write_register_operand(struct operand *op)
@@ -2042,7 +2054,8 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
 
memcpy(sel, ctxt-src.valptr + ctxt-op_bytes, 2);
 
-   rc = __load_segment_descriptor(ctxt, sel, VCPU_SREG_CS, cpl, false,
+   rc = __load_segment_descriptor(ctxt, sel, VCPU_SREG_CS, cpl,
+  X86_TRANSFER_CALL_JMP,
   new_desc);
if (rc != X86EMUL_CONTINUE)
return rc;
@@ -2131,7 +2144,8 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
/* Outer-privilege level return is not implemented */
if (ctxt-mode = X86EMUL_MODE_PROT16  (cs  3)  cpl)
return X86EMUL_UNHANDLEABLE;
-   rc = __load_segment_descriptor(ctxt, (u16)cs, VCPU_SREG_CS, cpl, false,
+   rc = __load_segment_descriptor(ctxt, (u16)cs, VCPU_SREG_CS, cpl,
+  X86_TRANSFER_RET,
   new_desc);
if (rc != X86EMUL_CONTINUE)
return rc;
@@ -2569,23 +2583,23 @@ static int load_state_from_tss16(struct 
x86_emulate_ctxt *ctxt,
 * it is handled in a context of new task
 */
ret = __load_segment_descriptor(ctxt, tss-ldt, VCPU_SREG_LDTR, cpl,
-   true, NULL);
+   X86_TRANSFER_TASK_SWITCH, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
ret = __load_segment_descriptor(ctxt, tss-es, VCPU_SREG_ES, cpl,
-   true, NULL);
+   X86_TRANSFER_TASK_SWITCH, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
ret = __load_segment_descriptor(ctxt, tss-cs, VCPU_SREG_CS, cpl,
-   true, NULL);
+   X86_TRANSFER_TASK_SWITCH, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
ret = __load_segment_descriptor(ctxt, tss-ss, VCPU_SREG_SS, cpl,
-   true, NULL);
+   X86_TRANSFER_TASK_SWITCH, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
ret = __load_segment_descriptor(ctxt, tss

[PATCH 8/8] KVM: x86: Access to LDT/GDT that wraparound is incorrect

2014-12-24 Thread Nadav Amit
When access to descriptor in LDT/GDT wraparound outside long-mode, the address
of the descriptor should be truncated to 32-bit.  Citing Intel SDM 2.1.1.1
Global and Local Descriptor Tables in IA-32e Mode: GDTR and LDTR registers
are expanded to 64-bits wide in both IA-32e sub-modes (64-bit mode and
compatibility mode).

So in other cases, we need to truncate. Creating new function to return a
pointer to descriptor table to avoid too much code duplication.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 45 -
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 4fcd9fd..cb6b8ef 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1446,10 +1446,8 @@ static void get_descriptor_table_ptr(struct 
x86_emulate_ctxt *ctxt,
ops-get_gdt(ctxt, dt);
 }
 
-/* allowed just for 8 bytes segments */
-static int read_segment_descriptor(struct x86_emulate_ctxt *ctxt,
-  u16 selector, struct desc_struct *desc,
-  ulong *desc_addr_p)
+static int get_descriptor_ptr(struct x86_emulate_ctxt *ctxt,
+ u16 selector, ulong *desc_addr_p)
 {
struct desc_ptr dt;
u16 index = selector  3;
@@ -1460,8 +1458,32 @@ static int read_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
if (dt.size  index * 8 + 7)
return emulate_gp(ctxt, selector  0xfffc);
 
-   *desc_addr_p = addr = dt.address + index * 8;
-   return ctxt-ops-read_std(ctxt, addr, desc, sizeof *desc,
+   addr = dt.address + index * 8;
+
+   if (addr  32 != 0) {
+   u64 efer = 0;
+
+   ctxt-ops-get_msr(ctxt, MSR_EFER, efer);
+   if (!(efer  EFER_LMA))
+   addr = (u32)-1;
+   }
+
+   *desc_addr_p = addr;
+   return X86EMUL_CONTINUE;
+}
+
+/* allowed just for 8 bytes segments */
+static int read_segment_descriptor(struct x86_emulate_ctxt *ctxt,
+  u16 selector, struct desc_struct *desc,
+  ulong *desc_addr_p)
+{
+   int rc;
+
+   rc = get_descriptor_ptr(ctxt, selector, desc_addr_p);
+   if (rc != X86EMUL_CONTINUE)
+   return rc;
+
+   return ctxt-ops-read_std(ctxt, *desc_addr_p, desc, sizeof(*desc),
   ctxt-exception);
 }
 
@@ -1469,16 +1491,13 @@ static int read_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
 static int write_segment_descriptor(struct x86_emulate_ctxt *ctxt,
u16 selector, struct desc_struct *desc)
 {
-   struct desc_ptr dt;
-   u16 index = selector  3;
+   int rc;
ulong addr;
 
-   get_descriptor_table_ptr(ctxt, selector, dt);
-
-   if (dt.size  index * 8 + 7)
-   return emulate_gp(ctxt, selector  0xfffc);
+   rc = get_descriptor_ptr(ctxt, selector, addr);
+   if (rc != X86EMUL_CONTINUE)
+   return rc;
 
-   addr = dt.address + index * 8;
return ctxt-ops-write_std(ctxt, addr, desc, sizeof *desc,
ctxt-exception);
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/8] KVM: x86: POP [ESP] is not emulated correctly

2014-12-24 Thread Nadav Amit
According to Intel SDM: If the ESP register is used as a base register for
addressing a destination operand in memory, the POP instruction computes the
effective address of the operand after it increments the ESP register.

The current emulation does not behave so. The fix required to waste another
of the precious instruction flags and to check the flag in decode_modrm.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 7f80f01..7bf3548 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -171,6 +171,7 @@
 #define PrivUD  ((u64)1  51)  /* #UD instead of #GP on CPL  0 */
 #define NearBranch  ((u64)1  52)  /* Near branches */
 #define No16   ((u64)1  53)  /* No 16 bit operand */
+#define IncSP   ((u64)1  54)  /* SP is incremented before ModRM calc */
 
 #define DstXacc (DstAccLo | SrcAccHi | SrcWrite)
 
@@ -1229,6 +1230,10 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
else {
modrm_ea += reg_read(ctxt, base_reg);
adjust_modrm_seg(ctxt, base_reg);
+   /* Increment ESP on POP [ESP] */
+   if ((ctxt-d  IncSP) 
+   base_reg == VCPU_REGS_RSP)
+   modrm_ea += ctxt-op_bytes;
}
if (index_reg != 4)
modrm_ea += reg_read(ctxt, index_reg)  scale;
@@ -3825,7 +3830,7 @@ static const struct opcode group1[] = {
 };
 
 static const struct opcode group1A[] = {
-   I(DstMem | SrcNone | Mov | Stack, em_pop), N, N, N, N, N, N, N,
+   I(DstMem | SrcNone | Mov | Stack | IncSP, em_pop), N, N, N, N, N, N, N,
 };
 
 static const struct opcode group2[] = {
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/8] KVM: x86: em_call_far should return failure result

2014-12-24 Thread Nadav Amit
Currently, if em_call_far fails it returns success instead of the resulting
error-code. Fix it.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index fd89471..7f80f01 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3034,7 +3034,7 @@ static int em_call_far(struct x86_emulate_ctxt *ctxt)
rc = __load_segment_descriptor(ctxt, sel, VCPU_SREG_CS, cpl,
   X86_TRANSFER_CALL_JMP, new_desc);
if (rc != X86EMUL_CONTINUE)
-   return X86EMUL_CONTINUE;
+   return rc;
 
rc = assign_eip_far(ctxt, ctxt-src.val, new_desc);
if (rc != X86EMUL_CONTINUE)
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/8] KVM: x86: fnstcw and fnstsw may cause spurious exception

2014-12-24 Thread Nadav Amit
Since the operand size of fnstcw and fnstsw is updated during the execution,
the emulation may cause spurious exceptions as it reads the memory beforehand.

Marking these instructions as Mov (since the previous value is ignored) and
DstMem16 to simplify the setting of operand size.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 702da5e..19923e7 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -86,6 +86,7 @@
 #define DstAcc  (OpAcc  DstShift)
 #define DstDI   (OpDI  DstShift)
 #define DstMem64(OpMem64  DstShift)
+#define DstMem16(OpMem16  DstShift)
 #define DstImmUByte (OpImmUByte  DstShift)
 #define DstDX   (OpDX  DstShift)
 #define DstAccLo(OpAccLo  DstShift)
@@ -1059,8 +1060,6 @@ static int em_fnstcw(struct x86_emulate_ctxt *ctxt)
asm volatile(fnstcw %0: +m(fcw));
ctxt-ops-put_fpu(ctxt);
 
-   /* force 2 byte destination */
-   ctxt-dst.bytes = 2;
ctxt-dst.val = fcw;
 
return X86EMUL_CONTINUE;
@@ -1077,8 +1076,6 @@ static int em_fnstsw(struct x86_emulate_ctxt *ctxt)
asm volatile(fnstsw %0: +m(fsw));
ctxt-ops-put_fpu(ctxt);
 
-   /* force 2 byte destination */
-   ctxt-dst.bytes = 2;
ctxt-dst.val = fsw;
 
return X86EMUL_CONTINUE;
@@ -3930,7 +3927,7 @@ static const struct gprefix pfx_0f_e7 = {
 };
 
 static const struct escape escape_d9 = { {
-   N, N, N, N, N, N, N, I(DstMem, em_fnstcw),
+   N, N, N, N, N, N, N, I(DstMem16 | Mov, em_fnstcw),
 }, {
/* 0xC0 - 0xC7 */
N, N, N, N, N, N, N, N,
@@ -3972,7 +3969,7 @@ static const struct escape escape_db = { {
 } };
 
 static const struct escape escape_dd = { {
-   N, N, N, N, N, N, N, I(DstMem, em_fnstsw),
+   N, N, N, N, N, N, N, I(DstMem16 | Mov, em_fnstsw),
 }, {
/* 0xC0 - 0xC7 */
N, N, N, N, N, N, N, N,
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/8] KVM: x86: Do not set access bit on accessed segments

2014-12-24 Thread Nadav Amit
When segment is loaded, the segment access bit is set unconditionally.  In
fact, it should be set conditionally, based on whether the segment had the
accessed bit set before. In addition, it can improve performance.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 7bf3548..4fcd9fd 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1620,10 +1620,13 @@ static int __load_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
 
if (seg_desc.s) {
/* mark segment as accessed */
-   seg_desc.type |= 1;
-   ret = write_segment_descriptor(ctxt, selector, seg_desc);
-   if (ret != X86EMUL_CONTINUE)
-   return ret;
+   if (!(seg_desc.type  1)) {
+   seg_desc.type |= 1;
+   ret = write_segment_descriptor(ctxt, selector,
+  seg_desc);
+   if (ret != X86EMUL_CONTINUE)
+   return ret;
+   }
} else if (ctxt-mode == X86EMUL_MODE_PROT64) {
ret = ctxt-ops-read_std(ctxt, desc_addr+8, base3,
sizeof(base3), ctxt-exception);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Problem with KVM when using XSAVES in host

2014-12-14 Thread Nadav Amit
I encountered an interesting and annoying problem when KVM uses
XSAVES/XRSTORS.

The problem results from the fact XSAVES does not save the exact value of
XINUSE[1]. See Intel SDM 13.10 “Operation of XSAVES”: “...if RFBM[1] = 1 and
MXCSR does not have the value 1F80H, XSAVEC writes XSTATE_BV[1] as 1 even if
XINUSE[1] = 0”. [ there is a typo in the SDM; they refer to XSAVES in this
section, and probably copy-pasted the sentence]. XINUSE[1] marks whether the
SSE state is in its initial state and not saved in the save-area.

The result of this behaviour, is that when KVM uses XSAVES and then XRSTORS,
it may not restore XINUSE[1] correctly, i.e. restore it to 1, although it
was 0.

This behaviour hurts the “equivalence property” - the VM does not behave as
bare-metal system. Moreover, it may hurt the VM performance if the VM uses
XSAVEOPT (and not XSAVES), has MXCSR with value different than the reset
value of 1F80H and has all SSE registers set to zero. In such case, the VM
would save/restore SSE registers unnecessarily. I don’t know whether such
scenario happens in real workloads.

tl;dr - hypervisors which use XSAVES (and XSAVEC) mess the VM state and may
hurt VM performance. Perhaps KVM should use XSAVE/XSAVEOPT instead.

Regards,
Nadav--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] KVM: x86: Emulator fixes for VM86

2014-12-10 Thread Nadav Amit
Two minor fixes for emulation of instructions on VM86.

Thanks for reviewing them.

Nadav Amit (2):
  KVM: x86: Do not push eflags.vm on pushf
  KVM: x86: Emulate should check #UD before #GP

 arch/x86/kvm/emulate.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: x86: Emulate should check #UD before #GP

2014-12-10 Thread Nadav Amit
Intel SDM table 6-2 (Priority Among Simultaneous Exceptions and Interrupts)
shows that faults from decoding the next instruction got higher priority than
general protection.  Moving the protected-mode check before the CPL check to
avoid wrong exception on vm86 mode.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 5cd5401..0d42aca 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4803,6 +4803,12 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
goto done;
}
 
+   /* Instruction can only be executed in protected mode */
+   if ((ctxt-d  Prot)  ctxt-mode  X86EMUL_MODE_PROT16) {
+   rc = emulate_ud(ctxt);
+   goto done;
+   }
+
/* Privileged instruction can be executed only in CPL=0 */
if ((ctxt-d  Priv)  ops-cpl(ctxt)) {
if (ctxt-d  PrivUD)
@@ -4812,12 +4818,6 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
goto done;
}
 
-   /* Instruction can only be executed in protected mode */
-   if ((ctxt-d  Prot)  ctxt-mode  X86EMUL_MODE_PROT16) {
-   rc = emulate_ud(ctxt);
-   goto done;
-   }
-
/* Do instruction specific permission checks */
if (ctxt-d  CheckPerm) {
rc = ctxt-check_perm(ctxt);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: x86: Do not push eflags.vm on pushf

2014-12-10 Thread Nadav Amit
The pushf instruction does not push eflags.VM, so emulation should not do so as
well.  Although eflags.RF should not be pushed as well, it is already cleared
by the time pushf is executed.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 33ecfcf..5cd5401 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1863,7 +1863,7 @@ static int em_pusha(struct x86_emulate_ctxt *ctxt)
 
 static int em_pushf(struct x86_emulate_ctxt *ctxt)
 {
-   ctxt-src.val =  (unsigned long)ctxt-eflags;
+   ctxt-src.val = (unsigned long)ctxt-eflags  ~EFLG_VM;
return em_push(ctxt);
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests] x86: test_conforming_switch misses es initialization

2014-12-07 Thread Nadav Amit
test_conforming_switch in the taskswitch2 tests, miss es initialization.
Fix it.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 x86/taskswitch2.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/x86/taskswitch2.c b/x86/taskswitch2.c
index f55843c..db3e41a 100644
--- a/x86/taskswitch2.c
+++ b/x86/taskswitch2.c
@@ -271,7 +271,8 @@ void test_conforming_switch(void)
 
tss_intr.cs = CONFORM_CS_SEL | 3;
tss_intr.eip = (u32)user_tss;
-   tss_intr.ds = tss_intr.gs = tss_intr.fs = tss_intr.ss = USER_DS;
+   tss_intr.ss = USER_DS;
+   tss_intr.ds = tss_intr.gs = tss_intr.es = tss_intr.fs = tss_intr.ss;
tss_intr.eflags |= 3  IOPL_SHIFT;
set_gdt_entry(CONFORM_CS_SEL, 0, 0x, 0x9f, 0xc0);
asm volatile(lcall $ xstr(TSS_INTR) , $0xf4f4f4f4);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: Remove prefix flag when GP macro is used

2014-12-07 Thread Nadav Amit
The macro GP already sets the flag Prefix. Remove the redundant flag for
0f_38_f0 and 0f_38_f1 opcodes.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 3817334..b4f4201 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4172,8 +4172,8 @@ static const struct opcode opcode_map_0f_38[256] = {
/* 0x80 - 0xef */
X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N),
/* 0xf0 - 0xf1 */
-   GP(EmulateOnUD | ModRM | Prefix, three_byte_0f_38_f0),
-   GP(EmulateOnUD | ModRM | Prefix, three_byte_0f_38_f1),
+   GP(EmulateOnUD | ModRM, three_byte_0f_38_f0),
+   GP(EmulateOnUD | ModRM, three_byte_0f_38_f1),
/* 0xf2 - 0xff */
N, N, X4(N), X8(N)
 };
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] kvm: cpuid: fix xsave area size of XSAVEC

2014-12-04 Thread Nadav Amit
Wanpeng Li wanpeng...@linux.intel.com wrote:

 XSAVEC also use the compacted format for the extended region
 of the XSAVE area. This patch fix it by caculate the size of 
 XSAVEC extended region of XSAVE area as compact format.
 
 Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com
 ---
 arch/x86/kvm/cpuid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
 index 70f0fa1..e16a0c7 100644
 --- a/arch/x86/kvm/cpuid.c
 +++ b/arch/x86/kvm/cpuid.c
 @@ -92,7 +92,7 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
   }
 
   best = kvm_find_cpuid_entry(vcpu, 0xD, 1);
 - if (best  (best-eax  F(XSAVES)))
 + if (best  (best-eax  (F(XSAVES) || F(XSAVEC
Did you want to use | (bitwise or) instead of II (logical or) ?

   best-ebx = xstate_required_size(vcpu-arch.xcr0, true);
 
   /*
 -- 
 1.9.1
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Nadav
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CFT PATCH v2 2/2] KVM: x86: support XSAVES usage in the host

2014-12-03 Thread Nadav Amit
Paolo Bonzini pbonz...@redhat.com wrote:

 Userspace is expecting non-compacted format for KVM_GET_XSAVE, but
 struct xsave_struct might be using the compacted format.  Convert
 in order to preserve userspace ABI.
 
 Likewise, userspace is passing non-compacted format for KVM_SET_XSAVE
 but the kernel will pass it to XRSTORS, and we need to convert back.
 
 Fixes: f31a9f7c71691569359fa7fb8b0acaa44bce0324
 Cc: Fenghua Yu fenghua...@intel.com
 Cc: H. Peter Anvin h...@linux.intel.com
 Cc: Nadav Amit na...@cs.technion.ac.il
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
 arch/x86/kvm/x86.c | 87 +-
 1 file changed, 80 insertions(+), 7 deletions(-)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 08b5657e57ed..373b0ab9a32e 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -3132,15 +3132,89 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct 
 kvm_vcpu *vcpu,
   return 0;
 }
 
 +#define XSTATE_COMPACTION_ENABLED (1ULL  63)
 +
 +static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
 +{
 + struct xsave_struct *xsave = vcpu-arch.guest_fpu.state-xsave;
 + u64 xstate_bv = vcpu-arch.guest_supported_xcr0 | XSTATE_FPSSE;
 + u64 valid;
 +
 + /*
 +  * Copy legacy XSAVE area, to avoid complications with CPUID
 +  * leaves 0 and 1 in the loop below.
 +  */
 + memcpy(dest, xsave, XSAVE_HDR_OFFSET);
 +
 + /* Set XSTATE_BV */
 + *(u64 *)(dest + XSAVE_HDR_OFFSET) = xstate_bv;

I have a problem with this line. I ran some experiments and it has a
side-effect of causing XINUSE (an internal register which saves which state
components are not in the initial state) to be all set. As a results,
after load_xsave runs, when the guest runs xsave instruction, initialised
xsave state components are marked as not-initialised in the guest’s
xstate_bv.

This causes both transparency issues (the VM does not behave as bare-metal
machine). In addition it may cause performance overheads, since from this
point on, xsave and xrstor instructions would save and load state which is
in fact in the initial state.

I think it is better just to replace the last line with:

*(u64 *)(dest + XSAVE_HDR_OFFSET) = xsave-xsave_hdr.xstate_bv

Thanks,
Nadav

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: x86: allow 256 logical x2APICs again

2014-11-27 Thread Nadav Amit
Radim Krčmář rkrc...@redhat.com wrote:

 While fixing an x2apic bug,
 17d68b7 KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)
 we've made only one cluster available.  This means that the amount of
 logically addressible x2APICs was reduced to 16 and VCPUs kept
 overwriting themselves in that region, so even the first cluster wasn't
 set up correctly.
 
 This patch extends x2APIC support back to the logical_map's limit, and
 keeps the CVE fixed as messages for non-present APICs are dropped.
 
 Signed-off-by: Radim Krčmář rkrc...@redhat.com
 ---
 arch/x86/kvm/lapic.c | 11 ++-
 arch/x86/kvm/lapic.h |  2 --
 2 files changed, 6 insertions(+), 7 deletions(-)
 
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 049d30f..f6e3369 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -132,8 +132,6 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
   return (kvm_apic_get_reg(apic, APIC_ID)  24)  0xff;
 }
 
 -#define KVM_X2APIC_CID_BITS 0
 -
 static void recalculate_apic_map(struct kvm *kvm)
 {
   struct kvm_apic_map *new, *old = NULL;
 @@ -163,8 +161,7 @@ static void recalculate_apic_map(struct kvm *kvm)
   if (apic_x2apic_mode(apic)) {
   new-ldr_bits = 32;
   new-cid_shift = 16;
 - new-cid_mask = (1  KVM_X2APIC_CID_BITS) - 1;
 - new-lid_mask = 0x;
 + new-cid_mask = new-lid_mask = 0x;
You set cid_mask to 0x, while there are only 16 clusters. I think it is
risky (if you twist my hand would come with a scenario). Yet, why not to set
cid_mask to (ARRAY_SIZE(map-logical_map) - 1) ?


Nadav--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: x86: allow 256 logical x2APICs again

2014-11-27 Thread Nadav Amit
Radim Krčmář rkrc...@redhat.com wrote:

 2014-11-27 21:53+0200, Nadav Amit:
 Radim Krčmář rkrc...@redhat.com wrote:
 -   new-cid_mask = (1  KVM_X2APIC_CID_BITS) - 1;
 -   new-lid_mask = 0x;
 +   new-cid_mask = new-lid_mask = 0x;
 You set cid_mask to 0x, while there are only 16 clusters. I think it is
 risky (if you twist my hand would come with a scenario).
 
 Let's see :) APIC id is 8 bit, and we compute cluster part of LDR by
 taking four upper bits, so 16 is enough.
To clarify my concern - I am worried that some of the CPUs are still in
xAPIC mode with LDR that does not follow x2APIC LDR scheme.

 It isn't the safest programming practice, but we already fail to check
 physical_map bounds and any boost to maximal APIC ID is going to require
 a rewrite, thus I didn't bother to do it ...
 
 All uses should be covered with the following hunk, I will add it to v2
 after all reviews,
 
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 6c2b8a5..30e4cc1 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -198,7 +198,7 @@ static void recalculate_apic_map(struct kvm *kvm)
   cid = apic_cluster_id(new, ldr);
   lid = apic_logical_id(new, ldr);
 
 - if (lid)
 + if (lid  cid  ARRAY_SIZE(map-logical_map))
   new-logical_map[cid][ffs(lid) - 1] = apic;
   }
 out:
 
 
 Yet, why not to set
 cid_mask to (ARRAY_SIZE(map-logical_map) - 1) ?
 
 We would incorrectly deliver messages intended for high clusters,
 it has to be 0x.
From the SDM, I am not sure you are correct, but your solution is fine.

Nadav

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/21] KVM: x86: Software disabled APIC should still deliver NMIs

2014-11-27 Thread Nadav Amit
Radim Krčmář rkrc...@redhat.com wrote:

 2014-11-26 19:01+0200, Nadav Amit:
 Sorry for the late and long reply, but I got an issue with the new version
 (and my previous version as well). Indeed, the SDM states that DFR should
 be the same for enabled CPUs, and that the BIOS should get all CPUs in
 either xAPIC or x2APIC. Yet, there is nothing that says all CPUs need to be
 in xAPIC/x2APIC mode.
 
 In my tests (which pass on bare-metal), I got a scenario in which some CPUs
 are in xAPIC mode, the BSP changed (which is currently not handled correctly
 by KVM) and the BSP has x2APIC enabled.
 
 How many (V)CPUs were you using?
 (We fail hard with logical destination x2APIC and 16+ VCPUs.)
2 at the moment. What failure do you refer to?

 
All the core APICs are
 software-enabled. The expected behaviour is that the CPUs with x2APIC
 enabled would work in x2APIC mode.
 
 (Nice, I bet that made some Intel designers happy.)
 
 There shouldn't be any message conflict when using APIC IDs 255, so it
 might be possible if the x2APIC isn't programmed to issue weird
 messages, like physical to nonexistent APIC ID 0xff00, which would
 be also interpreted as xAPIC broadcast.
 
 I think such a transitory scenario is possible on real-systems as well,
 perhaps during CPU hot-plug. It appears the previous version (before all of
 our changes) handled it better. I presume the most efficient way is to start
 determining the APIC logical mode from the BSP, and if it is disabled,
 traverse the rest of the CPUs until finding the first one with APIC enabled.
 Yet, I have not finished doing and checking the BSP fix and other dependent
 INIT signal handling fixes.
 
 In the meanwhile, would you be ok with restoring some of the previous
 behaviour - i.e., x2APIC is enabled if any CPU turned it on (regardless to
 whether APIC is software enabled), otherwise use the configuration of the
 last enabled APIC?
 
 I don't think this patch improves anything.
 (Both behaviors are wrong and I think the current one is a bit less so.)
 
 Our x2APIC implementation is a hack that allowed faster IPI thanks to 1
 MSR exit instead of 2 MMIO ones.  No OS, that doesn't know KVM's
 limitations, should have enabled it because we didn't emulate interrupt
 remapping, which is an architectural requirement for x2APIC …
It is a shame - I was under the impression QEMU emulation of the Intel IOMMU
would include it as well, and I now see they only did DMAR…

 And for more concrete points:
 - Physical x2APIC isn't affected (only broadcast, which is incorrect
  either way)
 
 - Logical x2APIC and xAPIC don't work at the same time
No, but it is important to determine what is the “consensus” APIC mode.

  - Btw. logical x2APIC isn't supposed to work (see KVM_X2APIC_CID_BITS)
Why? It is as if there is only a single cluster. You can still send an APIC
message to multiple CPUs within the same cluster (0).

  - Logical xAPIC is shifted incorrectly in x2APIC mode, so they are all
going to be inaccessible (ldr = 0)
  - Our map isn't designed to allow x2APIC and xAPIC at the same time
 
 - Your patch does not cover the case where sw-disabled x2APIC is
  before sw-enabled xAPIC, only if it is after.
I thought I covered it. The rationale was that if any lapic is in x2APIC
mode, then the are all in x2APIC mode. It is done similarly to the previous
version (3.18).

Anyhow, I have my workarounds, so do as you find appropriately. Once I deal
with the BSP issues, I may resubmit another version.

Nadav

 -- 8 —
 Subject: [PATCH] KVM: x86: Traverse all CPUs during recalculate_apic_map
 
 ---
 arch/x86/kvm/lapic.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)
 
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 9c90d31..6dc2be6 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -139,6 +139,7 @@ static void recalculate_apic_map(struct kvm *kvm)
  struct kvm_apic_map *new, *old = NULL;
  struct kvm_vcpu *vcpu;
  int i;
 +bool any_enabled = false;
 
  new = kzalloc(sizeof(struct kvm_apic_map), GFP_KERNEL);
 
 @@ -160,13 +161,21 @@ static void recalculate_apic_map(struct kvm *kvm)
  if (!kvm_apic_present(vcpu))
  continue;
 
 +/*
 + * All APICs DFRs have to be configured the same mode by an OS.
 + * We take advatage of this while building logical id lookup
 + * table. After reset APICs are in software disabled mode, so if
 + * we find apic with different setting we assume this is the 
 mode
 + * OS wants all apics to be in; build lookup table accordingly.
 + */
  if (apic_x2apic_mode(apic)) {
  new-ldr_bits = 32;
  new-cid_shift = 16;
  new-cid_mask = (1  KVM_X2APIC_CID_BITS) - 1;
  new-lid_mask = 0x;
  new-broadcast

  1   2   3   4   >