from:"Nadav Amit"

Re: KVM: x86: fix mov immediate emulation for 64-bit operands

2012-12-07 Thread Nadav Amit

On 7/12/12 09:31 , Gleb Natapov wrote:
> On Thu, Dec 06, 2012 at 09:55:10PM -0200, Marcelo Tosatti wrote:
>>
>> From: Nadav Amit 
>>
>> MOV immediate instruction (opcodes 0xB8-0xBF) may take 64-bit operand.
>> The previous emulation implementation assumes the operand is no longer than 
>> 32.
>> Adding OpImm64 for this matter.
>>
>> Fixes https://bugzilla.redhat.com/show_bug.cgi?id=881579
>>
>> Signed-off-by: Marcelo Tosatti 
>>
> Needs author's sign-off and test case.
I've already signed-off the patch I sent a while ago.
I was busy, put the test-case implementation low in my priority list,
and forgot about it.
I would appreciate if Marcelo implements the test-case.
Otherwise, let me know - and I'll do it next week.


> 
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index 39171cb..6fec09c 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -43,7 +43,7 @@
>>  #define OpCL   9ull  /* CL register (for shifts) */
>>  #define OpImmByte 10ull  /* 8-bit sign extended immediate */
>>  #define OpOne 11ull  /* Implied 1 */
>> -#define OpImm 12ull  /* Sign extended immediate */
>> +#define OpImm 12ull  /* Sign extended up to 32-bit immediate */
>>  #define OpMem16   13ull  /* Memory operand (16-bit). */
>>  #define OpMem32   14ull  /* Memory operand (32-bit). */
>>  #define OpImmU15ull  /* Immediate operand, zero extended */
>> @@ -58,6 +58,7 @@
>>  #define OpFS  24ull  /* FS */
>>  #define OpGS  25ull  /* GS */
>>  #define OpMem826ull  /* 8-bit zero extended memory operand */
>> +#define OpImm64   27ull  /* Sign extended 16/32/64-bit immediate */
>>  
>>  #define OpBits 5  /* Width of operand field */
>>  #define OpMask ((1ull << OpBits) - 1)
>> @@ -101,6 +102,7 @@
>>  #define SrcMemFAddr (OpMemFAddr << SrcShift)
>>  #define SrcAcc  (OpAcc << SrcShift)
>>  #define SrcImmU16   (OpImmU16 << SrcShift)
>> +#define SrcImm64(OpImm64 << SrcShift)
>>  #define SrcDX   (OpDX << SrcShift)
>>  #define SrcMem8 (OpMem8 << SrcShift)
>>  #define SrcMask (OpMask << SrcShift)
>> @@ -3786,7 +3788,7 @@ static const struct opcode opcode_table[256] = {
>>  /* 0xB0 - 0xB7 */
>>  X8(I(ByteOp | DstReg | SrcImm | Mov, em_mov)),
>>  /* 0xB8 - 0xBF */
>> -X8(I(DstReg | SrcImm | Mov, em_mov)),
>> +X8(I(DstReg | SrcImm64 | Mov, em_mov)),
>>  /* 0xC0 - 0xC7 */
>>  D2bv(DstMem | SrcImmByte | ModRM),
>>  I(ImplicitOps | Stack | SrcImmU16, em_ret_near_imm),
>> @@ -3950,6 +3952,9 @@ static int decode_imm(struct x86_emulate_ctxt *ctxt, 
>> struct operand *op,
>>  case 4:
>>  op->val = insn_fetch(s32, ctxt);
>>  break;
>> +case 8:
>> +op->val = insn_fetch(s64, ctxt);
>> +break;
>>  }
>>  if (!sign_extension) {
>>  switch (op->bytes) {
>> @@ -4028,6 +4033,9 @@ static int decode_operand(struct x86_emulate_ctxt 
>> *ctxt, struct operand *op,
>>  case OpImm:
>>  rc = decode_imm(ctxt, op, imm_size(ctxt), true);
>>  break;
>> +case OpImm64:
>> +rc = decode_imm(ctxt, op, ctxt->op_bytes, true);
>> +break;
>>  case OpMem8:
>>  ctxt->memop.bytes = 1;
>>  goto mem_common;
> 
> --
>   Gleb.
> 

Regards,
Nadav

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Fwd: [PATCH] KVM: fix test emulation writeback

2011-12-30 Thread Nadav Amit

Test instruction emulation should not perform writeback.

Signed-off-by: Nadav Amit 
---
arch/x86/kvm/emulate.c |4 
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 8b4cc5f..0ad0a75 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1708,6 +1708,8 @@ static int em_grp3(struct x86_emulate_ctxt *ctxt)
switch (ctxt->modrm_reg) {
case 0 ... 1:   /* test */
emulate_2op_SrcV("test", ctxt->src, ctxt->dst, ctxt->eflags);
+   /* Disable writeback. */
+   ctxt->dst.type = OP_NONE;
break;
case 2: /* not */
ctxt->dst.val = ~ctxt->dst.val;
@@ -2553,6 +2555,8 @@ static int em_cmp(struct x86_emulate_ctxt *ctxt)
static int em_test(struct x86_emulate_ctxt *ctxt)
{
emulate_2op_SrcV("test", ctxt->src, ctxt->dst, ctxt->eflags);
+   /* Disable writeback. */
+   ctxt->dst.type = OP_NONE;
return X86EMUL_CONTINUE;
}

-- 
1.7.4.1


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: fix mov immediate emulation for 64-bit operands

2012-01-07 Thread Nadav Amit

MOV immediate instruction (opcodes 0xB8-0xBF) may take 64-bit operand.
The previous emulation implementation assumes the operand is no longer than 32.

Signed-off-by: Nadav Amit 
---
arch/x86/kvm/emulate.c |6 +-
1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 05a562b..65d1d31 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3502,7 +3502,8 @@ static unsigned imm_size(struct x86_emulate_ctxt *ctxt)
unsigned size;

size = (ctxt->d & ByteOp) ? 1 : ctxt->op_bytes;
-   if (size == 8)
+   /* Immediates are usually no longer than 4 bytes */
+   if (size == 8 && ((ctxt->b & 0xF8) != 0xB8 || ctxt->twobyte))
size = 4;
return size;
}
@@ -3526,6 +3527,9 @@ static int decode_imm(struct x86_emulate_ctxt *ctxt, 
struct operand *op,
case 4:
op->val = insn_fetch(s32, ctxt);
break;
+   case 8:
+   op->val = insn_fetch(s64, ctxt);
+   break;
}
if (!sign_extension) {
switch (op->bytes) {
-- 
1.7.4.1


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: fix mov immediate emulation for 64-bit operands

2012-01-07 Thread Nadav Amit


On Jan 7, 2012, at 10:25 PM, H. Peter Anvin wrote:

> On 01/07/2012 12:21 PM, Nadav Amit wrote:
>> MOV immediate instruction (opcodes 0xB8-0xBF) may take 64-bit operand.
>> The previous emulation implementation assumes the operand is no longer than 
>> 32.
>> 
>> Signed-off-by: Nadav Amit 
> 
> There are exactly two such instructions: MOV immediate (B8-BF) and MOV
> moff (A0-A3); you may want to check the latter too.
> 
>   -hpa
> 

These instructions (A0-A3) seem to be already covered by the decode_abs 
function.

Regards,
Nadav--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: fix mov immediate emulation for 64-bit operands

2012-01-08 Thread Nadav Amit

On Sun, Jan 8, 2012 at 3:26 AM, Takuya Yoshikawa
 wrote:
> Hi,
>
> Nadav Amit  wrote:
>> On Jan 7, 2012, at 10:25 PM, H. Peter Anvin wrote:
>>
>> > On 01/07/2012 12:21 PM, Nadav Amit wrote:
>> >> MOV immediate instruction (opcodes 0xB8-0xBF) may take 64-bit operand.
>> >> The previous emulation implementation assumes the operand is no longer 
>> >> than 32.
>> >>
>> >> Signed-off-by: Nadav Amit 
>> >
>> > There are exactly two such instructions: MOV immediate (B8-BF) and MOV
>> > moff (A0-A3); you may want to check the latter too.
>> >
>> >     -hpa
>> >
>>
>> These instructions (A0-A3) seem to be already covered by the decode_abs 
>> function.
>
> Like these how about introducing a new flag and change the following entries 
> in the
> decode table to indicate possible 64bit immediate:
>
>        /* 0xB8 - 0xBF */
>        X8(I(DstReg | SrcImm | Mov, em_mov)),
>
> Checking the opcode byte at the operand decoding stage, like below, does not 
> look nice:
>        (IMO so better ask Avi)
>
> +       if (size == 8 && ((ctxt->b & 0xF8) != 0xB8 || ctxt->twobyte))
>                size = 4;
>

I agree. I remembered these flags are expensive (from the time flags
were set in u32).
I guess I can add OpImm64.
Another less preferable alternative is to add a misc. flag or reuse
another flag.

Avi, please acknowledge adding OpImm64.

Regards,
Nadav
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: fix mov immediate emulation for 64-bit operands

2012-01-08 Thread Nadav Amit

On Sun, Jan 8, 2012 at 4:08 PM, Avi Kivity  wrote:
> On 01/08/2012 04:05 PM, Avi Kivity wrote:
>> >
>> > Avi, please acknowledge adding OpImm64.
>>
>> Yes, OpImm64 is the cleanest IMO.  Note it doesn't even cost us a bit.

Very well - I'll submit a revised patch later.

>
> Note, usually I'd ask for a unit test to accompany the fix, but the
> current framework only supports testing instructions that have a memroy
> operand.  So you're off the hook for now, unless you'd like to extend
> the framework.

Unfortunately, I don't. I am doing some odd things with KVM that cause
me to encounter emulator bugs.
I am likely to submit another patch or two as the emulation code has
additional "issues" handling EPT violations (especially on non-memory
operands).

Regards,
Nadav
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] KVM: fix mov immediate emulation for 64-bit operands

2012-01-11 Thread Nadav Amit

MOV immediate instruction (opcodes 0xB8-0xBF) may take 64-bit operand.
The previous emulation implementation assumes the operand is no longer than 32.
Adding OpImm64 for this matter.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c |   12 ++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 05a562b..9ad5c0b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -43,7 +43,7 @@
 #define OpCL   9ull  /* CL register (for shifts) */
 #define OpImmByte 10ull  /* 8-bit sign extended immediate */
 #define OpOne 11ull  /* Implied 1 */
-#define OpImm 12ull  /* Sign extended immediate */
+#define OpImm 12ull  /* Sign extended up to 32-bit immediate */
 #define OpMem16   13ull  /* Memory operand (16-bit). */
 #define OpMem32   14ull  /* Memory operand (32-bit). */
 #define OpImmU15ull  /* Immediate operand, zero extended */
@@ -57,6 +57,7 @@
 #define OpDS  23ull  /* DS */
 #define OpFS  24ull  /* FS */
 #define OpGS  25ull  /* GS */
+#define OpImm64  26ull  /* Sign extended 16/32/64-bit 
immediate */
 
 #define OpBits 5  /* Width of operand field */
 #define OpMask ((1ull << OpBits) - 1)
@@ -100,6 +101,7 @@
 #define SrcMemFAddr (OpMemFAddr << SrcShift)
 #define SrcAcc  (OpAcc << SrcShift)
 #define SrcImmU16   (OpImmU16 << SrcShift)
+#define SrcImm64(OpImm64 << SrcShift)
 #define SrcDX   (OpDX << SrcShift)
 #define SrcMask (OpMask << SrcShift)
 #define BitOp   (1<<11)
@@ -3365,7 +3367,7 @@ static struct opcode opcode_table[256] = {
/* 0xB0 - 0xB7 */
X8(I(ByteOp | DstReg | SrcImm | Mov, em_mov)),
/* 0xB8 - 0xBF */
-   X8(I(DstReg | SrcImm | Mov, em_mov)),
+   X8(I(DstReg | SrcImm64 | Mov, em_mov)),
/* 0xC0 - 0xC7 */
D2bv(DstMem | SrcImmByte | ModRM),
I(ImplicitOps | Stack | SrcImmU16, em_ret_near_imm),
@@ -3526,6 +3528,9 @@ static int decode_imm(struct x86_emulate_ctxt *ctxt, 
struct operand *op,
case 4:
op->val = insn_fetch(s32, ctxt);
break;
+   case 8:
+   op->val = insn_fetch(s64, ctxt);
+   break;
}
if (!sign_extension) {
switch (op->bytes) {
@@ -3605,6 +3610,9 @@ static int decode_operand(struct x86_emulate_ctxt *ctxt, 
struct operand *op,
case OpImm:
rc = decode_imm(ctxt, op, imm_size(ctxt), true);
break;
+   case OpImm64:
+   rc = decode_imm(ctxt, op, ctxt->op_bytes, true);
+   break;
case OpMem16:
ctxt->memop.bytes = 2;
goto mem_common;
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] KVM: Exception during emulation decode should propagate

2012-01-11 Thread Nadav Amit

An exception might occur during decode (e.g., #PF during fetch).
Currently, the exception is ignored and emulation is performed.
Instead, emulation should be skipped and the fault should be injected.
Skipping instruction should report a failure in this case.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c |3 +++
 arch/x86/kvm/x86.c |8 
 2 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 05a562b..e06dc98 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3869,6 +3869,9 @@ done:
if (ctxt->memopp && ctxt->memopp->type == OP_MEM && ctxt->rip_relative)
ctxt->memopp->addr.mem.ea += ctxt->_eip;
 
+   if (rc == X86EMUL_PROPAGATE_FAULT)
+   ctxt->have_exception = true;
+
return (rc != X86EMUL_CONTINUE) ? EMULATION_FAILED : EMULATION_OK;
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1171def..05fd3d7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4443,10 +4443,17 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
}
 
if (emulation_type & EMULTYPE_SKIP) {
+   if (ctxt->have_exception)
+   return EMULATE_FAIL;
kvm_rip_write(vcpu, ctxt->_eip);
return EMULATE_DONE;
}
 
+   if (ctxt->have_exception) {
+   writeback = false;
+   goto post;
+   }
+
if (retry_instruction(ctxt, cr2, emulation_type))
return EMULATE_DONE;
 
@@ -4470,6 +4477,7 @@ restart:
return handle_emulation_failure(vcpu);
}
 
+post:
if (ctxt->have_exception) {
inject_emulated_exception(vcpu);
r = EMULATE_DONE;
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] KVM: Fix writeback on page boundary that propagate changes in spite of #PF

2012-01-11 Thread Nadav Amit

Consider the case in which an instruction emulation writeback is performed on a 
page boundary.
In such case, if a #PF occurs on the second page, the write to the first page 
already occurred and cannot be retracted.
Therefore, validation of the second page access must be performed prior to 
writeback.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/x86.c |   13 +
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 05fd3d7..7af3d67 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3626,6 +3626,8 @@ struct read_write_emulator_ops {
   int bytes, void *val);
int (*read_write_exit_mmio)(struct kvm_vcpu *vcpu, gpa_t gpa,
void *val, int bytes);
+   gpa_t (*read_write_validate)(struct kvm_vcpu *vcpu, gva_t gva,
+struct x86_exception *exception);
bool write;
 };
 
@@ -3686,6 +3688,7 @@ static struct read_write_emulator_ops write_emultor = {
.read_write_emulate = write_emulate,
.read_write_mmio = write_mmio,
.read_write_exit_mmio = write_exit_mmio,
+   .read_write_validate = kvm_mmu_gva_to_gpa_write,
.write = true,
 };
 
@@ -3750,6 +3753,16 @@ int emulator_read_write(struct x86_emulate_ctxt *ctxt, 
unsigned long addr,
int rc, now;
 
now = -addr & ~PAGE_MASK;
+
+   /* First check there is no page-fault on the next page */
+   if (ops->read_write_validate &&
+   ops->read_write_validate(vcpu, addr+now, exception) ==
+   UNMAPPED_GVA) {
+   /* #PF on the first page should be reported first */
+   ops->read_write_validate(vcpu, addr, exception);
+   return X86EMUL_PROPAGATE_FAULT;
+   }
+
rc = emulator_read_write_onepage(addr, val, now, exception,
 vcpu, ops);
 
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: Exception during emulation decode should propagate

2012-01-12 Thread Nadav Amit

On Jan 12, 2012, at 2:26 AM, Takuya Yoshikawa wrote:

> (2012/01/12 7:11), Takuya Yoshikawa wrote:
>> On Wed, 11 Jan 2012 18:53:30 +0200
>> Nadav Amit  wrote:
>> 
>>> An exception might occur during decode (e.g., #PF during fetch).
>>> Currently, the exception is ignored and emulation is performed.
> 
> Note that the decode/emulation will not be continued in such a case.
> 
> insn_fetch() is a bit tricky macro and it contains "goto done" to outside.
> So if an error happens during fetching the instruction, x86_decode_insn()
> will handle the X86EMUL_* fault value and returns FAIL immediately.

You got a point. Yet, a problem still exists.
I now notice I was originally working on previous version (3.0.0)
where the return-code of x86_decode_insn is handled differently.
Nonetheless, I think the current implementation might report emulation
error in such a scenario (instead of triggering #PF/#GP in the guest).

>> 
>> When I cleaned up insn_fetch(), I thought that fetching the instruction
>> which is being executed by the guest cannot cause #PF.
>> 
>> The possibility that a meaningless userspace might similtaneously unmap
>> the page, noted by Avi IIRC, was ignored intentionally, so we just fail
>> in such a case.
>> 
>> Did you see any real problem?

Well, I run some research project for which I emulate instructions quite
often. I do see a real problem with Linux 3.0.0. Please note AFAIK #GP
might occur as well during instruction fetch. I don't think failing is the
right behavior in such case - there is no real reason to fail.

Please tell me whether you are OK with KVM failing in such a scenario.
If not - I'll send an updated patch (in which x86_decode_insn returns
EMULATION_OK when rc == X86EMUL_PROPAGATE_FAULT).

Regards,
Nadav 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] KVM: Fix writeback on page boundary that propagate changes in spite of #PF

2012-01-14 Thread Nadav Amit


On Jan 12, 2012, at 12:27 PM, Gleb Natapov wrote:

> On Thu, Jan 12, 2012 at 12:21:00PM +0200, Avi Kivity wrote:
>> On 01/12/2012 12:12 PM, Gleb Natapov wrote:
>>> On Wed, Jan 11, 2012 at 06:53:31PM +0200, Nadav Amit wrote:
>>>> Consider the case in which an instruction emulation writeback is performed 
>>>> on a page boundary.
>>>> In such case, if a #PF occurs on the second page, the write to the first 
>>>> page already occurred and cannot be retracted.
>>>> Therefore, validation of the second page access must be performed prior to 
>>>> writeback.
>>>> 
>>>> Signed-off-by: Nadav Amit 
>>>> ---
>>>> arch/x86/kvm/x86.c |   13 +
>>>> 1 files changed, 13 insertions(+), 0 deletions(-)
>>>> 
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index 05fd3d7..7af3d67 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -3626,6 +3626,8 @@ struct read_write_emulator_ops {
>>>>   int bytes, void *val);
>>>>int (*read_write_exit_mmio)(struct kvm_vcpu *vcpu, gpa_t gpa,
>>>>void *val, int bytes);
>>>> +  gpa_t (*read_write_validate)(struct kvm_vcpu *vcpu, gva_t gva,
>>>> +   struct x86_exception *exception);
>>>>bool write;
>>>> };
>>>> 
>>>> @@ -3686,6 +3688,7 @@ static struct read_write_emulator_ops write_emultor 
>>>> = {
>>>>.read_write_emulate = write_emulate,
>>>>.read_write_mmio = write_mmio,
>>>>.read_write_exit_mmio = write_exit_mmio,
>>>> +  .read_write_validate = kvm_mmu_gva_to_gpa_write,
>>>>.write = true,
>>>> };
>>>> 
>>>> @@ -3750,6 +3753,16 @@ int emulator_read_write(struct x86_emulate_ctxt 
>>>> *ctxt, unsigned long addr,
>>>>int rc, now;
>>>> 
>>>>now = -addr & ~PAGE_MASK;
>>>> +
>>>> +  /* First check there is no page-fault on the next page */
>>>> +  if (ops->read_write_validate &&
>>>> +  ops->read_write_validate(vcpu, addr+now, exception) ==
>>>> +  UNMAPPED_GVA) {
>>>> +  /* #PF on the first page should be reported first */
>>>> +  ops->read_write_validate(vcpu, addr, exception);
>>>> +  return X86EMUL_PROPAGATE_FAULT;
>>>> +  }
>>>> +
>>> This undoes optimization that vcpu_mmio_gva_to_gpa() has for handling
>>> mmio. 
>> 
>> Right.  I suggest changing I/O to have two phases: first, translate the
>> virtual address into an array of two physical addresses; check
>> exceptions and report.  Then do the actual writes.
>> 
>>> Furthermore for common (non faulting) case we will check page
>>> tables twice on each write that crosses page boundary, first time here
>>> and second time in emulator_read_write_onepage().
>> 
>> Those should be very uncommon.
>> 
> Still it is better to have all the checks in one place like you suggest
> above.

I agree. I'll submit a revised version soon.
I'm just struggling with this emulation code as more stuff is broken.

Regards,
Nadav

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: Fix MOVSX emulation

2012-01-14 Thread Nadav Amit

The destination register of MOVSX should be decoded similarily to MOVZX.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 05a562b..7644a83 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3553,7 +3553,8 @@ static int decode_operand(struct x86_emulate_ctxt *ctxt, 
struct operand *op,
case OpReg:
decode_register_operand(ctxt, op,
 op == &ctxt->dst &&
-ctxt->twobyte && (ctxt->b == 0xb6 || ctxt->b == 0xb7));
+ctxt->twobyte && ((ctxt->b & 0xfe) == 0xb6 ||
+   (ctxt->b & 0xfe) == 0xbe));
break;
case OpImmUByte:
rc = decode_imm(ctxt, op, 1, false);
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: Fix MOVSX emulation

2012-01-15 Thread Nadav Amit

Ok. I will include MOVSXD as well in the future patch.

Regards,
Nadav

On Sun, Jan 15, 2012 at 11:56 AM, Avi Kivity  wrote:
> On 01/14/2012 08:34 PM, Nadav Amit wrote:
>> The destination register of MOVSX should be decoded similarily to MOVZX.
>>
>> Signed-off-by: Nadav Amit 
>> ---
>>  arch/x86/kvm/emulate.c |    3 ++-
>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index 05a562b..7644a83 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -3553,7 +3553,8 @@ static int decode_operand(struct x86_emulate_ctxt 
>> *ctxt, struct operand *op,
>>       case OpReg:
>>               decode_register_operand(ctxt, op,
>>                        op == &ctxt->dst &&
>> -                      ctxt->twobyte && (ctxt->b == 0xb6 || ctxt->b == 
>> 0xb7));
>> +                      ctxt->twobyte && ((ctxt->b & 0xfe) == 0xb6 ||
>> +                             (ctxt->b & 0xfe) == 0xbe));
>>               break;
>>       case OpImmUByte:
>>               rc = decode_imm(ctxt, op, 1, false);
>
> Please post a unit test for this.  See
> git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git x86/emulate.c.
>
> --
> error compiling committee.c: too many arguments to function
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 1/2] KVM: Exception during emulation decode should propagate

2012-01-22 Thread Nadav Amit

An exception might occur during decode (e.g., #PF during fetch).
Currently, the exception is ignored and emulation is performed.
Instead, emulation should be skipped and the fault should be injected.
Skipping instruction should report a failure in this case.

Signed-off-by: Nadav Amit 
---
arch/x86/kvm/emulate.c |5 +
arch/x86/kvm/x86.c |8 
2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 05a562b..9a07dcd 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3869,6 +3869,11 @@ done:
if (ctxt->memopp && ctxt->memopp->type == OP_MEM && ctxt->rip_relative)
ctxt->memopp->addr.mem.ea += ctxt->_eip;

+   if (rc == X86EMUL_PROPAGATE_FAULT) {
+   ctxt->have_exception = true;
+   return EMULATION_OK;
+   }
+
return (rc != X86EMUL_CONTINUE) ? EMULATION_FAILED : EMULATION_OK;
}

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1171def..05fd3d7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4443,10 +4443,17 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
}

if (emulation_type & EMULTYPE_SKIP) {
+   if (ctxt->have_exception)
+   return EMULATE_FAIL;
kvm_rip_write(vcpu, ctxt->_eip);
return EMULATE_DONE;
}

+   if (ctxt->have_exception) {
+   writeback = false;
+   goto post;
+   }
+
if (retry_instruction(ctxt, cr2, emulation_type))
return EMULATE_DONE;

@@ -4470,6 +4477,7 @@ restart:
return handle_emulation_failure(vcpu);
}

+post:
if (ctxt->have_exception) {
inject_emulated_exception(vcpu);
r = EMULATE_DONE;
-- 
1.7.4.1


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 2/2] KVM: Fix writeback on page boundary that propagate changes in spite of #PF

2012-01-22 Thread Nadav Amit

Consider the case in which an instruction emulation writeback is performed on a 
page boundary.
In such case, if a #PF occurs on the second page, the write to the first page 
already occurred and cannot be retracted.
Therefore, validation of the second page access must be performed prior to 
writeback.

Signed-off-by: Nadav Amit 
---
arch/x86/kvm/x86.c |   60 +--
1 files changed, 34 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 05fd3d7..0e86f3a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3689,33 +3689,27 @@ static struct read_write_emulator_ops write_emultor = {
.write = true,
};

-static int emulator_read_write_onepage(unsigned long addr, void *val,
+static int emulator_read_write_onepage(gpa_t gpa, void *val,
   unsigned int bytes,
-  struct x86_exception *exception,
   struct kvm_vcpu *vcpu,
-  struct read_write_emulator_ops *ops)
+  struct read_write_emulator_ops *ops,
+  bool mmio)
{
-   gpa_t gpa;
-   int handled, ret;
+   int handled;
bool write = ops->write;

if (ops->read_write_prepare &&
  ops->read_write_prepare(vcpu, val, bytes))
return X86EMUL_CONTINUE;

-   ret = vcpu_mmio_gva_to_gpa(vcpu, addr, &gpa, exception, write);
-
-   if (ret < 0)
-   return X86EMUL_PROPAGATE_FAULT;
-
/* For APIC access vmexit */
-   if (ret)
-   goto mmio;
+   if (mmio)
+   goto do_mmio;

if (ops->read_write_emulate(vcpu, gpa, val, bytes))
return X86EMUL_CONTINUE;

-mmio:
+do_mmio:
/*
 * Is this MMIO handled locally?
 */
@@ -3744,24 +3738,38 @@ int emulator_read_write(struct x86_emulate_ctxt *ctxt, 
unsigned long addr,
struct read_write_emulator_ops *ops)
{
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
+   int i, rc;
+   bool write = ops->write;
+   gpa_t gpa[2];
+   int npages = (((addr + bytes - 1) ^ addr) & PAGE_MASK) ? 2 : 1;
+   unsigned int offset[2] = {0};
+   unsigned int p_bytes[2] = {bytes, 0};
+   bool mmio[2];
+
+   if (npages == 2) {
+   p_bytes[0] = offset[1] = -addr & ~PAGE_MASK;
+   p_bytes[1] = bytes - p_bytes[0];
+   }
+
+   /* First check there is no page-fault on the next page */
+   for (i = 0; i < npages; i++) {
+   rc = vcpu_mmio_gva_to_gpa(vcpu, addr + offset[i], &gpa[i],
+   exception, write);
+   if (rc < 0)
+   return X86EMUL_PROPAGATE_FAULT;
+   mmio[i] = (rc > 0);
+   }

-   /* Crossing a page boundary? */
-   if (((addr + bytes - 1) ^ addr) & PAGE_MASK) {
-   int rc, now;
-
-   now = -addr & ~PAGE_MASK;
-   rc = emulator_read_write_onepage(addr, val, now, exception,
-vcpu, ops);
-
+   /* Actual read/write */
+   for (i = 0; i < npages; i++) {
+   rc = emulator_read_write_onepage(gpa[i], val + offset[i],
+p_bytes[i], vcpu, ops,
+mmio[i]);
if (rc != X86EMUL_CONTINUE)
return rc;
-   addr += now;
-   val += now;
-   bytes -= now;
}

-   return emulator_read_write_onepage(addr, val, bytes, exception,
-  vcpu, ops);
+   return X86EMUL_CONTINUE;
}

static int emulator_read_emulated(struct x86_emulate_ctxt *ctxt,
-- 
1.7.4.1


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: Emulator: Opcode 0x80 gets a byte immediate

2012-04-27 Thread Nadav Amit

The single byte opcode 0x80 should get a byte immediate.
The previous emulation is broken for opcode 0x80.
---
 arch/x86/kvm/emulate.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 8375622..dd53e53 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3464,7 +3464,7 @@ static struct opcode opcode_table[256] = {
/* 0x70 - 0x7F */
X16(D(SrcImmByte)),
/* 0x80 - 0x87 */
-   G(ByteOp | DstMem | SrcImm | ModRM | Group, group1),
+   G(ByteOp | DstMem | SrcImmByte | ModRM | Group, group1),
G(DstMem | SrcImm | ModRM | Group, group1),
G(ByteOp | DstMem | SrcImm | ModRM | No64 | Group, group1),
G(DstMem | SrcImmByte | ModRM | Group, group1),
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 7/9] KVM: x86: rdpmc emulation checks the counter incorrectly

2014-06-02 Thread Nadav Amit

The rdpmc emulation checks that the counter (ECX) is not higher than 2, without
taking into considerations bits 30:31 role (e.g., bit 30 marks whether the
counter is fixed). The fix uses the pmu information for checking the validity
of the pmu counter.

Signed-off-by: Nadav Amit 
---
 arch/x86/include/asm/kvm_emulate.h | 1 +
 arch/x86/include/asm/kvm_host.h| 1 +
 arch/x86/kvm/emulate.c | 2 +-
 arch/x86/kvm/pmu.c | 9 +
 arch/x86/kvm/x86.c | 7 +++
 5 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index a04fe4e..ffa2671 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -194,6 +194,7 @@ struct x86_emulate_ops {
int (*set_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong value);
int (*set_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 data);
int (*get_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 
*pdata);
+   int (*check_pmc)(struct x86_emulate_ctxt *ctxt, u32 pmc);
int (*read_pmc)(struct x86_emulate_ctxt *ctxt, u32 pmc, u64 *pdata);
void (*halt)(struct x86_emulate_ctxt *ctxt);
void (*wbinvd)(struct x86_emulate_ctxt *ctxt);
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4931415..63e020b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1070,6 +1070,7 @@ void kvm_pmu_cpuid_update(struct kvm_vcpu *vcpu);
 bool kvm_pmu_msr(struct kvm_vcpu *vcpu, u32 msr);
 int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
 int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
+int kvm_pmu_check_pmc(struct kvm_vcpu *vcpu, unsigned pmc);
 int kvm_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
 void kvm_handle_pmu_event(struct kvm_vcpu *vcpu);
 void kvm_deliver_pmi(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index be3f764..3da8d82 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3507,7 +3507,7 @@ static int check_rdpmc(struct x86_emulate_ctxt *ctxt)
u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
 
if ((!(cr4 & X86_CR4_PCE) && ctxt->ops->cpl(ctxt)) ||
-   (rcx > 3))
+   ctxt->ops->check_pmc(ctxt, rcx))
return emulate_gp(ctxt, 0);
 
return X86EMUL_CONTINUE;
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index cbecaa9..3dd6acc 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -428,6 +428,15 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data 
*msr_info)
return 1;
 }
 
+int kvm_pmu_check_pmc(struct kvm_vcpu *vcpu, unsigned pmc)
+{
+   struct kvm_pmu *pmu = &vcpu->arch.pmu;
+   bool fixed = pmc & (1u << 30);
+   pmc &= ~(3u << 30);
+   return (!fixed && pmc >= pmu->nr_arch_gp_counters) ||
+   (fixed && pmc >= pmu->nr_arch_fixed_counters);
+}
+
 int kvm_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data)
 {
struct kvm_pmu *pmu = &vcpu->arch.pmu;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 57eac30..2c6fccd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4759,6 +4759,12 @@ static int emulator_set_msr(struct x86_emulate_ctxt 
*ctxt,
return kvm_set_msr(emul_to_vcpu(ctxt), &msr);
 }
 
+static int emulator_check_pmc(struct x86_emulate_ctxt *ctxt,
+ u32 pmc)
+{
+   return kvm_pmu_check_pmc(emul_to_vcpu(ctxt), pmc);
+}
+
 static int emulator_read_pmc(struct x86_emulate_ctxt *ctxt,
 u32 pmc, u64 *pdata)
 {
@@ -4835,6 +4841,7 @@ static const struct x86_emulate_ops emulate_ops = {
.set_dr  = emulator_set_dr,
.set_msr = emulator_set_msr,
.get_msr = emulator_get_msr,
+   .check_pmc   = emulator_check_pmc,
.read_pmc= emulator_read_pmc,
.halt= emulator_halt,
.wbinvd  = emulator_wbinvd,
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/9] KVM: x86: movnti minimum op size of 32-bit is not kept

2014-06-02 Thread Nadav Amit

If the operand-size prefix (0x66) is used in 64-bit mode, the emulator would
assume the destination operand is 64-bit, when it should be 32-bit.

Reminder: movnti does not support 16-bit operands and its default operand size
is 32-bit.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 4cb0da6..be3f764 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4836,8 +4836,8 @@ twobyte_insn:
break;
case 0xc3:  /* movnti */
ctxt->dst.bytes = ctxt->op_bytes;
-   ctxt->dst.val = (ctxt->op_bytes == 4) ? (u32) ctxt->src.val :
-   (u64) ctxt->src.val;
+   ctxt->dst.val = (ctxt->op_bytes == 8) ? (u64) ctxt->src.val :
+   (u32) ctxt->src.val;
break;
default:
goto cannot_emulate;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/9] KVM: x86: Mark VEX-prefix instructions emulation as unimplemented

2014-06-02 Thread Nadav Amit

Currently the emulator does not recognize vex-prefix instructions.  However, it
may incorrectly decode lgdt/lidt instructions and try to execute them. This
patch returns unhandlable error on their emulation.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e4e833d..8ec4a3e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4314,6 +4314,13 @@ done_prefixes:
if (ctxt->d & ModRM)
ctxt->modrm = insn_fetch(u8, ctxt);
 
+   /* vex-prefix instructions are not implemented */
+   if (ctxt->opcode_len == 1 && (ctxt->b == 0xc5 || ctxt->b == 0xc4) &&
+   (mode == X86EMUL_MODE_PROT64 ||
+   (mode >= X86EMUL_MODE_PROT16 && (ctxt->modrm & 0x80 {
+   ctxt->d = NotImpl;
+   }
+
while (ctxt->d & GroupMask) {
switch (ctxt->d & GroupMask) {
case Group:
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/9] KVM: x86: cmpxchg emulation should compare in reverse order

2014-06-02 Thread Nadav Amit

The current implementation of cmpxchg does not update the flags correctly,
since the accumulator should be compared with the destination and not the other
way around. The current implementation does not update the flags correctly.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index a16bf22..4cb0da6 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2052,8 +2052,10 @@ static int em_ret_far_imm(struct x86_emulate_ctxt *ctxt)
 static int em_cmpxchg(struct x86_emulate_ctxt *ctxt)
 {
/* Save real source value, then compare EAX against destination. */
+   ctxt->dst.orig_val = ctxt->dst.val;
+   ctxt->dst.val = reg_read(ctxt, VCPU_REGS_RAX);
ctxt->src.orig_val = ctxt->src.val;
-   ctxt->src.val = reg_read(ctxt, VCPU_REGS_RAX);
+   ctxt->src.val = ctxt->dst.orig_val;
fastop(ctxt, em_cmp);
 
if (ctxt->eflags & EFLG_ZF) {
@@ -2063,6 +2065,7 @@ static int em_cmpxchg(struct x86_emulate_ctxt *ctxt)
/* Failure: write the value we saw to EAX. */
ctxt->dst.type = OP_REG;
ctxt->dst.addr.reg = reg_rmw(ctxt, VCPU_REGS_RAX);
+   ctxt->dst.val = ctxt->dst.orig_val;
}
return X86EMUL_CONTINUE;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 9/9] KVM: x86: smsw emulation is incorrect in 64-bit mode

2014-06-02 Thread Nadav Amit

In 64-bit mode, when the destination is a register, the assignment is done
according to the operand size. Otherwise (memory operand or no 64-bit mode), a
16-bit assignment is performed.

Currently, 16-bit assignment is always done to the destination.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index a151f8d..9b5d97d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3235,7 +3235,8 @@ static int em_lidt(struct x86_emulate_ctxt *ctxt)
 
 static int em_smsw(struct x86_emulate_ctxt *ctxt)
 {
-   ctxt->dst.bytes = 2;
+   if (ctxt->dst.type == OP_MEM)
+   ctxt->dst.bytes = 2;
ctxt->dst.val = ctxt->ops->get_cr(ctxt, 0);
return X86EMUL_CONTINUE;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 8/9] KVM: x86: Return error on cmpxchg16b emulation

2014-06-02 Thread Nadav Amit

cmpxchg16b is currently unimplemented in the emulator. The least we can do is
return error upon the emulation of this instruction.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 3da8d82..a151f8d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1999,6 +1999,9 @@ static int em_cmpxchg8b(struct x86_emulate_ctxt *ctxt)
 {
u64 old = ctxt->dst.orig_val64;
 
+   if (ctxt->dst.bytes == 16)
+   return X86EMUL_UNHANDLEABLE;
+
if (((u32) (old >> 0) != (u32) reg_read(ctxt, VCPU_REGS_RAX)) ||
((u32) (old >> 32) != (u32) reg_read(ctxt, VCPU_REGS_RDX))) {
*reg_write(ctxt, VCPU_REGS_RAX) = (u32) (old >> 0);
@@ -4077,7 +4080,7 @@ static int decode_operand(struct x86_emulate_ctxt *ctxt, 
struct operand *op,
op->orig_val = op->val;
break;
case OpMem64:
-   ctxt->memop.bytes = 8;
+   ctxt->memop.bytes = (ctxt->op_bytes == 8) ? 16 : 8;
goto mem_common;
case OpAcc:
op->type = OP_REG;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/9] KVM: x86: sgdt and sidt are not privilaged

2014-06-02 Thread Nadav Amit

The SGDT and SIDT instructions are not privilaged, i.e. they can be executed
with CPL>0.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 7e4a45c..a16bf22 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3642,8 +3642,8 @@ static const struct opcode group6[] = {
 };
 
 static const struct group_dual group7 = { {
-   II(Mov | DstMem | Priv, em_sgdt, sgdt),
-   II(Mov | DstMem | Priv, em_sidt, sidt),
+   II(Mov | DstMem,em_sgdt, sgdt),
+   II(Mov | DstMem,em_sidt, sidt),
II(SrcMem | Priv,   em_lgdt, lgdt),
II(SrcMem | Priv,   em_lidt, lidt),
II(SrcNone | DstMem | Mov,  em_smsw, smsw), N,
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/9] KVM: x86: Fixes for various emulator bugs

2014-06-02 Thread Nadav Amit

The x86 emulator of KVM is buggy. This series of patches includes fixes for
various bugs which were detected. Each patch stands on its own.  Two patches do
not fix KVM emulation, but cause the emulator to fail more nicely by returning
an unhandlable error, instead of performing wrong emulation (VEX-prefix and
cmpxchg16b).  The fix for rdpmc is a bit intrusive to keep SVM behavior intact.

Thanks for reviewing the patches.

Nadav Amit (9):
  KVM: x86: Mark VEX-prefix instructions emulation as unimplemented
  KVM: x86: Emulator ignores LDTR/TR extended base on LLDT/LTR
  KVM: x86: Loading segments on 64-bit mode may be wrong
  KVM: x86: sgdt and sidt are not privilaged
  KVM: x86: cmpxchg emulation should compare in reverse order
  KVM: x86: movnti minimum op size of 32-bit is not kept
  KVM: x86: rdpmc emulation checks the counter incorrectly
  KVM: x86: Return error on cmpxchg16b emulation
  KVM: x86: smsw emulation is incorrect in 64-bit mode

 arch/x86/include/asm/kvm_emulate.h |  1 +
 arch/x86/include/asm/kvm_host.h|  1 +
 arch/x86/kvm/emulate.c | 44 --
 arch/x86/kvm/pmu.c |  9 
 arch/x86/kvm/x86.c |  7 ++
 5 files changed, 51 insertions(+), 11 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/9] KVM: x86: Loading segments on 64-bit mode may be wrong

2014-06-02 Thread Nadav Amit

The current emulator implementation ignores the high 32 bits of the base in
long-mode.  During segment load from the LDT, the base of the LDT is calculated
incorrectly and may cause the wrong segment to be loaded.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 136088f..7e4a45c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1358,17 +1358,19 @@ static void get_descriptor_table_ptr(struct 
x86_emulate_ctxt *ctxt,
 u16 selector, struct desc_ptr *dt)
 {
const struct x86_emulate_ops *ops = ctxt->ops;
+   u32 base3 = 0;
 
if (selector & 1 << 2) {
struct desc_struct desc;
u16 sel;
 
memset (dt, 0, sizeof *dt);
-   if (!ops->get_segment(ctxt, &sel, &desc, NULL, VCPU_SREG_LDTR))
+   if (!ops->get_segment(ctxt, &sel, &desc, &base3,
+ VCPU_SREG_LDTR))
return;
 
dt->size = desc_limit_scaled(&desc); /* what if limit > 65535? 
*/
-   dt->address = get_desc_base(&desc);
+   dt->address = get_desc_base(&desc) | ((u64)base3 << 32);
} else
ops->get_gdt(ctxt, dt);
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/9] KVM: x86: Emulator ignores LDTR/TR extended base on LLDT/LTR

2014-06-02 Thread Nadav Amit

The current implementation ignores the LDTR/TR base high 32-bits on long-mode.
As a result the loaded segment descriptor may be incorrect.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 8ec4a3e..136088f 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1422,6 +1422,7 @@ static int __load_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
ulong desc_addr;
int ret;
u16 dummy;
+   u32 base3 = 0;
 
memset(&seg_desc, 0, sizeof seg_desc);
 
@@ -1538,9 +1539,14 @@ static int __load_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
ret = write_segment_descriptor(ctxt, selector, &seg_desc);
if (ret != X86EMUL_CONTINUE)
return ret;
+   } else if (ctxt->mode == X86EMUL_MODE_PROT64) {
+   ret = ctxt->ops->read_std(ctxt, desc_addr+8, &base3,
+   sizeof(base3), &ctxt->exception);
+   if (ret != X86EMUL_CONTINUE)
+   return ret;
}
 load:
-   ctxt->ops->set_segment(ctxt, selector, &seg_desc, 0, seg);
+   ctxt->ops->set_segment(ctxt, selector, &seg_desc, base3, seg);
return X86EMUL_CONTINUE;
 exception:
emulate_exception(ctxt, err_vec, err_code, true);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] kvm: x86: emulate monitor and mwait instructions as nop

2014-06-04 Thread Nadav Amit


On Jun 4, 2014, at 7:34 PM, Paolo Bonzini  wrote:

> Il 04/06/2014 16:44, Alexander Graf ha scritto:
>> 
>> 
>>> Obviously, if you really like the current behavior better you can
>>> always reject whatever patch I'll come up with, but I'd like to at
>>> least try and see what it would look like :)
>> 
>> I think it's perfectly fine to leave mwait always implemented as NOP -
>> it's valid behavior.
>> 
>> As for the CPUID exposure, that should be a pure QEMU thing. If
>> overriding CPUID bits the kernel mask tells us doesn't work today, we
>> should just make it possible :).
> 
> That should be the purpose of KVM_GET_EMULATED_CPUID, so MWAIT could be added 
> in __do_cpuid_ent_emulated.  However, the corresponding QEMU patches were 
> never included.  Borislav, can you refresh them?


Regardless to the whole discussion of what the guest is informed about, I think 
it might be better to implement mwait and monitor correctly according to the 
spec and let the instructions to be fully emulated.
Both mwait and monitor may encounter exceptions (#GP, #PF, regardless of #UD), 
so this behaviour should be correct.
If you want me, I’ll send my version which looks something like:

static int em_monitor(struct x86_emulate_ctxt *ctxt)
{
int rc;
struct segmented_address addr;
u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
u64 rax = reg_read(ctxt, VCPU_REGS_RAX);
u8 byte;

rc = check_mwait_supported(ctxt);
if (rc != X86EMUL_CONTINUE)
return rc;

if (ctxt->mode != X86EMUL_MODE_PROT64)
rcx = (u32)rcx;

if (rcx != 0)
return emulate_gp(ctxt, 0);

addr.seg = seg_override(ctxt);
addr.ea = ctxt->ad_bytes == 8 ? rax : (u32)rax;

rc = segmented_read(ctxt, addr, &byte, 1);
if (rc != X86EMUL_CONTINUE)
return rc;

return X86EMUL_CONTINUE;
}

static int em_mwait(struct x86_emulate_ctxt *ctxt)
{
u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
int rc = check_mwait_supported(ctxt);
if (rc != X86EMUL_CONTINUE)
return rc;
if (ctxt->mode != X86EMUL_MODE_PROT64)
rcx = (u32)rcx;

if ((rcx & ~(u64)1) != 0)
return emulate_gp(ctxt, 0);

if (rcx & 1) {
/* Interrupt as break event */
u32 ebx, ecx, edx, eax;
eax = 5;
ecx = 0;
ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx);
if (!(ecx & 1))
return emulate_gp(ctxt, 0);
}
return X86EMUL_CONTINUE;
}



signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [PATCH v2] kvm: x86: emulate monitor and mwait instructions as nop

2014-06-04 Thread Nadav Amit

On Jun 4, 2014, at 10:43 PM, Gabriel L. Somlo  wrote:

My implementation still emulates the instruction as a NOP, but first checks for 
an exception.
> On Wed, Jun 04, 2014 at 10:12:39PM +0300, Nadav Amit wrote:
> 
> I'd be curious how you're dealing with the "hidden" CPU state which
> tells MWAIT to sleep until someone or something writes to the
> monitored memory area set up by a corresponding MONITOR instruction.
>> Regardless to the whole discussion of what the guest is informed about, I 
>> think it might be better to implement mwait and monitor correctly according 
>> to the spec and let the instructions to be fully emulated.
>> Both mwait and monitor may encounter exceptions (#GP, #PF, regardless of 
>> #UD), so this behaviour should be correct.
>> If you want me, I?ll send my version which looks something like:
>> 
>> static int em_monitor(struct x86_emulate_ctxt *ctxt)
>> {
>>  int rc;
>>  struct segmented_address addr;
>>  u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
>>  u64 rax = reg_read(ctxt, VCPU_REGS_RAX);
>>  u8 byte;
>> 
>>  rc = check_mwait_supported(ctxt);
>>  if (rc != X86EMUL_CONTINUE)
>>  return rc;
>> 
>>  if (ctxt->mode != X86EMUL_MODE_PROT64)
>>  rcx = (u32)rcx;
>> 
>>  if (rcx != 0)
>>  return emulate_gp(ctxt, 0);
>> 
>>  addr.seg = seg_override(ctxt);
>>  addr.ea = ctxt->ad_bytes == 8 ? rax : (u32)rax;
>> 
>>  rc = segmented_read(ctxt, addr, &byte, 1);
>>  if (rc != X86EMUL_CONTINUE)
>>  return rc;
>> 
>>  return X86EMUL_CONTINUE;
>> }
>> 
>> static int em_mwait(struct x86_emulate_ctxt *ctxt)
>> {
>>  u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
>>  int rc = check_mwait_supported(ctxt);
>>  if (rc != X86EMUL_CONTINUE)
>>  return rc;
>>  if (ctxt->mode != X86EMUL_MODE_PROT64)
>>  rcx = (u32)rcx;
>> 
>>  if ((rcx & ~(u64)1) != 0)
>>  return emulate_gp(ctxt, 0);
>> 
>>  if (rcx & 1) {
>>  /* Interrupt as break event */
>>  u32 ebx, ecx, edx, eax;
>>  eax = 5;
>>  ecx = 0;
>>  ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx);
>>  if (!(ecx & 1))
>>  return emulate_gp(ctxt, 0);
>>  }
>>  return X86EMUL_CONTINUE;
>> }

Anyhow, if you want a real mwait emulation, you can write-protect the page of 
the monitored memory area in the EPT of the other VCPUs and set a callback once 
a write to the area takes place. You may want the host to cause a spurious 
wakeup after you do the write-protection, so you will not miss a write of 
another VCPU to the monitored area. After the spurious wake-up, the VM is 
likely to issue an additional mwait, using the same monitored cache-line.

Additional care for DMAs (emulated and paravirtual) might be needed with the 
assistance of QEMU. The complicated case is dealing with the DMAs of assigned 
devices due to the lack of support for I/O page-faules.

Nadav

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] kvm: x86: emulate monitor and mwait instructions as nop

2014-06-04 Thread Nadav Amit


On Jun 4, 2014, at 11:11 PM, Gabriel L. Somlo  wrote:

> On Wed, Jun 04, 2014 at 11:01:50PM +0300, Nadav Amit wrote:
>> On Jun 4, 2014, at 10:43 PM, Gabriel L. Somlo  wrote:
>> 
>> My implementation still emulates the instruction as a NOP, but first checks 
>> for an exception.
> 
> [...] 
> 
>> Anyhow, if you want a real mwait emulation, you can write-protect the page 
>> of the monitored memory area in the EPT of the other VCPUs and set a 
>> callback once a write to the area takes place. You may want the host to 
>> cause a spurious wakeup after you do the write-protection, so you will not 
>> miss a write of another VCPU to the monitored area. After the spurious 
>> wake-up, the VM is likely to issue an additional mwait, using the same 
>> monitored cache-line.
>> 
>> Additional care for DMAs (emulated and paravirtual) might be needed with the 
>> assistance of QEMU. The complicated case is dealing with the DMAs of 
>> assigned devices due to the lack of support for I/O page-faules.
> 
> I took a stab at something like that a while ago:
> 
> http://www.contrib.andrew.cmu.edu/~somlo/OSXKVM/kvm-mwait-emu-20140205.patch
> 
> with limited success, probably due to my lack of familiarity with
> the fine details of the KVM code base... :)
> 
> My main interest was to get it working well enough to be useful for
> idle loops (which is the only thing I know of that either Linux or
> OS X use monitor and mwait for, currently -- so DMA wasn't a huge
> priority).
> 
> Even if we got it working well enough in the general case (any number
> of vcpus, etc) I think it would still suck for idle loops when
> compared to simply falling back to HLT (mainly due to all the TLB
> shootdowns required to make it work) :)

MWAIT is likely not to perform as good as HLT, but you may play with other 
techniques, like clearing the dirty flag of the page on other VCPUs and 
periodically check it.
You will still need TLB shootdowns at that case, and wakeup might be slower, 
but at least you will have fewer emulations and exits.
Anyhow, I would try to avoid using the emulator for your purpose (especially 
since your guest crashes), and try to use the MTF feature instead - at least to 
get the system to work.

Regards,
Nadav--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 9/9] KVM: x86: smsw emulation is incorrect in 64-bit mode

2014-06-05 Thread Nadav Amit


On Jun 5, 2014, at 5:53 PM, Paolo Bonzini  wrote:

> Il 02/06/2014 17:34, Nadav Amit ha scritto:
>> In 64-bit mode, when the destination is a register, the assignment is done
>> according to the operand size. Otherwise (memory operand or no 64-bit mode), 
>> a
>> 16-bit assignment is performed.
> 
> I'm sorry, I'm missing the place where 64-bit mode is taken into account?
It is not, since on 32-bit mode the high-order 16 bits of a register 
destination are undefined. 
If I recall correctly, in this case the high-order 16-bits on native system 
actually reflect the high-order 16-bits of CR0.

Therefore, regardless to the mode of operation:
1. If the target is memory - the destination is 16-bit
2. If the target is a register - the register is written according to the 
operand size.

Nadav--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH kvm-unit-tests 2/2] x86: realmode: test smsw behavior with register operand

2014-06-05 Thread Nadav Amit

The smsw instruction has an undocumented behavior, in which the high-order
16-bits of CR0 are also saved in a 32-bit destination register.  This is
similar to the way smsw behaves in long-mode. However, it is hard to test the
long-mode case, since we need to cause an "invalid guest state" in long-mode.

The test works as follows: it sets CR0.CD (bit 30), so any of the high 16-bits
would be set.  It then executes smsw to register destination and compares the
register value with that of CR0.  CR0 value is restored when the test is done.

This test is expected to fail only when unrestricted mode is disabled or
unsupported.

Signed-off-by: Nadav Amit 
---
 x86/realmode.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/x86/realmode.c b/x86/realmode.c
index 839ac34..6e74883 100644
--- a/x86/realmode.c
+++ b/x86/realmode.c
@@ -1646,6 +1646,20 @@ void test_dr_mod(void)
report("mov dr with mod bits", R_AX | R_BX, outregs.eax == 0xaced);
 }
 
+void test_smsw(void)
+{
+   MK_INSN(smsw, "movl %cr0, %ebx\n\t"
+ "movl %ebx, %ecx\n\t"
+ "or $0x4000, %ebx\n\t"
+ "movl %ebx, %cr0\n\t"
+ "smswl %eax\n\t"
+ "movl %ecx, %cr0\n\t");
+   inregs.eax = 0x12345678;
+   exec_in_big_real_mode(&insn_smsw);
+   report("smsw", R_AX | R_BX | R_CX, outregs.eax == outregs.ebx);
+}
+
+
 void realmode_start(void)
 {
test_null();
@@ -1692,6 +1706,7 @@ void realmode_start(void)
test_salc();
test_fninit();
test_dr_mod();
+   test_smsw();
test_nopl();
test_perf_loop();
test_perf_mov();
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH kvm-unit-tests1/2] x86: emulator: additional smsw test-case

2014-06-05 Thread Nadav Amit

An additional test case for the emulator was added to test smsw which is
trapped by the emulator.  The other existing test-cases occur in the guest (at
least on VMX), since the values are read directly from the CR0 read shadow.

Signed-off-by: Nadav Amit 
---
 x86/emulator.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/x86/emulator.c b/x86/emulator.c
index 20e3a45..033f246 100644
--- a/x86/emulator.c
+++ b/x86/emulator.c
@@ -336,7 +336,7 @@ void test_incdecnotneg(void *mem)
 report("lock notb", *mb == vb);
 }
 
-void test_smsw(void)
+void test_smsw(uint64_t *h_mem)
 {
char mem[16];
unsigned short msw, msw_orig, *pmsw;
@@ -355,6 +355,12 @@ void test_smsw(void)
if (i != 4 && pmsw[i])
zero = 0;
report("smsw (2)", msw == pmsw[4] && zero);
+
+   /* Trigger exit on smsw */
+   *h_mem = 0x12345678abcdeful;
+   asm volatile("smsw %0" : "=m"(*h_mem));
+   report("smsw (3)", msw == (unsigned short)*h_mem &&
+   (*h_mem & ~0xul) == 0x12345678abul);
 }
 
 void test_lmsw(void)
@@ -998,7 +1004,7 @@ int main()
 
test_cr8();
 
-   test_smsw();
+   test_smsw(mem);
test_lmsw();
test_ljmp(mem);
test_stringio();
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH kvm-unit-tests 0/2] x86: Additional smsw tests

2014-06-05 Thread Nadav Amit

This patch set adds two tests for smsw. The first one is intended to add
coverage of smsw. It covers the case smsw is executed with memory operand in a
page which is write-protected by the hypervisor. Note that the existing smsw
tests are not supposed to be trapped by the hypervisor. This test was added
just for additional coverage.

The realmode smsw test covers the recent patch that saves the high 16-bits to
32-bit register operand. Implementing a long-mode test is difficult since we
need to cause an "invalid guest state" in long-mode.

Nadav Amit (2):
  x86: emulator: additional smsw test-case
  x86: realmode: test smsw behavior with register operand

 x86/emulator.c | 10 --
 x86/realmode.c | 15 +++
 2 files changed, 23 insertions(+), 2 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] x86: realmode: report failures

2014-06-05 Thread Nadav Amit

The current realmode tests always report success when done, regardless to
whether any of the tests failed.  Although the log includes the individual test
results, this behavior complicates the life of the tester.

Signed-off-by: Nadav Amit 
---
 x86/realmode.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/x86/realmode.c b/x86/realmode.c
index 6e74883..dc4a1d3 100644
--- a/x86/realmode.c
+++ b/x86/realmode.c
@@ -103,6 +103,8 @@ static void print_serial_u32(u32 value)
print_serial(p);
 }
 
+static int failed;
+
 static void exit(int code)
 {
outb(code, 0xf4);
@@ -222,6 +224,8 @@ static void report(const char *name, u16 regs_ignore, _Bool 
ok)
 print_serial(ok ? "PASS: " : "FAIL: ");
 print_serial(name);
 print_serial("\n");
+if (!ok)
+   failed = 1;
 }
 
 #define MK_INSN(name, str) \
@@ -1715,7 +1719,7 @@ void realmode_start(void)
test_perf_memory_store();
test_perf_memory_rmw();
 
-   exit(0);
+   exit(failed);
 }
 
 unsigned long long r_gdt[] = { 0, 0x9b00, 0x9300 };
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH kvm-unit-tests v2] x86: emulator: long mode smsw tests

2014-06-08 Thread Nadav Amit

SMSW instruction on long-mode is performed according to the operand size, if
the destination operand is a register.  This patch tests whether it is
performed correctly instead of always using two-bytes operands (as KVM
previously did).  Note that when a dword destination operand is used, the
result is zero-extended to qword on long-mode.

Signed-off-by: Nadav Amit 
---
 x86/emulator.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/x86/emulator.c b/x86/emulator.c
index 033f246..f653127 100644
--- a/x86/emulator.c
+++ b/x86/emulator.c
@@ -825,6 +825,26 @@ static void test_movabs(uint64_t *mem, uint8_t *insn_page,
 report("64-bit mov imm2", outregs.rcx == 0x9090909090909090);
 }
 
+static void test_smsw_reg(uint64_t *mem, uint8_t *insn_page,
+ uint8_t *alt_insn_page, void *insn_ram)
+{
+   unsigned long cr0 = read_cr0();
+   inregs = (struct regs){ .rax = 0x1234567890abcdeful };
+
+   MK_INSN(smsww, "smsww %ax\n\t");
+   trap_emulator(mem, alt_insn_page, &insn_smsww);
+   report("16-bit smsw reg", (u16)outregs.rax == (u16)cr0 &&
+ outregs.rax >> 16 == inregs.rax >> 16);
+
+   MK_INSN(smswl, "smswl %eax\n\t");
+   trap_emulator(mem, alt_insn_page, &insn_smswl);
+   report("32-bit smsw reg", outregs.rax == (u32)cr0);
+
+   MK_INSN(smswq, "smswq %rax\n\t");
+   trap_emulator(mem, alt_insn_page, &insn_smswq);
+   report("64-bit smsw reg", outregs.rax == cr0);
+}
+
 static void test_crosspage_mmio(volatile uint8_t *mem)
 {
 volatile uint16_t w, *pw;
@@ -1024,6 +1044,7 @@ int main()
 
test_mmx_movq_mf(mem, insn_page, alt_insn_page, insn_ram);
test_movabs(mem, insn_page, alt_insn_page, insn_ram);
+   test_smsw_reg(mem, insn_page, alt_insn_page, insn_ram);
 
test_crosspage_mmio(mem);
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/6] KVM: x86: bit-ops emulation ignores offset on 64-bit

2014-06-15 Thread Nadav Amit

The current emulation of bit operations ignores the offset from the destination
on 64-bit target memory operands. This patch fixes this behavior.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e4e833d..f0b0a10 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1220,12 +1220,14 @@ static void fetch_bit_operand(struct x86_emulate_ctxt 
*ctxt)
long sv = 0, mask;
 
if (ctxt->dst.type == OP_MEM && ctxt->src.type == OP_REG) {
-   mask = ~(ctxt->dst.bytes * 8 - 1);
+   mask = ~((long)ctxt->dst.bytes * 8 - 1);
 
if (ctxt->src.bytes == 2)
sv = (s16)ctxt->src.val & (s16)mask;
else if (ctxt->src.bytes == 4)
sv = (s32)ctxt->src.val & (s32)mask;
+   else
+   sv = (s64)ctxt->src.val & (s64)mask;
 
ctxt->dst.addr.mem.ea += (sv >> 3);
}
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/6] KVM: x86: check DR6/7 high-bits are clear only on long-mode

2014-06-15 Thread Nadav Amit

From: Nadav Amit 

When the guest sets DR6 and DR7, KVM asserts the high 32-bits are clear, and
otherwise injects a #GP exception. This exception should only be injected only
if running in long-mode.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/x86.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 57eac30..71fe841 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -756,6 +756,15 @@ static void kvm_update_dr7(struct kvm_vcpu *vcpu)
vcpu->arch.switch_db_regs |= KVM_DEBUGREG_BP_ENABLED;
 }
 
+static bool is_64_bit_mode(struct kvm_vcpu *vcpu)
+{
+   int cs_db, cs_l;
+   if (!is_long_mode(vcpu))
+   return false;
+   kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l);
+   return cs_l;
+}
+
 static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val)
 {
switch (dr) {
@@ -769,7 +778,7 @@ static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, 
unsigned long val)
return 1; /* #UD */
/* fall through */
case 6:
-   if (val & 0xULL)
+   if ((val & 0xULL) && is_64_bit_mode(vcpu))
return -1; /* #GP */
vcpu->arch.dr6 = (val & DR6_VOLATILE) | DR6_FIXED_1;
kvm_update_dr6(vcpu);
@@ -779,7 +788,7 @@ static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, 
unsigned long val)
return 1; /* #UD */
/* fall through */
default: /* 7 */
-   if (val & 0xULL)
+   if ((val & 0xULL) && is_64_bit_mode(vcpu))
return -1; /* #GP */
vcpu->arch.dr7 = (val & DR7_VOLATILE) | DR7_FIXED_1;
kvm_update_dr7(vcpu);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/6] KVM: x86: NOP emulation clears (incorrectly) the high 32-bits of RAX

2014-06-15 Thread Nadav Amit

On long-mode the current NOP (0x90) emulation still writes back to RAX.  As a
result, EAX is zero-extended and the high 32-bits of RAX are cleared.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index b354531..eb93eb4 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4663,8 +4663,9 @@ special_insn:
break;
case 0x90 ... 0x97: /* nop / xchg reg, rax */
if (ctxt->dst.addr.reg == reg_rmw(ctxt, VCPU_REGS_RAX))
-   break;
-   rc = em_xchg(ctxt);
+   ctxt->dst.type = OP_NONE;
+   else
+   rc = em_xchg(ctxt);
break;
case 0x98: /* cbw/cwde/cdqe */
switch (ctxt->op_bytes) {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/6] KVM: x86: Wrong emulation on 'xadd X, X'

2014-06-15 Thread Nadav Amit

The emulator does not emulate the xadd instruction correctly if the two
operands are the same.  In this (unlikely) situation the result should be the
sum of X and X (2X) when it is currently X.  The solution is to first perform
writeback to the source, before writing to the destination.  The only
instruction which should be affected is xadd, as the other instructions that
perform writeback to the source use the extended accumlator (e.g., RAX:RDX).

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f0b0a10..3c8d867 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4711,17 +4711,17 @@ special_insn:
goto done;
 
 writeback:
-   if (!(ctxt->d & NoWrite)) {
-   rc = writeback(ctxt, &ctxt->dst);
-   if (rc != X86EMUL_CONTINUE)
-   goto done;
-   }
if (ctxt->d & SrcWrite) {
BUG_ON(ctxt->src.type == OP_MEM || ctxt->src.type == 
OP_MEM_STR);
rc = writeback(ctxt, &ctxt->src);
if (rc != X86EMUL_CONTINUE)
goto done;
}
+   if (!(ctxt->d & NoWrite)) {
+   rc = writeback(ctxt, &ctxt->dst);
+   if (rc != X86EMUL_CONTINUE)
+   goto done;
+   }
 
/*
 * restore dst type in case the decoding will be reused
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/6] KVM: x86: emulation of dword cmov on long-mode should clear [63:32]

2014-06-15 Thread Nadav Amit

Even if the condition of cmov is not satisfied, bits[63:32] should be cleared.
This is clearly stated in Intel's CMOVcc documentation.  The solution is to
reassign the destination onto itself if the condition is unsatisfied.  For that
matter the original destination value needs to be read.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 0183350..b354531 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3905,7 +3905,7 @@ static const struct opcode twobyte_table[256] = {
N, N,
N, N, N, N, N, N, N, N,
/* 0x40 - 0x4F */
-   X16(D(DstReg | SrcMem | ModRM | Mov)),
+   X16(D(DstReg | SrcMem | ModRM)),
/* 0x50 - 0x5F */
N, N, N, N, N, N, N, N, N, N, N, N, N, N, N, N,
/* 0x60 - 0x6F */
@@ -4799,8 +4799,10 @@ twobyte_insn:
ops->get_dr(ctxt, ctxt->modrm_reg, &ctxt->dst.val);
break;
case 0x40 ... 0x4f: /* cmov */
-   ctxt->dst.val = ctxt->dst.orig_val = ctxt->src.val;
-   if (!test_cc(ctxt->b, ctxt->eflags))
+   if (test_cc(ctxt->b, ctxt->eflags))
+   ctxt->dst.val = ctxt->src.val;
+   else if (ctxt->mode != X86EMUL_MODE_PROT64 ||
+ctxt->op_bytes != 4)
ctxt->dst.type = OP_NONE; /* no writeback */
break;
case 0x80 ... 0x8f: /* jnz rel, etc*/
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/6] KVM: x86: More emulator bugs

2014-06-15 Thread Nadav Amit

This patch-set resolves several emulator bugs. Each fix is independent of the
others.  The DR6/7 bug can occur during DR-access exit (regardless to
unrestricted mode, MMIO and SPT).

Thanks for reviewing the patches,
Nadav

Nadav Amit (6):
  KVM: x86: bit-ops emulation ignores offset on 64-bit
  KVM: x86: Wrong emulation on 'xadd X, X'
  KVM: x86: Inter privilage level ret emulation is not implemeneted
  KVM: x86: emulation of dword cmov on long-mode should clear [63:32]
  KVM: x86: NOP emulation clears (incorrectly) the high 32-bits of RAX
  KVM: x86: check DR6/7 high-bits are clear only on long-mode

 arch/x86/kvm/emulate.c | 31 ---
 arch/x86/kvm/x86.c | 13 +++--
 2 files changed, 31 insertions(+), 13 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/6] KVM: x86: Inter privilage level ret emulation is not implemeneted

2014-06-15 Thread Nadav Amit

Return unhandlable error on inter-privilage level ret instruction.  This is
since the current emulation does not check the privilage level correctly when
loading the CS, and does not pop RSP/SS as needed.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 3c8d867..0183350 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2019,6 +2019,7 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
 {
int rc;
unsigned long cs;
+   int cpl = ctxt->ops->cpl(ctxt);
 
rc = emulate_pop(ctxt, &ctxt->_eip, ctxt->op_bytes);
if (rc != X86EMUL_CONTINUE)
@@ -2028,6 +2029,9 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
rc = emulate_pop(ctxt, &cs, ctxt->op_bytes);
if (rc != X86EMUL_CONTINUE)
return rc;
+   /* Outer-privilage level return is not implemented */
+   if (ctxt->mode >= X86EMUL_MODE_PROT16 && (cs & 3) > cpl)
+   return X86EMUL_UNHANDLEABLE;
rc = load_segment_descriptor(ctxt, (u16)cs, VCPU_SREG_CS);
return rc;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/6] KVM: x86: check DR6/7 high-bits are clear only on long-mode

2014-06-16 Thread Nadav Amit


On 6/16/14, 1:17 PM, Paolo Bonzini wrote:

Il 15/06/2014 15:13, Nadav Amit ha scritto:

From: Nadav Amit 

When the guest sets DR6 and DR7, KVM asserts the high 32-bits are
clear, and
otherwise injects a #GP exception. This exception should only be
injected only
if running in long-mode.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/x86.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 57eac30..71fe841 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -756,6 +756,15 @@ static void kvm_update_dr7(struct kvm_vcpu *vcpu)
 vcpu->arch.switch_db_regs |= KVM_DEBUGREG_BP_ENABLED;
 }

+static bool is_64_bit_mode(struct kvm_vcpu *vcpu)
+{
+int cs_db, cs_l;
+if (!is_long_mode(vcpu))
+return false;
+kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l);
+return cs_l;
+}
+
 static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long
val)
 {
 switch (dr) {
@@ -769,7 +778,7 @@ static int __kvm_set_dr(struct kvm_vcpu *vcpu, int
dr, unsigned long val)
 return 1; /* #UD */
 /* fall through */
 case 6:
-if (val & 0xULL)
+if ((val & 0xULL) && is_64_bit_mode(vcpu))
 return -1; /* #GP */
 vcpu->arch.dr6 = (val & DR6_VOLATILE) | DR6_FIXED_1;
 kvm_update_dr6(vcpu);
@@ -779,7 +788,7 @@ static int __kvm_set_dr(struct kvm_vcpu *vcpu, int
dr, unsigned long val)
 return 1; /* #UD */
 /* fall through */
 default: /* 7 */
-if (val & 0xULL)
+if ((val & 0xULL) && is_64_bit_mode(vcpu))
 return -1; /* #GP */
 vcpu->arch.dr7 = (val & DR7_VOLATILE) | DR7_FIXED_1;
 kvm_update_dr7(vcpu);



Do you get this if the input register has bit 31 set?
No. To be frank, the scenario may be considered a bit synthetic: the 
guest assigns a value to a general-purpose register in 64-bit mode, 
setting the high 32-bits to some non-zero value. Then, later, in 32-bit 
mode, the guest performs MOV DR instruction. In between the two 
assignments, the general purpose register is unmodified, so the high 
32-bits of the general purpose registers are still set.


Note that this scenario does not occur when MOV DR is emulated, but when 
handle_dr() is called. In this case, the entire 64-bits of the general 
purpose register used for MOV DR are read, regardless to the execution 
mode of the guest.


Nadav

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/6] KVM: x86: check DR6/7 high-bits are clear only on long-mode

2014-06-16 Thread Nadav Amit


On 6/16/14, 2:09 PM, Paolo Bonzini wrote:

Il 16/06/2014 12:33, Nadav Amit ha scritto:


Do you get this if the input register has bit 31 set?

No. To be frank, the scenario may be considered a bit synthetic: the
guest assigns a value to a general-purpose register in 64-bit mode,
setting the high 32-bits to some non-zero value. Then, later, in 32-bit
mode, the guest performs MOV DR instruction. In between the two
assignments, the general purpose register is unmodified, so the high
32-bits of the general purpose registers are still set.

Note that this scenario does not occur when MOV DR is emulated, but when
handle_dr() is called. In this case, the entire 64-bits of the general
purpose register used for MOV DR are read, regardless to the execution
mode of the guest.


I wonder if the same bug happens elsewhere.  For example,
kvm_emulate_hypercall doesn't look at CS.L/CS.DB, which is really a
corner case but arguably also a bug.  kvm_hv_hypercall instead does it
right.

Perhaps we need a variant of kvm_register_read that (on 64-bit hosts)
checks EFER/CS.L/CS.DB and masks the returned value accordingly.  You
could call it kvm_register_readl.


There are two questions that come in mind:
1. Should we ignore CS.DB? It would make it consistent with 
kvm_hv_hypercall and handle_dr. I think this is the proper behavior.


2. Reading CS.L once and masking all the registers (i.e., changing the 
is_long_mode in kvm_emulate_hypercall to is_64_bit_mode) is likely to be 
more efficient.


Regards,
Nadav
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/6] KVM: x86: check DR6/7 high-bits are clear only on long-mode

2014-06-16 Thread Nadav Amit


On 6/16/14, 5:56 PM, Paolo Bonzini wrote:

Il 16/06/2014 13:53, Nadav Amit ha scritto:

On 6/16/14, 2:09 PM, Paolo Bonzini wrote:

Il 16/06/2014 12:33, Nadav Amit ha scritto:


Do you get this if the input register has bit 31 set?

No. To be frank, the scenario may be considered a bit synthetic: the
guest assigns a value to a general-purpose register in 64-bit mode,
setting the high 32-bits to some non-zero value. Then, later, in 32-bit
mode, the guest performs MOV DR instruction. In between the two
assignments, the general purpose register is unmodified, so the high
32-bits of the general purpose registers are still set.

Note that this scenario does not occur when MOV DR is emulated, but
when
handle_dr() is called. In this case, the entire 64-bits of the general
purpose register used for MOV DR are read, regardless to the execution
mode of the guest.


I wonder if the same bug happens elsewhere.  For example,
kvm_emulate_hypercall doesn't look at CS.L/CS.DB, which is really a
corner case but arguably also a bug.  kvm_hv_hypercall instead does it
right.

Perhaps we need a variant of kvm_register_read that (on 64-bit hosts)
checks EFER/CS.L/CS.DB and masks the returned value accordingly.  You
could call it kvm_register_readl.


There are two questions that come in mind:
1. Should we ignore CS.DB? It would make it consistent with
kvm_hv_hypercall and handle_dr. I think this is the proper behavior.


It depends on what you're using it for, but as a start yes.


2. Reading CS.L once and masking all the registers (i.e., changing the
is_long_mode in kvm_emulate_hypercall to is_64_bit_mode) is likely to be
more efficient.


Yes, for the case of kvm_emulate_hypercall.  Then you can build
kvm_register_readl on top of is_64bit_mode and fix this bug with that
function.  Did you check that handle_cr is unaffected?


handle_cr is affected, and there are some other instances of affected 
register reads. Actually, most of the vmx instruction exit handling 
routines ignore long-mode and cs.l when considering the register operand 
size.


I'll make the necessary changes and resubmit.

Nadav

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: x86: Increase the number of fixed MTRR regs to 10

2014-06-18 Thread Nadav Amit

Recent Intel CPUs have 10 variable range MTRRs. Since operating systems
sometime make assumptions on CPUs while they ignore capability MSRs, it is
better for KVM to be consistent with recent CPUs. Reporting more MTRRs than
actually supported has no functional implications.

Signed-off-by: Nadav Amit 
---
 arch/x86/include/asm/kvm_host.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4931415..0bab29d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -95,7 +95,7 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, 
int level)
 #define KVM_REFILL_PAGES 25
 #define KVM_MAX_CPUID_ENTRIES 80
 #define KVM_NR_FIXED_MTRR_REGION 88
-#define KVM_NR_VAR_MTRR 8
+#define KVM_NR_VAR_MTRR 10
 
 #define ASYNC_PF_PER_VCPU 64
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] KVM: x86: Emulator support for #UD on CPL>0

2014-06-18 Thread Nadav Amit

Certain instructions (e.g., mwait and monitor) cause a #UD exception when they
are executed in privilaged mode. This is in contrast to the regular privilaged
instructions which cause #GP. In order not to mess with SVM interception of
mwait and monitor which assumes privilage level assertions take place before
interception, a flag has been added.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f90194d..ef7a5a0 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -163,6 +163,7 @@
 #define SrcWrite((u64)1 << 46)  /* Write back src operand */
 #define NoMod  ((u64)1 << 47)  /* Mod field is ignored */
 #define NoBigReal   ((u64)1 << 48)  /* No big real mode */
+#define UDOnPriv((u64)1 << 49)  /* #UD instead of #GP on CPL > 0 */
 
 #define DstXacc (DstAccLo | SrcAccHi | SrcWrite)
 
@@ -4560,7 +4561,10 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 
/* Privileged instruction can be executed only in CPL=0 */
if ((ctxt->d & Priv) && ops->cpl(ctxt)) {
-   rc = emulate_gp(ctxt, 0);
+   if (ctxt->d & UDOnPriv)
+   rc = emulate_ud(ctxt);
+   else
+   rc = emulate_gp(ctxt, 0);
goto done;
}
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/3] Correct monitor-mwait emulation as nop

2014-06-18 Thread Nadav Amit

KVM handles monitor-mwait as nop, but does not check any of the preconditions
for the instructions.  These instructions may generate all kind of exceptions
(#UD, #PF, #GP, #SS). They can also be executed in real-mode.  This patch-set
moves the handling of monitor-mwait to the emulator, to allow their execution
in either real-mode or protected-mode.  It tries to follow the SDM in checking
the preconditions and generating the necassary exceptions.

Thanks for reviewing the patch. Please try it with OS X to make sure it works
properly without generating unnecassary exception.

Nadav Amit (3):
  KVM: x86: Emulator flag for instruction with no big real mode
  KVM: x86: Emulator support for #UD on CPL>0
  KVM: x86: correct mwait and monitor emulation

 arch/x86/kvm/emulate.c | 55 ++
 arch/x86/kvm/svm.c | 22 ++--
 arch/x86/kvm/vmx.c | 27 ++---
 3 files changed, 64 insertions(+), 40 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] KVM: x86: Emulator flag for instruction with no big real mode

2014-06-18 Thread Nadav Amit

Certain instructions, such as monitor and xsave do not support big real mode
and cause a #GP exception if any of the accessed bytes effective address are
not within [0, 0x].  This patch introduces a flag to mark these
instructions, including the necassary checks.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e4e833d..f90194d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -162,6 +162,7 @@
 #define NoWrite ((u64)1 << 45)  /* No writeback */
 #define SrcWrite((u64)1 << 46)  /* Write back src operand */
 #define NoMod  ((u64)1 << 47)  /* Mod field is ignored */
+#define NoBigReal   ((u64)1 << 48)  /* No big real mode */
 
 #define DstXacc (DstAccLo | SrcAccHi | SrcWrite)
 
@@ -651,7 +652,12 @@ static int __linearize(struct x86_emulate_ctxt *ctxt,
if (!fetch && (desc.type & 8) && !(desc.type & 2))
goto bad;
lim = desc_limit_scaled(&desc);
-   if ((desc.type & 8) || !(desc.type & 4)) {
+   if ((ctxt->mode == X86EMUL_MODE_REAL) && !fetch &&
+   (ctxt->d & NoBigReal)) {
+   /* la is between zero and 0x */
+   if (la > 0x || (u32)(la + size - 1) > 0x)
+   goto bad;
+   } else if ((desc.type & 8) || !(desc.type & 4)) {
/* expand-up segment */
if (addr.ea > lim || (u32)(addr.ea + size - 1) > lim)
goto bad;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] KVM: x86: correct mwait and monitor emulation

2014-06-18 Thread Nadav Amit

mwait and monitor are currently handled as nop. Considering this behavior, they
should still be handled correctly, i.e., check execution conditions and generate
exceptions when required. mwait and monitor may also be executed in real-mode
and are not handled in that case.  This patch performs the emulation of
monitor-mwait according to Intel SDM (other than checking whether interrupt can
be used as a break event).

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c | 41 +++--
 arch/x86/kvm/svm.c | 22 ++
 arch/x86/kvm/vmx.c | 27 +++
 3 files changed, 52 insertions(+), 38 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index ef7a5a0..424b58d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3344,6 +3344,43 @@ static int em_bswap(struct x86_emulate_ctxt *ctxt)
return X86EMUL_CONTINUE;
 }
 
+static int em_monitor(struct x86_emulate_ctxt *ctxt)
+{
+   int rc;
+   struct segmented_address addr;
+   u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
+   u64 rax = reg_read(ctxt, VCPU_REGS_RAX);
+   u8 byte;
+
+   if (ctxt->mode != X86EMUL_MODE_PROT64)
+   rcx = (u32)rcx;
+   if (rcx != 0)
+   return emulate_gp(ctxt, 0);
+
+   addr.seg = seg_override(ctxt);
+   addr.ea = ctxt->ad_bytes == 8 ? rax : (u32)rax;
+   rc = segmented_read(ctxt, addr, &byte, 1);
+   if (rc != X86EMUL_CONTINUE)
+   return rc;
+
+   printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
+   return X86EMUL_CONTINUE;
+}
+
+static int em_mwait(struct x86_emulate_ctxt *ctxt)
+{
+   u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
+
+   if (ctxt->mode != X86EMUL_MODE_PROT64)
+   rcx = (u32)rcx;
+   if ((rcx & ~1UL) != 0)
+   return emulate_gp(ctxt, 0);
+
+   /* Accepting interrupt as break event regardless to cpuid */
+   printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
+   return X86EMUL_CONTINUE;
+}
+
 static bool valid_cr(int nr)
 {
switch (nr) {
@@ -3557,8 +3594,8 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt)
F2bv(((_f) & ~Lock) | DstAcc | SrcImm, _e)
 
 static const struct opcode group7_rm1[] = {
-   DI(SrcNone | Priv, monitor),
-   DI(SrcNone | Priv, mwait),
+   II(SrcNone | Priv | NoBigReal | UDOnPriv, em_monitor, monitor),
+   II(SrcNone | Priv | NoBigReal | UDOnPriv, em_mwait, mwait),
N, N, N, N, N, N,
 };
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ec8366c..a524e04 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3274,24 +3274,6 @@ static int pause_interception(struct vcpu_svm *svm)
return 1;
 }
 
-static int nop_interception(struct vcpu_svm *svm)
-{
-   skip_emulated_instruction(&(svm->vcpu));
-   return 1;
-}
-
-static int monitor_interception(struct vcpu_svm *svm)
-{
-   printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
-   return nop_interception(svm);
-}
-
-static int mwait_interception(struct vcpu_svm *svm)
-{
-   printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
-   return nop_interception(svm);
-}
-
 static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
[SVM_EXIT_READ_CR0] = cr_interception,
[SVM_EXIT_READ_CR3] = cr_interception,
@@ -3349,8 +3331,8 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm 
*svm) = {
[SVM_EXIT_CLGI] = clgi_interception,
[SVM_EXIT_SKINIT]   = skinit_interception,
[SVM_EXIT_WBINVD]   = emulate_on_interception,
-   [SVM_EXIT_MONITOR]  = monitor_interception,
-   [SVM_EXIT_MWAIT]= mwait_interception,
+   [SVM_EXIT_MONITOR]  = emulate_on_interception,
+   [SVM_EXIT_MWAIT]= emulate_on_interception,
[SVM_EXIT_XSETBV]   = xsetbv_interception,
[SVM_EXIT_NPF]  = pf_interception,
 };
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 801332e..7023e71 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5672,22 +5672,17 @@ static int handle_pause(struct kvm_vcpu *vcpu)
return 1;
 }
 
-static int handle_nop(struct kvm_vcpu *vcpu)
+static int handle_emulate(struct kvm_vcpu *vcpu)
 {
-   skip_emulated_instruction(vcpu);
-   return 1;
-}
+   int err = emulate_instruction(vcpu, 0);
 
-static int handle_mwait(struct kvm_vcpu *vcpu)
-{
-   printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
-   return handle_nop(vcpu);
-}
-
-static int handle_monitor(struct kvm_vcpu *vcpu)
-{
-   printk_on

[PATCH v2 9/9] KVM: vmx: vmx instructions handling does not consider cs.l

2014-06-18 Thread Nadav Amit

VMX instructions use 32-bit operands in 32-bit mode, and 64-bit operands in
64-bit mode.  The current implementation is broken since it does not use the
register operands correctly, and always uses 64-bit for reads and writes.
Moreover, write to memory in vmwrite only considers long-mode, so it ignores
cs.l. This patch fixes this behavior.  The field of vmread/vmwrite is kept
intentionally as 64-bit read since if bits [63:32] are not cleared the
instruction should fail, according to Intel SDM.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/vmx.c | 8 
 arch/x86/kvm/x86.h | 9 +
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index cbfbb8b..75dc888 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6397,7 +6397,7 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
 * on the guest's mode (32 or 64 bit), not on the given field's length.
 */
if (vmx_instruction_info & (1u << 10)) {
-   kvm_register_write(vcpu, (((vmx_instruction_info) >> 3) & 0xf),
+   kvm_register_writel(vcpu, (((vmx_instruction_info) >> 3) & 0xf),
field_value);
} else {
if (get_vmx_mem_address(vcpu, exit_qualification,
@@ -6434,14 +6434,14 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
return 1;
 
if (vmx_instruction_info & (1u << 10))
-   field_value = kvm_register_read(vcpu,
+   field_value = kvm_register_readl(vcpu,
(((vmx_instruction_info) >> 3) & 0xf));
else {
if (get_vmx_mem_address(vcpu, exit_qualification,
vmx_instruction_info, &gva))
return 1;
if (kvm_read_guest_virt(&vcpu->arch.emulate_ctxt, gva,
-  &field_value, (is_long_mode(vcpu) ? 8 : 4), &e)) {
+  &field_value, (is_64_bit_mode(vcpu) ? 8 : 4), &e)) {
kvm_inject_page_fault(vcpu, &e);
return 1;
}
@@ -6571,7 +6571,7 @@ static int handle_invept(struct kvm_vcpu *vcpu)
}
 
vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
-   type = kvm_register_read(vcpu, (vmx_instruction_info >> 28) & 0xf);
+   type = kvm_register_readl(vcpu, (vmx_instruction_info >> 28) & 0xf);
 
types = (nested_vmx_ept_caps >> VMX_EPT_EXTENT_SHIFT) & 6;
 
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index c5b61a7..306a1b7 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -126,6 +126,15 @@ static inline unsigned long kvm_register_readl(struct 
kvm_vcpu *vcpu,
return is_64_bit_mode(vcpu) ? val : (u32)val;
 }
 
+static inline void kvm_register_writel(struct kvm_vcpu *vcpu,
+  enum kvm_reg reg,
+  unsigned long val)
+{
+   if (!is_64_bit_mode(vcpu))
+   val = (u32)val;
+   return kvm_register_write(vcpu, reg, val);
+}
+
 void kvm_before_handle_nmi(struct kvm_vcpu *vcpu);
 void kvm_after_handle_nmi(struct kvm_vcpu *vcpu);
 int kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 1/9] KVM: x86: bit-ops emulation ignores offset on 64-bit

2014-06-18 Thread Nadav Amit

The current emulation of bit operations ignores the offset from the destination
on 64-bit target memory operands. This patch fixes this behavior.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e4e833d..f0b0a10 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1220,12 +1220,14 @@ static void fetch_bit_operand(struct x86_emulate_ctxt 
*ctxt)
long sv = 0, mask;
 
if (ctxt->dst.type == OP_MEM && ctxt->src.type == OP_REG) {
-   mask = ~(ctxt->dst.bytes * 8 - 1);
+   mask = ~((long)ctxt->dst.bytes * 8 - 1);
 
if (ctxt->src.bytes == 2)
sv = (s16)ctxt->src.val & (s16)mask;
else if (ctxt->src.bytes == 4)
sv = (s32)ctxt->src.val & (s32)mask;
+   else
+   sv = (s64)ctxt->src.val & (s64)mask;
 
ctxt->dst.addr.mem.ea += (sv >> 3);
}
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 6/9] KVM: x86: check DR6/7 high-bits are clear only on long-mode

2014-06-18 Thread Nadav Amit

From: Nadav Amit 

When the guest sets DR6 and DR7, KVM asserts the high 32-bits are clear, and
otherwise injects a #GP exception. This exception should only be injected only
if running in long-mode.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/vmx.c |  2 +-
 arch/x86/kvm/x86.h | 18 ++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 801332e..c0f53a0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5165,7 +5165,7 @@ static int handle_dr(struct kvm_vcpu *vcpu)
return 1;
kvm_register_write(vcpu, reg, val);
} else
-   if (kvm_set_dr(vcpu, dr, kvm_register_read(vcpu, reg)))
+   if (kvm_set_dr(vcpu, dr, kvm_register_readl(vcpu, reg)))
return 1;
 
skip_emulated_instruction(vcpu);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 8c97bac..c5b61a7 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -47,6 +47,16 @@ static inline int is_long_mode(struct kvm_vcpu *vcpu)
 #endif
 }
 
+static inline bool is_64_bit_mode(struct kvm_vcpu *vcpu)
+{
+   int cs_db, cs_l;
+
+   if (!is_long_mode(vcpu))
+   return false;
+   kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l);
+   return cs_l;
+}
+
 static inline bool mmu_is_nested(struct kvm_vcpu *vcpu)
 {
return vcpu->arch.walk_mmu == &vcpu->arch.nested_mmu;
@@ -108,6 +118,14 @@ static inline bool vcpu_match_mmio_gpa(struct kvm_vcpu 
*vcpu, gpa_t gpa)
return false;
 }
 
+static inline unsigned long kvm_register_readl(struct kvm_vcpu *vcpu,
+  enum kvm_reg reg)
+{
+   unsigned long val = kvm_register_read(vcpu, reg);
+
+   return is_64_bit_mode(vcpu) ? val : (u32)val;
+}
+
 void kvm_before_handle_nmi(struct kvm_vcpu *vcpu);
 void kvm_after_handle_nmi(struct kvm_vcpu *vcpu);
 int kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 7/9] KVM: x86: Hypercall handling does not considers opsize correctly

2014-06-18 Thread Nadav Amit

Currently, the hypercall handling routine only considers LME as an indication
to whether the guest uses 32/64-bit mode. This is incosistent with hyperv
hypercalls handling and against the common sense of considering cs.l as well.
This patch uses is_64_bit_mode instead of is_long_mode for that matter. In
addition, the result is masked in respect to the guest execution mode. Last, it
changes kvm_hv_hypercall to use is_64_bit_mode as well to simplify the code.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/x86.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f32a025..c6dfe47 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5662,7 +5662,6 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
u64 param, ingpa, outgpa, ret;
uint16_t code, rep_idx, rep_cnt, res = HV_STATUS_SUCCESS, rep_done = 0;
bool fast, longmode;
-   int cs_db, cs_l;
 
/*
 * hypercall generates UD from non zero cpl and real mode
@@ -5673,8 +5672,7 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
return 0;
}
 
-   kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l);
-   longmode = is_long_mode(vcpu) && cs_l == 1;
+   longmode = is_64_bit_mode(vcpu);
 
if (!longmode) {
param = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDX) << 32) |
@@ -5739,7 +5737,7 @@ static void kvm_pv_kick_cpu_op(struct kvm *kvm, unsigned 
long flags, int apicid)
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 {
unsigned long nr, a0, a1, a2, a3, ret;
-   int r = 1;
+   int op_64_bit, r = 1;
 
if (kvm_hv_hypercall_enabled(vcpu->kvm))
return kvm_hv_hypercall(vcpu);
@@ -5752,7 +5750,8 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 
trace_kvm_hypercall(nr, a0, a1, a2, a3);
 
-   if (!is_long_mode(vcpu)) {
+   op_64_bit = is_64_bit_mode(vcpu);
+   if (!op_64_bit) {
nr &= 0x;
a0 &= 0x;
a1 &= 0x;
@@ -5778,6 +5777,8 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
break;
}
 out:
+   if (!op_64_bit)
+   ret = (u32)ret;
kvm_register_write(vcpu, VCPU_REGS_RAX, ret);
++vcpu->stat.hypercalls;
return r;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 5/9] KVM: x86: NOP emulation clears (incorrectly) the high 32-bits of RAX

2014-06-18 Thread Nadav Amit

On long-mode the current NOP (0x90) emulation still writes back to RAX.  As a
result, EAX is zero-extended and the high 32-bits of RAX are cleared.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index b354531..eb93eb4 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4663,8 +4663,9 @@ special_insn:
break;
case 0x90 ... 0x97: /* nop / xchg reg, rax */
if (ctxt->dst.addr.reg == reg_rmw(ctxt, VCPU_REGS_RAX))
-   break;
-   rc = em_xchg(ctxt);
+   ctxt->dst.type = OP_NONE;
+   else
+   rc = em_xchg(ctxt);
break;
case 0x98: /* cbw/cwde/cdqe */
switch (ctxt->op_bytes) {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH kvm-unit-tests 0/5] x86: Tests for recent emulator bugs

2014-06-18 Thread Nadav Amit

This patch-set checks recent emulator bugs as well as monitor-mwait emulation
in real-mode.

Nadav Amit (5):
  x86: Testing nop instruction on 64-bit
  x86: test xadd with two identical operands
  x86: Test btcq with operand larger than 64
  x86: check cmov instruction on 64-bit
  x86: Test monitor and mwait on real-mode

 x86/emulator.c | 30 --
 x86/realmode.c | 20 
 2 files changed, 48 insertions(+), 2 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH kvm-unit-tests 4/5] x86: check cmov instruction on 64-bit

2014-06-18 Thread Nadav Amit

cmov instruction on 64-bit with dword destination register operand should clear
bits [63:32].  This test checks this behavior due to previous KVM bug.

Signed-off-by: Nadav Amit 
---
 x86/emulator.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/x86/emulator.c b/x86/emulator.c
index 460949f..1fd0ca6 100644
--- a/x86/emulator.c
+++ b/x86/emulator.c
@@ -710,6 +710,18 @@ static void test_shld_shrd(u32 *mem)
 report("shrd (cl)", *mem == ((0x12345678 >> 3) | (5u << 29)));
 }
 
+static void test_cmov(u32 *mem)
+{
+   u64 val;
+   *mem = 0xabcdef12u;
+   asm ("movq $0x1234567812345678, %%rax\n\t"
+"cmpl %%eax, %%eax\n\t"
+"cmovnel (%[mem]), %%eax\n\t"
+"movq %%rax, %[val]\n\t"
+: [val]"=r"(val) : [mem]"r"(mem) : "%rax", "cc");
+   report("cmovnel", val == 0x12345678ul);
+}
+
 #define INSN_XCHG_ALL  \
"xchg %rax, 0+save \n\t"\
"xchg %rbx, 8+save \n\t"\
@@ -1054,12 +1066,12 @@ int main()
test_sreg(mem);
test_lldt(mem);
test_ltr(mem);
+   test_cmov(mem);
 
test_mmx_movq_mf(mem, insn_page, alt_insn_page, insn_ram);
test_movabs(mem, insn_page, alt_insn_page, insn_ram);
test_smsw_reg(mem, insn_page, alt_insn_page, insn_ram);
test_nop(mem, insn_page, alt_insn_page, insn_ram);
-
test_crosspage_mmio(mem);
 
test_string_io_mmio(mem);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH kvm-unit-tests 5/5] x86: Test monitor and mwait on real-mode

2014-06-18 Thread Nadav Amit

monitor and mwait are now considered to behave as nop.  New patch enables
monitor and mwait in realmode as well.  This test checks whether they are
handled as nop in realmode as well.

Signed-off-by: Nadav Amit 
---
 x86/realmode.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/x86/realmode.c b/x86/realmode.c
index 10c3e03..0a62b5d 100644
--- a/x86/realmode.c
+++ b/x86/realmode.c
@@ -1671,6 +1671,16 @@ void test_xadd(void)
report("xadd", R_AX, outregs.eax == inregs.eax * 2);
 }
 
+void test_monitor_mwait(void)
+{
+   MK_INSN(monitor, "monitor\n\t"
+"mwait\n\t");
+   inregs.ecx = 0;
+   inregs.eax = 0;
+   exec_in_big_real_mode(&insn_monitor);
+   report("monitor", 0, 1);
+}
+ 
 
 void realmode_start(void)
 {
@@ -1721,6 +1731,7 @@ void realmode_start(void)
test_smsw();
test_nopl();
test_xadd();
+   test_monitor_mwait();
test_perf_loop();
test_perf_mov();
test_perf_arith();
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 8/9] KVM: vmx: handle_cr ignores 32/64-bit mode

2014-06-18 Thread Nadav Amit

On 32-bit mode only bits [31:0] of the CR should be used for setting the CR
value.  Otherwise, the host may incorrectly assume the value is invalid if bits
[63:32] are not zero.  Moreover, the CR is currently being read twice when CR8
is used.  Last, nested mov-cr exiting is modified to handle the CR value
correctly as well.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/vmx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c0f53a0..cbfbb8b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5039,7 +5039,7 @@ static int handle_cr(struct kvm_vcpu *vcpu)
reg = (exit_qualification >> 8) & 15;
switch ((exit_qualification >> 4) & 3) {
case 0: /* mov to cr */
-   val = kvm_register_read(vcpu, reg);
+   val = kvm_register_readl(vcpu, reg);
trace_kvm_cr_write(cr, val);
switch (cr) {
case 0:
@@ -5056,7 +5056,7 @@ static int handle_cr(struct kvm_vcpu *vcpu)
return 1;
case 8: {
u8 cr8_prev = kvm_get_cr8(vcpu);
-   u8 cr8 = kvm_register_read(vcpu, reg);
+   u8 cr8 = (u8)val;
err = kvm_set_cr8(vcpu, cr8);
kvm_complete_insn_gp(vcpu, err);
if (irqchip_in_kernel(vcpu->kvm))
@@ -6751,7 +6751,7 @@ static bool nested_vmx_exit_handled_cr(struct kvm_vcpu 
*vcpu,
unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
int cr = exit_qualification & 15;
int reg = (exit_qualification >> 8) & 15;
-   unsigned long val = kvm_register_read(vcpu, reg);
+   unsigned long val = kvm_register_readl(vcpu, reg);
 
switch ((exit_qualification >> 4) & 3) {
case 0: /* mov to cr */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 3/9] KVM: x86: Inter privilage level ret emulation is not implemeneted

2014-06-18 Thread Nadav Amit

Return unhandlable error on inter-privilage level ret instruction.  This is
since the current emulation does not check the privilage level correctly when
loading the CS, and does not pop RSP/SS as needed.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 3c8d867..0183350 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2019,6 +2019,7 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
 {
int rc;
unsigned long cs;
+   int cpl = ctxt->ops->cpl(ctxt);
 
rc = emulate_pop(ctxt, &ctxt->_eip, ctxt->op_bytes);
if (rc != X86EMUL_CONTINUE)
@@ -2028,6 +2029,9 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
rc = emulate_pop(ctxt, &cs, ctxt->op_bytes);
if (rc != X86EMUL_CONTINUE)
return rc;
+   /* Outer-privilage level return is not implemented */
+   if (ctxt->mode >= X86EMUL_MODE_PROT16 && (cs & 3) > cpl)
+   return X86EMUL_UNHANDLEABLE;
rc = load_segment_descriptor(ctxt, (u16)cs, VCPU_SREG_CS);
return rc;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 2/9] KVM: x86: Wrong emulation on 'xadd X, X'

2014-06-18 Thread Nadav Amit

The emulator does not emulate the xadd instruction correctly if the two
operands are the same.  In this (unlikely) situation the result should be the
sum of X and X (2X) when it is currently X.  The solution is to first perform
writeback to the source, before writing to the destination.  The only
instruction which should be affected is xadd, as the other instructions that
perform writeback to the source use the extended accumlator (e.g., RAX:RDX).

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f0b0a10..3c8d867 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4711,17 +4711,17 @@ special_insn:
goto done;
 
 writeback:
-   if (!(ctxt->d & NoWrite)) {
-   rc = writeback(ctxt, &ctxt->dst);
-   if (rc != X86EMUL_CONTINUE)
-   goto done;
-   }
if (ctxt->d & SrcWrite) {
BUG_ON(ctxt->src.type == OP_MEM || ctxt->src.type == 
OP_MEM_STR);
rc = writeback(ctxt, &ctxt->src);
if (rc != X86EMUL_CONTINUE)
goto done;
}
+   if (!(ctxt->d & NoWrite)) {
+   rc = writeback(ctxt, &ctxt->dst);
+   if (rc != X86EMUL_CONTINUE)
+   goto done;
+   }
 
/*
 * restore dst type in case the decoding will be reused
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 4/9] KVM: x86: emulation of dword cmov on long-mode should clear [63:32]

2014-06-18 Thread Nadav Amit

Even if the condition of cmov is not satisfied, bits[63:32] should be cleared.
This is clearly stated in Intel's CMOVcc documentation.  The solution is to
reassign the destination onto itself if the condition is unsatisfied.  For that
matter the original destination value needs to be read.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 0183350..b354531 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3905,7 +3905,7 @@ static const struct opcode twobyte_table[256] = {
N, N,
N, N, N, N, N, N, N, N,
/* 0x40 - 0x4F */
-   X16(D(DstReg | SrcMem | ModRM | Mov)),
+   X16(D(DstReg | SrcMem | ModRM)),
/* 0x50 - 0x5F */
N, N, N, N, N, N, N, N, N, N, N, N, N, N, N, N,
/* 0x60 - 0x6F */
@@ -4799,8 +4799,10 @@ twobyte_insn:
ops->get_dr(ctxt, ctxt->modrm_reg, &ctxt->dst.val);
break;
case 0x40 ... 0x4f: /* cmov */
-   ctxt->dst.val = ctxt->dst.orig_val = ctxt->src.val;
-   if (!test_cc(ctxt->b, ctxt->eflags))
+   if (test_cc(ctxt->b, ctxt->eflags))
+   ctxt->dst.val = ctxt->src.val;
+   else if (ctxt->mode != X86EMUL_MODE_PROT64 ||
+ctxt->op_bytes != 4)
ctxt->dst.type = OP_NONE; /* no writeback */
break;
case 0x80 ... 0x8f: /* jnz rel, etc*/
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 0/9] KVM: x86: More emulator bugs

2014-06-18 Thread Nadav Amit

This patch-set resolves several emulator bugs. Each fix is independent of the
others.  The DR6 bug can occur during DR-access exit (regardless to 
unrestricted mode, MMIO and SPT).


Changes in v2:

Introduced kvm_register_readl and kvm_register_writel which consider long-mode
and cs.l when reading the registers.

Fixing the register read to respect 32/64 bit in hypercall handling, CR exit
handling and VMX instructions handling.

Thanks for re-reviewing the patch

Nadav Amit (9):
  KVM: x86: bit-ops emulation ignores offset on 64-bit
  KVM: x86: Wrong emulation on 'xadd X, X'
  KVM: x86: Inter privilage level ret emulation is not implemeneted
  KVM: x86: emulation of dword cmov on long-mode should clear [63:32]
  KVM: x86: NOP emulation clears (incorrectly) the high 32-bits of RAX
  KVM: x86: check DR6/7 high-bits are clear only on long-mode
  KVM: x86: Hypercall handling does not considers opsize correctly
  KVM: vmx: handle_cr ignores 32/64-bit mode
  KVM: vmx: vmx instructions handling does not consider cs.l

 arch/x86/kvm/emulate.c | 31 ---
 arch/x86/kvm/vmx.c | 16 
 arch/x86/kvm/x86.c | 11 ++-
 arch/x86/kvm/x86.h | 27 +++
 4 files changed, 61 insertions(+), 24 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH kvm-unit-tests 3/5] x86: Test btcq with operand larger than 64

2014-06-18 Thread Nadav Amit

Previously, KVM did not calculate the offset for bit-operations correctly when
quad-word operands were used.  This test checks btcq when operand is larger
than 64 in order to check this scenario.

Signed-off-by: Nadav Amit 
---
 x86/emulator.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/x86/emulator.c b/x86/emulator.c
index bf8a873..460949f 100644
--- a/x86/emulator.c
+++ b/x86/emulator.c
@@ -496,7 +496,7 @@ void test_btc(void *mem)
 {
unsigned int *a = mem;
 
-   memset(mem, 0, 3 * sizeof(unsigned int));
+   memset(mem, 0, 4 * sizeof(unsigned int));
 
asm ("btcl $32, %0" :: "m"(a[0]) : "memory");
asm ("btcl $1, %0" :: "m"(a[1]) : "memory");
@@ -505,6 +505,10 @@ void test_btc(void *mem)
 
asm ("btcl %1, %0" :: "m"(a[3]), "r"(-1) : "memory");
report("btcl reg, r/m", a[0] == 1 && a[1] == 2 && a[2] == 0x8004);
+
+   asm ("btcq %1, %0" : : "m"(a[2]), "r"(-1l) : "memory");
+   report("btcq reg, r/m", a[0] == 1 && a[1] == 0x8002 &&
+   a[2] == 0x8004 && a[3] == 0);
 }
 
 void test_bsfbsr(void *mem)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH kvm-unit-tests 2/5] x86: test xadd with two identical operands

2014-06-18 Thread Nadav Amit

Previously, KVM emulated xadd incorrectly when the source and destination
operands were identical.  The expected result is that the register would hold
the sum (2x) and not the previous value (x).  This test checks this behavior.
It should be executed with a disabled unrestricted mode.

Signed-off-by: Nadav Amit 
---
 x86/realmode.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/x86/realmode.c b/x86/realmode.c
index dc4a1d3..10c3e03 100644
--- a/x86/realmode.c
+++ b/x86/realmode.c
@@ -1663,6 +1663,14 @@ void test_smsw(void)
report("smsw", R_AX | R_BX | R_CX, outregs.eax == outregs.ebx);
 }
 
+void test_xadd(void)
+{
+   MK_INSN(xadd, "xaddl %eax, %eax\n\t");
+   inregs.eax = 0x12345678;
+   exec_in_big_real_mode(&insn_xadd);
+   report("xadd", R_AX, outregs.eax == inregs.eax * 2);
+}
+
 
 void realmode_start(void)
 {
@@ -1712,6 +1720,7 @@ void realmode_start(void)
test_dr_mod();
test_smsw();
test_nopl();
+   test_xadd();
test_perf_loop();
test_perf_mov();
test_perf_arith();
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH kvm-unit-tests 1/5] x86: Testing nop instruction on 64-bit

2014-06-18 Thread Nadav Amit

Previously, nop instruction emulation on 64-bit caused RAX bits [63:32] to be
cleared.  This test checks the behavior is correct and RAX is unmodified.

Signed-off-by: Nadav Amit 
---
 x86/emulator.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/x86/emulator.c b/x86/emulator.c
index f653127..bf8a873 100644
--- a/x86/emulator.c
+++ b/x86/emulator.c
@@ -845,6 +845,15 @@ static void test_smsw_reg(uint64_t *mem, uint8_t 
*insn_page,
report("64-bit smsw reg", outregs.rax == cr0);
 }
 
+static void test_nop(uint64_t *mem, uint8_t *insn_page,
+   uint8_t *alt_insn_page, void *insn_ram)
+{
+   inregs = (struct regs){ .rax = 0x1234567890abcdeful };
+   MK_INSN(nop, "nop\n\t");
+   trap_emulator(mem, alt_insn_page, &insn_nop);
+   report("nop", outregs.rax == inregs.rax);
+}
+
 static void test_crosspage_mmio(volatile uint8_t *mem)
 {
 volatile uint16_t w, *pw;
@@ -1045,6 +1054,7 @@ int main()
test_mmx_movq_mf(mem, insn_page, alt_insn_page, insn_ram);
test_movabs(mem, insn_page, alt_insn_page, insn_ram);
test_smsw_reg(mem, insn_page, alt_insn_page, insn_ram);
+   test_nop(mem, insn_page, alt_insn_page, insn_ram);
 
test_crosspage_mmio(mem);
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 9/9] KVM: vmx: vmx instructions handling does not consider cs.l

2014-06-18 Thread Nadav Amit


On 6/18/14, 6:41 PM, Paolo Bonzini wrote:

Il 18/06/2014 16:19, Nadav Amit ha scritto:

VMX instructions use 32-bit operands in 32-bit mode, and 64-bit
operands in
64-bit mode.  The current implementation is broken since it does not
use the
register operands correctly, and always uses 64-bit for reads and writes.
Moreover, write to memory in vmwrite only considers long-mode, so it
ignores
cs.l. This patch fixes this behavior.  The field of vmread/vmwrite is
kept
intentionally as 64-bit read since if bits [63:32] are not cleared the
instruction should fail, according to Intel SDM.


This is not how I read the SDM:

"These instructions fail if given, in 64-bit mode, an operand that sets
an encoding bit beyond bit 32." (Section 24.11.1.2)

"Outside IA-32e mode, the source operand has 32 bits, regardless of the
value of CS.D. In 64-bit mode, the source operand has 64 bits; however,
if bits 63:32 of the source operand are not zero, VMREAD will fail due
to an attempt to access an unsupported VMCS component (see operation
section)." (Description of VMREAD in Chapter 30).

I'll fix up the patch myself.



Perhaps I am missing something, but I don't see where my mistake is.
The VMREAD source operand is always read as 64-bits and I made no 
changes there. Therefore, if bits 63:32 are not zero, the instruction 
should fail when attempting to access the field.


The value in the source operand of VMWRITE which represents the value 
which should be written is zero-extended outside 64-bit mode.
Quoting: "The effective size of the primary source operand, which may be 
a register or in memory, is always 32 bits outside IA-32e mode (the 
setting of CS.D is ignored with respect to operand size) and 64 bits in 
64-bit mode." (Description of VMWRITE in chapter 30).


Regards,
Nadav


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 9/9] KVM: vmx: vmx instructions handling does not consider cs.l

2014-06-18 Thread Nadav Amit


On Jun 18, 2014, at 7:06 PM, Paolo Bonzini  wrote:

> Il 18/06/2014 18:01, Nadav Amit ha scritto:
>>> 
>> 
>> Perhaps I am missing something, but I don't see where my mistake is.
>> The VMREAD source operand is always read as 64-bits and I made no
>> changes there. Therefore, if bits 63:32 are not zero, the instruction
>> should fail when attempting to access the field.
> 
> In 32-bit mode, it should be read as 32 bits: "Outside IA-32e mode, the 
> source operand has 32 bits, regardless of the value of CS.D" (so it's never 
> 16 bits, but it's also never 64 bits).

Oh. I now get it. I misunderstood what the SDM said, as I was thinking that 
62:32 will lead to failure also on 32-bit mode.
If you fix it, please fix both VMREAD and VMWRITE. If not, I would resubmit.

Thanks,
Nadav--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

2014-06-18 Thread Nadav Amit


On 6/18/14, 8:59 PM, Eric Northup wrote:

On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit  wrote:

mwait and monitor are currently handled as nop. Considering this behavior, they
should still be handled correctly, i.e., check execution conditions and generate
exceptions when required. mwait and monitor may also be executed in real-mode
and are not handled in that case.  This patch performs the emulation of
monitor-mwait according to Intel SDM (other than checking whether interrupt can
be used as a break event).

Signed-off-by: Nadav Amit 
---
  arch/x86/kvm/emulate.c | 41 +++--
  arch/x86/kvm/svm.c | 22 ++
  arch/x86/kvm/vmx.c | 27 +++
  3 files changed, 52 insertions(+), 38 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index ef7a5a0..424b58d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3344,6 +3344,43 @@ static int em_bswap(struct x86_emulate_ctxt *ctxt)
 return X86EMUL_CONTINUE;
  }

+static int em_monitor(struct x86_emulate_ctxt *ctxt)
+{
+   int rc;
+   struct segmented_address addr;
+   u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
+   u64 rax = reg_read(ctxt, VCPU_REGS_RAX);
+   u8 byte;


I'd request:

u32 ebx, ecx, edx, eax = 1;
ctxt->opt->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx);
if (!(ecx & FFL(MWAIT)))
 return emulate_ud(ctxt);

and also in em_mwait.



I had similar implementation on previous version, which also checked on 
mwait whether "interrupt as break event" matches ECX value. However, I 
was under the impression that it was decided that MWAIT will always be 
emulated as NOP to avoid misbehaving VMs that ignore CPUID (see the 
discussion at http://www.spinics.net/lists/kvm/msg102766.html ).


Nadav
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

2014-06-19 Thread Nadav Amit


On 6/19/14, 2:23 PM, Gleb Natapov wrote:

On Thu, Jun 19, 2014 at 01:53:36PM +0300, Nadav Amit wrote:


On Jun 19, 2014, at 1:18 PM, Michael S. Tsirkin  wrote:


On Wed, Jun 18, 2014 at 02:46:01PM -0400, Gabriel L. Somlo wrote:

On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:

On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit  wrote:

mwait and monitor are currently handled as nop. Considering this behavior, they
should still be handled correctly, i.e., check execution conditions and generate
exceptions when required. mwait and monitor may also be executed in real-mode
and are not handled in that case.  This patch performs the emulation of
monitor-mwait according to Intel SDM (other than checking whether interrupt can
be used as a break event).

Signed-off-by: Nadav Amit 


How about this instead (details in the commit log below) ? Please let
me know what you think, and if you'd prefer me to send it out as a
separate patch rather than a reply to this thread.

Thanks,
--Gabriel


If there's an easy workaround, I'm inclined to agree.
We can always go back to Gabriel's patch (and then we'll need
Nadav's one too) but if we release a kernel with this
support it becomes an ABI and we can't go back.

So let's be careful here, and revert the hack for 3.16.


Acked-by: Michael S. Tsirkin 


Personally, I got a custom guest which requires mwait for executing correctly.

Can you elaborate on this guest a little bit. With nop implementation
for mwait the guest will hog a host cpu. Do you consider this to be
"executing correctly?"

--


mwait is not as "clean" as it may appear. It encounters false wake-ups 
due to a variety of reasons, and any code need to recheck the wake-up 
condition afterwards. Actually, some CPUs had bugs that caused excessive 
wake-ups that degraded performance considerably (Nehalem, if I am not 
mistaken).
Therefore, handling mwait as nop is logically correct (although it may 
degrade performance).


For the reference, if you look at the SDM 8.10.4, you'll see:
"Multiple events other than a write to the triggering address range can 
cause a processor that executed MWAIT to wake up. These include events 
that would lead to voluntary or involuntary context switches, such as..."


Note the words "include" in the sentence "These include events". 
Software has no way of controlling whether it gets false wake-ups and 
cannot rely on the wake-up as indication to anything.


Nadav


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

2014-06-19 Thread Nadav Amit


On 6/19/14, 3:07 PM, Gleb Natapov wrote:

On Thu, Jun 19, 2014 at 02:52:20PM +0300, Nadav Amit wrote:

On 6/19/14, 2:23 PM, Gleb Natapov wrote:

On Thu, Jun 19, 2014 at 01:53:36PM +0300, Nadav Amit wrote:


On Jun 19, 2014, at 1:18 PM, Michael S. Tsirkin  wrote:


On Wed, Jun 18, 2014 at 02:46:01PM -0400, Gabriel L. Somlo wrote:

On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:

On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit  wrote:

mwait and monitor are currently handled as nop. Considering this behavior, they
should still be handled correctly, i.e., check execution conditions and generate
exceptions when required. mwait and monitor may also be executed in real-mode
and are not handled in that case.  This patch performs the emulation of
monitor-mwait according to Intel SDM (other than checking whether interrupt can
be used as a break event).

Signed-off-by: Nadav Amit 


How about this instead (details in the commit log below) ? Please let
me know what you think, and if you'd prefer me to send it out as a
separate patch rather than a reply to this thread.

Thanks,
--Gabriel


If there's an easy workaround, I'm inclined to agree.
We can always go back to Gabriel's patch (and then we'll need
Nadav's one too) but if we release a kernel with this
support it becomes an ABI and we can't go back.

So let's be careful here, and revert the hack for 3.16.


Acked-by: Michael S. Tsirkin 


Personally, I got a custom guest which requires mwait for executing correctly.

Can you elaborate on this guest a little bit. With nop implementation
for mwait the guest will hog a host cpu. Do you consider this to be
"executing correctly?"

--


mwait is not as "clean" as it may appear. It encounters false wake-ups due
to a variety of reasons, and any code need to recheck the wake-up condition
afterwards. Actually, some CPUs had bugs that caused excessive wake-ups that
degraded performance considerably (Nehalem, if I am not mistaken).
Therefore, handling mwait as nop is logically correct (although it may
degrade performance).

For the reference, if you look at the SDM 8.10.4, you'll see:
"Multiple events other than a write to the triggering address range can
cause a processor that executed MWAIT to wake up. These include events that
would lead to voluntary or involuntary context switches, such as..."

Note the words "include" in the sentence "These include events". Software
has no way of controlling whether it gets false wake-ups and cannot rely on
the wake-up as indication to anything.


That's all well and good and I didn't say that nop is not a valid
mwait implementation, it is, though there is a big difference between
"encounters false wake-ups" and never sleeps.  What I asked is do you
consider your guest hogging host cpu to be "executing correctly?". What
this guest is doing that such behaviour is tolerated and shouldn't it
be better to just poll for a condition you are waiting for instead of
executing expensive vmexits. This will also hog 100% host cpu, but will
be actually faster.

You are correct, but unfortunately I have no control over the guest 
workload. In this specific workload I do not care about performance but 
only about correctness.


Nadav

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

2014-06-19 Thread Nadav Amit


On 6/19/14, 3:17 PM, Michael S. Tsirkin wrote:

On Thu, Jun 19, 2014 at 03:10:21PM +0300, Nadav Amit wrote:

On 6/19/14, 3:07 PM, Gleb Natapov wrote:

On Thu, Jun 19, 2014 at 02:52:20PM +0300, Nadav Amit wrote:

On 6/19/14, 2:23 PM, Gleb Natapov wrote:

On Thu, Jun 19, 2014 at 01:53:36PM +0300, Nadav Amit wrote:


On Jun 19, 2014, at 1:18 PM, Michael S. Tsirkin  wrote:


On Wed, Jun 18, 2014 at 02:46:01PM -0400, Gabriel L. Somlo wrote:

On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:

On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit  wrote:

mwait and monitor are currently handled as nop. Considering this behavior, they
should still be handled correctly, i.e., check execution conditions and generate
exceptions when required. mwait and monitor may also be executed in real-mode
and are not handled in that case.  This patch performs the emulation of
monitor-mwait according to Intel SDM (other than checking whether interrupt can
be used as a break event).

Signed-off-by: Nadav Amit 


How about this instead (details in the commit log below) ? Please let
me know what you think, and if you'd prefer me to send it out as a
separate patch rather than a reply to this thread.

Thanks,
--Gabriel


If there's an easy workaround, I'm inclined to agree.
We can always go back to Gabriel's patch (and then we'll need
Nadav's one too) but if we release a kernel with this
support it becomes an ABI and we can't go back.

So let's be careful here, and revert the hack for 3.16.


Acked-by: Michael S. Tsirkin 


Personally, I got a custom guest which requires mwait for executing correctly.

Can you elaborate on this guest a little bit. With nop implementation
for mwait the guest will hog a host cpu. Do you consider this to be
"executing correctly?"

--


mwait is not as "clean" as it may appear. It encounters false wake-ups due
to a variety of reasons, and any code need to recheck the wake-up condition
afterwards. Actually, some CPUs had bugs that caused excessive wake-ups that
degraded performance considerably (Nehalem, if I am not mistaken).
Therefore, handling mwait as nop is logically correct (although it may
degrade performance).

For the reference, if you look at the SDM 8.10.4, you'll see:
"Multiple events other than a write to the triggering address range can
cause a processor that executed MWAIT to wake up. These include events that
would lead to voluntary or involuntary context switches, such as..."

Note the words "include" in the sentence "These include events". Software
has no way of controlling whether it gets false wake-ups and cannot rely on
the wake-up as indication to anything.


That's all well and good and I didn't say that nop is not a valid
mwait implementation, it is, though there is a big difference between
"encounters false wake-ups" and never sleeps.  What I asked is do you
consider your guest hogging host cpu to be "executing correctly?". What
this guest is doing that such behaviour is tolerated and shouldn't it
be better to just poll for a condition you are waiting for instead of
executing expensive vmexits. This will also hog 100% host cpu, but will
be actually faster.


You are correct, but unfortunately I have no control over the guest
workload. In this specific workload I do not care about performance but only
about correctness.

Nadav


No one prevents you from patching your kernel to run this workload.  But
is this of use to anyone else? If yes why?

I do not say it should be the default behavior, and I can try to push to 
qemu some setting to turn it on by demand.


Anyhow, I believe there are cases you may want mwait support - either an 
OS X guest which was not modified to run without mwait, or for debugging 
the monitor-mwait flow of a guest OS.


I am not going to argue too much. Since I was under the impression there 
are needs for mwait, other than mine, I thought it would make all of our 
lives easier to have a better implementation.


Nadav
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: x86: Fix lapic.c debug prints

2014-06-29 Thread Nadav Amit

In two cases lapic.c does not use the apic_debug macro correctly. This patch
fixes them.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/lapic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 0069118..3855103 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1451,7 +1451,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
vcpu->arch.apic_arb_prio = 0;
vcpu->arch.apic_attention = 0;
 
-   apic_debug(KERN_INFO "%s: vcpu=%p, id=%d, base_msr="
+   apic_debug("%s: vcpu=%p, id=%d, base_msr="
   "0x%016" PRIx64 ", base_address=0x%0lx.\n", __func__,
   vcpu, kvm_apic_id(apic),
   vcpu->arch.apic_base, apic->base_address);
@@ -1895,7 +1895,7 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
/* evaluate pending_events before reading the vector */
smp_rmb();
sipi_vector = apic->sipi_vector;
-   pr_debug("vcpu %d received sipi with vector # %x\n",
+   apic_debug("vcpu %d received sipi with vector # %x\n",
 vcpu->vcpu_id, sipi_vector);
kvm_vcpu_deliver_sipi_vector(vcpu, sipi_vector);
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: Fix lapic.c debug prints

2014-06-29 Thread Nadav Amit


On 6/30/14, 3:48 AM, Bandan Das wrote:

Nadav Amit  writes:


In two cases lapic.c does not use the apic_debug macro correctly. This patch
fixes them.

Signed-off-by: Nadav Amit 
---
  arch/x86/kvm/lapic.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 0069118..3855103 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1451,7 +1451,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
vcpu->arch.apic_arb_prio = 0;
vcpu->arch.apic_attention = 0;

-   apic_debug(KERN_INFO "%s: vcpu=%p, id=%d, base_msr="
+   apic_debug("%s: vcpu=%p, id=%d, base_msr="
   "0x%016" PRIx64 ", base_address=0x%0lx.\n", __func__,
   vcpu, kvm_apic_id(apic),
   vcpu->arch.apic_base, apic->base_address);
@@ -1895,7 +1895,7 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
/* evaluate pending_events before reading the vector */
smp_rmb();
sipi_vector = apic->sipi_vector;
-   pr_debug("vcpu %d received sipi with vector # %x\n",
+   apic_debug("vcpu %d received sipi with vector # %x\n",


Why don't we just use pr_debug all throughout ?

I don't know. I just tried to make it consistent, since it really bugged 
me while I was debugging the local-apic.
If you prefer the other way around (which does seem to be better), I can 
do the search-and-replace.


Nadav
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: x86: Pending interrupt may be delivered after INIT

2014-06-30 Thread Nadav Amit

We encountered a scenario in which after an INIT is delivered, a pending
interrupt is delivered, although it was sent before the INIT.  As the SDM
states in section 10.4.7.1, the ISR and the IRR should be cleared after INIT as
KVM does.  This also means that pending interrupts should be cleared.  This
patch clears upon reset (and INIT) the pending interrupts; and at the same
occassion clears the pending exceptions, since they may cause a similar issue.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/x86.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f32a025..863ac07 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6835,6 +6835,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
atomic_set(&vcpu->arch.nmi_queued, 0);
vcpu->arch.nmi_pending = 0;
vcpu->arch.nmi_injected = false;
+   vcpu->arch.interrupt.pending = false;
+   vcpu->arch.exception.pending = false;
 
memset(vcpu->arch.db, 0, sizeof(vcpu->arch.db));
vcpu->arch.dr6 = DR6_FIXED_1;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] KVM: x86: Pending interrupt may be delivered after INIT

2014-06-30 Thread Nadav Amit

We encountered a scenario in which after an INIT is delivered, a pending
interrupt is delivered, although it was sent before the INIT.  As the SDM
states in section 10.4.7.1, the ISR and the IRR should be cleared after INIT as
KVM does.  This also means that pending interrupts should be cleared.  This
patch clears upon reset (and INIT) the pending interrupts; and at the same
occassion clears the pending exceptions, since they may cause a similar issue.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/x86.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f32a025..6425a31 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6835,6 +6835,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
atomic_set(&vcpu->arch.nmi_queued, 0);
vcpu->arch.nmi_pending = 0;
vcpu->arch.nmi_injected = false;
+   kvm_clear_interrupt_queue(vcpu);
+   kvm_clear_exception_queue(vcpu);
 
memset(vcpu->arch.db, 0, sizeof(vcpu->arch.db));
vcpu->arch.dr6 = DR6_FIXED_1;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 2/5] KVM: MMU: allow pinning spte translations (TDP-only)

2014-07-01 Thread Nadav Amit


I tried to use the patches (regardless of PEBS).
When the host uses huge-pages, I get an a oops.
I did not delve into the patch details, but I think there is some mess 
with the indexes - see below:


On 6/19/14, 2:12 AM, mtosa...@redhat.com wrote:

+
+static bool direct_pin_sptes(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+   u64 *sptes[4];
+   int r, i, level;
+
+   r = get_sptep_hierarchy(vcpu, gfn << PAGE_SHIFT, sptes);
+   if (!r)
+   return false;
+
+   level = 5 - r;
+   if (!is_last_spte(*sptes[r-1], level))

should be *sptes[level-1]

+   return false;
+   if (!is_shadow_present_pte(*sptes[r-1]))

should be *sptes[level-1]

+   return false;
+
+   for (i = 0; i < r; i++) {
+   u64 *sptep = sptes[i];

should be sptes[3 - i];

+   struct kvm_mmu_page *sp = page_header(__pa(sptep));
+
+   sp->pinned = true;
+   set_bit(SPTE_PINNED_BIT, (unsigned long *)sptep);
+   }
+
+   return true;
+}



Regards,
Nadav
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] kvm, mem-hotplug: Update apic access page when it is migrated.

2014-07-07 Thread Nadav Amit


Tang,

Running some (unrelated) tests I see that KVM does not handle APIC base 
relocation correctly. When the base is changed, kvm_lapic_set_base just 
changes lapic->base_address without taking further action (i.e., 
modifying the VMCS apic address in VMX).


This patch follows KVM bad behavior by using the constant 
VMX_APIC_ACCESS_PAGE_ADDR instead of lapic->base_address.


Anyhow, I didn't see anything that would make my life (in fixing the 
lapic base issue) too difficult. Yet, feel free in making it more 
"fix-friendly".


Thanks,
Nadav

On 7/7/14, 12:52 PM, Tang Chen wrote:

Hi Gleb,

The guest hang problem has been solved.

When mmu_notifier is called, I set VMCS APIC_ACCESS_ADDR to the new value
instead of setting it to 0. And only update kvm->arch.apic_access_page in
the next ept violation.

The guest is running well now.

I'll post the new patches tomorrow. ;)

Thanks.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] kvm, mem-hotplug: Update apic access page when it is migrated.

2014-07-07 Thread Nadav Amit


On 7/7/14, 2:54 PM, Gleb Natapov wrote:

On Mon, Jul 07, 2014 at 02:42:27PM +0300, Nadav Amit wrote:

Tang,

Running some (unrelated) tests I see that KVM does not handle APIC base
relocation correctly. When the base is changed, kvm_lapic_set_base just
changes lapic->base_address without taking further action (i.e., modifying
the VMCS apic address in VMX).

This patch follows KVM bad behavior by using the constant
VMX_APIC_ACCESS_PAGE_ADDR instead of lapic->base_address.

There is no OS out there that relocates APIC base (in fact it was not always
relocatable on real HW), so there is not point in complicating the code to 
support
it. In fact current APIC_ACCESS_ADDR handling relies on the fact that all vcpus
has apic mapped at the same address.



Anyhow, I didn't see anything that would make my life (in fixing the lapic
base issue) too difficult. Yet, feel free in making it more "fix-friendly".


Why would you want to fix it?

If there is no general need, I will not send a fix. However, I think the 
very least a warning message should be appear if the guest relocates the 
APIC base.


Nadav

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] kvm, mem-hotplug: Update apic access page when it is migrated.

2014-07-07 Thread Nadav Amit


Tang,

I am sorry if I caused any confusion.

Following Gleb response, there is no apparent need for dealing with the 
scenario I mentioned (relocating the APIC base), so you don't need to do 
any changes to your patch, and I will post another patch later to warn 
if the guest relocates its APIC (from the default address to another 
guest physical address). My answers to your questions are below.


On 7/8/14, 4:44 AM, Tang Chen wrote:

Hi Nadav,

Thanks for the reply, please see below.

On 07/07/2014 08:10 PM, Nadav Amit wrote:

On 7/7/14, 2:54 PM, Gleb Natapov wrote:

On Mon, Jul 07, 2014 at 02:42:27PM +0300, Nadav Amit wrote:

Tang,

Running some (unrelated) tests I see that KVM does not handle APIC base
relocation correctly. When the base is changed, kvm_lapic_set_base just
changes lapic->base_address without taking further action (i.e.,
modifying
the VMCS apic address in VMX).

This patch follows KVM bad behavior by using the constant
VMX_APIC_ACCESS_PAGE_ADDR instead of lapic->base_address.

There is no OS out there that relocates APIC base (in fact it was not
always
relocatable on real HW), so there is not point in complicating the
code to support
it. In fact current APIC_ACCESS_ADDR handling relies on the fact that
all vcpus
has apic mapped at the same address.



Anyhow, I didn't see anything that would make my life (in fixing the
lapic
base issue) too difficult. Yet, feel free in making it more
"fix-friendly".


Why would you want to fix it?


If there is no general need, I will not send a fix. However, I think the
very least a warning message should be appear if the guest relocates the
APIC base.


Maybe I didn't understand you question correctly. If I'm wrong, please
tell me.

This patch does not relocate APIC base in guest, but in host. Host migrates
the apic page to somewhere else, and KVM updates ept pagetable to track it.
In guest, apic base address (gpa) doesn't change.
The last claim is true in practice, according to Gleb, but it is not 
necessarily so according to the specifications. Pentium 4, Intel Xeon 
and P6 family processors support APIC base relocation. See the Intel SDM 
section 10.4.5. Anyhow, Gleb claims it is not used by any OS.




Is this lapic->base_address a hpa ?

No, it is guest physical address.



Is there anywhere I need to update in my patch ?


No. I'll send another patch on top of yours that prints a warning if the 
APIC base is relocated (i.e., the guest physical address of the APIC 
base is changed). Such relocation is done explicitly by the guest, not 
by your patch.


Nadav

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: x86: DR6/7.RTM cannot be written

2014-07-15 Thread Nadav Amit

Haswell and newer Intel CPUs have support for RTM, and in that case DR6.RTM is
not fixed to 1 and DR7.RTM is not fixed to zero. That is not the case in the
current KVM implementation. This bug is apparent only if the MOV-DR instruction
is emulated or the host also debugs the guest.

This patch is a partial fix which enables DR6.RTM and DR7.RTM to be cleared and
set respectively. It also sets DR6.RTM upon every debug exception. Obviously,
it is not a complete fix, as debugging of RTM is still unsupported.

Signed-off-by: Nadav Amit 
---
 arch/x86/include/asm/kvm_host.h |  8 +---
 arch/x86/kvm/cpuid.h|  8 
 arch/x86/kvm/vmx.c  |  4 ++--
 arch/x86/kvm/x86.c  | 22 --
 4 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b8a4480..a84eaf7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -152,14 +152,16 @@ enum {
 
 #define DR6_BD (1 << 13)
 #define DR6_BS (1 << 14)
-#define DR6_FIXED_10x0ff0
-#define DR6_VOLATILE   0xe00f
+#define DR6_RTM(1 << 16)
+#define DR6_FIXED_10xfffe0ff0
+#define DR6_INIT   0x0ff0
+#define DR6_VOLATILE   0x0001e00f
 
 #define DR7_BP_EN_MASK 0x00ff
 #define DR7_GE (1 << 9)
 #define DR7_GD (1 << 13)
 #define DR7_FIXED_10x0400
-#define DR7_VOLATILE   0x23ff
+#define DR7_VOLATILE   0x2bff
 
 /* apic attention bits */
 #define KVM_APIC_CHECK_VAPIC   0
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index f908731..a538059 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -95,4 +95,12 @@ static inline bool guest_cpuid_has_gbpages(struct kvm_vcpu 
*vcpu)
best = kvm_find_cpuid_entry(vcpu, 0x8001, 0);
return best && (best->edx & bit(X86_FEATURE_GBPAGES));
 }
+
+static inline bool guest_cpuid_has_rtm(struct kvm_vcpu *vcpu)
+{
+   struct kvm_cpuid_entry2 *best;
+
+   best = kvm_find_cpuid_entry(vcpu, 7, 0);
+   return best && (best->ebx & bit(X86_FEATURE_RTM));
+}
 #endif
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0c9569b..1fd3598 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4892,7 +4892,7 @@ static int handle_exception(struct kvm_vcpu *vcpu)
if (!(vcpu->guest_debug &
  (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))) {
vcpu->arch.dr6 &= ~15;
-   vcpu->arch.dr6 |= dr6;
+   vcpu->arch.dr6 |= dr6 | DR6_RTM;
if (!(dr6 & ~DR6_RESERVED)) /* icebp */
skip_emulated_instruction(vcpu);
 
@@ -5151,7 +5151,7 @@ static int handle_dr(struct kvm_vcpu *vcpu)
return 0;
} else {
vcpu->arch.dr7 &= ~DR7_GD;
-   vcpu->arch.dr6 |= DR6_BD;
+   vcpu->arch.dr6 |= DR6_BD | DR6_RTM;
vmcs_writel(GUEST_DR7, vcpu->arch.dr7);
kvm_queue_exception(vcpu, DB_VECTOR);
return 1;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f750b69..fae064f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -759,6 +759,15 @@ static void kvm_update_dr7(struct kvm_vcpu *vcpu)
vcpu->arch.switch_db_regs |= KVM_DEBUGREG_BP_ENABLED;
 }
 
+static u64 kvm_dr6_fixed(struct kvm_vcpu *vcpu)
+{
+   u64 fixed = DR6_FIXED_1;
+
+   if (!guest_cpuid_has_rtm(vcpu))
+   fixed |= DR6_RTM;
+   return fixed;
+}
+
 static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val)
 {
switch (dr) {
@@ -774,7 +783,7 @@ static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, 
unsigned long val)
case 6:
if (val & 0xULL)
return -1; /* #GP */
-   vcpu->arch.dr6 = (val & DR6_VOLATILE) | DR6_FIXED_1;
+   vcpu->arch.dr6 = (val & DR6_VOLATILE) | kvm_dr6_fixed(vcpu);
kvm_update_dr6(vcpu);
break;
case 5:
@@ -5115,7 +5124,8 @@ static void kvm_vcpu_check_singlestep(struct kvm_vcpu 
*vcpu, unsigned long rflag
 */
if (unlikely(rflags & X86_EFLAGS_TF)) {
if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
-   kvm_run->debug.arch.dr6 = DR6_BS | DR6_FIXED_1;
+   kvm_run->debug.arch.dr6 = DR6_BS | DR6_FIXED_1 |
+ DR6_RTM;
kvm_run->debug.arch.pc = vcpu->arch.singlestep_rip;
kvm_run->debug.arch.exception = DB_VECTOR;
kvm_run->exit_reason = KVM_EXIT_DEBUG;
@@ -5

[PATCH kvm-unit-tests] x86: Check DR6.RTM is writable

2014-07-15 Thread Nadav Amit

Recently discovered bug shows DR6.RTM is fixed to one. The bug is only apparent
when the host emulates the MOV-DR instruction or when the host debugs the
guest kernel. This patch tests whether DR6.RTM is indeed accessible according
to RTM support as reported by cpuid.

Signed-off-by: Nadav Amit 
---
 x86/emulator.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/x86/emulator.c b/x86/emulator.c
index 1fd0ca6..f68882f 100644
--- a/x86/emulator.c
+++ b/x86/emulator.c
@@ -3,6 +3,7 @@
 #include "libcflat.h"
 #include "desc.h"
 #include "types.h"
+#include "processor.h"
 
 #define memset __builtin_memset
 #define TESTDEV_IO_PORT 0xe0
@@ -870,6 +871,19 @@ static void test_nop(uint64_t *mem, uint8_t *insn_page,
report("nop", outregs.rax == inregs.rax);
 }
 
+static void test_mov_dr(uint64_t *mem, uint8_t *insn_page,
+   uint8_t *alt_insn_page, void *insn_ram)
+{
+   bool rtm_support = cpuid(7).b & (1 << 11);
+   unsigned long dr6_fixed_1 = rtm_support ? 0xfffe0ff0ul : 0x0ff0ul;
+   inregs = (struct regs){ .rax = 0 };
+   MK_INSN(mov_to_dr6, "movq %rax, %dr6\n\t");
+   trap_emulator(mem, alt_insn_page, &insn_mov_to_dr6);
+   MK_INSN(mov_from_dr6, "movq %dr6, %rax\n\t");
+   trap_emulator(mem, alt_insn_page, &insn_mov_from_dr6);
+   report("mov_dr6", outregs.rax == dr6_fixed_1);
+}
+
 static void test_crosspage_mmio(volatile uint8_t *mem)
 {
 volatile uint16_t w, *pw;
@@ -1072,6 +1086,7 @@ int main()
test_movabs(mem, insn_page, alt_insn_page, insn_ram);
test_smsw_reg(mem, insn_page, alt_insn_page, insn_ram);
test_nop(mem, insn_page, alt_insn_page, insn_ram);
+   test_mov_dr(mem, insn_page, alt_insn_page, insn_ram);
test_crosspage_mmio(mem);
 
test_string_io_mmio(mem);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: x86: emulator injects #DB when RFLAGS.RF is set

2014-07-16 Thread Nadav Amit

If the RFLAGS.RF is set, then no #DB should occur on instruction breakpoints.
However, the KVM emulator injects #DB regardless to RFLAGS.RF. This patch fixes
this behavior. KVM, however, still appears not to update RFLAGS.RF correctly,
regardless of this patch.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/x86.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fae064f..e341a81 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5168,7 +5168,8 @@ static bool kvm_vcpu_check_breakpoint(struct kvm_vcpu 
*vcpu, int *r)
}
}
 
-   if (unlikely(vcpu->arch.dr7 & DR7_BP_EN_MASK)) {
+   if (unlikely(vcpu->arch.dr7 & DR7_BP_EN_MASK) &&
+   !(kvm_get_rflags(vcpu) & X86_EFLAGS_RF)) {
dr6 = kvm_vcpu_check_hw_bp(eip, 0,
   vcpu->arch.dr7,
   vcpu->arch.db);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 2/4] KVM: MMU: allow pinning spte translations (TDP-only)

2014-07-17 Thread Nadav Amit


Small question if I may regarding kvm_mmu_pin_pages:

On 7/9/14, 10:12 PM, mtosa...@redhat.com wrote:

+
+static int kvm_mmu_pin_pages(struct kvm_vcpu *vcpu)
+{
+   struct kvm_pinned_page_range *p;
+   int r = 1;
+
+   if (is_guest_mode(vcpu))
+   return r;
+
+   if (!vcpu->arch.mmu.direct_map)
+   return r;
+
+   ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
+
+   list_for_each_entry(p, &vcpu->arch.pinned_mmu_pages, link) {
+   gfn_t gfn_offset;
+
+   for (gfn_offset = 0; gfn_offset < p->npages; gfn_offset++) {
+   gfn_t gfn = p->base_gfn + gfn_offset;
+   int r;
+   bool pinned = false;
+
+   r = vcpu->arch.mmu.page_fault(vcpu, gfn << PAGE_SHIFT,
+PFERR_WRITE_MASK, false,
+true, &pinned);


I understand that the current use-case is for pinning only few pages. 
Yet, wouldn't it be better (for performance) to check whether the gfn 
uses a large page and if so to skip forward, increasing gfn_offset to 
point to the next large page?


Thanks,
Nadav
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 2/4] KVM: MMU: allow pinning spte translations (TDP-only)

2014-07-24 Thread Nadav Amit


On Jul 18, 2014, at 12:38 AM, Marcelo Tosatti  wrote:

> On Thu, Jul 17, 2014 at 08:18:03PM +0300, Nadav Amit wrote:
>> Small question if I may regarding kvm_mmu_pin_pages:
>> 
...
>> I understand that the current use-case is for pinning only few
>> pages. Yet, wouldn't it be better (for performance) to check whether
>> the gfn uses a large page and if so to skip forward, increasing
>> gfn_offset to point to the next large page?
> 
> Sure, that can be a lazy optimization and performed when necessary?
> 
> (feel free to do it in advance if you're interested in doing it 
> now).
I would do it once your patch is applied (I don’t see it in the queue).

Thanks,
Nadav


signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [PATCH] KVM: x86: Remove redundant and incorrect cpl check on task-switch

2014-07-31 Thread Nadav Amit

cc'ing the kvm mailing list that was mistakenly omitted.

On 7/30/14 9:57 AM, Nadav Amit wrote:
> Task-switch emulation checks the privilage level prior to performing the
> task-switch.  This check is incorrect in the case of task-gates, in which the
> tss.dpl is ignored, and can cause superfluous exceptions.  Moreover this check
> is unnecassary, since the CPU checks the privilage levels prior to exiting.
> Intel SDM 25.4.2 says "If CALL or JMP accesses a TSS descriptor directly
> outside IA-32e mode, privilege levels are checked on the TSS descriptor" prior
> to exiting.  AMD 15.14.1 says "The intercept is checked before the task switch
> takes place but after the incoming TSS and task gate (if one was involved) 
> have
> been checked for correctness."
> 
> This patch removes the CPL checks for CALL and JMP.
> 
> Signed-off-by: Nadav Amit 
> ---
>  arch/x86/kvm/emulate.c | 8 ++--
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 56657b0..a9b2bd6 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -2698,7 +2698,8 @@ static int emulator_do_task_switch(struct 
> x86_emulate_ctxt *ctxt,
>*
>* 1. jmp/call/int to task gate: Check against DPL of the task gate
>* 2. Exception/IRQ/iret: No check is performed
> -  * 3. jmp/call to TSS: Check against DPL of the TSS
> +  * 3. jmp/call to TSS/task-gate: No check is performed since the
> +  *hardware checks it before exiting.
>*/
>   if (reason == TASK_SWITCH_GATE) {
>   if (idt_index != -1) {
> @@ -2715,13 +2716,8 @@ static int emulator_do_task_switch(struct 
> x86_emulate_ctxt *ctxt,
>   if ((tss_selector & 3) > dpl || ops->cpl(ctxt) > dpl)
>   return emulate_gp(ctxt, (idt_index << 3) | 0x2);
>   }
> - } else if (reason != TASK_SWITCH_IRET) {
> - int dpl = next_tss_desc.dpl;
> - if ((tss_selector & 3) > dpl || ops->cpl(ctxt) > dpl)
> - return emulate_gp(ctxt, tss_selector);
>   }
>  
> -
>   desc_limit = desc_limit_scaled(&next_tss_desc);
>   if (!next_tss_desc.p ||
>   ((desc_limit < 0x67 && (next_tss_desc.type & 8)) ||
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: cluster mode broadcast does not work

2014-07-31 Thread Nadav Amit

cc'ing the kvm mailing list that was mistakenly omitted.

On 7/30/14 10:03 AM, Nadav Amit wrote:
> Local-apic enables cluster mode broadcast. As Intel SDM 10.6.2.2 says:
> "Broadcast to all local APICs is achieved by setting all destination bits to
> one." This patch enables cluster mode broadcast.
> 
> Signed-off-by: Nadav Amit 
> ---
>  arch/x86/kvm/lapic.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index efb1939..05c9a46 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -560,8 +560,8 @@ int kvm_apic_match_logical_addr(struct kvm_lapic *apic, 
> u8 mda)
>   result = 1;
>   break;
>   case APIC_DFR_CLUSTER:
> - if (((logical_id >> 4) == (mda >> 0x4))
> - && (logical_id & mda & 0xf))
> + if logical_id >> 4) == (mda >> 0x4))
> + && (logical_id & mda & 0xf)) || mda == 0xff)
>   result = 1;
>   break;
>   default:
> @@ -644,6 +644,10 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, 
> struct kvm_lapic *src,
>   } else {
>   u32 mda = irq->dest_id << (32 - map->ldr_bits);
>  
> + if (irq->dest_id == 0xff &&
> + kvm_apic_get_reg(src, APIC_DFR) == APIC_DFR_CLUSTER)
> + goto out; /* cluster mode broadcast */
> +
>   dst = map->logical_map[apic_cluster_id(map, mda)];
>  
>   bitmap = apic_logical_id(map, mda);
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: x86: Avoid emulating instructions on #UD mistakenly

2014-08-13 Thread Nadav Amit

Commit d40a6898e5 mistakenly caused instructions which are not marked as
EmulateOnUD to be emulated upon #UD exception. The commit caused the check of
whether the instruction flags include EmulateOnUD to never be evaluated. As a
result instructions whose emulation is broken may be emulated.  This fix moves
the evaluation of EmulateOnUD so it would be evaluated.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 56657b0..37a83b2 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4394,6 +4394,9 @@ done_prefixes:
 
ctxt->execute = opcode.u.execute;
 
+   if (!(ctxt->d & EmulateOnUD) && ctxt->ud)
+   return EMULATION_FAILED;
+
if (unlikely(ctxt->d &
 
(NotImpl|EmulateOnUD|Stack|Op3264|Sse|Mmx|Intercept|CheckPerm))) {
/*
@@ -4406,9 +4409,6 @@ done_prefixes:
if (ctxt->d & NotImpl)
return EMULATION_FAILED;
 
-   if (!(ctxt->d & EmulateOnUD) && ctxt->ud)
-   return EMULATION_FAILED;
-
if (mode == X86EMUL_MODE_PROT64 && (ctxt->d & Stack))
ctxt->op_bytes = 8;
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: Avoid emulating instructions on #UD mistakenly

2014-08-13 Thread Nadav Amit

Correction: the word “never” in the message is too harsh.
Nonetheless, there is a regression bug. I encountered it with “wrfsbase” 
instruction.

Nadav

On Aug 13, 2014, at 4:50 PM, Nadav Amit  wrote:

> Commit d40a6898e5 mistakenly caused instructions which are not marked as
> EmulateOnUD to be emulated upon #UD exception. The commit caused the check of
> whether the instruction flags include EmulateOnUD to never be evaluated. As a
> result instructions whose emulation is broken may be emulated.  This fix moves
> the evaluation of EmulateOnUD so it would be evaluated.
> 
> Signed-off-by: Nadav Amit 
> ---
> arch/x86/kvm/emulate.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 56657b0..37a83b2 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -4394,6 +4394,9 @@ done_prefixes:
> 
>   ctxt->execute = opcode.u.execute;
> 
> + if (!(ctxt->d & EmulateOnUD) && ctxt->ud)
> + return EMULATION_FAILED;
> +
>   if (unlikely(ctxt->d &
>
> (NotImpl|EmulateOnUD|Stack|Op3264|Sse|Mmx|Intercept|CheckPerm))) {
>   /*
> @@ -4406,9 +4409,6 @@ done_prefixes:
>   if (ctxt->d & NotImpl)
>   return EMULATION_FAILED;
> 
> - if (!(ctxt->d & EmulateOnUD) && ctxt->ud)
> - return EMULATION_FAILED;
> -
>   if (mode == X86EMUL_MODE_PROT64 && (ctxt->d & Stack))
>   ctxt->op_bytes = 8;
> 
> -- 
> 1.9.1
> 



signature.asc
Description: Message signed with OpenPGP using GPGMail

Regression problem with commit 5045b46803

2014-08-13 Thread Nadav Amit

Commit 5045b46803 added a check that cs.dpl equals cs.rpl during
task-switch. Unfortunately, is causes some of my tests that run well on
bare-metal to fail.

Although this check is mentioned in table 7-1 of the SDM as causing a
#TSS exception, it is not mentioned in table 6-6 that lists "invalid TSS
conditions" which cause #TSS exceptions.

Thus, I recommend on reverting commit 5045b46803, or alternatively
rechecking task-switch behavior on bare-metal.

Nadav
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Regression problem with commit 5045b46803

2014-08-16 Thread Nadav Amit


On Aug 17, 2014, at 9:17 AM, Paolo Bonzini  wrote:

> Il 13/08/2014 19:14, Nadav Amit ha scritto:
>> Commit 5045b46803 added a check that cs.dpl equals cs.rpl during
>> task-switch. Unfortunately, is causes some of my tests that run well on
>> bare-metal to fail.
>> 
>> Although this check is mentioned in table 7-1 of the SDM as causing a
>> #TSS exception, it is not mentioned in table 6-6 that lists "invalid TSS
>> conditions" which cause #TSS exceptions.
>> 
>> Thus, I recommend on reverting commit 5045b46803, or alternatively
>> rechecking task-switch behavior on bare-metal.
> 
> Can you provide a kvm-unit-tests test for this?

I need to put time into writing one. However, the unit test would prove nothing.
The question is how bare-metal behaves. According to what I see, it does not 
perform this check.

Nadav--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Regression problem with commit 5045b46803

2014-08-17 Thread Nadav Amit


On Aug 17, 2014, at 9:28 AM, Paolo Bonzini  wrote:

> Il 17/08/2014 08:23, Nadav Amit ha scritto:
>>>>>> Although this check is mentioned in table 7-1 of the SDM as causing a
>>>>>> #TSS exception, it is not mentioned in table 6-6 that lists "invalid TSS
>>>>>> conditions" which cause #TSS exceptions.
>>>>>> 
>>>>>> Thus, I recommend on reverting commit 5045b46803, or alternatively
>>>>>> rechecking task-switch behavior on bare-metal.
>>>> 
>>>> Can you provide a kvm-unit-tests test for this?
>> 
>> The question is how bare-metal behaves. According to what I see, it does not 
>> perform this check.
> 
> Yes, but I still prefer to have a test, in case I find some time to fix
> kvm-unit-tests to work on bare metal (or on Bochs).
> 
> There are already a few task switch tests (32-bit only).
> 

Can you please assist in running the existing task-switch tests first?

In x86_64 the task-switch tests are skipped. If I configure the tests to use 
the i386 architecture, the build fails.
Even after fixing the compilation errors, I don’t manage to get the task-switch 
tests to pass.

Nadav

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH kvm-unit-tests] x86: Test task-switch with cs.rpl != cs.dpl

2014-08-17 Thread Nadav Amit

Commit 5045b46803 added a check that cs.dpl equals cs.rpl during task-switch.
This is a wrong check, and this test introduces a test in which cs.dpl !=
cs.rpl. To do so, it configures tss.cs to be conforming with rpl=3 and dpl=0.
Since the cpl after calling is 3, it does not make any prints in the callee.

Signed-off-by: Nadav Amit 
---
 x86/taskswitch2.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/x86/taskswitch2.c b/x86/taskswitch2.c
index 92fc941..d96853f 100644
--- a/x86/taskswitch2.c
+++ b/x86/taskswitch2.c
@@ -7,6 +7,8 @@
 
 #define MAIN_TSS_SEL (FIRST_SPARE_SEL + 0)
 #define VM86_TSS_SEL (FIRST_SPARE_SEL + 8)
+#define USER_CS_SEL  (FIRST_SPARE_SEL + 16)
+#define USER_DS_SEL  (FIRST_SPARE_SEL + 24)
 
 static volatile int test_count;
 static volatile unsigned int test_divider;
@@ -102,6 +104,14 @@ start:
goto start;
 }
 
+static void user_tss(void)
+{
+start:
+   test_count++;
+   asm volatile ("iret");
+   goto start;
+}
+
 void test_kernel_mode_int()
 {
unsigned int res;
@@ -201,6 +211,18 @@ void test_kernel_mode_int()
asm volatile ("ljmp $" xstr(TSS_INTR) ", $0xf4f4f4f4");
printf("Jump back succeeded\n");
report("ljmp", test_count == 1);
+
+   /* test lcall with conforming segment, cs.dpl != cs.rpl */
+   test_count = 0;
+   set_intr_task_gate(0, user_tss);
+
+   tss_intr.cs = USER_CS_SEL | 3;
+   tss_intr.ss = USER_DS_SEL | 3;
+   tss_intr.ds = tss_intr.gs = tss_intr.fs = tss_intr.ss;
+   set_gdt_entry(USER_CS_SEL, 0, 0x, 0x9f, 0xc0);
+   set_gdt_entry(USER_DS_SEL, 0, 0x, 0xf3, 0xc0);
+   asm volatile("lcall $" xstr(TSS_INTR) ", $0xf4f4f4f4");
+   report("lcall when cs.rpl != cs.dpl", test_count == 1);
 }
 
 void test_vm86_switch(void)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: x86: Revert "check CS.DPL against RPL during task switch"

2014-08-17 Thread Nadav Amit

This reverts commit 5045b468037dfe1c848827ce10e99d87f5669160.  Although the
cs.dpl=cs.rpl check is mentioned in table 7-1 of the SDM as causing a #TSS
exception, it is not mentioned in table 6-6 that lists "invalid TSS conditions"
which cause #TSS exceptions. As it causes some tests, which pass on bare-metal,
to fail - it should be reverted.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/emulate.c | 32 +++-
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 56657b0..b73c9e8 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1415,7 +1415,7 @@ static int write_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
 
 /* Does not support long mode */
 static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
-u16 selector, int seg, u8 cpl, bool 
in_task_switch)
+u16 selector, int seg, u8 cpl)
 {
struct desc_struct seg_desc, old_desc;
u8 dpl, rpl;
@@ -1491,9 +1491,6 @@ static int __load_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
goto exception;
break;
case VCPU_SREG_CS:
-   if (in_task_switch && rpl != dpl)
-   goto exception;
-
if (!(seg_desc.type & 8))
goto exception;
 
@@ -1560,7 +1557,7 @@ static int load_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
   u16 selector, int seg)
 {
u8 cpl = ctxt->ops->cpl(ctxt);
-   return __load_segment_descriptor(ctxt, selector, seg, cpl, false);
+   return __load_segment_descriptor(ctxt, selector, seg, cpl);
 }
 
 static void write_register_operand(struct operand *op)
@@ -2460,19 +2457,19 @@ static int load_state_from_tss16(struct 
x86_emulate_ctxt *ctxt,
 * Now load segment descriptors. If fault happens at this stage
 * it is handled in a context of new task
 */
-   ret = __load_segment_descriptor(ctxt, tss->ldt, VCPU_SREG_LDTR, cpl, 
true);
+   ret = __load_segment_descriptor(ctxt, tss->ldt, VCPU_SREG_LDTR, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
-   ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl, true);
+   ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
-   ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl, true);
+   ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
-   ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl, true);
+   ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
-   ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl, true);
+   ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
 
@@ -2597,25 +2594,26 @@ static int load_state_from_tss32(struct 
x86_emulate_ctxt *ctxt,
 * Now load segment descriptors. If fault happenes at this stage
 * it is handled in a context of new task
 */
-   ret = __load_segment_descriptor(ctxt, tss->ldt_selector, 
VCPU_SREG_LDTR, cpl, true);
+   ret = __load_segment_descriptor(ctxt, tss->ldt_selector, VCPU_SREG_LDTR,
+   cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
-   ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl, true);
+   ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
-   ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl, true);
+   ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
-   ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl, true);
+   ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
-   ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl, true);
+   ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
-   ret = __load_segment_descriptor(ctxt, tss->fs, VCPU_SREG_FS, cpl, true);
+   ret = __load_segment_descriptor(ctxt, tss->fs, VCPU_SREG_FS, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
-   ret = __load_segment_descriptor(ctxt, tss->gs, VCPU_SREG_GS, cpl, true);
+   ret = __load_segment_descriptor(ctxt, tss->

Re: [PATCH] KVM: x86: Revert "check CS.DPL against RPL during task switch"

2014-08-17 Thread Nadav Amit


On Aug 18, 2014, at 12:13 AM, Paolo Bonzini  wrote:

> Il 17/08/2014 23:09, Paolo Bonzini ha scritto:
>> Il 17/08/2014 21:32, Nadav Amit ha scritto:
>>> This reverts commit 5045b468037dfe1c848827ce10e99d87f5669160.  Although the
>>> cs.dpl=cs.rpl check is mentioned in table 7-1 of the SDM as causing a #TSS
>>> exception, it is not mentioned in table 6-6 that lists "invalid TSS 
>>> conditions"
>>> which cause #TSS exceptions. As it causes some tests, which pass on 
>>> bare-metal,
>>> to fail - it should be reverted.
>> 
>> Right.  However, I think reverting the patch is too big a hammer.  We 
>> still need in_task_switch to raise TS_VECTOR instead of GP_VECTOR, so I 
>> propose instead something like:
>> 
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index 56657b0bb3bb..cd230b035514 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -1468,7 +1468,7 @@ static int __load_segment_descriptor(struct 
>> x86_emulate_ctxt *ctxt,
>>  return ret;
>> 
>>  err_code = selector & 0xfffc;
>> -err_vec = GP_VECTOR;
>> +err_vec = in_task_switch ? TS_VECTOR : GP_VECTOR;
>> 
>>  /* can't load system descriptor into segment selector */
>>  if (seg <= VCPU_SREG_GS && !seg_desc.s)
>> @@ -1491,9 +1491,6 @@ static int __load_segment_descriptor(struct 
>> x86_emulate_ctxt *ctxt,
>>  goto exception;
>>  break;
>>  case VCPU_SREG_CS:
>> -if (in_task_switch && rpl != dpl)
>> -goto exception;
>> -
>>  if (!(seg_desc.type & 8))
>>  goto exception;
>> 
> 
> Also, what about the rpl > cpl test below, for non-conforming code
> segments?  It is not mentioned in table 6-6 either.

As far as I understand, after task-switch cpl = cs.rpl. This is how the 
load_state_from_tss32 does it, and follows SDM 7.3, Task Switching: 
"The new task begins executing at the privilege level specified in the CPL 
field of the CS register, which is loaded from the TSS.”
As a result, this condition can never occur during task-switch.

Do I miss anything?

Nadav


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: Revert "check CS.DPL against RPL during task switch"

2014-08-17 Thread Nadav Amit


On Aug 18, 2014, at 12:09 AM, Paolo Bonzini  wrote:

> Il 17/08/2014 21:32, Nadav Amit ha scritto:
>> This reverts commit 5045b468037dfe1c848827ce10e99d87f5669160.  Although the
>> cs.dpl=cs.rpl check is mentioned in table 7-1 of the SDM as causing a #TSS
>> exception, it is not mentioned in table 6-6 that lists "invalid TSS 
>> conditions"
>> which cause #TSS exceptions. As it causes some tests, which pass on 
>> bare-metal,
>> to fail - it should be reverted.
> 
> Right.  However, I think reverting the patch is too big a hammer.  We 
> still need in_task_switch to raise TS_VECTOR instead of GP_VECTOR, so I 
> propose instead something like:
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 56657b0bb3bb..cd230b035514 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -1468,7 +1468,7 @@ static int __load_segment_descriptor(struct 
> x86_emulate_ctxt *ctxt,
>   return ret;
> 
>   err_code = selector & 0xfffc;
> - err_vec = GP_VECTOR;
> + err_vec = in_task_switch ? TS_VECTOR : GP_VECTOR;
> 
>   /* can't load system descriptor into segment selector */
>   if (seg <= VCPU_SREG_GS && !seg_desc.s)
> @@ -1491,9 +1491,6 @@ static int __load_segment_descriptor(struct 
> x86_emulate_ctxt *ctxt,
>   goto exception;
>   break;
>   case VCPU_SREG_CS:
> - if (in_task_switch && rpl != dpl)
> - goto exception;
> -
>   if (!(seg_desc.type & 8))
>   goto exception;
> 
> 
> either in a single patch or as two separate patch.  I'll test this against
> your test case and repost (not before Tuesday).
> 

I missed the TS_VECTOR thing. Yes, in that case, full revert makes no sense. 
I’m in no hurry with this patch, and I posted it during your vacation only 
because it is a recent bug.
Anyhow, you may want to look at another patch I sent, "KVM: x86: Avoid 
emulating instructions on #UD mistakenly”, ASAP. It fixes a bug that was 
introduced into 3.17-RC1 and is visible in userspace.

Nadav





signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [PATCH] KVM: x86: Increase the number of fixed MTRR regs to 10

2014-08-17 Thread Nadav Amit

This should have been a benign patch. I'll try to get windows 7 installation 
disk and check ASAP.

Nadav

> On 18 Aug 2014, at 05:17, Wanpeng Li  wrote:
> 
> Hi Nadav,
>> On Wed, Jun 18, 2014 at 05:21:19PM +0300, Nadav Amit wrote:
>> Recent Intel CPUs have 10 variable range MTRRs. Since operating systems
>> sometime make assumptions on CPUs while they ignore capability MSRs, it is
>> better for KVM to be consistent with recent CPUs. Reporting more MTRRs than
>> actually supported has no functional implications.
>> 
>> Signed-off-by: Nadav Amit 
>> ---
>> arch/x86/include/asm/kvm_host.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/arch/x86/include/asm/kvm_host.h 
>> b/arch/x86/include/asm/kvm_host.h
>> index 4931415..0bab29d 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -95,7 +95,7 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t 
>> base_gfn, int level)
>> #define KVM_REFILL_PAGES 25
>> #define KVM_MAX_CPUID_ENTRIES 80
>> #define KVM_NR_FIXED_MTRR_REGION 88
>> -#define KVM_NR_VAR_MTRR 8
>> +#define KVM_NR_VAR_MTRR 10
> 
> We observed that there is obvious regression caused by this commit, 32bit 
> win7 guest show blue screen during boot.
> 
> Regards,
> Wanpeng Li 
> 
>> #define ASYNC_PF_PER_VCPU 64
>> 
>> -- 
>> 1.9.1
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: Increase the number of fixed MTRR regs to 10

2014-08-18 Thread Nadav Amit

I reproduced the blue-screen. Let me to to figure it out.

Nadav

On Aug 18, 2014, at 11:11 AM, Wanpeng Li  wrote:

> On Mon, Aug 18, 2014 at 09:39:39AM +0300, Nadav Amit wrote:
>> This should have been a benign patch. I'll try to get windows 7 installation 
>> disk and check ASAP.
>> 
> 
> In addition, it just can be reproduced on 32bit win7 w/ MP enabled, in
> case UP can't be reproduced.
> 
> Regards,
> Wanpeng Li 
> 
>> Nadav
>> 
>>> On 18 Aug 2014, at 05:17, Wanpeng Li  wrote:
>>> 
>>> Hi Nadav,
>>>> On Wed, Jun 18, 2014 at 05:21:19PM +0300, Nadav Amit wrote:
>>>> Recent Intel CPUs have 10 variable range MTRRs. Since operating systems
>>>> sometime make assumptions on CPUs while they ignore capability MSRs, it is
>>>> better for KVM to be consistent with recent CPUs. Reporting more MTRRs than
>>>> actually supported has no functional implications.
>>>> 
>>>> Signed-off-by: Nadav Amit 
>>>> ---
>>>> arch/x86/include/asm/kvm_host.h | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>> 
>>>> diff --git a/arch/x86/include/asm/kvm_host.h 
>>>> b/arch/x86/include/asm/kvm_host.h
>>>> index 4931415..0bab29d 100644
>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>> @@ -95,7 +95,7 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t 
>>>> base_gfn, int level)
>>>> #define KVM_REFILL_PAGES 25
>>>> #define KVM_MAX_CPUID_ENTRIES 80
>>>> #define KVM_NR_FIXED_MTRR_REGION 88
>>>> -#define KVM_NR_VAR_MTRR 8
>>>> +#define KVM_NR_VAR_MTRR 10
>>> 
>>> We observed that there is obvious regression caused by this commit, 32bit 
>>> win7 guest show blue screen during boot.
>>> 
>>> Regards,
>>> Wanpeng Li 
>>> 
>>>> #define ASYNC_PF_PER_VCPU 64
>>>> 
>>>> -- 
>>>> 1.9.1
>>>> 
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>> the body of a message to majord...@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: Increase the number of fixed MTRR regs to 10

2014-08-18 Thread Nadav Amit

The cause for the blue-screen appears to be seabios, which leaves only 0x20 
slots for “smp_mtrr”s.
Apparently, the increase in the variable range MTRR count caused it to exhaust 
the available slots.
As a result, some MSRs are not initialised by the BIOS (specifically, 3.5-4GB 
are not marked as UC), and cause Windows to panic.

Once we increase the size of the array smp_mtrr in seabios, Windows boots.

Paolo, you may wish to revert the patch. Please note that it was applied to 
some stable branches.

Nadav

On Aug 18, 2014, at 12:39 PM, Nadav Amit  wrote:

> I reproduced the blue-screen. Let me to to figure it out.
> 
> Nadav
> 
> On Aug 18, 2014, at 11:11 AM, Wanpeng Li  wrote:
> 
>> On Mon, Aug 18, 2014 at 09:39:39AM +0300, Nadav Amit wrote:
>>> This should have been a benign patch. I'll try to get windows 7 
>>> installation disk and check ASAP.
>>> 
>> 
>> In addition, it just can be reproduced on 32bit win7 w/ MP enabled, in
>> case UP can't be reproduced.
>> 
>> Regards,
>> Wanpeng Li 
>> 
>>> Nadav
>>> 
>>>> On 18 Aug 2014, at 05:17, Wanpeng Li  wrote:
>>>> 
>>>> Hi Nadav,
>>>>> On Wed, Jun 18, 2014 at 05:21:19PM +0300, Nadav Amit wrote:
>>>>> Recent Intel CPUs have 10 variable range MTRRs. Since operating systems
>>>>> sometime make assumptions on CPUs while they ignore capability MSRs, it is
>>>>> better for KVM to be consistent with recent CPUs. Reporting more MTRRs 
>>>>> than
>>>>> actually supported has no functional implications.
>>>>> 
>>>>> Signed-off-by: Nadav Amit 
>>>>> ---
>>>>> arch/x86/include/asm/kvm_host.h | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>> 
>>>>> diff --git a/arch/x86/include/asm/kvm_host.h 
>>>>> b/arch/x86/include/asm/kvm_host.h
>>>>> index 4931415..0bab29d 100644
>>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>>> @@ -95,7 +95,7 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t 
>>>>> base_gfn, int level)
>>>>> #define KVM_REFILL_PAGES 25
>>>>> #define KVM_MAX_CPUID_ENTRIES 80
>>>>> #define KVM_NR_FIXED_MTRR_REGION 88
>>>>> -#define KVM_NR_VAR_MTRR 8
>>>>> +#define KVM_NR_VAR_MTRR 10
>>>> 
>>>> We observed that there is obvious regression caused by this commit, 32bit 
>>>> win7 guest show blue screen during boot.
>>>> 
>>>> Regards,
>>>> Wanpeng Li 
>>>> 
>>>>> #define ASYNC_PF_PER_VCPU 64
>>>>> 
>>>>> -- 
>>>>> 1.9.1
>>>>> 
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>>> the body of a message to majord...@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>> the body of a message to majord...@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: Avoid emulating instructions on #UD mistakenly

2014-08-18 Thread Nadav Amit


On Aug 18, 2014, at 11:31 AM, Paolo Bonzini  wrote:

> Il 13/08/2014 16:21, Nadav Amit ha scritto:
>> Correction: the word “never” in the message is too harsh.
>> Nonetheless, there is a regression bug. I encountered it with “wrfsbase” 
>> instruction.
> 
> So KVM is emulating wrfsbase even if the host doesn't support it?
KVM doesn’t know wrfsbase - wrfsbase is encoded like clflush with rep-prefix.
Therefore, the emulator thinks it can emulate it as clflush, and eliminates the 
#UD.

> 
> I'm swapping the order of the two operands of &&, since the first one will 
> almost
> always be true and the second one will almost always be false.
> 
> Also, there's now no need to test EmulateOnUD in the condition below.  Does 
> the
> below look good to you?
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 37a83b24e040..ef117b842334 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -4394,11 +4394,11 @@ done_prefixes:
> 
>   ctxt->execute = opcode.u.execute;
> 
> - if (!(ctxt->d & EmulateOnUD) && ctxt->ud)
> + if (unlikely(ctxt->ud) && likely(!(ctxt->d & EmulateOnUD)))
>   return EMULATION_FAILED;
> 
>   if (unlikely(ctxt->d &
> -  
> (NotImpl|EmulateOnUD|Stack|Op3264|Sse|Mmx|Intercept|CheckPerm))) {
> +  (NotImpl|Stack|Op3264|Sse|Mmx|Intercept|CheckPerm))) {
>   /*
>* These are copied unconditionally here, and checked 
> unconditionally
>* in x86_emulate_insn.
> 

Sure. Until I find some bugs in it. ;-)

Thanks,
Nadav


signature.asc
Description: Message signed with OpenPGP using GPGMail

1 2 3 4 >

1 - 100 of 357 matches

Mail list logo