Re: KVM: x86: fix mov immediate emulation for 64-bit operands
On 7/12/12 09:31 , Gleb Natapov wrote: > On Thu, Dec 06, 2012 at 09:55:10PM -0200, Marcelo Tosatti wrote: >> >> From: Nadav Amit >> >> MOV immediate instruction (opcodes 0xB8-0xBF) may take 64-bit operand. >> The previous emulation implementation assumes the operand is no longer than >> 32. >> Adding OpImm64 for this matter. >> >> Fixes https://bugzilla.redhat.com/show_bug.cgi?id=881579 >> >> Signed-off-by: Marcelo Tosatti >> > Needs author's sign-off and test case. I've already signed-off the patch I sent a while ago. I was busy, put the test-case implementation low in my priority list, and forgot about it. I would appreciate if Marcelo implements the test-case. Otherwise, let me know - and I'll do it next week. > >> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c >> index 39171cb..6fec09c 100644 >> --- a/arch/x86/kvm/emulate.c >> +++ b/arch/x86/kvm/emulate.c >> @@ -43,7 +43,7 @@ >> #define OpCL 9ull /* CL register (for shifts) */ >> #define OpImmByte 10ull /* 8-bit sign extended immediate */ >> #define OpOne 11ull /* Implied 1 */ >> -#define OpImm 12ull /* Sign extended immediate */ >> +#define OpImm 12ull /* Sign extended up to 32-bit immediate */ >> #define OpMem16 13ull /* Memory operand (16-bit). */ >> #define OpMem32 14ull /* Memory operand (32-bit). */ >> #define OpImmU15ull /* Immediate operand, zero extended */ >> @@ -58,6 +58,7 @@ >> #define OpFS 24ull /* FS */ >> #define OpGS 25ull /* GS */ >> #define OpMem826ull /* 8-bit zero extended memory operand */ >> +#define OpImm64 27ull /* Sign extended 16/32/64-bit immediate */ >> >> #define OpBits 5 /* Width of operand field */ >> #define OpMask ((1ull << OpBits) - 1) >> @@ -101,6 +102,7 @@ >> #define SrcMemFAddr (OpMemFAddr << SrcShift) >> #define SrcAcc (OpAcc << SrcShift) >> #define SrcImmU16 (OpImmU16 << SrcShift) >> +#define SrcImm64(OpImm64 << SrcShift) >> #define SrcDX (OpDX << SrcShift) >> #define SrcMem8 (OpMem8 << SrcShift) >> #define SrcMask (OpMask << SrcShift) >> @@ -3786,7 +3788,7 @@ static const struct opcode opcode_table[256] = { >> /* 0xB0 - 0xB7 */ >> X8(I(ByteOp | DstReg | SrcImm | Mov, em_mov)), >> /* 0xB8 - 0xBF */ >> -X8(I(DstReg | SrcImm | Mov, em_mov)), >> +X8(I(DstReg | SrcImm64 | Mov, em_mov)), >> /* 0xC0 - 0xC7 */ >> D2bv(DstMem | SrcImmByte | ModRM), >> I(ImplicitOps | Stack | SrcImmU16, em_ret_near_imm), >> @@ -3950,6 +3952,9 @@ static int decode_imm(struct x86_emulate_ctxt *ctxt, >> struct operand *op, >> case 4: >> op->val = insn_fetch(s32, ctxt); >> break; >> +case 8: >> +op->val = insn_fetch(s64, ctxt); >> +break; >> } >> if (!sign_extension) { >> switch (op->bytes) { >> @@ -4028,6 +4033,9 @@ static int decode_operand(struct x86_emulate_ctxt >> *ctxt, struct operand *op, >> case OpImm: >> rc = decode_imm(ctxt, op, imm_size(ctxt), true); >> break; >> +case OpImm64: >> +rc = decode_imm(ctxt, op, ctxt->op_bytes, true); >> +break; >> case OpMem8: >> ctxt->memop.bytes = 1; >> goto mem_common; > > -- > Gleb. > Regards, Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fwd: [PATCH] KVM: fix test emulation writeback
Test instruction emulation should not perform writeback. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 8b4cc5f..0ad0a75 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1708,6 +1708,8 @@ static int em_grp3(struct x86_emulate_ctxt *ctxt) switch (ctxt->modrm_reg) { case 0 ... 1: /* test */ emulate_2op_SrcV("test", ctxt->src, ctxt->dst, ctxt->eflags); + /* Disable writeback. */ + ctxt->dst.type = OP_NONE; break; case 2: /* not */ ctxt->dst.val = ~ctxt->dst.val; @@ -2553,6 +2555,8 @@ static int em_cmp(struct x86_emulate_ctxt *ctxt) static int em_test(struct x86_emulate_ctxt *ctxt) { emulate_2op_SrcV("test", ctxt->src, ctxt->dst, ctxt->eflags); + /* Disable writeback. */ + ctxt->dst.type = OP_NONE; return X86EMUL_CONTINUE; } -- 1.7.4.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: fix mov immediate emulation for 64-bit operands
MOV immediate instruction (opcodes 0xB8-0xBF) may take 64-bit operand. The previous emulation implementation assumes the operand is no longer than 32. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 05a562b..65d1d31 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3502,7 +3502,8 @@ static unsigned imm_size(struct x86_emulate_ctxt *ctxt) unsigned size; size = (ctxt->d & ByteOp) ? 1 : ctxt->op_bytes; - if (size == 8) + /* Immediates are usually no longer than 4 bytes */ + if (size == 8 && ((ctxt->b & 0xF8) != 0xB8 || ctxt->twobyte)) size = 4; return size; } @@ -3526,6 +3527,9 @@ static int decode_imm(struct x86_emulate_ctxt *ctxt, struct operand *op, case 4: op->val = insn_fetch(s32, ctxt); break; + case 8: + op->val = insn_fetch(s64, ctxt); + break; } if (!sign_extension) { switch (op->bytes) { -- 1.7.4.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: fix mov immediate emulation for 64-bit operands
On Jan 7, 2012, at 10:25 PM, H. Peter Anvin wrote: > On 01/07/2012 12:21 PM, Nadav Amit wrote: >> MOV immediate instruction (opcodes 0xB8-0xBF) may take 64-bit operand. >> The previous emulation implementation assumes the operand is no longer than >> 32. >> >> Signed-off-by: Nadav Amit > > There are exactly two such instructions: MOV immediate (B8-BF) and MOV > moff (A0-A3); you may want to check the latter too. > > -hpa > These instructions (A0-A3) seem to be already covered by the decode_abs function. Regards, Nadav-- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: fix mov immediate emulation for 64-bit operands
On Sun, Jan 8, 2012 at 3:26 AM, Takuya Yoshikawa wrote: > Hi, > > Nadav Amit wrote: >> On Jan 7, 2012, at 10:25 PM, H. Peter Anvin wrote: >> >> > On 01/07/2012 12:21 PM, Nadav Amit wrote: >> >> MOV immediate instruction (opcodes 0xB8-0xBF) may take 64-bit operand. >> >> The previous emulation implementation assumes the operand is no longer >> >> than 32. >> >> >> >> Signed-off-by: Nadav Amit >> > >> > There are exactly two such instructions: MOV immediate (B8-BF) and MOV >> > moff (A0-A3); you may want to check the latter too. >> > >> > -hpa >> > >> >> These instructions (A0-A3) seem to be already covered by the decode_abs >> function. > > Like these how about introducing a new flag and change the following entries > in the > decode table to indicate possible 64bit immediate: > > /* 0xB8 - 0xBF */ > X8(I(DstReg | SrcImm | Mov, em_mov)), > > Checking the opcode byte at the operand decoding stage, like below, does not > look nice: > (IMO so better ask Avi) > > + if (size == 8 && ((ctxt->b & 0xF8) != 0xB8 || ctxt->twobyte)) > size = 4; > I agree. I remembered these flags are expensive (from the time flags were set in u32). I guess I can add OpImm64. Another less preferable alternative is to add a misc. flag or reuse another flag. Avi, please acknowledge adding OpImm64. Regards, Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: fix mov immediate emulation for 64-bit operands
On Sun, Jan 8, 2012 at 4:08 PM, Avi Kivity wrote: > On 01/08/2012 04:05 PM, Avi Kivity wrote: >> > >> > Avi, please acknowledge adding OpImm64. >> >> Yes, OpImm64 is the cleanest IMO. Note it doesn't even cost us a bit. Very well - I'll submit a revised patch later. > > Note, usually I'd ask for a unit test to accompany the fix, but the > current framework only supports testing instructions that have a memroy > operand. So you're off the hook for now, unless you'd like to extend > the framework. Unfortunately, I don't. I am doing some odd things with KVM that cause me to encounter emulator bugs. I am likely to submit another patch or two as the emulation code has additional "issues" handling EPT violations (especially on non-memory operands). Regards, Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] KVM: fix mov immediate emulation for 64-bit operands
MOV immediate instruction (opcodes 0xB8-0xBF) may take 64-bit operand. The previous emulation implementation assumes the operand is no longer than 32. Adding OpImm64 for this matter. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 12 ++-- 1 files changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 05a562b..9ad5c0b 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -43,7 +43,7 @@ #define OpCL 9ull /* CL register (for shifts) */ #define OpImmByte 10ull /* 8-bit sign extended immediate */ #define OpOne 11ull /* Implied 1 */ -#define OpImm 12ull /* Sign extended immediate */ +#define OpImm 12ull /* Sign extended up to 32-bit immediate */ #define OpMem16 13ull /* Memory operand (16-bit). */ #define OpMem32 14ull /* Memory operand (32-bit). */ #define OpImmU15ull /* Immediate operand, zero extended */ @@ -57,6 +57,7 @@ #define OpDS 23ull /* DS */ #define OpFS 24ull /* FS */ #define OpGS 25ull /* GS */ +#define OpImm64 26ull /* Sign extended 16/32/64-bit immediate */ #define OpBits 5 /* Width of operand field */ #define OpMask ((1ull << OpBits) - 1) @@ -100,6 +101,7 @@ #define SrcMemFAddr (OpMemFAddr << SrcShift) #define SrcAcc (OpAcc << SrcShift) #define SrcImmU16 (OpImmU16 << SrcShift) +#define SrcImm64(OpImm64 << SrcShift) #define SrcDX (OpDX << SrcShift) #define SrcMask (OpMask << SrcShift) #define BitOp (1<<11) @@ -3365,7 +3367,7 @@ static struct opcode opcode_table[256] = { /* 0xB0 - 0xB7 */ X8(I(ByteOp | DstReg | SrcImm | Mov, em_mov)), /* 0xB8 - 0xBF */ - X8(I(DstReg | SrcImm | Mov, em_mov)), + X8(I(DstReg | SrcImm64 | Mov, em_mov)), /* 0xC0 - 0xC7 */ D2bv(DstMem | SrcImmByte | ModRM), I(ImplicitOps | Stack | SrcImmU16, em_ret_near_imm), @@ -3526,6 +3528,9 @@ static int decode_imm(struct x86_emulate_ctxt *ctxt, struct operand *op, case 4: op->val = insn_fetch(s32, ctxt); break; + case 8: + op->val = insn_fetch(s64, ctxt); + break; } if (!sign_extension) { switch (op->bytes) { @@ -3605,6 +3610,9 @@ static int decode_operand(struct x86_emulate_ctxt *ctxt, struct operand *op, case OpImm: rc = decode_imm(ctxt, op, imm_size(ctxt), true); break; + case OpImm64: + rc = decode_imm(ctxt, op, ctxt->op_bytes, true); + break; case OpMem16: ctxt->memop.bytes = 2; goto mem_common; -- 1.7.4.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: Exception during emulation decode should propagate
An exception might occur during decode (e.g., #PF during fetch). Currently, the exception is ignored and emulation is performed. Instead, emulation should be skipped and the fault should be injected. Skipping instruction should report a failure in this case. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c |3 +++ arch/x86/kvm/x86.c |8 2 files changed, 11 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 05a562b..e06dc98 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3869,6 +3869,9 @@ done: if (ctxt->memopp && ctxt->memopp->type == OP_MEM && ctxt->rip_relative) ctxt->memopp->addr.mem.ea += ctxt->_eip; + if (rc == X86EMUL_PROPAGATE_FAULT) + ctxt->have_exception = true; + return (rc != X86EMUL_CONTINUE) ? EMULATION_FAILED : EMULATION_OK; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1171def..05fd3d7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4443,10 +4443,17 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, } if (emulation_type & EMULTYPE_SKIP) { + if (ctxt->have_exception) + return EMULATE_FAIL; kvm_rip_write(vcpu, ctxt->_eip); return EMULATE_DONE; } + if (ctxt->have_exception) { + writeback = false; + goto post; + } + if (retry_instruction(ctxt, cr2, emulation_type)) return EMULATE_DONE; @@ -4470,6 +4477,7 @@ restart: return handle_emulation_failure(vcpu); } +post: if (ctxt->have_exception) { inject_emulated_exception(vcpu); r = EMULATE_DONE; -- 1.7.4.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: Fix writeback on page boundary that propagate changes in spite of #PF
Consider the case in which an instruction emulation writeback is performed on a page boundary. In such case, if a #PF occurs on the second page, the write to the first page already occurred and cannot be retracted. Therefore, validation of the second page access must be performed prior to writeback. Signed-off-by: Nadav Amit --- arch/x86/kvm/x86.c | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 05fd3d7..7af3d67 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3626,6 +3626,8 @@ struct read_write_emulator_ops { int bytes, void *val); int (*read_write_exit_mmio)(struct kvm_vcpu *vcpu, gpa_t gpa, void *val, int bytes); + gpa_t (*read_write_validate)(struct kvm_vcpu *vcpu, gva_t gva, +struct x86_exception *exception); bool write; }; @@ -3686,6 +3688,7 @@ static struct read_write_emulator_ops write_emultor = { .read_write_emulate = write_emulate, .read_write_mmio = write_mmio, .read_write_exit_mmio = write_exit_mmio, + .read_write_validate = kvm_mmu_gva_to_gpa_write, .write = true, }; @@ -3750,6 +3753,16 @@ int emulator_read_write(struct x86_emulate_ctxt *ctxt, unsigned long addr, int rc, now; now = -addr & ~PAGE_MASK; + + /* First check there is no page-fault on the next page */ + if (ops->read_write_validate && + ops->read_write_validate(vcpu, addr+now, exception) == + UNMAPPED_GVA) { + /* #PF on the first page should be reported first */ + ops->read_write_validate(vcpu, addr, exception); + return X86EMUL_PROPAGATE_FAULT; + } + rc = emulator_read_write_onepage(addr, val, now, exception, vcpu, ops); -- 1.7.4.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: Exception during emulation decode should propagate
On Jan 12, 2012, at 2:26 AM, Takuya Yoshikawa wrote: > (2012/01/12 7:11), Takuya Yoshikawa wrote: >> On Wed, 11 Jan 2012 18:53:30 +0200 >> Nadav Amit wrote: >> >>> An exception might occur during decode (e.g., #PF during fetch). >>> Currently, the exception is ignored and emulation is performed. > > Note that the decode/emulation will not be continued in such a case. > > insn_fetch() is a bit tricky macro and it contains "goto done" to outside. > So if an error happens during fetching the instruction, x86_decode_insn() > will handle the X86EMUL_* fault value and returns FAIL immediately. You got a point. Yet, a problem still exists. I now notice I was originally working on previous version (3.0.0) where the return-code of x86_decode_insn is handled differently. Nonetheless, I think the current implementation might report emulation error in such a scenario (instead of triggering #PF/#GP in the guest). >> >> When I cleaned up insn_fetch(), I thought that fetching the instruction >> which is being executed by the guest cannot cause #PF. >> >> The possibility that a meaningless userspace might similtaneously unmap >> the page, noted by Avi IIRC, was ignored intentionally, so we just fail >> in such a case. >> >> Did you see any real problem? Well, I run some research project for which I emulate instructions quite often. I do see a real problem with Linux 3.0.0. Please note AFAIK #GP might occur as well during instruction fetch. I don't think failing is the right behavior in such case - there is no real reason to fail. Please tell me whether you are OK with KVM failing in such a scenario. If not - I'll send an updated patch (in which x86_decode_insn returns EMULATION_OK when rc == X86EMUL_PROPAGATE_FAULT). Regards, Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: Fix writeback on page boundary that propagate changes in spite of #PF
On Jan 12, 2012, at 12:27 PM, Gleb Natapov wrote: > On Thu, Jan 12, 2012 at 12:21:00PM +0200, Avi Kivity wrote: >> On 01/12/2012 12:12 PM, Gleb Natapov wrote: >>> On Wed, Jan 11, 2012 at 06:53:31PM +0200, Nadav Amit wrote: >>>> Consider the case in which an instruction emulation writeback is performed >>>> on a page boundary. >>>> In such case, if a #PF occurs on the second page, the write to the first >>>> page already occurred and cannot be retracted. >>>> Therefore, validation of the second page access must be performed prior to >>>> writeback. >>>> >>>> Signed-off-by: Nadav Amit >>>> --- >>>> arch/x86/kvm/x86.c | 13 + >>>> 1 files changed, 13 insertions(+), 0 deletions(-) >>>> >>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>>> index 05fd3d7..7af3d67 100644 >>>> --- a/arch/x86/kvm/x86.c >>>> +++ b/arch/x86/kvm/x86.c >>>> @@ -3626,6 +3626,8 @@ struct read_write_emulator_ops { >>>> int bytes, void *val); >>>>int (*read_write_exit_mmio)(struct kvm_vcpu *vcpu, gpa_t gpa, >>>>void *val, int bytes); >>>> + gpa_t (*read_write_validate)(struct kvm_vcpu *vcpu, gva_t gva, >>>> + struct x86_exception *exception); >>>>bool write; >>>> }; >>>> >>>> @@ -3686,6 +3688,7 @@ static struct read_write_emulator_ops write_emultor >>>> = { >>>>.read_write_emulate = write_emulate, >>>>.read_write_mmio = write_mmio, >>>>.read_write_exit_mmio = write_exit_mmio, >>>> + .read_write_validate = kvm_mmu_gva_to_gpa_write, >>>>.write = true, >>>> }; >>>> >>>> @@ -3750,6 +3753,16 @@ int emulator_read_write(struct x86_emulate_ctxt >>>> *ctxt, unsigned long addr, >>>>int rc, now; >>>> >>>>now = -addr & ~PAGE_MASK; >>>> + >>>> + /* First check there is no page-fault on the next page */ >>>> + if (ops->read_write_validate && >>>> + ops->read_write_validate(vcpu, addr+now, exception) == >>>> + UNMAPPED_GVA) { >>>> + /* #PF on the first page should be reported first */ >>>> + ops->read_write_validate(vcpu, addr, exception); >>>> + return X86EMUL_PROPAGATE_FAULT; >>>> + } >>>> + >>> This undoes optimization that vcpu_mmio_gva_to_gpa() has for handling >>> mmio. >> >> Right. I suggest changing I/O to have two phases: first, translate the >> virtual address into an array of two physical addresses; check >> exceptions and report. Then do the actual writes. >> >>> Furthermore for common (non faulting) case we will check page >>> tables twice on each write that crosses page boundary, first time here >>> and second time in emulator_read_write_onepage(). >> >> Those should be very uncommon. >> > Still it is better to have all the checks in one place like you suggest > above. I agree. I'll submit a revised version soon. I'm just struggling with this emulation code as more stuff is broken. Regards, Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: Fix MOVSX emulation
The destination register of MOVSX should be decoded similarily to MOVZX. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 05a562b..7644a83 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3553,7 +3553,8 @@ static int decode_operand(struct x86_emulate_ctxt *ctxt, struct operand *op, case OpReg: decode_register_operand(ctxt, op, op == &ctxt->dst && -ctxt->twobyte && (ctxt->b == 0xb6 || ctxt->b == 0xb7)); +ctxt->twobyte && ((ctxt->b & 0xfe) == 0xb6 || + (ctxt->b & 0xfe) == 0xbe)); break; case OpImmUByte: rc = decode_imm(ctxt, op, 1, false); -- 1.7.4.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Fix MOVSX emulation
Ok. I will include MOVSXD as well in the future patch. Regards, Nadav On Sun, Jan 15, 2012 at 11:56 AM, Avi Kivity wrote: > On 01/14/2012 08:34 PM, Nadav Amit wrote: >> The destination register of MOVSX should be decoded similarily to MOVZX. >> >> Signed-off-by: Nadav Amit >> --- >> arch/x86/kvm/emulate.c | 3 ++- >> 1 files changed, 2 insertions(+), 1 deletions(-) >> >> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c >> index 05a562b..7644a83 100644 >> --- a/arch/x86/kvm/emulate.c >> +++ b/arch/x86/kvm/emulate.c >> @@ -3553,7 +3553,8 @@ static int decode_operand(struct x86_emulate_ctxt >> *ctxt, struct operand *op, >> case OpReg: >> decode_register_operand(ctxt, op, >> op == &ctxt->dst && >> - ctxt->twobyte && (ctxt->b == 0xb6 || ctxt->b == >> 0xb7)); >> + ctxt->twobyte && ((ctxt->b & 0xfe) == 0xb6 || >> + (ctxt->b & 0xfe) == 0xbe)); >> break; >> case OpImmUByte: >> rc = decode_imm(ctxt, op, 1, false); > > Please post a unit test for this. See > git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git x86/emulate.c. > > -- > error compiling committee.c: too many arguments to function > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/2] KVM: Exception during emulation decode should propagate
An exception might occur during decode (e.g., #PF during fetch). Currently, the exception is ignored and emulation is performed. Instead, emulation should be skipped and the fault should be injected. Skipping instruction should report a failure in this case. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c |5 + arch/x86/kvm/x86.c |8 2 files changed, 13 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 05a562b..9a07dcd 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3869,6 +3869,11 @@ done: if (ctxt->memopp && ctxt->memopp->type == OP_MEM && ctxt->rip_relative) ctxt->memopp->addr.mem.ea += ctxt->_eip; + if (rc == X86EMUL_PROPAGATE_FAULT) { + ctxt->have_exception = true; + return EMULATION_OK; + } + return (rc != X86EMUL_CONTINUE) ? EMULATION_FAILED : EMULATION_OK; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1171def..05fd3d7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4443,10 +4443,17 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, } if (emulation_type & EMULTYPE_SKIP) { + if (ctxt->have_exception) + return EMULATE_FAIL; kvm_rip_write(vcpu, ctxt->_eip); return EMULATE_DONE; } + if (ctxt->have_exception) { + writeback = false; + goto post; + } + if (retry_instruction(ctxt, cr2, emulation_type)) return EMULATE_DONE; @@ -4470,6 +4477,7 @@ restart: return handle_emulation_failure(vcpu); } +post: if (ctxt->have_exception) { inject_emulated_exception(vcpu); r = EMULATE_DONE; -- 1.7.4.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/2] KVM: Fix writeback on page boundary that propagate changes in spite of #PF
Consider the case in which an instruction emulation writeback is performed on a page boundary. In such case, if a #PF occurs on the second page, the write to the first page already occurred and cannot be retracted. Therefore, validation of the second page access must be performed prior to writeback. Signed-off-by: Nadav Amit --- arch/x86/kvm/x86.c | 60 +-- 1 files changed, 34 insertions(+), 26 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 05fd3d7..0e86f3a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3689,33 +3689,27 @@ static struct read_write_emulator_ops write_emultor = { .write = true, }; -static int emulator_read_write_onepage(unsigned long addr, void *val, +static int emulator_read_write_onepage(gpa_t gpa, void *val, unsigned int bytes, - struct x86_exception *exception, struct kvm_vcpu *vcpu, - struct read_write_emulator_ops *ops) + struct read_write_emulator_ops *ops, + bool mmio) { - gpa_t gpa; - int handled, ret; + int handled; bool write = ops->write; if (ops->read_write_prepare && ops->read_write_prepare(vcpu, val, bytes)) return X86EMUL_CONTINUE; - ret = vcpu_mmio_gva_to_gpa(vcpu, addr, &gpa, exception, write); - - if (ret < 0) - return X86EMUL_PROPAGATE_FAULT; - /* For APIC access vmexit */ - if (ret) - goto mmio; + if (mmio) + goto do_mmio; if (ops->read_write_emulate(vcpu, gpa, val, bytes)) return X86EMUL_CONTINUE; -mmio: +do_mmio: /* * Is this MMIO handled locally? */ @@ -3744,24 +3738,38 @@ int emulator_read_write(struct x86_emulate_ctxt *ctxt, unsigned long addr, struct read_write_emulator_ops *ops) { struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); + int i, rc; + bool write = ops->write; + gpa_t gpa[2]; + int npages = (((addr + bytes - 1) ^ addr) & PAGE_MASK) ? 2 : 1; + unsigned int offset[2] = {0}; + unsigned int p_bytes[2] = {bytes, 0}; + bool mmio[2]; + + if (npages == 2) { + p_bytes[0] = offset[1] = -addr & ~PAGE_MASK; + p_bytes[1] = bytes - p_bytes[0]; + } + + /* First check there is no page-fault on the next page */ + for (i = 0; i < npages; i++) { + rc = vcpu_mmio_gva_to_gpa(vcpu, addr + offset[i], &gpa[i], + exception, write); + if (rc < 0) + return X86EMUL_PROPAGATE_FAULT; + mmio[i] = (rc > 0); + } - /* Crossing a page boundary? */ - if (((addr + bytes - 1) ^ addr) & PAGE_MASK) { - int rc, now; - - now = -addr & ~PAGE_MASK; - rc = emulator_read_write_onepage(addr, val, now, exception, -vcpu, ops); - + /* Actual read/write */ + for (i = 0; i < npages; i++) { + rc = emulator_read_write_onepage(gpa[i], val + offset[i], +p_bytes[i], vcpu, ops, +mmio[i]); if (rc != X86EMUL_CONTINUE) return rc; - addr += now; - val += now; - bytes -= now; } - return emulator_read_write_onepage(addr, val, bytes, exception, - vcpu, ops); + return X86EMUL_CONTINUE; } static int emulator_read_emulated(struct x86_emulate_ctxt *ctxt, -- 1.7.4.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: Emulator: Opcode 0x80 gets a byte immediate
The single byte opcode 0x80 should get a byte immediate. The previous emulation is broken for opcode 0x80. --- arch/x86/kvm/emulate.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 8375622..dd53e53 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3464,7 +3464,7 @@ static struct opcode opcode_table[256] = { /* 0x70 - 0x7F */ X16(D(SrcImmByte)), /* 0x80 - 0x87 */ - G(ByteOp | DstMem | SrcImm | ModRM | Group, group1), + G(ByteOp | DstMem | SrcImmByte | ModRM | Group, group1), G(DstMem | SrcImm | ModRM | Group, group1), G(ByteOp | DstMem | SrcImm | ModRM | No64 | Group, group1), G(DstMem | SrcImmByte | ModRM | Group, group1), -- 1.7.4.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/9] KVM: x86: rdpmc emulation checks the counter incorrectly
The rdpmc emulation checks that the counter (ECX) is not higher than 2, without taking into considerations bits 30:31 role (e.g., bit 30 marks whether the counter is fixed). The fix uses the pmu information for checking the validity of the pmu counter. Signed-off-by: Nadav Amit --- arch/x86/include/asm/kvm_emulate.h | 1 + arch/x86/include/asm/kvm_host.h| 1 + arch/x86/kvm/emulate.c | 2 +- arch/x86/kvm/pmu.c | 9 + arch/x86/kvm/x86.c | 7 +++ 5 files changed, 19 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index a04fe4e..ffa2671 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -194,6 +194,7 @@ struct x86_emulate_ops { int (*set_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong value); int (*set_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 data); int (*get_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 *pdata); + int (*check_pmc)(struct x86_emulate_ctxt *ctxt, u32 pmc); int (*read_pmc)(struct x86_emulate_ctxt *ctxt, u32 pmc, u64 *pdata); void (*halt)(struct x86_emulate_ctxt *ctxt); void (*wbinvd)(struct x86_emulate_ctxt *ctxt); diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4931415..63e020b 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1070,6 +1070,7 @@ void kvm_pmu_cpuid_update(struct kvm_vcpu *vcpu); bool kvm_pmu_msr(struct kvm_vcpu *vcpu, u32 msr); int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data); int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info); +int kvm_pmu_check_pmc(struct kvm_vcpu *vcpu, unsigned pmc); int kvm_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data); void kvm_handle_pmu_event(struct kvm_vcpu *vcpu); void kvm_deliver_pmi(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index be3f764..3da8d82 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3507,7 +3507,7 @@ static int check_rdpmc(struct x86_emulate_ctxt *ctxt) u64 rcx = reg_read(ctxt, VCPU_REGS_RCX); if ((!(cr4 & X86_CR4_PCE) && ctxt->ops->cpl(ctxt)) || - (rcx > 3)) + ctxt->ops->check_pmc(ctxt, rcx)) return emulate_gp(ctxt, 0); return X86EMUL_CONTINUE; diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index cbecaa9..3dd6acc 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -428,6 +428,15 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) return 1; } +int kvm_pmu_check_pmc(struct kvm_vcpu *vcpu, unsigned pmc) +{ + struct kvm_pmu *pmu = &vcpu->arch.pmu; + bool fixed = pmc & (1u << 30); + pmc &= ~(3u << 30); + return (!fixed && pmc >= pmu->nr_arch_gp_counters) || + (fixed && pmc >= pmu->nr_arch_fixed_counters); +} + int kvm_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data) { struct kvm_pmu *pmu = &vcpu->arch.pmu; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 57eac30..2c6fccd 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4759,6 +4759,12 @@ static int emulator_set_msr(struct x86_emulate_ctxt *ctxt, return kvm_set_msr(emul_to_vcpu(ctxt), &msr); } +static int emulator_check_pmc(struct x86_emulate_ctxt *ctxt, + u32 pmc) +{ + return kvm_pmu_check_pmc(emul_to_vcpu(ctxt), pmc); +} + static int emulator_read_pmc(struct x86_emulate_ctxt *ctxt, u32 pmc, u64 *pdata) { @@ -4835,6 +4841,7 @@ static const struct x86_emulate_ops emulate_ops = { .set_dr = emulator_set_dr, .set_msr = emulator_set_msr, .get_msr = emulator_get_msr, + .check_pmc = emulator_check_pmc, .read_pmc= emulator_read_pmc, .halt= emulator_halt, .wbinvd = emulator_wbinvd, -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/9] KVM: x86: movnti minimum op size of 32-bit is not kept
If the operand-size prefix (0x66) is used in 64-bit mode, the emulator would assume the destination operand is 64-bit, when it should be 32-bit. Reminder: movnti does not support 16-bit operands and its default operand size is 32-bit. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 4cb0da6..be3f764 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -4836,8 +4836,8 @@ twobyte_insn: break; case 0xc3: /* movnti */ ctxt->dst.bytes = ctxt->op_bytes; - ctxt->dst.val = (ctxt->op_bytes == 4) ? (u32) ctxt->src.val : - (u64) ctxt->src.val; + ctxt->dst.val = (ctxt->op_bytes == 8) ? (u64) ctxt->src.val : + (u32) ctxt->src.val; break; default: goto cannot_emulate; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/9] KVM: x86: Mark VEX-prefix instructions emulation as unimplemented
Currently the emulator does not recognize vex-prefix instructions. However, it may incorrectly decode lgdt/lidt instructions and try to execute them. This patch returns unhandlable error on their emulation. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index e4e833d..8ec4a3e 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -4314,6 +4314,13 @@ done_prefixes: if (ctxt->d & ModRM) ctxt->modrm = insn_fetch(u8, ctxt); + /* vex-prefix instructions are not implemented */ + if (ctxt->opcode_len == 1 && (ctxt->b == 0xc5 || ctxt->b == 0xc4) && + (mode == X86EMUL_MODE_PROT64 || + (mode >= X86EMUL_MODE_PROT16 && (ctxt->modrm & 0x80 { + ctxt->d = NotImpl; + } + while (ctxt->d & GroupMask) { switch (ctxt->d & GroupMask) { case Group: -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/9] KVM: x86: cmpxchg emulation should compare in reverse order
The current implementation of cmpxchg does not update the flags correctly, since the accumulator should be compared with the destination and not the other way around. The current implementation does not update the flags correctly. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index a16bf22..4cb0da6 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2052,8 +2052,10 @@ static int em_ret_far_imm(struct x86_emulate_ctxt *ctxt) static int em_cmpxchg(struct x86_emulate_ctxt *ctxt) { /* Save real source value, then compare EAX against destination. */ + ctxt->dst.orig_val = ctxt->dst.val; + ctxt->dst.val = reg_read(ctxt, VCPU_REGS_RAX); ctxt->src.orig_val = ctxt->src.val; - ctxt->src.val = reg_read(ctxt, VCPU_REGS_RAX); + ctxt->src.val = ctxt->dst.orig_val; fastop(ctxt, em_cmp); if (ctxt->eflags & EFLG_ZF) { @@ -2063,6 +2065,7 @@ static int em_cmpxchg(struct x86_emulate_ctxt *ctxt) /* Failure: write the value we saw to EAX. */ ctxt->dst.type = OP_REG; ctxt->dst.addr.reg = reg_rmw(ctxt, VCPU_REGS_RAX); + ctxt->dst.val = ctxt->dst.orig_val; } return X86EMUL_CONTINUE; } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 9/9] KVM: x86: smsw emulation is incorrect in 64-bit mode
In 64-bit mode, when the destination is a register, the assignment is done according to the operand size. Otherwise (memory operand or no 64-bit mode), a 16-bit assignment is performed. Currently, 16-bit assignment is always done to the destination. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index a151f8d..9b5d97d 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3235,7 +3235,8 @@ static int em_lidt(struct x86_emulate_ctxt *ctxt) static int em_smsw(struct x86_emulate_ctxt *ctxt) { - ctxt->dst.bytes = 2; + if (ctxt->dst.type == OP_MEM) + ctxt->dst.bytes = 2; ctxt->dst.val = ctxt->ops->get_cr(ctxt, 0); return X86EMUL_CONTINUE; } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/9] KVM: x86: Return error on cmpxchg16b emulation
cmpxchg16b is currently unimplemented in the emulator. The least we can do is return error upon the emulation of this instruction. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 3da8d82..a151f8d 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1999,6 +1999,9 @@ static int em_cmpxchg8b(struct x86_emulate_ctxt *ctxt) { u64 old = ctxt->dst.orig_val64; + if (ctxt->dst.bytes == 16) + return X86EMUL_UNHANDLEABLE; + if (((u32) (old >> 0) != (u32) reg_read(ctxt, VCPU_REGS_RAX)) || ((u32) (old >> 32) != (u32) reg_read(ctxt, VCPU_REGS_RDX))) { *reg_write(ctxt, VCPU_REGS_RAX) = (u32) (old >> 0); @@ -4077,7 +4080,7 @@ static int decode_operand(struct x86_emulate_ctxt *ctxt, struct operand *op, op->orig_val = op->val; break; case OpMem64: - ctxt->memop.bytes = 8; + ctxt->memop.bytes = (ctxt->op_bytes == 8) ? 16 : 8; goto mem_common; case OpAcc: op->type = OP_REG; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/9] KVM: x86: sgdt and sidt are not privilaged
The SGDT and SIDT instructions are not privilaged, i.e. they can be executed with CPL>0. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 7e4a45c..a16bf22 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3642,8 +3642,8 @@ static const struct opcode group6[] = { }; static const struct group_dual group7 = { { - II(Mov | DstMem | Priv, em_sgdt, sgdt), - II(Mov | DstMem | Priv, em_sidt, sidt), + II(Mov | DstMem,em_sgdt, sgdt), + II(Mov | DstMem,em_sidt, sidt), II(SrcMem | Priv, em_lgdt, lgdt), II(SrcMem | Priv, em_lidt, lidt), II(SrcNone | DstMem | Mov, em_smsw, smsw), N, -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/9] KVM: x86: Fixes for various emulator bugs
The x86 emulator of KVM is buggy. This series of patches includes fixes for various bugs which were detected. Each patch stands on its own. Two patches do not fix KVM emulation, but cause the emulator to fail more nicely by returning an unhandlable error, instead of performing wrong emulation (VEX-prefix and cmpxchg16b). The fix for rdpmc is a bit intrusive to keep SVM behavior intact. Thanks for reviewing the patches. Nadav Amit (9): KVM: x86: Mark VEX-prefix instructions emulation as unimplemented KVM: x86: Emulator ignores LDTR/TR extended base on LLDT/LTR KVM: x86: Loading segments on 64-bit mode may be wrong KVM: x86: sgdt and sidt are not privilaged KVM: x86: cmpxchg emulation should compare in reverse order KVM: x86: movnti minimum op size of 32-bit is not kept KVM: x86: rdpmc emulation checks the counter incorrectly KVM: x86: Return error on cmpxchg16b emulation KVM: x86: smsw emulation is incorrect in 64-bit mode arch/x86/include/asm/kvm_emulate.h | 1 + arch/x86/include/asm/kvm_host.h| 1 + arch/x86/kvm/emulate.c | 44 -- arch/x86/kvm/pmu.c | 9 arch/x86/kvm/x86.c | 7 ++ 5 files changed, 51 insertions(+), 11 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/9] KVM: x86: Loading segments on 64-bit mode may be wrong
The current emulator implementation ignores the high 32 bits of the base in long-mode. During segment load from the LDT, the base of the LDT is calculated incorrectly and may cause the wrong segment to be loaded. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 136088f..7e4a45c 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1358,17 +1358,19 @@ static void get_descriptor_table_ptr(struct x86_emulate_ctxt *ctxt, u16 selector, struct desc_ptr *dt) { const struct x86_emulate_ops *ops = ctxt->ops; + u32 base3 = 0; if (selector & 1 << 2) { struct desc_struct desc; u16 sel; memset (dt, 0, sizeof *dt); - if (!ops->get_segment(ctxt, &sel, &desc, NULL, VCPU_SREG_LDTR)) + if (!ops->get_segment(ctxt, &sel, &desc, &base3, + VCPU_SREG_LDTR)) return; dt->size = desc_limit_scaled(&desc); /* what if limit > 65535? */ - dt->address = get_desc_base(&desc); + dt->address = get_desc_base(&desc) | ((u64)base3 << 32); } else ops->get_gdt(ctxt, dt); } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/9] KVM: x86: Emulator ignores LDTR/TR extended base on LLDT/LTR
The current implementation ignores the LDTR/TR base high 32-bits on long-mode. As a result the loaded segment descriptor may be incorrect. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 8ec4a3e..136088f 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1422,6 +1422,7 @@ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt, ulong desc_addr; int ret; u16 dummy; + u32 base3 = 0; memset(&seg_desc, 0, sizeof seg_desc); @@ -1538,9 +1539,14 @@ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt, ret = write_segment_descriptor(ctxt, selector, &seg_desc); if (ret != X86EMUL_CONTINUE) return ret; + } else if (ctxt->mode == X86EMUL_MODE_PROT64) { + ret = ctxt->ops->read_std(ctxt, desc_addr+8, &base3, + sizeof(base3), &ctxt->exception); + if (ret != X86EMUL_CONTINUE) + return ret; } load: - ctxt->ops->set_segment(ctxt, selector, &seg_desc, 0, seg); + ctxt->ops->set_segment(ctxt, selector, &seg_desc, base3, seg); return X86EMUL_CONTINUE; exception: emulate_exception(ctxt, err_vec, err_code, true); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] kvm: x86: emulate monitor and mwait instructions as nop
On Jun 4, 2014, at 7:34 PM, Paolo Bonzini wrote: > Il 04/06/2014 16:44, Alexander Graf ha scritto: >> >> >>> Obviously, if you really like the current behavior better you can >>> always reject whatever patch I'll come up with, but I'd like to at >>> least try and see what it would look like :) >> >> I think it's perfectly fine to leave mwait always implemented as NOP - >> it's valid behavior. >> >> As for the CPUID exposure, that should be a pure QEMU thing. If >> overriding CPUID bits the kernel mask tells us doesn't work today, we >> should just make it possible :). > > That should be the purpose of KVM_GET_EMULATED_CPUID, so MWAIT could be added > in __do_cpuid_ent_emulated. However, the corresponding QEMU patches were > never included. Borislav, can you refresh them? Regardless to the whole discussion of what the guest is informed about, I think it might be better to implement mwait and monitor correctly according to the spec and let the instructions to be fully emulated. Both mwait and monitor may encounter exceptions (#GP, #PF, regardless of #UD), so this behaviour should be correct. If you want me, I’ll send my version which looks something like: static int em_monitor(struct x86_emulate_ctxt *ctxt) { int rc; struct segmented_address addr; u64 rcx = reg_read(ctxt, VCPU_REGS_RCX); u64 rax = reg_read(ctxt, VCPU_REGS_RAX); u8 byte; rc = check_mwait_supported(ctxt); if (rc != X86EMUL_CONTINUE) return rc; if (ctxt->mode != X86EMUL_MODE_PROT64) rcx = (u32)rcx; if (rcx != 0) return emulate_gp(ctxt, 0); addr.seg = seg_override(ctxt); addr.ea = ctxt->ad_bytes == 8 ? rax : (u32)rax; rc = segmented_read(ctxt, addr, &byte, 1); if (rc != X86EMUL_CONTINUE) return rc; return X86EMUL_CONTINUE; } static int em_mwait(struct x86_emulate_ctxt *ctxt) { u64 rcx = reg_read(ctxt, VCPU_REGS_RCX); int rc = check_mwait_supported(ctxt); if (rc != X86EMUL_CONTINUE) return rc; if (ctxt->mode != X86EMUL_MODE_PROT64) rcx = (u32)rcx; if ((rcx & ~(u64)1) != 0) return emulate_gp(ctxt, 0); if (rcx & 1) { /* Interrupt as break event */ u32 ebx, ecx, edx, eax; eax = 5; ecx = 0; ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx); if (!(ecx & 1)) return emulate_gp(ctxt, 0); } return X86EMUL_CONTINUE; } signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [PATCH v2] kvm: x86: emulate monitor and mwait instructions as nop
On Jun 4, 2014, at 10:43 PM, Gabriel L. Somlo wrote: My implementation still emulates the instruction as a NOP, but first checks for an exception. > On Wed, Jun 04, 2014 at 10:12:39PM +0300, Nadav Amit wrote: > > I'd be curious how you're dealing with the "hidden" CPU state which > tells MWAIT to sleep until someone or something writes to the > monitored memory area set up by a corresponding MONITOR instruction. >> Regardless to the whole discussion of what the guest is informed about, I >> think it might be better to implement mwait and monitor correctly according >> to the spec and let the instructions to be fully emulated. >> Both mwait and monitor may encounter exceptions (#GP, #PF, regardless of >> #UD), so this behaviour should be correct. >> If you want me, I?ll send my version which looks something like: >> >> static int em_monitor(struct x86_emulate_ctxt *ctxt) >> { >> int rc; >> struct segmented_address addr; >> u64 rcx = reg_read(ctxt, VCPU_REGS_RCX); >> u64 rax = reg_read(ctxt, VCPU_REGS_RAX); >> u8 byte; >> >> rc = check_mwait_supported(ctxt); >> if (rc != X86EMUL_CONTINUE) >> return rc; >> >> if (ctxt->mode != X86EMUL_MODE_PROT64) >> rcx = (u32)rcx; >> >> if (rcx != 0) >> return emulate_gp(ctxt, 0); >> >> addr.seg = seg_override(ctxt); >> addr.ea = ctxt->ad_bytes == 8 ? rax : (u32)rax; >> >> rc = segmented_read(ctxt, addr, &byte, 1); >> if (rc != X86EMUL_CONTINUE) >> return rc; >> >> return X86EMUL_CONTINUE; >> } >> >> static int em_mwait(struct x86_emulate_ctxt *ctxt) >> { >> u64 rcx = reg_read(ctxt, VCPU_REGS_RCX); >> int rc = check_mwait_supported(ctxt); >> if (rc != X86EMUL_CONTINUE) >> return rc; >> if (ctxt->mode != X86EMUL_MODE_PROT64) >> rcx = (u32)rcx; >> >> if ((rcx & ~(u64)1) != 0) >> return emulate_gp(ctxt, 0); >> >> if (rcx & 1) { >> /* Interrupt as break event */ >> u32 ebx, ecx, edx, eax; >> eax = 5; >> ecx = 0; >> ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx); >> if (!(ecx & 1)) >> return emulate_gp(ctxt, 0); >> } >> return X86EMUL_CONTINUE; >> } Anyhow, if you want a real mwait emulation, you can write-protect the page of the monitored memory area in the EPT of the other VCPUs and set a callback once a write to the area takes place. You may want the host to cause a spurious wakeup after you do the write-protection, so you will not miss a write of another VCPU to the monitored area. After the spurious wake-up, the VM is likely to issue an additional mwait, using the same monitored cache-line. Additional care for DMAs (emulated and paravirtual) might be needed with the assistance of QEMU. The complicated case is dealing with the DMAs of assigned devices due to the lack of support for I/O page-faules. Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] kvm: x86: emulate monitor and mwait instructions as nop
On Jun 4, 2014, at 11:11 PM, Gabriel L. Somlo wrote: > On Wed, Jun 04, 2014 at 11:01:50PM +0300, Nadav Amit wrote: >> On Jun 4, 2014, at 10:43 PM, Gabriel L. Somlo wrote: >> >> My implementation still emulates the instruction as a NOP, but first checks >> for an exception. > > [...] > >> Anyhow, if you want a real mwait emulation, you can write-protect the page >> of the monitored memory area in the EPT of the other VCPUs and set a >> callback once a write to the area takes place. You may want the host to >> cause a spurious wakeup after you do the write-protection, so you will not >> miss a write of another VCPU to the monitored area. After the spurious >> wake-up, the VM is likely to issue an additional mwait, using the same >> monitored cache-line. >> >> Additional care for DMAs (emulated and paravirtual) might be needed with the >> assistance of QEMU. The complicated case is dealing with the DMAs of >> assigned devices due to the lack of support for I/O page-faules. > > I took a stab at something like that a while ago: > > http://www.contrib.andrew.cmu.edu/~somlo/OSXKVM/kvm-mwait-emu-20140205.patch > > with limited success, probably due to my lack of familiarity with > the fine details of the KVM code base... :) > > My main interest was to get it working well enough to be useful for > idle loops (which is the only thing I know of that either Linux or > OS X use monitor and mwait for, currently -- so DMA wasn't a huge > priority). > > Even if we got it working well enough in the general case (any number > of vcpus, etc) I think it would still suck for idle loops when > compared to simply falling back to HLT (mainly due to all the TLB > shootdowns required to make it work) :) MWAIT is likely not to perform as good as HLT, but you may play with other techniques, like clearing the dirty flag of the page on other VCPUs and periodically check it. You will still need TLB shootdowns at that case, and wakeup might be slower, but at least you will have fewer emulations and exits. Anyhow, I would try to avoid using the emulator for your purpose (especially since your guest crashes), and try to use the MTF feature instead - at least to get the system to work. Regards, Nadav-- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 9/9] KVM: x86: smsw emulation is incorrect in 64-bit mode
On Jun 5, 2014, at 5:53 PM, Paolo Bonzini wrote: > Il 02/06/2014 17:34, Nadav Amit ha scritto: >> In 64-bit mode, when the destination is a register, the assignment is done >> according to the operand size. Otherwise (memory operand or no 64-bit mode), >> a >> 16-bit assignment is performed. > > I'm sorry, I'm missing the place where 64-bit mode is taken into account? It is not, since on 32-bit mode the high-order 16 bits of a register destination are undefined. If I recall correctly, in this case the high-order 16-bits on native system actually reflect the high-order 16-bits of CR0. Therefore, regardless to the mode of operation: 1. If the target is memory - the destination is 16-bit 2. If the target is a register - the register is written according to the operand size. Nadav-- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 2/2] x86: realmode: test smsw behavior with register operand
The smsw instruction has an undocumented behavior, in which the high-order 16-bits of CR0 are also saved in a 32-bit destination register. This is similar to the way smsw behaves in long-mode. However, it is hard to test the long-mode case, since we need to cause an "invalid guest state" in long-mode. The test works as follows: it sets CR0.CD (bit 30), so any of the high 16-bits would be set. It then executes smsw to register destination and compares the register value with that of CR0. CR0 value is restored when the test is done. This test is expected to fail only when unrestricted mode is disabled or unsupported. Signed-off-by: Nadav Amit --- x86/realmode.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/x86/realmode.c b/x86/realmode.c index 839ac34..6e74883 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -1646,6 +1646,20 @@ void test_dr_mod(void) report("mov dr with mod bits", R_AX | R_BX, outregs.eax == 0xaced); } +void test_smsw(void) +{ + MK_INSN(smsw, "movl %cr0, %ebx\n\t" + "movl %ebx, %ecx\n\t" + "or $0x4000, %ebx\n\t" + "movl %ebx, %cr0\n\t" + "smswl %eax\n\t" + "movl %ecx, %cr0\n\t"); + inregs.eax = 0x12345678; + exec_in_big_real_mode(&insn_smsw); + report("smsw", R_AX | R_BX | R_CX, outregs.eax == outregs.ebx); +} + + void realmode_start(void) { test_null(); @@ -1692,6 +1706,7 @@ void realmode_start(void) test_salc(); test_fninit(); test_dr_mod(); + test_smsw(); test_nopl(); test_perf_loop(); test_perf_mov(); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests1/2] x86: emulator: additional smsw test-case
An additional test case for the emulator was added to test smsw which is trapped by the emulator. The other existing test-cases occur in the guest (at least on VMX), since the values are read directly from the CR0 read shadow. Signed-off-by: Nadav Amit --- x86/emulator.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/x86/emulator.c b/x86/emulator.c index 20e3a45..033f246 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -336,7 +336,7 @@ void test_incdecnotneg(void *mem) report("lock notb", *mb == vb); } -void test_smsw(void) +void test_smsw(uint64_t *h_mem) { char mem[16]; unsigned short msw, msw_orig, *pmsw; @@ -355,6 +355,12 @@ void test_smsw(void) if (i != 4 && pmsw[i]) zero = 0; report("smsw (2)", msw == pmsw[4] && zero); + + /* Trigger exit on smsw */ + *h_mem = 0x12345678abcdeful; + asm volatile("smsw %0" : "=m"(*h_mem)); + report("smsw (3)", msw == (unsigned short)*h_mem && + (*h_mem & ~0xul) == 0x12345678abul); } void test_lmsw(void) @@ -998,7 +1004,7 @@ int main() test_cr8(); - test_smsw(); + test_smsw(mem); test_lmsw(); test_ljmp(mem); test_stringio(); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 0/2] x86: Additional smsw tests
This patch set adds two tests for smsw. The first one is intended to add coverage of smsw. It covers the case smsw is executed with memory operand in a page which is write-protected by the hypervisor. Note that the existing smsw tests are not supposed to be trapped by the hypervisor. This test was added just for additional coverage. The realmode smsw test covers the recent patch that saves the high 16-bits to 32-bit register operand. Implementing a long-mode test is difficult since we need to cause an "invalid guest state" in long-mode. Nadav Amit (2): x86: emulator: additional smsw test-case x86: realmode: test smsw behavior with register operand x86/emulator.c | 10 -- x86/realmode.c | 15 +++ 2 files changed, 23 insertions(+), 2 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] x86: realmode: report failures
The current realmode tests always report success when done, regardless to whether any of the tests failed. Although the log includes the individual test results, this behavior complicates the life of the tester. Signed-off-by: Nadav Amit --- x86/realmode.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/x86/realmode.c b/x86/realmode.c index 6e74883..dc4a1d3 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -103,6 +103,8 @@ static void print_serial_u32(u32 value) print_serial(p); } +static int failed; + static void exit(int code) { outb(code, 0xf4); @@ -222,6 +224,8 @@ static void report(const char *name, u16 regs_ignore, _Bool ok) print_serial(ok ? "PASS: " : "FAIL: "); print_serial(name); print_serial("\n"); +if (!ok) + failed = 1; } #define MK_INSN(name, str) \ @@ -1715,7 +1719,7 @@ void realmode_start(void) test_perf_memory_store(); test_perf_memory_rmw(); - exit(0); + exit(failed); } unsigned long long r_gdt[] = { 0, 0x9b00, 0x9300 }; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests v2] x86: emulator: long mode smsw tests
SMSW instruction on long-mode is performed according to the operand size, if the destination operand is a register. This patch tests whether it is performed correctly instead of always using two-bytes operands (as KVM previously did). Note that when a dword destination operand is used, the result is zero-extended to qword on long-mode. Signed-off-by: Nadav Amit --- x86/emulator.c | 21 + 1 file changed, 21 insertions(+) diff --git a/x86/emulator.c b/x86/emulator.c index 033f246..f653127 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -825,6 +825,26 @@ static void test_movabs(uint64_t *mem, uint8_t *insn_page, report("64-bit mov imm2", outregs.rcx == 0x9090909090909090); } +static void test_smsw_reg(uint64_t *mem, uint8_t *insn_page, + uint8_t *alt_insn_page, void *insn_ram) +{ + unsigned long cr0 = read_cr0(); + inregs = (struct regs){ .rax = 0x1234567890abcdeful }; + + MK_INSN(smsww, "smsww %ax\n\t"); + trap_emulator(mem, alt_insn_page, &insn_smsww); + report("16-bit smsw reg", (u16)outregs.rax == (u16)cr0 && + outregs.rax >> 16 == inregs.rax >> 16); + + MK_INSN(smswl, "smswl %eax\n\t"); + trap_emulator(mem, alt_insn_page, &insn_smswl); + report("32-bit smsw reg", outregs.rax == (u32)cr0); + + MK_INSN(smswq, "smswq %rax\n\t"); + trap_emulator(mem, alt_insn_page, &insn_smswq); + report("64-bit smsw reg", outregs.rax == cr0); +} + static void test_crosspage_mmio(volatile uint8_t *mem) { volatile uint16_t w, *pw; @@ -1024,6 +1044,7 @@ int main() test_mmx_movq_mf(mem, insn_page, alt_insn_page, insn_ram); test_movabs(mem, insn_page, alt_insn_page, insn_ram); + test_smsw_reg(mem, insn_page, alt_insn_page, insn_ram); test_crosspage_mmio(mem); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] KVM: x86: bit-ops emulation ignores offset on 64-bit
The current emulation of bit operations ignores the offset from the destination on 64-bit target memory operands. This patch fixes this behavior. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index e4e833d..f0b0a10 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1220,12 +1220,14 @@ static void fetch_bit_operand(struct x86_emulate_ctxt *ctxt) long sv = 0, mask; if (ctxt->dst.type == OP_MEM && ctxt->src.type == OP_REG) { - mask = ~(ctxt->dst.bytes * 8 - 1); + mask = ~((long)ctxt->dst.bytes * 8 - 1); if (ctxt->src.bytes == 2) sv = (s16)ctxt->src.val & (s16)mask; else if (ctxt->src.bytes == 4) sv = (s32)ctxt->src.val & (s32)mask; + else + sv = (s64)ctxt->src.val & (s64)mask; ctxt->dst.addr.mem.ea += (sv >> 3); } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] KVM: x86: check DR6/7 high-bits are clear only on long-mode
From: Nadav Amit When the guest sets DR6 and DR7, KVM asserts the high 32-bits are clear, and otherwise injects a #GP exception. This exception should only be injected only if running in long-mode. Signed-off-by: Nadav Amit --- arch/x86/kvm/x86.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 57eac30..71fe841 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -756,6 +756,15 @@ static void kvm_update_dr7(struct kvm_vcpu *vcpu) vcpu->arch.switch_db_regs |= KVM_DEBUGREG_BP_ENABLED; } +static bool is_64_bit_mode(struct kvm_vcpu *vcpu) +{ + int cs_db, cs_l; + if (!is_long_mode(vcpu)) + return false; + kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l); + return cs_l; +} + static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) { switch (dr) { @@ -769,7 +778,7 @@ static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) return 1; /* #UD */ /* fall through */ case 6: - if (val & 0xULL) + if ((val & 0xULL) && is_64_bit_mode(vcpu)) return -1; /* #GP */ vcpu->arch.dr6 = (val & DR6_VOLATILE) | DR6_FIXED_1; kvm_update_dr6(vcpu); @@ -779,7 +788,7 @@ static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) return 1; /* #UD */ /* fall through */ default: /* 7 */ - if (val & 0xULL) + if ((val & 0xULL) && is_64_bit_mode(vcpu)) return -1; /* #GP */ vcpu->arch.dr7 = (val & DR7_VOLATILE) | DR7_FIXED_1; kvm_update_dr7(vcpu); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] KVM: x86: NOP emulation clears (incorrectly) the high 32-bits of RAX
On long-mode the current NOP (0x90) emulation still writes back to RAX. As a result, EAX is zero-extended and the high 32-bits of RAX are cleared. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index b354531..eb93eb4 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -4663,8 +4663,9 @@ special_insn: break; case 0x90 ... 0x97: /* nop / xchg reg, rax */ if (ctxt->dst.addr.reg == reg_rmw(ctxt, VCPU_REGS_RAX)) - break; - rc = em_xchg(ctxt); + ctxt->dst.type = OP_NONE; + else + rc = em_xchg(ctxt); break; case 0x98: /* cbw/cwde/cdqe */ switch (ctxt->op_bytes) { -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] KVM: x86: Wrong emulation on 'xadd X, X'
The emulator does not emulate the xadd instruction correctly if the two operands are the same. In this (unlikely) situation the result should be the sum of X and X (2X) when it is currently X. The solution is to first perform writeback to the source, before writing to the destination. The only instruction which should be affected is xadd, as the other instructions that perform writeback to the source use the extended accumlator (e.g., RAX:RDX). Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index f0b0a10..3c8d867 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -4711,17 +4711,17 @@ special_insn: goto done; writeback: - if (!(ctxt->d & NoWrite)) { - rc = writeback(ctxt, &ctxt->dst); - if (rc != X86EMUL_CONTINUE) - goto done; - } if (ctxt->d & SrcWrite) { BUG_ON(ctxt->src.type == OP_MEM || ctxt->src.type == OP_MEM_STR); rc = writeback(ctxt, &ctxt->src); if (rc != X86EMUL_CONTINUE) goto done; } + if (!(ctxt->d & NoWrite)) { + rc = writeback(ctxt, &ctxt->dst); + if (rc != X86EMUL_CONTINUE) + goto done; + } /* * restore dst type in case the decoding will be reused -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] KVM: x86: emulation of dword cmov on long-mode should clear [63:32]
Even if the condition of cmov is not satisfied, bits[63:32] should be cleared. This is clearly stated in Intel's CMOVcc documentation. The solution is to reassign the destination onto itself if the condition is unsatisfied. For that matter the original destination value needs to be read. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 0183350..b354531 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3905,7 +3905,7 @@ static const struct opcode twobyte_table[256] = { N, N, N, N, N, N, N, N, N, N, /* 0x40 - 0x4F */ - X16(D(DstReg | SrcMem | ModRM | Mov)), + X16(D(DstReg | SrcMem | ModRM)), /* 0x50 - 0x5F */ N, N, N, N, N, N, N, N, N, N, N, N, N, N, N, N, /* 0x60 - 0x6F */ @@ -4799,8 +4799,10 @@ twobyte_insn: ops->get_dr(ctxt, ctxt->modrm_reg, &ctxt->dst.val); break; case 0x40 ... 0x4f: /* cmov */ - ctxt->dst.val = ctxt->dst.orig_val = ctxt->src.val; - if (!test_cc(ctxt->b, ctxt->eflags)) + if (test_cc(ctxt->b, ctxt->eflags)) + ctxt->dst.val = ctxt->src.val; + else if (ctxt->mode != X86EMUL_MODE_PROT64 || +ctxt->op_bytes != 4) ctxt->dst.type = OP_NONE; /* no writeback */ break; case 0x80 ... 0x8f: /* jnz rel, etc*/ -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/6] KVM: x86: More emulator bugs
This patch-set resolves several emulator bugs. Each fix is independent of the others. The DR6/7 bug can occur during DR-access exit (regardless to unrestricted mode, MMIO and SPT). Thanks for reviewing the patches, Nadav Nadav Amit (6): KVM: x86: bit-ops emulation ignores offset on 64-bit KVM: x86: Wrong emulation on 'xadd X, X' KVM: x86: Inter privilage level ret emulation is not implemeneted KVM: x86: emulation of dword cmov on long-mode should clear [63:32] KVM: x86: NOP emulation clears (incorrectly) the high 32-bits of RAX KVM: x86: check DR6/7 high-bits are clear only on long-mode arch/x86/kvm/emulate.c | 31 --- arch/x86/kvm/x86.c | 13 +++-- 2 files changed, 31 insertions(+), 13 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/6] KVM: x86: Inter privilage level ret emulation is not implemeneted
Return unhandlable error on inter-privilage level ret instruction. This is since the current emulation does not check the privilage level correctly when loading the CS, and does not pop RSP/SS as needed. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 3c8d867..0183350 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2019,6 +2019,7 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt) { int rc; unsigned long cs; + int cpl = ctxt->ops->cpl(ctxt); rc = emulate_pop(ctxt, &ctxt->_eip, ctxt->op_bytes); if (rc != X86EMUL_CONTINUE) @@ -2028,6 +2029,9 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt) rc = emulate_pop(ctxt, &cs, ctxt->op_bytes); if (rc != X86EMUL_CONTINUE) return rc; + /* Outer-privilage level return is not implemented */ + if (ctxt->mode >= X86EMUL_MODE_PROT16 && (cs & 3) > cpl) + return X86EMUL_UNHANDLEABLE; rc = load_segment_descriptor(ctxt, (u16)cs, VCPU_SREG_CS); return rc; } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] KVM: x86: check DR6/7 high-bits are clear only on long-mode
On 6/16/14, 1:17 PM, Paolo Bonzini wrote: Il 15/06/2014 15:13, Nadav Amit ha scritto: From: Nadav Amit When the guest sets DR6 and DR7, KVM asserts the high 32-bits are clear, and otherwise injects a #GP exception. This exception should only be injected only if running in long-mode. Signed-off-by: Nadav Amit --- arch/x86/kvm/x86.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 57eac30..71fe841 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -756,6 +756,15 @@ static void kvm_update_dr7(struct kvm_vcpu *vcpu) vcpu->arch.switch_db_regs |= KVM_DEBUGREG_BP_ENABLED; } +static bool is_64_bit_mode(struct kvm_vcpu *vcpu) +{ +int cs_db, cs_l; +if (!is_long_mode(vcpu)) +return false; +kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l); +return cs_l; +} + static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) { switch (dr) { @@ -769,7 +778,7 @@ static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) return 1; /* #UD */ /* fall through */ case 6: -if (val & 0xULL) +if ((val & 0xULL) && is_64_bit_mode(vcpu)) return -1; /* #GP */ vcpu->arch.dr6 = (val & DR6_VOLATILE) | DR6_FIXED_1; kvm_update_dr6(vcpu); @@ -779,7 +788,7 @@ static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) return 1; /* #UD */ /* fall through */ default: /* 7 */ -if (val & 0xULL) +if ((val & 0xULL) && is_64_bit_mode(vcpu)) return -1; /* #GP */ vcpu->arch.dr7 = (val & DR7_VOLATILE) | DR7_FIXED_1; kvm_update_dr7(vcpu); Do you get this if the input register has bit 31 set? No. To be frank, the scenario may be considered a bit synthetic: the guest assigns a value to a general-purpose register in 64-bit mode, setting the high 32-bits to some non-zero value. Then, later, in 32-bit mode, the guest performs MOV DR instruction. In between the two assignments, the general purpose register is unmodified, so the high 32-bits of the general purpose registers are still set. Note that this scenario does not occur when MOV DR is emulated, but when handle_dr() is called. In this case, the entire 64-bits of the general purpose register used for MOV DR are read, regardless to the execution mode of the guest. Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] KVM: x86: check DR6/7 high-bits are clear only on long-mode
On 6/16/14, 2:09 PM, Paolo Bonzini wrote: Il 16/06/2014 12:33, Nadav Amit ha scritto: Do you get this if the input register has bit 31 set? No. To be frank, the scenario may be considered a bit synthetic: the guest assigns a value to a general-purpose register in 64-bit mode, setting the high 32-bits to some non-zero value. Then, later, in 32-bit mode, the guest performs MOV DR instruction. In between the two assignments, the general purpose register is unmodified, so the high 32-bits of the general purpose registers are still set. Note that this scenario does not occur when MOV DR is emulated, but when handle_dr() is called. In this case, the entire 64-bits of the general purpose register used for MOV DR are read, regardless to the execution mode of the guest. I wonder if the same bug happens elsewhere. For example, kvm_emulate_hypercall doesn't look at CS.L/CS.DB, which is really a corner case but arguably also a bug. kvm_hv_hypercall instead does it right. Perhaps we need a variant of kvm_register_read that (on 64-bit hosts) checks EFER/CS.L/CS.DB and masks the returned value accordingly. You could call it kvm_register_readl. There are two questions that come in mind: 1. Should we ignore CS.DB? It would make it consistent with kvm_hv_hypercall and handle_dr. I think this is the proper behavior. 2. Reading CS.L once and masking all the registers (i.e., changing the is_long_mode in kvm_emulate_hypercall to is_64_bit_mode) is likely to be more efficient. Regards, Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] KVM: x86: check DR6/7 high-bits are clear only on long-mode
On 6/16/14, 5:56 PM, Paolo Bonzini wrote: Il 16/06/2014 13:53, Nadav Amit ha scritto: On 6/16/14, 2:09 PM, Paolo Bonzini wrote: Il 16/06/2014 12:33, Nadav Amit ha scritto: Do you get this if the input register has bit 31 set? No. To be frank, the scenario may be considered a bit synthetic: the guest assigns a value to a general-purpose register in 64-bit mode, setting the high 32-bits to some non-zero value. Then, later, in 32-bit mode, the guest performs MOV DR instruction. In between the two assignments, the general purpose register is unmodified, so the high 32-bits of the general purpose registers are still set. Note that this scenario does not occur when MOV DR is emulated, but when handle_dr() is called. In this case, the entire 64-bits of the general purpose register used for MOV DR are read, regardless to the execution mode of the guest. I wonder if the same bug happens elsewhere. For example, kvm_emulate_hypercall doesn't look at CS.L/CS.DB, which is really a corner case but arguably also a bug. kvm_hv_hypercall instead does it right. Perhaps we need a variant of kvm_register_read that (on 64-bit hosts) checks EFER/CS.L/CS.DB and masks the returned value accordingly. You could call it kvm_register_readl. There are two questions that come in mind: 1. Should we ignore CS.DB? It would make it consistent with kvm_hv_hypercall and handle_dr. I think this is the proper behavior. It depends on what you're using it for, but as a start yes. 2. Reading CS.L once and masking all the registers (i.e., changing the is_long_mode in kvm_emulate_hypercall to is_64_bit_mode) is likely to be more efficient. Yes, for the case of kvm_emulate_hypercall. Then you can build kvm_register_readl on top of is_64bit_mode and fix this bug with that function. Did you check that handle_cr is unaffected? handle_cr is affected, and there are some other instances of affected register reads. Actually, most of the vmx instruction exit handling routines ignore long-mode and cs.l when considering the register operand size. I'll make the necessary changes and resubmit. Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86: Increase the number of fixed MTRR regs to 10
Recent Intel CPUs have 10 variable range MTRRs. Since operating systems sometime make assumptions on CPUs while they ignore capability MSRs, it is better for KVM to be consistent with recent CPUs. Reporting more MTRRs than actually supported has no functional implications. Signed-off-by: Nadav Amit --- arch/x86/include/asm/kvm_host.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4931415..0bab29d 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -95,7 +95,7 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level) #define KVM_REFILL_PAGES 25 #define KVM_MAX_CPUID_ENTRIES 80 #define KVM_NR_FIXED_MTRR_REGION 88 -#define KVM_NR_VAR_MTRR 8 +#define KVM_NR_VAR_MTRR 10 #define ASYNC_PF_PER_VCPU 64 -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] KVM: x86: Emulator support for #UD on CPL>0
Certain instructions (e.g., mwait and monitor) cause a #UD exception when they are executed in privilaged mode. This is in contrast to the regular privilaged instructions which cause #GP. In order not to mess with SVM interception of mwait and monitor which assumes privilage level assertions take place before interception, a flag has been added. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index f90194d..ef7a5a0 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -163,6 +163,7 @@ #define SrcWrite((u64)1 << 46) /* Write back src operand */ #define NoMod ((u64)1 << 47) /* Mod field is ignored */ #define NoBigReal ((u64)1 << 48) /* No big real mode */ +#define UDOnPriv((u64)1 << 49) /* #UD instead of #GP on CPL > 0 */ #define DstXacc (DstAccLo | SrcAccHi | SrcWrite) @@ -4560,7 +4561,10 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt) /* Privileged instruction can be executed only in CPL=0 */ if ((ctxt->d & Priv) && ops->cpl(ctxt)) { - rc = emulate_gp(ctxt, 0); + if (ctxt->d & UDOnPriv) + rc = emulate_ud(ctxt); + else + rc = emulate_gp(ctxt, 0); goto done; } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] Correct monitor-mwait emulation as nop
KVM handles monitor-mwait as nop, but does not check any of the preconditions for the instructions. These instructions may generate all kind of exceptions (#UD, #PF, #GP, #SS). They can also be executed in real-mode. This patch-set moves the handling of monitor-mwait to the emulator, to allow their execution in either real-mode or protected-mode. It tries to follow the SDM in checking the preconditions and generating the necassary exceptions. Thanks for reviewing the patch. Please try it with OS X to make sure it works properly without generating unnecassary exception. Nadav Amit (3): KVM: x86: Emulator flag for instruction with no big real mode KVM: x86: Emulator support for #UD on CPL>0 KVM: x86: correct mwait and monitor emulation arch/x86/kvm/emulate.c | 55 ++ arch/x86/kvm/svm.c | 22 ++-- arch/x86/kvm/vmx.c | 27 ++--- 3 files changed, 64 insertions(+), 40 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] KVM: x86: Emulator flag for instruction with no big real mode
Certain instructions, such as monitor and xsave do not support big real mode and cause a #GP exception if any of the accessed bytes effective address are not within [0, 0x]. This patch introduces a flag to mark these instructions, including the necassary checks. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index e4e833d..f90194d 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -162,6 +162,7 @@ #define NoWrite ((u64)1 << 45) /* No writeback */ #define SrcWrite((u64)1 << 46) /* Write back src operand */ #define NoMod ((u64)1 << 47) /* Mod field is ignored */ +#define NoBigReal ((u64)1 << 48) /* No big real mode */ #define DstXacc (DstAccLo | SrcAccHi | SrcWrite) @@ -651,7 +652,12 @@ static int __linearize(struct x86_emulate_ctxt *ctxt, if (!fetch && (desc.type & 8) && !(desc.type & 2)) goto bad; lim = desc_limit_scaled(&desc); - if ((desc.type & 8) || !(desc.type & 4)) { + if ((ctxt->mode == X86EMUL_MODE_REAL) && !fetch && + (ctxt->d & NoBigReal)) { + /* la is between zero and 0x */ + if (la > 0x || (u32)(la + size - 1) > 0x) + goto bad; + } else if ((desc.type & 8) || !(desc.type & 4)) { /* expand-up segment */ if (addr.ea > lim || (u32)(addr.ea + size - 1) > lim) goto bad; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] KVM: x86: correct mwait and monitor emulation
mwait and monitor are currently handled as nop. Considering this behavior, they should still be handled correctly, i.e., check execution conditions and generate exceptions when required. mwait and monitor may also be executed in real-mode and are not handled in that case. This patch performs the emulation of monitor-mwait according to Intel SDM (other than checking whether interrupt can be used as a break event). Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 41 +++-- arch/x86/kvm/svm.c | 22 ++ arch/x86/kvm/vmx.c | 27 +++ 3 files changed, 52 insertions(+), 38 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index ef7a5a0..424b58d 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3344,6 +3344,43 @@ static int em_bswap(struct x86_emulate_ctxt *ctxt) return X86EMUL_CONTINUE; } +static int em_monitor(struct x86_emulate_ctxt *ctxt) +{ + int rc; + struct segmented_address addr; + u64 rcx = reg_read(ctxt, VCPU_REGS_RCX); + u64 rax = reg_read(ctxt, VCPU_REGS_RAX); + u8 byte; + + if (ctxt->mode != X86EMUL_MODE_PROT64) + rcx = (u32)rcx; + if (rcx != 0) + return emulate_gp(ctxt, 0); + + addr.seg = seg_override(ctxt); + addr.ea = ctxt->ad_bytes == 8 ? rax : (u32)rax; + rc = segmented_read(ctxt, addr, &byte, 1); + if (rc != X86EMUL_CONTINUE) + return rc; + + printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n"); + return X86EMUL_CONTINUE; +} + +static int em_mwait(struct x86_emulate_ctxt *ctxt) +{ + u64 rcx = reg_read(ctxt, VCPU_REGS_RCX); + + if (ctxt->mode != X86EMUL_MODE_PROT64) + rcx = (u32)rcx; + if ((rcx & ~1UL) != 0) + return emulate_gp(ctxt, 0); + + /* Accepting interrupt as break event regardless to cpuid */ + printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n"); + return X86EMUL_CONTINUE; +} + static bool valid_cr(int nr) { switch (nr) { @@ -3557,8 +3594,8 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt) F2bv(((_f) & ~Lock) | DstAcc | SrcImm, _e) static const struct opcode group7_rm1[] = { - DI(SrcNone | Priv, monitor), - DI(SrcNone | Priv, mwait), + II(SrcNone | Priv | NoBigReal | UDOnPriv, em_monitor, monitor), + II(SrcNone | Priv | NoBigReal | UDOnPriv, em_mwait, mwait), N, N, N, N, N, N, }; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index ec8366c..a524e04 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3274,24 +3274,6 @@ static int pause_interception(struct vcpu_svm *svm) return 1; } -static int nop_interception(struct vcpu_svm *svm) -{ - skip_emulated_instruction(&(svm->vcpu)); - return 1; -} - -static int monitor_interception(struct vcpu_svm *svm) -{ - printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n"); - return nop_interception(svm); -} - -static int mwait_interception(struct vcpu_svm *svm) -{ - printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n"); - return nop_interception(svm); -} - static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = { [SVM_EXIT_READ_CR0] = cr_interception, [SVM_EXIT_READ_CR3] = cr_interception, @@ -3349,8 +3331,8 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = { [SVM_EXIT_CLGI] = clgi_interception, [SVM_EXIT_SKINIT] = skinit_interception, [SVM_EXIT_WBINVD] = emulate_on_interception, - [SVM_EXIT_MONITOR] = monitor_interception, - [SVM_EXIT_MWAIT]= mwait_interception, + [SVM_EXIT_MONITOR] = emulate_on_interception, + [SVM_EXIT_MWAIT]= emulate_on_interception, [SVM_EXIT_XSETBV] = xsetbv_interception, [SVM_EXIT_NPF] = pf_interception, }; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 801332e..7023e71 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5672,22 +5672,17 @@ static int handle_pause(struct kvm_vcpu *vcpu) return 1; } -static int handle_nop(struct kvm_vcpu *vcpu) +static int handle_emulate(struct kvm_vcpu *vcpu) { - skip_emulated_instruction(vcpu); - return 1; -} + int err = emulate_instruction(vcpu, 0); -static int handle_mwait(struct kvm_vcpu *vcpu) -{ - printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n"); - return handle_nop(vcpu); -} - -static int handle_monitor(struct kvm_vcpu *vcpu) -{ - printk_on
[PATCH v2 9/9] KVM: vmx: vmx instructions handling does not consider cs.l
VMX instructions use 32-bit operands in 32-bit mode, and 64-bit operands in 64-bit mode. The current implementation is broken since it does not use the register operands correctly, and always uses 64-bit for reads and writes. Moreover, write to memory in vmwrite only considers long-mode, so it ignores cs.l. This patch fixes this behavior. The field of vmread/vmwrite is kept intentionally as 64-bit read since if bits [63:32] are not cleared the instruction should fail, according to Intel SDM. Signed-off-by: Nadav Amit --- arch/x86/kvm/vmx.c | 8 arch/x86/kvm/x86.h | 9 + 2 files changed, 13 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index cbfbb8b..75dc888 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6397,7 +6397,7 @@ static int handle_vmread(struct kvm_vcpu *vcpu) * on the guest's mode (32 or 64 bit), not on the given field's length. */ if (vmx_instruction_info & (1u << 10)) { - kvm_register_write(vcpu, (((vmx_instruction_info) >> 3) & 0xf), + kvm_register_writel(vcpu, (((vmx_instruction_info) >> 3) & 0xf), field_value); } else { if (get_vmx_mem_address(vcpu, exit_qualification, @@ -6434,14 +6434,14 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu) return 1; if (vmx_instruction_info & (1u << 10)) - field_value = kvm_register_read(vcpu, + field_value = kvm_register_readl(vcpu, (((vmx_instruction_info) >> 3) & 0xf)); else { if (get_vmx_mem_address(vcpu, exit_qualification, vmx_instruction_info, &gva)) return 1; if (kvm_read_guest_virt(&vcpu->arch.emulate_ctxt, gva, - &field_value, (is_long_mode(vcpu) ? 8 : 4), &e)) { + &field_value, (is_64_bit_mode(vcpu) ? 8 : 4), &e)) { kvm_inject_page_fault(vcpu, &e); return 1; } @@ -6571,7 +6571,7 @@ static int handle_invept(struct kvm_vcpu *vcpu) } vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO); - type = kvm_register_read(vcpu, (vmx_instruction_info >> 28) & 0xf); + type = kvm_register_readl(vcpu, (vmx_instruction_info >> 28) & 0xf); types = (nested_vmx_ept_caps >> VMX_EPT_EXTENT_SHIFT) & 6; diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index c5b61a7..306a1b7 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -126,6 +126,15 @@ static inline unsigned long kvm_register_readl(struct kvm_vcpu *vcpu, return is_64_bit_mode(vcpu) ? val : (u32)val; } +static inline void kvm_register_writel(struct kvm_vcpu *vcpu, + enum kvm_reg reg, + unsigned long val) +{ + if (!is_64_bit_mode(vcpu)) + val = (u32)val; + return kvm_register_write(vcpu, reg, val); +} + void kvm_before_handle_nmi(struct kvm_vcpu *vcpu); void kvm_after_handle_nmi(struct kvm_vcpu *vcpu); int kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/9] KVM: x86: bit-ops emulation ignores offset on 64-bit
The current emulation of bit operations ignores the offset from the destination on 64-bit target memory operands. This patch fixes this behavior. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index e4e833d..f0b0a10 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1220,12 +1220,14 @@ static void fetch_bit_operand(struct x86_emulate_ctxt *ctxt) long sv = 0, mask; if (ctxt->dst.type == OP_MEM && ctxt->src.type == OP_REG) { - mask = ~(ctxt->dst.bytes * 8 - 1); + mask = ~((long)ctxt->dst.bytes * 8 - 1); if (ctxt->src.bytes == 2) sv = (s16)ctxt->src.val & (s16)mask; else if (ctxt->src.bytes == 4) sv = (s32)ctxt->src.val & (s32)mask; + else + sv = (s64)ctxt->src.val & (s64)mask; ctxt->dst.addr.mem.ea += (sv >> 3); } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 6/9] KVM: x86: check DR6/7 high-bits are clear only on long-mode
From: Nadav Amit When the guest sets DR6 and DR7, KVM asserts the high 32-bits are clear, and otherwise injects a #GP exception. This exception should only be injected only if running in long-mode. Signed-off-by: Nadav Amit --- arch/x86/kvm/vmx.c | 2 +- arch/x86/kvm/x86.h | 18 ++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 801332e..c0f53a0 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5165,7 +5165,7 @@ static int handle_dr(struct kvm_vcpu *vcpu) return 1; kvm_register_write(vcpu, reg, val); } else - if (kvm_set_dr(vcpu, dr, kvm_register_read(vcpu, reg))) + if (kvm_set_dr(vcpu, dr, kvm_register_readl(vcpu, reg))) return 1; skip_emulated_instruction(vcpu); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 8c97bac..c5b61a7 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -47,6 +47,16 @@ static inline int is_long_mode(struct kvm_vcpu *vcpu) #endif } +static inline bool is_64_bit_mode(struct kvm_vcpu *vcpu) +{ + int cs_db, cs_l; + + if (!is_long_mode(vcpu)) + return false; + kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l); + return cs_l; +} + static inline bool mmu_is_nested(struct kvm_vcpu *vcpu) { return vcpu->arch.walk_mmu == &vcpu->arch.nested_mmu; @@ -108,6 +118,14 @@ static inline bool vcpu_match_mmio_gpa(struct kvm_vcpu *vcpu, gpa_t gpa) return false; } +static inline unsigned long kvm_register_readl(struct kvm_vcpu *vcpu, + enum kvm_reg reg) +{ + unsigned long val = kvm_register_read(vcpu, reg); + + return is_64_bit_mode(vcpu) ? val : (u32)val; +} + void kvm_before_handle_nmi(struct kvm_vcpu *vcpu); void kvm_after_handle_nmi(struct kvm_vcpu *vcpu); int kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 7/9] KVM: x86: Hypercall handling does not considers opsize correctly
Currently, the hypercall handling routine only considers LME as an indication to whether the guest uses 32/64-bit mode. This is incosistent with hyperv hypercalls handling and against the common sense of considering cs.l as well. This patch uses is_64_bit_mode instead of is_long_mode for that matter. In addition, the result is masked in respect to the guest execution mode. Last, it changes kvm_hv_hypercall to use is_64_bit_mode as well to simplify the code. Signed-off-by: Nadav Amit --- arch/x86/kvm/x86.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f32a025..c6dfe47 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5662,7 +5662,6 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu) u64 param, ingpa, outgpa, ret; uint16_t code, rep_idx, rep_cnt, res = HV_STATUS_SUCCESS, rep_done = 0; bool fast, longmode; - int cs_db, cs_l; /* * hypercall generates UD from non zero cpl and real mode @@ -5673,8 +5672,7 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu) return 0; } - kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l); - longmode = is_long_mode(vcpu) && cs_l == 1; + longmode = is_64_bit_mode(vcpu); if (!longmode) { param = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDX) << 32) | @@ -5739,7 +5737,7 @@ static void kvm_pv_kick_cpu_op(struct kvm *kvm, unsigned long flags, int apicid) int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) { unsigned long nr, a0, a1, a2, a3, ret; - int r = 1; + int op_64_bit, r = 1; if (kvm_hv_hypercall_enabled(vcpu->kvm)) return kvm_hv_hypercall(vcpu); @@ -5752,7 +5750,8 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) trace_kvm_hypercall(nr, a0, a1, a2, a3); - if (!is_long_mode(vcpu)) { + op_64_bit = is_64_bit_mode(vcpu); + if (!op_64_bit) { nr &= 0x; a0 &= 0x; a1 &= 0x; @@ -5778,6 +5777,8 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) break; } out: + if (!op_64_bit) + ret = (u32)ret; kvm_register_write(vcpu, VCPU_REGS_RAX, ret); ++vcpu->stat.hypercalls; return r; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 5/9] KVM: x86: NOP emulation clears (incorrectly) the high 32-bits of RAX
On long-mode the current NOP (0x90) emulation still writes back to RAX. As a result, EAX is zero-extended and the high 32-bits of RAX are cleared. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index b354531..eb93eb4 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -4663,8 +4663,9 @@ special_insn: break; case 0x90 ... 0x97: /* nop / xchg reg, rax */ if (ctxt->dst.addr.reg == reg_rmw(ctxt, VCPU_REGS_RAX)) - break; - rc = em_xchg(ctxt); + ctxt->dst.type = OP_NONE; + else + rc = em_xchg(ctxt); break; case 0x98: /* cbw/cwde/cdqe */ switch (ctxt->op_bytes) { -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 0/5] x86: Tests for recent emulator bugs
This patch-set checks recent emulator bugs as well as monitor-mwait emulation in real-mode. Nadav Amit (5): x86: Testing nop instruction on 64-bit x86: test xadd with two identical operands x86: Test btcq with operand larger than 64 x86: check cmov instruction on 64-bit x86: Test monitor and mwait on real-mode x86/emulator.c | 30 -- x86/realmode.c | 20 2 files changed, 48 insertions(+), 2 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 4/5] x86: check cmov instruction on 64-bit
cmov instruction on 64-bit with dword destination register operand should clear bits [63:32]. This test checks this behavior due to previous KVM bug. Signed-off-by: Nadav Amit --- x86/emulator.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/x86/emulator.c b/x86/emulator.c index 460949f..1fd0ca6 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -710,6 +710,18 @@ static void test_shld_shrd(u32 *mem) report("shrd (cl)", *mem == ((0x12345678 >> 3) | (5u << 29))); } +static void test_cmov(u32 *mem) +{ + u64 val; + *mem = 0xabcdef12u; + asm ("movq $0x1234567812345678, %%rax\n\t" +"cmpl %%eax, %%eax\n\t" +"cmovnel (%[mem]), %%eax\n\t" +"movq %%rax, %[val]\n\t" +: [val]"=r"(val) : [mem]"r"(mem) : "%rax", "cc"); + report("cmovnel", val == 0x12345678ul); +} + #define INSN_XCHG_ALL \ "xchg %rax, 0+save \n\t"\ "xchg %rbx, 8+save \n\t"\ @@ -1054,12 +1066,12 @@ int main() test_sreg(mem); test_lldt(mem); test_ltr(mem); + test_cmov(mem); test_mmx_movq_mf(mem, insn_page, alt_insn_page, insn_ram); test_movabs(mem, insn_page, alt_insn_page, insn_ram); test_smsw_reg(mem, insn_page, alt_insn_page, insn_ram); test_nop(mem, insn_page, alt_insn_page, insn_ram); - test_crosspage_mmio(mem); test_string_io_mmio(mem); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 5/5] x86: Test monitor and mwait on real-mode
monitor and mwait are now considered to behave as nop. New patch enables monitor and mwait in realmode as well. This test checks whether they are handled as nop in realmode as well. Signed-off-by: Nadav Amit --- x86/realmode.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/x86/realmode.c b/x86/realmode.c index 10c3e03..0a62b5d 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -1671,6 +1671,16 @@ void test_xadd(void) report("xadd", R_AX, outregs.eax == inregs.eax * 2); } +void test_monitor_mwait(void) +{ + MK_INSN(monitor, "monitor\n\t" +"mwait\n\t"); + inregs.ecx = 0; + inregs.eax = 0; + exec_in_big_real_mode(&insn_monitor); + report("monitor", 0, 1); +} + void realmode_start(void) { @@ -1721,6 +1731,7 @@ void realmode_start(void) test_smsw(); test_nopl(); test_xadd(); + test_monitor_mwait(); test_perf_loop(); test_perf_mov(); test_perf_arith(); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 8/9] KVM: vmx: handle_cr ignores 32/64-bit mode
On 32-bit mode only bits [31:0] of the CR should be used for setting the CR value. Otherwise, the host may incorrectly assume the value is invalid if bits [63:32] are not zero. Moreover, the CR is currently being read twice when CR8 is used. Last, nested mov-cr exiting is modified to handle the CR value correctly as well. Signed-off-by: Nadav Amit --- arch/x86/kvm/vmx.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index c0f53a0..cbfbb8b 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5039,7 +5039,7 @@ static int handle_cr(struct kvm_vcpu *vcpu) reg = (exit_qualification >> 8) & 15; switch ((exit_qualification >> 4) & 3) { case 0: /* mov to cr */ - val = kvm_register_read(vcpu, reg); + val = kvm_register_readl(vcpu, reg); trace_kvm_cr_write(cr, val); switch (cr) { case 0: @@ -5056,7 +5056,7 @@ static int handle_cr(struct kvm_vcpu *vcpu) return 1; case 8: { u8 cr8_prev = kvm_get_cr8(vcpu); - u8 cr8 = kvm_register_read(vcpu, reg); + u8 cr8 = (u8)val; err = kvm_set_cr8(vcpu, cr8); kvm_complete_insn_gp(vcpu, err); if (irqchip_in_kernel(vcpu->kvm)) @@ -6751,7 +6751,7 @@ static bool nested_vmx_exit_handled_cr(struct kvm_vcpu *vcpu, unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION); int cr = exit_qualification & 15; int reg = (exit_qualification >> 8) & 15; - unsigned long val = kvm_register_read(vcpu, reg); + unsigned long val = kvm_register_readl(vcpu, reg); switch ((exit_qualification >> 4) & 3) { case 0: /* mov to cr */ -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/9] KVM: x86: Inter privilage level ret emulation is not implemeneted
Return unhandlable error on inter-privilage level ret instruction. This is since the current emulation does not check the privilage level correctly when loading the CS, and does not pop RSP/SS as needed. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 3c8d867..0183350 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2019,6 +2019,7 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt) { int rc; unsigned long cs; + int cpl = ctxt->ops->cpl(ctxt); rc = emulate_pop(ctxt, &ctxt->_eip, ctxt->op_bytes); if (rc != X86EMUL_CONTINUE) @@ -2028,6 +2029,9 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt) rc = emulate_pop(ctxt, &cs, ctxt->op_bytes); if (rc != X86EMUL_CONTINUE) return rc; + /* Outer-privilage level return is not implemented */ + if (ctxt->mode >= X86EMUL_MODE_PROT16 && (cs & 3) > cpl) + return X86EMUL_UNHANDLEABLE; rc = load_segment_descriptor(ctxt, (u16)cs, VCPU_SREG_CS); return rc; } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/9] KVM: x86: Wrong emulation on 'xadd X, X'
The emulator does not emulate the xadd instruction correctly if the two operands are the same. In this (unlikely) situation the result should be the sum of X and X (2X) when it is currently X. The solution is to first perform writeback to the source, before writing to the destination. The only instruction which should be affected is xadd, as the other instructions that perform writeback to the source use the extended accumlator (e.g., RAX:RDX). Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index f0b0a10..3c8d867 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -4711,17 +4711,17 @@ special_insn: goto done; writeback: - if (!(ctxt->d & NoWrite)) { - rc = writeback(ctxt, &ctxt->dst); - if (rc != X86EMUL_CONTINUE) - goto done; - } if (ctxt->d & SrcWrite) { BUG_ON(ctxt->src.type == OP_MEM || ctxt->src.type == OP_MEM_STR); rc = writeback(ctxt, &ctxt->src); if (rc != X86EMUL_CONTINUE) goto done; } + if (!(ctxt->d & NoWrite)) { + rc = writeback(ctxt, &ctxt->dst); + if (rc != X86EMUL_CONTINUE) + goto done; + } /* * restore dst type in case the decoding will be reused -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 4/9] KVM: x86: emulation of dword cmov on long-mode should clear [63:32]
Even if the condition of cmov is not satisfied, bits[63:32] should be cleared. This is clearly stated in Intel's CMOVcc documentation. The solution is to reassign the destination onto itself if the condition is unsatisfied. For that matter the original destination value needs to be read. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 0183350..b354531 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3905,7 +3905,7 @@ static const struct opcode twobyte_table[256] = { N, N, N, N, N, N, N, N, N, N, /* 0x40 - 0x4F */ - X16(D(DstReg | SrcMem | ModRM | Mov)), + X16(D(DstReg | SrcMem | ModRM)), /* 0x50 - 0x5F */ N, N, N, N, N, N, N, N, N, N, N, N, N, N, N, N, /* 0x60 - 0x6F */ @@ -4799,8 +4799,10 @@ twobyte_insn: ops->get_dr(ctxt, ctxt->modrm_reg, &ctxt->dst.val); break; case 0x40 ... 0x4f: /* cmov */ - ctxt->dst.val = ctxt->dst.orig_val = ctxt->src.val; - if (!test_cc(ctxt->b, ctxt->eflags)) + if (test_cc(ctxt->b, ctxt->eflags)) + ctxt->dst.val = ctxt->src.val; + else if (ctxt->mode != X86EMUL_MODE_PROT64 || +ctxt->op_bytes != 4) ctxt->dst.type = OP_NONE; /* no writeback */ break; case 0x80 ... 0x8f: /* jnz rel, etc*/ -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/9] KVM: x86: More emulator bugs
This patch-set resolves several emulator bugs. Each fix is independent of the others. The DR6 bug can occur during DR-access exit (regardless to unrestricted mode, MMIO and SPT). Changes in v2: Introduced kvm_register_readl and kvm_register_writel which consider long-mode and cs.l when reading the registers. Fixing the register read to respect 32/64 bit in hypercall handling, CR exit handling and VMX instructions handling. Thanks for re-reviewing the patch Nadav Amit (9): KVM: x86: bit-ops emulation ignores offset on 64-bit KVM: x86: Wrong emulation on 'xadd X, X' KVM: x86: Inter privilage level ret emulation is not implemeneted KVM: x86: emulation of dword cmov on long-mode should clear [63:32] KVM: x86: NOP emulation clears (incorrectly) the high 32-bits of RAX KVM: x86: check DR6/7 high-bits are clear only on long-mode KVM: x86: Hypercall handling does not considers opsize correctly KVM: vmx: handle_cr ignores 32/64-bit mode KVM: vmx: vmx instructions handling does not consider cs.l arch/x86/kvm/emulate.c | 31 --- arch/x86/kvm/vmx.c | 16 arch/x86/kvm/x86.c | 11 ++- arch/x86/kvm/x86.h | 27 +++ 4 files changed, 61 insertions(+), 24 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 3/5] x86: Test btcq with operand larger than 64
Previously, KVM did not calculate the offset for bit-operations correctly when quad-word operands were used. This test checks btcq when operand is larger than 64 in order to check this scenario. Signed-off-by: Nadav Amit --- x86/emulator.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/x86/emulator.c b/x86/emulator.c index bf8a873..460949f 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -496,7 +496,7 @@ void test_btc(void *mem) { unsigned int *a = mem; - memset(mem, 0, 3 * sizeof(unsigned int)); + memset(mem, 0, 4 * sizeof(unsigned int)); asm ("btcl $32, %0" :: "m"(a[0]) : "memory"); asm ("btcl $1, %0" :: "m"(a[1]) : "memory"); @@ -505,6 +505,10 @@ void test_btc(void *mem) asm ("btcl %1, %0" :: "m"(a[3]), "r"(-1) : "memory"); report("btcl reg, r/m", a[0] == 1 && a[1] == 2 && a[2] == 0x8004); + + asm ("btcq %1, %0" : : "m"(a[2]), "r"(-1l) : "memory"); + report("btcq reg, r/m", a[0] == 1 && a[1] == 0x8002 && + a[2] == 0x8004 && a[3] == 0); } void test_bsfbsr(void *mem) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 2/5] x86: test xadd with two identical operands
Previously, KVM emulated xadd incorrectly when the source and destination operands were identical. The expected result is that the register would hold the sum (2x) and not the previous value (x). This test checks this behavior. It should be executed with a disabled unrestricted mode. Signed-off-by: Nadav Amit --- x86/realmode.c | 9 + 1 file changed, 9 insertions(+) diff --git a/x86/realmode.c b/x86/realmode.c index dc4a1d3..10c3e03 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -1663,6 +1663,14 @@ void test_smsw(void) report("smsw", R_AX | R_BX | R_CX, outregs.eax == outregs.ebx); } +void test_xadd(void) +{ + MK_INSN(xadd, "xaddl %eax, %eax\n\t"); + inregs.eax = 0x12345678; + exec_in_big_real_mode(&insn_xadd); + report("xadd", R_AX, outregs.eax == inregs.eax * 2); +} + void realmode_start(void) { @@ -1712,6 +1720,7 @@ void realmode_start(void) test_dr_mod(); test_smsw(); test_nopl(); + test_xadd(); test_perf_loop(); test_perf_mov(); test_perf_arith(); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 1/5] x86: Testing nop instruction on 64-bit
Previously, nop instruction emulation on 64-bit caused RAX bits [63:32] to be cleared. This test checks the behavior is correct and RAX is unmodified. Signed-off-by: Nadav Amit --- x86/emulator.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/x86/emulator.c b/x86/emulator.c index f653127..bf8a873 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -845,6 +845,15 @@ static void test_smsw_reg(uint64_t *mem, uint8_t *insn_page, report("64-bit smsw reg", outregs.rax == cr0); } +static void test_nop(uint64_t *mem, uint8_t *insn_page, + uint8_t *alt_insn_page, void *insn_ram) +{ + inregs = (struct regs){ .rax = 0x1234567890abcdeful }; + MK_INSN(nop, "nop\n\t"); + trap_emulator(mem, alt_insn_page, &insn_nop); + report("nop", outregs.rax == inregs.rax); +} + static void test_crosspage_mmio(volatile uint8_t *mem) { volatile uint16_t w, *pw; @@ -1045,6 +1054,7 @@ int main() test_mmx_movq_mf(mem, insn_page, alt_insn_page, insn_ram); test_movabs(mem, insn_page, alt_insn_page, insn_ram); test_smsw_reg(mem, insn_page, alt_insn_page, insn_ram); + test_nop(mem, insn_page, alt_insn_page, insn_ram); test_crosspage_mmio(mem); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 9/9] KVM: vmx: vmx instructions handling does not consider cs.l
On 6/18/14, 6:41 PM, Paolo Bonzini wrote: Il 18/06/2014 16:19, Nadav Amit ha scritto: VMX instructions use 32-bit operands in 32-bit mode, and 64-bit operands in 64-bit mode. The current implementation is broken since it does not use the register operands correctly, and always uses 64-bit for reads and writes. Moreover, write to memory in vmwrite only considers long-mode, so it ignores cs.l. This patch fixes this behavior. The field of vmread/vmwrite is kept intentionally as 64-bit read since if bits [63:32] are not cleared the instruction should fail, according to Intel SDM. This is not how I read the SDM: "These instructions fail if given, in 64-bit mode, an operand that sets an encoding bit beyond bit 32." (Section 24.11.1.2) "Outside IA-32e mode, the source operand has 32 bits, regardless of the value of CS.D. In 64-bit mode, the source operand has 64 bits; however, if bits 63:32 of the source operand are not zero, VMREAD will fail due to an attempt to access an unsupported VMCS component (see operation section)." (Description of VMREAD in Chapter 30). I'll fix up the patch myself. Perhaps I am missing something, but I don't see where my mistake is. The VMREAD source operand is always read as 64-bits and I made no changes there. Therefore, if bits 63:32 are not zero, the instruction should fail when attempting to access the field. The value in the source operand of VMWRITE which represents the value which should be written is zero-extended outside 64-bit mode. Quoting: "The effective size of the primary source operand, which may be a register or in memory, is always 32 bits outside IA-32e mode (the setting of CS.D is ignored with respect to operand size) and 64 bits in 64-bit mode." (Description of VMWRITE in chapter 30). Regards, Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 9/9] KVM: vmx: vmx instructions handling does not consider cs.l
On Jun 18, 2014, at 7:06 PM, Paolo Bonzini wrote: > Il 18/06/2014 18:01, Nadav Amit ha scritto: >>> >> >> Perhaps I am missing something, but I don't see where my mistake is. >> The VMREAD source operand is always read as 64-bits and I made no >> changes there. Therefore, if bits 63:32 are not zero, the instruction >> should fail when attempting to access the field. > > In 32-bit mode, it should be read as 32 bits: "Outside IA-32e mode, the > source operand has 32 bits, regardless of the value of CS.D" (so it's never > 16 bits, but it's also never 64 bits). Oh. I now get it. I misunderstood what the SDM said, as I was thinking that 62:32 will lead to failure also on 32-bit mode. If you fix it, please fix both VMREAD and VMWRITE. If not, I would resubmit. Thanks, Nadav-- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation
On 6/18/14, 8:59 PM, Eric Northup wrote: On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit wrote: mwait and monitor are currently handled as nop. Considering this behavior, they should still be handled correctly, i.e., check execution conditions and generate exceptions when required. mwait and monitor may also be executed in real-mode and are not handled in that case. This patch performs the emulation of monitor-mwait according to Intel SDM (other than checking whether interrupt can be used as a break event). Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 41 +++-- arch/x86/kvm/svm.c | 22 ++ arch/x86/kvm/vmx.c | 27 +++ 3 files changed, 52 insertions(+), 38 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index ef7a5a0..424b58d 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3344,6 +3344,43 @@ static int em_bswap(struct x86_emulate_ctxt *ctxt) return X86EMUL_CONTINUE; } +static int em_monitor(struct x86_emulate_ctxt *ctxt) +{ + int rc; + struct segmented_address addr; + u64 rcx = reg_read(ctxt, VCPU_REGS_RCX); + u64 rax = reg_read(ctxt, VCPU_REGS_RAX); + u8 byte; I'd request: u32 ebx, ecx, edx, eax = 1; ctxt->opt->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx); if (!(ecx & FFL(MWAIT))) return emulate_ud(ctxt); and also in em_mwait. I had similar implementation on previous version, which also checked on mwait whether "interrupt as break event" matches ECX value. However, I was under the impression that it was decided that MWAIT will always be emulated as NOP to avoid misbehaving VMs that ignore CPUID (see the discussion at http://www.spinics.net/lists/kvm/msg102766.html ). Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation
On 6/19/14, 2:23 PM, Gleb Natapov wrote: On Thu, Jun 19, 2014 at 01:53:36PM +0300, Nadav Amit wrote: On Jun 19, 2014, at 1:18 PM, Michael S. Tsirkin wrote: On Wed, Jun 18, 2014 at 02:46:01PM -0400, Gabriel L. Somlo wrote: On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote: On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit wrote: mwait and monitor are currently handled as nop. Considering this behavior, they should still be handled correctly, i.e., check execution conditions and generate exceptions when required. mwait and monitor may also be executed in real-mode and are not handled in that case. This patch performs the emulation of monitor-mwait according to Intel SDM (other than checking whether interrupt can be used as a break event). Signed-off-by: Nadav Amit How about this instead (details in the commit log below) ? Please let me know what you think, and if you'd prefer me to send it out as a separate patch rather than a reply to this thread. Thanks, --Gabriel If there's an easy workaround, I'm inclined to agree. We can always go back to Gabriel's patch (and then we'll need Nadav's one too) but if we release a kernel with this support it becomes an ABI and we can't go back. So let's be careful here, and revert the hack for 3.16. Acked-by: Michael S. Tsirkin Personally, I got a custom guest which requires mwait for executing correctly. Can you elaborate on this guest a little bit. With nop implementation for mwait the guest will hog a host cpu. Do you consider this to be "executing correctly?" -- mwait is not as "clean" as it may appear. It encounters false wake-ups due to a variety of reasons, and any code need to recheck the wake-up condition afterwards. Actually, some CPUs had bugs that caused excessive wake-ups that degraded performance considerably (Nehalem, if I am not mistaken). Therefore, handling mwait as nop is logically correct (although it may degrade performance). For the reference, if you look at the SDM 8.10.4, you'll see: "Multiple events other than a write to the triggering address range can cause a processor that executed MWAIT to wake up. These include events that would lead to voluntary or involuntary context switches, such as..." Note the words "include" in the sentence "These include events". Software has no way of controlling whether it gets false wake-ups and cannot rely on the wake-up as indication to anything. Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation
On 6/19/14, 3:07 PM, Gleb Natapov wrote: On Thu, Jun 19, 2014 at 02:52:20PM +0300, Nadav Amit wrote: On 6/19/14, 2:23 PM, Gleb Natapov wrote: On Thu, Jun 19, 2014 at 01:53:36PM +0300, Nadav Amit wrote: On Jun 19, 2014, at 1:18 PM, Michael S. Tsirkin wrote: On Wed, Jun 18, 2014 at 02:46:01PM -0400, Gabriel L. Somlo wrote: On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote: On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit wrote: mwait and monitor are currently handled as nop. Considering this behavior, they should still be handled correctly, i.e., check execution conditions and generate exceptions when required. mwait and monitor may also be executed in real-mode and are not handled in that case. This patch performs the emulation of monitor-mwait according to Intel SDM (other than checking whether interrupt can be used as a break event). Signed-off-by: Nadav Amit How about this instead (details in the commit log below) ? Please let me know what you think, and if you'd prefer me to send it out as a separate patch rather than a reply to this thread. Thanks, --Gabriel If there's an easy workaround, I'm inclined to agree. We can always go back to Gabriel's patch (and then we'll need Nadav's one too) but if we release a kernel with this support it becomes an ABI and we can't go back. So let's be careful here, and revert the hack for 3.16. Acked-by: Michael S. Tsirkin Personally, I got a custom guest which requires mwait for executing correctly. Can you elaborate on this guest a little bit. With nop implementation for mwait the guest will hog a host cpu. Do you consider this to be "executing correctly?" -- mwait is not as "clean" as it may appear. It encounters false wake-ups due to a variety of reasons, and any code need to recheck the wake-up condition afterwards. Actually, some CPUs had bugs that caused excessive wake-ups that degraded performance considerably (Nehalem, if I am not mistaken). Therefore, handling mwait as nop is logically correct (although it may degrade performance). For the reference, if you look at the SDM 8.10.4, you'll see: "Multiple events other than a write to the triggering address range can cause a processor that executed MWAIT to wake up. These include events that would lead to voluntary or involuntary context switches, such as..." Note the words "include" in the sentence "These include events". Software has no way of controlling whether it gets false wake-ups and cannot rely on the wake-up as indication to anything. That's all well and good and I didn't say that nop is not a valid mwait implementation, it is, though there is a big difference between "encounters false wake-ups" and never sleeps. What I asked is do you consider your guest hogging host cpu to be "executing correctly?". What this guest is doing that such behaviour is tolerated and shouldn't it be better to just poll for a condition you are waiting for instead of executing expensive vmexits. This will also hog 100% host cpu, but will be actually faster. You are correct, but unfortunately I have no control over the guest workload. In this specific workload I do not care about performance but only about correctness. Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation
On 6/19/14, 3:17 PM, Michael S. Tsirkin wrote: On Thu, Jun 19, 2014 at 03:10:21PM +0300, Nadav Amit wrote: On 6/19/14, 3:07 PM, Gleb Natapov wrote: On Thu, Jun 19, 2014 at 02:52:20PM +0300, Nadav Amit wrote: On 6/19/14, 2:23 PM, Gleb Natapov wrote: On Thu, Jun 19, 2014 at 01:53:36PM +0300, Nadav Amit wrote: On Jun 19, 2014, at 1:18 PM, Michael S. Tsirkin wrote: On Wed, Jun 18, 2014 at 02:46:01PM -0400, Gabriel L. Somlo wrote: On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote: On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit wrote: mwait and monitor are currently handled as nop. Considering this behavior, they should still be handled correctly, i.e., check execution conditions and generate exceptions when required. mwait and monitor may also be executed in real-mode and are not handled in that case. This patch performs the emulation of monitor-mwait according to Intel SDM (other than checking whether interrupt can be used as a break event). Signed-off-by: Nadav Amit How about this instead (details in the commit log below) ? Please let me know what you think, and if you'd prefer me to send it out as a separate patch rather than a reply to this thread. Thanks, --Gabriel If there's an easy workaround, I'm inclined to agree. We can always go back to Gabriel's patch (and then we'll need Nadav's one too) but if we release a kernel with this support it becomes an ABI and we can't go back. So let's be careful here, and revert the hack for 3.16. Acked-by: Michael S. Tsirkin Personally, I got a custom guest which requires mwait for executing correctly. Can you elaborate on this guest a little bit. With nop implementation for mwait the guest will hog a host cpu. Do you consider this to be "executing correctly?" -- mwait is not as "clean" as it may appear. It encounters false wake-ups due to a variety of reasons, and any code need to recheck the wake-up condition afterwards. Actually, some CPUs had bugs that caused excessive wake-ups that degraded performance considerably (Nehalem, if I am not mistaken). Therefore, handling mwait as nop is logically correct (although it may degrade performance). For the reference, if you look at the SDM 8.10.4, you'll see: "Multiple events other than a write to the triggering address range can cause a processor that executed MWAIT to wake up. These include events that would lead to voluntary or involuntary context switches, such as..." Note the words "include" in the sentence "These include events". Software has no way of controlling whether it gets false wake-ups and cannot rely on the wake-up as indication to anything. That's all well and good and I didn't say that nop is not a valid mwait implementation, it is, though there is a big difference between "encounters false wake-ups" and never sleeps. What I asked is do you consider your guest hogging host cpu to be "executing correctly?". What this guest is doing that such behaviour is tolerated and shouldn't it be better to just poll for a condition you are waiting for instead of executing expensive vmexits. This will also hog 100% host cpu, but will be actually faster. You are correct, but unfortunately I have no control over the guest workload. In this specific workload I do not care about performance but only about correctness. Nadav No one prevents you from patching your kernel to run this workload. But is this of use to anyone else? If yes why? I do not say it should be the default behavior, and I can try to push to qemu some setting to turn it on by demand. Anyhow, I believe there are cases you may want mwait support - either an OS X guest which was not modified to run without mwait, or for debugging the monitor-mwait flow of a guest OS. I am not going to argue too much. Since I was under the impression there are needs for mwait, other than mine, I thought it would make all of our lives easier to have a better implementation. Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86: Fix lapic.c debug prints
In two cases lapic.c does not use the apic_debug macro correctly. This patch fixes them. Signed-off-by: Nadav Amit --- arch/x86/kvm/lapic.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 0069118..3855103 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1451,7 +1451,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu) vcpu->arch.apic_arb_prio = 0; vcpu->arch.apic_attention = 0; - apic_debug(KERN_INFO "%s: vcpu=%p, id=%d, base_msr=" + apic_debug("%s: vcpu=%p, id=%d, base_msr=" "0x%016" PRIx64 ", base_address=0x%0lx.\n", __func__, vcpu, kvm_apic_id(apic), vcpu->arch.apic_base, apic->base_address); @@ -1895,7 +1895,7 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu) /* evaluate pending_events before reading the vector */ smp_rmb(); sipi_vector = apic->sipi_vector; - pr_debug("vcpu %d received sipi with vector # %x\n", + apic_debug("vcpu %d received sipi with vector # %x\n", vcpu->vcpu_id, sipi_vector); kvm_vcpu_deliver_sipi_vector(vcpu, sipi_vector); vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Fix lapic.c debug prints
On 6/30/14, 3:48 AM, Bandan Das wrote: Nadav Amit writes: In two cases lapic.c does not use the apic_debug macro correctly. This patch fixes them. Signed-off-by: Nadav Amit --- arch/x86/kvm/lapic.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 0069118..3855103 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1451,7 +1451,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu) vcpu->arch.apic_arb_prio = 0; vcpu->arch.apic_attention = 0; - apic_debug(KERN_INFO "%s: vcpu=%p, id=%d, base_msr=" + apic_debug("%s: vcpu=%p, id=%d, base_msr=" "0x%016" PRIx64 ", base_address=0x%0lx.\n", __func__, vcpu, kvm_apic_id(apic), vcpu->arch.apic_base, apic->base_address); @@ -1895,7 +1895,7 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu) /* evaluate pending_events before reading the vector */ smp_rmb(); sipi_vector = apic->sipi_vector; - pr_debug("vcpu %d received sipi with vector # %x\n", + apic_debug("vcpu %d received sipi with vector # %x\n", Why don't we just use pr_debug all throughout ? I don't know. I just tried to make it consistent, since it really bugged me while I was debugging the local-apic. If you prefer the other way around (which does seem to be better), I can do the search-and-replace. Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86: Pending interrupt may be delivered after INIT
We encountered a scenario in which after an INIT is delivered, a pending interrupt is delivered, although it was sent before the INIT. As the SDM states in section 10.4.7.1, the ISR and the IRR should be cleared after INIT as KVM does. This also means that pending interrupts should be cleared. This patch clears upon reset (and INIT) the pending interrupts; and at the same occassion clears the pending exceptions, since they may cause a similar issue. Signed-off-by: Nadav Amit --- arch/x86/kvm/x86.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f32a025..863ac07 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6835,6 +6835,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu) atomic_set(&vcpu->arch.nmi_queued, 0); vcpu->arch.nmi_pending = 0; vcpu->arch.nmi_injected = false; + vcpu->arch.interrupt.pending = false; + vcpu->arch.exception.pending = false; memset(vcpu->arch.db, 0, sizeof(vcpu->arch.db)); vcpu->arch.dr6 = DR6_FIXED_1; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] KVM: x86: Pending interrupt may be delivered after INIT
We encountered a scenario in which after an INIT is delivered, a pending interrupt is delivered, although it was sent before the INIT. As the SDM states in section 10.4.7.1, the ISR and the IRR should be cleared after INIT as KVM does. This also means that pending interrupts should be cleared. This patch clears upon reset (and INIT) the pending interrupts; and at the same occassion clears the pending exceptions, since they may cause a similar issue. Signed-off-by: Nadav Amit --- arch/x86/kvm/x86.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f32a025..6425a31 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6835,6 +6835,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu) atomic_set(&vcpu->arch.nmi_queued, 0); vcpu->arch.nmi_pending = 0; vcpu->arch.nmi_injected = false; + kvm_clear_interrupt_queue(vcpu); + kvm_clear_exception_queue(vcpu); memset(vcpu->arch.db, 0, sizeof(vcpu->arch.db)); vcpu->arch.dr6 = DR6_FIXED_1; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/5] KVM: MMU: allow pinning spte translations (TDP-only)
I tried to use the patches (regardless of PEBS). When the host uses huge-pages, I get an a oops. I did not delve into the patch details, but I think there is some mess with the indexes - see below: On 6/19/14, 2:12 AM, mtosa...@redhat.com wrote: + +static bool direct_pin_sptes(struct kvm_vcpu *vcpu, gfn_t gfn) +{ + u64 *sptes[4]; + int r, i, level; + + r = get_sptep_hierarchy(vcpu, gfn << PAGE_SHIFT, sptes); + if (!r) + return false; + + level = 5 - r; + if (!is_last_spte(*sptes[r-1], level)) should be *sptes[level-1] + return false; + if (!is_shadow_present_pte(*sptes[r-1])) should be *sptes[level-1] + return false; + + for (i = 0; i < r; i++) { + u64 *sptep = sptes[i]; should be sptes[3 - i]; + struct kvm_mmu_page *sp = page_header(__pa(sptep)); + + sp->pinned = true; + set_bit(SPTE_PINNED_BIT, (unsigned long *)sptep); + } + + return true; +} Regards, Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] kvm, mem-hotplug: Update apic access page when it is migrated.
Tang, Running some (unrelated) tests I see that KVM does not handle APIC base relocation correctly. When the base is changed, kvm_lapic_set_base just changes lapic->base_address without taking further action (i.e., modifying the VMCS apic address in VMX). This patch follows KVM bad behavior by using the constant VMX_APIC_ACCESS_PAGE_ADDR instead of lapic->base_address. Anyhow, I didn't see anything that would make my life (in fixing the lapic base issue) too difficult. Yet, feel free in making it more "fix-friendly". Thanks, Nadav On 7/7/14, 12:52 PM, Tang Chen wrote: Hi Gleb, The guest hang problem has been solved. When mmu_notifier is called, I set VMCS APIC_ACCESS_ADDR to the new value instead of setting it to 0. And only update kvm->arch.apic_access_page in the next ept violation. The guest is running well now. I'll post the new patches tomorrow. ;) Thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] kvm, mem-hotplug: Update apic access page when it is migrated.
On 7/7/14, 2:54 PM, Gleb Natapov wrote: On Mon, Jul 07, 2014 at 02:42:27PM +0300, Nadav Amit wrote: Tang, Running some (unrelated) tests I see that KVM does not handle APIC base relocation correctly. When the base is changed, kvm_lapic_set_base just changes lapic->base_address without taking further action (i.e., modifying the VMCS apic address in VMX). This patch follows KVM bad behavior by using the constant VMX_APIC_ACCESS_PAGE_ADDR instead of lapic->base_address. There is no OS out there that relocates APIC base (in fact it was not always relocatable on real HW), so there is not point in complicating the code to support it. In fact current APIC_ACCESS_ADDR handling relies on the fact that all vcpus has apic mapped at the same address. Anyhow, I didn't see anything that would make my life (in fixing the lapic base issue) too difficult. Yet, feel free in making it more "fix-friendly". Why would you want to fix it? If there is no general need, I will not send a fix. However, I think the very least a warning message should be appear if the guest relocates the APIC base. Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] kvm, mem-hotplug: Update apic access page when it is migrated.
Tang, I am sorry if I caused any confusion. Following Gleb response, there is no apparent need for dealing with the scenario I mentioned (relocating the APIC base), so you don't need to do any changes to your patch, and I will post another patch later to warn if the guest relocates its APIC (from the default address to another guest physical address). My answers to your questions are below. On 7/8/14, 4:44 AM, Tang Chen wrote: Hi Nadav, Thanks for the reply, please see below. On 07/07/2014 08:10 PM, Nadav Amit wrote: On 7/7/14, 2:54 PM, Gleb Natapov wrote: On Mon, Jul 07, 2014 at 02:42:27PM +0300, Nadav Amit wrote: Tang, Running some (unrelated) tests I see that KVM does not handle APIC base relocation correctly. When the base is changed, kvm_lapic_set_base just changes lapic->base_address without taking further action (i.e., modifying the VMCS apic address in VMX). This patch follows KVM bad behavior by using the constant VMX_APIC_ACCESS_PAGE_ADDR instead of lapic->base_address. There is no OS out there that relocates APIC base (in fact it was not always relocatable on real HW), so there is not point in complicating the code to support it. In fact current APIC_ACCESS_ADDR handling relies on the fact that all vcpus has apic mapped at the same address. Anyhow, I didn't see anything that would make my life (in fixing the lapic base issue) too difficult. Yet, feel free in making it more "fix-friendly". Why would you want to fix it? If there is no general need, I will not send a fix. However, I think the very least a warning message should be appear if the guest relocates the APIC base. Maybe I didn't understand you question correctly. If I'm wrong, please tell me. This patch does not relocate APIC base in guest, but in host. Host migrates the apic page to somewhere else, and KVM updates ept pagetable to track it. In guest, apic base address (gpa) doesn't change. The last claim is true in practice, according to Gleb, but it is not necessarily so according to the specifications. Pentium 4, Intel Xeon and P6 family processors support APIC base relocation. See the Intel SDM section 10.4.5. Anyhow, Gleb claims it is not used by any OS. Is this lapic->base_address a hpa ? No, it is guest physical address. Is there anywhere I need to update in my patch ? No. I'll send another patch on top of yours that prints a warning if the APIC base is relocated (i.e., the guest physical address of the APIC base is changed). Such relocation is done explicitly by the guest, not by your patch. Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86: DR6/7.RTM cannot be written
Haswell and newer Intel CPUs have support for RTM, and in that case DR6.RTM is not fixed to 1 and DR7.RTM is not fixed to zero. That is not the case in the current KVM implementation. This bug is apparent only if the MOV-DR instruction is emulated or the host also debugs the guest. This patch is a partial fix which enables DR6.RTM and DR7.RTM to be cleared and set respectively. It also sets DR6.RTM upon every debug exception. Obviously, it is not a complete fix, as debugging of RTM is still unsupported. Signed-off-by: Nadav Amit --- arch/x86/include/asm/kvm_host.h | 8 +--- arch/x86/kvm/cpuid.h| 8 arch/x86/kvm/vmx.c | 4 ++-- arch/x86/kvm/x86.c | 22 -- 4 files changed, 31 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index b8a4480..a84eaf7 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -152,14 +152,16 @@ enum { #define DR6_BD (1 << 13) #define DR6_BS (1 << 14) -#define DR6_FIXED_10x0ff0 -#define DR6_VOLATILE 0xe00f +#define DR6_RTM(1 << 16) +#define DR6_FIXED_10xfffe0ff0 +#define DR6_INIT 0x0ff0 +#define DR6_VOLATILE 0x0001e00f #define DR7_BP_EN_MASK 0x00ff #define DR7_GE (1 << 9) #define DR7_GD (1 << 13) #define DR7_FIXED_10x0400 -#define DR7_VOLATILE 0x23ff +#define DR7_VOLATILE 0x2bff /* apic attention bits */ #define KVM_APIC_CHECK_VAPIC 0 diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index f908731..a538059 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -95,4 +95,12 @@ static inline bool guest_cpuid_has_gbpages(struct kvm_vcpu *vcpu) best = kvm_find_cpuid_entry(vcpu, 0x8001, 0); return best && (best->edx & bit(X86_FEATURE_GBPAGES)); } + +static inline bool guest_cpuid_has_rtm(struct kvm_vcpu *vcpu) +{ + struct kvm_cpuid_entry2 *best; + + best = kvm_find_cpuid_entry(vcpu, 7, 0); + return best && (best->ebx & bit(X86_FEATURE_RTM)); +} #endif diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 0c9569b..1fd3598 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4892,7 +4892,7 @@ static int handle_exception(struct kvm_vcpu *vcpu) if (!(vcpu->guest_debug & (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))) { vcpu->arch.dr6 &= ~15; - vcpu->arch.dr6 |= dr6; + vcpu->arch.dr6 |= dr6 | DR6_RTM; if (!(dr6 & ~DR6_RESERVED)) /* icebp */ skip_emulated_instruction(vcpu); @@ -5151,7 +5151,7 @@ static int handle_dr(struct kvm_vcpu *vcpu) return 0; } else { vcpu->arch.dr7 &= ~DR7_GD; - vcpu->arch.dr6 |= DR6_BD; + vcpu->arch.dr6 |= DR6_BD | DR6_RTM; vmcs_writel(GUEST_DR7, vcpu->arch.dr7); kvm_queue_exception(vcpu, DB_VECTOR); return 1; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f750b69..fae064f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -759,6 +759,15 @@ static void kvm_update_dr7(struct kvm_vcpu *vcpu) vcpu->arch.switch_db_regs |= KVM_DEBUGREG_BP_ENABLED; } +static u64 kvm_dr6_fixed(struct kvm_vcpu *vcpu) +{ + u64 fixed = DR6_FIXED_1; + + if (!guest_cpuid_has_rtm(vcpu)) + fixed |= DR6_RTM; + return fixed; +} + static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) { switch (dr) { @@ -774,7 +783,7 @@ static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) case 6: if (val & 0xULL) return -1; /* #GP */ - vcpu->arch.dr6 = (val & DR6_VOLATILE) | DR6_FIXED_1; + vcpu->arch.dr6 = (val & DR6_VOLATILE) | kvm_dr6_fixed(vcpu); kvm_update_dr6(vcpu); break; case 5: @@ -5115,7 +5124,8 @@ static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu, unsigned long rflag */ if (unlikely(rflags & X86_EFLAGS_TF)) { if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) { - kvm_run->debug.arch.dr6 = DR6_BS | DR6_FIXED_1; + kvm_run->debug.arch.dr6 = DR6_BS | DR6_FIXED_1 | + DR6_RTM; kvm_run->debug.arch.pc = vcpu->arch.singlestep_rip; kvm_run->debug.arch.exception = DB_VECTOR; kvm_run->exit_reason = KVM_EXIT_DEBUG; @@ -5
[PATCH kvm-unit-tests] x86: Check DR6.RTM is writable
Recently discovered bug shows DR6.RTM is fixed to one. The bug is only apparent when the host emulates the MOV-DR instruction or when the host debugs the guest kernel. This patch tests whether DR6.RTM is indeed accessible according to RTM support as reported by cpuid. Signed-off-by: Nadav Amit --- x86/emulator.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/x86/emulator.c b/x86/emulator.c index 1fd0ca6..f68882f 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -3,6 +3,7 @@ #include "libcflat.h" #include "desc.h" #include "types.h" +#include "processor.h" #define memset __builtin_memset #define TESTDEV_IO_PORT 0xe0 @@ -870,6 +871,19 @@ static void test_nop(uint64_t *mem, uint8_t *insn_page, report("nop", outregs.rax == inregs.rax); } +static void test_mov_dr(uint64_t *mem, uint8_t *insn_page, + uint8_t *alt_insn_page, void *insn_ram) +{ + bool rtm_support = cpuid(7).b & (1 << 11); + unsigned long dr6_fixed_1 = rtm_support ? 0xfffe0ff0ul : 0x0ff0ul; + inregs = (struct regs){ .rax = 0 }; + MK_INSN(mov_to_dr6, "movq %rax, %dr6\n\t"); + trap_emulator(mem, alt_insn_page, &insn_mov_to_dr6); + MK_INSN(mov_from_dr6, "movq %dr6, %rax\n\t"); + trap_emulator(mem, alt_insn_page, &insn_mov_from_dr6); + report("mov_dr6", outregs.rax == dr6_fixed_1); +} + static void test_crosspage_mmio(volatile uint8_t *mem) { volatile uint16_t w, *pw; @@ -1072,6 +1086,7 @@ int main() test_movabs(mem, insn_page, alt_insn_page, insn_ram); test_smsw_reg(mem, insn_page, alt_insn_page, insn_ram); test_nop(mem, insn_page, alt_insn_page, insn_ram); + test_mov_dr(mem, insn_page, alt_insn_page, insn_ram); test_crosspage_mmio(mem); test_string_io_mmio(mem); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86: emulator injects #DB when RFLAGS.RF is set
If the RFLAGS.RF is set, then no #DB should occur on instruction breakpoints. However, the KVM emulator injects #DB regardless to RFLAGS.RF. This patch fixes this behavior. KVM, however, still appears not to update RFLAGS.RF correctly, regardless of this patch. Signed-off-by: Nadav Amit --- arch/x86/kvm/x86.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fae064f..e341a81 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5168,7 +5168,8 @@ static bool kvm_vcpu_check_breakpoint(struct kvm_vcpu *vcpu, int *r) } } - if (unlikely(vcpu->arch.dr7 & DR7_BP_EN_MASK)) { + if (unlikely(vcpu->arch.dr7 & DR7_BP_EN_MASK) && + !(kvm_get_rflags(vcpu) & X86_EFLAGS_RF)) { dr6 = kvm_vcpu_check_hw_bp(eip, 0, vcpu->arch.dr7, vcpu->arch.db); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/4] KVM: MMU: allow pinning spte translations (TDP-only)
Small question if I may regarding kvm_mmu_pin_pages: On 7/9/14, 10:12 PM, mtosa...@redhat.com wrote: + +static int kvm_mmu_pin_pages(struct kvm_vcpu *vcpu) +{ + struct kvm_pinned_page_range *p; + int r = 1; + + if (is_guest_mode(vcpu)) + return r; + + if (!vcpu->arch.mmu.direct_map) + return r; + + ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa)); + + list_for_each_entry(p, &vcpu->arch.pinned_mmu_pages, link) { + gfn_t gfn_offset; + + for (gfn_offset = 0; gfn_offset < p->npages; gfn_offset++) { + gfn_t gfn = p->base_gfn + gfn_offset; + int r; + bool pinned = false; + + r = vcpu->arch.mmu.page_fault(vcpu, gfn << PAGE_SHIFT, +PFERR_WRITE_MASK, false, +true, &pinned); I understand that the current use-case is for pinning only few pages. Yet, wouldn't it be better (for performance) to check whether the gfn uses a large page and if so to skip forward, increasing gfn_offset to point to the next large page? Thanks, Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/4] KVM: MMU: allow pinning spte translations (TDP-only)
On Jul 18, 2014, at 12:38 AM, Marcelo Tosatti wrote: > On Thu, Jul 17, 2014 at 08:18:03PM +0300, Nadav Amit wrote: >> Small question if I may regarding kvm_mmu_pin_pages: >> ... >> I understand that the current use-case is for pinning only few >> pages. Yet, wouldn't it be better (for performance) to check whether >> the gfn uses a large page and if so to skip forward, increasing >> gfn_offset to point to the next large page? > > Sure, that can be a lazy optimization and performed when necessary? > > (feel free to do it in advance if you're interested in doing it > now). I would do it once your patch is applied (I don’t see it in the queue). Thanks, Nadav signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [PATCH] KVM: x86: Remove redundant and incorrect cpl check on task-switch
cc'ing the kvm mailing list that was mistakenly omitted. On 7/30/14 9:57 AM, Nadav Amit wrote: > Task-switch emulation checks the privilage level prior to performing the > task-switch. This check is incorrect in the case of task-gates, in which the > tss.dpl is ignored, and can cause superfluous exceptions. Moreover this check > is unnecassary, since the CPU checks the privilage levels prior to exiting. > Intel SDM 25.4.2 says "If CALL or JMP accesses a TSS descriptor directly > outside IA-32e mode, privilege levels are checked on the TSS descriptor" prior > to exiting. AMD 15.14.1 says "The intercept is checked before the task switch > takes place but after the incoming TSS and task gate (if one was involved) > have > been checked for correctness." > > This patch removes the CPL checks for CALL and JMP. > > Signed-off-by: Nadav Amit > --- > arch/x86/kvm/emulate.c | 8 ++-- > 1 file changed, 2 insertions(+), 6 deletions(-) > > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c > index 56657b0..a9b2bd6 100644 > --- a/arch/x86/kvm/emulate.c > +++ b/arch/x86/kvm/emulate.c > @@ -2698,7 +2698,8 @@ static int emulator_do_task_switch(struct > x86_emulate_ctxt *ctxt, >* >* 1. jmp/call/int to task gate: Check against DPL of the task gate >* 2. Exception/IRQ/iret: No check is performed > - * 3. jmp/call to TSS: Check against DPL of the TSS > + * 3. jmp/call to TSS/task-gate: No check is performed since the > + *hardware checks it before exiting. >*/ > if (reason == TASK_SWITCH_GATE) { > if (idt_index != -1) { > @@ -2715,13 +2716,8 @@ static int emulator_do_task_switch(struct > x86_emulate_ctxt *ctxt, > if ((tss_selector & 3) > dpl || ops->cpl(ctxt) > dpl) > return emulate_gp(ctxt, (idt_index << 3) | 0x2); > } > - } else if (reason != TASK_SWITCH_IRET) { > - int dpl = next_tss_desc.dpl; > - if ((tss_selector & 3) > dpl || ops->cpl(ctxt) > dpl) > - return emulate_gp(ctxt, tss_selector); > } > > - > desc_limit = desc_limit_scaled(&next_tss_desc); > if (!next_tss_desc.p || > ((desc_limit < 0x67 && (next_tss_desc.type & 8)) || > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: cluster mode broadcast does not work
cc'ing the kvm mailing list that was mistakenly omitted. On 7/30/14 10:03 AM, Nadav Amit wrote: > Local-apic enables cluster mode broadcast. As Intel SDM 10.6.2.2 says: > "Broadcast to all local APICs is achieved by setting all destination bits to > one." This patch enables cluster mode broadcast. > > Signed-off-by: Nadav Amit > --- > arch/x86/kvm/lapic.c | 8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index efb1939..05c9a46 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -560,8 +560,8 @@ int kvm_apic_match_logical_addr(struct kvm_lapic *apic, > u8 mda) > result = 1; > break; > case APIC_DFR_CLUSTER: > - if (((logical_id >> 4) == (mda >> 0x4)) > - && (logical_id & mda & 0xf)) > + if logical_id >> 4) == (mda >> 0x4)) > + && (logical_id & mda & 0xf)) || mda == 0xff) > result = 1; > break; > default: > @@ -644,6 +644,10 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, > struct kvm_lapic *src, > } else { > u32 mda = irq->dest_id << (32 - map->ldr_bits); > > + if (irq->dest_id == 0xff && > + kvm_apic_get_reg(src, APIC_DFR) == APIC_DFR_CLUSTER) > + goto out; /* cluster mode broadcast */ > + > dst = map->logical_map[apic_cluster_id(map, mda)]; > > bitmap = apic_logical_id(map, mda); > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86: Avoid emulating instructions on #UD mistakenly
Commit d40a6898e5 mistakenly caused instructions which are not marked as EmulateOnUD to be emulated upon #UD exception. The commit caused the check of whether the instruction flags include EmulateOnUD to never be evaluated. As a result instructions whose emulation is broken may be emulated. This fix moves the evaluation of EmulateOnUD so it would be evaluated. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 56657b0..37a83b2 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -4394,6 +4394,9 @@ done_prefixes: ctxt->execute = opcode.u.execute; + if (!(ctxt->d & EmulateOnUD) && ctxt->ud) + return EMULATION_FAILED; + if (unlikely(ctxt->d & (NotImpl|EmulateOnUD|Stack|Op3264|Sse|Mmx|Intercept|CheckPerm))) { /* @@ -4406,9 +4409,6 @@ done_prefixes: if (ctxt->d & NotImpl) return EMULATION_FAILED; - if (!(ctxt->d & EmulateOnUD) && ctxt->ud) - return EMULATION_FAILED; - if (mode == X86EMUL_MODE_PROT64 && (ctxt->d & Stack)) ctxt->op_bytes = 8; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Avoid emulating instructions on #UD mistakenly
Correction: the word “never” in the message is too harsh. Nonetheless, there is a regression bug. I encountered it with “wrfsbase” instruction. Nadav On Aug 13, 2014, at 4:50 PM, Nadav Amit wrote: > Commit d40a6898e5 mistakenly caused instructions which are not marked as > EmulateOnUD to be emulated upon #UD exception. The commit caused the check of > whether the instruction flags include EmulateOnUD to never be evaluated. As a > result instructions whose emulation is broken may be emulated. This fix moves > the evaluation of EmulateOnUD so it would be evaluated. > > Signed-off-by: Nadav Amit > --- > arch/x86/kvm/emulate.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c > index 56657b0..37a83b2 100644 > --- a/arch/x86/kvm/emulate.c > +++ b/arch/x86/kvm/emulate.c > @@ -4394,6 +4394,9 @@ done_prefixes: > > ctxt->execute = opcode.u.execute; > > + if (!(ctxt->d & EmulateOnUD) && ctxt->ud) > + return EMULATION_FAILED; > + > if (unlikely(ctxt->d & > > (NotImpl|EmulateOnUD|Stack|Op3264|Sse|Mmx|Intercept|CheckPerm))) { > /* > @@ -4406,9 +4409,6 @@ done_prefixes: > if (ctxt->d & NotImpl) > return EMULATION_FAILED; > > - if (!(ctxt->d & EmulateOnUD) && ctxt->ud) > - return EMULATION_FAILED; > - > if (mode == X86EMUL_MODE_PROT64 && (ctxt->d & Stack)) > ctxt->op_bytes = 8; > > -- > 1.9.1 > signature.asc Description: Message signed with OpenPGP using GPGMail
Regression problem with commit 5045b46803
Commit 5045b46803 added a check that cs.dpl equals cs.rpl during task-switch. Unfortunately, is causes some of my tests that run well on bare-metal to fail. Although this check is mentioned in table 7-1 of the SDM as causing a #TSS exception, it is not mentioned in table 6-6 that lists "invalid TSS conditions" which cause #TSS exceptions. Thus, I recommend on reverting commit 5045b46803, or alternatively rechecking task-switch behavior on bare-metal. Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression problem with commit 5045b46803
On Aug 17, 2014, at 9:17 AM, Paolo Bonzini wrote: > Il 13/08/2014 19:14, Nadav Amit ha scritto: >> Commit 5045b46803 added a check that cs.dpl equals cs.rpl during >> task-switch. Unfortunately, is causes some of my tests that run well on >> bare-metal to fail. >> >> Although this check is mentioned in table 7-1 of the SDM as causing a >> #TSS exception, it is not mentioned in table 6-6 that lists "invalid TSS >> conditions" which cause #TSS exceptions. >> >> Thus, I recommend on reverting commit 5045b46803, or alternatively >> rechecking task-switch behavior on bare-metal. > > Can you provide a kvm-unit-tests test for this? I need to put time into writing one. However, the unit test would prove nothing. The question is how bare-metal behaves. According to what I see, it does not perform this check. Nadav-- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression problem with commit 5045b46803
On Aug 17, 2014, at 9:28 AM, Paolo Bonzini wrote: > Il 17/08/2014 08:23, Nadav Amit ha scritto: >>>>>> Although this check is mentioned in table 7-1 of the SDM as causing a >>>>>> #TSS exception, it is not mentioned in table 6-6 that lists "invalid TSS >>>>>> conditions" which cause #TSS exceptions. >>>>>> >>>>>> Thus, I recommend on reverting commit 5045b46803, or alternatively >>>>>> rechecking task-switch behavior on bare-metal. >>>> >>>> Can you provide a kvm-unit-tests test for this? >> >> The question is how bare-metal behaves. According to what I see, it does not >> perform this check. > > Yes, but I still prefer to have a test, in case I find some time to fix > kvm-unit-tests to work on bare metal (or on Bochs). > > There are already a few task switch tests (32-bit only). > Can you please assist in running the existing task-switch tests first? In x86_64 the task-switch tests are skipped. If I configure the tests to use the i386 architecture, the build fails. Even after fixing the compilation errors, I don’t manage to get the task-switch tests to pass. Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests] x86: Test task-switch with cs.rpl != cs.dpl
Commit 5045b46803 added a check that cs.dpl equals cs.rpl during task-switch. This is a wrong check, and this test introduces a test in which cs.dpl != cs.rpl. To do so, it configures tss.cs to be conforming with rpl=3 and dpl=0. Since the cpl after calling is 3, it does not make any prints in the callee. Signed-off-by: Nadav Amit --- x86/taskswitch2.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/x86/taskswitch2.c b/x86/taskswitch2.c index 92fc941..d96853f 100644 --- a/x86/taskswitch2.c +++ b/x86/taskswitch2.c @@ -7,6 +7,8 @@ #define MAIN_TSS_SEL (FIRST_SPARE_SEL + 0) #define VM86_TSS_SEL (FIRST_SPARE_SEL + 8) +#define USER_CS_SEL (FIRST_SPARE_SEL + 16) +#define USER_DS_SEL (FIRST_SPARE_SEL + 24) static volatile int test_count; static volatile unsigned int test_divider; @@ -102,6 +104,14 @@ start: goto start; } +static void user_tss(void) +{ +start: + test_count++; + asm volatile ("iret"); + goto start; +} + void test_kernel_mode_int() { unsigned int res; @@ -201,6 +211,18 @@ void test_kernel_mode_int() asm volatile ("ljmp $" xstr(TSS_INTR) ", $0xf4f4f4f4"); printf("Jump back succeeded\n"); report("ljmp", test_count == 1); + + /* test lcall with conforming segment, cs.dpl != cs.rpl */ + test_count = 0; + set_intr_task_gate(0, user_tss); + + tss_intr.cs = USER_CS_SEL | 3; + tss_intr.ss = USER_DS_SEL | 3; + tss_intr.ds = tss_intr.gs = tss_intr.fs = tss_intr.ss; + set_gdt_entry(USER_CS_SEL, 0, 0x, 0x9f, 0xc0); + set_gdt_entry(USER_DS_SEL, 0, 0x, 0xf3, 0xc0); + asm volatile("lcall $" xstr(TSS_INTR) ", $0xf4f4f4f4"); + report("lcall when cs.rpl != cs.dpl", test_count == 1); } void test_vm86_switch(void) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86: Revert "check CS.DPL against RPL during task switch"
This reverts commit 5045b468037dfe1c848827ce10e99d87f5669160. Although the cs.dpl=cs.rpl check is mentioned in table 7-1 of the SDM as causing a #TSS exception, it is not mentioned in table 6-6 that lists "invalid TSS conditions" which cause #TSS exceptions. As it causes some tests, which pass on bare-metal, to fail - it should be reverted. Signed-off-by: Nadav Amit --- arch/x86/kvm/emulate.c | 32 +++- 1 file changed, 15 insertions(+), 17 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 56657b0..b73c9e8 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1415,7 +1415,7 @@ static int write_segment_descriptor(struct x86_emulate_ctxt *ctxt, /* Does not support long mode */ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt, -u16 selector, int seg, u8 cpl, bool in_task_switch) +u16 selector, int seg, u8 cpl) { struct desc_struct seg_desc, old_desc; u8 dpl, rpl; @@ -1491,9 +1491,6 @@ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt, goto exception; break; case VCPU_SREG_CS: - if (in_task_switch && rpl != dpl) - goto exception; - if (!(seg_desc.type & 8)) goto exception; @@ -1560,7 +1557,7 @@ static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt, u16 selector, int seg) { u8 cpl = ctxt->ops->cpl(ctxt); - return __load_segment_descriptor(ctxt, selector, seg, cpl, false); + return __load_segment_descriptor(ctxt, selector, seg, cpl); } static void write_register_operand(struct operand *op) @@ -2460,19 +2457,19 @@ static int load_state_from_tss16(struct x86_emulate_ctxt *ctxt, * Now load segment descriptors. If fault happens at this stage * it is handled in a context of new task */ - ret = __load_segment_descriptor(ctxt, tss->ldt, VCPU_SREG_LDTR, cpl, true); + ret = __load_segment_descriptor(ctxt, tss->ldt, VCPU_SREG_LDTR, cpl); if (ret != X86EMUL_CONTINUE) return ret; - ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl, true); + ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl); if (ret != X86EMUL_CONTINUE) return ret; - ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl, true); + ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl); if (ret != X86EMUL_CONTINUE) return ret; - ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl, true); + ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl); if (ret != X86EMUL_CONTINUE) return ret; - ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl, true); + ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl); if (ret != X86EMUL_CONTINUE) return ret; @@ -2597,25 +2594,26 @@ static int load_state_from_tss32(struct x86_emulate_ctxt *ctxt, * Now load segment descriptors. If fault happenes at this stage * it is handled in a context of new task */ - ret = __load_segment_descriptor(ctxt, tss->ldt_selector, VCPU_SREG_LDTR, cpl, true); + ret = __load_segment_descriptor(ctxt, tss->ldt_selector, VCPU_SREG_LDTR, + cpl); if (ret != X86EMUL_CONTINUE) return ret; - ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl, true); + ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl); if (ret != X86EMUL_CONTINUE) return ret; - ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl, true); + ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl); if (ret != X86EMUL_CONTINUE) return ret; - ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl, true); + ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl); if (ret != X86EMUL_CONTINUE) return ret; - ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl, true); + ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl); if (ret != X86EMUL_CONTINUE) return ret; - ret = __load_segment_descriptor(ctxt, tss->fs, VCPU_SREG_FS, cpl, true); + ret = __load_segment_descriptor(ctxt, tss->fs, VCPU_SREG_FS, cpl); if (ret != X86EMUL_CONTINUE) return ret; - ret = __load_segment_descriptor(ctxt, tss->gs, VCPU_SREG_GS, cpl, true); + ret = __load_segment_descriptor(ctxt, tss->
Re: [PATCH] KVM: x86: Revert "check CS.DPL against RPL during task switch"
On Aug 18, 2014, at 12:13 AM, Paolo Bonzini wrote: > Il 17/08/2014 23:09, Paolo Bonzini ha scritto: >> Il 17/08/2014 21:32, Nadav Amit ha scritto: >>> This reverts commit 5045b468037dfe1c848827ce10e99d87f5669160. Although the >>> cs.dpl=cs.rpl check is mentioned in table 7-1 of the SDM as causing a #TSS >>> exception, it is not mentioned in table 6-6 that lists "invalid TSS >>> conditions" >>> which cause #TSS exceptions. As it causes some tests, which pass on >>> bare-metal, >>> to fail - it should be reverted. >> >> Right. However, I think reverting the patch is too big a hammer. We >> still need in_task_switch to raise TS_VECTOR instead of GP_VECTOR, so I >> propose instead something like: >> >> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c >> index 56657b0bb3bb..cd230b035514 100644 >> --- a/arch/x86/kvm/emulate.c >> +++ b/arch/x86/kvm/emulate.c >> @@ -1468,7 +1468,7 @@ static int __load_segment_descriptor(struct >> x86_emulate_ctxt *ctxt, >> return ret; >> >> err_code = selector & 0xfffc; >> -err_vec = GP_VECTOR; >> +err_vec = in_task_switch ? TS_VECTOR : GP_VECTOR; >> >> /* can't load system descriptor into segment selector */ >> if (seg <= VCPU_SREG_GS && !seg_desc.s) >> @@ -1491,9 +1491,6 @@ static int __load_segment_descriptor(struct >> x86_emulate_ctxt *ctxt, >> goto exception; >> break; >> case VCPU_SREG_CS: >> -if (in_task_switch && rpl != dpl) >> -goto exception; >> - >> if (!(seg_desc.type & 8)) >> goto exception; >> > > Also, what about the rpl > cpl test below, for non-conforming code > segments? It is not mentioned in table 6-6 either. As far as I understand, after task-switch cpl = cs.rpl. This is how the load_state_from_tss32 does it, and follows SDM 7.3, Task Switching: "The new task begins executing at the privilege level specified in the CPL field of the CS register, which is loaded from the TSS.” As a result, this condition can never occur during task-switch. Do I miss anything? Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Revert "check CS.DPL against RPL during task switch"
On Aug 18, 2014, at 12:09 AM, Paolo Bonzini wrote: > Il 17/08/2014 21:32, Nadav Amit ha scritto: >> This reverts commit 5045b468037dfe1c848827ce10e99d87f5669160. Although the >> cs.dpl=cs.rpl check is mentioned in table 7-1 of the SDM as causing a #TSS >> exception, it is not mentioned in table 6-6 that lists "invalid TSS >> conditions" >> which cause #TSS exceptions. As it causes some tests, which pass on >> bare-metal, >> to fail - it should be reverted. > > Right. However, I think reverting the patch is too big a hammer. We > still need in_task_switch to raise TS_VECTOR instead of GP_VECTOR, so I > propose instead something like: > > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c > index 56657b0bb3bb..cd230b035514 100644 > --- a/arch/x86/kvm/emulate.c > +++ b/arch/x86/kvm/emulate.c > @@ -1468,7 +1468,7 @@ static int __load_segment_descriptor(struct > x86_emulate_ctxt *ctxt, > return ret; > > err_code = selector & 0xfffc; > - err_vec = GP_VECTOR; > + err_vec = in_task_switch ? TS_VECTOR : GP_VECTOR; > > /* can't load system descriptor into segment selector */ > if (seg <= VCPU_SREG_GS && !seg_desc.s) > @@ -1491,9 +1491,6 @@ static int __load_segment_descriptor(struct > x86_emulate_ctxt *ctxt, > goto exception; > break; > case VCPU_SREG_CS: > - if (in_task_switch && rpl != dpl) > - goto exception; > - > if (!(seg_desc.type & 8)) > goto exception; > > > either in a single patch or as two separate patch. I'll test this against > your test case and repost (not before Tuesday). > I missed the TS_VECTOR thing. Yes, in that case, full revert makes no sense. I’m in no hurry with this patch, and I posted it during your vacation only because it is a recent bug. Anyhow, you may want to look at another patch I sent, "KVM: x86: Avoid emulating instructions on #UD mistakenly”, ASAP. It fixes a bug that was introduced into 3.17-RC1 and is visible in userspace. Nadav signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [PATCH] KVM: x86: Increase the number of fixed MTRR regs to 10
This should have been a benign patch. I'll try to get windows 7 installation disk and check ASAP. Nadav > On 18 Aug 2014, at 05:17, Wanpeng Li wrote: > > Hi Nadav, >> On Wed, Jun 18, 2014 at 05:21:19PM +0300, Nadav Amit wrote: >> Recent Intel CPUs have 10 variable range MTRRs. Since operating systems >> sometime make assumptions on CPUs while they ignore capability MSRs, it is >> better for KVM to be consistent with recent CPUs. Reporting more MTRRs than >> actually supported has no functional implications. >> >> Signed-off-by: Nadav Amit >> --- >> arch/x86/include/asm/kvm_host.h | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/arch/x86/include/asm/kvm_host.h >> b/arch/x86/include/asm/kvm_host.h >> index 4931415..0bab29d 100644 >> --- a/arch/x86/include/asm/kvm_host.h >> +++ b/arch/x86/include/asm/kvm_host.h >> @@ -95,7 +95,7 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t >> base_gfn, int level) >> #define KVM_REFILL_PAGES 25 >> #define KVM_MAX_CPUID_ENTRIES 80 >> #define KVM_NR_FIXED_MTRR_REGION 88 >> -#define KVM_NR_VAR_MTRR 8 >> +#define KVM_NR_VAR_MTRR 10 > > We observed that there is obvious regression caused by this commit, 32bit > win7 guest show blue screen during boot. > > Regards, > Wanpeng Li > >> #define ASYNC_PF_PER_VCPU 64 >> >> -- >> 1.9.1 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Increase the number of fixed MTRR regs to 10
I reproduced the blue-screen. Let me to to figure it out. Nadav On Aug 18, 2014, at 11:11 AM, Wanpeng Li wrote: > On Mon, Aug 18, 2014 at 09:39:39AM +0300, Nadav Amit wrote: >> This should have been a benign patch. I'll try to get windows 7 installation >> disk and check ASAP. >> > > In addition, it just can be reproduced on 32bit win7 w/ MP enabled, in > case UP can't be reproduced. > > Regards, > Wanpeng Li > >> Nadav >> >>> On 18 Aug 2014, at 05:17, Wanpeng Li wrote: >>> >>> Hi Nadav, >>>> On Wed, Jun 18, 2014 at 05:21:19PM +0300, Nadav Amit wrote: >>>> Recent Intel CPUs have 10 variable range MTRRs. Since operating systems >>>> sometime make assumptions on CPUs while they ignore capability MSRs, it is >>>> better for KVM to be consistent with recent CPUs. Reporting more MTRRs than >>>> actually supported has no functional implications. >>>> >>>> Signed-off-by: Nadav Amit >>>> --- >>>> arch/x86/include/asm/kvm_host.h | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/arch/x86/include/asm/kvm_host.h >>>> b/arch/x86/include/asm/kvm_host.h >>>> index 4931415..0bab29d 100644 >>>> --- a/arch/x86/include/asm/kvm_host.h >>>> +++ b/arch/x86/include/asm/kvm_host.h >>>> @@ -95,7 +95,7 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t >>>> base_gfn, int level) >>>> #define KVM_REFILL_PAGES 25 >>>> #define KVM_MAX_CPUID_ENTRIES 80 >>>> #define KVM_NR_FIXED_MTRR_REGION 88 >>>> -#define KVM_NR_VAR_MTRR 8 >>>> +#define KVM_NR_VAR_MTRR 10 >>> >>> We observed that there is obvious regression caused by this commit, 32bit >>> win7 guest show blue screen during boot. >>> >>> Regards, >>> Wanpeng Li >>> >>>> #define ASYNC_PF_PER_VCPU 64 >>>> >>>> -- >>>> 1.9.1 >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe kvm" in >>>> the body of a message to majord...@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> -- >>> To unsubscribe from this list: send the line "unsubscribe kvm" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Increase the number of fixed MTRR regs to 10
The cause for the blue-screen appears to be seabios, which leaves only 0x20 slots for “smp_mtrr”s. Apparently, the increase in the variable range MTRR count caused it to exhaust the available slots. As a result, some MSRs are not initialised by the BIOS (specifically, 3.5-4GB are not marked as UC), and cause Windows to panic. Once we increase the size of the array smp_mtrr in seabios, Windows boots. Paolo, you may wish to revert the patch. Please note that it was applied to some stable branches. Nadav On Aug 18, 2014, at 12:39 PM, Nadav Amit wrote: > I reproduced the blue-screen. Let me to to figure it out. > > Nadav > > On Aug 18, 2014, at 11:11 AM, Wanpeng Li wrote: > >> On Mon, Aug 18, 2014 at 09:39:39AM +0300, Nadav Amit wrote: >>> This should have been a benign patch. I'll try to get windows 7 >>> installation disk and check ASAP. >>> >> >> In addition, it just can be reproduced on 32bit win7 w/ MP enabled, in >> case UP can't be reproduced. >> >> Regards, >> Wanpeng Li >> >>> Nadav >>> >>>> On 18 Aug 2014, at 05:17, Wanpeng Li wrote: >>>> >>>> Hi Nadav, >>>>> On Wed, Jun 18, 2014 at 05:21:19PM +0300, Nadav Amit wrote: >>>>> Recent Intel CPUs have 10 variable range MTRRs. Since operating systems >>>>> sometime make assumptions on CPUs while they ignore capability MSRs, it is >>>>> better for KVM to be consistent with recent CPUs. Reporting more MTRRs >>>>> than >>>>> actually supported has no functional implications. >>>>> >>>>> Signed-off-by: Nadav Amit >>>>> --- >>>>> arch/x86/include/asm/kvm_host.h | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/arch/x86/include/asm/kvm_host.h >>>>> b/arch/x86/include/asm/kvm_host.h >>>>> index 4931415..0bab29d 100644 >>>>> --- a/arch/x86/include/asm/kvm_host.h >>>>> +++ b/arch/x86/include/asm/kvm_host.h >>>>> @@ -95,7 +95,7 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t >>>>> base_gfn, int level) >>>>> #define KVM_REFILL_PAGES 25 >>>>> #define KVM_MAX_CPUID_ENTRIES 80 >>>>> #define KVM_NR_FIXED_MTRR_REGION 88 >>>>> -#define KVM_NR_VAR_MTRR 8 >>>>> +#define KVM_NR_VAR_MTRR 10 >>>> >>>> We observed that there is obvious regression caused by this commit, 32bit >>>> win7 guest show blue screen during boot. >>>> >>>> Regards, >>>> Wanpeng Li >>>> >>>>> #define ASYNC_PF_PER_VCPU 64 >>>>> >>>>> -- >>>>> 1.9.1 >>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in >>>>> the body of a message to majord...@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe kvm" in >>>> the body of a message to majord...@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Avoid emulating instructions on #UD mistakenly
On Aug 18, 2014, at 11:31 AM, Paolo Bonzini wrote: > Il 13/08/2014 16:21, Nadav Amit ha scritto: >> Correction: the word “never” in the message is too harsh. >> Nonetheless, there is a regression bug. I encountered it with “wrfsbase” >> instruction. > > So KVM is emulating wrfsbase even if the host doesn't support it? KVM doesn’t know wrfsbase - wrfsbase is encoded like clflush with rep-prefix. Therefore, the emulator thinks it can emulate it as clflush, and eliminates the #UD. > > I'm swapping the order of the two operands of &&, since the first one will > almost > always be true and the second one will almost always be false. > > Also, there's now no need to test EmulateOnUD in the condition below. Does > the > below look good to you? > > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c > index 37a83b24e040..ef117b842334 100644 > --- a/arch/x86/kvm/emulate.c > +++ b/arch/x86/kvm/emulate.c > @@ -4394,11 +4394,11 @@ done_prefixes: > > ctxt->execute = opcode.u.execute; > > - if (!(ctxt->d & EmulateOnUD) && ctxt->ud) > + if (unlikely(ctxt->ud) && likely(!(ctxt->d & EmulateOnUD))) > return EMULATION_FAILED; > > if (unlikely(ctxt->d & > - > (NotImpl|EmulateOnUD|Stack|Op3264|Sse|Mmx|Intercept|CheckPerm))) { > + (NotImpl|Stack|Op3264|Sse|Mmx|Intercept|CheckPerm))) { > /* >* These are copied unconditionally here, and checked > unconditionally >* in x86_emulate_insn. > Sure. Until I find some bugs in it. ;-) Thanks, Nadav signature.asc Description: Message signed with OpenPGP using GPGMail