Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote: H. Peter Anvin h...@zytor.com writes: On 06/05/2013 03:08 PM, Anthony Liguori wrote: Definitely an option. However, we want to be able to boot from native devices, too, so having an I/O BAR (which would not be used by the OS driver) should still at the very least be an option. What makes it so difficult to work with an MMIO bar for PCI-e? With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight forward. Is there something special about PCI-e here? It's not tracking allocation. It is that accessing memory above 1 MiB is incredibly painful in the BIOS environment, which basically means MMIO is inaccessible. Oh, you mean in real mode. SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout. There are loads of ASSERT32FLAT()s in the code to make sure of this. Well, not exactly. Initialization is done in 32bit, but disk reads/writes are done in 16bit mode since it should work from int13 interrupt handler. The only way I know to access MMIO bars from 16 bit is to use SMM which we do not have in KVM. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021
On Wed, Jun 05, 2013 at 02:51:19PM +0200, Stefan Pietsch wrote: On 05.06.2013 14:10, Gleb Natapov wrote: On Wed, Jun 05, 2013 at 01:57:25PM +0200, Stefan Pietsch wrote: On 19.05.2013 14:32, Gleb Natapov wrote: On Sun, May 19, 2013 at 02:00:31AM +0100, Ben Hutchings wrote: Dear KVM maintainers, it appears that there is a gap in x86 emulation, at least on a 32-bit host. Stefan found this when running GRML, a live distribution which can be downloaded from: http://download.grml.org/grml32-full_2013.02.iso. His original reported is at http://bugs.debian.org/707257. Can you verify with latest linux.git HEAD? It works for me there on 64bit. There were a lot of problems fixed in this area in 3.9/3.10 time frame, so it would be helpful if you'll test 32bit before I install one myself. Kernel version 3.9.4-1 (linux-image-3.9-1-686-pae) made things worse. The virtual machine tries to boot the kernel, but stops after a few seconds and the kern.log shows: At what point does it stop? The machine stops at: Performance Events: Broken PMU hardware detected, using software events only. Failed to access perfctr msr (MSR c1 is 0) Enabling APIC mode: Flat. Using 1 I/O APICs Timer initialization is what comes next. I tried 32bit kernel compiled from kvm.git next (3.10.0-rc2+) branch and upstream qemu and I cannot reproduce the problem. The guest boots fine. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm-unit-tests: Add test case for accessing bpl via modr/m
On Thu, Jun 6, 2013 at 1:45 PM, Gleb Natapov g...@redhat.com wrote: On Thu, Jun 06, 2013 at 01:03:44PM +0800, Arthur Chunqi Li wrote: Test access to %bpl via modr/m addressing mode. This case can test another bug in the boot of RHEL5.9 64-bit. We have growing number of instructions tests using the same tlb trick. I think it is time to make the code more generic. Create a function that receives instruction to check and all the tlb games will be done by that function. Should I do some work to merge all these test cases? Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/emulator.c | 41 + 1 file changed, 41 insertions(+) diff --git a/x86/emulator.c b/x86/emulator.c index 96576e5..3563971 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -901,6 +901,45 @@ static void test_simplealu(u32 *mem) report(test, *mem == 0x8400); } +static void test_bpl_modrm(uint64_t *mem, uint8_t *insn_page, + uint8_t *alt_insn_page, void *insn_ram) +{ + ulong *cr3 = (ulong *)read_cr3(); + uint16_t cx = 0; + + // Pad with RET instructions + memset(insn_page, 0xc3, 4096); + memset(alt_insn_page, 0xc3, 4096); + // Place a trapping instruction in the page to trigger a VMEXIT + insn_page[0] = 0x66; // mov $0x4321, %cx + insn_page[1] = 0xb9; + insn_page[2] = 0x21; + insn_page[3] = 0x43; + insn_page[4] = 0x89; // mov %eax, (%rax) + insn_page[5] = 0x00; + insn_page[6] = 0x90; // nop + // Place mov %cl, %bpl in alt_insn_page for emulator to execuate + // If emulator mistaken addressing %bpl, %cl may be moved to %ch + // %cx will be broken to 0x2121, not 0x4321 + alt_insn_page[4] = 0x40; + alt_insn_page[5] = 0x88; + alt_insn_page[6] = 0xcd; + + // Load the code TLB with insn_page, but point the page tables at + // alt_insn_page (and keep the data TLB clear, for AMD decode assist). + // This will make the CPU trap on the insn_page instruction but the + // hypervisor will see alt_insn_page. + install_page(cr3, virt_to_phys(insn_page), insn_ram); + // Load code TLB + invlpg(insn_ram); + asm volatile(call *%0 : : r(insn_ram+3)); + // Trap, let hypervisor emulate at alt_insn_page + install_page(cr3, virt_to_phys(alt_insn_page), insn_ram); + asm volatile(call *%0 : : r(insn_ram), a(mem)); + asm volatile(:=c(cx)); Why not add the constrain to previous asm? I will merge them in next version. + report(access bpl in modr/m, cx == 0x4321); +} + int main() { void *mem; @@ -964,6 +1003,8 @@ int main() test_string_io_mmio(mem); + test_bpl_modrm(mem, insn_page, alt_insn_page, insn_ram); + printf(\nSUMMARY: %d tests, %d failures\n, tests, fails); return fails ? 1 : 0; } -- 1.7.9.5 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Test case of emulating multibyte NOP
On Thu, Jun 6, 2013 at 1:40 PM, Gleb Natapov g...@redhat.com wrote: On Thu, Jun 06, 2013 at 12:28:16AM +0800, 李春奇 Arthur Chunqi Li wrote: On Thu, Jun 6, 2013 at 12:13 AM, Gleb Natapov g...@redhat.com wrote: This time the email is perfect :) On Thu, Jun 06, 2013 at 12:02:52AM +0800, Arthur Chunqi Li wrote: Add multibyte NOP test case to kvm-unit-tests. This version adds test cases into x86/realmode.c. This can test one of bugs when booting RHEL5.9 64-bit. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/realmode.c | 24 1 file changed, 24 insertions(+) diff --git a/x86/realmode.c b/x86/realmode.c index 981be08..e103ca6 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -1504,6 +1504,29 @@ static void test_fninit(void) report(fninit, 0, fsw == 0 (fcw 0x103f) == 0x003f); } +static void test_nopl(void) +{ + MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop + MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop + MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop + MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop But all nops below that are not supported in 16 bit mode. You can disassemble realmode.elf in 16bit node (objdump -z -d -mi8086 x86/realmode.elf) and check yourself. Lets not complicate things for now and test only those that are easy to test. Yes. But what if a 7-bytes nop runs in 16bit mode? Just the same as https://bugzilla.redhat.com/show_bug.cgi?id=967652 It cannot. In 16 bit mode it is decoded as two instructions: 0f 1f 80 00 00 nopw 0x0(%bx,%si) 00 00 add%al,(%bx,%si) OK, I will just test the first four nop instructions. Should I commit another patch? Arthur. DR6=0ff0 DR7=0400 EFER=0500 Code=00 00 e9 50 ff ff ff 00 00 00 00 85 d2 74 20 45 31 c0 31 c9 0f 1f 80 00 00 00 00 0f b6 04 31 41 83 c0 01 88 04 39 48 83 c1 01 41 39 d0 75 ec 48 89 f8 The error code is 0f 1f 80 00 00 00 00, which is a 7-bytes nop. Will the emulator runs well in that case when booting RHEL5.9 64-bit? Arthur + MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 bytes nop + MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 6 bytes nop + MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 0x00\n\r); // 7 bytes nop + MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 8 bytes nop + MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 9 bytes nop + exec_in_big_real_mode(insn_nopl1); + exec_in_big_real_mode(insn_nopl2); + exec_in_big_real_mode(insn_nopl3); + exec_in_big_real_mode(insn_nopl4); + exec_in_big_real_mode(insn_nopl5); + exec_in_big_real_mode(insn_nopl6); + exec_in_big_real_mode(insn_nopl7); + exec_in_big_real_mode(insn_nopl8); + exec_in_big_real_mode(insn_nopl9); + report(nopl, 0, 1); +} + void realmode_start(void) { test_null(); @@ -1548,6 +1571,7 @@ void realmode_start(void) test_xlat(); test_salc(); test_fninit(); + test_nopl(); exit(0); } -- 1.7.9.5 -- Gleb. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm-unit-tests: Add test case for accessing bpl via modr/m
On Thu, Jun 06, 2013 at 02:47:49PM +0800, 李春奇 Arthur Chunqi Li wrote: On Thu, Jun 6, 2013 at 1:45 PM, Gleb Natapov g...@redhat.com wrote: On Thu, Jun 06, 2013 at 01:03:44PM +0800, Arthur Chunqi Li wrote: Test access to %bpl via modr/m addressing mode. This case can test another bug in the boot of RHEL5.9 64-bit. We have growing number of instructions tests using the same tlb trick. I think it is time to make the code more generic. Create a function that receives instruction to check and all the tlb games will be done by that function. Should I do some work to merge all these test cases? It would be nice, yes. I also have an idea on how to improve test reliability. Since it relies on tlb to be out of sync with actual page table a vmexit in a wrong time can break this assumption and wrong instruction will be emulated (the one from insn_page instead of alt_insn_page). If we make the instruction on insn_page place special value somewhere and check it after the test we can see if the wrong instruction was executed and rerun the test. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/emulator.c | 41 + 1 file changed, 41 insertions(+) diff --git a/x86/emulator.c b/x86/emulator.c index 96576e5..3563971 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -901,6 +901,45 @@ static void test_simplealu(u32 *mem) report(test, *mem == 0x8400); } +static void test_bpl_modrm(uint64_t *mem, uint8_t *insn_page, + uint8_t *alt_insn_page, void *insn_ram) +{ + ulong *cr3 = (ulong *)read_cr3(); + uint16_t cx = 0; + + // Pad with RET instructions + memset(insn_page, 0xc3, 4096); + memset(alt_insn_page, 0xc3, 4096); + // Place a trapping instruction in the page to trigger a VMEXIT + insn_page[0] = 0x66; // mov $0x4321, %cx + insn_page[1] = 0xb9; + insn_page[2] = 0x21; + insn_page[3] = 0x43; + insn_page[4] = 0x89; // mov %eax, (%rax) + insn_page[5] = 0x00; + insn_page[6] = 0x90; // nop + // Place mov %cl, %bpl in alt_insn_page for emulator to execuate + // If emulator mistaken addressing %bpl, %cl may be moved to %ch + // %cx will be broken to 0x2121, not 0x4321 + alt_insn_page[4] = 0x40; + alt_insn_page[5] = 0x88; + alt_insn_page[6] = 0xcd; + + // Load the code TLB with insn_page, but point the page tables at + // alt_insn_page (and keep the data TLB clear, for AMD decode assist). + // This will make the CPU trap on the insn_page instruction but the + // hypervisor will see alt_insn_page. + install_page(cr3, virt_to_phys(insn_page), insn_ram); + // Load code TLB + invlpg(insn_ram); + asm volatile(call *%0 : : r(insn_ram+3)); + // Trap, let hypervisor emulate at alt_insn_page + install_page(cr3, virt_to_phys(alt_insn_page), insn_ram); + asm volatile(call *%0 : : r(insn_ram), a(mem)); + asm volatile(:=c(cx)); Why not add the constrain to previous asm? I will merge them in next version. + report(access bpl in modr/m, cx == 0x4321); +} + int main() { void *mem; @@ -964,6 +1003,8 @@ int main() test_string_io_mmio(mem); + test_bpl_modrm(mem, insn_page, alt_insn_page, insn_ram); + printf(\nSUMMARY: %d tests, %d failures\n, tests, fails); return fails ? 1 : 0; } -- 1.7.9.5 -- Gleb. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Test case of emulating multibyte NOP
On Thu, Jun 06, 2013 at 02:49:14PM +0800, 李春奇 Arthur Chunqi Li wrote: On Thu, Jun 6, 2013 at 1:40 PM, Gleb Natapov g...@redhat.com wrote: On Thu, Jun 06, 2013 at 12:28:16AM +0800, 李春奇 Arthur Chunqi Li wrote: On Thu, Jun 6, 2013 at 12:13 AM, Gleb Natapov g...@redhat.com wrote: This time the email is perfect :) On Thu, Jun 06, 2013 at 12:02:52AM +0800, Arthur Chunqi Li wrote: Add multibyte NOP test case to kvm-unit-tests. This version adds test cases into x86/realmode.c. This can test one of bugs when booting RHEL5.9 64-bit. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/realmode.c | 24 1 file changed, 24 insertions(+) diff --git a/x86/realmode.c b/x86/realmode.c index 981be08..e103ca6 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -1504,6 +1504,29 @@ static void test_fninit(void) report(fninit, 0, fsw == 0 (fcw 0x103f) == 0x003f); } +static void test_nopl(void) +{ + MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop + MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop + MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop + MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop But all nops below that are not supported in 16 bit mode. You can disassemble realmode.elf in 16bit node (objdump -z -d -mi8086 x86/realmode.elf) and check yourself. Lets not complicate things for now and test only those that are easy to test. Yes. But what if a 7-bytes nop runs in 16bit mode? Just the same as https://bugzilla.redhat.com/show_bug.cgi?id=967652 It cannot. In 16 bit mode it is decoded as two instructions: 0f 1f 80 00 00 nopw 0x0(%bx,%si) 00 00 add%al,(%bx,%si) OK, I will just test the first four nop instructions. Should I commit another patch? Yes, all others will have to go into emulator.c. Arthur. DR6=0ff0 DR7=0400 EFER=0500 Code=00 00 e9 50 ff ff ff 00 00 00 00 85 d2 74 20 45 31 c0 31 c9 0f 1f 80 00 00 00 00 0f b6 04 31 41 83 c0 01 88 04 39 48 83 c1 01 41 39 d0 75 ec 48 89 f8 The error code is 0f 1f 80 00 00 00 00, which is a 7-bytes nop. Will the emulator runs well in that case when booting RHEL5.9 64-bit? Arthur + MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 bytes nop + MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 6 bytes nop + MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 0x00\n\r); // 7 bytes nop + MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 8 bytes nop + MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 9 bytes nop + exec_in_big_real_mode(insn_nopl1); + exec_in_big_real_mode(insn_nopl2); + exec_in_big_real_mode(insn_nopl3); + exec_in_big_real_mode(insn_nopl4); + exec_in_big_real_mode(insn_nopl5); + exec_in_big_real_mode(insn_nopl6); + exec_in_big_real_mode(insn_nopl7); + exec_in_big_real_mode(insn_nopl8); + exec_in_big_real_mode(insn_nopl9); + report(nopl, 0, 1); +} + void realmode_start(void) { test_null(); @@ -1548,6 +1571,7 @@ void realmode_start(void) test_xlat(); test_salc(); test_fninit(); + test_nopl(); exit(0); } -- 1.7.9.5 -- Gleb. -- Gleb. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Test case of emulating multibyte NOP
On Thu, Jun 6, 2013 at 3:02 PM, Gleb Natapov g...@redhat.com wrote: On Thu, Jun 06, 2013 at 02:49:14PM +0800, 李春奇 Arthur Chunqi Li wrote: On Thu, Jun 6, 2013 at 1:40 PM, Gleb Natapov g...@redhat.com wrote: On Thu, Jun 06, 2013 at 12:28:16AM +0800, 李春奇 Arthur Chunqi Li wrote: On Thu, Jun 6, 2013 at 12:13 AM, Gleb Natapov g...@redhat.com wrote: This time the email is perfect :) On Thu, Jun 06, 2013 at 12:02:52AM +0800, Arthur Chunqi Li wrote: Add multibyte NOP test case to kvm-unit-tests. This version adds test cases into x86/realmode.c. This can test one of bugs when booting RHEL5.9 64-bit. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/realmode.c | 24 1 file changed, 24 insertions(+) diff --git a/x86/realmode.c b/x86/realmode.c index 981be08..e103ca6 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -1504,6 +1504,29 @@ static void test_fninit(void) report(fninit, 0, fsw == 0 (fcw 0x103f) == 0x003f); } +static void test_nopl(void) +{ + MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop + MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop + MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop + MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop But all nops below that are not supported in 16 bit mode. You can disassemble realmode.elf in 16bit node (objdump -z -d -mi8086 x86/realmode.elf) and check yourself. Lets not complicate things for now and test only those that are easy to test. Yes. But what if a 7-bytes nop runs in 16bit mode? Just the same as https://bugzilla.redhat.com/show_bug.cgi?id=967652 It cannot. In 16 bit mode it is decoded as two instructions: 0f 1f 80 00 00 nopw 0x0(%bx,%si) 00 00 add%al,(%bx,%si) OK, I will just test the first four nop instructions. Should I commit another patch? Yes, all others will have to go into emulator.c. You mean I need also add another test for nopl5~nop9 in emulator.c with the trick emulator mode? I will commit a modified one for realmode.c since some other works should be done in emulator.c. Arthur. DR6=0ff0 DR7=0400 EFER=0500 Code=00 00 e9 50 ff ff ff 00 00 00 00 85 d2 74 20 45 31 c0 31 c9 0f 1f 80 00 00 00 00 0f b6 04 31 41 83 c0 01 88 04 39 48 83 c1 01 41 39 d0 75 ec 48 89 f8 The error code is 0f 1f 80 00 00 00 00, which is a 7-bytes nop. Will the emulator runs well in that case when booting RHEL5.9 64-bit? Arthur + MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 bytes nop + MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 6 bytes nop + MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 0x00\n\r); // 7 bytes nop + MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 8 bytes nop + MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 9 bytes nop + exec_in_big_real_mode(insn_nopl1); + exec_in_big_real_mode(insn_nopl2); + exec_in_big_real_mode(insn_nopl3); + exec_in_big_real_mode(insn_nopl4); + exec_in_big_real_mode(insn_nopl5); + exec_in_big_real_mode(insn_nopl6); + exec_in_big_real_mode(insn_nopl7); + exec_in_big_real_mode(insn_nopl8); + exec_in_big_real_mode(insn_nopl9); + report(nopl, 0, 1); +} + void realmode_start(void) { test_null(); @@ -1548,6 +1571,7 @@ void realmode_start(void) test_xlat(); test_salc(); test_fninit(); + test_nopl(); exit(0); } -- 1.7.9.5 -- Gleb. -- Gleb. -- Gleb. -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021
On Thu, Jun 06, 2013 at 09:42:40AM +0300, Gleb Natapov wrote: On Wed, Jun 05, 2013 at 02:51:19PM +0200, Stefan Pietsch wrote: On 05.06.2013 14:10, Gleb Natapov wrote: On Wed, Jun 05, 2013 at 01:57:25PM +0200, Stefan Pietsch wrote: On 19.05.2013 14:32, Gleb Natapov wrote: On Sun, May 19, 2013 at 02:00:31AM +0100, Ben Hutchings wrote: Dear KVM maintainers, it appears that there is a gap in x86 emulation, at least on a 32-bit host. Stefan found this when running GRML, a live distribution which can be downloaded from: http://download.grml.org/grml32-full_2013.02.iso. His original reported is at http://bugs.debian.org/707257. Can you verify with latest linux.git HEAD? It works for me there on 64bit. There were a lot of problems fixed in this area in 3.9/3.10 time frame, so it would be helpful if you'll test 32bit before I install one myself. Kernel version 3.9.4-1 (linux-image-3.9-1-686-pae) made things worse. The virtual machine tries to boot the kernel, but stops after a few seconds and the kern.log shows: At what point does it stop? The machine stops at: Performance Events: Broken PMU hardware detected, using software events only. Failed to access perfctr msr (MSR c1 is 0) Enabling APIC mode: Flat. Using 1 I/O APICs Timer initialization is what comes next. I tried 32bit kernel compiled from kvm.git next (3.10.0-rc2+) branch and upstream qemu and I cannot reproduce the problem. The guest boots fine. Actually the branch I tested is master not next, but this should not make a difference. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Test case of emulating multibyte NOP
On Thu, Jun 6, 2013 at 3:17 PM, 李春奇 Arthur Chunqi Li yzt...@gmail.com wrote: On Thu, Jun 6, 2013 at 3:02 PM, Gleb Natapov g...@redhat.com wrote: On Thu, Jun 06, 2013 at 02:49:14PM +0800, 李春奇 Arthur Chunqi Li wrote: On Thu, Jun 6, 2013 at 1:40 PM, Gleb Natapov g...@redhat.com wrote: On Thu, Jun 06, 2013 at 12:28:16AM +0800, 李春奇 Arthur Chunqi Li wrote: On Thu, Jun 6, 2013 at 12:13 AM, Gleb Natapov g...@redhat.com wrote: This time the email is perfect :) On Thu, Jun 06, 2013 at 12:02:52AM +0800, Arthur Chunqi Li wrote: Add multibyte NOP test case to kvm-unit-tests. This version adds test cases into x86/realmode.c. This can test one of bugs when booting RHEL5.9 64-bit. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/realmode.c | 24 1 file changed, 24 insertions(+) diff --git a/x86/realmode.c b/x86/realmode.c index 981be08..e103ca6 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -1504,6 +1504,29 @@ static void test_fninit(void) report(fninit, 0, fsw == 0 (fcw 0x103f) == 0x003f); } +static void test_nopl(void) +{ + MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop + MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop + MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop + MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop But all nops below that are not supported in 16 bit mode. You can disassemble realmode.elf in 16bit node (objdump -z -d -mi8086 x86/realmode.elf) and check yourself. Lets not complicate things for now and test only those that are easy to test. Yes. But what if a 7-bytes nop runs in 16bit mode? Just the same as https://bugzilla.redhat.com/show_bug.cgi?id=967652 It cannot. In 16 bit mode it is decoded as two instructions: 0f 1f 80 00 00 nopw 0x0(%bx,%si) 00 00 add%al,(%bx,%si) OK, I will just test the first four nop instructions. Should I commit another patch? Yes, all others will have to go into emulator.c. You mean I need also add another test for nopl5~nop9 in emulator.c with the trick emulator mode? I will commit a modified one for realmode.c since some other works should be done in emulator.c. Since we need to place some relevant codes in emulator.c, why don't we place all the tests in emulator.c? Arthur. Arthur. DR6=0ff0 DR7=0400 EFER=0500 Code=00 00 e9 50 ff ff ff 00 00 00 00 85 d2 74 20 45 31 c0 31 c9 0f 1f 80 00 00 00 00 0f b6 04 31 41 83 c0 01 88 04 39 48 83 c1 01 41 39 d0 75 ec 48 89 f8 The error code is 0f 1f 80 00 00 00 00, which is a 7-bytes nop. Will the emulator runs well in that case when booting RHEL5.9 64-bit? Arthur + MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 bytes nop + MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 6 bytes nop + MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 0x00\n\r); // 7 bytes nop + MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 8 bytes nop + MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 9 bytes nop + exec_in_big_real_mode(insn_nopl1); + exec_in_big_real_mode(insn_nopl2); + exec_in_big_real_mode(insn_nopl3); + exec_in_big_real_mode(insn_nopl4); + exec_in_big_real_mode(insn_nopl5); + exec_in_big_real_mode(insn_nopl6); + exec_in_big_real_mode(insn_nopl7); + exec_in_big_real_mode(insn_nopl8); + exec_in_big_real_mode(insn_nopl9); + report(nopl, 0, 1); +} + void realmode_start(void) { test_null(); @@ -1548,6 +1571,7 @@ void realmode_start(void) test_xlat(); test_salc(); test_fninit(); + test_nopl(); exit(0); } -- 1.7.9.5 -- Gleb. -- Gleb. -- Gleb. -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Test case of emulating multibyte NOP
On Thu, Jun 06, 2013 at 03:22:59PM +0800, 李春奇 Arthur Chunqi Li wrote: On Thu, Jun 6, 2013 at 3:17 PM, 李春奇 Arthur Chunqi Li yzt...@gmail.com wrote: On Thu, Jun 6, 2013 at 3:02 PM, Gleb Natapov g...@redhat.com wrote: On Thu, Jun 06, 2013 at 02:49:14PM +0800, 李春奇 Arthur Chunqi Li wrote: On Thu, Jun 6, 2013 at 1:40 PM, Gleb Natapov g...@redhat.com wrote: On Thu, Jun 06, 2013 at 12:28:16AM +0800, 李春奇 Arthur Chunqi Li wrote: On Thu, Jun 6, 2013 at 12:13 AM, Gleb Natapov g...@redhat.com wrote: This time the email is perfect :) On Thu, Jun 06, 2013 at 12:02:52AM +0800, Arthur Chunqi Li wrote: Add multibyte NOP test case to kvm-unit-tests. This version adds test cases into x86/realmode.c. This can test one of bugs when booting RHEL5.9 64-bit. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/realmode.c | 24 1 file changed, 24 insertions(+) diff --git a/x86/realmode.c b/x86/realmode.c index 981be08..e103ca6 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -1504,6 +1504,29 @@ static void test_fninit(void) report(fninit, 0, fsw == 0 (fcw 0x103f) == 0x003f); } +static void test_nopl(void) +{ + MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop + MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop + MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop + MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop But all nops below that are not supported in 16 bit mode. You can disassemble realmode.elf in 16bit node (objdump -z -d -mi8086 x86/realmode.elf) and check yourself. Lets not complicate things for now and test only those that are easy to test. Yes. But what if a 7-bytes nop runs in 16bit mode? Just the same as https://bugzilla.redhat.com/show_bug.cgi?id=967652 It cannot. In 16 bit mode it is decoded as two instructions: 0f 1f 80 00 00 nopw 0x0(%bx,%si) 00 00 add%al,(%bx,%si) OK, I will just test the first four nop instructions. Should I commit another patch? Yes, all others will have to go into emulator.c. You mean I need also add another test for nopl5~nop9 in emulator.c with the trick emulator mode? I will commit a modified one for realmode.c since some other works should be done in emulator.c. Since we need to place some relevant codes in emulator.c, why don't we place all the tests in emulator.c? We can place those 4 in both. I do not always run all tests so it is nice to cover as much as possible in both. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm-unit-tests: Test case of emulating multibyte NOP
Add multibyte (1 to 4-bytes) NOPL test case to kvm-unit-tests x86/realmode.c. This test only consist of 16-bit NOPL insn, other test cases (5 to 9-bytes NOPL) should be placed in x86/emulator.c. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/realmode.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/x86/realmode.c b/x86/realmode.c index 981be08..3546771 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -1504,6 +1504,19 @@ static void test_fninit(void) report(fninit, 0, fsw == 0 (fcw 0x103f) == 0x003f); } +static void test_nopl(void) +{ + MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop + MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop + MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop + MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop + exec_in_big_real_mode(insn_nopl1); + exec_in_big_real_mode(insn_nopl2); + exec_in_big_real_mode(insn_nopl3); + exec_in_big_real_mode(insn_nopl4); + report(nopl, 0, 1); +} + void realmode_start(void) { test_null(); @@ -1548,6 +1561,7 @@ void realmode_start(void) test_xlat(); test_salc(); test_fninit(); + test_nopl(); exit(0); } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm-unit-tests: Add test case for accessing bpl via modr/m
On Thu, Jun 6, 2013 at 3:01 PM, Gleb Natapov g...@redhat.com wrote: On Thu, Jun 06, 2013 at 02:47:49PM +0800, 李春奇 Arthur Chunqi Li wrote: On Thu, Jun 6, 2013 at 1:45 PM, Gleb Natapov g...@redhat.com wrote: On Thu, Jun 06, 2013 at 01:03:44PM +0800, Arthur Chunqi Li wrote: Test access to %bpl via modr/m addressing mode. This case can test another bug in the boot of RHEL5.9 64-bit. We have growing number of instructions tests using the same tlb trick. I think it is time to make the code more generic. Create a function that receives instruction to check and all the tlb games will be done by that function. Should I do some work to merge all these test cases? It would be nice, yes. I also have an idea on how to improve test reliability. Since it relies on tlb to be out of sync with actual page table a vmexit in a wrong time can break this assumption and wrong instruction will be emulated (the one from insn_page instead of alt_insn_page). If we make the instruction on insn_page place special value somewhere and check it after the test we can see if the wrong instruction was executed and rerun the test. If I commit the patch to merge these test cases, should the patch base on what I have commit before (after patched of this mail), or base on the master thread? Arthur Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/emulator.c | 41 + 1 file changed, 41 insertions(+) diff --git a/x86/emulator.c b/x86/emulator.c index 96576e5..3563971 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -901,6 +901,45 @@ static void test_simplealu(u32 *mem) report(test, *mem == 0x8400); } +static void test_bpl_modrm(uint64_t *mem, uint8_t *insn_page, + uint8_t *alt_insn_page, void *insn_ram) +{ + ulong *cr3 = (ulong *)read_cr3(); + uint16_t cx = 0; + + // Pad with RET instructions + memset(insn_page, 0xc3, 4096); + memset(alt_insn_page, 0xc3, 4096); + // Place a trapping instruction in the page to trigger a VMEXIT + insn_page[0] = 0x66; // mov $0x4321, %cx + insn_page[1] = 0xb9; + insn_page[2] = 0x21; + insn_page[3] = 0x43; + insn_page[4] = 0x89; // mov %eax, (%rax) + insn_page[5] = 0x00; + insn_page[6] = 0x90; // nop + // Place mov %cl, %bpl in alt_insn_page for emulator to execuate + // If emulator mistaken addressing %bpl, %cl may be moved to %ch + // %cx will be broken to 0x2121, not 0x4321 + alt_insn_page[4] = 0x40; + alt_insn_page[5] = 0x88; + alt_insn_page[6] = 0xcd; + + // Load the code TLB with insn_page, but point the page tables at + // alt_insn_page (and keep the data TLB clear, for AMD decode assist). + // This will make the CPU trap on the insn_page instruction but the + // hypervisor will see alt_insn_page. + install_page(cr3, virt_to_phys(insn_page), insn_ram); + // Load code TLB + invlpg(insn_ram); + asm volatile(call *%0 : : r(insn_ram+3)); + // Trap, let hypervisor emulate at alt_insn_page + install_page(cr3, virt_to_phys(alt_insn_page), insn_ram); + asm volatile(call *%0 : : r(insn_ram), a(mem)); + asm volatile(:=c(cx)); Why not add the constrain to previous asm? I will merge them in next version. + report(access bpl in modr/m, cx == 0x4321); +} + int main() { void *mem; @@ -964,6 +1003,8 @@ int main() test_string_io_mmio(mem); + test_bpl_modrm(mem, insn_page, alt_insn_page, insn_ram); + printf(\nSUMMARY: %d tests, %d failures\n, tests, fails); return fails ? 1 : 0; } -- 1.7.9.5 -- Gleb. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm-unit-tests: Add test case for accessing bpl via modr/m
On Thu, Jun 06, 2013 at 03:42:56PM +0800, 李春奇 Arthur Chunqi Li wrote: On Thu, Jun 6, 2013 at 3:01 PM, Gleb Natapov g...@redhat.com wrote: On Thu, Jun 06, 2013 at 02:47:49PM +0800, 李春奇 Arthur Chunqi Li wrote: On Thu, Jun 6, 2013 at 1:45 PM, Gleb Natapov g...@redhat.com wrote: On Thu, Jun 06, 2013 at 01:03:44PM +0800, Arthur Chunqi Li wrote: Test access to %bpl via modr/m addressing mode. This case can test another bug in the boot of RHEL5.9 64-bit. We have growing number of instructions tests using the same tlb trick. I think it is time to make the code more generic. Create a function that receives instruction to check and all the tlb games will be done by that function. Should I do some work to merge all these test cases? It would be nice, yes. I also have an idea on how to improve test reliability. Since it relies on tlb to be out of sync with actual page table a vmexit in a wrong time can break this assumption and wrong instruction will be emulated (the one from insn_page instead of alt_insn_page). If we make the instruction on insn_page place special value somewhere and check it after the test we can see if the wrong instruction was executed and rerun the test. If I commit the patch to merge these test cases, should the patch base on what I have commit before (after patched of this mail), or base on the master thread? Master. First patch provides the infrastructure. Second converts existing users. Third adds new tests. Arthur Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/emulator.c | 41 + 1 file changed, 41 insertions(+) diff --git a/x86/emulator.c b/x86/emulator.c index 96576e5..3563971 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -901,6 +901,45 @@ static void test_simplealu(u32 *mem) report(test, *mem == 0x8400); } +static void test_bpl_modrm(uint64_t *mem, uint8_t *insn_page, + uint8_t *alt_insn_page, void *insn_ram) +{ + ulong *cr3 = (ulong *)read_cr3(); + uint16_t cx = 0; + + // Pad with RET instructions + memset(insn_page, 0xc3, 4096); + memset(alt_insn_page, 0xc3, 4096); + // Place a trapping instruction in the page to trigger a VMEXIT + insn_page[0] = 0x66; // mov $0x4321, %cx + insn_page[1] = 0xb9; + insn_page[2] = 0x21; + insn_page[3] = 0x43; + insn_page[4] = 0x89; // mov %eax, (%rax) + insn_page[5] = 0x00; + insn_page[6] = 0x90; // nop + // Place mov %cl, %bpl in alt_insn_page for emulator to execuate + // If emulator mistaken addressing %bpl, %cl may be moved to %ch + // %cx will be broken to 0x2121, not 0x4321 + alt_insn_page[4] = 0x40; + alt_insn_page[5] = 0x88; + alt_insn_page[6] = 0xcd; + + // Load the code TLB with insn_page, but point the page tables at + // alt_insn_page (and keep the data TLB clear, for AMD decode assist). + // This will make the CPU trap on the insn_page instruction but the + // hypervisor will see alt_insn_page. + install_page(cr3, virt_to_phys(insn_page), insn_ram); + // Load code TLB + invlpg(insn_ram); + asm volatile(call *%0 : : r(insn_ram+3)); + // Trap, let hypervisor emulate at alt_insn_page + install_page(cr3, virt_to_phys(alt_insn_page), insn_ram); + asm volatile(call *%0 : : r(insn_ram), a(mem)); + asm volatile(:=c(cx)); Why not add the constrain to previous asm? I will merge them in next version. + report(access bpl in modr/m, cx == 0x4321); +} + int main() { void *mem; @@ -964,6 +1003,8 @@ int main() test_string_io_mmio(mem); + test_bpl_modrm(mem, insn_page, alt_insn_page, insn_ram); + printf(\nSUMMARY: %d tests, %d failures\n, tests, fails); return fails ? 1 : 0; } -- 1.7.9.5 -- Gleb. -- Gleb. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm-unit-tests: Test case of emulating multibyte NOP
On Thu, Jun 06, 2013 at 03:38:29PM +0800, Arthur Chunqi Li wrote: Add multibyte (1 to 4-bytes) NOPL test case to kvm-unit-tests x86/realmode.c. This test only consist of 16-bit NOPL insn, other test cases (5 to 9-bytes NOPL) should be placed in x86/emulator.c. Applied, thanks! Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/realmode.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/x86/realmode.c b/x86/realmode.c index 981be08..3546771 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -1504,6 +1504,19 @@ static void test_fninit(void) report(fninit, 0, fsw == 0 (fcw 0x103f) == 0x003f); } +static void test_nopl(void) +{ + MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop + MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop + MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop + MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop + exec_in_big_real_mode(insn_nopl1); + exec_in_big_real_mode(insn_nopl2); + exec_in_big_real_mode(insn_nopl3); + exec_in_big_real_mode(insn_nopl4); + report(nopl, 0, 1); +} + void realmode_start(void) { test_null(); @@ -1548,6 +1561,7 @@ void realmode_start(void) test_xlat(); test_salc(); test_fninit(); + test_nopl(); exit(0); } -- 1.7.9.5 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote: Michael S. Tsirkin m...@redhat.com writes: On Mon, Jun 03, 2013 at 09:56:15AM +0930, Rusty Russell wrote: Michael S. Tsirkin m...@redhat.com writes: On Thu, May 30, 2013 at 08:53:45AM -0500, Anthony Liguori wrote: Rusty Russell ru...@rustcorp.com.au writes: Anthony Liguori aligu...@us.ibm.com writes: Forcing a guest driver change is a really big deal and I see no reason to do that unless there's a compelling reason to. So we're stuck with the 1.0 config layout for a very long time. We definitely must not force a guest change. The explicit aim of the standard is that legacy and 1.0 be backward compatible. One deliverable is a document detailing how this is done (effectively a summary of changes between what we have and 1.0). If 2.0 is fully backwards compatible, great. It seems like such a difference that that would be impossible but I need to investigate further. Regards, Anthony Liguori If you look at my patches you'll see how it works. Basically old guests use BAR0 new ones don't, so it's easy: BAR0 access means legacy guest. Only started testing but things seem to work fine with old guests so far. I think we need a spec, not just driver code. Rusty what's the plan? Want me to write it? We need both, of course, but the spec work will happen in the OASIS WG. A draft is good, but let's not commit anything to upstream QEMU until we get the spec finalized. And that is proposed to be late this year. Well that would be quite sad really. This means we can't make virtio a spec compliant pci express device, and we can't add any more feature bits, so no flexible buffer optimizations for virtio net. There are probably more projects that will be blocked. So how about we keep extending legacy layout for a bit longer: - add a way to access device with MMIO - use feature bit 31 to signal 64 bit features (and shift device config accordingly) By my count, net still has 7 feature bits left, so I don't think the feature bits are likely to be a limitation in the next 6 months? Actually I count 5 net specific ones: 3, 4 and then 25, 26, 27 Unless you count 31 even though it's a generic transport bit? That's still 6 ... MMIO is a bigger problem. Linux guests are happy with it: does it break the Windows drivers? Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Regression after Remove support for reporting coalesced APIC IRQs
Hi Jan, I bisected [1] to f1ed0450a5fac7067590317cbf027f566b6ccbca. Fortunately further investigation showed that it is not really related to removing APIC timer interrupt reinjection and the real problem is that we cannot assume that __apic_accept_irq() always injects interrupts like the patch does because the function skips interrupt injection if APIC is disabled. This misreporting screws RTC interrupt tracking, so further RTC interrupt are stopped to be injected. The simplest solution that I see is to revert most of the commit and only leave APIC timer interrupt reinjection. If you have more elegant solution let me know. [1] https://bugzilla.kernel.org/show_bug.cgi?id=58931 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm-unit-tests: Add test case for accessing bpl via modr/m
On Thu, Jun 6, 2013 at 3:45 PM, Gleb Natapov g...@redhat.com wrote: On Thu, Jun 06, 2013 at 03:42:56PM +0800, 李春奇 Arthur Chunqi Li wrote: On Thu, Jun 6, 2013 at 3:01 PM, Gleb Natapov g...@redhat.com wrote: On Thu, Jun 06, 2013 at 02:47:49PM +0800, 李春奇 Arthur Chunqi Li wrote: On Thu, Jun 6, 2013 at 1:45 PM, Gleb Natapov g...@redhat.com wrote: On Thu, Jun 06, 2013 at 01:03:44PM +0800, Arthur Chunqi Li wrote: Test access to %bpl via modr/m addressing mode. This case can test another bug in the boot of RHEL5.9 64-bit. We have growing number of instructions tests using the same tlb trick. I think it is time to make the code more generic. Create a function that receives instruction to check and all the tlb games will be done by that function. Should I do some work to merge all these test cases? It would be nice, yes. I also have an idea on how to improve test reliability. Since it relies on tlb to be out of sync with actual page table a vmexit in a wrong time can break this assumption and wrong instruction will be emulated (the one from insn_page instead of alt_insn_page). If we make the instruction on insn_page place special value somewhere and check it after the test we can see if the wrong instruction was executed and rerun the test. If I commit the patch to merge these test cases, should the patch base on what I have commit before (after patched of this mail), or base on the master thread? Master. First patch provides the infrastructure. Second converts existing users. Third adds new tests. There are some problems packaging it. For some test cases some registers should be set before call alt_insn_page and some registers should be returned after it (such as test_movabs). If the emulator part is packed, the initialization before and result got after it may be invalid. Add two functions designed by caller to initialize and get result may cause the parameter table too long. Do you have any suggestions about this? Arthur Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/emulator.c | 41 + 1 file changed, 41 insertions(+) diff --git a/x86/emulator.c b/x86/emulator.c index 96576e5..3563971 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -901,6 +901,45 @@ static void test_simplealu(u32 *mem) report(test, *mem == 0x8400); } +static void test_bpl_modrm(uint64_t *mem, uint8_t *insn_page, + uint8_t *alt_insn_page, void *insn_ram) +{ + ulong *cr3 = (ulong *)read_cr3(); + uint16_t cx = 0; + + // Pad with RET instructions + memset(insn_page, 0xc3, 4096); + memset(alt_insn_page, 0xc3, 4096); + // Place a trapping instruction in the page to trigger a VMEXIT + insn_page[0] = 0x66; // mov $0x4321, %cx + insn_page[1] = 0xb9; + insn_page[2] = 0x21; + insn_page[3] = 0x43; + insn_page[4] = 0x89; // mov %eax, (%rax) + insn_page[5] = 0x00; + insn_page[6] = 0x90; // nop + // Place mov %cl, %bpl in alt_insn_page for emulator to execuate + // If emulator mistaken addressing %bpl, %cl may be moved to %ch + // %cx will be broken to 0x2121, not 0x4321 + alt_insn_page[4] = 0x40; + alt_insn_page[5] = 0x88; + alt_insn_page[6] = 0xcd; + + // Load the code TLB with insn_page, but point the page tables at + // alt_insn_page (and keep the data TLB clear, for AMD decode assist). + // This will make the CPU trap on the insn_page instruction but the + // hypervisor will see alt_insn_page. + install_page(cr3, virt_to_phys(insn_page), insn_ram); + // Load code TLB + invlpg(insn_ram); + asm volatile(call *%0 : : r(insn_ram+3)); + // Trap, let hypervisor emulate at alt_insn_page + install_page(cr3, virt_to_phys(alt_insn_page), insn_ram); + asm volatile(call *%0 : : r(insn_ram), a(mem)); + asm volatile(:=c(cx)); Why not add the constrain to previous asm? I will merge them in next version. + report(access bpl in modr/m, cx == 0x4321); +} + int main() { void *mem; @@ -964,6 +1003,8 @@ int main() test_string_io_mmio(mem); + test_bpl_modrm(mem, insn_page, alt_insn_page, insn_ram); + printf(\nSUMMARY: %d tests, %d failures\n, tests, fails); return fails ? 1 : 0; } -- 1.7.9.5 -- Gleb. -- Gleb. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm-unit-tests: Add test case for accessing bpl via modr/m
On Thu, Jun 06, 2013 at 05:33:52PM +0800, 李春奇 Arthur Chunqi Li wrote: On Thu, Jun 6, 2013 at 3:45 PM, Gleb Natapov g...@redhat.com wrote: On Thu, Jun 06, 2013 at 03:42:56PM +0800, 李春奇 Arthur Chunqi Li wrote: On Thu, Jun 6, 2013 at 3:01 PM, Gleb Natapov g...@redhat.com wrote: On Thu, Jun 06, 2013 at 02:47:49PM +0800, 李春奇 Arthur Chunqi Li wrote: On Thu, Jun 6, 2013 at 1:45 PM, Gleb Natapov g...@redhat.com wrote: On Thu, Jun 06, 2013 at 01:03:44PM +0800, Arthur Chunqi Li wrote: Test access to %bpl via modr/m addressing mode. This case can test another bug in the boot of RHEL5.9 64-bit. We have growing number of instructions tests using the same tlb trick. I think it is time to make the code more generic. Create a function that receives instruction to check and all the tlb games will be done by that function. Should I do some work to merge all these test cases? It would be nice, yes. I also have an idea on how to improve test reliability. Since it relies on tlb to be out of sync with actual page table a vmexit in a wrong time can break this assumption and wrong instruction will be emulated (the one from insn_page instead of alt_insn_page). If we make the instruction on insn_page place special value somewhere and check it after the test we can see if the wrong instruction was executed and rerun the test. If I commit the patch to merge these test cases, should the patch base on what I have commit before (after patched of this mail), or base on the master thread? Master. First patch provides the infrastructure. Second converts existing users. Third adds new tests. There are some problems packaging it. For some test cases some registers should be set before call alt_insn_page and some registers should be returned after it (such as test_movabs). If the emulator part is packed, the initialization before and result got after it may be invalid. Add two functions designed by caller to initialize and get result may cause the parameter table too long. Do you have any suggestions about this? You can do what realmode.c does. It has inregs and outregs structures, registers are set to inregs values before a test and saved to outregs after. You can examine outregs to check correctness instead of checking HW registers directly. Arthur Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/emulator.c | 41 + 1 file changed, 41 insertions(+) diff --git a/x86/emulator.c b/x86/emulator.c index 96576e5..3563971 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -901,6 +901,45 @@ static void test_simplealu(u32 *mem) report(test, *mem == 0x8400); } +static void test_bpl_modrm(uint64_t *mem, uint8_t *insn_page, + uint8_t *alt_insn_page, void *insn_ram) +{ + ulong *cr3 = (ulong *)read_cr3(); + uint16_t cx = 0; + + // Pad with RET instructions + memset(insn_page, 0xc3, 4096); + memset(alt_insn_page, 0xc3, 4096); + // Place a trapping instruction in the page to trigger a VMEXIT + insn_page[0] = 0x66; // mov $0x4321, %cx + insn_page[1] = 0xb9; + insn_page[2] = 0x21; + insn_page[3] = 0x43; + insn_page[4] = 0x89; // mov %eax, (%rax) + insn_page[5] = 0x00; + insn_page[6] = 0x90; // nop + // Place mov %cl, %bpl in alt_insn_page for emulator to execuate + // If emulator mistaken addressing %bpl, %cl may be moved to %ch + // %cx will be broken to 0x2121, not 0x4321 + alt_insn_page[4] = 0x40; + alt_insn_page[5] = 0x88; + alt_insn_page[6] = 0xcd; + + // Load the code TLB with insn_page, but point the page tables at + // alt_insn_page (and keep the data TLB clear, for AMD decode assist). + // This will make the CPU trap on the insn_page instruction but the + // hypervisor will see alt_insn_page. + install_page(cr3, virt_to_phys(insn_page), insn_ram); + // Load code TLB + invlpg(insn_ram); + asm volatile(call *%0 : : r(insn_ram+3)); + // Trap, let hypervisor emulate at alt_insn_page + install_page(cr3, virt_to_phys(alt_insn_page), insn_ram); + asm volatile(call *%0 : : r(insn_ram), a(mem)); + asm volatile(:=c(cx)); Why not add the constrain to previous asm? I will merge them in next version. + report(access bpl in modr/m, cx == 0x4321); +} + int main() { void *mem; @@ -964,6 +1003,8 @@ int main() test_string_io_mmio(mem); + test_bpl_modrm(mem, insn_page, alt_insn_page, insn_ram); + printf(\nSUMMARY: %d tests, %d failures\n, tests, fails); return fails ? 1 : 0; } -- 1.7.9.5 --
RE: [RFC PATCH 0/6] KVM: PPC: Book3E: AltiVec support
This looks like a bit much for 3.10 (certainly, subject lines like refactor and enhance and add support aren't going to make Linus happy given that we're past rc4) so I think we should apply http://patchwork.ozlabs.org/patch/242896/ for 3.10. Then for 3.11, revert it after applying this patchset. Why not 1/6 plus e6500 removal? 1/6 is not a bugfix. Not sure I get it. Isn't this a better fix for AltiVec build breakage: -#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL 42 -#define BOOKE_INTERRUPT_ALTIVEC_ASSIST 43 +#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL 32 +#define BOOKE_INTERRUPT_ALTIVEC_ASSIST 33 This removes the need for additional kvm_handlers. Obvious this doesn't make AltiVec to work so we still need to disable e6500. -Mike -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021
On 06.06.2013 08:42, Gleb Natapov wrote: On Wed, Jun 05, 2013 at 02:51:19PM +0200, Stefan Pietsch wrote: On 05.06.2013 14:10, Gleb Natapov wrote: On Wed, Jun 05, 2013 at 01:57:25PM +0200, Stefan Pietsch wrote: On 19.05.2013 14:32, Gleb Natapov wrote: On Sun, May 19, 2013 at 02:00:31AM +0100, Ben Hutchings wrote: Dear KVM maintainers, it appears that there is a gap in x86 emulation, at least on a 32-bit host. Stefan found this when running GRML, a live distribution which can be downloaded from: http://download.grml.org/grml32-full_2013.02.iso. His original reported is at http://bugs.debian.org/707257. Can you verify with latest linux.git HEAD? It works for me there on 64bit. There were a lot of problems fixed in this area in 3.9/3.10 time frame, so it would be helpful if you'll test 32bit before I install one myself. Kernel version 3.9.4-1 (linux-image-3.9-1-686-pae) made things worse. The virtual machine tries to boot the kernel, but stops after a few seconds and the kern.log shows: At what point does it stop? The machine stops at: Performance Events: Broken PMU hardware detected, using software events only. Failed to access perfctr msr (MSR c1 is 0) Enabling APIC mode: Flat. Using 1 I/O APICs Timer initialization is what comes next. I tried 32bit kernel compiled from kvm.git next (3.10.0-rc2+) branch and upstream qemu and I cannot reproduce the problem. The guest boots fine. I had no success with the Debian kernel 3.10~rc4-1~exp1 (3.10-rc4-686-pae). The machine hangs after Enabling APIC mode: Flat. Using 1 I/O APICs. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021
On Thu, Jun 06, 2013 at 01:35:13PM +0200, Stefan Pietsch wrote: On 06.06.2013 08:42, Gleb Natapov wrote: On Wed, Jun 05, 2013 at 02:51:19PM +0200, Stefan Pietsch wrote: On 05.06.2013 14:10, Gleb Natapov wrote: On Wed, Jun 05, 2013 at 01:57:25PM +0200, Stefan Pietsch wrote: On 19.05.2013 14:32, Gleb Natapov wrote: On Sun, May 19, 2013 at 02:00:31AM +0100, Ben Hutchings wrote: Dear KVM maintainers, it appears that there is a gap in x86 emulation, at least on a 32-bit host. Stefan found this when running GRML, a live distribution which can be downloaded from: http://download.grml.org/grml32-full_2013.02.iso. His original reported is at http://bugs.debian.org/707257. Can you verify with latest linux.git HEAD? It works for me there on 64bit. There were a lot of problems fixed in this area in 3.9/3.10 time frame, so it would be helpful if you'll test 32bit before I install one myself. Kernel version 3.9.4-1 (linux-image-3.9-1-686-pae) made things worse. The virtual machine tries to boot the kernel, but stops after a few seconds and the kern.log shows: At what point does it stop? The machine stops at: Performance Events: Broken PMU hardware detected, using software events only. Failed to access perfctr msr (MSR c1 is 0) Enabling APIC mode: Flat. Using 1 I/O APICs Timer initialization is what comes next. I tried 32bit kernel compiled from kvm.git next (3.10.0-rc2+) branch and upstream qemu and I cannot reproduce the problem. The guest boots fine. I had no success with the Debian kernel 3.10~rc4-1~exp1 (3.10-rc4-686-pae). The machine hangs after Enabling APIC mode: Flat. Using 1 I/O APICs. OK, since it looks like it hangs during timer initialization can you try to disable kvmclock? Add -cpu qemu64,-kvmclock to your command line. Also can you provide the output of cat /proc/cpuinfo on your host? And complete serial output before hang. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/2] tools: lkvm - Filter out cpu vendor string
On Tue, May 28, 2013 at 2:49 PM, Cyrill Gorcunov gorcu...@openvz.org wrote: If cpuvendor string is not filetered in case of host amd machine we get unhandled msr reads | [1709265.368464] kvm: 25706: cpu6 unhandled rdmsr: 0xc0010048 | [1709265.397161] kvm: 25706: cpu7 unhandled rdmsr: 0xc0010048 | [1709265.425774] kvm: 25706: cpu8 unhandled rdmsr: 0xc0010048 thus provide own string and kernel will use generic cpu init. Reported-by: Ingo Molnar mi...@kernel.org CC: Pekka Enberg penb...@kernel.org CC: Sasha Levin sasha.le...@oracle.com CC: Asias He as...@redhat.com Signed-off-by: Cyrill Gorcunov gorcu...@openvz.org --- tools/kvm/x86/cpuid.c |8 1 file changed, 8 insertions(+) Index: linux-2.6.git/tools/kvm/x86/cpuid.c === --- linux-2.6.git.orig/tools/kvm/x86/cpuid.c +++ linux-2.6.git/tools/kvm/x86/cpuid.c @@ -12,6 +12,7 @@ static void filter_cpuid(struct kvm_cpuid2 *kvm_cpuid) { + unsigned int signature[3]; unsigned int i; /* @@ -21,6 +22,13 @@ static void filter_cpuid(struct kvm_cpui struct kvm_cpuid_entry2 *entry = kvm_cpuid-entries[i]; switch (entry-function) { + case 0: + /* Vendor name */ + memcpy(signature, LKVMLKVMLKVM, 12); + entry-ebx = signature[0]; + entry-ecx = signature[1]; + entry-edx = signature[2]; + break; case 1: /* Set X86_FEATURE_HYPERVISOR */ if (entry-index == 0) Ping! Is there someone out there who has a AMD box they could test this on? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/2] tools: lkvm - Filter out cpu vendor string
On Thu, Jun 06, 2013 at 03:03:03PM +0300, Pekka Enberg wrote: /* Set X86_FEATURE_HYPERVISOR */ if (entry-index == 0) Ping! Is there someone out there who has a AMD box they could test this on? I don't have it, sorry :-( -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021
On 06.06.2013 13:40, Gleb Natapov wrote: On Thu, Jun 06, 2013 at 01:35:13PM +0200, Stefan Pietsch wrote: I had no success with the Debian kernel 3.10~rc4-1~exp1 (3.10-rc4-686-pae). The machine hangs after Enabling APIC mode: Flat. Using 1 I/O APICs. OK, since it looks like it hangs during timer initialization can you try to disable kvmclock? Add -cpu qemu64,-kvmclock to your command line. Also can you provide the output of cat /proc/cpuinfo on your host? And complete serial output before hang. command line: qemu-system-i386 -machine accel=kvm -m 512 -cpu qemu64,-kvmclock -cdrom grml32-full_2013.02.iso -serial file:ttyS0.log /proc/cpuinfo: ## processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 14 model name : Intel(R) Core(TM) Duo CPU L2400 @ 1.66GHz stepping: 12 microcode : 0x54 cpu MHz : 1000.000 cache size : 2048 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fdiv_bug: no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon bts aperfmperf pni monitor vmx est tm2 xtpr pdcm dtherm bogomips: 3325.02 clflush size: 64 cache_alignment : 64 address sizes : 32 bits physical, 32 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 14 model name : Intel(R) Core(TM) Duo CPU L2400 @ 1.66GHz stepping: 12 microcode : 0x54 cpu MHz : 1000.000 cache size : 2048 KB physical id : 0 siblings: 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fdiv_bug: no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon bts aperfmperf pni monitor vmx est tm2 xtpr pdcm dtherm bogomips: 3325.02 clflush size: 64 cache_alignment : 64 address sizes : 32 bits physical, 32 bits virtual power management: ttyS0.log: ## [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Linux version 3.7-1-grml-486 (t...@grml.org) (gcc version 4.7.2 (Debian 4.7.2-5) ) #1 Debian 3.7.9-1+grml.1 [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009fbff] usable [0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved [0.00] BIOS-e820: [mem 0x000f-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0x1fffdfff] usable [0.00] BIOS-e820: [mem 0x1fffe000-0x1fff] reserved [0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] reserved [0.00] BIOS-e820: [mem 0xfffc-0x] reserved [0.00] Notice: NX (Execute Disable) protection cannot be enabled: non-PAE kernel! [0.00] SMBIOS 2.4 present. [0.00] Hypervisor detected: KVM [0.00] e820: last_pfn = 0x1fffe max_arch_pfn = 0x10 [0.00] PAT not supported by CPU. [0.00] found SMP MP-table at [mem 0x000fdb00-0x000fdb0f] mapped at [c00fdb00] [0.00] init_memory_mapping: [mem 0x-0x1fffdfff] [0.00] RAMDISK: [mem 0x1f33-0x1ffdbfff] [0.00] ACPI: RSDP 000fd9a0 00014 (v00 BOCHS ) [0.00] ACPI: RSDT 1fffe4b0 00034 (v01 BOCHS BXPCRSDT 0001 BXPC 0001) [0.00] ACPI: FACP 1f80 00074 (v01 BOCHS BXPCFACP 0001 BXPC 0001) [0.00] ACPI: DSDT 1fffe4f0 011A9 (v01 BXPC BXDSDT 0001 INTL 20100528) [0.00] ACPI: FACS 1f40 00040 [0.00] ACPI: SSDT 1800 00735 (v01 BOCHS BXPCSSDT 0001 BXPC 0001) [0.00] ACPI: APIC 16e0 00078 (v01 BOCHS BXPCAPIC 0001 BXPC 0001) [0.00] ACPI: HPET 16a0 00038 (v01 BOCHS BXPCHPET 0001 BXPC 0001) [0.00] 0MB HIGHMEM available. [0.00] 511MB LOWMEM available. [0.00] mapped low ram: 0 - 1fffe000 [0.00] low ram: 0 - 1fffe000 [0.00] Zone ranges: [0.00] DMA [mem 0x0001-0x00ff] [0.00] Normal [mem 0x0100-0x1fffdfff] [0.00] HighMem empty [0.00] Movable zone start for each node [0.00] Early memory node ranges [0.00] node 0: [mem 0x0001-0x0009efff] [0.00] node 0: [mem 0x0010-0x1fffdfff] [0.00] Using APIC driver default [0.00] ACPI: PM-Timer IO Port: 0xb008 [0.00] ACPI:
[PATCH net 2/2] vhost: fix ubuf_info cleanup
vhost_net_clear_ubuf_info didn't clear ubuf_info after kfree, this could trigger double free. Fix this and simplify this code to make it more robust: make sure ubuf info is always freed through vhost_net_clear_ubuf_info. Reported-by: Tommi Rantala tt.rant...@gmail.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/vhost/net.c | 22 +++--- 1 file changed, 7 insertions(+), 15 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 6b00f64..7fc47f7 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -155,14 +155,11 @@ static void vhost_net_ubuf_put_and_wait(struct vhost_net_ubuf_ref *ubufs) static void vhost_net_clear_ubuf_info(struct vhost_net *n) { - - bool zcopy; int i; - for (i = 0; i n-dev.nvqs; ++i) { - zcopy = vhost_net_zcopy_mask (0x1 i); - if (zcopy) - kfree(n-vqs[i].ubuf_info); + for (i = 0; i VHOST_NET_VQ_MAX; ++i) { + kfree(n-vqs[i].ubuf_info); + n-vqs[i].ubuf_info = NULL; } } @@ -171,7 +168,7 @@ int vhost_net_set_ubuf_info(struct vhost_net *n) bool zcopy; int i; - for (i = 0; i n-dev.nvqs; ++i) { + for (i = 0; i VHOST_NET_VQ_MAX; ++i) { zcopy = vhost_net_zcopy_mask (0x1 i); if (!zcopy) continue; @@ -183,12 +180,7 @@ int vhost_net_set_ubuf_info(struct vhost_net *n) return 0; err: - while (i--) { - zcopy = vhost_net_zcopy_mask (0x1 i); - if (!zcopy) - continue; - kfree(n-vqs[i].ubuf_info); - } + vhost_net_clear_ubuf_info(n); return -ENOMEM; } @@ -196,12 +188,12 @@ void vhost_net_vq_reset(struct vhost_net *n) { int i; + vhost_net_clear_ubuf_info(n); + for (i = 0; i VHOST_NET_VQ_MAX; i++) { n-vqs[i].done_idx = 0; n-vqs[i].upend_idx = 0; n-vqs[i].ubufs = NULL; - kfree(n-vqs[i].ubuf_info); - n-vqs[i].ubuf_info = NULL; n-vqs[i].vhost_hlen = 0; n-vqs[i].sock_hlen = 0; } -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 1/2] vhost: check owner before we overwrite ubuf_info
If device has an owner, we shouldn't touch ubuf_info since it might be in use. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/vhost/net.c | 4 drivers/vhost/vhost.c | 8 +++- drivers/vhost/vhost.h | 1 + 3 files changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 2b51e23..6b00f64 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -1053,6 +1053,10 @@ static long vhost_net_set_owner(struct vhost_net *n) int r; mutex_lock(n-dev.mutex); + if (vhost_dev_has_owner(n-dev)) { + r = -EBUSY; + goto out; + } r = vhost_net_set_ubuf_info(n); if (r) goto out; diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index beee7f5..60aa5ad 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -344,13 +344,19 @@ static int vhost_attach_cgroups(struct vhost_dev *dev) } /* Caller should have device mutex */ +bool vhost_dev_has_owner(struct vhost_dev *dev) +{ + return dev-mm; +} + +/* Caller should have device mutex */ long vhost_dev_set_owner(struct vhost_dev *dev) { struct task_struct *worker; int err; /* Is there an owner already? */ - if (dev-mm) { + if (vhost_dev_has_owner(dev)) { err = -EBUSY; goto err_mm; } diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index a7ad635..64adcf9 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -133,6 +133,7 @@ struct vhost_dev { long vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs, int nvqs); long vhost_dev_set_owner(struct vhost_dev *dev); +bool vhost_dev_has_owner(struct vhost_dev *dev); long vhost_dev_check_owner(struct vhost_dev *); struct vhost_memory *vhost_dev_reset_owner_prepare(void); void vhost_dev_reset_owner(struct vhost_dev *, struct vhost_memory *); -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 0/2] vhost fixes for 3.10
Two patches fixing the fallout from the vhost cleanup in 3.10. Thanks to Tommi Rantala who reported the issue. Tommi, could you please confirm this fixes the crashes for you? Michael S. Tsirkin (2): vhost: check owner before we overwrite ubuf_info vhost: fix ubuf_info cleanup drivers/vhost/net.c | 26 +++--- drivers/vhost/vhost.c | 8 +++- drivers/vhost/vhost.h | 1 + 3 files changed, 19 insertions(+), 16 deletions(-) -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/2] tools: lkvm - Filter out cpu vendor string
On Thu, Jun 6, 2013 at 8:03 PM, Pekka Enberg penb...@kernel.org wrote: On Tue, May 28, 2013 at 2:49 PM, Cyrill Gorcunov gorcu...@openvz.org wrote: If cpuvendor string is not filetered in case of host amd machine we get unhandled msr reads | [1709265.368464] kvm: 25706: cpu6 unhandled rdmsr: 0xc0010048 | [1709265.397161] kvm: 25706: cpu7 unhandled rdmsr: 0xc0010048 | [1709265.425774] kvm: 25706: cpu8 unhandled rdmsr: 0xc0010048 thus provide own string and kernel will use generic cpu init. Reported-by: Ingo Molnar mi...@kernel.org CC: Pekka Enberg penb...@kernel.org CC: Sasha Levin sasha.le...@oracle.com CC: Asias He as...@redhat.com Signed-off-by: Cyrill Gorcunov gorcu...@openvz.org --- tools/kvm/x86/cpuid.c |8 1 file changed, 8 insertions(+) Index: linux-2.6.git/tools/kvm/x86/cpuid.c === --- linux-2.6.git.orig/tools/kvm/x86/cpuid.c +++ linux-2.6.git/tools/kvm/x86/cpuid.c @@ -12,6 +12,7 @@ static void filter_cpuid(struct kvm_cpuid2 *kvm_cpuid) { + unsigned int signature[3]; unsigned int i; /* @@ -21,6 +22,13 @@ static void filter_cpuid(struct kvm_cpui struct kvm_cpuid_entry2 *entry = kvm_cpuid-entries[i]; switch (entry-function) { + case 0: + /* Vendor name */ + memcpy(signature, LKVMLKVMLKVM, 12); + entry-ebx = signature[0]; + entry-ecx = signature[1]; + entry-edx = signature[2]; + break; case 1: /* Set X86_FEATURE_HYPERVISOR */ if (entry-index == 0) Ping! Is there someone out there who has a AMD box they could test this on? I will try to find one. -- Asias -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On 06/05/2013 11:34 PM, Gleb Natapov wrote: SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout. There are loads of ASSERT32FLAT()s in the code to make sure of this. Well, not exactly. Initialization is done in 32bit, but disk reads/writes are done in 16bit mode since it should work from int13 interrupt handler. The only way I know to access MMIO bars from 16 bit is to use SMM which we do not have in KVM. In some ways it is even worse for PXE, since PXE is defined to work from 16-bit protected mode... but the mechanism to request a segment mapping for the high memory area doesn't actually work. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] virtio_balloon: leak_balloon(): only tell host if we got pages deflated
On Wed, Jun 05, 2013 at 09:18:37PM -0400, Luiz Capitulino wrote: The balloon_page_dequeue() function can return NULL. If it does for the first page being freed, then leak_balloon() will create a scatter list with len=0. Which in turn seems to generate an invalid virtio request. I didn't get this in practice, I found it by code review. On the other hand, such an invalid virtio request will cause errors in QEMU and fill_balloon() also performs the same check implemented by this commit. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com Acked-by: Rafael Aquini aqu...@redhat.com --- o v2 - Improve changelog drivers/virtio/virtio_balloon.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index bd3ae32..71af7b5 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -191,7 +191,8 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); * is true, we *have* to do it in this order */ - tell_host(vb, vb-deflate_vq); Luiz, sorry for not being clearer before. I was referring to add a commentary on code, to explain in short words why we should not get rid of this check point. + if (vb-num_pfns != 0) + tell_host(vb, vb-deflate_vq); mutex_unlock(vb-balloon_lock); If the comment is regarded as unnecessary, then just ignore my suggestion. I'm OK with your patch. :) Cheers! -- Rafael -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] virtio_balloon: leak_balloon(): only tell host if we got pages deflated
On Thu, 6 Jun 2013 11:13:58 -0300 Rafael Aquini aqu...@redhat.com wrote: On Wed, Jun 05, 2013 at 09:18:37PM -0400, Luiz Capitulino wrote: The balloon_page_dequeue() function can return NULL. If it does for the first page being freed, then leak_balloon() will create a scatter list with len=0. Which in turn seems to generate an invalid virtio request. I didn't get this in practice, I found it by code review. On the other hand, such an invalid virtio request will cause errors in QEMU and fill_balloon() also performs the same check implemented by this commit. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com Acked-by: Rafael Aquini aqu...@redhat.com --- o v2 - Improve changelog drivers/virtio/virtio_balloon.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index bd3ae32..71af7b5 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -191,7 +191,8 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); * is true, we *have* to do it in this order */ - tell_host(vb, vb-deflate_vq); Luiz, sorry for not being clearer before. I was referring to add a commentary on code, to explain in short words why we should not get rid of this check point. Oh. + if (vb-num_pfns != 0) + tell_host(vb, vb-deflate_vq); mutex_unlock(vb-balloon_lock); If the comment is regarded as unnecessary, then just ignore my suggestion. I'm OK with your patch. :) IMHO, the code is clear enough. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Hi Rusty, Rusty Russell ru...@rustcorp.com.au writes: Anthony Liguori aligu...@us.ibm.com writes: 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), give it a new device/vendor ID. Continue to use virtio-pci for existing devices potentially adding virtio-{net,blk,...}-pcie variants for people that care to use them. Now you have a different compatibility problem; how do you know the guest supports the new virtio-pcie net? We don't care. We would still use virtio-pci for existing devices. Only new devices would use virtio-pcie. If you put a virtio-pci card behind a PCI-e bridge today, it's not compliant, but AFAICT it will Just Work. (Modulo the 16-dev limit). I believe you can put it in legacy mode and then there isn't the 16-dev limit. I believe the only advantage of putting it in native mode is that then you can do native hotplug (as opposed to ACPI hotplug). So sticking with virtio-pci seems reasonable to me. I've been assuming we'd avoid a flag day change; that devices would look like existing virtio-pci with capabilities indicating the new config layout. I don't think that's feasible. Maybe 5 or 10 years from now, we switch the default adapter to virtio-pcie. I think 4 is the best path forward. It's better for users (guests continue to work as they always have). There's less confusion about enabling PCI-e support--you must ask for the virtio-pcie variant and you must have a virtio-pcie driver. It's easy to explain. Removing both forward and backward compatibility is easy to explain, but I think it'll be harder to deploy. This is your area though, so perhaps I'm wrong. My concern is that it's not real backwards compatibility. It also maps to what regular hardware does. I highly doubt that there are any real PCI cards that made the shift from PCI to PCI-e without bumping at least a revision ID. Noone expected the new cards to Just Work with old OSes: a new machine meant a new OS and new drivers. Hardware vendors like that. Yup. Since virtualization often involves legacy, our priorities might be different. So realistically, I think if we introduce virtio-pcie with a different vendor ID, it will be adopted fairly quickly. The drivers will show up in distros quickly and get backported. New devices can be limited to supporting virtio-pcie and we'll certainly provide a way to use old devices with virtio-pcie too. But for practical reasons, I think we have to continue using virtio-pci by default. Regards, Anthony Liguori Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Gleb Natapov g...@redhat.com writes: On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote: H. Peter Anvin h...@zytor.com writes: On 06/05/2013 03:08 PM, Anthony Liguori wrote: Definitely an option. However, we want to be able to boot from native devices, too, so having an I/O BAR (which would not be used by the OS driver) should still at the very least be an option. What makes it so difficult to work with an MMIO bar for PCI-e? With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight forward. Is there something special about PCI-e here? It's not tracking allocation. It is that accessing memory above 1 MiB is incredibly painful in the BIOS environment, which basically means MMIO is inaccessible. Oh, you mean in real mode. SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout. There are loads of ASSERT32FLAT()s in the code to make sure of this. Well, not exactly. Initialization is done in 32bit, but disk reads/writes are done in 16bit mode since it should work from int13 interrupt handler. The only way I know to access MMIO bars from 16 bit is to use SMM which we do not have in KVM. Ah, if it's just the dataplane operations then there's another solution. We can introduce a virtqueue flag that asks the backend to poll for new requests. Then SeaBIOS can add the request to the queue and not worry about kicking or reading the ISR. SeaBIOS is polling for completion anyway. Regards, Anthony Liguori -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On 06/06/13 08:34, Gleb Natapov wrote: On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote: Oh, you mean in real mode. SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout. There are loads of ASSERT32FLAT()s in the code to make sure of this. Well, not exactly. Initialization is done in 32bit, but disk reads/writes are done in 16bit mode since it should work from int13 interrupt handler. Exactly. It's only the initialization code which has ASSERt32FLAT() all over the place. Which actually is the majority of the code in most cases as all the hardware detection and initialization code is there. But kicking I/O requests must work from 16bit mode too. The only way I know to access MMIO bars from 16 bit is to use SMM which we do not have in KVM. For seabios itself this isn't a big issue, see pci_{readl,writel} in src/pci.c. When called in 16bit mode it goes into 32bit mode temporarily, just for accessing the mmio register. ahci driver uses it, xhci driver (wip atm) will use that too, and virtio-{blk,scsi} drivers in seabios can do the same. But as hpa mentioned it will be more tricky for option roms (aka virtio-net). cheers, Gerd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Thu, Jun 06, 2013 at 05:06:32PM +0200, Gerd Hoffmann wrote: On 06/06/13 08:34, Gleb Natapov wrote: On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote: Oh, you mean in real mode. SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout. There are loads of ASSERT32FLAT()s in the code to make sure of this. Well, not exactly. Initialization is done in 32bit, but disk reads/writes are done in 16bit mode since it should work from int13 interrupt handler. Exactly. It's only the initialization code which has ASSERt32FLAT() all over the place. Which actually is the majority of the code in most cases as all the hardware detection and initialization code is there. But kicking I/O requests must work from 16bit mode too. The only way I know to access MMIO bars from 16 bit is to use SMM which we do not have in KVM. For seabios itself this isn't a big issue, see pci_{readl,writel} in src/pci.c. When called in 16bit mode it goes into 32bit mode temporarily, just for accessing the mmio register. ahci driver uses it, xhci driver (wip atm) will use that too, and virtio-{blk,scsi} drivers in seabios can do the same. Isn't this approach broken? How can SeaBIOS be sure it restores real mode registers to exactly same state they were before entering 32bit mode? But as hpa mentioned it will be more tricky for option roms (aka virtio-net). cheers, Gerd -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On 06/06/2013 08:10 AM, Gleb Natapov wrote: On Thu, Jun 06, 2013 at 05:06:32PM +0200, Gerd Hoffmann wrote: Isn't this approach broken? How can SeaBIOS be sure it restores real mode registers to exactly same state they were before entering 32bit mode? It can't... so yes, it is broken. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Hi, For seabios itself this isn't a big issue, see pci_{readl,writel} in src/pci.c. When called in 16bit mode it goes into 32bit mode temporarily, just for accessing the mmio register. ahci driver uses it, xhci driver (wip atm) will use that too, and virtio-{blk,scsi} drivers in seabios can do the same. Isn't this approach broken? How can SeaBIOS be sure it restores real mode registers to exactly same state they were before entering 32bit mode? Don't know the details of that magic. Kevin had some concerns on the stability of this, so maybe there is a theoretic hole. So far I havn't seen any issues in practice, but also didn't stress it too much. Basically only used that with all kinds of boot loaders, could very well be it breaks if you try to use that with more esoteric stuff such as dos extenders, then hit unhandled corner cases ... cheers, Gerd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] kvm-unit-tests: Add a func to run instruction in emulator
Add a function trap_emulator to run an instruction in emulator. Set inregs first (%rax is invalid because it is used as return address), put instruction codec in alt_insn and call func with alt_insn_length. Get results in outregs. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/emulator.c | 81 1 file changed, 81 insertions(+) diff --git a/x86/emulator.c b/x86/emulator.c index 96576e5..8ab9904 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -11,6 +11,14 @@ int fails, tests; static int exceptions; +struct regs { + u64 rax, rbx, rcx, rdx; + u64 rsi, rdi, rsp, rbp; + u64 rip, rflags; +}; + +static struct regs inregs, outregs; + void report(const char *name, int result) { ++tests; @@ -685,6 +693,79 @@ static void test_shld_shrd(u32 *mem) report(shrd (cl), *mem == ((0x12345678 3) | (5u 29))); } +static void trap_emulator(uint64_t *mem, uint8_t *insn_page, +uint8_t *alt_insn_page, void *insn_ram, +uint8_t *alt_insn, int alt_insn_length) +{ + ulong *cr3 = (ulong *)read_cr3(); + int i; + + // Pad with RET instructions + memset(insn_page, 0xc3, 4096); + memset(alt_insn_page, 0xc3, 4096); + + // Place a trapping instruction in the page to trigger a VMEXIT + insn_page[0] = 0x89; // mov %eax, (%rax) + insn_page[1] = 0x00; + insn_page[2] = 0x90; // nop + insn_page[3] = 0xc3; // ret + + // Place the instruction we want the hypervisor to see in the alternate page + for (i=0; ialt_insn_length; i++) + alt_insn_page[i] = alt_insn[i]; + + // Save general registers + asm volatile( + push %rax\n\r + push %rbx\n\r + push %rcx\n\r + push %rdx\n\r + push %rsi\n\r + push %rdi\n\r + ); + // Load the code TLB with insn_page, but point the page tables at + // alt_insn_page (and keep the data TLB clear, for AMD decode assist). + // This will make the CPU trap on the insn_page instruction but the + // hypervisor will see alt_insn_page. + install_page(cr3, virt_to_phys(insn_page), insn_ram); + invlpg(insn_ram); + // Load code TLB + asm volatile(call *%0 : : r(insn_ram + 3)); + install_page(cr3, virt_to_phys(alt_insn_page), insn_ram); + // Trap, let hypervisor emulate at alt_insn_page + asm volatile( + call *%1\n\r + + mov %%rax, 0+%[outregs] \n\t + mov %%rbx, 8+%[outregs] \n\t + mov %%rcx, 16+%[outregs] \n\t + mov %%rdx, 24+%[outregs] \n\t + mov %%rsi, 32+%[outregs] \n\t + mov %%rdi, 40+%[outregs] \n\t + mov %%rsp,48+ %[outregs] \n\t + mov %%rbp, 56+%[outregs] \n\t + + /* Save RFLAGS in outregs*/ + pushf \n\t + popq 72+%[outregs] \n\t + : [outregs]+m(outregs) + : r(insn_ram), + a(mem), b(inregs.rbx), + c(inregs.rcx), d(inregs.rdx), + S(inregs.rsi), D(inregs.rdi) + : memory, cc + ); + // Restore general registers + asm volatile( + pop %rax\n\r + pop %rbx\n\r + pop %rcx\n\r + pop %rdx\n\r + pop %rsi\n\r + pop %rdi\n\r + ); +} + static void advance_rip_by_3_and_note_exception(struct ex_regs *regs) { ++exceptions; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] kvm-unit-tests: Change two cases to use trap_emulator
Change two functions (test_mmx_movq_mf and test_movabs) using unified trap_emulator. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/emulator.c | 66 1 file changed, 14 insertions(+), 52 deletions(-) diff --git a/x86/emulator.c b/x86/emulator.c index 8ab9904..fa8993f 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -776,72 +776,34 @@ static void test_mmx_movq_mf(uint64_t *mem, uint8_t *insn_page, uint8_t *alt_insn_page, void *insn_ram) { uint16_t fcw = 0; // all exceptions unmasked -ulong *cr3 = (ulong *)read_cr3(); +uint8_t alt_insn[] = {0x0f, 0x7f, 0x00}; // movq %mm0, (%rax) write_cr0(read_cr0() ~6); // TS, EM -// Place a trapping instruction in the page to trigger a VMEXIT -insn_page[0] = 0x89; // mov %eax, (%rax) -insn_page[1] = 0x00; -insn_page[2] = 0x90; // nop -insn_page[3] = 0xc3; // ret -// Place the instruction we want the hypervisor to see in the alternate page -alt_insn_page[0] = 0x0f; // movq %mm0, (%rax) -alt_insn_page[1] = 0x7f; -alt_insn_page[2] = 0x00; -alt_insn_page[3] = 0xc3; // ret - exceptions = 0; handle_exception(MF_VECTOR, advance_rip_by_3_and_note_exception); - -// Load the code TLB with insn_page, but point the page tables at -// alt_insn_page (and keep the data TLB clear, for AMD decode assist). -// This will make the CPU trap on the insn_page instruction but the -// hypervisor will see alt_insn_page. -install_page(cr3, virt_to_phys(insn_page), insn_ram); asm volatile(fninit; fldcw %0 : : m(fcw)); asm volatile(fldz; fldz; fdivp); // generate exception -invlpg(insn_ram); -// Load code TLB -asm volatile(call *%0 : : r(insn_ram + 3)); -install_page(cr3, virt_to_phys(alt_insn_page), insn_ram); -// Trap, let hypervisor emulate at alt_insn_page -asm volatile(call *%0 : : r(insn_ram), a(mem)); + +inregs = (struct regs){ 0 }; +trap_emulator(mem, insn_page, alt_insn_page, insn_ram, + alt_insn, 3); // exit MMX mode asm volatile(fnclex; emms); -report(movq mmx generates #MF, exceptions == 1); +report(movq mmx generates #MF2, exceptions == 1); handle_exception(MF_VECTOR, 0); } static void test_movabs(uint64_t *mem, uint8_t *insn_page, uint8_t *alt_insn_page, void *insn_ram) { -uint64_t val = 0; -ulong *cr3 = (ulong *)read_cr3(); - -// Pad with RET instructions -memset(insn_page, 0xc3, 4096); -memset(alt_insn_page, 0xc3, 4096); -// Place a trapping instruction in the page to trigger a VMEXIT -insn_page[0] = 0x89; // mov %eax, (%rax) -insn_page[1] = 0x00; -// Place the instruction we want the hypervisor to see in the alternate -// page. A buggy hypervisor will fetch a 32-bit immediate and return -// 0xc3c3c3c3. -alt_insn_page[0] = 0x48; // mov $0xc3c3c3c3c3c3c3c3, %rcx -alt_insn_page[1] = 0xb9; - -// Load the code TLB with insn_page, but point the page tables at -// alt_insn_page (and keep the data TLB clear, for AMD decode assist). -// This will make the CPU trap on the insn_page instruction but the -// hypervisor will see alt_insn_page. -install_page(cr3, virt_to_phys(insn_page), insn_ram); -// Load code TLB -invlpg(insn_ram); -asm volatile(call *%0 : : r(insn_ram + 3)); -// Trap, let hypervisor emulate at alt_insn_page -install_page(cr3, virt_to_phys(alt_insn_page), insn_ram); -asm volatile(call *%1 : =c(val) : r(insn_ram), a(mem), c(0)); -report(64-bit mov imm, val == 0xc3c3c3c3c3c3c3c3); +// mov $0xc3c3c3c3c3c3c3c3, %rcx +uint8_t alt_insn[] = {0x48, 0xb9, 0xc3, 0xc3, 0xc3, + 0xc3, 0xc3, 0xc3, 0xc3, 0xc3}; +inregs = (struct regs){ .rcx = 0 }; + +trap_emulator(mem, insn_page, alt_insn_page, insn_ram, + alt_insn, 10); +report(64-bit mov imm2, outregs.rcx == 0xc3c3c3c3c3c3c3c3); } static void test_crosspage_mmio(volatile uint8_t *mem) -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Regression after Remove support for reporting coalesced APIC IRQs
-Original Message- From: Gleb Natapov [mailto:g...@redhat.com] Sent: Thursday, June 06, 2013 4:54 PM To: Jan Kiszka Cc: kvm@vger.kernel.org; Ren, Yongjie Subject: Regression after Remove support for reporting coalesced APIC IRQs Hi Jan, I bisected [1] to f1ed0450a5fac7067590317cbf027f566b6ccbca. Right. Before Gleb's mail, I also just did the bisection for this bug. The first bad commit is Jan's f1ed0450. Fortunately further investigation showed that it is not really related to removing APIC timer interrupt reinjection and the real problem is that we cannot assume that __apic_accept_irq() always injects interrupts like the patch does because the function skips interrupt injection if APIC is disabled. This misreporting screws RTC interrupt tracking, so further RTC interrupt are stopped to be injected. The simplest solution that I see is to revert most of the commit and only leave APIC timer interrupt reinjection. If you have more elegant solution let me know. [1] https://bugzilla.kernel.org/show_bug.cgi?id=58931 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 58931] SMP x64 Windows 2003 guest can't boot up
https://bugzilla.kernel.org/show_bug.cgi?id=58931 --- Comment #1 from Jay Ren yongjie@intel.com 2013-06-06 15:40:58 --- After bisection, we found the first bad commit is: f1ed0450a5fac7067590317cbf027f566b6ccbca commit f1ed0450a5fac7067590317cbf027f566b6ccbca Author: Jan Kiszka jan.kis...@siemens.com Date: Sun Apr 28 14:00:41 2013 +0200 KVM: x86: Remove support for reporting coalesced APIC IRQs Gleb and Jan are trying to fix it. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
lwepx faults needs to be handled by KVM and this implies additional code in DO_KVM macro to identify the source of the exception originated in host context. This requires to check the Exception Syndrome Register (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for DTB_MISS, DSI and LRAT exceptions which is too intrusive for the host. Get rid of lwepx and acquire last instuction in kvmppc_handle_exit() by searching for the physical address and kmap it. This fixes an infinite loop caused by lwepx's data TLB miss handled in the host and the TODO for TLB eviction and execute-but-not-read entries. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- arch/powerpc/include/asm/mmu-book3e.h |6 ++- arch/powerpc/kvm/booke.c |6 +++ arch/powerpc/kvm/booke.h |2 + arch/powerpc/kvm/bookehv_interrupts.S | 32 ++- arch/powerpc/kvm/e500.c |4 ++ arch/powerpc/kvm/e500mc.c | 69 + 6 files changed, 91 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/include/asm/mmu-book3e.h b/arch/powerpc/include/asm/mmu-book3e.h index 99d43e0..32e470e 100644 --- a/arch/powerpc/include/asm/mmu-book3e.h +++ b/arch/powerpc/include/asm/mmu-book3e.h @@ -40,7 +40,10 @@ /* MAS registers bit definitions */ -#define MAS0_TLBSEL(x) (((x) 28) 0x3000) +#define MAS0_TLBSEL_MASK 0x3000 +#define MAS0_TLBSEL_SHIFT 28 +#define MAS0_TLBSEL(x) (((x) MAS0_TLBSEL_SHIFT) MAS0_TLBSEL_MASK) +#define MAS0_GET_TLBSEL(mas0) (((mas0) MAS0_TLBSEL_MASK) MAS0_TLBSEL_SHIFT) #define MAS0_ESEL_MASK 0x0FFF #define MAS0_ESEL_SHIFT16 #define MAS0_ESEL(x) (((x) MAS0_ESEL_SHIFT) MAS0_ESEL_MASK) @@ -58,6 +61,7 @@ #define MAS1_TSIZE_MASK0x0f80 #define MAS1_TSIZE_SHIFT 7 #define MAS1_TSIZE(x) (((x) MAS1_TSIZE_SHIFT) MAS1_TSIZE_MASK) +#define MAS1_GET_TSIZE(mas1) (((mas1) MAS1_TSIZE_MASK) MAS1_TSIZE_SHIFT) #define MAS2_EPN (~0xFFFUL) #define MAS2_X00x0040 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 1020119..6764a8e 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -836,6 +836,12 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, /* update before a new last_exit_type is rewritten */ kvmppc_update_timing_stats(vcpu); + /* +* The exception type can change at this point, such as if the TLB entry +* for the emulated instruction has been evicted. +*/ + kvmppc_prepare_for_emulation(vcpu, exit_nr); + /* restart interrupts if they were meant for the host */ kvmppc_restart_interrupt(vcpu, exit_nr); diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h index 5fd1ba6..a0d0fea 100644 --- a/arch/powerpc/kvm/booke.h +++ b/arch/powerpc/kvm/booke.h @@ -90,6 +90,8 @@ void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu); void kvmppc_booke_vcpu_load(struct kvm_vcpu *vcpu, int cpu); void kvmppc_booke_vcpu_put(struct kvm_vcpu *vcpu); +void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int *exit_nr); + enum int_class { INT_CLASS_NONCRIT, INT_CLASS_CRIT, diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index 20c7a54..0538ab9 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -120,37 +120,20 @@ .if \flags NEED_EMU /* -* This assumes you have external PID support. -* To support a bookehv CPU without external PID, you'll -* need to look up the TLB entry and create a temporary mapping. -* -* FIXME: we don't currently handle if the lwepx faults. PR-mode -* booke doesn't handle it either. Since Linux doesn't use -* broadcast tlbivax anymore, the only way this should happen is -* if the guest maps its memory execute-but-not-read, or if we -* somehow take a TLB miss in the middle of this entry code and -* evict the relevant entry. On e500mc, all kernel lowmem is -* bolted into TLB1 large page mappings, and we don't use -* broadcast invalidates, so we should not take a TLB miss here. -* -* Later we'll need to deal with faults here. Disallowing guest -* mappings that are execute-but-not-read could be an option on -* e500mc, but not on chips with an LRAT if it is used. +* We don't use external PID support. lwepx faults would need to be +* handled by KVM and this implies aditional code in DO_KVM (for +* DTB_MISS, DSI and LRAT) to check ESR[EPID] and EPLC[EGS] which +* is too intrusive for the host. Get last instuction in +* kvmppc_handle_exit(). */ - - mfspr r3, SPRN_EPLC /* will already have correct ELPID
[PATCH 1/2] KVM: PPC: e500mc: Revert add load inst fixup
lwepx faults needs to be handled by KVM. With the current solution the host kernel searches for the faulting address using its LPID context. If a host translation is found we return to the lwepx instr instead of the fixup ending up in an infinite loop. Revert the commit 1d628af7 add load inst fixup. We will address lwepx issue in a subsequent patch without the need of fixups. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- arch/powerpc/kvm/bookehv_interrupts.S | 26 +- 1 files changed, 1 insertions(+), 25 deletions(-) diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index e8ed7d6..20c7a54 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -29,7 +29,6 @@ #include asm/asm-compat.h #include asm/asm-offsets.h #include asm/bitsperlong.h -#include asm/thread_info.h #ifdef CONFIG_64BIT #include asm/exception-64e.h @@ -162,32 +161,9 @@ PPC_STL r30, VCPU_GPR(R30)(r4) PPC_STL r31, VCPU_GPR(R31)(r4) mtspr SPRN_EPLC, r8 - - /* disable preemption, so we are sure we hit the fixup handler */ - CURRENT_THREAD_INFO(r8, r1) - li r7, 1 - stw r7, TI_PREEMPT(r8) - isync - - /* -* In case the read goes wrong, we catch it and write an invalid value -* in LAST_INST instead. -*/ -1: lwepx r9, 0, r5 -2: -.section .fixup, ax -3: li r9, KVM_INST_FETCH_FAILED - b 2b -.previous -.section __ex_table,a - PPC_LONG_ALIGN - PPC_LONG 1b,3b -.previous - + lwepx r9, 0, r5 mtspr SPRN_EPLC, r3 - li r7, 0 - stw r7, TI_PREEMPT(r8) stw r9, VCPU_LAST_INST(r4) .endif -- 1.7.4.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost_net: clear msg.control for non-zerocopy case during tx
Hello. On 06/06/2013 07:27 AM, Jason Wang wrote: When we decide not use zero-copy, msg.control should be set to NULL otherwise macvtap/tap may set zerocopy callbacks which may decrease the kref of ubufs wrongly. Bug were introduced by commit cedb9bdce099206290a2bdd02ce47a7b253b6a84 (vhost-net: skip head management if no outstanding). This solves the following warnings: WARNING: at include/linux/kref.h:47 handle_tx+0x477/0x4b0 [vhost_net]() Modules linked in: vhost_net macvtap macvlan tun nfsd exportfs bridge stp llc openvswitch kvm_amd kvm bnx2 megaraid_sas [last unloaded: tun] CPU: 5 PID: 8670 Comm: vhost-8668 Not tainted 3.10.0-rc2+ #1566 Hardware name: Dell Inc. PowerEdge R715/00XHKG, BIOS 1.5.2 04/19/2011 a0198323 88007c9ebd08 81796b73 88007c9ebd48 8103d66b 7b773e20 8800779f 8800779f43f0 8800779f8418 015c 0062 88007c9ebd58 Call Trace: [81796b73] dump_stack+0x19/0x1e [8103d66b] warn_slowpath_common+0x6b/0xa0 [8103d6b5] warn_slowpath_null+0x15/0x20 [a0197627] handle_tx+0x477/0x4b0 [vhost_net] [a0197690] handle_tx_kick+0x10/0x20 [vhost_net] [a019541e] vhost_worker+0xfe/0x1a0 [vhost_net] [a0195320] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net] [a0195320] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net] [81061f46] kthread+0xc6/0xd0 [81061e80] ? kthread_freezable_should_stop+0x70/0x70 [817a1aec] ret_from_fork+0x7c/0xb0 [81061e80] ? kthread_freezable_should_stop+0x70/0x70 Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/vhost/net.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 2b51e23..b07d96b 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -436,7 +436,8 @@ static void handle_tx(struct vhost_net *net) kref_get(ubufs-kref); } nvq-upend_idx = (nvq-upend_idx + 1) % UIO_MAXIOV; -} +} else You have to use {} on the *else* branch if you have it of the *if* branch (and vice versa), according to Documentation/CodingStyle. checkpatch.pl didn't complain this, will send v2. Yes, this check is what it seems to lack still. Thanks WBR, Sergei -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net 0/2] vhost fixes for 3.10
2013/6/6 Michael S. Tsirkin m...@redhat.com: Two patches fixing the fallout from the vhost cleanup in 3.10. Thanks to Tommi Rantala who reported the issue. Tommi, could you please confirm this fixes the crashes for you? Confirmed! With the two patches applied, I can no longer reproduce the crash with trinity. Thanks! Tommi Michael S. Tsirkin (2): vhost: check owner before we overwrite ubuf_info vhost: fix ubuf_info cleanup drivers/vhost/net.c | 26 +++--- drivers/vhost/vhost.c | 8 +++- drivers/vhost/vhost.h | 1 + 3 files changed, 19 insertions(+), 16 deletions(-) -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/6] KVM: PPC: Book3E: AltiVec support
On 06/06/2013 04:42:44 AM, Caraman Mihai Claudiu-B02008 wrote: This looks like a bit much for 3.10 (certainly, subject lines like refactor and enhance and add support aren't going to make Linus happy given that we're past rc4) so I think we should apply http://patchwork.ozlabs.org/patch/242896/ for 3.10. Then for 3.11, revert it after applying this patchset. Why not 1/6 plus e6500 removal? 1/6 is not a bugfix. Not sure I get it. Isn't this a better fix for AltiVec build breakage: -#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL 42 -#define BOOKE_INTERRUPT_ALTIVEC_ASSIST 43 +#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL 32 +#define BOOKE_INTERRUPT_ALTIVEC_ASSIST 33 This removes the need for additional kvm_handlers. Obvious this doesn't make AltiVec to work so we still need to disable e6500. OK, didn't realize you meant it as an alternative fix to what was in my patch. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/8] kvm/ppc: IRQ disabling cleanup
Simplify the handling of lazy EE by going directly from fully-enabled to hard-disabled. This replaces the lazy_irq_pending() check (including its misplaced kvm_guest_exit() call). As suggested by Tiejun Chen, move the interrupt disabling into kvmppc_prepare_to_enter() rather than have each caller do it. Also move the IRQ enabling on heavyweight exit into kvmppc_prepare_to_enter(). Signed-off-by: Scott Wood scottw...@freescale.com --- arch/powerpc/include/asm/kvm_ppc.h |6 ++ arch/powerpc/kvm/book3s_pr.c | 12 +++- arch/powerpc/kvm/booke.c | 11 +++ arch/powerpc/kvm/powerpc.c | 23 ++- 4 files changed, 22 insertions(+), 30 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 6885846..e4474f8 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -404,6 +404,12 @@ static inline void kvmppc_fix_ee_before_entry(void) trace_hardirqs_on(); #ifdef CONFIG_PPC64 + /* +* To avoid races, the caller must have gone directly from having +* interrupts fully-enabled to hard-disabled. +*/ + WARN_ON(local_paca-irq_happened != PACA_IRQ_HARD_DIS); + /* Only need to enable IRQs by hard enabling them after this */ local_paca-irq_happened = 0; local_paca-soft_enabled = 1; diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index 0b97ce4..e61e39e 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -884,14 +884,11 @@ program_interrupt: * and if we really did time things so badly, then we just exit * again due to a host external interrupt. */ - local_irq_disable(); s = kvmppc_prepare_to_enter(vcpu); - if (s = 0) { - local_irq_enable(); + if (s = 0) r = s; - } else { + else kvmppc_fix_ee_before_entry(); - } } trace_kvm_book3s_reenter(r, vcpu); @@ -1121,12 +1118,9 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) * really did time things so badly, then we just exit again due to * a host external interrupt. */ - local_irq_disable(); ret = kvmppc_prepare_to_enter(vcpu); - if (ret = 0) { - local_irq_enable(); + if (ret = 0) goto out; - } /* Save FPU state in stack */ if (current-thread.regs-msr MSR_FP) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 08f4aa1..c5270a3 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -617,7 +617,7 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu) local_irq_enable(); kvm_vcpu_block(vcpu); clear_bit(KVM_REQ_UNHALT, vcpu-requests); - local_irq_disable(); + hard_irq_disable(); kvmppc_set_exit_type(vcpu, EMULATED_MTMSRWE_EXITS); r = 1; @@ -666,10 +666,8 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) return -EINVAL; } - local_irq_disable(); s = kvmppc_prepare_to_enter(vcpu); if (s = 0) { - local_irq_enable(); ret = s; goto out; } @@ -1161,14 +1159,11 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, * aren't already exiting to userspace for some other reason. */ if (!(r RESUME_HOST)) { - local_irq_disable(); s = kvmppc_prepare_to_enter(vcpu); - if (s = 0) { - local_irq_enable(); + if (s = 0) r = (s 2) | RESUME_HOST | (r RESUME_FLAG_NV); - } else { + else kvmppc_fix_ee_before_entry(); - } } return r; diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 4e05f8c..2f7a221 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -64,12 +64,14 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu) { int r = 1; - WARN_ON_ONCE(!irqs_disabled()); + WARN_ON(irqs_disabled()); + hard_irq_disable(); + while (true) { if (need_resched()) { local_irq_enable(); cond_resched(); - local_irq_disable(); + hard_irq_disable(); continue; } @@ -95,7 +97,7 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu) local_irq_enable(); trace_kvm_check_requests(vcpu); r =
[PATCH 3/8] kvm/ppc/booke: Hold srcu lock when calling gfn functions
KVM core expects arch code to acquire the srcu lock when calling gfn_to_memslot and similar functions. Signed-off-by: Scott Wood scottw...@freescale.com --- arch/powerpc/kvm/44x_tlb.c |5 + arch/powerpc/kvm/booke.c|7 +++ arch/powerpc/kvm/e500_mmu.c |5 + 3 files changed, 17 insertions(+) diff --git a/arch/powerpc/kvm/44x_tlb.c b/arch/powerpc/kvm/44x_tlb.c index 5dd3ab4..ed03854 100644 --- a/arch/powerpc/kvm/44x_tlb.c +++ b/arch/powerpc/kvm/44x_tlb.c @@ -441,6 +441,7 @@ int kvmppc_44x_emul_tlbwe(struct kvm_vcpu *vcpu, u8 ra, u8 rs, u8 ws) struct kvmppc_vcpu_44x *vcpu_44x = to_44x(vcpu); struct kvmppc_44x_tlbe *tlbe; unsigned int gtlb_index; + int idx; gtlb_index = kvmppc_get_gpr(vcpu, ra); if (gtlb_index = KVM44x_GUEST_TLB_SIZE) { @@ -473,6 +474,8 @@ int kvmppc_44x_emul_tlbwe(struct kvm_vcpu *vcpu, u8 ra, u8 rs, u8 ws) return EMULATE_FAIL; } + idx = srcu_read_lock(vcpu-kvm-srcu); + if (tlbe_is_host_safe(vcpu, tlbe)) { gva_t eaddr; gpa_t gpaddr; @@ -489,6 +492,8 @@ int kvmppc_44x_emul_tlbwe(struct kvm_vcpu *vcpu, u8 ra, u8 rs, u8 ws) kvmppc_mmu_map(vcpu, eaddr, gpaddr, gtlb_index); } + srcu_read_unlock(vcpu-kvm-srcu, idx); + trace_kvm_gtlb_write(gtlb_index, tlbe-tid, tlbe-word0, tlbe-word1, tlbe-word2); diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 1020119..ecbe908 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -832,6 +832,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, { int r = RESUME_HOST; int s; + int idx; /* update before a new last_exit_type is rewritten */ kvmppc_update_timing_stats(vcpu); @@ -1053,6 +1054,8 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, break; } + idx = srcu_read_lock(vcpu-kvm-srcu); + gpaddr = kvmppc_mmu_xlate(vcpu, gtlb_index, eaddr); gfn = gpaddr PAGE_SHIFT; @@ -1075,6 +1078,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, kvmppc_account_exit(vcpu, MMIO_EXITS); } + srcu_read_unlock(vcpu-kvm-srcu, idx); break; } @@ -1098,6 +1102,8 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, kvmppc_account_exit(vcpu, ITLB_VIRT_MISS_EXITS); + idx = srcu_read_lock(vcpu-kvm-srcu); + gpaddr = kvmppc_mmu_xlate(vcpu, gtlb_index, eaddr); gfn = gpaddr PAGE_SHIFT; @@ -1114,6 +1120,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_MACHINE_CHECK); } + srcu_read_unlock(vcpu-kvm-srcu, idx); break; } diff --git a/arch/powerpc/kvm/e500_mmu.c b/arch/powerpc/kvm/e500_mmu.c index c41a5a9..6d6f153 100644 --- a/arch/powerpc/kvm/e500_mmu.c +++ b/arch/powerpc/kvm/e500_mmu.c @@ -396,6 +396,7 @@ int kvmppc_e500_emul_tlbwe(struct kvm_vcpu *vcpu) struct kvm_book3e_206_tlb_entry *gtlbe; int tlbsel, esel; int recal = 0; + int idx; tlbsel = get_tlb_tlbsel(vcpu); esel = get_tlb_esel(vcpu, tlbsel); @@ -430,6 +431,8 @@ int kvmppc_e500_emul_tlbwe(struct kvm_vcpu *vcpu) kvmppc_set_tlb1map_range(vcpu, gtlbe); } + idx = srcu_read_lock(vcpu-kvm-srcu); + /* Invalidate shadow mappings for the about-to-be-clobbered TLBE. */ if (tlbe_is_host_safe(vcpu, gtlbe)) { u64 eaddr = get_tlb_eaddr(gtlbe); @@ -444,6 +447,8 @@ int kvmppc_e500_emul_tlbwe(struct kvm_vcpu *vcpu) kvmppc_mmu_map(vcpu, eaddr, raddr, index_of(tlbsel, esel)); } + srcu_read_unlock(vcpu-kvm-srcu, idx); + kvmppc_set_exit_type(vcpu, EMULATED_TLBWE_EXITS); return EMULATE_DONE; } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/8] kvm/ppc: fixes for 3.10
Most of these have been posted before, but I grouped them together as there are some contextual dependencies between them. Gleb/Paolo: As Alex doesn't appear to be back yet, can you apply these if there's no objection over the next few days? Mihai Caraman (1): kvm/ppc/booke64: Fix AltiVec interrupt numbers and build breakage Scott Wood (7): kvm/ppc/booke64: Disable e6500 support kvm/ppc/booke: Hold srcu lock when calling gfn functions kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit() kvm/ppc: Call trace_hardirqs_on before entry kvm/ppc: IRQ disabling cleanup kvm/ppc/booke: Delay kvmppc_fix_ee_before_entry kvm/ppc/booke: Don't call kvm_guest_enter twice arch/powerpc/include/asm/kvm_asm.h | 16 ++-- arch/powerpc/include/asm/kvm_ppc.h | 17 ++--- arch/powerpc/kvm/44x_tlb.c |5 + arch/powerpc/kvm/book3s_pr.c | 16 +--- arch/powerpc/kvm/booke.c | 36 arch/powerpc/kvm/e500_mmu.c|5 + arch/powerpc/kvm/e500mc.c |2 -- arch/powerpc/kvm/powerpc.c | 25 ++--- 8 files changed, 73 insertions(+), 49 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/8] kvm/ppc/booke: Delay kvmppc_fix_ee_before_entry
kwmppc_fix_ee_before_entry() should be called as late as possible, or else we get things like WARN_ON(preemptible()) in enable_kernel_fp() in configurations where preemptible() works. Note that book3s_pr already waits until just before __kvmppc_vcpu_run to call kvmppc_fix_ee_before_entry(). Signed-off-by: Scott Wood scottw...@freescale.com --- arch/powerpc/kvm/booke.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index c5270a3..f953324 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -671,7 +671,6 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) ret = s; goto out; } - kvmppc_fix_ee_before_entry(); kvm_guest_enter(); @@ -697,6 +696,8 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) kvmppc_load_guest_fp(vcpu); #endif + kvmppc_fix_ee_before_entry(); + ret = __kvmppc_vcpu_run(kvm_run, vcpu); /* No need for kvm_guest_exit. It's done in handle_exit. -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/8] kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()
EE is hard-disabled on entry to kvmppc_handle_exit(), so call hard_irq_disable() so that PACA_IRQ_HARD_DIS is set, and soft_enabled is unset. Without this, we get warnings such as arch/powerpc/kernel/time.c:300, and sometimes host kernel hangs. Signed-off-by: Scott Wood scottw...@freescale.com --- arch/powerpc/kvm/booke.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index ecbe908..5cd7ad0 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -834,6 +834,17 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, int s; int idx; +#ifdef CONFIG_PPC64 + WARN_ON(local_paca-irq_happened != 0); +#endif + + /* +* We enter with interrupts disabled in hardware, but +* we need to call hard_irq_disable anyway to ensure that +* the software state is kept in sync. +*/ + hard_irq_disable(); + /* update before a new last_exit_type is rewritten */ kvmppc_update_timing_stats(vcpu); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/8] kvm/ppc: Call trace_hardirqs_on before entry
Currently this is only being done on 64-bit. Rather than just move it out of the 64-bit ifdef, move it to kvm_lazy_ee_enable() so that it is consistent with lazy ee state, and so that we don't track more host code as interrupts-enabled than necessary. Rename kvm_lazy_ee_enable() to kvm_fix_ee_before_entry() to reflect that this function now has a role on 32-bit as well. Signed-off-by: Scott Wood scottw...@freescale.com --- arch/powerpc/include/asm/kvm_ppc.h | 11 --- arch/powerpc/kvm/book3s_pr.c |4 ++-- arch/powerpc/kvm/booke.c |4 ++-- arch/powerpc/kvm/powerpc.c |2 -- 4 files changed, 12 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index a5287fe..6885846 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -394,10 +394,15 @@ static inline void kvmppc_mmu_flush_icache(pfn_t pfn) } } -/* Please call after prepare_to_enter. This function puts the lazy ee state - back to normal mode, without actually enabling interrupts. */ -static inline void kvmppc_lazy_ee_enable(void) +/* + * Please call after prepare_to_enter. This function puts the lazy ee and irq + * disabled tracking state back to normal mode, without actually enabling + * interrupts. + */ +static inline void kvmppc_fix_ee_before_entry(void) { + trace_hardirqs_on(); + #ifdef CONFIG_PPC64 /* Only need to enable IRQs by hard enabling them after this */ local_paca-irq_happened = 0; diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index bdc40b8..0b97ce4 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -890,7 +890,7 @@ program_interrupt: local_irq_enable(); r = s; } else { - kvmppc_lazy_ee_enable(); + kvmppc_fix_ee_before_entry(); } } @@ -1161,7 +1161,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) if (vcpu-arch.shared-msr MSR_FP) kvmppc_handle_ext(vcpu, BOOK3S_INTERRUPT_FP_UNAVAIL, MSR_FP); - kvmppc_lazy_ee_enable(); + kvmppc_fix_ee_before_entry(); ret = __kvmppc_vcpu_run(kvm_run, vcpu); diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 5cd7ad0..08f4aa1 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -673,7 +673,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) ret = s; goto out; } - kvmppc_lazy_ee_enable(); + kvmppc_fix_ee_before_entry(); kvm_guest_enter(); @@ -1167,7 +1167,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, local_irq_enable(); r = (s 2) | RESUME_HOST | (r RESUME_FLAG_NV); } else { - kvmppc_lazy_ee_enable(); + kvmppc_fix_ee_before_entry(); } } diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 6316ee3..4e05f8c 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -117,8 +117,6 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu) kvm_guest_exit(); continue; } - - trace_hardirqs_on(); #endif kvm_guest_enter(); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/8] kvm/ppc/booke: Don't call kvm_guest_enter twice
kvm_guest_enter() was already called by kvmppc_prepare_to_enter(). Don't call it again. Signed-off-by: Scott Wood scottw...@freescale.com --- arch/powerpc/kvm/booke.c |2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index f953324..0b4d792 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -672,8 +672,6 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) goto out; } - kvm_guest_enter(); - #ifdef CONFIG_PPC_FPU /* Save userspace FPU state in stack */ enable_kernel_fp(); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/8] kvm/ppc/booke64: Fix AltiVec interrupt numbers and build breakage
From: Mihai Caraman mihai.cara...@freescale.com Interrupt numbers defined for Book3E follows IVORs definition. Align BOOKE_INTERRUPT_ALTIVEC_UNAVAIL and BOOKE_INTERRUPT_ALTIVEC_ASSIST to this rule which also fixes the build breakage. IVORs 32 and 33 are shared so reflect this in the interrupts naming. This fixes a build break for 64-bit booke KVM. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Signed-off-by: Scott Wood scottw...@freescale.com --- arch/powerpc/include/asm/kvm_asm.h | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h index b9dd382..851bac7 100644 --- a/arch/powerpc/include/asm/kvm_asm.h +++ b/arch/powerpc/include/asm/kvm_asm.h @@ -54,8 +54,16 @@ #define BOOKE_INTERRUPT_DEBUG 15 /* E500 */ -#define BOOKE_INTERRUPT_SPE_UNAVAIL 32 -#define BOOKE_INTERRUPT_SPE_FP_DATA 33 +#define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32 +#define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33 +/* + * TODO: Unify 32-bit and 64-bit kernel exception handlers to use same defines + */ +#define BOOKE_INTERRUPT_SPE_UNAVAIL BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL +#define BOOKE_INTERRUPT_SPE_FP_DATA BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST +#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL +#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \ + BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST #define BOOKE_INTERRUPT_SPE_FP_ROUND 34 #define BOOKE_INTERRUPT_PERFORMANCE_MONITOR 35 #define BOOKE_INTERRUPT_DOORBELL 36 @@ -67,10 +75,6 @@ #define BOOKE_INTERRUPT_HV_SYSCALL 40 #define BOOKE_INTERRUPT_HV_PRIV 41 -/* altivec */ -#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL 42 -#define BOOKE_INTERRUPT_ALTIVEC_ASSIST 43 - /* book3s */ #define BOOK3S_INTERRUPT_SYSTEM_RESET 0x100 -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/8] kvm/ppc/booke64: Disable e6500 support
The previous patch made 64-bit booke KVM build again, but Altivec support is still not complete, and we can't prevent the guest from turning on Altivec (which can corrupt host state until state save/restore is implemented). Disable e6500 on KVM until this is fixed. Signed-off-by: Scott Wood scottw...@freescale.com --- Mihai has posted RFC patches for proper Altivec support, so disabling e6500 should only need to be for 3.10. --- arch/powerpc/kvm/e500mc.c |2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c index 753cc99..19c8379 100644 --- a/arch/powerpc/kvm/e500mc.c +++ b/arch/powerpc/kvm/e500mc.c @@ -177,8 +177,6 @@ int kvmppc_core_check_processor_compat(void) r = 0; else if (strcmp(cur_cpu_spec-cpu_name, e5500) == 0) r = 0; - else if (strcmp(cur_cpu_spec-cpu_name, e6500) == 0) - r = 0; else r = -ENOTSUPP; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] kvm-unit-tests: Add a func to run instruction in emulator
This version of save/restore general register seems a bit too ugly, I will change it and commit another patch. Some of the registers cannot be set as realmode.c do, for example %rax used to save return value, wrong %esp %ebp may cause crash, and I think changed %rflags may cause some unknown error. So these registers should not be set by caller. Arthur On Thu, Jun 6, 2013 at 11:24 PM, Arthur Chunqi Li yzt...@gmail.com wrote: Add a function trap_emulator to run an instruction in emulator. Set inregs first (%rax is invalid because it is used as return address), put instruction codec in alt_insn and call func with alt_insn_length. Get results in outregs. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/emulator.c | 81 1 file changed, 81 insertions(+) diff --git a/x86/emulator.c b/x86/emulator.c index 96576e5..8ab9904 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -11,6 +11,14 @@ int fails, tests; static int exceptions; +struct regs { + u64 rax, rbx, rcx, rdx; + u64 rsi, rdi, rsp, rbp; + u64 rip, rflags; +}; + +static struct regs inregs, outregs; + void report(const char *name, int result) { ++tests; @@ -685,6 +693,79 @@ static void test_shld_shrd(u32 *mem) report(shrd (cl), *mem == ((0x12345678 3) | (5u 29))); } +static void trap_emulator(uint64_t *mem, uint8_t *insn_page, +uint8_t *alt_insn_page, void *insn_ram, +uint8_t *alt_insn, int alt_insn_length) +{ + ulong *cr3 = (ulong *)read_cr3(); + int i; + + // Pad with RET instructions + memset(insn_page, 0xc3, 4096); + memset(alt_insn_page, 0xc3, 4096); + + // Place a trapping instruction in the page to trigger a VMEXIT + insn_page[0] = 0x89; // mov %eax, (%rax) + insn_page[1] = 0x00; + insn_page[2] = 0x90; // nop + insn_page[3] = 0xc3; // ret + + // Place the instruction we want the hypervisor to see in the alternate page + for (i=0; ialt_insn_length; i++) + alt_insn_page[i] = alt_insn[i]; + + // Save general registers + asm volatile( + push %rax\n\r + push %rbx\n\r + push %rcx\n\r + push %rdx\n\r + push %rsi\n\r + push %rdi\n\r + ); + // Load the code TLB with insn_page, but point the page tables at + // alt_insn_page (and keep the data TLB clear, for AMD decode assist). + // This will make the CPU trap on the insn_page instruction but the + // hypervisor will see alt_insn_page. + install_page(cr3, virt_to_phys(insn_page), insn_ram); + invlpg(insn_ram); + // Load code TLB + asm volatile(call *%0 : : r(insn_ram + 3)); + install_page(cr3, virt_to_phys(alt_insn_page), insn_ram); + // Trap, let hypervisor emulate at alt_insn_page + asm volatile( + call *%1\n\r + + mov %%rax, 0+%[outregs] \n\t + mov %%rbx, 8+%[outregs] \n\t + mov %%rcx, 16+%[outregs] \n\t + mov %%rdx, 24+%[outregs] \n\t + mov %%rsi, 32+%[outregs] \n\t + mov %%rdi, 40+%[outregs] \n\t + mov %%rsp,48+ %[outregs] \n\t + mov %%rbp, 56+%[outregs] \n\t + + /* Save RFLAGS in outregs*/ + pushf \n\t + popq 72+%[outregs] \n\t + : [outregs]+m(outregs) + : r(insn_ram), + a(mem), b(inregs.rbx), + c(inregs.rcx), d(inregs.rdx), + S(inregs.rsi), D(inregs.rdi) + : memory, cc + ); + // Restore general registers + asm volatile( + pop %rax\n\r + pop %rbx\n\r + pop %rcx\n\r + pop %rdx\n\r + pop %rsi\n\r + pop %rdi\n\r + ); +} + static void advance_rip_by_3_and_note_exception(struct ex_regs *regs) { ++exceptions; -- 1.7.9.5 -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] kvm-unit-tests: Change two cases to use trap_emulator
Change two functions (test_mmx_movq_mf and test_movabs) using unified trap_emulator. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/emulator.c | 66 1 file changed, 14 insertions(+), 52 deletions(-) diff --git a/x86/emulator.c b/x86/emulator.c index 770e8f7..f8a204e 100755 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -762,72 +762,34 @@ static void test_mmx_movq_mf(uint64_t *mem, uint8_t *insn_page, uint8_t *alt_insn_page, void *insn_ram) { uint16_t fcw = 0; // all exceptions unmasked -ulong *cr3 = (ulong *)read_cr3(); +uint8_t alt_insn[] = {0x0f, 0x7f, 0x00}; // movq %mm0, (%rax) write_cr0(read_cr0() ~6); // TS, EM -// Place a trapping instruction in the page to trigger a VMEXIT -insn_page[0] = 0x89; // mov %eax, (%rax) -insn_page[1] = 0x00; -insn_page[2] = 0x90; // nop -insn_page[3] = 0xc3; // ret -// Place the instruction we want the hypervisor to see in the alternate page -alt_insn_page[0] = 0x0f; // movq %mm0, (%rax) -alt_insn_page[1] = 0x7f; -alt_insn_page[2] = 0x00; -alt_insn_page[3] = 0xc3; // ret - exceptions = 0; handle_exception(MF_VECTOR, advance_rip_by_3_and_note_exception); - -// Load the code TLB with insn_page, but point the page tables at -// alt_insn_page (and keep the data TLB clear, for AMD decode assist). -// This will make the CPU trap on the insn_page instruction but the -// hypervisor will see alt_insn_page. -install_page(cr3, virt_to_phys(insn_page), insn_ram); asm volatile(fninit; fldcw %0 : : m(fcw)); asm volatile(fldz; fldz; fdivp); // generate exception -invlpg(insn_ram); -// Load code TLB -asm volatile(call *%0 : : r(insn_ram + 3)); -install_page(cr3, virt_to_phys(alt_insn_page), insn_ram); -// Trap, let hypervisor emulate at alt_insn_page -asm volatile(call *%0 : : r(insn_ram), a(mem)); + +inregs = (struct regs){ 0 }; +trap_emulator(mem, insn_page, alt_insn_page, insn_ram, + alt_insn, 3); // exit MMX mode asm volatile(fnclex; emms); -report(movq mmx generates #MF, exceptions == 1); +report(movq mmx generates #MF2, exceptions == 1); handle_exception(MF_VECTOR, 0); } static void test_movabs(uint64_t *mem, uint8_t *insn_page, uint8_t *alt_insn_page, void *insn_ram) { -uint64_t val = 0; -ulong *cr3 = (ulong *)read_cr3(); - -// Pad with RET instructions -memset(insn_page, 0xc3, 4096); -memset(alt_insn_page, 0xc3, 4096); -// Place a trapping instruction in the page to trigger a VMEXIT -insn_page[0] = 0x89; // mov %eax, (%rax) -insn_page[1] = 0x00; -// Place the instruction we want the hypervisor to see in the alternate -// page. A buggy hypervisor will fetch a 32-bit immediate and return -// 0xc3c3c3c3. -alt_insn_page[0] = 0x48; // mov $0xc3c3c3c3c3c3c3c3, %rcx -alt_insn_page[1] = 0xb9; - -// Load the code TLB with insn_page, but point the page tables at -// alt_insn_page (and keep the data TLB clear, for AMD decode assist). -// This will make the CPU trap on the insn_page instruction but the -// hypervisor will see alt_insn_page. -install_page(cr3, virt_to_phys(insn_page), insn_ram); -// Load code TLB -invlpg(insn_ram); -asm volatile(call *%0 : : r(insn_ram + 3)); -// Trap, let hypervisor emulate at alt_insn_page -install_page(cr3, virt_to_phys(alt_insn_page), insn_ram); -asm volatile(call *%1 : =c(val) : r(insn_ram), a(mem), c(0)); -report(64-bit mov imm, val == 0xc3c3c3c3c3c3c3c3); +// mov $0xc3c3c3c3c3c3c3c3, %rcx +uint8_t alt_insn[] = {0x48, 0xb9, 0xc3, 0xc3, 0xc3, + 0xc3, 0xc3, 0xc3, 0xc3, 0xc3}; +inregs = (struct regs){ .rcx = 0 }; + +trap_emulator(mem, insn_page, alt_insn_page, insn_ram, + alt_insn, 10); +report(64-bit mov imm2, outregs.rcx == 0xc3c3c3c3c3c3c3c3); } static void test_crosspage_mmio(volatile uint8_t *mem) -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] kvm-unit-tests: Add a func to run instruction in emulator
Add a function trap_emulator to run an instruction in emulator. Set inregs first (%rax, %rsp, %rbp, %rflags have special usage and cannot set in inregs), put instruction codec in alt_insn and call func with alt_insn_length. Get results in outregs. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/emulator.c | 67 1 file changed, 67 insertions(+) mode change 100644 = 100755 x86/emulator.c diff --git a/x86/emulator.c b/x86/emulator.c old mode 100644 new mode 100755 index 96576e5..770e8f7 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -11,6 +11,13 @@ int fails, tests; static int exceptions; +struct regs { + u64 rax, rbx, rcx, rdx; + u64 rsi, rdi, rsp, rbp; + u64 rip, rflags; +}; +static struct regs inregs, outregs; + void report(const char *name, int result) { ++tests; @@ -685,6 +692,66 @@ static void test_shld_shrd(u32 *mem) report(shrd (cl), *mem == ((0x12345678 3) | (5u 29))); } +static void trap_emulator(uint64_t *mem, uint8_t *insn_page, +uint8_t *alt_insn_page, void *insn_ram, +uint8_t *alt_insn, int alt_insn_length) +{ + ulong *cr3 = (ulong *)read_cr3(); + int i; + static struct regs save; + + // Pad with RET instructions + memset(insn_page, 0xc3, 4096); + memset(alt_insn_page, 0xc3, 4096); + + // Place a trapping instruction in the page to trigger a VMEXIT + insn_page[0] = 0x89; // mov %eax, (%rax) + insn_page[1] = 0x00; + insn_page[2] = 0x90; // nop + insn_page[3] = 0xc3; // ret + + // Place the instruction we want the hypervisor to see in the alternate page + for (i=0; ialt_insn_length; i++) + alt_insn_page[i] = alt_insn[i]; + save = inregs; + + // Load the code TLB with insn_page, but point the page tables at + // alt_insn_page (and keep the data TLB clear, for AMD decode assist). + // This will make the CPU trap on the insn_page instruction but the + // hypervisor will see alt_insn_page. + install_page(cr3, virt_to_phys(insn_page), insn_ram); + invlpg(insn_ram); + // Load code TLB + asm volatile(call *%0 : : r(insn_ram + 3)); + install_page(cr3, virt_to_phys(alt_insn_page), insn_ram); + // Trap, let hypervisor emulate at alt_insn_page + asm volatile( + xchg %%rbx, 8+%[save] \n\t + xchg %%rcx, 16+%[save] \n\t + xchg %%rdx, 24+%[save] \n\t + xchg %%rsi, 32+%[save] \n\t + xchg %%rdi, 40+%[save] \n\t + + call *%1\n\t + + mov %%rax, 0+%[save] \n\t + xchg %%rbx, 8+%[save] \n\t + xchg %%rcx, 16+%[save] \n\t + xchg %%rdx, 24+%[save] \n\t + xchg %%rsi, 32+%[save] \n\t + xchg %%rdi, 40+%[save] \n\t + mov %%rsp, 48+%[save] \n\t + mov %%rbp, 56+%[save] \n\t + /* Save RFLAGS in outregs*/ + pushf \n\t + popq 72+%[save] \n\t + : [save]+m(save) + : r(insn_ram), a(mem) + : memory, cc + ); + outregs = save; +} + static void advance_rip_by_3_and_note_exception(struct ex_regs *regs) { ++exceptions; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Anthony Liguori aligu...@us.ibm.com writes: Hi Rusty, Rusty Russell ru...@rustcorp.com.au writes: Anthony Liguori aligu...@us.ibm.com writes: 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), give it a new device/vendor ID. Continue to use virtio-pci for existing devices potentially adding virtio-{net,blk,...}-pcie variants for people that care to use them. Now you have a different compatibility problem; how do you know the guest supports the new virtio-pcie net? We don't care. We would still use virtio-pci for existing devices. Only new devices would use virtio-pcie. My concern is that new features make the virtio-pcie driver so desirable that libvirt is pressured to use it ASAP. Have we just punted the problem upstream? (Of course, feature bits were supposed to avoid such a transition issue, but mistakes accumulate). Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
lwepx faults needs to be handled by KVM and this implies additional code in DO_KVM macro to identify the source of the exception originated in host context. This requires to check the Exception Syndrome Register (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for DTB_MISS, DSI and LRAT exceptions which is too intrusive for the host. Get rid of lwepx and acquire last instuction in kvmppc_handle_exit() by searching for the physical address and kmap it. This fixes an infinite loop caused by lwepx's data TLB miss handled in the host and the TODO for TLB eviction and execute-but-not-read entries. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- arch/powerpc/include/asm/mmu-book3e.h |6 ++- arch/powerpc/kvm/booke.c |6 +++ arch/powerpc/kvm/booke.h |2 + arch/powerpc/kvm/bookehv_interrupts.S | 32 ++- arch/powerpc/kvm/e500.c |4 ++ arch/powerpc/kvm/e500mc.c | 69 + 6 files changed, 91 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/include/asm/mmu-book3e.h b/arch/powerpc/include/asm/mmu-book3e.h index 99d43e0..32e470e 100644 --- a/arch/powerpc/include/asm/mmu-book3e.h +++ b/arch/powerpc/include/asm/mmu-book3e.h @@ -40,7 +40,10 @@ /* MAS registers bit definitions */ -#define MAS0_TLBSEL(x) (((x) 28) 0x3000) +#define MAS0_TLBSEL_MASK 0x3000 +#define MAS0_TLBSEL_SHIFT 28 +#define MAS0_TLBSEL(x) (((x) MAS0_TLBSEL_SHIFT) MAS0_TLBSEL_MASK) +#define MAS0_GET_TLBSEL(mas0) (((mas0) MAS0_TLBSEL_MASK) MAS0_TLBSEL_SHIFT) #define MAS0_ESEL_MASK 0x0FFF #define MAS0_ESEL_SHIFT16 #define MAS0_ESEL(x) (((x) MAS0_ESEL_SHIFT) MAS0_ESEL_MASK) @@ -58,6 +61,7 @@ #define MAS1_TSIZE_MASK0x0f80 #define MAS1_TSIZE_SHIFT 7 #define MAS1_TSIZE(x) (((x) MAS1_TSIZE_SHIFT) MAS1_TSIZE_MASK) +#define MAS1_GET_TSIZE(mas1) (((mas1) MAS1_TSIZE_MASK) MAS1_TSIZE_SHIFT) #define MAS2_EPN (~0xFFFUL) #define MAS2_X00x0040 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 1020119..6764a8e 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -836,6 +836,12 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, /* update before a new last_exit_type is rewritten */ kvmppc_update_timing_stats(vcpu); + /* +* The exception type can change at this point, such as if the TLB entry +* for the emulated instruction has been evicted. +*/ + kvmppc_prepare_for_emulation(vcpu, exit_nr); + /* restart interrupts if they were meant for the host */ kvmppc_restart_interrupt(vcpu, exit_nr); diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h index 5fd1ba6..a0d0fea 100644 --- a/arch/powerpc/kvm/booke.h +++ b/arch/powerpc/kvm/booke.h @@ -90,6 +90,8 @@ void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu); void kvmppc_booke_vcpu_load(struct kvm_vcpu *vcpu, int cpu); void kvmppc_booke_vcpu_put(struct kvm_vcpu *vcpu); +void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int *exit_nr); + enum int_class { INT_CLASS_NONCRIT, INT_CLASS_CRIT, diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index 20c7a54..0538ab9 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -120,37 +120,20 @@ .if \flags NEED_EMU /* -* This assumes you have external PID support. -* To support a bookehv CPU without external PID, you'll -* need to look up the TLB entry and create a temporary mapping. -* -* FIXME: we don't currently handle if the lwepx faults. PR-mode -* booke doesn't handle it either. Since Linux doesn't use -* broadcast tlbivax anymore, the only way this should happen is -* if the guest maps its memory execute-but-not-read, or if we -* somehow take a TLB miss in the middle of this entry code and -* evict the relevant entry. On e500mc, all kernel lowmem is -* bolted into TLB1 large page mappings, and we don't use -* broadcast invalidates, so we should not take a TLB miss here. -* -* Later we'll need to deal with faults here. Disallowing guest -* mappings that are execute-but-not-read could be an option on -* e500mc, but not on chips with an LRAT if it is used. +* We don't use external PID support. lwepx faults would need to be +* handled by KVM and this implies aditional code in DO_KVM (for +* DTB_MISS, DSI and LRAT) to check ESR[EPID] and EPLC[EGS] which +* is too intrusive for the host. Get last instuction in +* kvmppc_handle_exit(). */ - - mfspr r3, SPRN_EPLC /* will already have correct ELPID
Re: [RFC PATCH 0/6] KVM: PPC: Book3E: AltiVec support
On 06/06/2013 04:42:44 AM, Caraman Mihai Claudiu-B02008 wrote: This looks like a bit much for 3.10 (certainly, subject lines like refactor and enhance and add support aren't going to make Linus happy given that we're past rc4) so I think we should apply http://patchwork.ozlabs.org/patch/242896/ for 3.10. Then for 3.11, revert it after applying this patchset. Why not 1/6 plus e6500 removal? 1/6 is not a bugfix. Not sure I get it. Isn't this a better fix for AltiVec build breakage: -#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL 42 -#define BOOKE_INTERRUPT_ALTIVEC_ASSIST 43 +#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL 32 +#define BOOKE_INTERRUPT_ALTIVEC_ASSIST 33 This removes the need for additional kvm_handlers. Obvious this doesn't make AltiVec to work so we still need to disable e6500. OK, didn't realize you meant it as an alternative fix to what was in my patch. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/8] kvm/ppc: fixes for 3.10
Most of these have been posted before, but I grouped them together as there are some contextual dependencies between them. Gleb/Paolo: As Alex doesn't appear to be back yet, can you apply these if there's no objection over the next few days? Mihai Caraman (1): kvm/ppc/booke64: Fix AltiVec interrupt numbers and build breakage Scott Wood (7): kvm/ppc/booke64: Disable e6500 support kvm/ppc/booke: Hold srcu lock when calling gfn functions kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit() kvm/ppc: Call trace_hardirqs_on before entry kvm/ppc: IRQ disabling cleanup kvm/ppc/booke: Delay kvmppc_fix_ee_before_entry kvm/ppc/booke: Don't call kvm_guest_enter twice arch/powerpc/include/asm/kvm_asm.h | 16 ++-- arch/powerpc/include/asm/kvm_ppc.h | 17 ++--- arch/powerpc/kvm/44x_tlb.c |5 + arch/powerpc/kvm/book3s_pr.c | 16 +--- arch/powerpc/kvm/booke.c | 36 arch/powerpc/kvm/e500_mmu.c|5 + arch/powerpc/kvm/e500mc.c |2 -- arch/powerpc/kvm/powerpc.c | 25 ++--- 8 files changed, 73 insertions(+), 49 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/8] kvm/ppc/booke: Hold srcu lock when calling gfn functions
KVM core expects arch code to acquire the srcu lock when calling gfn_to_memslot and similar functions. Signed-off-by: Scott Wood scottw...@freescale.com --- arch/powerpc/kvm/44x_tlb.c |5 + arch/powerpc/kvm/booke.c|7 +++ arch/powerpc/kvm/e500_mmu.c |5 + 3 files changed, 17 insertions(+) diff --git a/arch/powerpc/kvm/44x_tlb.c b/arch/powerpc/kvm/44x_tlb.c index 5dd3ab4..ed03854 100644 --- a/arch/powerpc/kvm/44x_tlb.c +++ b/arch/powerpc/kvm/44x_tlb.c @@ -441,6 +441,7 @@ int kvmppc_44x_emul_tlbwe(struct kvm_vcpu *vcpu, u8 ra, u8 rs, u8 ws) struct kvmppc_vcpu_44x *vcpu_44x = to_44x(vcpu); struct kvmppc_44x_tlbe *tlbe; unsigned int gtlb_index; + int idx; gtlb_index = kvmppc_get_gpr(vcpu, ra); if (gtlb_index = KVM44x_GUEST_TLB_SIZE) { @@ -473,6 +474,8 @@ int kvmppc_44x_emul_tlbwe(struct kvm_vcpu *vcpu, u8 ra, u8 rs, u8 ws) return EMULATE_FAIL; } + idx = srcu_read_lock(vcpu-kvm-srcu); + if (tlbe_is_host_safe(vcpu, tlbe)) { gva_t eaddr; gpa_t gpaddr; @@ -489,6 +492,8 @@ int kvmppc_44x_emul_tlbwe(struct kvm_vcpu *vcpu, u8 ra, u8 rs, u8 ws) kvmppc_mmu_map(vcpu, eaddr, gpaddr, gtlb_index); } + srcu_read_unlock(vcpu-kvm-srcu, idx); + trace_kvm_gtlb_write(gtlb_index, tlbe-tid, tlbe-word0, tlbe-word1, tlbe-word2); diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 1020119..ecbe908 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -832,6 +832,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, { int r = RESUME_HOST; int s; + int idx; /* update before a new last_exit_type is rewritten */ kvmppc_update_timing_stats(vcpu); @@ -1053,6 +1054,8 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, break; } + idx = srcu_read_lock(vcpu-kvm-srcu); + gpaddr = kvmppc_mmu_xlate(vcpu, gtlb_index, eaddr); gfn = gpaddr PAGE_SHIFT; @@ -1075,6 +1078,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, kvmppc_account_exit(vcpu, MMIO_EXITS); } + srcu_read_unlock(vcpu-kvm-srcu, idx); break; } @@ -1098,6 +1102,8 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, kvmppc_account_exit(vcpu, ITLB_VIRT_MISS_EXITS); + idx = srcu_read_lock(vcpu-kvm-srcu); + gpaddr = kvmppc_mmu_xlate(vcpu, gtlb_index, eaddr); gfn = gpaddr PAGE_SHIFT; @@ -1114,6 +1120,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_MACHINE_CHECK); } + srcu_read_unlock(vcpu-kvm-srcu, idx); break; } diff --git a/arch/powerpc/kvm/e500_mmu.c b/arch/powerpc/kvm/e500_mmu.c index c41a5a9..6d6f153 100644 --- a/arch/powerpc/kvm/e500_mmu.c +++ b/arch/powerpc/kvm/e500_mmu.c @@ -396,6 +396,7 @@ int kvmppc_e500_emul_tlbwe(struct kvm_vcpu *vcpu) struct kvm_book3e_206_tlb_entry *gtlbe; int tlbsel, esel; int recal = 0; + int idx; tlbsel = get_tlb_tlbsel(vcpu); esel = get_tlb_esel(vcpu, tlbsel); @@ -430,6 +431,8 @@ int kvmppc_e500_emul_tlbwe(struct kvm_vcpu *vcpu) kvmppc_set_tlb1map_range(vcpu, gtlbe); } + idx = srcu_read_lock(vcpu-kvm-srcu); + /* Invalidate shadow mappings for the about-to-be-clobbered TLBE. */ if (tlbe_is_host_safe(vcpu, gtlbe)) { u64 eaddr = get_tlb_eaddr(gtlbe); @@ -444,6 +447,8 @@ int kvmppc_e500_emul_tlbwe(struct kvm_vcpu *vcpu) kvmppc_mmu_map(vcpu, eaddr, raddr, index_of(tlbsel, esel)); } + srcu_read_unlock(vcpu-kvm-srcu, idx); + kvmppc_set_exit_type(vcpu, EMULATED_TLBWE_EXITS); return EMULATE_DONE; } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/8] kvm/ppc: IRQ disabling cleanup
Simplify the handling of lazy EE by going directly from fully-enabled to hard-disabled. This replaces the lazy_irq_pending() check (including its misplaced kvm_guest_exit() call). As suggested by Tiejun Chen, move the interrupt disabling into kvmppc_prepare_to_enter() rather than have each caller do it. Also move the IRQ enabling on heavyweight exit into kvmppc_prepare_to_enter(). Signed-off-by: Scott Wood scottw...@freescale.com --- arch/powerpc/include/asm/kvm_ppc.h |6 ++ arch/powerpc/kvm/book3s_pr.c | 12 +++- arch/powerpc/kvm/booke.c | 11 +++ arch/powerpc/kvm/powerpc.c | 23 ++- 4 files changed, 22 insertions(+), 30 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 6885846..e4474f8 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -404,6 +404,12 @@ static inline void kvmppc_fix_ee_before_entry(void) trace_hardirqs_on(); #ifdef CONFIG_PPC64 + /* +* To avoid races, the caller must have gone directly from having +* interrupts fully-enabled to hard-disabled. +*/ + WARN_ON(local_paca-irq_happened != PACA_IRQ_HARD_DIS); + /* Only need to enable IRQs by hard enabling them after this */ local_paca-irq_happened = 0; local_paca-soft_enabled = 1; diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index 0b97ce4..e61e39e 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -884,14 +884,11 @@ program_interrupt: * and if we really did time things so badly, then we just exit * again due to a host external interrupt. */ - local_irq_disable(); s = kvmppc_prepare_to_enter(vcpu); - if (s = 0) { - local_irq_enable(); + if (s = 0) r = s; - } else { + else kvmppc_fix_ee_before_entry(); - } } trace_kvm_book3s_reenter(r, vcpu); @@ -1121,12 +1118,9 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) * really did time things so badly, then we just exit again due to * a host external interrupt. */ - local_irq_disable(); ret = kvmppc_prepare_to_enter(vcpu); - if (ret = 0) { - local_irq_enable(); + if (ret = 0) goto out; - } /* Save FPU state in stack */ if (current-thread.regs-msr MSR_FP) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 08f4aa1..c5270a3 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -617,7 +617,7 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu) local_irq_enable(); kvm_vcpu_block(vcpu); clear_bit(KVM_REQ_UNHALT, vcpu-requests); - local_irq_disable(); + hard_irq_disable(); kvmppc_set_exit_type(vcpu, EMULATED_MTMSRWE_EXITS); r = 1; @@ -666,10 +666,8 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) return -EINVAL; } - local_irq_disable(); s = kvmppc_prepare_to_enter(vcpu); if (s = 0) { - local_irq_enable(); ret = s; goto out; } @@ -1161,14 +1159,11 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, * aren't already exiting to userspace for some other reason. */ if (!(r RESUME_HOST)) { - local_irq_disable(); s = kvmppc_prepare_to_enter(vcpu); - if (s = 0) { - local_irq_enable(); + if (s = 0) r = (s 2) | RESUME_HOST | (r RESUME_FLAG_NV); - } else { + else kvmppc_fix_ee_before_entry(); - } } return r; diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 4e05f8c..2f7a221 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -64,12 +64,14 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu) { int r = 1; - WARN_ON_ONCE(!irqs_disabled()); + WARN_ON(irqs_disabled()); + hard_irq_disable(); + while (true) { if (need_resched()) { local_irq_enable(); cond_resched(); - local_irq_disable(); + hard_irq_disable(); continue; } @@ -95,7 +97,7 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu) local_irq_enable(); trace_kvm_check_requests(vcpu); r =
[PATCH 8/8] kvm/ppc/booke: Don't call kvm_guest_enter twice
kvm_guest_enter() was already called by kvmppc_prepare_to_enter(). Don't call it again. Signed-off-by: Scott Wood scottw...@freescale.com --- arch/powerpc/kvm/booke.c |2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index f953324..0b4d792 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -672,8 +672,6 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) goto out; } - kvm_guest_enter(); - #ifdef CONFIG_PPC_FPU /* Save userspace FPU state in stack */ enable_kernel_fp(); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/8] kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()
EE is hard-disabled on entry to kvmppc_handle_exit(), so call hard_irq_disable() so that PACA_IRQ_HARD_DIS is set, and soft_enabled is unset. Without this, we get warnings such as arch/powerpc/kernel/time.c:300, and sometimes host kernel hangs. Signed-off-by: Scott Wood scottw...@freescale.com --- arch/powerpc/kvm/booke.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index ecbe908..5cd7ad0 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -834,6 +834,17 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, int s; int idx; +#ifdef CONFIG_PPC64 + WARN_ON(local_paca-irq_happened != 0); +#endif + + /* +* We enter with interrupts disabled in hardware, but +* we need to call hard_irq_disable anyway to ensure that +* the software state is kept in sync. +*/ + hard_irq_disable(); + /* update before a new last_exit_type is rewritten */ kvmppc_update_timing_stats(vcpu); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/8] kvm/ppc/booke64: Fix AltiVec interrupt numbers and build breakage
From: Mihai Caraman mihai.cara...@freescale.com Interrupt numbers defined for Book3E follows IVORs definition. Align BOOKE_INTERRUPT_ALTIVEC_UNAVAIL and BOOKE_INTERRUPT_ALTIVEC_ASSIST to this rule which also fixes the build breakage. IVORs 32 and 33 are shared so reflect this in the interrupts naming. This fixes a build break for 64-bit booke KVM. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Signed-off-by: Scott Wood scottw...@freescale.com --- arch/powerpc/include/asm/kvm_asm.h | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h index b9dd382..851bac7 100644 --- a/arch/powerpc/include/asm/kvm_asm.h +++ b/arch/powerpc/include/asm/kvm_asm.h @@ -54,8 +54,16 @@ #define BOOKE_INTERRUPT_DEBUG 15 /* E500 */ -#define BOOKE_INTERRUPT_SPE_UNAVAIL 32 -#define BOOKE_INTERRUPT_SPE_FP_DATA 33 +#define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32 +#define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33 +/* + * TODO: Unify 32-bit and 64-bit kernel exception handlers to use same defines + */ +#define BOOKE_INTERRUPT_SPE_UNAVAIL BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL +#define BOOKE_INTERRUPT_SPE_FP_DATA BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST +#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL +#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \ + BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST #define BOOKE_INTERRUPT_SPE_FP_ROUND 34 #define BOOKE_INTERRUPT_PERFORMANCE_MONITOR 35 #define BOOKE_INTERRUPT_DOORBELL 36 @@ -67,10 +75,6 @@ #define BOOKE_INTERRUPT_HV_SYSCALL 40 #define BOOKE_INTERRUPT_HV_PRIV 41 -/* altivec */ -#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL 42 -#define BOOKE_INTERRUPT_ALTIVEC_ASSIST 43 - /* book3s */ #define BOOK3S_INTERRUPT_SYSTEM_RESET 0x100 -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/8] kvm/ppc: Call trace_hardirqs_on before entry
Currently this is only being done on 64-bit. Rather than just move it out of the 64-bit ifdef, move it to kvm_lazy_ee_enable() so that it is consistent with lazy ee state, and so that we don't track more host code as interrupts-enabled than necessary. Rename kvm_lazy_ee_enable() to kvm_fix_ee_before_entry() to reflect that this function now has a role on 32-bit as well. Signed-off-by: Scott Wood scottw...@freescale.com --- arch/powerpc/include/asm/kvm_ppc.h | 11 --- arch/powerpc/kvm/book3s_pr.c |4 ++-- arch/powerpc/kvm/booke.c |4 ++-- arch/powerpc/kvm/powerpc.c |2 -- 4 files changed, 12 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index a5287fe..6885846 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -394,10 +394,15 @@ static inline void kvmppc_mmu_flush_icache(pfn_t pfn) } } -/* Please call after prepare_to_enter. This function puts the lazy ee state - back to normal mode, without actually enabling interrupts. */ -static inline void kvmppc_lazy_ee_enable(void) +/* + * Please call after prepare_to_enter. This function puts the lazy ee and irq + * disabled tracking state back to normal mode, without actually enabling + * interrupts. + */ +static inline void kvmppc_fix_ee_before_entry(void) { + trace_hardirqs_on(); + #ifdef CONFIG_PPC64 /* Only need to enable IRQs by hard enabling them after this */ local_paca-irq_happened = 0; diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index bdc40b8..0b97ce4 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -890,7 +890,7 @@ program_interrupt: local_irq_enable(); r = s; } else { - kvmppc_lazy_ee_enable(); + kvmppc_fix_ee_before_entry(); } } @@ -1161,7 +1161,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) if (vcpu-arch.shared-msr MSR_FP) kvmppc_handle_ext(vcpu, BOOK3S_INTERRUPT_FP_UNAVAIL, MSR_FP); - kvmppc_lazy_ee_enable(); + kvmppc_fix_ee_before_entry(); ret = __kvmppc_vcpu_run(kvm_run, vcpu); diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 5cd7ad0..08f4aa1 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -673,7 +673,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) ret = s; goto out; } - kvmppc_lazy_ee_enable(); + kvmppc_fix_ee_before_entry(); kvm_guest_enter(); @@ -1167,7 +1167,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, local_irq_enable(); r = (s 2) | RESUME_HOST | (r RESUME_FLAG_NV); } else { - kvmppc_lazy_ee_enable(); + kvmppc_fix_ee_before_entry(); } } diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 6316ee3..4e05f8c 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -117,8 +117,6 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu) kvm_guest_exit(); continue; } - - trace_hardirqs_on(); #endif kvm_guest_enter(); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/8] kvm/ppc/booke64: Disable e6500 support
The previous patch made 64-bit booke KVM build again, but Altivec support is still not complete, and we can't prevent the guest from turning on Altivec (which can corrupt host state until state save/restore is implemented). Disable e6500 on KVM until this is fixed. Signed-off-by: Scott Wood scottw...@freescale.com --- Mihai has posted RFC patches for proper Altivec support, so disabling e6500 should only need to be for 3.10. --- arch/powerpc/kvm/e500mc.c |2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c index 753cc99..19c8379 100644 --- a/arch/powerpc/kvm/e500mc.c +++ b/arch/powerpc/kvm/e500mc.c @@ -177,8 +177,6 @@ int kvmppc_core_check_processor_compat(void) r = 0; else if (strcmp(cur_cpu_spec-cpu_name, e5500) == 0) r = 0; - else if (strcmp(cur_cpu_spec-cpu_name, e6500) == 0) - r = 0; else r = -ENOTSUPP; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/8] kvm/ppc/booke: Delay kvmppc_fix_ee_before_entry
kwmppc_fix_ee_before_entry() should be called as late as possible, or else we get things like WARN_ON(preemptible()) in enable_kernel_fp() in configurations where preemptible() works. Note that book3s_pr already waits until just before __kvmppc_vcpu_run to call kvmppc_fix_ee_before_entry(). Signed-off-by: Scott Wood scottw...@freescale.com --- arch/powerpc/kvm/booke.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index c5270a3..f953324 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -671,7 +671,6 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) ret = s; goto out; } - kvmppc_fix_ee_before_entry(); kvm_guest_enter(); @@ -697,6 +696,8 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) kvmppc_load_guest_fp(vcpu); #endif + kvmppc_fix_ee_before_entry(); + ret = __kvmppc_vcpu_run(kvm_run, vcpu); /* No need for kvm_guest_exit. It's done in handle_exit. -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html