Re: IOMMU groups ... PEX8606 switch?
On 01/04/2014 12:22 PM, Alex Williamson wrote: On Sat, 2014-01-04 at 11:26 -0800, Dana Goyette wrote: On 01/03/2014 04:03 PM, Alex Williamson wrote: On Mon, 2013-12-30 at 16:13 -0800, Dana Goyette wrote: On 12/29/2013 08:16 PM, Alex Williamson wrote: On Sat, 2013-12-28 at 23:32 -0800, Dana Goyette wrote: On 12/28/2013 7:23 PM, Alex Williamson wrote: On Sat, 2013-12-28 at 18:31 -0800, Dana Goyette wrote: I have purchased both a SuperMicro X10SAE and an X10SAT, and I need to soon decide which one to keep. The SuperMicro X10SAT has all the PCIe x1 slots hidden behind a PLX PEX8066 switch, which claims to support ACS. I'd expect the devices downstream of the PLX switch to be in separate groups. With Linux 3.13-rc5 and "enable overrides for missing ACS capabilities" applied and set for the Intel root ports, the devices behind the switch remain stuck in the same group. In terms of passing devices to different VMs, which is better: all devices on different root ports, or all devices behind the one ACS-supporting switch? Can you provide lspci -vvv info? If you're getting that for groups either the switch has ACS capabilities, but doesn't support the features we need or we're doing something wrong. Thanks, I initially tried attaching the output as a .txt file, but it's too large. Anyway, here's the output of lspci -nnvvv (you may notice that I moved the Radeon to a different slot). Well, something seems amiss since the downstream switch ports all seem to support and enable the correct set of ACS capabilities. I'm tending to suspect something wrong with the ACS override patch or how it's being used since your IOMMU group is still based at the root port. Each root port is isolated from the other root ports though, so something is happening with the override patch. Can you provide the kernel command line you use to enable ACS overrides and the override patch you're using, as it applies to 3.13-rc5? Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I'm using the original acs-override patch from this post: https://lkml.org/lkml/2013/5/30/513 Kernel parameter is: pcie_acs_override=id:8086:8c10,id:8086:8c12,id:8086:8c16,id:8086:8c18 Actually, you're not: pcie_acs_override=id:8086:8c10,id:8086:8c16,id:8086:8c18,id:8086:ac1a,id:8086:8c1c,id:8086:8c1e,id:10b5:8606 And we register all of them: [0.00] PCIe ACS bypass added for 8086:8c10 [0.00] PCIe ACS bypass added for 8086:8c16 [0.00] PCIe ACS bypass added for 8086:8c18 [0.00] PCIe ACS bypass added for 8086:ac1a [0.00] PCIe ACS bypass added for 8086:8c1c [0.00] PCIe ACS bypass added for 8086:8c1e [0.00] PCIe ACS bypass added for 10b5:8606 However, note that the root port causing you trouble is 8086:8c12, which isn't provided as an override, therefore the code is doing the right thing and grouping all devices behind that root port together. Thanks for catching that -- I certainly missed it! I've added the override for that root port and removed the override for the PLX switch; now all the ports are indeed in separate groups. Do we yet know if it'll be possible to properly isolate the Intel root ports, without this ACS override? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IOMMU groups ... PEX8606 switch?
On Sat, 2014-01-04 at 11:26 -0800, Dana Goyette wrote: > On 01/03/2014 04:03 PM, Alex Williamson wrote: > > On Mon, 2013-12-30 at 16:13 -0800, Dana Goyette wrote: > >> On 12/29/2013 08:16 PM, Alex Williamson wrote: > >>> On Sat, 2013-12-28 at 23:32 -0800, Dana Goyette wrote: > On 12/28/2013 7:23 PM, Alex Williamson wrote: > > On Sat, 2013-12-28 at 18:31 -0800, Dana Goyette wrote: > >> I have purchased both a SuperMicro X10SAE and an X10SAT, and I need to > >> soon decide which one to keep. > >> > >> The SuperMicro X10SAT has all the PCIe x1 slots hidden behind a PLX > >> PEX8066 switch, which claims to support ACS. I'd expect the devices > >> downstream of the PLX switch to be in separate groups. > >> > >> With Linux 3.13-rc5 and "enable overrides for missing ACS capabilities" > >> applied and set for the Intel root ports, the devices behind the switch > >> remain stuck in the same group. > >> > >> In terms of passing devices to different VMs, which is better: all > >> devices on different root ports, or all devices behind the one > >> ACS-supporting switch? > > Can you provide lspci -vvv info? If you're getting that for groups > > either the switch has ACS capabilities, but doesn't support the features > > we need or we're doing something wrong. Thanks, > > > I initially tried attaching the output as a .txt file, but it's too > large. Anyway, here's the output of lspci -nnvvv (you may notice that I > moved the Radeon to a different slot). > >>> Well, something seems amiss since the downstream switch ports all seem > >>> to support and enable the correct set of ACS capabilities. I'm tending > >>> to suspect something wrong with the ACS override patch or how it's being > >>> used since your IOMMU group is still based at the root port. Each root > >>> port is isolated from the other root ports though, so something is > >>> happening with the override patch. Can you provide the kernel command > >>> line you use to enable ACS overrides and the override patch you're > >>> using, as it applies to 3.13-rc5? Thanks, > >>> > >>> Alex > >>> > >>> -- > >>> To unsubscribe from this list: send the line "unsubscribe kvm" in > >>> the body of a message to majord...@vger.kernel.org > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>> > >> I'm using the original acs-override patch from this post: > >> https://lkml.org/lkml/2013/5/30/513 > >> > >> Kernel parameter is: > >> pcie_acs_override=id:8086:8c10,id:8086:8c12,id:8086:8c16,id:8086:8c18 Actually, you're not: pcie_acs_override=id:8086:8c10,id:8086:8c16,id:8086:8c18,id:8086:ac1a,id:8086:8c1c,id:8086:8c1e,id:10b5:8606 And we register all of them: [0.00] PCIe ACS bypass added for 8086:8c10 [0.00] PCIe ACS bypass added for 8086:8c16 [0.00] PCIe ACS bypass added for 8086:8c18 [0.00] PCIe ACS bypass added for 8086:ac1a [0.00] PCIe ACS bypass added for 8086:8c1c [0.00] PCIe ACS bypass added for 8086:8c1e [0.00] PCIe ACS bypass added for 10b5:8606 However, note that the root port causing you trouble is 8086:8c12, which isn't provided as an override, therefore the code is doing the right thing and grouping all devices behind that root port together. > >> > >> When booting a kernel without the override patch, the following devices > >> are all in the same group: Intel Root Ports 1, 2, 4, 5; ASMedia SATA > >> controller; PLX PEX8606 switch; Renesas USB controller; TI Firewire > >> controller; Intel I210 Ethernet controller. > > Could you please try the patch below and send dmesg for the system once > > booted. This applies directly to upstream and includes the acs override > > patch. Thanks, > > (removed patch from quote.) > > Here's the complete dmesg, with pcie_acs_override still set: > > http://pastebin.com/YHuKnrTb > > Most relevant section: > > [0.524362] DMAR: No ATSR found > [0.524386] IOMMU 1 0xfed91000: using Queued invalidation > [0.524389] IOMMU: Setting RMRR: > [0.524398] IOMMU: Setting identity map for device :00:1d.0 > [0x7bea1000 - 0x7bea] > [0.524423] IOMMU: Setting identity map for device :00:1a.0 > [0x7bea1000 - 0x7bea] > [0.524441] IOMMU: Setting identity map for device :00:14.0 > [0x7bea1000 - 0x7bea] > [0.524454] IOMMU: Prepare 0-16MiB unity mapping for LPC > [0.524461] IOMMU: Setting identity map for device :00:1f.0 [0x0 > - 0xff] > [0.524548] PCI-DMA: Intel(R) Virtualization Technology for Directed I/O > [0.524551] intel_iommu_add_device(:00:00.0) > [0.524552] dma_pdev #1: :00:00.0 > [0.524553] dma_pdev #2: :00:00.0 > [0.524554] dma_pdev #3: :00:00.0 > [0.524554] dma_pdev #4: :00:00.0 > [0.524565] intel_iommu_add_device(:00:01.0) > [0.524566] dma_pdev #1: :00:01.0 > [0.524567] dma_pdev #2: :00:01.0 > [0.524569] pci_acs_enab
Re: IOMMU groups ... PEX8606 switch?
On 01/03/2014 04:03 PM, Alex Williamson wrote: On Mon, 2013-12-30 at 16:13 -0800, Dana Goyette wrote: On 12/29/2013 08:16 PM, Alex Williamson wrote: On Sat, 2013-12-28 at 23:32 -0800, Dana Goyette wrote: On 12/28/2013 7:23 PM, Alex Williamson wrote: On Sat, 2013-12-28 at 18:31 -0800, Dana Goyette wrote: I have purchased both a SuperMicro X10SAE and an X10SAT, and I need to soon decide which one to keep. The SuperMicro X10SAT has all the PCIe x1 slots hidden behind a PLX PEX8066 switch, which claims to support ACS. I'd expect the devices downstream of the PLX switch to be in separate groups. With Linux 3.13-rc5 and "enable overrides for missing ACS capabilities" applied and set for the Intel root ports, the devices behind the switch remain stuck in the same group. In terms of passing devices to different VMs, which is better: all devices on different root ports, or all devices behind the one ACS-supporting switch? Can you provide lspci -vvv info? If you're getting that for groups either the switch has ACS capabilities, but doesn't support the features we need or we're doing something wrong. Thanks, I initially tried attaching the output as a .txt file, but it's too large. Anyway, here's the output of lspci -nnvvv (you may notice that I moved the Radeon to a different slot). Well, something seems amiss since the downstream switch ports all seem to support and enable the correct set of ACS capabilities. I'm tending to suspect something wrong with the ACS override patch or how it's being used since your IOMMU group is still based at the root port. Each root port is isolated from the other root ports though, so something is happening with the override patch. Can you provide the kernel command line you use to enable ACS overrides and the override patch you're using, as it applies to 3.13-rc5? Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I'm using the original acs-override patch from this post: https://lkml.org/lkml/2013/5/30/513 Kernel parameter is: pcie_acs_override=id:8086:8c10,id:8086:8c12,id:8086:8c16,id:8086:8c18 When booting a kernel without the override patch, the following devices are all in the same group: Intel Root Ports 1, 2, 4, 5; ASMedia SATA controller; PLX PEX8606 switch; Renesas USB controller; TI Firewire controller; Intel I210 Ethernet controller. Could you please try the patch below and send dmesg for the system once booted. This applies directly to upstream and includes the acs override patch. Thanks, (removed patch from quote.) Here's the complete dmesg, with pcie_acs_override still set: http://pastebin.com/YHuKnrTb Most relevant section: [0.524362] DMAR: No ATSR found [0.524386] IOMMU 1 0xfed91000: using Queued invalidation [0.524389] IOMMU: Setting RMRR: [0.524398] IOMMU: Setting identity map for device :00:1d.0 [0x7bea1000 - 0x7bea] [0.524423] IOMMU: Setting identity map for device :00:1a.0 [0x7bea1000 - 0x7bea] [0.524441] IOMMU: Setting identity map for device :00:14.0 [0x7bea1000 - 0x7bea] [0.524454] IOMMU: Prepare 0-16MiB unity mapping for LPC [0.524461] IOMMU: Setting identity map for device :00:1f.0 [0x0 - 0xff] [0.524548] PCI-DMA: Intel(R) Virtualization Technology for Directed I/O [0.524551] intel_iommu_add_device(:00:00.0) [0.524552] dma_pdev #1: :00:00.0 [0.524553] dma_pdev #2: :00:00.0 [0.524554] dma_pdev #3: :00:00.0 [0.524554] dma_pdev #4: :00:00.0 [0.524565] intel_iommu_add_device(:00:01.0) [0.524566] dma_pdev #1: :00:01.0 [0.524567] dma_pdev #2: :00:01.0 [0.524569] pci_acs_enabled(:00:01.0, 001d) [0.524572] pci_acs_flags_enabled no ACS capability on :00:01.0 [0.524573] pci_acs_flags_enabled(:00:01.0, 001d) -> false [0.524574] -> false [0.524575] pci_acs_enabled(:00:01.0, 001d) [0.524577] pci_acs_flags_enabled no ACS capability on :00:01.0 [0.524578] pci_acs_flags_enabled(:00:01.0, 001d) -> false [0.524579] -> false [0.524580] dma_pdev #3: :00:01.0 [0.524581] dma_pdev #4: :00:01.0 [0.524584] intel_iommu_add_device(:00:01.1) [0.524586] dma_pdev #1: :00:01.1 [0.524586] dma_pdev #2: :00:01.1 [0.524587] pci_acs_enabled(:00:01.1, 001d) [0.524589] pci_acs_flags_enabled no ACS capability on :00:01.1 [0.524590] pci_acs_flags_enabled(:00:01.1, 001d) -> false [0.524591] -> false [0.524592] pci_acs_enabled(:00:01.0, 001d) [0.524593] pci_acs_flags_enabled no ACS capability on :00:01.0 [0.524595] pci_acs_flags_enabled(:00:01.0, 001d) -> false [0.524596] -> false [0.524596] dma_pdev #3: :00:01.0 [0.524597] dma_pdev #4: :00:01.0 [0.524599] intel_iommu_add_device(:00:02.0) [0.524601] dma_pd
[PATCH 03/13] apic: Remove redundant enable_apic
From: Jan Kiszka Already called by the bootstrap code. Signed-off-by: Jan Kiszka --- x86/apic.c | 1 - 1 file changed, 1 deletion(-) diff --git a/x86/apic.c b/x86/apic.c index 50e77fc..d06153f 100644 --- a/x86/apic.c +++ b/x86/apic.c @@ -326,7 +326,6 @@ int main() test_lapic_existence(); mask_pic_interrupts(); -enable_apic(); test_enable_x2apic(); test_self_ipi(); -- 1.8.1.1.298.ge7eed54 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/13] Ignore *.elf build outputs
From: Jan Kiszka Signed-off-by: Jan Kiszka --- .gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/.gitignore b/.gitignore index ed857b7..d6663ec 100644 --- a/.gitignore +++ b/.gitignore @@ -3,6 +3,7 @@ *.d *.o *.flat +*.elf .pc patches .stgit-* -- 1.8.1.1.298.ge7eed54 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/13] x86: Add debug facility test case
From: Jan Kiszka This adds a basic test for INT3/#BP, hardware breakpoints, hardware watchpoints and single-stepping. Signed-off-by: Jan Kiszka --- config-x86-common.mak | 2 + config-x86_64.mak | 2 +- lib/x86/desc.h| 2 + x86/debug.c | 113 ++ x86/unittests.cfg | 3 ++ 5 files changed, 121 insertions(+), 1 deletion(-) create mode 100644 x86/debug.c diff --git a/config-x86-common.mak b/config-x86-common.mak index 32da7fb..aa5a439 100644 --- a/config-x86-common.mak +++ b/config-x86-common.mak @@ -103,6 +103,8 @@ $(TEST_DIR)/pcid.elf: $(cstart.o) $(TEST_DIR)/pcid.o $(TEST_DIR)/vmx.elf: $(cstart.o) $(TEST_DIR)/vmx.o $(TEST_DIR)/vmx_tests.o +$(TEST_DIR)/debug.elf: $(cstart.o) $(TEST_DIR)/debug.o + arch_clean: $(RM) $(TEST_DIR)/*.o $(TEST_DIR)/*.flat $(TEST_DIR)/*.elf \ $(TEST_DIR)/.*.d $(TEST_DIR)/lib/.*.d $(TEST_DIR)/lib/*.o diff --git a/config-x86_64.mak b/config-x86_64.mak index bb8ee89..a9a2a9e 100644 --- a/config-x86_64.mak +++ b/config-x86_64.mak @@ -7,7 +7,7 @@ CFLAGS += -D__x86_64__ tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \ $(TEST_DIR)/emulator.flat $(TEST_DIR)/idt_test.flat \ $(TEST_DIR)/xsave.flat $(TEST_DIR)/rmap_chain.flat \ - $(TEST_DIR)/pcid.flat + $(TEST_DIR)/pcid.flat $(TEST_DIR)/debug.flat tests += $(TEST_DIR)/svm.flat tests += $(TEST_DIR)/vmx.flat diff --git a/lib/x86/desc.h b/lib/x86/desc.h index 5c850b2..b795aad 100644 --- a/lib/x86/desc.h +++ b/lib/x86/desc.h @@ -66,6 +66,8 @@ typedef struct { ".popsection \n\t" \ ":" +#define DB_VECTOR 1 +#define BP_VECTOR 3 #define UD_VECTOR 6 #define GP_VECTOR 13 diff --git a/x86/debug.c b/x86/debug.c new file mode 100644 index 000..154c7fe --- /dev/null +++ b/x86/debug.c @@ -0,0 +1,113 @@ +/* + * Test for x86 debugging facilities + * + * Copyright (c) Siemens AG, 2014 + * + * Authors: + * Jan Kiszka + * + * This work is licensed under the terms of the GNU GPL, version 2. + */ + +#include "libcflat.h" +#include "desc.h" + +static volatile unsigned long bp_addr[10], dr6[10]; +static volatile unsigned int n; +static volatile unsigned long value; + +static unsigned long get_dr6(void) +{ + unsigned long value; + + asm volatile("mov %%dr6,%0" : "=r" (value)); + return value; +} + +static void set_dr0(void *value) +{ + asm volatile("mov %0,%%dr0" : : "r" (value)); +} + +static void set_dr1(void *value) +{ + asm volatile("mov %0,%%dr1" : : "r" (value)); +} + +static void set_dr7(unsigned long value) +{ + asm volatile("mov %0,%%dr7" : : "r" (value)); +} + +static void handle_db(struct ex_regs *regs) +{ + bp_addr[n] = regs->rip; + dr6[n] = get_dr6(); + + if (dr6[n] & 0x1) + regs->rflags |= (1 << 16); + + if (++n >= 10) { + regs->rflags &= ~(1 << 8); + set_dr7(0x0400); + } +} + +static void handle_bp(struct ex_regs *regs) +{ + bp_addr[0] = regs->rip; +} + +int main(int ac, char **av) +{ + unsigned long start; + + setup_idt(); + handle_exception(DB_VECTOR, handle_db); + handle_exception(BP_VECTOR, handle_bp); + +sw_bp: + asm volatile("int3"); + report("#BP", bp_addr[0] == (unsigned long)&&sw_bp + 1); + + set_dr0(&&hw_bp); + set_dr7(0x0402); +hw_bp: + asm volatile("nop"); + report("hw breakpoint", + n == 1 && + bp_addr[0] == ((unsigned long)&&hw_bp) && dr6[0] == 0x0ff1); + + n = 0; + asm volatile( + "pushf\n\t" + "pop %%rax\n\t" + "or $(1<<8),%%rax\n\t" + "push %%rax\n\t" + "lea (%%rip),%0\n\t" + "popf\n\t" + "and $~(1<<8),%%rax\n\t" + "push %%rax\n\t" + "popf\n\t" + : "=g" (start) : : "rax"); + report("single step", + n == 3 && + bp_addr[0] == start+1+6 && dr6[0] == 0x4ff0 && + bp_addr[1] == start+1+6+1 && dr6[1] == 0x4ff0 && + bp_addr[2] == start+1+6+1+1 && dr6[2] == 0x4ff0); + + n = 0; + set_dr1((void *)&value); + set_dr7(0x00d0040a); + + asm volatile( + "mov $42,%%rax\n\t" + "mov %%rax,%0\n\t" + : "=m" (value) : : "rax"); +hw_wp: + report("hw watchpoint", + n == 1 && + bp_addr[0] == ((unsigned long)&&hw_wp) && dr6[0] == 0x4ff2); + + return 0; +} diff --git a/x86/unittests.cfg b/x86/unittests.cfg index 85c36aa..7930c02 100644 --- a/x86/unittests.cfg +++ b/x86/unittests.cfg @@ -155,3 +155,6 @@ file = vmx.flat extra_params = -cpu host,+vmx arch = x86_64 +[debug] +file = debug.flag +arch = x86_64 -- 1.8.1.1.298.ge7eed54 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body o
[PATCH 01/13] VMX: Add test cases around interrupt injection and halting
From: Jan Kiszka This checks for interrupt delivery to L2, unintercepted hlt in L2 and explicit L2 suspension via the activity state HLT. All tests are performed both with direct interrupt injection and external interrupt interception. Signed-off-by: Jan Kiszka --- x86/vmx.c | 3 +- x86/vmx.h | 3 ++ x86/vmx_tests.c | 144 3 files changed, 148 insertions(+), 2 deletions(-) diff --git a/x86/vmx.c b/x86/vmx.c index fe950e6..a475aec 100644 --- a/x86/vmx.c +++ b/x86/vmx.c @@ -457,7 +457,7 @@ static void init_vmcs_guest(void) vmcs_write(GUEST_RFLAGS, 0x2); /* 26.3.1.5 */ - vmcs_write(GUEST_ACTV_STATE, 0); + vmcs_write(GUEST_ACTV_STATE, ACTV_ACTIVE); vmcs_write(GUEST_INTR_STATE, 0); } @@ -482,7 +482,6 @@ static int init_vmcs(struct vmcs **vmcs) ctrl_pin |= PIN_EXTINT | PIN_NMI | PIN_VIRT_NMI; ctrl_exit = EXI_LOAD_EFER | EXI_HOST_64; ctrl_enter = (ENT_LOAD_EFER | ENT_GUEST_64); - ctrl_cpu[0] |= CPU_HLT; /* DIsable IO instruction VMEXIT now */ ctrl_cpu[0] &= (~(CPU_IO | CPU_IO_BITMAP)); ctrl_cpu[1] = 0; diff --git a/x86/vmx.h b/x86/vmx.h index bc8c86f..3867793 100644 --- a/x86/vmx.h +++ b/x86/vmx.h @@ -500,6 +500,9 @@ enum Ctrl1 { #define INVEPT_SINGLE 1 #define INVEPT_GLOBAL 2 +#define ACTV_ACTIVE0 +#define ACTV_HLT 1 + extern struct regs regs; extern union vmx_basic basic; diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index bec34c4..70efb50 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -9,6 +9,8 @@ #include "vm.h" #include "io.h" #include "fwcfg.h" +#include "isr.h" +#include "apic.h" u64 ia32_pat; u64 ia32_efer; @@ -1117,6 +1119,146 @@ static int ept_exit_handler() return VMX_TEST_VMEXIT; } +#define TIMER_VECTOR 222 + +static volatile bool timer_fired; + +static void timer_isr(isr_regs_t *regs) +{ + timer_fired = true; + apic_write(APIC_EOI, 0); +} + +static int interrupt_init(struct vmcs *vmcs) +{ + msr_bmp_init(); + vmcs_write(PIN_CONTROLS, vmcs_read(PIN_CONTROLS) & ~PIN_EXTINT); + handle_irq(TIMER_VECTOR, timer_isr); + return VMX_TEST_START; +} + +static void interrupt_main(void) +{ + long long start, loops; + + set_stage(0); + + apic_write(APIC_LVTT, TIMER_VECTOR); + irq_enable(); + + apic_write(APIC_TMICT, 1); + for (loops = 0; loops < 1000 && !timer_fired; loops++) + asm volatile ("nop"); + report("direct interrupt while running guest", timer_fired); + + apic_write(APIC_TMICT, 0); + irq_disable(); + vmcall(); + timer_fired = false; + apic_write(APIC_TMICT, 1); + for (loops = 0; loops < 1000 && !timer_fired; loops++) + asm volatile ("nop"); + report("intercepted interrupt while running guest", timer_fired); + + irq_enable(); + apic_write(APIC_TMICT, 0); + irq_disable(); + vmcall(); + timer_fired = false; + start = rdtsc(); + apic_write(APIC_TMICT, 100); + + asm volatile ("sti; hlt"); + + report("direct interrupt + hlt", + rdtsc() - start > 100 && timer_fired); + + apic_write(APIC_TMICT, 0); + irq_disable(); + vmcall(); + timer_fired = false; + start = rdtsc(); + apic_write(APIC_TMICT, 100); + + asm volatile ("sti; hlt"); + + report("intercepted interrupt + hlt", + rdtsc() - start > 1 && timer_fired); + + apic_write(APIC_TMICT, 0); + irq_disable(); + vmcall(); + timer_fired = false; + start = rdtsc(); + apic_write(APIC_TMICT, 100); + + irq_enable(); + asm volatile ("nop"); + vmcall(); + + report("direct interrupt + activity state hlt", + rdtsc() - start > 1 && timer_fired); + + apic_write(APIC_TMICT, 0); + irq_disable(); + vmcall(); + timer_fired = false; + start = rdtsc(); + apic_write(APIC_TMICT, 100); + + irq_enable(); + asm volatile ("nop"); + vmcall(); + + report("intercepted interrupt + activity state hlt", + rdtsc() - start > 1 && timer_fired); +} + +static int interrupt_exit_handler(void) +{ + u64 guest_rip = vmcs_read(GUEST_RIP); + ulong reason = vmcs_read(EXI_REASON) & 0xff; + u32 insn_len = vmcs_read(EXI_INST_LEN); + + switch (reason) { + case VMX_VMCALL: + switch (get_stage()) { + case 0: + case 2: + case 5: + vmcs_write(PIN_CONTROLS, + vmcs_read(PIN_CONTROLS) | PIN_EXTINT); + break; + case 1: + case 3: + vmcs_write(PIN_CONTROLS, + vmcs_read(PIN_CONTR
[PATCH 05/13] lib/x86: Move exception test code into library
From: Jan Kiszka Will also be used by the APIC test. Moving exception_return assignment out of line, we can drop the explicit compiler barrier. Signed-off-by: Jan Kiszka --- lib/x86/desc.c | 24 lib/x86/desc.h | 4 x86/vmx.c | 34 +++--- 3 files changed, 35 insertions(+), 27 deletions(-) diff --git a/lib/x86/desc.c b/lib/x86/desc.c index 7c5c721..f75ec1d 100644 --- a/lib/x86/desc.c +++ b/lib/x86/desc.c @@ -353,3 +353,27 @@ void print_current_tss_info(void) tr, tss[0].prev, tss[i].prev); } #endif + +static bool exception; +static void *exception_return; + +static void exception_handler(struct ex_regs *regs) +{ + exception = true; + regs->rip = (unsigned long)exception_return; +} + +bool test_for_exception(unsigned int ex, void (*trigger_func)(void *data), + void *data) +{ + handle_exception(ex, exception_handler); + exception = false; + trigger_func(data); + handle_exception(ex, NULL); + return exception; +} + +void set_exception_return(void *addr) +{ + exception_return = addr; +} diff --git a/lib/x86/desc.h b/lib/x86/desc.h index f819452..5c850b2 100644 --- a/lib/x86/desc.h +++ b/lib/x86/desc.h @@ -84,4 +84,8 @@ void set_intr_task_gate(int e, void *fn); void print_current_tss_info(void); void handle_exception(u8 v, void (*func)(struct ex_regs *regs)); +bool test_for_exception(unsigned int ex, void (*trigger_func)(void *data), + void *data); +void set_exception_return(void *addr); + #endif diff --git a/x86/vmx.c b/x86/vmx.c index f9d5493..4f0bb8d 100644 --- a/x86/vmx.c +++ b/x86/vmx.c @@ -538,38 +538,18 @@ static void init_vmx(void) memset(guest_syscall_stack, 0, PAGE_SIZE); } -static bool exception; -static void *exception_return; - -static void exception_handler(struct ex_regs *regs) +static void do_vmxon_off(void *data) { - exception = true; - regs->rip = (u64)exception_return; -} - -static int test_for_exception(unsigned int ex, void (*func)(void)) -{ - handle_exception(ex, exception_handler); - exception = false; - func(); - handle_exception(ex, NULL); - return exception; -} - -static void do_vmxon_off(void) -{ - exception_return = &&resume; - barrier(); + set_exception_return(&&resume); vmx_on(); vmx_off(); resume: barrier(); } -static void do_write_feature_control(void) +static void do_write_feature_control(void *data) { - exception_return = &&resume; - barrier(); + set_exception_return(&&resume); wrmsr(MSR_IA32_FEATURE_CONTROL, 0); resume: barrier(); @@ -592,18 +572,18 @@ static int test_vmx_feature_control(void) wrmsr(MSR_IA32_FEATURE_CONTROL, 0); report("test vmxon with FEATURE_CONTROL cleared", - test_for_exception(GP_VECTOR, &do_vmxon_off)); + test_for_exception(GP_VECTOR, &do_vmxon_off, NULL)); wrmsr(MSR_IA32_FEATURE_CONTROL, 0x4); report("test vmxon without FEATURE_CONTROL lock", - test_for_exception(GP_VECTOR, &do_vmxon_off)); + test_for_exception(GP_VECTOR, &do_vmxon_off, NULL)); wrmsr(MSR_IA32_FEATURE_CONTROL, 0x5); vmx_enabled = ((rdmsr(MSR_IA32_FEATURE_CONTROL) & 0x5) == 0x5); report("test enable VMX in FEATURE_CONTROL", vmx_enabled); report("test FEATURE_CONTROL lock bit", - test_for_exception(GP_VECTOR, &do_write_feature_control)); + test_for_exception(GP_VECTOR, &do_write_feature_control, NULL)); return !vmx_enabled; } -- 1.8.1.1.298.ge7eed54 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/13] x2apic: Test for invalid state transitions
From: Jan Kiszka This checks if KVM properly acknowledges invalid state transitions on MSR_APIC_BASE writes with a #GP. Signed-off-by: Jan Kiszka --- lib/x86/apic-defs.h | 3 +++ x86/apic.c | 41 - 2 files changed, 43 insertions(+), 1 deletion(-) diff --git a/lib/x86/apic-defs.h b/lib/x86/apic-defs.h index c061e3d..94112b4 100644 --- a/lib/x86/apic-defs.h +++ b/lib/x86/apic-defs.h @@ -9,6 +9,9 @@ */ #defineAPIC_DEFAULT_PHYS_BASE 0xfee0 +#define APIC_BSP (1UL << 8) +#define APIC_EXTD (1UL << 10) +#define APIC_EN(1UL << 11) #defineAPIC_ID 0x20 diff --git a/x86/apic.c b/x86/apic.c index d06153f..4ebcd4f 100644 --- a/x86/apic.c +++ b/x86/apic.c @@ -4,6 +4,7 @@ #include "smp.h" #include "desc.h" #include "isr.h" +#include "msr.h" static int g_fail; static int g_tests; @@ -70,14 +71,52 @@ static void test_tsc_deadline_timer(void) } } -#define MSR_APIC_BASE 0x001b +static void do_write_apicbase(void *data) +{ +set_exception_return(&&resume); +wrmsr(MSR_IA32_APICBASE, *(u64 *)data); +resume: +barrier(); +} void test_enable_x2apic(void) { +u64 invalid_state = APIC_DEFAULT_PHYS_BASE | APIC_BSP | APIC_EXTD; +u64 apic_enabled = APIC_DEFAULT_PHYS_BASE | APIC_BSP | APIC_EN; +u64 x2apic_enabled = +APIC_DEFAULT_PHYS_BASE | APIC_BSP | APIC_EN | APIC_EXTD; + if (enable_x2apic()) { printf("x2apic enabled\n"); + +report("x2apic enabled to invalid state", + test_for_exception(GP_VECTOR, do_write_apicbase, + &invalid_state)); +report("x2apic enabled to apic enabled", + test_for_exception(GP_VECTOR, do_write_apicbase, + &apic_enabled)); + +wrmsr(MSR_IA32_APICBASE, APIC_DEFAULT_PHYS_BASE | APIC_BSP); +report("disabled to invalid state", + test_for_exception(GP_VECTOR, do_write_apicbase, + &invalid_state)); +report("disabled to x2apic enabled", + test_for_exception(GP_VECTOR, do_write_apicbase, + &x2apic_enabled)); + +wrmsr(MSR_IA32_APICBASE, apic_enabled); +report("apic enabled to invalid state", + test_for_exception(GP_VECTOR, do_write_apicbase, + &invalid_state)); + +wrmsr(MSR_IA32_APICBASE, x2apic_enabled); +apic_write(APIC_SPIV, 0x1ff); } else { printf("x2apic not detected\n"); + +report("enable unsupported x2apic", + test_for_exception(GP_VECTOR, do_write_apicbase, + &x2apic_enabled)); } } -- 1.8.1.1.298.ge7eed54 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/13] Provide common report and report_summary services
From: Jan Kiszka This both reduces code duplication and standardizes the output format a bit more. Signed-off-by: Jan Kiszka --- Makefile | 3 ++- lib/libcflat.h| 4 lib/report.c | 36 x86/apic.c| 15 +-- x86/emulator.c| 16 +--- x86/eventinj.c| 15 +-- x86/idt_test.c| 21 - x86/msr.c | 15 +-- x86/pcid.c| 14 +- x86/pmu.c | 37 +++-- x86/taskswitch2.c | 15 +-- x86/vmx.c | 16 +--- x86/vmx.h | 1 - 13 files changed, 68 insertions(+), 140 deletions(-) create mode 100644 lib/report.c diff --git a/Makefile b/Makefile index b6e8759..f5eccc7 100644 --- a/Makefile +++ b/Makefile @@ -14,7 +14,8 @@ libcflat := lib/libcflat.a cflatobjs := \ lib/panic.o \ lib/printf.o \ - lib/string.o + lib/string.o \ + lib/report.o cflatobjs += lib/argv.o #include architecure specific make rules diff --git a/lib/libcflat.h b/lib/libcflat.h index fadc33d..f734fde 100644 --- a/lib/libcflat.h +++ b/lib/libcflat.h @@ -45,6 +45,7 @@ extern char *strcat(char *dest, const char *src); extern int strcmp(const char *a, const char *b); extern int printf(const char *fmt, ...); +extern int snprintf(char *buf, int size, const char *fmt, ...); extern int vsnprintf(char *buf, int size, const char *fmt, va_list va); extern void puts(const char *s); @@ -58,4 +59,7 @@ extern long atol(const char *ptr); #define offsetof(TYPE, MEMBER) __builtin_offsetof (TYPE, MEMBER) #define NULL ((void *)0UL) + +void report(const char *msg_fmt, bool pass, ...); +int report_summary(void); #endif diff --git a/lib/report.c b/lib/report.c new file mode 100644 index 000..ff562a1 --- /dev/null +++ b/lib/report.c @@ -0,0 +1,36 @@ +/* + * Test result reporting + * + * Copyright (c) Siemens AG, 2014 + * + * Authors: + * Jan Kiszka + * + * This work is licensed under the terms of the GNU LGPL, version 2. + */ + +#include "libcflat.h" + +static unsigned int tests, failures; + +void report(const char *msg_fmt, bool pass, ...) +{ + char buf[2000]; + va_list va; + + tests++; + printf("%s: ", pass ? "PASS" : "FAIL"); + va_start(va, pass); + vsnprintf(buf, sizeof(buf), msg_fmt, va); + va_end(va); + puts(buf); + puts("\n"); + if (!pass) + failures++; +} + +int report_summary(void) +{ + printf("\nSUMMARY: %d tests, %d failures\n", tests, failures); + return failures > 0 ? 1 : 0; +} diff --git a/x86/apic.c b/x86/apic.c index 8febfa2..487c248 100644 --- a/x86/apic.c +++ b/x86/apic.c @@ -6,17 +6,6 @@ #include "isr.h" #include "msr.h" -static int g_fail; -static int g_tests; - -static void report(const char *msg, int pass) -{ -++g_tests; -printf("%s: %s\n", msg, (pass ? "PASS" : "FAIL")); -if (!pass) -++g_fail; -} - static void test_lapic_existence(void) { u32 lvr; @@ -403,7 +392,5 @@ int main() test_tsc_deadline_timer(); -printf("\nsummary: %d tests, %d failures\n", g_tests, g_fail); - -return g_fail != 0; +return report_summary(); } diff --git a/x86/emulator.c b/x86/emulator.c index b70e540..2e25dd8 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -7,8 +7,6 @@ #define memset __builtin_memset #define TESTDEV_IO_PORT 0xe0 -int fails, tests; - static int exceptions; struct regs { @@ -25,17 +23,6 @@ struct insn_desc { size_t len; }; -void report(const char *name, int result) -{ - ++tests; - if (result) - printf("PASS: %s\n", name); - else { - printf("FAIL: %s\n", name); - ++fails; - } -} - static char st1[] = "abcdefghijklmnop"; void test_stringio() @@ -1022,6 +1009,5 @@ int main() test_string_io_mmio(mem); - printf("\nSUMMARY: %d tests, %d failures\n", tests, fails); - return fails ? 1 : 0; + return report_summary(); } diff --git a/x86/eventinj.c b/x86/eventinj.c index 9d4392c..a218aaf 100644 --- a/x86/eventinj.c +++ b/x86/eventinj.c @@ -12,9 +12,6 @@ # define R "e" #endif -static int g_fail; -static int g_tests; - static inline void io_delay(void) { } @@ -24,14 +21,6 @@ static inline void outl(int addr, int val) asm volatile ("outl %1, %w0" : : "d" (addr), "a" (val)); } -static void report(const char *msg, int pass) -{ -++g_tests; -printf("%s: %s\n", msg, (pass ? "PASS" : "FAIL")); -if (!pass) -++g_fail; -} - void apic_self_ipi(u8 v) { apic_icr_write(APIC_DEST_SELF | APIC_DEST_PHYSICAL | APIC_DM_FIXED | @@ -416,7 +405,5 @@ int main() printf("After int 33 with shadowed stack\n"); report("int 33 with shadowed stack", test_count == 1); - printf("\nsummary: %d tests, %d failures\n", g_tests, g_fail); - - return g_fail != 0; + return
[PATCH 12/13] svm: Add missing build dependency
From: Jan Kiszka Signed-off-by: Jan Kiszka --- config-x86-common.mak | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/config-x86-common.mak b/config-x86-common.mak index bf88c67..32da7fb 100644 --- a/config-x86-common.mak +++ b/config-x86-common.mak @@ -86,7 +86,7 @@ $(TEST_DIR)/xsave.elf: $(cstart.o) $(TEST_DIR)/xsave.o $(TEST_DIR)/rmap_chain.elf: $(cstart.o) $(TEST_DIR)/rmap_chain.o -$(TEST_DIR)/svm.elf: $(cstart.o) +$(TEST_DIR)/svm.elf: $(cstart.o) $(TEST_DIR)/svm.o $(TEST_DIR)/kvmclock_test.elf: $(cstart.o) $(TEST_DIR)/kvmclock.o \ $(TEST_DIR)/kvmclock_test.o -- 1.8.1.1.298.ge7eed54 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/13] VMX: Check unconditional I/O exiting
From: Jan Kiszka Test if we ignore "unconditional I/O exiting" as long as "use I/O bitmap" is enabled. Also test if unconditional exiting itself works. Signed-off-by: Jan Kiszka --- x86/vmx_tests.c | 36 +++- 1 file changed, 35 insertions(+), 1 deletion(-) diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index 0077f3f..2c2d6c4 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -701,13 +701,21 @@ static void iobmp_main() report("I/O bitmap - overrun", 1); else report("I/O bitmap - overrun", 0); + set_stage(9); + vmcall(); + outb(0x0, 0x0); + report("I/O bitmap - ignore unconditional exiting", stage == 9); + set_stage(10); + vmcall(); + outb(0x0, 0x0); + report("I/O bitmap - unconditional exiting", stage == 11); } static int iobmp_exit_handler() { u64 guest_rip; ulong reason, exit_qual; - u32 insn_len; + u32 insn_len, ctrl_cpu0; guest_rip = vmcs_read(GUEST_RIP); reason = vmcs_read(EXI_REASON) & 0xff; @@ -765,6 +773,32 @@ static int iobmp_exit_handler() if (((exit_qual & VMX_IO_PORT_MASK) >> VMX_IO_PORT_SHIFT) == 0x) set_stage(stage + 1); break; + case 9: + case 10: + ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0); + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0 & ~CPU_IO); + set_stage(stage + 1); + break; + default: + // Should not reach here + printf("ERROR : unexpected stage, %d\n", get_stage()); + print_vmexit_info(); + return VMX_TEST_VMEXIT; + } + vmcs_write(GUEST_RIP, guest_rip + insn_len); + return VMX_TEST_RESUME; + case VMX_VMCALL: + switch (get_stage()) { + case 9: + ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0); + ctrl_cpu0 |= CPU_IO | CPU_IO_BITMAP; + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0); + break; + case 10: + ctrl_cpu0 = vmcs_read(CPU_EXEC_CTRL0); + ctrl_cpu0 = (ctrl_cpu0 & ~CPU_IO_BITMAP) | CPU_IO; + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu0); + break; default: // Should not reach here printf("ERROR : unexpected stage, %d\n", get_stage()); -- 1.8.1.1.298.ge7eed54 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/13] VMX: Fix return label in fault-triggering handlers
From: Jan Kiszka Some compiler versions (seen with gcc 4.8.1) move the resume label after the return statement which, of course, causes sever problems. Signed-off-by: Jan Kiszka --- x86/vmx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/x86/vmx.c b/x86/vmx.c index a475aec..f9d5493 100644 --- a/x86/vmx.c +++ b/x86/vmx.c @@ -563,7 +563,7 @@ static void do_vmxon_off(void) vmx_on(); vmx_off(); resume: - return; + barrier(); } static void do_write_feature_control(void) @@ -572,7 +572,7 @@ static void do_write_feature_control(void) barrier(); wrmsr(MSR_IA32_FEATURE_CONTROL, 0); resume: - return; + barrier(); } static int test_vmx_feature_control(void) -- 1.8.1.1.298.ge7eed54 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/13] apic: Add test case for relocation and writing reserved bits
From: Jan Kiszka Check that the xAPIC is relocatable and that writing a reserved bit to MSR_IA32_APICBASE triggers a #GP. Signed-off-by: Jan Kiszka --- x86/apic.c | 27 +++ 1 file changed, 27 insertions(+) diff --git a/x86/apic.c b/x86/apic.c index 4ebcd4f..8febfa2 100644 --- a/x86/apic.c +++ b/x86/apic.c @@ -120,6 +120,32 @@ void test_enable_x2apic(void) } } +#define ALTERNATE_APIC_BASE0x4200 + +static void test_apicbase(void) +{ +u64 orig_apicbase = rdmsr(MSR_IA32_APICBASE); +u32 lvr = apic_read(APIC_LVR); +u64 value; + +wrmsr(MSR_IA32_APICBASE, orig_apicbase & ~(APIC_EN | APIC_EXTD)); +wrmsr(MSR_IA32_APICBASE, ALTERNATE_APIC_BASE | APIC_BSP | APIC_EN); + +report("relocate apic", + *(volatile u32 *)(ALTERNATE_APIC_BASE + APIC_LVR) == lvr); + +value = orig_apicbase | (1UL << (cpuid(0x8008).a & 0xff)); +report("apicbase: reserved physaddr bits", + test_for_exception(GP_VECTOR, do_write_apicbase, &value)); + +value = orig_apicbase | 1; +report("apicbase: reserved low bits", + test_for_exception(GP_VECTOR, do_write_apicbase, &value)); + +wrmsr(MSR_IA32_APICBASE, orig_apicbase); +apic_write(APIC_SPIV, 0x1ff); +} + static void eoi(void) { apic_write(APIC_EOI, 0); @@ -366,6 +392,7 @@ int main() mask_pic_interrupts(); test_enable_x2apic(); +test_apicbase(); test_self_ipi(); -- 1.8.1.1.298.ge7eed54 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/13] lib/x86/apic: Consolidate over MSR_IA32_APICBASE
From: Jan Kiszka Signed-off-by: Jan Kiszka --- lib/x86/apic.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/lib/x86/apic.c b/lib/x86/apic.c index 7bb98ed..6876d85 100644 --- a/lib/x86/apic.c +++ b/lib/x86/apic.c @@ -1,5 +1,6 @@ #include "libcflat.h" #include "apic.h" +#include "msr.h" static void *g_apic = (void *)0xfee0; static void *g_ioapic = (void *)0xfec0; @@ -99,8 +100,6 @@ uint32_t apic_id(void) return apic_ops->id(); } -#define MSR_APIC_BASE 0x001b - int enable_x2apic(void) { unsigned a, b, c, d; @@ -108,9 +107,9 @@ int enable_x2apic(void) asm ("cpuid" : "=a"(a), "=b"(b), "=c"(c), "=d"(d) : "0"(1)); if (c & (1 << 21)) { -asm ("rdmsr" : "=a"(a), "=d"(d) : "c"(MSR_APIC_BASE)); +asm ("rdmsr" : "=a"(a), "=d"(d) : "c"(MSR_IA32_APICBASE)); a |= 1 << 10; -asm ("wrmsr" : : "a"(a), "d"(d), "c"(MSR_APIC_BASE)); +asm ("wrmsr" : : "a"(a), "d"(d), "c"(MSR_IA32_APICBASE)); apic_ops = &x2apic_ops; return 1; } else { -- 1.8.1.1.298.ge7eed54 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/13] VMX: Extend preemption timer tests
From: Jan Kiszka This checks that we properly expire the preemption timer while the guest is in HLT state and that we do not progress guest execution of the preemption timer is activated with a timer value of 0. Signed-off-by: Jan Kiszka --- x86/vmx_tests.c | 84 +++-- 1 file changed, 64 insertions(+), 20 deletions(-) diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index 70efb50..0077f3f 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -99,6 +99,7 @@ int vmenter_exit_handler() u32 preempt_scale; volatile unsigned long long tsc_val; volatile u32 preempt_val; +u64 saved_rip; int preemption_timer_init() { @@ -126,17 +127,24 @@ void preemption_timer_main() if (get_stage() == 1) vmcall(); } - while (1) { + set_stage(1); + while (get_stage() == 1) { if (((rdtsc() - tsc_val) >> preempt_scale) > 10 * preempt_val) { set_stage(2); vmcall(); } } + tsc_val = rdtsc(); + asm volatile ("hlt"); + vmcall(); + set_stage(5); + vmcall(); } int preemption_timer_exit_handler() { + bool guest_halted; u64 guest_rip; ulong reason; u32 insn_len; @@ -147,33 +155,69 @@ int preemption_timer_exit_handler() insn_len = vmcs_read(EXI_INST_LEN); switch (reason) { case VMX_PREEMPT: - if (((rdtsc() - tsc_val) >> preempt_scale) < preempt_val) - report("Preemption timer", 0); - else - report("Preemption timer", 1); + switch (get_stage()) { + case 1: + case 2: + report("busy-wait for preemption timer", + ((rdtsc() - tsc_val) >> preempt_scale) >= + preempt_val); + set_stage(3); + vmcs_write(PREEMPT_TIMER_VALUE, preempt_val); + return VMX_TEST_RESUME; + case 3: + guest_halted = + (vmcs_read(GUEST_ACTV_STATE) == ACTV_HLT); + report("preemption timer during hlt", + ((rdtsc() - tsc_val) >> preempt_scale) >= + preempt_val && guest_halted); + set_stage(4); + vmcs_write(PIN_CONTROLS, + vmcs_read(PIN_CONTROLS) & ~PIN_PREEMPT); + vmcs_write(GUEST_ACTV_STATE, ACTV_ACTIVE); + return VMX_TEST_RESUME; + case 4: + report("preemption timer with 0 value", + saved_rip == guest_rip); + break; + default: + printf("Invalid stage.\n"); + print_vmexit_info(); + break; + } break; case VMX_VMCALL: + vmcs_write(GUEST_RIP, guest_rip + insn_len); switch (get_stage()) { case 0: - if (vmcs_read(PREEMPT_TIMER_VALUE) != preempt_val) - report("Save preemption value", 0); - else { - set_stage(get_stage() + 1); - ctrl_exit = (vmcs_read(EXI_CONTROLS) | - EXI_SAVE_PREEMPT) & ctrl_exit_rev.clr; - vmcs_write(EXI_CONTROLS, ctrl_exit); - } - vmcs_write(GUEST_RIP, guest_rip + insn_len); + report("Keep preemption value", + vmcs_read(PREEMPT_TIMER_VALUE) == preempt_val); + set_stage(1); + vmcs_write(PREEMPT_TIMER_VALUE, preempt_val); + ctrl_exit = (vmcs_read(EXI_CONTROLS) | + EXI_SAVE_PREEMPT) & ctrl_exit_rev.clr; + vmcs_write(EXI_CONTROLS, ctrl_exit); return VMX_TEST_RESUME; case 1: - if (vmcs_read(PREEMPT_TIMER_VALUE) >= preempt_val) - report("Save preemption value", 0); - else - report("Save preemption value", 1); - vmcs_write(GUEST_RIP, guest_rip + insn_len); + report("Save preemption value", + vmcs_read(PREEMPT_TIMER_VALUE) < preempt_val); return VMX_TEST_RESUME; case 2: - report("Preemption timer", 0); + report("busy-wait for preemption timer", 0); + set_sta
[PATCH 00/13] kvm-unit-tests: Various improvements for x86 tests
Highlights: - improved preemption timer and interrupt injection tests (obsoletes my two patches in vmx queue) - tests for IA32_APIC_BASE writes - test for unconditional IO exiting (VMX) - basic test of debug facilities (hw breakpoints etc.) Jan Kiszka (13): VMX: Add test cases around interrupt injection and halting VMX: Extend preemption timer tests apic: Remove redundant enable_apic VMX: Fix return label in fault-triggering handlers lib/x86: Move exception test code into library x2apic: Test for invalid state transitions lib/x86/apic: Consolidate over MSR_IA32_APICBASE apic: Add test case for relocation and writing reserved bits VMX: Check unconditional I/O exiting Provide common report and report_summary services Ignore *.elf build outputs svm: Add missing build dependency x86: Add debug facility test case .gitignore| 1 + Makefile | 3 +- config-x86-common.mak | 4 +- config-x86_64.mak | 2 +- lib/libcflat.h| 4 + lib/report.c | 36 +++ lib/x86/apic-defs.h | 3 + lib/x86/apic.c| 7 +- lib/x86/desc.c| 24 + lib/x86/desc.h| 6 ++ x86/apic.c| 84 +--- x86/debug.c | 113 + x86/emulator.c| 16 +-- x86/eventinj.c| 15 +-- x86/idt_test.c| 21 +--- x86/msr.c | 15 +-- x86/pcid.c| 14 +-- x86/pmu.c | 37 +++ x86/taskswitch2.c | 15 +-- x86/unittests.cfg | 3 + x86/vmx.c | 57 +++ x86/vmx.h | 4 +- x86/vmx_tests.c | 264 ++ 23 files changed, 548 insertions(+), 200 deletions(-) create mode 100644 lib/report.c create mode 100644 x86/debug.c -- 1.8.1.1.298.ge7eed54 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/12] KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject
From: Jan Kiszka Used von SVM introduced for tracing nested vmexit: kvm_nested_vmexit marks exits from L2 to L0 while kvm_nested_vmexit_inject marks vmexits that are reflected to L1. Signed-off-by: Jan Kiszka --- arch/x86/kvm/vmx.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 0bd0509..9cd6eb7 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6701,6 +6701,13 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu) struct vmcs12 *vmcs12 = get_vmcs12(vcpu); u32 exit_reason = vmx->exit_reason; + trace_kvm_nested_vmexit(kvm_rip_read(vcpu), exit_reason, + vmcs_readl(EXIT_QUALIFICATION), + vmx->idt_vectoring_info, + intr_info, + vmcs_read32(VM_EXIT_INTR_ERROR_CODE), + KVM_ISA_VMX); + if (vmx->nested.nested_run_pending) return 0; @@ -8472,6 +8479,13 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, prepare_vmcs12(vcpu, vmcs12, exit_reason, exit_intr_info, exit_qualification); + trace_kvm_nested_vmexit_inject(vmcs12->vm_exit_reason, + vmcs12->exit_qualification, + vmcs12->idt_vectoring_info_field, + vmcs12->vm_exit_intr_info, + vmcs12->vm_exit_intr_error_code, + KVM_ISA_VMX); + cpu = get_cpu(); vmx->loaded_vmcs = &vmx->vmcs01; vmx_vcpu_put(vcpu); -- 1.8.1.1.298.ge7eed54 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/12] KVM: nVMX: Fix nested_run_pending on activity state HLT
From: Jan Kiszka When we suspend the guest in HLT state, the nested run is no longer pending - we emulated it completely. So only set nested_run_pending after checking the activity state. Signed-off-by: Jan Kiszka --- arch/x86/kvm/vmx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 36efd47..bde8ddd 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8050,8 +8050,6 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) enter_guest_mode(vcpu); - vmx->nested.nested_run_pending = 1; - vmx->nested.vmcs01_tsc_offset = vmcs_read64(TSC_OFFSET); cpu = get_cpu(); @@ -8070,6 +8068,8 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) return kvm_emulate_halt(vcpu); + vmx->nested.nested_run_pending = 1; + /* * Note no nested_vmx_succeed or nested_vmx_fail here. At this point * we are no longer running L1, and VMLAUNCH/VMRESUME has not yet -- 1.8.1.1.298.ge7eed54 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/12] KVM: VMX: Fix DR6 update on #DB exception
From: Jan Kiszka According to the SDM, only bits 0-3 of DR6 "may" be cleared by "certain" debug exception. So do update them on #DB exception in KVM, but leave the rest alone, only setting BD and BS in addition to already set bits in DR6. This also aligns us with kvm_vcpu_check_singlestep. Signed-off-by: Jan Kiszka --- arch/x86/kvm/vmx.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 55cb4b6..2a95ce0 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4869,7 +4869,8 @@ static int handle_exception(struct kvm_vcpu *vcpu) dr6 = vmcs_readl(EXIT_QUALIFICATION); if (!(vcpu->guest_debug & (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))) { - vcpu->arch.dr6 = dr6 | DR6_FIXED_1; + vcpu->arch.dr6 &= ~15; + vcpu->arch.dr6 |= dr6; kvm_queue_exception(vcpu, DB_VECTOR); return 1; } -- 1.8.1.1.298.ge7eed54 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/12] KVM: nVMX: Leave VMX mode on clearing of feature control MSR
From: Jan Kiszka When userspace sets MSR_IA32_FEATURE_CONTROL to 0, make sure we leave root and non-root mode, fully disabling VMX. The register state of the VCPU is undefined after this step, so userspace has to set it to a proper state afterward. This enables to reboot a VM while it is running some hypervisor code. Signed-off-by: Jan Kiszka --- arch/x86/kvm/vmx.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 9fa8a1c..3edf08f 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2455,6 +2455,8 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata) return 1; } +static void vmx_leave_nested(struct kvm_vcpu *vcpu); + static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { u32 msr_index = msr_info->index; @@ -2470,6 +2472,8 @@ static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) & FEATURE_CONTROL_LOCKED) return 0; to_vmx(vcpu)->nested.msr_ia32_feature_control = data; + if (host_initialized && data == 0) + vmx_leave_nested(vcpu); return 1; } @@ -8507,6 +8511,16 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu) } /* + * Forcibly leave nested mode in order to be able to reset the VCPU later on. + */ +static void vmx_leave_nested(struct kvm_vcpu *vcpu) +{ + if (is_guest_mode(vcpu)) + nested_vmx_vmexit(vcpu); + free_nested(to_vmx(vcpu)); +} + +/* * L1's failure to enter L2 is a subset of a normal exit, as explained in * 23.7 "VM-entry failures during or after loading guest state" (this also * lists the acceptable exit-reason and exit-qualification parameters). -- 1.8.1.1.298.ge7eed54 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/12] KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit
From: Jan Kiszka Instead of fixing up the vmcs12 after the nested vmexit, pass key parameters already when calling nested_vmx_vmexit. This will help tracing those vmexits. Signed-off-by: Jan Kiszka --- arch/x86/kvm/vmx.c | 63 +- 1 file changed, 34 insertions(+), 29 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 3edf08f..0bd0509 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1058,7 +1058,9 @@ static inline bool is_exception(u32 intr_info) == (INTR_TYPE_HARD_EXCEPTION | INTR_INFO_VALID_MASK); } -static void nested_vmx_vmexit(struct kvm_vcpu *vcpu); +static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, + u32 exit_intr_info, + unsigned long exit_qualification); static void nested_vmx_entry_failure(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, u32 reason, unsigned long qualification); @@ -1967,7 +1969,9 @@ static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned nr) if (!(vmcs12->exception_bitmap & (1u << nr))) return 0; - nested_vmx_vmexit(vcpu); + nested_vmx_vmexit(vcpu, to_vmx(vcpu)->exit_reason, + vmcs_read32(VM_EXIT_INTR_INFO), + vmcs_readl(EXIT_QUALIFICATION)); return 1; } @@ -4650,15 +4654,12 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked) static int vmx_nmi_allowed(struct kvm_vcpu *vcpu) { if (is_guest_mode(vcpu)) { - struct vmcs12 *vmcs12 = get_vmcs12(vcpu); - if (to_vmx(vcpu)->nested.nested_run_pending) return 0; if (nested_exit_on_nmi(vcpu)) { - nested_vmx_vmexit(vcpu); - vmcs12->vm_exit_reason = EXIT_REASON_EXCEPTION_NMI; - vmcs12->vm_exit_intr_info = NMI_VECTOR | - INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK; + nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI, + NMI_VECTOR | INTR_TYPE_NMI_INTR | + INTR_INFO_VALID_MASK, 0); /* * The NMI-triggered VM exit counts as injection: * clear this one and block further NMIs. @@ -4680,15 +4681,11 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu) static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu) { if (is_guest_mode(vcpu)) { - struct vmcs12 *vmcs12 = get_vmcs12(vcpu); - if (to_vmx(vcpu)->nested.nested_run_pending) return 0; if (nested_exit_on_intr(vcpu)) { - nested_vmx_vmexit(vcpu); - vmcs12->vm_exit_reason = - EXIT_REASON_EXTERNAL_INTERRUPT; - vmcs12->vm_exit_intr_info = 0; + nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, + 0, 0); /* * fall through to normal code, but now in L1, not L2 */ @@ -6853,7 +6850,9 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) return handle_invalid_guest_state(vcpu); if (is_guest_mode(vcpu) && nested_vmx_exit_handled(vcpu)) { - nested_vmx_vmexit(vcpu); + nested_vmx_vmexit(vcpu, exit_reason, + vmcs_read32(VM_EXIT_INTR_INFO), + vmcs_readl(EXIT_QUALIFICATION)); return 1; } @@ -7594,15 +7593,14 @@ static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry) static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault) { - struct vmcs12 *vmcs12; - nested_vmx_vmexit(vcpu); - vmcs12 = get_vmcs12(vcpu); + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); + u32 exit_reason; if (fault->error_code & PFERR_RSVD_MASK) - vmcs12->vm_exit_reason = EXIT_REASON_EPT_MISCONFIG; + exit_reason = EXIT_REASON_EPT_MISCONFIG; else - vmcs12->vm_exit_reason = EXIT_REASON_EPT_VIOLATION; - vmcs12->exit_qualification = vcpu->arch.exit_qualification; + exit_reason = EXIT_REASON_EPT_VIOLATION; + nested_vmx_vmexit(vcpu, exit_reason, 0, vcpu->arch.exit_qualification); vmcs12->guest_physical_address = fault->address; } @@ -7640,7 +7638,9 @@ static void vmx_inject_page_fault_nested(struct kvm_vcpu *vcpu, /* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */ if (vmcs12->exception_bitmap & (1u << PF_VECTOR)) - nested_vmx_vmexit(vcpu); + nested_vmx_vmexit(vcpu, to_
[PATCH 08/12] KVM: nVMX: Clean up handling of VMX-related MSRs
From: Jan Kiszka This simplifies the code and also stops issuing warning about writing to unhandled MSRs when VMX is disabled or the Feature Control MSR is locked - we do handle them all according to the spec. Signed-off-by: Jan Kiszka --- arch/x86/include/uapi/asm/msr-index.h | 1 + arch/x86/kvm/vmx.c| 79 ++- 2 files changed, 24 insertions(+), 56 deletions(-) diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h index 37813b5..2e4a42d 100644 --- a/arch/x86/include/uapi/asm/msr-index.h +++ b/arch/x86/include/uapi/asm/msr-index.h @@ -527,6 +527,7 @@ #define MSR_IA32_VMX_TRUE_PROCBASED_CTLS 0x048e #define MSR_IA32_VMX_TRUE_EXIT_CTLS 0x048f #define MSR_IA32_VMX_TRUE_ENTRY_CTLS 0x0490 +#define MSR_IA32_VMX_VMFUNC 0x0491 /* VMX_BASIC bits and bitmasks */ #define VMX_BASIC_VMCS_SIZE_SHIFT 32 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 9cd6eb7..36efd47 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2361,32 +2361,10 @@ static inline u64 vmx_control_msr(u32 low, u32 high) return low | ((u64)high << 32); } -/* - * If we allow our guest to use VMX instructions (i.e., nested VMX), we should - * also let it use VMX-specific MSRs. - * vmx_get_vmx_msr() and vmx_set_vmx_msr() return 1 when we handled a - * VMX-specific MSR, or 0 when we haven't (and the caller should handle it - * like all other MSRs). - */ +/* Returns 0 on success, non-0 otherwise. */ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata) { - if (!nested_vmx_allowed(vcpu) && msr_index >= MSR_IA32_VMX_BASIC && -msr_index <= MSR_IA32_VMX_TRUE_ENTRY_CTLS) { - /* -* According to the spec, processors which do not support VMX -* should throw a #GP(0) when VMX capability MSRs are read. -*/ - kvm_queue_exception_e(vcpu, GP_VECTOR, 0); - return 1; - } - switch (msr_index) { - case MSR_IA32_FEATURE_CONTROL: - if (nested_vmx_allowed(vcpu)) { - *pdata = to_vmx(vcpu)->nested.msr_ia32_feature_control; - break; - } - return 0; case MSR_IA32_VMX_BASIC: /* * This MSR reports some information about VMX support. We @@ -2453,38 +2431,9 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata) *pdata = nested_vmx_ept_caps; break; default: - return 0; - } - - return 1; -} - -static void vmx_leave_nested(struct kvm_vcpu *vcpu); - -static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) -{ - u32 msr_index = msr_info->index; - u64 data = msr_info->data; - bool host_initialized = msr_info->host_initiated; - - if (!nested_vmx_allowed(vcpu)) - return 0; - - if (msr_index == MSR_IA32_FEATURE_CONTROL) { - if (!host_initialized && - to_vmx(vcpu)->nested.msr_ia32_feature_control - & FEATURE_CONTROL_LOCKED) - return 0; - to_vmx(vcpu)->nested.msr_ia32_feature_control = data; - if (host_initialized && data == 0) - vmx_leave_nested(vcpu); return 1; } - /* -* No need to treat VMX capability MSRs specially: If we don't handle -* them, handle_wrmsr will #GP(0), which is correct (they are readonly) -*/ return 0; } @@ -2530,13 +2479,20 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata) case MSR_IA32_SYSENTER_ESP: data = vmcs_readl(GUEST_SYSENTER_ESP); break; + case MSR_IA32_FEATURE_CONTROL: + if (!nested_vmx_allowed(vcpu)) + return 1; + data = to_vmx(vcpu)->nested.msr_ia32_feature_control; + break; + case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC: + if (!nested_vmx_allowed(vcpu)) + return 1; + return vmx_get_vmx_msr(vcpu, msr_index, pdata); case MSR_TSC_AUX: if (!to_vmx(vcpu)->rdtscp_enabled) return 1; /* Otherwise falls through */ default: - if (vmx_get_vmx_msr(vcpu, msr_index, pdata)) - return 0; msr = find_msr_entry(to_vmx(vcpu), msr_index); if (msr) { data = msr->data; @@ -2549,6 +2505,8 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata) return 0; } +static void vmx_leave_nested(struct kvm_vcpu *vcpu); + /* * Writes msr value into into the appropriate "register". * Returns 0
[PATCH 04/12] KVM: x86: Validate guest writes to MSR_IA32_APICBASE
From: Jan Kiszka Check for invalid state transitions on guest-initiated updates of MSR_IA32_APICBASE. This address both enabling of the x2APIC when it is not supported and all invalid transitions as described in SDM section 10.12.5. It also checks that no reserved bit is set in APICBASE by the guest. Signed-off-by: Jan Kiszka --- arch/x86/kvm/cpuid.h | 8 arch/x86/kvm/lapic.h | 2 +- arch/x86/kvm/vmx.c | 9 + arch/x86/kvm/x86.c | 32 +--- 4 files changed, 39 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index f1e4895..a2a1bb7 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -72,4 +72,12 @@ static inline bool guest_cpuid_has_pcid(struct kvm_vcpu *vcpu) return best && (best->ecx & bit(X86_FEATURE_PCID)); } +static inline bool guest_cpuid_has_x2apic(struct kvm_vcpu *vcpu) +{ + struct kvm_cpuid_entry2 *best; + + best = kvm_find_cpuid_entry(vcpu, 1, 0); + return best && (best->ecx & bit(X86_FEATURE_X2APIC)); +} + #endif diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index c730ac9..3ee60ef 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -65,7 +65,7 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src, struct kvm_lapic_irq *irq, int *r, unsigned long *dest_map); u64 kvm_get_apic_base(struct kvm_vcpu *vcpu); -void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data); +int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info); void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s); int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 2a95ce0..9fa8a1c 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4417,7 +4417,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) static void vmx_vcpu_reset(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); - u64 msr; + struct msr_data apic_base_msr; vmx->rmode.vm86_active = 0; @@ -4425,10 +4425,11 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu) vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val(); kvm_set_cr8(&vmx->vcpu, 0); - msr = 0xfee0 | MSR_IA32_APICBASE_ENABLE; + apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE; if (kvm_vcpu_is_bsp(&vmx->vcpu)) - msr |= MSR_IA32_APICBASE_BSP; - kvm_set_apic_base(&vmx->vcpu, msr); + apic_base_msr.data |= MSR_IA32_APICBASE_BSP; + apic_base_msr.host_initiated = true; + kvm_set_apic_base(&vmx->vcpu, &apic_base_msr); vmx_segment_cache_clear(vmx); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ea7c6a5..559ae75 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -254,10 +254,26 @@ u64 kvm_get_apic_base(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_get_apic_base); -void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data) -{ - /* TODO: reserve bits check */ - kvm_lapic_set_base(vcpu, data); +int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info) +{ + u64 old_state = vcpu->arch.apic_base & + (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE); + u64 new_state = msr_info->data & + (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE); + u64 reserved_bits = ((~0ULL) << boot_cpu_data.x86_phys_bits) | 0x2ff | + (guest_cpuid_has_x2apic(vcpu) ? 0 : X2APIC_ENABLE); + + if (!msr_info->host_initiated && + ((msr_info->data & reserved_bits) != 0 || +new_state == X2APIC_ENABLE || +(new_state == MSR_IA32_APICBASE_ENABLE && + old_state == (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE)) || +(new_state == (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE) && + old_state == 0))) + return 1; + + kvm_lapic_set_base(vcpu, msr_info->data); + return 0; } EXPORT_SYMBOL_GPL(kvm_set_apic_base); @@ -2027,8 +2043,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case 0x200 ... 0x2ff: return set_msr_mtrr(vcpu, msr, data); case MSR_IA32_APICBASE: - kvm_set_apic_base(vcpu, data); - break; + return kvm_set_apic_base(vcpu, msr_info); case APIC_BASE_MSR ... APIC_BASE_MSR + 0x3ff: return kvm_x2apic_msr_write(vcpu, msr, data); case MSR_IA32_TSCDEADLINE: @@ -6459,6 +6474,7 @@ EXPORT_SYMBOL_GPL(kvm_task_switch); int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs) { + struct msr_data apic_base_msr; int mmu_reset_needed = 0; int pending_vec, max_bits, idx; struct desc_ptr dt; @@ -6482,7 +6498,9 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, mmu_reset_needed
[PATCH 10/12] KVM: nVMX: Update guest activity state field on L2 exits
From: Jan Kiszka Set guest activity state in L1's VMCS according to the VCPUs mp_state. This ensures we report the correct state in case we L2 executed HLT or if we put L2 into HLT state and it was now woken up by an event. Signed-off-by: Jan Kiszka --- arch/x86/kvm/vmx.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index bde8ddd..1245ff1 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8223,6 +8223,10 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, vmcs_read32(GUEST_INTERRUPTIBILITY_INFO); vmcs12->guest_pending_dbg_exceptions = vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS); + if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED) + vmcs12->guest_activity_state = GUEST_ACTIVITY_HLT; + else + vmcs12->guest_activity_state = GUEST_ACTIVITY_ACTIVE; if ((vmcs12->pin_based_vm_exec_control & PIN_BASED_VMX_PREEMPTION_TIMER) && (vmcs12->vm_exit_controls & VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) -- 1.8.1.1.298.ge7eed54 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/12] KVM: SVM: Fix reading of DR6
From: Jan Kiszka In contrast to VMX, SVM dose not automatically transfer DR6 into the VCPU's arch.dr6. So if we face a DR6 read, we must consult a new vendor hook to obtain the current value. And as SVM now picks the DR6 state from its VMCB, we also need a set callback in order to write updates of DR6 back. Fixes a regression of 020df0794f. Signed-off-by: Jan Kiszka --- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/svm.c | 15 +++ arch/x86/kvm/vmx.c | 11 +++ arch/x86/kvm/x86.c | 19 +-- 4 files changed, 45 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index ae5d783..e73651b 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -699,6 +699,8 @@ struct kvm_x86_ops { void (*set_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt); void (*get_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt); void (*set_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt); + u64 (*get_dr6)(struct kvm_vcpu *vcpu); + void (*set_dr6)(struct kvm_vcpu *vcpu, unsigned long value); void (*set_dr7)(struct kvm_vcpu *vcpu, unsigned long value); void (*cache_reg)(struct kvm_vcpu *vcpu, enum kvm_reg reg); unsigned long (*get_rflags)(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index c7168a5..e81df8f 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1671,6 +1671,19 @@ static void new_asid(struct vcpu_svm *svm, struct svm_cpu_data *sd) mark_dirty(svm->vmcb, VMCB_ASID); } +static u64 svm_get_dr6(struct kvm_vcpu *vcpu) +{ + return to_svm(vcpu)->vmcb->save.dr6; +} + +static void svm_set_dr6(struct kvm_vcpu *vcpu, unsigned long value) +{ + struct vcpu_svm *svm = to_svm(vcpu); + + svm->vmcb->save.dr6 = value; + mark_dirty(svm->vmcb, VMCB_DR); +} + static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned long value) { struct vcpu_svm *svm = to_svm(vcpu); @@ -4286,6 +4299,8 @@ static struct kvm_x86_ops svm_x86_ops = { .set_idt = svm_set_idt, .get_gdt = svm_get_gdt, .set_gdt = svm_set_gdt, + .get_dr6 = svm_get_dr6, + .set_dr6 = svm_set_dr6, .set_dr7 = svm_set_dr7, .cache_reg = svm_cache_reg, .get_rflags = svm_get_rflags, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index e947cba..55cb4b6 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5149,6 +5149,15 @@ static int handle_dr(struct kvm_vcpu *vcpu) return 1; } +static u64 vmx_get_dr6(struct kvm_vcpu *vcpu) +{ + return vcpu->arch.dr6; +} + +static void vmx_set_dr6(struct kvm_vcpu *vcpu, unsigned long val) +{ +} + static void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val) { vmcs_writel(GUEST_DR7, val); @@ -8558,6 +8567,8 @@ static struct kvm_x86_ops vmx_x86_ops = { .set_idt = vmx_set_idt, .get_gdt = vmx_get_gdt, .set_gdt = vmx_set_gdt, + .get_dr6 = vmx_get_dr6, + .set_dr6 = vmx_set_dr6, .set_dr7 = vmx_set_dr7, .cache_reg = vmx_cache_reg, .get_rflags = vmx_get_rflags, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5f75230..ea7c6a5 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -719,6 +719,12 @@ unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_get_cr8); +static void kvm_update_dr6(struct kvm_vcpu *vcpu) +{ + if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)) + kvm_x86_ops->set_dr6(vcpu, vcpu->arch.dr6); +} + static void kvm_update_dr7(struct kvm_vcpu *vcpu) { unsigned long dr7; @@ -747,6 +753,7 @@ static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) if (val & 0xULL) return -1; /* #GP */ vcpu->arch.dr6 = (val & DR6_VOLATILE) | DR6_FIXED_1; + kvm_update_dr6(vcpu); break; case 5: if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) @@ -788,7 +795,10 @@ static int _kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val) return 1; /* fall through */ case 6: - *val = vcpu->arch.dr6; + if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP) + *val = vcpu->arch.dr6; + else + *val = kvm_x86_ops->get_dr6(vcpu); break; case 5: if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) @@ -2972,8 +2982,11 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu, static void kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu, struct kvm_debugregs *dbgregs) { + unsigned long val; + memcpy(dbgregs->db, vcpu->arch.db, sizeof(vcpu->arch.db)); - dbgregs->dr6 = vcpu->arch.dr6; +
[PATCH 11/12] KVM: nVMX: Rework interception of IRQs and NMIs
From: Jan Kiszka Move the check for leaving L2 on pending and intercepted IRQs or NMIs from the *_allowed handler into a dedicated callback. Invoke this callback at the relevant points before KVM checks if IRQs/NMIs can be injected. The callback has the task to switch from L2 to L1 if needed and inject the proper vmexit events. The rework fixes L2 wakeups from HLT and provides the foundation for preemption timer emulation. Signed-off-by: Jan Kiszka --- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/vmx.c | 67 +++-- arch/x86/kvm/x86.c | 15 +++-- 3 files changed, 53 insertions(+), 31 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index e73651b..d195421 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -764,6 +764,8 @@ struct kvm_x86_ops { struct x86_instruction_info *info, enum x86_intercept_stage stage); void (*handle_external_intr)(struct kvm_vcpu *vcpu); + + int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr); }; struct kvm_arch_async_pf { diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 1245ff1..ec8a976 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4620,22 +4620,8 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked) static int vmx_nmi_allowed(struct kvm_vcpu *vcpu) { - if (is_guest_mode(vcpu)) { - if (to_vmx(vcpu)->nested.nested_run_pending) - return 0; - if (nested_exit_on_nmi(vcpu)) { - nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI, - NMI_VECTOR | INTR_TYPE_NMI_INTR | - INTR_INFO_VALID_MASK, 0); - /* -* The NMI-triggered VM exit counts as injection: -* clear this one and block further NMIs. -*/ - vcpu->arch.nmi_pending = 0; - vmx_set_nmi_mask(vcpu, true); - return 0; - } - } + if (to_vmx(vcpu)->nested.nested_run_pending) + return 0; if (!cpu_has_virtual_nmis() && to_vmx(vcpu)->soft_vnmi_blocked) return 0; @@ -4647,19 +4633,8 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu) static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu) { - if (is_guest_mode(vcpu)) { - if (to_vmx(vcpu)->nested.nested_run_pending) - return 0; - if (nested_exit_on_intr(vcpu)) { - nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, - 0, 0); - /* -* fall through to normal code, but now in L1, not L2 -*/ - } - } - - return (vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) && + return (!to_vmx(vcpu)->nested.nested_run_pending && + vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) && !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS)); } @@ -8158,6 +8133,35 @@ static void vmcs12_save_pending_event(struct kvm_vcpu *vcpu, } } +static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + + if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) { + if (vmx->nested.nested_run_pending) + return -EBUSY; + nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI, + NMI_VECTOR | INTR_TYPE_NMI_INTR | + INTR_INFO_VALID_MASK, 0); + /* +* The NMI-triggered VM exit counts as injection: +* clear this one and block further NMIs. +*/ + vcpu->arch.nmi_pending = 0; + vmx_set_nmi_mask(vcpu, true); + return 0; + } + + if ((kvm_cpu_has_interrupt(vcpu) || external_intr) && + nested_exit_on_intr(vcpu)) { + if (vmx->nested.nested_run_pending) + return -EBUSY; + nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0); + } + + return 0; +} + /* * prepare_vmcs12 is part of what we need to do when the nested L2 guest exits * and we want to prepare to run its L1 parent. L1 keeps a vmcs for L2 (vmcs12), @@ -8498,6 +8502,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, nested_vmx_succeed(vcpu); if (enable_shadow_vmcs) vmx->nested.sync_shadow_vmcs = true; + + /* in case we halted in L2 */ + vcpu->arch.mp_state = KV
[PATCH 12/12] KVM: nVMX: Fully emulate preemption timer
From: Jan Kiszka We cannot rely on the hardware-provided preemption timer support because we are holding L2 in HLT outside non-root mode. Furthermore, emulating the preemption will resolve tick rate errata on older Intel CPUs. The emulation is based on hrtimer which is started on L2 entry, stopped on L2 exit and evaluated via the new check_nested_events hook. As we no longer rely on hardware features, we can enable both the preemption timer support and value saving unconditionally. Signed-off-by: Jan Kiszka --- arch/x86/kvm/vmx.c | 151 ++--- 1 file changed, 96 insertions(+), 55 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index ec8a976..51d13c7 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -31,6 +31,7 @@ #include #include #include +#include #include "kvm_cache_regs.h" #include "x86.h" @@ -110,6 +111,8 @@ module_param(nested, bool, S_IRUGO); #define RMODE_GUEST_OWNED_EFLAGS_BITS (~(X86_EFLAGS_IOPL | X86_EFLAGS_VM)) +#define VMX_MISC_EMULATED_PREEMPTION_TIMER_RATE 5 + /* * These 2 parameters are used to config the controls for Pause-Loop Exiting: * ple_gap:upper bound on the amount of time between two successive @@ -374,6 +377,9 @@ struct nested_vmx { */ struct page *apic_access_page; u64 msr_ia32_feature_control; + + struct hrtimer preemption_timer; + bool preemption_timer_expired; }; #define POSTED_INTR_ON 0 @@ -1047,6 +1053,12 @@ static inline bool nested_cpu_has_virtual_nmis(struct vmcs12 *vmcs12) return vmcs12->pin_based_vm_exec_control & PIN_BASED_VIRTUAL_NMIS; } +static inline bool nested_cpu_has_preemption_timer(struct vmcs12 *vmcs12) +{ + return vmcs12->pin_based_vm_exec_control & + PIN_BASED_VMX_PREEMPTION_TIMER; +} + static inline int nested_cpu_has_ept(struct vmcs12 *vmcs12) { return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_EPT); @@ -2248,9 +2260,9 @@ static __init void nested_vmx_setup_ctls_msrs(void) */ nested_vmx_pinbased_ctls_low |= PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR; nested_vmx_pinbased_ctls_high &= PIN_BASED_EXT_INTR_MASK | - PIN_BASED_NMI_EXITING | PIN_BASED_VIRTUAL_NMIS | + PIN_BASED_NMI_EXITING | PIN_BASED_VIRTUAL_NMIS; + nested_vmx_pinbased_ctls_high |= PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR | PIN_BASED_VMX_PREEMPTION_TIMER; - nested_vmx_pinbased_ctls_high |= PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR; /* * Exit controls @@ -2265,15 +2277,10 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + nested_vmx_exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | + VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER | VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; - if (!(nested_vmx_pinbased_ctls_high & PIN_BASED_VMX_PREEMPTION_TIMER) || - !(nested_vmx_exit_ctls_high & VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) { - nested_vmx_exit_ctls_high &= ~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; - nested_vmx_pinbased_ctls_high &= ~PIN_BASED_VMX_PREEMPTION_TIMER; - } - nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | - VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER); /* entry controls */ rdmsr(MSR_IA32_VMX_ENTRY_CTLS, @@ -2342,9 +2349,9 @@ static __init void nested_vmx_setup_ctls_msrs(void) /* miscellaneous data */ rdmsr(MSR_IA32_VMX_MISC, nested_vmx_misc_low, nested_vmx_misc_high); - nested_vmx_misc_low &= VMX_MISC_PREEMPTION_TIMER_RATE_MASK | - VMX_MISC_SAVE_EFER_LMA; - nested_vmx_misc_low |= VMX_MISC_ACTIVITY_HLT; + nested_vmx_misc_low &= VMX_MISC_SAVE_EFER_LMA; + nested_vmx_misc_low |= VMX_MISC_EMULATED_PREEMPTION_TIMER_RATE | + VMX_MISC_ACTIVITY_HLT; nested_vmx_misc_high = 0; } @@ -5702,6 +5709,18 @@ static void nested_vmx_failValid(struct kvm_vcpu *vcpu, */ } +static enum hrtimer_restart vmx_preemption_timer_fn(struct hrtimer *timer) +{ + struct vcpu_vmx *vmx = + container_of(timer, struct vcpu_vmx, nested.preemption_timer); + + vmx->nested.preemption_timer_expired = true; + kvm_make_request(KVM_REQ_EVENT, &vmx->vcpu); + kvm_vcpu_kick(&vmx->vcpu); + + return HRTIMER_NORESTART; +} + /* * Emulate the VMXON instruction. * Currently, we just remember that VMX is active, and do not save or even @@ -5766,6 +5785,10 @@ static int handle_vmon(struct kvm_vcpu *vcpu) INIT_LIST_HEAD(&(vmx->nested.vmcs02_pool)); vmx->nested.vmcs02_num = 0; + hrtimer_init(&vmx->nested.preemption_timer, CLOCK_MONOTONIC, +HRTIMER_MODE_REL);
[PATCH 00/12] KVM: x86: Fixes for debug registers, IA32_APIC_BASE, and nVMX
This is on top of next after merging in the two patches of mine that are only present in master ATM. Highlights: - reworked fix of DR6 reading on SVM - full check for invalid writes to IA32_APIC_BASE - fixed support for halting in L2 (nVMX) - fully emulated preemption timer (nVMX) - tracing of nested vmexits (nVMX) The patch "KVM: nVMX: Leave VMX mode on clearing of feature control MSR" is included again, unchanged from previous posting. Most fixes are backed by KVM unit tests, to be posted soon as well. Jan Kiszka (12): KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS KVM: SVM: Fix reading of DR6 KVM: VMX: Fix DR6 update on #DB exception KVM: x86: Validate guest writes to MSR_IA32_APICBASE KVM: nVMX: Leave VMX mode on clearing of feature control MSR KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject KVM: nVMX: Clean up handling of VMX-related MSRs KVM: nVMX: Fix nested_run_pending on activity state HLT KVM: nVMX: Update guest activity state field on L2 exits KVM: nVMX: Rework interception of IRQs and NMIs KVM: nVMX: Fully emulate preemption timer arch/x86/include/asm/kvm_host.h | 4 + arch/x86/include/uapi/asm/msr-index.h | 1 + arch/x86/kvm/cpuid.h | 8 + arch/x86/kvm/lapic.h | 2 +- arch/x86/kvm/svm.c| 15 ++ arch/x86/kvm/vmx.c| 399 -- arch/x86/kvm/x86.c| 67 +- 7 files changed, 318 insertions(+), 178 deletions(-) -- 1.8.1.1.298.ge7eed54 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/12] KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS
From: Jan Kiszka Whenever we change arch.dr7, we also have to call kvm_update_dr7. In case guest debugging is off, this will synchronize the new state into hardware. Signed-off-by: Jan Kiszka --- arch/x86/kvm/x86.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1dc0359..5f75230 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2988,6 +2988,7 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu, memcpy(vcpu->arch.db, dbgregs->db, sizeof(vcpu->arch.db)); vcpu->arch.dr6 = dbgregs->dr6; vcpu->arch.dr7 = dbgregs->dr7; + kvm_update_dr7(vcpu); return 0; } -- 1.8.1.1.298.ge7eed54 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] trace-cmd: Report unknown VMX exit reasons with code
From: Jan Kiszka Allows to parse the result even if the KVM plugin does not yet understand a specific exit code. Signed-off-by: Jan Kiszka --- plugin_kvm.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/plugin_kvm.c b/plugin_kvm.c index 8a25cf1..59443e5 100644 --- a/plugin_kvm.c +++ b/plugin_kvm.c @@ -240,9 +240,8 @@ static const char *find_exit_reason(unsigned isa, int val) for (i = 0; strings[i].val >= 0; i++) if (strings[i].val == val) break; - if (strings[i].str) - return strings[i].str; - return "UNKNOWN"; + + return strings[i].str; } static int kvm_exit_handler(struct trace_seq *s, struct pevent_record *record, @@ -251,6 +250,7 @@ static int kvm_exit_handler(struct trace_seq *s, struct pevent_record *record, unsigned long long isa; unsigned long long val; unsigned long long info1 = 0, info2 = 0; + const char *reason; if (pevent_get_field_val(s, event, "exit_reason", record, &val, 1) < 0) return -1; @@ -258,7 +258,11 @@ static int kvm_exit_handler(struct trace_seq *s, struct pevent_record *record, if (pevent_get_field_val(s, event, "isa", record, &isa, 0) < 0) isa = 1; - trace_seq_printf(s, "reason %s", find_exit_reason(isa, val)); + reason = find_exit_reason(isa, val); + if (reason) + trace_seq_printf(s, "reason %s", reason); + else + trace_seq_printf(s, "reason UNKNOWN (%llu)", val); pevent_print_num_field(s, " rip 0x%lx", event, "guest_rip", record, 1); -- 1.8.1.1.298.ge7eed54 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] trace-cmd: Fix and cleanup kvm_nested_vmexit tracepoints
From: Jan Kiszka Fix several issues of kvm_nested_vmexit[_inject]: field width aren't supported with pevent_print, rip was printed twice/incorrectly, SVM ISA was hard-coded, we don't use ':' to separate field names. Signed-off-by: Jan Kiszka --- plugin_kvm.c | 18 ++ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/plugin_kvm.c b/plugin_kvm.c index c407e55..cb52e9e 100644 --- a/plugin_kvm.c +++ b/plugin_kvm.c @@ -330,19 +330,13 @@ static int kvm_emulate_insn_handler(struct trace_seq *s, struct pevent_record *r static int kvm_nested_vmexit_inject_handler(struct trace_seq *s, struct pevent_record *record, struct event_format *event, void *context) { - unsigned long long val; - - pevent_print_num_field(s, " rip %0x016llx", event, "rip", record, 1); - - if (pevent_get_field_val(s, event, "exit_code", record, &val, 1) < 0) + if (print_exit_reason(s, record, event, "exit_code") < 0) return -1; - trace_seq_printf(s, "reason %s", find_exit_reason(2, val)); - - pevent_print_num_field(s, " ext_inf1: %0x016llx", event, "exit_info1", record, 1); - pevent_print_num_field(s, " ext_inf2: %0x016llx", event, "exit_info2", record, 1); - pevent_print_num_field(s, " ext_int: %0x016llx", event, "exit_int_info", record, 1); - pevent_print_num_field(s, " ext_int_err: %0x016llx", event, "exit_int_info_err", record, 1); + pevent_print_num_field(s, " info1 %llx", event, "exit_info1", record, 1); + pevent_print_num_field(s, " info2 %llx", event, "exit_info2", record, 1); + pevent_print_num_field(s, " int_info %llx", event, "exit_int_info", record, 1); + pevent_print_num_field(s, " int_info_err %llx", event, "exit_int_info_err", record, 1); return 0; } @@ -350,7 +344,7 @@ static int kvm_nested_vmexit_inject_handler(struct trace_seq *s, struct pevent_r static int kvm_nested_vmexit_handler(struct trace_seq *s, struct pevent_record *record, struct event_format *event, void *context) { - pevent_print_num_field(s, " rip %0x016llx", event, "rip", record, 1); + pevent_print_num_field(s, "rip %lx ", event, "rip", record, 1); return kvm_nested_vmexit_inject_handler(s, record, event, context); } -- 1.8.1.1.298.ge7eed54 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] trace-cmd: Factor out print_exit_reason in kvm plugin
From: Jan Kiszka We will reuse it for nested vmexit tracepoints. Signed-off-by: Jan Kiszka --- plugin_kvm.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/plugin_kvm.c b/plugin_kvm.c index 59443e5..c407e55 100644 --- a/plugin_kvm.c +++ b/plugin_kvm.c @@ -244,15 +244,14 @@ static const char *find_exit_reason(unsigned isa, int val) return strings[i].str; } -static int kvm_exit_handler(struct trace_seq *s, struct pevent_record *record, - struct event_format *event, void *context) +static int print_exit_reason(struct trace_seq *s, struct pevent_record *record, +struct event_format *event, const char *field) { unsigned long long isa; unsigned long long val; - unsigned long long info1 = 0, info2 = 0; const char *reason; - if (pevent_get_field_val(s, event, "exit_reason", record, &val, 1) < 0) + if (pevent_get_field_val(s, event, field, record, &val, 1) < 0) return -1; if (pevent_get_field_val(s, event, "isa", record, &isa, 0) < 0) @@ -263,6 +262,16 @@ static int kvm_exit_handler(struct trace_seq *s, struct pevent_record *record, trace_seq_printf(s, "reason %s", reason); else trace_seq_printf(s, "reason UNKNOWN (%llu)", val); + return 0; +} + +static int kvm_exit_handler(struct trace_seq *s, struct pevent_record *record, + struct event_format *event, void *context) +{ + unsigned long long info1 = 0, info2 = 0; + + if (print_exit_reason(s, record, event, "exit_reason") < 0) + return -1; pevent_print_num_field(s, " rip 0x%lx", event, "guest_rip", record, 1); -- 1.8.1.1.298.ge7eed54 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] trace-cmd: Updates for kvm plugin
Patch 1 is resent unchanged from a previous round, patches 2 and 3 improve the output of nested vmexit tracepoints. Jan Kiszka (3): trace-cmd: Report unknown VMX exit reasons with code trace-cmd: Factor out print_exit_reason in kvm plugin trace-cmd: Fix and cleanup kvm_nested_vmexit tracepoints plugin_kvm.c | 47 +++ 1 file changed, 27 insertions(+), 20 deletions(-) -- 1.8.1.1.298.ge7eed54 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] ARM: mm: Fix stage-2 device memory attributes
The stage-2 memory attributes are distinct from the Hyp memory attributes and the Stage-1 memory attributes. We were using the stage-1 memory attributes for stage-2 mappings causing device mappings to be mapped as normal memory. Add the S2 equivalent defines for memory attributes and fix the comments explaining the defines while at it. Add a prot_pte_s2 field to the mem_type struct and fill out the field for device mappings accordingly. Signed-off-by: Christoffer Dall --- Changelog[v2]: - Guard the use of L_PTE_S2 defines with s2_policy to allow non-LPAE compiles. arch/arm/include/asm/pgtable-3level.h | 20 +--- arch/arm/mm/mm.h | 1 + arch/arm/mm/mmu.c | 15 ++- 3 files changed, 28 insertions(+), 8 deletions(-) diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h index 4f95039..d5e04d6 100644 --- a/arch/arm/include/asm/pgtable-3level.h +++ b/arch/arm/include/asm/pgtable-3level.h @@ -120,13 +120,19 @@ /* * 2nd stage PTE definitions for LPAE. */ -#define L_PTE_S2_MT_UNCACHED(_AT(pteval_t, 0x5) << 2) /* MemAttr[3:0] */ -#define L_PTE_S2_MT_WRITETHROUGH (_AT(pteval_t, 0xa) << 2) /* MemAttr[3:0] */ -#define L_PTE_S2_MT_WRITEBACK (_AT(pteval_t, 0xf) << 2) /* MemAttr[3:0] */ -#define L_PTE_S2_RDONLY (_AT(pteval_t, 1) << 6) /* HAP[1] */ -#define L_PTE_S2_RDWR (_AT(pteval_t, 3) << 6) /* HAP[2:1] */ - -#define L_PMD_S2_RDWR (_AT(pmdval_t, 3) << 6) /* HAP[2:1] */ +#define L_PTE_S2_MT_UNCACHED (_AT(pteval_t, 0x0) << 2) /* strongly ordered */ +#define L_PTE_S2_MT_WRITETHROUGH (_AT(pteval_t, 0xa) << 2) /* normal inner write-through */ +#define L_PTE_S2_MT_WRITEBACK (_AT(pteval_t, 0xf) << 2) /* normal inner write-back */ +#define L_PTE_S2_MT_DEV_SHARED (_AT(pteval_t, 0x1) << 2) /* device */ +#define L_PTE_S2_MT_DEV_NONSHARED (_AT(pteval_t, 0x1) << 2) /* device */ +#define L_PTE_S2_MT_DEV_WC (_AT(pteval_t, 0x5) << 2) /* normal non-cacheable */ +#define L_PTE_S2_MT_DEV_CACHED (_AT(pteval_t, 0xf) << 2) /* normal inner write-back */ +#define L_PTE_S2_MT_MASK (_AT(pteval_t, 0xf) << 2) + +#define L_PTE_S2_RDONLY(_AT(pteval_t, 1) << 6) /* HAP[1] */ +#define L_PTE_S2_RDWR (_AT(pteval_t, 3) << 6) /* HAP[2:1] */ + +#define L_PMD_S2_RDWR (_AT(pmdval_t, 3) << 6) /* HAP[2:1] */ /* * Hyp-mode PL2 PTE definitions for LPAE. diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h index d5a982d..7ea641b 100644 --- a/arch/arm/mm/mm.h +++ b/arch/arm/mm/mm.h @@ -38,6 +38,7 @@ static inline pmd_t *pmd_off_k(unsigned long virt) struct mem_type { pteval_t prot_pte; + pteval_t prot_pte_s2; pmdval_t prot_l1; pmdval_t prot_sect; unsigned int domain; diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c index 580ef2d..44d571f 100644 --- a/arch/arm/mm/mmu.c +++ b/arch/arm/mm/mmu.c @@ -231,36 +231,48 @@ __setup("noalign", noalign_setup); #endif /* ifdef CONFIG_CPU_CP15 / else */ #define PROT_PTE_DEVICE L_PTE_PRESENT|L_PTE_YOUNG|L_PTE_DIRTY|L_PTE_XN +#define PROT_PTE_S2_DEVICE PROT_PTE_DEVICE #define PROT_SECT_DEVICE PMD_TYPE_SECT|PMD_SECT_AP_WRITE static struct mem_type mem_types[] = { [MT_DEVICE] = { /* Strongly ordered / ARMv6 shared device */ .prot_pte = PROT_PTE_DEVICE | L_PTE_MT_DEV_SHARED | L_PTE_SHARED, + .prot_pte_s2= s2_policy(PROT_PTE_S2_DEVICE) | + s2_policy(L_PTE_S2_MT_DEV_SHARED) | + L_PTE_SHARED, .prot_l1= PMD_TYPE_TABLE, .prot_sect = PROT_SECT_DEVICE | PMD_SECT_S, .domain = DOMAIN_IO, }, [MT_DEVICE_NONSHARED] = { /* ARMv6 non-shared device */ .prot_pte = PROT_PTE_DEVICE | L_PTE_MT_DEV_NONSHARED, + .prot_pte_s2= s2_policy(PROT_PTE_S2_DEVICE) | + s2_policy(L_PTE_S2_MT_DEV_NONSHARED), .prot_l1= PMD_TYPE_TABLE, .prot_sect = PROT_SECT_DEVICE, .domain = DOMAIN_IO, }, [MT_DEVICE_CACHED] = {/* ioremap_cached */ .prot_pte = PROT_PTE_DEVICE | L_PTE_MT_DEV_CACHED, + .prot_pte_s2= s2_policy(PROT_PTE_S2_DEVICE) | + s2_policy(L_PTE_S2_MT_DEV_CACHED), .prot_l1= PMD_TYPE_TABLE, .prot_sect = PROT_SECT_DEVICE | PMD_SECT_WB, .domain = DOMAIN_IO, }, [MT_DEVICE_WC] = { /* ioremap_wc */ .prot_pte = PROT_PTE_DEVICE | L_PTE_MT_DEV_WC, + .prot_pte_s2= s2_policy(PROT_PTE_S2_DEVICE) | +