Re: [PATCH 0/11] RFC: PCI using capabilitities
On 12/08/2011 05:37 PM, Sasha Levin wrote: On Thu, 2011-12-08 at 20:52 +1030, Rusty Russell wrote: Here's the patch series I ended up with. I haven't coded up the QEMU side yet, so no idea if the new driver works. Questions: (1) Do we win from separating ISR, NOTIFY and COMMON? (2) I used a u8 bar; should I use a bir and pack it instead? BIR seems a little obscure (noone else in the kernel source seems to refer to it). I started implementing it for KVM tools, when I noticed a strange thing: my vq creating was failing because the driver was reading a value other than 0 from the address field of a new vq, and failing. I've added simple prints in the usermode code, and saw the following ordering: 1. queue select vq 0 2. queue read address (returns 0 - new vq) 3. queue write address (good address of vq) 4. queue read address (returns !=0, fails) 4. queue select vq 1 From that I understood that the ordering is wrong, the driver was trying to read address before selecting the correct vq. At that point, I've added simple prints to the driver. Initially it looked as follows: iowrite16(index, vp_dev-common-queue_select); switch (ioread64(vp_dev-common-queue_address)) { [...] }; So I added prints before the iowrite16() and after the ioread64(), and saw that while the driver prints were ordered, the device ones weren't: [1.264052] before iowrite index=1 kvmtool: net returning pfn (vq=0): 310706176 kvmtool: queue selected: 1 [1.264890] after ioread index=1 Suspecting that something was wrong with ordering, I've added a print between the iowrite and the ioread, and it finally started working well. Which leads me to the question: Are MMIO vs MMIO reads/writes not ordered? mmios are strictly ordered. Perhaps your printfs are reordered by buffering? Are they from different threads? Are you using coalesced mmio (which is still strictly ordered, if used correctly)? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/11] RFC: PCI using capabilitities
On Sun, 2011-12-11 at 11:05 +0200, Avi Kivity wrote: mmios are strictly ordered. Perhaps your printfs are reordered by buffering? Are they from different threads? Are you using coalesced mmio (which is still strictly ordered, if used correctly)? I print the queue_selector and queue_address in the printfs, even if printfs were reordered they would be printing the data right, unlike they do now. It's the data in the printfs that matters, not their order. Same vcpu thread with both accesses. Not using coalesced mmio. -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 00/13] KVM/ARM Implementation
The following series implements KVM support for ARM processors, specifically on the Cortex A-15 platform. The patch series applies to commit 0ec4044a029b5ba9ed6dc7c52390c25da717e184 on Catalin Marinas' linux-arm-arch tree. This is Version 5 of the patch series, but the first two versions were reviewed outside of the KVM mailing list. Changes can also be pulled from: git://github.com/virtualopensystems/linux-kvm-arm.git kvm-a15-v5 The implementation is broken up into a logical set of patches, the first one containing a skeleton of files, makefile changes, the basic user space interface and KVM architecture specific stubs. Subsequent patches implement parts of the system as listed: 1. Skeleton 2. Identity Mapping for Hyp mode 3. Hypervisor initialization 4. Memory virtualization setup (hyp mode mappings and 2nd stage) 5. Inject IRQs and FIQs from userspace 6. World-switch implementation and Hyp exception vectors 7. Emulation framework and CP15 emulation 8. Handle guest user memory aborts 9. Handle guest MMIO aborts 10. Support guest wait-for-interrupt instructions. 11. Initial SMP host support (incomplete!) 12. Fix guest view of MPIDR 13. Initial SMP guest support (incomplete!) Testing: Limited testing, but have run GCC inside guest, which compiled a small hello-world program, which was successfully run. Hardware still unavailable, so all testing has been done on ARM Fast Models. For a guide on how to set up a testing environment and try out these patches, see: http://www.virtualopensystems.com/media/pdf/kvm-arm-guide.pdf https://wiki.linaro.org/PeterMaydell/A15OnFastModels Still on the to-do list: - Reuse VMIDs - Fix SMP host support - Fix SMP guest support - Support guest Thumb mode for MMIO emulation - Further testing - Performance improvements Changes since v4: - Addressed reviewer comments from v4 * cleanup debug and trace code * remove printks * fixup kvm_arch_vcpu_ioctl_run * add trace details to mmio emulation - Fix from Marc Zyngier: Move kvm_guest_enter/exit into non-preemptible section (squashed into world-switch patch) - Cleanup create_hyp_mappings/remove_hyp_mappings from Marc Zyngier (squashed into hypervisor initialization patch) - Removed the remove_hyp_mappings feature. Removing hypervisor mappings could potentially unmap other important data shared in the same page. - Removed the arm_ prefix from the arch-specific files. - Initial SMP host/guest support Changes since v3: - v4 actually works, fully boots a guest - Support compiling as a module - Use static inlines instead of macros for vcpu_reg and friends - Optimize kvm_vcpu_reg function - Use Ftrace for trace capabilities - Updated documentation and commenting - Use KVM_IRQ_LINE instead of KVM_INTERRUPT - Emulates load/store instructions not supported through HSR syndrome information. - Frees 2nd stage translation tables on VM teardown - Handles IRQ/FIQ instructions - Handles more CP15 accesses - Support guest WFI calls - Uses debugfs instead of /proc - Support compiling in Thumb mode Changes since v2: - Performs world-switch code - Maps guest memory using 2nd stage translation - Emulates co-processor 15 instructions - Forwards I/O faults to QEMU. --- Christoffer Dall (12): ARM: KVM: Initial skeleton to compile KVM support ARM: KVM: Hypervisor identity mapping ARM: KVM: Add hypervisor inititalization ARM: KVM: Memory virtualization setup ARM: KVM: Inject IRQs and FIQs from userspace ARM: KVM: World-switch implementation ARM: KVM: Emulation framework and CP15 emulation ARM: KVM: Handle guest faults in KVM ARM: KVM: Handle I/O aborts ARM: KVM: Guest wait-for-interrupts (WFI) support ARM: KVM: Support SMP hosts ARM: KVM: Support SMP guests Marc Zyngier (1): ARM: KVM: Fix guest view of MPIDR Documentation/virtual/kvm/api.txt | 10 arch/arm/Kconfig|2 arch/arm/Makefile |1 arch/arm/include/asm/kvm.h | 75 +++ arch/arm/include/asm/kvm_arm.h | 130 + arch/arm/include/asm/kvm_asm.h | 51 ++ arch/arm/include/asm/kvm_emulate.h | 100 arch/arm/include/asm/kvm_host.h | 112 arch/arm/include/asm/kvm_mmu.h | 42 ++ arch/arm/include/asm/kvm_para.h |9 arch/arm/include/asm/pgtable-3level-hwdef.h |5 arch/arm/include/asm/pgtable-3level.h | 12 arch/arm/include/asm/pgtable.h | 11 arch/arm/include/asm/unified.h | 12 arch/arm/kernel/armksyms.c |7 arch/arm/kernel/asm-offsets.c | 34 + arch/arm/kernel/entry-armv.S|1 arch/arm/kvm/Kconfig| 44 ++ arch/arm/kvm/Makefile | 17 + arch/arm/kvm/arm.c | 716
[PATCH v5 01/13] ARM: KVM: Initial skeleton to compile KVM support
Targets KVM support for Cortex A-15 processors. Contains no real functionality but all the framework components, make files, header files and some tracing functionality. Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h. Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- arch/arm/Kconfig |2 arch/arm/Makefile |1 arch/arm/include/asm/kvm.h | 66 + arch/arm/include/asm/kvm_asm.h | 28 arch/arm/include/asm/kvm_emulate.h | 91 arch/arm/include/asm/kvm_host.h| 93 arch/arm/include/asm/kvm_para.h|9 + arch/arm/include/asm/unified.h | 12 ++ arch/arm/kvm/Kconfig | 44 ++ arch/arm/kvm/Makefile | 17 ++ arch/arm/kvm/arm.c | 279 arch/arm/kvm/debug.h | 48 ++ arch/arm/kvm/emulate.c | 121 arch/arm/kvm/exports.c | 16 ++ arch/arm/kvm/guest.c | 148 +++ arch/arm/kvm/init.S| 17 ++ arch/arm/kvm/interrupts.S | 17 ++ arch/arm/kvm/mmu.c | 15 ++ arch/arm/kvm/trace.h | 52 +++ arch/arm/mach-vexpress/Kconfig |1 arch/arm/mm/Kconfig|8 + 21 files changed, 1085 insertions(+), 0 deletions(-) create mode 100644 arch/arm/include/asm/kvm.h create mode 100644 arch/arm/include/asm/kvm_asm.h create mode 100644 arch/arm/include/asm/kvm_emulate.h create mode 100644 arch/arm/include/asm/kvm_host.h create mode 100644 arch/arm/include/asm/kvm_para.h create mode 100644 arch/arm/kvm/Kconfig create mode 100644 arch/arm/kvm/Makefile create mode 100644 arch/arm/kvm/arm.c create mode 100644 arch/arm/kvm/debug.h create mode 100644 arch/arm/kvm/emulate.c create mode 100644 arch/arm/kvm/exports.c create mode 100644 arch/arm/kvm/guest.c create mode 100644 arch/arm/kvm/init.S create mode 100644 arch/arm/kvm/interrupts.S create mode 100644 arch/arm/kvm/mmu.c create mode 100644 arch/arm/kvm/trace.h diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 00e908b..2a65d7b 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -2248,3 +2248,5 @@ source security/Kconfig source crypto/Kconfig source lib/Kconfig + +source arch/arm/kvm/Kconfig diff --git a/arch/arm/Makefile b/arch/arm/Makefile index dfcf3b0..621fb8d 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -255,6 +255,7 @@ core-$(CONFIG_VFP) += arch/arm/vfp/ # If we have a machine-specific directory, then include it in the build. core-y += arch/arm/kernel/ arch/arm/mm/ arch/arm/common/ +core-y += arch/arm/kvm/ core-y += $(machdirs) $(platdirs) drivers-$(CONFIG_OPROFILE) += arch/arm/oprofile/ diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h new file mode 100644 index 000..87dc33b --- /dev/null +++ b/arch/arm/include/asm/kvm.h @@ -0,0 +1,66 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + * + */ + +#ifndef __ARM_KVM_H__ +#define __ARM_KVM_H__ + +#include asm/types.h + +/* + * Modes used for short-hand mode determinition in the world-switch code and + * in emulation code. + * + * Note: These indices do NOT correspond to the value of the CPSR mode bits! + */ +#define MODE_FIQ 0 +#define MODE_IRQ 1 +#define MODE_SVC 2 +#define MODE_ABT 3 +#define MODE_UND 4 +#define MODE_USR 5 +#define MODE_SYS 6 + +struct kvm_regs { + __u32 regs0_7[8]; /* Unbanked regs. (r0 - r7)*/ + __u32 fiq_regs8_12[5]; /* Banked fiq regs. (r8 - r12) */ + __u32 usr_regs8_12[5]; /* Banked usr registers (r8 - r12) */ + __u32 reg13[6]; /* Banked r13, indexed by MODE_*/ + __u32 reg14[6]; /* Banked r13, indexed by MODE_*/ + __u32 reg15; + __u32 cpsr; + __u32 spsr[5]; /* Banked SPSR, indexed by MODE_ */ + struct { + __u32 c1_sys; + __u32 c2_base0; + __u32 c2_base1; + __u32 c3_dacr; + } cp15; + +}; + +struct kvm_sregs { +}; + +struct kvm_fpu { +}; + +struct kvm_guest_debug_arch { +}; + +struct kvm_debug_exit_arch { +}; + +#endif
[PATCH v5 02/13] ARM: KVM: Hypervisor identity mapping
From: Christoffer Dall cd...@cs.columbia.edu Adds support in the identity mapping feature that allows KVM to setup identity mapping for the Hyp mode with the AP[1] bit set as required by the specification and also supports freeing created sub pmd's after finished use. These two functions: - hyp_identity_mapping_add(pgd, addr, end); - hyp_identity_mapping_del(pgd, addr, end); are essentially calls to the same function as the non-hyp versions but with a different argument value. KVM calls these functions to setup and teardown the identity mapping used to initialize the hypervisor. Note, the hyp-version of the _del function actually frees the pmd's pointed to by the pgd as opposed to the non-hyp version which just clears them. Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- arch/arm/include/asm/pgtable-3level-hwdef.h |1 + arch/arm/include/asm/pgtable.h |6 +++ arch/arm/mm/idmap.c | 54 +++ 3 files changed, 60 insertions(+), 1 deletions(-) diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h index d795282..a2d404e 100644 --- a/arch/arm/include/asm/pgtable-3level-hwdef.h +++ b/arch/arm/include/asm/pgtable-3level-hwdef.h @@ -44,6 +44,7 @@ #define PMD_SECT_XN(_AT(pmdval_t, 1) 54) #define PMD_SECT_AP_WRITE (_AT(pmdval_t, 0)) #define PMD_SECT_AP_READ (_AT(pmdval_t, 0)) +#define PMD_SECT_AP1 (_AT(pmdval_t, 1) 6) #define PMD_SECT_TEX(x)(_AT(pmdval_t, 0)) /* diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h index aec18ab..19456f4 100644 --- a/arch/arm/include/asm/pgtable.h +++ b/arch/arm/include/asm/pgtable.h @@ -318,6 +318,12 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) void identity_mapping_add(pgd_t *, unsigned long, unsigned long); void identity_mapping_del(pgd_t *, unsigned long, unsigned long); +#ifdef CONFIG_KVM_ARM_HOST +void hyp_identity_mapping_add(pgd_t *, unsigned long, unsigned long); +void hyp_identity_mapping_del(pgd_t *pgd, unsigned long addr, + unsigned long end); +#endif + #endif /* !__ASSEMBLY__ */ #endif /* CONFIG_MMU */ diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c index 267db72..e29903a 100644 --- a/arch/arm/mm/idmap.c +++ b/arch/arm/mm/idmap.c @@ -1,3 +1,4 @@ +#include linux/module.h #include linux/kernel.h #include asm/cputype.h @@ -54,11 +55,18 @@ static void idmap_add_pud(pgd_t *pgd, unsigned long addr, unsigned long end, } while (pud++, addr = next, addr != end); } -void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end) +static void __identity_mapping_add(pgd_t *pgd, unsigned long addr, + unsigned long end, bool hyp_mapping) { unsigned long prot, next; prot = PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF; + +#ifdef CONFIG_ARM_LPAE + if (hyp_mapping) + prot |= PMD_SECT_AP1; +#endif + if (cpu_architecture() = CPU_ARCH_ARMv5TEJ !cpu_is_xscale()) prot |= PMD_BIT4; @@ -69,6 +77,12 @@ void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end) } while (pgd++, addr = next, addr != end); } +void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end) +{ + __identity_mapping_add(pgd, addr, end, false); +} + + #ifdef CONFIG_SMP static void idmap_del_pmd(pud_t *pud, unsigned long addr, unsigned long end) { @@ -103,6 +117,44 @@ void identity_mapping_del(pgd_t *pgd, unsigned long addr, unsigned long end) } #endif +#ifdef CONFIG_KVM_ARM_HOST +void hyp_identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end) +{ + __identity_mapping_add(pgd, addr, end, true); +} +EXPORT_SYMBOL_GPL(hyp_identity_mapping_add); + +static void hyp_idmap_del_pmd(pgd_t *pgd, unsigned long addr) +{ + pud_t *pud; + pmd_t *pmd; + + pud = pud_offset(pgd, addr); + pmd = pmd_offset(pud, addr); + pmd_free(NULL, pmd); +} + +/* + * This version actually frees the underlying pmds for all pgds in range and + * clear the pgds themselves afterwards. + */ +void hyp_identity_mapping_del(pgd_t *pgd, unsigned long addr, unsigned long end) +{ + unsigned long next; + pgd_t *next_pgd; + + do { + next = pgd_addr_end(addr, end); + next_pgd = pgd + pgd_index(addr); + if (!pgd_none_or_clear_bad(next_pgd)) { + hyp_idmap_del_pmd(next_pgd, addr); + pgd_clear(next_pgd); + } + } while (addr = next, addr end); +} +EXPORT_SYMBOL_GPL(hyp_identity_mapping_del); +#endif + /* * In order to soft-boot, we need to insert a 1:1 mapping in place of * the user-mode pages. This will then ensure that we have predictable -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a
[PATCH v5 03/13] ARM: KVM: Add hypervisor inititalization
Sets up the required registers to run code in HYP-mode from the kernel. No major controversies, but we should consider how to deal with SMP support for hypervisor stack page. By setting the HVBAR the kernel can execute code in Hyp-mode with the MMU disabled. The HVBAR initially points to initialization code, which initializes other Hyp-mode registers and enables the MMU for Hyp-mode. Afterwards, the HVBAR is changed to point to KVM Hyp vectors used to catch guest faults and to switch to Hyp mode to perform a world-switch into a KVM guest. Also provides memory mapping code to map required code pages and data structures accessed in Hyp mode at the same virtual address as the host kernel virtual addresses, but which conforms to the architectural requirements for translations in Hyp mode. This interface is added in arch/arm/kvm/arm_mmu.c and is comprised of: - create_hyp_mappings(hyp_pgd, start, end); - free_hyp_pmds(pgd_hyp); See the implementation for more details. Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- arch/arm/include/asm/kvm_arm.h | 103 + arch/arm/include/asm/kvm_asm.h | 23 arch/arm/include/asm/kvm_host.h |1 arch/arm/include/asm/kvm_mmu.h | 35 ++ arch/arm/include/asm/pgtable-3level-hwdef.h |4 + arch/arm/include/asm/pgtable-3level.h |4 + arch/arm/include/asm/pgtable.h |1 arch/arm/kvm/arm.c | 166 +++ arch/arm/kvm/exports.c | 10 ++ arch/arm/kvm/init.S | 98 arch/arm/kvm/interrupts.S | 30 + arch/arm/kvm/mmu.c | 152 + mm/memory.c |1 13 files changed, 628 insertions(+), 0 deletions(-) create mode 100644 arch/arm/include/asm/kvm_arm.h create mode 100644 arch/arm/include/asm/kvm_mmu.h diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h new file mode 100644 index 000..835abd1 --- /dev/null +++ b/arch/arm/include/asm/kvm_arm.h @@ -0,0 +1,103 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + * + */ + +#ifndef __KVM_ARM_H__ +#define __KVM_ARM_H__ + +#include asm/types.h + +/* Hyp Configuration Register (HCR) bits */ +#define HCR_TGE(1 27) +#define HCR_TVM(1 26) +#define HCR_TTLB (1 25) +#define HCR_TPU(1 24) +#define HCR_TPC(1 23) +#define HCR_TSW(1 22) +#define HCR_TAC(1 21) +#define HCR_TIDCP (1 20) +#define HCR_TSC(1 19) +#define HCR_TID3 (1 18) +#define HCR_TID2 (1 17) +#define HCR_TID1 (1 16) +#define HCR_TID0 (1 15) +#define HCR_TWE(1 14) +#define HCR_TWI(1 13) +#define HCR_DC (1 12) +#define HCR_BSU(3 10) +#define HCR_FB (1 9) +#define HCR_VA (1 8) +#define HCR_VI (1 7) +#define HCR_VF (1 6) +#define HCR_AMO(1 5) +#define HCR_IMO(1 4) +#define HCR_FMO(1 3) +#define HCR_PTW(1 2) +#define HCR_SWIO (1 1) +#define HCR_VM 1 +#define HCR_GUEST_MASK (HCR_TSC | HCR_TWE | HCR_TWI | HCR_VM | HCR_AMO | \ + HCR_AMO | HCR_IMO | HCR_FMO | HCR_SWIO) + +/* Hyp System Control Register (HSCTLR) bits */ +#define HSCTLR_TE (1 30) +#define HSCTLR_EE (1 25) +#define HSCTLR_FI (1 21) +#define HSCTLR_WXN (1 19) +#define HSCTLR_I (1 12) +#define HSCTLR_C (1 2) +#define HSCTLR_A (1 1) +#define HSCTLR_M 1 +#define HSCTLR_MASK(HSCTLR_M | HSCTLR_A | HSCTLR_C | HSCTLR_I | \ +HSCTLR_WXN | HSCTLR_FI | HSCTLR_EE | HSCTLR_TE) + +/* TTBCR and HTCR Registers bits */ +#define TTBCR_EAE (1 31) +#define TTBCR_IMP (1 30) +#define TTBCR_SH1 (3 28) +#define TTBCR_ORGN1(3 26) +#define TTBCR_IRGN1(3 24) +#define TTBCR_EPD1 (1 23) +#define TTBCR_A1 (1 22) +#define TTBCR_T1SZ (3 16) +#define TTBCR_SH0 (3 12) +#define TTBCR_ORGN0(3 10) +#define TTBCR_IRGN0(3 8) +#define TTBCR_EPD0 (1 7) +#define TTBCR_T0SZ 3 +#define HTCR_MASK
[PATCH v5 04/13] ARM: KVM: Memory virtualization setup
This commit introduces the framework for guest memory management through the use of 2nd stage translation. Each VM has a pointer to a level-1 tabled (the pgd field in struct kvm_arch) which is used for the 2nd stage translations. Entries are added when handling guest faults (later patch) and the table itself can be allocated and freed through the following functions implemented in arch/arm/kvm/arm_mmu.c: - kvm_alloc_stage2_pgd(struct kvm *kvm); - kvm_free_stage2_pgd(struct kvm *kvm); Further, each entry in TLBs and caches are tagged with a VMID identifier in addition to ASIDs. The VMIDs are managed using a bitmap and assigned when creating the VM in kvm_arch_init_vm() where the 2nd stage pgd is also allocated. The table is freed in kvm_arch_destroy_vm(). Both functions are called from the main KVM code. Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- arch/arm/include/asm/kvm_host.h |4 ++ arch/arm/include/asm/kvm_mmu.h |5 +++ arch/arm/kvm/arm.c | 59 +++-- arch/arm/kvm/mmu.c | 69 +++ 4 files changed, 132 insertions(+), 5 deletions(-) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 6a10467..06d1263 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -31,7 +31,9 @@ struct kvm_vcpu; u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode); struct kvm_arch { - pgd_t *pgd; /* 1-level 2nd stage table */ + u32vmid;/* The VMID used for the virt. memory system */ + pgd_t *pgd; /* 1-level 2nd stage table */ + u64vttbr; /* VTTBR value associated with above pgd and vmid */ }; #define EXCEPTION_NONE 0 diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 13fd8dc..9d7440c 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -32,4 +32,9 @@ extern pgd_t *kvm_hyp_pgd; int create_hyp_mappings(pgd_t *hyp_pgd, void *from, void *to); void free_hyp_pmds(pgd_t *hyp_pgd); +int kvm_alloc_stage2_pgd(struct kvm *kvm); +void kvm_free_stage2_pgd(struct kvm *kvm); + +int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run); + #endif /* __ARM_KVM_MMU_H__ */ diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index e6bdf50..89ba18d 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -94,15 +94,62 @@ void kvm_arch_sync_events(struct kvm *kvm) { } +/** + * kvm_arch_init_vm - initializes a VM data structure + * @kvm: pointer to the KVM struct + */ int kvm_arch_init_vm(struct kvm *kvm) { - return 0; + int ret = 0; + phys_addr_t pgd_phys; + unsigned long vmid; + + mutex_lock(kvm_vmids_mutex); + vmid = find_first_zero_bit(kvm_vmids, VMID_SIZE); + if (vmid = VMID_SIZE) { + mutex_unlock(kvm_vmids_mutex); + return -EBUSY; + } + __set_bit(vmid, kvm_vmids); + kvm-arch.vmid = vmid; + mutex_unlock(kvm_vmids_mutex); + + ret = kvm_alloc_stage2_pgd(kvm); + if (ret) + goto out_fail_alloc; + + pgd_phys = virt_to_phys(kvm-arch.pgd); + kvm-arch.vttbr = pgd_phys ((1LLU 40) - 1) ~((2 VTTBR_X) - 1); + kvm-arch.vttbr |= ((u64)vmid 48); + + ret = create_hyp_mappings(kvm_hyp_pgd, kvm, kvm + 1); + if (ret) + goto out_free_stage2_pgd; + + return ret; +out_free_stage2_pgd: + kvm_free_stage2_pgd(kvm); +out_fail_alloc: + clear_bit(vmid, kvm_vmids); + return ret; } +/** + * kvm_arch_destroy_vm - destroy the VM data structure + * @kvm: pointer to the KVM struct + */ void kvm_arch_destroy_vm(struct kvm *kvm) { int i; + kvm_free_stage2_pgd(kvm); + + if (kvm-arch.vmid != 0) { + mutex_lock(kvm_vmids_mutex); + clear_bit(kvm-arch.vmid, kvm_vmids); + mutex_unlock(kvm_vmids_mutex); + } + for (i = 0; i KVM_MAX_VCPUS; ++i) { if (kvm-vcpus[i]) { kvm_arch_vcpu_free(kvm-vcpus[i]); @@ -178,6 +225,10 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id) if (err) goto free_vcpu; + err = create_hyp_mappings(kvm_hyp_pgd, vcpu, vcpu + 1); + if (err) + goto free_vcpu; + return vcpu; free_vcpu: kmem_cache_free(kvm_vcpu_cache, vcpu); @@ -187,7 +238,7 @@ out: void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu) { - KVMARM_NOT_IMPLEMENTED(); + kmem_cache_free(kvm_vcpu_cache, vcpu); } void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) @@ -293,8 +344,8 @@ static int init_hyp_mode(void) hyp_stack_ptr = (unsigned long)kvm_arm_hyp_stack_page + PAGE_SIZE; - init_phys_addr = virt_to_phys((void *)__kvm_hyp_init); - init_end_phys_addr = virt_to_phys((void *)__kvm_hyp_init_end); + init_phys_addr
[PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace
Userspace can inject IRQs and FIQs through the KVM_IRQ_LINE VM ioctl. This ioctl is used since the sematics are in fact two lines that can be either raised or lowered on the VCPU - the IRQ and FIQ lines. KVM needs to know which VCPU it must operate on and whether the FIQ or IRQ line is raised/lowered. Hence both pieces of information is packed in the kvm_irq_level-irq field. The irq fild value will be: IRQ: vcpu_index * 2 FIQ: (vcpu_index * 2) + 1 This is documented in Documentation/kvm/api.txt. The effect of the ioctl is simply to simply raise/lower the corresponding virt_irq field on the VCPU struct, which will cause the world-switch code to raise/lower virtual interrupts when running the guest on next switch. The wait_for_interrupt flag is also cleared for raised IRQs causing an idle VCPU to become active again. Note: The custom trace_kvm_irq_line is used despite a generic definition of trace_kvm_set_irq, since the trace-Kvm_set_irq depends on the x86-specific define of __HAVE_IOAPIC. Either the trace event should be created regardless of this define or it should depend on another ifdef clause, common for both x86 and ARM. However, since the arguments don't really match those used in ARM, I am yet to be convinced why this is necessary. Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- Documentation/virtual/kvm/api.txt | 10 ++- arch/arm/include/asm/kvm.h|8 ++ arch/arm/include/asm/kvm_arm.h|1 + arch/arm/kvm/arm.c| 53 - arch/arm/kvm/trace.h | 21 +++ include/linux/kvm.h |1 + 6 files changed, 91 insertions(+), 3 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 7945b0b..4abaa67 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -572,7 +572,7 @@ only go to the IOAPIC. On ia64, a IOSAPIC is created. 4.25 KVM_IRQ_LINE Capability: KVM_CAP_IRQCHIP -Architectures: x86, ia64 +Architectures: x86, ia64, arm Type: vm ioctl Parameters: struct kvm_irq_level Returns: 0 on success, -1 on error @@ -582,6 +582,14 @@ Requires that an interrupt controller model has been previously created with KVM_CREATE_IRQCHIP. Note that edge-triggered interrupts require the level to be set to 1 and then back to 0. +KVM_CREATE_IRQCHIP (except for ARM). Note that edge-triggered interrupts +require the level to be set to 1 and then back to 0. + +ARM uses two types of interrupt lines per CPU, ie. IRQ and FIQ. The value of the +irq field should be (VCPU_INDEX * 2) for IRQs and ((VCPU_INDEX * 2) + 1) for +FIQs. Level is used to raise/lower the line. See arch/arm/include/asm/kvm.h for +convenience macros. + struct kvm_irq_level { union { __u32 irq; /* GSI */ diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h index 87dc33b..8935062 100644 --- a/arch/arm/include/asm/kvm.h +++ b/arch/arm/include/asm/kvm.h @@ -20,6 +20,14 @@ #include asm/types.h /* + * KVM_IRQ_LINE macros to set/read IRQ/FIQ for specific VCPU index. + */ +enum KVM_ARM_IRQ_LINE_TYPE { + KVM_ARM_IRQ_LINE = 0, + KVM_ARM_FIQ_LINE = 1, +}; + +/* * Modes used for short-hand mode determinition in the world-switch code and * in emulation code. * diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h index 835abd1..e378a37 100644 --- a/arch/arm/include/asm/kvm_arm.h +++ b/arch/arm/include/asm/kvm_arm.h @@ -49,6 +49,7 @@ #define HCR_VM 1 #define HCR_GUEST_MASK (HCR_TSC | HCR_TWE | HCR_TWI | HCR_VM | HCR_AMO | \ HCR_AMO | HCR_IMO | HCR_FMO | HCR_SWIO) +#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF) /* Hyp System Control Register (HSCTLR) bits */ #define HSCTLR_TE (1 30) diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 89ba18d..fc0bd6b 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -299,6 +299,43 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) return -EINVAL; } +static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm, + struct kvm_irq_level *irq_level) +{ + u32 mask; + unsigned int vcpu_idx; + struct kvm_vcpu *vcpu; + + vcpu_idx = irq_level-irq / 2; + if (vcpu_idx = KVM_MAX_VCPUS) + return -EINVAL; + + vcpu = kvm_get_vcpu(kvm, vcpu_idx); + if (!vcpu) + return -EINVAL; + + switch (irq_level-irq % 2) { + case KVM_ARM_IRQ_LINE: + mask = HCR_VI; + break; + case KVM_ARM_FIQ_LINE: + mask = HCR_VF; + break; + default: + return -EINVAL; + } + + trace_kvm_irq_line(irq_level-irq % 2, irq_level-level, vcpu_idx); + + if (irq_level-level) { + vcpu-arch.virt_irq |= mask; + vcpu-arch.wait_for_interrupts =
[PATCH v5 06/13] ARM: KVM: World-switch implementation
Provides complete world-switch implementation to switch to other guests runinng in non-secure modes. Includes Hyp exception handlers that captures necessary exception information and stores the information on the VCPU and KVM structures. Switching to Hyp mode is done through a simple HVC instructions. The exception vector code will check that the HVC comes from VMID==0 and if so will store the necessary state on the Hyp stack, which will look like this (see hyp_hvc): ... Hyp_Sp + 4: lr_usr Hyp_Sp: spsr (Host-SVC cpsr) When returning from Hyp mode to SVC mode, another HVC instruction is executed from Hyp mode, which is taken in the Hyp_Svc handler. The Hyp stack pointer should be where it was left from the above initial call, since the values on the stack will be used to restore state (see hyp_svc). Otherwise, the world-switch is pretty straight-forward. All state that can be modified by the guest is first backed up on the Hyp stack and the VCPU values is loaded onto the hardware. State, which is not loaded, but theoretically modifiable by the guest is protected through the virtualiation features to generate a trap and cause software emulation. Upon guest returns, all state is restored from hardware onto the VCPU struct and the original state is restored from the Hyp-stack onto the hardware. One controversy may be the back-door call to __irq_svc (the host kernel's own physical IRQ handler) which is called when a physical IRQ exception is taken in Hyp mode while running in the guest. Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- arch/arm/include/asm/kvm.h |1 arch/arm/include/asm/kvm_arm.h | 26 ++ arch/arm/include/asm/kvm_host.h |8 + arch/arm/kernel/armksyms.c |7 + arch/arm/kernel/asm-offsets.c | 33 +++ arch/arm/kernel/entry-armv.S|1 arch/arm/kvm/arm.c | 45 arch/arm/kvm/guest.c|2 arch/arm/kvm/interrupts.S | 443 +++ 9 files changed, 562 insertions(+), 4 deletions(-) diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h index 8935062..ff88ca0 100644 --- a/arch/arm/include/asm/kvm.h +++ b/arch/arm/include/asm/kvm.h @@ -51,6 +51,7 @@ struct kvm_regs { __u32 cpsr; __u32 spsr[5]; /* Banked SPSR, indexed by MODE_ */ struct { + __u32 c0_midr; __u32 c1_sys; __u32 c2_base0; __u32 c2_base1; diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h index e378a37..1769187 100644 --- a/arch/arm/include/asm/kvm_arm.h +++ b/arch/arm/include/asm/kvm_arm.h @@ -100,5 +100,31 @@ #define VTTBR_X(5 - VTCR_GUEST_T0SZ) #endif +/* Hyp Syndrome Register (HSR) bits */ +#define HSR_EC_SHIFT (26) +#define HSR_EC (0x3fU HSR_EC_SHIFT) +#define HSR_IL (1U 25) +#define HSR_ISS(HSR_IL - 1) +#define HSR_ISV_SHIFT (24) +#define HSR_ISV(1U HSR_ISV_SHIFT) + +#define HSR_EC_UNKNOWN (0x00) +#define HSR_EC_WFI (0x01) +#define HSR_EC_CP15_32 (0x03) +#define HSR_EC_CP15_64 (0x04) +#define HSR_EC_CP14_MR (0x05) +#define HSR_EC_CP14_LS (0x06) +#define HSR_EC_CP_0_13 (0x07) +#define HSR_EC_CP10_ID (0x08) +#define HSR_EC_JAZELLE (0x09) +#define HSR_EC_BXJ (0x0A) +#define HSR_EC_CP14_64 (0x0C) +#define HSR_EC_SVC_HYP (0x11) +#define HSR_EC_HVC (0x12) +#define HSR_EC_SMC (0x13) +#define HSR_EC_IABT(0x20) +#define HSR_EC_IABT_HYP(0x21) +#define HSR_EC_DABT(0x24) +#define HSR_EC_DABT_HYP(0x25) #endif /* __KVM_ARM_H__ */ diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 06d1263..59fcd15 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -62,6 +62,7 @@ struct kvm_vcpu_arch { /* System control coprocessor (cp15) */ struct { + u32 c0_MIDR;/* Main ID Register */ u32 c1_SCTLR; /* System Control Register */ u32 c1_ACTLR; /* Auxilliary Control Register */ u32 c1_CPACR; /* Coprocessor Access Control */ @@ -69,6 +70,12 @@ struct kvm_vcpu_arch { u64 c2_TTBR1; /* Translation Table Base Register 1 */ u32 c2_TTBCR; /* Translation Table Base Control R. */ u32 c3_DACR;/* Domain Access Control Register */ + u32 c10_PRRR; /* Primary Region Remap Register */ + u32 c10_NMRR; /* Normal Memory Remap Register */ + u32 c13_CID;/* Context ID Register */ + u32 c13_TID_URW;/* Thread ID, User R/W */ + u32 c13_TID_URO;/* Thread ID, User R/O */ + u32 c13_TID_PRIV; /* Thread ID, Priveleged */ } cp15; u32 virt_irq; /* HCR exception mask */ @@ -78,6 +85,7
[PATCH v5 07/13] ARM: KVM: Emulation framework and CP15 emulation
From: Christoffer Dall cd...@cs.columbia.edu Adds a new important function in the main KVM/ARM code called handle_exit() which is called from kvm_arch_vcpu_ioctl_run() on returns from guest execution. This function examines the Hyp-Syndrome-Register (HSR), which contains information telling KVM what caused the exit from the guest. Some of the reasons for an exit are CP15 accesses, which are not allowed from the guest and this commits handles these exits by emulating the intented operation in software and skip the guest instruction. Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- arch/arm/include/asm/kvm_emulate.h |7 + arch/arm/kvm/arm.c | 77 ++ arch/arm/kvm/emulate.c | 195 arch/arm/kvm/trace.h | 28 + 4 files changed, 307 insertions(+), 0 deletions(-) diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h index 91d461a..af21fd5 100644 --- a/arch/arm/include/asm/kvm_emulate.h +++ b/arch/arm/include/asm/kvm_emulate.h @@ -40,6 +40,13 @@ static inline unsigned char vcpu_mode(struct kvm_vcpu *vcpu) return modes_table[vcpu-arch.regs.cpsr 0xf]; } +int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run); +int kvm_handle_cp_0_13_access(struct kvm_vcpu *vcpu, struct kvm_run *run); +int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run); +int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run); +int kvm_handle_cp15_access(struct kvm_vcpu *vcpu, struct kvm_run *run); +int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run); + /* * Return the SPSR for the specified mode of the virtual CPU. */ diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index f5d..a6e1763 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -35,6 +35,7 @@ #include asm/kvm_arm.h #include asm/kvm_asm.h #include asm/kvm_mmu.h +#include asm/kvm_emulate.h #include debug.h @@ -306,6 +307,62 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v) return 0; } +static inline int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run, + int exception_index) +{ + unsigned long hsr_ec; + + if (exception_index == ARM_EXCEPTION_IRQ) + return 0; + + if (exception_index != ARM_EXCEPTION_HVC) { + kvm_err(-EINVAL, Unsupported exception type); + return -EINVAL; + } + + hsr_ec = (vcpu-arch.hsr HSR_EC) HSR_EC_SHIFT; + switch (hsr_ec) { + case HSR_EC_WFI: + return kvm_handle_wfi(vcpu, run); + case HSR_EC_CP15_32: + case HSR_EC_CP15_64: + return kvm_handle_cp15_access(vcpu, run); + case HSR_EC_CP14_MR: + return kvm_handle_cp14_access(vcpu, run); + case HSR_EC_CP14_LS: + return kvm_handle_cp14_load_store(vcpu, run); + case HSR_EC_CP14_64: + return kvm_handle_cp14_access(vcpu, run); + case HSR_EC_CP_0_13: + return kvm_handle_cp_0_13_access(vcpu, run); + case HSR_EC_CP10_ID: + return kvm_handle_cp10_id(vcpu, run); + case HSR_EC_SVC_HYP: + /* SVC called from Hyp mode should never get here */ + kvm_msg(SVC called from Hyp mode shouldn't go here); + BUG(); + case HSR_EC_HVC: + kvm_msg(hvc: %x (at %08x), vcpu-arch.hsr ((1 16) - 1), +vcpu-arch.regs.pc); + kvm_msg( HSR: %8x, vcpu-arch.hsr); + break; + case HSR_EC_IABT: + case HSR_EC_DABT: + return kvm_handle_guest_abort(vcpu, run); + case HSR_EC_IABT_HYP: + case HSR_EC_DABT_HYP: + /* The hypervisor should never cause aborts */ + kvm_msg(The hypervisor itself shouldn't cause aborts); + BUG(); + default: + kvm_msg(Unkown exception class: %08x (%08x), hsr_ec, + vcpu-arch.hsr); + BUG(); + } + + return 0; +} + /** * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code * @vcpu: The VCPU pointer @@ -333,6 +390,26 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) local_irq_enable(); trace_kvm_exit(vcpu-arch.regs.pc); + + ret = handle_exit(vcpu, run, ret); + if (ret) { + kvm_err(ret, Error in handle_exit); + break; + } + + if (run-exit_reason == KVM_EXIT_MMIO) + break; + + if (need_resched()) { + vcpu_put(vcpu); + schedule(); + vcpu_load(vcpu); + } + + if (signal_pending(current) !(run-exit_reason)) { +
[PATCH v5 08/13] ARM: KVM: Handle guest faults in KVM
From: Christoffer Dall cd...@cs.columbia.edu Handles the guest faults in KVM by mapping in corresponding user pages in the 2nd stage page tables. Introduces new ARM-specific kernel memory types, PAGE_KVM_GUEST and pgprot_guest variables used to map 2nd stage memory for KVM guests. Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- arch/arm/include/asm/pgtable-3level.h |8 ++ arch/arm/include/asm/pgtable.h|4 + arch/arm/kvm/mmu.c| 107 - arch/arm/mm/mmu.c |3 + 4 files changed, 120 insertions(+), 2 deletions(-) diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h index edc3cb9..6dc5331 100644 --- a/arch/arm/include/asm/pgtable-3level.h +++ b/arch/arm/include/asm/pgtable-3level.h @@ -104,6 +104,14 @@ */ #define L_PGD_SWAPPER (_AT(pgdval_t, 1) 55)/* swapper_pg_dir entry */ +/* + * 2-nd stage PTE definitions for LPAE. + */ +#define L_PTE2_READ(_AT(pteval_t, 1) 6) /* HAP[0] */ +#define L_PTE2_WRITE (_AT(pteval_t, 1) 7) /* HAP[1] */ +#define L_PTE2_NORM_WB (_AT(pteval_t, 3) 4) /* MemAttr[3:2] */ +#define L_PTE2_INNER_WB(_AT(pteval_t, 3) 2) /* MemAttr[1:0] */ + #ifndef __ASSEMBLY__ #define pud_none(pud) (!pud_val(pud)) diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h index 20025cc..778856b 100644 --- a/arch/arm/include/asm/pgtable.h +++ b/arch/arm/include/asm/pgtable.h @@ -76,6 +76,7 @@ extern void __pgd_error(const char *file, int line, pgd_t); extern pgprot_tpgprot_user; extern pgprot_tpgprot_kernel; +extern pgprot_tpgprot_guest; #define _MOD_PROT(p, b)__pgprot(pgprot_val(p) | (b)) @@ -89,6 +90,9 @@ extern pgprot_t pgprot_kernel; #define PAGE_KERNEL_MOD_PROT(pgprot_kernel, L_PTE_XN) #define PAGE_KERNEL_EXEC pgprot_kernel #define PAGE_HYP _MOD_PROT(pgprot_kernel, L_PTE_USER) +#define PAGE_KVM_GUEST _MOD_PROT(pgprot_guest, L_PTE2_READ | \ + L_PTE2_WRITE | L_PTE2_NORM_WB | \ + L_PTE2_INNER_WB) #define __PAGE_NONE__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN) #define __PAGE_SHARED __pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN) diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index f7a7b17..d468238 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -229,8 +229,111 @@ void kvm_free_stage2_pgd(struct kvm *kvm) kvm-arch.pgd = NULL; } +static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, + gfn_t gfn, struct kvm_memory_slot *memslot) +{ + pfn_t pfn; + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + pte_t *pte, new_pte; + + pfn = gfn_to_pfn(vcpu-kvm, gfn); + + if (is_error_pfn(pfn)) { + kvm_err(-EFAULT, Guest gfn %u (0x%08lx) does not have + corresponding host mapping, + gfn, gfn PAGE_SHIFT); + return -EFAULT; + } + + /* Create 2nd stage page table mapping - Level 1 */ + pgd = vcpu-kvm-arch.pgd + pgd_index(fault_ipa); + pud = pud_offset(pgd, fault_ipa); + if (pud_none(*pud)) { + pmd = pmd_alloc_one(NULL, fault_ipa); + if (!pmd) { + kvm_err(-ENOMEM, Cannot allocate 2nd stage pmd); + return -ENOMEM; + } + pud_populate(NULL, pud, pmd); + pmd += pmd_index(fault_ipa); + } else + pmd = pmd_offset(pud, fault_ipa); + + /* Create 2nd stage page table mapping - Level 2 */ + if (pmd_none(*pmd)) { + pte = pte_alloc_one_kernel(NULL, fault_ipa); + if (!pte) { + kvm_err(-ENOMEM, Cannot allocate 2nd stage pte); + return -ENOMEM; + } + pmd_populate_kernel(NULL, pmd, pte); + pte += pte_index(fault_ipa); + } else + pte = pte_offset_kernel(pmd, fault_ipa); + + /* Create 2nd stage page table mapping - Level 3 */ + new_pte = pfn_pte(pfn, PAGE_KVM_GUEST); + set_pte_ext(pte, new_pte, 0); + + return 0; +} + +#define HSR_ABT_FS (0x3f) +#define HPFAR_MASK (~0xf) + +/** + * kvm_handle_guest_abort - handles all 2nd stage aborts + * @vcpu: the VCPU pointer + * @run: the kvm_run structure + * + * Any abort that gets to the host is almost guaranteed to be caused by a + * missing second stage translation table entry, which can mean that either the + * guest simply needs more memory and we must allocate an appropriate page or it + * can mean that the guest tried to access I/O memory, which is emulated by user + * space. The
[PATCH v5 09/13] ARM: KVM: Handle I/O aborts
From: Christoffer Dall cd...@cs.columbia.edu When the guest accesses I/O memory this will create data abort exceptions and they are handled by decoding the HSR information (physical address, read/write, length, register) and forwarding reads and writes to QEMU which performs the device emulation. Certain classes of load/store operations do not support the syndrome information provided in the HSR and we therefore must be able to fetch the offending instruction from guest memory and decode it manually. This requires changing the general flow somewhat since new calls to run the VCPU must check if there's a pending MMIO load and perform the write after userspace has made the data available. Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- arch/arm/include/asm/kvm_emulate.h |2 arch/arm/include/asm/kvm_host.h|1 arch/arm/include/asm/kvm_mmu.h |1 arch/arm/kvm/arm.c |8 + arch/arm/kvm/emulate.c | 288 arch/arm/kvm/mmu.c | 155 +++ arch/arm/kvm/trace.h | 22 +++ 7 files changed, 470 insertions(+), 7 deletions(-) diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h index af21fd5..9899474 100644 --- a/arch/arm/include/asm/kvm_emulate.h +++ b/arch/arm/include/asm/kvm_emulate.h @@ -46,6 +46,8 @@ int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run); int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run); int kvm_handle_cp15_access(struct kvm_vcpu *vcpu, struct kvm_run *run); int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run); +int kvm_emulate_mmio_ls(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, + unsigned long instr); /* * Return the SPSR for the specified mode of the virtual CPU. diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 59fcd15..86f6cf1 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -88,6 +88,7 @@ struct kvm_vcpu_arch { u64 pc_ipa; /* IPA for the current PC (VA to PA result) */ /* IO related fields */ + bool mmio_sign_extend; /* for byte/halfword loads */ u32 mmio_rd; /* Misc. fields */ diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 9d7440c..e82eae9 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -35,6 +35,7 @@ void free_hyp_pmds(pgd_t *hyp_pgd); int kvm_alloc_stage2_pgd(struct kvm *kvm); void kvm_free_stage2_pgd(struct kvm *kvm); +int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run); int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run); #endif /* __ARM_KVM_MMU_H__ */ diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index a6e1763..e5348a7 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -379,6 +379,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) int ret; for (;;) { + if (run-exit_reason == KVM_EXIT_MMIO) { + ret = kvm_handle_mmio_return(vcpu, vcpu-run); + if (ret) + break; + } + + run-exit_reason = KVM_EXIT_UNKNOWN; + trace_kvm_entry(vcpu-arch.regs.pc); local_irq_disable(); diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c index fded8c7..4fb5a7d 100644 --- a/arch/arm/kvm/emulate.c +++ b/arch/arm/kvm/emulate.c @@ -20,6 +20,7 @@ #include asm/kvm_emulate.h #include trace/events/kvm.h +#include trace.h #include debug.h #include trace.h @@ -128,8 +129,30 @@ u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode) } /** - * Co-processor emulation + * Utility functions common for all emulation code + */ + +/* + * This one accepts a matrix where the first element is the + * bits as they must be, and the second element is the bitmask. */ +#define INSTR_NONE -1 +static int kvm_instr_index(u32 instr, u32 table[][2], int table_entries) +{ + int i; + u32 mask; + + for (i = 0; i table_entries; i++) { + mask = table[i][1]; + if ((table[i][0] mask) == (instr mask)) + return i; + } + return INSTR_NONE; +} + +/** + * Co-processor emulation + */ struct coproc_params { unsigned long CRm; @@ -228,9 +251,11 @@ static int emulate_cp15_c10_access(struct kvm_vcpu *vcpu, * @vcpu: The VCPU pointer * @p:The coprocessor parameters struct pointer holding trap inst. details * -
[PATCH v5 10/13] ARM: KVM: Guest wait-for-interrupts (WFI) support
From: Christoffer Dall cd...@cs.columbia.edu When the guest executes a WFI instruction the operation is trapped to KVM, which emulates the instruction in software. There is no correlation between a guest executing a WFI instruction and actually puttin the hardware into a low-power mode, since a KVM guest is essentially a process and the WFI instruction can be seen as 'sleep' call from this process. Therefore, we flag the VCPU to be in wait_for_interrupts mode and call the main KVM function kvm_vcpu_block() function. This function will put the thread on a wait-queue and call schedule. When an interrupt comes in through KVM_IRQ_LINE (see previous patch) we signal the VCPU thread and unflag the VCPU to no longer wait for interrupts. All calls to kvm_arch_vcpu_ioctl_run() result in a call to kvm_vcpu_block() as long as the VCPU is in wfi-mode. Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- arch/arm/kvm/arm.c | 33 - arch/arm/kvm/emulate.c | 12 arch/arm/kvm/trace.h | 15 +++ 3 files changed, 51 insertions(+), 9 deletions(-) diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index e5348a7..00215a1 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -302,9 +302,16 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, return -EINVAL; } +/** + * kvm_arch_vcpu_runnable - determine if the vcpu can be scheduled + * @v: The VCPU pointer + * + * If the guest CPU is not waiting for interrupts then it is by definition + * runnable. + */ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v) { - return 0; + return !v-arch.wait_for_interrupts; } static inline int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run, @@ -379,6 +386,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) int ret; for (;;) { + if (vcpu-arch.wait_for_interrupts) + goto wait_for_interrupts; + if (run-exit_reason == KVM_EXIT_MMIO) { ret = kvm_handle_mmio_return(vcpu, vcpu-run); if (ret) @@ -408,16 +418,19 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) if (run-exit_reason == KVM_EXIT_MMIO) break; - if (need_resched()) { - vcpu_put(vcpu); - schedule(); - vcpu_load(vcpu); - } - - if (signal_pending(current) !(run-exit_reason)) { - run-exit_reason = KVM_EXIT_IRQ_WINDOW_OPEN; + if (need_resched()) + kvm_resched(vcpu); +wait_for_interrupts: + if (signal_pending(current)) { + if (!run-exit_reason) { + ret = -EINTR; + run-exit_reason = KVM_EXIT_INTR; + } break; } + + if (vcpu-arch.wait_for_interrupts) + kvm_vcpu_block(vcpu); } return ret; @@ -454,6 +467,8 @@ static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm, if (irq_level-level) { vcpu-arch.virt_irq |= mask; vcpu-arch.wait_for_interrupts = 0; + if (waitqueue_active(vcpu-wq)) + wake_up_interruptible(vcpu-wq); } else vcpu-arch.virt_irq = ~mask; diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c index 4fb5a7d..f60c75a 100644 --- a/arch/arm/kvm/emulate.c +++ b/arch/arm/kvm/emulate.c @@ -335,8 +335,20 @@ unsupp_err_out: return -EINVAL; } +/** + * kvm_handle_wfi - handle a wait-for-interrupts instruction executed by a guest + * @vcpu: the vcpu pointer + * @run: the kvm_run structure pointer + * + * Simply sets the wait_for_interrupts flag on the vcpu structure, which will + * halt execution of world-switches and schedule other host processes until + * there is an incoming IRQ or FIQ to the VM. + */ int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run) { + trace_kvm_wfi(vcpu-arch.regs.pc); + if (!vcpu-arch.virt_irq) + vcpu-arch.wait_for_interrupts = 1; return 0; } diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h index 8ba3db9..693da82 100644 --- a/arch/arm/kvm/trace.h +++ b/arch/arm/kvm/trace.h @@ -111,6 +111,21 @@ TRACE_EVENT(kvm_irq_line, __entry-level, __entry-vcpu_idx) ); +TRACE_EVENT(kvm_wfi, + TP_PROTO(unsigned long vcpu_pc), + TP_ARGS(vcpu_pc), + + TP_STRUCT__entry( + __field(unsigned long, vcpu_pc ) + ), + + TP_fast_assign( + __entry-vcpu_pc= vcpu_pc; + ), + + TP_printk(guest executed wfi at: 0x%08lx, __entry-vcpu_pc) +); + #endif /* _TRACE_KVM_H */ -- To
[PATCH v5 11/13] ARM: KVM: Support SMP hosts
In order to support KVM on a SMP host, it is necessary to initialize the hypervisor on all CPUs, mostly by making sure each CPU gets its own hypervisor stack and runs the HYP init code. We also take care of some missing locking of modifications to the hypervisor page tables and ensure synchronized consistency between virtual IRQ masks and wait_for_interrupt flags on the VPUs. Note that this code doesn't handle CPU hotplug yet. Note that this code doesn't support SMP guests. WARNING: This code is in development and guests do not fully boot on SMP hosts yet. Signed-off-by: Marc Zyngier marc.zyng...@arm.com Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- arch/arm/include/asm/kvm_host.h |4 - arch/arm/include/asm/kvm_mmu.h |1 arch/arm/kvm/arm.c | 175 +++ arch/arm/kvm/emulate.c |2 arch/arm/kvm/mmu.c |9 ++ 5 files changed, 114 insertions(+), 77 deletions(-) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 86f6cf1..a0ffbe8 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -78,8 +78,6 @@ struct kvm_vcpu_arch { u32 c13_TID_PRIV; /* Thread ID, Priveleged */ } cp15; - u32 virt_irq; /* HCR exception mask */ - /* Exception Information */ u32 hsr;/* Hyp Syndrom Register */ u32 hdfar; /* Hyp Data Fault Address Register */ @@ -92,6 +90,8 @@ struct kvm_vcpu_arch { u32 mmio_rd; /* Misc. fields */ + spinlock_t irq_lock; + u32 virt_irq; /* HCR exception mask */ u32 wait_for_interrupts; }; diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index e82eae9..917edd7 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -28,6 +28,7 @@ #define PGD2_ORDER get_order(PTRS_PER_PGD2 * sizeof(pgd_t)) extern pgd_t *kvm_hyp_pgd; +extern struct mutex kvm_hyp_pgd_mutex; int create_hyp_mappings(pgd_t *hyp_pgd, void *from, void *to); void free_hyp_pmds(pgd_t *hyp_pgd); diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 00215a1..6e384e2 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -61,7 +61,7 @@ void __kvm_print_msg(char *fmt, ...) spin_unlock(__tmp_log_lock); } -static void *kvm_arm_hyp_stack_page; +static DEFINE_PER_CPU(void *, kvm_arm_hyp_stack_page); /* The VMID used in the VTTBR */ #define VMID_SIZE (18) @@ -257,6 +257,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) unsigned long cpsr; unsigned long sctlr; + spin_lock_init(vcpu-arch.irq_lock); + /* Init execution CPSR */ asm volatile (mrs %[cpsr], cpsr : [cpsr] =r (cpsr)); @@ -464,13 +466,27 @@ static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm, trace_kvm_irq_line(irq_level-irq % 2, irq_level-level, vcpu_idx); + spin_lock(vcpu-arch.irq_lock); if (irq_level-level) { vcpu-arch.virt_irq |= mask; + + /* +* Note that we grab the wq.lock before clearing the wfi flag +* since this ensures that a concurrent call to kvm_vcpu_block +* will either sleep before we grab the lock, in which case we +* wake it up, or will never sleep due to +* kvm_arch_vcpu_runnable being true (iow. this avoids having +* to grab the irq_lock in kvm_arch_vcpu_runnable). +*/ + spin_lock(vcpu-wq.lock); vcpu-arch.wait_for_interrupts = 0; + if (waitqueue_active(vcpu-wq)) - wake_up_interruptible(vcpu-wq); + __wake_up_locked(vcpu-wq, TASK_INTERRUPTIBLE); + spin_unlock(vcpu-wq.lock); } else vcpu-arch.virt_irq = ~mask; + spin_unlock(vcpu-arch.irq_lock); return 0; } @@ -505,14 +521,49 @@ long kvm_arch_vm_ioctl(struct file *filp, } } +static void cpu_set_vector(void *vector) +{ + /* +* Set the HVBAR +*/ + asm volatile ( + movr0, %[vector_ptr]\n\t + ldrr7, =SMCHYP_HVBAR_W\n\t + smc#0\n\t : : + [vector_ptr] r (vector) : + r0, r7); +} + +static void cpu_init_hyp_mode(void *vector) +{ + unsigned long hyp_stack_ptr; + void *stack_page; + + stack_page = __get_cpu_var(kvm_arm_hyp_stack_page); + hyp_stack_ptr = (unsigned long)stack_page + PAGE_SIZE; + + cpu_set_vector(vector); + + /* +* Call initialization code +*/ + asm volatile ( + movr0, %[pgd_ptr]\n\t + movr1, %[stack_ptr]\n\t + hvc#0\n\t : : + [pgd_ptr] r (virt_to_phys(kvm_hyp_pgd)), + [stack_ptr] r
[PATCH v5 12/13] ARM: KVM: Fix guest view of MPIDR
From: Marc Zyngier marc.zyng...@arm.com A guest may need to know which CPU it has booted on (and Linux does). Now that we can run KVM on a SMP host, QEMU may be running on any CPU. In that case, directly reading MPIDR will give an inconsistent view on the guest CPU number (among other problems). The solution is to use the VMPIDR register, which is computed by using the host MPIDR and overriding the low bits with KVM vcpu_id. Signed-off-by: Marc Zyngier marc.zyng...@arm.com --- arch/arm/include/asm/kvm_host.h |1 + arch/arm/kernel/asm-offsets.c |1 + arch/arm/kvm/arm.c |4 arch/arm/kvm/interrupts.S |8 4 files changed, 14 insertions(+), 0 deletions(-) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index a0ffbe8..7fcc412 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -63,6 +63,7 @@ struct kvm_vcpu_arch { /* System control coprocessor (cp15) */ struct { u32 c0_MIDR;/* Main ID Register */ + u32 c0_MPIDR; /* MultiProcessor ID Register */ u32 c1_SCTLR; /* System Control Register */ u32 c1_ACTLR; /* Auxilliary Control Register */ u32 c1_CPACR; /* Coprocessor Access Control */ diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c index c126cfb..1c6e2ee 100644 --- a/arch/arm/kernel/asm-offsets.c +++ b/arch/arm/kernel/asm-offsets.c @@ -148,6 +148,7 @@ int main(void) #ifdef CONFIG_KVM_ARM_HOST DEFINE(VCPU_KVM, offsetof(struct kvm_vcpu, kvm)); DEFINE(VCPU_MIDR,offsetof(struct kvm_vcpu, arch.cp15.c0_MIDR)); + DEFINE(VCPU_MPIDR, offsetof(struct kvm_vcpu, arch.cp15.c0_MPIDR)); DEFINE(VCPU_SCTLR, offsetof(struct kvm_vcpu, arch.cp15.c1_SCTLR)); DEFINE(VCPU_CPACR, offsetof(struct kvm_vcpu, arch.cp15.c1_CPACR)); DEFINE(VCPU_TTBR0, offsetof(struct kvm_vcpu, arch.cp15.c2_TTBR0)); diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 6e384e2..9c5c38e 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -32,6 +32,7 @@ #include asm/ptrace.h #include asm/mman.h #include asm/tlbflush.h +#include asm/cputype.h #include asm/kvm_arm.h #include asm/kvm_asm.h #include asm/kvm_mmu.h @@ -270,6 +271,9 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) [sctlr] =r (sctlr)); vcpu-arch.cp15.c1_SCTLR = sctlr ~1U; + /* Compute guest MPIDR */ + vcpu-arch.cp15.c0_MPIDR = (read_cpuid_mpidr() ~0xff) | vcpu-vcpu_id; + return 0; } diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S index d516bf4..fbc26ca 100644 --- a/arch/arm/kvm/interrupts.S +++ b/arch/arm/kvm/interrupts.S @@ -245,6 +245,10 @@ ENTRY(__kvm_vcpu_run) ldr r1, [r0, #VCPU_MIDR] mcr p15, 4, r1, c0, c0, 0 + @ Write guest view of MPIDR into VMPIDR + ldr r1, [r0, #VCPU_MPIDR] + mcr p15, 4, r1, c0, c0, 5 + @ Load guest registers add r0, r0, #(VCPU_USR_SP) load_mode_state r0, usr @@ -291,6 +295,10 @@ __kvm_vcpu_return: mrc p15, 0, r2, c0, c0, 0 mcr p15, 4, r2, c0, c0, 0 + @ Back to hardware MPIDR + mrc p15, 0, r2, c0, c0, 5 + mcr p15, 4, r2, c0, c0, 5 + @ Set VMID == 0 mov r2, #0 mov r3, #0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 13/13] ARM: KVM: Support SMP guests
This patch is a beginning attempt to support SMP guests. So far we only add locking for the second stage PGD stored on the kvm_arch struct. WARNING: This code is untested and does not yet support SMP guests. Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- arch/arm/include/asm/kvm_host.h | 12 ++-- arch/arm/kvm/arm.c |1 + arch/arm/kvm/mmu.c | 57 +-- 3 files changed, 47 insertions(+), 23 deletions(-) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 7fcc412..555a6f1 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -31,9 +31,15 @@ struct kvm_vcpu; u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode); struct kvm_arch { - u32vmid;/* The VMID used for the virt. memory system */ - pgd_t *pgd; /* 1-level 2nd stage table */ - u64vttbr; /* VTTBR value associated with above pgd and vmid */ + /* The VMID used for the virt. memory system */ + u32vmid; + + /* 1-level 2nd stage table and lock */ + struct mutex pgd_mutex; + pgd_t *pgd; + + /* VTTBR value associated with above pgd and vmid */ + u64vttbr; }; #define EXCEPTION_NONE 0 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 9c5c38e..14ccc4d 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -119,6 +119,7 @@ int kvm_arch_init_vm(struct kvm *kvm) ret = kvm_alloc_stage2_pgd(kvm); if (ret) goto out_fail_alloc; + mutex_init(kvm-arch.pgd_mutex); pgd_phys = virt_to_phys(kvm-arch.pgd); kvm-arch.vttbr = pgd_phys ((1LLU 40) - 1) ~((2 VTTBR_X) - 1); diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 50c9571..baeb8a1 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -177,6 +177,9 @@ out: * Allocates the 1st level table only of size defined by PGD2_ORDER (can * support either full 40-bit input addresses or limited to 32-bit input * addresses). Clears the allocated pages. + * + * Note we don't need locking here as this is only called when the VM is + * destroyed, which can only be done once. */ int kvm_alloc_stage2_pgd(struct kvm *kvm) { @@ -204,6 +207,9 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm) * Walks the level-1 page table pointed to by kvm-arch.pgd and frees all * underlying level-2 and level-3 tables before freeing the actual level-1 table * and setting the struct pointer to NULL. + * + * Note we don't need locking here as this is only called when the VM is + * destroyed, which can only be done once. */ void kvm_free_stage2_pgd(struct kvm *kvm) { @@ -239,49 +245,38 @@ void kvm_free_stage2_pgd(struct kvm *kvm) kvm-arch.pgd = NULL; } -static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, - gfn_t gfn, struct kvm_memory_slot *memslot) +static int __user_mem_abort(struct kvm *kvm, phys_addr_t addr, pfn_t pfn) { - pfn_t pfn; pgd_t *pgd; pud_t *pud; pmd_t *pmd; pte_t *pte, new_pte; - pfn = gfn_to_pfn(vcpu-kvm, gfn); - - if (is_error_pfn(pfn)) { - kvm_err(-EFAULT, Guest gfn %u (0x%08lx) does not have - corresponding host mapping, - gfn, gfn PAGE_SHIFT); - return -EFAULT; - } - /* Create 2nd stage page table mapping - Level 1 */ - pgd = vcpu-kvm-arch.pgd + pgd_index(fault_ipa); - pud = pud_offset(pgd, fault_ipa); + pgd = kvm-arch.pgd + pgd_index(addr); + pud = pud_offset(pgd, addr); if (pud_none(*pud)) { - pmd = pmd_alloc_one(NULL, fault_ipa); + pmd = pmd_alloc_one(NULL, addr); if (!pmd) { kvm_err(-ENOMEM, Cannot allocate 2nd stage pmd); return -ENOMEM; } pud_populate(NULL, pud, pmd); - pmd += pmd_index(fault_ipa); + pmd += pmd_index(addr); } else - pmd = pmd_offset(pud, fault_ipa); + pmd = pmd_offset(pud, addr); /* Create 2nd stage page table mapping - Level 2 */ if (pmd_none(*pmd)) { - pte = pte_alloc_one_kernel(NULL, fault_ipa); + pte = pte_alloc_one_kernel(NULL, addr); if (!pte) { kvm_err(-ENOMEM, Cannot allocate 2nd stage pte); return -ENOMEM; } pmd_populate_kernel(NULL, pmd, pte); - pte += pte_index(fault_ipa); + pte += pte_index(addr); } else - pte = pte_offset_kernel(pmd, fault_ipa); + pte = pte_offset_kernel(pmd, addr); /* Create 2nd stage page table mapping - Level 3 */ new_pte = pfn_pte(pfn, PAGE_KVM_GUEST); @@ -290,6 +285,28 @@ static int
[PATCH] kvm tools: Add NMI ability to 'kvm debug'
This allows triggering NMI on guests using 'kvm debug -m [cpu]'. Please note that the default behaviour of 'kvm debug' dumping guest's cpu state has been modified to require a '-d'/--dump. Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/builtin-debug.c| 22 +++ tools/kvm/builtin-run.c | 16 +++- tools/kvm/include/kvm/builtin-debug.h| 11 tools/kvm/include/kvm/kvm-cpu.h |1 + tools/kvm/kvm-cpu.c |5 +++ tools/kvm/x86/include/kvm/kvm-cpu-arch.h |1 + tools/kvm/x86/kvm-cpu.c | 41 ++ 7 files changed, 90 insertions(+), 7 deletions(-) diff --git a/tools/kvm/builtin-debug.c b/tools/kvm/builtin-debug.c index 045dc2c..eee26c3 100644 --- a/tools/kvm/builtin-debug.c +++ b/tools/kvm/builtin-debug.c @@ -14,13 +14,10 @@ static bool all; static int instance; +static int nmi = -1; +static bool dump; static const char *instance_name; -struct debug_cmd { - u32 type; - u32 len; -}; - static const char * const debug_usage[] = { kvm debug [--all] [-n name], NULL @@ -30,6 +27,8 @@ static const struct option debug_options[] = { OPT_GROUP(General options:), OPT_BOOLEAN('a', all, all, Debug all instances), OPT_STRING('n', name, instance_name, name, Instance name), + OPT_BOOLEAN('d', dump, dump, Generate a debug dump from guest), + OPT_INTEGER('m', nmi, nmi, Generate NMI on VCPU), OPT_END() }; @@ -51,13 +50,24 @@ void kvm_debug_help(void) static int do_debug(const char *name, int sock) { char buff[BUFFER_SIZE]; - struct debug_cmd cmd = {KVM_IPC_DEBUG, 0}; + struct debug_cmd cmd = {KVM_IPC_DEBUG, 2 * sizeof(u32)}; int r; + if (dump) + cmd.dbg_type |= KVM_DEBUG_CMD_TYPE_DUMP; + + if (nmi != -1) { + cmd.dbg_type |= KVM_DEBUG_CMD_TYPE_NMI; + cmd.cpu = nmi; + } + r = xwrite(sock, cmd, sizeof(cmd)); if (r 0) return r; + if (!dump) + return 0; + do { r = xread(sock, buff, BUFFER_SIZE); if (r 0) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 7969901..7709edb 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -31,6 +31,7 @@ #include kvm/guest_compat.h #include kvm/pci-shmem.h #include kvm/kvm-ipc.h +#include kvm/builtin-debug.h #include linux/types.h @@ -464,7 +465,7 @@ static void handle_sigusr1(int sig) struct kvm_cpu *cpu = current_kvm_cpu; int fd = kvm_cpu__get_debug_fd(); - if (!cpu) + if (!cpu || cpu-needs_nmi) return; dprintf(fd, \n #\n # vCPU #%ld's dump:\n #\n, cpu-cpu_id); @@ -495,6 +496,19 @@ static void handle_pause(int fd, u32 type, u32 len, u8 *msg) static void handle_debug(int fd, u32 type, u32 len, u8 *msg) { int i; + u32 dbg_type = *(u32 *)msg; + int vcpu = *(((u32 *)msg) + 1); + + if (dbg_type KVM_DEBUG_CMD_TYPE_NMI) { + if (vcpu = kvm-nrcpus) + return; + + kvm_cpus[vcpu]-needs_nmi = 1; + pthread_kill(kvm_cpus[vcpu]-thread, SIGUSR1); + } + + if (!(dbg_type KVM_DEBUG_CMD_TYPE_DUMP)) + return; for (i = 0; i nrcpus; i++) { struct kvm_cpu *cpu = kvm_cpus[i]; diff --git a/tools/kvm/include/kvm/builtin-debug.h b/tools/kvm/include/kvm/builtin-debug.h index 3fc2469..b24b501 100644 --- a/tools/kvm/include/kvm/builtin-debug.h +++ b/tools/kvm/include/kvm/builtin-debug.h @@ -1,6 +1,17 @@ #ifndef KVM__DEBUG_H #define KVM__DEBUG_H +#include linux/types.h + +struct debug_cmd { + u32 type; + u32 len; + u32 dbg_type; +#define KVM_DEBUG_CMD_TYPE_DUMP(1 0) +#define KVM_DEBUG_CMD_TYPE_NMI (1 1) + u32 cpu; +}; + int kvm_cmd_debug(int argc, const char **argv, const char *prefix); void kvm_debug_help(void); diff --git a/tools/kvm/include/kvm/kvm-cpu.h b/tools/kvm/include/kvm/kvm-cpu.h index 15618f1..d4448f6 100644 --- a/tools/kvm/include/kvm/kvm-cpu.h +++ b/tools/kvm/include/kvm/kvm-cpu.h @@ -19,5 +19,6 @@ void kvm_cpu__set_debug_fd(int fd); void kvm_cpu__show_code(struct kvm_cpu *vcpu); void kvm_cpu__show_registers(struct kvm_cpu *vcpu); void kvm_cpu__show_page_tables(struct kvm_cpu *vcpu); +void kvm_cpu__arch_nmi(struct kvm_cpu *cpu); #endif /* KVM__KVM_CPU_H */ diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c index 884a89f..8ec4efa 100644 --- a/tools/kvm/kvm-cpu.c +++ b/tools/kvm/kvm-cpu.c @@ -94,6 +94,11 @@ int kvm_cpu__start(struct kvm_cpu *cpu) cpu-paused = 0; } + if (cpu-needs_nmi) { + kvm_cpu__arch_nmi(cpu); + cpu-needs_nmi = 0; + } + kvm_cpu__run(cpu);
Re: [PATCH v5 00/13] KVM/ARM Implementation
On 11 December 2011 10:24, Christoffer Dall c.d...@virtualopensystems.com wrote: The following series implements KVM support for ARM processors, specifically on the Cortex A-15 platform. Still on the to-do list: - Reuse VMIDs - Fix SMP host support - Fix SMP guest support - Support guest Thumb mode for MMIO emulation - Further testing - Performance improvements Other items for this list: - Support Neon/VFP in guests (the fpu regs struct is empty ATM) - Support guest debugging I couldn't see any support for the TLS registers in your cp15 emulation: did I miss it, or do we handle it without needing to trap? -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] virtio: use mandatory barriers for remote processor vdevs
On Sat, Dec 03, 2011 at 03:44:36PM +1030, Rusty Russell wrote: On Sat, 03 Dec 2011 10:09:44 +1100, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: On Tue, 2011-11-29 at 14:31 +0200, Ohad Ben-Cohen wrote: A trivial, albeit sub-optimal, solution would be to simply revert commit d57ed95 virtio: use smp_XX barriers on SMP. Obviously, though, that's going to have a negative impact on performance of SMP-based virtualization use cases. Have you measured the impact of using normal barriers (non-SMP ones) like we use on normal HW drivers unconditionally ? IE. If the difference is small enough I'd say just go for it and avoid the bloat. Yep. Plan is: 1) Measure the difference. 2) Difference unmeassurable? Use normal barriers (ie. revert d57ed95). 3) Difference small? Revert d57ed95 for 3.2, revisit for 3.3. 4) Difference large? Runtime switch based on if you're PCI for 3.2, revisit for 3.3. Cheers, Rusty. Forwarding some results by Amos, who run multiple netperf streams in parallel, from an external box to the guest. TCP_STREAM results were noisy. This could be due to buffering done by TCP, where packet size varies even as message size is constant. TCP_RR results were consistent. In this benchmark, after switching to mandatory barriers, CPU utilization increased by up to 35% while throughput went down by up to 14%. the normalized throughput/cpu regressed consistently, between 7 and 35% The fix applied was simply this: diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 3198f2e..fdccb77 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -23,7 +23,7 @@ /* virtio guest is communicating with a virtual device that actually runs on * a host processor. Memory barriers are used to control SMP effects. */ -#ifdef CONFIG_SMP +#if 0 /* Where possible, use SMP barriers which are more lightweight than mandatory * barriers, because mandatory barriers control MMIO effects on accesses * through relaxed memory I/O windows (which virtio does not use). */ -- MST Fri Dec 9 23:57:33 2011 1 - old-exhost_guest.txt 2 - fixed-exhost_guest.txt == TCP_STREAM sessions| size|throughput| cpu| normalize| #tx-pkts| #rx-pkts| #re-trans| #tx-intr| #rx-intr| #io_exit| #irq_inj|#tpkt/#exit| #rpkt/#irq 11| 64|949.64| 10.64|89| 1170134| 1368739| 0|17|487392|488820|504716| 2.39| 2.71 21| 64|946.03| 10.87|87| 1119582| 1325851| 0|17|493763|485865|516161| 2.30| 2.57 % | 0.0| -0.4| +2.2| -2.2| -4.3| -3.1| 0| 0.0| +1.3| -0.6| +2.3| -3.8| -5.2 12| 64| 1877.15| 15.45| 121| 2151267| 2561929| 0|33|923916|971093|969360| 2.22| 2.64 22| 64| 1867.63| 15.06| 124| 2212457| 2607606| 0|33|836160|927721|883964| 2.38| 2.95 % | 0.0| -0.5| -2.5| +2.5| +2.8| +1.8| 0| 0.0| -9.5| -4.5| -8.8| +7.2| +11.7 14| 64| 3577.38| 19.62| 182| 4176151| 5036661| 0|64| 1677417| 1412979| 1859101| 2.96| 2.71 24| 64| 3583.17| 20.05| 178| 4215327| 5063534| 0|65| 1682582| 1549394| 1759033| 2.72| 2.88 % | 0.0| +0.2| +2.2| -2.2| +0.9| +0.5| 0| +1.6| +0.3| +9.7| -5.4| -8.1| +6.3 11| 256| 2654.52| 11.41| 232|925787| 1029214| 0|14|597763|670927|619414| 1.38| 1.66 21| 256| 2632.22| 20.32| 129|977446| 1036094| 0|15|742699|715460|764512| 1.37| 1.36 % | 0.0| -0.8| +78.1| -44.4| +5.6| +0.7| 0| +7.1| +24.2| +6.6| +23.4| -0.7| -18.1 12| 256| 5228.76| 16.94| 308| 1949442| 2082492| 0|30| 1230329| 1323945| 1274262| 1.47| 1.63 22| 256| 5140.98| 19.58| 262| 1991090| 2093206| 0|30| 1400232| 1271363| 1441564| 1.57| 1.45 % | 0.0| -1.7| +15.6| -14.9| +2.1| +0.5| 0| 0.0| +13.8| -4.0| +13.1| +6.8| -11.0 14| 256| 9412.61| 24.04| 391| 2292404| 2351356| 0|35| 1669864|555786| 1741742| 4.12| 1.35 24| 256| 9408.92| 22.80| 412| 2303267|
Re: [PATCH 0/11] RFC: PCI using capabilitities
On Sun, Dec 11, 2011 at 12:03:52PM +0200, Sasha Levin wrote: On Sun, 2011-12-11 at 11:05 +0200, Avi Kivity wrote: mmios are strictly ordered. Perhaps your printfs are reordered by buffering? Are they from different threads? Are you using coalesced mmio (which is still strictly ordered, if used correctly)? I print the queue_selector and queue_address in the printfs, even if printfs were reordered they would be printing the data right, unlike they do now. It's the data in the printfs that matters, not their order. Same vcpu thread with both accesses. Not using coalesced mmio. Not sure why this would matter, but is the BAR a prefetcheable one? Rusty's patch uses pci_iomap which maps a prefetcheable BAR as cacheable. -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5 V5] Avoid soft lockup message when KVM is stopped by host
On 12/07/2011 04:41 PM, Avi Kivity wrote: On 12/05/2011 10:18 PM, Eric B Munson wrote: Changes from V4: Rename KVM_GUEST_PAUSED to KVMCLOCK_GUEST_PAUSED Add description of KVMCLOCK_GUEST_PAUSED ioctl to api.txt Changes from V3: Include CC's on patch 3 Drop clear flag ioctl and have the watchdog clear the flag when it is reset Changes from V2: A new kvm functions defined in kvm_para.h, the only change to pvclock is the initial flag definition Changes from V1: (Thanks Marcelo) Host code has all been moved to arch/x86/kvm/x86.c KVM_PAUSE_GUEST was renamed to KVM_GUEST_PAUSED When a guest kernel is stopped by the host hypervisor it can look like a soft lockup to the guest kernel. This false warning can mask later soft lockup warnings which may be real. This patch series adds a method for a host hypervisor to communicate to a guest kernel that it is being stopped. The final patch in the series has the watchdog check this flag when it goes to issue a soft lockup warning and skip the warning if the guest knows it was stopped. It was attempted to solve this in Qemu, but the side effects of saving and restoring the clock and tsc for each vcpu put the wall clock of the guest behind by the amount of time of the pause. This forces a guest to have ntp running in order to keep the wall clock accurate. Guests need to run NTP regardless, not only the virtualization layer add some skew, the physical world is not that perfect. btw: traditional NTP client won't sync the time automatically if the diff is 0.5%. Having this controlled from userspace means it doesn't work for SIGSTOP or for long scheduling delays. What about doing this automatically based on preempt notifiers? Isn't it solved by steal time? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/11] RFC: PCI using capabilitities
On Thu, Dec 08, 2011 at 05:37:37PM +0200, Sasha Levin wrote: On Thu, 2011-12-08 at 20:52 +1030, Rusty Russell wrote: Here's the patch series I ended up with. I haven't coded up the QEMU side yet, so no idea if the new driver works. Questions: (1) Do we win from separating ISR, NOTIFY and COMMON? (2) I used a u8 bar; should I use a bir and pack it instead? BIR seems a little obscure (noone else in the kernel source seems to refer to it). I started implementing it for KVM tools, when I noticed a strange thing: my vq creating was failing because the driver was reading a value other than 0 from the address field of a new vq, and failing. I've added simple prints in the usermode code, and saw the following ordering: 1. queue select vq 0 2. queue read address (returns 0 - new vq) 3. queue write address (good address of vq) 4. queue read address (returns !=0, fails) 4. queue select vq 1 From that I understood that the ordering is wrong, the driver was trying to read address before selecting the correct vq. At that point, I've added simple prints to the driver. Initially it looked as follows: iowrite16(index, vp_dev-common-queue_select); switch (ioread64(vp_dev-common-queue_address)) { [...] }; So I added prints before the iowrite16() and after the ioread64(), and saw that while the driver prints were ordered, the device ones weren't: [1.264052] before iowrite index=1 kvmtool: net returning pfn (vq=0): 310706176 kvmtool: queue selected: 1 [1.264890] after ioread index=1 Suspecting that something was wrong with ordering, I've added a print between the iowrite and the ioread, and it finally started working well. Which leads me to the question: Are MMIO vs MMIO reads/writes not ordered? First, I'd like to answer your questions from the PCI side. Look for PCI rules in the PCI spec. You will notices that a write is required to be able to pass a read request. It might also pass read completion. A read request will not pass a write request. There's more or less no ordering between different types of transactions (memory versus io/configuration). That's wrt to the question you asked. But this is not your setup: you have a single vcpu so you will not initiate a write (select vq) until you get a read completion. So what you are really describing is this setup: guest reads a value, gets the response, then writes out another one, and kvm tool reports the write before the read. -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/11] RFC: PCI using capabilitities
On Sun, 2011-12-11 at 14:30 +0200, Michael S. Tsirkin wrote: On Sun, Dec 11, 2011 at 12:03:52PM +0200, Sasha Levin wrote: On Sun, 2011-12-11 at 11:05 +0200, Avi Kivity wrote: mmios are strictly ordered. Perhaps your printfs are reordered by buffering? Are they from different threads? Are you using coalesced mmio (which is still strictly ordered, if used correctly)? I print the queue_selector and queue_address in the printfs, even if printfs were reordered they would be printing the data right, unlike they do now. It's the data in the printfs that matters, not their order. Same vcpu thread with both accesses. Not using coalesced mmio. Not sure why this would matter, but is the BAR a prefetcheable one? Rusty's patch uses pci_iomap which maps a prefetcheable BAR as cacheable. Wasn't defined as prefetchable, but I'm seeing same thing with or without it. -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/11] RFC: PCI using capabilitities
On Sun, 2011-12-11 at 14:47 +0200, Michael S. Tsirkin wrote: First, I'd like to answer your questions from the PCI side. Look for PCI rules in the PCI spec. You will notices that a write is required to be able to pass a read request. It might also pass read completion. A read request will not pass a write request. There's more or less no ordering between different types of transactions (memory versus io/configuration). That's wrt to the question you asked. But this is not your setup: you have a single vcpu so you will not initiate a write (select vq) until you get a read completion. So what you are really describing is this setup: guest reads a value, gets the response, then writes out another one, and kvm tool reports the write before the read. No, it's exactly the opposite. Guest writes a value first and then reads one (writes queue_select and reads queue_address) and kvm tool reporting the read before the write. I must add here that the kvm tool doesn't do anything fancy with simple IO/MMIO. Theres no thread games or anything similar there. The vcpu thread is doing all the IO/MMIO work. -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv3 00/10] KVM in-guest performance monitoring
On 11/10/2011 02:57 PM, Gleb Natapov wrote: This patchset exposes an emulated version 2 architectural performance monitoring unit to KVM guests. The PMU is emulated using perf_events, so the host kernel can multiplex host-wide, host-user, and the guest on available resources. The patches are against next branch on kvm.git. Thanks, applied. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/10] nEPT: Nested INVEPT
On Thu, Nov 10, 2011, Avi Kivity wrote about Re: [PATCH 08/10] nEPT: Nested INVEPT: On 11/10/2011 12:01 PM, Nadav Har'El wrote: If we let L1 use EPT, we should probably also support the INVEPT instruction. .. + if (vmcs12 nested_cpu_has_ept(vmcs12) + (vmcs12-ept_pointer == operand.eptp) + vmx-nested.last_eptp02) + ept_sync_context(vmx-nested.last_eptp02); + else + ept_sync_global(); Are either of these needed? Won't a write to a shadowed EPT table cause them anyway? This is very good point... You're right that as it stands, any changes to the guest EPT table (EPT12) will cause changes to the shadow EPT table (EPT02), and these already cause KVM to do an INVEPT, so no point to do this again when the guest asks. So basically, I can have INVEPT emulated by doing absolutely nothing (after checking all the checks), right? I wonder if I am missing any reason why a hypervisor might want to do INVEPT without changing the EPT12 table first. -- Nadav Har'El|Sunday, Dec 11 2011, n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |Why do programmers mix up Christmas and http://nadav.harel.org.il |Halloween? Because DEC 25 = OCT 31 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/10] nEPT: Nested INVEPT
On 12/11/2011 04:24 PM, Nadav Har'El wrote: On Thu, Nov 10, 2011, Avi Kivity wrote about Re: [PATCH 08/10] nEPT: Nested INVEPT: On 11/10/2011 12:01 PM, Nadav Har'El wrote: If we let L1 use EPT, we should probably also support the INVEPT instruction. .. + if (vmcs12 nested_cpu_has_ept(vmcs12) + (vmcs12-ept_pointer == operand.eptp) + vmx-nested.last_eptp02) + ept_sync_context(vmx-nested.last_eptp02); + else + ept_sync_global(); Are either of these needed? Won't a write to a shadowed EPT table cause them anyway? This is very good point... You're right that as it stands, any changes to the guest EPT table (EPT12) will cause changes to the shadow EPT table (EPT02), and these already cause KVM to do an INVEPT, so no point to do this again when the guest asks. So basically, I can have INVEPT emulated by doing absolutely nothing (after checking all the checks), right? Right. This was the case for INVLPG before we added out-of-sync pages; we didn't even intercept the instruction. I wonder if I am missing any reason why a hypervisor might want to do INVEPT without changing the EPT12 table first. Shouldn't happen, but why do you care? If EPT12 has not changed, any access through EPT02 or its TLB entry is valid. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace
Just found two, maybe three nits while browsing by: On 2011-12-11 11:24, Christoffer Dall wrote: Userspace can inject IRQs and FIQs through the KVM_IRQ_LINE VM ioctl. This ioctl is used since the sematics are in fact two lines that can be either raised or lowered on the VCPU - the IRQ and FIQ lines. KVM needs to know which VCPU it must operate on and whether the FIQ or IRQ line is raised/lowered. Hence both pieces of information is packed in the kvm_irq_level-irq field. The irq fild value will be: IRQ: vcpu_index * 2 FIQ: (vcpu_index * 2) + 1 This is documented in Documentation/kvm/api.txt. The effect of the ioctl is simply to simply raise/lower the corresponding virt_irq field on the VCPU struct, which will cause the world-switch code to raise/lower virtual interrupts when running the guest on next switch. The wait_for_interrupt flag is also cleared for raised IRQs causing an idle VCPU to become active again. Note: The custom trace_kvm_irq_line is used despite a generic definition of trace_kvm_set_irq, since the trace-Kvm_set_irq depends on the x86-specific define of __HAVE_IOAPIC. Either the trace event should be created regardless of this define or it should depend on another ifdef clause, common for both x86 and ARM. However, since the arguments don't really match those used in ARM, I am yet to be convinced why this is necessary. Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- Documentation/virtual/kvm/api.txt | 10 ++- arch/arm/include/asm/kvm.h|8 ++ arch/arm/include/asm/kvm_arm.h|1 + arch/arm/kvm/arm.c| 53 - arch/arm/kvm/trace.h | 21 +++ include/linux/kvm.h |1 + 6 files changed, 91 insertions(+), 3 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 7945b0b..4abaa67 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -572,7 +572,7 @@ only go to the IOAPIC. On ia64, a IOSAPIC is created. 4.25 KVM_IRQ_LINE Capability: KVM_CAP_IRQCHIP -Architectures: x86, ia64 +Architectures: x86, ia64, arm Type: vm ioctl Parameters: struct kvm_irq_level Returns: 0 on success, -1 on error @@ -582,6 +582,14 @@ Requires that an interrupt controller model has been previously created with KVM_CREATE_IRQCHIP. Note that edge-triggered interrupts require the level to be set to 1 and then back to 0. +KVM_CREATE_IRQCHIP (except for ARM). Note that edge-triggered interrupts +require the level to be set to 1 and then back to 0. You probably wanted to replace the original lines with these two, no? + +ARM uses two types of interrupt lines per CPU, ie. IRQ and FIQ. The value of the +irq field should be (VCPU_INDEX * 2) for IRQs and ((VCPU_INDEX * 2) + 1) for +FIQs. Level is used to raise/lower the line. See arch/arm/include/asm/kvm.h for +convenience macros. + struct kvm_irq_level { union { __u32 irq; /* GSI */ diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h index 87dc33b..8935062 100644 --- a/arch/arm/include/asm/kvm.h +++ b/arch/arm/include/asm/kvm.h @@ -20,6 +20,14 @@ #include asm/types.h /* + * KVM_IRQ_LINE macros to set/read IRQ/FIQ for specific VCPU index. + */ +enum KVM_ARM_IRQ_LINE_TYPE { + KVM_ARM_IRQ_LINE = 0, + KVM_ARM_FIQ_LINE = 1, +}; + +/* * Modes used for short-hand mode determinition in the world-switch code and * in emulation code. * diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h index 835abd1..e378a37 100644 --- a/arch/arm/include/asm/kvm_arm.h +++ b/arch/arm/include/asm/kvm_arm.h @@ -49,6 +49,7 @@ #define HCR_VM 1 #define HCR_GUEST_MASK (HCR_TSC | HCR_TWE | HCR_TWI | HCR_VM | HCR_AMO | \ HCR_AMO | HCR_IMO | HCR_FMO | HCR_SWIO) +#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF) /* Hyp System Control Register (HSCTLR) bits */ #define HSCTLR_TE(1 30) diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 89ba18d..fc0bd6b 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -299,6 +299,43 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) return -EINVAL; } +static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm, + struct kvm_irq_level *irq_level) +{ + u32 mask; + unsigned int vcpu_idx; + struct kvm_vcpu *vcpu; + + vcpu_idx = irq_level-irq / 2; + if (vcpu_idx = KVM_MAX_VCPUS) + return -EINVAL; + + vcpu = kvm_get_vcpu(kvm, vcpu_idx); + if (!vcpu) + return -EINVAL; + + switch (irq_level-irq % 2) { + case KVM_ARM_IRQ_LINE: + mask = HCR_VI; + break; + case KVM_ARM_FIQ_LINE: + mask = HCR_VF; + break;
Re: Current kernel fails to compile with KVM on PowerPC
Alexander Graf hat am Tue 22. Nov, 22:29 (+0100) geschrieben: On 22.11.2011, at 21:04, Jörg Sommer wrote: Jörg Sommer hat am Mon 07. Nov, 20:48 (+0100) geschrieben: I'm trying to build the kernel with the git commit-id 31555213f03bca37d2c02e10946296052f4ecfcd, but it fails CHK include/linux/version.h HOSTCC scripts/mod/modpost.o CHK include/generated/utsrelease.h UPD include/generated/utsrelease.h HOSTLD scripts/mod/modpost GEN include/generated/bounds.h CC arch/powerpc/kernel/asm-offsets.s In file included from arch/powerpc/kernel/asm-offsets.c:59:0: /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h: In function ‘compute_tlbie_rb’: /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:393:10: error: ‘HPTE_V_SECONDARY’ undeclared (first use in this function) /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:393:10: note: each undeclared identifier is reported only once for each function it appears in /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:396:12: error: ‘HPTE_V_1TB_SEG’ undeclared (first use in this function) /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:401:10: error: ‘HPTE_V_LARGE’ undeclared (first use in this function) /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:415:2: warning: right shift count = width of type [enabled by default] make[3]: *** [arch/powerpc/kernel/asm-offsets.s] Fehler 1 make[2]: *** [prepare0] Fehler 2 make[1]: *** [deb-pkg] Fehler 2 make: *** [deb-pkg] Fehler 2 I'm still having this problem. I can' build 6fe4c6d466e95d31164f14b1ac4aefb51f0f4f82. Are there any patches to make the kernel builds and do not oops [1] on PowerPC? The failures above should be fixed by now. I've pulled git://git.kernel.org/pub/scm/virt/kvm/kvm.git (a41d08d13f903da5c633fc58ee074156f05ab3ce), but this tree doesn't contain a suitable commit. Where can I find it? Bye, Jörg. -- Ich kenn mich mit OpenBSD kaum aus, was sind denn da so die Vorteile gegenueber Linux und iptables? Der Fuchsschwanzeffekt ist größer. :- Message-ID: slrnb11064.54g.hsch...@humbert.ddns.org signature.asc Description: Digital signature http://en.wikipedia.org/wiki/OpenPGP
Re: [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace
On 11 December 2011 15:18, Jan Kiszka jan.kis...@web.de wrote: Just found two, maybe three nits while browsing by: On 2011-12-11 11:24, Christoffer Dall wrote: +ARM uses two types of interrupt lines per CPU, ie. IRQ and FIQ. The value of the +irq field should be (VCPU_INDEX * 2) for IRQs and ((VCPU_INDEX * 2) + 1) for +FIQs. This seems to me a slightly obscure way of defining the two fields in this word (ie bits [31..1] cpu number, bit [0] irq-vs-fiq). +static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm, + struct kvm_irq_level *irq_level) +{ + u32 mask; + unsigned int vcpu_idx; + struct kvm_vcpu *vcpu; + + vcpu_idx = irq_level-irq / 2; + if (vcpu_idx = KVM_MAX_VCPUS) + return -EINVAL; + + vcpu = kvm_get_vcpu(kvm, vcpu_idx); + if (!vcpu) + return -EINVAL; + + switch (irq_level-irq % 2) { + case KVM_ARM_IRQ_LINE: + mask = HCR_VI; + break; + case KVM_ARM_FIQ_LINE: + mask = HCR_VF; + break; + default: + return -EINVAL; Due to % 2, default is unreachable. Remove the masking? Removing the mask would be wrong since the irq field here is encoding both cpu number and irq-vs-fiq. The default is just an unreachable condition. (Why are we using % here rather than the obvious bit operation, incidentally?) -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace
On Sun, Dec 11, 2011 at 10:18 AM, Jan Kiszka jan.kis...@web.de wrote: Just found two, maybe three nits while browsing by: On 2011-12-11 11:24, Christoffer Dall wrote: Userspace can inject IRQs and FIQs through the KVM_IRQ_LINE VM ioctl. This ioctl is used since the sematics are in fact two lines that can be either raised or lowered on the VCPU - the IRQ and FIQ lines. KVM needs to know which VCPU it must operate on and whether the FIQ or IRQ line is raised/lowered. Hence both pieces of information is packed in the kvm_irq_level-irq field. The irq fild value will be: IRQ: vcpu_index * 2 FIQ: (vcpu_index * 2) + 1 This is documented in Documentation/kvm/api.txt. The effect of the ioctl is simply to simply raise/lower the corresponding virt_irq field on the VCPU struct, which will cause the world-switch code to raise/lower virtual interrupts when running the guest on next switch. The wait_for_interrupt flag is also cleared for raised IRQs causing an idle VCPU to become active again. Note: The custom trace_kvm_irq_line is used despite a generic definition of trace_kvm_set_irq, since the trace-Kvm_set_irq depends on the x86-specific define of __HAVE_IOAPIC. Either the trace event should be created regardless of this define or it should depend on another ifdef clause, common for both x86 and ARM. However, since the arguments don't really match those used in ARM, I am yet to be convinced why this is necessary. Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- Documentation/virtual/kvm/api.txt | 10 ++- arch/arm/include/asm/kvm.h | 8 ++ arch/arm/include/asm/kvm_arm.h | 1 + arch/arm/kvm/arm.c | 53 - arch/arm/kvm/trace.h | 21 +++ include/linux/kvm.h | 1 + 6 files changed, 91 insertions(+), 3 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 7945b0b..4abaa67 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -572,7 +572,7 @@ only go to the IOAPIC. On ia64, a IOSAPIC is created. 4.25 KVM_IRQ_LINE Capability: KVM_CAP_IRQCHIP -Architectures: x86, ia64 +Architectures: x86, ia64, arm Type: vm ioctl Parameters: struct kvm_irq_level Returns: 0 on success, -1 on error @@ -582,6 +582,14 @@ Requires that an interrupt controller model has been previously created with KVM_CREATE_IRQCHIP. Note that edge-triggered interrupts require the level to be set to 1 and then back to 0. +KVM_CREATE_IRQCHIP (except for ARM). Note that edge-triggered interrupts +require the level to be set to 1 and then back to 0. You probably wanted to replace the original lines with these two, no? ah yes, some stgit re-ordering artifact. + +ARM uses two types of interrupt lines per CPU, ie. IRQ and FIQ. The value of the +irq field should be (VCPU_INDEX * 2) for IRQs and ((VCPU_INDEX * 2) + 1) for +FIQs. Level is used to raise/lower the line. See arch/arm/include/asm/kvm.h for +convenience macros. + struct kvm_irq_level { union { __u32 irq; /* GSI */ diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h index 87dc33b..8935062 100644 --- a/arch/arm/include/asm/kvm.h +++ b/arch/arm/include/asm/kvm.h @@ -20,6 +20,14 @@ #include asm/types.h /* + * KVM_IRQ_LINE macros to set/read IRQ/FIQ for specific VCPU index. + */ +enum KVM_ARM_IRQ_LINE_TYPE { + KVM_ARM_IRQ_LINE = 0, + KVM_ARM_FIQ_LINE = 1, +}; + +/* * Modes used for short-hand mode determinition in the world-switch code and * in emulation code. * diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h index 835abd1..e378a37 100644 --- a/arch/arm/include/asm/kvm_arm.h +++ b/arch/arm/include/asm/kvm_arm.h @@ -49,6 +49,7 @@ #define HCR_VM 1 #define HCR_GUEST_MASK (HCR_TSC | HCR_TWE | HCR_TWI | HCR_VM | HCR_AMO | \ HCR_AMO | HCR_IMO | HCR_FMO | HCR_SWIO) +#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF) /* Hyp System Control Register (HSCTLR) bits */ #define HSCTLR_TE (1 30) diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 89ba18d..fc0bd6b 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -299,6 +299,43 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) return -EINVAL; } +static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm, + struct kvm_irq_level *irq_level) +{ + u32 mask; + unsigned int vcpu_idx; + struct kvm_vcpu *vcpu; + + vcpu_idx = irq_level-irq / 2; + if (vcpu_idx = KVM_MAX_VCPUS) + return -EINVAL; + + vcpu = kvm_get_vcpu(kvm, vcpu_idx); + if (!vcpu) + return -EINVAL; + + switch (irq_level-irq % 2) { + case KVM_ARM_IRQ_LINE: + mask = HCR_VI; +
Re: [PATCH v5 00/13] KVM/ARM Implementation
On Sun, Dec 11, 2011 at 6:32 AM, Peter Maydell peter.mayd...@linaro.org wrote: On 11 December 2011 10:24, Christoffer Dall c.d...@virtualopensystems.com wrote: The following series implements KVM support for ARM processors, specifically on the Cortex A-15 platform. Still on the to-do list: - Reuse VMIDs - Fix SMP host support - Fix SMP guest support - Support guest Thumb mode for MMIO emulation - Further testing - Performance improvements Other items for this list: - Support Neon/VFP in guests (the fpu regs struct is empty ATM) - Support guest debugging ok, thanks, will add these to the list. I have a feeling it will keep growing for a while :) I couldn't see any support for the TLS registers in your cp15 emulation: did I miss it, or do we handle it without needing to trap? by TLS you mean the cp15, c13 registers (tid and friends?) If so, I handle these in the world-switch code (look at read_cp15_state and write_cp15_state). otherwise, help me out on the acronym... -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 00/13] KVM/ARM Implementation
On 11 December 2011 19:23, Christoffer Dall c.d...@virtualopensystems.com wrote: by TLS you mean the cp15, c13 registers (tid and friends?) If so, I handle these in the world-switch code (look at read_cp15_state and write_cp15_state). otherwise, help me out on the acronym... Yes, those are the ones (TLS == thread local storage). Thanks for the pointer. -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace
On Sun, Dec 11, 2011 at 11:03 AM, Peter Maydell peter.mayd...@linaro.org wrote: On 11 December 2011 15:18, Jan Kiszka jan.kis...@web.de wrote: Just found two, maybe three nits while browsing by: On 2011-12-11 11:24, Christoffer Dall wrote: +ARM uses two types of interrupt lines per CPU, ie. IRQ and FIQ. The value of the +irq field should be (VCPU_INDEX * 2) for IRQs and ((VCPU_INDEX * 2) + 1) for +FIQs. This seems to me a slightly obscure way of defining the two fields in this word (ie bits [31..1] cpu number, bit [0] irq-vs-fiq). Isn't that just personal preference? The other scheme was suggested by Avi, and nobody else complained then, so I'd be inclined to just leave it as is. +static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm, + struct kvm_irq_level *irq_level) +{ + u32 mask; + unsigned int vcpu_idx; + struct kvm_vcpu *vcpu; + + vcpu_idx = irq_level-irq / 2; + if (vcpu_idx = KVM_MAX_VCPUS) + return -EINVAL; + + vcpu = kvm_get_vcpu(kvm, vcpu_idx); + if (!vcpu) + return -EINVAL; + + switch (irq_level-irq % 2) { + case KVM_ARM_IRQ_LINE: + mask = HCR_VI; + break; + case KVM_ARM_FIQ_LINE: + mask = HCR_VF; + break; + default: + return -EINVAL; Due to % 2, default is unreachable. Remove the masking? Removing the mask would be wrong since the irq field here is encoding both cpu number and irq-vs-fiq. The default is just an unreachable condition. (Why are we using % here rather than the obvious bit operation, incidentally?) right, I will remove the default case. I highly doubt that the difference in using a bitop will be measurably more efficient, but if you feel strongly about it, I can change it to a shift and bitwise and, which I assume is what you mean by the obvious bit operation? I think my CS background speaks for using %, but whatever. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace
On 11 December 2011 19:30, Christoffer Dall c.d...@virtualopensystems.com wrote: On Sun, Dec 11, 2011 at 11:03 AM, Peter Maydell peter.mayd...@linaro.org wrote: Removing the mask would be wrong since the irq field here is encoding both cpu number and irq-vs-fiq. The default is just an unreachable condition. (Why are we using % here rather than the obvious bit operation, incidentally?) right, I will remove the default case. I highly doubt that the difference in using a bitop will be measurably more efficient, but if you feel strongly about it, I can change it to a shift and bitwise and, which I assume is what you mean by the obvious bit operation? I think my CS background speaks for using %, but whatever. Certainly the compiler ought to be able to figure out the two are the same thing; I just think irq 1 is more readable than irq % 2 (because it's being clear that it's treating the variable as a pile of bits rather than an integer). This is bikeshedding rather, though, and style issues in kernel code are a matter for the kernel folk. So you can ignore me :-) -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm tools: Clean up LINT assignment code
Just set delivery mode directly without going through ugly casting. This cleans up and simplifies the code. Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/x86/kvm-cpu.c | 10 ++ 1 files changed, 2 insertions(+), 8 deletions(-) diff --git a/tools/kvm/x86/kvm-cpu.c b/tools/kvm/x86/kvm-cpu.c index 27b7a8f..cc1f560 100644 --- a/tools/kvm/x86/kvm-cpu.c +++ b/tools/kvm/x86/kvm-cpu.c @@ -81,18 +81,12 @@ static int kvm_cpu__set_lint(struct kvm_cpu *vcpu) { struct kvm_lapic_state klapic; struct local_apic *lapic = (void *)klapic; - u32 lvt; if (ioctl(vcpu-vcpu_fd, KVM_GET_LAPIC, klapic)) return -1; - lvt = *(u32 *)lapic-lvt_lint0; - lvt = SET_APIC_DELIVERY_MODE(lvt, APIC_MODE_EXTINT); - *(u32 *)lapic-lvt_lint0 = lvt; - - lvt = *(u32 *)lapic-lvt_lint1; - lvt = SET_APIC_DELIVERY_MODE(lvt, APIC_MODE_NMI); - *(u32 *)lapic-lvt_lint1 = lvt; + lapic-lvt_lint0.delivery_mode = APIC_MODE_EXTINT; + lapic-lvt_lint1.delivery_mode = APIC_MODE_NMI; return ioctl(vcpu-vcpu_fd, KVM_SET_LAPIC, klapic); } -- 1.7.8 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Android-virt] [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace
On Dec 11, 2011, at 2:48 PM, Peter Maydell peter.mayd...@linaro.org wrote: On 11 December 2011 19:30, Christoffer Dall c.d...@virtualopensystems.com wrote: On Sun, Dec 11, 2011 at 11:03 AM, Peter Maydell peter.mayd...@linaro.org wrote: Removing the mask would be wrong since the irq field here is encoding both cpu number and irq-vs-fiq. The default is just an unreachable condition. (Why are we using % here rather than the obvious bit operation, incidentally?) right, I will remove the default case. I highly doubt that the difference in using a bitop will be measurably more efficient, but if you feel strongly about it, I can change it to a shift and bitwise and, which I assume is what you mean by the obvious bit operation? I think my CS background speaks for using %, but whatever. Certainly the compiler ought to be able to figure out the two are the same thing; I just think irq 1 is more readable than irq % 2 (because it's being clear that it's treating the variable as a pile of bits rather than an integer). This is bikeshedding rather, though, and style issues in kernel code are a matter for the kernel folk. So you can ignore me :-) Well, if it was just irq 1, then I hear you, but it would be (irq cpu_idx) 1 which I don't think is more clear. But yes let's see what the kernel folks say. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Android-virt] [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace
On 11 December 2011 20:07, Christoffer Dall christofferd...@christofferdall.dk wrote: Well, if it was just irq 1, then I hear you, but it would be (irq cpu_idx) 1 which I don't think is more clear. Er, what? The fields are [31..1] CPU index and [0] irqtype, right? So what you have now is: vcpu_idx = irq_level-irq / 2; irqtype = irq_level-irq % 2; and the bitshifting equivalent is: vcpu_idx = irq_level-irq 1; irqtype = irq_level-irq 1; surely? Shifting by the cpuindex is definitely wrong. (Incidentally I fixed a bug in your QEMU-side code which wasn't feeding this field to the kernel in the way it expects: http://git.linaro.org/gitweb?p=qemu/qemu-linaro.git;a=commitdiff;h=2502ba067e795e48d346f9816fad45177ca64bca Sorry, I should have posted that to the list. I'll do that now.) -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Android-virt] [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace
On Sun, Dec 11, 2011 at 3:25 PM, Peter Maydell peter.mayd...@linaro.org wrote: On 11 December 2011 20:07, Christoffer Dall christofferd...@christofferdall.dk wrote: Well, if it was just irq 1, then I hear you, but it would be (irq cpu_idx) 1 which I don't think is more clear. Er, what? The fields are [31..1] CPU index and [0] irqtype, right? So what you have now is: vcpu_idx = irq_level-irq / 2; irqtype = irq_level-irq % 2; and the bitshifting equivalent is: vcpu_idx = irq_level-irq 1; irqtype = irq_level-irq 1; surely? Shifting by the cpuindex is definitely wrong. actually, that's not how the irq_level field is defined. If you look in Documentation/virtual/kvm/api.txt: ARM uses two types of interrupt lines per CPU, ie. IRQ and FIQ. The value of the irq field should be (VCPU_INDEX * 2) for IRQs and ((VCPU_INDEX * 2) + 1) for FIQs. Level is used to raise/lower the line. See arch/arm/include/asm/kvm.h for convenience macros. also, in the kernel code the cpu_index is achieved by a simple integer division by 2. as I said, this was the proposal from the last round of reviews after a lengthy discussion, so I sticked with that. we should definitely fix either side, and the only sane argument is that this is an irq_line field, so an index resembling an actual line seems more semantically in line with the field purpose rather than a bit encoding, but I am open to arguments and not married to the current implementation. (Incidentally I fixed a bug in your QEMU-side code which wasn't feeding this field to the kernel in the way it expects: http://git.linaro.org/gitweb?p=qemu/qemu-linaro.git;a=commitdiff;h=2502ba067e795e48d346f9816fad45177ca64bca Sorry, I should have posted that to the list. I'll do that now.) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 10/12] [PATCH] kvm-s390: storage key interface
On Sat, Dec 10, 2011 at 01:35:39PM +0100, Carsten Otte wrote: This patch introduces an interface to access the guest visible storage keys. It supports three operations that model the behavior that SSKE/ISKE/RRBE instructions would have if they were issued by the guest. These instructions are all documented in the z architecture principles of operation book. Signed-off-by: Carsten Otte co...@de.ibm.com [...] --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -112,13 +112,115 @@ void kvm_arch_exit(void) { } +static long kvm_s390_keyop(struct kvm_s390_keyop *kop) +{ + unsigned long addr = kop-user_addr; + pte_t *ptep; + pgste_t pgste; + int r; + unsigned long skey; + unsigned long bits; + + /* make sure this process is a hypervisor */ + r = -EINVAL; + if (!mm_has_pgste(current-mm)) + goto out; + + r = -EFAULT; + if (addr = PGDIR_SIZE) + goto out; + + spin_lock(current-mm-page_table_lock); + ptep = ptep_for_addr(addr); Locking is broken; following order is possible: kvm_s390_keyop()- spin_lock(current-mm-page_table_lock) - ptep_for_addr() - down_read(current-mm-mmap_sem) --- Bug 1, we might schedule here - __pmdp_for_addr() - __pte_alloc()- spin_lock(mm-page_table_lock) --- Bug 2, deadlock -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Android-virt] [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace
On 11 December 2011 21:36, Christoffer Dall c.d...@virtualopensystems.com wrote: On Sun, Dec 11, 2011 at 3:25 PM, Peter Maydell peter.mayd...@linaro.org wrote: On 11 December 2011 20:07, Christoffer Dall christofferd...@christofferdall.dk wrote: Well, if it was just irq 1, then I hear you, but it would be (irq cpu_idx) 1 which I don't think is more clear. Er, what? The fields are [31..1] CPU index and [0] irqtype, right? So what you have now is: vcpu_idx = irq_level-irq / 2; irqtype = irq_level-irq % 2; and the bitshifting equivalent is: vcpu_idx = irq_level-irq 1; irqtype = irq_level-irq 1; surely? Shifting by the cpuindex is definitely wrong. actually, that's not how the irq_level field is defined. It's not clear to me which part of my comment this is aimed at. Shifting by the cpuindex doesn't give the right answer whether you define irq_level by bitfields or with the current phrasing you quote below. If you look in Documentation/virtual/kvm/api.txt: ARM uses two types of interrupt lines per CPU, ie. IRQ and FIQ. The value of the irq field should be (VCPU_INDEX * 2) for IRQs and ((VCPU_INDEX * 2) + 1) for FIQs. Level is used to raise/lower the line. See arch/arm/include/asm/kvm.h for convenience macros. That's exactly the same thing, though, right? It's just a matter of how you choose to phrase it (in either text or in code; the values come out identical). When I was sorting out the QEMU side, I started out by looking at the kernel source code, deduced that we were encoding CPU number and irq-vs-fiq as described above (and documenting it in a slightly confusing way as a multiplication) and then wrote the qemu code in what seemed to me the clearest way. (Actually what would be clearest would be if the ioctl took the (interrupt-target, interrupt-line-for-that-target, value-of-line) tuple as three separate values rather than encoding two of them into a single integer, but I assume there's a reason we can't have that.) we should definitely fix either side, and the only sane argument is that this is an irq_line field, so an index resembling an actual line seems more semantically in line with the field purpose rather than a bit encoding, but I am open to arguments and not married to the current implementation. To be clear, I'm not attempting to suggest a change in the semantics of this field. (The qemu patch fixes the qemu side to adhere to what the kernel requires.) -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] KVM: Make mmu_shrink() scan nr_to_scan shadow pages
This patch set fixes mmu_shrink() as I said last week. Though I did not change tuning parameters, we can do that in the future on top of this: I think the batch size, 128, may be too large. Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] KVM: Rename vm_list to kvm_list to avoid confusion
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Make it clear that this is not related to virtual memory. Remove vm_ prefix from the corresponding member of the struct kvm to avoid kvm-vm_ redundancy alongside. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- Documentation/virtual/kvm/locking.txt |2 +- arch/x86/include/asm/kvm_host.h |2 +- arch/x86/kvm/mmu.c|4 ++-- arch/x86/kvm/x86.c|4 ++-- include/linux/kvm_host.h |2 +- virt/kvm/kvm_main.c | 12 ++-- 6 files changed, 13 insertions(+), 13 deletions(-) diff --git a/Documentation/virtual/kvm/locking.txt b/Documentation/virtual/kvm/locking.txt index 3b4cd3b..1a851be 100644 --- a/Documentation/virtual/kvm/locking.txt +++ b/Documentation/virtual/kvm/locking.txt @@ -12,7 +12,7 @@ KVM Lock Overview Name: kvm_lock Type: raw_spinlock Arch: any -Protects: - vm_list +Protects: - kvm_list - hardware virtualization enable/disable Comment: 'raw' because hardware enabling/disabling must be atomic /wrt migration. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 020413a..186b2b0 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -105,7 +105,7 @@ #define ASYNC_PF_PER_VCPU 64 extern raw_spinlock_t kvm_lock; -extern struct list_head vm_list; +extern struct list_head kvm_list; struct kvm_vcpu; struct kvm; diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 2a2a9b4..590f76b 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3911,7 +3911,7 @@ static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc) raw_spin_lock(kvm_lock); - list_for_each_entry(kvm, vm_list, vm_list) { + list_for_each_entry(kvm, kvm_list, list) { int idx; LIST_HEAD(invalid_list); @@ -3930,7 +3930,7 @@ static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc) srcu_read_unlock(kvm-srcu, idx); } if (kvm_freed) - list_move_tail(kvm_freed-vm_list, vm_list); + list_move_tail(kvm_freed-list, kvm_list); raw_spin_unlock(kvm_lock); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index eeeaf2e..96f118b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4566,7 +4566,7 @@ static int kvmclock_cpufreq_notifier(struct notifier_block *nb, unsigned long va smp_call_function_single(freq-cpu, tsc_khz_changed, freq, 1); raw_spin_lock(kvm_lock); - list_for_each_entry(kvm, vm_list, vm_list) { + list_for_each_entry(kvm, kvm_list, list) { kvm_for_each_vcpu(i, vcpu, kvm) { if (vcpu-cpu != freq-cpu) continue; @@ -5857,7 +5857,7 @@ int kvm_arch_hardware_enable(void *garbage) int i; kvm_shared_msr_cpu_online(); - list_for_each_entry(kvm, vm_list, vm_list) + list_for_each_entry(kvm, kvm_list, list) kvm_for_each_vcpu(i, vcpu, kvm) if (vcpu-cpu == smp_processor_id()) kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 8c5c303..054b52e 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -256,7 +256,7 @@ struct kvm { struct kvm_vcpu *vcpus[KVM_MAX_VCPUS]; atomic_t online_vcpus; int last_boosted_vcpu; - struct list_head vm_list; + struct list_head list; /* the list of kvm instances */ struct mutex lock; struct kvm_io_bus *buses[KVM_NR_BUSES]; #ifdef CONFIG_HAVE_KVM_EVENTFD diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d8bac07..03ae960 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -70,8 +70,8 @@ MODULE_LICENSE(GPL); * kvm-lock -- kvm-slots_lock -- kvm-irq_lock */ -DEFINE_RAW_SPINLOCK(kvm_lock); -LIST_HEAD(vm_list); +DEFINE_RAW_SPINLOCK(kvm_lock); /* protect kvm_list */ +LIST_HEAD(kvm_list); /* the list of kvm instances */ static cpumask_var_t cpus_hardware_enabled; static int kvm_usage_count = 0; @@ -498,7 +498,7 @@ static struct kvm *kvm_create_vm(void) goto out_err; raw_spin_lock(kvm_lock); - list_add(kvm-vm_list, vm_list); + list_add(kvm-list, kvm_list); raw_spin_unlock(kvm_lock); return kvm; @@ -573,7 +573,7 @@ static void kvm_destroy_vm(struct kvm *kvm) kvm_arch_sync_events(kvm); raw_spin_lock(kvm_lock); - list_del(kvm-vm_list); + list_del(kvm-list); raw_spin_unlock(kvm_lock); kvm_free_irq_routing(kvm); for (i = 0; i KVM_NR_BUSES; i++) @@ -2626,7 +2626,7 @@ static int vm_stat_get(void *_offset, u64 *val) *val = 0;
[PATCH 2/4] KVM: MMU: Make common preparation code for zapping sp into a function
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Use list_entry() instead of container_of() for taking a shadow page from the active_mmu_pages list. Note: the return value of pre_zap_one_sp() will be used later. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- arch/x86/kvm/mmu.c | 45 +++-- 1 files changed, 23 insertions(+), 22 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 590f76b..b1e8270 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1930,6 +1930,26 @@ static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp, return ret; } +/** + * pre_zap_one_sp - make one shadow page ready for being freed + * @kvm: the kvm instance + * @invalid_list: the list to which we add shadow pages ready for being freed + * + * Take one shadow page from the tail of the active_mmu_pages list and make it + * ready for being freed, then put it into the @invalid_list. Other pages, + * unsync children, may also be put into the @invalid_list. + * + * Return the number of shadow pages added to the @invalid_list this way. + */ +static int pre_zap_one_sp(struct kvm *kvm, struct list_head *invalid_list) +{ + struct kvm_mmu_page *sp; + + sp = list_entry(kvm-arch.active_mmu_pages.prev, + struct kvm_mmu_page, link); + return kvm_mmu_prepare_zap_page(kvm, sp, invalid_list); +} + static void kvm_mmu_isolate_pages(struct list_head *invalid_list) { struct kvm_mmu_page *sp; @@ -1999,11 +2019,7 @@ void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int goal_nr_mmu_pages) if (kvm-arch.n_used_mmu_pages goal_nr_mmu_pages) { while (kvm-arch.n_used_mmu_pages goal_nr_mmu_pages !list_empty(kvm-arch.active_mmu_pages)) { - struct kvm_mmu_page *page; - - page = container_of(kvm-arch.active_mmu_pages.prev, - struct kvm_mmu_page, link); - kvm_mmu_prepare_zap_page(kvm, page, invalid_list); + pre_zap_one_sp(kvm, invalid_list); } kvm_mmu_commit_zap_page(kvm, invalid_list); goal_nr_mmu_pages = kvm-arch.n_used_mmu_pages; @@ -3719,11 +3735,7 @@ void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu) while (kvm_mmu_available_pages(vcpu-kvm) KVM_REFILL_PAGES !list_empty(vcpu-kvm-arch.active_mmu_pages)) { - struct kvm_mmu_page *sp; - - sp = container_of(vcpu-kvm-arch.active_mmu_pages.prev, - struct kvm_mmu_page, link); - kvm_mmu_prepare_zap_page(vcpu-kvm, sp, invalid_list); + pre_zap_one_sp(vcpu-kvm, invalid_list); ++vcpu-kvm-stat.mmu_recycled; } kvm_mmu_commit_zap_page(vcpu-kvm, invalid_list); @@ -3890,16 +3902,6 @@ restart: spin_unlock(kvm-mmu_lock); } -static void kvm_mmu_remove_some_alloc_mmu_pages(struct kvm *kvm, - struct list_head *invalid_list) -{ - struct kvm_mmu_page *page; - - page = container_of(kvm-arch.active_mmu_pages.prev, - struct kvm_mmu_page, link); - kvm_mmu_prepare_zap_page(kvm, page, invalid_list); -} - static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc) { struct kvm *kvm; @@ -3919,8 +3921,7 @@ static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc) spin_lock(kvm-mmu_lock); if (!kvm_freed nr_to_scan 0 kvm-arch.n_used_mmu_pages 0) { - kvm_mmu_remove_some_alloc_mmu_pages(kvm, - invalid_list); + pre_zap_one_sp(kvm, invalid_list); kvm_freed = kvm; } nr_to_scan--; -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] KVM: MMU: Make preparation for zapping some sp into a separate function
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp This will be used for mmu_shrink() in the following patch. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- arch/x86/kvm/mmu.c | 36 ++-- 1 files changed, 26 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index b1e8270..fcd0dd1 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2003,6 +2003,28 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm, } +/** + * pre_zap_some_sp - make some shadow pages ready for being freed + * @kvm: the kvm instance + * @invalid_list: the list to which we add shadow pages ready for being freed + * @nr_to_zap: how many shadow pages we want to zap + * + * Try to make @nr_to_zap shadow pages ready for being freed, then put them + * into the @invalid_list. + * + * Return the number of shadow pages actually added to the @invalid_list. + */ +static int pre_zap_some_sp(struct kvm *kvm, struct list_head *invalid_list, + int nr_to_zap) +{ + int nr_before = kvm-arch.n_used_mmu_pages; + + while (nr_to_zap 0 !list_empty(kvm-arch.active_mmu_pages)) + nr_to_zap -= pre_zap_one_sp(kvm, invalid_list); + + return nr_before - kvm-arch.n_used_mmu_pages; +} + /* * Changing the number of mmu pages allocated to the vm * Note: if goal_nr_mmu_pages is too small, you will get dead lock @@ -2010,17 +2032,11 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm, void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int goal_nr_mmu_pages) { LIST_HEAD(invalid_list); - /* -* If we set the number of mmu pages to be smaller be than the -* number of actived pages , we must to free some mmu pages before we -* change the value -*/ + int nr_to_zap = kvm-arch.n_used_mmu_pages goal_nr_mmu_pages; - if (kvm-arch.n_used_mmu_pages goal_nr_mmu_pages) { - while (kvm-arch.n_used_mmu_pages goal_nr_mmu_pages - !list_empty(kvm-arch.active_mmu_pages)) { - pre_zap_one_sp(kvm, invalid_list); - } + if (nr_to_zap 0) { + /* free some shadow pages to make the number fit the goal */ + pre_zap_some_sp(kvm, invalid_list, nr_to_zap); kvm_mmu_commit_zap_page(kvm, invalid_list); goal_nr_mmu_pages = kvm-arch.n_used_mmu_pages; } -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] KVM: MMU: Make mmu_shrink() scan nr_to_scan shadow pages
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Currently, mmu_shrink() tries to free a shadow page from one kvm and does not use nr_to_scan correctly. This patch fixes this by making it try to free some shadow pages from each kvm. The number of shadow pages each kvm frees becomes proportional to the number of shadow pages it is using. Note: an easy way to see how this code works is to do echo 3 /proc/sys/vm/drop_caches during some virtual machines are running. Shadow pages will be zapped as expected by this. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- arch/x86/kvm/mmu.c | 23 ++- 1 files changed, 14 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index fcd0dd1..c6c61dd 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3921,7 +3921,7 @@ restart: static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc) { struct kvm *kvm; - struct kvm *kvm_freed = NULL; + int nr_to_zap, nr_total; int nr_to_scan = sc-nr_to_scan; if (nr_to_scan == 0) @@ -3929,25 +3929,30 @@ static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc) raw_spin_lock(kvm_lock); + nr_total = percpu_counter_read_positive(kvm_total_used_mmu_pages); + list_for_each_entry(kvm, kvm_list, list) { int idx; LIST_HEAD(invalid_list); + if (nr_to_scan = 0) { + /* next time from this kvm */ + list_move_tail(kvm_list, kvm-list); + break; + } + idx = srcu_read_lock(kvm-srcu); spin_lock(kvm-mmu_lock); - if (!kvm_freed nr_to_scan 0 - kvm-arch.n_used_mmu_pages 0) { - pre_zap_one_sp(kvm, invalid_list); - kvm_freed = kvm; - } - nr_to_scan--; + /* proportional to how many shadow pages this kvm is using */ + nr_to_zap = sc-nr_to_scan * kvm-arch.n_used_mmu_pages; + nr_to_zap /= nr_total; + nr_to_scan -= pre_zap_some_sp(kvm, invalid_list, nr_to_zap); kvm_mmu_commit_zap_page(kvm, invalid_list); + spin_unlock(kvm-mmu_lock); srcu_read_unlock(kvm-srcu, idx); } - if (kvm_freed) - list_move_tail(kvm_freed-list, kvm_list); raw_spin_unlock(kvm_lock); -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] virtio: use mandatory barriers for remote processor vdevs
On Sun, 2011-12-11 at 14:25 +0200, Michael S. Tsirkin wrote: Forwarding some results by Amos, who run multiple netperf streams in parallel, from an external box to the guest. TCP_STREAM results were noisy. This could be due to buffering done by TCP, where packet size varies even as message size is constant. TCP_RR results were consistent. In this benchmark, after switching to mandatory barriers, CPU utilization increased by up to 35% while throughput went down by up to 14%. the normalized throughput/cpu regressed consistently, between 7 and 35% The fix applied was simply this: What machine processor was this ? Cheers, Ben. diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 3198f2e..fdccb77 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -23,7 +23,7 @@ /* virtio guest is communicating with a virtual device that actually runs on * a host processor. Memory barriers are used to control SMP effects. */ -#ifdef CONFIG_SMP +#if 0 /* Where possible, use SMP barriers which are more lightweight than mandatory * barriers, because mandatory barriers control MMIO effects on accesses * through relaxed memory I/O windows (which virtio does not use). */ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Android-virt] [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace
On 11 December 2011 22:12, Peter Maydell peter.mayd...@linaro.org wrote: (Actually what would be clearest would be if the ioctl took the (interrupt-target, interrupt-line-for-that-target, value-of-line) tuple as three separate values rather than encoding two of them into a single integer, but I assume there's a reason we can't have that.) Have you thought about how this encoding scheme would be extended when we move to using the VGIC and an in-kernel interrupt controller implementation, incidentally? I haven't really looked into that at all, but I assume that then QEMU is going to start having to tell the kernel it wants to deliver interrupt 35 to the GIC, and so on... -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Android-virt] [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace
On Sun, Dec 11, 2011 at 5:35 PM, Peter Maydell peter.mayd...@linaro.org wrote: On 11 December 2011 22:12, Peter Maydell peter.mayd...@linaro.org wrote: (Actually what would be clearest would be if the ioctl took the (interrupt-target, interrupt-line-for-that-target, value-of-line) tuple as three separate values rather than encoding two of them into a single integer, but I assume there's a reason we can't have that.) Have you thought about how this encoding scheme would be extended when we move to using the VGIC and an in-kernel interrupt controller implementation, incidentally? I haven't really looked into that at all, but I assume that then QEMU is going to start having to tell the kernel it wants to deliver interrupt 35 to the GIC, and so on... no, I haven't looked into that at all. My plan was to decipher the common irq, ioapic stuff for x86 and see how much we can re-use and if there will be some nice way to either use what's there or change some bits to accomodate both existing archs and ARM. But the short answer is, no not really, I was focusing so far on getting a stable implementation upstream. yes, we are going to have to have some interface with QEMU for this and if we need new features from what's already there that should probably be discussed in the same round as the mechanism for handing of CP15 stuff to QEMU that we touched upon earlier. -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Android-virt] [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace
On 2011-12-11 23:53, Christoffer Dall wrote: On Sun, Dec 11, 2011 at 5:35 PM, Peter Maydell peter.mayd...@linaro.org wrote: On 11 December 2011 22:12, Peter Maydell peter.mayd...@linaro.org wrote: (Actually what would be clearest would be if the ioctl took the (interrupt-target, interrupt-line-for-that-target, value-of-line) tuple as three separate values rather than encoding two of them into a single integer, but I assume there's a reason we can't have that.) Have you thought about how this encoding scheme would be extended when we move to using the VGIC and an in-kernel interrupt controller implementation, incidentally? I haven't really looked into that at all, but I assume that then QEMU is going to start having to tell the kernel it wants to deliver interrupt 35 to the GIC, and so on... no, I haven't looked into that at all. My plan was to decipher the common irq, ioapic stuff for x86 and see how much we can re-use and if there will be some nice way to either use what's there or change some bits to accomodate both existing archs and ARM. But the short answer is, no not really, I was focusing so far on getting a stable implementation upstream. yes, we are going to have to have some interface with QEMU for this and if we need new features from what's already there that should probably be discussed in the same round as the mechanism for handing of CP15 stuff to QEMU that we touched upon earlier. Enabling in-kernel irqchips usually means switching worlds. So the semantics of these particular IRQ inject interface details may change without breaking anything. However, things might look different if there will be a need to inject also the CPU IRQs directly, not only the irqchip inputs. In that case, it may make some sense to reserve more space for interrupt types than just one bit and use a common encoding scheme. Jan signature.asc Description: OpenPGP digital signature
Re: [PATCH V2 04/23] kvm tools: Get correct 64-bit types on PPC64 and link appropriately
On 09/12/11 19:29, Pekka Enberg wrote: On Fri, Dec 9, 2011 at 10:24 AM, Sasha Levin levinsasha...@gmail.com wrote: If you also got kernel patches that add __SANE_USERSPACE_TYPES__ to the headers, and KVM_CAP_NR_VCPUS to KVM PPC, we can carry them in the KVM tools tree as well. Yup, all we need is ACKs from PPC maintainers. Cool, I've pinged them re. __SANE_USERSPACE_TYPES__, and that would be useful to carry in your tree. But, IMHO, the patch I sent to Alex ([PATCH] KVM: PPC: Add KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS) should go in via his tree. The KVM/PPC-HV kernel folk will get it quicker in their trees (pulled from Alex) and it's not a build problem, only a limit of SMP CPU numbers. That is, if you're building a kernel for PPC KVM today you'll probably use something more similar to Alex's tree than mainline/kvm tools tree. Cheers, Matt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 23/23] kvm tools: Create arch-specific kvm_cpu__emulate_{mm}io()
On 09/12/11 18:53, Sasha Levin wrote: On Fri, 2011-12-09 at 17:56 +1100, Matt Evans wrote: @@ -30,4 +31,18 @@ struct kvm_cpu { struct kvm_coalesced_mmio_ring *ring; }; +/* + * As these are such simple wrappers, let's have them in the header so they'll + * be cheaper to call: + */ +static inline bool kvm_cpu__emulate_io(struct kvm *kvm, u16 port, void *data, int direction, int size, u32 count) +{ + return kvm__emulate_io(kvm, port, data, direction, size, count); +} + +static inline bool kvm_cpu__emulate_mmio(struct kvm *kvm, u64 phys_addr, u8 *data, u32 len, u8 is_write) +{ + return kvm_cpu__emulate_mmio(kvm, phys_addr, data, len, is_write); This is probably wrong. kvm_cpu__emulate_mmio just calls itself over and over. Urgh, not just probably -- CP strikes again. Consider it fixed. Thanks! Matt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: MMU: Make preparation for zapping some sp into a separate function
Takuya Yoshikawa takuya.yoshik...@gmail.com wrote: @@ -2010,17 +2032,11 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm, void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int goal_nr_mmu_pages) { LIST_HEAD(invalid_list); - /* - * If we set the number of mmu pages to be smaller be than the - * number of actived pages , we must to free some mmu pages before we - * change the value - */ + int nr_to_zap = kvm-arch.n_used_mmu_pages goal_nr_mmu_pages; Sorry, should have been: int nr_to_zap = kvm-arch.n_used_mmu_pages - goal_nr_mmu_pages; I will fix this after getting some comments. Takuya - if (kvm-arch.n_used_mmu_pages goal_nr_mmu_pages) { - while (kvm-arch.n_used_mmu_pages goal_nr_mmu_pages - !list_empty(kvm-arch.active_mmu_pages)) { - pre_zap_one_sp(kvm, invalid_list); - } + if (nr_to_zap 0) { + /* free some shadow pages to make the number fit the goal */ + pre_zap_some_sp(kvm, invalid_list, nr_to_zap); kvm_mmu_commit_zap_page(kvm, invalid_list); goal_nr_mmu_pages = kvm-arch.n_used_mmu_pages; } -- 1.7.5.4 -- Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH] kvm tools, qcow: Add support for growing refcount blocks
This patch enables allocating new refcount blocks and so then kvm tools could expand qcow2 image much larger. Signed-off-by: Lan Tianyu tianyu@intel.com --- tools/kvm/disk/qcow.c | 105 +--- 1 files changed, 89 insertions(+), 16 deletions(-) diff --git a/tools/kvm/disk/qcow.c b/tools/kvm/disk/qcow.c index e139fa5..929ba69 100644 --- a/tools/kvm/disk/qcow.c +++ b/tools/kvm/disk/qcow.c @@ -12,6 +12,7 @@ #include string.h #include unistd.h #include fcntl.h +#include errno.h #ifdef CONFIG_HAS_ZLIB #include zlib.h #endif @@ -20,6 +21,10 @@ #include linux/kernel.h #include linux/types.h +static int update_cluster_refcount(struct qcow *q, u64 clust_idx, u16 append); +static int qcow_write_refcount_table(struct qcow *q); +static u64 qcow_alloc_clusters(struct qcow *q, u64 size, int update_ref); +static void qcow_free_clusters(struct qcow *q, u64 clust_start, u64 size); static inline int qcow_pwrite_sync(int fd, void *buf, size_t count, off_t offset) @@ -657,6 +662,56 @@ static struct qcow_refcount_block *refcount_block_search(struct qcow *q, u64 off return rfb; } +static struct qcow_refcount_block *qcow_grow_refcount_block(struct qcow *q, + u64 clust_idx) +{ + struct qcow_header *header = q-header; + struct qcow_refcount_table *rft = q-refcount_table; + struct qcow_refcount_block *rfb; + u64 new_block_offset; + u64 rft_idx; + + rft_idx = clust_idx (header-cluster_bits - + QCOW_REFCOUNT_BLOCK_SHIFT); + + if (rft_idx = rft-rf_size) { + pr_warning(Don't support grow refcount block table); + return NULL; + } + + new_block_offset = qcow_alloc_clusters(q, q-cluster_size, 0); + if (new_block_offset 0) + return NULL; + + rfb = new_refcount_block(q, new_block_offset); + if (!rfb) + return NULL; + + memset(rfb-entries, 0x00, q-cluster_size); + rfb-dirty = 1; + + /* write refcount block */ + if (write_refcount_block(q, rfb) 0) + goto free_rfb; + + if (cache_refcount_block(q, rfb) 0) + goto free_rfb; + + rft-rf_table[rft_idx] = cpu_to_be64(new_block_offset); + if (qcow_write_refcount_table(q) 0) + goto free_rfb; + + if (update_cluster_refcount(q, new_block_offset + header-cluster_bits, 1) 0) + goto free_rfb; + + return rfb; + +free_rfb: + free(rfb); + return NULL; +} + static struct qcow_refcount_block *qcow_read_refcount_block(struct qcow *q, u64 clust_idx) { struct qcow_header *header = q-header; @@ -667,14 +722,11 @@ static struct qcow_refcount_block *qcow_read_refcount_block(struct qcow *q, u64 rft_idx = clust_idx (header-cluster_bits - QCOW_REFCOUNT_BLOCK_SHIFT); if (rft_idx = rft-rf_size) - return NULL; + return (void *)-ENOSPC; rfb_offset = be64_to_cpu(rft-rf_table[rft_idx]); - - if (!rfb_offset) { - pr_warning(Don't support to grow refcount table); - return NULL; - } + if (!rfb_offset) + return (void *)-ENOSPC; rfb = refcount_block_search(q, rfb_offset); if (rfb) @@ -708,7 +760,8 @@ static u16 qcow_get_refcount(struct qcow *q, u64 clust_idx) if (!rfb) { pr_warning(Error while reading refcount table); return -1; - } + } else if ((long)rfb == -ENOSPC) + return 0; rfb_idx = clust_idx (((1ULL (header-cluster_bits - QCOW_REFCOUNT_BLOCK_SHIFT)) - 1)); @@ -732,6 +785,12 @@ static int update_cluster_refcount(struct qcow *q, u64 clust_idx, u16 append) if (!rfb) { pr_warning(error while reading refcount table); return -1; + } else if ((long)rfb == -ENOSPC) { + rfb = qcow_grow_refcount_block(q, clust_idx); + if (!rfb) { + pr_warning(error while growing refcount table); + return -1; + } } rfb_idx = clust_idx (((1ULL @@ -774,11 +833,11 @@ static void qcow_free_clusters(struct qcow *q, u64 clust_start, u64 size) * can satisfy the size. free_clust_idx is initialized to zero and * Record last position. */ -static u64 qcow_alloc_clusters(struct qcow *q, u64 size) +static u64 qcow_alloc_clusters(struct qcow *q, u64 size, int update_ref) { struct qcow_header *header = q-header; u16 clust_refcount; - u32 clust_idx, i; + u32 clust_idx = 0, i; u64 clust_num; clust_num = (size + (q-cluster_size - 1)) header-cluster_bits; @@ -793,12 +852,15 @@ again: goto again; } - for (i = 0; i clust_num; i++) - if (update_cluster_refcount(q, - q-free_clust_idx -
New Guess OS Creation Problem
Hi All, I am running on CentOS Released 6.1 final. Been using and running Linux KVM quite well for quite some time, something goes wrong after I perform yum upgrade. I created new VM yesterday without any problem, same exact installation procedure, installed FreeBSD 8.2. I tried to create a new VM today after yum upgrade, it's able to detect the hard disk, when I start commit FreeBSD 8.2 installation, it complains cannot write to disk as stated the error message below. block I/O error in device 'drive-ide0-0-0': Invalid argument (22) block I/O error in device 'drive-ide0-0-0': Invalid argument (22) block I/O error in device 'drive-ide0-0-0': Invalid argument (22) block I/O error in device 'drive-ide0-0-0': Invalid argument (22) block I/O error in device 'drive-ide0-0-0': Invalid argument (22) block I/O error in device 'drive-ide0-0-0': Invalid argument (22) Below is the log I capture from the VM log file -- Log start -- 2011-12-12 00:23:39.485: starting up LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -S -M rhel6.1.0 -enable-kvm -m 4096 -smp 1,sockets=1,cores=1,threads=1 -name database -uuid f3e9f320-7826-7e50-94bb-1833f7fd9dfb -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/database.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-reboot -drive file=/opt/cibai/database,if=none,id=drive-ide0-0-0,format=raw,cache=none,aio=threads -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=2 -drive file=/opt/ISO-Download/FreeBSD-8.2-RELEASE-amd64-disc1.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,aio=threads -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev tap,fd=22,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:77:a5:a6,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:2,password -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 char device redirected to /dev/pts/6 Using CPU model cpu64-rhel6 block I/O error in device 'drive-ide0-0-0': Invalid argument (22) block I/O error in device 'drive-ide0-0-0': Invalid argument (22) block I/O error in device 'drive-ide0-0-0': Invalid argument (22) block I/O error in device 'drive-ide0-0-0': Invalid argument (22) block I/O error in device 'drive-ide0-0-0': Invalid argument (22) block I/O error in device 'drive-ide0-0-0': Invalid argument (22) -- Log End -- Below is the software version currently running; gpxe-roms-qemu-0.9.7-6.7.el6.noarch qemu-img-0.12.1.2-2.160.el6_1.8.x86_64 qemu-kvm-0.12.1.2-2.160.el6_1.8.x86_64 libvirt-client-0.8.7-18.el6_1.4.x86_64 libvirt-python-0.8.7-18.el6_1.4.x86_64 libvirt-0.8.7-18.el6_1.4.x86_64 Any of you having the problem as well? I am planning to install CentOS as guest and see whether is has the same problem as well. Thanks. -- Paul Ooi-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] kvm: make vcpu life cycle separated from kvm instance
From: Liu Ping Fan pingf...@linux.vnet.ibm.com Currently, vcpu can be destructed only when kvm instance destroyed. Change this to vcpu's destruction taken when its refcnt is zero, and then vcpu MUST and CAN be destroyed before kvm's destroy. Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com --- arch/x86/kvm/i8254.c | 10 -- arch/x86/kvm/i8259.c | 12 +-- arch/x86/kvm/mmu.c |7 ++-- arch/x86/kvm/x86.c | 54 +++ include/linux/kvm_host.h | 71 ++ virt/kvm/irq_comm.c |7 +++- virt/kvm/kvm_main.c | 62 +-- 7 files changed, 170 insertions(+), 53 deletions(-) diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index 76e3f1c..ac79598 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -289,7 +289,7 @@ static void pit_do_work(struct work_struct *work) struct kvm_pit *pit = container_of(work, struct kvm_pit, expired); struct kvm *kvm = pit-kvm; struct kvm_vcpu *vcpu; - int i; + struct kvm_iter it; struct kvm_kpit_state *ps = pit-pit_state; int inject = 0; @@ -315,9 +315,13 @@ static void pit_do_work(struct work_struct *work) * LVT0 to NMI delivery. Other PIC interrupts are just sent to * VCPU0, and only if its LVT0 is in EXTINT mode. */ - if (kvm-arch.vapics_in_nmi_mode 0) - kvm_for_each_vcpu(i, vcpu, kvm) + if (kvm-arch.vapics_in_nmi_mode 0) { + rcu_read_lock(); + kvm_for_each_vcpu(it, vcpu, kvm) { kvm_apic_nmi_wd_deliver(vcpu); + } + rcu_read_unlock(); + } } } diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c index cac4746..2186b30 100644 --- a/arch/x86/kvm/i8259.c +++ b/arch/x86/kvm/i8259.c @@ -50,25 +50,29 @@ static void pic_unlock(struct kvm_pic *s) { bool wakeup = s-wakeup_needed; struct kvm_vcpu *vcpu, *found = NULL; - int i; + struct kvm *kvm = s-kvm; + struct kvm_iter it; s-wakeup_needed = false; spin_unlock(s-lock); if (wakeup) { - kvm_for_each_vcpu(i, vcpu, s-kvm) { + rcu_read_lock(); + kvm_for_each_vcpu(it, vcpu, kvm) if (kvm_apic_accept_pic_intr(vcpu)) { found = vcpu; break; } - } - if (!found) + if (!found) { + rcu_read_unlock(); return; + } kvm_make_request(KVM_REQ_EVENT, found); kvm_vcpu_kick(found); + rcu_read_unlock(); } } diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index f1b36cf..c16887e 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1833,11 +1833,12 @@ static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte) static void kvm_mmu_reset_last_pte_updated(struct kvm *kvm) { - int i; + struct kvm_iter it; struct kvm_vcpu *vcpu; - - kvm_for_each_vcpu(i, vcpu, kvm) + rcu_read_lock(); + kvm_for_each_vcpu(it, vcpu, kvm) vcpu-arch.last_pte_updated = NULL; + rcu_read_unlock(); } static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c38efd7..a302470 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1831,10 +1831,15 @@ static int get_msr_hyperv(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) switch (msr) { case HV_X64_MSR_VP_INDEX: { int r; + struct kvm_iter it; struct kvm_vcpu *v; - kvm_for_each_vcpu(r, v, vcpu-kvm) + struct kvm *kvm = vcpu-kvm; + rcu_read_lock(); + kvm_for_each_vcpu(it, v, kvm) { if (v == vcpu) data = r; + } + rcu_read_unlock(); break; } case HV_X64_MSR_EOI: @@ -4966,7 +4971,8 @@ static int kvmclock_cpufreq_notifier(struct notifier_block *nb, unsigned long va struct cpufreq_freqs *freq = data; struct kvm *kvm; struct kvm_vcpu *vcpu; - int i, send_ipi = 0; + int send_ipi = 0; + struct kvm_iter it; /* * We allow guests to temporarily run on slowing clocks, @@ -5016,13 +5022,16 @@ static int kvmclock_cpufreq_notifier(struct notifier_block *nb, unsigned long va raw_spin_lock(kvm_lock); list_for_each_entry(kvm, vm_list, vm_list) { - kvm_for_each_vcpu(i, vcpu, kvm) { + + rcu_read_lock(); +
Re: [RFC] virtio: use mandatory barriers for remote processor vdevs
On 12/12/11 06:27, Benjamin Herrenschmidt wrote: On Sun, 2011-12-11 at 14:25 +0200, Michael S. Tsirkin wrote: Forwarding some results by Amos, who run multiple netperf streams in parallel, from an external box to the guest. TCP_STREAM results were noisy. This could be due to buffering done by TCP, where packet size varies even as message size is constant. TCP_RR results were consistent. In this benchmark, after switching to mandatory barriers, CPU utilization increased by up to 35% while throughput went down by up to 14%. the normalized throughput/cpu regressed consistently, between 7 and 35% The fix applied was simply this: What machine processor was this ? pined guest memory to numa node 1 # numactl -m 1 qemu-kvm ... pined guest vcpu threads and vhost thread to single cpu of numa node 1 # taskset -p 0x10 8348 (vhost_net_thread) # taskset -p 0x20 8353 (vcpu 1 thread) # taskset -p 0x40 8357 (vcpu 2 thread) pined cpu/memory of netperf client process to node 1 # numactl --cpunodebind=1 --membind=1 netperf ... 8 cores --- processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz stepping: 2 microcode : 0xc cpu MHz : 1596.000 cache size : 12288 KB physical id : 1 siblings: 4 core id : 10 cpu cores : 4 apicid : 52 initial apicid : 52 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb dts tpr_shadow vnmi flexpriority ept vpid bogomips: 4787.76 clflush size: 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: # cat /proc/meminfo MemTotal: 16446616 kB MemFree:15874092 kB Buffers: 30404 kB Cached: 238640 kB SwapCached:0 kB Active: 100204 kB Inactive: 184312 kB Active(anon): 15724 kB Inactive(anon):4 kB Active(file): 84480 kB Inactive(file): 184308 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 8388604 kB SwapFree:8388604 kB Dirty:56 kB Writeback: 0 kB AnonPages: 15548 kB Mapped:11540 kB Shmem: 256 kB Slab: 82444 kB SReclaimable: 19220 kB SUnreclaim:63224 kB KernelStack:1224 kB PageTables: 2256 kB NFS_Unstable: 0 kB Bounce:0 kB WritebackTmp: 0 kB CommitLimit:16611912 kB Committed_AS: 209068 kB VmallocTotal: 34359738367 kB VmallocUsed: 224244 kB VmallocChunk: 34351073668 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB HugePages_Total: 0 HugePages_Free:0 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 2048 kB DirectMap4k:9876 kB DirectMap2M: 2070528 kB DirectMap1G:14680064 kB # numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 node 0 size: 8175 MB node 0 free: 7706 MB node 1 cpus: 4 5 6 7 node 1 size: 8192 MB node 1 free: 7796 MB node distances: node 0 1 0: 10 20 1: 20 10 # numactl --show policy: default preferred node: current physcpubind: 0 1 2 3 4 5 6 7 cpubind: 0 1 nodebind: 0 1 membind: 0 1 Cheers, Ben. diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 3198f2e..fdccb77 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -23,7 +23,7 @@ /* virtio guest is communicating with a virtual device that actually runs on * a host processor. Memory barriers are used to control SMP effects. */ -#ifdef CONFIG_SMP +#if 0 /* Where possible, use SMP barriers which are more lightweight than mandatory * barriers, because mandatory barriers control MMIO effects on accesses * through relaxed memory I/O windows (which virtio does not use). */ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] KVM: Rename vm_list to kvm_list to avoid confusion
On 12/12/2011 06:24 AM, Takuya Yoshikawa wrote: From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Make it clear that this is not related to virtual memory. 'vm' means 'virtual machine'... Remove vm_ prefix from the corresponding member of the struct kvm to avoid kvm-vm_ redundancy alongside. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] KVM: Rename vm_list to kvm_list to avoid confusion
(2011/12/12 12:16), Xiao Guangrong wrote: On 12/12/2011 06:24 AM, Takuya Yoshikawa wrote: From: Takuya Yoshikawayoshikawa.tak...@oss.ntt.co.jp Make it clear that this is not related to virtual memory. 'vm' means 'virtual machine'... Of course I know. So I wrote not related to virtual memory. What's your point? Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New Guess OS Creation Problem
Hi All, I have tried to install Centos on guest, it has the same problem, cannot read the HDD. I format it in qcow2 for linux and raw in FreeBSD. -- Paul Ooi On Dec 12, 2011, at 10:13 AM, takizo wrote: Hi All, I am running on CentOS Released 6.1 final. Been using and running Linux KVM quite well for quite some time, something goes wrong after I perform yum upgrade. I created new VM yesterday without any problem, same exact installation procedure, installed FreeBSD 8.2. I tried to create a new VM today after yum upgrade, it's able to detect the hard disk, when I start commit FreeBSD 8.2 installation, it complains cannot write to disk as stated the error message below. block I/O error in device 'drive-ide0-0-0': Invalid argument (22) block I/O error in device 'drive-ide0-0-0': Invalid argument (22) block I/O error in device 'drive-ide0-0-0': Invalid argument (22) block I/O error in device 'drive-ide0-0-0': Invalid argument (22) block I/O error in device 'drive-ide0-0-0': Invalid argument (22) block I/O error in device 'drive-ide0-0-0': Invalid argument (22) Below is the log I capture from the VM log file -- Log start -- 2011-12-12 00:23:39.485: starting up LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -S -M rhel6.1.0 -enable-kvm -m 4096 -smp 1,sockets=1,cores=1,threads=1 -name database -uuid f3e9f320-7826-7e50-94bb-1833f7fd9dfb -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/database.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-reboot -drive file=/opt/cibai/database,if=none,id=drive-ide0-0-0,format=raw,cache=none,aio=threads -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=2 -drive file=/opt/ISO-Download/FreeBSD-8.2-RELEASE-amd64-disc1.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,aio=threads -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev tap,fd=22,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:77:a5:a6,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:2,password -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 char device redirected to /dev/pts/6 Using CPU model cpu64-rhel6 block I/O error in device 'drive-ide0-0-0': Invalid argument (22) block I/O error in device 'drive-ide0-0-0': Invalid argument (22) block I/O error in device 'drive-ide0-0-0': Invalid argument (22) block I/O error in device 'drive-ide0-0-0': Invalid argument (22) block I/O error in device 'drive-ide0-0-0': Invalid argument (22) block I/O error in device 'drive-ide0-0-0': Invalid argument (22) -- Log End -- Below is the software version currently running; gpxe-roms-qemu-0.9.7-6.7.el6.noarch qemu-img-0.12.1.2-2.160.el6_1.8.x86_64 qemu-kvm-0.12.1.2-2.160.el6_1.8.x86_64 libvirt-client-0.8.7-18.el6_1.4.x86_64 libvirt-python-0.8.7-18.el6_1.4.x86_64 libvirt-0.8.7-18.el6_1.4.x86_64 Any of you having the problem as well? I am planning to install CentOS as guest and see whether is has the same problem as well. Thanks. -- Paul Ooi -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] KVM: Rename vm_list to kvm_list to avoid confusion
On 12/12/2011 12:04 PM, Takuya Yoshikawa wrote: (2011/12/12 12:16), Xiao Guangrong wrote: On 12/12/2011 06:24 AM, Takuya Yoshikawa wrote: From: Takuya Yoshikawayoshikawa.tak...@oss.ntt.co.jp Make it clear that this is not related to virtual memory. 'vm' means 'virtual machine'... Of course I know. So I wrote not related to virtual memory. What's your point? In the code, we have kvm_create_vm()/kvm_destroy_vm(), then add/delete the 'vm to/from the vm_list, it is really clear, so i think this name is OK. :) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in kvm on next-s390
The Buildbot has detected a new failure on builder next-s390 while building kvm. Full details are available at: http://buildbot.b1-systems.de/kvm/builders/next-s390/builds/380 Buildbot URL: http://buildbot.b1-systems.de/kvm/ Buildslave for this Build: b1_kvm_1 Build Reason: The Nightly scheduler named 'nightly_next' triggered this build Build Source Stamp: [branch next] HEAD Blamelist: BUILD FAILED: failed compile sincerely, -The Buildbot
Re: [PATCH V2 17/23] kvm tools: Add ability to map guest RAM from hugetlbfs
On 09/12/11 18:39, Sasha Levin wrote: On Fri, 2011-12-09 at 17:55 +1100, Matt Evans wrote: Add a --hugetlbfs commandline option to give a path to hugetlbfs-map guest memory (down in kvm__arch_init()). For x86, guest memory is a normal ANON mmap() if this option is not provided, otherwise a hugetlbfs mmap. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/builtin-run.c |4 ++- tools/kvm/include/kvm/kvm.h |4 +- tools/kvm/include/kvm/util.h |4 +++ tools/kvm/kvm.c |4 +- tools/kvm/util.c | 45 ++ tools/kvm/x86/kvm.c | 20 +++-- 6 files changed, 73 insertions(+), 8 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 7969901..0acfe81 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -82,6 +82,7 @@ static const char *guest_mac; static const char *host_mac; static const char *script; static const char *guest_name; +static const char *hugetlbfs_path; static struct virtio_net_params *net_params; static bool single_step; static bool readonly_image[MAX_DISK_IMAGES]; @@ -422,6 +423,7 @@ static const struct option options[] = { OPT_CALLBACK('\0', tty, NULL, tty id, Remap guest TTY into a pty on the host, tty_parser), +OPT_STRING('\0', hugetlbfs, hugetlbfs_path, path, Hugetlbfs path), OPT_GROUP(Kernel options:), OPT_STRING('k', kernel, kernel_filename, kernel, @@ -807,7 +809,7 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) guest_name = default_name; } -kvm = kvm__init(dev, ram_size, guest_name); +kvm = kvm__init(dev, hugetlbfs_path, ram_size, guest_name); kvm-single_step = single_step; diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h index 5fe6e75..7159952 100644 --- a/tools/kvm/include/kvm/kvm.h +++ b/tools/kvm/include/kvm/kvm.h @@ -30,7 +30,7 @@ struct kvm_ext { void kvm__set_dir(const char *fmt, ...); const char *kvm__get_dir(void); -struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name); +struct kvm *kvm__init(const char *kvm_dev, const char *hugetlbfs_path, u64 ram_size, const char *name); int kvm__recommended_cpus(struct kvm *kvm); int kvm__max_cpus(struct kvm *kvm); void kvm__init_ram(struct kvm *kvm); @@ -54,7 +54,7 @@ int kvm__enumerate_instances(int (*callback)(const char *name, int pid)); void kvm__remove_socket(const char *name); void kvm__arch_set_cmdline(char *cmdline, bool video); -void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, u64 ram_size, const char *name); +void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, const char *hugetlbfs_path, u64 ram_size, const char *name); void kvm__arch_setup_firmware(struct kvm *kvm); bool kvm__arch_cpu_supports_vm(void); void kvm__arch_periodic_poll(struct kvm *kvm); diff --git a/tools/kvm/include/kvm/util.h b/tools/kvm/include/kvm/util.h index dc2e0b9..1f6fbbd 100644 --- a/tools/kvm/include/kvm/util.h +++ b/tools/kvm/include/kvm/util.h @@ -20,6 +20,7 @@ #include limits.h #include sys/param.h #include sys/types.h +#include linux/types.h #ifdef __GNUC__ #define NORETURN __attribute__((__noreturn__)) @@ -75,4 +76,7 @@ static inline void msleep(unsigned int msecs) { usleep(MSECS_TO_USECS(msecs)); } + +void *mmap_hugetlbfs(const char *htlbfs_path, u64 size); + #endif /* KVM__UTIL_H */ diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c index c54f886..35ca2c5 100644 --- a/tools/kvm/kvm.c +++ b/tools/kvm/kvm.c @@ -306,7 +306,7 @@ int kvm__max_cpus(struct kvm *kvm) return ret; } -struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name) +struct kvm *kvm__init(const char *kvm_dev, const char *hugetlbfs_path, u64 ram_size, const char *name) { struct kvm *kvm; int ret; @@ -339,7 +339,7 @@ struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name) if (kvm__check_extensions(kvm)) die(A required KVM extention is not supported by OS); -kvm__arch_init(kvm, kvm_dev, ram_size, name); +kvm__arch_init(kvm, kvm_dev, hugetlbfs_path, ram_size, name); kvm-name = name; diff --git a/tools/kvm/util.c b/tools/kvm/util.c index 4efbce9..90b6a3b 100644 --- a/tools/kvm/util.c +++ b/tools/kvm/util.c @@ -4,6 +4,11 @@ #include kvm/util.h +#include linux/magic.h/* For HUGETLBFS_MAGIC */ +#include sys/mman.h +#include sys/stat.h +#include sys/statfs.h + static void report(const char *prefix, const char *err, va_list params) { char msg[1024]; @@ -99,3 +104,43 @@ size_t strlcat(char *dest, const char *src, size_t count) return res; } + +void *mmap_hugetlbfs(const char *htlbfs_path, u64 size) +{ +char mpath[PATH_MAX]; +int fd; +int r; +struct statfs sfs; +
Re: [RFC] virtio: use mandatory barriers for remote processor vdevs
On Mon, 12 Dec 2011 11:06:53 +0800, Amos Kong ak...@redhat.com wrote: On 12/12/11 06:27, Benjamin Herrenschmidt wrote: On Sun, 2011-12-11 at 14:25 +0200, Michael S. Tsirkin wrote: Forwarding some results by Amos, who run multiple netperf streams in parallel, from an external box to the guest. TCP_STREAM results were noisy. This could be due to buffering done by TCP, where packet size varies even as message size is constant. TCP_RR results were consistent. In this benchmark, after switching to mandatory barriers, CPU utilization increased by up to 35% while throughput went down by up to 14%. the normalized throughput/cpu regressed consistently, between 7 and 35% The fix applied was simply this: What machine processor was this ? pined guest memory to numa node 1 Please try this patch. How much does the branch cost us? (Compiles, untested). Thanks, Rusty. From: Rusty Russell ru...@rustcorp.com.au Subject: virtio: harsher barriers for virtio-mmio. We were cheating with our barriers; using the smp ones rather than the real device ones. That was fine, until virtio-mmio came along, which could be talking to a real device (a non-SMP CPU). Unfortunately, just putting back the real barriers (reverting d57ed95d) causes a performance regression on virtio-pci. In particular, Amos reports netbench's TCP_RR over virtio_net CPU utilization increased up to 35% while throughput went down by up to 14%. By comparison, this branch costs us??? Reference: https://lkml.org/lkml/2011/12/11/22 Signed-off-by: Rusty Russell ru...@rustcorp.com.au --- drivers/lguest/lguest_device.c | 10 ++ drivers/s390/kvm/kvm_virtio.c |2 +- drivers/virtio/virtio_mmio.c |7 --- drivers/virtio/virtio_pci.c|4 ++-- drivers/virtio/virtio_ring.c | 34 +- include/linux/virtio_ring.h|1 + tools/virtio/linux/virtio.h|1 + tools/virtio/virtio_test.c |3 ++- 8 files changed, 38 insertions(+), 24 deletions(-) diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c --- a/drivers/lguest/lguest_device.c +++ b/drivers/lguest/lguest_device.c @@ -291,11 +291,13 @@ static struct virtqueue *lg_find_vq(stru } /* -* OK, tell virtio_ring.c to set up a virtqueue now we know its size -* and we've got a pointer to its pages. +* OK, tell virtio_ring.c to set up a virtqueue now we know its size +* and we've got a pointer to its pages. Note that we set weak_barriers +* to 'true': the host just a(nother) SMP CPU, so we only need inter-cpu +* barriers. */ - vq = vring_new_virtqueue(lvq-config.num, LGUEST_VRING_ALIGN, -vdev, lvq-pages, lg_notify, callback, name); + vq = vring_new_virtqueue(lvq-config.num, LGUEST_VRING_ALIGN, vdev, +true, lvq-pages, lg_notify, callback, name); if (!vq) { err = -ENOMEM; goto unmap; diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c --- a/drivers/s390/kvm/kvm_virtio.c +++ b/drivers/s390/kvm/kvm_virtio.c @@ -198,7 +198,7 @@ static struct virtqueue *kvm_find_vq(str goto out; vq = vring_new_virtqueue(config-num, KVM_S390_VIRTIO_RING_ALIGN, -vdev, (void *) config-address, +vdev, true, (void *) config-address, kvm_notify, callback, name); if (!vq) { err = -ENOMEM; diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -309,9 +309,10 @@ static struct virtqueue *vm_setup_vq(str writel(virt_to_phys(info-queue) PAGE_SHIFT, vm_dev-base + VIRTIO_MMIO_QUEUE_PFN); - /* Create the vring */ - vq = vring_new_virtqueue(info-num, VIRTIO_MMIO_VRING_ALIGN, -vdev, info-queue, vm_notify, callback, name); + /* Create the vring: no weak barriers, the other side is could +* be an independent device. */ + vq = vring_new_virtqueue(info-num, VIRTIO_MMIO_VRING_ALIGN, vdev, +false, info-queue, vm_notify, callback, name); if (!vq) { err = -ENOMEM; goto error_new_virtqueue; diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c --- a/drivers/virtio/virtio_pci.c +++ b/drivers/virtio/virtio_pci.c @@ -414,8 +414,8 @@ static struct virtqueue *setup_vq(struct vp_dev-ioaddr + VIRTIO_PCI_QUEUE_PFN); /* create the vring */ - vq = vring_new_virtqueue(info-num, VIRTIO_PCI_VRING_ALIGN, -vdev, info-queue, vp_notify, callback, name); + vq = vring_new_virtqueue(info-num, VIRTIO_PCI_VRING_ALIGN, vdev, +
Re: [PATCH V2 17/23] kvm tools: Add ability to map guest RAM from hugetlbfs
On 09/12/11 19:42, Pekka Enberg wrote: On Fri, Dec 9, 2011 at 8:55 AM, Matt Evans m...@ozlabs.org wrote: Add a --hugetlbfs commandline option to give a path to hugetlbfs-map guest memory (down in kvm__arch_init()). For x86, guest memory is a normal ANON mmap() if this option is not provided, otherwise a hugetlbfs mmap. Signed-off-by: Matt Evans m...@ozlabs.org Btw, why don't you want to use MADV_HUGEPAGE for this? You could just do it unconditionally, no? Well, I'm manually mapping from hugetlbfs as currently* PPC KVM requires hugepages to back guest RAM and MADV_HUGEPAGE is just a hint, no? I also wanted things to work on kernels without transparent hugepages enabled. I think it's safer to do things explicitly, as if the user requests hugepages it's more transparent (I'm thinking benchmarking, etc.) to be definitely using hugepages. Cheers, Matt *: I know Paul's posted patches to implement smallpage support... so this will change in time. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in kvm on next-x86_64
The Buildbot has detected a new failure on builder next-x86_64 while building kvm. Full details are available at: http://buildbot.b1-systems.de/kvm/builders/next-x86_64/builds/378 Buildbot URL: http://buildbot.b1-systems.de/kvm/ Buildslave for this Build: b1_kvm_1 Build Reason: The Nightly scheduler named 'nightly_next' triggered this build Build Source Stamp: [branch next] HEAD Blamelist: BUILD FAILED: failed compile sincerely, -The Buildbot N�r��yb�X��ǧv�^�){.n�+h����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf
buildbot failure in kvm on next-i386
The Buildbot has detected a new failure on builder next-i386 while building kvm. Full details are available at: http://buildbot.b1-systems.de/kvm/builders/next-i386/builds/378 Buildbot URL: http://buildbot.b1-systems.de/kvm/ Buildslave for this Build: b1_kvm_1 Build Reason: The Nightly scheduler named 'nightly_next' triggered this build Build Source Stamp: [branch next] HEAD Blamelist: BUILD FAILED: failed compile sincerely, -The Buildbot
Re: [PATCH V2 04/23] kvm tools: Get correct 64-bit types on PPC64 and link appropriately
On Mon, 2011-12-12 at 12:03 +1100, Matt Evans wrote: On 09/12/11 19:29, Pekka Enberg wrote: On Fri, Dec 9, 2011 at 10:24 AM, Sasha Levin levinsasha...@gmail.com wrote: If you also got kernel patches that add __SANE_USERSPACE_TYPES__ to the headers, and KVM_CAP_NR_VCPUS to KVM PPC, we can carry them in the KVM tools tree as well. Yup, all we need is ACKs from PPC maintainers. Cool, I've pinged them re. __SANE_USERSPACE_TYPES__, and that would be useful to carry in your tree. But, IMHO, the patch I sent to Alex ([PATCH] KVM: PPC: Add KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS) should go in via his tree. The KVM/PPC-HV kernel folk will get it quicker in their trees (pulled from Alex) and it's not a build problem, only a limit of SMP CPU numbers. That is, if you're building a kernel for PPC KVM today you'll probably use something more similar to Alex's tree than mainline/kvm tools tree. Definitely. The __SANE_USERSPACE_TYPES__ patch should probably go to powerpc git tree in addition to our tree. Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 17/23] kvm tools: Add ability to map guest RAM from hugetlbfs
On Mon, Dec 12, 2011 at 7:17 AM, Matt Evans m...@ozlabs.org wrote: Well, I'm manually mapping from hugetlbfs as currently* PPC KVM requires hugepages to back guest RAM and MADV_HUGEPAGE is just a hint, no? I also wanted things to work on kernels without transparent hugepages enabled. I think it's safer to do things explicitly, as if the user requests hugepages it's more transparent (I'm thinking benchmarking, etc.) to be definitely using hugepages. OK, makes sense. You should probably mention that in the changelog. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 17/23] kvm tools: Add ability to map guest RAM from hugetlbfs
On 09/12/11 19:38, Pekka Enberg wrote: On Fri, Dec 9, 2011 at 8:55 AM, Matt Evans m...@ozlabs.org wrote: Add a --hugetlbfs commandline option to give a path to hugetlbfs-map guest memory (down in kvm__arch_init()). For x86, guest memory is a normal ANON mmap() if this option is not provided, otherwise a hugetlbfs mmap. Signed-off-by: Matt Evans m...@ozlabs.org +void *mmap_hugetlbfs(const char *htlbfs_path, u64 size) +{ + char mpath[PATH_MAX]; + int fd; + int r; + struct statfs sfs; + void *addr; + + do { + /* +* QEMU seems to work around this returning EINTR... Let's do +* that too. +*/ + r = statfs(htlbfs_path, sfs); + } while (r errno == EINTR); Can this really happen? What about EAGAIN? The retry logic really wants to live in tools/kvm/read-write.c as a xstatfs() wrapper if we do need this. I don't think it can. As per the comment, I thought QEMU knew something I didn't but I haven't seen any other reason for doing this. I'll remove it, thanks for the sanity jolt. Matt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Android-virt] [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace
On 11.12.2011, at 20:48, Peter Maydell peter.mayd...@linaro.org wrote: On 11 December 2011 19:30, Christoffer Dall c.d...@virtualopensystems.com wrote: On Sun, Dec 11, 2011 at 11:03 AM, Peter Maydell peter.mayd...@linaro.org wrote: Removing the mask would be wrong since the irq field here is encoding both cpu number and irq-vs-fiq. The default is just an unreachable condition. (Why are we using % here rather than the obvious bit operation, incidentally?) right, I will remove the default case. I highly doubt that the difference in using a bitop will be measurably more efficient, but if you feel strongly about it, I can change it to a shift and bitwise and, which I assume is what you mean by the obvious bit operation? I think my CS background speaks for using %, but whatever. Certainly the compiler ought to be able to figure out the two are the same thing; I just think irq 1 is more readable than irq % 2 (because it's being clear that it's treating the variable as a pile of bits rather than an integer). This is bikeshedding rather, though, and style issues in kernel code are a matter for the kernel folk. So you can ignore me :-) Yes, the general rule of thumb is to use bit operations where you can. And in this case it certainly makes sense :). Plus, bit operations are an order of magnitude faster than div/mod usually. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] KVM: Rename vm_list to kvm_list to avoid confusion
(2011/12/12 13:51), Xiao Guangrong wrote: On 12/12/2011 12:04 PM, Takuya Yoshikawa wrote: (2011/12/12 12:16), Xiao Guangrong wrote: On 12/12/2011 06:24 AM, Takuya Yoshikawa wrote: From: Takuya Yoshikawayoshikawa.tak...@oss.ntt.co.jp Make it clear that this is not related to virtual memory. 'vm' means 'virtual machine'... Of course I know. So I wrote not related to virtual memory. What's your point? In the code, we have kvm_create_vm()/kvm_destroy_vm(), then add/delete the 'vm to/from the vm_list, it is really clear, so i think this name is OK. :) Some reasons I wanted to change this: - The lock which protects this list is called kvm_lock, not vm_lock - Some architectures are using vm_list for vm region member - The list connects kvm instances (struct kvm) and we are doing list_for_each_entry(kvm, vm_list, vm_list), not list_for_each_entry(vm, vm_list, vm_list) In the case of kvm_create_vm(), it creates not only a kvm instance but also does more virtual machine initialization generally. So _vm is reasonable. (I do not mind if it is static in kvm_main.c but it is more widely used.) But I do not mind to drop this patch if other people also want to keep the name. So I will wait some more comments. Thanks, Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Current kernel fails to compile with KVM on PowerPC
On 11.12.2011, at 16:16, Jörg Sommer wrote: Alexander Graf hat am Tue 22. Nov, 22:29 (+0100) geschrieben: On 22.11.2011, at 21:04, Jörg Sommer wrote: Jörg Sommer hat am Mon 07. Nov, 20:48 (+0100) geschrieben: I'm trying to build the kernel with the git commit-id 31555213f03bca37d2c02e10946296052f4ecfcd, but it fails CHK include/linux/version.h HOSTCC scripts/mod/modpost.o CHK include/generated/utsrelease.h UPD include/generated/utsrelease.h HOSTLD scripts/mod/modpost GEN include/generated/bounds.h CC arch/powerpc/kernel/asm-offsets.s In file included from arch/powerpc/kernel/asm-offsets.c:59:0: /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h: In function ‘compute_tlbie_rb’: /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:393:10: error: ‘HPTE_V_SECONDARY’ undeclared (first use in this function) /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:393:10: note: each undeclared identifier is reported only once for each function it appears in /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:396:12: error: ‘HPTE_V_1TB_SEG’ undeclared (first use in this function) /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:401:10: error: ‘HPTE_V_LARGE’ undeclared (first use in this function) /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:415:2: warning: right shift count = width of type [enabled by default] make[3]: *** [arch/powerpc/kernel/asm-offsets.s] Fehler 1 make[2]: *** [prepare0] Fehler 2 make[1]: *** [deb-pkg] Fehler 2 make: *** [deb-pkg] Fehler 2 I'm still having this problem. I can' build 6fe4c6d466e95d31164f14b1ac4aefb51f0f4f82. Are there any patches to make the kernel builds and do not oops [1] on PowerPC? The failures above should be fixed by now. I've pulled git://git.kernel.org/pub/scm/virt/kvm/kvm.git (a41d08d13f903da5c633fc58ee074156f05ab3ce), but this tree doesn't contain a suitable commit. Where can I find it? Please try: git://github.com/agraf/linux-2.6.git kvm-ppc-next That's my WIP tree. I still have a few more patches I want to collect before shoving everything through automated testing and pushing it on to Avi. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Current kernel fails to compile with KVM on PowerPC
Alexander Graf hat am Tue 22. Nov, 22:29 (+0100) geschrieben: On 22.11.2011, at 21:04, Jörg Sommer wrote: Jörg Sommer hat am Mon 07. Nov, 20:48 (+0100) geschrieben: I'm trying to build the kernel with the git commit-id 31555213f03bca37d2c02e10946296052f4ecfcd, but it fails CHK include/linux/version.h HOSTCC scripts/mod/modpost.o CHK include/generated/utsrelease.h UPD include/generated/utsrelease.h HOSTLD scripts/mod/modpost GEN include/generated/bounds.h CC arch/powerpc/kernel/asm-offsets.s In file included from arch/powerpc/kernel/asm-offsets.c:59:0: /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h: In function ‘compute_tlbie_rb’: /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:393:10: error: ‘HPTE_V_SECONDARY’ undeclared (first use in this function) /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:393:10: note: each undeclared identifier is reported only once for each function it appears in /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:396:12: error: ‘HPTE_V_1TB_SEG’ undeclared (first use in this function) /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:401:10: error: ‘HPTE_V_LARGE’ undeclared (first use in this function) /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:415:2: warning: right shift count = width of type [enabled by default] make[3]: *** [arch/powerpc/kernel/asm-offsets.s] Fehler 1 make[2]: *** [prepare0] Fehler 2 make[1]: *** [deb-pkg] Fehler 2 make: *** [deb-pkg] Fehler 2 I'm still having this problem. I can' build 6fe4c6d466e95d31164f14b1ac4aefb51f0f4f82. Are there any patches to make the kernel builds and do not oops [1] on PowerPC? The failures above should be fixed by now. I've pulled git://git.kernel.org/pub/scm/virt/kvm/kvm.git (a41d08d13f903da5c633fc58ee074156f05ab3ce), but this tree doesn't contain a suitable commit. Where can I find it? Bye, Jörg. -- Ich kenn mich mit OpenBSD kaum aus, was sind denn da so die Vorteile gegenueber Linux und iptables? Der Fuchsschwanzeffekt ist größer. :- Message-ID: slrnb11064.54g.hsch...@humbert.ddns.org signature.asc Description: Digital signature http://en.wikipedia.org/wiki/OpenPGP
Re: [PATCH V2 04/23] kvm tools: Get correct 64-bit types on PPC64 and link appropriately
On 09/12/11 19:29, Pekka Enberg wrote: On Fri, Dec 9, 2011 at 10:24 AM, Sasha Levin levinsasha...@gmail.com wrote: If you also got kernel patches that add __SANE_USERSPACE_TYPES__ to the headers, and KVM_CAP_NR_VCPUS to KVM PPC, we can carry them in the KVM tools tree as well. Yup, all we need is ACKs from PPC maintainers. Cool, I've pinged them re. __SANE_USERSPACE_TYPES__, and that would be useful to carry in your tree. But, IMHO, the patch I sent to Alex ([PATCH] KVM: PPC: Add KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS) should go in via his tree. The KVM/PPC-HV kernel folk will get it quicker in their trees (pulled from Alex) and it's not a build problem, only a limit of SMP CPU numbers. That is, if you're building a kernel for PPC KVM today you'll probably use something more similar to Alex's tree than mainline/kvm tools tree. Cheers, Matt -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 17/23] kvm tools: Add ability to map guest RAM from hugetlbfs
On 09/12/11 18:39, Sasha Levin wrote: On Fri, 2011-12-09 at 17:55 +1100, Matt Evans wrote: Add a --hugetlbfs commandline option to give a path to hugetlbfs-map guest memory (down in kvm__arch_init()). For x86, guest memory is a normal ANON mmap() if this option is not provided, otherwise a hugetlbfs mmap. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/builtin-run.c |4 ++- tools/kvm/include/kvm/kvm.h |4 +- tools/kvm/include/kvm/util.h |4 +++ tools/kvm/kvm.c |4 +- tools/kvm/util.c | 45 ++ tools/kvm/x86/kvm.c | 20 +++-- 6 files changed, 73 insertions(+), 8 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 7969901..0acfe81 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -82,6 +82,7 @@ static const char *guest_mac; static const char *host_mac; static const char *script; static const char *guest_name; +static const char *hugetlbfs_path; static struct virtio_net_params *net_params; static bool single_step; static bool readonly_image[MAX_DISK_IMAGES]; @@ -422,6 +423,7 @@ static const struct option options[] = { OPT_CALLBACK('\0', tty, NULL, tty id, Remap guest TTY into a pty on the host, tty_parser), +OPT_STRING('\0', hugetlbfs, hugetlbfs_path, path, Hugetlbfs path), OPT_GROUP(Kernel options:), OPT_STRING('k', kernel, kernel_filename, kernel, @@ -807,7 +809,7 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) guest_name = default_name; } -kvm = kvm__init(dev, ram_size, guest_name); +kvm = kvm__init(dev, hugetlbfs_path, ram_size, guest_name); kvm-single_step = single_step; diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h index 5fe6e75..7159952 100644 --- a/tools/kvm/include/kvm/kvm.h +++ b/tools/kvm/include/kvm/kvm.h @@ -30,7 +30,7 @@ struct kvm_ext { void kvm__set_dir(const char *fmt, ...); const char *kvm__get_dir(void); -struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name); +struct kvm *kvm__init(const char *kvm_dev, const char *hugetlbfs_path, u64 ram_size, const char *name); int kvm__recommended_cpus(struct kvm *kvm); int kvm__max_cpus(struct kvm *kvm); void kvm__init_ram(struct kvm *kvm); @@ -54,7 +54,7 @@ int kvm__enumerate_instances(int (*callback)(const char *name, int pid)); void kvm__remove_socket(const char *name); void kvm__arch_set_cmdline(char *cmdline, bool video); -void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, u64 ram_size, const char *name); +void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, const char *hugetlbfs_path, u64 ram_size, const char *name); void kvm__arch_setup_firmware(struct kvm *kvm); bool kvm__arch_cpu_supports_vm(void); void kvm__arch_periodic_poll(struct kvm *kvm); diff --git a/tools/kvm/include/kvm/util.h b/tools/kvm/include/kvm/util.h index dc2e0b9..1f6fbbd 100644 --- a/tools/kvm/include/kvm/util.h +++ b/tools/kvm/include/kvm/util.h @@ -20,6 +20,7 @@ #include limits.h #include sys/param.h #include sys/types.h +#include linux/types.h #ifdef __GNUC__ #define NORETURN __attribute__((__noreturn__)) @@ -75,4 +76,7 @@ static inline void msleep(unsigned int msecs) { usleep(MSECS_TO_USECS(msecs)); } + +void *mmap_hugetlbfs(const char *htlbfs_path, u64 size); + #endif /* KVM__UTIL_H */ diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c index c54f886..35ca2c5 100644 --- a/tools/kvm/kvm.c +++ b/tools/kvm/kvm.c @@ -306,7 +306,7 @@ int kvm__max_cpus(struct kvm *kvm) return ret; } -struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name) +struct kvm *kvm__init(const char *kvm_dev, const char *hugetlbfs_path, u64 ram_size, const char *name) { struct kvm *kvm; int ret; @@ -339,7 +339,7 @@ struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name) if (kvm__check_extensions(kvm)) die(A required KVM extention is not supported by OS); -kvm__arch_init(kvm, kvm_dev, ram_size, name); +kvm__arch_init(kvm, kvm_dev, hugetlbfs_path, ram_size, name); kvm-name = name; diff --git a/tools/kvm/util.c b/tools/kvm/util.c index 4efbce9..90b6a3b 100644 --- a/tools/kvm/util.c +++ b/tools/kvm/util.c @@ -4,6 +4,11 @@ #include kvm/util.h +#include linux/magic.h/* For HUGETLBFS_MAGIC */ +#include sys/mman.h +#include sys/stat.h +#include sys/statfs.h + static void report(const char *prefix, const char *err, va_list params) { char msg[1024]; @@ -99,3 +104,43 @@ size_t strlcat(char *dest, const char *src, size_t count) return res; } + +void *mmap_hugetlbfs(const char *htlbfs_path, u64 size) +{ +char mpath[PATH_MAX]; +int fd; +int r; +struct statfs sfs; +
Re: [PATCH V2 04/23] kvm tools: Get correct 64-bit types on PPC64 and link appropriately
On Mon, 2011-12-12 at 12:03 +1100, Matt Evans wrote: On 09/12/11 19:29, Pekka Enberg wrote: On Fri, Dec 9, 2011 at 10:24 AM, Sasha Levin levinsasha...@gmail.com wrote: If you also got kernel patches that add __SANE_USERSPACE_TYPES__ to the headers, and KVM_CAP_NR_VCPUS to KVM PPC, we can carry them in the KVM tools tree as well. Yup, all we need is ACKs from PPC maintainers. Cool, I've pinged them re. __SANE_USERSPACE_TYPES__, and that would be useful to carry in your tree. But, IMHO, the patch I sent to Alex ([PATCH] KVM: PPC: Add KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS) should go in via his tree. The KVM/PPC-HV kernel folk will get it quicker in their trees (pulled from Alex) and it's not a build problem, only a limit of SMP CPU numbers. That is, if you're building a kernel for PPC KVM today you'll probably use something more similar to Alex's tree than mainline/kvm tools tree. Definitely. The __SANE_USERSPACE_TYPES__ patch should probably go to powerpc git tree in addition to our tree. Pekka -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 17/23] kvm tools: Add ability to map guest RAM from hugetlbfs
On Mon, Dec 12, 2011 at 7:17 AM, Matt Evans m...@ozlabs.org wrote: Well, I'm manually mapping from hugetlbfs as currently* PPC KVM requires hugepages to back guest RAM and MADV_HUGEPAGE is just a hint, no? I also wanted things to work on kernels without transparent hugepages enabled. I think it's safer to do things explicitly, as if the user requests hugepages it's more transparent (I'm thinking benchmarking, etc.) to be definitely using hugepages. OK, makes sense. You should probably mention that in the changelog. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Current kernel fails to compile with KVM on PowerPC
On 11.12.2011, at 16:16, Jörg Sommer wrote: Alexander Graf hat am Tue 22. Nov, 22:29 (+0100) geschrieben: On 22.11.2011, at 21:04, Jörg Sommer wrote: Jörg Sommer hat am Mon 07. Nov, 20:48 (+0100) geschrieben: I'm trying to build the kernel with the git commit-id 31555213f03bca37d2c02e10946296052f4ecfcd, but it fails CHK include/linux/version.h HOSTCC scripts/mod/modpost.o CHK include/generated/utsrelease.h UPD include/generated/utsrelease.h HOSTLD scripts/mod/modpost GEN include/generated/bounds.h CC arch/powerpc/kernel/asm-offsets.s In file included from arch/powerpc/kernel/asm-offsets.c:59:0: /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h: In function ‘compute_tlbie_rb’: /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:393:10: error: ‘HPTE_V_SECONDARY’ undeclared (first use in this function) /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:393:10: note: each undeclared identifier is reported only once for each function it appears in /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:396:12: error: ‘HPTE_V_1TB_SEG’ undeclared (first use in this function) /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:401:10: error: ‘HPTE_V_LARGE’ undeclared (first use in this function) /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:415:2: warning: right shift count = width of type [enabled by default] make[3]: *** [arch/powerpc/kernel/asm-offsets.s] Fehler 1 make[2]: *** [prepare0] Fehler 2 make[1]: *** [deb-pkg] Fehler 2 make: *** [deb-pkg] Fehler 2 I'm still having this problem. I can' build 6fe4c6d466e95d31164f14b1ac4aefb51f0f4f82. Are there any patches to make the kernel builds and do not oops [1] on PowerPC? The failures above should be fixed by now. I've pulled git://git.kernel.org/pub/scm/virt/kvm/kvm.git (a41d08d13f903da5c633fc58ee074156f05ab3ce), but this tree doesn't contain a suitable commit. Where can I find it? Please try: git://github.com/agraf/linux-2.6.git kvm-ppc-next That's my WIP tree. I still have a few more patches I want to collect before shoving everything through automated testing and pushing it on to Avi. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html