[PATCH] kvm: ppc: booke: Restore SPRG3 when entering guest
SPRG3 is guest accessible and SPRG3 can be clobbered by host or another guest, So this need to be restored when loading guest state. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/booke_interrupts.S | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kvm/booke_interrupts.S b/arch/powerpc/kvm/booke_interrupts.S index 2c6deb5ef..0d3403f 100644 --- a/arch/powerpc/kvm/booke_interrupts.S +++ b/arch/powerpc/kvm/booke_interrupts.S @@ -459,6 +459,8 @@ lightweight_exit: * written directly to the shared area, so we * need to reload them here with the guest's values. */ + PPC_LD(r3, VCPU_SHARED_SPRG3, r5) + mtspr SPRN_SPRG3, r3 PPC_LD(r3, VCPU_SHARED_SPRG4, r5) mtspr SPRN_SPRG4W, r3 PPC_LD(r3, VCPU_SHARED_SPRG5, r5) -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
On Tue, Jul 15, 2014 at 07:48:06PM -0700, Andy Lutomirski wrote: virtio-rng is both too complicated and insufficient for initial rng seeding. It's far too complicated to use for KASLR or any other early boot random number needs. It also provides /dev/random-style bits, which means that making guest boot wait for virtio-rng is unacceptably slow, and doing it asynchronously means that /dev/urandom might be predictable when userspace starts. This introduces a very simple synchronous mechanism to get /dev/urandom-style bits. Why can't you use RDRAND instruction for that? This is a KVM change: am I supposed to write a unit test somewhere? Andy Lutomirski (4): x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit random,x86: Add arch_get_slow_rng_u64 random: Seed pools from arch_get_slow_rng_u64 at startup x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available Documentation/virtual/kvm/cpuid.txt | 3 +++ arch/x86/Kconfig | 4 arch/x86/boot/compressed/aslr.c | 27 +++ arch/x86/include/asm/archslowrng.h | 30 ++ arch/x86/include/uapi/asm/kvm_para.h | 2 ++ arch/x86/kernel/kvm.c| 22 ++ arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/x86.c | 4 drivers/char/random.c| 14 +- include/linux/random.h | 9 + 10 files changed, 116 insertions(+), 2 deletions(-) create mode 100644 arch/x86/include/asm/archslowrng.h -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
On 07/16/2014 08:41 AM, Gleb Natapov wrote: On Tue, Jul 15, 2014 at 07:48:06PM -0700, Andy Lutomirski wrote: virtio-rng is both too complicated and insufficient for initial rng seeding. It's far too complicated to use for KASLR or any other early boot random number needs. It also provides /dev/random-style bits, which means that making guest boot wait for virtio-rng is unacceptably slow, and doing it asynchronously means that /dev/urandom might be predictable when userspace starts. This introduces a very simple synchronous mechanism to get /dev/urandom-style bits. Why can't you use RDRAND instruction for that? You mean using it directly? I think simply for the very same reasons as in c2557a303a ... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
On Wed, Jul 16, 2014 at 09:10:27AM +0200, Daniel Borkmann wrote: On 07/16/2014 08:41 AM, Gleb Natapov wrote: On Tue, Jul 15, 2014 at 07:48:06PM -0700, Andy Lutomirski wrote: virtio-rng is both too complicated and insufficient for initial rng seeding. It's far too complicated to use for KASLR or any other early boot random number needs. It also provides /dev/random-style bits, which means that making guest boot wait for virtio-rng is unacceptably slow, and doing it asynchronously means that /dev/urandom might be predictable when userspace starts. This introduces a very simple synchronous mechanism to get /dev/urandom-style bits. Why can't you use RDRAND instruction for that? You mean using it directly? I think simply for the very same reasons as in c2557a303a ... So you trust your hypervisor vendor more than you trust your CPU vendor? :) -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
Il 16/07/2014 09:10, Daniel Borkmann ha scritto: On 07/16/2014 08:41 AM, Gleb Natapov wrote: On Tue, Jul 15, 2014 at 07:48:06PM -0700, Andy Lutomirski wrote: virtio-rng is both too complicated and insufficient for initial rng seeding. It's far too complicated to use for KASLR or any other early boot random number needs. It also provides /dev/random-style bits, which means that making guest boot wait for virtio-rng is unacceptably slow, and doing it asynchronously means that /dev/urandom might be predictable when userspace starts. This introduces a very simple synchronous mechanism to get /dev/urandom-style bits. Why can't you use RDRAND instruction for that? You mean using it directly? I think simply for the very same reasons as in c2557a303a ... No, this is very different. This mechanism provides no guarantee that the result contains any actual entropy. In fact, patch 3 adds a call to the new arch_get_slow_rng_u64 just below a call to arch_get_random_lang aka RDRAND. I agree with Gleb that it's simpler to just expect a relatively recent processor and use RDRAND. BTW, the logic for crediting entropy to RDSEED but not RDRAND escapes me. If you trust the processor, you could use Intel's algorithm to force reseeding of RDRAND. If you don't trust the processor, the same paranoia applies to RDRAND and RDSEED. In a guest you must trust the hypervisor anyway to use RDRAND or RDSEED, since the hypervisor can trap it. A malicious hypervisor is no different from a malicious processor. In any case, is there a matching QEMU patch somewhere? Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 73331] Nested Virtualization, L2 cannot boot up on Ivybridge and Haswell
https://bugzilla.kernel.org/show_bug.cgi?id=73331 Zhou, Chao chao.z...@intel.com changed: What|Removed |Added CC||chao.z...@intel.com --- Comment #3 from Zhou, Chao chao.z...@intel.com --- kvm.git + qemu.git:9f6226a7_0e162974 kernel version: 3.16.0-rc1 test on Ivytown_EP after create L2 guest, L2 guest can boot up. -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 73331] Nested Virtualization, L2 cannot boot up on Ivybridge and Haswell
https://bugzilla.kernel.org/show_bug.cgi?id=73331 Paolo Bonzini bonz...@gnu.org changed: What|Removed |Added Status|NEW |RESOLVED CC||bonz...@gnu.org Resolution|--- |CODE_FIX -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 73331] Nested Virtualization, L2 cannot boot up on Ivybridge and Haswell
https://bugzilla.kernel.org/show_bug.cgi?id=73331 robert...@intel.com changed: What|Removed |Added Status|RESOLVED|VERIFIED -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM Test report, kernel 9f6226a7... qemu 0e162974...
Hi All, This is KVM upstream test result against kvm.git next branch and qemu.git master branch. kvm.git next branch: 9f6226a762c7ae02f6a23a3d4fc552dafa57ea23 based on kernel 3.16.0-rc1 qemu.git master branch: 0e16297461264b3ea8f7282d1195cf53aa8a707c We found no new bug and two fix bugs in the past two months. New issue (0): Fixed issue (1): 1. [Nested kvm on kvm] L2 guest reboot continuously when create a rhel6u5(64bit) as L2 guest. https://bugzilla.kernel.org/show_bug.cgi?id=75981 --Jan Kiszka fixed the bug. 2. Nested Virtualization,L2 cannot boot up on Ivybridge and Haswell. https://bugzilla.kernel.org/show_bug.cgi?id=73331 Old issues (6): -- 1. guest panic with parameter -cpu host in qemu command line (about vPMU issue). https://bugs.launchpad.net/qemu/+bug/994378 2. Guest hang when doing kernel build and writing data in guest. https://bugs.launchpad.net/qemu/+bug/1096814 3. with 'monitor pty', it needs to flush pts device after sending command to it https://bugs.launchpad.net/qemu/+bug/1185228 4. [Nested] Windows XP Mode can not work https://bugzilla.kernel.org/show_bug.cgi?id=60782 5. [Nested]L2 guest failed to start in VMware on KVM https://bugzilla.kernel.org/show_bug.cgi?id=61411 6. [Nested]L1 call trace when create windows 7 guest as L2 guest https://bugzilla.kernel.org/show_bug.cgi?id=72381 Test environment: == Platform IvyBridge-EP Sandybridge-EP Haswell-EP CPU Cores 32 32 56 Memory size 64GB 32GB 32GB Best Regards, Robert Ho -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7 00/14] kvm-unit-tests/arm: initial drop
This is v7 of a series that introduces arm to kvm-unit-tests. Three of the v6 patches were already merged, so this is the remainder. No new patches have been added, but some of the summaries have changed. This series first adds support for device trees (libfdt), and for chr-testdev (virtio). Next, it adds the basic infrastructure for booting a test case (guest), and adds a first test case, a self-test to confirm setup was completed successfully. Finally, it further prepares the framework for more complicated tests by adding vector support, and extends the self-test to test that too. This initial drop doesn't require kvmarm. qemu-system-arm is enough, but qemu must have mach-virt, and the chr-testdev patch[1]. These patches (v7) are also available from a git repo here https://github.com/rhdrjones/kvm-unit-tests/commits/arm/v7-initial-drop The v6 patches are also available from a git repo here https://github.com/rhdrjones/kvm-unit-tests/commits/arm/v6-initial-drop%2Cchr-testdev and, the v5 patches are still available here https://github.com/rhdrjones/kvm-unit-tests/commits/arm/v5-initial-drop The main changes since v6 are the moving/redesigning/reAPIing of arm's memregions to common code as phys_alloc, and the splitting of virtio and virtio-mmio code into separate files. The main change since v5 (as stated in the v6 cover letter) is the switch from virtio-testdev to chr-testdev. Also, as stated in the v6 cover letter, I've kept Christoffer's *-by's, and mainly the already Reviewed-by patches 05/14 add minimal virtio support for devtree virtio-mmio 09/14 arm: initial drop should get a second look (or interdiffed). Thanks in advance for reviews! [1] http://lists.nongnu.org/archive/html/qemu-devel/2014-07/msg01960.html Andrew Jones (12): libfdt: Import libfdt source add support for Linux device trees Introduce asm-generic/*.h files Introduce lib/alloc add minimal virtio support for devtree virtio-mmio lib/asm-generic: add page.h and virt_to_phys/phys_to_virt virtio: add minimal support for virtqueues Introduce chr-testdev arm: initial drop arm: Add arch-specific asm/page.h and __va/__pa arm: add useful headers from the Linux kernel arm: vectors support Christoffer Dall (2): arm: Add spinlock implementation arm: Add IO accessors to avoid register-writeback .gitignore |1 + Makefile | 25 +- arm/cstart.S | 209 ++ arm/flat.lds | 23 + arm/run | 46 ++ arm/selftest.c | 210 ++ arm/unittests.cfg| 30 + config/asm-offsets.mak | 41 ++ config/config-arm.mak| 81 +++ configure| 23 +- lib/alloc.c | 176 + lib/alloc.h | 123 lib/argv.c |9 + lib/arm/.gitignore |1 + lib/arm/asm-offsets.c| 39 ++ lib/arm/asm/asm-offsets.h|1 + lib/arm/asm/barrier.h| 18 + lib/arm/asm/cp15.h | 37 ++ lib/arm/asm/io.h | 94 +++ lib/arm/asm/page.h | 33 + lib/arm/asm/processor.h | 39 ++ lib/arm/asm/ptrace.h | 100 +++ lib/arm/asm/setup.h | 27 + lib/arm/asm/spinlock.h | 11 + lib/arm/eabi_compat.c| 20 + lib/arm/io.c | 65 ++ lib/arm/processor.c | 111 lib/arm/setup.c | 82 +++ lib/arm/spinlock.c | 28 + lib/asm-generic/io.h | 175 + lib/asm-generic/page.h | 28 + lib/asm-generic/spinlock.h |4 + lib/chr-testdev.c| 72 ++ lib/chr-testdev.h| 14 + lib/devicetree.c | 272 lib/devicetree.h | 236 +++ lib/generated/.gitignore |1 + lib/libcflat.h |2 + lib/libfdt/Makefile.libfdt | 10 + lib/libfdt/README|4 + lib/libfdt/fdt.c | 250 +++ lib/libfdt/fdt.h | 111 lib/libfdt/fdt_empty_tree.c | 84 +++ lib/libfdt/fdt_ro.c | 573 lib/libfdt/fdt_rw.c | 492 ++ lib/libfdt/fdt_strerror.c| 96 +++ lib/libfdt/fdt_sw.c | 256 +++ lib/libfdt/fdt_wip.c | 118 lib/libfdt/libfdt.h | 1514 ++ lib/libfdt/libfdt_env.h | 111 lib/libfdt/libfdt_internal.h | 95 +++ lib/libfdt/version.lds | 60 ++ lib/virtio-mmio.c| 175 + lib/virtio-mmio.h| 65 ++ lib/virtio.c | 130 lib/virtio.h | 149 + 56 files changed, 6794 insertions(+), 6 deletions(-) create mode 100644 arm/cstart.S create mode 100644 arm/flat.lds create mode 100755 arm/run create mode 100644 arm/selftest.c create mode 100644 arm/unittests.cfg create mode 100644 config/asm-offsets.mak create mode 100644 config/config-arm.mak create
[PATCH v7 06/14] lib/asm-generic: add page.h and virt_to_phys/phys_to_virt
Signed-off-by: Andrew Jones drjo...@redhat.com --- lib/asm-generic/io.h | 13 + lib/asm-generic/page.h | 28 2 files changed, 41 insertions(+) create mode 100644 lib/asm-generic/page.h diff --git a/lib/asm-generic/io.h b/lib/asm-generic/io.h index f00f4d3e68fe1..a9939d3a5921f 100644 --- a/lib/asm-generic/io.h +++ b/lib/asm-generic/io.h @@ -10,6 +10,7 @@ * This work is licensed under the terms of the GNU LGPL, version 2. */ #include libcflat.h +#include asm/page.h #ifndef __raw_readb static inline u8 __raw_readb(const volatile void *addr) @@ -159,4 +160,16 @@ static inline void *ioremap(u64 phys_addr, size_t size __unused) } #endif +#ifndef virt_to_phys +static inline unsigned long virt_to_phys(volatile void *address) +{ + return __pa((unsigned long)address); +} + +static inline void *phys_to_virt(unsigned long address) +{ + return __va(address); +} +#endif + #endif /* _ASM_GENERIC_IO_H_ */ diff --git a/lib/asm-generic/page.h b/lib/asm-generic/page.h new file mode 100644 index 0..559938fcf0b3f --- /dev/null +++ b/lib/asm-generic/page.h @@ -0,0 +1,28 @@ +#ifndef _ASM_GENERIC_PAGE_H_ +#define _ASM_GENERIC_PAGE_H_ +/* + * asm-generic/page.h + * adapted from the Linux kernel's include/asm-generic/page.h + * + * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com + * + * This work is licensed under the terms of the GNU LGPL, version 2. + */ + +#define PAGE_SHIFT 12 +#ifndef __ASSEMBLY__ +#define PAGE_SIZE (1UL PAGE_SHIFT) +#else +#define PAGE_SIZE (1 PAGE_SHIFT) +#endif +#define PAGE_MASK (~(PAGE_SIZE-1)) +#define PAGE_ALIGN(addr) (((addr) + (PAGE_SIZE-1)) PAGE_MASK) + +#ifndef __ASSEMBLY__ +#define __va(x)((void *)((unsigned long) (x))) +#define __pa(x)((unsigned long) (x)) +#define virt_to_pfn(kaddr) (__pa(kaddr) PAGE_SHIFT) +#define pfn_to_virt(pfn) __va((pfn) PAGE_SHIFT) +#endif + +#endif -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7 03/14] Introduce asm-generic/*.h files
Architecture neutral code may need to call low-level io accessors, or use spinlocks. Create a generic io.h to ensure those accessors are defined, and a generic spinlock.h that complains when included, as we can't write a generic spinlock. These files can be overridden or extended by architecture specific versions placed in lib/$ARCH/asm/. Signed-off-by: Andrew Jones drjo...@redhat.com Acked-by: Christoffer Dall christoffer.d...@linaro.org --- v5: added a trivial ioremap function [Christoffer Dall] v4: introduce lib/asm symlink to get rid of #ifdef __arm__, add spinlock.h too v3: wrt to io.h (was libio.[ch]) only - get rid of CONFIG_64BIT, replace with asserts - get rid of {read,write}_len() [libio.c] - fix bad *64_to_cpu macros --- .gitignore | 1 + Makefile | 6 +- configure | 11 +++ lib/asm-generic/io.h | 162 + lib/asm-generic/spinlock.h | 4 ++ 5 files changed, 181 insertions(+), 3 deletions(-) create mode 100644 lib/asm-generic/io.h create mode 100644 lib/asm-generic/spinlock.h diff --git a/.gitignore b/.gitignore index 775d0dfd8263e..e21939a8771e9 100644 --- a/.gitignore +++ b/.gitignore @@ -9,6 +9,7 @@ patches .stgit-* cscope.* *.swp +/lib/asm /config.mak /*-run /test.log diff --git a/Makefile b/Makefile index 180189ecd6d8c..dd7e6e94bfe7b 100644 --- a/Makefile +++ b/Makefile @@ -77,11 +77,11 @@ libfdt_clean: $(LIBFDT_objdir)/.*.d distclean: clean libfdt_clean - $(RM) config.mak $(TEST_DIR)-run test.log msr.out cscope.* + $(RM) lib/asm config.mak $(TEST_DIR)-run test.log msr.out cscope.* -cscope: common_dirs = lib lib/libfdt +cscope: common_dirs = lib lib/libfdt lib/asm lib/asm-generic cscope: $(RM) ./cscope.* - find $(TEST_DIR) lib/$(TEST_DIR) $(common_dirs) -maxdepth 1 \ + find -L $(TEST_DIR) lib/$(TEST_DIR) $(common_dirs) -maxdepth 1 \ -name '*.[chsS]' -print | sed 's,^\./,,' ./cscope.files cscope -bk diff --git a/configure b/configure index dbbc6045d214a..aaa1b50ab1b98 100755 --- a/configure +++ b/configure @@ -91,6 +91,17 @@ if [ $exit -eq 0 ]; then fi rm -f lib_test.c +# link lib/asm for the architecture +rm -f lib/asm +asm=asm-generic +if [ -d lib/$arch/asm ]; then + asm=$arch/asm +elif [ -d lib/$testdir/asm ]; then + asm=$testdir/asm +fi +ln -s $asm lib/asm + +# create the config cat EOF config.mak PREFIX=$prefix KERNELDIR=$(readlink -f $kerneldir) diff --git a/lib/asm-generic/io.h b/lib/asm-generic/io.h new file mode 100644 index 0..f00f4d3e68fe1 --- /dev/null +++ b/lib/asm-generic/io.h @@ -0,0 +1,162 @@ +#ifndef _ASM_GENERIC_IO_H_ +#define _ASM_GENERIC_IO_H_ +/* + * asm-generic/io.h + * adapted from the Linux kernel's include/asm-generic/io.h + * and arch/arm/include/asm/io.h + * + * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com + * + * This work is licensed under the terms of the GNU LGPL, version 2. + */ +#include libcflat.h + +#ifndef __raw_readb +static inline u8 __raw_readb(const volatile void *addr) +{ + return *(const volatile u8 *)addr; +} +#endif + +#ifndef __raw_readw +static inline u16 __raw_readw(const volatile void *addr) +{ + return *(const volatile u16 *)addr; +} +#endif + +#ifndef __raw_readl +static inline u32 __raw_readl(const volatile void *addr) +{ + return *(const volatile u32 *)addr; +} +#endif + +#ifndef __raw_readq +static inline u64 __raw_readq(const volatile void *addr) +{ + assert(sizeof(unsigned long) == sizeof(u64)); + return *(const volatile u64 *)addr; +} +#endif + +#ifndef __raw_writeb +static inline void __raw_writeb(u8 b, volatile void *addr) +{ + *(volatile u8 *)addr = b; +} +#endif + +#ifndef __raw_writew +static inline void __raw_writew(u16 b, volatile void *addr) +{ + *(volatile u16 *)addr = b; +} +#endif + +#ifndef __raw_writel +static inline void __raw_writel(u32 b, volatile void *addr) +{ + *(volatile u32 *)addr = b; +} +#endif + +#ifndef __raw_writeq +static inline void __raw_writeq(u64 b, volatile void *addr) +{ + assert(sizeof(unsigned long) == sizeof(u64)); + *(volatile u64 *)addr = b; +} +#endif + +#ifndef __bswap16 +static inline u16 __bswap16(u16 x) +{ + return ((x 8) 0xff) | ((x 0xff) 8); +} +#endif + +#ifndef __bswap32 +static inline u32 __bswap32(u32 x) +{ + return ((x 0xff00) 24) | ((x 0x00ff) 8) | + ((x 0xff00) 8) | ((x 0x00ff) 24); +} +#endif + +#ifndef __bswap64 +static inline u64 __bswap64(u64 x) +{ + return ((x 0x00ffULL) 56) | + ((x 0xff00ULL) 40) | + ((x 0x00ffULL) 24) | + ((x 0xff00ULL) 8) | + ((x 0x00ffULL) 8) | + ((x 0xff00ULL) 24) | + ((x 0x00ffULL) 40) | + ((x
[PATCH v7 14/14] arm: vectors support
Add support for tests to use exception handlers using install_exception_handler(). This patch also adds start_usr(), which can be used to start a function in USR mode, using a given stack pointer. start_usr() is used by a new selftest test that checks the new vector support. Signed-off-by: Andrew Jones drjo...@redhat.com Reviewed-by: Christoffer Dall christoffer.d...@linaro.org --- v7: - selftest.c: s/alloc_aligned/memalign/ - lib/arm/processor.c remove unnecessary include asm/setup.h v6: use alloc() for start_usr v5: rebase change: replace __stringify with libcflat's new xstr macro v4: a couple tweaks to fit changes in the other patches, vectors-usr test now has an 8K usr stack v3: - squashed in 'arm: Simplify exceptions_init in cstart.S' [Christoffer Dall] - suggested function name changes and comment additions [Christoffer Dall] - fix a bug with stack restore from usr mode exceptions that Christoffer pointed out. Add a get_sp() accessor too. --- arm/cstart.S| 174 arm/flat.lds| 7 +- arm/selftest.c | 126 ++- arm/unittests.cfg | 12 config/config-arm.mak | 3 +- lib/arm/asm/processor.h | 39 +++ lib/arm/processor.c | 111 ++ 7 files changed, 469 insertions(+), 3 deletions(-) create mode 100644 lib/arm/asm/processor.h create mode 100644 lib/arm/processor.c diff --git a/arm/cstart.S b/arm/cstart.S index e28251db2950d..cc87ece4b6b40 100644 --- a/arm/cstart.S +++ b/arm/cstart.S @@ -5,6 +5,10 @@ * * This work is licensed under the terms of the GNU LGPL, version 2. */ +#define __ASSEMBLY__ +#include asm/asm-offsets.h +#include asm/ptrace.h +#include asm/cp15.h .arm @@ -17,6 +21,13 @@ start: * See the kernel doc Documentation/arm/Booting */ ldr sp, =stacktop + push{r0-r3} + + /* set up vector table and mode stacks */ + bl exceptions_init + + /* complete setup */ + pop {r0-r3} bl setup /* run the test */ @@ -27,9 +38,172 @@ start: bl exit b halt + +.macro set_mode_stack mode, stack + add \stack, #S_FRAME_SIZE + msr cpsr_c, #(\mode | PSR_I_BIT | PSR_F_BIT) + mov sp, \stack +.endm + +exceptions_init: + mrc p15, 0, r2, c1, c0, 0 @ read SCTLR + bic r2, #CR_V @ SCTLR.V := 0 + mcr p15, 0, r2, c1, c0, 0 @ write SCTLR + ldr r2, =vector_table + mcr p15, 0, r2, c12, c0, 0 @ write VBAR + + mrs r2, cpsr + ldr r1, =exception_stacks + + /* first frame reserved for svc mode */ + set_mode_stack UND_MODE, r1 + set_mode_stack ABT_MODE, r1 + set_mode_stack IRQ_MODE, r1 + set_mode_stack FIQ_MODE, r1 + + msr cpsr_cxsf, r2 @ back to svc mode + mov pc, lr + .text .globl halt halt: 1: wfi b 1b + +/* + * Vector stubs + * Simplified version of the Linux kernel implementation + * arch/arm/kernel/entry-armv.S + * + * Each mode has an S_FRAME_SIZE sized stack initialized + * in exceptions_init + */ +.macro vector_stub, name, vec, mode, correction=0 +.align 5 +vector_\name: +.if \correction + sub lr, lr, #\correction +.endif + /* +* Save r0, r1, lr_exception (parent PC) +* and spsr_exception (parent CPSR) +*/ + str r0, [sp, #S_R0] + str r1, [sp, #S_R1] + str lr, [sp, #S_PC] + mrs r0, spsr + str r0, [sp, #S_PSR] + + /* Prepare for SVC32 mode. */ + mrs r0, cpsr + bic r0, #MODE_MASK + orr r0, #SVC_MODE + msr spsr_cxsf, r0 + + /* Branch to handler in SVC mode */ + mov r0, #\vec + mov r1, sp + ldr lr, =vector_common + movspc, lr +.endm + +vector_stubrst,0, UND_MODE +vector_stubund,1, UND_MODE +vector_stubpabt, 3, ABT_MODE, 4 +vector_stubdabt, 4, ABT_MODE, 8 +vector_stubirq,6, IRQ_MODE, 4 +vector_stubfiq,7, FIQ_MODE, 4 + +.align 5 +vector_svc: + /* +* Save r0, r1, lr_exception (parent PC) +* and spsr_exception (parent CPSR) +*/ + push{ r1 } + ldr r1, =exception_stacks + str r0, [r1, #S_R0] + pop { r0 } + str r0, [r1, #S_R1] + str lr, [r1, #S_PC] + mrs r0, spsr + str r0, [r1, #S_PSR] + + /* +* Branch to handler, still in SVC mode. +* r0 := 2 is the svc vector number. +*/ + mov r0, #2 + ldr lr, =vector_common + mov pc, lr + +vector_common: + /* make room for pt_regs */ + sub sp, #S_FRAME_SIZE + tst sp, #4 @ check stack alignment + subne sp, #4 + + /* store
[PATCH v7 07/14] virtio: add minimal support for virtqueues
Currently only supports sending (outbufs), doesn't have any bells or whistles. Code adapted from the Linux Kernel. Signed-off-by: Andrew Jones drjo...@redhat.com --- v7: - {alloc,alloc_aligned} - {calloc,memalign} - changes now split between virtio.* and virtio-mmio.* files --- lib/virtio-mmio.c | 64 + lib/virtio-mmio.h | 18 + lib/virtio.c | 117 ++ lib/virtio.h | 73 ++ 4 files changed, 272 insertions(+) diff --git a/lib/virtio-mmio.c b/lib/virtio-mmio.c index 7331abf128cc5..3840838defa1c 100644 --- a/lib/virtio-mmio.c +++ b/lib/virtio-mmio.c @@ -1,4 +1,6 @@ /* + * virtqueue support adapted from the Linux kernel. + * * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com * * This work is licensed under the terms of the GNU LGPL, version 2. @@ -6,6 +8,7 @@ #include libcflat.h #include devicetree.h #include alloc.h +#include asm/page.h #include asm/io.h #include virtio.h #include virtio-mmio.h @@ -32,9 +35,68 @@ static void vm_set(struct virtio_device *vdev, unsigned offset, writeb(p[i], vm_dev-base + VIRTIO_MMIO_CONFIG + offset + i); } +static bool vm_notify(struct virtqueue *vq) +{ + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq-vdev); + writel(vq-index, vm_dev-base + VIRTIO_MMIO_QUEUE_NOTIFY); + return true; +} + +static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, +unsigned index, +void (*callback)(struct virtqueue *vq), +const char *name) +{ + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); + struct vring_virtqueue *vq; + void *queue; + unsigned num = VIRTIO_MMIO_QUEUE_NUM_MIN; + + vq = calloc(1, sizeof(*vq)); + queue = memalign(PAGE_SIZE, VIRTIO_MMIO_QUEUE_SIZE_MIN); + if (!vq || !queue) + return NULL; + + writel(index, vm_dev-base + VIRTIO_MMIO_QUEUE_SEL); + + assert(readl(vm_dev-base + VIRTIO_MMIO_QUEUE_NUM_MAX) = num); + + if (readl(vm_dev-base + VIRTIO_MMIO_QUEUE_PFN) != 0) { + printf(%s: virtqueue %d already setup! base=%p\n, + __func__, index, vm_dev-base); + return NULL; + } + + writel(num, vm_dev-base + VIRTIO_MMIO_QUEUE_NUM); + writel(VIRTIO_MMIO_VRING_ALIGN, + vm_dev-base + VIRTIO_MMIO_QUEUE_ALIGN); + writel(virt_to_pfn(queue), vm_dev-base + VIRTIO_MMIO_QUEUE_PFN); + + vring_init_virtqueue(vq, index, num, VIRTIO_MMIO_VRING_ALIGN, +vdev, queue, vm_notify, callback, name); + + return vq-vq; +} + +static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs, + struct virtqueue *vqs[], vq_callback_t *callbacks[], + const char *names[]) +{ + unsigned i; + + for (i = 0; i nvqs; ++i) { + vqs[i] = vm_setup_vq(vdev, i, callbacks[i], names[i]); + if (vqs[i] == NULL) + return -1; + } + + return 0; +} + static const struct virtio_config_ops vm_config_ops = { .get = vm_get, .set = vm_set, + .find_vqs = vm_find_vqs, }; static void vm_device_init(struct virtio_mmio_device *vm_dev) @@ -42,6 +104,8 @@ static void vm_device_init(struct virtio_mmio_device *vm_dev) vm_dev-vdev.id.device = readl(vm_dev-base + VIRTIO_MMIO_DEVICE_ID); vm_dev-vdev.id.vendor = readl(vm_dev-base + VIRTIO_MMIO_VENDOR_ID); vm_dev-vdev.config = vm_config_ops; + + writel(PAGE_SIZE, vm_dev-base + VIRTIO_MMIO_GUEST_PAGE_SIZE); } /** diff --git a/lib/virtio-mmio.h b/lib/virtio-mmio.h index 7cd610428b486..8046a4747959a 100644 --- a/lib/virtio-mmio.h +++ b/lib/virtio-mmio.h @@ -8,6 +8,7 @@ * This work is licensed under the terms of the GNU LGPL, version 2. */ #include libcflat.h +#include asm/page.h #include virtio.h #define VIRTIO_MMIO_MAGIC_VALUE0x000 @@ -33,6 +34,23 @@ #define VIRTIO_MMIO_INT_VRING (1 0) #define VIRTIO_MMIO_INT_CONFIG (1 1) +#define VIRTIO_MMIO_VRING_ALIGNPAGE_SIZE + +/* + * The minimum queue size is 2*VIRTIO_MMIO_VRING_ALIGN, which + * means the largest queue num for the minimum queue size is 128, i.e. + * 2*VIRTIO_MMIO_VRING_ALIGN = vring_size(128, VIRTIO_MMIO_VRING_ALIGN), + * where vring_size is + * + * unsigned vring_size(unsigned num, unsigned long align) + * { + * return ((sizeof(struct vring_desc) * num + sizeof(u16) * (3 + num) + * + align - 1) ~(align - 1)) + * + sizeof(u16) * 3 + sizeof(struct vring_used_elem) * num; + * } + */ +#define VIRTIO_MMIO_QUEUE_SIZE_MIN (2*VIRTIO_MMIO_VRING_ALIGN) +#define
[PATCH v7 11/14] arm: Add IO accessors to avoid register-writeback
From: Christoffer Dall christoffer.d...@linaro.org Add IO accessor functions to the arm library functions to avoid register-writeback IO accessors that are not yet supported by the kernel. Signed-off-by: Christoffer Dall christoffer.d...@linaro.org Signed-off-by: Andrew Jones drjo...@redhat.com --- lib/arm/asm/io.h | 57 1 file changed, 57 insertions(+) diff --git a/lib/arm/asm/io.h b/lib/arm/asm/io.h index 51ec6e9aa2e99..1d0abb7d9f405 100644 --- a/lib/arm/asm/io.h +++ b/lib/arm/asm/io.h @@ -3,6 +3,9 @@ #include libcflat.h #include asm/barrier.h +#define __iomem +#define __force + #define __bswap16 bswap16 static inline u16 bswap16(u16 val) { @@ -19,6 +22,60 @@ static inline u32 bswap32(u32 val) return ret; } +#define __raw_readb __raw_readb +static inline u8 __raw_readb(const volatile void __iomem *addr) +{ + u8 val; + asm volatile(ldrb %1, %0 +: +Qo (*(volatile u8 __force *)addr), + =r (val)); + return val; +} + +#define __raw_readw __raw_readw +static inline u16 __raw_readw(const volatile void __iomem *addr) +{ + u16 val; + asm volatile(ldrh %1, %0 +: +Q (*(volatile u16 __force *)addr), + =r (val)); + return val; +} + +#define __raw_readl __raw_readl +static inline u32 __raw_readl(const volatile void __iomem *addr) +{ + u32 val; + asm volatile(ldr %1, %0 +: +Qo (*(volatile u32 __force *)addr), + =r (val)); + return val; +} + +#define __raw_writeb __raw_writeb +static inline void __raw_writeb(u8 val, volatile void __iomem *addr) +{ + asm volatile(strb %1, %0 +: +Qo (*(volatile u8 __force *)addr) +: r (val)); +} + +#define __raw_writew __raw_writew +static inline void __raw_writew(u16 val, volatile void __iomem *addr) +{ + asm volatile(strh %1, %0 +: +Q (*(volatile u16 __force *)addr) +: r (val)); +} + +#define __raw_writel __raw_writel +static inline void __raw_writel(u32 val, volatile void __iomem *addr) +{ + asm volatile(str %1, %0 +: +Qo (*(volatile u32 __force *)addr) +: r (val)); +} + #include asm-generic/io.h #endif /* _ASMARM_IO_H_ */ -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7 13/14] arm: add useful headers from the Linux kernel
We're going to need PSR bit defines and pt_regs. We'll also need pt_regs offsets in assembly code. This patch adapts the Linux kernel's ptrace.h and generated/asm-offsets.h to this framework. It also adapts cp15.h from the kernel, since we'll need bit defines from there too. Signed-off-by: Andrew Jones drjo...@redhat.com Acked-by: Christoffer Dall christoffer.d...@linaro.org --- v4: much improved asm-offsets.h generation based on Kbuild --- config/asm-offsets.mak| 41 +++ config/config-arm.mak | 9 - lib/arm/.gitignore| 1 + lib/arm/asm-offsets.c | 39 ++ lib/arm/asm/asm-offsets.h | 1 + lib/arm/asm/cp15.h| 37 + lib/arm/asm/ptrace.h | 100 ++ lib/generated/.gitignore | 1 + 8 files changed, 227 insertions(+), 2 deletions(-) create mode 100644 config/asm-offsets.mak create mode 100644 lib/arm/.gitignore create mode 100644 lib/arm/asm-offsets.c create mode 100644 lib/arm/asm/asm-offsets.h create mode 100644 lib/arm/asm/cp15.h create mode 100644 lib/arm/asm/ptrace.h create mode 100644 lib/generated/.gitignore diff --git a/config/asm-offsets.mak b/config/asm-offsets.mak new file mode 100644 index 0..b2578a6692f33 --- /dev/null +++ b/config/asm-offsets.mak @@ -0,0 +1,41 @@ +# +# asm-offsets adapted from the kernel, see +# Kbuild +# scripts/Kbuild.include +# scripts/Makefile.build +# +# Authors: Andrew Jones drjo...@redhat.com +# + +define sed-y + /^-/{s:-#\(.*\):/* \1 */:; \ + s:^-\([^ ]*\) [\$$#]*\([-0-9]*\) \(.*\):#define \1 \2 /* \3 */:; \ + s:^-\([^ ]*\) [\$$#]*\([^ ]*\) \(.*\):#define \1 \2 /* \3 */:; \ + s:-::; p;} +endef + +define make_asm_offsets + (set -e; \ +echo #ifndef __ASM_OFFSETS_H__; \ +echo #define __ASM_OFFSETS_H__; \ +echo /*; \ +echo * Generated file. DO NOT MODIFY.; \ +echo *; \ +echo */; \ +echo ; \ +sed -ne $(sed-y) $; \ +echo ; \ +echo #endif ) $@ +endef + +$(asm-offsets:.h=.s): $(asm-offsets:.h=.c) + $(CC) $(CFLAGS) -fverbose-asm -S -o $@ $ + +$(asm-offsets): $(asm-offsets:.h=.s) + $(call make_asm_offsets) + cp -f $(asm-offsets) lib/generated + +asm_offsets_clean: + $(RM) $(asm-offsets) $(asm-offsets:.h=.s) \ + $(addprefix lib/generated/,$(notdir $(asm-offsets))) + diff --git a/config/config-arm.mak b/config/config-arm.mak index b7239810183d1..f03b96d4c50c5 100644 --- a/config/config-arm.mak +++ b/config/config-arm.mak @@ -30,6 +30,9 @@ CFLAGS += -Wextra CFLAGS += -O2 CFLAGS += -I lib -I lib/libfdt +asm-offsets = lib/arm/asm-offsets.h +include config/asm-offsets.mak + cflatobjs += \ lib/alloc.o \ lib/devicetree.o \ @@ -59,7 +62,7 @@ FLATLIBS = $(libcflat) $(LIBFDT_archive) $(libgcc) $(libeabi) $(libeabi): $(eabiobjs) $(AR) rcs $@ $^ -arch_clean: libfdt_clean +arch_clean: libfdt_clean asm_offsets_clean $(RM) $(TEST_DIR)/*.{o,flat,elf} $(libeabi) $(eabiobjs) \ $(TEST_DIR)/.*.d lib/arm/.*.d @@ -69,7 +72,9 @@ tests_and_config = $(TEST_DIR)/*.flat $(TEST_DIR)/unittests.cfg cstart.o = $(TEST_DIR)/cstart.o -test_cases: $(tests-common) $(tests) +generated_files = $(asm-offsets) + +test_cases: $(generated_files) $(tests-common) $(tests) $(TEST_DIR)/selftest.elf: $(cstart.o) $(TEST_DIR)/selftest.o diff --git a/lib/arm/.gitignore b/lib/arm/.gitignore new file mode 100644 index 0..84872bf197c67 --- /dev/null +++ b/lib/arm/.gitignore @@ -0,0 +1 @@ +asm-offsets.[hs] diff --git a/lib/arm/asm-offsets.c b/lib/arm/asm-offsets.c new file mode 100644 index 0..a9c349d2d427c --- /dev/null +++ b/lib/arm/asm-offsets.c @@ -0,0 +1,39 @@ +/* + * Adapted from arch/arm/kernel/asm-offsets.c + * + * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com + * + * This work is licensed under the terms of the GNU LGPL, version 2. + */ +#include libcflat.h +#include asm/ptrace.h + +#define DEFINE(sym, val) \ + asm volatile(\n- #sym %0 #val : : i (val)) +#define OFFSET(sym, str, mem) DEFINE(sym, offsetof(struct str, mem)) +#define COMMENT(x) asm volatile(\n-# x) +#define BLANK()asm volatile(\n- : : ) + +int main(void) +{ + OFFSET(S_R0, pt_regs, ARM_r0); + OFFSET(S_R1, pt_regs, ARM_r1); + OFFSET(S_R2, pt_regs, ARM_r2); + OFFSET(S_R3, pt_regs, ARM_r3); + OFFSET(S_R4, pt_regs, ARM_r4); + OFFSET(S_R5, pt_regs, ARM_r5); + OFFSET(S_R6, pt_regs, ARM_r6); + OFFSET(S_R7, pt_regs, ARM_r7); + OFFSET(S_R8, pt_regs, ARM_r8); + OFFSET(S_R9, pt_regs, ARM_r9); + OFFSET(S_R10, pt_regs, ARM_r10); + OFFSET(S_FP, pt_regs, ARM_fp); + OFFSET(S_IP, pt_regs, ARM_ip); + OFFSET(S_SP, pt_regs, ARM_sp); + OFFSET(S_LR, pt_regs, ARM_lr); + OFFSET(S_PC, pt_regs, ARM_pc); + OFFSET(S_PSR,
[PATCH v7 12/14] arm: Add arch-specific asm/page.h and __va/__pa
These are pretty much the same as the asm-generic version, but use phys_addr_t. Signed-off-by: Andrew Jones drjo...@redhat.com --- lib/arm/asm/io.h | 13 + lib/arm/asm/page.h | 34 +- 2 files changed, 46 insertions(+), 1 deletion(-) diff --git a/lib/arm/asm/io.h b/lib/arm/asm/io.h index 1d0abb7d9f405..bbcbcd0542490 100644 --- a/lib/arm/asm/io.h +++ b/lib/arm/asm/io.h @@ -2,6 +2,7 @@ #define _ASMARM_IO_H_ #include libcflat.h #include asm/barrier.h +#include asm/page.h #define __iomem #define __force @@ -76,6 +77,18 @@ static inline void __raw_writel(u32 val, volatile void __iomem *addr) : r (val)); } +#define virt_to_phys virt_to_phys +static inline phys_addr_t virt_to_phys(const volatile void *x) +{ + return __virt_to_phys((unsigned long)(x)); +} + +#define phys_to_virt phys_to_virt +static inline void *phys_to_virt(phys_addr_t x) +{ + return (void *)__phys_to_virt(x); +} + #include asm-generic/io.h #endif /* _ASMARM_IO_H_ */ diff --git a/lib/arm/asm/page.h b/lib/arm/asm/page.h index 91a4bc3b7f86e..606d76f5775cf 100644 --- a/lib/arm/asm/page.h +++ b/lib/arm/asm/page.h @@ -1 +1,33 @@ -#include asm-generic/page.h +#ifndef _ASMARM_PAGE_H_ +#define _ASMARM_PAGE_H_ +/* + * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com + * + * This work is licensed under the terms of the GNU LGPL, version 2. + */ + +#define PAGE_SHIFT 12 +#ifndef __ASSEMBLY__ +#define PAGE_SIZE (1UL PAGE_SHIFT) +#else +#define PAGE_SIZE (1 PAGE_SHIFT) +#endif +#define PAGE_MASK (~(PAGE_SIZE-1)) +#define PAGE_ALIGN(addr) (((addr) + (PAGE_SIZE-1)) PAGE_MASK) + +#ifndef __ASSEMBLY__ +#include asm/setup.h + +#ifndef __virt_to_phys +#define __phys_to_virt(x) ((unsigned long) (x)) +#define __virt_to_phys(x) (x) +#endif + +#define __va(x)((void *)__phys_to_virt((phys_addr_t)(x))) +#define __pa(x)__virt_to_phys((unsigned long)(x)) + +#define virt_to_pfn(kaddr) (__pa(kaddr) PAGE_SHIFT) +#define pfn_to_virt(pfn)__va((pfn) PAGE_SHIFT) +#endif + +#endif -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7 05/14] add minimal virtio support for devtree virtio-mmio
Support the bare minimum of virtio to enable access to the virtio-mmio config space of a device. Currently this implementation must use a device tree to find the device. Signed-off-by: Andrew Jones drjo...@redhat.com Reviewed-by: Christoffer Dall christoffer.d...@linaro.org --- v7: - s/alloc/calloc/ - split into virtio.[ch] and virtio-mmio.[ch] [Paolo Bonzini] - dump virtio_bind_busses table [Paolo Bonzini] v6: - switch to using alloc() - s/vmdev/vm_dev/ to be consistent with kernel naming - check for virtio magic in vm_dt_match v5: - use same virtio struct names as kernel - no need to alloc a new virtio_config_ops for each virtio device - use ioremap v4: - split from the virtio-testdev patch - search a table to discover that the device must be DT/virtio-mmio, which doesn't change anything, but looks less hacky than comments saying the device must be DT/virtio-mmio... - manage own pool of virtio-mmio pre-allocated device structures in order to avoid needing access to the heap --- lib/virtio-mmio.c | 111 ++ lib/virtio-mmio.h | 47 +++ lib/virtio.c | 13 +++ lib/virtio.h | 74 4 files changed, 245 insertions(+) create mode 100644 lib/virtio-mmio.c create mode 100644 lib/virtio-mmio.h create mode 100644 lib/virtio.c create mode 100644 lib/virtio.h diff --git a/lib/virtio-mmio.c b/lib/virtio-mmio.c new file mode 100644 index 0..7331abf128cc5 --- /dev/null +++ b/lib/virtio-mmio.c @@ -0,0 +1,111 @@ +/* + * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com + * + * This work is licensed under the terms of the GNU LGPL, version 2. + */ +#include libcflat.h +#include devicetree.h +#include alloc.h +#include asm/io.h +#include virtio.h +#include virtio-mmio.h + +static void vm_get(struct virtio_device *vdev, unsigned offset, + void *buf, unsigned len) +{ + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); + u8 *p = buf; + unsigned i; + + for (i = 0; i len; ++i) + p[i] = readb(vm_dev-base + VIRTIO_MMIO_CONFIG + offset + i); +} + +static void vm_set(struct virtio_device *vdev, unsigned offset, + const void *buf, unsigned len) +{ + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); + const u8 *p = buf; + unsigned i; + + for (i = 0; i len; ++i) + writeb(p[i], vm_dev-base + VIRTIO_MMIO_CONFIG + offset + i); +} + +static const struct virtio_config_ops vm_config_ops = { + .get = vm_get, + .set = vm_set, +}; + +static void vm_device_init(struct virtio_mmio_device *vm_dev) +{ + vm_dev-vdev.id.device = readl(vm_dev-base + VIRTIO_MMIO_DEVICE_ID); + vm_dev-vdev.id.vendor = readl(vm_dev-base + VIRTIO_MMIO_VENDOR_ID); + vm_dev-vdev.config = vm_config_ops; +} + +/** + * virtio-mmio device tree support + **/ + +struct vm_dt_info { + u32 devid; + void *base; +}; + +static int vm_dt_match(const struct dt_device *dev, int fdtnode) +{ + struct vm_dt_info *info = (struct vm_dt_info *)dev-info; + struct dt_pbus_reg base; + u32 magic; + + dt_device_bind_node((struct dt_device *)dev, fdtnode); + + assert(dt_pbus_get_base(dev, base) == 0); + info-base = ioremap(base.addr, base.size); + + magic = readl(info-base + VIRTIO_MMIO_MAGIC_VALUE); + if (magic != ('v' | 'i' 8 | 'r' 16 | 't' 24)) + return false; + + return readl(info-base + VIRTIO_MMIO_DEVICE_ID) == info-devid; +} + +static struct virtio_device *virtio_mmio_dt_bind(u32 devid) +{ + struct virtio_mmio_device *vm_dev; + struct dt_device dt_dev; + struct dt_bus dt_bus; + struct vm_dt_info info; + int node; + + if (!dt_available()) + return NULL; + + dt_bus_init_defaults(dt_bus); + dt_bus.match = vm_dt_match; + + info.devid = devid; + + dt_device_init(dt_dev, dt_bus, info); + + node = dt_device_find_compatible(dt_dev, virtio,mmio); + assert(node = 0 || node == -FDT_ERR_NOTFOUND); + + if (node == -FDT_ERR_NOTFOUND) + return NULL; + + vm_dev = calloc(1, sizeof(*vm_dev)); + if (!vm_dev) + return NULL; + + vm_dev-base = info.base; + vm_device_init(vm_dev); + + return vm_dev-vdev; +} + +struct virtio_device *virtio_mmio_bind(u32 devid) +{ + return virtio_mmio_dt_bind(devid); +} diff --git a/lib/virtio-mmio.h b/lib/virtio-mmio.h new file mode 100644 index 0..7cd610428b486 --- /dev/null +++ b/lib/virtio-mmio.h @@ -0,0 +1,47 @@ +#ifndef _VIRTIO_MMIO_H_ +#define _VIRTIO_MMIO_H_ +/* + * A minimal implementation of virtio-mmio. Adapted from the Linux Kernel. + * + * Copyright (C) 2014, Red Hat Inc, Andrew
[PATCH v7 04/14] Introduce lib/alloc
alloc supplies three ingredients to the test framework that are all related to the support of dynamic memory allocation. The first is a set of alloc function wrappers for malloc and its friends. Using wrappers allows test code and common code to use the same interface for memory allocation at all stages, even though the implementations may change with the stage, e.g. pre/post paging. The second is a set of implementations for the alloc function interfaces. These implementations are named early_*, as they can be used almost immediately by the test framework. The third is a very simple physical memory allocator, which the early_* alloc functions build on. Signed-off-by: Andrew Jones drjo...@redhat.com --- v7: expanded from only supplying the alloc function wrappers to including early_* and phys_alloc [Paolo Bonzini] --- lib/alloc.c | 176 lib/alloc.h | 123 ++ 2 files changed, 299 insertions(+) create mode 100644 lib/alloc.c create mode 100644 lib/alloc.h diff --git a/lib/alloc.c b/lib/alloc.c new file mode 100644 index 0..5d55e285dcd1d --- /dev/null +++ b/lib/alloc.c @@ -0,0 +1,176 @@ +/* + * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com + * + * This work is licensed under the terms of the GNU LGPL, version 2. + */ +#include alloc.h +#include asm/spinlock.h +#include asm/io.h + +#define ALIGN_UP_MASK(x, mask) (((x) + (mask)) ~(mask)) +#define ALIGN_UP(x, a) ALIGN_UP_MASK(x, (typeof(x))(a) - 1) +#define MIN(a, b) ((a) (b) ? (a) : (b)) +#define MAX(a, b) ((a) (b) ? (a) : (b)) + +#define PHYS_ALLOC_NR_REGIONS 256 + +struct phys_alloc_region { + phys_addr_t base; + phys_addr_t size; +}; + +static struct phys_alloc_region regions[PHYS_ALLOC_NR_REGIONS]; +static int nr_regions; + +static struct spinlock lock; +static phys_addr_t base, top, align_min; + +void phys_alloc_show(void) +{ + int i; + + spin_lock(lock); + printf(phys_alloc minimum alignment: 0x%llx\n, align_min); + for (i = 0; i nr_regions; ++i) + printf(%016llx-%016llx [%s]\n, + regions[i].base, + regions[i].base + regions[i].size - 1, + USED); + printf(%016llx-%016llx [%s]\n, base, top - 1, FREE); + spin_unlock(lock); +} + +void phys_alloc_init(phys_addr_t base_addr, phys_addr_t size) +{ + spin_lock(lock); + base = base_addr; + top = base + size; + align_min = DEFAULT_MINIMUM_ALIGNMENT; + spin_unlock(lock); +} + +void phys_alloc_set_minimum_alignment(phys_addr_t align) +{ + assert(align !(align (align - 1))); + spin_lock(lock); + align_min = align; + spin_unlock(lock); +} + +static phys_addr_t phys_alloc_aligned_safe(phys_addr_t size, + phys_addr_t align, bool safe) +{ + phys_addr_t addr, size_orig = size; + u64 top_safe = top; + + if (safe sizeof(long) == 4) + top_safe = MIN(top, 1ULL 32); + + align = MAX(align, align_min); + + spin_lock(lock); + + addr = ALIGN_UP(base, align); + size += addr - base; + + if ((top_safe - base) size) { + printf(%s: requested=0x%llx (align=0x%llx), + need=0x%llx, but free=0x%llx. + top=0x%llx, top_safe=0x%llx\n, __func__, + size_orig, align, size, top_safe - base, + top, top_safe); + spin_unlock(lock); + return INVALID_PHYS_ADDR; + } + + base += size; + + if (nr_regions PHYS_ALLOC_NR_REGIONS) { + regions[nr_regions].base = addr; + regions[nr_regions].size = size_orig; + ++nr_regions; + } else { + printf(%s: WARNING: no free log entries, + can't log allocation...\n, __func__); + } + + spin_unlock(lock); + + return addr; +} + +static phys_addr_t phys_zalloc_aligned_safe(phys_addr_t size, + phys_addr_t align, bool safe) +{ + phys_addr_t addr = phys_alloc_aligned_safe(size, align, safe); + if (addr == INVALID_PHYS_ADDR) + return addr; + + memset(phys_to_virt(addr), 0, size); + return addr; +} + +phys_addr_t phys_alloc_aligned(phys_addr_t size, phys_addr_t align) +{ + return phys_alloc_aligned_safe(size, align, false); +} + +phys_addr_t phys_zalloc_aligned(phys_addr_t size, phys_addr_t align) +{ + return phys_zalloc_aligned_safe(size, align, false); +} + +phys_addr_t phys_alloc(phys_addr_t size) +{ + return phys_alloc_aligned(size, align_min); +} + +phys_addr_t phys_zalloc(phys_addr_t size) +{ + return phys_zalloc_aligned(size, align_min); +} + +static void *early_malloc(size_t size) +{ + phys_addr_t addr =
[PATCH v7 02/14] add support for Linux device trees
Build libfdt and add some device tree functions built on it to the arch-neutral lib code in order to facilitate the extraction of boot info and device base addresses. These functions should work on device trees conforming to section III of the kernel doc Documentation/devicetree/booting-without-of.txt. Signed-off-by: Andrew Jones drjo...@redhat.com Reviewed-by: Christoffer Dall christoffer.d...@linaro.org --- v7: - squashed the add a make target bit of v6's libfdt: get libfdt to build (now dropped) patch. The rest of that dropped patch has already been merged under libcflat: add more string functions - no need for info to be const in dt_device_init v5: - changed *get_baseaddr* helpers to *get_base* helpers - a couple minor code changes [Christoffer Dall] v4: reworked everything, added lots of comments to devicetree.h --- Makefile | 21 - lib/devicetree.c | 272 +++ lib/devicetree.h | 236 +++ lib/libcflat.h | 2 + 4 files changed, 529 insertions(+), 2 deletions(-) create mode 100644 lib/devicetree.c create mode 100644 lib/devicetree.h diff --git a/Makefile b/Makefile index 78d9ac664ac4b..180189ecd6d8c 100644 --- a/Makefile +++ b/Makefile @@ -22,6 +22,13 @@ cflatobjs := \ lib/abort.o \ lib/report.o +# libfdt paths +LIBFDT_objdir = lib/libfdt +LIBFDT_srcdir = lib/libfdt +LIBFDT_archive = $(LIBFDT_objdir)/libfdt.a +LIBFDT_include = $(addprefix $(LIBFDT_srcdir)/,$(LIBFDT_INCLUDES)) +LIBFDT_version = $(addprefix $(LIBFDT_srcdir)/,$(LIBFDT_VERSION)) + #include architecure specific make rules include config/config-$(ARCH).mak @@ -47,6 +54,11 @@ LDFLAGS += -pthread -lrt $(libcflat): $(cflatobjs) $(AR) rcs $@ $^ +include $(LIBFDT_srcdir)/Makefile.libfdt +$(LIBFDT_archive): CFLAGS += -ffreestanding -I lib -I lib/libfdt -Wno-sign-compare +$(LIBFDT_archive): $(addprefix $(LIBFDT_objdir)/,$(LIBFDT_OBJS)) + $(AR) rcs $@ $^ + %.o: %.S $(CC) $(CFLAGS) -c -nostdlib -o $@ $ @@ -59,10 +71,15 @@ install: clean: arch_clean $(RM) lib/.*.d $(libcflat) $(cflatobjs) -distclean: clean +libfdt_clean: + $(RM) $(LIBFDT_archive) \ + $(addprefix $(LIBFDT_objdir)/,$(LIBFDT_OBJS)) \ + $(LIBFDT_objdir)/.*.d + +distclean: clean libfdt_clean $(RM) config.mak $(TEST_DIR)-run test.log msr.out cscope.* -cscope: common_dirs = lib +cscope: common_dirs = lib lib/libfdt cscope: $(RM) ./cscope.* find $(TEST_DIR) lib/$(TEST_DIR) $(common_dirs) -maxdepth 1 \ diff --git a/lib/devicetree.c b/lib/devicetree.c new file mode 100644 index 0..0f9b4e9942736 --- /dev/null +++ b/lib/devicetree.c @@ -0,0 +1,272 @@ +/* + * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com + * + * This work is licensed under the terms of the GNU LGPL, version 2. + */ +#include libcflat.h +#include libfdt/libfdt.h +#include devicetree.h + +static const void *fdt; +static u32 root_nr_address_cells, root_nr_size_cells; + +const void *dt_fdt(void) +{ + return fdt; +} + +bool dt_available(void) +{ + return fdt_check_header(fdt) == 0; +} + +int dt_get_nr_cells(int fdtnode, u32 *nr_address_cells, u32 *nr_size_cells) +{ + const struct fdt_property *prop; + u32 *nr_cells; + int len; + + prop = fdt_get_property(fdt, fdtnode, #address-cells, len); + if (prop == NULL) + return len; + + nr_cells = (u32 *)prop-data; + *nr_address_cells = fdt32_to_cpu(*nr_cells); + + prop = fdt_get_property(fdt, fdtnode, #size-cells, len); + if (prop == NULL) + return len; + + nr_cells = (u32 *)prop-data; + *nr_size_cells = fdt32_to_cpu(*nr_cells); + + return 0; +} + +void dt_reg_init(struct dt_reg *reg, u32 nr_address_cells, u32 nr_size_cells) +{ + memset(reg, 0, sizeof(struct dt_reg)); + reg-nr_address_cells = nr_address_cells; + reg-nr_size_cells = nr_size_cells; +} + +int dt_get_reg(int fdtnode, int regidx, struct dt_reg *reg) +{ + const struct fdt_property *prop; + u32 *cells, i; + unsigned nr_tuple_cells; + int len; + + prop = fdt_get_property(fdt, fdtnode, reg, len); + if (prop == NULL) + return len; + + cells = (u32 *)prop-data; + nr_tuple_cells = reg-nr_address_cells + reg-nr_size_cells; + regidx *= nr_tuple_cells; + + if (regidx + nr_tuple_cells len/sizeof(u32)) + return -FDT_ERR_NOTFOUND; + + for (i = 0; i reg-nr_address_cells; ++i) + reg-address_cells[i] = fdt32_to_cpu(cells[regidx + i]); + + regidx += reg-nr_address_cells; + for (i = 0; i reg-nr_size_cells; ++i) + reg-size_cells[i] = fdt32_to_cpu(cells[regidx + i]); + + return 0; +} + +int dt_pbus_translate_node(int fdtnode, int regidx, + struct dt_pbus_reg *pbus_reg) +{ + struct dt_reg
[PATCH v7 08/14] Introduce chr-testdev
chr-testdev is a qemu backend that can be used by test code to send qemu commands. It communicates with qemu through a virtio-console device. The only command currently implemented is quit, which allows the test code to exit with a given status code, i.e. chr_testdev_exit(code). Signed-off-by: Andrew Jones drjo...@redhat.com --- lib/chr-testdev.c | 72 +++ lib/chr-testdev.h | 14 +++ lib/virtio.h | 2 ++ 3 files changed, 88 insertions(+) create mode 100644 lib/chr-testdev.c create mode 100644 lib/chr-testdev.h diff --git a/lib/chr-testdev.c b/lib/chr-testdev.c new file mode 100644 index 0..0c9a173a04886 --- /dev/null +++ b/lib/chr-testdev.c @@ -0,0 +1,72 @@ +/* + * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com + * + * This work is licensed under the terms of the GNU LGPL, version 2. + */ +#include libcflat.h +#include virtio.h +#include asm/spinlock.h + +#define TESTDEV_NAME chr-testdev + +static struct virtio_device *vcon; +static struct virtqueue *in_vq, *out_vq; +static struct spinlock lock; + +static void __testdev_send(char *buf, size_t len) +{ + int ret; + + ret = virtqueue_add_outbuf(out_vq, buf, len); + virtqueue_kick(out_vq); + + if (ret 0) + return; + + while (!virtqueue_get_buf(out_vq, len)) + ; +} + +void chr_testdev_exit(int code) +{ + char buf[8]; + int len; + + snprintf(buf, sizeof(buf), %dq, code); + len = strlen(buf); + + spin_lock(lock); + + if (!vcon) + goto out; + + __testdev_send(buf, len); + +out: + spin_unlock(lock); +} + +void chr_testdev_init(void) +{ + const char *io_names[] = { input, output }; + struct virtqueue *vqs[2]; + int ret; + + vcon = virtio_bind(VIRTIO_ID_CONSOLE); + if (vcon == NULL) { + printf(%s: %s: can't find a virtio-console\n, + __func__, TESTDEV_NAME); + return; + } + + ret = vcon-config-find_vqs(vcon, 2, vqs, NULL, io_names); + if (ret 0) { + printf(%s: %s: can't init virtqueues\n, + __func__, TESTDEV_NAME); + vcon = NULL; + return; + } + + in_vq = vqs[0]; + out_vq = vqs[1]; +} diff --git a/lib/chr-testdev.h b/lib/chr-testdev.h new file mode 100644 index 0..ffd9a851aa9b9 --- /dev/null +++ b/lib/chr-testdev.h @@ -0,0 +1,14 @@ +#ifndef _CHR_TESTDEV_H_ +#define _CHR_TESTDEV_H_ +/* + * chr-testdev is a driver for the chr-testdev qemu backend. + * The chr-testdev backend exposes a simple control interface to + * qemu for kvm-unit-tests accessible through virtio-console. + * + * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com + * + * This work is licensed under the terms of the GNU LGPL, version 2. + */ +extern void chr_testdev_init(void); +extern void chr_testdev_exit(int code); +#endif diff --git a/lib/virtio.h b/lib/virtio.h index 37ce028b2c2bb..b51899ab998b6 100644 --- a/lib/virtio.h +++ b/lib/virtio.h @@ -10,6 +10,8 @@ */ #include libcflat.h +#define VIRTIO_ID_CONSOLE 3 + struct virtio_device_id { u32 device; u32 vendor; -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7 10/14] arm: Add spinlock implementation
From: Christoffer Dall christoffer.d...@linaro.org Add simple busy-wait spinlock implementation for ARM. Signed-off-by: Christoffer Dall christoffer.d...@linaro.org Signed-off-by: Andrew Jones drjo...@redhat.com --- config/config-arm.mak | 3 ++- lib/arm/asm/spinlock.h | 9 ++--- lib/arm/spinlock.c | 28 3 files changed, 32 insertions(+), 8 deletions(-) create mode 100644 lib/arm/spinlock.c diff --git a/config/config-arm.mak b/config/config-arm.mak index ff965428e0e5b..b7239810183d1 100644 --- a/config/config-arm.mak +++ b/config/config-arm.mak @@ -37,7 +37,8 @@ cflatobjs += \ lib/virtio-mmio.o \ lib/chr-testdev.o \ lib/arm/io.o \ - lib/arm/setup.o + lib/arm/setup.o \ + lib/arm/spinlock.o libeabi = lib/arm/libeabi.a eabiobjs = lib/arm/eabi_compat.o diff --git a/lib/arm/asm/spinlock.h b/lib/arm/asm/spinlock.h index 04f5a1a5538e2..2118a4b3751e0 100644 --- a/lib/arm/asm/spinlock.h +++ b/lib/arm/asm/spinlock.h @@ -5,12 +5,7 @@ struct spinlock { int v; }; -//TODO -static inline void spin_lock(struct spinlock *lock __unused) -{ -} -static inline void spin_unlock(struct spinlock *lock __unused) -{ -} +extern void spin_lock(struct spinlock *lock); +extern void spin_unlock(struct spinlock *lock); #endif /* _ASMARM_SPINLOCK_H_ */ diff --git a/lib/arm/spinlock.c b/lib/arm/spinlock.c new file mode 100644 index 0..d8a6d4c3383d6 --- /dev/null +++ b/lib/arm/spinlock.c @@ -0,0 +1,28 @@ +#include libcflat.h +#include asm/spinlock.h +#include asm/barrier.h + +void spin_lock(struct spinlock *lock) +{ + u32 val, fail; + + dmb(); + do { + asm volatile( + 1: ldrex %0, [%2]\n + teq %0, #0\n + bne 1b\n + mov %0, #1\n + strex %1, %0, [%2]\n + : =r (val), =r (fail) + : r (lock-v) + : cc ); + } while (fail); + dmb(); +} + +void spin_unlock(struct spinlock *lock) +{ + lock-v = 0; + dmb(); +} -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7 09/14] arm: initial drop
This is the initial drop of the arm test framework and a first test that just checks that setup completed (a selftest). kvm isn't needed to run this test unless testing with smp 1. Try it out with yum install gcc-arm-linux-gnu export QEMU=[qemu with mach-virt and chr-testdev] ./configure --cross-prefix=arm-linux-gnu- --arch=arm make ./run_tests.sh Signed-off-by: Andrew Jones drjo...@redhat.com Reviewed-by: Christoffer Dall christoffer.d...@linaro.org --- v7: - remove memregions (reworked them as phys_alloc in lib/alloc) - selftest: non-functional change: s/argv[i]/var/ - lib/argv:setup_args don't dereference NULL v6: - fixed setup.c comment [Christoffer Dall] - changed arm/run to use chr-testdev instead of virtio-testdev - add align parameter to memregion_new, setup alloc_ops v5: - memregions: check freemem_start is in bounds and document - selftest: rename testnam = testname and properly init it - io.c: use writeb instead of writel in puts() and use ioremap - arm/run script update for new qemu ('-device ?' now requires -machine) - couple other minor changes to setup.c and io.c [Christoffer Dall] v4: - moved fdt to just after stacktop (it was in the middle of free memory) - switched from using heap to memregions - get nr_cpus and added smp=num test - added barrier.h - use new report()/report_summary() - config/config-arm.mak cleanup --- arm/cstart.S | 35 arm/flat.lds | 18 +++ arm/run| 46 +++ arm/selftest.c | 86 ++ arm/unittests.cfg | 18 +++ config/config-arm.mak | 74 +++ configure | 12 +-- lib/argv.c | 9 ++ lib/arm/asm/barrier.h | 18 +++ lib/arm/asm/io.h | 24 ++ lib/arm/asm/page.h | 1 + lib/arm/asm/setup.h| 27 lib/arm/asm/spinlock.h | 16 ++ lib/arm/eabi_compat.c | 20 lib/arm/io.c | 65 ++ lib/arm/setup.c| 82 +++ 16 files changed, 549 insertions(+), 2 deletions(-) create mode 100644 arm/cstart.S create mode 100644 arm/flat.lds create mode 100755 arm/run create mode 100644 arm/selftest.c create mode 100644 arm/unittests.cfg create mode 100644 config/config-arm.mak create mode 100644 lib/arm/asm/barrier.h create mode 100644 lib/arm/asm/io.h create mode 100644 lib/arm/asm/page.h create mode 100644 lib/arm/asm/setup.h create mode 100644 lib/arm/asm/spinlock.h create mode 100644 lib/arm/eabi_compat.c create mode 100644 lib/arm/io.c create mode 100644 lib/arm/setup.c diff --git a/arm/cstart.S b/arm/cstart.S new file mode 100644 index 0..e28251db2950d --- /dev/null +++ b/arm/cstart.S @@ -0,0 +1,35 @@ +/* + * Boot entry point and assembler functions for armv7 tests. + * + * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com + * + * This work is licensed under the terms of the GNU LGPL, version 2. + */ + +.arm + +.section .init + +.globl start +start: + /* +* bootloader params are in r0-r2 +* See the kernel doc Documentation/arm/Booting +*/ + ldr sp, =stacktop + bl setup + + /* run the test */ + ldr r0, =__argc + ldr r0, [r0] + ldr r1, =__argv + bl main + bl exit + b halt + +.text + +.globl halt +halt: +1: wfi + b 1b diff --git a/arm/flat.lds b/arm/flat.lds new file mode 100644 index 0..3e5d72e24989b --- /dev/null +++ b/arm/flat.lds @@ -0,0 +1,18 @@ + +SECTIONS +{ +.text : { *(.init) *(.text) *(.text.*) } +. = ALIGN(4K); +.data : { *(.data) } +. = ALIGN(16); +.rodata : { *(.rodata) } +. = ALIGN(16); +.bss : { *(.bss) } +. = ALIGN(4K); +edata = .; +. += 8K; +. = ALIGN(4K); +stacktop = .; +} + +ENTRY(start) diff --git a/arm/run b/arm/run new file mode 100755 index 0..a714350225597 --- /dev/null +++ b/arm/run @@ -0,0 +1,46 @@ +#!/bin/bash + +if [ ! -f config.mak ]; then + echo run ./configure first. See ./configure -h + exit 2 +fi +source config.mak + +qemu=${QEMU:-qemu-system-arm} +qpath=$(which $qemu 2/dev/null) + +if [ -z $qpath ]; then + echo $qemu not found. + exit 2 +fi + +if ! $qemu -machine '?' 21 | grep 'ARM Virtual Machine' /dev/null; then + echo $qpath doesn't support mach-virt ('-machine virt'). Exiting. + exit 2 +fi + +M='-machine virt' + +if ! $qemu $M -device '?' 21 | grep virtconsole /dev/null; then + echo $qpath doesn't support virtio-console for chr-testdev. Exiting. + exit 2 +fi + +if $qemu $M -chardev testdev,id=id -kernel . 21 \ + | grep backend /dev/null; then + echo $qpath doesn't support chr-testdev. Exiting. +
Re: [PATCH v7 09/14] arm: initial drop
diff --git a/arm/unittests.cfg b/arm/unittests.cfg new file mode 100644 index 0..da9dfd7b1f118 --- /dev/null +++ b/arm/unittests.cfg @@ -0,0 +1,18 @@ +# Define your new unittest following the convention: +# [unittest_name] +# file = foo.flat # Name of the flat file to be used +# smp = 2# Number of processors the VM will use during this test +# extra_params = -append params... # Additional parameters used +# arch = arm/arm64 # Only if test case is specific to one +# groups = group1 group2 # Used to identify test cases with run_tests -g ... + +# +# Test that the configured number of processors (smp = num), and +# that the configured amount of memory (-m MB) are correctly setup +# by the framework. +# +[selftest::setup] +file = selftest.flat +smp = 1 +extra_params = -m 256 -append 'setup smp=1 mem=256' +groups = selftest Nice. :) diff --git a/lib/arm/asm/page.h b/lib/arm/asm/page.h new file mode 100644 index 0..91a4bc3b7f86e --- /dev/null +++ b/lib/arm/asm/page.h @@ -0,0 +1 @@ +#include asm-generic/page.h diff --git a/lib/arm/asm/setup.h b/lib/arm/asm/setup.h new file mode 100644 index 0..21445ef2085fc --- /dev/null +++ b/lib/arm/asm/setup.h @@ -0,0 +1,27 @@ +#ifndef _ASMARM_SETUP_H_ +#define _ASMARM_SETUP_H_ +/* + * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com + * + * This work is licensed under the terms of the GNU LGPL, version 2. + */ +#include libcflat.h +#include alloc.h + +#define NR_CPUS8 +extern u32 cpus[NR_CPUS]; +extern int nr_cpus; + +extern phys_addr_t __phys_offset, __phys_end; + +#define PHYS_OFFSET(__phys_offset) +#define PHYS_END (__phys_end) +#define PHYS_SHIFT 40 +#define PHYS_SIZE (1ULL PHYS_SHIFT) +#define PHYS_MASK (PHYS_SIZE - 1ULL) Can you explain these? I'm not sure I understand this: + mem_start = regs[0].addr; + mem_end = mem_start + regs[0].size; + + assert(!(mem_start ~PHYS_MASK) !((mem_end-1) ~PHYS_MASK)); + assert(freemem_start = mem_start freemem_start mem_end); + + __phys_offset = mem_start; /* PHYS_OFFSET */ + __phys_end = mem_end; /* PHYS_END */ and I think the macro indirection (__phys_offset vs. PHYS_OFFSET, __phys_end vs. PHYS_END) is unnecessary: just call the variables phys_offset and phys_end. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 08/14] Introduce chr-testdev
- Original Message - [...] +void chr_testdev_exit(int code) +{ + char buf[8]; + int len; + + snprintf(buf, sizeof(buf), %dq, code); + len = strlen(buf); AFAIK, snprintf returns the number of characters written, so these two statements can be merged into one. Thanks, Levente Kurusa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 09/14] arm: initial drop
On Wed, Jul 16, 2014 at 11:22:18AM +0200, Paolo Bonzini wrote: diff --git a/arm/unittests.cfg b/arm/unittests.cfg new file mode 100644 index 0..da9dfd7b1f118 --- /dev/null +++ b/arm/unittests.cfg @@ -0,0 +1,18 @@ +# Define your new unittest following the convention: +# [unittest_name] +# file = foo.flat # Name of the flat file to be used +# smp = 2# Number of processors the VM will use during this test +# extra_params = -append params... # Additional parameters used +# arch = arm/arm64 # Only if test case is specific to one +# groups = group1 group2 # Used to identify test cases with run_tests -g ... + +# +# Test that the configured number of processors (smp = num), and +# that the configured amount of memory (-m MB) are correctly setup +# by the framework. +# +[selftest::setup] +file = selftest.flat +smp = 1 +extra_params = -m 256 -append 'setup smp=1 mem=256' +groups = selftest Nice. :) diff --git a/lib/arm/asm/page.h b/lib/arm/asm/page.h new file mode 100644 index 0..91a4bc3b7f86e --- /dev/null +++ b/lib/arm/asm/page.h @@ -0,0 +1 @@ +#include asm-generic/page.h diff --git a/lib/arm/asm/setup.h b/lib/arm/asm/setup.h new file mode 100644 index 0..21445ef2085fc --- /dev/null +++ b/lib/arm/asm/setup.h @@ -0,0 +1,27 @@ +#ifndef _ASMARM_SETUP_H_ +#define _ASMARM_SETUP_H_ +/* + * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com + * + * This work is licensed under the terms of the GNU LGPL, version 2. + */ +#include libcflat.h +#include alloc.h + +#define NR_CPUS 8 +extern u32 cpus[NR_CPUS]; +extern int nr_cpus; + +extern phys_addr_t __phys_offset, __phys_end; + +#define PHYS_OFFSET (__phys_offset) +#define PHYS_END(__phys_end) +#define PHYS_SHIFT 40 +#define PHYS_SIZE (1ULL PHYS_SHIFT) +#define PHYS_MASK (PHYS_SIZE - 1ULL) Can you explain these? I'm not sure I understand this: arm with LPAE can address 40-bit addrs. PHYS_MASK is handy to assert all addresses we expect to be addressable, are. +mem_start = regs[0].addr; +mem_end = mem_start + regs[0].size; + +assert(!(mem_start ~PHYS_MASK) !((mem_end-1) ~PHYS_MASK)); +assert(freemem_start = mem_start freemem_start mem_end); + +__phys_offset = mem_start; /* PHYS_OFFSET */ +__phys_end = mem_end; /* PHYS_END */ and I think the macro indirection (__phys_offset vs. PHYS_OFFSET, __phys_end vs. PHYS_END) is unnecessary: just call the variables phys_offset and phys_end. PHYS_OFFSET is consistent with the kernel naming, so I'd like to keep that. I invented PHYS_END, as it can serve a nice utility (mem_size = PHYS_END - PHYS_OFFSET), and I wouldn't want to leave it as the odd one out by not granting it the privilege of capital letters. drew -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ensure guest's kvmclock never goes backwards when TSC jumps backward
There are buggy hosts in the wild that advertise invariant TSC and as result host uses TSC as clocksource, but TSC on such host sometimes sporadically jumps backwards. This causes kvmclock to go backwards if host advertises PVCLOCK_TSC_STABLE_BIT, which turns off aggregated clock accumulator and returns: pvclock_vcpu_time_info.system_timestamp + offset where 'offset' is calculated using TSC. Since TSC is not virtualized in KVM, it makes guest see TSC jumped backwards and leads to kvmclock going backwards as well. This is defensive patch that keeps per CPU last clock value and ensures that clock will never go backwards even with using PVCLOCK_TSC_STABLE_BIT enabled path. Signed-off-by: Igor Mammedov imamm...@redhat.com --- RHBZ: 1115795 --- arch/x86/kernel/pvclock.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index 2f355d2..dd9df0e 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -71,11 +71,14 @@ u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src) return flags valid_flags; } +static DEFINE_PER_CPU(cycle_t, last_clock); + cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) { unsigned version; cycle_t ret; - u64 last; + u64 last, *this_cpu_last; + s64 clock_delta; u8 flags; do { @@ -87,6 +90,16 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) pvclock_touch_watchdogs(); } + this_cpu_last = get_cpu_var(last_clock); + clock_delta = ret - *this_cpu_last; + if (likely(clock_delta 0)) { + *this_cpu_last = ret; + } else { + ret = *this_cpu_last; + WARN_ONCE(1, clock went backwards); + } + put_cpu_var(last_clock); + if ((valid_flags PVCLOCK_TSC_STABLE_BIT) (flags PVCLOCK_TSC_STABLE_BIT)) return ret; -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ensure guest's kvmclock never goes backwards when TSC jumps backward
Il 16/07/2014 11:52, Igor Mammedov ha scritto: There are buggy hosts in the wild that advertise invariant TSC and as result host uses TSC as clocksource, but TSC on such host sometimes sporadically jumps backwards. This causes kvmclock to go backwards if host advertises PVCLOCK_TSC_STABLE_BIT, which turns off aggregated clock accumulator and returns: pvclock_vcpu_time_info.system_timestamp + offset where 'offset' is calculated using TSC. Since TSC is not virtualized in KVM, it makes guest see TSC jumped backwards and leads to kvmclock going backwards as well. This is defensive patch that keeps per CPU last clock value and ensures that clock will never go backwards even with using PVCLOCK_TSC_STABLE_BIT enabled path. I'm not sure that a per-CPU value is enough; your patch can make the problem much less frequent of course, but I'm not sure neither detection nor correction are 100% reliable. Your addition is basically a faster but less reliable version of the last_value logic. If may be okay to have detection that is faster but not 100% reliable. However, once you find that the host is buggy I think the correct thing to do is to write last_value and kill PVCLOCK_TSC_STABLE_BIT from valid_flags. Did you check that the affected host has the latest microcode? Alternatively, could we simply blacklist some CPU steppings? I'm not sure who we could ask at AMD :( but perhaps there is an erratum. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 09/14] arm: initial drop
Il 16/07/2014 11:39, Andrew Jones ha scritto: PHYS_OFFSET is consistent with the kernel naming, so I'd like to keep that. I invented PHYS_END, as it can serve a nice utility (mem_size = PHYS_END - PHYS_OFFSET), and I wouldn't want to leave it as the odd one out by not granting it the privilege of capital letters. Ok, I see some de-Linuxization coming to the ARM kvm-unit-tests sooner or later, but getting things moving is more important. I'll push the tests as soon as I can try them on the cubietruck. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ensure guest's kvmclock never goes backwards when TSC jumps backward
On Wed, Jul 16, 2014 at 12:18:37PM +0200, Paolo Bonzini wrote: Il 16/07/2014 11:52, Igor Mammedov ha scritto: There are buggy hosts in the wild that advertise invariant TSC and as result host uses TSC as clocksource, but TSC on such host sometimes sporadically jumps backwards. This causes kvmclock to go backwards if host advertises PVCLOCK_TSC_STABLE_BIT, which turns off aggregated clock accumulator and returns: pvclock_vcpu_time_info.system_timestamp + offset where 'offset' is calculated using TSC. Since TSC is not virtualized in KVM, it makes guest see TSC jumped backwards and leads to kvmclock going backwards as well. This is defensive patch that keeps per CPU last clock value and ensures that clock will never go backwards even with using PVCLOCK_TSC_STABLE_BIT enabled path. I'm not sure that a per-CPU value is enough; your patch can make the problem much less frequent of course, but I'm not sure neither detection nor correction are 100% reliable. Your addition is basically a faster but less reliable version of the last_value logic. If may be okay to have detection that is faster but not 100% reliable. However, once you find that the host is buggy I think the correct thing to do is to write last_value and kill PVCLOCK_TSC_STABLE_BIT from valid_flags. Did you check that the affected host has the latest microcode? Alternatively, could we simply blacklist some CPU steppings? I'm not sure who we could ask at AMD :( but perhaps there is an erratum. Paolo Igor, Can we move detection to the host TSC clocksource driver ? Because it is responsability of the host system to provide a non backwards clock_gettime() interface as well. How did you prove its the host TSC in fact going backwards? Is it cross-CPU detection? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ensure guest's kvmclock never goes backwards when TSC jumps backward
On Wed, 16 Jul 2014 08:41:00 -0300 Marcelo Tosatti mtosa...@redhat.com wrote: On Wed, Jul 16, 2014 at 12:18:37PM +0200, Paolo Bonzini wrote: Il 16/07/2014 11:52, Igor Mammedov ha scritto: There are buggy hosts in the wild that advertise invariant TSC and as result host uses TSC as clocksource, but TSC on such host sometimes sporadically jumps backwards. This causes kvmclock to go backwards if host advertises PVCLOCK_TSC_STABLE_BIT, which turns off aggregated clock accumulator and returns: pvclock_vcpu_time_info.system_timestamp + offset where 'offset' is calculated using TSC. Since TSC is not virtualized in KVM, it makes guest see TSC jumped backwards and leads to kvmclock going backwards as well. This is defensive patch that keeps per CPU last clock value and ensures that clock will never go backwards even with using PVCLOCK_TSC_STABLE_BIT enabled path. I'm not sure that a per-CPU value is enough; your patch can make the problem much less frequent of course, but I'm not sure neither detection nor correction are 100% reliable. Your addition is basically a faster but less reliable version of the last_value logic. How is it less reliable than last_value logic? Alternatively, we can panic in case of backward jump here, so that guest won't hang in random place in case of error. There might not be OOPs but at least coredump will point to a right place. If may be okay to have detection that is faster but not 100% reliable. However, once you find that the host is buggy I think the correct thing to do is to write last_value and kill PVCLOCK_TSC_STABLE_BIT from valid_flags. that might be an option, but what value we need to store into last_value? To make sure that clock won't go back we need to track it on all CPUs and store highest value to last_value, at this point there is no point in switching to last_value path since we have to track per CPU anyway. What this patch doesn't cover is switching from master_clock mode to last_value mode (it happens at CPU hotplug time), I'd need to add what was described above as second patch on top of this one. Did you check that the affected host has the latest microcode? Alternatively, could we simply blacklist some CPU steppings? I'm not sure who we could ask at AMD :( but perhaps there is an erratum. I haven't found anything in this direction yet. I'm still trying to find someone from AMD to look at the issue. Paolo Igor, Can we move detection to the host TSC clocksource driver ? I haven't looked much at host side solution yet, but to detection reliable it needs to be run constantly, from read_native_tsc(). it's possible to put detection into check_system_tsc_reliable() but that would increase boot time and it's not clear for how long test should run to make detection reliable (in my case it takes ~5-10sec to detect first failure). Best we could at boot time is mark TSC as unstable on affected hardware, but for this we need to figure out if it's specific machine or CPU issue to do it properly. (I'm in process of finding out who to bug with it) Because it is responsability of the host system to provide a non backwards clock_gettime() interface as well. vdso_clock_gettime() is not affected since it will use last highest tsc value in case of jump due to usage of vread_tsc(). PS: it appears that host runs stably. but kvm_get_time_and_clockread() is affected since it uses its own version of do_monotonic()-vgettsc() which will lead to cycles go backwards and overflow of nano secs in timespec. We should mimic vread_tsc() here so not to run into this kind of issues. How did you prove its the host TSC in fact going backwards? Is it cross-CPU detection? I've checked with several methods, 1. patched pvclock_clocksource_read() in guest with VCPUs pinned to host CPUs. 2.Ingo's tsc_wrap_test, which fails miserably on affected host. 3 sytemtap script hooked to read_native_tsc(), for source see https://bugzilla.redhat.com/show_bug.cgi?id=1115795#c12 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
On Wed, Jul 16, 2014 at 12:36 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 16/07/2014 09:10, Daniel Borkmann ha scritto: On 07/16/2014 08:41 AM, Gleb Natapov wrote: On Tue, Jul 15, 2014 at 07:48:06PM -0700, Andy Lutomirski wrote: virtio-rng is both too complicated and insufficient for initial rng seeding. It's far too complicated to use for KASLR or any other early boot random number needs. It also provides /dev/random-style bits, which means that making guest boot wait for virtio-rng is unacceptably slow, and doing it asynchronously means that /dev/urandom might be predictable when userspace starts. This introduces a very simple synchronous mechanism to get /dev/urandom-style bits. Why can't you use RDRAND instruction for that? You mean using it directly? I think simply for the very same reasons as in c2557a303a ... No, this is very different. This mechanism provides no guarantee that the result contains any actual entropy. In fact, patch 3 adds a call to the new arch_get_slow_rng_u64 just below a call to arch_get_random_lang aka RDRAND. I agree with Gleb that it's simpler to just expect a relatively recent processor and use RDRAND. BTW, the logic for crediting entropy to RDSEED but not RDRAND escapes me. If you trust the processor, you could use Intel's algorithm to force reseeding of RDRAND. If you don't trust the processor, the same paranoia applies to RDRAND and RDSEED. In a guest you must trust the hypervisor anyway to use RDRAND or RDSEED, since the hypervisor can trap it. A malicious hypervisor is no different from a malicious processor. This patch has nothing whatsoever to do with how much I trust the CPU vs the hypervisor. It's for the enormous installed base of machines without RDRAND. hpa suggested emulating RDRAND awhile ago, but I think that'll unusably slow -- the kernel uses RDRAND in various places where it's expected to be fast, and not using it at all will be preferable to causing a VM exit for every few bytes. I've been careful to only use this in the guest in places where a few hundred to a few thousand cycles per 64 bits of RNG seed is acceptable. In any case, is there a matching QEMU patch somewhere? What QEMU change is needed? I admit I'm a bit vague on how QEMU and KVM cooperate here, but there's no state to save and restore. I guess that QEMU wants the ability to turn this on and off for migration. How does that work? I couldn't spot the KVM code that allows this type of control. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ensure guest's kvmclock never goes backwards when TSC jumps backward
Il 16/07/2014 15:55, Igor Mammedov ha scritto: On Wed, 16 Jul 2014 08:41:00 -0300 Marcelo Tosatti mtosa...@redhat.com wrote: On Wed, Jul 16, 2014 at 12:18:37PM +0200, Paolo Bonzini wrote: Il 16/07/2014 11:52, Igor Mammedov ha scritto: There are buggy hosts in the wild that advertise invariant TSC and as result host uses TSC as clocksource, but TSC on such host sometimes sporadically jumps backwards. This causes kvmclock to go backwards if host advertises PVCLOCK_TSC_STABLE_BIT, which turns off aggregated clock accumulator and returns: pvclock_vcpu_time_info.system_timestamp + offset where 'offset' is calculated using TSC. Since TSC is not virtualized in KVM, it makes guest see TSC jumped backwards and leads to kvmclock going backwards as well. This is defensive patch that keeps per CPU last clock value and ensures that clock will never go backwards even with using PVCLOCK_TSC_STABLE_BIT enabled path. I'm not sure that a per-CPU value is enough; your patch can make the problem much less frequent of course, but I'm not sure neither detection nor correction are 100% reliable. Your addition is basically a faster but less reliable version of the last_value logic. How is it less reliable than last_value logic? Suppose CPU 1 is behind by 3 nanoseconds CPU 0 CPU 1 time = 100 (at time 100) time = 99(at time 102) time = 104 (at time 104) time = 105 (at time 108) Your patch will not detect this. If may be okay to have detection that is faster but not 100% reliable. However, once you find that the host is buggy I think the correct thing to do is to write last_value and kill PVCLOCK_TSC_STABLE_BIT from valid_flags. that might be an option, but what value we need to store into last_value? You can write the value that was in the per-CPU variable (not perfect correction)... To make sure that clock won't go back we need to track it on all CPUs and store highest value to last_value, at this point there is no point in switching to last_value path since we have to track per CPU anyway. ... or loop over all CPUs and find the highest value. You would only have to do this once. Can we move detection to the host TSC clocksource driver ? I haven't looked much at host side solution yet, but to detection reliable it needs to be run constantly, from read_native_tsc(). it's possible to put detection into check_system_tsc_reliable() but that would increase boot time and it's not clear for how long test should run to make detection reliable (in my case it takes ~5-10sec to detect first failure). Is 5-10sec the time that it takes for tsc_wrap_test to fail? Best we could at boot time is mark TSC as unstable on affected hardware, but for this we need to figure out if it's specific machine or CPU issue to do it properly. (I'm in process of finding out who to bug with it) Thanks, this would be best. PS: it appears that host runs stably. but kvm_get_time_and_clockread() is affected since it uses its own version of do_monotonic()-vgettsc() which will lead to cycles go backwards and overflow of nano secs in timespec. We should mimic vread_tsc() here so not to run into this kind of issues. I'm not sure I understand, the code is similar: arch/x86/kvm/x86.c arch/x86/vdso/vclock_gettime.c do_monotonicdo_monotonic vgettsc vgetsns read_tscvread_tsc vget_cycles __native_read_tsc __native_read_tsc The VDSO inlines timespec_add_ns. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
Il 16/07/2014 16:07, Andy Lutomirski ha scritto: This patch has nothing whatsoever to do with how much I trust the CPU vs the hypervisor. It's for the enormous installed base of machines without RDRAND. Ok. I think an MSR is fine, though I don't think it's useful for the guest to use it if it already has RDRAND and/or RDSEED. In any case, is there a matching QEMU patch somewhere? What QEMU change is needed? I admit I'm a bit vague on how QEMU and KVM cooperate here, but there's no state to save and restore. I guess that QEMU wants the ability to turn this on and off for migration. How does that work? I couldn't spot the KVM code that allows this type of control. It is QEMU who decides the CPUID bits that are visible to the guest. By default it blocks bits that it doesn't know about. You would need to add the bit in the kvm_default_features and kvm_feature_name arrays. For migration, we have versioned machine types, for example pc-2.1. Once the versioned machine type exists, blocking the feature is a one-liner like x86_cpu_compat_disable_kvm_features(FEAT_KVM, KVM_FEATURE_NAME); Unfortunately, QEMU is in hard freeze, so you'd likely be the one creating pc-2.2. This is a boilerplate but relatively complicated patch. But let's cross that bridge when we'll reach it. For now, you can simply add the bit to the two arrays above. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ensure guest's kvmclock never goes backwards when TSC jumps backward
On Wed, 16 Jul 2014 16:16:17 +0200 Paolo Bonzini pbonz...@redhat.com wrote: Il 16/07/2014 15:55, Igor Mammedov ha scritto: On Wed, 16 Jul 2014 08:41:00 -0300 Marcelo Tosatti mtosa...@redhat.com wrote: On Wed, Jul 16, 2014 at 12:18:37PM +0200, Paolo Bonzini wrote: Il 16/07/2014 11:52, Igor Mammedov ha scritto: There are buggy hosts in the wild that advertise invariant TSC and as result host uses TSC as clocksource, but TSC on such host sometimes sporadically jumps backwards. This causes kvmclock to go backwards if host advertises PVCLOCK_TSC_STABLE_BIT, which turns off aggregated clock accumulator and returns: pvclock_vcpu_time_info.system_timestamp + offset where 'offset' is calculated using TSC. Since TSC is not virtualized in KVM, it makes guest see TSC jumped backwards and leads to kvmclock going backwards as well. This is defensive patch that keeps per CPU last clock value and ensures that clock will never go backwards even with using PVCLOCK_TSC_STABLE_BIT enabled path. I'm not sure that a per-CPU value is enough; your patch can make the problem much less frequent of course, but I'm not sure neither detection nor correction are 100% reliable. Your addition is basically a faster but less reliable version of the last_value logic. How is it less reliable than last_value logic? Suppose CPU 1 is behind by 3 nanoseconds CPU 0 CPU 1 time = 100 (at time 100) time = 99(at time 102) time = 104 (at time 104) time = 105 (at time 108) Your patch will not detect this. Is it possible for each cpu to have it's own time? If may be okay to have detection that is faster but not 100% reliable. However, once you find that the host is buggy I think the correct thing to do is to write last_value and kill PVCLOCK_TSC_STABLE_BIT from valid_flags. that might be an option, but what value we need to store into last_value? You can write the value that was in the per-CPU variable (not perfect correction)... I'll look at this variant, it's not perfect but it doesn't involve callout to other CPUs! To make sure that clock won't go back we need to track it on all CPUs and store highest value to last_value, at this point there is no point in switching to last_value path since we have to track per CPU anyway. ... or loop over all CPUs and find the highest value. You would only have to do this once. Can we move detection to the host TSC clocksource driver ? I haven't looked much at host side solution yet, but to detection reliable it needs to be run constantly, from read_native_tsc(). it's possible to put detection into check_system_tsc_reliable() but that would increase boot time and it's not clear for how long test should run to make detection reliable (in my case it takes ~5-10sec to detect first failure). Is 5-10sec the time that it takes for tsc_wrap_test tofail? nope, for systemtap script hooked to native_read_tsc(), but it depend on the load for example hotplugging VCPU causes imediate jumps. tsc_wrap_test starts to fail almost imediately, I'll check how much tries it takes to fail for the first time, if it is not too much I guess we could add check to check_system_tsc_reliable(). Best we could at boot time is mark TSC as unstable on affected hardware, but for this we need to figure out if it's specific machine or CPU issue to do it properly. (I'm in process of finding out who to bug with it) Thanks, this would be best. PS: it appears that host runs stably. but kvm_get_time_and_clockread() is affected since it uses its own version of do_monotonic()-vgettsc() which will lead to cycles go backwards and overflow of nano secs in timespec. We should mimic vread_tsc() here so not to run into this kind of issues. I'm not sure I understand, the code is similar: arch/x86/kvm/x86.c arch/x86/vdso/vclock_gettime.c do_monotonicdo_monotonic vgettsc vgetsns read_tscvread_tsc vget_cycles __native_read_tsc __native_read_tsc The VDSO inlines timespec_add_ns. I'm sorry, I haven't looked inside read_tsc() in arch/x86/kvm/x86.c, it's the same as vread_tsc(). Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
On Wed, Jul 16, 2014 at 04:32:19PM +0200, Paolo Bonzini wrote: Il 16/07/2014 16:07, Andy Lutomirski ha scritto: This patch has nothing whatsoever to do with how much I trust the CPU vs the hypervisor. It's for the enormous installed base of machines without RDRAND. Ok. I think an MSR is fine, though I don't think it's useful for the guest to use it if it already has RDRAND and/or RDSEED. Agree. It is unfortunate that we add PV interfaces for a HW that will be extinct in a couple of years though :( -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ensure guest's kvmclock never goes backwards when TSC jumps backward
Il 16/07/2014 16:51, Igor Mammedov ha scritto: I'm not sure that a per-CPU value is enough; your patch can make the problem much less frequent of course, but I'm not sure neither detection nor correction are 100% reliable. Your addition is basically a faster but less reliable version of the last_value logic. How is it less reliable than last_value logic? Suppose CPU 1 is behind by 3 nanoseconds CPU 0 CPU 1 time = 100 (at time 100) time = 99(at time 102) time = 104 (at time 104) time = 105 (at time 108) Your patch will not detect this. Is it possible for each cpu to have it's own time? Yes, that's one of the reasons for TSC not to be stable (it could also happen simply because the value of TSC_ADJUST MSR is bogus). tsc_wrap_test starts to fail almost imediately, I'll check how much tries it takes to fail for the first time, if it is not too much I guess we could add check to check_system_tsc_reliable(). Thanks! Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ensure guest's kvmclock never goes backwards when TSC jumps backward
On Wed, 16 Jul 2014 16:55:37 +0200 Paolo Bonzini pbonz...@redhat.com wrote: Il 16/07/2014 16:51, Igor Mammedov ha scritto: I'm not sure that a per-CPU value is enough; your patch can make the problem much less frequent of course, but I'm not sure neither detection nor correction are 100% reliable. Your addition is basically a faster but less reliable version of the last_value logic. How is it less reliable than last_value logic? Suppose CPU 1 is behind by 3 nanoseconds CPU 0 CPU 1 time = 100 (at time 100) time = 99(at time 102) time = 104 (at time 104) time = 105 (at time 108) Your patch will not detect this. Is it possible for each cpu to have it's own time? Yes, that's one of the reasons for TSC not to be stable (it could also happen simply because the value of TSC_ADJUST MSR is bogus). I was wondering not about TSC but kvmclock - sched_clock. If they are percpu and can differ for each CPU than above diagram if fine as far as time on each CPU is monotonic and there is not need to detect that time on each CPU is different. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH qemu] i386,linux-headers: Add support for kvm_get_rng_seed
This updates x86's kvm_para.h for the feature bit definition and target-i386/cpu.c for the feature name and default. Signed-off-by: Andy Lutomirski l...@amacapital.net --- linux-headers/asm-x86/kvm_para.h | 2 ++ target-i386/cpu.c| 5 +++-- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/linux-headers/asm-x86/kvm_para.h b/linux-headers/asm-x86/kvm_para.h index e41c5c1..a9b27ce 100644 --- a/linux-headers/asm-x86/kvm_para.h +++ b/linux-headers/asm-x86/kvm_para.h @@ -24,6 +24,7 @@ #define KVM_FEATURE_STEAL_TIME 5 #define KVM_FEATURE_PV_EOI 6 #define KVM_FEATURE_PV_UNHALT 7 +#define KVM_FEATURE_GET_RNG_SEED 8 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. @@ -40,6 +41,7 @@ #define MSR_KVM_ASYNC_PF_EN 0x4b564d02 #define MSR_KVM_STEAL_TIME 0x4b564d03 #define MSR_KVM_PV_EOI_EN 0x4b564d04 +#define MSR_KVM_GET_RNG_SEED 0x4b564d05 struct kvm_steal_time { __u64 steal; diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 8fd1497..4ea7e6c 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -236,7 +236,7 @@ static const char *ext4_feature_name[] = { static const char *kvm_feature_name[] = { kvmclock, kvm_nopiodelay, kvm_mmu, kvmclock, kvm_asyncpf, kvm_steal_time, kvm_pv_eoi, kvm_pv_unhalt, -NULL, NULL, NULL, NULL, +kvm_get_rng_seed, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, @@ -368,7 +368,8 @@ static uint32_t kvm_default_features[FEATURE_WORDS] = { (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_STEAL_TIME) | (1 KVM_FEATURE_PV_EOI) | -(1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT), +(1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | +(1 KVM_FEATURE_GET_RNG_SEED), [FEAT_1_ECX] = CPUID_EXT_X2APIC, }; -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
On Wed, Jul 16, 2014 at 7:32 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 16/07/2014 16:07, Andy Lutomirski ha scritto: This patch has nothing whatsoever to do with how much I trust the CPU vs the hypervisor. It's for the enormous installed base of machines without RDRAND. Ok. I think an MSR is fine, though I don't think it's useful for the guest to use it if it already has RDRAND and/or RDSEED. In any case, is there a matching QEMU patch somewhere? What QEMU change is needed? I admit I'm a bit vague on how QEMU and KVM cooperate here, but there's no state to save and restore. I guess that QEMU wants the ability to turn this on and off for migration. How does that work? I couldn't spot the KVM code that allows this type of control. It is QEMU who decides the CPUID bits that are visible to the guest. By default it blocks bits that it doesn't know about. You would need to add the bit in the kvm_default_features and kvm_feature_name arrays. For migration, we have versioned machine types, for example pc-2.1. Once the versioned machine type exists, blocking the feature is a one-liner like x86_cpu_compat_disable_kvm_features(FEAT_KVM, KVM_FEATURE_NAME); Unfortunately, QEMU is in hard freeze, so you'd likely be the one creating pc-2.2. This is a boilerplate but relatively complicated patch. But let's cross that bridge when we'll reach it. For now, you can simply add the bit to the two arrays above. Done. NB: Patch 4 of this series is bad due to an asm constraint issue that I haven't figured out yet. I'll send a replacement once I get it working. *sigh* the x86 kernel loading code is a bit of a compilation mess. Paolo -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
On 07/16/2014 07:07 AM, Andy Lutomirski wrote: This patch has nothing whatsoever to do with how much I trust the CPU vs the hypervisor. It's for the enormous installed base of machines without RDRAND. hpa suggested emulating RDRAND awhile ago, but I think that'll unusably slow -- the kernel uses RDRAND in various places where it's expected to be fast, and not using it at all will be preferable to causing a VM exit for every few bytes. I've been careful to only use this in the guest in places where a few hundred to a few thousand cycles per 64 bits of RNG seed is acceptable. I suggested emulating RDRAND *but not set the CPUID bit*. We already developed a protocol in KVM/Qemu to enumerate emulated features (created for MOVBE as I recall), specifically to service the semantic feature X will work but will be substantially slower than normal. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
Il 16/07/2014 18:03, H. Peter Anvin ha scritto: I suggested emulating RDRAND *but not set the CPUID bit*. We already developed a protocol in KVM/Qemu to enumerate emulated features (created for MOVBE as I recall), specifically to service the semantic feature X will work but will be substantially slower than normal. But those will set the CPUID bit. There is currently no way for KVM guests to know if a CPUID bit is real or emulated. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
On 07/16/2014 09:08 AM, Paolo Bonzini wrote: Il 16/07/2014 18:03, H. Peter Anvin ha scritto: I suggested emulating RDRAND *but not set the CPUID bit*. We already developed a protocol in KVM/Qemu to enumerate emulated features (created for MOVBE as I recall), specifically to service the semantic feature X will work but will be substantially slower than normal. But those will set the CPUID bit. There is currently no way for KVM guests to know if a CPUID bit is real or emulated. OK, so there wasn't any protocol implemented in the end. I sit corrected. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH qemu] i386,linux-headers: Add support for kvm_get_rng_seed
Il 16/07/2014 17:52, Andy Lutomirski ha scritto: This updates x86's kvm_para.h for the feature bit definition and target-i386/cpu.c for the feature name and default. Signed-off-by: Andy Lutomirski l...@amacapital.net Thanks, looks good---assuming the kernel side will make it into 3.17, I'll sync the headers once 3.17 is released and then apply the patch. As mentioned in kvm@ someone will have to add the pc-2.2 machine type too. Paolo --- linux-headers/asm-x86/kvm_para.h | 2 ++ target-i386/cpu.c| 5 +++-- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/linux-headers/asm-x86/kvm_para.h b/linux-headers/asm-x86/kvm_para.h index e41c5c1..a9b27ce 100644 --- a/linux-headers/asm-x86/kvm_para.h +++ b/linux-headers/asm-x86/kvm_para.h @@ -24,6 +24,7 @@ #define KVM_FEATURE_STEAL_TIME 5 #define KVM_FEATURE_PV_EOI 6 #define KVM_FEATURE_PV_UNHALT 7 +#define KVM_FEATURE_GET_RNG_SEED 8 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. @@ -40,6 +41,7 @@ #define MSR_KVM_ASYNC_PF_EN 0x4b564d02 #define MSR_KVM_STEAL_TIME 0x4b564d03 #define MSR_KVM_PV_EOI_EN 0x4b564d04 +#define MSR_KVM_GET_RNG_SEED 0x4b564d05 struct kvm_steal_time { __u64 steal; diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 8fd1497..4ea7e6c 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -236,7 +236,7 @@ static const char *ext4_feature_name[] = { static const char *kvm_feature_name[] = { kvmclock, kvm_nopiodelay, kvm_mmu, kvmclock, kvm_asyncpf, kvm_steal_time, kvm_pv_eoi, kvm_pv_unhalt, -NULL, NULL, NULL, NULL, +kvm_get_rng_seed, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, @@ -368,7 +368,8 @@ static uint32_t kvm_default_features[FEATURE_WORDS] = { (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_STEAL_TIME) | (1 KVM_FEATURE_PV_EOI) | -(1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT), +(1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | +(1 KVM_FEATURE_GET_RNG_SEED), [FEAT_1_ECX] = CPUID_EXT_X2APIC, }; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
On Wed, Jul 16, 2014 at 09:13:23AM -0700, H. Peter Anvin wrote: On 07/16/2014 09:08 AM, Paolo Bonzini wrote: Il 16/07/2014 18:03, H. Peter Anvin ha scritto: I suggested emulating RDRAND *but not set the CPUID bit*. We already developed a protocol in KVM/Qemu to enumerate emulated features (created for MOVBE as I recall), specifically to service the semantic feature X will work but will be substantially slower than normal. But those will set the CPUID bit. There is currently no way for KVM guests to know if a CPUID bit is real or emulated. OK, so there wasn't any protocol implemented in the end. I sit corrected. That protocol that was implemented is between qemu and kvm, not kvm and a guest. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/5] x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
This adds a simple interface to allow a guest to request 64 bits of host nonblocking entropy. This is independent of virtio-rng for a couple of reasons: - It's intended to be usable during early boot, when a trivial synchronous interface is needed. - virtio-rng gives blocking entropy, and making guest boot wait for the host's /dev/random will cause problems. MSR_KVM_GET_RNG_SEED is intended to provide 64 bits of best-effort cryptographically secure data for use as a seed. It provides no guarantee that the result contains any actual entropy. Signed-off-by: Andy Lutomirski l...@amacapital.net --- Documentation/virtual/kvm/cpuid.txt | 3 +++ arch/x86/include/uapi/asm/kvm_para.h | 2 ++ arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/x86.c | 4 4 files changed, 11 insertions(+), 1 deletion(-) diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt index 3c65feb..0ab043b 100644 --- a/Documentation/virtual/kvm/cpuid.txt +++ b/Documentation/virtual/kvm/cpuid.txt @@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT || 7 || guest checks this feature bit || || before enabling paravirtualized || || spinlock support. -- +KVM_FEATURE_GET_RNG_SEED || 8 || host provides rng seed data via + || || MSR_KVM_GET_RNG_SEED. +-- KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side || || per-cpu warps are expected in || || kvmclock. diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 94dc8ca..e2eaf93 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -24,6 +24,7 @@ #define KVM_FEATURE_STEAL_TIME 5 #define KVM_FEATURE_PV_EOI 6 #define KVM_FEATURE_PV_UNHALT 7 +#define KVM_FEATURE_GET_RNG_SEED 8 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. @@ -40,6 +41,7 @@ #define MSR_KVM_ASYNC_PF_EN 0x4b564d02 #define MSR_KVM_STEAL_TIME 0x4b564d03 #define MSR_KVM_PV_EOI_EN 0x4b564d04 +#define MSR_KVM_GET_RNG_SEED 0x4b564d05 struct kvm_steal_time { __u64 steal; diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 38a0afe..40d6763 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -479,7 +479,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_PV_EOI) | (1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | -(1 KVM_FEATURE_PV_UNHALT); +(1 KVM_FEATURE_PV_UNHALT) | +(1 KVM_FEATURE_GET_RNG_SEED); if (sched_info_on()) entry-eax |= (1 KVM_FEATURE_STEAL_TIME); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f644933..4e81853 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -48,6 +48,7 @@ #include linux/pci.h #include linux/timekeeper_internal.h #include linux/pvclock_gtod.h +#include linux/random.h #include trace/events/kvm.h #define CREATE_TRACE_POINTS @@ -2480,6 +2481,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case MSR_KVM_PV_EOI_EN: data = vcpu-arch.pv_eoi.msr_val; break; + case MSR_KVM_GET_RNG_SEED: + get_random_bytes(data, sizeof(data)); + break; case MSR_IA32_P5_MC_ADDR: case MSR_IA32_P5_MC_TYPE: case MSR_IA32_MCG_CAP: -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 5/5] x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available
It's considerably better than any of the alternatives on KVM. Rather than reinventing all of the cpu feature query code, this fixes native_cpuid to work in PIC objects. I haven't combined it with boot/cpuflags.c's cpuid implementation: including asm/processor.h from boot/cpuflags.c results in a flood of unrelated errors, and fixing it might be messy. Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/boot/compressed/aslr.c | 27 +++ arch/x86/include/asm/processor.h | 21 ++--- 2 files changed, 45 insertions(+), 3 deletions(-) diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c index fc6091a..8583f0e 100644 --- a/arch/x86/boot/compressed/aslr.c +++ b/arch/x86/boot/compressed/aslr.c @@ -5,6 +5,8 @@ #include asm/archrandom.h #include asm/e820.h +#include uapi/asm/kvm_para.h + #include generated/compile.h #include linux/module.h #include linux/uts.h @@ -15,6 +17,22 @@ static const char build_str[] = UTS_RELEASE ( LINUX_COMPILE_BY @ LINUX_COMPILE_HOST ) ( LINUX_COMPILER ) UTS_VERSION; +static bool kvm_para_has_feature(unsigned int feature) +{ + u32 kvm_base; + u32 features; + + if (!has_cpuflag(X86_FEATURE_HYPERVISOR)) + return false; + + kvm_base = hypervisor_cpuid_base(KVMKVMKVM\0\0\0, KVM_CPUID_FEATURES); + if (!kvm_base) + return false; + + features = cpuid_eax(kvm_base | KVM_CPUID_FEATURES); + return features (1UL feature); +} + #define I8254_PORT_CONTROL 0x43 #define I8254_PORT_COUNTER00x40 #define I8254_CMD_READBACK 0xC0 @@ -81,6 +99,15 @@ static unsigned long get_random_long(void) } } + if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) { + u64 seed; + + debug_putstr( MSR_KVM_GET_RNG_SEED); + rdmsrl(MSR_KVM_GET_RNG_SEED, seed); + random ^= (unsigned long)seed; + use_i8254 = false; + } + if (has_cpuflag(X86_FEATURE_TSC)) { debug_putstr( RDTSC); rdtscll(raw); diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index a4ea023..6096f3c 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -189,10 +189,25 @@ static inline int have_cpuid_p(void) static inline void native_cpuid(unsigned int *eax, unsigned int *ebx, unsigned int *ecx, unsigned int *edx) { - /* ecx is often an input as well as an output. */ - asm volatile(cpuid + /* +* This function can be used from the boot code, so it needs +* to avoid using EBX in constraints in PIC mode. +* +* ecx is often an input as well as an output. +*/ + asm volatile(.ifnc %%ebx,%1 ; .ifnc %%rbx,%1 \n\t +movl %%ebx,%1\n\t +.endif ; .endif \n\t +cpuid \n\t +.ifnc %%ebx,%1 ; .ifnc %%rbx,%1 \n\t +xchgl %%ebx,%1\n\t +.endif ; .endif : =a (*eax), - =b (*ebx), +#if defined(__i386__) defined(__PIC__) + =r (*ebx), /* gcc won't let us use ebx */ +#else + =b (*ebx), /* ebx is okay */ +#endif =c (*ecx), =d (*edx) : 0 (*eax), 2 (*ecx) -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 4/5] random: Log how many bits we managed to seed with in init_std_data
This is useful for making sure that init_std_data is working correctly and for allaying fear when this happens: random: xyz urandom read with SMALL_NUMBER bits of entropy available Signed-off-by: Andy Lutomirski l...@amacapital.net --- drivers/char/random.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index e2c3d02..10e9642 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1251,12 +1251,16 @@ static void init_std_data(struct entropy_store *r) int i; ktime_t now = ktime_get_real(); unsigned long rv; + int arch_seed_bits = 0, arch_random_bits = 0, slow_rng_bits = 0; r-last_pulled = jiffies; mix_pool_bytes(r, now, sizeof(now), NULL); for (i = r-poolinfo-poolbytes; i 0; i -= sizeof(rv)) { - if (!arch_get_random_seed_long(rv) - !arch_get_random_long(rv)) + if (arch_get_random_seed_long(rv)) + arch_seed_bits += 8 * sizeof(rv); + else if (arch_get_random_long(rv)) + arch_random_bits += 8 * sizeof(rv); + else rv = random_get_entropy(); mix_pool_bytes(r, rv, sizeof(rv), NULL); } @@ -1265,10 +1269,14 @@ static void init_std_data(struct entropy_store *r) for (i = 0; i 4; i++) { u64 rv64; - if (arch_get_slow_rng_u64(rv64)) + if (arch_get_slow_rng_u64(rv64)) { mix_pool_bytes(r, rv64, sizeof(rv64), NULL); + slow_rng_bits += 8 * sizeof(rv64); } } + + pr_info(random: seeded %s pool with %d bits of arch random seed, %d bits of arch random, and %d bits of arch slow rng\n, + r-name, arch_seed_bits, arch_random_bits, slow_rng_bits); } /* -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/5] random,x86: Add arch_get_slow_rng_u64
arch_get_slow_rng_u64 tries to get 64 bits of RNG seed data. Unlike arch_get_random_{bytes,seed}, etc., it makes no claims about entropy content. It's also likely to be much slower and should not be used frequently. That being said, it should be fast enough to call several times during boot without any noticeable slowdown. This initial implementation backs it with MSR_KVM_GET_RNG_SEED if available. The intent is for other hypervisor guest implementations to implement this interface. Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/Kconfig | 4 arch/x86/include/asm/archslowrng.h | 30 ++ arch/x86/kernel/kvm.c | 22 ++ include/linux/random.h | 9 + 4 files changed, 65 insertions(+) create mode 100644 arch/x86/include/asm/archslowrng.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a8f749e..4dfb539 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -593,6 +593,7 @@ config KVM_GUEST bool KVM Guest support (including kvmclock) depends on PARAVIRT select PARAVIRT_CLOCK + select ARCH_SLOW_RNG default y ---help--- This option enables various optimizations for running under the KVM @@ -627,6 +628,9 @@ config PARAVIRT_TIME_ACCOUNTING config PARAVIRT_CLOCK bool +config ARCH_SLOW_RNG + bool + endif #HYPERVISOR_GUEST config NO_BOOTMEM diff --git a/arch/x86/include/asm/archslowrng.h b/arch/x86/include/asm/archslowrng.h new file mode 100644 index 000..c8e8d0d --- /dev/null +++ b/arch/x86/include/asm/archslowrng.h @@ -0,0 +1,30 @@ +/* + * This file is part of the Linux kernel. + * + * Copyright (c) 2014 Andy Lutomirski + * Authors: Andy Lutomirski l...@amacapital.net + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#ifndef ASM_X86_ARCHSLOWRANDOM_H +#define ASM_X86_ARCHSLOWRANDOM_H + +#ifndef CONFIG_ARCH_SLOW_RNG +# error archslowrng.h should not be included if !CONFIG_ARCH_SLOW_RNG +#endif + +/* + * Performance is irrelevant here, so there's no point in using the + * paravirt ops mechanism. Instead just use a function pointer. + */ +extern int (*arch_get_slow_rng_u64)(u64 *v); + +#endif /* ASM_X86_ARCHSLOWRANDOM_H */ diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 3dd8e2c..8d64d28 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -416,6 +416,25 @@ void kvm_disable_steal_time(void) wrmsr(MSR_KVM_STEAL_TIME, 0, 0); } +static int nop_get_slow_rng_u64(u64 *v) +{ + return 0; +} + +static int kvm_get_slow_rng_u64(u64 *v) +{ + /* +* Allow migration from a hypervisor with the GET_RNG_SEED +* feature to a hypervisor without it. +*/ + if (rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0) + return 1; + else + return 0; +} + +int (*arch_get_slow_rng_u64)(u64 *v) = nop_get_slow_rng_u64; + #ifdef CONFIG_SMP static void __init kvm_smp_prepare_boot_cpu(void) { @@ -493,6 +512,9 @@ void __init kvm_guest_init(void) if (kvmclock_vsyscall) kvm_setup_vsyscall_timeinfo(); + if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) + arch_get_slow_rng_u64 = kvm_get_slow_rng_u64; + #ifdef CONFIG_SMP smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu; register_cpu_notifier(kvm_cpu_notifier); diff --git a/include/linux/random.h b/include/linux/random.h index 57fbbff..ceafbcf 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -106,6 +106,15 @@ static inline int arch_has_random_seed(void) } #endif +#ifdef CONFIG_ARCH_SLOW_RNG +# include asm/archslowrng.h +#else +static inline int arch_get_slow_rng_u64(u64 *v) +{ + return 0; +} +#endif + /* Pseudo random number generator from numerical recipes. */ static inline u32 next_pseudo_random32(u32 seed) { -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/5] random: Seed pools from arch_get_slow_rng_u64 at startup
This should help solve the problem of guests starting out with predictable RNG state. Signed-off-by: Andy Lutomirski l...@amacapital.net --- drivers/char/random.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/char/random.c b/drivers/char/random.c index 0a7ac0a..e2c3d02 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1261,6 +1261,14 @@ static void init_std_data(struct entropy_store *r) mix_pool_bytes(r, rv, sizeof(rv), NULL); } mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL); + + for (i = 0; i 4; i++) { + u64 rv64; + + if (arch_get_slow_rng_u64(rv64)) + mix_pool_bytes(r, rv64, sizeof(rv64), NULL); + } + } } /* -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/5] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
virtio-rng is both too complicated and insufficient for initial rng seeding. It's far too complicated to use for KASLR or any other early boot random number needs. It also provides /dev/random-style bits, which means that making guest boot wait for virtio-rng is unacceptably slow, and doing it asynchronously means that /dev/urandom might be predictable when userspace starts. This introduces a very simple synchronous mechanism to get /dev/urandom-style bits. I sent the corresponding kvm-unit-tests and qemu changes separately. There's room for bikeshedding on the same arch_get_slow_rng_u64. I considered arch_get_rng_seed_u64, but that could be confused with arch_get_random_seed_long, which is not interchangeable. Changes from v1: - Split patches 2 and 3 - Log all arch sources in init_std_data - Fix the 32-bit kaslr build Andy Lutomirski (5): x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit random,x86: Add arch_get_slow_rng_u64 random: Seed pools from arch_get_slow_rng_u64 at startup random: Log how many bits we managed to seed with in init_std_data x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available Documentation/virtual/kvm/cpuid.txt | 3 +++ arch/x86/Kconfig | 4 arch/x86/boot/compressed/aslr.c | 27 +++ arch/x86/include/asm/archslowrng.h | 30 ++ arch/x86/include/asm/processor.h | 21 ++--- arch/x86/include/uapi/asm/kvm_para.h | 2 ++ arch/x86/kernel/kvm.c| 22 ++ arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/x86.c | 4 drivers/char/random.c| 20 ++-- include/linux/random.h | 9 + 11 files changed, 139 insertions(+), 6 deletions(-) create mode 100644 arch/x86/include/asm/archslowrng.h -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/5] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
Andy Lutomirski l...@amacapital.net writes: virtio-rng is both too complicated and insufficient for initial rng seeding. It's far too complicated to use for KASLR or any other early boot random number needs. It also provides /dev/random-style bits, which means that making guest boot wait for virtio-rng is unacceptably slow, and doing it asynchronously means that /dev/urandom might be predictable when userspace starts. This introduces a very simple synchronous mechanism to get /dev/urandom-style bits. Whoa! the cover letter seems more like virtio-rng bashing rather than introduction to the patchset (and/or it's advantages over existing methods) :) That's ok though I guess, these won't be in the commit log. I sent the corresponding kvm-unit-tests and qemu changes separately. There's room for bikeshedding on the same arch_get_slow_rng_u64. I considered arch_get_rng_seed_u64, but that could be confused with arch_get_random_seed_long, which is not interchangeable. Changes from v1: - Split patches 2 and 3 - Log all arch sources in init_std_data - Fix the 32-bit kaslr build Andy Lutomirski (5): x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit random,x86: Add arch_get_slow_rng_u64 random: Seed pools from arch_get_slow_rng_u64 at startup random: Log how many bits we managed to seed with in init_std_data x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available Documentation/virtual/kvm/cpuid.txt | 3 +++ arch/x86/Kconfig | 4 arch/x86/boot/compressed/aslr.c | 27 +++ arch/x86/include/asm/archslowrng.h | 30 ++ arch/x86/include/asm/processor.h | 21 ++--- arch/x86/include/uapi/asm/kvm_para.h | 2 ++ arch/x86/kernel/kvm.c| 22 ++ arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/x86.c | 4 drivers/char/random.c| 20 ++-- include/linux/random.h | 9 + 11 files changed, 139 insertions(+), 6 deletions(-) create mode 100644 arch/x86/include/asm/archslowrng.h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/5] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
On Wed, Jul 16, 2014 at 11:02 AM, Bandan Das b...@redhat.com wrote: Andy Lutomirski l...@amacapital.net writes: virtio-rng is both too complicated and insufficient for initial rng seeding. It's far too complicated to use for KASLR or any other early boot random number needs. It also provides /dev/random-style bits, which means that making guest boot wait for virtio-rng is unacceptably slow, and doing it asynchronously means that /dev/urandom might be predictable when userspace starts. This introduces a very simple synchronous mechanism to get /dev/urandom-style bits. Whoa! the cover letter seems more like virtio-rng bashing rather than introduction to the patchset (and/or it's advantages over existing methods) :) That's ok though I guess, these won't be in the commit log. Yeah, sorry -- I figured that the biggest objection would be just use virtio-rng. I'll send a v3 later today -- there's a trivial bisectability bug in this version. --Andy I sent the corresponding kvm-unit-tests and qemu changes separately. There's room for bikeshedding on the same arch_get_slow_rng_u64. I considered arch_get_rng_seed_u64, but that could be confused with arch_get_random_seed_long, which is not interchangeable. Changes from v1: - Split patches 2 and 3 - Log all arch sources in init_std_data - Fix the 32-bit kaslr build Andy Lutomirski (5): x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit random,x86: Add arch_get_slow_rng_u64 random: Seed pools from arch_get_slow_rng_u64 at startup random: Log how many bits we managed to seed with in init_std_data x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available Documentation/virtual/kvm/cpuid.txt | 3 +++ arch/x86/Kconfig | 4 arch/x86/boot/compressed/aslr.c | 27 +++ arch/x86/include/asm/archslowrng.h | 30 ++ arch/x86/include/asm/processor.h | 21 ++--- arch/x86/include/uapi/asm/kvm_para.h | 2 ++ arch/x86/kernel/kvm.c| 22 ++ arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/x86.c | 4 drivers/char/random.c| 20 ++-- include/linux/random.h | 9 + 11 files changed, 139 insertions(+), 6 deletions(-) create mode 100644 arch/x86/include/asm/archslowrng.h -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] vfio-pci: Release devices with BusMaster disabled
Our current open/release path looks like this: vfio_pci_open vfio_pci_enable pci_enable_device pci_save_state pci_store_saved_state vfio_pci_release vfio_pci_disable pci_disable_device pci_restore_state pci_enable_device() doesn't modify PCI_COMMAND_MASTER, so if a device comes to us with it enabled, it persists through the open and gets stored as part of the device saved state. We then restore that saved state when released, which can allow the device to attempt to continue to do DMA. When the group is disconnected from the domain, this will get caught by the IOMMU, but if there are other devices in the group, the device may continue running and interfere with the user. Even in the former case, IOMMUs don't necessarily behave well and a stream of blocked DMA can result in unpleasant behavior on the host. Explicitly disable Bus Master as we're enabling the device and slightly re-work release to make sure that pci_disable_device() is the last thing that touches the device. Signed-off-by: Alex Williamson alex.william...@redhat.com --- drivers/vfio/pci/vfio_pci.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 010e0f8..36d8332 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -44,6 +44,9 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev) u16 cmd; u8 msix_pos; + /* Don't allow our initial saved state to include busmaster */ + pci_clear_master(pdev); + ret = pci_enable_device(pdev); if (ret) return ret; @@ -99,7 +102,8 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev) struct pci_dev *pdev = vdev-pdev; int bar; - pci_disable_device(pdev); + /* Stop the device from further DMA */ + pci_clear_master(pdev); vfio_pci_set_irqs_ioctl(vdev, VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER, @@ -128,7 +132,7 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev) __func__, dev_name(pdev-dev)); if (!vdev-reset_works) - return; + goto out; pci_save_state(pdev); } @@ -151,6 +155,8 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev) } pci_restore_state(pdev); +out: + pci_disable_device(pdev); } static void vfio_pci_release(void *device_data) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] vfio-pci: Attempt bus/slot reset on release
Each time a device is released, mark whether a local reset was successful or whether a bus/slot reset is needed. If a reset is needed and all of the affected devices are bound to vfio-pci and unused, allow the reset. This is most useful when the userspace driver is killed and releases all the devices in an unclean state, such as when a QEMU VM quits. Signed-off-by: Alex Williamson alex.william...@redhat.com --- drivers/vfio/pci/vfio_pci.c | 112 +++ drivers/vfio/pci/vfio_pci_private.h |1 2 files changed, 113 insertions(+) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index c4949a7..f95b90f 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -39,6 +39,8 @@ MODULE_PARM_DESC(nointxmask, static DEFINE_MUTEX(driver_lock); +static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev); + static int vfio_pci_enable(struct vfio_pci_device *vdev) { struct pci_dev *pdev = vdev-pdev; @@ -123,6 +125,8 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev) vdev-barmap[bar] = NULL; } + vdev-needs_reset = true; + /* * If we have saved state, restore it. If we can reset the device, * even better. Resetting with current state seems better than @@ -154,11 +158,15 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev) if (ret) pr_warn(%s: Failed to reset device %s (%d)\n, __func__, dev_name(pdev-dev), ret); + else + vdev-needs_reset = false; } pci_restore_state(pdev); out: pci_disable_device(pdev); + + vfio_pci_try_bus_reset(vdev); } static void vfio_pci_release(void *device_data) @@ -917,6 +925,110 @@ static struct pci_driver vfio_pci_driver = { .err_handler= vfio_err_handlers, }; +/* + * Test whether a reset is necessary and possible. We mark devices as + * needs_reset when they are released, but don't have a function-local reset + * available. If any of these exist in the affected devices, we want to do + * a bus/slot reset. We also need all of the affected devices to be unused, + * so we abort if any device has a non-zero refcnt. driver_lock prevents a + * device from being opened during the scan or unbound from vfio-pci. + */ +static int vfio_pci_test_bus_reset(struct pci_dev *pdev, void *data) +{ + bool *needs_reset = data; + struct pci_driver *pci_drv = ACCESS_ONCE(pdev-driver); + int ret = -EBUSY; + + if (pci_drv == vfio_pci_driver) { + struct vfio_device *device; + struct vfio_pci_device *vdev; + + device = vfio_device_get_from_dev(pdev-dev); + if (!device) + return ret; + + vdev = vfio_device_data(device); + if (vdev) { + if (vdev-needs_reset) + *needs_reset = true; + + if (!vdev-refcnt) + ret = 0; + } + + vfio_device_put(device); + } + + /* +* TODO: vfio-core considers groups to be viable even if some devices +* are attached to known drivers, like pci-stub or pcieport. We can't +* freeze devices from being unbound to those drivers like we can +* here though, so it would be racy to test for them. We also can't +* use device_lock() to prevent changes as that would interfere with +* PCI-core taking device_lock during bus reset. For now, we require +* devices to be bound to vfio-pci to get a bus/slot reset on release. +*/ + + return ret; +} + +/* Clear needs_reset on all affected devices after successful bus/slot reset */ +static int vfio_pci_clear_needs_reset(struct pci_dev *pdev, void *data) +{ + struct pci_driver *pci_drv = ACCESS_ONCE(pdev-driver); + + if (pci_drv == vfio_pci_driver) { + struct vfio_device *device; + struct vfio_pci_device *vdev; + + device = vfio_device_get_from_dev(pdev-dev); + if (!device) + return 0; + + vdev = vfio_device_data(device); + if (vdev) + vdev-needs_reset = false; + + vfio_device_put(device); + } + + return 0; +} + +/* + * Attempt to do a bus/slot reset if there are devices affected by a reset for + * this device that are needs_reset and all of the affected devices are unused + * (!refcnt). Callers of this function are required to hold driver_lock such + * that devices can not be unbound from vfio-pci or opened by a user while we + * test for and perform a bus/slot reset. + */ +static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev) +{ + bool needs_reset = false, slot = false; + int ret; + + if
[PATCH 0/3] vfio-pci: Reset improvements
This series is intended to improve the state of devices returned back to the host from vfio-pci or re-used by another user. First we make sure that busmaster is disabled in the saved state, so the device cannot continue to do DMA, then we add some serialization, move our reference counting under it to fix an unlikely bug should we fail to initialize a device, and add the ability to do bus/slot reset on device release. To do this, we require all devices affected by the bus/slot reset to be bound to vfio-pci, therefore users sequestering devices with pci-stub will need to bind them to vfio-pci to see this change. The effect of these changes are perhaps most noticeable with GPU assignment to a VM, where killing QEMU results in a static image on the framebuffer since no reset of the device was done. Returning the GPU to a host device at this point was suspect. Other devices, like USB controllers, also don't necessarily appreciate being abruptly disconnected from their IOMMU domain and would generate IOMMU faults in the event the user process is killed. Both of these cases should be resolved here, assuming all the devices on the bus are bound to vfio-pci and at least one of the devices in use does not support a function-local reset. Please test and comment. Thanks, Alex --- Alex Williamson (3): vfio-pci: Attempt bus/slot reset on release vfio-pci: Use mutex around open, release, and remove vfio-pci: Release devices with BusMaster disabled drivers/vfio/pci/vfio_pci.c | 157 --- drivers/vfio/pci/vfio_pci_private.h |3 - 2 files changed, 147 insertions(+), 13 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] vfio-pci: Use mutex around open, release, and remove
Serializing open/release allows us to fix a refcnt error if we fail to enable the device and lets us prevent devices from being unbound or opened, giving us an opportunity to do bus resets on release. No restriction added to serialize binding devices to vfio-pci while the mutex is held though. Signed-off-by: Alex Williamson alex.william...@redhat.com --- drivers/vfio/pci/vfio_pci.c | 35 +-- drivers/vfio/pci/vfio_pci_private.h |2 +- 2 files changed, 26 insertions(+), 11 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 36d8332..c4949a7 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -37,6 +37,8 @@ module_param_named(nointxmask, nointxmask, bool, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(nointxmask, Disable support for PCI 2.3 style INTx masking. If this resolves problems for specific devices, report lspci -vvvxxx to linux-...@vger.kernel.org so the device can be fixed automatically via the broken_intx_masking flag.); +static DEFINE_MUTEX(driver_lock); + static int vfio_pci_enable(struct vfio_pci_device *vdev) { struct pci_dev *pdev = vdev-pdev; @@ -163,28 +165,39 @@ static void vfio_pci_release(void *device_data) { struct vfio_pci_device *vdev = device_data; - if (atomic_dec_and_test(vdev-refcnt)) + mutex_lock(driver_lock); + + if (!(--vdev-refcnt)) vfio_pci_disable(vdev); + mutex_unlock(driver_lock); + module_put(THIS_MODULE); } static int vfio_pci_open(void *device_data) { struct vfio_pci_device *vdev = device_data; + int ret = 0; if (!try_module_get(THIS_MODULE)) return -ENODEV; - if (atomic_inc_return(vdev-refcnt) == 1) { - int ret = vfio_pci_enable(vdev); + mutex_lock(driver_lock); + + if (!vdev-refcnt) { + ret = vfio_pci_enable(vdev); if (ret) { module_put(THIS_MODULE); - return ret; + goto unlock; } } + vdev-refcnt++; - return 0; +unlock: + mutex_unlock(driver_lock); + + return ret; } static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type) @@ -839,7 +852,6 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) vdev-irq_type = VFIO_PCI_NUM_IRQS; mutex_init(vdev-igate); spin_lock_init(vdev-irqlock); - atomic_set(vdev-refcnt, 0); ret = vfio_add_group_dev(pdev-dev, vfio_pci_ops, vdev); if (ret) { @@ -854,12 +866,15 @@ static void vfio_pci_remove(struct pci_dev *pdev) { struct vfio_pci_device *vdev; + mutex_lock(driver_lock); + vdev = vfio_del_group_dev(pdev-dev); - if (!vdev) - return; + if (vdev) { + iommu_group_put(pdev-dev.iommu_group); + kfree(vdev); + } - iommu_group_put(pdev-dev.iommu_group); - kfree(vdev); + mutex_unlock(driver_lock); } static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev, diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index 9c6d5d0..31e7a30 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -55,7 +55,7 @@ struct vfio_pci_device { boolbardirty; boolhas_vga; struct pci_saved_state *pci_saved_state; - atomic_trefcnt; + int refcnt; struct eventfd_ctx *err_trigger; }; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
On 07/16/2014 09:21 AM, Gleb Natapov wrote: On Wed, Jul 16, 2014 at 09:13:23AM -0700, H. Peter Anvin wrote: On 07/16/2014 09:08 AM, Paolo Bonzini wrote: Il 16/07/2014 18:03, H. Peter Anvin ha scritto: I suggested emulating RDRAND *but not set the CPUID bit*. We already developed a protocol in KVM/Qemu to enumerate emulated features (created for MOVBE as I recall), specifically to service the semantic feature X will work but will be substantially slower than normal. But those will set the CPUID bit. There is currently no way for KVM guests to know if a CPUID bit is real or emulated. OK, so there wasn't any protocol implemented in the end. I sit corrected. That protocol that was implemented is between qemu and kvm, not kvm and a guest. Either which way, the notion was to have a PV CPUID bit like the proposed kvm_get_rng_seed bit, but to have it exercised by executing RDRAND. The biggest reason to *not* do this would be that with an MSR it is not available to guest user space, which may be better under the circumstances. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch V2 43/64] x86: kvm: Use ktime_get_boot_ns()
Use the new nanoseconds based interface and get rid of the timespec conversion dance. Signed-off-by: Thomas Gleixner t...@linutronix.de Cc: Gleb Natapov g...@kernel.org Cc: kvm@vger.kernel.org --- arch/x86/kvm/x86.c |6 +- 1 file changed, 1 insertion(+), 5 deletions(-) Index: tip/arch/x86/kvm/x86.c === --- tip.orig/arch/x86/kvm/x86.c +++ tip/arch/x86/kvm/x86.c @@ -1109,11 +1109,7 @@ static void kvm_get_time_scale(uint32_t static inline u64 get_kernel_ns(void) { - struct timespec ts; - - ktime_get_ts(ts); - monotonic_to_bootbased(ts); - return timespec_to_ns(ts); + return ktime_get_boot_ns(); } #ifdef CONFIG_X86_64 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch V2 44/64] x86: kvm: Make kvm_get_time_and_clockread() nanoseconds based
Convert the relevant base data right away to nanoseconds instead of doing the conversion on every readout. Reduces text size by 160 bytes. Signed-off-by: Thomas Gleixner t...@linutronix.de Cc: Gleb Natapov g...@kernel.org Cc: kvm@vger.kernel.org --- arch/x86/kvm/x86.c | 44 ++-- 1 file changed, 14 insertions(+), 30 deletions(-) Index: tip/arch/x86/kvm/x86.c === --- tip.orig/arch/x86/kvm/x86.c +++ tip/arch/x86/kvm/x86.c @@ -984,9 +984,8 @@ struct pvclock_gtod_data { u32 shift; } clock; - /* open coded 'struct timespec' */ - u64 monotonic_time_snsec; - time_t monotonic_time_sec; + u64 boot_ns; + u64 nsec_base; }; static struct pvclock_gtod_data pvclock_gtod_data; @@ -994,6 +993,9 @@ static struct pvclock_gtod_data pvclock_ static void update_pvclock_gtod(struct timekeeper *tk) { struct pvclock_gtod_data *vdata = pvclock_gtod_data; + u64 boot_ns; + + boot_ns = ktime_to_ns(ktime_add(tk-base_mono, tk-offs_boot)); write_seqcount_begin(vdata-seq); @@ -1004,17 +1006,8 @@ static void update_pvclock_gtod(struct t vdata-clock.mult = tk-mult; vdata-clock.shift = tk-shift; - vdata-monotonic_time_sec = tk-xtime_sec - + tk-wall_to_monotonic.tv_sec; - vdata-monotonic_time_snsec = tk-xtime_nsec - + (tk-wall_to_monotonic.tv_nsec -tk-shift); - while (vdata-monotonic_time_snsec = - (((u64)NSEC_PER_SEC) tk-shift)) { - vdata-monotonic_time_snsec -= - ((u64)NSEC_PER_SEC) tk-shift; - vdata-monotonic_time_sec++; - } + vdata-boot_ns = boot_ns; + vdata-nsec_base= tk-xtime_nsec; write_seqcount_end(vdata-seq); } @@ -1371,23 +1364,22 @@ static inline u64 vgettsc(cycle_t *cycle return v * gtod-clock.mult; } -static int do_monotonic(struct timespec *ts, cycle_t *cycle_now) +static int do_monotonic_boot(s64 *t, cycle_t *cycle_now) { + struct pvclock_gtod_data *gtod = pvclock_gtod_data; unsigned long seq; - u64 ns; int mode; - struct pvclock_gtod_data *gtod = pvclock_gtod_data; + u64 ns; - ts-tv_nsec = 0; do { seq = read_seqcount_begin(gtod-seq); mode = gtod-clock.vclock_mode; - ts-tv_sec = gtod-monotonic_time_sec; - ns = gtod-monotonic_time_snsec; + ns = gtod-nsec_base; ns += vgettsc(cycle_now); ns = gtod-clock.shift; + ns += gtod-boot_ns; } while (unlikely(read_seqcount_retry(gtod-seq, seq))); - timespec_add_ns(ts, ns); + *t = ns; return mode; } @@ -1395,19 +1387,11 @@ static int do_monotonic(struct timespec /* returns true if host is using tsc clocksource */ static bool kvm_get_time_and_clockread(s64 *kernel_ns, cycle_t *cycle_now) { - struct timespec ts; - /* checked again under seqlock below */ if (pvclock_gtod_data.clock.vclock_mode != VCLOCK_TSC) return false; - if (do_monotonic(ts, cycle_now) != VCLOCK_TSC) - return false; - - monotonic_to_bootbased(ts); - *kernel_ns = timespec_to_ns(ts); - - return true; + return do_monotonic_boot(kernel_ns, cycle_now) == VCLOCK_TSC; } #endif -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
On Wed, Jul 16, 2014 at 1:20 PM, H. Peter Anvin h...@zytor.com wrote: On 07/16/2014 09:21 AM, Gleb Natapov wrote: On Wed, Jul 16, 2014 at 09:13:23AM -0700, H. Peter Anvin wrote: On 07/16/2014 09:08 AM, Paolo Bonzini wrote: Il 16/07/2014 18:03, H. Peter Anvin ha scritto: I suggested emulating RDRAND *but not set the CPUID bit*. We already developed a protocol in KVM/Qemu to enumerate emulated features (created for MOVBE as I recall), specifically to service the semantic feature X will work but will be substantially slower than normal. But those will set the CPUID bit. There is currently no way for KVM guests to know if a CPUID bit is real or emulated. OK, so there wasn't any protocol implemented in the end. I sit corrected. That protocol that was implemented is between qemu and kvm, not kvm and a guest. Either which way, the notion was to have a PV CPUID bit like the proposed kvm_get_rng_seed bit, but to have it exercised by executing RDRAND. The biggest reason to *not* do this would be that with an MSR it is not available to guest user space, which may be better under the circumstances. On the theory that I see no legitimate reason to expose this to guest user space, I think we shouldn't expose it. If we wanted to add a get_random_bytes syscall, that would be an entirely different story, though. Should I send v3 as one series or should I split it into host and guest parts? --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
On 07/16/2014 02:32 PM, Andy Lutomirski wrote: On the theory that I see no legitimate reason to expose this to guest user space, I think we shouldn't expose it. If we wanted to add a get_random_bytes syscall, that would be an entirely different story, though. Should I send v3 as one series or should I split it into host and guest parts? It doesn't matter... as long as they are separate *patches*. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/5] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
This introduces and uses a very simple synchronous mechanism to get /dev/urandom-style bits appropriate for initial KVM PV guest RNG seeding. virtio-rng is not suitable for this purpose. It's too difficult to enumerate for use in early boot (e.g. KASLR, which runs before we even have an IDT). It also provides /dev/random-style bits, which means that making guest boot wait for virtio-rng is unacceptably slow, and doing it asynchronously means that /dev/urandom might still be predictable when userspace starts. I sent the corresponding kvm-unit-tests and qemu changes separately. There's room for bikeshedding on the same arch_get_slow_rng_u64. I considered arch_get_rng_seed_u64, but that could be confused with arch_get_random_seed_long, which is not interchangeable. Changes from v2: - Bisection fix (patch 2 had a misplaced brace). The final states is identical to that of v2. - Improve the 0/5 description a little bit. Changes from v1: - Split patches 2 and 3 - Log all arch sources in init_std_data - Fix the 32-bit kaslr build Andy Lutomirski (5): x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit random,x86: Add arch_get_slow_rng_u64 random: Seed pools from arch_get_slow_rng_u64 at startup random: Log how many bits we managed to seed with in init_std_data x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available Documentation/virtual/kvm/cpuid.txt | 3 +++ arch/x86/Kconfig | 4 arch/x86/boot/compressed/aslr.c | 27 +++ arch/x86/include/asm/archslowrng.h | 30 ++ arch/x86/include/asm/processor.h | 21 ++--- arch/x86/include/uapi/asm/kvm_para.h | 2 ++ arch/x86/kernel/kvm.c| 22 ++ arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/x86.c | 4 drivers/char/random.c| 20 ++-- include/linux/random.h | 9 + 11 files changed, 139 insertions(+), 6 deletions(-) create mode 100644 arch/x86/include/asm/archslowrng.h -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 3/5] random: Seed pools from arch_get_slow_rng_u64 at startup
This should help solve the problem of guests starting out with predictable RNG state. Signed-off-by: Andy Lutomirski l...@amacapital.net --- drivers/char/random.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/char/random.c b/drivers/char/random.c index 0a7ac0a..17ad33d 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1261,6 +1261,13 @@ static void init_std_data(struct entropy_store *r) mix_pool_bytes(r, rv, sizeof(rv), NULL); } mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL); + + for (i = 0; i 4; i++) { + u64 rv64; + + if (arch_get_slow_rng_u64(rv64)) + mix_pool_bytes(r, rv64, sizeof(rv64), NULL); + } } /* -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64
arch_get_slow_rng_u64 tries to get 64 bits of RNG seed data. Unlike arch_get_random_{bytes,seed}, etc., it makes no claims about entropy content. It's also likely to be much slower and should not be used frequently. That being said, it should be fast enough to call several times during boot without any noticeable slowdown. This initial implementation backs it with MSR_KVM_GET_RNG_SEED if available. The intent is for other hypervisor guest implementations to implement this interface. Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/Kconfig | 4 arch/x86/include/asm/archslowrng.h | 30 ++ arch/x86/kernel/kvm.c | 22 ++ include/linux/random.h | 9 + 4 files changed, 65 insertions(+) create mode 100644 arch/x86/include/asm/archslowrng.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a8f749e..4dfb539 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -593,6 +593,7 @@ config KVM_GUEST bool KVM Guest support (including kvmclock) depends on PARAVIRT select PARAVIRT_CLOCK + select ARCH_SLOW_RNG default y ---help--- This option enables various optimizations for running under the KVM @@ -627,6 +628,9 @@ config PARAVIRT_TIME_ACCOUNTING config PARAVIRT_CLOCK bool +config ARCH_SLOW_RNG + bool + endif #HYPERVISOR_GUEST config NO_BOOTMEM diff --git a/arch/x86/include/asm/archslowrng.h b/arch/x86/include/asm/archslowrng.h new file mode 100644 index 000..c8e8d0d --- /dev/null +++ b/arch/x86/include/asm/archslowrng.h @@ -0,0 +1,30 @@ +/* + * This file is part of the Linux kernel. + * + * Copyright (c) 2014 Andy Lutomirski + * Authors: Andy Lutomirski l...@amacapital.net + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#ifndef ASM_X86_ARCHSLOWRANDOM_H +#define ASM_X86_ARCHSLOWRANDOM_H + +#ifndef CONFIG_ARCH_SLOW_RNG +# error archslowrng.h should not be included if !CONFIG_ARCH_SLOW_RNG +#endif + +/* + * Performance is irrelevant here, so there's no point in using the + * paravirt ops mechanism. Instead just use a function pointer. + */ +extern int (*arch_get_slow_rng_u64)(u64 *v); + +#endif /* ASM_X86_ARCHSLOWRANDOM_H */ diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 3dd8e2c..8d64d28 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -416,6 +416,25 @@ void kvm_disable_steal_time(void) wrmsr(MSR_KVM_STEAL_TIME, 0, 0); } +static int nop_get_slow_rng_u64(u64 *v) +{ + return 0; +} + +static int kvm_get_slow_rng_u64(u64 *v) +{ + /* +* Allow migration from a hypervisor with the GET_RNG_SEED +* feature to a hypervisor without it. +*/ + if (rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0) + return 1; + else + return 0; +} + +int (*arch_get_slow_rng_u64)(u64 *v) = nop_get_slow_rng_u64; + #ifdef CONFIG_SMP static void __init kvm_smp_prepare_boot_cpu(void) { @@ -493,6 +512,9 @@ void __init kvm_guest_init(void) if (kvmclock_vsyscall) kvm_setup_vsyscall_timeinfo(); + if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) + arch_get_slow_rng_u64 = kvm_get_slow_rng_u64; + #ifdef CONFIG_SMP smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu; register_cpu_notifier(kvm_cpu_notifier); diff --git a/include/linux/random.h b/include/linux/random.h index 57fbbff..ceafbcf 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -106,6 +106,15 @@ static inline int arch_has_random_seed(void) } #endif +#ifdef CONFIG_ARCH_SLOW_RNG +# include asm/archslowrng.h +#else +static inline int arch_get_slow_rng_u64(u64 *v) +{ + return 0; +} +#endif + /* Pseudo random number generator from numerical recipes. */ static inline u32 next_pseudo_random32(u32 seed) { -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 5/5] x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available
It's considerably better than any of the alternatives on KVM. Rather than reinventing all of the cpu feature query code, this fixes native_cpuid to work in PIC objects. I haven't combined it with boot/cpuflags.c's cpuid implementation: including asm/processor.h from boot/cpuflags.c results in a flood of unrelated errors, and fixing it might be messy. Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/boot/compressed/aslr.c | 27 +++ arch/x86/include/asm/processor.h | 21 ++--- 2 files changed, 45 insertions(+), 3 deletions(-) diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c index fc6091a..8583f0e 100644 --- a/arch/x86/boot/compressed/aslr.c +++ b/arch/x86/boot/compressed/aslr.c @@ -5,6 +5,8 @@ #include asm/archrandom.h #include asm/e820.h +#include uapi/asm/kvm_para.h + #include generated/compile.h #include linux/module.h #include linux/uts.h @@ -15,6 +17,22 @@ static const char build_str[] = UTS_RELEASE ( LINUX_COMPILE_BY @ LINUX_COMPILE_HOST ) ( LINUX_COMPILER ) UTS_VERSION; +static bool kvm_para_has_feature(unsigned int feature) +{ + u32 kvm_base; + u32 features; + + if (!has_cpuflag(X86_FEATURE_HYPERVISOR)) + return false; + + kvm_base = hypervisor_cpuid_base(KVMKVMKVM\0\0\0, KVM_CPUID_FEATURES); + if (!kvm_base) + return false; + + features = cpuid_eax(kvm_base | KVM_CPUID_FEATURES); + return features (1UL feature); +} + #define I8254_PORT_CONTROL 0x43 #define I8254_PORT_COUNTER00x40 #define I8254_CMD_READBACK 0xC0 @@ -81,6 +99,15 @@ static unsigned long get_random_long(void) } } + if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) { + u64 seed; + + debug_putstr( MSR_KVM_GET_RNG_SEED); + rdmsrl(MSR_KVM_GET_RNG_SEED, seed); + random ^= (unsigned long)seed; + use_i8254 = false; + } + if (has_cpuflag(X86_FEATURE_TSC)) { debug_putstr( RDTSC); rdtscll(raw); diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index a4ea023..6096f3c 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -189,10 +189,25 @@ static inline int have_cpuid_p(void) static inline void native_cpuid(unsigned int *eax, unsigned int *ebx, unsigned int *ecx, unsigned int *edx) { - /* ecx is often an input as well as an output. */ - asm volatile(cpuid + /* +* This function can be used from the boot code, so it needs +* to avoid using EBX in constraints in PIC mode. +* +* ecx is often an input as well as an output. +*/ + asm volatile(.ifnc %%ebx,%1 ; .ifnc %%rbx,%1 \n\t +movl %%ebx,%1\n\t +.endif ; .endif \n\t +cpuid \n\t +.ifnc %%ebx,%1 ; .ifnc %%rbx,%1 \n\t +xchgl %%ebx,%1\n\t +.endif ; .endif : =a (*eax), - =b (*ebx), +#if defined(__i386__) defined(__PIC__) + =r (*ebx), /* gcc won't let us use ebx */ +#else + =b (*ebx), /* ebx is okay */ +#endif =c (*ecx), =d (*edx) : 0 (*eax), 2 (*ecx) -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 4/5] random: Log how many bits we managed to seed with in init_std_data
This is useful for making sure that init_std_data is working correctly and for allaying fear when this happens: random: xyz urandom read with SMALL_NUMBER bits of entropy available Signed-off-by: Andy Lutomirski l...@amacapital.net --- drivers/char/random.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index 17ad33d..10e9642 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1251,12 +1251,16 @@ static void init_std_data(struct entropy_store *r) int i; ktime_t now = ktime_get_real(); unsigned long rv; + int arch_seed_bits = 0, arch_random_bits = 0, slow_rng_bits = 0; r-last_pulled = jiffies; mix_pool_bytes(r, now, sizeof(now), NULL); for (i = r-poolinfo-poolbytes; i 0; i -= sizeof(rv)) { - if (!arch_get_random_seed_long(rv) - !arch_get_random_long(rv)) + if (arch_get_random_seed_long(rv)) + arch_seed_bits += 8 * sizeof(rv); + else if (arch_get_random_long(rv)) + arch_random_bits += 8 * sizeof(rv); + else rv = random_get_entropy(); mix_pool_bytes(r, rv, sizeof(rv), NULL); } @@ -1265,9 +1269,14 @@ static void init_std_data(struct entropy_store *r) for (i = 0; i 4; i++) { u64 rv64; - if (arch_get_slow_rng_u64(rv64)) + if (arch_get_slow_rng_u64(rv64)) { mix_pool_bytes(r, rv64, sizeof(rv64), NULL); + slow_rng_bits += 8 * sizeof(rv64); + } } + + pr_info(random: seeded %s pool with %d bits of arch random seed, %d bits of arch random, and %d bits of arch slow rng\n, + r-name, arch_seed_bits, arch_random_bits, slow_rng_bits); } /* -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/5] x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
This adds a simple interface to allow a guest to request 64 bits of host nonblocking entropy. This is independent of virtio-rng for a couple of reasons: - It's intended to be usable during early boot, when a trivial synchronous interface is needed. - virtio-rng gives blocking entropy, and making guest boot wait for the host's /dev/random will cause problems. MSR_KVM_GET_RNG_SEED is intended to provide 64 bits of best-effort cryptographically secure data for use as a seed. It provides no guarantee that the result contains any actual entropy. Signed-off-by: Andy Lutomirski l...@amacapital.net --- Documentation/virtual/kvm/cpuid.txt | 3 +++ arch/x86/include/uapi/asm/kvm_para.h | 2 ++ arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/x86.c | 4 4 files changed, 11 insertions(+), 1 deletion(-) diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt index 3c65feb..0ab043b 100644 --- a/Documentation/virtual/kvm/cpuid.txt +++ b/Documentation/virtual/kvm/cpuid.txt @@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT || 7 || guest checks this feature bit || || before enabling paravirtualized || || spinlock support. -- +KVM_FEATURE_GET_RNG_SEED || 8 || host provides rng seed data via + || || MSR_KVM_GET_RNG_SEED. +-- KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side || || per-cpu warps are expected in || || kvmclock. diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 94dc8ca..e2eaf93 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -24,6 +24,7 @@ #define KVM_FEATURE_STEAL_TIME 5 #define KVM_FEATURE_PV_EOI 6 #define KVM_FEATURE_PV_UNHALT 7 +#define KVM_FEATURE_GET_RNG_SEED 8 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. @@ -40,6 +41,7 @@ #define MSR_KVM_ASYNC_PF_EN 0x4b564d02 #define MSR_KVM_STEAL_TIME 0x4b564d03 #define MSR_KVM_PV_EOI_EN 0x4b564d04 +#define MSR_KVM_GET_RNG_SEED 0x4b564d05 struct kvm_steal_time { __u64 steal; diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 38a0afe..40d6763 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -479,7 +479,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_PV_EOI) | (1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | -(1 KVM_FEATURE_PV_UNHALT); +(1 KVM_FEATURE_PV_UNHALT) | +(1 KVM_FEATURE_GET_RNG_SEED); if (sched_info_on()) entry-eax |= (1 KVM_FEATURE_STEAL_TIME); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f644933..4e81853 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -48,6 +48,7 @@ #include linux/pci.h #include linux/timekeeper_internal.h #include linux/pvclock_gtod.h +#include linux/random.h #include trace/events/kvm.h #define CREATE_TRACE_POINTS @@ -2480,6 +2481,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case MSR_KVM_PV_EOI_EN: data = vcpu-arch.pv_eoi.msr_val; break; + case MSR_KVM_GET_RNG_SEED: + get_random_bytes(data, sizeof(data)); + break; case MSR_IA32_P5_MC_ADDR: case MSR_IA32_P5_MC_TYPE: case MSR_IA32_MCG_CAP: -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64
On 07/16/2014 02:45 PM, Andy Lutomirski wrote: diff --git a/arch/x86/include/asm/archslowrng.h b/arch/x86/include/asm/archslowrng.h new file mode 100644 index 000..c8e8d0d --- /dev/null +++ b/arch/x86/include/asm/archslowrng.h @@ -0,0 +1,30 @@ +/* + * This file is part of the Linux kernel. + * + * Copyright (c) 2014 Andy Lutomirski + * Authors: Andy Lutomirski l...@amacapital.net + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#ifndef ASM_X86_ARCHSLOWRANDOM_H +#define ASM_X86_ARCHSLOWRANDOM_H + +#ifndef CONFIG_ARCH_SLOW_RNG +# error archslowrng.h should not be included if !CONFIG_ARCH_SLOW_RNG +#endif + I'm *seriously* questioning the wisdom of this. A much saner thing would be to do: #ifndef CONFIG_ARCH_SLOW_RNG /* Not supported */ static inline int arch_get_slow_rng_u64(u64 *v) { (void)v; return 0; } #endif ... which is basically what we do for the archrandom stuff. I'm also wondering if it makes sense to have a function which prefers arch_get_random*() over this one as a preferred interface. Something like: int get_random_arch_u64_slow_ok(u64 *v) { int i; u64 x = 0; unsigned long l; for (i = 0; i 64/BITS_PER_LONG; i++) { if (!arch_get_random_long(l)) return arch_get_slow_rng_u64(v); x |= l (i*BITS_PER_LONG); } *v = l; return 0; } This still doesn't address the issue e.g. on x86 where RDRAND is available but we haven't set up alternatives yet. So it might be that what we really want is to encapsulate this fallback in arch code and do a more direct enumeration. + +static int kvm_get_slow_rng_u64(u64 *v) +{ + /* + * Allow migration from a hypervisor with the GET_RNG_SEED + * feature to a hypervisor without it. + */ + if (rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0) + return 1; + else + return 0; +} How about: return rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0; The naming also feels really inconsistent... -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64
On Wed, Jul 16, 2014 at 2:59 PM, H. Peter Anvin h...@zytor.com wrote: On 07/16/2014 02:45 PM, Andy Lutomirski wrote: diff --git a/arch/x86/include/asm/archslowrng.h b/arch/x86/include/asm/archslowrng.h new file mode 100644 index 000..c8e8d0d --- /dev/null +++ b/arch/x86/include/asm/archslowrng.h @@ -0,0 +1,30 @@ +/* + * This file is part of the Linux kernel. + * + * Copyright (c) 2014 Andy Lutomirski + * Authors: Andy Lutomirski l...@amacapital.net + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#ifndef ASM_X86_ARCHSLOWRANDOM_H +#define ASM_X86_ARCHSLOWRANDOM_H + +#ifndef CONFIG_ARCH_SLOW_RNG +# error archslowrng.h should not be included if !CONFIG_ARCH_SLOW_RNG +#endif + I'm *seriously* questioning the wisdom of this. A much saner thing would be to do: #ifndef CONFIG_ARCH_SLOW_RNG /* Not supported */ static inline int arch_get_slow_rng_u64(u64 *v) { (void)v; return 0; } #endif ... which is basically what we do for the archrandom stuff. The archrandom stuff defines the not supported variant in the generic header, which is what I'm doing here. I could wrap all of asm/archslowrng.h in #ifdef CONFIG_ARCH_SLOW_RNG instead of putting the #error in there, but I have no strong preference. I'm also wondering if it makes sense to have a function which prefers arch_get_random*() over this one as a preferred interface. Something like: int get_random_arch_u64_slow_ok(u64 *v) { int i; u64 x = 0; unsigned long l; for (i = 0; i 64/BITS_PER_LONG; i++) { if (!arch_get_random_long(l)) return arch_get_slow_rng_u64(v); x |= l (i*BITS_PER_LONG); } *v = l; return 0; } I played with something like this earlier, but I dropped it when it ended up having exactly one user. I suspect that the highly paranoid will actually prefer seeding with both sources in init_std_data even if RDRAND is available -- it costs very little and it provides a bit of extra assurance. This still doesn't address the issue e.g. on x86 where RDRAND is available but we haven't set up alternatives yet. So it might be that what we really want is to encapsulate this fallback in arch code and do a more direct enumeration. My personal preference is to defer this until some user shows up. I think that even this would be too complicated for KASLR, which is the only extremely early-boot user that I found. Hmm. Does the prandom stuff want to use this? + +static int kvm_get_slow_rng_u64(u64 *v) +{ + /* + * Allow migration from a hypervisor with the GET_RNG_SEED + * feature to a hypervisor without it. + */ + if (rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0) + return 1; + else + return 0; +} How about: return rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0; The naming also feels really inconsistent... Better ideas welcome. I could call the generic function arch_get_pv_random_seed, but maybe someone will come up with a non-paravirt implementation. --Andy -hpa -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86: emulator injects #DB when RFLAGS.RF is set
If the RFLAGS.RF is set, then no #DB should occur on instruction breakpoints. However, the KVM emulator injects #DB regardless to RFLAGS.RF. This patch fixes this behavior. KVM, however, still appears not to update RFLAGS.RF correctly, regardless of this patch. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/kvm/x86.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fae064f..e341a81 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5168,7 +5168,8 @@ static bool kvm_vcpu_check_breakpoint(struct kvm_vcpu *vcpu, int *r) } } - if (unlikely(vcpu-arch.dr7 DR7_BP_EN_MASK)) { + if (unlikely(vcpu-arch.dr7 DR7_BP_EN_MASK) + !(kvm_get_rflags(vcpu) X86_EFLAGS_RF)) { dr6 = kvm_vcpu_check_hw_bp(eip, 0, vcpu-arch.dr7, vcpu-arch.db); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64
On Wed, Jul 16, 2014 at 3:13 PM, Andy Lutomirski l...@amacapital.net wrote: My personal preference is to defer this until some user shows up. I think that even this would be too complicated for KASLR, which is the only extremely early-boot user that I found. Hmm. Does the prandom stuff want to use this? prandom isn't even using rdrand. I'd suggest fixing this separately, or even just waiting until someone goes and deletes prandom. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64
On 07/16/2014 03:40 PM, Andy Lutomirski wrote: On Wed, Jul 16, 2014 at 3:13 PM, Andy Lutomirski l...@amacapital.net wrote: My personal preference is to defer this until some user shows up. I think that even this would be too complicated for KASLR, which is the only extremely early-boot user that I found. Hmm. Does the prandom stuff want to use this? prandom isn't even using rdrand. I'd suggest fixing this separately, or even just waiting until someone goes and deletes prandom. prandom is exactly the opposite; it is designed for when we need possibly low quality random numbers very quickly. RDRAND is actually too slow. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64
On Jul 16, 2014 4:00 PM, H. Peter Anvin h...@zytor.com wrote: On 07/16/2014 03:40 PM, Andy Lutomirski wrote: On Wed, Jul 16, 2014 at 3:13 PM, Andy Lutomirski l...@amacapital.net wrote: My personal preference is to defer this until some user shows up. I think that even this would be too complicated for KASLR, which is the only extremely early-boot user that I found. Hmm. Does the prandom stuff want to use this? prandom isn't even using rdrand. I'd suggest fixing this separately, or even just waiting until someone goes and deletes prandom. prandom is exactly the opposite; it is designed for when we need possibly low quality random numbers very quickly. RDRAND is actually too slow. I meant that prandom isn't using rdrand for early seeding. --Andy -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64
On 07/16/2014 05:03 PM, Andy Lutomirski wrote: prandom is exactly the opposite; it is designed for when we need possibly low quality random numbers very quickly. RDRAND is actually too slow. I meant that prandom isn't using rdrand for early seeding. We should probably fix that. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] KVM: nVMX: Fix fail to get nested ack intr's vector during nested vmexit
WARNING: CPU: 9 PID: 7251 at arch/x86/kvm/vmx.c:8719 nested_vmx_vmexit+0xa4/0x233 [kvm_intel]() Modules linked in: tun nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 dns_resolver nfs fscache lockd sunrpc pci_stub netconsole kvm_intel kvm bridge stp llc autofs4 8021q ipv6 uinput joydev microcode pcspkr igb i2c_algo_bit ehci_pci ehci_hcd e1000e ixgbe ptp pps_core hwmon mdio i2c_i801 i2c_core tpm_tis tpm ipmi_si ipmi_msghandler isci libsas scsi_transport_sas button dm_mirror dm_region_hash dm_log dm_mod CPU: 9 PID: 7251 Comm: qemu-system-x86 Tainted: GW 3.16.0-rc1 #2 Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.131329 11/11/2013 220f 880ffd107bf8 81493563 220f 880ffd107c38 8103f0eb 880ffd107c48 a059709a 881ffc9e0040 8800b74b8000 Call Trace: [81493563] dump_stack+0x49/0x5e [8103f0eb] warn_slowpath_common+0x7c/0x96 [a059709a] ? nested_vmx_vmexit+0xa4/0x233 [kvm_intel] [8103f11a] warn_slowpath_null+0x15/0x17 [a059709a] nested_vmx_vmexit+0xa4/0x233 [kvm_intel] [a0594295] ? nested_vmx_exit_handled+0x6a/0x39e [kvm_intel] [a0537931] ? kvm_apic_has_interrupt+0x80/0xd5 [kvm] [a05972ec] vmx_check_nested_events+0xc3/0xd3 [kvm_intel] [a051ebe9] inject_pending_event+0xd0/0x16e [kvm] [a051efa0] vcpu_enter_guest+0x319/0x704 [kvm] After commit 77b0f5d (KVM: nVMX: Ack and write vector info to intr_info if L1 asks us to), Acknowledge interrupt on exit behavior can be emulated. Current logic will ask for intr vector if it is nested vmexit and VM_EXIT_ACK_INTR_ON_EXIT is set by L1. However, intr vector for posted intr can't be got by generic read pending interrupt vector and intack routine, there is a requirement to sync from pir to irr. This patch fix it by ask the intr vector after sync pir to irr. Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com --- arch/x86/kvm/lapic.c | 1 + arch/x86/kvm/vmx.c | 3 +++ 2 files changed, 4 insertions(+) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 0069118..b7d45dc 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1637,6 +1637,7 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu) apic_clear_irr(vector, apic); return vector; } +EXPORT_SYMBOL_GPL(kvm_get_apic_interrupt); void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 4ae5ad8..31f1479 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8697,6 +8697,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT) nested_exit_intr_ack_set(vcpu)) { int irq = kvm_cpu_get_interrupt(vcpu); + + if (irq 0 kvm_apic_vid_enabled(vcpu-kvm)) + irq = kvm_get_apic_interrupt(vcpu); WARN_ON(irq 0); vmcs12-vm_exit_intr_info = irq | INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] KVM: nVMX: Fix vmptrld fail and vmwrite error when L1 goes down
This bug can be trigger by L1 goes down directly w/ enable_shadow_vmcs. [ 6413.158950] kvm: vmptrld (null)/7800 failed [ 6413.158954] vmwrite error: reg 401e value 4 (err 1) [ 6413.158957] CPU: 0 PID: 4840 Comm: qemu-system-x86 Tainted: G OE 3.16.0kvm+ #2 [ 6413.158958] Hardware name: Dell Inc. OptiPlex 9020/0DNKMN, BIOS A05 12/05/2013 [ 6413.158959] 0003 880210c9fb58 81741de9 8800d7433f80 [ 6413.158960] 880210c9fb68 a059fa08 880210c9fb78 a05938bf [ 6413.158962] 880210c9fba8 a059a97f 8800d7433f80 0003 [ 6413.158963] Call Trace: [ 6413.158968] [81741de9] dump_stack+0x45/0x56 [ 6413.158972] [a059fa08] vmwrite_error+0x2c/0x2e [kvm_intel] [ 6413.158974] [a05938bf] vmcs_writel+0x1f/0x30 [kvm_intel] [ 6413.158976] [a059a97f] free_nested.part.73+0x5f/0x170 [kvm_intel] [ 6413.158978] [a059ab13] vmx_free_vcpu+0x33/0x70 [kvm_intel] [ 6413.158991] [a0360324] kvm_arch_vcpu_free+0x44/0x50 [kvm] [ 6413.158998] [a0360f92] kvm_arch_destroy_vm+0xf2/0x1f0 [kvm] Commit 26a865 (KVM: VMX: fix use after free of vmx-loaded_vmcs) fix the use after free bug by move free_loaded_vmcs() before free_nested(), however, this lead to free loaded_vmcs-vmcs premature and vmptrld load a NULL pointer during sync shadow vmcs to vmcs12. In addition, vmwrite which used to disable shadow vmcs and reset VMCS_LINK_POINTER failed since there is no valid current-VMCS. This patch fix it by skipping sync shadow vmcs and reset vmcs field for L1 destroy since they will be reinitialized after L1 recreate. Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com --- arch/x86/kvm/vmx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index fbce89e..2b28da7 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6113,9 +6113,9 @@ static void free_nested(struct vcpu_vmx *vmx) return; vmx-nested.vmxon = false; if (vmx-nested.current_vmptr != -1ull) { - nested_release_vmcs12(vmx); vmx-nested.current_vmptr = -1ull; vmx-nested.current_vmcs12 = NULL; + nested_release_vmcs12(vmx); } if (enable_shadow_vmcs) free_vmcs(vmx-nested.current_shadow_vmcs); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] KVM: nVMX: Fix virtual interrupt delivery injection
This patch fix bug reported in https://bugzilla.kernel.org/show_bug.cgi?id=73331, after the patch http://www.spinics.net/lists/kvm/msg105230.html applied, there is some progress and the L2 can boot up, however, slowly. The original idea of this fix vid injection patch is from Zhang, Yang Z yang.z.zh...@intel.com. Interrupt which delivered by vid should be injected to L1 by L0 if current is in L1, or should be injected to L2 by L0 through the old injection way if L1 doesn't have set VM_EXIT_ACK_INTR_ON_EXIT. The current logic doen't consider these cases. This patch fix it by vid intr to L1 if current is L1 or L2 through old injection way if L1 doen't have VM_EXIT_ACK_INTR_ON_EXIT set. Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com Signed-off-by: Zhang, Yang Z yang.z.zh...@intel.com --- arch/x86/kvm/vmx.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 021d84a..ad36646 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7112,8 +7112,22 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr) { if (max_irr == -1) return; - - vmx_set_rvi(max_irr); + if (!is_guest_mode(vcpu)) { + vmx_set_rvi(max_irr); + } else if (is_guest_mode(vcpu) !nested_exit_on_intr(vcpu)) { + /* +* Fall back to old way to inject the interrupt since there +* is no vAPIC-v for L2. +*/ + if (vcpu-arch.exception.pending || + vcpu-arch.nmi_injected || + vcpu-arch.interrupt.pending) + return; + else if (vmx_interrupt_allowed(vcpu)) { + kvm_queue_interrupt(vcpu, max_irr, false); + vmx_inject_irq(vcpu); + } + } } static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/3] KVM: nVMX: Fix fail to get nested ack intr's vector during nested vmexit
Wanpeng Li wrote on 2014-07-17: WARNING: CPU: 9 PID: 7251 at arch/x86/kvm/vmx.c:8719 nested_vmx_vmexit+0xa4/0x233 [kvm_intel]() Modules linked in: tun nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 dns_resolver nfs fscache lockd sunrpc pci_stub netconsole kvm_intel kvm bridge stp llc autofs4 8021q ipv6 uinput joydev microcode pcspkr igb i2c_algo_bit ehci_pci ehci_hcd e1000e ixgbe ptp pps_core hwmon mdio i2c_i801 i2c_core tpm_tis tpm ipmi_si ipmi_msghandler isci libsas scsi_transport_sas button dm_mirror dm_region_hash dm_log dm_mod CPU: 9 PID: 7251 Comm: qemu-system-x86 Tainted: GW 3.16.0-rc1 #2 Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.131329 11/11/2013 220f 880ffd107bf8 81493563 220f 880ffd107c38 8103f0eb 880ffd107c48 a059709a 881ffc9e0040 8800b74b8000 Call Trace: [81493563] dump_stack+0x49/0x5e [8103f0eb] warn_slowpath_common+0x7c/0x96 [a059709a] ? nested_vmx_vmexit+0xa4/0x233 [kvm_intel] [8103f11a] warn_slowpath_null+0x15/0x17 [a059709a] nested_vmx_vmexit+0xa4/0x233 [kvm_intel] [a0594295] ? nested_vmx_exit_handled+0x6a/0x39e [kvm_intel] [a0537931] ? kvm_apic_has_interrupt+0x80/0xd5 [kvm] [a05972ec] vmx_check_nested_events+0xc3/0xd3 [kvm_intel] [a051ebe9] inject_pending_event+0xd0/0x16e [kvm] [a051efa0] vcpu_enter_guest+0x319/0x704 [kvm] After commit 77b0f5d (KVM: nVMX: Ack and write vector info to intr_info if L1 asks us to), Acknowledge interrupt on exit behavior can be emulated. Current logic will ask for intr vector if it is nested vmexit and VM_EXIT_ACK_INTR_ON_EXIT is set by L1. However, intr vector for posted intr can't be got by generic read pending interrupt vector and intack routine, there is a requirement to sync from pir to irr. This patch fix it by ask the intr vector after sync pir to irr. Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com Reviewed-by: Yang Zhang yang.z.zh...@intel.com --- arch/x86/kvm/lapic.c | 1 + arch/x86/kvm/vmx.c | 3 +++ 2 files changed, 4 insertions(+) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 0069118..b7d45dc 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1637,6 +1637,7 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu) apic_clear_irr(vector, apic); return vector; } +EXPORT_SYMBOL_GPL(kvm_get_apic_interrupt); void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 4ae5ad8..31f1479 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8697,6 +8697,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT) nested_exit_intr_ack_set(vcpu)) { int irq = kvm_cpu_get_interrupt(vcpu); + + if (irq 0 kvm_apic_vid_enabled(vcpu-kvm)) + irq = kvm_get_apic_interrupt(vcpu); WARN_ON(irq 0); vmcs12-vm_exit_intr_info = irq | INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR; Best regards, Yang -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: ppc: booke: Restore SPRG3 when entering guest
SPRG3 is guest accessible and SPRG3 can be clobbered by host or another guest, So this need to be restored when loading guest state. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/booke_interrupts.S | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kvm/booke_interrupts.S b/arch/powerpc/kvm/booke_interrupts.S index 2c6deb5ef..0d3403f 100644 --- a/arch/powerpc/kvm/booke_interrupts.S +++ b/arch/powerpc/kvm/booke_interrupts.S @@ -459,6 +459,8 @@ lightweight_exit: * written directly to the shared area, so we * need to reload them here with the guest's values. */ + PPC_LD(r3, VCPU_SHARED_SPRG3, r5) + mtspr SPRN_SPRG3, r3 PPC_LD(r3, VCPU_SHARED_SPRG4, r5) mtspr SPRN_SPRG4W, r3 PPC_LD(r3, VCPU_SHARED_SPRG5, r5) -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8
The POWER8 processor has a Micro Partition Prefetch Engine, which is a fancy way of saying has way to store and load contents of L2 or L2+MRU way of L3 cache. We initiate the storing of the log (list of addresses) using the logmpp instruction and start restore by writing to a SPR. The logmpp instruction takes parameters in a single 64bit register: - starting address of the table to store log of L2/L2+L3 cache contents - 32kb for L2 - 128kb for L2+L3 - Aligned relative to maximum size of the table (32kb or 128kb) - Log control (no-op, L2 only, L2 and L3, abort logout) We should abort any ongoing logging before initiating one. To initiate restore, we write to the MPPR SPR. The format of what to write to the SPR is similar to the logmpp instruction parameter: - starting address of the table to read from (same alignment requirements) - table size (no data, until end of table) - prefetch rate (from fastest possible to slower. about every 8, 16, 24 or 32 cycles) The idea behind loading and storing the contents of L2/L3 cache is to reduce memory latency in a system that is frequently swapping vcores on a physical CPU. The best case scenario for doing this is when some vcores are doing very cache heavy workloads. The worst case is when they have about 0 cache hits, so we just generate needless memory operations. This implementation just does L2 store/load. In my benchmarks this proves to be useful. Benchmark 1: - 16 core POWER8 - 3x Ubuntu 14.04LTS guests (LE) with 8 VCPUs each - No split core/SMT - two guests running sysbench memory test. sysbench --test=memory --num-threads=8 run - one guest running apache bench (of default HTML page) ab -n 49 -c 400 http://localhost/ This benchmark aims to measure performance of real world application (apache) where other guests are cache hot with their own workloads. The sysbench memory benchmark does pointer sized writes to a (small) memory buffer in a loop. In this benchmark with this patch I can see an improvement both in requests per second (~5%) and in mean and median response times (again, about 5%). The spread of minimum and maximum response times were largely unchanged. benchmark 2: - Same VM config as benchmark 1 - all three guests running sysbench memory benchmark This benchmark aims to see if there is a positive or negative affect to this cache heavy benchmark. Although due to the nature of the benchmark (stores) we may not see a difference in performance, but rather hopefully an improvement in consistency of performance (when vcore switched in, don't have to wait many times for cachelines to be pulled in) The results of this benchmark are improvements in consistency of performance rather than performance itself. With this patch, the few outliers in duration go away and we get more consistent performance in each guest. benchmark 3: - same 3 guests and CPU configuration as benchmark 1 and 2. - two idle guests - 1 guest running STREAM benchmark This scenario also saw performance improvement with this patch. On Copy and Scale workloads from STREAM, I got 5-6% improvement with this patch. For Add and triad, it was around 10% (or more). benchmark 4: - same 3 guests as previous benchmarks - two guests running sysbench --memory, distinctly different cache heavy workload - one guest running STREAM benchmark. Similar improvements to benchmark 3. benchmark 5: - 1 guest, 8 VCPUs, Ubuntu 14.04 - Host configured with split core (SMT8, subcores-per-core=4) - STREAM benchmark In this benchmark, we see a 10-20% performance improvement across the board of STREAM benchmark results with this patch. Based on preliminary investigation and microbenchmarks by Prerna Saxena pre...@linux.vnet.ibm.com Signed-off-by: Stewart Smith stew...@linux.vnet.ibm.com -- changes since v2: - based on feedback from Alexander Graf: - move save and restore of cache to separate functions - move allocation of mpp_buffer to vcore creation - get_free_pages() does actually allocate pages aligned to order (Mel Gorman confirms) - make SPR and logmpp parameters a bit less magic, especially around abort changes since v1: - s/mppe/mpp_buffer/ - add MPP_BUFFER_ORDER define. --- arch/powerpc/include/asm/kvm_host.h |2 + arch/powerpc/include/asm/ppc-opcode.h | 17 +++ arch/powerpc/include/asm/reg.h|1 + arch/powerpc/kvm/book3s_hv.c | 89 + 4 files changed, 98 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 1eaea2d..5769497 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -305,6 +305,8 @@ struct kvmppc_vcore { u32 arch_compat; ulong pcr; ulong dpdes;/* doorbell state (POWER8) */ + unsigned long mpp_buffer; /* Micro Partition Prefetch buffer */ + bool mpp_buffer_is_valid; }; #define VCORE_ENTRY_COUNT(vc)