date:20140716

[PATCH] kvm: ppc: booke: Restore SPRG3 when entering guest

2014-07-16 Thread Bharat Bhushan

SPRG3 is guest accessible and SPRG3 can be clobbered by host
or another guest, So this need to be restored when loading
guest state.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
 arch/powerpc/kvm/booke_interrupts.S | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/booke_interrupts.S 
b/arch/powerpc/kvm/booke_interrupts.S
index 2c6deb5ef..0d3403f 100644
--- a/arch/powerpc/kvm/booke_interrupts.S
+++ b/arch/powerpc/kvm/booke_interrupts.S
@@ -459,6 +459,8 @@ lightweight_exit:
 * written directly to the shared area, so we
 * need to reload them here with the guest's values.
 */
+   PPC_LD(r3, VCPU_SHARED_SPRG3, r5)
+   mtspr   SPRN_SPRG3, r3
PPC_LD(r3, VCPU_SHARED_SPRG4, r5)
mtspr   SPRN_SPRG4W, r3
PPC_LD(r3, VCPU_SHARED_SPRG5, r5)
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread Gleb Natapov

On Tue, Jul 15, 2014 at 07:48:06PM -0700, Andy Lutomirski wrote:
 virtio-rng is both too complicated and insufficient for initial rng
 seeding.  It's far too complicated to use for KASLR or any other
 early boot random number needs.  It also provides /dev/random-style
 bits, which means that making guest boot wait for virtio-rng is
 unacceptably slow, and doing it asynchronously means that
 /dev/urandom might be predictable when userspace starts.
 
 This introduces a very simple synchronous mechanism to get
 /dev/urandom-style bits.
Why can't you use RDRAND instruction for that?

 
 This is a KVM change: am I supposed to write a unit test somewhere?
 
 Andy Lutomirski (4):
   x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
   random,x86: Add arch_get_slow_rng_u64
   random: Seed pools from arch_get_slow_rng_u64 at startup
   x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available
 
  Documentation/virtual/kvm/cpuid.txt  |  3 +++
  arch/x86/Kconfig |  4 
  arch/x86/boot/compressed/aslr.c  | 27 +++
  arch/x86/include/asm/archslowrng.h   | 30 ++
  arch/x86/include/uapi/asm/kvm_para.h |  2 ++
  arch/x86/kernel/kvm.c| 22 ++
  arch/x86/kvm/cpuid.c |  3 ++-
  arch/x86/kvm/x86.c   |  4 
  drivers/char/random.c| 14 +-
  include/linux/random.h   |  9 +
  10 files changed, 116 insertions(+), 2 deletions(-)
  create mode 100644 arch/x86/include/asm/archslowrng.h
 
 -- 
 1.9.3
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread Daniel Borkmann


On 07/16/2014 08:41 AM, Gleb Natapov wrote:

On Tue, Jul 15, 2014 at 07:48:06PM -0700, Andy Lutomirski wrote:

virtio-rng is both too complicated and insufficient for initial rng
seeding.  It's far too complicated to use for KASLR or any other
early boot random number needs.  It also provides /dev/random-style
bits, which means that making guest boot wait for virtio-rng is
unacceptably slow, and doing it asynchronously means that
/dev/urandom might be predictable when userspace starts.

This introduces a very simple synchronous mechanism to get
/dev/urandom-style bits.



Why can't you use RDRAND instruction for that?


You mean using it directly? I think simply for the very same reasons
as in c2557a303a ...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread Gleb Natapov

On Wed, Jul 16, 2014 at 09:10:27AM +0200, Daniel Borkmann wrote:
 On 07/16/2014 08:41 AM, Gleb Natapov wrote:
 On Tue, Jul 15, 2014 at 07:48:06PM -0700, Andy Lutomirski wrote:
 virtio-rng is both too complicated and insufficient for initial rng
 seeding.  It's far too complicated to use for KASLR or any other
 early boot random number needs.  It also provides /dev/random-style
 bits, which means that making guest boot wait for virtio-rng is
 unacceptably slow, and doing it asynchronously means that
 /dev/urandom might be predictable when userspace starts.
 
 This introduces a very simple synchronous mechanism to get
 /dev/urandom-style bits.
 
 Why can't you use RDRAND instruction for that?
 
 You mean using it directly? I think simply for the very same reasons
 as in c2557a303a ...
So you trust your hypervisor vendor more than you trust your CPU vendor? :)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread Paolo Bonzini


Il 16/07/2014 09:10, Daniel Borkmann ha scritto:

On 07/16/2014 08:41 AM, Gleb Natapov wrote:

On Tue, Jul 15, 2014 at 07:48:06PM -0700, Andy Lutomirski wrote:

virtio-rng is both too complicated and insufficient for initial rng
seeding.  It's far too complicated to use for KASLR or any other
early boot random number needs.  It also provides /dev/random-style
bits, which means that making guest boot wait for virtio-rng is
unacceptably slow, and doing it asynchronously means that
/dev/urandom might be predictable when userspace starts.

This introduces a very simple synchronous mechanism to get
/dev/urandom-style bits.


Why can't you use RDRAND instruction for that?


You mean using it directly? I think simply for the very same reasons
as in c2557a303a ...


No, this is very different.  This mechanism provides no guarantee that 
the result contains any actual entropy.  In fact, patch 3 adds a call 
to the new arch_get_slow_rng_u64 just below a call to 
arch_get_random_lang aka RDRAND.  I agree with Gleb that it's simpler to 
just expect a relatively recent processor and use RDRAND.


BTW, the logic for crediting entropy to RDSEED but not RDRAND escapes 
me.  If you trust the processor, you could use Intel's algorithm to 
force reseeding of RDRAND.  If you don't trust the processor, the same 
paranoia applies to RDRAND and RDSEED.


In a guest you must trust the hypervisor anyway to use RDRAND or RDSEED, 
since the hypervisor can trap it.  A malicious hypervisor is no 
different from a malicious processor.


In any case, is there a matching QEMU patch somewhere?

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 73331] Nested Virtualization, L2 cannot boot up on Ivybridge and Haswell

2014-07-16 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=73331

Zhou, Chao chao.z...@intel.com changed:

   What|Removed |Added

 CC||chao.z...@intel.com

--- Comment #3 from Zhou, Chao chao.z...@intel.com ---
kvm.git + qemu.git:9f6226a7_0e162974
kernel version: 3.16.0-rc1
test on Ivytown_EP
after create L2 guest, L2 guest can boot up.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 73331] Nested Virtualization, L2 cannot boot up on Ivybridge and Haswell

2014-07-16 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=73331

Paolo Bonzini bonz...@gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||bonz...@gnu.org
 Resolution|--- |CODE_FIX

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 73331] Nested Virtualization, L2 cannot boot up on Ivybridge and Haswell

2014-07-16 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=73331

robert...@intel.com changed:

   What|Removed |Added

 Status|RESOLVED|VERIFIED

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM Test report, kernel 9f6226a7... qemu 0e162974...

2014-07-16 Thread Hu, Robert

Hi All,

This is KVM upstream test result against kvm.git next branch and qemu.git 
master branch.
 kvm.git next branch: 9f6226a762c7ae02f6a23a3d4fc552dafa57ea23 based on 
kernel 3.16.0-rc1
 qemu.git master branch: 0e16297461264b3ea8f7282d1195cf53aa8a707c

We found no new bug and two fix bugs in the past two months.

New issue (0):

Fixed issue (1):
1. [Nested kvm on kvm] L2 guest reboot continuously when create a 
rhel6u5(64bit) as L2 guest.
  https://bugzilla.kernel.org/show_bug.cgi?id=75981
  --Jan Kiszka fixed the bug.
2. Nested Virtualization,L2 cannot boot up on Ivybridge and Haswell.
  https://bugzilla.kernel.org/show_bug.cgi?id=73331

Old issues (6):
--
1. guest panic with parameter -cpu host in qemu command line (about vPMU 
issue).
  https://bugs.launchpad.net/qemu/+bug/994378
2. Guest hang when doing kernel build and writing data in guest.
  https://bugs.launchpad.net/qemu/+bug/1096814
3. with 'monitor pty', it needs to flush pts device after sending command to it 
  https://bugs.launchpad.net/qemu/+bug/1185228
4. [Nested] Windows XP Mode can not work
  https://bugzilla.kernel.org/show_bug.cgi?id=60782
5. [Nested]L2 guest failed to start in VMware on KVM
  https://bugzilla.kernel.org/show_bug.cgi?id=61411
6. [Nested]L1 call trace when create windows 7 guest as L2 guest
  https://bugzilla.kernel.org/show_bug.cgi?id=72381




Test environment:
==
  Platform   IvyBridge-EP    Sandybridge-EP   Haswell-EP
  CPU Cores   32    32   56
  Memory size 64GB  32GB 32GB


Best Regards,
Robert Ho










--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v7 00/14] kvm-unit-tests/arm: initial drop

2014-07-16 Thread Andrew Jones

This is v7 of a series that introduces arm to kvm-unit-tests. Three
of the v6 patches were already merged, so this is the remainder.
No new patches have been added, but some of the summaries have
changed.

This series first adds support for device trees (libfdt), and for
chr-testdev (virtio). Next, it adds the basic infrastructure for booting
a test case (guest), and adds a first test case, a self-test to confirm
setup was completed successfully. Finally, it further prepares the
framework for more complicated tests by adding vector support, and
extends the self-test to test that too.

This initial drop doesn't require kvmarm. qemu-system-arm is enough,
but qemu must have mach-virt, and the chr-testdev patch[1].

These patches (v7) are also available from a git repo here
https://github.com/rhdrjones/kvm-unit-tests/commits/arm/v7-initial-drop

The v6 patches are also available from a git repo here
https://github.com/rhdrjones/kvm-unit-tests/commits/arm/v6-initial-drop%2Cchr-testdev

and, the v5 patches are still available here
https://github.com/rhdrjones/kvm-unit-tests/commits/arm/v5-initial-drop

The main changes since v6 are the moving/redesigning/reAPIing of
arm's memregions to common code as phys_alloc, and the splitting of
virtio and virtio-mmio code into separate files. The main change
since v5 (as stated in the v6 cover letter) is the switch from
virtio-testdev to chr-testdev. Also, as stated in the v6 cover letter,
I've kept Christoffer's *-by's, and mainly the already Reviewed-by
patches

  05/14 add minimal virtio support for devtree virtio-mmio
  09/14 arm: initial drop

should get a second look (or interdiffed).

Thanks in advance for reviews!

[1] http://lists.nongnu.org/archive/html/qemu-devel/2014-07/msg01960.html


Andrew Jones (12):
  libfdt: Import libfdt source
  add support for Linux device trees
  Introduce asm-generic/*.h files
  Introduce lib/alloc
  add minimal virtio support for devtree virtio-mmio
  lib/asm-generic: add page.h and virt_to_phys/phys_to_virt
  virtio: add minimal support for virtqueues
  Introduce chr-testdev
  arm: initial drop
  arm: Add arch-specific asm/page.h and __va/__pa
  arm: add useful headers from the Linux kernel
  arm: vectors support

Christoffer Dall (2):
  arm: Add spinlock implementation
  arm: Add IO accessors to avoid register-writeback

 .gitignore   |1 +
 Makefile |   25 +-
 arm/cstart.S |  209 ++
 arm/flat.lds |   23 +
 arm/run  |   46 ++
 arm/selftest.c   |  210 ++
 arm/unittests.cfg|   30 +
 config/asm-offsets.mak   |   41 ++
 config/config-arm.mak|   81 +++
 configure|   23 +-
 lib/alloc.c  |  176 +
 lib/alloc.h  |  123 
 lib/argv.c   |9 +
 lib/arm/.gitignore   |1 +
 lib/arm/asm-offsets.c|   39 ++
 lib/arm/asm/asm-offsets.h|1 +
 lib/arm/asm/barrier.h|   18 +
 lib/arm/asm/cp15.h   |   37 ++
 lib/arm/asm/io.h |   94 +++
 lib/arm/asm/page.h   |   33 +
 lib/arm/asm/processor.h  |   39 ++
 lib/arm/asm/ptrace.h |  100 +++
 lib/arm/asm/setup.h  |   27 +
 lib/arm/asm/spinlock.h   |   11 +
 lib/arm/eabi_compat.c|   20 +
 lib/arm/io.c |   65 ++
 lib/arm/processor.c  |  111 
 lib/arm/setup.c  |   82 +++
 lib/arm/spinlock.c   |   28 +
 lib/asm-generic/io.h |  175 +
 lib/asm-generic/page.h   |   28 +
 lib/asm-generic/spinlock.h   |4 +
 lib/chr-testdev.c|   72 ++
 lib/chr-testdev.h|   14 +
 lib/devicetree.c |  272 
 lib/devicetree.h |  236 +++
 lib/generated/.gitignore |1 +
 lib/libcflat.h   |2 +
 lib/libfdt/Makefile.libfdt   |   10 +
 lib/libfdt/README|4 +
 lib/libfdt/fdt.c |  250 +++
 lib/libfdt/fdt.h |  111 
 lib/libfdt/fdt_empty_tree.c  |   84 +++
 lib/libfdt/fdt_ro.c  |  573 
 lib/libfdt/fdt_rw.c  |  492 ++
 lib/libfdt/fdt_strerror.c|   96 +++
 lib/libfdt/fdt_sw.c  |  256 +++
 lib/libfdt/fdt_wip.c |  118 
 lib/libfdt/libfdt.h  | 1514 ++
 lib/libfdt/libfdt_env.h  |  111 
 lib/libfdt/libfdt_internal.h |   95 +++
 lib/libfdt/version.lds   |   60 ++
 lib/virtio-mmio.c|  175 +
 lib/virtio-mmio.h|   65 ++
 lib/virtio.c |  130 
 lib/virtio.h |  149 +
 56 files changed, 6794 insertions(+), 6 deletions(-)
 create mode 100644 arm/cstart.S
 create mode 100644 arm/flat.lds
 create mode 100755 arm/run
 create mode 100644 arm/selftest.c
 create mode 100644 arm/unittests.cfg
 create mode 100644 config/asm-offsets.mak
 create mode 100644 config/config-arm.mak
 create

[PATCH v7 06/14] lib/asm-generic: add page.h and virt_to_phys/phys_to_virt

2014-07-16 Thread Andrew Jones

Signed-off-by: Andrew Jones drjo...@redhat.com
---
 lib/asm-generic/io.h   | 13 +
 lib/asm-generic/page.h | 28 
 2 files changed, 41 insertions(+)
 create mode 100644 lib/asm-generic/page.h

diff --git a/lib/asm-generic/io.h b/lib/asm-generic/io.h
index f00f4d3e68fe1..a9939d3a5921f 100644
--- a/lib/asm-generic/io.h
+++ b/lib/asm-generic/io.h
@@ -10,6 +10,7 @@
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
 #include libcflat.h
+#include asm/page.h
 
 #ifndef __raw_readb
 static inline u8 __raw_readb(const volatile void *addr)
@@ -159,4 +160,16 @@ static inline void *ioremap(u64 phys_addr, size_t size 
__unused)
 }
 #endif
 
+#ifndef virt_to_phys
+static inline unsigned long virt_to_phys(volatile void *address)
+{
+   return __pa((unsigned long)address);
+}
+
+static inline void *phys_to_virt(unsigned long address)
+{
+   return __va(address);
+}
+#endif
+
 #endif /* _ASM_GENERIC_IO_H_ */
diff --git a/lib/asm-generic/page.h b/lib/asm-generic/page.h
new file mode 100644
index 0..559938fcf0b3f
--- /dev/null
+++ b/lib/asm-generic/page.h
@@ -0,0 +1,28 @@
+#ifndef _ASM_GENERIC_PAGE_H_
+#define _ASM_GENERIC_PAGE_H_
+/*
+ * asm-generic/page.h
+ *  adapted from the Linux kernel's include/asm-generic/page.h
+ *
+ * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+
+#define PAGE_SHIFT 12
+#ifndef __ASSEMBLY__
+#define PAGE_SIZE  (1UL  PAGE_SHIFT)
+#else
+#define PAGE_SIZE  (1  PAGE_SHIFT)
+#endif
+#define PAGE_MASK  (~(PAGE_SIZE-1))
+#define PAGE_ALIGN(addr)   (((addr) + (PAGE_SIZE-1))  PAGE_MASK)
+
+#ifndef __ASSEMBLY__
+#define __va(x)((void *)((unsigned long) (x)))
+#define __pa(x)((unsigned long) (x))
+#define virt_to_pfn(kaddr) (__pa(kaddr)  PAGE_SHIFT)
+#define pfn_to_virt(pfn)   __va((pfn)  PAGE_SHIFT)
+#endif
+
+#endif
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v7 03/14] Introduce asm-generic/*.h files

2014-07-16 Thread Andrew Jones

Architecture neutral code may need to call low-level io accessors,
or use spinlocks. Create a generic io.h to ensure those accessors
are defined, and a generic spinlock.h that complains when included,
as we can't write a generic spinlock. These files can be overridden
or extended by architecture specific versions placed in
lib/$ARCH/asm/.

Signed-off-by: Andrew Jones drjo...@redhat.com
Acked-by: Christoffer Dall christoffer.d...@linaro.org
---
v5: added a trivial ioremap function [Christoffer Dall]
v4: introduce lib/asm symlink to get rid of #ifdef __arm__,
add spinlock.h too
v3: wrt to io.h (was libio.[ch]) only
- get rid of CONFIG_64BIT, replace with asserts
- get rid of {read,write}_len() [libio.c]
- fix bad *64_to_cpu macros
---
 .gitignore |   1 +
 Makefile   |   6 +-
 configure  |  11 +++
 lib/asm-generic/io.h   | 162 +
 lib/asm-generic/spinlock.h |   4 ++
 5 files changed, 181 insertions(+), 3 deletions(-)
 create mode 100644 lib/asm-generic/io.h
 create mode 100644 lib/asm-generic/spinlock.h

diff --git a/.gitignore b/.gitignore
index 775d0dfd8263e..e21939a8771e9 100644
--- a/.gitignore
+++ b/.gitignore
@@ -9,6 +9,7 @@ patches
 .stgit-*
 cscope.*
 *.swp
+/lib/asm
 /config.mak
 /*-run
 /test.log
diff --git a/Makefile b/Makefile
index 180189ecd6d8c..dd7e6e94bfe7b 100644
--- a/Makefile
+++ b/Makefile
@@ -77,11 +77,11 @@ libfdt_clean:
$(LIBFDT_objdir)/.*.d
 
 distclean: clean libfdt_clean
-   $(RM) config.mak $(TEST_DIR)-run test.log msr.out cscope.*
+   $(RM) lib/asm config.mak $(TEST_DIR)-run test.log msr.out cscope.*
 
-cscope: common_dirs = lib lib/libfdt
+cscope: common_dirs = lib lib/libfdt lib/asm lib/asm-generic
 cscope:
$(RM) ./cscope.*
-   find $(TEST_DIR) lib/$(TEST_DIR) $(common_dirs) -maxdepth 1 \
+   find -L $(TEST_DIR) lib/$(TEST_DIR) $(common_dirs) -maxdepth 1 \
-name '*.[chsS]' -print | sed 's,^\./,,'  ./cscope.files
cscope -bk
diff --git a/configure b/configure
index dbbc6045d214a..aaa1b50ab1b98 100755
--- a/configure
+++ b/configure
@@ -91,6 +91,17 @@ if [ $exit -eq 0 ]; then
 fi
 rm -f lib_test.c
 
+# link lib/asm for the architecture
+rm -f lib/asm
+asm=asm-generic
+if [ -d lib/$arch/asm ]; then
+   asm=$arch/asm
+elif [ -d lib/$testdir/asm ]; then
+   asm=$testdir/asm
+fi
+ln -s $asm lib/asm
+
+# create the config
 cat EOF  config.mak
 PREFIX=$prefix
 KERNELDIR=$(readlink -f $kerneldir)
diff --git a/lib/asm-generic/io.h b/lib/asm-generic/io.h
new file mode 100644
index 0..f00f4d3e68fe1
--- /dev/null
+++ b/lib/asm-generic/io.h
@@ -0,0 +1,162 @@
+#ifndef _ASM_GENERIC_IO_H_
+#define _ASM_GENERIC_IO_H_
+/*
+ * asm-generic/io.h
+ *  adapted from the Linux kernel's include/asm-generic/io.h
+ *  and arch/arm/include/asm/io.h
+ *
+ * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+#include libcflat.h
+
+#ifndef __raw_readb
+static inline u8 __raw_readb(const volatile void *addr)
+{
+   return *(const volatile u8 *)addr;
+}
+#endif
+
+#ifndef __raw_readw
+static inline u16 __raw_readw(const volatile void *addr)
+{
+   return *(const volatile u16 *)addr;
+}
+#endif
+
+#ifndef __raw_readl
+static inline u32 __raw_readl(const volatile void *addr)
+{
+   return *(const volatile u32 *)addr;
+}
+#endif
+
+#ifndef __raw_readq
+static inline u64 __raw_readq(const volatile void *addr)
+{
+   assert(sizeof(unsigned long) == sizeof(u64));
+   return *(const volatile u64 *)addr;
+}
+#endif
+
+#ifndef __raw_writeb
+static inline void __raw_writeb(u8 b, volatile void *addr)
+{
+   *(volatile u8 *)addr = b;
+}
+#endif
+
+#ifndef __raw_writew
+static inline void __raw_writew(u16 b, volatile void *addr)
+{
+   *(volatile u16 *)addr = b;
+}
+#endif
+
+#ifndef __raw_writel
+static inline void __raw_writel(u32 b, volatile void *addr)
+{
+   *(volatile u32 *)addr = b;
+}
+#endif
+
+#ifndef __raw_writeq
+static inline void __raw_writeq(u64 b, volatile void *addr)
+{
+   assert(sizeof(unsigned long) == sizeof(u64));
+   *(volatile u64 *)addr = b;
+}
+#endif
+
+#ifndef __bswap16
+static inline u16 __bswap16(u16 x)
+{
+   return ((x  8)  0xff) | ((x  0xff)  8);
+}
+#endif
+
+#ifndef __bswap32
+static inline u32 __bswap32(u32 x)
+{
+   return ((x  0xff00)  24) | ((x  0x00ff)   8) |
+  ((x  0xff00)   8) | ((x  0x00ff)  24);
+}
+#endif
+
+#ifndef __bswap64
+static inline u64 __bswap64(u64 x)
+{
+   return ((x  0x00ffULL)  56) |
+  ((x  0xff00ULL)  40) |
+  ((x  0x00ffULL)  24) |
+  ((x  0xff00ULL)   8) |
+  ((x  0x00ffULL)   8) |
+  ((x  0xff00ULL)  24) |
+  ((x  0x00ffULL)  40) |
+  ((x

[PATCH v7 14/14] arm: vectors support

2014-07-16 Thread Andrew Jones

Add support for tests to use exception handlers using
install_exception_handler(). This patch also adds start_usr(),
which can be used to start a function in USR mode, using a given
stack pointer. start_usr() is used by a new selftest test that
checks the new vector support.

Signed-off-by: Andrew Jones drjo...@redhat.com
Reviewed-by: Christoffer Dall christoffer.d...@linaro.org
---
v7:
  - selftest.c: s/alloc_aligned/memalign/
  - lib/arm/processor.c remove unnecessary include asm/setup.h
v6: use alloc() for start_usr
v5: rebase change: replace __stringify with libcflat's new xstr
macro
v4: a couple tweaks to fit changes in the other patches,
vectors-usr test now has an 8K usr stack
v3:
  - squashed in 'arm: Simplify exceptions_init in cstart.S'
[Christoffer Dall]
  - suggested function name changes and comment additions
[Christoffer Dall]
  - fix a bug with stack restore from usr mode exceptions that
Christoffer pointed out. Add a get_sp() accessor too.
---
 arm/cstart.S| 174 
 arm/flat.lds|   7 +-
 arm/selftest.c  | 126 ++-
 arm/unittests.cfg   |  12 
 config/config-arm.mak   |   3 +-
 lib/arm/asm/processor.h |  39 +++
 lib/arm/processor.c | 111 ++
 7 files changed, 469 insertions(+), 3 deletions(-)
 create mode 100644 lib/arm/asm/processor.h
 create mode 100644 lib/arm/processor.c

diff --git a/arm/cstart.S b/arm/cstart.S
index e28251db2950d..cc87ece4b6b40 100644
--- a/arm/cstart.S
+++ b/arm/cstart.S
@@ -5,6 +5,10 @@
  *
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
+#define __ASSEMBLY__
+#include asm/asm-offsets.h
+#include asm/ptrace.h
+#include asm/cp15.h
 
 .arm
 
@@ -17,6 +21,13 @@ start:
 * See the kernel doc Documentation/arm/Booting
 */
ldr sp, =stacktop
+   push{r0-r3}
+
+   /* set up vector table and mode stacks */
+   bl  exceptions_init
+
+   /* complete setup */
+   pop {r0-r3}
bl  setup
 
/* run the test */
@@ -27,9 +38,172 @@ start:
bl  exit
b   halt
 
+
+.macro set_mode_stack mode, stack
+   add \stack, #S_FRAME_SIZE
+   msr cpsr_c, #(\mode | PSR_I_BIT | PSR_F_BIT)
+   mov sp, \stack
+.endm
+
+exceptions_init:
+   mrc p15, 0, r2, c1, c0, 0   @ read SCTLR
+   bic r2, #CR_V   @ SCTLR.V := 0
+   mcr p15, 0, r2, c1, c0, 0   @ write SCTLR
+   ldr r2, =vector_table
+   mcr p15, 0, r2, c12, c0, 0  @ write VBAR
+
+   mrs r2, cpsr
+   ldr r1, =exception_stacks
+
+   /* first frame reserved for svc mode */
+   set_mode_stack  UND_MODE, r1
+   set_mode_stack  ABT_MODE, r1
+   set_mode_stack  IRQ_MODE, r1
+   set_mode_stack  FIQ_MODE, r1
+
+   msr cpsr_cxsf, r2   @ back to svc mode
+   mov pc, lr
+
 .text
 
 .globl halt
 halt:
 1: wfi
b   1b
+
+/*
+ * Vector stubs
+ * Simplified version of the Linux kernel implementation
+ *   arch/arm/kernel/entry-armv.S
+ *
+ * Each mode has an S_FRAME_SIZE sized stack initialized
+ * in exceptions_init
+ */
+.macro vector_stub, name, vec, mode, correction=0
+.align 5
+vector_\name:
+.if \correction
+   sub lr, lr, #\correction
+.endif
+   /*
+* Save r0, r1, lr_exception (parent PC)
+* and spsr_exception (parent CPSR)
+*/
+   str r0, [sp, #S_R0]
+   str r1, [sp, #S_R1]
+   str lr, [sp, #S_PC]
+   mrs r0, spsr
+   str r0, [sp, #S_PSR]
+
+   /* Prepare for SVC32 mode. */
+   mrs r0, cpsr
+   bic r0, #MODE_MASK
+   orr r0, #SVC_MODE
+   msr spsr_cxsf, r0
+
+   /* Branch to handler in SVC mode */
+   mov r0, #\vec
+   mov r1, sp
+   ldr lr, =vector_common
+   movspc, lr
+.endm
+
+vector_stubrst,0, UND_MODE
+vector_stubund,1, UND_MODE
+vector_stubpabt,   3, ABT_MODE, 4
+vector_stubdabt,   4, ABT_MODE, 8
+vector_stubirq,6, IRQ_MODE, 4
+vector_stubfiq,7, FIQ_MODE, 4
+
+.align 5
+vector_svc:
+   /*
+* Save r0, r1, lr_exception (parent PC)
+* and spsr_exception (parent CPSR)
+*/
+   push{ r1 }
+   ldr r1, =exception_stacks
+   str r0, [r1, #S_R0]
+   pop { r0 }
+   str r0, [r1, #S_R1]
+   str lr, [r1, #S_PC]
+   mrs r0, spsr
+   str r0, [r1, #S_PSR]
+
+   /*
+* Branch to handler, still in SVC mode.
+* r0 := 2 is the svc vector number.
+*/
+   mov r0, #2
+   ldr lr, =vector_common
+   mov pc, lr
+
+vector_common:
+   /* make room for pt_regs */
+   sub sp, #S_FRAME_SIZE
+   tst sp, #4  @ check stack alignment
+   subne   sp, #4
+
+   /* store

[PATCH v7 07/14] virtio: add minimal support for virtqueues

2014-07-16 Thread Andrew Jones

Currently only supports sending (outbufs), doesn't have any
bells or whistles. Code adapted from the Linux Kernel.

Signed-off-by: Andrew Jones drjo...@redhat.com

---
v7:
 - {alloc,alloc_aligned} - {calloc,memalign}
 - changes now split between virtio.* and virtio-mmio.* files
---
 lib/virtio-mmio.c |  64 +
 lib/virtio-mmio.h |  18 +
 lib/virtio.c  | 117 ++
 lib/virtio.h  |  73 ++
 4 files changed, 272 insertions(+)

diff --git a/lib/virtio-mmio.c b/lib/virtio-mmio.c
index 7331abf128cc5..3840838defa1c 100644
--- a/lib/virtio-mmio.c
+++ b/lib/virtio-mmio.c
@@ -1,4 +1,6 @@
 /*
+ * virtqueue support adapted from the Linux kernel.
+ *
  * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com
  *
  * This work is licensed under the terms of the GNU LGPL, version 2.
@@ -6,6 +8,7 @@
 #include libcflat.h
 #include devicetree.h
 #include alloc.h
+#include asm/page.h
 #include asm/io.h
 #include virtio.h
 #include virtio-mmio.h
@@ -32,9 +35,68 @@ static void vm_set(struct virtio_device *vdev, unsigned 
offset,
writeb(p[i], vm_dev-base + VIRTIO_MMIO_CONFIG + offset + i);
 }
 
+static bool vm_notify(struct virtqueue *vq)
+{
+   struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq-vdev);
+   writel(vq-index, vm_dev-base + VIRTIO_MMIO_QUEUE_NOTIFY);
+   return true;
+}
+
+static struct virtqueue *vm_setup_vq(struct virtio_device *vdev,
+unsigned index,
+void (*callback)(struct virtqueue *vq),
+const char *name)
+{
+   struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+   struct vring_virtqueue *vq;
+   void *queue;
+   unsigned num = VIRTIO_MMIO_QUEUE_NUM_MIN;
+
+   vq = calloc(1, sizeof(*vq));
+   queue = memalign(PAGE_SIZE, VIRTIO_MMIO_QUEUE_SIZE_MIN);
+   if (!vq || !queue)
+   return NULL;
+
+   writel(index, vm_dev-base + VIRTIO_MMIO_QUEUE_SEL);
+
+   assert(readl(vm_dev-base + VIRTIO_MMIO_QUEUE_NUM_MAX) = num);
+
+   if (readl(vm_dev-base + VIRTIO_MMIO_QUEUE_PFN) != 0) {
+   printf(%s: virtqueue %d already setup! base=%p\n,
+   __func__, index, vm_dev-base);
+   return NULL;
+   }
+
+   writel(num, vm_dev-base + VIRTIO_MMIO_QUEUE_NUM);
+   writel(VIRTIO_MMIO_VRING_ALIGN,
+   vm_dev-base + VIRTIO_MMIO_QUEUE_ALIGN);
+   writel(virt_to_pfn(queue), vm_dev-base + VIRTIO_MMIO_QUEUE_PFN);
+
+   vring_init_virtqueue(vq, index, num, VIRTIO_MMIO_VRING_ALIGN,
+vdev, queue, vm_notify, callback, name);
+
+   return vq-vq;
+}
+
+static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
+  struct virtqueue *vqs[], vq_callback_t *callbacks[],
+  const char *names[])
+{
+   unsigned i;
+
+   for (i = 0; i  nvqs; ++i) {
+   vqs[i] = vm_setup_vq(vdev, i, callbacks[i], names[i]);
+   if (vqs[i] == NULL)
+   return -1;
+   }
+
+   return 0;
+}
+
 static const struct virtio_config_ops vm_config_ops = {
.get = vm_get,
.set = vm_set,
+   .find_vqs = vm_find_vqs,
 };
 
 static void vm_device_init(struct virtio_mmio_device *vm_dev)
@@ -42,6 +104,8 @@ static void vm_device_init(struct virtio_mmio_device *vm_dev)
vm_dev-vdev.id.device = readl(vm_dev-base + VIRTIO_MMIO_DEVICE_ID);
vm_dev-vdev.id.vendor = readl(vm_dev-base + VIRTIO_MMIO_VENDOR_ID);
vm_dev-vdev.config = vm_config_ops;
+
+   writel(PAGE_SIZE, vm_dev-base + VIRTIO_MMIO_GUEST_PAGE_SIZE);
 }
 
 /**
diff --git a/lib/virtio-mmio.h b/lib/virtio-mmio.h
index 7cd610428b486..8046a4747959a 100644
--- a/lib/virtio-mmio.h
+++ b/lib/virtio-mmio.h
@@ -8,6 +8,7 @@
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
 #include libcflat.h
+#include asm/page.h
 #include virtio.h
 
 #define VIRTIO_MMIO_MAGIC_VALUE0x000
@@ -33,6 +34,23 @@
 #define VIRTIO_MMIO_INT_VRING  (1  0)
 #define VIRTIO_MMIO_INT_CONFIG (1  1)
 
+#define VIRTIO_MMIO_VRING_ALIGNPAGE_SIZE
+
+/*
+ * The minimum queue size is 2*VIRTIO_MMIO_VRING_ALIGN, which
+ * means the largest queue num for the minimum queue size is 128, i.e.
+ * 2*VIRTIO_MMIO_VRING_ALIGN = vring_size(128, VIRTIO_MMIO_VRING_ALIGN),
+ * where vring_size is
+ *
+ * unsigned vring_size(unsigned num, unsigned long align)
+ * {
+ * return ((sizeof(struct vring_desc) * num + sizeof(u16) * (3 + num)
+ *  + align - 1)  ~(align - 1))
+ * + sizeof(u16) * 3 + sizeof(struct vring_used_elem) * num;
+ * }
+ */
+#define VIRTIO_MMIO_QUEUE_SIZE_MIN (2*VIRTIO_MMIO_VRING_ALIGN)
+#define

[PATCH v7 11/14] arm: Add IO accessors to avoid register-writeback

2014-07-16 Thread Andrew Jones

From: Christoffer Dall christoffer.d...@linaro.org

Add IO accessor functions to the arm library functions to avoid
register-writeback IO accessors that are not yet supported by the
kernel.

Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
Signed-off-by: Andrew Jones drjo...@redhat.com
---
 lib/arm/asm/io.h | 57 
 1 file changed, 57 insertions(+)

diff --git a/lib/arm/asm/io.h b/lib/arm/asm/io.h
index 51ec6e9aa2e99..1d0abb7d9f405 100644
--- a/lib/arm/asm/io.h
+++ b/lib/arm/asm/io.h
@@ -3,6 +3,9 @@
 #include libcflat.h
 #include asm/barrier.h
 
+#define __iomem
+#define __force
+
 #define __bswap16 bswap16
 static inline u16 bswap16(u16 val)
 {
@@ -19,6 +22,60 @@ static inline u32 bswap32(u32 val)
return ret;
 }
 
+#define __raw_readb __raw_readb
+static inline u8 __raw_readb(const volatile void __iomem *addr)
+{
+   u8 val;
+   asm volatile(ldrb %1, %0
+: +Qo (*(volatile u8 __force *)addr),
+  =r (val));
+   return val;
+}
+
+#define __raw_readw __raw_readw
+static inline u16 __raw_readw(const volatile void __iomem *addr)
+{
+   u16 val;
+   asm volatile(ldrh %1, %0
+: +Q (*(volatile u16 __force *)addr),
+  =r (val));
+   return val;
+}
+
+#define __raw_readl __raw_readl
+static inline u32 __raw_readl(const volatile void __iomem *addr)
+{
+   u32 val;
+   asm volatile(ldr %1, %0
+: +Qo (*(volatile u32 __force *)addr),
+  =r (val));
+   return val;
+}
+
+#define __raw_writeb __raw_writeb
+static inline void __raw_writeb(u8 val, volatile void __iomem *addr)
+{
+   asm volatile(strb %1, %0
+: +Qo (*(volatile u8 __force *)addr)
+: r (val));
+}
+
+#define __raw_writew __raw_writew
+static inline void __raw_writew(u16 val, volatile void __iomem *addr)
+{
+   asm volatile(strh %1, %0
+: +Q (*(volatile u16 __force *)addr)
+: r (val));
+}
+
+#define __raw_writel __raw_writel
+static inline void __raw_writel(u32 val, volatile void __iomem *addr)
+{
+   asm volatile(str %1, %0
+: +Qo (*(volatile u32 __force *)addr)
+: r (val));
+}
+
 #include asm-generic/io.h
 
 #endif /* _ASMARM_IO_H_ */
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v7 13/14] arm: add useful headers from the Linux kernel

2014-07-16 Thread Andrew Jones

We're going to need PSR bit defines and pt_regs. We'll also need
pt_regs offsets in assembly code. This patch adapts the Linux
kernel's ptrace.h and generated/asm-offsets.h to this framework.
It also adapts cp15.h from the kernel, since we'll need bit defines
from there too.

Signed-off-by: Andrew Jones drjo...@redhat.com
Acked-by: Christoffer Dall christoffer.d...@linaro.org
---
v4: much improved asm-offsets.h generation based on Kbuild
---
 config/asm-offsets.mak|  41 +++
 config/config-arm.mak |   9 -
 lib/arm/.gitignore|   1 +
 lib/arm/asm-offsets.c |  39 ++
 lib/arm/asm/asm-offsets.h |   1 +
 lib/arm/asm/cp15.h|  37 +
 lib/arm/asm/ptrace.h  | 100 ++
 lib/generated/.gitignore  |   1 +
 8 files changed, 227 insertions(+), 2 deletions(-)
 create mode 100644 config/asm-offsets.mak
 create mode 100644 lib/arm/.gitignore
 create mode 100644 lib/arm/asm-offsets.c
 create mode 100644 lib/arm/asm/asm-offsets.h
 create mode 100644 lib/arm/asm/cp15.h
 create mode 100644 lib/arm/asm/ptrace.h
 create mode 100644 lib/generated/.gitignore

diff --git a/config/asm-offsets.mak b/config/asm-offsets.mak
new file mode 100644
index 0..b2578a6692f33
--- /dev/null
+++ b/config/asm-offsets.mak
@@ -0,0 +1,41 @@
+#
+# asm-offsets adapted from the kernel, see
+#   Kbuild
+#   scripts/Kbuild.include
+#   scripts/Makefile.build
+#
+#   Authors: Andrew Jones drjo...@redhat.com
+#
+
+define sed-y
+   /^-/{s:-#\(.*\):/* \1 */:; \
+   s:^-\([^ ]*\) [\$$#]*\([-0-9]*\) \(.*\):#define \1 \2 /* \3 */:; \
+   s:^-\([^ ]*\) [\$$#]*\([^ ]*\) \(.*\):#define \1 \2 /* \3 */:; \
+   s:-::; p;}
+endef
+
+define make_asm_offsets
+   (set -e; \
+echo #ifndef __ASM_OFFSETS_H__; \
+echo #define __ASM_OFFSETS_H__; \
+echo /*; \
+echo  * Generated file. DO NOT MODIFY.; \
+echo  *; \
+echo  */; \
+echo ; \
+sed -ne $(sed-y) $; \
+echo ; \
+echo #endif )  $@
+endef
+
+$(asm-offsets:.h=.s): $(asm-offsets:.h=.c)
+   $(CC) $(CFLAGS) -fverbose-asm -S -o $@ $
+
+$(asm-offsets): $(asm-offsets:.h=.s)
+   $(call make_asm_offsets)
+   cp -f $(asm-offsets) lib/generated
+
+asm_offsets_clean:
+   $(RM) $(asm-offsets) $(asm-offsets:.h=.s) \
+ $(addprefix lib/generated/,$(notdir $(asm-offsets)))
+
diff --git a/config/config-arm.mak b/config/config-arm.mak
index b7239810183d1..f03b96d4c50c5 100644
--- a/config/config-arm.mak
+++ b/config/config-arm.mak
@@ -30,6 +30,9 @@ CFLAGS += -Wextra
 CFLAGS += -O2
 CFLAGS += -I lib -I lib/libfdt
 
+asm-offsets = lib/arm/asm-offsets.h
+include config/asm-offsets.mak
+
 cflatobjs += \
lib/alloc.o \
lib/devicetree.o \
@@ -59,7 +62,7 @@ FLATLIBS = $(libcflat) $(LIBFDT_archive) $(libgcc) $(libeabi)
 $(libeabi): $(eabiobjs)
$(AR) rcs $@ $^
 
-arch_clean: libfdt_clean
+arch_clean: libfdt_clean asm_offsets_clean
$(RM) $(TEST_DIR)/*.{o,flat,elf} $(libeabi) $(eabiobjs) \
  $(TEST_DIR)/.*.d lib/arm/.*.d
 
@@ -69,7 +72,9 @@ tests_and_config = $(TEST_DIR)/*.flat 
$(TEST_DIR)/unittests.cfg
 
 cstart.o = $(TEST_DIR)/cstart.o
 
-test_cases: $(tests-common) $(tests)
+generated_files = $(asm-offsets)
+
+test_cases: $(generated_files) $(tests-common) $(tests)
 
 $(TEST_DIR)/selftest.elf: $(cstart.o) $(TEST_DIR)/selftest.o
 
diff --git a/lib/arm/.gitignore b/lib/arm/.gitignore
new file mode 100644
index 0..84872bf197c67
--- /dev/null
+++ b/lib/arm/.gitignore
@@ -0,0 +1 @@
+asm-offsets.[hs]
diff --git a/lib/arm/asm-offsets.c b/lib/arm/asm-offsets.c
new file mode 100644
index 0..a9c349d2d427c
--- /dev/null
+++ b/lib/arm/asm-offsets.c
@@ -0,0 +1,39 @@
+/*
+ * Adapted from arch/arm/kernel/asm-offsets.c
+ *
+ * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+#include libcflat.h
+#include asm/ptrace.h
+
+#define DEFINE(sym, val) \
+   asm volatile(\n- #sym  %0  #val : : i (val))
+#define OFFSET(sym, str, mem)  DEFINE(sym, offsetof(struct str, mem))
+#define COMMENT(x) asm volatile(\n-# x)
+#define BLANK()asm volatile(\n- : : )
+
+int main(void)
+{
+   OFFSET(S_R0, pt_regs, ARM_r0);
+   OFFSET(S_R1, pt_regs, ARM_r1);
+   OFFSET(S_R2, pt_regs, ARM_r2);
+   OFFSET(S_R3, pt_regs, ARM_r3);
+   OFFSET(S_R4, pt_regs, ARM_r4);
+   OFFSET(S_R5, pt_regs, ARM_r5);
+   OFFSET(S_R6, pt_regs, ARM_r6);
+   OFFSET(S_R7, pt_regs, ARM_r7);
+   OFFSET(S_R8, pt_regs, ARM_r8);
+   OFFSET(S_R9, pt_regs, ARM_r9);
+   OFFSET(S_R10, pt_regs, ARM_r10);
+   OFFSET(S_FP, pt_regs, ARM_fp);
+   OFFSET(S_IP, pt_regs, ARM_ip);
+   OFFSET(S_SP, pt_regs, ARM_sp);
+   OFFSET(S_LR, pt_regs, ARM_lr);
+   OFFSET(S_PC, pt_regs, ARM_pc);
+   OFFSET(S_PSR,

[PATCH v7 12/14] arm: Add arch-specific asm/page.h and va/pa

2014-07-16 Thread Andrew Jones

These are pretty much the same as the asm-generic version,
but use phys_addr_t.

Signed-off-by: Andrew Jones drjo...@redhat.com
---
 lib/arm/asm/io.h   | 13 +
 lib/arm/asm/page.h | 34 +-
 2 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/lib/arm/asm/io.h b/lib/arm/asm/io.h
index 1d0abb7d9f405..bbcbcd0542490 100644
--- a/lib/arm/asm/io.h
+++ b/lib/arm/asm/io.h
@@ -2,6 +2,7 @@
 #define _ASMARM_IO_H_
 #include libcflat.h
 #include asm/barrier.h
+#include asm/page.h
 
 #define __iomem
 #define __force
@@ -76,6 +77,18 @@ static inline void __raw_writel(u32 val, volatile void 
__iomem *addr)
 : r (val));
 }
 
+#define virt_to_phys virt_to_phys
+static inline phys_addr_t virt_to_phys(const volatile void *x)
+{
+   return __virt_to_phys((unsigned long)(x));
+}
+
+#define phys_to_virt phys_to_virt
+static inline void *phys_to_virt(phys_addr_t x)
+{
+   return (void *)__phys_to_virt(x);
+}
+
 #include asm-generic/io.h
 
 #endif /* _ASMARM_IO_H_ */
diff --git a/lib/arm/asm/page.h b/lib/arm/asm/page.h
index 91a4bc3b7f86e..606d76f5775cf 100644
--- a/lib/arm/asm/page.h
+++ b/lib/arm/asm/page.h
@@ -1 +1,33 @@
-#include asm-generic/page.h
+#ifndef _ASMARM_PAGE_H_
+#define _ASMARM_PAGE_H_
+/*
+ * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+
+#define PAGE_SHIFT 12
+#ifndef __ASSEMBLY__
+#define PAGE_SIZE  (1UL  PAGE_SHIFT)
+#else
+#define PAGE_SIZE  (1  PAGE_SHIFT)
+#endif
+#define PAGE_MASK  (~(PAGE_SIZE-1))
+#define PAGE_ALIGN(addr)   (((addr) + (PAGE_SIZE-1))  PAGE_MASK)
+
+#ifndef __ASSEMBLY__
+#include asm/setup.h
+
+#ifndef __virt_to_phys
+#define __phys_to_virt(x)  ((unsigned long) (x))
+#define __virt_to_phys(x)  (x)
+#endif
+
+#define __va(x)((void 
*)__phys_to_virt((phys_addr_t)(x)))
+#define __pa(x)__virt_to_phys((unsigned long)(x))
+
+#define virt_to_pfn(kaddr)  (__pa(kaddr)  PAGE_SHIFT)
+#define pfn_to_virt(pfn)__va((pfn)  PAGE_SHIFT)
+#endif
+
+#endif
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v7 05/14] add minimal virtio support for devtree virtio-mmio

2014-07-16 Thread Andrew Jones

Support the bare minimum of virtio to enable access to the virtio-mmio
config space of a device. Currently this implementation must use a
device tree to find the device.

Signed-off-by: Andrew Jones drjo...@redhat.com
Reviewed-by: Christoffer Dall christoffer.d...@linaro.org
---
v7:
 - s/alloc/calloc/
 - split into virtio.[ch] and virtio-mmio.[ch] [Paolo Bonzini]
 - dump virtio_bind_busses table [Paolo Bonzini]
v6:
 - switch to using alloc()
 - s/vmdev/vm_dev/ to be consistent with kernel naming
 - check for virtio magic in vm_dt_match
v5:
 - use same virtio struct names as kernel
 - no need to alloc a new virtio_config_ops for each virtio device
 - use ioremap
v4:
 - split from the virtio-testdev patch
 - search a table to discover that the device must be DT/virtio-mmio,
   which doesn't change anything, but looks less hacky than comments
   saying the device must be DT/virtio-mmio...
 - manage own pool of virtio-mmio pre-allocated device structures in
   order to avoid needing access to the heap
---
 lib/virtio-mmio.c | 111 ++
 lib/virtio-mmio.h |  47 +++
 lib/virtio.c  |  13 +++
 lib/virtio.h  |  74 
 4 files changed, 245 insertions(+)
 create mode 100644 lib/virtio-mmio.c
 create mode 100644 lib/virtio-mmio.h
 create mode 100644 lib/virtio.c
 create mode 100644 lib/virtio.h

diff --git a/lib/virtio-mmio.c b/lib/virtio-mmio.c
new file mode 100644
index 0..7331abf128cc5
--- /dev/null
+++ b/lib/virtio-mmio.c
@@ -0,0 +1,111 @@
+/*
+ * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+#include libcflat.h
+#include devicetree.h
+#include alloc.h
+#include asm/io.h
+#include virtio.h
+#include virtio-mmio.h
+
+static void vm_get(struct virtio_device *vdev, unsigned offset,
+  void *buf, unsigned len)
+{
+   struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+   u8 *p = buf;
+   unsigned i;
+
+   for (i = 0; i  len; ++i)
+   p[i] = readb(vm_dev-base + VIRTIO_MMIO_CONFIG + offset + i);
+}
+
+static void vm_set(struct virtio_device *vdev, unsigned offset,
+  const void *buf, unsigned len)
+{
+   struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+   const u8 *p = buf;
+   unsigned i;
+
+   for (i = 0; i  len; ++i)
+   writeb(p[i], vm_dev-base + VIRTIO_MMIO_CONFIG + offset + i);
+}
+
+static const struct virtio_config_ops vm_config_ops = {
+   .get = vm_get,
+   .set = vm_set,
+};
+
+static void vm_device_init(struct virtio_mmio_device *vm_dev)
+{
+   vm_dev-vdev.id.device = readl(vm_dev-base + VIRTIO_MMIO_DEVICE_ID);
+   vm_dev-vdev.id.vendor = readl(vm_dev-base + VIRTIO_MMIO_VENDOR_ID);
+   vm_dev-vdev.config = vm_config_ops;
+}
+
+/**
+ * virtio-mmio device tree support
+ **/
+
+struct vm_dt_info {
+   u32 devid;
+   void *base;
+};
+
+static int vm_dt_match(const struct dt_device *dev, int fdtnode)
+{
+   struct vm_dt_info *info = (struct vm_dt_info *)dev-info;
+   struct dt_pbus_reg base;
+   u32 magic;
+
+   dt_device_bind_node((struct dt_device *)dev, fdtnode);
+
+   assert(dt_pbus_get_base(dev, base) == 0);
+   info-base = ioremap(base.addr, base.size);
+
+   magic = readl(info-base + VIRTIO_MMIO_MAGIC_VALUE);
+   if (magic != ('v' | 'i'  8 | 'r'  16 | 't'  24))
+   return false;
+
+   return readl(info-base + VIRTIO_MMIO_DEVICE_ID) == info-devid;
+}
+
+static struct virtio_device *virtio_mmio_dt_bind(u32 devid)
+{
+   struct virtio_mmio_device *vm_dev;
+   struct dt_device dt_dev;
+   struct dt_bus dt_bus;
+   struct vm_dt_info info;
+   int node;
+
+   if (!dt_available())
+   return NULL;
+
+   dt_bus_init_defaults(dt_bus);
+   dt_bus.match = vm_dt_match;
+
+   info.devid = devid;
+
+   dt_device_init(dt_dev, dt_bus, info);
+
+   node = dt_device_find_compatible(dt_dev, virtio,mmio);
+   assert(node = 0 || node == -FDT_ERR_NOTFOUND);
+
+   if (node == -FDT_ERR_NOTFOUND)
+   return NULL;
+
+   vm_dev = calloc(1, sizeof(*vm_dev));
+   if (!vm_dev)
+   return NULL;
+
+   vm_dev-base = info.base;
+   vm_device_init(vm_dev);
+
+   return vm_dev-vdev;
+}
+
+struct virtio_device *virtio_mmio_bind(u32 devid)
+{
+   return virtio_mmio_dt_bind(devid);
+}
diff --git a/lib/virtio-mmio.h b/lib/virtio-mmio.h
new file mode 100644
index 0..7cd610428b486
--- /dev/null
+++ b/lib/virtio-mmio.h
@@ -0,0 +1,47 @@
+#ifndef _VIRTIO_MMIO_H_
+#define _VIRTIO_MMIO_H_
+/*
+ * A minimal implementation of virtio-mmio. Adapted from the Linux Kernel.
+ *
+ * Copyright (C) 2014, Red Hat Inc, Andrew

[PATCH v7 04/14] Introduce lib/alloc

2014-07-16 Thread Andrew Jones

alloc supplies three ingredients to the test framework that are all
related to the support of dynamic memory allocation.

The first is a set of alloc function wrappers for malloc and its
friends. Using wrappers allows test code and common code to use the
same interface for memory allocation at all stages, even though the
implementations may change with the stage, e.g. pre/post paging.

The second is a set of implementations for the alloc function
interfaces. These implementations are named early_*, as they can be
used almost immediately by the test framework.

The third is a very simple physical memory allocator, which the
early_* alloc functions build on.

Signed-off-by: Andrew Jones drjo...@redhat.com
---
v7: expanded from only supplying the alloc function wrappers to
including early_* and phys_alloc [Paolo Bonzini]
---
 lib/alloc.c | 176 
 lib/alloc.h | 123 ++
 2 files changed, 299 insertions(+)
 create mode 100644 lib/alloc.c
 create mode 100644 lib/alloc.h

diff --git a/lib/alloc.c b/lib/alloc.c
new file mode 100644
index 0..5d55e285dcd1d
--- /dev/null
+++ b/lib/alloc.c
@@ -0,0 +1,176 @@
+/*
+ * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+#include alloc.h
+#include asm/spinlock.h
+#include asm/io.h
+
+#define ALIGN_UP_MASK(x, mask) (((x) + (mask))  ~(mask))
+#define ALIGN_UP(x, a) ALIGN_UP_MASK(x, (typeof(x))(a) - 1)
+#define MIN(a, b)  ((a)  (b) ? (a) : (b))
+#define MAX(a, b)  ((a)  (b) ? (a) : (b))
+
+#define PHYS_ALLOC_NR_REGIONS  256
+
+struct phys_alloc_region {
+   phys_addr_t base;
+   phys_addr_t size;
+};
+
+static struct phys_alloc_region regions[PHYS_ALLOC_NR_REGIONS];
+static int nr_regions;
+
+static struct spinlock lock;
+static phys_addr_t base, top, align_min;
+
+void phys_alloc_show(void)
+{
+   int i;
+
+   spin_lock(lock);
+   printf(phys_alloc minimum alignment: 0x%llx\n, align_min);
+   for (i = 0; i  nr_regions; ++i)
+   printf(%016llx-%016llx [%s]\n,
+   regions[i].base,
+   regions[i].base + regions[i].size - 1,
+   USED);
+   printf(%016llx-%016llx [%s]\n, base, top - 1, FREE);
+   spin_unlock(lock);
+}
+
+void phys_alloc_init(phys_addr_t base_addr, phys_addr_t size)
+{
+   spin_lock(lock);
+   base = base_addr;
+   top = base + size;
+   align_min = DEFAULT_MINIMUM_ALIGNMENT;
+   spin_unlock(lock);
+}
+
+void phys_alloc_set_minimum_alignment(phys_addr_t align)
+{
+   assert(align  !(align  (align - 1)));
+   spin_lock(lock);
+   align_min = align;
+   spin_unlock(lock);
+}
+
+static phys_addr_t phys_alloc_aligned_safe(phys_addr_t size,
+  phys_addr_t align, bool safe)
+{
+   phys_addr_t addr, size_orig = size;
+   u64 top_safe = top;
+
+   if (safe  sizeof(long) == 4)
+   top_safe = MIN(top, 1ULL  32);
+
+   align = MAX(align, align_min);
+
+   spin_lock(lock);
+
+   addr = ALIGN_UP(base, align);
+   size += addr - base;
+
+   if ((top_safe - base)  size) {
+   printf(%s: requested=0x%llx (align=0x%llx), 
+  need=0x%llx, but free=0x%llx. 
+  top=0x%llx, top_safe=0x%llx\n, __func__,
+  size_orig, align, size, top_safe - base,
+  top, top_safe);
+   spin_unlock(lock);
+   return INVALID_PHYS_ADDR;
+   }
+
+   base += size;
+
+   if (nr_regions  PHYS_ALLOC_NR_REGIONS) {
+   regions[nr_regions].base = addr;
+   regions[nr_regions].size = size_orig;
+   ++nr_regions;
+   } else {
+   printf(%s: WARNING: no free log entries, 
+  can't log allocation...\n, __func__);
+   }
+
+   spin_unlock(lock);
+
+   return addr;
+}
+
+static phys_addr_t phys_zalloc_aligned_safe(phys_addr_t size,
+   phys_addr_t align, bool safe)
+{
+   phys_addr_t addr = phys_alloc_aligned_safe(size, align, safe);
+   if (addr == INVALID_PHYS_ADDR)
+   return addr;
+
+   memset(phys_to_virt(addr), 0, size);
+   return addr;
+}
+
+phys_addr_t phys_alloc_aligned(phys_addr_t size, phys_addr_t align)
+{
+   return phys_alloc_aligned_safe(size, align, false);
+}
+
+phys_addr_t phys_zalloc_aligned(phys_addr_t size, phys_addr_t align)
+{
+   return phys_zalloc_aligned_safe(size, align, false);
+}
+
+phys_addr_t phys_alloc(phys_addr_t size)
+{
+   return phys_alloc_aligned(size, align_min);
+}
+
+phys_addr_t phys_zalloc(phys_addr_t size)
+{
+   return phys_zalloc_aligned(size, align_min);
+}
+
+static void *early_malloc(size_t size)
+{
+   phys_addr_t addr =

[PATCH v7 02/14] add support for Linux device trees

2014-07-16 Thread Andrew Jones

Build libfdt and add some device tree functions built on it to the
arch-neutral lib code in order to facilitate the extraction of boot
info and device base addresses. These functions should work on device
trees conforming to section III of the kernel doc
Documentation/devicetree/booting-without-of.txt.

Signed-off-by: Andrew Jones drjo...@redhat.com
Reviewed-by: Christoffer Dall christoffer.d...@linaro.org
---
v7:
 - squashed the add a make target bit of v6's libfdt: get libfdt to
   build (now dropped) patch. The rest of that dropped patch has
   already been merged under libcflat: add more string functions
 - no need for info to be const in dt_device_init
v5:
 - changed *get_baseaddr* helpers to *get_base* helpers
 - a couple minor code changes [Christoffer Dall]
v4: reworked everything, added lots of comments to devicetree.h
---
 Makefile |  21 -
 lib/devicetree.c | 272 +++
 lib/devicetree.h | 236 +++
 lib/libcflat.h   |   2 +
 4 files changed, 529 insertions(+), 2 deletions(-)
 create mode 100644 lib/devicetree.c
 create mode 100644 lib/devicetree.h

diff --git a/Makefile b/Makefile
index 78d9ac664ac4b..180189ecd6d8c 100644
--- a/Makefile
+++ b/Makefile
@@ -22,6 +22,13 @@ cflatobjs := \
lib/abort.o \
lib/report.o
 
+# libfdt paths
+LIBFDT_objdir = lib/libfdt
+LIBFDT_srcdir = lib/libfdt
+LIBFDT_archive = $(LIBFDT_objdir)/libfdt.a
+LIBFDT_include = $(addprefix $(LIBFDT_srcdir)/,$(LIBFDT_INCLUDES))
+LIBFDT_version = $(addprefix $(LIBFDT_srcdir)/,$(LIBFDT_VERSION))
+
 #include architecure specific make rules
 include config/config-$(ARCH).mak
 
@@ -47,6 +54,11 @@ LDFLAGS += -pthread -lrt
 $(libcflat): $(cflatobjs)
$(AR) rcs $@ $^
 
+include $(LIBFDT_srcdir)/Makefile.libfdt
+$(LIBFDT_archive): CFLAGS += -ffreestanding -I lib -I lib/libfdt 
-Wno-sign-compare
+$(LIBFDT_archive): $(addprefix $(LIBFDT_objdir)/,$(LIBFDT_OBJS))
+   $(AR) rcs $@ $^
+
 %.o: %.S
$(CC) $(CFLAGS) -c -nostdlib -o $@ $
 
@@ -59,10 +71,15 @@ install:
 clean: arch_clean
$(RM) lib/.*.d $(libcflat) $(cflatobjs)
 
-distclean: clean
+libfdt_clean:
+   $(RM) $(LIBFDT_archive) \
+   $(addprefix $(LIBFDT_objdir)/,$(LIBFDT_OBJS)) \
+   $(LIBFDT_objdir)/.*.d
+
+distclean: clean libfdt_clean
$(RM) config.mak $(TEST_DIR)-run test.log msr.out cscope.*
 
-cscope: common_dirs = lib
+cscope: common_dirs = lib lib/libfdt
 cscope:
$(RM) ./cscope.*
find $(TEST_DIR) lib/$(TEST_DIR) $(common_dirs) -maxdepth 1 \
diff --git a/lib/devicetree.c b/lib/devicetree.c
new file mode 100644
index 0..0f9b4e9942736
--- /dev/null
+++ b/lib/devicetree.c
@@ -0,0 +1,272 @@
+/*
+ * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+#include libcflat.h
+#include libfdt/libfdt.h
+#include devicetree.h
+
+static const void *fdt;
+static u32 root_nr_address_cells, root_nr_size_cells;
+
+const void *dt_fdt(void)
+{
+   return fdt;
+}
+
+bool dt_available(void)
+{
+   return fdt_check_header(fdt) == 0;
+}
+
+int dt_get_nr_cells(int fdtnode, u32 *nr_address_cells, u32 *nr_size_cells)
+{
+   const struct fdt_property *prop;
+   u32 *nr_cells;
+   int len;
+
+   prop = fdt_get_property(fdt, fdtnode, #address-cells, len);
+   if (prop == NULL)
+   return len;
+
+   nr_cells = (u32 *)prop-data;
+   *nr_address_cells = fdt32_to_cpu(*nr_cells);
+
+   prop = fdt_get_property(fdt, fdtnode, #size-cells, len);
+   if (prop == NULL)
+   return len;
+
+   nr_cells = (u32 *)prop-data;
+   *nr_size_cells = fdt32_to_cpu(*nr_cells);
+
+   return 0;
+}
+
+void dt_reg_init(struct dt_reg *reg, u32 nr_address_cells, u32 nr_size_cells)
+{
+   memset(reg, 0, sizeof(struct dt_reg));
+   reg-nr_address_cells = nr_address_cells;
+   reg-nr_size_cells = nr_size_cells;
+}
+
+int dt_get_reg(int fdtnode, int regidx, struct dt_reg *reg)
+{
+   const struct fdt_property *prop;
+   u32 *cells, i;
+   unsigned nr_tuple_cells;
+   int len;
+
+   prop = fdt_get_property(fdt, fdtnode, reg, len);
+   if (prop == NULL)
+   return len;
+
+   cells = (u32 *)prop-data;
+   nr_tuple_cells = reg-nr_address_cells + reg-nr_size_cells;
+   regidx *= nr_tuple_cells;
+
+   if (regidx + nr_tuple_cells  len/sizeof(u32))
+   return -FDT_ERR_NOTFOUND;
+
+   for (i = 0; i  reg-nr_address_cells; ++i)
+   reg-address_cells[i] = fdt32_to_cpu(cells[regidx + i]);
+
+   regidx += reg-nr_address_cells;
+   for (i = 0; i  reg-nr_size_cells; ++i)
+   reg-size_cells[i] = fdt32_to_cpu(cells[regidx + i]);
+
+   return 0;
+}
+
+int dt_pbus_translate_node(int fdtnode, int regidx,
+  struct dt_pbus_reg *pbus_reg)
+{
+   struct dt_reg

[PATCH v7 08/14] Introduce chr-testdev

2014-07-16 Thread Andrew Jones

chr-testdev is a qemu backend that can be used by test code to send qemu
commands. It communicates with qemu through a virtio-console device. The
only command currently implemented is quit, which allows the test code
to exit with a given status code, i.e. chr_testdev_exit(code).

Signed-off-by: Andrew Jones drjo...@redhat.com
---
 lib/chr-testdev.c | 72 +++
 lib/chr-testdev.h | 14 +++
 lib/virtio.h  |  2 ++
 3 files changed, 88 insertions(+)
 create mode 100644 lib/chr-testdev.c
 create mode 100644 lib/chr-testdev.h

diff --git a/lib/chr-testdev.c b/lib/chr-testdev.c
new file mode 100644
index 0..0c9a173a04886
--- /dev/null
+++ b/lib/chr-testdev.c
@@ -0,0 +1,72 @@
+/*
+ * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+#include libcflat.h
+#include virtio.h
+#include asm/spinlock.h
+
+#define TESTDEV_NAME chr-testdev
+
+static struct virtio_device *vcon;
+static struct virtqueue *in_vq, *out_vq;
+static struct spinlock lock;
+
+static void __testdev_send(char *buf, size_t len)
+{
+   int ret;
+
+   ret = virtqueue_add_outbuf(out_vq, buf, len);
+   virtqueue_kick(out_vq);
+
+   if (ret  0)
+   return;
+
+   while (!virtqueue_get_buf(out_vq, len))
+   ;
+}
+
+void chr_testdev_exit(int code)
+{
+   char buf[8];
+   int len;
+
+   snprintf(buf, sizeof(buf), %dq, code);
+   len = strlen(buf);
+
+   spin_lock(lock);
+
+   if (!vcon)
+   goto out;
+
+   __testdev_send(buf, len);
+
+out:
+   spin_unlock(lock);
+}
+
+void chr_testdev_init(void)
+{
+   const char *io_names[] = { input, output };
+   struct virtqueue *vqs[2];
+   int ret;
+
+   vcon = virtio_bind(VIRTIO_ID_CONSOLE);
+   if (vcon == NULL) {
+   printf(%s: %s: can't find a virtio-console\n,
+   __func__, TESTDEV_NAME);
+   return;
+   }
+
+   ret = vcon-config-find_vqs(vcon, 2, vqs, NULL, io_names);
+   if (ret  0) {
+   printf(%s: %s: can't init virtqueues\n,
+   __func__, TESTDEV_NAME);
+   vcon = NULL;
+   return;
+   }
+
+   in_vq = vqs[0];
+   out_vq = vqs[1];
+}
diff --git a/lib/chr-testdev.h b/lib/chr-testdev.h
new file mode 100644
index 0..ffd9a851aa9b9
--- /dev/null
+++ b/lib/chr-testdev.h
@@ -0,0 +1,14 @@
+#ifndef _CHR_TESTDEV_H_
+#define _CHR_TESTDEV_H_
+/*
+ * chr-testdev is a driver for the chr-testdev qemu backend.
+ * The chr-testdev backend exposes a simple control interface to
+ * qemu for kvm-unit-tests accessible through virtio-console.
+ *
+ * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+extern void chr_testdev_init(void);
+extern void chr_testdev_exit(int code);
+#endif
diff --git a/lib/virtio.h b/lib/virtio.h
index 37ce028b2c2bb..b51899ab998b6 100644
--- a/lib/virtio.h
+++ b/lib/virtio.h
@@ -10,6 +10,8 @@
  */
 #include libcflat.h
 
+#define VIRTIO_ID_CONSOLE 3
+
 struct virtio_device_id {
u32 device;
u32 vendor;
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v7 10/14] arm: Add spinlock implementation

2014-07-16 Thread Andrew Jones

From: Christoffer Dall christoffer.d...@linaro.org

Add simple busy-wait spinlock implementation for ARM.

Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
Signed-off-by: Andrew Jones drjo...@redhat.com
---
 config/config-arm.mak  |  3 ++-
 lib/arm/asm/spinlock.h |  9 ++---
 lib/arm/spinlock.c | 28 
 3 files changed, 32 insertions(+), 8 deletions(-)
 create mode 100644 lib/arm/spinlock.c

diff --git a/config/config-arm.mak b/config/config-arm.mak
index ff965428e0e5b..b7239810183d1 100644
--- a/config/config-arm.mak
+++ b/config/config-arm.mak
@@ -37,7 +37,8 @@ cflatobjs += \
lib/virtio-mmio.o \
lib/chr-testdev.o \
lib/arm/io.o \
-   lib/arm/setup.o
+   lib/arm/setup.o \
+   lib/arm/spinlock.o
 
 libeabi = lib/arm/libeabi.a
 eabiobjs = lib/arm/eabi_compat.o
diff --git a/lib/arm/asm/spinlock.h b/lib/arm/asm/spinlock.h
index 04f5a1a5538e2..2118a4b3751e0 100644
--- a/lib/arm/asm/spinlock.h
+++ b/lib/arm/asm/spinlock.h
@@ -5,12 +5,7 @@ struct spinlock {
int v;
 };
 
-//TODO
-static inline void spin_lock(struct spinlock *lock __unused)
-{
-}
-static inline void spin_unlock(struct spinlock *lock __unused)
-{
-}
+extern void spin_lock(struct spinlock *lock);
+extern void spin_unlock(struct spinlock *lock);
 
 #endif /* _ASMARM_SPINLOCK_H_ */
diff --git a/lib/arm/spinlock.c b/lib/arm/spinlock.c
new file mode 100644
index 0..d8a6d4c3383d6
--- /dev/null
+++ b/lib/arm/spinlock.c
@@ -0,0 +1,28 @@
+#include libcflat.h
+#include asm/spinlock.h
+#include asm/barrier.h
+
+void spin_lock(struct spinlock *lock)
+{
+   u32 val, fail;
+
+   dmb();
+   do {
+   asm volatile(
+   1: ldrex   %0, [%2]\n
+  teq %0, #0\n
+  bne 1b\n
+  mov %0, #1\n
+  strex   %1, %0, [%2]\n
+   : =r (val), =r (fail)
+   : r (lock-v)
+   : cc );
+   } while (fail);
+   dmb();
+}
+
+void spin_unlock(struct spinlock *lock)
+{
+   lock-v = 0;
+   dmb();
+}
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v7 09/14] arm: initial drop

2014-07-16 Thread Andrew Jones

This is the initial drop of the arm test framework and a first test
that just checks that setup completed (a selftest). kvm isn't needed
to run this test unless testing with smp  1.

Try it out with
  yum install gcc-arm-linux-gnu
  export QEMU=[qemu with mach-virt and chr-testdev]
  ./configure --cross-prefix=arm-linux-gnu- --arch=arm
  make
  ./run_tests.sh

Signed-off-by: Andrew Jones drjo...@redhat.com
Reviewed-by: Christoffer Dall christoffer.d...@linaro.org
---
v7:
  - remove memregions (reworked them as phys_alloc in lib/alloc)
  - selftest: non-functional change: s/argv[i]/var/
  - lib/argv:setup_args don't dereference NULL
v6:
  - fixed setup.c comment [Christoffer Dall]
  - changed arm/run to use chr-testdev instead of virtio-testdev
  - add align parameter to memregion_new, setup alloc_ops
v5:
  - memregions: check freemem_start is in bounds and document
  - selftest: rename testnam = testname and properly init it
  - io.c: use writeb instead of writel in puts() and use ioremap
  - arm/run script update for new qemu ('-device ?' now requires -machine)
  - couple other minor changes to setup.c and io.c [Christoffer Dall]
v4:
  - moved fdt to just after stacktop (it was in the middle of free memory)
  - switched from using heap to memregions
  - get nr_cpus and added smp=num test
  - added barrier.h
  - use new report()/report_summary()
  - config/config-arm.mak cleanup
---
 arm/cstart.S   | 35 
 arm/flat.lds   | 18 +++
 arm/run| 46 +++
 arm/selftest.c | 86 ++
 arm/unittests.cfg  | 18 +++
 config/config-arm.mak  | 74 +++
 configure  | 12 +--
 lib/argv.c |  9 ++
 lib/arm/asm/barrier.h  | 18 +++
 lib/arm/asm/io.h   | 24 ++
 lib/arm/asm/page.h |  1 +
 lib/arm/asm/setup.h| 27 
 lib/arm/asm/spinlock.h | 16 ++
 lib/arm/eabi_compat.c  | 20 
 lib/arm/io.c   | 65 ++
 lib/arm/setup.c| 82 +++
 16 files changed, 549 insertions(+), 2 deletions(-)
 create mode 100644 arm/cstart.S
 create mode 100644 arm/flat.lds
 create mode 100755 arm/run
 create mode 100644 arm/selftest.c
 create mode 100644 arm/unittests.cfg
 create mode 100644 config/config-arm.mak
 create mode 100644 lib/arm/asm/barrier.h
 create mode 100644 lib/arm/asm/io.h
 create mode 100644 lib/arm/asm/page.h
 create mode 100644 lib/arm/asm/setup.h
 create mode 100644 lib/arm/asm/spinlock.h
 create mode 100644 lib/arm/eabi_compat.c
 create mode 100644 lib/arm/io.c
 create mode 100644 lib/arm/setup.c

diff --git a/arm/cstart.S b/arm/cstart.S
new file mode 100644
index 0..e28251db2950d
--- /dev/null
+++ b/arm/cstart.S
@@ -0,0 +1,35 @@
+/*
+ * Boot entry point and assembler functions for armv7 tests.
+ *
+ * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+
+.arm
+
+.section .init
+
+.globl start
+start:
+   /*
+* bootloader params are in r0-r2
+* See the kernel doc Documentation/arm/Booting
+*/
+   ldr sp, =stacktop
+   bl  setup
+
+   /* run the test */
+   ldr r0, =__argc
+   ldr r0, [r0]
+   ldr r1, =__argv
+   bl  main
+   bl  exit
+   b   halt
+
+.text
+
+.globl halt
+halt:
+1: wfi
+   b   1b
diff --git a/arm/flat.lds b/arm/flat.lds
new file mode 100644
index 0..3e5d72e24989b
--- /dev/null
+++ b/arm/flat.lds
@@ -0,0 +1,18 @@
+
+SECTIONS
+{
+.text : { *(.init) *(.text) *(.text.*) }
+. = ALIGN(4K);
+.data : { *(.data) }
+. = ALIGN(16);
+.rodata : { *(.rodata) }
+. = ALIGN(16);
+.bss : { *(.bss) }
+. = ALIGN(4K);
+edata = .;
+. += 8K;
+. = ALIGN(4K);
+stacktop = .;
+}
+
+ENTRY(start)
diff --git a/arm/run b/arm/run
new file mode 100755
index 0..a714350225597
--- /dev/null
+++ b/arm/run
@@ -0,0 +1,46 @@
+#!/bin/bash
+
+if [ ! -f config.mak ]; then
+   echo run ./configure first. See ./configure -h
+   exit 2
+fi
+source config.mak
+
+qemu=${QEMU:-qemu-system-arm}
+qpath=$(which $qemu 2/dev/null)
+
+if [ -z $qpath ]; then
+   echo $qemu not found.
+   exit 2
+fi
+
+if ! $qemu -machine '?' 21 | grep 'ARM Virtual Machine'  /dev/null; then
+   echo $qpath doesn't support mach-virt ('-machine virt'). Exiting.
+   exit 2
+fi
+
+M='-machine virt'
+
+if ! $qemu $M -device '?' 21 | grep virtconsole  /dev/null; then
+   echo $qpath doesn't support virtio-console for chr-testdev. Exiting.
+   exit 2
+fi
+
+if $qemu $M -chardev testdev,id=id -kernel . 21 \
+   | grep backend  /dev/null; then
+   echo $qpath doesn't support chr-testdev. Exiting.
+

Re: [PATCH v7 09/14] arm: initial drop

2014-07-16 Thread Paolo Bonzini




diff --git a/arm/unittests.cfg b/arm/unittests.cfg
new file mode 100644
index 0..da9dfd7b1f118
--- /dev/null
+++ b/arm/unittests.cfg
@@ -0,0 +1,18 @@
+# Define your new unittest following the convention:
+# [unittest_name]
+# file = foo.flat # Name of the flat file to be used
+# smp  = 2# Number of processors the VM will use during this test
+# extra_params = -append params... # Additional parameters used
+# arch = arm/arm64   # Only if test case is specific to one
+# groups = group1 group2 # Used to identify test cases with run_tests -g ...
+
+#
+# Test that the configured number of processors (smp = num), and
+# that the configured amount of memory (-m MB) are correctly setup
+# by the framework.
+#
+[selftest::setup]
+file = selftest.flat
+smp  = 1
+extra_params = -m 256 -append 'setup smp=1 mem=256'
+groups = selftest


Nice. :)


diff --git a/lib/arm/asm/page.h b/lib/arm/asm/page.h
new file mode 100644
index 0..91a4bc3b7f86e
--- /dev/null
+++ b/lib/arm/asm/page.h
@@ -0,0 +1 @@
+#include asm-generic/page.h
diff --git a/lib/arm/asm/setup.h b/lib/arm/asm/setup.h
new file mode 100644
index 0..21445ef2085fc
--- /dev/null
+++ b/lib/arm/asm/setup.h
@@ -0,0 +1,27 @@
+#ifndef _ASMARM_SETUP_H_
+#define _ASMARM_SETUP_H_
+/*
+ * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+#include libcflat.h
+#include alloc.h
+
+#define NR_CPUS8
+extern u32 cpus[NR_CPUS];
+extern int nr_cpus;
+
+extern phys_addr_t __phys_offset, __phys_end;
+
+#define PHYS_OFFSET(__phys_offset)
+#define PHYS_END   (__phys_end)
+#define PHYS_SHIFT 40
+#define PHYS_SIZE  (1ULL  PHYS_SHIFT)
+#define PHYS_MASK  (PHYS_SIZE - 1ULL)


Can you explain these?  I'm not sure I understand this:


+   mem_start = regs[0].addr;
+   mem_end = mem_start + regs[0].size;
+
+   assert(!(mem_start  ~PHYS_MASK)  !((mem_end-1)  ~PHYS_MASK));
+   assert(freemem_start = mem_start  freemem_start  mem_end);
+
+   __phys_offset = mem_start;  /* PHYS_OFFSET */
+   __phys_end = mem_end;   /* PHYS_END */


and I think the macro indirection (__phys_offset vs. PHYS_OFFSET, 
__phys_end vs. PHYS_END) is unnecessary: just call the variables 
phys_offset and phys_end.


Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 08/14] Introduce chr-testdev

2014-07-16 Thread Levente Kurusa

- Original Message -
 [...]
 +void chr_testdev_exit(int code)
 +{
 + char buf[8];
 + int len;
 +
 + snprintf(buf, sizeof(buf), %dq, code);
 + len = strlen(buf);

AFAIK, snprintf returns the number of characters written, so
these two statements can be merged into one.

Thanks,
Levente Kurusa
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 09/14] arm: initial drop

2014-07-16 Thread Andrew Jones

On Wed, Jul 16, 2014 at 11:22:18AM +0200, Paolo Bonzini wrote:
 
 diff --git a/arm/unittests.cfg b/arm/unittests.cfg
 new file mode 100644
 index 0..da9dfd7b1f118
 --- /dev/null
 +++ b/arm/unittests.cfg
 @@ -0,0 +1,18 @@
 +# Define your new unittest following the convention:
 +# [unittest_name]
 +# file = foo.flat # Name of the flat file to be used
 +# smp  = 2# Number of processors the VM will use during this test
 +# extra_params = -append params... # Additional parameters used
 +# arch = arm/arm64   # Only if test case is specific to one
 +# groups = group1 group2 # Used to identify test cases with run_tests -g ...
 +
 +#
 +# Test that the configured number of processors (smp = num), and
 +# that the configured amount of memory (-m MB) are correctly setup
 +# by the framework.
 +#
 +[selftest::setup]
 +file = selftest.flat
 +smp  = 1
 +extra_params = -m 256 -append 'setup smp=1 mem=256'
 +groups = selftest
 
 Nice. :)
 
 diff --git a/lib/arm/asm/page.h b/lib/arm/asm/page.h
 new file mode 100644
 index 0..91a4bc3b7f86e
 --- /dev/null
 +++ b/lib/arm/asm/page.h
 @@ -0,0 +1 @@
 +#include asm-generic/page.h
 diff --git a/lib/arm/asm/setup.h b/lib/arm/asm/setup.h
 new file mode 100644
 index 0..21445ef2085fc
 --- /dev/null
 +++ b/lib/arm/asm/setup.h
 @@ -0,0 +1,27 @@
 +#ifndef _ASMARM_SETUP_H_
 +#define _ASMARM_SETUP_H_
 +/*
 + * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com
 + *
 + * This work is licensed under the terms of the GNU LGPL, version 2.
 + */
 +#include libcflat.h
 +#include alloc.h
 +
 +#define NR_CPUS 8
 +extern u32 cpus[NR_CPUS];
 +extern int nr_cpus;
 +
 +extern phys_addr_t __phys_offset, __phys_end;
 +
 +#define PHYS_OFFSET (__phys_offset)
 +#define PHYS_END(__phys_end)
 +#define PHYS_SHIFT  40
 +#define PHYS_SIZE   (1ULL  PHYS_SHIFT)
 +#define PHYS_MASK   (PHYS_SIZE - 1ULL)
 
 Can you explain these?  I'm not sure I understand this:

arm with LPAE can address 40-bit addrs. PHYS_MASK is handy
to assert all addresses we expect to be addressable, are.

 
 +mem_start = regs[0].addr;
 +mem_end = mem_start + regs[0].size;
 +
 +assert(!(mem_start  ~PHYS_MASK)  !((mem_end-1)  ~PHYS_MASK));
 +assert(freemem_start = mem_start  freemem_start  mem_end);
 +
 +__phys_offset = mem_start;  /* PHYS_OFFSET */
 +__phys_end = mem_end;   /* PHYS_END */
 
 and I think the macro indirection (__phys_offset vs. PHYS_OFFSET, __phys_end
 vs. PHYS_END) is unnecessary: just call the variables phys_offset and
 phys_end.


PHYS_OFFSET is consistent with the kernel naming, so I'd like to keep
that. I invented PHYS_END, as it can serve a nice utility
(mem_size = PHYS_END - PHYS_OFFSET), and I wouldn't want to leave it
as the odd one out by not granting it the privilege of capital letters.

drew
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] ensure guest's kvmclock never goes backwards when TSC jumps backward

2014-07-16 Thread Igor Mammedov

There are buggy hosts in the wild that advertise invariant
TSC and as result host uses TSC as clocksource, but TSC on
such host sometimes sporadically jumps backwards.

This causes kvmclock to go backwards if host advertises
PVCLOCK_TSC_STABLE_BIT, which turns off aggregated clock
accumulator and returns:
  pvclock_vcpu_time_info.system_timestamp + offset
where 'offset' is calculated using TSC.
Since TSC is not virtualized in KVM, it makes guest see
TSC jumped backwards and leads to kvmclock going backwards
as well.

This is defensive patch that keeps per CPU last clock value
and ensures that clock will never go backwards even with
using PVCLOCK_TSC_STABLE_BIT enabled path.

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
RHBZ: 1115795

---
 arch/x86/kernel/pvclock.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 2f355d2..dd9df0e 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -71,11 +71,14 @@ u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src)
return flags  valid_flags;
 }
 
+static DEFINE_PER_CPU(cycle_t, last_clock);
+
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 {
unsigned version;
cycle_t ret;
-   u64 last;
+   u64 last, *this_cpu_last;
+   s64 clock_delta;
u8 flags;
 
do {
@@ -87,6 +90,16 @@ cycle_t pvclock_clocksource_read(struct 
pvclock_vcpu_time_info *src)
pvclock_touch_watchdogs();
}
 
+   this_cpu_last = get_cpu_var(last_clock);
+   clock_delta = ret - *this_cpu_last;
+   if (likely(clock_delta  0)) {
+   *this_cpu_last = ret;
+   } else {
+   ret = *this_cpu_last;
+   WARN_ONCE(1, clock went backwards);
+   }
+   put_cpu_var(last_clock);
+
if ((valid_flags  PVCLOCK_TSC_STABLE_BIT) 
(flags  PVCLOCK_TSC_STABLE_BIT))
return ret;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ensure guest's kvmclock never goes backwards when TSC jumps backward

2014-07-16 Thread Paolo Bonzini


Il 16/07/2014 11:52, Igor Mammedov ha scritto:

There are buggy hosts in the wild that advertise invariant
TSC and as result host uses TSC as clocksource, but TSC on
such host sometimes sporadically jumps backwards.

This causes kvmclock to go backwards if host advertises
PVCLOCK_TSC_STABLE_BIT, which turns off aggregated clock
accumulator and returns:
  pvclock_vcpu_time_info.system_timestamp + offset
where 'offset' is calculated using TSC.
Since TSC is not virtualized in KVM, it makes guest see
TSC jumped backwards and leads to kvmclock going backwards
as well.

This is defensive patch that keeps per CPU last clock value
and ensures that clock will never go backwards even with
using PVCLOCK_TSC_STABLE_BIT enabled path.


I'm not sure that a per-CPU value is enough; your patch can make the 
problem much less frequent of course, but I'm not sure neither detection 
nor correction are 100% reliable.  Your addition is basically a faster 
but less reliable version of the last_value logic.


If may be okay to have detection that is faster but not 100% reliable. 
However, once you find that the host is buggy I think the correct thing 
to do is to write last_value and kill PVCLOCK_TSC_STABLE_BIT from 
valid_flags.


Did you check that the affected host has the latest microcode? 
Alternatively, could we simply blacklist some CPU steppings?  I'm not 
sure who we could ask at AMD :( but perhaps there is an erratum.


Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 09/14] arm: initial drop

2014-07-16 Thread Paolo Bonzini


Il 16/07/2014 11:39, Andrew Jones ha scritto:

PHYS_OFFSET is consistent with the kernel naming, so I'd like to keep
that. I invented PHYS_END, as it can serve a nice utility
(mem_size = PHYS_END - PHYS_OFFSET), and I wouldn't want to leave it
as the odd one out by not granting it the privilege of capital letters.


Ok, I see some de-Linuxization coming to the ARM kvm-unit-tests sooner 
or later, but getting things moving is more important.  I'll push the 
tests as soon as I can try them on the cubietruck.


Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ensure guest's kvmclock never goes backwards when TSC jumps backward

2014-07-16 Thread Marcelo Tosatti

On Wed, Jul 16, 2014 at 12:18:37PM +0200, Paolo Bonzini wrote:
 Il 16/07/2014 11:52, Igor Mammedov ha scritto:
 There are buggy hosts in the wild that advertise invariant
 TSC and as result host uses TSC as clocksource, but TSC on
 such host sometimes sporadically jumps backwards.
 
 This causes kvmclock to go backwards if host advertises
 PVCLOCK_TSC_STABLE_BIT, which turns off aggregated clock
 accumulator and returns:
   pvclock_vcpu_time_info.system_timestamp + offset
 where 'offset' is calculated using TSC.
 Since TSC is not virtualized in KVM, it makes guest see
 TSC jumped backwards and leads to kvmclock going backwards
 as well.
 
 This is defensive patch that keeps per CPU last clock value
 and ensures that clock will never go backwards even with
 using PVCLOCK_TSC_STABLE_BIT enabled path.
 
 I'm not sure that a per-CPU value is enough; your patch can make the
 problem much less frequent of course, but I'm not sure neither
 detection nor correction are 100% reliable.  Your addition is
 basically a faster but less reliable version of the last_value
 logic.
 
 If may be okay to have detection that is faster but not 100%
 reliable. However, once you find that the host is buggy I think the
 correct thing to do is to write last_value and kill
 PVCLOCK_TSC_STABLE_BIT from valid_flags.
 
 Did you check that the affected host has the latest microcode?
 Alternatively, could we simply blacklist some CPU steppings?  I'm
 not sure who we could ask at AMD :( but perhaps there is an erratum.
 
 Paolo

Igor,

Can we move detection to the host TSC clocksource driver ?

Because it is responsability of the host system to provide a non
backwards clock_gettime() interface as well.

How did you prove its the host TSC in fact going backwards?
Is it cross-CPU detection?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ensure guest's kvmclock never goes backwards when TSC jumps backward

2014-07-16 Thread Igor Mammedov

On Wed, 16 Jul 2014 08:41:00 -0300
Marcelo Tosatti mtosa...@redhat.com wrote:

 On Wed, Jul 16, 2014 at 12:18:37PM +0200, Paolo Bonzini wrote:
  Il 16/07/2014 11:52, Igor Mammedov ha scritto:
  There are buggy hosts in the wild that advertise invariant
  TSC and as result host uses TSC as clocksource, but TSC on
  such host sometimes sporadically jumps backwards.
  
  This causes kvmclock to go backwards if host advertises
  PVCLOCK_TSC_STABLE_BIT, which turns off aggregated clock
  accumulator and returns:
pvclock_vcpu_time_info.system_timestamp + offset
  where 'offset' is calculated using TSC.
  Since TSC is not virtualized in KVM, it makes guest see
  TSC jumped backwards and leads to kvmclock going backwards
  as well.
  
  This is defensive patch that keeps per CPU last clock value
  and ensures that clock will never go backwards even with
  using PVCLOCK_TSC_STABLE_BIT enabled path.
  
  I'm not sure that a per-CPU value is enough; your patch can make the
  problem much less frequent of course, but I'm not sure neither
  detection nor correction are 100% reliable.  Your addition is
  basically a faster but less reliable version of the last_value
  logic.
How is it less reliable than last_value logic?

Alternatively, we can panic in case of backward jump here, so that guest
won't hang in random place in case of error. There might not be OOPs
but at least coredump will point to a right place.

  
  If may be okay to have detection that is faster but not 100%
  reliable. However, once you find that the host is buggy I think the
  correct thing to do is to write last_value and kill
  PVCLOCK_TSC_STABLE_BIT from valid_flags.
that might be an option, but what value we need to store into
last_value?
To make sure that clock won't go back we need to track
it on all CPUs and store highest value to last_value, at this point
there is no point in switching to last_value path since we have to
track per CPU anyway.

What this patch doesn't cover is switching from master_clock mode to
last_value mode (it happens at CPU hotplug time), I'd need to add
what was described above as second patch on top of this one.

  
  Did you check that the affected host has the latest microcode?
  Alternatively, could we simply blacklist some CPU steppings?  I'm
  not sure who we could ask at AMD :( but perhaps there is an erratum.
I haven't found anything in this direction yet. I'm still trying to
find someone from AMD to look at the issue.

  
  Paolo
 
 Igor,
 
 Can we move detection to the host TSC clocksource driver ?
I haven't looked much at host side solution yet,
but to detection reliable it needs to be run constantly,
from read_native_tsc().

it's possible to put detection into check_system_tsc_reliable() but
that would increase boot time and it's not clear for how long test
should run to make detection reliable (in my case it takes ~5-10sec
to detect first failure).

Best we could at boot time is mark TSC as unstable on affected hardware,
but for this we need to figure out if it's specific machine or CPU issue
to do it properly. (I'm in process of finding out who to bug with it)

 
 Because it is responsability of the host system to provide a non
 backwards clock_gettime() interface as well.
vdso_clock_gettime() is not affected since it will use last highest
tsc value in case of jump due to usage of vread_tsc().
PS: it appears that host runs stably. 

but kvm_get_time_and_clockread() is affected since it uses its own
version of do_monotonic()-vgettsc() which will lead to cycles
go backwards and overflow of nano secs in timespec. We should mimic
vread_tsc() here so not to run into this kind of issues.

 
 How did you prove its the host TSC in fact going backwards?
 Is it cross-CPU detection?
I've checked with several methods, 
1. patched pvclock_clocksource_read() in guest with VCPUs pinned to host
CPUs.
2.Ingo's tsc_wrap_test, which fails miserably on affected host.
3 sytemtap script hooked to read_native_tsc(), for source see
https://bugzilla.redhat.com/show_bug.cgi?id=1115795#c12


 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread Andy Lutomirski

On Wed, Jul 16, 2014 at 12:36 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 16/07/2014 09:10, Daniel Borkmann ha scritto:

 On 07/16/2014 08:41 AM, Gleb Natapov wrote:

 On Tue, Jul 15, 2014 at 07:48:06PM -0700, Andy Lutomirski wrote:

 virtio-rng is both too complicated and insufficient for initial rng
 seeding.  It's far too complicated to use for KASLR or any other
 early boot random number needs.  It also provides /dev/random-style
 bits, which means that making guest boot wait for virtio-rng is
 unacceptably slow, and doing it asynchronously means that
 /dev/urandom might be predictable when userspace starts.

 This introduces a very simple synchronous mechanism to get
 /dev/urandom-style bits.


 Why can't you use RDRAND instruction for that?


 You mean using it directly? I think simply for the very same reasons
 as in c2557a303a ...


 No, this is very different.  This mechanism provides no guarantee that the
 result contains any actual entropy.  In fact, patch 3 adds a call to the
 new arch_get_slow_rng_u64 just below a call to arch_get_random_lang aka
 RDRAND.  I agree with Gleb that it's simpler to just expect a relatively
 recent processor and use RDRAND.

 BTW, the logic for crediting entropy to RDSEED but not RDRAND escapes me.
 If you trust the processor, you could use Intel's algorithm to force
 reseeding of RDRAND.  If you don't trust the processor, the same paranoia
 applies to RDRAND and RDSEED.

 In a guest you must trust the hypervisor anyway to use RDRAND or RDSEED,
 since the hypervisor can trap it.  A malicious hypervisor is no different
 from a malicious processor.


This patch has nothing whatsoever to do with how much I trust the CPU
vs the hypervisor.  It's for the enormous installed base of machines
without RDRAND.

hpa suggested emulating RDRAND awhile ago, but I think that'll
unusably slow -- the kernel uses RDRAND in various places where it's
expected to be fast, and not using it at all will be preferable to
causing a VM exit for every few bytes.  I've been careful to only use
this in the guest in places where a few hundred to a few thousand
cycles per 64 bits of RNG seed is acceptable.

 In any case, is there a matching QEMU patch somewhere?

What QEMU change is needed?  I admit I'm a bit vague on how QEMU and
KVM cooperate here, but there's no state to save and restore.  I guess
that QEMU wants the ability to turn this on and off for migration.
How does that work?  I couldn't spot the KVM code that allows this
type of control.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ensure guest's kvmclock never goes backwards when TSC jumps backward

2014-07-16 Thread Paolo Bonzini


Il 16/07/2014 15:55, Igor Mammedov ha scritto:

On Wed, 16 Jul 2014 08:41:00 -0300
Marcelo Tosatti mtosa...@redhat.com wrote:


On Wed, Jul 16, 2014 at 12:18:37PM +0200, Paolo Bonzini wrote:

Il 16/07/2014 11:52, Igor Mammedov ha scritto:

There are buggy hosts in the wild that advertise invariant
TSC and as result host uses TSC as clocksource, but TSC on
such host sometimes sporadically jumps backwards.

This causes kvmclock to go backwards if host advertises
PVCLOCK_TSC_STABLE_BIT, which turns off aggregated clock
accumulator and returns:
 pvclock_vcpu_time_info.system_timestamp + offset
where 'offset' is calculated using TSC.
Since TSC is not virtualized in KVM, it makes guest see
TSC jumped backwards and leads to kvmclock going backwards
as well.

This is defensive patch that keeps per CPU last clock value
and ensures that clock will never go backwards even with
using PVCLOCK_TSC_STABLE_BIT enabled path.


I'm not sure that a per-CPU value is enough; your patch can make the
problem much less frequent of course, but I'm not sure neither
detection nor correction are 100% reliable.  Your addition is
basically a faster but less reliable version of the last_value
logic.

How is it less reliable than last_value logic?


Suppose CPU 1 is behind by 3 nanoseconds

   CPU 0  CPU 1
   time = 100  (at time 100)
  time = 99(at time 102)
   time = 104  (at time 104)
  time = 105   (at time 108)

Your patch will not detect this.


If may be okay to have detection that is faster but not 100%
reliable. However, once you find that the host is buggy I think the
correct thing to do is to write last_value and kill
PVCLOCK_TSC_STABLE_BIT from valid_flags.

that might be an option, but what value we need to store into
last_value?


You can write the value that was in the per-CPU variable (not perfect 
correction)...



To make sure that clock won't go back we need to track
it on all CPUs and store highest value to last_value, at this point
there is no point in switching to last_value path since we have to
track per CPU anyway.


... or loop over all CPUs and find the highest value.  You would only 
have to do this once.



Can we move detection to the host TSC clocksource driver ?


I haven't looked much at host side solution yet,
but to detection reliable it needs to be run constantly,
from read_native_tsc().

it's possible to put detection into check_system_tsc_reliable() but
that would increase boot time and it's not clear for how long test
should run to make detection reliable (in my case it takes ~5-10sec
to detect first failure).


Is 5-10sec the time that it takes for tsc_wrap_test to fail?


Best we could at boot time is mark TSC as unstable on affected hardware,
but for this we need to figure out if it's specific machine or CPU issue
to do it properly. (I'm in process of finding out who to bug with it)


Thanks, this would be best.


PS: it appears that host runs stably.

but kvm_get_time_and_clockread() is affected since it uses its own
version of do_monotonic()-vgettsc() which will lead to cycles
go backwards and overflow of nano secs in timespec. We should mimic
vread_tsc() here so not to run into this kind of issues.


I'm not sure I understand, the code is similar:

arch/x86/kvm/x86.c  arch/x86/vdso/vclock_gettime.c
do_monotonicdo_monotonic
vgettsc vgetsns
read_tscvread_tsc
vget_cycles
__native_read_tsc   __native_read_tsc

The VDSO inlines timespec_add_ns.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread Paolo Bonzini


Il 16/07/2014 16:07, Andy Lutomirski ha scritto:

This patch has nothing whatsoever to do with how much I trust the CPU
vs the hypervisor.  It's for the enormous installed base of machines
without RDRAND.


Ok.  I think an MSR is fine, though I don't think it's useful for the 
guest to use it if it already has RDRAND and/or RDSEED.



 In any case, is there a matching QEMU patch somewhere?

What QEMU change is needed?  I admit I'm a bit vague on how QEMU and
KVM cooperate here, but there's no state to save and restore.  I guess
that QEMU wants the ability to turn this on and off for migration.
How does that work?  I couldn't spot the KVM code that allows this
type of control.


It is QEMU who decides the CPUID bits that are visible to the guest.  By 
default it blocks bits that it doesn't know about.  You would need to 
add the bit in the kvm_default_features and kvm_feature_name arrays.


For migration, we have versioned machine types, for example pc-2.1.
Once the versioned machine type exists, blocking the feature is a 
one-liner like


x86_cpu_compat_disable_kvm_features(FEAT_KVM, KVM_FEATURE_NAME);

Unfortunately, QEMU is in hard freeze, so you'd likely be the one 
creating pc-2.2.  This is a boilerplate but relatively complicated 
patch.  But let's cross that bridge when we'll reach it.  For now, you 
can simply add the bit to the two arrays above.


Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ensure guest's kvmclock never goes backwards when TSC jumps backward

2014-07-16 Thread Igor Mammedov

On Wed, 16 Jul 2014 16:16:17 +0200
Paolo Bonzini pbonz...@redhat.com wrote:

 Il 16/07/2014 15:55, Igor Mammedov ha scritto:
  On Wed, 16 Jul 2014 08:41:00 -0300
  Marcelo Tosatti mtosa...@redhat.com wrote:
 
  On Wed, Jul 16, 2014 at 12:18:37PM +0200, Paolo Bonzini wrote:
  Il 16/07/2014 11:52, Igor Mammedov ha scritto:
  There are buggy hosts in the wild that advertise invariant
  TSC and as result host uses TSC as clocksource, but TSC on
  such host sometimes sporadically jumps backwards.
 
  This causes kvmclock to go backwards if host advertises
  PVCLOCK_TSC_STABLE_BIT, which turns off aggregated clock
  accumulator and returns:
   pvclock_vcpu_time_info.system_timestamp + offset
  where 'offset' is calculated using TSC.
  Since TSC is not virtualized in KVM, it makes guest see
  TSC jumped backwards and leads to kvmclock going backwards
  as well.
 
  This is defensive patch that keeps per CPU last clock value
  and ensures that clock will never go backwards even with
  using PVCLOCK_TSC_STABLE_BIT enabled path.
 
  I'm not sure that a per-CPU value is enough; your patch can make
  the problem much less frequent of course, but I'm not sure neither
  detection nor correction are 100% reliable.  Your addition is
  basically a faster but less reliable version of the last_value
  logic.
  How is it less reliable than last_value logic?
 
 Suppose CPU 1 is behind by 3 nanoseconds
 
 CPU 0  CPU 1
 time = 100  (at time 100)
time = 99(at time 102)
 time = 104  (at time 104)
time = 105   (at time 108)
 
 Your patch will not detect this.
Is it possible for each cpu to have it's own time?

 
  If may be okay to have detection that is faster but not 100%
  reliable. However, once you find that the host is buggy I think
  the correct thing to do is to write last_value and kill
  PVCLOCK_TSC_STABLE_BIT from valid_flags.
  that might be an option, but what value we need to store into
  last_value?
 
 You can write the value that was in the per-CPU variable (not perfect 
 correction)...
I'll look at this variant, it's not perfect but it doesn't involve callout
to other CPUs!

 
  To make sure that clock won't go back we need to track
  it on all CPUs and store highest value to last_value, at this point
  there is no point in switching to last_value path since we have to
  track per CPU anyway.
 
 ... or loop over all CPUs and find the highest value.  You would only 
 have to do this once.
 
  Can we move detection to the host TSC clocksource driver ?
 
  I haven't looked much at host side solution yet,
  but to detection reliable it needs to be run constantly,
  from read_native_tsc().
 
  it's possible to put detection into check_system_tsc_reliable() but
  that would increase boot time and it's not clear for how long test
  should run to make detection reliable (in my case it takes ~5-10sec
  to detect first failure).
 
 Is 5-10sec the time that it takes for tsc_wrap_test tofail?
nope, for systemtap script hooked to native_read_tsc(), but it depend
on the load for example hotplugging VCPU causes imediate jumps.

tsc_wrap_test starts to fail almost imediately,

I'll check how much tries it takes to fail for the first time, if it is
not too much I guess we could add check to check_system_tsc_reliable().


 
  Best we could at boot time is mark TSC as unstable on affected
  hardware, but for this we need to figure out if it's specific
  machine or CPU issue to do it properly. (I'm in process of finding
  out who to bug with it)
 
 Thanks, this would be best.
 
  PS: it appears that host runs stably.
 
  but kvm_get_time_and_clockread() is affected since it uses its own
  version of do_monotonic()-vgettsc() which will lead to cycles
  go backwards and overflow of nano secs in timespec. We should mimic
  vread_tsc() here so not to run into this kind of issues.
 
 I'm not sure I understand, the code is similar:
 
   arch/x86/kvm/x86.c  arch/x86/vdso/vclock_gettime.c
   do_monotonicdo_monotonic
   vgettsc vgetsns
   read_tscvread_tsc
   vget_cycles
   __native_read_tsc   __native_read_tsc
 
 The VDSO inlines timespec_add_ns.
I'm sorry, I haven't looked inside read_tsc() in
arch/x86/kvm/x86.c, it's the same as vread_tsc().

 
 Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread Gleb Natapov

On Wed, Jul 16, 2014 at 04:32:19PM +0200, Paolo Bonzini wrote:
 Il 16/07/2014 16:07, Andy Lutomirski ha scritto:
 This patch has nothing whatsoever to do with how much I trust the CPU
 vs the hypervisor.  It's for the enormous installed base of machines
 without RDRAND.
 
 Ok.  I think an MSR is fine, though I don't think it's useful for the guest
 to use it if it already has RDRAND and/or RDSEED.
 
Agree. It is unfortunate that we add PV interfaces for a HW that will be extinct
in a couple of years though :(

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ensure guest's kvmclock never goes backwards when TSC jumps backward

2014-07-16 Thread Paolo Bonzini


Il 16/07/2014 16:51, Igor Mammedov ha scritto:

I'm not sure that a per-CPU value is enough; your patch can make
the problem much less frequent of course, but I'm not sure neither
detection nor correction are 100% reliable.  Your addition is
basically a faster but less reliable version of the last_value
logic.


How is it less reliable than last_value logic?


Suppose CPU 1 is behind by 3 nanoseconds

CPU 0  CPU 1
time = 100  (at time 100)
   time = 99(at time 102)
time = 104  (at time 104)
   time = 105   (at time 108)

Your patch will not detect this.

Is it possible for each cpu to have it's own time?


Yes, that's one of the reasons for TSC not to be stable (it could also 
happen simply because the value of TSC_ADJUST MSR is bogus).



tsc_wrap_test starts to fail almost imediately,

I'll check how much tries it takes to fail for the first time, if it is
not too much I guess we could add check to check_system_tsc_reliable().


Thanks!

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ensure guest's kvmclock never goes backwards when TSC jumps backward

2014-07-16 Thread Igor Mammedov

On Wed, 16 Jul 2014 16:55:37 +0200
Paolo Bonzini pbonz...@redhat.com wrote:

 Il 16/07/2014 16:51, Igor Mammedov ha scritto:
  I'm not sure that a per-CPU value is enough; your patch can make
  the problem much less frequent of course, but I'm not sure
  neither detection nor correction are 100% reliable.  Your
  addition is basically a faster but less reliable version of the
  last_value logic.
 
  How is it less reliable than last_value logic?
 
  Suppose CPU 1 is behind by 3 nanoseconds
 
  CPU 0  CPU 1
  time = 100  (at time 100)
 time = 99(at time 102)
  time = 104  (at time 104)
 time = 105   (at time 108)
 
  Your patch will not detect this.
  Is it possible for each cpu to have it's own time?
 
 Yes, that's one of the reasons for TSC not to be stable (it could
 also happen simply because the value of TSC_ADJUST MSR is bogus).
I was wondering not about TSC but kvmclock - sched_clock.
If they are percpu and can differ for each CPU than above diagram if
fine as far as time on each CPU is monotonic and there is not need to
detect that time on each CPU is different.


 
 Paolo
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH qemu] i386,linux-headers: Add support for kvm_get_rng_seed

2014-07-16 Thread Andy Lutomirski

This updates x86's kvm_para.h for the feature bit definition and
target-i386/cpu.c for the feature name and default.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 linux-headers/asm-x86/kvm_para.h | 2 ++
 target-i386/cpu.c| 5 +++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/linux-headers/asm-x86/kvm_para.h b/linux-headers/asm-x86/kvm_para.h
index e41c5c1..a9b27ce 100644
--- a/linux-headers/asm-x86/kvm_para.h
+++ b/linux-headers/asm-x86/kvm_para.h
@@ -24,6 +24,7 @@
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
 #define KVM_FEATURE_PV_UNHALT  7
+#define KVM_FEATURE_GET_RNG_SEED   8
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -40,6 +41,7 @@
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN  0x4b564d04
+#define MSR_KVM_GET_RNG_SEED 0x4b564d05
 
 struct kvm_steal_time {
__u64 steal;
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 8fd1497..4ea7e6c 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -236,7 +236,7 @@ static const char *ext4_feature_name[] = {
 static const char *kvm_feature_name[] = {
 kvmclock, kvm_nopiodelay, kvm_mmu, kvmclock,
 kvm_asyncpf, kvm_steal_time, kvm_pv_eoi, kvm_pv_unhalt,
-NULL, NULL, NULL, NULL,
+kvm_get_rng_seed, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
@@ -368,7 +368,8 @@ static uint32_t kvm_default_features[FEATURE_WORDS] = {
 (1  KVM_FEATURE_ASYNC_PF) |
 (1  KVM_FEATURE_STEAL_TIME) |
 (1  KVM_FEATURE_PV_EOI) |
-(1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT),
+(1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+(1  KVM_FEATURE_GET_RNG_SEED),
 [FEAT_1_ECX] = CPUID_EXT_X2APIC,
 };
 
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread Andy Lutomirski

On Wed, Jul 16, 2014 at 7:32 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 16/07/2014 16:07, Andy Lutomirski ha scritto:

 This patch has nothing whatsoever to do with how much I trust the CPU
 vs the hypervisor.  It's for the enormous installed base of machines
 without RDRAND.


 Ok.  I think an MSR is fine, though I don't think it's useful for the guest
 to use it if it already has RDRAND and/or RDSEED.


  In any case, is there a matching QEMU patch somewhere?

 What QEMU change is needed?  I admit I'm a bit vague on how QEMU and
 KVM cooperate here, but there's no state to save and restore.  I guess
 that QEMU wants the ability to turn this on and off for migration.
 How does that work?  I couldn't spot the KVM code that allows this
 type of control.


 It is QEMU who decides the CPUID bits that are visible to the guest.  By
 default it blocks bits that it doesn't know about.  You would need to add
 the bit in the kvm_default_features and kvm_feature_name arrays.

 For migration, we have versioned machine types, for example pc-2.1.
 Once the versioned machine type exists, blocking the feature is a one-liner
 like

 x86_cpu_compat_disable_kvm_features(FEAT_KVM, KVM_FEATURE_NAME);

 Unfortunately, QEMU is in hard freeze, so you'd likely be the one creating
 pc-2.2.  This is a boilerplate but relatively complicated patch.  But let's
 cross that bridge when we'll reach it.  For now, you can simply add the bit
 to the two arrays above.


Done.

NB: Patch 4 of this series is bad due to an asm constraint issue that
I haven't figured out yet.  I'll send a replacement once I get it
working.  *sigh* the x86 kernel loading code is a bit of a compilation
mess.

 Paolo



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread H. Peter Anvin

On 07/16/2014 07:07 AM, Andy Lutomirski wrote:
 
 This patch has nothing whatsoever to do with how much I trust the CPU
 vs the hypervisor.  It's for the enormous installed base of machines
 without RDRAND.
 
 hpa suggested emulating RDRAND awhile ago, but I think that'll
 unusably slow -- the kernel uses RDRAND in various places where it's
 expected to be fast, and not using it at all will be preferable to
 causing a VM exit for every few bytes.  I've been careful to only use
 this in the guest in places where a few hundred to a few thousand
 cycles per 64 bits of RNG seed is acceptable.
 

I suggested emulating RDRAND *but not set the CPUID bit*.  We already
developed a protocol in KVM/Qemu to enumerate emulated features (created
for MOVBE as I recall), specifically to service the semantic feature X
will work but will be substantially slower than normal.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread Paolo Bonzini


Il 16/07/2014 18:03, H. Peter Anvin ha scritto:

I suggested emulating RDRAND *but not set the CPUID bit*.  We already
developed a protocol in KVM/Qemu to enumerate emulated features (created
for MOVBE as I recall), specifically to service the semantic feature X
will work but will be substantially slower than normal.


But those will set the CPUID bit.  There is currently no way for KVM 
guests to know if a CPUID bit is real or emulated.


Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread H. Peter Anvin

On 07/16/2014 09:08 AM, Paolo Bonzini wrote:
 Il 16/07/2014 18:03, H. Peter Anvin ha scritto:
 I suggested emulating RDRAND *but not set the CPUID bit*.  We already
 developed a protocol in KVM/Qemu to enumerate emulated features (created
 for MOVBE as I recall), specifically to service the semantic feature X
 will work but will be substantially slower than normal.
 
 But those will set the CPUID bit.  There is currently no way for KVM
 guests to know if a CPUID bit is real or emulated.
 

OK, so there wasn't any protocol implemented in the end.  I sit corrected.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH qemu] i386,linux-headers: Add support for kvm_get_rng_seed

2014-07-16 Thread Paolo Bonzini


Il 16/07/2014 17:52, Andy Lutomirski ha scritto:

This updates x86's kvm_para.h for the feature bit definition and
target-i386/cpu.c for the feature name and default.

Signed-off-by: Andy Lutomirski l...@amacapital.net


Thanks, looks good---assuming the kernel side will make it into 3.17, 
I'll sync the headers once 3.17 is released and then apply the patch. 
As mentioned in kvm@ someone will have to add the pc-2.2 machine type too.


Paolo


---
 linux-headers/asm-x86/kvm_para.h | 2 ++
 target-i386/cpu.c| 5 +++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/linux-headers/asm-x86/kvm_para.h b/linux-headers/asm-x86/kvm_para.h
index e41c5c1..a9b27ce 100644
--- a/linux-headers/asm-x86/kvm_para.h
+++ b/linux-headers/asm-x86/kvm_para.h
@@ -24,6 +24,7 @@
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
 #define KVM_FEATURE_PV_UNHALT  7
+#define KVM_FEATURE_GET_RNG_SEED   8

 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -40,6 +41,7 @@
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN  0x4b564d04
+#define MSR_KVM_GET_RNG_SEED 0x4b564d05

 struct kvm_steal_time {
__u64 steal;
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 8fd1497..4ea7e6c 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -236,7 +236,7 @@ static const char *ext4_feature_name[] = {
 static const char *kvm_feature_name[] = {
 kvmclock, kvm_nopiodelay, kvm_mmu, kvmclock,
 kvm_asyncpf, kvm_steal_time, kvm_pv_eoi, kvm_pv_unhalt,
-NULL, NULL, NULL, NULL,
+kvm_get_rng_seed, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
@@ -368,7 +368,8 @@ static uint32_t kvm_default_features[FEATURE_WORDS] = {
 (1  KVM_FEATURE_ASYNC_PF) |
 (1  KVM_FEATURE_STEAL_TIME) |
 (1  KVM_FEATURE_PV_EOI) |
-(1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT),
+(1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+(1  KVM_FEATURE_GET_RNG_SEED),
 [FEAT_1_ECX] = CPUID_EXT_X2APIC,
 };




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread Gleb Natapov

On Wed, Jul 16, 2014 at 09:13:23AM -0700, H. Peter Anvin wrote:
 On 07/16/2014 09:08 AM, Paolo Bonzini wrote:
  Il 16/07/2014 18:03, H. Peter Anvin ha scritto:
  I suggested emulating RDRAND *but not set the CPUID bit*.  We already
  developed a protocol in KVM/Qemu to enumerate emulated features (created
  for MOVBE as I recall), specifically to service the semantic feature X
  will work but will be substantially slower than normal.
  
  But those will set the CPUID bit.  There is currently no way for KVM
  guests to know if a CPUID bit is real or emulated.
  
 
 OK, so there wasn't any protocol implemented in the end.  I sit corrected.
 
That protocol that was implemented is between qemu and kvm, not kvm and a guest.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 1/5] x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit

2014-07-16 Thread Andy Lutomirski

This adds a simple interface to allow a guest to request 64 bits of
host nonblocking entropy.  This is independent of virtio-rng for a
couple of reasons:

 - It's intended to be usable during early boot, when a trivial
   synchronous interface is needed.

 - virtio-rng gives blocking entropy, and making guest boot wait for
   the host's /dev/random will cause problems.

MSR_KVM_GET_RNG_SEED is intended to provide 64 bits of best-effort
cryptographically secure data for use as a seed.  It provides no
guarantee that the result contains any actual entropy.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 Documentation/virtual/kvm/cpuid.txt  | 3 +++
 arch/x86/include/uapi/asm/kvm_para.h | 2 ++
 arch/x86/kvm/cpuid.c | 3 ++-
 arch/x86/kvm/x86.c   | 4 
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/cpuid.txt 
b/Documentation/virtual/kvm/cpuid.txt
index 3c65feb..0ab043b 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT  || 7 || guest checks 
this feature bit
||   || before enabling paravirtualized
||   || spinlock support.
 --
+KVM_FEATURE_GET_RNG_SEED   || 8 || host provides rng seed data via
+   ||   || MSR_KVM_GET_RNG_SEED.
+--
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side
||   || per-cpu warps are expected in
||   || kvmclock.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 94dc8ca..e2eaf93 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -24,6 +24,7 @@
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
 #define KVM_FEATURE_PV_UNHALT  7
+#define KVM_FEATURE_GET_RNG_SEED   8
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -40,6 +41,7 @@
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN  0x4b564d04
+#define MSR_KVM_GET_RNG_SEED 0x4b564d05
 
 struct kvm_steal_time {
__u64 steal;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 38a0afe..40d6763 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -479,7 +479,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 (1  KVM_FEATURE_ASYNC_PF) |
 (1  KVM_FEATURE_PV_EOI) |
 (1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
-(1  KVM_FEATURE_PV_UNHALT);
+(1  KVM_FEATURE_PV_UNHALT) |
+(1  KVM_FEATURE_GET_RNG_SEED);
 
if (sched_info_on())
entry-eax |= (1  KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f644933..4e81853 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -48,6 +48,7 @@
 #include linux/pci.h
 #include linux/timekeeper_internal.h
 #include linux/pvclock_gtod.h
+#include linux/random.h
 #include trace/events/kvm.h
 
 #define CREATE_TRACE_POINTS
@@ -2480,6 +2481,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 *pdata)
case MSR_KVM_PV_EOI_EN:
data = vcpu-arch.pv_eoi.msr_val;
break;
+   case MSR_KVM_GET_RNG_SEED:
+   get_random_bytes(data, sizeof(data));
+   break;
case MSR_IA32_P5_MC_ADDR:
case MSR_IA32_P5_MC_TYPE:
case MSR_IA32_MCG_CAP:
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 5/5] x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

2014-07-16 Thread Andy Lutomirski

It's considerably better than any of the alternatives on KVM.

Rather than reinventing all of the cpu feature query code, this fixes
native_cpuid to work in PIC objects.

I haven't combined it with boot/cpuflags.c's cpuid implementation:
including asm/processor.h from boot/cpuflags.c results in a flood of
unrelated errors, and fixing it might be messy.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/boot/compressed/aslr.c  | 27 +++
 arch/x86/include/asm/processor.h | 21 ++---
 2 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index fc6091a..8583f0e 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -5,6 +5,8 @@
 #include asm/archrandom.h
 #include asm/e820.h
 
+#include uapi/asm/kvm_para.h
+
 #include generated/compile.h
 #include linux/module.h
 #include linux/uts.h
@@ -15,6 +17,22 @@
 static const char build_str[] = UTS_RELEASE  ( LINUX_COMPILE_BY @
LINUX_COMPILE_HOST ) ( LINUX_COMPILER )  UTS_VERSION;
 
+static bool kvm_para_has_feature(unsigned int feature)
+{
+   u32 kvm_base;
+   u32 features;
+
+   if (!has_cpuflag(X86_FEATURE_HYPERVISOR))
+   return false;
+
+   kvm_base = hypervisor_cpuid_base(KVMKVMKVM\0\0\0, KVM_CPUID_FEATURES);
+   if (!kvm_base)
+   return false;
+
+   features = cpuid_eax(kvm_base | KVM_CPUID_FEATURES);
+   return features  (1UL  feature);
+}
+
 #define I8254_PORT_CONTROL 0x43
 #define I8254_PORT_COUNTER00x40
 #define I8254_CMD_READBACK 0xC0
@@ -81,6 +99,15 @@ static unsigned long get_random_long(void)
}
}
 
+   if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) {
+   u64 seed;
+
+   debug_putstr( MSR_KVM_GET_RNG_SEED);
+   rdmsrl(MSR_KVM_GET_RNG_SEED, seed);
+   random ^= (unsigned long)seed;
+   use_i8254 = false;
+   }
+
if (has_cpuflag(X86_FEATURE_TSC)) {
debug_putstr( RDTSC);
rdtscll(raw);
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index a4ea023..6096f3c 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -189,10 +189,25 @@ static inline int have_cpuid_p(void)
 static inline void native_cpuid(unsigned int *eax, unsigned int *ebx,
unsigned int *ecx, unsigned int *edx)
 {
-   /* ecx is often an input as well as an output. */
-   asm volatile(cpuid
+   /*
+* This function can be used from the boot code, so it needs
+* to avoid using EBX in constraints in PIC mode.
+*
+* ecx is often an input as well as an output.
+*/
+   asm volatile(.ifnc %%ebx,%1 ; .ifnc %%rbx,%1   \n\t
+movl  %%ebx,%1\n\t
+.endif ; .endif   \n\t
+cpuid \n\t
+.ifnc %%ebx,%1 ; .ifnc %%rbx,%1   \n\t
+xchgl %%ebx,%1\n\t
+.endif ; .endif
: =a (*eax),
- =b (*ebx),
+#if defined(__i386__)  defined(__PIC__)
+ =r (*ebx),  /* gcc won't let us use ebx */
+#else
+ =b (*ebx),  /* ebx is okay */
+#endif
  =c (*ecx),
  =d (*edx)
: 0 (*eax), 2 (*ecx)
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 4/5] random: Log how many bits we managed to seed with in init_std_data

2014-07-16 Thread Andy Lutomirski

This is useful for making sure that init_std_data is working
correctly and for allaying fear when this happens:

random: xyz urandom read with SMALL_NUMBER bits of entropy available

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 drivers/char/random.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index e2c3d02..10e9642 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1251,12 +1251,16 @@ static void init_std_data(struct entropy_store *r)
int i;
ktime_t now = ktime_get_real();
unsigned long rv;
+   int arch_seed_bits = 0, arch_random_bits = 0, slow_rng_bits = 0;
 
r-last_pulled = jiffies;
mix_pool_bytes(r, now, sizeof(now), NULL);
for (i = r-poolinfo-poolbytes; i  0; i -= sizeof(rv)) {
-   if (!arch_get_random_seed_long(rv) 
-   !arch_get_random_long(rv))
+   if (arch_get_random_seed_long(rv))
+   arch_seed_bits += 8 * sizeof(rv);
+   else if (arch_get_random_long(rv))
+   arch_random_bits += 8 * sizeof(rv);
+   else
rv = random_get_entropy();
mix_pool_bytes(r, rv, sizeof(rv), NULL);
}
@@ -1265,10 +1269,14 @@ static void init_std_data(struct entropy_store *r)
for (i = 0; i  4; i++) {
u64 rv64;
 
-   if (arch_get_slow_rng_u64(rv64))
+   if (arch_get_slow_rng_u64(rv64)) {
mix_pool_bytes(r, rv64, sizeof(rv64), NULL);
+   slow_rng_bits += 8 * sizeof(rv64);
}
}
+
+   pr_info(random: seeded %s pool with %d bits of arch random seed, %d 
bits of arch random, and %d bits of arch slow rng\n,
+   r-name, arch_seed_bits, arch_random_bits, slow_rng_bits);
 }
 
 /*
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 2/5] random,x86: Add arch_get_slow_rng_u64

2014-07-16 Thread Andy Lutomirski

arch_get_slow_rng_u64 tries to get 64 bits of RNG seed data.  Unlike
arch_get_random_{bytes,seed}, etc., it makes no claims about entropy
content.  It's also likely to be much slower and should not be used
frequently.  That being said, it should be fast enough to call
several times during boot without any noticeable slowdown.

This initial implementation backs it with MSR_KVM_GET_RNG_SEED if
available.  The intent is for other hypervisor guest implementations
to implement this interface.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/Kconfig   |  4 
 arch/x86/include/asm/archslowrng.h | 30 ++
 arch/x86/kernel/kvm.c  | 22 ++
 include/linux/random.h |  9 +
 4 files changed, 65 insertions(+)
 create mode 100644 arch/x86/include/asm/archslowrng.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a8f749e..4dfb539 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -593,6 +593,7 @@ config KVM_GUEST
bool KVM Guest support (including kvmclock)
depends on PARAVIRT
select PARAVIRT_CLOCK
+   select ARCH_SLOW_RNG
default y
---help---
  This option enables various optimizations for running under the KVM
@@ -627,6 +628,9 @@ config PARAVIRT_TIME_ACCOUNTING
 config PARAVIRT_CLOCK
bool
 
+config ARCH_SLOW_RNG
+   bool
+
 endif #HYPERVISOR_GUEST
 
 config NO_BOOTMEM
diff --git a/arch/x86/include/asm/archslowrng.h 
b/arch/x86/include/asm/archslowrng.h
new file mode 100644
index 000..c8e8d0d
--- /dev/null
+++ b/arch/x86/include/asm/archslowrng.h
@@ -0,0 +1,30 @@
+/*
+ * This file is part of the Linux kernel.
+ *
+ * Copyright (c) 2014 Andy Lutomirski
+ * Authors: Andy Lutomirski l...@amacapital.net
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#ifndef ASM_X86_ARCHSLOWRANDOM_H
+#define ASM_X86_ARCHSLOWRANDOM_H
+
+#ifndef CONFIG_ARCH_SLOW_RNG
+# error archslowrng.h should not be included if !CONFIG_ARCH_SLOW_RNG
+#endif
+
+/*
+ * Performance is irrelevant here, so there's no point in using the
+ * paravirt ops mechanism.  Instead just use a function pointer.
+ */
+extern int (*arch_get_slow_rng_u64)(u64 *v);
+
+#endif /* ASM_X86_ARCHSLOWRANDOM_H */
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 3dd8e2c..8d64d28 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -416,6 +416,25 @@ void kvm_disable_steal_time(void)
wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
 }
 
+static int nop_get_slow_rng_u64(u64 *v)
+{
+   return 0;
+}
+
+static int kvm_get_slow_rng_u64(u64 *v)
+{
+   /*
+* Allow migration from a hypervisor with the GET_RNG_SEED
+* feature to a hypervisor without it.
+*/
+   if (rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0)
+   return 1;
+   else
+   return 0;
+}
+
+int (*arch_get_slow_rng_u64)(u64 *v) = nop_get_slow_rng_u64;
+
 #ifdef CONFIG_SMP
 static void __init kvm_smp_prepare_boot_cpu(void)
 {
@@ -493,6 +512,9 @@ void __init kvm_guest_init(void)
if (kvmclock_vsyscall)
kvm_setup_vsyscall_timeinfo();
 
+   if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED))
+   arch_get_slow_rng_u64 = kvm_get_slow_rng_u64;
+
 #ifdef CONFIG_SMP
smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
register_cpu_notifier(kvm_cpu_notifier);
diff --git a/include/linux/random.h b/include/linux/random.h
index 57fbbff..ceafbcf 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -106,6 +106,15 @@ static inline int arch_has_random_seed(void)
 }
 #endif
 
+#ifdef CONFIG_ARCH_SLOW_RNG
+# include asm/archslowrng.h
+#else
+static inline int arch_get_slow_rng_u64(u64 *v)
+{
+   return 0;
+}
+#endif
+
 /* Pseudo random number generator from numerical recipes. */
 static inline u32 next_pseudo_random32(u32 seed)
 {
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 3/5] random: Seed pools from arch_get_slow_rng_u64 at startup

2014-07-16 Thread Andy Lutomirski

This should help solve the problem of guests starting out with
predictable RNG state.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 drivers/char/random.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 0a7ac0a..e2c3d02 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1261,6 +1261,14 @@ static void init_std_data(struct entropy_store *r)
mix_pool_bytes(r, rv, sizeof(rv), NULL);
}
mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL);
+
+   for (i = 0; i  4; i++) {
+   u64 rv64;
+
+   if (arch_get_slow_rng_u64(rv64))
+   mix_pool_bytes(r, rv64, sizeof(rv64), NULL);
+   }
+   }
 }
 
 /*
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 0/5] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread Andy Lutomirski

virtio-rng is both too complicated and insufficient for initial rng
seeding.  It's far too complicated to use for KASLR or any other
early boot random number needs.  It also provides /dev/random-style
bits, which means that making guest boot wait for virtio-rng is
unacceptably slow, and doing it asynchronously means that
/dev/urandom might be predictable when userspace starts.

This introduces a very simple synchronous mechanism to get
/dev/urandom-style bits.

I sent the corresponding kvm-unit-tests and qemu changes separately.

There's room for bikeshedding on the same arch_get_slow_rng_u64.  I
considered arch_get_rng_seed_u64, but that could be confused with
arch_get_random_seed_long, which is not interchangeable.

Changes from v1:
 - Split patches 2 and 3
 - Log all arch sources in init_std_data
 - Fix the 32-bit kaslr build

Andy Lutomirski (5):
  x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
  random,x86: Add arch_get_slow_rng_u64
  random: Seed pools from arch_get_slow_rng_u64 at startup
  random: Log how many bits we managed to seed with in init_std_data
  x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

 Documentation/virtual/kvm/cpuid.txt  |  3 +++
 arch/x86/Kconfig |  4 
 arch/x86/boot/compressed/aslr.c  | 27 +++
 arch/x86/include/asm/archslowrng.h   | 30 ++
 arch/x86/include/asm/processor.h | 21 ++---
 arch/x86/include/uapi/asm/kvm_para.h |  2 ++
 arch/x86/kernel/kvm.c| 22 ++
 arch/x86/kvm/cpuid.c |  3 ++-
 arch/x86/kvm/x86.c   |  4 
 drivers/char/random.c| 20 ++--
 include/linux/random.h   |  9 +
 11 files changed, 139 insertions(+), 6 deletions(-)
 create mode 100644 arch/x86/include/asm/archslowrng.h

-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 0/5] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread Bandan Das

Andy Lutomirski l...@amacapital.net writes:

 virtio-rng is both too complicated and insufficient for initial rng
 seeding.  It's far too complicated to use for KASLR or any other
 early boot random number needs.  It also provides /dev/random-style
 bits, which means that making guest boot wait for virtio-rng is
 unacceptably slow, and doing it asynchronously means that
 /dev/urandom might be predictable when userspace starts.

 This introduces a very simple synchronous mechanism to get
 /dev/urandom-style bits.

Whoa! the cover letter seems more like virtio-rng bashing rather than
introduction to the patchset (and/or it's advantages over existing methods)
:) That's ok though I guess, these won't be in the commit log.

 I sent the corresponding kvm-unit-tests and qemu changes separately.

 There's room for bikeshedding on the same arch_get_slow_rng_u64.  I
 considered arch_get_rng_seed_u64, but that could be confused with
 arch_get_random_seed_long, which is not interchangeable.

 Changes from v1:
  - Split patches 2 and 3
  - Log all arch sources in init_std_data
  - Fix the 32-bit kaslr build

 Andy Lutomirski (5):
   x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
   random,x86: Add arch_get_slow_rng_u64
   random: Seed pools from arch_get_slow_rng_u64 at startup
   random: Log how many bits we managed to seed with in init_std_data
   x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

  Documentation/virtual/kvm/cpuid.txt  |  3 +++
  arch/x86/Kconfig |  4 
  arch/x86/boot/compressed/aslr.c  | 27 +++
  arch/x86/include/asm/archslowrng.h   | 30 ++
  arch/x86/include/asm/processor.h | 21 ++---
  arch/x86/include/uapi/asm/kvm_para.h |  2 ++
  arch/x86/kernel/kvm.c| 22 ++
  arch/x86/kvm/cpuid.c |  3 ++-
  arch/x86/kvm/x86.c   |  4 
  drivers/char/random.c| 20 ++--
  include/linux/random.h   |  9 +
  11 files changed, 139 insertions(+), 6 deletions(-)
  create mode 100644 arch/x86/include/asm/archslowrng.h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 0/5] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread Andy Lutomirski

On Wed, Jul 16, 2014 at 11:02 AM, Bandan Das b...@redhat.com wrote:
 Andy Lutomirski l...@amacapital.net writes:

 virtio-rng is both too complicated and insufficient for initial rng
 seeding.  It's far too complicated to use for KASLR or any other
 early boot random number needs.  It also provides /dev/random-style
 bits, which means that making guest boot wait for virtio-rng is
 unacceptably slow, and doing it asynchronously means that
 /dev/urandom might be predictable when userspace starts.

 This introduces a very simple synchronous mechanism to get
 /dev/urandom-style bits.

 Whoa! the cover letter seems more like virtio-rng bashing rather than
 introduction to the patchset (and/or it's advantages over existing methods)
 :) That's ok though I guess, these won't be in the commit log.


Yeah, sorry -- I figured that the biggest objection would be just use
virtio-rng.

I'll send a v3 later today -- there's a trivial bisectability bug in
this version.

--Andy

 I sent the corresponding kvm-unit-tests and qemu changes separately.

 There's room for bikeshedding on the same arch_get_slow_rng_u64.  I
 considered arch_get_rng_seed_u64, but that could be confused with
 arch_get_random_seed_long, which is not interchangeable.

 Changes from v1:
  - Split patches 2 and 3
  - Log all arch sources in init_std_data
  - Fix the 32-bit kaslr build

 Andy Lutomirski (5):
   x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
   random,x86: Add arch_get_slow_rng_u64
   random: Seed pools from arch_get_slow_rng_u64 at startup
   random: Log how many bits we managed to seed with in init_std_data
   x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

  Documentation/virtual/kvm/cpuid.txt  |  3 +++
  arch/x86/Kconfig |  4 
  arch/x86/boot/compressed/aslr.c  | 27 +++
  arch/x86/include/asm/archslowrng.h   | 30 ++
  arch/x86/include/asm/processor.h | 21 ++---
  arch/x86/include/uapi/asm/kvm_para.h |  2 ++
  arch/x86/kernel/kvm.c| 22 ++
  arch/x86/kvm/cpuid.c |  3 ++-
  arch/x86/kvm/x86.c   |  4 
  drivers/char/random.c| 20 ++--
  include/linux/random.h   |  9 +
  11 files changed, 139 insertions(+), 6 deletions(-)
  create mode 100644 arch/x86/include/asm/archslowrng.h



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] vfio-pci: Release devices with BusMaster disabled

2014-07-16 Thread Alex Williamson

Our current open/release path looks like this:

vfio_pci_open
  vfio_pci_enable
pci_enable_device
pci_save_state
pci_store_saved_state

vfio_pci_release
  vfio_pci_disable
pci_disable_device
pci_restore_state

pci_enable_device() doesn't modify PCI_COMMAND_MASTER, so if a device
comes to us with it enabled, it persists through the open and gets
stored as part of the device saved state.  We then restore that saved
state when released, which can allow the device to attempt to continue
to do DMA.  When the group is disconnected from the domain, this will
get caught by the IOMMU, but if there are other devices in the group,
the device may continue running and interfere with the user.  Even in
the former case, IOMMUs don't necessarily behave well and a stream of
blocked DMA can result in unpleasant behavior on the host.

Explicitly disable Bus Master as we're enabling the device and
slightly re-work release to make sure that pci_disable_device() is
the last thing that touches the device.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 drivers/vfio/pci/vfio_pci.c |   10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 010e0f8..36d8332 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -44,6 +44,9 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
u16 cmd;
u8 msix_pos;
 
+   /* Don't allow our initial saved state to include busmaster */
+   pci_clear_master(pdev);
+
ret = pci_enable_device(pdev);
if (ret)
return ret;
@@ -99,7 +102,8 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
struct pci_dev *pdev = vdev-pdev;
int bar;
 
-   pci_disable_device(pdev);
+   /* Stop the device from further DMA */
+   pci_clear_master(pdev);
 
vfio_pci_set_irqs_ioctl(vdev, VFIO_IRQ_SET_DATA_NONE |
VFIO_IRQ_SET_ACTION_TRIGGER,
@@ -128,7 +132,7 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
__func__, dev_name(pdev-dev));
 
if (!vdev-reset_works)
-   return;
+   goto out;
 
pci_save_state(pdev);
}
@@ -151,6 +155,8 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
}
 
pci_restore_state(pdev);
+out:
+   pci_disable_device(pdev);
 }
 
 static void vfio_pci_release(void *device_data)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] vfio-pci: Attempt bus/slot reset on release

2014-07-16 Thread Alex Williamson

Each time a device is released, mark whether a local reset was
successful or whether a bus/slot reset is needed.  If a reset is
needed and all of the affected devices are bound to vfio-pci and
unused, allow the reset.  This is most useful when the userspace
driver is killed and releases all the devices in an unclean state,
such as when a QEMU VM quits.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 drivers/vfio/pci/vfio_pci.c |  112 +++
 drivers/vfio/pci/vfio_pci_private.h |1 
 2 files changed, 113 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index c4949a7..f95b90f 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -39,6 +39,8 @@ MODULE_PARM_DESC(nointxmask,
 
 static DEFINE_MUTEX(driver_lock);
 
+static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev);
+
 static int vfio_pci_enable(struct vfio_pci_device *vdev)
 {
struct pci_dev *pdev = vdev-pdev;
@@ -123,6 +125,8 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
vdev-barmap[bar] = NULL;
}
 
+   vdev-needs_reset = true;
+
/*
 * If we have saved state, restore it.  If we can reset the device,
 * even better.  Resetting with current state seems better than
@@ -154,11 +158,15 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
if (ret)
pr_warn(%s: Failed to reset device %s (%d)\n,
__func__, dev_name(pdev-dev), ret);
+   else
+   vdev-needs_reset = false;
}
 
pci_restore_state(pdev);
 out:
pci_disable_device(pdev);
+
+   vfio_pci_try_bus_reset(vdev);
 }
 
 static void vfio_pci_release(void *device_data)
@@ -917,6 +925,110 @@ static struct pci_driver vfio_pci_driver = {
.err_handler= vfio_err_handlers,
 };
 
+/*
+ * Test whether a reset is necessary and possible.  We mark devices as
+ * needs_reset when they are released, but don't have a function-local reset
+ * available.  If any of these exist in the affected devices, we want to do
+ * a bus/slot reset.  We also need all of the affected devices to be unused,
+ * so we abort if any device has a non-zero refcnt.  driver_lock prevents a
+ * device from being opened during the scan or unbound from vfio-pci.
+ */
+static int vfio_pci_test_bus_reset(struct pci_dev *pdev, void *data)
+{
+   bool *needs_reset = data;
+   struct pci_driver *pci_drv = ACCESS_ONCE(pdev-driver);
+   int ret = -EBUSY;
+
+   if (pci_drv == vfio_pci_driver) {
+   struct vfio_device *device;
+   struct vfio_pci_device *vdev;
+
+   device = vfio_device_get_from_dev(pdev-dev);
+   if (!device)
+   return ret;
+
+   vdev = vfio_device_data(device);
+   if (vdev) {
+   if (vdev-needs_reset)
+   *needs_reset = true;
+
+   if (!vdev-refcnt)
+   ret = 0;
+   }
+
+   vfio_device_put(device);
+   }
+
+   /*
+* TODO: vfio-core considers groups to be viable even if some devices
+* are attached to known drivers, like pci-stub or pcieport.  We can't
+* freeze devices from being unbound to those drivers like we can
+* here though, so it would be racy to test for them.  We also can't
+* use device_lock() to prevent changes as that would interfere with
+* PCI-core taking device_lock during bus reset.  For now, we require
+* devices to be bound to vfio-pci to get a bus/slot reset on release.
+*/
+
+   return ret;
+}
+
+/* Clear needs_reset on all affected devices after successful bus/slot reset */
+static int vfio_pci_clear_needs_reset(struct pci_dev *pdev, void *data)
+{
+   struct pci_driver *pci_drv = ACCESS_ONCE(pdev-driver);
+
+   if (pci_drv == vfio_pci_driver) {
+   struct vfio_device *device;
+   struct vfio_pci_device *vdev;
+
+   device = vfio_device_get_from_dev(pdev-dev);
+   if (!device)
+   return 0;
+
+   vdev = vfio_device_data(device);
+   if (vdev)
+   vdev-needs_reset = false;
+
+   vfio_device_put(device);
+   }
+
+   return 0;
+}
+
+/*
+ * Attempt to do a bus/slot reset if there are devices affected by a reset for
+ * this device that are needs_reset and all of the affected devices are unused
+ * (!refcnt).  Callers of this function are required to hold driver_lock such
+ * that devices can not be unbound from vfio-pci or opened by a user while we
+ * test for and perform a bus/slot reset.
+ */
+static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev)
+{
+   bool needs_reset = false, slot = false;
+   int ret;
+
+   if

[PATCH 0/3] vfio-pci: Reset improvements

2014-07-16 Thread Alex Williamson

This series is intended to improve the state of devices returned back
to the host from vfio-pci or re-used by another user.  First we make
sure that busmaster is disabled in the saved state, so the device
cannot continue to do DMA, then we add some serialization, move our
reference counting under it to fix an unlikely bug should we fail to
initialize a device, and add the ability to do bus/slot reset on
device release.  To do this, we require all devices affected by the
bus/slot reset to be bound to vfio-pci, therefore users sequestering
devices with pci-stub will need to bind them to vfio-pci to see this
change.

The effect of these changes are perhaps most noticeable with GPU
assignment to a VM, where killing QEMU results in a static image on
the framebuffer since no reset of the device was done.  Returning
the GPU to a host device at this point was suspect.  Other devices,
like USB controllers, also don't necessarily appreciate being abruptly
disconnected from their IOMMU domain and would generate IOMMU faults
in the event the user process is killed.  Both of these cases should
be resolved here, assuming all the devices on the bus are bound to
vfio-pci and at least one of the devices in use does not support
a function-local reset.

Please test and comment.  Thanks,

Alex

---

Alex Williamson (3):
  vfio-pci: Attempt bus/slot reset on release
  vfio-pci: Use mutex around open, release, and remove
  vfio-pci: Release devices with BusMaster disabled


 drivers/vfio/pci/vfio_pci.c |  157 ---
 drivers/vfio/pci/vfio_pci_private.h |3 -
 2 files changed, 147 insertions(+), 13 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] vfio-pci: Use mutex around open, release, and remove

2014-07-16 Thread Alex Williamson

Serializing open/release allows us to fix a refcnt error if we fail
to enable the device and lets us prevent devices from being unbound
or opened, giving us an opportunity to do bus resets on release.  No
restriction added to serialize binding devices to vfio-pci while the
mutex is held though.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 drivers/vfio/pci/vfio_pci.c |   35 +--
 drivers/vfio/pci/vfio_pci_private.h |2 +-
 2 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 36d8332..c4949a7 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -37,6 +37,8 @@ module_param_named(nointxmask, nointxmask, bool, S_IRUGO | 
S_IWUSR);
 MODULE_PARM_DESC(nointxmask,
  Disable support for PCI 2.3 style INTx masking.  If this 
resolves problems for specific devices, report lspci -vvvxxx to 
linux-...@vger.kernel.org so the device can be fixed automatically via the 
broken_intx_masking flag.);
 
+static DEFINE_MUTEX(driver_lock);
+
 static int vfio_pci_enable(struct vfio_pci_device *vdev)
 {
struct pci_dev *pdev = vdev-pdev;
@@ -163,28 +165,39 @@ static void vfio_pci_release(void *device_data)
 {
struct vfio_pci_device *vdev = device_data;
 
-   if (atomic_dec_and_test(vdev-refcnt))
+   mutex_lock(driver_lock);
+
+   if (!(--vdev-refcnt))
vfio_pci_disable(vdev);
 
+   mutex_unlock(driver_lock);
+
module_put(THIS_MODULE);
 }
 
 static int vfio_pci_open(void *device_data)
 {
struct vfio_pci_device *vdev = device_data;
+   int ret = 0;
 
if (!try_module_get(THIS_MODULE))
return -ENODEV;
 
-   if (atomic_inc_return(vdev-refcnt) == 1) {
-   int ret = vfio_pci_enable(vdev);
+   mutex_lock(driver_lock);
+
+   if (!vdev-refcnt) {
+   ret = vfio_pci_enable(vdev);
if (ret) {
module_put(THIS_MODULE);
-   return ret;
+   goto unlock;
}
}
+   vdev-refcnt++;
 
-   return 0;
+unlock:
+   mutex_unlock(driver_lock);
+
+   return ret;
 }
 
 static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
@@ -839,7 +852,6 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
vdev-irq_type = VFIO_PCI_NUM_IRQS;
mutex_init(vdev-igate);
spin_lock_init(vdev-irqlock);
-   atomic_set(vdev-refcnt, 0);
 
ret = vfio_add_group_dev(pdev-dev, vfio_pci_ops, vdev);
if (ret) {
@@ -854,12 +866,15 @@ static void vfio_pci_remove(struct pci_dev *pdev)
 {
struct vfio_pci_device *vdev;
 
+   mutex_lock(driver_lock);
+
vdev = vfio_del_group_dev(pdev-dev);
-   if (!vdev)
-   return;
+   if (vdev) {
+   iommu_group_put(pdev-dev.iommu_group);
+   kfree(vdev);
+   }
 
-   iommu_group_put(pdev-dev.iommu_group);
-   kfree(vdev);
+   mutex_unlock(driver_lock);
 }
 
 static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
diff --git a/drivers/vfio/pci/vfio_pci_private.h 
b/drivers/vfio/pci/vfio_pci_private.h
index 9c6d5d0..31e7a30 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -55,7 +55,7 @@ struct vfio_pci_device {
boolbardirty;
boolhas_vga;
struct pci_saved_state  *pci_saved_state;
-   atomic_trefcnt;
+   int refcnt;
struct eventfd_ctx  *err_trigger;
 };
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread H. Peter Anvin

On 07/16/2014 09:21 AM, Gleb Natapov wrote:
 On Wed, Jul 16, 2014 at 09:13:23AM -0700, H. Peter Anvin wrote:
 On 07/16/2014 09:08 AM, Paolo Bonzini wrote:
 Il 16/07/2014 18:03, H. Peter Anvin ha scritto:
 I suggested emulating RDRAND *but not set the CPUID bit*.  We already
 developed a protocol in KVM/Qemu to enumerate emulated features (created
 for MOVBE as I recall), specifically to service the semantic feature X
 will work but will be substantially slower than normal.

 But those will set the CPUID bit.  There is currently no way for KVM
 guests to know if a CPUID bit is real or emulated.


 OK, so there wasn't any protocol implemented in the end.  I sit corrected.

 That protocol that was implemented is between qemu and kvm, not kvm and a 
 guest.
 

Either which way, the notion was to have a PV CPUID bit like the
proposed kvm_get_rng_seed bit, but to have it exercised by executing RDRAND.

The biggest reason to *not* do this would be that with an MSR it is not
available to guest user space, which may be better under the circumstances.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch V2 43/64] x86: kvm: Use ktime_get_boot_ns()

2014-07-16 Thread Thomas Gleixner

Use the new nanoseconds based interface and get rid of the timespec
conversion dance.

Signed-off-by: Thomas Gleixner t...@linutronix.de
Cc: Gleb Natapov g...@kernel.org
Cc: kvm@vger.kernel.org
---
 arch/x86/kvm/x86.c |6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

Index: tip/arch/x86/kvm/x86.c
===
--- tip.orig/arch/x86/kvm/x86.c
+++ tip/arch/x86/kvm/x86.c
@@ -1109,11 +1109,7 @@ static void kvm_get_time_scale(uint32_t
 
 static inline u64 get_kernel_ns(void)
 {
-   struct timespec ts;
-
-   ktime_get_ts(ts);
-   monotonic_to_bootbased(ts);
-   return timespec_to_ns(ts);
+   return ktime_get_boot_ns();
 }
 
 #ifdef CONFIG_X86_64


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch V2 44/64] x86: kvm: Make kvm_get_time_and_clockread() nanoseconds based

2014-07-16 Thread Thomas Gleixner

Convert the relevant base data right away to nanoseconds instead of
doing the conversion on every readout. Reduces text size by 160 bytes.

Signed-off-by: Thomas Gleixner t...@linutronix.de
Cc: Gleb Natapov g...@kernel.org
Cc: kvm@vger.kernel.org
---
 arch/x86/kvm/x86.c |   44 ++--
 1 file changed, 14 insertions(+), 30 deletions(-)

Index: tip/arch/x86/kvm/x86.c
===
--- tip.orig/arch/x86/kvm/x86.c
+++ tip/arch/x86/kvm/x86.c
@@ -984,9 +984,8 @@ struct pvclock_gtod_data {
u32 shift;
} clock;
 
-   /* open coded 'struct timespec' */
-   u64 monotonic_time_snsec;
-   time_t  monotonic_time_sec;
+   u64 boot_ns;
+   u64 nsec_base;
 };
 
 static struct pvclock_gtod_data pvclock_gtod_data;
@@ -994,6 +993,9 @@ static struct pvclock_gtod_data pvclock_
 static void update_pvclock_gtod(struct timekeeper *tk)
 {
struct pvclock_gtod_data *vdata = pvclock_gtod_data;
+   u64 boot_ns;
+
+   boot_ns = ktime_to_ns(ktime_add(tk-base_mono, tk-offs_boot));
 
write_seqcount_begin(vdata-seq);
 
@@ -1004,17 +1006,8 @@ static void update_pvclock_gtod(struct t
vdata-clock.mult   = tk-mult;
vdata-clock.shift  = tk-shift;
 
-   vdata-monotonic_time_sec   = tk-xtime_sec
-   + tk-wall_to_monotonic.tv_sec;
-   vdata-monotonic_time_snsec = tk-xtime_nsec
-   + (tk-wall_to_monotonic.tv_nsec
-tk-shift);
-   while (vdata-monotonic_time_snsec =
-   (((u64)NSEC_PER_SEC)  tk-shift)) {
-   vdata-monotonic_time_snsec -=
-   ((u64)NSEC_PER_SEC)  tk-shift;
-   vdata-monotonic_time_sec++;
-   }
+   vdata-boot_ns  = boot_ns;
+   vdata-nsec_base= tk-xtime_nsec;
 
write_seqcount_end(vdata-seq);
 }
@@ -1371,23 +1364,22 @@ static inline u64 vgettsc(cycle_t *cycle
return v * gtod-clock.mult;
 }
 
-static int do_monotonic(struct timespec *ts, cycle_t *cycle_now)
+static int do_monotonic_boot(s64 *t, cycle_t *cycle_now)
 {
+   struct pvclock_gtod_data *gtod = pvclock_gtod_data;
unsigned long seq;
-   u64 ns;
int mode;
-   struct pvclock_gtod_data *gtod = pvclock_gtod_data;
+   u64 ns;
 
-   ts-tv_nsec = 0;
do {
seq = read_seqcount_begin(gtod-seq);
mode = gtod-clock.vclock_mode;
-   ts-tv_sec = gtod-monotonic_time_sec;
-   ns = gtod-monotonic_time_snsec;
+   ns = gtod-nsec_base;
ns += vgettsc(cycle_now);
ns = gtod-clock.shift;
+   ns += gtod-boot_ns;
} while (unlikely(read_seqcount_retry(gtod-seq, seq)));
-   timespec_add_ns(ts, ns);
+   *t = ns;
 
return mode;
 }
@@ -1395,19 +1387,11 @@ static int do_monotonic(struct timespec
 /* returns true if host is using tsc clocksource */
 static bool kvm_get_time_and_clockread(s64 *kernel_ns, cycle_t *cycle_now)
 {
-   struct timespec ts;
-
/* checked again under seqlock below */
if (pvclock_gtod_data.clock.vclock_mode != VCLOCK_TSC)
return false;
 
-   if (do_monotonic(ts, cycle_now) != VCLOCK_TSC)
-   return false;
-
-   monotonic_to_bootbased(ts);
-   *kernel_ns = timespec_to_ns(ts);
-
-   return true;
+   return do_monotonic_boot(kernel_ns, cycle_now) == VCLOCK_TSC;
 }
 #endif
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread Andy Lutomirski

On Wed, Jul 16, 2014 at 1:20 PM, H. Peter Anvin h...@zytor.com wrote:
 On 07/16/2014 09:21 AM, Gleb Natapov wrote:
 On Wed, Jul 16, 2014 at 09:13:23AM -0700, H. Peter Anvin wrote:
 On 07/16/2014 09:08 AM, Paolo Bonzini wrote:
 Il 16/07/2014 18:03, H. Peter Anvin ha scritto:
 I suggested emulating RDRAND *but not set the CPUID bit*.  We already
 developed a protocol in KVM/Qemu to enumerate emulated features (created
 for MOVBE as I recall), specifically to service the semantic feature X
 will work but will be substantially slower than normal.

 But those will set the CPUID bit.  There is currently no way for KVM
 guests to know if a CPUID bit is real or emulated.


 OK, so there wasn't any protocol implemented in the end.  I sit corrected.

 That protocol that was implemented is between qemu and kvm, not kvm and a 
 guest.


 Either which way, the notion was to have a PV CPUID bit like the
 proposed kvm_get_rng_seed bit, but to have it exercised by executing RDRAND.

 The biggest reason to *not* do this would be that with an MSR it is not
 available to guest user space, which may be better under the circumstances.

On the theory that I see no legitimate reason to expose this to guest
user space, I think we shouldn't expose it.  If we wanted to add a
get_random_bytes syscall, that would be an entirely different story,
though.

Should I send v3 as one series or should I split it into host and guest parts?

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread H. Peter Anvin

On 07/16/2014 02:32 PM, Andy Lutomirski wrote:
 
 On the theory that I see no legitimate reason to expose this to guest
 user space, I think we shouldn't expose it.  If we wanted to add a
 get_random_bytes syscall, that would be an entirely different story,
 though.
 
 Should I send v3 as one series or should I split it into host and guest parts?
 

It doesn't matter... as long as they are separate *patches*.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 0/5] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread Andy Lutomirski

This introduces and uses a very simple synchronous mechanism to get
/dev/urandom-style bits appropriate for initial KVM PV guest RNG
seeding.

virtio-rng is not suitable for this purpose.  It's too difficult to
enumerate for use in early boot (e.g. KASLR, which runs before we
even have an IDT).  It also provides /dev/random-style bits, which
means that making guest boot wait for virtio-rng is unacceptably
slow, and doing it asynchronously means that /dev/urandom might
still be predictable when userspace starts.

I sent the corresponding kvm-unit-tests and qemu changes separately.

There's room for bikeshedding on the same arch_get_slow_rng_u64.  I
considered arch_get_rng_seed_u64, but that could be confused with
arch_get_random_seed_long, which is not interchangeable.

Changes from v2:
 - Bisection fix (patch 2 had a misplaced brace).  The final states is
   identical to that of v2.
 - Improve the 0/5 description a little bit.

Changes from v1:
 - Split patches 2 and 3
 - Log all arch sources in init_std_data
 - Fix the 32-bit kaslr build

Andy Lutomirski (5):
  x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
  random,x86: Add arch_get_slow_rng_u64
  random: Seed pools from arch_get_slow_rng_u64 at startup
  random: Log how many bits we managed to seed with in init_std_data
  x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

 Documentation/virtual/kvm/cpuid.txt  |  3 +++
 arch/x86/Kconfig |  4 
 arch/x86/boot/compressed/aslr.c  | 27 +++
 arch/x86/include/asm/archslowrng.h   | 30 ++
 arch/x86/include/asm/processor.h | 21 ++---
 arch/x86/include/uapi/asm/kvm_para.h |  2 ++
 arch/x86/kernel/kvm.c| 22 ++
 arch/x86/kvm/cpuid.c |  3 ++-
 arch/x86/kvm/x86.c   |  4 
 drivers/char/random.c| 20 ++--
 include/linux/random.h   |  9 +
 11 files changed, 139 insertions(+), 6 deletions(-)
 create mode 100644 arch/x86/include/asm/archslowrng.h

-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 3/5] random: Seed pools from arch_get_slow_rng_u64 at startup

2014-07-16 Thread Andy Lutomirski

This should help solve the problem of guests starting out with
predictable RNG state.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 drivers/char/random.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 0a7ac0a..17ad33d 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1261,6 +1261,13 @@ static void init_std_data(struct entropy_store *r)
mix_pool_bytes(r, rv, sizeof(rv), NULL);
}
mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL);
+
+   for (i = 0; i  4; i++) {
+   u64 rv64;
+
+   if (arch_get_slow_rng_u64(rv64))
+   mix_pool_bytes(r, rv64, sizeof(rv64), NULL);
+   }
 }
 
 /*
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64

2014-07-16 Thread Andy Lutomirski

arch_get_slow_rng_u64 tries to get 64 bits of RNG seed data.  Unlike
arch_get_random_{bytes,seed}, etc., it makes no claims about entropy
content.  It's also likely to be much slower and should not be used
frequently.  That being said, it should be fast enough to call
several times during boot without any noticeable slowdown.

This initial implementation backs it with MSR_KVM_GET_RNG_SEED if
available.  The intent is for other hypervisor guest implementations
to implement this interface.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/Kconfig   |  4 
 arch/x86/include/asm/archslowrng.h | 30 ++
 arch/x86/kernel/kvm.c  | 22 ++
 include/linux/random.h |  9 +
 4 files changed, 65 insertions(+)
 create mode 100644 arch/x86/include/asm/archslowrng.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a8f749e..4dfb539 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -593,6 +593,7 @@ config KVM_GUEST
bool KVM Guest support (including kvmclock)
depends on PARAVIRT
select PARAVIRT_CLOCK
+   select ARCH_SLOW_RNG
default y
---help---
  This option enables various optimizations for running under the KVM
@@ -627,6 +628,9 @@ config PARAVIRT_TIME_ACCOUNTING
 config PARAVIRT_CLOCK
bool
 
+config ARCH_SLOW_RNG
+   bool
+
 endif #HYPERVISOR_GUEST
 
 config NO_BOOTMEM
diff --git a/arch/x86/include/asm/archslowrng.h 
b/arch/x86/include/asm/archslowrng.h
new file mode 100644
index 000..c8e8d0d
--- /dev/null
+++ b/arch/x86/include/asm/archslowrng.h
@@ -0,0 +1,30 @@
+/*
+ * This file is part of the Linux kernel.
+ *
+ * Copyright (c) 2014 Andy Lutomirski
+ * Authors: Andy Lutomirski l...@amacapital.net
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#ifndef ASM_X86_ARCHSLOWRANDOM_H
+#define ASM_X86_ARCHSLOWRANDOM_H
+
+#ifndef CONFIG_ARCH_SLOW_RNG
+# error archslowrng.h should not be included if !CONFIG_ARCH_SLOW_RNG
+#endif
+
+/*
+ * Performance is irrelevant here, so there's no point in using the
+ * paravirt ops mechanism.  Instead just use a function pointer.
+ */
+extern int (*arch_get_slow_rng_u64)(u64 *v);
+
+#endif /* ASM_X86_ARCHSLOWRANDOM_H */
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 3dd8e2c..8d64d28 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -416,6 +416,25 @@ void kvm_disable_steal_time(void)
wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
 }
 
+static int nop_get_slow_rng_u64(u64 *v)
+{
+   return 0;
+}
+
+static int kvm_get_slow_rng_u64(u64 *v)
+{
+   /*
+* Allow migration from a hypervisor with the GET_RNG_SEED
+* feature to a hypervisor without it.
+*/
+   if (rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0)
+   return 1;
+   else
+   return 0;
+}
+
+int (*arch_get_slow_rng_u64)(u64 *v) = nop_get_slow_rng_u64;
+
 #ifdef CONFIG_SMP
 static void __init kvm_smp_prepare_boot_cpu(void)
 {
@@ -493,6 +512,9 @@ void __init kvm_guest_init(void)
if (kvmclock_vsyscall)
kvm_setup_vsyscall_timeinfo();
 
+   if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED))
+   arch_get_slow_rng_u64 = kvm_get_slow_rng_u64;
+
 #ifdef CONFIG_SMP
smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
register_cpu_notifier(kvm_cpu_notifier);
diff --git a/include/linux/random.h b/include/linux/random.h
index 57fbbff..ceafbcf 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -106,6 +106,15 @@ static inline int arch_has_random_seed(void)
 }
 #endif
 
+#ifdef CONFIG_ARCH_SLOW_RNG
+# include asm/archslowrng.h
+#else
+static inline int arch_get_slow_rng_u64(u64 *v)
+{
+   return 0;
+}
+#endif
+
 /* Pseudo random number generator from numerical recipes. */
 static inline u32 next_pseudo_random32(u32 seed)
 {
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 5/5] x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

2014-07-16 Thread Andy Lutomirski

It's considerably better than any of the alternatives on KVM.

Rather than reinventing all of the cpu feature query code, this fixes
native_cpuid to work in PIC objects.

I haven't combined it with boot/cpuflags.c's cpuid implementation:
including asm/processor.h from boot/cpuflags.c results in a flood of
unrelated errors, and fixing it might be messy.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/boot/compressed/aslr.c  | 27 +++
 arch/x86/include/asm/processor.h | 21 ++---
 2 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index fc6091a..8583f0e 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -5,6 +5,8 @@
 #include asm/archrandom.h
 #include asm/e820.h
 
+#include uapi/asm/kvm_para.h
+
 #include generated/compile.h
 #include linux/module.h
 #include linux/uts.h
@@ -15,6 +17,22 @@
 static const char build_str[] = UTS_RELEASE  ( LINUX_COMPILE_BY @
LINUX_COMPILE_HOST ) ( LINUX_COMPILER )  UTS_VERSION;
 
+static bool kvm_para_has_feature(unsigned int feature)
+{
+   u32 kvm_base;
+   u32 features;
+
+   if (!has_cpuflag(X86_FEATURE_HYPERVISOR))
+   return false;
+
+   kvm_base = hypervisor_cpuid_base(KVMKVMKVM\0\0\0, KVM_CPUID_FEATURES);
+   if (!kvm_base)
+   return false;
+
+   features = cpuid_eax(kvm_base | KVM_CPUID_FEATURES);
+   return features  (1UL  feature);
+}
+
 #define I8254_PORT_CONTROL 0x43
 #define I8254_PORT_COUNTER00x40
 #define I8254_CMD_READBACK 0xC0
@@ -81,6 +99,15 @@ static unsigned long get_random_long(void)
}
}
 
+   if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) {
+   u64 seed;
+
+   debug_putstr( MSR_KVM_GET_RNG_SEED);
+   rdmsrl(MSR_KVM_GET_RNG_SEED, seed);
+   random ^= (unsigned long)seed;
+   use_i8254 = false;
+   }
+
if (has_cpuflag(X86_FEATURE_TSC)) {
debug_putstr( RDTSC);
rdtscll(raw);
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index a4ea023..6096f3c 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -189,10 +189,25 @@ static inline int have_cpuid_p(void)
 static inline void native_cpuid(unsigned int *eax, unsigned int *ebx,
unsigned int *ecx, unsigned int *edx)
 {
-   /* ecx is often an input as well as an output. */
-   asm volatile(cpuid
+   /*
+* This function can be used from the boot code, so it needs
+* to avoid using EBX in constraints in PIC mode.
+*
+* ecx is often an input as well as an output.
+*/
+   asm volatile(.ifnc %%ebx,%1 ; .ifnc %%rbx,%1   \n\t
+movl  %%ebx,%1\n\t
+.endif ; .endif   \n\t
+cpuid \n\t
+.ifnc %%ebx,%1 ; .ifnc %%rbx,%1   \n\t
+xchgl %%ebx,%1\n\t
+.endif ; .endif
: =a (*eax),
- =b (*ebx),
+#if defined(__i386__)  defined(__PIC__)
+ =r (*ebx),  /* gcc won't let us use ebx */
+#else
+ =b (*ebx),  /* ebx is okay */
+#endif
  =c (*ecx),
  =d (*edx)
: 0 (*eax), 2 (*ecx)
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 4/5] random: Log how many bits we managed to seed with in init_std_data

2014-07-16 Thread Andy Lutomirski

This is useful for making sure that init_std_data is working
correctly and for allaying fear when this happens:

random: xyz urandom read with SMALL_NUMBER bits of entropy available

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 drivers/char/random.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 17ad33d..10e9642 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1251,12 +1251,16 @@ static void init_std_data(struct entropy_store *r)
int i;
ktime_t now = ktime_get_real();
unsigned long rv;
+   int arch_seed_bits = 0, arch_random_bits = 0, slow_rng_bits = 0;
 
r-last_pulled = jiffies;
mix_pool_bytes(r, now, sizeof(now), NULL);
for (i = r-poolinfo-poolbytes; i  0; i -= sizeof(rv)) {
-   if (!arch_get_random_seed_long(rv) 
-   !arch_get_random_long(rv))
+   if (arch_get_random_seed_long(rv))
+   arch_seed_bits += 8 * sizeof(rv);
+   else if (arch_get_random_long(rv))
+   arch_random_bits += 8 * sizeof(rv);
+   else
rv = random_get_entropy();
mix_pool_bytes(r, rv, sizeof(rv), NULL);
}
@@ -1265,9 +1269,14 @@ static void init_std_data(struct entropy_store *r)
for (i = 0; i  4; i++) {
u64 rv64;
 
-   if (arch_get_slow_rng_u64(rv64))
+   if (arch_get_slow_rng_u64(rv64)) {
mix_pool_bytes(r, rv64, sizeof(rv64), NULL);
+   slow_rng_bits += 8 * sizeof(rv64);
+   }
}
+
+   pr_info(random: seeded %s pool with %d bits of arch random seed, %d 
bits of arch random, and %d bits of arch slow rng\n,
+   r-name, arch_seed_bits, arch_random_bits, slow_rng_bits);
 }
 
 /*
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 1/5] x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit

2014-07-16 Thread Andy Lutomirski

This adds a simple interface to allow a guest to request 64 bits of
host nonblocking entropy.  This is independent of virtio-rng for a
couple of reasons:

 - It's intended to be usable during early boot, when a trivial
   synchronous interface is needed.

 - virtio-rng gives blocking entropy, and making guest boot wait for
   the host's /dev/random will cause problems.

MSR_KVM_GET_RNG_SEED is intended to provide 64 bits of best-effort
cryptographically secure data for use as a seed.  It provides no
guarantee that the result contains any actual entropy.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 Documentation/virtual/kvm/cpuid.txt  | 3 +++
 arch/x86/include/uapi/asm/kvm_para.h | 2 ++
 arch/x86/kvm/cpuid.c | 3 ++-
 arch/x86/kvm/x86.c   | 4 
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/cpuid.txt 
b/Documentation/virtual/kvm/cpuid.txt
index 3c65feb..0ab043b 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT  || 7 || guest checks 
this feature bit
||   || before enabling paravirtualized
||   || spinlock support.
 --
+KVM_FEATURE_GET_RNG_SEED   || 8 || host provides rng seed data via
+   ||   || MSR_KVM_GET_RNG_SEED.
+--
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side
||   || per-cpu warps are expected in
||   || kvmclock.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 94dc8ca..e2eaf93 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -24,6 +24,7 @@
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
 #define KVM_FEATURE_PV_UNHALT  7
+#define KVM_FEATURE_GET_RNG_SEED   8
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -40,6 +41,7 @@
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN  0x4b564d04
+#define MSR_KVM_GET_RNG_SEED 0x4b564d05
 
 struct kvm_steal_time {
__u64 steal;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 38a0afe..40d6763 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -479,7 +479,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 (1  KVM_FEATURE_ASYNC_PF) |
 (1  KVM_FEATURE_PV_EOI) |
 (1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
-(1  KVM_FEATURE_PV_UNHALT);
+(1  KVM_FEATURE_PV_UNHALT) |
+(1  KVM_FEATURE_GET_RNG_SEED);
 
if (sched_info_on())
entry-eax |= (1  KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f644933..4e81853 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -48,6 +48,7 @@
 #include linux/pci.h
 #include linux/timekeeper_internal.h
 #include linux/pvclock_gtod.h
+#include linux/random.h
 #include trace/events/kvm.h
 
 #define CREATE_TRACE_POINTS
@@ -2480,6 +2481,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 *pdata)
case MSR_KVM_PV_EOI_EN:
data = vcpu-arch.pv_eoi.msr_val;
break;
+   case MSR_KVM_GET_RNG_SEED:
+   get_random_bytes(data, sizeof(data));
+   break;
case MSR_IA32_P5_MC_ADDR:
case MSR_IA32_P5_MC_TYPE:
case MSR_IA32_MCG_CAP:
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64

2014-07-16 Thread H. Peter Anvin

On 07/16/2014 02:45 PM, Andy Lutomirski wrote:
 diff --git a/arch/x86/include/asm/archslowrng.h 
 b/arch/x86/include/asm/archslowrng.h
 new file mode 100644
 index 000..c8e8d0d
 --- /dev/null
 +++ b/arch/x86/include/asm/archslowrng.h
 @@ -0,0 +1,30 @@
 +/*
 + * This file is part of the Linux kernel.
 + *
 + * Copyright (c) 2014 Andy Lutomirski
 + * Authors: Andy Lutomirski l...@amacapital.net
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms and conditions of the GNU General Public License,
 + * version 2, as published by the Free Software Foundation.
 + *
 + * This program is distributed in the hope it will be useful, but WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
 + * more details.
 + */
 +
 +#ifndef ASM_X86_ARCHSLOWRANDOM_H
 +#define ASM_X86_ARCHSLOWRANDOM_H
 +
 +#ifndef CONFIG_ARCH_SLOW_RNG
 +# error archslowrng.h should not be included if !CONFIG_ARCH_SLOW_RNG
 +#endif
 +

I'm *seriously* questioning the wisdom of this.  A much saner thing
would be to do:

#ifndef CONFIG_ARCH_SLOW_RNG

/* Not supported */
static inline int arch_get_slow_rng_u64(u64 *v)
{
(void)v;
return 0;
}

#endif

... which is basically what we do for the archrandom stuff.

I'm also wondering if it makes sense to have a function which prefers
arch_get_random*() over this one as a preferred interface.  Something like:

int get_random_arch_u64_slow_ok(u64 *v)
{
int i;
u64 x = 0;
unsigned long l;

for (i = 0; i  64/BITS_PER_LONG; i++) {
if (!arch_get_random_long(l))
return arch_get_slow_rng_u64(v);

x |=  l  (i*BITS_PER_LONG);
}
*v = l;
return 0;
}

This still doesn't address the issue e.g. on x86 where RDRAND is
available but we haven't set up alternatives yet.  So it might be that
what we really want is to encapsulate this fallback in arch code and do
a more direct enumeration.

 +
 +static int kvm_get_slow_rng_u64(u64 *v)
 +{
 + /*
 +  * Allow migration from a hypervisor with the GET_RNG_SEED
 +  * feature to a hypervisor without it.
 +  */
 + if (rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0)
 + return 1;
 + else
 + return 0;
 +}

How about:

return rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0;

The naming also feels really inconsistent...

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64

2014-07-16 Thread Andy Lutomirski

On Wed, Jul 16, 2014 at 2:59 PM, H. Peter Anvin h...@zytor.com wrote:
 On 07/16/2014 02:45 PM, Andy Lutomirski wrote:
 diff --git a/arch/x86/include/asm/archslowrng.h 
 b/arch/x86/include/asm/archslowrng.h
 new file mode 100644
 index 000..c8e8d0d
 --- /dev/null
 +++ b/arch/x86/include/asm/archslowrng.h
 @@ -0,0 +1,30 @@
 +/*
 + * This file is part of the Linux kernel.
 + *
 + * Copyright (c) 2014 Andy Lutomirski
 + * Authors: Andy Lutomirski l...@amacapital.net
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms and conditions of the GNU General Public License,
 + * version 2, as published by the Free Software Foundation.
 + *
 + * This program is distributed in the hope it will be useful, but WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
 + * more details.
 + */
 +
 +#ifndef ASM_X86_ARCHSLOWRANDOM_H
 +#define ASM_X86_ARCHSLOWRANDOM_H
 +
 +#ifndef CONFIG_ARCH_SLOW_RNG
 +# error archslowrng.h should not be included if !CONFIG_ARCH_SLOW_RNG
 +#endif
 +

 I'm *seriously* questioning the wisdom of this.  A much saner thing
 would be to do:

 #ifndef CONFIG_ARCH_SLOW_RNG

 /* Not supported */
 static inline int arch_get_slow_rng_u64(u64 *v)
 {
 (void)v;
 return 0;
 }

 #endif

 ... which is basically what we do for the archrandom stuff.

The archrandom stuff defines the not supported variant in the
generic header, which is what I'm doing here.  I could wrap all of
asm/archslowrng.h in #ifdef CONFIG_ARCH_SLOW_RNG instead of putting
the #error in there, but I have no strong preference.


 I'm also wondering if it makes sense to have a function which prefers
 arch_get_random*() over this one as a preferred interface.  Something like:

 int get_random_arch_u64_slow_ok(u64 *v)
 {
 int i;
 u64 x = 0;
 unsigned long l;

 for (i = 0; i  64/BITS_PER_LONG; i++) {
 if (!arch_get_random_long(l))
 return arch_get_slow_rng_u64(v);

 x |=  l  (i*BITS_PER_LONG);
 }
 *v = l;
 return 0;
 }

I played with something like this earlier, but I dropped it when it
ended up having exactly one user.  I suspect that the highly paranoid
will actually prefer seeding with both sources in init_std_data even
if RDRAND is available -- it costs very little and it provides a bit
of extra assurance.


 This still doesn't address the issue e.g. on x86 where RDRAND is
 available but we haven't set up alternatives yet.  So it might be that
 what we really want is to encapsulate this fallback in arch code and do
 a more direct enumeration.

My personal preference is to defer this until some user shows up.  I
think that even this would be too complicated for KASLR, which is the
only extremely early-boot user that I found.

Hmm.  Does the prandom stuff want to use this?


 +
 +static int kvm_get_slow_rng_u64(u64 *v)
 +{
 + /*
 +  * Allow migration from a hypervisor with the GET_RNG_SEED
 +  * feature to a hypervisor without it.
 +  */
 + if (rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0)
 + return 1;
 + else
 + return 0;
 +}

 How about:

 return rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0;

 The naming also feels really inconsistent...

Better ideas welcome.  I could call the generic function
arch_get_pv_random_seed, but maybe someone will come up with a
non-paravirt implementation.

--Andy


 -hpa




-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: x86: emulator injects #DB when RFLAGS.RF is set

2014-07-16 Thread Nadav Amit

If the RFLAGS.RF is set, then no #DB should occur on instruction breakpoints.
However, the KVM emulator injects #DB regardless to RFLAGS.RF. This patch fixes
this behavior. KVM, however, still appears not to update RFLAGS.RF correctly,
regardless of this patch.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/x86.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fae064f..e341a81 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5168,7 +5168,8 @@ static bool kvm_vcpu_check_breakpoint(struct kvm_vcpu 
*vcpu, int *r)
}
}
 
-   if (unlikely(vcpu-arch.dr7  DR7_BP_EN_MASK)) {
+   if (unlikely(vcpu-arch.dr7  DR7_BP_EN_MASK) 
+   !(kvm_get_rflags(vcpu)  X86_EFLAGS_RF)) {
dr6 = kvm_vcpu_check_hw_bp(eip, 0,
   vcpu-arch.dr7,
   vcpu-arch.db);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64

2014-07-16 Thread Andy Lutomirski

On Wed, Jul 16, 2014 at 3:13 PM, Andy Lutomirski l...@amacapital.net wrote:
 My personal preference is to defer this until some user shows up.  I
 think that even this would be too complicated for KASLR, which is the
 only extremely early-boot user that I found.

 Hmm.  Does the prandom stuff want to use this?

prandom isn't even using rdrand.  I'd suggest fixing this separately,
or even just waiting until someone goes and deletes prandom.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64

2014-07-16 Thread H. Peter Anvin

On 07/16/2014 03:40 PM, Andy Lutomirski wrote:
 On Wed, Jul 16, 2014 at 3:13 PM, Andy Lutomirski l...@amacapital.net wrote:
 My personal preference is to defer this until some user shows up.  I
 think that even this would be too complicated for KASLR, which is the
 only extremely early-boot user that I found.

 Hmm.  Does the prandom stuff want to use this?
 
 prandom isn't even using rdrand.  I'd suggest fixing this separately,
 or even just waiting until someone goes and deletes prandom.
 

prandom is exactly the opposite; it is designed for when we need
possibly low quality random numbers very quickly.  RDRAND is actually
too slow.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64

2014-07-16 Thread Andy Lutomirski

On Jul 16, 2014 4:00 PM, H. Peter Anvin h...@zytor.com wrote:

 On 07/16/2014 03:40 PM, Andy Lutomirski wrote:
  On Wed, Jul 16, 2014 at 3:13 PM, Andy Lutomirski l...@amacapital.net 
  wrote:
  My personal preference is to defer this until some user shows up.  I
  think that even this would be too complicated for KASLR, which is the
  only extremely early-boot user that I found.
 
  Hmm.  Does the prandom stuff want to use this?
 
  prandom isn't even using rdrand.  I'd suggest fixing this separately,
  or even just waiting until someone goes and deletes prandom.
 

 prandom is exactly the opposite; it is designed for when we need
 possibly low quality random numbers very quickly.  RDRAND is actually
 too slow.

I meant that prandom isn't using rdrand for early seeding.

--Andy


 -hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64

2014-07-16 Thread H. Peter Anvin

On 07/16/2014 05:03 PM, Andy Lutomirski wrote:

 prandom is exactly the opposite; it is designed for when we need
 possibly low quality random numbers very quickly.  RDRAND is actually
 too slow.
 
 I meant that prandom isn't using rdrand for early seeding.
 

We should probably fix that.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] KVM: nVMX: Fix fail to get nested ack intr's vector during nested vmexit

2014-07-16 Thread Wanpeng Li

WARNING: CPU: 9 PID: 7251 at arch/x86/kvm/vmx.c:8719 
nested_vmx_vmexit+0xa4/0x233 [kvm_intel]()
Modules linked in: tun nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 
dns_resolver nfs fscache lockd
sunrpc pci_stub netconsole kvm_intel kvm bridge stp llc autofs4 8021q ipv6 
uinput joydev microcode 
pcspkr igb i2c_algo_bit ehci_pci ehci_hcd e1000e ixgbe ptp pps_core hwmon mdio 
i2c_i801 i2c_core 
tpm_tis tpm ipmi_si ipmi_msghandler isci libsas scsi_transport_sas button 
dm_mirror dm_region_hash 
dm_log dm_mod
CPU: 9 PID: 7251 Comm: qemu-system-x86 Tainted: GW 3.16.0-rc1 #2
Hardware name: Intel Corporation S2600CP/S2600CP, BIOS 
RMLSDP.86I.00.29.D696.131329 11/11/2013
 220f 880ffd107bf8 81493563 220f
  880ffd107c38 8103f0eb 880ffd107c48
 a059709a 881ffc9e0040 8800b74b8000 
Call Trace:
 [81493563] dump_stack+0x49/0x5e
 [8103f0eb] warn_slowpath_common+0x7c/0x96
 [a059709a] ? nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
 [8103f11a] warn_slowpath_null+0x15/0x17
 [a059709a] nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
 [a0594295] ? nested_vmx_exit_handled+0x6a/0x39e [kvm_intel]
 [a0537931] ? kvm_apic_has_interrupt+0x80/0xd5 [kvm]
 [a05972ec] vmx_check_nested_events+0xc3/0xd3 [kvm_intel]
 [a051ebe9] inject_pending_event+0xd0/0x16e [kvm]
 [a051efa0] vcpu_enter_guest+0x319/0x704 [kvm]

After commit 77b0f5d (KVM: nVMX: Ack and write vector info to intr_info if L1 
asks us to), Acknowledge interrupt on exit behavior can be emulated. Current
logic will ask for intr vector if it is nested vmexit and 
VM_EXIT_ACK_INTR_ON_EXIT
is set by L1. However, intr vector for posted intr can't be got by generic read 
pending interrupt vector and intack routine, there is a requirement to sync 
from 
pir to irr. This patch fix it by ask the intr vector after sync pir to irr.

Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com
---
 arch/x86/kvm/lapic.c | 1 +
 arch/x86/kvm/vmx.c   | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 0069118..b7d45dc 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1637,6 +1637,7 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
apic_clear_irr(vector, apic);
return vector;
 }
+EXPORT_SYMBOL_GPL(kvm_get_apic_interrupt);
 
 void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
struct kvm_lapic_state *s)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4ae5ad8..31f1479 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8697,6 +8697,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
 nested_exit_intr_ack_set(vcpu)) {
int irq = kvm_cpu_get_interrupt(vcpu);
+
+   if (irq  0  kvm_apic_vid_enabled(vcpu-kvm))
+   irq = kvm_get_apic_interrupt(vcpu);
WARN_ON(irq  0);
vmcs12-vm_exit_intr_info = irq |
INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] KVM: nVMX: Fix vmptrld fail and vmwrite error when L1 goes down

2014-07-16 Thread Wanpeng Li

This bug can be trigger by L1 goes down directly w/ enable_shadow_vmcs.

[ 6413.158950] kvm: vmptrld   (null)/7800 failed
[ 6413.158954] vmwrite error: reg 401e value 4 (err 1)
[ 6413.158957] CPU: 0 PID: 4840 Comm: qemu-system-x86 Tainted: G   OE 
3.16.0kvm+ #2
[ 6413.158958] Hardware name: Dell Inc. OptiPlex 9020/0DNKMN, BIOS A05 
12/05/2013
[ 6413.158959]  0003 880210c9fb58 81741de9 
8800d7433f80
[ 6413.158960]  880210c9fb68 a059fa08 880210c9fb78 
a05938bf
[ 6413.158962]  880210c9fba8 a059a97f 8800d7433f80 
0003
[ 6413.158963] Call Trace:
[ 6413.158968]  [81741de9] dump_stack+0x45/0x56
[ 6413.158972]  [a059fa08] vmwrite_error+0x2c/0x2e [kvm_intel]
[ 6413.158974]  [a05938bf] vmcs_writel+0x1f/0x30 [kvm_intel]
[ 6413.158976]  [a059a97f] free_nested.part.73+0x5f/0x170 [kvm_intel]
[ 6413.158978]  [a059ab13] vmx_free_vcpu+0x33/0x70 [kvm_intel]
[ 6413.158991]  [a0360324] kvm_arch_vcpu_free+0x44/0x50 [kvm]
[ 6413.158998]  [a0360f92] kvm_arch_destroy_vm+0xf2/0x1f0 [kvm]

Commit 26a865 (KVM: VMX: fix use after free of vmx-loaded_vmcs) fix the use 
after free bug by move free_loaded_vmcs() before free_nested(), however, this 
lead to free loaded_vmcs-vmcs premature and vmptrld load a NULL pointer during 
sync shadow vmcs to vmcs12. In addition, vmwrite which used to disable shadow 
vmcs and reset VMCS_LINK_POINTER failed since there is no valid current-VMCS.
This patch fix it by skipping sync shadow vmcs and reset vmcs field for L1 
destroy since they will be reinitialized after L1 recreate.

Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com
---
 arch/x86/kvm/vmx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index fbce89e..2b28da7 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6113,9 +6113,9 @@ static void free_nested(struct vcpu_vmx *vmx)
return;
vmx-nested.vmxon = false;
if (vmx-nested.current_vmptr != -1ull) {
-   nested_release_vmcs12(vmx);
vmx-nested.current_vmptr = -1ull;
vmx-nested.current_vmcs12 = NULL;
+   nested_release_vmcs12(vmx);
}
if (enable_shadow_vmcs)
free_vmcs(vmx-nested.current_shadow_vmcs);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] KVM: nVMX: Fix virtual interrupt delivery injection

2014-07-16 Thread Wanpeng Li

This patch fix bug reported in 
https://bugzilla.kernel.org/show_bug.cgi?id=73331, 
after the patch http://www.spinics.net/lists/kvm/msg105230.html applied, there 
is
some progress and the L2 can boot up, however, slowly. The original idea of 
this 
fix vid injection patch is from Zhang, Yang Z yang.z.zh...@intel.com.

Interrupt which delivered by vid should be injected to L1 by L0 if current is 
in 
L1, or should be injected to L2 by L0 through the old injection way if L1 
doesn't 
have set VM_EXIT_ACK_INTR_ON_EXIT. The current logic doen't consider these 
cases. 
This patch fix it by vid intr to L1 if current is L1 or L2 through old 
injection 
way if L1 doen't have VM_EXIT_ACK_INTR_ON_EXIT set.

Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com
Signed-off-by: Zhang, Yang Z yang.z.zh...@intel.com
---
 arch/x86/kvm/vmx.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 021d84a..ad36646 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7112,8 +7112,22 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, 
int max_irr)
 {
if (max_irr == -1)
return;
-
-   vmx_set_rvi(max_irr);
+   if (!is_guest_mode(vcpu)) {
+   vmx_set_rvi(max_irr);
+   } else if (is_guest_mode(vcpu)  !nested_exit_on_intr(vcpu)) {
+   /*
+* Fall back to old way to inject the interrupt since there
+* is no vAPIC-v for L2.
+*/
+   if (vcpu-arch.exception.pending ||
+   vcpu-arch.nmi_injected ||
+   vcpu-arch.interrupt.pending)
+   return;
+   else if (vmx_interrupt_allowed(vcpu)) {
+   kvm_queue_interrupt(vcpu, max_irr, false);
+   vmx_inject_irq(vcpu);
+   }
+   }
 }
 
 static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 2/3] KVM: nVMX: Fix fail to get nested ack intr's vector during nested vmexit

2014-07-16 Thread Zhang, Yang Z

Wanpeng Li wrote on 2014-07-17:
 WARNING: CPU: 9 PID: 7251 at arch/x86/kvm/vmx.c:8719
 nested_vmx_vmexit+0xa4/0x233 [kvm_intel]() Modules linked in: tun
 nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 dns_resolver nfs fscache
 lockd sunrpc pci_stub netconsole kvm_intel kvm bridge stp llc autofs4
 8021q ipv6 uinput joydev microcode pcspkr igb i2c_algo_bit ehci_pci
 ehci_hcd e1000e ixgbe ptp pps_core hwmon mdio i2c_i801 i2c_core
 tpm_tis tpm ipmi_si ipmi_msghandler isci libsas scsi_transport_sas
 button dm_mirror dm_region_hash dm_log dm_mod
 CPU: 9 PID: 7251 Comm: qemu-system-x86 Tainted: GW
 3.16.0-rc1 #2
 Hardware name: Intel Corporation S2600CP/S2600CP, BIOS
 RMLSDP.86I.00.29.D696.131329 11/11/2013  220f
 880ffd107bf8 81493563 220f
   880ffd107c38 8103f0eb 880ffd107c48
  a059709a 881ffc9e0040 8800b74b8000 
  Call Trace: [81493563] dump_stack+0x49/0x5e 
  [8103f0eb]
 warn_slowpath_common+0x7c/0x96  [a059709a] ?
 nested_vmx_vmexit+0xa4/0x233 [kvm_intel]  [8103f11a]
 warn_slowpath_null+0x15/0x17  [a059709a]
 nested_vmx_vmexit+0xa4/0x233 [kvm_intel]  [a0594295] ?
 nested_vmx_exit_handled+0x6a/0x39e [kvm_intel]  [a0537931] ?
 kvm_apic_has_interrupt+0x80/0xd5 [kvm]  [a05972ec]
 vmx_check_nested_events+0xc3/0xd3 [kvm_intel]  [a051ebe9]
 inject_pending_event+0xd0/0x16e [kvm]  [a051efa0]
 vcpu_enter_guest+0x319/0x704 [kvm]
 
 After commit 77b0f5d (KVM: nVMX: Ack and write vector info to
 intr_info if L1 asks us to), Acknowledge interrupt on exit behavior
 can be emulated. Current logic will ask for intr vector if it is
 nested vmexit and VM_EXIT_ACK_INTR_ON_EXIT is set by L1. However, intr
 vector for posted intr can't be got by generic read pending interrupt
 vector and intack routine, there is a requirement to sync from pir to
 irr. This patch fix it by ask the intr vector after sync pir to irr.
 
 Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com

Reviewed-by: Yang Zhang yang.z.zh...@intel.com

 ---
  arch/x86/kvm/lapic.c | 1 +
  arch/x86/kvm/vmx.c   | 3 +++
  2 files changed, 4 insertions(+)
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index
 0069118..b7d45dc 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -1637,6 +1637,7 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
   apic_clear_irr(vector, apic);
   return vector;
  }
 +EXPORT_SYMBOL_GPL(kvm_get_apic_interrupt);
 
  void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
   struct kvm_lapic_state *s)
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
 4ae5ad8..31f1479 100644 --- a/arch/x86/kvm/vmx.c +++
 b/arch/x86/kvm/vmx.c @@ -8697,6 +8697,9 @@ static void
 nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
   if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
nested_exit_intr_ack_set(vcpu)) {
   int irq = kvm_cpu_get_interrupt(vcpu);
 +
 + if (irq  0  kvm_apic_vid_enabled(vcpu-kvm))
 + irq = kvm_get_apic_interrupt(vcpu);
   WARN_ON(irq  0);
   vmcs12-vm_exit_intr_info = irq |
   INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR;


Best regards,
Yang


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm: ppc: booke: Restore SPRG3 when entering guest

2014-07-16 Thread Bharat Bhushan

SPRG3 is guest accessible and SPRG3 can be clobbered by host
or another guest, So this need to be restored when loading
guest state.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
 arch/powerpc/kvm/booke_interrupts.S | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/booke_interrupts.S 
b/arch/powerpc/kvm/booke_interrupts.S
index 2c6deb5ef..0d3403f 100644
--- a/arch/powerpc/kvm/booke_interrupts.S
+++ b/arch/powerpc/kvm/booke_interrupts.S
@@ -459,6 +459,8 @@ lightweight_exit:
 * written directly to the shared area, so we
 * need to reload them here with the guest's values.
 */
+   PPC_LD(r3, VCPU_SHARED_SPRG3, r5)
+   mtspr   SPRN_SPRG3, r3
PPC_LD(r3, VCPU_SHARED_SPRG4, r5)
mtspr   SPRN_SPRG4W, r3
PPC_LD(r3, VCPU_SHARED_SPRG5, r5)
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3] Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8

2014-07-16 Thread Stewart Smith

The POWER8 processor has a Micro Partition Prefetch Engine, which is
a fancy way of saying has way to store and load contents of L2 or
L2+MRU way of L3 cache. We initiate the storing of the log (list of
addresses) using the logmpp instruction and start restore by writing
to a SPR.

The logmpp instruction takes parameters in a single 64bit register:
- starting address of the table to store log of L2/L2+L3 cache contents
  - 32kb for L2
  - 128kb for L2+L3
  - Aligned relative to maximum size of the table (32kb or 128kb)
- Log control (no-op, L2 only, L2 and L3, abort logout)

We should abort any ongoing logging before initiating one.

To initiate restore, we write to the MPPR SPR. The format of what to write
to the SPR is similar to the logmpp instruction parameter:
- starting address of the table to read from (same alignment requirements)
- table size (no data, until end of table)
- prefetch rate (from fastest possible to slower. about every 8, 16, 24 or
  32 cycles)

The idea behind loading and storing the contents of L2/L3 cache is to
reduce memory latency in a system that is frequently swapping vcores on
a physical CPU.

The best case scenario for doing this is when some vcores are doing very
cache heavy workloads. The worst case is when they have about 0 cache hits,
so we just generate needless memory operations.

This implementation just does L2 store/load. In my benchmarks this proves
to be useful.

Benchmark 1:
 - 16 core POWER8
 - 3x Ubuntu 14.04LTS guests (LE) with 8 VCPUs each
 - No split core/SMT
 - two guests running sysbench memory test.
   sysbench --test=memory --num-threads=8 run
 - one guest running apache bench (of default HTML page)
   ab -n 49 -c 400 http://localhost/

This benchmark aims to measure performance of real world application (apache)
where other guests are cache hot with their own workloads. The sysbench memory
benchmark does pointer sized writes to a (small) memory buffer in a loop.

In this benchmark with this patch I can see an improvement both in requests
per second (~5%) and in mean and median response times (again, about 5%).
The spread of minimum and maximum response times were largely unchanged.

benchmark 2:
 - Same VM config as benchmark 1
 - all three guests running sysbench memory benchmark

This benchmark aims to see if there is a positive or negative affect to this
cache heavy benchmark. Although due to the nature of the benchmark (stores) we
may not see a difference in performance, but rather hopefully an improvement
in consistency of performance (when vcore switched in, don't have to wait
many times for cachelines to be pulled in)

The results of this benchmark are improvements in consistency of performance
rather than performance itself. With this patch, the few outliers in duration
go away and we get more consistent performance in each guest.

benchmark 3:
 - same 3 guests and CPU configuration as benchmark 1 and 2.
 - two idle guests
 - 1 guest running STREAM benchmark

This scenario also saw performance improvement with this patch. On Copy and
Scale workloads from STREAM, I got 5-6% improvement with this patch. For
Add and triad, it was around 10% (or more).

benchmark 4:
 - same 3 guests as previous benchmarks
 - two guests running sysbench --memory, distinctly different cache heavy
   workload
 - one guest running STREAM benchmark.

Similar improvements to benchmark 3.

benchmark 5:
 - 1 guest, 8 VCPUs, Ubuntu 14.04
 - Host configured with split core (SMT8, subcores-per-core=4)
 - STREAM benchmark

In this benchmark, we see a 10-20% performance improvement across the board
of STREAM benchmark results with this patch.

Based on preliminary investigation and microbenchmarks
by Prerna Saxena pre...@linux.vnet.ibm.com

Signed-off-by: Stewart Smith stew...@linux.vnet.ibm.com

--
changes since v2:
- based on feedback from Alexander Graf:
  - move save and restore of cache to separate functions
  - move allocation of mpp_buffer to vcore creation
  - get_free_pages() does actually allocate pages aligned to order
(Mel Gorman confirms)
  - make SPR and logmpp parameters a bit less magic, especially around abort

changes since v1:
- s/mppe/mpp_buffer/
- add MPP_BUFFER_ORDER define.
---
 arch/powerpc/include/asm/kvm_host.h   |2 +
 arch/powerpc/include/asm/ppc-opcode.h |   17 +++
 arch/powerpc/include/asm/reg.h|1 +
 arch/powerpc/kvm/book3s_hv.c  |   89 +
 4 files changed, 98 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 1eaea2d..5769497 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -305,6 +305,8 @@ struct kvmppc_vcore {
u32 arch_compat;
ulong pcr;
ulong dpdes;/* doorbell state (POWER8) */
+   unsigned long mpp_buffer; /* Micro Partition Prefetch buffer */
+   bool mpp_buffer_is_valid;
 };
 
 #define VCORE_ENTRY_COUNT(vc)

81 matches

Mail list logo